The Recent “OGG Opus” Codec

One of the uses which I’ve had for OGG Files has been, as a container-file for music, which has been compressed using the lossy “Vorbis” Codec. This has given me superior sound to what MP3 Files once delivered, assuming that I’ve set my Vorbis-encoded streams to a higher bit-rate than what most people set, that being 256kbps, or, Quality Level 8.

But the same people who invented the Vorbis Codec, have embarked on a more recent project, which is called “OGG Opus”, which is a Codec that can switch back and forth seamlessly, between a lossy, Linear Predictive Coding mode (“SILK”), and a mode based on the Type 4 Discrete Cosine Transform (‘DCT’), the latter of which will dominate, when the Codec is used for high-fidelity music. This music-mode is defined by “The CELT Codec”, which has a detailed write-up dating in the year 2010 from its developers, that This Link points to.

I have read the write-up and offer an interpretation of it, which does not require as much technical comprehension, as the technical write-up itself requires, to be understood.

Essentially, the developers have made a radical departure from the approaches used previously, when compressing audio in the frequency domain. Only the least of the changes is, that shorter sampling windows are to be used, such as the 512-sample window which has been sketched, as well as a possible 256-sample window, which was mentioned as well. In return, both the even and odd coefficients of these sampling windows – aka Frames – are used, so that only very little overlap will exist between them. Hence, even though there will still be some overlap, these are mainly just Type 4 Discrete Cosine Transforms.

The concept has been abandoned, that the Codec should reconstruct the spectral definition of the original sound as much as possible, minus the fact that it has to be simplified, in order to be represented with far fewer bits, than the original sound was defined as having. A 44.1kHz, 16-bit, stereo, uncompressed Wave-File consumes about 1.4Mbps, while compressed sampling rates as low as 64kbps are achievable, and music will still sound decently like music. The emphasis here seems to be, that only the subjective perception of the sound is supposed to remain accurate.

(Updated 8/03/2019,16h00 … )

Continue reading The Recent “OGG Opus” Codec

Wavelet Decomposition of Images

One type of wavelet which exists, and which has continued to be of some interest to computational signal processing, is the Haar Wavelet. It’s thought to have a low-pass and a high-pass version complementing each other. This would be the low-pass Haar Wavelet:

[ +1 +1 ]

And this would be the high-pass version:

[ +1 -1 ]

These wavelets are intrinsically flawed, in that if they are applied to audio signals, they will produce poor frequency response each. But they do have as an important Mathematical property, that from its low-pass and its high-pass component, the original signal can be reconstructed fully.

Now, there is also something called a wavelet transform, but I seldom see it used.

If we wanted to extend the Haar Wavelet to the 2D domain, then a first approach might be, to apply it twice, once, one-dimensionally, along each axis of an image. But in reality, this would give the following low-frequency component:

[ +1 +1 ]

[ +1 +1 ]

And only, the following high-frequency component:

[ +1 -1 ]

[ -1 +1 ]

This creates an issue with common sense, because in order to be able to reconstruct the original signal – in this case an image – we’d need to arrive at 4 reduced values, not 2, because the original signal had 4 distinct values.

gpu_1_ow

And so closer inspection should reveal, that the wavelet reduction of images has 3 distinct high-frequency components: ( :1 )

Continue reading Wavelet Decomposition of Images