One of the subjects which I’ve blogged about often, is the compression of sound, including Codecs which are based in the frequency-domain, rather than in the time-domain. What I’ve basically written is that in such cases, the time-domain samples of sound generate granules of frequency-domain coefficients, which are then in turn quantized. What tends to happen is that a new granule of sound is encoded every 576 time-domain samples, but that each time, a 1152-sample sampling window is used, and that due to the application of the “Modified Discrete Cosine Transform” (the ‘MDCT’), what amounts to all the odd coefficients of the Type 2 ‘DCT‘ are encoded, resulting in 576 coefficients being encoded each time. The present sampling window’s cosine function corresponds to the previous and next sampling window’s sine function, so that in a way that is orthogonal, these overlapping sampling windows also have the potential to preserve phase-information.
One observation which my readers may have about this, is the fact that while it does a good job at maintaining spectral resolution, this granule-size does not provide good temporal resolution. Therefore, a mechanism which MP3 compression introduced already, was ‘transient detection’. This feature can arbitrarily replace one of these full-length granules with 3 granules that only generate 192 frequency coefficients, and that recur as frequently.
The method by which transients are detected may be simple. For example, these short granules may tentatively have the stream subdivided all the time, but if any one of them contains more than average variance – which corresponds to signal energy – for example, if one shorter granule contains 1.5 times the average signal energy between the current 3, then this switch can take place.
What I do know is that when granules of sound – or rather, the quantized spectral information from granules of sound – are included in the stream, they include two extra bits each time, that define what the “Zone” of the present granule is. This can be one of four zones:
- A full-sized granule belonging to a stream of them,
- A shortened granule, belonging to a stream of them,
- A shortened granule, that precedes a full-sized granule,
- A shortened granule, that follows a full-sized granule.
- Because it’s inherent in MP3 compression that the entire current sampling window must overlap, partially with the preceding, and partially with the following one, there may be no special rule for how to shape a sampling window, that corresponds to a long granule, both preceded and followed by shortened ones. However, when that happens, both the preceding and following shortened granules will be encoded, to be followed and preceded respectively, by a long granule, for which reason those granules will already have long overlap-portions. Therefore, the current granule in such a case can be encoded as though it was just part of a sequence of long granules.
This information is ultimately non-trivial because it also affects the computation of sampling windows, i.e., it also affects the exact windowing function to be used when encoding. If the granule is followed or preceded by short granules, then either side of the windowing function must also be shortened. (:1)
Now, in the case of other Codecs, such as ‘OGG Vorbis’, a similar approach is taken. But I can well imagine that if specific ideals were simply implemented exactly as they were with MP3 sound, then eventually, the owners of the MP3 Codec might cry foul, over software patent violations. And yet, this problem can easily be sidestepped, let’s say by deciding that the shortened granules be made 1/2 the length of the full-sized granule, instead of 1/3 that length. And at that point the implementation would be sufficiently different from the original idea, that it would no longer constitute a patent violation.
Continue reading Some Trivia about Granules of Sound