Threshold Elimination in Compressed Sound

I’ve written quite a few postings in this blog, about sound compression based on the Discrete Cosine Transform. And mixed in with my thoughts about that – where I was still, basically, trying to figure the subject out – were my statements to the effect that frequency-coefficients that are below a certain threshold of perceptibility could be set to zeroes, thus reducing the total number bits taken up, when Huffman-encoded.

My biggest problem in trying to analyze this is, the fact that I’m considering generalities, when in fact, specific compression methods based on the DCT, may or may not apply threshold-elimination at all. As an alternative, the compression technique could just rely on the quantization, to reduce how many bits per second it’s going to allocate to each sub-band of frequencies. ( :1 ) If the quantization step / scale-factor was high enough – suggesting the lowest quality-level – then many coefficients could still end up set to zeroes, just because they were below the quantization step used, as first computed from the DCT.

My impression is that the procedure which gets used to compute the quantization step remains straightforward:

  • Subdivide the frequencies into an arbitrary set of sub-bands – fewer than 32.
  • For each sub-band, first compute the DCTs to scale.
  • Take the (absolute of the) highest coefficient that results.
  • Divide that by the quality-level ( + 0.5 ) , to arrive at the quantization step to be used for that sub-band.
  • Divide all the actual DCT-coefficients by that quantization step, so that the maximum, (signed) integer value that results, will be equal to the quality-level.
  • How many coefficients end up being encoded to having such a high integer value, remains beyond our control.
  • Encode the quantization step / scale-factor with the sub-band, as part of the header information for each granule of sound.

The sub-band which I speak of has nothing to do with the fact that additionally, in MP3-compression, the signal is first passed through a quadrature filter-bank, resulting in 32 sub-bands that are evenly-spaced in frequencies by nature, and that the DCT is computed of each sub-band. This latter feature is a refinement, which as best I recall, was not present in the earliest forms of MP3-compression, and which does not affect how an MP3-file needs to be decoded.

(Updated 03/10/2018 : )

Continue reading Threshold Elimination in Compressed Sound

Emphasizing a Presumed Difference between OGG and MP3 Sound Compression

In this posting from some time ago, I wrote down certain details I had learned about MP3 sound compression. I suppose that while I did write, that the Discreet Cosine Transform coefficients get scaled, I may have missed to mention in that same posting, that they also get quantized. But I did imply it, and I also made up for the omission in this posting.

But one subject which I did mention over several postings, was my own disagreement with the practice, of culling frequency-coefficients which are deemed inaudible, thus setting those to zero, just to reduce the bit-rate in one step, hoping to get better results, ‘because a lower initial bit-rate also means that the user can select a higher final bit-rate…’

In fact, I think that some technical observers have confused two separate processes that take place in MP3:

  1. An audibility threshold is determined, so that coefficients which are lower than that are set to zero.
  2. The non-zero coefficients are quantized, in such a way that the highest of them fits inside a fixed maximum, quantized value. Since a scale-factor is computed for one frequency sub-band, this also implies that close to strong frequency coefficients, weaker ones are just quantized more.

In principle, concept (1) above disagrees with me, while concept (2) seems perfectly fine.

And so based on that I also need to emphasize, that with MP3, first a Fast-Fourier Transform is computed, the exact implementation of which is not critical for the correct playback of the stream, but the only purpose of which is to determine audibility thresholds for the DCT transform coefficients, the frequency-sub-bands of which must fit the standard exactly, since the DCT is actually used to compress the sound, and then to play it back.

This FFT can serve a second purpose in Stereo. Since this transform is assumed to produce complex numbers – unlike the DCT – it is possible to determine whether the Left-Minus-Right channel correlates positively or negatively with the Left-Plus-Right channel, regarding their phase. The way to do this effectively, is to compute the dot-product between two complex numbers, and to see whether this dot-product is positive or negative. The imaginary component of one of the sources needs to be inverted for that to work.

But then negative or positive correlation can be recorded once for each sub-band of the DCT as one bit. This will tell, whether a positive difference-signal, is positive when the left channel is more so, or positive if the right channel is more so.

You see, in addition to the need to store this information, potentially with each coefficient, there is the need to measure this information somehow first.

But an alternative approach is possible, in which no initial FFT is computed, but in which only the DCT is computed, once for each Stereo channel. This might even have been done, to reduce the required coding effort. And in that case, the DCT would need to be computed for each channel separately, before a later encoding stage decides to store the sum and the difference for each coefficient. In that case, it is not possible first to determine, whether the time-domain streams correlate positively or negatively.

This would also imply, that close to strong frequency-components, the weaker ones are only quantized more, not culled.

So, partially because of what I read, and partially because of my own idea of how I might do things, I am hoping that OGG sound compression takes this latter approach.


Continue reading Emphasizing a Presumed Difference between OGG and MP3 Sound Compression