Threshold Elimination in Compressed Sound

I’ve written quite a few postings in this blog, about sound compression based on the Discrete Cosine Transform. And mixed in with my thoughts about that – where I was still, basically, trying to figure the subject out – were my statements to the effect that frequency-coefficients that are below a certain threshold of perceptibility could be set to zeroes, thus reducing the total number bits taken up, when Huffman-encoded.

My biggest problem in trying to analyze this is, the fact that I’m considering generalities, when in fact, specific compression methods based on the DCT, may or may not apply threshold-elimination at all. As an alternative, the compression technique could just rely on the quantization, to reduce how many bits per second it’s going to allocate to each sub-band of frequencies. ( :1 ) If the quantization step / scale-factor was high enough – suggesting the lowest quality-level – then many coefficients could still end up set to zeroes, just because they were below the quantization step used, as first computed from the DCT.

My impression is that the procedure which gets used to compute the quantization step remains straightforward:

  • Subdivide the frequencies into an arbitrary set of sub-bands – fewer than 32.
  • For each sub-band, first compute the DCTs to scale.
  • Take the (absolute of the) highest coefficient that results.
  • Divide that by the quality-level ( + 0.5 ) , to arrive at the quantization step to be used for that sub-band.
  • Divide all the actual DCT-coefficients by that quantization step, so that the maximum, (signed) integer value that results, will be equal to the quality-level.
  • How many coefficients end up being encoded to having such a high integer value, remains beyond our control.
  • Encode the quantization step / scale-factor with the sub-band, as part of the header information for each granule of sound.

The sub-band which I speak of has nothing to do with the fact that additionally, in MP3-compression, the signal is first passed through a quadrature filter-bank, resulting in 32 sub-bands that are evenly-spaced in frequencies by nature, and that the DCT is computed of each sub-band. This latter feature is a refinement, which as best I recall, was not present in the earliest forms of MP3-compression, and which does not affect how an MP3-file needs to be decoded.

(Updated 03/10/2018 : )

Continue reading Threshold Elimination in Compressed Sound