Threshold Elimination in Compressed Sound

I’ve written quite a few postings in this blog, about sound compression based on the Discrete Cosine Transform. And mixed in with my thoughts about that – where I was still, basically, trying to figure the subject out – were my statements to the effect that frequency-coefficients that are below a certain threshold of perceptibility could be set to zeroes, thus reducing the total number bits taken up, when Huffman-encoded.

My biggest problem in trying to analyze this is, the fact that I’m considering generalities, when in fact, specific compression methods based on the DCT, may or may not apply threshold-elimination at all. As an alternative, the compression technique could just rely on the quantization, to reduce how many bits per second it’s going to allocate to each sub-band of frequencies. ( :1 ) If the quantization step / scale-factor was high enough – suggesting the lowest quality-level – then many coefficients could still end up set to zeroes, just because they were below the quantization step used, as first computed from the DCT.

My impression is that the procedure which gets used to compute the quantization step remains straightforward:

  • Subdivide the frequencies into an arbitrary set of sub-bands – fewer than 32.
  • For each sub-band, first compute the DCTs to scale.
  • Take the (absolute of the) highest coefficient that results.
  • Divide that by the quality-level ( + 0.5 ) , to arrive at the quantization step to be used for that sub-band.
  • Divide all the actual DCT-coefficients by that quantization step, so that the maximum, (signed) integer value that results, will be equal to the quality-level.
  • How many coefficients end up being encoded to having such a high integer value, remains beyond our control.
  • Encode the quantization step / scale-factor with the sub-band, as part of the header information for each granule of sound.

The sub-band which I speak of has nothing to do with the fact that additionally, in MP3-compression, the signal is first passed through a quadrature filter-bank, resulting in 32 sub-bands that are evenly-spaced in frequencies by nature, and that the DCT is computed of each sub-band. This latter feature is a refinement, which as best I recall, was not present in the earliest forms of MP3-compression, and which does not affect how an MP3-file needs to be decoded.

(Updated 03/10/2018 : )

Continue reading Threshold Elimination in Compressed Sound

An Observation about the Daubechies Wavelet and PQF

In an earlier posting, I had written about what a wonderful thing Quadrature Mirror Filter was, and that it is better to apply the Daubechies Wavelet than the older Haar Wavelet. But the question remains less obvious, as to how the process can be reversed.

The concept was clear, that an input stream in the Time-Domain could first be passed through a low-pass filter, and then sub-sampled at (1/2) its original sampling rate. Simultaneously, the same stream can be passed through the corresponding band-pass filter, and then sub-sampled again, so that only frequencies above half the Nyquist Frequency are sub-sampled, thereby reversing them to below the new Nyquist Frequency.

A first approximation for how to reverse this might be, to duplicate each sample of the lower sub-band once, before super-sampling them, and to invert each sample of the upper side-band once, after expressing it positively, but we would not want playback-quality to drop to that of a Haar wavelet again ! And so we would apply the same wavelets to recombine the sub-bands. There is a detail to that which I left out.

We might want to multiply each sample of each sub-band by its entire wavelet, but only once for every second output-sample. And then one concern we might have could be, that the output-amplitude might not be constant. I suspect that one of the constraints which each of these wavelets satisfies would be, that their output-amplitude will actually be constant, if they are applied once per second output-sample.

Now, in the case of ‘Polyphase Quadrature Filter’, Engineers reduced the amount of computational effort, by not applying a band-pass filter, but only the low-pass filter. When encoding, the low sub-band is produced as before, but the high sub-band is simply produced as the difference between every second input-sample, and the result that was obtained when applying the low-pass filter. The question about this which is not obvious, is ‘How does one recombine that?’

And the best answer I can think of would be, to apply the low-pass wavelet to the low sub-band, and then to supply the sample from the high sub-band for two operations:

  1. The first sample from the output of the low-pass wavelet, plus the input sample.
  2. The second sample from the output of the low-pass wavelet, minus the same input sample, from the high sub-band.

Continue reading An Observation about the Daubechies Wavelet and PQF

Scale Factor == Step Size

It could be the case, that in my own postings I referred to a ‘Scale Factor’, which would come prior to Quantization. In other works of reference, the term ‘Quantization Step’ might appear. As far as I am concerned, these terms are synonymous.

The goal could be to start with a maximum value as input, and to find a way to quantize it and all lower values, to arrive at a maximum quantized integer value. One would divide the (absolute) first by the second input value, to find this parameter, for an interval of time.

Dividing all the sample-values in the interval by the resulting parameter, will yield the maximum, quantized integer. And this parameter will also be equal to the minimum difference between the sample values, leading to two different quantized integers.