I am an old-timer. And one of the tricks which once existed in Computing, to compress the amount of memory that would be needed, just to store digitized sound, was called “Delta Modulation”. At that time, the only ‘normal’ way to digitize sound was what is now called PCM, which often took up too much memory.
And so a scheme was devised very early, by which only the difference between two consecutive samples would actually stored. Today, this is called ‘DPCM‘. And yet, this method has an obvious, severe drawback. If the signal contains substantial amplitudes, associated with frequencies that are half the Nyquist Frequency or higher, this method will clip that content, and produce dull, altered sound.
Well one welcoming fact which I have learned, is that this limitation has essentially been overcome. One commercial domain in which this has been overcome, is with the compression scheme / CODEC named “aptX“. This is a proprietary scheme, owned by Qualcomm, but is frequently used, as the chips manufactured and designed by Qualcomm are installed into many devices and circuits. One important place this gets used, is with the type of Bluetooth headset, that now has high-quality sound.
What happens in aptX, requires that the band of frequencies which start out as a PCM stream, needs to get ‘beaten down’ into 4 sub-bands, using a type of filter known as a “Quadrature Mirror Filter“. This happens in two stages. I know of a kind of Quadrature Mirror Filter which was possible in the old analog days, but have had problems until now, imagining how somebody might implement one using algorithms.
The analog approach required, a local sine-wave, a phase-shifted local sine-wave, a balanced demodulator used twice, and a phase-shifter which was capable of phase-shifting a (wide) band of frequencies, without altering their relative amplitudes. This latter feat is a little difficult to accomplish with simple algorithms, and when accomplished, typically involves high latency. aptX is a CODEC with low latency.
The main thing to understand about a Quadrature Mirror Filter, implemented using algorithms in digital signal processing today, is that the hypothetical example the WiKi article above cites, using a Haar Wavelet for H0 and its complementary series for H1, actually fails to implement a quadrature-split in a pure way, and was offered just as a hypothetical example. The idea that H1( H0(z) ) always equals zero, simply suggested that the frequencies passed by these two filters are mutually exclusive, so that in an abstract way, they pass the requirements. After the signal is passed through H0 and H1 in parallel, the output of each is reduced to half the sampling rate of the input.
What Qualcomm explicitly does, is to define a series H0 and a series H1, such that they apply “64 coefficients”, so that they may achieve a frequency-split accurately. And it is not clear from the article, whether the number of coefficients for each filter is 64, or whether their sum for two filters is 64, or the sum of all six. Either way, this implies a lot of coefficients, which is why dedicated hardware is needed today, to implement aptX, and this dedicated hardware belongs to the kind, which needs to run its own microprogram.
Back in the early days of Computing, programmers would actually use the Haar Wavelet, because of its computational simplicity, even though doing so did not split the spectrum cleanly. And then this wavelet would define the ‘upper sideband’ in a notional way, while its complementary filter would define the notional, ‘lower sideband’, when splitting.
But then the result of this becomes 4 channels in the case of aptX, each of which has 1/4 the sampling rate of the original audio. And then it is possible, in effect, to delta-modulate each of these channels separately. The higher frequencies have then been beaten down to lower frequencies…
But there is a catch. In reality, aptX needs to use ‘ADPCM‘ and not ‘DPCM’, because it can happen in any case, that the amplitudes of upper-frequency bands could be high. ADPCM is a scheme, by which the maximum short-term differential is computed for some time-interval, which is allowed to be a frame of samples, and where a simple division is used to compute a scale factor, by which these differentials are to be quantized.
This is a special situation, in which the sound is quantized in the time-domain, rather than being quantized in the frequency-domain. Quantizing the higher-frequency sub-bands has the effect of adding background – ‘white’ – noise to the decoded signal, thus making the scheme lossy. Yet, because the ADPCM stages are adaptive, the degree of quantization keeps the level of this background noise at a certain fraction, of the amplitude of the intended signal.
And so it would seem, that even old tricks which once existed in Computing, such as delta modulation, have not gone to waste, and have been transformed into something more HQ today.
I think that one observation to add would be, that this approach makes most sense, if the number of output samples of each instance of H0 is half as many, as the number of input samples, and if the same can be said for H1.
And another observation would be, that this approach does not invert the lower sideband, the way real quadrature demodulation would. Instead, it would seem that H0 inverts the upper sideband.
If the intent of down-sampling is to act as a 2:1 low-pass filter, then it remains productive to add successive pairs of samples. Yet, this could just as easily be the definition of H1.
(Edit 06/20/2016 : ) There is an observation to add about wavelets. The Haar Wavelet is the simplest kind:
H0 = [ +1, -1 ] H1 = [ +1, +1 ]
And this one guarantees that the original signal can be reconstructed from two down-sampled sub-bands. But, if we remove one of the sub-bands completely, this one results in weird spectral results. This can also be a problem if the sub-bands are modified in ways that do not match.
It is possible to define complementary Wavelets, that are also orthogonal, but which again, result in weird spectral results.
The task of defining ones, which are both orthogonal and spectrally neutral, has been solved better by the Daubechies series of Wavelets. However, the series of coefficients used there are non-intuitive, and were also beyond my personal ability to figure out spontaneously.
The idea is that there exists a “scaling function”, which also results in the low-pass filter H1. And then, if we reverse the order of coefficients and negate every second one, we get the high-pass filter H0, which is really a band-pass filter.
To my surprise, the Daubechies Wavelets achieve ‘good results’, even with a low number of coefficients such as maybe 4? But for very good audio results, a longer series of coefficients would still be needed.
One aspect to this which is not mentioned elsewhere, is that while a Daubechies Wavelet-set could be used for encoding, that has a high order of approximation, it could still be that simple appliances will use the Haar Wavelet for decoding. This could be disappointing, but I guess that when decoding, the damage done in this way will be less severe than when encoding.
The most correct thing to do, would be to use the Daubechies Wavelets again for decoding, and the mere time-delays that result from their use, still fall within the customary definitions today, of “low-latency solutions”. If we needed a Sinc Filter, using it may no longer be considered so, and if we needed to find a Fourier Transform of granules of sound, only to invert it again later, it would certainly not be considered low-latency anymore.
And, when the subject is image decomposition or compression, it is a 2-dimensional application, and the reuse of the Haar Wavelet is more common.