An Observation about the Daubechies Wavelet and PQF

In an earlier posting, I had written about what a wonderful thing Quadrature Mirror Filter was, and that it is better to apply the Daubechies Wavelet than the older Haar Wavelet. But the question remains less obvious, as to how the process can be reversed.

The concept was clear, that an input stream in the Time-Domain could first be passed through a low-pass filter, and then sub-sampled at (1/2) its original sampling rate. Simultaneously, the same stream can be passed through the corresponding band-pass filter, and then sub-sampled again, so that only frequencies above half the Nyquist Frequency are sub-sampled, thereby reversing them to below the new Nyquist Frequency.

A first approximation for how to reverse this might be, to duplicate each sample of the lower sub-band once, before super-sampling them, and to invert each sample of the upper side-band once, after expressing it positively, but we would not want playback-quality to drop to that of a Haar wavelet again ! And so we would apply the same wavelets to recombine the sub-bands. There is a detail to that which I left out.

We might want to multiply each sample of each sub-band by its entire wavelet, but only once for every second output-sample. And then one concern we might have could be, that the output-amplitude might not be constant. I suspect that one of the constraints which each of these wavelets satisfies would be, that their output-amplitude will actually be constant, if they are applied once per second output-sample.

Now, in the case of ‘Polyphase Quadrature Filter’, Engineers reduced the amount of computational effort, by not applying a band-pass filter, but only the low-pass filter. When encoding, the low sub-band is produced as before, but the high sub-band is simply produced as the difference between every second input-sample, and the result that was obtained when applying the low-pass filter. The question about this which is not obvious, is ‘How does one recombine that?’

And the best answer I can think of would be, to apply the low-pass wavelet to the low sub-band, and then to supply the sample from the high sub-band for two operations:

  1. The first sample from the output of the low-pass wavelet, plus the input sample.
  2. The second sample from the output of the low-pass wavelet, minus the same input sample, from the high sub-band.

Continue reading An Observation about the Daubechies Wavelet and PQF

An Observation about Modifying Fourier Transforms

A concept which seems to exist, is that certain standard Fourier Transforms do not produce desired results, and that therefore, They must be modified for use with compressed sound.

What I have noticed is that often, when we modify a Fourier Transform, it only produces a special case of an existing standard Transform.

For example, we may start with a Type 4 Discrete Cosine Transform, that has a sampling interval of 576 elements, but want it to overlap 50%, therefore wanting to double the length of samples taken in, without doubling the number of Frequency-Domain samples output. One way to accomplish that is to adhere to the standard Math, but just to extend the array of input samples, and to allow the reference-waves to continue into the extension of the sampling interval, at unchanged frequencies.

Because the Type 4 applies a half-sample shift to its output elements as well as to its input elements, this is really equivalent to what we would obtain, if we were to compute a Type 2 Discrete Cosine Transform over a sampling interval of 1152 elements, but if we were only to keep the odd-numbered coefficients. All the output elements would count as odd-numbered ones then, after their index is doubled.

The only new information I really have on Frequency-Based sound-compression, is that there is an advantage gained, in storing the sign of each coefficient, notwithstanding.

(Edit 08/07/2017 : )

Continue reading An Observation about Modifying Fourier Transforms

About the Amplitudes of a Discrete Differential

One of the concepts which exist in digital signal processing, is that the difference between two consecutive input samples (in the time-domain) can simply be output, thus resulting in a differential of some sort, even though the samples of data do not represent a continuous function. There is a fact which must be observed to occur at (F = N / 2) – i.e. when the frequency is half the Nyquist Frequency, of (h / 2) , if (h) is the sampling frequency.

The input signal could be aligned with the samples, to give a sequence of [s0 … s3] equal to

0, +1, 0, -1

This set of (s) is equivalent to a sine-wave at (F = N / 2) . Its discrete differentiation [h0 … h3] would be

+1, +1, -1, -1

At first glance we might think, that this output stream has the same amplitude as the input stream. But the problem becomes that the output stream is by same token, not aligned with the samples. There is an implicit peak in amplitudes between (h0) and (h1) which is greater than (+1) , and an implicit peak between (h2) and (h3) more negative than (-1) . Any adequate filtering of this stream, belonging to a D/A conversion, will reproduce a sine-wave with a peak amplitude greater than (1).

(Edit 03/23/2017 : )

In this case we can see, that samples h0 and h1 of the output stream, would be phase-shifted 45⁰ with respect to the zero crossings and to the peak amplitude, that would exist exactly between h0 and h1. Therefore, the amplitude of h0 and h1 will be the sine-function of 45⁰ with respect to this peak value, and the actual peak would be (the square root of 2) times the values of h0 and h1.

(Erratum 11/28/2017 —

And so a logical question which anybody might want an answer to would be, ‘Below what frequency does the gain cross unity gain?’ And the answer to that question is, somewhat obscurely, at (N/3) . This is a darned low frequency in practice. If the sampling rate was 44.1kHz, this is achieved somewhere around 7 kHz, and music, for which that sampling rate was devised, easily contains sound energy above that frequency.

Hence the sequences which result would be:

s = [ +1, +1/2, -1/2, -1, -1/2, +1/2 ]

h = [ +1/2, -1/2, -1, -1/2, +1/2, +1 ]

What follows is also a reason for which by itself, DPCM offers poor performance in compressing signals. It usually needs to be combined with other methods of data-reduction, thus possibly resulting in the lossy ADPCM. And another approach which uses ADPCM, is aptX, the last of which is a proprietary codec, which minimizes the loss of quality that might otherwise stem from using ADPCM.

I believe this observation is also relevant to This Earlier Posting of mine, which implied a High-Pass Filter with a cutoff frequency of 500 Hz, that would be part of a Band-Pass Filter. My goal was to obtain a gain of at most 0.5 , over the entire interval, and to simplify the Math.

— End of Erratum. )

(Posting shortened here on 11/28/2017 . )

Dirk

 

aptX and Delta-Modulation

I am an old-timer. And one of the tricks which once existed in Computing, to compress the amount of memory that would be needed, just to store digitized sound, was called “Delta Modulation”. At that time, the only ‘normal’ way to digitize sound was what is now called PCM, which often took up too much memory.

And so a scheme was devised very early, by which only the difference between two consecutive samples would actually stored. Today, this is called ‘DPCM‘. And yet, this method has an obvious, severe drawback. If the signal contains substantial amplitudes, associated with frequencies that are half the Nyquist Frequency or higher, this method will clip that content, and produce dull, altered sound.

Well one welcoming fact which I have learned, is that this limitation has essentially been overcome. One commercial domain in which this has been overcome, is with the compression scheme / CODEC named “aptX“. This is a proprietary scheme, owned by Qualcomm, but is frequently used, as the chips manufactured and designed by Qualcomm are installed into many devices and circuits. One important place this gets used, is with the type of Bluetooth headset, that now has high-quality sound.

What happens in aptX, requires that the band of frequencies which start out as a PCM stream, needs to get ‘beaten down’ into 4 sub-bands, using a type of filter known as a “Quadrature Mirror Filter“. This happens in two stages. I know of a kind of Quadrature Mirror Filter which was possible in the old analog days, but have had problems until now, imagining how somebody might implement one using algorithms.

The analog approach required, a local sine-wave, a phase-shifted local sine-wave, a balanced demodulator used twice, and a phase-shifter which was capable of phase-shifting a (wide) band of frequencies, without altering their relative amplitudes. This latter feat is a little difficult to accomplish with simple algorithms, and when accomplished, typically involves high latency. aptX is a CODEC with low latency.

The main thing to understand about a Quadrature Mirror Filter, implemented using algorithms in digital signal processing today, is that the hypothetical example the WiKi article above cites, using a Haar Wavelet for H0 and its complementary series for H1, actually fails to implement a quadrature-split in a pure way, and was offered just as a hypothetical example. The idea that H1( H0(z) ) always equals zero, simply suggested that the frequencies passed by these two filters are mutually exclusive, so that in an abstract way, they pass the requirements. After the signal is passed through H0 and H1 in parallel, the output of each is reduced to half the sampling rate of the input.

What Qualcomm explicitly does, is to define a series H0 and a series H1, such that they apply “64 coefficients”, so that they may achieve a frequency-split accurately. And it is not clear from the article, whether the number of coefficients for each filter is 64, or whether their sum for two filters is 64, or the sum of all six. Either way, this implies a lot of coefficients, which is why dedicated hardware is needed today, to implement aptX, and this dedicated hardware belongs to the kind, which needs to run its own microprogram.

Back in the early days of Computing, programmers would actually use the Haar Wavelet, because of its computational simplicity, even though doing so did not split the spectrum cleanly. And then this wavelet would define the ‘upper sideband’ in a notional way, while its complementary filter would define the notional, ‘lower sideband’, when splitting.

But then the result of this becomes 4 channels in the case of aptX, each of which has 1/4 the sampling rate of the original audio. And then it is possible, in effect, to delta-modulate each of these channels separately. The higher frequencies have then been beaten down to lower frequencies…

But there is a catch. In reality, aptX needs to use ‘ADPCM‘ and not ‘DPCM’, because it can happen in any case, that the amplitudes of upper-frequency bands could be high. ADPCM is a scheme, by which the maximum short-term differential is computed for some time-interval, which is allowed to be a frame of samples, and where a simple division is used to compute a scale factor, by which these differentials are to be quantized.

This is a special situation, in which the sound is quantized in the time-domain, rather than being quantized in the frequency-domain. Quantizing the higher-frequency sub-bands has the effect of adding background – ‘white’ – noise to the decoded signal, thus making the scheme lossy. Yet, because the ADPCM stages are adaptive, the degree of quantization keeps the level of this background noise at a certain fraction, of the amplitude of the intended signal.

And so it would seem, that even old tricks which once existed in Computing, such as delta modulation, have not gone to waste, and have been transformed into something more HQ today.

I think that one observation to add would be, that this approach makes most sense, if the number of output samples of each instance of H0 is half as many, as the number of input samples, and if the same can be said for H1.

And another observation would be, that this approach does not invert the lower sideband, the way real quadrature demodulation would. Instead, it would seem that H0 inverts the upper sideband.

If the intent of down-sampling is to act as a 2:1 low-pass filter, then it remains productive to add successive pairs of samples. Yet, this could just as easily be the definition of H1.

Dirk

(Edit 06/20/2016 : ) There is an observation to add about wavelets. The Haar Wavelet is the simplest kind:


H0 = [ +1, -1 ]
H1 = [ +1, +1 ]

And this one guarantees that the original signal can be reconstructed from two down-sampled sub-bands. But, if we remove one of the sub-bands completely, this one results in weird spectral results. This can also be a problem if the sub-bands are modified in ways that do not match.

It is possible to define complementary Wavelets, that are also orthogonal, but which again, result in weird spectral results.

The task of defining ones, which are both orthogonal and spectrally neutral, has been solved better by the Daubechies series of Wavelets. However, the series of coefficients used there are non-intuitive, and were also beyond my personal ability to figure out spontaneously.

The idea is that there exists a “scaling function”, which also results in the low-pass filter H1. And then, if we reverse the order of coefficients and negate every second one, we get the high-pass filter H0, which is really a band-pass filter.

To my surprise, the Daubechies Wavelets achieve ‘good results’, even with a low number of coefficients such as maybe 4? But for very good audio results, a longer series of coefficients would still be needed.

One aspect to this which is not mentioned elsewhere, is that while a Daubechies Wavelet-set could be used for encoding, that has a high order of approximation, it could still be that simple appliances will use the Haar Wavelet for decoding. This could be disappointing, but I guess that when decoding, the damage done in this way will be less severe than when encoding.

The most correct thing to do, would be to use the Daubechies Wavelets again for decoding, and the mere time-delays that result from their use, still fall within the customary definitions today, of “low-latency solutions”. If we needed a Sinc Filter, using it may no longer be considered so, and if we needed to find a Fourier Transform of granules of sound, only to invert it again later, it would certainly not be considered low-latency anymore.

And, when the subject is image decomposition or compression, it is a 2-dimensional application, and the reuse of the Haar Wavelet is more common.