One concept which exists in Computing, is a primary representation of audio streams, as samples with a constant sampling-rate, which is also called ‘PCM’ – or, Pulse-Code Modulation. this is also the basis for .WAV-Files. But, everybody knows that the files needed to represent even the highest humanly-audible frequencies in this way, become large. And so means have been pursued over the decades to compress this format after it has been generated, or to decompress it before reading the stream. And as early as in the 1970s, a compression-technique existed, which is called ‘DPCM’ today: Differential Pulse-Code Modulation. Back then, it was just not referred to as DPCM, but rather as ‘Delta-Modulation’, and it first formed a basis for the voice-chips, in ‘talking dolls’ (toys). Later it became the basis for the first solid-state (telephone) answering machines.

The way DPCM works, is that instead of each sample-value being stored or transmitted, only the exact difference between two consecutive sample-values is stored. And this subject is sometimes explained, as though software engineers had two ways to go about encoding it:

1. Simply subtract the current sample-value from the previous one and output it,
2. Create a local copy, of what the decoder would do, if the previous sample-differences had been decoded, and output the difference between the current sample-value, and what this local model regenerated.

What happens when DPCM is used directly, is that a smaller field of bits can be used as data, let’s say ‘4’ instead of ‘8’. But then, a problem quickly becomes obvious: Unless the uncompressed signal was very low in higher-frequency components – frequencies above 1/3 the Nyquist-Frequency – a step in the 8-bit sample-values could take place, which is too large to represent as a 4-bit number. And given this possibility, it would seem that only approach (2) will give the correct result, which would be, that the decoded sample-values will slew, where the original values had a step, but slew back to an originally-correct, low-frequency value.

But then we’d still be left with the advantage, of fixed field-widths, and thus, a truly Constant Bitrate (CBR).

But because according to today’s customs, the signal is practically guaranteed to be rich in its higher-frequency components, a derivative of DPCM has been devised, which is called ‘ADPCM’ – Adaptive Differential Pulse-Code Modulation. When encoding ADPCM, each sample-difference is quantized, according to a quantization-step – aka scale-factor – that adapts to how high the successive differences are at any time. But again, as long as we include the scale-factor as part of (small) header-information for an audio-format, that’s organized into blocks, we can achieve fixed field-sizes and fixed block-sizes again, and thus also achieve true CBR.

(Updated 03/07/2018 : )

## About the Amplitudes of a Discrete Differential

One of the concepts which exist in digital signal processing, is that the difference between two consecutive input samples (in the time-domain) can simply be output, thus resulting in a differential of some sort, even though the samples of data do not represent a continuous function. There is a fact which must be observed to occur at (F = N / 2) – i.e. when the frequency is half the Nyquist Frequency, of (h / 2) , if (h) is the sampling frequency.

The input signal could be aligned with the samples, to give a sequence of [s0 … s3] equal to

0, +1, 0, -1

This set of (s) is equivalent to a sine-wave at (F = N / 2) . Its discrete differentiation [h0 … h3] would be

+1, +1, -1, -1

At first glance we might think, that this output stream has the same amplitude as the input stream. But the problem becomes that the output stream is by same token, not aligned with the samples. There is an implicit peak in amplitudes between (h0) and (h1) which is greater than (+1) , and an implicit peak between (h2) and (h3) more negative than (-1) . Any adequate filtering of this stream, belonging to a D/A conversion, will reproduce a sine-wave with a peak amplitude greater than (1).

(Edit 03/23/2017 : )

In this case we can see, that samples h0 and h1 of the output stream, would be phase-shifted 45⁰ with respect to the zero crossings and to the peak amplitude, that would exist exactly between h0 and h1. Therefore, the amplitude of h0 and h1 will be the sine-function of 45⁰ with respect to this peak value, and the actual peak would be (the square root of 2) times the values of h0 and h1.

(Erratum 11/28/2017 —

And so a logical question which anybody might want an answer to would be, ‘Below what frequency does the gain cross unity gain?’ And the answer to that question is, somewhat obscurely, at (N/3) . This is a darned low frequency in practice. If the sampling rate was 44.1kHz, this is achieved somewhere around 7 kHz, and music, for which that sampling rate was devised, easily contains sound energy above that frequency.

Hence the sequences which result would be:

s = [ +1, +1/2, -1/2, -1, -1/2, +1/2 ]

h = [ +1/2, -1/2, -1, -1/2, +1/2, +1 ]

What follows is also a reason for which by itself, DPCM offers poor performance in compressing signals. It usually needs to be combined with other methods of data-reduction, thus possibly resulting in the lossy ADPCM. And another approach which uses ADPCM, is aptX, the last of which is a proprietary codec, which minimizes the loss of quality that might otherwise stem from using ADPCM.

I believe this observation is also relevant to This Earlier Posting of mine, which implied a High-Pass Filter with a cutoff frequency of 500 Hz, that would be part of a Band-Pass Filter. My goal was to obtain a gain of at most 0.5 , over the entire interval, and to simplify the Math.

— End of Erratum. )

(Posting shortened here on 11/28/2017 . )

Dirk

## aptX Handles Polyphonic Sound Surprisingly Well.

Right now, as I am typing this, I am listening to Beethovens 9th Symphony on my real “LG Tone Pro HBS-750″ Bluetooth Headphones. The quality of sound is dramatically better, than what the fake HBS-730s had produced, simply because those were fake.

This recording of Beethoven is stored on my phone, as a series of FLAC files, and Android Lollipop devices are well-able to play back FLAC files. I did this, in order to test the fake headphones at first, because I was not sure whether their poor performance then was due to some interaction of aptX, with MP3 or OGG compression, rather than due to the implementation of aptX I was getting. Playing back a FLAC file is equivalent to playing back a raw audio file.

From what I read, aptX not only splits the uncompressed spectrum into 4 sub-bands, but then quantizes each sub-band. The 4 sub-bands are approximately from 0 to 5.5 kHz, from 5.5 to 11 kHz, from 11 kHz to 16.5 kHz, and finally from 16.5 kHz to 22 kHz. These sub-bands are then compressed using ADPCM, which allocates 8,4,2,2 bits to each.

This implies, that the first sub-band contains the bass and the mid-range, and that what I would call ‘melodic treble’ sounds, do not extend beyond sub-band 2, since treble notes with fundamental frequencies higher than 11 kHz are not usually played. And sub-bands 3 and 4 simply add texture to the sound. This means, that to allocate fewer bits of precision to sub-bands 3 and 4 ‘makes sense’, since our natural way of interpreting sound, already sees less detail at those frequencies.

A question which I had raised earlier, was if the act of quantizing the sub-bands 3 and 4 greatly – down to 2 bits in fact – will damage the degree of polyphony that can be achieved.

And now that I possess true headphones I am finding, that the answer is No. The sub-bands 3 and 4, are still capable of being played back in a multi-spectral way, even though their differentials have been quantized that much.

(Edit 06/25/2016 : ) Instead of receiving a regular sequence of +1, 0 and -1 data-points, it is possible to receive an atactic sequence of them. The first thing that happens when decoding that, is an integration, which will already emphasize lower, original frequency components that have been deemphasized. After that, the degree with which the analog signal can be reconstructed is only as good, as the interpolation. And in practice, interpolation is often provided by means of a linear filter which has more than two coefficients. Having a longer sequence of coefficients, such as maybe 6 or 8, provides better interpolation, even in sub-bands 3 and 4, which we supposedly hear less-well.

I do find though, now that the entire signal is much more clear, that when I listen closely, the highest frequencies belonging to Beethovens 9th, seem to have slightly less resolution than they are truly supposed to have. But not as much less resolution, than I am used to hearing, due to poor headphones, or due to MP3 compression.

It is already a dramatic improvement over what my past told me, that today, Some Bluetooth Headphones can play back high-quality music, in addition to being usable for telephony.

Now, Beethoven died before he finished his 9th symphony, and later artists officially completed it, by adding the 5th movement, which is actually “Shiller’s Ode To Joy”. According to what I am hearing, that 5th movement is compromised more by the aptX compression than the first 4 were, that were actually written by Beethoven.

The reason seems to be the fact, that Shiller’s work is more operatic, and has choruses singing very high notes, which results in a lot of the signal energy being in the 2nd, 3rd and 4th sub-bands. So when I hear that movement, I can hear the quantization quite clearly.

It is usually not a preference of mine, to listen to this 5th movement, because I don’t find it to be authentic Beethoven. Right now I am listening to it, and observing this effect with some fascination.

Dirk

## aptX and Delta-Modulation

I am an old-timer. And one of the tricks which once existed in Computing, to compress the amount of memory that would be needed, just to store digitized sound, was called “Delta Modulation”. At that time, the only ‘normal’ way to digitize sound was what is now called PCM, which often took up too much memory.

And so a scheme was devised very early, by which only the difference between two consecutive samples would actually stored. Today, this is called ‘DPCM‘. And yet, this method has an obvious, severe drawback. If the signal contains substantial amplitudes, associated with frequencies that are half the Nyquist Frequency or higher, this method will clip that content, and produce dull, altered sound.

Well one welcoming fact which I have learned, is that this limitation has essentially been overcome. One commercial domain in which this has been overcome, is with the compression scheme / CODEC named “aptX“. This is a proprietary scheme, owned by Qualcomm, but is frequently used, as the chips manufactured and designed by Qualcomm are installed into many devices and circuits. One important place this gets used, is with the type of Bluetooth headset, that now has high-quality sound.

What happens in aptX, requires that the band of frequencies which start out as a PCM stream, needs to get ‘beaten down’ into 4 sub-bands, using a type of filter known as a “Quadrature Mirror Filter“. This happens in two stages. I know of a kind of Quadrature Mirror Filter which was possible in the old analog days, but have had problems until now, imagining how somebody might implement one using algorithms.

The analog approach required, a local sine-wave, a phase-shifted local sine-wave, a balanced demodulator used twice, and a phase-shifter which was capable of phase-shifting a (wide) band of frequencies, without altering their relative amplitudes. This latter feat is a little difficult to accomplish with simple algorithms, and when accomplished, typically involves high latency. aptX is a CODEC with low latency.

The main thing to understand about a Quadrature Mirror Filter, implemented using algorithms in digital signal processing today, is that the hypothetical example the WiKi article above cites, using a Haar Wavelet for H0 and its complementary series for H1, actually fails to implement a quadrature-split in a pure way, and was offered just as a hypothetical example. The idea that H1( H0(z) ) always equals zero, simply suggested that the frequencies passed by these two filters are mutually exclusive, so that in an abstract way, they pass the requirements. After the signal is passed through H0 and H1 in parallel, the output of each is reduced to half the sampling rate of the input.

What Qualcomm explicitly does, is to define a series H0 and a series H1, such that they apply “64 coefficients”, so that they may achieve a frequency-split accurately. And it is not clear from the article, whether the number of coefficients for each filter is 64, or whether their sum for two filters is 64, or the sum of all six. Either way, this implies a lot of coefficients, which is why dedicated hardware is needed today, to implement aptX, and this dedicated hardware belongs to the kind, which needs to run its own microprogram.

Back in the early days of Computing, programmers would actually use the Haar Wavelet, because of its computational simplicity, even though doing so did not split the spectrum cleanly. And then this wavelet would define the ‘upper sideband’ in a notional way, while its complementary filter would define the notional, ‘lower sideband’, when splitting.

But then the result of this becomes 4 channels in the case of aptX, each of which has 1/4 the sampling rate of the original audio. And then it is possible, in effect, to delta-modulate each of these channels separately. The higher frequencies have then been beaten down to lower frequencies…

But there is a catch. In reality, aptX needs to use ‘ADPCM‘ and not ‘DPCM’, because it can happen in any case, that the amplitudes of upper-frequency bands could be high. ADPCM is a scheme, by which the maximum short-term differential is computed for some time-interval, which is allowed to be a frame of samples, and where a simple division is used to compute a scale factor, by which these differentials are to be quantized.

This is a special situation, in which the sound is quantized in the time-domain, rather than being quantized in the frequency-domain. Quantizing the higher-frequency sub-bands has the effect of adding background – ‘white’ – noise to the decoded signal, thus making the scheme lossy. Yet, because the ADPCM stages are adaptive, the degree of quantization keeps the level of this background noise at a certain fraction, of the amplitude of the intended signal.

And so it would seem, that even old tricks which once existed in Computing, such as delta modulation, have not gone to waste, and have been transformed into something more HQ today.

I think that one observation to add would be, that this approach makes most sense, if the number of output samples of each instance of H0 is half as many, as the number of input samples, and if the same can be said for H1.

And another observation would be, that this approach does not invert the lower sideband, the way real quadrature demodulation would. Instead, it would seem that H0 inverts the upper sideband.

If the intent of down-sampling is to act as a 2:1 low-pass filter, then it remains productive to add successive pairs of samples. Yet, this could just as easily be the definition of H1.

Dirk

(Edit 06/20/2016 : ) There is an observation to add about wavelets. The Haar Wavelet is the simplest kind:


H0 = [ +1, -1 ]
H1 = [ +1, +1 ]


And this one guarantees that the original signal can be reconstructed from two down-sampled sub-bands. But, if we remove one of the sub-bands completely, this one results in weird spectral results. This can also be a problem if the sub-bands are modified in ways that do not match.

It is possible to define complementary Wavelets, that are also orthogonal, but which again, result in weird spectral results.

The task of defining ones, which are both orthogonal and spectrally neutral, has been solved better by the Daubechies series of Wavelets. However, the series of coefficients used there are non-intuitive, and were also beyond my personal ability to figure out spontaneously.

The idea is that there exists a “scaling function”, which also results in the low-pass filter H1. And then, if we reverse the order of coefficients and negate every second one, we get the high-pass filter H0, which is really a band-pass filter.

To my surprise, the Daubechies Wavelets achieve ‘good results’, even with a low number of coefficients such as maybe 4? But for very good audio results, a longer series of coefficients would still be needed.

One aspect to this which is not mentioned elsewhere, is that while a Daubechies Wavelet-set could be used for encoding, that has a high order of approximation, it could still be that simple appliances will use the Haar Wavelet for decoding. This could be disappointing, but I guess that when decoding, the damage done in this way will be less severe than when encoding.

The most correct thing to do, would be to use the Daubechies Wavelets again for decoding, and the mere time-delays that result from their use, still fall within the customary definitions today, of “low-latency solutions”. If we needed a Sinc Filter, using it may no longer be considered so, and if we needed to find a Fourier Transform of granules of sound, only to invert it again later, it would certainly not be considered low-latency anymore.

And, when the subject is image decomposition or compression, it is a 2-dimensional application, and the reuse of the Haar Wavelet is more common.