## How Low-Latency CODECs can have a Time-Delay on PCs and Mobile Devices.

aptX is a CODEC, which uses two stages of “Linear Filters”, which are also known as “Convolutions”. And aptX gets used by some of the Bluetooth Headphones, that have it as a special feature, to be able to play HiFi music.

If we could just assume for the moment that each of the filters used by aptX is a 6-tap filter, meaning a filter with 6 coefficients, which is a realistic assumption, it would seem that ‘low latency’ is implied.

aptX subdivides the uncompressed spectrum into 4 sub-bands, about 5 kHz wide each, and each of which has been converted into a parallel 11.025 kHz stream of samples, for further processing. At first glance, one would assume that the latency of such a filter is the amount of time it takes, for an input sample of sound to make it past 6 coefficients then. This would mean that the latency of one filter stage is less than 1 millisecond. And so the next question which a casual observer might ask would be, ‘Why then, is there a noticeable time-delay when I listen to my Bluetooth Media?’

In this case, this time-delay has nothing to do with the Wavelets used, or the low-pass and band-pass filters themselves. When we listen to a stream on a PC, a Laptop, or a consumer Mobile Device such as a smart-phone, there are stages involved, which most users do not see, and which have nothing to do with these individual 6-tap filters.

aptX is actually implemented on the hardware side, within the Bluetooth chip-set of the source of the stream. It does not even rely on the CPU for the processing.

But what happens to any audio streamed on a consumer PC / Laptop / Mobile Device, is that first a user-space application needs to transfer the audio into kernel space, that next a kernel module needs to transfer the stream, essentially, to the hardware interface, and that then, firmware for the chip-set allows the latter to compress the stream on its way out via the Bluetooth antenna.

In consumer computing, every time an audio stream needs to be transferred from one process to another, there is a buffer that stores a minimum interval of sound. And the number of buffered samples can be higher than what we imagine, because software specialists try to make up for possible multi-tasking here, that could cause the stream to skip or drop, because the actual CPU has been called to do some background processing, outside of playing back the audio stream. Such a condition is called a “Buffer Underflow” when it happens. So the delay of each of these buffers is commonly kept high, so that all the audio we are hearing, has been delayed as a unit, and so that the CPU can still perform additional tasks, without necessarily causing the audio to skip.

Such buffering does not just happen once, in consumer electronics, but several times.

The situation is different, when aptX is built-in to professional equipment, that gets used in concerts etc.. Here, the chips are not embedded in all-purpose computing devices, but rather into dedicated devices. And here, the buffering has essentially been eliminated, so that the technology can be used live.

Dirk

## I now have LG Tone Pro HBS-750 Bluetooth Headphones.

And unlike how it went with the previous set, I paid the full price for these, and know that they are genuine.

I can now comment accurately for the first time, about the “aptX” sound compression they use.

I understand that most of the music that I will be playing, has already been MP3 or OGG compressed. But with my simple headphones, that were wired to the stereo mini-jack on my phone, there was a loss in quality, just in getting the sound to my ears, after MP3 or OGG decoding. With aptX, it could be argued that there is also some small loss of quality in getting the sound to my ears.

aptX, and the HBS-750 headphones, are able to get the sound to my ears, after lossy decompression, better than the wired headphones could. So the only sound artifacts that I will ever hear with these, will be those due to MP3 or OGG, and the OGG files will play better again than the MP3 files did, as the OGG files are supposed to do.

The sound of these headphones is truly superb.

Further, the reason for which the suggested app ‘Tone And Talk’ was not recognizing the supposed HBS-730 headphones, was the mere fact that this app was able to read the meta-data of those, and was able to determine, that those were just not on the list of supported headphones, even though I was fooled into believing that they were.

Tone And Talk works properly, with the HBS-750 headphones, that are genuine LG headphones. That is, unless I am to do a detailed test of this app, which may come later. But the app does not just sit there and stay lame, as it did with the counterfeit headphones.

Dirk

## aptX and Delta-Modulation

I am an old-timer. And one of the tricks which once existed in Computing, to compress the amount of memory that would be needed, just to store digitized sound, was called “Delta Modulation”. At that time, the only ‘normal’ way to digitize sound was what is now called PCM, which often took up too much memory.

And so a scheme was devised very early, by which only the difference between two consecutive samples would actually stored. Today, this is called ‘DPCM‘. And yet, this method has an obvious, severe drawback. If the signal contains substantial amplitudes, associated with frequencies that are half the Nyquist Frequency or higher, this method will clip that content, and produce dull, altered sound.

Well one welcoming fact which I have learned, is that this limitation has essentially been overcome. One commercial domain in which this has been overcome, is with the compression scheme / CODEC named “aptX“. This is a proprietary scheme, owned by Qualcomm, but is frequently used, as the chips manufactured and designed by Qualcomm are installed into many devices and circuits. One important place this gets used, is with the type of Bluetooth headset, that now has high-quality sound.

What happens in aptX, requires that the band of frequencies which start out as a PCM stream, needs to get ‘beaten down’ into 4 sub-bands, using a type of filter known as a “Quadrature Mirror Filter“. This happens in two stages. I know of a kind of Quadrature Mirror Filter which was possible in the old analog days, but have had problems until now, imagining how somebody might implement one using algorithms.

The analog approach required, a local sine-wave, a phase-shifted local sine-wave, a balanced demodulator used twice, and a phase-shifter which was capable of phase-shifting a (wide) band of frequencies, without altering their relative amplitudes. This latter feat is a little difficult to accomplish with simple algorithms, and when accomplished, typically involves high latency. aptX is a CODEC with low latency.

The main thing to understand about a Quadrature Mirror Filter, implemented using algorithms in digital signal processing today, is that the hypothetical example the WiKi article above cites, using a Haar Wavelet for H0 and its complementary series for H1, actually fails to implement a quadrature-split in a pure way, and was offered just as a hypothetical example. The idea that H1( H0(z) ) always equals zero, simply suggested that the frequencies passed by these two filters are mutually exclusive, so that in an abstract way, they pass the requirements. After the signal is passed through H0 and H1 in parallel, the output of each is reduced to half the sampling rate of the input.

What Qualcomm explicitly does, is to define a series H0 and a series H1, such that they apply “64 coefficients”, so that they may achieve a frequency-split accurately. And it is not clear from the article, whether the number of coefficients for each filter is 64, or whether their sum for two filters is 64, or the sum of all six. Either way, this implies a lot of coefficients, which is why dedicated hardware is needed today, to implement aptX, and this dedicated hardware belongs to the kind, which needs to run its own microprogram.

Back in the early days of Computing, programmers would actually use the Haar Wavelet, because of its computational simplicity, even though doing so did not split the spectrum cleanly. And then this wavelet would define the ‘upper sideband’ in a notional way, while its complementary filter would define the notional, ‘lower sideband’, when splitting.

But then the result of this becomes 4 channels in the case of aptX, each of which has 1/4 the sampling rate of the original audio. And then it is possible, in effect, to delta-modulate each of these channels separately. The higher frequencies have then been beaten down to lower frequencies…

But there is a catch. In reality, aptX needs to use ‘ADPCM‘ and not ‘DPCM’, because it can happen in any case, that the amplitudes of upper-frequency bands could be high. ADPCM is a scheme, by which the maximum short-term differential is computed for some time-interval, which is allowed to be a frame of samples, and where a simple division is used to compute a scale factor, by which these differentials are to be quantized.

This is a special situation, in which the sound is quantized in the time-domain, rather than being quantized in the frequency-domain. Quantizing the higher-frequency sub-bands has the effect of adding background – ‘white’ – noise to the decoded signal, thus making the scheme lossy. Yet, because the ADPCM stages are adaptive, the degree of quantization keeps the level of this background noise at a certain fraction, of the amplitude of the intended signal.

And so it would seem, that even old tricks which once existed in Computing, such as delta modulation, have not gone to waste, and have been transformed into something more HQ today.

I think that one observation to add would be, that this approach makes most sense, if the number of output samples of each instance of H0 is half as many, as the number of input samples, and if the same can be said for H1.

And another observation would be, that this approach does not invert the lower sideband, the way real quadrature demodulation would. Instead, it would seem that H0 inverts the upper sideband.

If the intent of down-sampling is to act as a 2:1 low-pass filter, then it remains productive to add successive pairs of samples. Yet, this could just as easily be the definition of H1.

Dirk

(Edit 06/20/2016 : ) There is an observation to add about wavelets. The Haar Wavelet is the simplest kind:


H0 = [ +1, -1 ]
H1 = [ +1, +1 ]


And this one guarantees that the original signal can be reconstructed from two down-sampled sub-bands. But, if we remove one of the sub-bands completely, this one results in weird spectral results. This can also be a problem if the sub-bands are modified in ways that do not match.

It is possible to define complementary Wavelets, that are also orthogonal, but which again, result in weird spectral results.

The task of defining ones, which are both orthogonal and spectrally neutral, has been solved better by the Daubechies series of Wavelets. However, the series of coefficients used there are non-intuitive, and were also beyond my personal ability to figure out spontaneously.

The idea is that there exists a “scaling function”, which also results in the low-pass filter H1. And then, if we reverse the order of coefficients and negate every second one, we get the high-pass filter H0, which is really a band-pass filter.

To my surprise, the Daubechies Wavelets achieve ‘good results’, even with a low number of coefficients such as maybe 4? But for very good audio results, a longer series of coefficients would still be needed.

One aspect to this which is not mentioned elsewhere, is that while a Daubechies Wavelet-set could be used for encoding, that has a high order of approximation, it could still be that simple appliances will use the Haar Wavelet for decoding. This could be disappointing, but I guess that when decoding, the damage done in this way will be less severe than when encoding.

The most correct thing to do, would be to use the Daubechies Wavelets again for decoding, and the mere time-delays that result from their use, still fall within the customary definitions today, of “low-latency solutions”. If we needed a Sinc Filter, using it may no longer be considered so, and if we needed to find a Fourier Transform of granules of sound, only to invert it again later, it would certainly not be considered low-latency anymore.

And, when the subject is image decomposition or compression, it is a 2-dimensional application, and the reuse of the Haar Wavelet is more common.