Understanding ADPCM

One concept which exists in Computing, is a primary representation of audio streams, as samples with a constant sampling-rate, which is also called ‘PCM’ – or, Pulse-Code Modulation. this is also the basis for .WAV-Files. But, everybody knows that the files needed to represent even the highest humanly-audible frequencies in this way, become large. And so means have been pursued over the decades to compress this format after it has been generated, or to decompress it before reading the stream. And as early as in the 1970s, a compression-technique existed, which is called ‘DPCM’ today: Differential Pulse-Code Modulation. Back then, it was just not referred to as DPCM, but rather as ‘Delta-Modulation’, and it first formed a basis for the voice-chips, in ‘talking dolls’ (toys). Later it became the basis for the first solid-state (telephone) answering machines.

The way DPCM works, is that instead of each sample-value being stored or transmitted, only the exact difference between two consecutive sample-values is stored. And this subject is sometimes explained, as though software engineers had two ways to go about encoding it:

  1. Simply subtract the current sample-value from the previous one and output it,
  2. Create a local copy, of what the decoder would do, if the previous sample-differences had been decoded, and output the difference between the current sample-value, and what this local model regenerated.

What happens when DPCM is used directly, is that a smaller field of bits can be used as data, let’s say ‘4’ instead of ‘8’. But then, a problem quickly becomes obvious: Unless the uncompressed signal was very low in higher-frequency components – frequencies above 1/3 the Nyquist-Frequency – a step in the 8-bit sample-values could take place, which is too large to represent as a 4-bit number. And given this possibility, it would seem that only approach (2) will give the correct result, which would be, that the decoded sample-values will slew, where the original values had a step, but slew back to an originally-correct, low-frequency value.

But then we’d still be left with the advantage, of fixed field-widths, and thus, a truly Constant Bitrate (CBR).

But because according to today’s customs, the signal is practically guaranteed to be rich in its higher-frequency components, a derivative of DPCM has been devised, which is called ‘ADPCM’ – Adaptive Differential Pulse-Code Modulation. When encoding ADPCM, each sample-difference is quantized, according to a quantization-step – aka scale-factor – that adapts to how high the successive differences are at any time. But again, as long as we include the scale-factor as part of (small) header-information for an audio-format, that’s organized into blocks, we can achieve fixed field-sizes and fixed block-sizes again, and thus also achieve true CBR.

(Updated 03/07/2018 : )

Continue reading Understanding ADPCM

About the Amplitudes of a Discrete Differential

One of the concepts which exist in digital signal processing, is that the difference between two consecutive input samples (in the time-domain) can simply be output, thus resulting in a differential of some sort, even though the samples of data do not represent a continuous function. There is a fact which must be observed to occur at (F = N / 2) – i.e. when the frequency is half the Nyquist Frequency, of (h / 2) , if (h) is the sampling frequency.

The input signal could be aligned with the samples, to give a sequence of [s0 … s3] equal to

0, +1, 0, -1

This set of (s) is equivalent to a sine-wave at (F = N / 2) . Its discrete differentiation [h0 … h3] would be

+1, +1, -1, -1

At first glance we might think, that this output stream has the same amplitude as the input stream. But the problem becomes that the output stream is by same token, not aligned with the samples. There is an implicit peak in amplitudes between (h0) and (h1) which is greater than (+1) , and an implicit peak between (h2) and (h3) more negative than (-1) . Any adequate filtering of this stream, belonging to a D/A conversion, will reproduce a sine-wave with a peak amplitude greater than (1).

(Edit 03/23/2017 : )

In this case we can see, that samples h0 and h1 of the output stream, would be phase-shifted 45⁰ with respect to the zero crossings and to the peak amplitude, that would exist exactly between h0 and h1. Therefore, the amplitude of h0 and h1 will be the sine-function of 45⁰ with respect to this peak value, and the actual peak would be (the square root of 2) times the values of h0 and h1.

(Erratum 11/28/2017 —

And so a logical question which anybody might want an answer to would be, ‘Below what frequency does the gain cross unity gain?’ And the answer to that question is, somewhat obscurely, at (N/3) . This is a darned low frequency in practice. If the sampling rate was 44.1kHz, this is achieved somewhere around 7 kHz, and music, for which that sampling rate was devised, easily contains sound energy above that frequency.

Hence the sequences which result would be:

s = [ +1, +1/2, -1/2, -1, -1/2, +1/2 ]

h = [ +1/2, -1/2, -1, -1/2, +1/2, +1 ]

What follows is also a reason for which by itself, DPCM offers poor performance in compressing signals. It usually needs to be combined with other methods of data-reduction, thus possibly resulting in the lossy ADPCM. And another approach which uses ADPCM, is aptX, the last of which is a proprietary codec, which minimizes the loss of quality that might otherwise stem from using ADPCM.

I believe this observation is also relevant to This Earlier Posting of mine, which implied a High-Pass Filter with a cutoff frequency of 500 Hz, that would be part of a Band-Pass Filter. My goal was to obtain a gain of at most 0.5 , over the entire interval, and to simplify the Math.

— End of Erratum. )

(Posting shortened here on 11/28/2017 . )