One concept which exists in Computing, is a primary representation of audio streams, as samples with a constant sampling-rate, which is also called ‘PCM’ – or, Pulse-Code Modulation. this is also the basis for .WAV-Files. But, everybody knows that the files needed to represent even the highest humanly-audible frequencies in this way, become large. And so means have been pursued over the decades to compress this format after it has been generated, or to decompress it before reading the stream. And as early as in the 1970s, a compression-technique existed, which is called ‘DPCM’ today: Differential Pulse-Code Modulation. Back then, it was just not referred to as DPCM, but rather as ‘Delta-Modulation’, and it first formed a basis for the voice-chips, in ‘talking dolls’ (toys). Later it became the basis for the first solid-state (telephone) answering machines.
The way DPCM works, is that instead of each sample-value being stored or transmitted, only the exact difference between two consecutive sample-values is stored. And this subject is sometimes explained, as though software engineers had two ways to go about encoding it:
- Simply subtract the current sample-value from the previous one and output it,
- Create a local copy, of what the decoder would do, if the previous sample-differences had been decoded, and output the difference between the current sample-value, and what this local model regenerated.
What happens when DPCM is used directly, is that a smaller field of bits can be used as data, let’s say ‘4’ instead of ‘8’. But then, a problem quickly becomes obvious: Unless the uncompressed signal was very low in higher-frequency components – frequencies above 1/3 the Nyquist-Frequency – a step in the 8-bit sample-values could take place, which is too large to represent as a 4-bit number. And given this possibility, it would seem that only approach (2) will give the correct result, which would be, that the decoded sample-values will slew, where the original values had a step, but slew back to an originally-correct, low-frequency value.
But then we’d still be left with the advantage, of fixed field-widths, and thus, a truly Constant Bitrate (CBR).
But because according to today’s customs, the signal is practically guaranteed to be rich in its higher-frequency components, a derivative of DPCM has been devised, which is called ‘ADPCM’ – Adaptive Differential Pulse-Code Modulation. When encoding ADPCM, each sample-difference is quantized, according to a quantization-step – aka scale-factor – that adapts to how high the successive differences are at any time. But again, as long as we include the scale-factor as part of (small) header-information for an audio-format, that’s organized into blocks, we can achieve fixed field-sizes and fixed block-sizes again, and thus also achieve true CBR.
(Updated 03/07/2018 : )
(As of 03/06/2018 : )
But with ADPCM, it’s absolutely essential, to use approach (2) above, when encoding. Because, as the quantization-step changes, there is no longer any guarantee that the sample-values which will result from the summation upon decoding, will ever come back to what the original values were, before encoding. If the encoder was using a local model of the decoder, then at least this will be so for the lowest-frequency components.
BTW: ADPCM is lossy compression, because of the quantization.
(Edit 03/07/2018 : )
There’s a certain refinement which I can suggest. If we were to assume for the moment, that ‘a block contains 64 samples’, then
For the sake of determining the optimum quantization-step / scale factor, when encoding:
- The first difference to be computed would be between the last decoder-output and the first input-sample of the block, followed by ’63’ mere differences between input block-samples.
- Alternatively, since header-information is already being spent, to define the scale factor, which could have as many bits as the input-samples have bits, one input-sample could just as easily be made a part of this overhead, after which ’63’ samples could follow through differentiation.
- Even though the arithmetic is (unusually) in the integer-domain, the equivalent of adding (0.5) to the largest-possible positive integer out of signed ones, to be output, should still be computed, and a computed scale factor would never be less than (1). This can be simulated with integers, by left-shifting both the largest difference to be input, and the largest-possible integer output, adding (1) to the divisor, dividing, and adding (1) to the quotient again if zero.