An observation about some purchased FLAC Files.

One of the ideas which I’ve blogged about often – a pet peeve of mine – is how lossy compression is not inaudible, although some people have claimed it is, and how its use degrades the final quality of modern, streamed or downloaded music.

And so if this is taken to be real for the moment, a question can rise as to what the modern methods are, to purchase High-Fidelity, Classical Music after all. One method could be, only to purchase Audio CDs that were mastered in the 1990s. But then, the eventual problem becomes, that even the best producers may not be mastering new recordings in that format anymore, in the year 2019. We may be able to purchase famous recordings made in the 1990s, but none from later, depending on what, exactly, our needs are. But, an alternative method exists to acquire such music today, especially to acquire the highest quality of Classical music recorded recently.

What people can do is to purchase and download the music in 16-bit, FLAC-compressed format. Ideally, this form of compression should not insert any flaws into the sound on its own. The sound could still be lacking in certain ways, but if it is, then this will be because the raw audio was flawed, before it was even compressed. By definition, lossless compression decompresses exactly to what was present, before the sound was compressed.

I have just taken part in such a transaction, and downloaded Gershwin’s Rhapsody In Blue, in 16-bit FLAC Format. But I made an interesting observation. The raw 16-bit audio at a sample-rate of 44.1kHz, would take up just over 1.4mbps. When I’ve undertaken to Flac-compress such recordings myself, I’ve never been able to achieve a ratio much better than 2:1. Hence, I should not be able to achieve bit-rates much lower than 700kbps. But the recording of Gershwin which I just downloaded, achieves 561kbps. This is a piece in which a piano and a clarinet feature most prominently, and, in this version, also some muted horns. And yet, the overall sound quality of the recording seems good. So what magic might be employed by the producers, to result in smaller FLAC Files?

(Updated 8/27/2019, 14h45 … )

Continue reading An observation about some purchased FLAC Files.

A Basic Limitation in Stereo FM Reproduction

One of the concepts which exist in modern, high-definition sound, is that Human Sound perception can take place between 20 Hz and 20kHz, even though those endpoints are somewhat arbitrary. Some people cannot hear frequencies as high as 20kHz, especially older people, or anybody who just does not have good hearing. Healthy, young children and teenagers can typically hear that entire frequency range.

But, way back when FM radio was invented, sound engineers had flawed data about what frequencies Humans can hear. It was given to them as data to work with that Humans can only hear frequencies from 30Hz to 15kHz. And so, even though Their communications authorities had the ability to assign frequencies somewhat arbitrarily, they did so in a way that was based on such data. (:1)

For that reason, the playback of FM Stereo today, using household receivers, is still limited to an audio frequency range from 30Hz to 15kHz. Even very expensive receivers will not be able to reproduce sound, that was once part of the modulated input, outside this frequency range, although other reference points can be applied, to try to gauge how good the sound quality is.

There is one artifact of this initial standard which was sometimes apparent in early receivers. Stereo FM has a pilot frequency at 19kHz, which a receiver needs to lock an internal oscillator to, but in such a way that the internal oscillator runs at 38kHz, but such that this internal oscillator can be used to demodulate the stereo part of the sound. Because the pilot signal which is actually part of the broadcast signal is ‘only’ at 19kHz, this gives an additional reason to cut off the audible signal at ‘only’ 15Khz; the pilot is not meant to be heard. But, way back in the 1970s and earlier, Electrical Engineers did not have the type of low-pass filters available to them which they do now, that are also known as ‘brick-wall filters’, or filters that attenuate frequencies above the cutoff frequency very suddenly. Instead, equipment designed to be manufactured in the 1970s and earlier, would only use low-pass filters with gradual ‘roll-off’ curves, to attenuate the higher frequencies progressively more, above the cutoff frequency by an increasing distance, but in a way that was gentle. And in fact, even today the result seems to be, that gentler roll-off of the higher frequencies, results in better sound, when the quality is measured in other ways than just the frequency range, such as, when sound quality is measured for how good the temporal resolution, of very short pulses, of high-frequency sound is.

Generally, very sharp spectral resolution results in worse temporal resolution, and this is a negative side effect of some examples of modern sound technology.

But then sometimes, when listeners with high-end receivers in the 1970s and before, who had very good hearing, were tuned in to an FM Stereo Signal, they could actually hear some residual amount of the 19kHz pilot signal, which was never a part of the original broadcast audio. That was sometimes still audible, just because the low-pass filter that defined 15kHz as the upper cut-off frequency, was admitting the 19kHz component to a partial degree.

One technical accomplishment that has been possible since the 1970s however, in consumer electronics, was an analog ‘notch filter’, which seemed to suppress one exact frequency – or almost so – and such a notch filter could be calibrated to suppress 19kHz specifically.

Modern electronics makes possible such things as analog low-pass filters with a more-sudden frequency-cut-off, digital filters, etc. So it’s improbable today, that even listeners whose hearing would be good enough, would still be receiving this 19kHz sound-component to their headphones. In fact, the sound today is likely to seem ‘washed out’, simply because of too many transistors being fit on one chip. And when I just bought an AM/FM Radio in recent days, I did not even try the included ear-buds at first, because I have better headphones. When I did try the included ear-buds, their sound-quality was worse than that, when using my own, valued headphones. I’d say the included ear-buds did not seem to reproduce frequencies above 10kHz at all. My noise-cancelling headphones clearly continue to do so.

One claim which should be approached with extreme skepticism would be, that the sound which a listener seemed to be getting from an FM Tuner, was as good as sound that he was also obtaining from his Vinyl Turntable. AFAIK, the only way in which this would be possible would be, if he was using an extremely poor turntable to begin with.

What has happened however, is that audibility curves have been accepted – since the 1980s – that state the upper limit of Human hearing as 20kHz, and that all manner of audio equipment designed since then takes this into consideration. This would include Audio CD Players, some forms of compressed sound, etc. What some people will claim in a way that strikes me as credible however, is that the frequency-response of the HQ turntables was as good, as that of Audio CDs was. And the main reason I’ll believe that is the fact that Quadraphonic LPs were sold at some point, which had a sub-carrier for each stereo channel, that differentiated that stereo channel front-to-back. This sub-carrier was actually phase-modulated. But in order for Quadraphonic LPs to have worked at all, their actual frequency response need to go as high as  40kHz. And phase-modulation was chosen because this form of modulation is particularly immune to the various types of distortion which an LP would insert, when playing back frequencies as high as 40kHz.

About Digital FM:

(Updated 7/3/2019, 22h15 … )

Continue reading A Basic Limitation in Stereo FM Reproduction

Understanding ADPCM

One concept which exists in Computing, is a primary representation of audio streams, as samples with a constant sampling-rate, which is also called ‘PCM’ – or, Pulse-Code Modulation. this is also the basis for .WAV-Files. But, everybody knows that the files needed to represent even the highest humanly-audible frequencies in this way, become large. And so means have been pursued over the decades to compress this format after it has been generated, or to decompress it before reading the stream. And as early as in the 1970s, a compression-technique existed, which is called ‘DPCM’ today: Differential Pulse-Code Modulation. Back then, it was just not referred to as DPCM, but rather as ‘Delta-Modulation’, and it first formed a basis for the voice-chips, in ‘talking dolls’ (toys). Later it became the basis for the first solid-state (telephone) answering machines.

The way DPCM works, is that instead of each sample-value being stored or transmitted, only the exact difference between two consecutive sample-values is stored. And this subject is sometimes explained, as though software engineers had two ways to go about encoding it:

  1. Simply subtract the current sample-value from the previous one and output it,
  2. Create a local copy, of what the decoder would do, if the previous sample-differences had been decoded, and output the difference between the current sample-value, and what this local model regenerated.

What happens when DPCM is used directly, is that a smaller field of bits can be used as data, let’s say ‘4’ instead of ‘8’. But then, a problem quickly becomes obvious: Unless the uncompressed signal was very low in higher-frequency components – frequencies above 1/3 the Nyquist-Frequency – a step in the 8-bit sample-values could take place, which is too large to represent as a 4-bit number. And given this possibility, it would seem that only approach (2) will give the correct result, which would be, that the decoded sample-values will slew, where the original values had a step, but slew back to an originally-correct, low-frequency value.

But then we’d still be left with the advantage, of fixed field-widths, and thus, a truly Constant Bitrate (CBR).

But because according to today’s customs, the signal is practically guaranteed to be rich in its higher-frequency components, a derivative of DPCM has been devised, which is called ‘ADPCM’ – Adaptive Differential Pulse-Code Modulation. When encoding ADPCM, each sample-difference is quantized, according to a quantization-step – aka scale-factor – that adapts to how high the successive differences are at any time. But again, as long as we include the scale-factor as part of (small) header-information for an audio-format, that’s organized into blocks, we can achieve fixed field-sizes and fixed block-sizes again, and thus also achieve true CBR.

(Updated 03/07/2018 : )

Continue reading Understanding ADPCM

About the Amplitudes of a Discrete Differential

One of the concepts which exist in digital signal processing, is that the difference between two consecutive input samples (in the time-domain) can simply be output, thus resulting in a differential of some sort, even though the samples of data do not represent a continuous function. There is a fact which must be observed to occur at (F = N / 2) – i.e. when the frequency is half the Nyquist Frequency, of (h / 2) , if (h) is the sampling frequency.

The input signal could be aligned with the samples, to give a sequence of [s0 … s3] equal to

0, +1, 0, -1

This set of (s) is equivalent to a sine-wave at (F = N / 2) . Its discrete differentiation [h0 … h3] would be

+1, +1, -1, -1

At first glance we might think, that this output stream has the same amplitude as the input stream. But the problem becomes that the output stream is by same token, not aligned with the samples. There is an implicit peak in amplitudes between (h0) and (h1) which is greater than (+1) , and an implicit peak between (h2) and (h3) more negative than (-1) . Any adequate filtering of this stream, belonging to a D/A conversion, will reproduce a sine-wave with a peak amplitude greater than (1).

(Edit 03/23/2017 : )

In this case we can see, that samples h0 and h1 of the output stream, would be phase-shifted 45⁰ with respect to the zero crossings and to the peak amplitude, that would exist exactly between h0 and h1. Therefore, the amplitude of h0 and h1 will be the sine-function of 45⁰ with respect to this peak value, and the actual peak would be (the square root of 2) times the values of h0 and h1.

(Erratum 11/28/2017 —

And so a logical question which anybody might want an answer to would be, ‘Below what frequency does the gain cross unity gain?’ And the answer to that question is, somewhat obscurely, at (N/3) . This is a darned low frequency in practice. If the sampling rate was 44.1kHz, this is achieved somewhere around 7 kHz, and music, for which that sampling rate was devised, easily contains sound energy above that frequency.

Hence the sequences which result would be:

s = [ +1, +1/2, -1/2, -1, -1/2, +1/2 ]

h = [ +1/2, -1/2, -1, -1/2, +1/2, +1 ]

What follows is also a reason for which by itself, DPCM offers poor performance in compressing signals. It usually needs to be combined with other methods of data-reduction, thus possibly resulting in the lossy ADPCM. And another approach which uses ADPCM, is aptX, the last of which is a proprietary codec, which minimizes the loss of quality that might otherwise stem from using ADPCM.

I believe this observation is also relevant to This Earlier Posting of mine, which implied a High-Pass Filter with a cutoff frequency of 500 Hz, that would be part of a Band-Pass Filter. My goal was to obtain a gain of at most 0.5 , over the entire interval, and to simplify the Math.

— End of Erratum. )

(Posting shortened here on 11/28/2017 . )

Dirk