A Basic Limitation in Stereo FM Reproduction

One of the concepts which exist in modern, high-definition sound, is that Human Sound perception can take place between 20 Hz and 20kHz, even though those endpoints are somewhat arbitrary. Some people cannot hear frequencies as high as 20kHz, especially older people, or anybody who just does not have good hearing. Healthy, young children and teenagers can typically hear that entire frequency range.

But, way back when FM radio was invented, sound engineers had flawed data about what frequencies Humans can hear. It was given to them as data to work with that Humans can only hear frequencies from 30Hz to 15kHz. And so, even though Their communications authorities had the ability to assign frequencies somewhat arbitrarily, they did so in a way that was based on such data. (:1)

For that reason, the playback of FM Stereo today, using household receivers, is still limited to an audio frequency range from 30Hz to 15kHz. Even very expensive receivers will not be able to reproduce sound, that was once part of the modulated input, outside this frequency range, although other reference points can be applied, to try to gauge how good the sound quality is.

There is one artifact of this initial standard which was sometimes apparent in early receivers. Stereo FM has a pilot frequency at 19kHz, which a receiver needs to lock an internal oscillator to, but in such a way that the internal oscillator runs at 38kHz, but such that this internal oscillator can be used to demodulate the stereo part of the sound. Because the pilot signal which is actually part of the broadcast signal is ‘only’ at 19kHz, this gives an additional reason to cut off the audible signal at ‘only’ 15Khz; the pilot is not meant to be heard. But, way back in the 1970s and earlier, Electrical Engineers did not have the type of low-pass filters available to them which they do now, that are also known as ‘brick-wall filters’, or filters that attenuate frequencies above the cutoff frequency very suddenly. Instead, equipment designed to be manufactured in the 1970s and earlier, would only use low-pass filters with gradual ‘roll-off’ curves, to attenuate the higher frequencies progressively more, above the cutoff frequency by an increasing distance, but in a way that was gentle. And in fact, even today the result seems to be, that gentler roll-off of the higher frequencies, results in better sound, when the quality is measured in other ways than just the frequency range, such as, when sound quality is measured for how good the temporal resolution, of very short pulses, of high-frequency sound is.

Generally, very sharp spectral resolution results in worse temporal resolution, and this is a negative side effect of some examples of modern sound technology.

But then sometimes, when listeners with high-end receivers in the 1970s and before, who had very good hearing, were tuned in to an FM Stereo Signal, they could actually hear some residual amount of the 19kHz pilot signal, which was never a part of the original broadcast audio. That was sometimes still audible, just because the low-pass filter that defined 15kHz as the upper cut-off frequency, was admitting the 19kHz component to a partial degree.

One technical accomplishment that has been possible since the 1970s however, in consumer electronics, was an analog ‘notch filter’, which seemed to suppress one exact frequency – or almost so – and such a notch filter could be calibrated to suppress 19kHz specifically.

Modern electronics makes possible such things as analog low-pass filters with a more-sudden frequency-cut-off, digital filters, etc. So it’s improbable today, that even listeners whose hearing would be good enough, would still be receiving this 19kHz sound-component to their headphones. In fact, the sound today is likely to seem ‘washed out’, simply because of too many transistors being fit on one chip. And when I just bought an AM/FM Radio in recent days, I did not even try the included ear-buds at first, because I have better headphones. When I did try the included ear-buds, their sound-quality was worse than that, when using my own, valued headphones. I’d say the included ear-buds did not seem to reproduce frequencies above 10kHz at all. My noise-cancelling headphones clearly continue to do so.

One claim which should be approached with extreme skepticism would be, that the sound which a listener seemed to be getting from an FM Tuner, was as good as sound that he was also obtaining from his Vinyl Turntable. AFAIK, the only way in which this would be possible would be, if he was using an extremely poor turntable to begin with.

What has happened however, is that audibility curves have been accepted – since the 1980s – that state the upper limit of Human hearing as 20kHz, and that all manner of audio equipment designed since then takes this into consideration. This would include Audio CD Players, some forms of compressed sound, etc. What some people will claim in a way that strikes me as credible however, is that the frequency-response of the HQ turntables was as good, as that of Audio CDs was. And the main reason I’ll believe that is the fact that Quadraphonic LPs were sold at some point, which had a sub-carrier for each stereo channel, that differentiated that stereo channel front-to-back. This sub-carrier was actually phase-modulated. But in order for Quadraphonic LPs to have worked at all, their actual frequency response need to go as high as  40kHz. And phase-modulation was chosen because this form of modulation is particularly immune to the various types of distortion which an LP would insert, when playing back frequencies as high as 40kHz.

(Updated 6/24/2019, 14h50 … )

Linear Predictive Coding

Linear Predictive Coding is a method of using a constant number of known sample-values, that precede an unknown sample-value, and to find coefficients for each of the preceding sample-values, which they should be multiplied by, and the products summed, to derive the most-probable following sample-value.

More specifically, while the exercise of applying these coefficients should not require much explaining, methods for deriving them do.

Finding these coefficients is also called an Auto-Correlation, because the following sample is part of the same sequence, as the preceding samples belonged to.

Even though LPC travels along the stream of values, each preceding position relative to the current sample to predict is treated as having a persistent meaning, and for the sake of simplicity I’ll be referring to each preceding sample-position as One Predictor.

If the Linear Predictive Coding was only to be of the 2nd order, thus taking into account 2 Predictors, then it will often be simple to use a fixed system of coefficients.

In this case, the coefficients should be { -1, +2 }, which will successfully predict the continuation of a straight line, and nothing else.

One fact about a set of coefficients is, that their sum should be equal to 1, in order to predict a DC value correctly.

( If the intent was to use a set of 3 predictors, to conserve both the 1st and the 2nd derivatives of a curve, then the 3 coefficients should automatically be { +1, -3, +3 } . But, what’s needed for Signal Processing is often not, what’s needed for Analytical Geometry. )

But for orders of LPC greater than 3, the determination of the coefficients is anything but trivial. In order to understand how these coefficients can be computed, one must first understand a basic concept in Statistics called a Correlation. A correlation supposes that an ordered set of X-value and Y-value pairs exist, which could have any values for both X and Y, but that Y is supposed to follow from X, according to a linear equation, such that

Y = α + β X

Quite simply, the degree of correlation is the ideal value for β, which achieves the closest-fitting set of predicted Y-values, given hypothetical X-values.

The process of trying to compute this ideal value for β is also called Linear Regression Analysis, and I keep a fact-sheet about it:

This little sheet actually describes Non-Linear Regression Analysis at the top, using a matrix which states the polynomial terms of X, but it goes on to show the simpler examples of Linear Regression afterward.

There is a word of commentary to make, before understanding correlations at all. Essentially, they exist in two forms

1. There is a form, in which the products of the deviations of X and Y are divided by the variance of X, before being divided by the number of samples.
2. There is a form, in which the products of the deviations of X and Y are divided by the square root, of the product of the variance of X and the variance of Y, before being divided by the number of samples.

The variance of any data-set is also its standard deviation squared. And essentially, there are two ways to deal with the possibility of non-zero Y-intercepts – non-zero values of α. One way is to compute the mean of X, and to use the deviations of individual values of X from this mean, as well as to find the corresponding mean of Y, and to use deviations of individual values of Y from this mean.

Another way to do the Math, is what my fact-sheet describes.

Essentially, Form (1) above treats Y-values as following from known X-values, and is easily capable of indicating amounts of correlation greater than 1.

Form (2) finds how similar X and Y -values are, symmetrically, and should never produce correlations greater than 1.

For LPC, Form (2) is rather useless, and the mean of a set of predictors must be found anyway, so that individual deviations from this mean are also the easiest values to compute with.

The main idea when this is to become an autocorrelation, is that the correlation of the following sample is computed individually, as if it was one of the Y-values, as following each predictor, as if that was just one of the X-values. But it gets just a touch trickier…

(Last Edited 06/07/2017 … )

My Opinion on the Opinion of Chris “Monty” Montgomery

Chris Montgomery is the Audio Expert, who invented the OGG Vorbis codec. That gives enough reason to accredit him with good advice. I recommend that my readers read his advice here.

I did read the whole thing, but have three comments on it:

1. The Author suggests that 16-bit sample-depth offers a de-facto solution to the limits in dynamic range, simply due to the correct application of dithering. If I cannot trust my hardware to perform correct low-pass filtering, why on Earth would I trust it to perform correct, 16-bit, audio dithering?
2. The Author explains the famous loudness curves, that define threshold of perceptibility, as well as the higher threshold of pain. What he fails to point out is that these curves assume, that the sound being tested for, is the only sound being played over the headphones. If there is another, background sound being played – i.e. the current loudness-level already higher than zero – then the threshold of perception for a given test-sound, is higher – requires a higher level, for that test-sound itself to be heard. Yet, this level is still lower, than the peak level of the background sound. People who design codecs know this, as I am sure the author does. It is the threshold of perceptibility next to a background sound – not the absolute threshold – which gets used in the design of codecs.
3. The Author suggests it would be a misuse of his codec, to encode discrete multi-channel sound. And one reason he states, would be the waste in file-size, while the next reason he states, would be the fact that sound jumps to the nearest speaker, when they are all encoded that way.

This last observation strikes a cord with me. I have already noticed, that OGG Files do allow numerous channels to be encoded in parallel, but that if we exceed 2, we lose the benefits of Joint Stereo. By itself, this does not really count against this Author, whose codec therefore does not offer explicit surround-sound. But the possibility is very real, that the localization of sound will jump to the nearest speaker, if the listener moves and the sound was encoded that way. It is entirely possible, that purposeful encoding of surround-sound by the (competing) AC3 or the AAC codecs, reduces this risk.

But then I would suggest an alternative approach, to people who do not want to use the proprietary codecs, yet who wish to encode their movies with surround.

There exists the Steve Harris LADSPA plug-in library, which includes a matrix encoder for Pro Logic. This matrix encoder accepts 4 input channels, one of which is the surround channel, and outputs 2 stereo channels.

Further, the circuitry must exist someplace as well, to accept 2 stereo, 1 center and 1 surround-channel, and to encode those in real-time, so that the surround-effect can be played back over 6 speakers.

• In principle, it should be possible to OGG-compress 4 channels and not 6, so that these channels can be used as inputs, to a matrix surround-system, like to the LADSPA plug-in, so that listenable surround will emanate from all speakers. Does Audio Software exist, which applies the LADSPA plug-in in real-time?
• Alternatively, it might be possible to mix down Pro Logic sound into Stereo using the Steve Harris plug-in, and then to use FLAC on the resulting stereo.

BTW: What the Author mainly writes, is how incorrect it would be for pure listeners, to download their music in 24/192 format. He does not actually write, that Music / Sound Authors should avoid recording in this format. And so one fact which I have observed, is that there exists a lot of Audio Software – such as – that stores its sound in some higher, internal format, but which, when instructed to Export that to a 16-bit format, offer Dithering as an option.

This is possible because the Application is numeric and not physical. Thus, If I had used my USB-sound-device to record in 24-bit, I could next Export the finished sound tracks to 16-bit:

But, It would also seem that Chris Montgomery equates the use of such technology, as only being suited for Professionals. I am not a professional, and do not have the extremely expensive tools they do. Yet, I am able to author sound-projects.

Dirk

A Note on FLAC -Compressing 24-bit

One note which I had commented about before my blog began, was that if authors decide to capture sound at 96k samples /second, the resulting sound should compress well using FLAC.

But now that I have experimented with ‘QTractor‘ and an external sound card, I have realized that we will probably also be capturing that sound in 24-bit sample-format, instead of 16-bit. And the sad fact is, that FLAC will not compress the 24-bit format as well, as it did 16-bit.

The reason seems clear. Using ‘Linear Predictive Coding’ means that FLAC will be able to predict the next sample in a set of so-many, to maybe 8 bits of precision, except that the next sample will always deviate from this prediction by a small residual. So 8-bit sound should compress brilliantly.

But then with 16-bit, the accuracy of the encoding stays the same. So again, the ‘LPC’ is really only 8-bits accurate at best, meaning that we get a larger residual. The size of that residual is what makes up most of a FLAC File.

Well at 24-bit, again, the LPC will only predict the next sample, accurately to within 8 bits. And so the residual is likely to be twice as large, as it was with 16-bit, completing 24-bit accuracy this time. We are not left with much compression then.

When I recorded my 14-second sound session the other day, I selected FLAC as my capture file format. I had a noisy air-conditioner running in the background. Additionally, the compression level defaults to Fastest, because the file needs to be written in real-time, and not chewed on.

At 96 kHz, 24-bit stereo, raw audio will take up about 4.6 mbps. At 44.1 kHz, 16-bit stereo, raw audio takes up about 1.4 mbps.

Well I was capturing to a stereo FLAC File, but was only using one channel out of the two. So the FLAC File that resulted, had a bit-rate of 2.3 mbps. This means that FLAC recognized the silent track and used ‘Run-Length Encoding’ on it, but that was about all this CODEC could do for me.

Now, we do have a command-line tool which will-re-compress that file:


$flac -8 infile.flac -o outfile.flac$ flac -8 infile.flac --channels=1 -o outfile.flac
\$ flac -8 infile.flac --channels=1 --blocksize=8192 -o outfile.flac



The -8 means to use maximum compression.

For me, the bit-rate went down to 2.2 mbps either way.

It beats using a raw format, because using the latter would have meant, nothing would have detected my silent stereo channel, and the file would have been twice as large.

Dirk