There exists HD Radio.

In Canada and the USA, a relatively recent practice in FM radio has been, to piggy-back a digital audio stream, onto the carriers of some existing, analog radio carriers. This is referred to as “HD Radio”. A receiver as good as the broadcasting standard should cost slightly more than $200. This additional content isn’t audible to people who have standard, analog receivers, but can be decoded by people who have the capable receivers. I like to try evaluating how well certain ‘Codecs’ work, which is an acronym for “Compressor-Decompressor”. Obviously, the digital audio has been compressed, so that it will take up a narrower range of radio-frequencies than it offers audio-frequencies. In certain cases, either a poor choice, or an outdated choice of a Codec in itself, can leave the sound-quality injured.

There was an earlier blog posting, in which I described the European Standard for ‘DAB’ this way. That uses ‘MPEG-1, Layer 2′ compression (:1). The main difference between ‘DAB’ and ‘HD Radio’ is the fact that, with ‘DAB’ or ‘DAB+’, a separate band of VHF frequencies is being used, while ‘HD Radio’ uses existing radio stations and therefore the existing band of frequencies.

The Codec used in HD Radio is proprietary, and is owned by a company named ‘iBiquity’. Some providers may reject the format, over an unwillingness to enter a contractual relationship with one commercial undertaking. But what is written is, that The Codec used here resembles AAC. One of the things which I will not do, is to provide my opinion about a lossy audio Codec, without ever having listened to it. Apple and iTunes have been working with AAC for many years, but I’ve neither owned an iPhone, nor an OS/X computer.

What I’ve done in recent days was to buy an HD Radio -capable Receiver, and this provides me with my first hands-on experience with this family of Codecs. Obviously, when trying to assess the levels of quality for FM radio, I use my headphones and not the speakers in my echoic computer-room. But, it can sometimes be more relaxing to play the radio over the speakers, despite the loss of quality that takes place, whenever I do so. (:2)

What I find is that the quality of HD Radio is better than that of analog, FM radio, but still not as good as that of lossless, 44.1kHz audio (such as, with actual Audio CDs). Yet, because we know that this Codec is lossy, that last part is to be expected.

(Updated 8/01/2019, 19h00 … )

Continue reading There exists HD Radio.

A Basic Limitation in Stereo FM Reproduction

One of the concepts which exist in modern, high-definition sound, is that Human Sound perception can take place between 20 Hz and 20kHz, even though those endpoints are somewhat arbitrary. Some people cannot hear frequencies as high as 20kHz, especially older people, or anybody who just does not have good hearing. Healthy, young children and teenagers can typically hear that entire frequency range.

But, way back when FM radio was invented, sound engineers had flawed data about what frequencies Humans can hear. It was given to them as data to work with that Humans can only hear frequencies from 30Hz to 15kHz. And so, even though Their communications authorities had the ability to assign frequencies somewhat arbitrarily, they did so in a way that was based on such data. (:1)

For that reason, the playback of FM Stereo today, using household receivers, is still limited to an audio frequency range from 30Hz to 15kHz. Even very expensive receivers will not be able to reproduce sound, that was once part of the modulated input, outside this frequency range, although other reference points can be applied, to try to gauge how good the sound quality is.

There is one artifact of this initial standard which was sometimes apparent in early receivers. Stereo FM has a pilot frequency at 19kHz, which a receiver needs to lock an internal oscillator to, but in such a way that the internal oscillator runs at 38kHz, but such that this internal oscillator can be used to demodulate the stereo part of the sound. Because the pilot signal which is actually part of the broadcast signal is ‘only’ at 19kHz, this gives an additional reason to cut off the audible signal at ‘only’ 15Khz; the pilot is not meant to be heard. But, way back in the 1970s and earlier, Electrical Engineers did not have the type of low-pass filters available to them which they do now, that are also known as ‘brick-wall filters’, or filters that attenuate frequencies above the cutoff frequency very suddenly. Instead, equipment designed to be manufactured in the 1970s and earlier, would only use low-pass filters with gradual ‘roll-off’ curves, to attenuate the higher frequencies progressively more, above the cutoff frequency by an increasing distance, but in a way that was gentle. And in fact, even today the result seems to be, that gentler roll-off of the higher frequencies, results in better sound, when the quality is measured in other ways than just the frequency range, such as, when sound quality is measured for how good the temporal resolution, of very short pulses, of high-frequency sound is.

Generally, very sharp spectral resolution results in worse temporal resolution, and this is a negative side effect of some examples of modern sound technology.

But then sometimes, when listeners with high-end receivers in the 1970s and before, who had very good hearing, were tuned in to an FM Stereo Signal, they could actually hear some residual amount of the 19kHz pilot signal, which was never a part of the original broadcast audio. That was sometimes still audible, just because the low-pass filter that defined 15kHz as the upper cut-off frequency, was admitting the 19kHz component to a partial degree.

One technical accomplishment that has been possible since the 1970s however, in consumer electronics, was an analog ‘notch filter’, which seemed to suppress one exact frequency – or almost so – and such a notch filter could be calibrated to suppress 19kHz specifically.

Modern electronics makes possible such things as analog low-pass filters with a more-sudden frequency-cut-off, digital filters, etc. So it’s improbable today, that even listeners whose hearing would be good enough, would still be receiving this 19kHz sound-component to their headphones. In fact, the sound today is likely to seem ‘washed out’, simply because of too many transistors being fit on one chip. And when I just bought an AM/FM Radio in recent days, I did not even try the included ear-buds at first, because I have better headphones. When I did try the included ear-buds, their sound-quality was worse than that, when using my own, valued headphones. I’d say the included ear-buds did not seem to reproduce frequencies above 10kHz at all. My noise-cancelling headphones clearly continue to do so.

One claim which should be approached with extreme skepticism would be, that the sound which a listener seemed to be getting from an FM Tuner, was as good as sound that he was also obtaining from his Vinyl Turntable. AFAIK, the only way in which this would be possible would be, if he was using an extremely poor turntable to begin with.

What has happened however, is that audibility curves have been accepted – since the 1980s – that state the upper limit of Human hearing as 20kHz, and that all manner of audio equipment designed since then takes this into consideration. This would include Audio CD Players, some forms of compressed sound, etc. What some people will claim in a way that strikes me as credible however, is that the frequency-response of the HQ turntables was as good, as that of Audio CDs was. And the main reason I’ll believe that is the fact that Quadraphonic LPs were sold at some point, which had a sub-carrier for each stereo channel, that differentiated that stereo channel front-to-back. This sub-carrier was actually phase-modulated. But in order for Quadraphonic LPs to have worked at all, their actual frequency response need to go as high asĀ  40kHz. And phase-modulation was chosen because this form of modulation is particularly immune to the various types of distortion which an LP would insert, when playing back frequencies as high as 40kHz.

About Digital FM:

(Updated 7/3/2019, 22h15 … )

Continue reading A Basic Limitation in Stereo FM Reproduction

Wavelet Decomposition of Images

One type of wavelet which exists, and which has continued to be of some interest to computational signal processing, is the Haar Wavelet. It’s thought to have a low-pass and a high-pass version complementing each other. This would be the low-pass Haar Wavelet:

[ +1 +1 ]

And this would be the high-pass version:

[ +1 -1 ]

These wavelets are intrinsically flawed, in that if they are applied to audio signals, they will produce poor frequency response each. But they do have as an important Mathematical property, that from its low-pass and its high-pass component, the original signal can be reconstructed fully.

Now, there is also something called a wavelet transform, but I seldom see it used.

If we wanted to extend the Haar Wavelet to the 2D domain, then a first approach might be, to apply it twice, once, one-dimensionally, along each axis of an image. But in reality, this would give the following low-frequency component:

[ +1 +1 ]

[ +1 +1 ]

And only, the following high-frequency component:

[ +1 -1 ]

[ -1 +1 ]

This creates an issue with common sense, because in order to be able to reconstruct the original signal – in this case an image – we’d need to arrive at 4 reduced values, not 2, because the original signal had 4 distinct values.

gpu_1_ow

And so closer inspection should reveal, that the wavelet reduction of images has 3 distinct high-frequency components: ( :1 )

Continue reading Wavelet Decomposition of Images

Why the Temporal Resolution of MP3s is Poor.

I have spent a lot of my private time, thinking about lossy sound compression, and then, simplifying my ideas to something more likely to have been implemented in actual MP3 compression. In order to understand this concept, one needs to be familiar with the concept of Fourier Transforms. There are two varieties of them, which are important in sound compression: The “Discreet Fourier Transform” (‘DFT’), and the “Discreet Cosine Transform” (‘DCT’), the latter of which has several types again.

I did notice that the temporal resolution of MP3s I listen to is poor, and it was an important realization I finally came to, that this was not due to the actual length of the sampling window.

If we were to assume for the moment that the sampling interval was 1024 samples long – and for MP3, it is not – then to compute the DFT of that would produce 1024 frequency coefficients, from (0-1023 / 2) cycles / sampling interval. Each of these coefficients would be a complex number, and the whole set of them can be used to reconstruct the original sample-set accurately, by inverting the DFT. The inverse of the DFT is actually the DFT computation again, but with the imaginary (sine) component inverted (negated).

But, MP3s do not use the DFT, instead using the DCT, the main difference in which is, that the DCT does not record a complex number for each coefficient, rather just stating a real number, which would normally correspond only to the cosine function within the DFT… Admittedly, each of these absolute amplitudes may possibly be negated.

If the time-domain signal consisted of a 5 kHz wave, which was pulsating on and off 200 times per second – which would actually sound like buzzing to human ears – then the DCT would record a frequency component at 5kHz, but as long as they are not suppressed due to the psychoacoustic models used, would also record ‘sidebands’ at 4800 and at 5200 Hz, each of which has 1/2 the amplitude of the center frequency at 5 kHz. I know this, because for the center frequency to be turned on and off, it must be amplitude modulated, virtually. And so what has this time-domain representation, even though this happens faster than once per sampling window, also has a valid frequency-domain representation.

When this gets decoded, the coefficient-set will reproduce a sample-set, whose 5 kHz center frequency again seems to ‘buzz’ 200 times per second, due to the individual frequency components interfering constructively and then destructively, even though they are being applied equally across the entire sampling interval.

But because the coefficient-set was produced by a DCT, it has no accurate phase information. And so the exact time each 5 kHz burst has its maximum amplitude, will not correspond to the exact time it did before. This will only seem to correct itself once per frame. If the sampling interval was truly 1024 samples long, then a frame will recur every 512 samples, which is ~80 times per second.

Now the question could be asked, why it should not be possible to base lossy audio compression on the DFT instead. And the answer is that in principle, it would be possible to do so. Only, if each coefficient-set consisted of complex numbers, it would also become more difficult to compress the number of kbps kept in the stream, in an effective way. It would probably still not be possible to preserve the phase information perfectly.

And then as a side-note, this one hypothetical sample-set started out as consisting of real numbers. But with the DFT, the sample-set could carry complex numbers as easily as the coefficient-set did. If the coefficients were compressed-and-simplified, then the samples reproduced would probably end up being so, with complex values. In a case like this, the correct thing to do is to ignore the imaginary component, and only output the real component, as the decoded result…

When using a DCT to encode a stream of sound, which is supposed to be continuous, a vulgarization of the problem could be, that the stream contains ‘a sine wave instead of a cosine wave’, which would therefore get missed by all the sampling intervals, because only the product with the cosine function is being computed each time, for a specific coefficient. The solution that comes from the Math of the DCT itself is, that the phase of the unit vector generally rotates 90 degrees ~from each frame to the next~. To the best of my understanding, two sampling intervals will generally overlap by 50% in time, resulting in one frame half as long. It may be that the designers only compute the odd-numbered coefficients. Then, the same coefficient belonging to the next frame should be aware of this wave notwithstanding. Further, the sampling intervals are made to overlap when the stream is decoded again, such that a continuous wave can be reconstructed. ( :1 )

The only question I remain curious about, is whether a need exists when encoding with a DCT, to blend any given coefficient as belonging to two consecutive frames, the current one plus the previous one.

While it can be done, to use rectangular sampling windows for encoding, the results from that are likely to be offensive to listen to. So in practice, Blackman Windows should ideally be used for encoding (that are twice as long as a frame).

The choice of whether decoders should use a Hanning Window or a Linear Taper, can depend on what sort of situation should best be reproduced.

Decoding with a linear taper, will cause crescendos to seem maximally smooth, and perfectly so if the crescendo is linear. But considering that linear crescendos might be rare in real music, a Hanning Window will minimize the distortion that is generated, when a burst of sound is decoded, just as a Blackman Window was supposed to do when encoding. Only, a Blackman Window cannot be used to decode, because coefficients being constant from one frame to the next would result in non-constant (output) sample amplitudes.

Dirk

(Edit 05/18/2016 : ) One related fact should be acknowledged. The DCT can be used to reconstruct a phase-correct sample-set, if non-zero even-numbered as well as odd-numbered coefficients are included. This follows directly from the fact that a ‘Type 3′ DCT is the inverse of the ‘Type 2′. But, the compression method used by several codecs is such, that a psychoacoustic model suppresses coefficients, on the assumption that they should be inaudible, because they are too close spectrally, to stronger ones. This would almost certainly go into effect, between complementary even-numbered and odd-numbered DCT coefficients.

( 05/31/2016 : ) One detail which was not made clear to me, was whether instead, coefficients which are in the same sub-band as one that has a stronger peak, are merely quantized more, due to the scale-factor of that sub-band being higher, to capture this higher peak. This would strike me as favorable, but also results in greater bit-rates, than what would follow, from setting supposedly-inaudible coefficients to zero. Due to Huffman Encoding, the bit-length of a more-quantized coefficient, is still longer than that for the value (zero).

In any type of Fourier Transform, signal energy at one frequency cannot be separated fully from energy measured at a frequency different by only half a cycle per frame. When the difference is at least by one cycle per frame, energy and therefore amplitude become isolated. This does not mean however, that the presence of a number of coefficients equal to the number of samples, is always redundant.

And so, one good way to achieve some phase-correctness might be, to try designing a codec, which does not rely too strongly on the customary psychoacoustic models. For example, a hypothetical codec might rely on Quantization, followed by Exponential-Golomb Coding of the coefficients, being sure to state the scale of quantization in the header information of each frame.

It is understood that such approaches will produce ‘poorer results’ at a given bit-rate. But then, simply choosing a higher bit-rate (than what might be appropriate for an MP3) could result in better sound.

And then, just not to make our hypothetical codec too primitive, one could subdivide the audible spectrum into 8 bands, each one octave higher than the previous, starting from coefficient (8), so that each of these bands can be quantized by a different scale, according to the Threshold Of Audibility. These Human Loudness Perception Curves may be a simple form of psychoacoustics, but are also thought to be reliable fact, as perceived loudness does not usually correspond consistently to uniform spectral distribution of energy.

Parts of the spectrum could be quantized less, for which ‘the lower threshold of hearing’ is lower with respect to a calculable loudness value, at which the Human ears are thought to be uniformly sensitive to all frequencies.

Assigning such a header-field 8 times for each frame would not be prohibitive.

1: ) ( 05/31/2016 ) An alternative approach, which the designers of MP3 could just as easily have used, would have been first to compute the DCT, including both even- and odd-numbered coefficients F(k), but then to derive only the even-numbered coefficients from that. The best way would have been, for even numbered, derived coefficient G(k) to be found as

r = F(k)

i = F(k+1) – F(k-1)

G(0) = F(0)

G(k) = sqrt( r^2 + i^2 )