In Canada and the USA, a relatively recent practice in FM radio has been, to piggy-back a digital audio stream, onto the carriers of some existing, analog radio carriers. This is referred to as “HD Radio”. A receiver as good as the broadcasting standard should cost slightly more than $200. This additional content isn’t audible to people who have standard, analog receivers, but can be decoded by people who have the capable receivers. I like to try evaluating how well certain ‘Codecs’ work, which is an acronym for “Compressor-Decompressor”. Obviously, the digital audio has been compressed, so that it will take up a narrower range of radio-frequencies than it offers audio-frequencies. In certain cases, either a poor choice, or an outdated choice of a Codec in itself, can leave the sound-quality injured.
There was an earlier blog posting, in which I described the European Standard for ‘DAB’ this way. That uses ‘MPEG-1, Layer 2′ compression (:1). The main difference between ‘DAB’ and ‘HD Radio’ is the fact that, with ‘DAB’ or ‘DAB+’, a separate band of VHF frequencies is being used, while ‘HD Radio’ uses existing radio stations and therefore the existing band of frequencies.
The Codec used in HD Radio is proprietary, and is owned by a company named ‘iBiquity’. Some providers may reject the format, over an unwillingness to enter a contractual relationship with one commercial undertaking. But what is written is, that The Codec used here resembles AAC. One of the things which I will not do, is to provide my opinion about a lossy audio Codec, without ever having listened to it. Apple and iTunes have been working with AAC for many years, but I’ve neither owned an iPhone, nor an OS/X computer.
What I’ve done in recent days was to buy an HD Radio -capable Receiver, and this provides me with my first hands-on experience with this family of Codecs. Obviously, when trying to assess the levels of quality for FM radio, I use my headphones and not the speakers in my echoic computer-room. But, it can sometimes be more relaxing to play the radio over the speakers, despite the loss of quality that takes place, whenever I do so. (:2)
What I find is that the quality of HD Radio is better than that of analog, FM radio, but still not as good as that of lossless, 44.1kHz audio (such as, with actual Audio CDs). Yet, because we know that this Codec is lossy, that last part is to be expected.
(Updated 8/01/2019, 19h00 … )
(As of 7/16/2019 : )
With any Codec similar to AAC, the content source is able to adjust the bit-rate, and therefore ostensibly, to achieve whatever level of sound quality they wish, only limited in the end, either by the resulting file-size, or by the maximum rate of ‘kbps’ that a broadcast, digital stream can carry. But an interesting detail with HD Radio is the fact, that some stations which offer it, actually offer 2 and not 1 HD channel. Because at that point, the stream is digital, this is a trivial accomplishment, since all that’s required is to multiplex 2 or more channels into 1, using mere logic. What I find to be an interesting observation, though, is that:
- Stations which use HD Radio to provide Classical Music, will often put that content as the second out of two logical channel-choices – obviously for strange people like me, who listen to Classical Music at least part of the time, and
- The quality of the resulting stream is still what it should be for Classical Music, on the proviso that the listener does not have the ultimate in musical hearing skills. (:3)
In Europe, the DAB+ standard is meant to provide such an improved quality as well. When they adopted DAB, they needed to use Codecs that were available at the time, which initially gave DAB poorer sound, than DAB+ presumably offers them as well.
What most users refer to as ‘MP3′, is actually ‘MPEG-1, Layer 3‘ Compression. Only, this format is usually stored in .MP3 Files. By comparison, ‘MPEG-1, Layer2′ damages sound less, than …Layer 3 Compression did. The trade-off is, that …Layer 2 also compresses the stream less. Thus, when the European regulatory bodies were adopting MPEG1, Layer 2 for DAB, they did know what they were doing.
My ability to play the audio from my new, HD Radio receiver over my speakers, was contingent on first setting up a loop-back, on the ‘Pulseaudio’ server of my computer, that computer being named ‘Phosphene’. This is because the rear jacks of the receiver neither have speaker amplifiers, nor a volume-control. So, when listening to it over my speakers, I depend on my sound-card resampling the analog signal from the receiver to arrive at a 44.1kHz digital stream that only exists inside the computer, and that consumes a minimal amount of CPU time. This stream is multiplied by the volume-levels that I get to set in the GUI of my computer. And then, the resulting stream is converted back into analog form by my sound-card, and sent to my speakers.
It’s a good thing, then, that my sound-card allows for full-duplex operation.
The volume-control buttons on the receiver, only control the headphone-jack.
But what this also means is that there could be many artifacts, in what’s potentially wrong with my speaker playback, that are simply due to the resampling by the sound card, not the receiver.
A problem which many PC sound cards have, is to put a ‘high-frequency’ noise (of an audible number of Kilohertz) into the analog signal, both coming in and going out, that originated as ripples in the PC’s supply voltage. Those imperfections don’t affect the operation of digital circuits, but the reverse does happen.
If a sound card had special filtering against power-supply noise, this would be a luxury feature which I never paid for.
Switching power-supplies tend to operate at some frequency in the Kilohertz range.
Even though the PC’s CPU is clocked at Gigahertz frequencies, this doesn’t prevent a simultaneity in the switching of digital circuits, which has a period of recurrence in the Kilohertz range, and which Computer Experts were not specifically trained to be aware of, until they are given a reason to be aware of it.
The way in which this family of Codecs selects what part of the spectral information to encode, is ‘more intelligent’ (in an AI sort of way), than the way MPEG-1, Layer 3 does so. And one result of that is, the fact that much lower bit-rates can be used for music, than with any other Codec I’m familiar with. In fact, the level of quality requires maybe half the bit-rate, that an MP3 would require, to achieve ‘the same thing’. But the real question which still lingers in my mind would be:
If the content provider set the bit-rate to 128kbps, for that one channel, given that he has the capacity to transmit at least two channels that way, is there still a disadvantage for Classical Music?
What I seem to hear is that, while listening to Classical Music, at no point in time do I hear the entire spectrum. I only get to hear the parts of the spectrum that are being played at by the main instruments at any one time, as well as their harmonics, and the sibilants (High-frequency components associated with the instruments, but not identifiable as having much more than a short-term amplitude-definition). It might be, that the higher frequency-coefficients are being quantized more, than the mid-range frequency-coefficients.
And yet, my capacity to hear the full spectrum, during an entire recording, could be a part of the experience which I’m used to, giving me ‘that feel of complete sound’. However, If the stream has in fact been reduced to 128kbps, Science Dictates that the same amount of information can’t be contained within, as would be, with some lossless format. The results are ‘Very Listenable, Classical Music.’
With Some of the compression-formats that I’m familiar with, that are based on the Modified Discrete Cosine Transform, what gets done after the time-domain granules of sound have been converted into frequency-domain sets of coefficients, and those have been quantized, is that they are merely grouped into pairs, and the Huffman-Encoded in such a way, that if both coefficients belonging to one pair are zero, a single bit corresponding to zero is nevertheless encoded. What this naturally tends to do in the case of a 44.1kHz sample-rate, is to require that at least 22,050 bits per second are encoded, even if they are all zeroes, plus all the header information, plus additional bits for non-zero coefficients.
This can strictly be considered wasteful. Hypothetically, if the number of frequency sub-bands in the audible spectrum was said to equal 32, then each of those potentially encode 18 frequency coefficients. What can be done for each sub-band would be, to encode one additional bit, which can state that all the coefficients in the sub-band are zeroes, and if it does, all the individual bits for that sub-band can be omitted from the stream. If this approach can be combined with greater quantization for some parts of the spectrum, it can conserve the number of integers that need to be Huffman-Encoded, and thus also, the minimum bit-rate needed, considerably.
And in that case Yes, If the encoding is also designed to act as a kind of ‘noise-gate’, a real reduction in the bit-rate will result. It can still be argued that, if some Quality-level is assigned a higher number, more bits of the stream can be made to encode the parts of the spectrum which the listener ‘is supposed to hear’, and which would then be quantized less.
(Update 8/01/2019, 19h00 : )
What some sources suggest is, that AAC encoding should be set to 256kbps for maximum quality. I have a problem with this estimate.
In the case of MP3, I fail to notice any improvement of the sound quality, if I increase the bit-rate above 192kbps, let’s say to 256kbps. Some streaming services will use that bit-rate with MP3, to try to maximize the sound quality. But I hear a basic limitation in the sound still, that was already present in MP3s which I encoded myself at 192kbps.
According to that, it may not make much sense to increase the bit-rate of the AAC-encoded audio, above 128kbps. I have now done some experimental encoding of music to that format, actually to try out different bit-rates. I’ve also listened to some HD Radio, where two channels were being offered – therefore each at a halved bit-rate, and out of which the Classical-Music channel was listenable. Also, The Codec actually used by HD Radio isn’t exactly AAC. It has some extra features designed to reduce the overall bit-rate, such as Spectral Band Replication. Because of the way that works, half the frequency coefficients don’t need to be encoded at all, only header information about how those are to be derived from the lower half (actually the upper half, of the lower half) of the spectrum, once per granule of sound. I suppose that would specify a multiplication-factor, as well as an additive amount of white noise.
Given that reality, it might make sense to say, that 128kbps with the AAC-HE Codec, is roughly equivalent to 192kbps with plain AAC.
(Edit 7/19/2019, 6h10 : )
This last observation might sound like a condemnation, but it shouldn’t be. The way most people hear frequencies above 10kHz, takes the form of ‘a bright hissing sound’ anyway, without much in the way of further definition. And it’s like that with me as well. I can just barely spot that this hissing sound is less-defined than the hissing sound from other sources.
For example, I just listened to a string performance on my headphones, that included Violins playing high notes, as well as either Cellos or Basses. The high-pitched component of the violins came through well, while there is also supposed to be some high-pitched sound coming from a cello and a bass, which in this case was not coming through well.
Is it possible that the variant of AAC-HE which HD Radio uses, just refuses to add any white noise, when the harmonics of defined frequencies fail to produce high-pitched sound? Or, Is it possible that a radio station that specializes in Classical Music just has the option to turn this feature down?
Hypothetically, the associated encoder could have as a parameter, a threshold which an expected, added amount of white noise needs to exceed, before an amount of white noise is transmitted as part of the granule of sound. And for Classical Music, that threshold could simply be set very high – i.e., at (100 – 20) db – so that only such events as ‘cymbals crashing’ would set this off.
(End of Edit, 7/18/2019, 18h55.)
Because it might be useless to try to compute a correlation in the time-domain, between two signals, where one signal has twice the frequencies of the other, in order to compute this added data, the proper encoding of AAC-HE will still require that all the coefficients be computed with the DCT. But what can then be done is, that starting from the first coefficient after 1/4 the spectrum, the index of the coefficients can be doubled, (1) subtracted, and a correlation can be computed, in a way that preserves signs, between half the available coefficients in the upper half of the spectrum (that represent the second harmonics), and their corresponding coefficients belonging to the upper half, of the lower half, of the spectrum (that represent the fundamental frequencies). As with any good correlation, a Y-intercept can then also be computed.
I’m assuming that the index of the coefficients starts with (1), not (0). (:4)
This is a special case, in which the absolute of the causal values ( = fundamental frequencies ) is to be multiplied by (1/2) the absolute, of the correctly computed correlation, and the result subtracted from the absolute, of the derived values ( = second harmonics ) , in order to compute how much white noise should later be added, when decoding.
Before the Y-intercept is computed, as a refinement, a separate correlation could compute, to what extent the upper half of the spectrum follows as a third harmonic, of the upper half, of the lower third of the coefficients, the index of which is tripled and (1) subtracted. In that case, because there is an overlap in the indexes between multiples of 2 and multiples of 3, a more interesting way exists to compute the Y-intercept – i.e., the amount of sibilant that will need to be added. The correctly computed (signed) product of each fundamental coefficient can be subtracted from the harmonic coefficient individually, and the average of the remaining absolutes computed… The result of that would simulate more closely, what will happen in the decoder. And the number of higher coefficients left-out, either as doubled or tripled indexes, would correspond to 1/3 of values never derived.
(Update 7/17/2019, 18h45 : )
The described exercise can be continued, to compute correlations for higher harmonics, that are prime numbers. Half the prime number would need to be subtracted from the index, but (0.5) added. However, it’s doubtful that doing so will improve the sound quality much, unless another type of problem is solved first: Even though, when critically sampled, the sound format accepts frequencies up to 22.05kHz, it can easily happen that faults in the media source act as a low-pass filter, with a cutoff-frequency lower than that. As I left it, this by itself would lead to an underestimation of the computed correlations, which in turn would lead to an excessively high sibilant-amplitude.
One way to solve that problem would be, to bisect the upper half of the audible spectrum, so that two sets of parameters are computed separately, one set for the interval from (1/2) to (3/4) times Nyquist Frequency, and the other set for the interval from (3/4) to (1) times Nyquist Frequency. And then, because each set of parameters needs to restate both the correlation for all harmonics accounted for, as well as to restate the amplitude of the sibilant, a point in time would eventually come, when doing all this takes up as many bits per granule of sound, as it would have taken just to encode the coefficients.
(Update 7/17/2019, 19h50 : )
I would think that a good compromise, which still yields considerable data reduction, would be, to compute the correlations that correspond to the second and third harmonics only, but to base this computation on the interval from (1/2) to (3/4) times Nyquist Frequency when encoding, and then to pretend that the application of this data is also valid for the interval from (3/4) to (1) times Nyquist Frequency. As a consequence, I’d compute two amplitudes for the sibilant, that form the Y-intercepts, separately for these two intervals of the coefficients. This would follow as the subtraction each time, of (1/2) the absolute of the correlation times the fundamental coefficients for the second harmonic, then, of (1/3) the absolute of the correlation times the fundamental coefficients for the third harmonic, from the average of the absolutes of the coefficients of the higher sub-band.
I suppose that the third harmonic would be based only 48 pairs of coefficients, then.
(Updated 7/31/2019, 22h05 : )
Because of the way the MDCT works, the exact determination of the 2nd harmonic is highly non-trivial, and the method I suggested above detunes it by approximately -20Hz, assuming a sample-rate of 44.1kHz, and 576-sample ‘granules’ of sound.
If a more accurate way is sought, that may be achievable, as a subtraction between two adjacent coefficient-values, i.e., by the subtraction of the value at the fundamental’s coefficient-index times two, from that at the same index minus one, when encoding. This value will naturally tend to be twice as high, as the average of the resulting absolutes, and must be halved before decoding. Also, this affects how much to subtract when computing the amplitude of the sibilant…
The possibility that the chosen polarity describes a cosine-wave which is negative with respect to the
cosine-wave at the fundamental frequencies matters not, because it would be decoded with the same polarity with which it was encoded. What would matter is if the decoding generated pulsed output, because of the way the lapped transform works. And one way to prevent that in turn, would be to apply the multiplier to a series of 4 or more coefficients when decoding, such that their resulting values alternate between positive and negative, and with diminishing amplitude as their index goes further away from the notional centre-frequency. I.e., one fundamental-frequency coefficient should multiply out, first by the encoded multiplier for the second harmonic, and then by:
Index Multiplier -2 -0.5 -1 +1.0 0 -1.0 +1 +0.5
Please note that this table assumes that the correlation that was initially found, has already been multiplied by (0.5) when encoding, and multiplied by (0.75) when determining the amplitude of the sibilant. If this second harmonic was to be multiplied out over 6 indexes instead of 4 when decoding, then it would also need to be multiplied by (0.875) when determining the amplitude of the sibilant, when encoding.
The methods described above will only detect third harmonics, when their peaks are in-phase with the peaks of the fundamental, and second harmonics, when their sine-waves coincide with the cosine-waves of the fundamental.
A way can exist also to detect harmonic components, which are 90⁰ out-of-phase. The way to do that would be, to compute a second, single correlation for each harmonic in question, that uses the fundamental coefficients from the previous granule of sound. Then, a Pythagorean Sum can be computed between the two correlations, and given
whatever sign there was, using the present granule. However, If this is done, Then the way the harmonics are played back will always put them in-phase again, with the fundamentals.
Alternatively, the same genius of the MDCT could be applied again, and the correlations be encoded separately, between the harmonic coefficients of the current granule, and the fundamental coefficients of the current and of the previous granule, each time with the sign preserved. Only, when it comes to Humans being able to hear harmonics above 10kHz, I don’t know whether this ability also recognizes phase-position. If it does not, then this extension of the exercise may only waste bits.
When decoding, one might just as easily assume that the second harmonic is always positive, and that the third is always negative, to correspond to a clamped (co)sine-wave. That way, the per-granule data on how to replicate the upper sub-band might not require sign bits at all.