Some Trivia about Granules of Sound

One of the subjects which I’ve blogged about often, is the compression of sound, including Codecs which are based in the frequency-domain, rather than in the time-domain. What I’ve basically written is that in such cases, the time-domain samples of sound generate granules of frequency-domain coefficients, which are then in turn quantized. What tends to happen is that a new granule of sound is encoded every 576 time-domain samples, but that each time, a 1152-sample sampling window is used, and that due to the application of the “Modified Discrete Cosine Transform” (the ‘MDCT’), what amounts to all the odd coefficients of the Type 2 ‘DCT‘ are encoded, resulting in 576 coefficients being encoded each time. The present sampling window’s cosine function corresponds to the previous and next sampling window’s sine function, so that in a way that is orthogonal, these overlapping sampling windows also have the potential to preserve phase-information.

One observation which my readers may have about this, is the fact that while it does a good job at maintaining spectral resolution, this granule-size does not provide good temporal resolution. Therefore, a mechanism which MP3 compression introduced already, was ‘transient detection’. This feature can arbitrarily replace one of these full-length granules with 3 granules that only generate 192 frequency coefficients, and that recur as frequently.

The method by which transients are detected may be simple. For example, these short granules may tentatively have the stream subdivided all the time, but if any one of them contains more than average variance – which corresponds to signal energy – for example, if one shorter granule contains 1.5 times the average signal energy between the current 3, then this switch can take place.

What I do know is that when granules of sound – or rather, the quantized spectral information from granules of sound – are included in the stream, they include two extra bits each time, that define what the “Zone” of the present granule is. This can be one of four zones:

  • A full-sized granule belonging to a stream of them,
  • A shortened granule, belonging to a stream of them,
  • A shortened granule, that precedes a full-sized granule,
  • A shortened granule, that follows a full-sized granule.
  • Because it’s inherent in MP3 compression that the entire current sampling window must overlap, partially with the preceding, and partially with the following one, there may be no special rule for how to shape a sampling window, that corresponds to a long granule, both preceded and followed by shortened ones. However, when that happens, both the preceding and following shortened granules will be encoded, to be followed and preceded respectively, by a long granule, for which reason those granules will already have long overlap-portions. Therefore, the current granule in such a case can be encoded as though it was just part of a sequence of long granules.

This information is ultimately non-trivial because it also affects the computation of sampling windows, i.e., it also affects the exact windowing function to be used when encoding. If the granule is followed or preceded by short granules, then either side of the windowing function must also be shortened. (:1)

Now, in the case of other Codecs, such as ‘OGG Vorbis’, a similar approach is taken. But I can well imagine that if specific ideals were simply implemented exactly as they were with MP3 sound, then eventually, the owners of the MP3 Codec might cry foul, over software patent violations. And yet, this problem can easily be sidestepped, let’s say by deciding that the shortened granules be made 1/2 the length of the full-sized granule, instead of 1/3 that length. And at that point the implementation would be sufficiently different from the original idea, that it would no longer constitute a patent violation.

Continue reading Some Trivia about Granules of Sound

There exists HD Radio.

In Canada and the USA, a relatively recent practice in FM radio has been, to piggy-back a digital audio stream, onto the carriers of some existing, analog radio carriers. This is referred to as “HD Radio”. A receiver as good as the broadcasting standard should cost slightly more than $200. This additional content isn’t audible to people who have standard, analog receivers, but can be decoded by people who have the capable receivers. I like to try evaluating how well certain ‘Codecs’ work, which is an acronym for “Compressor-Decompressor”. Obviously, the digital audio has been compressed, so that it will take up a narrower range of radio-frequencies than it offers audio-frequencies. In certain cases, either a poor choice, or an outdated choice of a Codec in itself, can leave the sound-quality injured.

There was an earlier blog posting, in which I described the European Standard for ‘DAB’ this way. That uses ‘MPEG-1, Layer 2′ compression (:1). The main difference between ‘DAB’ and ‘HD Radio’ is the fact that, with ‘DAB’ or ‘DAB+’, a separate band of VHF frequencies is being used, while ‘HD Radio’ uses existing radio stations and therefore the existing band of frequencies.

The Codec used in HD Radio is proprietary, and is owned by a company named ‘iBiquity’. Some providers may reject the format, over an unwillingness to enter a contractual relationship with one commercial undertaking. But what is written is, that The Codec used here resembles AAC. One of the things which I will not do, is to provide my opinion about a lossy audio Codec, without ever having listened to it. Apple and iTunes have been working with AAC for many years, but I’ve neither owned an iPhone, nor an OS/X computer.

What I’ve done in recent days was to buy an HD Radio -capable Receiver, and this provides me with my first hands-on experience with this family of Codecs. Obviously, when trying to assess the levels of quality for FM radio, I use my headphones and not the speakers in my echoic computer-room. But, it can sometimes be more relaxing to play the radio over the speakers, despite the loss of quality that takes place, whenever I do so. (:2)

What I find is that the quality of HD Radio is better than that of analog, FM radio, but still not as good as that of lossless, 44.1kHz audio (such as, with actual Audio CDs). Yet, because we know that this Codec is lossy, that last part is to be expected.

(Updated 8/01/2019, 19h00 … )

Continue reading There exists HD Radio.

A Basic Limitation in Stereo FM Reproduction

One of the concepts which exist in modern, high-definition sound, is that Human Sound perception can take place between 20 Hz and 20kHz, even though those endpoints are somewhat arbitrary. Some people cannot hear frequencies as high as 20kHz, especially older people, or anybody who just does not have good hearing. Healthy, young children and teenagers can typically hear that entire frequency range.

But, way back when FM radio was invented, sound engineers had flawed data about what frequencies Humans can hear. It was given to them as data to work with that Humans can only hear frequencies from 30Hz to 15kHz. And so, even though Their communications authorities had the ability to assign frequencies somewhat arbitrarily, they did so in a way that was based on such data. (:1)

For that reason, the playback of FM Stereo today, using household receivers, is still limited to an audio frequency range from 30Hz to 15kHz. Even very expensive receivers will not be able to reproduce sound, that was once part of the modulated input, outside this frequency range, although other reference points can be applied, to try to gauge how good the sound quality is.

There is one artifact of this initial standard which was sometimes apparent in early receivers. Stereo FM has a pilot frequency at 19kHz, which a receiver needs to lock an internal oscillator to, but in such a way that the internal oscillator runs at 38kHz, but such that this internal oscillator can be used to demodulate the stereo part of the sound. Because the pilot signal which is actually part of the broadcast signal is ‘only’ at 19kHz, this gives an additional reason to cut off the audible signal at ‘only’ 15Khz; the pilot is not meant to be heard. But, way back in the 1970s and earlier, Electrical Engineers did not have the type of low-pass filters available to them which they do now, that are also known as ‘brick-wall filters’, or filters that attenuate frequencies above the cutoff frequency very suddenly. Instead, equipment designed to be manufactured in the 1970s and earlier, would only use low-pass filters with gradual ‘roll-off’ curves, to attenuate the higher frequencies progressively more, above the cutoff frequency by an increasing distance, but in a way that was gentle. And in fact, even today the result seems to be, that gentler roll-off of the higher frequencies, results in better sound, when the quality is measured in other ways than just the frequency range, such as, when sound quality is measured for how good the temporal resolution, of very short pulses, of high-frequency sound is.

Generally, very sharp spectral resolution results in worse temporal resolution, and this is a negative side effect of some examples of modern sound technology.

But then sometimes, when listeners with high-end receivers in the 1970s and before, who had very good hearing, were tuned in to an FM Stereo Signal, they could actually hear some residual amount of the 19kHz pilot signal, which was never a part of the original broadcast audio. That was sometimes still audible, just because the low-pass filter that defined 15kHz as the upper cut-off frequency, was admitting the 19kHz component to a partial degree.

One technical accomplishment that has been possible since the 1970s however, in consumer electronics, was an analog ‘notch filter’, which seemed to suppress one exact frequency – or almost so – and such a notch filter could be calibrated to suppress 19kHz specifically.

Modern electronics makes possible such things as analog low-pass filters with a more-sudden frequency-cut-off, digital filters, etc. So it’s improbable today, that even listeners whose hearing would be good enough, would still be receiving this 19kHz sound-component to their headphones. In fact, the sound today is likely to seem ‘washed out’, simply because of too many transistors being fit on one chip. And when I just bought an AM/FM Radio in recent days, I did not even try the included ear-buds at first, because I have better headphones. When I did try the included ear-buds, their sound-quality was worse than that, when using my own, valued headphones. I’d say the included ear-buds did not seem to reproduce frequencies above 10kHz at all. My noise-cancelling headphones clearly continue to do so.

One claim which should be approached with extreme skepticism would be, that the sound which a listener seemed to be getting from an FM Tuner, was as good as sound that he was also obtaining from his Vinyl Turntable. AFAIK, the only way in which this would be possible would be, if he was using an extremely poor turntable to begin with.

What has happened however, is that audibility curves have been accepted – since the 1980s – that state the upper limit of Human hearing as 20kHz, and that all manner of audio equipment designed since then takes this into consideration. This would include Audio CD Players, some forms of compressed sound, etc. What some people will claim in a way that strikes me as credible however, is that the frequency-response of the HQ turntables was as good, as that of Audio CDs was. And the main reason I’ll believe that is the fact that Quadraphonic LPs were sold at some point, which had a sub-carrier for each stereo channel, that differentiated that stereo channel front-to-back. This sub-carrier was actually phase-modulated. But in order for Quadraphonic LPs to have worked at all, their actual frequency response need to go as high asĀ  40kHz. And phase-modulation was chosen because this form of modulation is particularly immune to the various types of distortion which an LP would insert, when playing back frequencies as high as 40kHz.

About Digital FM:

(Updated 7/3/2019, 22h15 … )

Continue reading A Basic Limitation in Stereo FM Reproduction

Threshold Elimination in Compressed Sound

I’ve written quite a few postings in this blog, about sound compression based on the Discrete Cosine Transform. And mixed in with my thoughts about that – where I was still, basically, trying to figure the subject out – were my statements to the effect that frequency-coefficients that are below a certain threshold of perceptibility could be set to zeroes, thus reducing the total number bits taken up, when Huffman-encoded.

My biggest problem in trying to analyze this is, the fact that I’m considering generalities, when in fact, specific compression methods based on the DCT, may or may not apply threshold-elimination at all. As an alternative, the compression technique could just rely on the quantization, to reduce how many bits per second it’s going to allocate to each sub-band of frequencies. ( :1 ) If the quantization step / scale-factor was high enough – suggesting the lowest quality-level – then many coefficients could still end up set to zeroes, just because they were below the quantization step used, as first computed from the DCT.

My impression is that the procedure which gets used to compute the quantization step remains straightforward:

  • Subdivide the frequencies into an arbitrary set of sub-bands – fewer than 32.
  • For each sub-band, first compute the DCTs to scale.
  • Take the (absolute of the) highest coefficient that results.
  • Divide that by the quality-level ( + 0.5 ) , to arrive at the quantization step to be used for that sub-band.
  • Divide all the actual DCT-coefficients by that quantization step, so that the maximum, (signed) integer value that results, will be equal to the quality-level.
  • How many coefficients end up being encoded to having such a high integer value, remains beyond our control.
  • Encode the quantization step / scale-factor with the sub-band, as part of the header information for each granule of sound.

The sub-band which I speak of has nothing to do with the fact that additionally, in MP3-compression, the signal is first passed through a quadrature filter-bank, resulting in 32 sub-bands that are evenly-spaced in frequencies by nature, and that the DCT is computed of each sub-band. This latter feature is a refinement, which as best I recall, was not present in the earliest forms of MP3-compression, and which does not affect how an MP3-file needs to be decoded.

(Updated 03/10/2018 : )

Continue reading Threshold Elimination in Compressed Sound