Different types of music, for testing Audio Codecs – Or Maybe Not.

One of my recent activities has been, to start with Audio CDs from the 1980s and 1990s, the encoding of which was limited only by the 44.1kHz sample rate, and the bit-depth, as well as by whatever type of Sinc Filter was once used to master them, but not limited by any sort of lossy compression, and to “rip” those, but into different types of lossy compression, in order to evaluate that. The two types of compression I recently played with were ‘AAC’ (plain), and ‘OGG Opus’, at 128kbps both times.

One of the apparent facts which I learned was that Phil Collins music is not the best type to try this with. The reason is the fact that much of his music was recorded using electronic instruments of his era, the main function of which was, to emulate standard acoustical instruments, but in a way that was ‘acoustically pure’. The fact that Phil Collins started his career as a drummer, did not prevent him from releasing later, solo albums.

If somebody is listening to an entire string section of an orchestra, or to a brass section, then one factor which contributes to the nature of the sound is, that every Violin is minutely off-pitch, as would be every French Horn. But what that also means is that the sound which results, is “Thick”. Its spectral energy is spread in subtle ways, whereas if somebody mixes two sine-waves that have exactly the same frequency, then he obtains another sine-wave. If one mixes 10 sine-waves that have exactly the same frequency, then he still obtains one sine-wave.

Having sound to start from which is ‘Thick’, is a good basis to test Codecs. Phil Collins does not provide that. Therefore, if the acoustical nature of the recording is boring, I have no way to know, whether it’s the Codec that failed to bring out greater depth, or whether that was the fault of Phil Collins.

(Update 8/03/2019, 12h55 : )

Since the last time I edited this posting, I learned, that Debian / Stretch, the Linux version I have installed on the computer I name ‘Phosphene’, only ships with ‘libopus v1.2~alpha2-1′ from the package repositories. Apparently, when using this version, the best one can hope for is an equivalent, to 128kbps MP3 -quality. This was the true reason, for which I was obtaining inferior results, along with the fact that I had given the command to encode my Opus Files using a version of ‘ffmpeg’, that just happened to include marginal support for ‘Opus’, instead of using the actual ‘opus-toolkit’ package.

What I have now done was, to download and custom-compile ‘libopus v1.3.1′, as well as its associated toolkit, and just to make sure that the programs work. Rumours have it, that when this version is used at a bit-rate of 96kbps, virtual transparency will result.

And I’ve written quite a long synopsis, as to why this might be so.

(Update 8/03/2019, 15h50 : )

I have now run an altered experiment, by encoding my Opus Files at 96kbps, and discovered to my amazement, that the sound I obtained seemed better, than what I had already obtained above, using 128kbps AAC-encoded Files.


 

(Update 8/04/2019, 10h10 : )

When I use the command ‘opusenc’, which I’ve custom-compiled as written above, it defaults to a frame-size of 20 milliseconds. Given a sampling rate of 48kHz, this amounts to a frame-size – or granule – of 960 samples. This is very different from what the developers were suggesting in their article in 2010 (see posting linked to above). With that sampling interval, the spectral resolution will be approximately as good as it was with MP3 encoding, or with AAC encoding, without requiring that any “Hadamard Transforms” be used. Unfortunately, even though the resulting sound-quality was very good, it also means that support for Hadamard Transforms was not tested, and the help info in using the command-line also does not mention them.

If Hadamard Transforms are in fact used, it will be easier to adapt the granules which have been encoded that way, when the transforms are being used to increase spectral resolution, not temporal resolution, because the decoder only needs to receive the signal, that any given sub-band contains 2x, or 4x, or 8x as many coefficients, as it would normally contain. Then, in order to be able to decode a given sub-band if the transform has been applied when encoding, the decoder needs to have a separately sized ‘iDCT’ buffer running, to decode the coefficient-numbers to, the output of which would need to be added, to the output resulting from the main ‘Inverse Discrete Cosine Transform’. If the transform was used to increase temporal resolution, more would be required of the decoder, such as to split a sub-band into 2, 4, or 8 time-intervals, even though a shorter frame-size has not been set. But again, a separate ‘iDCT’ buffer would need to be running.

So it’s very possible that, because a rather long frame-size was already set, Hadamard Transforms were never used in the present exercise.

The ‘opusenc’ command that I custom-compiled allows me to shorten the frame-size to 10, 5, or 2.5 milliseconds, and then, in the last case, the frames will be 120 samples short. In that case, it would be difficult to imagine that the Codec does nothing to improve the spectral resolution. However, doing so might just ensure that my Samsung Galaxy S9 would not be able to play the resulting files anymore.


 

(Update 8/12/2019, 6h00 : )

I have now listened very carefully to the Phil Collins Music encoded with AAC at 128kbps, and the exact same songs, encoded with Opus at 96kbps. And what I’ve come to find, is that Opus seems to preserve spectral complexity better than AAC. However, the AAC encoded versions of the same music, seem to provide slightly better perception of positioning all around the listener, of instruments and voices in the lower-mid-range frequencies, as a result of Stereo. And this would be, when the listener is using a good set of headphones.

Dirk

 

There exists HD Radio.

In Canada and the USA, a relatively recent practice in FM radio has been, to piggy-back a digital audio stream, onto the carriers of some existing, analog radio carriers. This is referred to as “HD Radio”. A receiver as good as the broadcasting standard should cost slightly more than $200. This additional content isn’t audible to people who have standard, analog receivers, but can be decoded by people who have the capable receivers. I like to try evaluating how well certain ‘Codecs’ work, which is an acronym for “Compressor-Decompressor”. Obviously, the digital audio has been compressed, so that it will take up a narrower range of radio-frequencies than it offers audio-frequencies. In certain cases, either a poor choice, or an outdated choice of a Codec in itself, can leave the sound-quality injured.

There was an earlier blog posting, in which I described the European Standard for ‘DAB’ this way. That uses ‘MPEG-1, Layer 2′ compression (:1). The main difference between ‘DAB’ and ‘HD Radio’ is the fact that, with ‘DAB’ or ‘DAB+’, a separate band of VHF frequencies is being used, while ‘HD Radio’ uses existing radio stations and therefore the existing band of frequencies.

The Codec used in HD Radio is proprietary, and is owned by a company named ‘iBiquity’. Some providers may reject the format, over an unwillingness to enter a contractual relationship with one commercial undertaking. But what is written is, that The Codec used here resembles AAC. One of the things which I will not do, is to provide my opinion about a lossy audio Codec, without ever having listened to it. Apple and iTunes have been working with AAC for many years, but I’ve neither owned an iPhone, nor an OS/X computer.

What I’ve done in recent days was to buy an HD Radio -capable Receiver, and this provides me with my first hands-on experience with this family of Codecs. Obviously, when trying to assess the levels of quality for FM radio, I use my headphones and not the speakers in my echoic computer-room. But, it can sometimes be more relaxing to play the radio over the speakers, despite the loss of quality that takes place, whenever I do so. (:2)

What I find is that the quality of HD Radio is better than that of analog, FM radio, but still not as good as that of lossless, 44.1kHz audio (such as, with actual Audio CDs). Yet, because we know that this Codec is lossy, that last part is to be expected.

(Updated 8/01/2019, 19h00 … )

Continue reading There exists HD Radio.

A Basic Limitation in Stereo FM Reproduction

One of the concepts which exist in modern, high-definition sound, is that Human Sound perception can take place between 20 Hz and 20kHz, even though those endpoints are somewhat arbitrary. Some people cannot hear frequencies as high as 20kHz, especially older people, or anybody who just does not have good hearing. Healthy, young children and teenagers can typically hear that entire frequency range.

But, way back when FM radio was invented, sound engineers had flawed data about what frequencies Humans can hear. It was given to them as data to work with that Humans can only hear frequencies from 30Hz to 15kHz. And so, even though Their communications authorities had the ability to assign frequencies somewhat arbitrarily, they did so in a way that was based on such data. (:1)

For that reason, the playback of FM Stereo today, using household receivers, is still limited to an audio frequency range from 30Hz to 15kHz. Even very expensive receivers will not be able to reproduce sound, that was once part of the modulated input, outside this frequency range, although other reference points can be applied, to try to gauge how good the sound quality is.

There is one artifact of this initial standard which was sometimes apparent in early receivers. Stereo FM has a pilot frequency at 19kHz, which a receiver needs to lock an internal oscillator to, but in such a way that the internal oscillator runs at 38kHz, but such that this internal oscillator can be used to demodulate the stereo part of the sound. Because the pilot signal which is actually part of the broadcast signal is ‘only’ at 19kHz, this gives an additional reason to cut off the audible signal at ‘only’ 15Khz; the pilot is not meant to be heard. But, way back in the 1970s and earlier, Electrical Engineers did not have the type of low-pass filters available to them which they do now, that are also known as ‘brick-wall filters’, or filters that attenuate frequencies above the cutoff frequency very suddenly. Instead, equipment designed to be manufactured in the 1970s and earlier, would only use low-pass filters with gradual ‘roll-off’ curves, to attenuate the higher frequencies progressively more, above the cutoff frequency by an increasing distance, but in a way that was gentle. And in fact, even today the result seems to be, that gentler roll-off of the higher frequencies, results in better sound, when the quality is measured in other ways than just the frequency range, such as, when sound quality is measured for how good the temporal resolution, of very short pulses, of high-frequency sound is.

Generally, very sharp spectral resolution results in worse temporal resolution, and this is a negative side effect of some examples of modern sound technology.

But then sometimes, when listeners with high-end receivers in the 1970s and before, who had very good hearing, were tuned in to an FM Stereo Signal, they could actually hear some residual amount of the 19kHz pilot signal, which was never a part of the original broadcast audio. That was sometimes still audible, just because the low-pass filter that defined 15kHz as the upper cut-off frequency, was admitting the 19kHz component to a partial degree.

One technical accomplishment that has been possible since the 1970s however, in consumer electronics, was an analog ‘notch filter’, which seemed to suppress one exact frequency – or almost so – and such a notch filter could be calibrated to suppress 19kHz specifically.

Modern electronics makes possible such things as analog low-pass filters with a more-sudden frequency-cut-off, digital filters, etc. So it’s improbable today, that even listeners whose hearing would be good enough, would still be receiving this 19kHz sound-component to their headphones. In fact, the sound today is likely to seem ‘washed out’, simply because of too many transistors being fit on one chip. And when I just bought an AM/FM Radio in recent days, I did not even try the included ear-buds at first, because I have better headphones. When I did try the included ear-buds, their sound-quality was worse than that, when using my own, valued headphones. I’d say the included ear-buds did not seem to reproduce frequencies above 10kHz at all. My noise-cancelling headphones clearly continue to do so.

One claim which should be approached with extreme skepticism would be, that the sound which a listener seemed to be getting from an FM Tuner, was as good as sound that he was also obtaining from his Vinyl Turntable. AFAIK, the only way in which this would be possible would be, if he was using an extremely poor turntable to begin with.

What has happened however, is that audibility curves have been accepted – since the 1980s – that state the upper limit of Human hearing as 20kHz, and that all manner of audio equipment designed since then takes this into consideration. This would include Audio CD Players, some forms of compressed sound, etc. What some people will claim in a way that strikes me as credible however, is that the frequency-response of the HQ turntables was as good, as that of Audio CDs was. And the main reason I’ll believe that is the fact that Quadraphonic LPs were sold at some point, which had a sub-carrier for each stereo channel, that differentiated that stereo channel front-to-back. This sub-carrier was actually phase-modulated. But in order for Quadraphonic LPs to have worked at all, their actual frequency response need to go as high as  40kHz. And phase-modulation was chosen because this form of modulation is particularly immune to the various types of distortion which an LP would insert, when playing back frequencies as high as 40kHz.

About Digital FM:

(Updated 7/3/2019, 22h15 … )

Continue reading A Basic Limitation in Stereo FM Reproduction

A Gap in My Understanding of Surround-Sound Filled: Separate Surround Channel when Compressed

In This earlier posting of mine, I had written about certain concepts in surround-sound, which were based on Pro Logic and the analog days. But I had gone on to write, that in the case of the AC3 or the AAC audio CODEC, the actual surround channel could be encoded separately, from the stereo. The purpose in doing so would have been, that if decoded on the appropriate hardware, the surround channel could be sent directly to the rear speakers – thus giving 6-channel output.

While writing what I just linked to above, I had not yet realized, that either channel of the compressed stream, could contain phase information conserved. This had caused me some confusion. Now that I realize, that the phase information could be correct, and not based on the sampling windows themselves, a conclusion comes to mind:

Such a separate, compressed surround-channel, would already be 90⁰ phase-shifted with respect to the panned stereo. And what this means could be, that if the software recognizes that only 2 output channels are to be decoded, the CODEC might just mix the surround channel directly into the stereo. The resulting stereo would then also be prepped, for Pro Logic decoding.

Dirk