Different types of music, for testing Audio Codecs – Or Maybe Not.

One of my recent activities has been, to start with Audio CDs from the 1980s and 1990s, the encoding of which was limited only by the 44.1kHz sample rate, and the bit-depth, as well as by whatever type of Sinc Filter was once used to master them, but not limited by any sort of lossy compression, and to “rip” those, but into different types of lossy compression, in order to evaluate that. The two types of compression I recently played with were ‘AAC’ (plain), and ‘OGG Opus’, at 128kbps both times.

One of the apparent facts which I learned was that Phil Collins music is not the best type to try this with. The reason is the fact that much of his music was recorded using electronic instruments of his era, the main function of which was, to emulate standard acoustical instruments, but in a way that was ‘acoustically pure’. The fact that Phil Collins started his career as a drummer, did not prevent him from releasing later, solo albums.

If somebody is listening to an entire string section of an orchestra, or to a brass section, then one factor which contributes to the nature of the sound is, that every Violin is minutely off-pitch, as would be every French Horn. But what that also means is that the sound which results, is “Thick”. Its spectral energy is spread in subtle ways, whereas if somebody mixes two sine-waves that have exactly the same frequency, then he obtains another sine-wave. If one mixes 10 sine-waves that have exactly the same frequency, then he still obtains one sine-wave.

Having sound to start from which is ‘Thick’, is a good basis to test Codecs. Phil Collins does not provide that. Therefore, if the acoustical nature of the recording is boring, I have no way to know, whether it’s the Codec that failed to bring out greater depth, or whether that was the fault of Phil Collins.

(Update 8/03/2019, 12h55 : )

Since the last time I edited this posting, I learned, that Debian / Stretch, the Linux version I have installed on the computer I name ‘Phosphene’, only ships with ‘libopus v1.2~alpha2-1′ from the package repositories. Apparently, when using this version, the best one can hope for is an equivalent, to 128kbps MP3 -quality. This was the true reason, for which I was obtaining inferior results, along with the fact that I had given the command to encode my Opus Files using a version of ‘ffmpeg’, that just happened to include marginal support for ‘Opus’, instead of using the actual ‘opus-toolkit’ package.

What I have now done was, to download and custom-compile ‘libopus v1.3.1′, as well as its associated toolkit, and just to make sure that the programs work. Rumours have it, that when this version is used at a bit-rate of 96kbps, virtual transparency will result.

And I’ve written quite a long synopsis, as to why this might be so.

(Update 8/03/2019, 15h50 : )

I have now run an altered experiment, by encoding my Opus Files at 96kbps, and discovered to my amazement, that the sound I obtained seemed better, than what I had already obtained above, using 128kbps AAC-encoded Files.


 

(Update 8/04/2019, 10h10 : )

When I use the command ‘opusenc’, which I’ve custom-compiled as written above, it defaults to a frame-size of 20 milliseconds. Given a sampling rate of 48kHz, this amounts to a frame-size – or granule – of 960 samples. This is very different from what the developers were suggesting in their article in 2010 (see posting linked to above). With that sampling interval, the spectral resolution will be approximately as good as it was with MP3 encoding, or with AAC encoding, without requiring that any “Hadamard Transforms” be used. Unfortunately, even though the resulting sound-quality was very good, it also means that support for Hadamard Transforms was not tested, and the help info in using the command-line also does not mention them.

If Hadamard Transforms are in fact used, it will be easier to adapt the granules which have been encoded that way, when the transforms are being used to increase spectral resolution, not temporal resolution, because the decoder only needs to receive the signal, that any given sub-band contains 2x, or 4x, or 8x as many coefficients, as it would normally contain. Then, in order to be able to decode a given sub-band if the transform has been applied when encoding, the decoder needs to have a separately sized ‘iDCT’ buffer running, to decode the coefficient-numbers to, the output of which would need to be added, to the output resulting from the main ‘Inverse Discrete Cosine Transform’. If the transform was used to increase temporal resolution, more would be required of the decoder, such as to split a sub-band into 2, 4, or 8 time-intervals, even though a shorter frame-size has not been set. But again, a separate ‘iDCT’ buffer would need to be running.

So it’s very possible that, because a rather long frame-size was already set, Hadamard Transforms were never used in the present exercise.

The ‘opusenc’ command that I custom-compiled allows me to shorten the frame-size to 10, 5, or 2.5 milliseconds, and then, in the last case, the frames will be 120 samples short. In that case, it would be difficult to imagine that the Codec does nothing to improve the spectral resolution. However, doing so might just ensure that my Samsung Galaxy S9 would not be able to play the resulting files anymore.


 

(Update 8/12/2019, 6h00 : )

I have now listened very carefully to the Phil Collins Music encoded with AAC at 128kbps, and the exact same songs, encoded with Opus at 96kbps. And what I’ve come to find, is that Opus seems to preserve spectral complexity better than AAC. However, the AAC encoded versions of the same music, seem to provide slightly better perception of positioning all around the listener, of instruments and voices in the lower-mid-range frequencies, as a result of Stereo. And this would be, when the listener is using a good set of headphones.

Dirk