## Different types of music, for testing Audio Codecs – Or Maybe Not.

One of my recent activities has been, to start with Audio CDs from the 1980s and 1990s, the encoding of which was limited only by the 44.1kHz sample rate, and the bit-depth, as well as by whatever type of Sinc Filter was once used to master them, but not limited by any sort of lossy compression, and to “rip” those, but into different types of lossy compression, in order to evaluate that. The two types of compression I recently played with were ‘AAC’ (plain), and ‘OGG Opus’, at 128kbps both times.

One of the apparent facts which I learned was that Phil Collins music is not the best type to try this with. The reason is the fact that much of his music was recorded using electronic instruments of his era, the main function of which was, to emulate standard acoustical instruments, but in a way that was ‘acoustically pure’. The fact that Phil Collins started his career as a drummer, did not prevent him from releasing later, solo albums.

If somebody is listening to an entire string section of an orchestra, or to a brass section, then one factor which contributes to the nature of the sound is, that every Violin is minutely off-pitch, as would be every French Horn. But what that also means is that the sound which results, is “Thick”. Its spectral energy is spread in subtle ways, whereas if somebody mixes two sine-waves that have exactly the same frequency, then he obtains another sine-wave. If one mixes 10 sine-waves that have exactly the same frequency, then he still obtains one sine-wave.

Having sound to start from which is ‘Thick’, is a good basis to test Codecs. Phil Collins does not provide that. Therefore, if the acoustical nature of the recording is boring, I have no way to know, whether it’s the Codec that failed to bring out greater depth, or whether that was the fault of Phil Collins.

(Update 8/03/2019, 12h55 : )

Since the last time I edited this posting, I learned, that Debian / Stretch, the Linux version I have installed on the computer I name ‘Phosphene’, only ships with ‘libopus v1.2~alpha2-1′ from the package repositories. Apparently, when using this version, the best one can hope for is an equivalent, to 128kbps MP3 -quality. This was the true reason, for which I was obtaining inferior results, along with the fact that I had given the command to encode my Opus Files using a version of ‘ffmpeg’, that just happened to include marginal support for ‘Opus’, instead of using the actual ‘opus-toolkit’ package.

What I have now done was, to download and custom-compile ‘libopus v1.3.1′, as well as its associated toolkit, and just to make sure that the programs work. Rumours have it, that when this version is used at a bit-rate of 96kbps, virtual transparency will result.

And I’ve written quite a long synopsis, as to why this might be so.

(Update 8/03/2019, 15h50 : )

I have now run an altered experiment, by encoding my Opus Files at 96kbps, and discovered to my amazement, that the sound I obtained seemed better, than what I had already obtained above, using 128kbps AAC-encoded Files.

(Update 10/02/2019, 12h25 : )

When I use the command ‘opusenc’, which I’ve custom-compiled as written above, it defaults to a frame-size of 20 milliseconds. Given a sampling rate of 48kHz, this amounts to a frame-size – or granule – of 960 samples. This is very different from what the developers were suggesting in their article in 2010 (see posting linked to above). With that sampling interval, the tonal accuracy will be approximately twice as good as it was with MP3 encoding, or with AAC encoding, without requiring that any “Hadamard Transforms” be used. With the default setting, Opus will generally be able to distinguish frequencies that are 25Hz apart, while it was in the nature of MP3 only to be able to distinguish between frequencies that are 40Hz apart, except for the fact that the lowest distinguishable frequency was near 20Hz.

If Hadamard Transforms are in fact used, then it now strikes me as more likely that they will be used to increase temporal resolution at the expense of spectral resolution, not the reverse. And the reason I think this is the following paragraph in the manpage, for the ‘opusenc’ command-line:


--framesize N
Set maximum frame size in milliseconds (2.5, 5, 10, 20, 40,  60,
default: 20)
Smaller  framesizes  achieve lower latency but less quality at a
given bitrate.
Sizes greater than 20ms  are  only  interesting  at  fairly  low
bitrates.



(Update 8/12/2019, 6h00 : )

I have now listened very carefully to the Phil Collins Music encoded with AAC at 128kbps, and the exact same songs, encoded with Opus at 96kbps. And what I’ve come to find, is that Opus seems to preserve spectral complexity better than AAC. However, the AAC encoded versions of the same music, seem to provide slightly better perception of positioning all around the listener, of instruments and voices in the lower-mid-range frequencies, as a result of Stereo. And this would be, when the listener is using a good set of headphones.

Dirk

## Testing of USB Sound Device Complete.

According to my previous posting, I needed to do a more thorough test of the USB Sound Card I have bought, which is a “Focusrite Scarlett 2i2“.

In particular, I needed to address the discrepancy according to which, the Linux JACK daemon reports capture at 32 bits, while the specifications of the sound card state a 24 bit sample format.

Also, I needed to be sure whether it would run as well at 96 kHz, as it already did at 48 kHz.

According to my more complete test, the 32-bit sample-format which ‘QJackCtl‘ shows me, which can be viewed in its Messages box, state the ALSA parameters and not the JACK internals. Therefore, JACK has after all chosen to capture and/or play back audio at a physical 32 bits, at the 96 kHz sample-rate. This is not, after all, a statement of the JACK internal behavior.

Since I am using Linux, and since the manufacturer chose to rate this capture device as only being capable of 24-bit capture, I must assume that for hardware reasons the device uses 32-bit registers, but that only the first, most-significant 24 of those bits are accurate. Therefore, when I open ‘QTractor‘ – the Digital Audio Workstation / Tracker application, it is best to truncate its capture format to 24 bits as well, which is most probably what the Windows or Mac drivers for this device do.

Aside from that, using QTractor next, to capture a 96 kHz, 24-bit, stereo FLAC file was easy and uneventful. Further, the stability of my software suggests that I can play with the GUIs as much as I need to, to figure them out, and I will not screw anything up.

After I closed JACK, I next imported this FLAC file, that plays for 14 seconds, into “Audacity“, which has been set up to use the default sound settings (‘PulseAudio‘), and which performs an on-demand re-sampling of the FLAC file.

The on-demand FLAC playback is not filtered well by Audacity, but since it is running at 96 kHz, compared with the 44.1 kHz that the internal sound of the laptop runs at, this observation is not surprising.

And then the captured sound clip simply contains, what I spoke into my microphone.

Dirk