My Opinion on the Opinion of Chris “Monty” Montgomery

Chris Montgomery is the Audio Expert, who invented the OGG Vorbis codec. That gives enough reason to accredit him with good advice. I recommend that my readers read his advice here.

I did read the whole thing, but have three comments on it:

  1. The Author suggests that 16-bit sample-depth offers a de-facto solution to the limits in dynamic range, simply due to the correct application of dithering. If I cannot trust my hardware to perform correct low-pass filtering, why on Earth would I trust it to perform correct, 16-bit, audio dithering?
  2. The Author explains the famous loudness curves, that define threshold of perceptibility, as well as the higher threshold of pain. What he fails to point out is that these curves assume, that the sound being tested for, is the only sound being played over the headphones. If there is another, background sound being played – i.e. the current loudness-level already higher than zero – then the threshold of perception for a given test-sound, is higher – requires a higher level, for that test-sound itself to be heard. Yet, this level is still lower, than the peak level of the background sound. People who design codecs know this, as I am sure the author does. It is the threshold of perceptibility next to a background sound – not the absolute threshold – which gets used in the design of codecs.
  3. The Author suggests it would be a misuse of his codec, to encode discrete multi-channel sound. And one reason he states, would be the waste in file-size, while the next reason he states, would be the fact that sound jumps to the nearest speaker, when they are all encoded that way.

This last observation strikes a cord with me. I have already noticed, that OGG Files do allow numerous channels to be encoded in parallel, but that if we exceed 2, we lose the benefits of Joint Stereo. By itself, this does not really count against this Author, whose codec therefore does not offer explicit surround-sound. But the possibility is very real, that the localization of sound will jump to the nearest speaker, if the listener moves and the sound was encoded that way. It is entirely possible, that purposeful encoding of surround-sound by the (competing) AC3 or the AAC codecs, reduces this risk.

But then I would suggest an alternative approach, to people who do not want to use the proprietary codecs, yet who wish to encode their movies with surround.

There exists the Steve Harris LADSPA plug-in library, which includes a matrix encoder for Pro Logic. This matrix encoder accepts 4 input channels, one of which is the surround channel, and outputs 2 stereo channels.

Further, the circuitry must exist someplace as well, to accept 2 stereo, 1 center and 1 surround-channel, and to encode those in real-time, so that the surround-effect can be played back over 6 speakers.

  • In principle, it should be possible to OGG-compress 4 channels and not 6, so that these channels can be used as inputs, to a matrix surround-system, like to the LADSPA plug-in, so that listenable surround will emanate from all speakers. Does Audio Software exist, which applies the LADSPA plug-in in real-time?
  • Alternatively, it might be possible to mix down Pro Logic sound into Stereo using the Steve Harris plug-in, and then to use FLAC on the resulting stereo.

BTW: What the Author mainly writes, is how incorrect it would be for pure listeners, to download their music in 24/192 format. He does not actually write, that Music / Sound Authors should avoid recording in this format. And so one fact which I have observed, is that there exists a lot of Audio Software – such as – that stores its sound in some higher, internal format, but which, when instructed to Export that to a 16-bit format, offer Dithering as an option.

This is possible because the Application is numeric and not physical. Thus, If I had used my USB-sound-device to record in 24-bit, I could next Export the finished sound tracks to 16-bit:

ardour_klystr_6

 

But, It would also seem that Chris Montgomery equates the use of such technology, as only being suited for Professionals. I am not a professional, and do not have the extremely expensive tools they do. Yet, I am able to author sound-projects.

Dirk

 

Emphasizing a Presumed Difference between OGG and MP3 Sound Compression

In this posting from some time ago, I wrote down certain details I had learned about MP3 sound compression. I suppose that while I did write, that the Discreet Cosine Transform coefficients get scaled, I may have missed to mention in that same posting, that they also get quantized. But I did imply it, and I also made up for the omission in this posting.

But one subject which I did mention over several postings, was my own disagreement with the practice, of culling frequency-coefficients which are deemed inaudible, thus setting those to zero, just to reduce the bit-rate in one step, hoping to get better results, ‘because a lower initial bit-rate also means that the user can select a higher final bit-rate…’

In fact, I think that some technical observers have confused two separate processes that take place in MP3:

  1. An audibility threshold is determined, so that coefficients which are lower than that are set to zero.
  2. The non-zero coefficients are quantized, in such a way that the highest of them fits inside a fixed maximum, quantized value. Since a scale-factor is computed for one frequency sub-band, this also implies that close to strong frequency coefficients, weaker ones are just quantized more.

In principle, concept (1) above disagrees with me, while concept (2) seems perfectly fine.

And so based on that I also need to emphasize, that with MP3, first a Fast-Fourier Transform is computed, the exact implementation of which is not critical for the correct playback of the stream, but the only purpose of which is to determine audibility thresholds for the DCT transform coefficients, the frequency-sub-bands of which must fit the standard exactly, since the DCT is actually used to compress the sound, and then to play it back.

This FFT can serve a second purpose in Stereo. Since this transform is assumed to produce complex numbers – unlike the DCT – it is possible to determine whether the Left-Minus-Right channel correlates positively or negatively with the Left-Plus-Right channel, regarding their phase. The way to do this effectively, is to compute the dot-product between two complex numbers, and to see whether this dot-product is positive or negative. The imaginary component of one of the sources needs to be inverted for that to work.

But then negative or positive correlation can be recorded once for each sub-band of the DCT as one bit. This will tell, whether a positive difference-signal, is positive when the left channel is more so, or positive if the right channel is more so.

You see, in addition to the need to store this information, potentially with each coefficient, there is the need to measure this information somehow first.

But an alternative approach is possible, in which no initial FFT is computed, but in which only the DCT is computed, once for each Stereo channel. This might even have been done, to reduce the required coding effort. And in that case, the DCT would need to be computed for each channel separately, before a later encoding stage decides to store the sum and the difference for each coefficient. In that case, it is not possible first to determine, whether the time-domain streams correlate positively or negatively.

This would also imply, that close to strong frequency-components, the weaker ones are only quantized more, not culled.

So, partially because of what I read, and partially because of my own idea of how I might do things, I am hoping that OGG sound compression takes this latter approach.

Dirk

Continue reading Emphasizing a Presumed Difference between OGG and MP3 Sound Compression

The Advantage of Linear Filters – ie Convolutions

A question might come to mind to readers who are not familiar with this subject, as to why the subset of ‘Morphologies’ that is known as ‘Convolutions’ – i.e. ‘Linear Filters’ – is advantageous in filtering signals.

This is because even though such a static system of coefficients, applied constantly to input samples, will often produce spectral changes in the signal, they will not produce frequency components that were not present before. If new frequency components are produced, this is referred to as ‘distortion’, while otherwise all we get is spectral errors – i.e. ‘coloration of the sound’. The latter type of error is gentler on the ear.

For this reason, the mere realization that certain polynomial approximations can be converted into systems, that entirely produce linear products of the input samples, makes those more interesting.

OTOH, If each sampling of a continuous polynomial curve was at a random, irregular point in time – thus truly revealing it to be a polynomial – then additional errors get introduced, which might resemble ‘noise’, because those may not have deterministic frequencies with respect to the input.

And, the fact that the output samples are being generated at a frequency which is a multiple of the original sample-rate, also means that new frequency components will be generated, that go up to the same multiple.

In the case of digital signal processing, the most common type of distortion is ‘Aliasing’, while with analog methods it used to be ‘Total Harmonic Distortion’, followed by ‘Intermodulation Distortion’.

If we up-sample a digital stream and apply a filter, which consistently underestimates the sub-sampled, then the resulting distortion will consist of unwanted modulations of the higher Nyquist Frequency.

Continue reading The Advantage of Linear Filters – ie Convolutions

I now have LG Tone Pro HBS-750 Bluetooth Headphones.

And unlike how it went with the previous set, I paid the full price for these, and know that they are genuine.

I can now comment accurately for the first time, about the “aptX” sound compression they use.

I understand that most of the music that I will be playing, has already been MP3 or OGG compressed. But with my simple headphones, that were wired to the stereo mini-jack on my phone, there was a loss in quality, just in getting the sound to my ears, after MP3 or OGG decoding. With aptX, it could be argued that there is also some small loss of quality in getting the sound to my ears.

aptX, and the HBS-750 headphones, are able to get the sound to my ears, after lossy decompression, better than the wired headphones could. So the only sound artifacts that I will ever hear with these, will be those due to MP3 or OGG, and the OGG files will play better again than the MP3 files did, as the OGG files are supposed to do.

The sound of these headphones is truly superb.

Further, the reason for which the suggested app ‘Tone And Talk’ was not recognizing the supposed HBS-730 headphones, was the mere fact that this app was able to read the meta-data of those, and was able to determine, that those were just not on the list of supported headphones, even though I was fooled into believing that they were.

Tone And Talk works properly, with the HBS-750 headphones, that are genuine LG headphones. That is, unless I am to do a detailed test of this app, which may come later. But the app does not just sit there and stay lame, as it did with the counterfeit headphones.

Dirk