My Opinion on the Opinion of Chris “Monty” Montgomery

Chris Montgomery is the Audio Expert, who invented the OGG Vorbis codec. That gives enough reason to accredit him with good advice. I recommend that my readers read his advice here.

I did read the whole thing, but have three comments on it:

  1. The Author suggests that 16-bit sample-depth offers a de-facto solution to the limits in dynamic range, simply due to the correct application of dithering. If I cannot trust my hardware to perform correct low-pass filtering, why on Earth would I trust it to perform correct, 16-bit, audio dithering?
  2. The Author explains the famous loudness curves, that define threshold of perceptibility, as well as the higher threshold of pain. What he fails to point out is that these curves assume, that the sound being tested for, is the only sound being played over the headphones. If there is another, background sound being played – i.e. the current loudness-level already higher than zero – then the threshold of perception for a given test-sound, is higher – requires a higher level, for that test-sound itself to be heard. Yet, this level is still lower, than the peak level of the background sound. People who design codecs know this, as I am sure the author does. It is the threshold of perceptibility next to a background sound – not the absolute threshold – which gets used in the design of codecs.
  3. The Author suggests it would be a misuse of his codec, to encode discrete multi-channel sound. And one reason he states, would be the waste in file-size, while the next reason he states, would be the fact that sound jumps to the nearest speaker, when they are all encoded that way.

This last observation strikes a cord with me. I have already noticed, that OGG Files do allow numerous channels to be encoded in parallel, but that if we exceed 2, we lose the benefits of Joint Stereo. By itself, this does not really count against this Author, whose codec therefore does not offer explicit surround-sound. But the possibility is very real, that the localization of sound will jump to the nearest speaker, if the listener moves and the sound was encoded that way. It is entirely possible, that purposeful encoding of surround-sound by the (competing) AC3 or the AAC codecs, reduces this risk.

But then I would suggest an alternative approach, to people who do not want to use the proprietary codecs, yet who wish to encode their movies with surround.

There exists the Steve Harris LADSPA plug-in library, which includes a matrix encoder for Pro Logic. This matrix encoder accepts 4 input channels, one of which is the surround channel, and outputs 2 stereo channels.

Further, the circuitry must exist someplace as well, to accept 2 stereo, 1 center and 1 surround-channel, and to encode those in real-time, so that the surround-effect can be played back over 6 speakers.

  • In principle, it should be possible to OGG-compress 4 channels and not 6, so that these channels can be used as inputs, to a matrix surround-system, like to the LADSPA plug-in, so that listenable surround will emanate from all speakers. Does Audio Software exist, which applies the LADSPA plug-in in real-time?
  • Alternatively, it might be possible to mix down Pro Logic sound into Stereo using the Steve Harris plug-in, and then to use FLAC on the resulting stereo.

BTW: What the Author mainly writes, is how incorrect it would be for pure listeners, to download their music in 24/192 format. He does not actually write, that Music / Sound Authors should avoid recording in this format. And so one fact which I have observed, is that there exists a lot of Audio Software – such as – that stores its sound in some higher, internal format, but which, when instructed to Export that to a 16-bit format, offer Dithering as an option.

This is possible because the Application is numeric and not physical. Thus, If I had used my USB-sound-device to record in 24-bit, I could next Export the finished sound tracks to 16-bit:



But, It would also seem that Chris Montgomery equates the use of such technology, as only being suited for Professionals. I am not a professional, and do not have the extremely expensive tools they do. Yet, I am able to author sound-projects.



Some Music may be Suitable for Surround-Sound.

One question which older members of the population might ask themselves, is whether it makes any sense, for people to be listening to music with surround-sound.

This question – or rather bias – stems from the way much music was mixed-down in the 1970s and 1980s, where the Artists applied a catch-as-catch-can approach, to creating Stereo. In fact back then, the goal was often even to confuse how the listener hears sound, using phase-shifts, and thus to be psychedelic. And so a basis exists to think, that electronic music especially, was never meant to be heard in surround-sound.

But the situation has changed since then. For several decades, some FM radio stations have been offering some music in surround-sound. And further, much of the old music from the 1970s has also been remastered, with more-modern technical considerations.

Before I could know whether a friend of mine is listening to Beethovens 9th Symphony, I cannot be sure whether what he says is real, and so I generally give people the benefit of the doubt. And, if he says he is listening to Neil Young, does he mean the way he recorded in the 1970s, or is he referring to a recording, which Neil Young personally remastered after the year 2000? ;)



Not Being Sure about the Sign Bit

One format in which MP3 can encode stereo, is in the form of “Joint Stereo”. In this form, the signal is sent as a sum-channel, and a difference-channel. The left and right channels can be reconstructed from them, just as easily as a sum and a difference, could be computed. The reason this is done, is to save on the bit-rate of the difference channel, along the argument that human stereo-directionality is more limited in frequencies, than straightforward hearing is.

But this is also one example, in which the difference channel needs to have a defined sign, as either being positive when left is more so, or being more positive when right is more so.

And so one way in which this could be encoded, if the decision was made to include doing so into the standard, could be to encode an additional sign-bit into each non-zero frequency coefficient of the difference channel. But doing so would also affect the overall bit-rate of the signal enough, that this deters professionals from doing it. Also, the argument is made that lower bit-rates can lead to higher sound quality, because if they want, users can increase the bit-rate anyway, resulting in greater definition of the more audible components of the sound.

And so with Joint Stereo, a feature built-in to MP3 is, “Side-Bits”. These side-bits are included if the mode is enabled, and declare as header information, how the signal should be panned, either to the left or the right, as well as what the sign of the difference-channel might be. ( :1 )

Well there is an implementation detail about the side-bits, which I do not specifically know: Whether they are stored only once per frame, or once per frequency sub-band, since the compression schemes do divide the spectrum into sub-bands already. Thus, if the side-bits were encoded for each sub-band, for each frame, then a good compromise could be reached, in terms of how much data should be spent on that.

The only coefficient which I imagine would have a sign-bit to itself each time, would be the zero coefficient, that corresponds to DC, but that also corresponds to F = Sampling Interval / 2.

This question becomes more relevant for surround-sound encoding. If, rather than using the Pro Logic method, some other scheme was decided on, for defining a ‘Back-Front’ channel, then it would suddenly become critical, that this channel have correct sign information. And then it would also become possible for this channel to correlate negatively with frequency components that also belong to the stereo channels. Hence, a more familiar method of designing the servos of Pro Logic II would be effective again, than what would be needed, if none of the cosine transforms could produce negative coefficients. ( :2 )


P.S. When we listen to accurately-reproduced signals on stereo headphones, If the signals are perfectly in-phase, humans tend to hear them as if they came from inside our own heads. If they are 180 degrees out-of-phase, we tend to hear them as coming from a non-specific place outside our heads.

I have had a personal friend complain, that when he listens to Classical Music via MP3 Files, he cannot ascertain the direction which sounds are coming from – i.e. the positions of the instruments in the orchestra, but can hear panning and this in-versus-out condition. Listening to Classical Music via MP3 is a very bad idea.

What this tells me is that my friend has very good hearing, which is tuned to the needs of Classical Music.

The reason he hears some of the sound as being in the out-of-phase position, may well just be due to the fact that Joint Stereo was being selected by the encoder, and that certain frequency components in the difference-channel, did not have substantial counterparts in the sum-channel. Mathematically, this results in a correlation of zero between the encoded channels, but in a correlation of -1 between the reconstructed left and right…

Subjectively, I would say that I have observed better sound quality in this regard, when using OGG Compression, and at a higher bit-rate. I found that “Enya” required 192kbps, while “Strauss” did not sound good until I had reached 256kbps.

But I do not know objectively, what it is in the OGG Files, that gives me this better experience. I do not have the precision hearing, which said friend has. I have used FLAC to encode some “Beethoven” and “Schubert”, but mainly just in order to archive their music without any loss in information at all, and not as a testament, to the listening experience really being that much better.

1: ) In the case of Joint Stereo with MP3, what I would expect, is that the ‘pan-number’ will also direct the decoder to set the polarity of the difference-channel, to be positive with whichever side the sum-channel is being panned-towards more strongly. And I expect this to happen, regardless of what the phase information was when encoding.

If there was explicit sign information here, such information would first also have had to be measured, when encoding, as relative to the phase-position of the sum-channel. Since phase-information is generally relative. And I do not hear speak, of correlation information being collected first when encoding, between the stereo and difference channels.

2: ) This subject peaked my interest into how OGG Compression deals with multi-channel sound. I did an experiment using “Audacity”, in which I prepared a 6-channel project, chose to export it in a custom channel-configuration, and then chose different settings in the channel-meta-data window.

While AC3 was ‘limited’ to allowing a maximum of 7 channels, OGG allows a compressed stream with up to 32 channels. But, I seem to have observed that when compressing more than 2 channels, OGG forgoes even joint stereo optimization, instead only compressing each channel individually. This seems to follow from the observation, that If I mix channels 3 and 4, assigned to a hypothetical front-center and LFE, I should have turned those two into a monaural signal repeated once. But doing so does not improve the OGG File size.

There was a 3m45s stream, which took 5.2MB as a 6-channel AC3. The same stream takes up 18.1MB as a 6-channel OGG. And these bit-rates result from choosing a rate of 192kbps for the AC3 File, while choosing ‘Quality Level 8/10′ for the OGG.

I think that one reason for the big difference in bit-rates, is the fact that my stream consisted of a stereo signal originally, of which there were merely 3 copies. The AC3 File takes advantage of the correlations to compress, while OGG is not as able to do so.

Further, I read somewhere that OGG takes the remarkable approach, to convert the stereo into joint stereo, after quantization (in the frequency domain), while MP3 does so before quantization. This makes the conversion which OGG performs, of a signal into stereo, a lossless process, and also seems to imply, encoding one sign bit with each coefficient of the difference-channel. Any advantage OGG gives to the bit-rate would need to stem, from the majority of low-amplitude coefficients in the difference-channel, as well as from limiting its frequencies.

By contrast, this would seem to suggest, that MP3 will compute an ‘FFT’ of each channel, also in order to determine the side-bits, after which it will compute a sum and a difference channel in the time-domain, and then compute the ‘DCT’ of each…


An Observation about Pro Logic Versus AC3

One question that people might ask, would be ‘Why is there still any interest in Pro Logic, when in the world today, we have AC3 sound compression?’ Beyond AC3, we also have AAC sound compression, which gets used in MP4 Video Files, or by itself, in M4A Audio Files.

The answer I would give is as follows. As long as our player or player application supports AC3, it will definitely be better able to output 6 channels of sound from such a compressed, digital stream.

But it can happen to us that our speaker amplifier only accepts two analog channels, which would have been called Left and Right. In such a case, If our speaker amp possesses a Pro Logic decoder, the player of our AC3-compressed stream still has the option, of Pro Logic encoding its stereo output.

In that case, our speaker amp will still try to decode that into surround sound, with as many speakers as we have connected to this amp.

But, If we do that, we are subjecting the sound to a loss in quality, because the sound has been collapsed into analog stereo first.

Yet, to substitute some other, Back-Front component for the surround channel, which is being fed to the Pro Logic decoder, does not really hurt the quality of the surround decoding more, than using Pro Logic already would. And so I would see no hesitation in doing so, if the need arises.