## An observation about how the OGG Opus CODEC may do Stereo.

One of the subjects which I’ve written about before, is the fact that the developers of the original OGG Vorbis form of music compression, have more recently developed the OGG Opus CODEC, which is partially the CELT CODEC. And, in studying the manpage on how to use the ‘opusenc’ command (under Linux), I ran across the following detail:


--no-phase-inv
Disable use of phase inversion for intensity stereo. This trades
some stereo quality for a higher quality mono  downmix,  and  is
useful when encoding stereo audio that is likely to be downmixed
to mono after decoding.



What does this mean? Let me explain.

I should first preface that with an admission, of the fact that an idea which was true for the original version of the Modified Discrete Cosine Transform, as introduced by MP3 compression and then reused frequently by other CODECs, may not always be the case. That idea was that, when defining monaural sound, each frequency coefficient needed to be signed. Because CELT uses a form of the Type 4 Discrete Cosine Transform which is only partially lapped, it may be that all the coefficients are assumed to be positive.

This will work as long as there is no destructive interference between the same coefficient, in the overlapping region, from one frame to the next, in spite of the half-sample shift of each frequency-value. Also, a hypotenuse function should be avoided, as that would present itself as distortion. One explicit way to achieve this could be, to rotate the reference-waves (n)·90° + 45° for coefficient (n):

Where ‘FN‘ refers to the current Frame-Number.

In general, modern compressed schemes will subdivide the audible spectrum into sub-bands, which in the case of CELT are referred to as its Critical Bands. And for each frame, the way stereo is encoded for each critical band, switches back and forth between X/Y intensity stereo, and Mid/Side stereo, which also just referred to as M/S stereo. What will happen with M/S stereo is, that the (L-R) channel has its own spectral shape, independent of the (L+R) channel’s, while with X/Y stereo, there is only one spectral pattern, which is reproduced by a linear factor, as both the (L+R) component, and the (L-R) component.

Even if the (L+R) is only being recorded as having positive DCT coefficients, with M/S stereo, the need persists for the (L-R) channel to be signed. Yet, even if M/S stereo is not taking place, implying that X/Y stereo is taking place, what can happen is that:

|L-R| > (L+R)

This would cause phase-inversion to take place between the two channels, (L) and (R). Apparently, a setting will prevent this from happening.

Further, because CELT has as its main feature, that it first states the amplitude of the critical band, and then a Code-Word which identifies the actual non-zero coefficients, which may only number 4, the setting may also affect critical bands for which M/S stereo is being used during any one frame. I’m not really sure if it does. But if it does, it will also make sure that the amplitude of the (L+R) critical band exceeds or equals that of the (L-R) critical band.

The way in which the CODEC decides, whether to encode the critical band using X/Y or M/S, for any one frame, is to detect the extent to which the non-zero coefficients coincide. If the majority of them do, encoding automatically switches to X/Y… Having said that, my own ideas on stereo perception are such that, if none of the coefficients coincide, it should not make any difference whether the specific coefficients belonging to the (L-R) channel are positive or negative. And finally, a feature which CELT could have enabled constantly, is to compute whether the (L-R) critical band correlates positively or negatively with the (L+R), independently of what the two amplitudes are. And this last observation suggests that even when encoding in M/S mode, the individual coefficients may not be signed.

(Update 10/03/2019, 9h30 … )

## Comparing two Bose headphones, both of which use active technology.

In this posting I’m going to do something I rarely do, which is, something like a product review. I have purchased the following two headphones within the past few months:

The first set of headphones has an analog 3.5mm stereo input cable, which has a dual-purpose Mike / Headphone Jack, and comes either compatible with Samsung, or with Apple phones, while the second uses Bluetooth to connect to either brand of phone. I should add that the phone I use with either set of headphones is a Samsung Galaxy S9, which supports Bluetooth 5.

The first set of headphones requires a single, AAA alkaline battery to work properly. And this not only fuels its active noise cancelling, but also an equalizer chip that has become standard with many similar middle-price-range headphones. The second has a built-in rechargeable Lithium-Ion Battery, which is rumoured to be good for 10-15 hours of play-time, which I have not yet tested. Like the first, the second has an equalizer chip, but no active noise cancellation.

I think that right off the bat I should point out, that I don’t approve of this use of an equalizer chip, effectively, to compensate for the sound oddities of the internal voice-coils. I think that more properly, the voice-coils should be designed to deliver the best frequency response possible, by themselves. But the reality in the year 2019 is, that many headphones come with an internal equalizer chip instead.

What I’ve found is that the first set of headphones, while having excellent noise cancellation, has two main drawbacks:

• The jack into which the analog cable fits, is poorly designed, and can cause bad connections,
• The single, AAA battery can only deliver a voltage of 1.5V, and if the actual voltage is any lower, either because a Ni-MH battery was used in place of an alkaline cell, or, because the battery is just plain low, the low-voltage equalizer chip will no longer work fully, resulting in sound that reveals the deficiencies in the voice-coil.

The second set of headphones overcomes both these limitations, and I fully expect that its equalizer chips will have uniform behaviour, that my ears will be able to adjust to in the long term, even when I use them for hours or days. Also, I’d tend to say that the way the equalizer arrangement worked in the first set of headphones, was not complete in fulfilling its job, even when the battery was fully charged. Therefore, If I only had the money to buy one of the headphones, I’d choose the second set, which I just received today.

But, having said that, I should also add that I have two 12,000BTU air conditioners running in the Summer months, which really require the noise-cancellation of the first set of headphones, that the second set does not provide.

Also, I have an observation of why the EQ chip in the second set of headphones may work better than the similarly purposed chip in the first set…

(Updated 9/28/2019, 19h05 … )

## Is it valid that audio equipment from the 1970s sound better than modern equipment?

That depends on which piece of audio equipment from the 1970s, is being compared with which piece of equipment from today.

If the equipment consists of a top-quality turntable from the late 1970s, compared to the most basic MP3-player from today, and if we assume for the moment that the type of sound file which is being played on the Portable Audio Player, is in fact an MP3 File recorded at a bit-rate of 128kbps, then the answer would be Yes. Top-quality turntables from the late 1970s were able to outperform that.

OTOH, If the audio equipment from today is a Digital Audio Player, that boasts 24-bit sound, that only happens to be able to play MP3 Files, but that is in fact playing a FLAC File, then it becomes very difficult for even the better audio equipment from the 1970s to match that.

Top-Quality Audio Equipment from the late 1970s, would have cost over $1000 for one component, without taking into account, how many dollars that would have been equivalent to today. The type of Digital Audio Player I described cost me C$ 140.- plus shipping, plus handling, in 2018.

Also, there is a major distinction, between any sort of equipment which is only meant to reproduce an Electronic signal, and equipment which is Electromechanical in nature, including speakers, headphones, phonographs… ‘The old Electromechanical technology’ was very good, except for the basic limitation, that they could not design good bass-reflex speakers, which require computers to design well. With no bass-reflex speakers, the older generations tended to listen to stereo on bigger, expensive speakers. But their sound was good, with even bass.

## The Recent “OGG Opus” Codec

One of the uses which I’ve had for OGG Files has been, as a container-file for music, which has been compressed using the lossy “Vorbis” Codec. This has given me superior sound to what MP3 Files once delivered, assuming that I’ve set my Vorbis-encoded streams to a higher bit-rate than what most people set, that being 256kbps, or, Quality Level 8.

But the same people who invented the Vorbis Codec, have embarked on a more recent project, which is called “OGG Opus”, which is a Codec that can switch back and forth seamlessly, between a lossy, Linear Predictive Coding mode (“SILK”), and a mode based on the Type 4 Discrete Cosine Transform (‘DCT’), the latter of which will dominate, when the Codec is used for high-fidelity music. This music-mode is defined by “The CELT Codec”, which has a detailed write-up dating in the year 2010 from its developers, that This Link points to.

I have read the write-up and offer an interpretation of it, which does not require as much technical comprehension, as the technical write-up itself requires, to be understood.

Essentially, the developers have made a radical departure from the approaches used previously, when compressing audio in the frequency domain. Only the least of the changes is, that shorter sampling windows are to be used, such as the 512-sample window which has been sketched, as well as a possible 256-sample window, which was mentioned as well. In return, both the even and odd coefficients of these sampling windows – aka Frames – are used, so that only very little overlap will exist between them. Hence, even though there will still be some overlap, these are mainly just Type 4 Discrete Cosine Transforms.

The concept has been abandoned, that the Codec should reconstruct the spectral definition of the original sound as much as possible, minus the fact that it has to be simplified, in order to be represented with far fewer bits, than the original sound was defined as having. A 44.1kHz, 16-bit, stereo, uncompressed Wave-File consumes about 1.4Mbps, while compressed sampling rates as low as 64kbps are achievable, and music will still sound decently like music. The emphasis here seems to be, that only the subjective perception of the sound is supposed to remain accurate.

(Updated 8/03/2019,16h00 … )