Modern Consumer Sound Appreciation

Over recent months, I have been racking my brain, trying to answer questions I have, about how sound that was compressed in the frequency-domain, may or may not be able to preserve phase-information. This does not mean that I, personally, can hear phase-information, nor that specific MP3 Files I have been listening to, would even be good examples of how well modern MP3s compress sound. I suspect that in order to stay in business, the developers of MP3 have in fact been improving their codec, so that when played back correctly, the quality of MP3s will stay in line with more-recent formats that exist, such as OGG Vorbis…

But I think that people under-appreciate my intellectual point of view.

For many months and years, I had my doubts, that MP3 Files can in fact encode ± 180⁰ phase-shifts, i.e. a stereo-difference channel that has the correct polarity with respect to the stereo-sum channel, over a range of frequencies. What my own musings have only taught me in recent days, is that in fact, MP3 is capable of ± 180⁰ phase-separation.

Further, similar types of compression should be capable of better phase-separation than that, If their bit-rates are set high enough, that not too many of their frequency-coefficients get chopped down – according to what I have reasoned out today.

What I also know, is that the sound-formats AC3 and AAC have as an explicit feature, to store surround-sound. MPEG-2 Video Files more-or-less require the use of the AC3 codec for sound, and MP4 Files absolutely require the use of the AAC codec. And, stored in its compressed format, the surround-effect only requires ± 180⁰ phase-accuracy.

This subject is orthogonal to debate which exists, about whether it is of benefit to human listeners, to have sound reproduced at very high sample-rates, or at great bit-depths. Furthermore, I do not fully know what good a very high sample-rate – such as “192kHz” – is supposed to do any listener, if his sound has been MP3-compressed. As far as I am concerned, ultra-high sample-rates have to do with lossless compression, or no compression, which also happen to produce the same file-sizes at that signal-format.

What I did was just check, in what format iTunes downloads music by default. And it downloads its music in AAC Format. All this does for me, is corroborate a claim a friend of mine made, that he can hear his music with full positioning, since that is also the main feature of AAC, and not of MP3.

Continue reading Modern Consumer Sound Appreciation

Another Observation about MP3

An interesting fact about MP3 compression, is that the actual encoding procedure is not constant, but that the decoding procedure is. This means that many of the differences that have come to MP3 in recent years, are optimizations of the encoding, which when played back in an unchanging way, may give better results.

For example, I myself have often written, that to quantize the frequency coefficients is ‘good’, but that to cut them off because deemed inaudible is ‘bad’. Well, why not just set all the assumed audibility thresholds lower? It would not affect the actual decoding of the stream.



An Update about MP3-Compressed Sound

In many of my earlier postings, I stated what happens in MP3-compressed sound somewhat inaccurately. One reason is the fact that an overview requires that information be combined from numerous sources. While earlier WiKiPedia articles tended to be quite incomplete on this subject, it happens that more-recent WiKi-coverage has become quite complete, yet still requires that users click deeper and deeper, into subjects such as the Type 4 Discrete Cosine Transform, the Modified Discrete Cosine Transform, and Polyphase Quadrature Filters.

What seems to happen with MP3 compression, which is also known as MPEG-2, Layer 3, is that the Discrete Cosine Transform is not applied to the audio directly, but that rather, the audio stream is divided down to 32 sub-bands in fact, and that the MDCT is applied to each sub-band individually.

Actually, after the coefficients are computed, a specific filter is applied to them, to reduce the aliasing that happened, just because of the PQF Filter-bank.

I cannot be sure that this was always how MP3 was implemented, because if we take into account the fact that with PQF, every second sub-band is frequency-inverted, we may be able to obtain equivalent results just by performing the Discrete Cosine Transform which is needed, directly on the audio. But apparently, there is some advantage in subdividing the spectrum into its 32 sub-bands first.

One advantage could be, that doing so reduces the amount of computation required. Another advantage could be the reduction of round-off errors. Computing many smaller Fourier Transforms has generally accomplished both.

Also, if the spectrum is first subdivided in this way, it becomes easier to extract the parameters from each sub-band, that will determine how best to quantize its coefficients, or to cull ones either deemed to be inaudible, or aliased artifacts.

Continue reading An Update about MP3-Compressed Sound

The approximate Difference between a DFT and an FFT

Both the Discreet Fourier Transform and the Fast Fourier Transform produce complex-numbered coefficients, the non-zero amplitudes of which will represent frequency components in the signal. They both produce a more accurate measure of this property of the signal, than the Discreet Cosine Transforms do.

Without getting into rigorous Math,

If we have a 1024-sample interval in the time-domain, then the DFT of that simply computes the coefficients from 0 through to 1023, half-cycles. A frequency component present at one coefficient, let us say an even-numbered coefficient, will also have a non-zero effect on the adjacent, odd-numbered coefficients, which can therefore not be separated fully, by a Fourier Transform that defines both sets. A DFT will generally compute them all.

An FFT has as a premise, a specific number of coefficients per octave. That number could be (1), but seldom actually is. In general, an FFT will at first compute (2 * n) coefficients over the full sampling interval, will then fold the interval, and will then compute another (n) coefficients, and will fold the interval again, until the highest-frequency coefficient approaches 1/2 the number of time-domain samples in the last computed interval.

This will cause the higher-octave coefficients to be more spread out and less numerous, but because they are also being computed for successively shorter sampling intervals, they also become less selective, so that all the signal energy is eventually accounted for.

Also, with an FFT, it is usually the coefficients which correspond to the even-numbered ones in the DFT which are computed, again because one frequency component from the signal does not need to be accounted for twice. Thus, whole-numbers of cycles per sampling interval are usually computed.

For example, if we start with a 1024-sample interval in the time-domain, we may decide that we want to achieve (n = 4) coefficients per octave. We therefore compute 8 over the full interval, including (F = 0) but excluding (F = 8). Then we fold the interval down to 512 samples, and compute the coefficients from (F = 4) through (F = 7).

A frequency component that completes the 1024-sample interval 8 times, will complete the 512-sample interval 4 times, so that the second set of coefficients continues where the first left off. And then again, for a twice-folded interval of 256 samples, we compute from (F = 4) through (F = 7)…


After we have folded our original sampling interval 6 times, we are left with a 16-sample interval, which forms the end of our series, because (F = 8) would fit in exactly, into 16 samples. And, true to form, we omit the last coefficient, as we did with the DFT.

210  =  1024

10 – 6 = 4

24  =  16

So we would end up with

(1 * 8) + (6 * 4) =  32  Coefficients .

And this type of symmetry seemed relevant in this earlier posting.


Continue reading The approximate Difference between a DFT and an FFT