## A single time-delay can also be expressed in the frequency-domain.

Another way to state, that a stream of time-domain samples has been given a time-delay, is simply to state that each frequency-coefficient has been given a phase-shift, that depends both on the frequency of the coefficient, and on the intended time-delay.

A concern that some readers might have with this, is the fact that a number of samples need to be stored, in order for a time-delay to be executed in the time-domain. But as soon as differing values for coefficients, for a Fourier Transform, are spaced closer together, indicating in this case a longer time-delay, its computation also requires that a longer interval of samples in the time-domain need to be combined.

Now, if the reader would like to visualize what this would look like, as a homology to a graphical equalizer, then he would need to imagine a graphical equalizer the sliders of which can be made negative – i.e. one that can command, that one frequency come out inverted – so that then, if he was to set his sliders into the accurate shape of a sine-wave that goes both positive and negative in its settings, he should obtain a simple time-delay.

But there is one more reason for which this homology would be flawed. The type of Fourier Transform which is best-suited for this, would be the Discrete Fourier Transform, not one of the Discrete Cosine Transforms. The reason is the fact that the DFT accepts complex numbers as its terms. And so the reader would also have to imagine, that his equalizer not only have sliders that move up and down, but sliders with little wheels on them, from which he can give a phase-shift to one frequency, without changing its amplitude. Obviously graphical equalizers for music are not made that way.

## The approximate Difference between a DFT and an FFT

Both the Discreet Fourier Transform and the Fast Fourier Transform produce complex-numbered coefficients, the non-zero amplitudes of which will represent frequency components in the signal. They both produce a more accurate measure of this property of the signal, than the Discreet Cosine Transforms do.

Without getting into rigorous Math,

If we have a 1024-sample interval in the time-domain, then the DFT of that simply computes the coefficients from 0 through to 1023, half-cycles. A frequency component present at one coefficient, let us say an even-numbered coefficient, will also have a non-zero effect on the adjacent, odd-numbered coefficients, which can therefore not be separated fully, by a Fourier Transform that defines both sets. A DFT will generally compute them all.

An FFT has as a premise, a specific number of coefficients per octave. That number could be (1), but seldom actually is. In general, an FFT will at first compute (2 * n) coefficients over the full sampling interval, will then fold the interval, and will then compute another (n) coefficients, and will fold the interval again, until the highest-frequency coefficient approaches 1/2 the number of time-domain samples in the last computed interval.

This will cause the higher-octave coefficients to be more spread out and less numerous, but because they are also being computed for successively shorter sampling intervals, they also become less selective, so that all the signal energy is eventually accounted for.

Also, with an FFT, it is usually the coefficients which correspond to the even-numbered ones in the DFT which are computed, again because one frequency component from the signal does not need to be accounted for twice. Thus, whole-numbers of cycles per sampling interval are usually computed.

For example, if we start with a 1024-sample interval in the time-domain, we may decide that we want to achieve (n = 4) coefficients per octave. We therefore compute 8 over the full interval, including (F = 0) but excluding (F = 8). Then we fold the interval down to 512 samples, and compute the coefficients from (F = 4) through (F = 7).

A frequency component that completes the 1024-sample interval 8 times, will complete the 512-sample interval 4 times, so that the second set of coefficients continues where the first left off. And then again, for a twice-folded interval of 256 samples, we compute from (F = 4) through (F = 7)…

After we have folded our original sampling interval 6 times, we are left with a 16-sample interval, which forms the end of our series, because (F = 8) would fit in exactly, into 16 samples. And, true to form, we omit the last coefficient, as we did with the DFT.

210  =  1024

10 – 6 = 4

24  =  16

So we would end up with

(1 * 8) + (6 * 4) =  32  Coefficients .

And this type of symmetry seemed relevant in this earlier posting.

Dirk

## Emphasizing a Presumed Difference between OGG and MP3 Sound Compression

In this posting from some time ago, I wrote down certain details I had learned about MP3 sound compression. I suppose that while I did write, that the Discreet Cosine Transform coefficients get scaled, I may have missed to mention in that same posting, that they also get quantized. But I did imply it, and I also made up for the omission in this posting.

But one subject which I did mention over several postings, was my own disagreement with the practice, of culling frequency-coefficients which are deemed inaudible, thus setting those to zero, just to reduce the bit-rate in one step, hoping to get better results, ‘because a lower initial bit-rate also means that the user can select a higher final bit-rate…’

In fact, I think that some technical observers have confused two separate processes that take place in MP3:

1. An audibility threshold is determined, so that coefficients which are lower than that are set to zero.
2. The non-zero coefficients are quantized, in such a way that the highest of them fits inside a fixed maximum, quantized value. Since a scale-factor is computed for one frequency sub-band, this also implies that close to strong frequency coefficients, weaker ones are just quantized more.

In principle, concept (1) above disagrees with me, while concept (2) seems perfectly fine.

And so based on that I also need to emphasize, that with MP3, first a Fast-Fourier Transform is computed, the exact implementation of which is not critical for the correct playback of the stream, but the only purpose of which is to determine audibility thresholds for the DCT transform coefficients, the frequency-sub-bands of which must fit the standard exactly, since the DCT is actually used to compress the sound, and then to play it back.

This FFT can serve a second purpose in Stereo. Since this transform is assumed to produce complex numbers – unlike the DCT – it is possible to determine whether the Left-Minus-Right channel correlates positively or negatively with the Left-Plus-Right channel, regarding their phase. The way to do this effectively, is to compute the dot-product between two complex numbers, and to see whether this dot-product is positive or negative. The imaginary component of one of the sources needs to be inverted for that to work.

But then negative or positive correlation can be recorded once for each sub-band of the DCT as one bit. This will tell, whether a positive difference-signal, is positive when the left channel is more so, or positive if the right channel is more so.

You see, in addition to the need to store this information, potentially with each coefficient, there is the need to measure this information somehow first.

But an alternative approach is possible, in which no initial FFT is computed, but in which only the DCT is computed, once for each Stereo channel. This might even have been done, to reduce the required coding effort. And in that case, the DCT would need to be computed for each channel separately, before a later encoding stage decides to store the sum and the difference for each coefficient. In that case, it is not possible first to determine, whether the time-domain streams correlate positively or negatively.

This would also imply, that close to strong frequency-components, the weaker ones are only quantized more, not culled.

So, partially because of what I read, and partially because of my own idea of how I might do things, I am hoping that OGG sound compression takes this latter approach.

Dirk

## Those Mysterious FFTs

One question I do not know the answer to, is why many Fourier Transforms are being named ‘FFTs’, which I feel should be named ‘DFTs’.

According to what I read, an FFT is supposed to compute a number of coefficients per octave, while a DFT is supposed to compute a number of them per unit frequency. This would improve the computation of FFTs, by folding them and computing fewer, high-frequency coefficients.

I am seeing Transforms named FFTs, that are still computing the full number of coefficients, per unit of frequency.

Dirk