## An Observation about Modifying Fourier Transforms

A concept which seems to exist, is that certain standard Fourier Transforms do not produce desired results, and that therefore, They must be modified for use with compressed sound.

What I have noticed is that often, when we modify a Fourier Transform, it only produces a special case of an existing standard Transform.

For example, we may start with a Type 4 Discrete Cosine Transform, that has a sampling interval of 576 elements, but want it to overlap 50%, therefore wanting to double the length of samples taken in, without doubling the number of Frequency-Domain samples output. One way to accomplish that is to adhere to the standard Math, but just to extend the array of input samples, and to allow the reference-waves to continue into the extension of the sampling interval, at unchanged frequencies.

Because the Type 4 applies a half-sample shift to its output elements as well as to its input elements, this is really equivalent to what we would obtain, if we were to compute a Type 2 Discrete Cosine Transform over a sampling interval of 1152 elements, but if we were only to keep the odd-numbered coefficients. All the output elements would count as odd-numbered ones then, after their index is doubled.

The only new information I really have on Frequency-Based sound-compression, is that there is an advantage gained, in storing the sign of each coefficient, notwithstanding.

(Edit 08/07/2017 : )

## An Update about MP3-Compressed Sound

In many of my earlier postings, I stated what happens in MP3-compressed sound somewhat inaccurately. One reason is the fact that an overview requires that information be combined from numerous sources. While earlier WiKiPedia articles tended to be quite incomplete on this subject, it happens that more-recent WiKi-coverage has become quite complete, yet still requires that users click deeper and deeper, into subjects such as the Type 4 Discrete Cosine Transform, the Modified Discrete Cosine Transform, and Polyphase Quadrature Filters.

What seems to happen with MP3 compression, which is also known as MPEG-2, Layer 3, is that the Discrete Cosine Transform is not applied to the audio directly, but that rather, the audio stream is divided down to 32 sub-bands in fact, and that the MDCT is applied to each sub-band individually.

Actually, after the coefficients are computed, a specific filter is applied to them, to reduce the aliasing that happened, just because of the PQF Filter-bank.

I cannot be sure that this was always how MP3 was implemented, because if we take into account the fact that with PQF, every second sub-band is frequency-inverted, we may be able to obtain equivalent results just by performing the Discrete Cosine Transform which is needed, directly on the audio. But apparently, there is some advantage in subdividing the spectrum into its 32 sub-bands first.

One advantage could be, that doing so reduces the amount of computation required. Another advantage could be the reduction of round-off errors. Computing many smaller Fourier Transforms has generally accomplished both.

Also, if the spectrum is first subdivided in this way, it becomes easier to extract the parameters from each sub-band, that will determine how best to quantize its coefficients, or to cull ones either deemed to be inaudible, or aliased artifacts.

## A single time-delay can also be expressed in the frequency-domain.

Another way to state, that a stream of time-domain samples has been given a time-delay, is simply to state that each frequency-coefficient has been given a phase-shift, that depends both on the frequency of the coefficient, and on the intended time-delay.

A concern that some readers might have with this, is the fact that a number of samples need to be stored, in order for a time-delay to be executed in the time-domain. But as soon as differing values for coefficients, for a Fourier Transform, are spaced closer together, indicating in this case a longer time-delay, its computation also requires that a longer interval of samples in the time-domain need to be combined.

Now, if the reader would like to visualize what this would look like, as a homology to a graphical equalizer, then he would need to imagine a graphical equalizer the sliders of which can be made negative – i.e. one that can command, that one frequency come out inverted – so that then, if he was to set his sliders into the accurate shape of a sine-wave that goes both positive and negative in its settings, he should obtain a simple time-delay.

But there is one more reason for which this homology would be flawed. The type of Fourier Transform which is best-suited for this, would be the Discrete Fourier Transform, not one of the Discrete Cosine Transforms. The reason is the fact that the DFT accepts complex numbers as its terms. And so the reader would also have to imagine, that his equalizer not only have sliders that move up and down, but sliders with little wheels on them, from which he can give a phase-shift to one frequency, without changing its amplitude. Obviously graphical equalizers for music are not made that way.

## An Observation about the Discrete Fourier Transforms

Discrete Fourier Transforms, including the Cosine Transforms, tend to have as many elements in the frequency-domain, as the sampling interval had in the time-domain.

Thus, if a sampling interval had 1024 samples, there will be as many frequency-coefficients, numbered from 0 to 1023 inclusively. One way in which these transforms differ from the FFT, is in the possibility of having a number of elements either way, that are not a power of 2. It is possible to have a discrete transform with 11 time-domain samples, that translate into as many frequency-coefficients, numbered from 0 to 10 inclusively.

If it was truly the project to compute an FFT that has one coefficient per octave, then we would include the Nyquist Frequency, which is usually not done. And in that case, we would also ask ourselves, whether the component at F=0 is best computed as the summation over the longest interval, where it would usually be computed, or whether it makes more sense then, just to fold the shortest interval, which consists of 2 samples, one more time, to arrive at 1 sample, the value of which corresponds to F=0 .

Now, if our discrete transform had the frequency-coefficients


G(n) = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}




Then the fact could be exploited that these transforms tend to act as their own inverse. Therefore I can know, that the same set of samples in the time-domain, would constitute a DC signal, which would therefore have the frequency-coefficients


F(n) = {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}




If this was taken to be a convolution again, because the discrete transforms are their own inverse, it would correspond to the function


F(n) · S(m) == S(m)




We would assume that multiplication begins with element (0) and not with element (10). So I have a hint, that maybe I am on the right track. But, because the DCT has an inverse which is not exactly the same, the inverse being the IDCT, the next question I would need to investigate, is whether indeed I should be using the DCT and not the IDCT, to turn an intended set of frequency-coefficients, into a working convolution. And to answer that question, the simple thought does not suffice.

The main advantage with the DCT would be, that we will never need to deal with complex values.

Dirk