A question might come to mind to readers who are not familiar with this subject, as to why the subset of ‘Morphologies’ that is known as ‘Convolutions’ – i.e. ‘Linear Filters’ – is advantageous in filtering signals.
This is because even though such a static system of coefficients, applied constantly to input samples, will often produce spectral changes in the signal, they will not produce frequency components that were not present before. If new frequency components are produced, this is referred to as ‘distortion’, while otherwise all we get is spectral errors – i.e. ‘coloration of the sound’. The latter type of error is gentler on the ear.
For this reason, the mere realization that certain polynomial approximations can be converted into systems, that entirely produce linear products of the input samples, makes those more interesting.
OTOH, If each sampling of a continuous polynomial curve was at a random, irregular point in time – thus truly revealing it to be a polynomial – then additional errors get introduced, which might resemble ‘noise’, because those may not have deterministic frequencies with respect to the input.
And, the fact that the output samples are being generated at a frequency which is a multiple of the original sample-rate, also means that new frequency components will be generated, that go up to the same multiple.
In the case of digital signal processing, the most common type of distortion is ‘Aliasing’, while with analog methods it used to be ‘Total Harmonic Distortion’, followed by ‘Intermodulation Distortion’.
If we up-sample a digital stream and apply a filter, which consistently underestimates the sub-sampled, then the resulting distortion will consist of unwanted modulations of the higher Nyquist Frequency.
In Human perception of sound, a pure sine wave represents purity, just as it does according to a Fourier Transform. When analog distortion is applied to it, the latter generates Harmonics, in the case of a single input sine-wave. For this reason, people tend to use the existence of excessive or inappropriate harmonics – frequency-components at multiples of the fundamental – to interpret whether a sound source is distorted.
This can also explain why, when surrounding people are able to hear the sound coming from the headphones of an individual, they often tend to hear ‘distorted, scratchy sounds’. When we listen to any sort of sound sources or musical instruments that have ‘sound color / timbre’ , this means that the sources have an appropriate set of harmonics already.
Typically, the ‘noise pollution’ given off by headphones only tends to change the coloration of sounds, in a way that strongly attenuates the mid-range frequencies, where fundamentals used to define musical notes, etc..
What this does is to leave the harmonics which the individual was already listening to, much stronger than they would normally be, in relationship to the amplitude of the fundamental, which has been strongly attenuated according to what surrounding people hear.
This leaves the surrounding people not able to recognize any instruments or voices, instead perceiving the now-excessive harmonics – as distortion.
It is not strictly necessary for the smart-phone of the user to be distorting sound much, in order for surrounding people to perceive the sound given off by the headphones as distorted. But this sound has an extremely colored spectrum.
Yet, there is also some tolerance with which the reverse can take place. Sometimes the harmonics actually being played back are excessive the way they are played back, but ‘because we are in tune with the music, we have flow’, and our hearing has its own version, of ‘suspension of disbelief’, so that we think all we are hearing is proper timbre.
This is often due to the fact that like the fundamentals, when MP3 compresses sounds, it quantizes the harmonics. So the harmonics can easily be over-represented or under-represented, in relation to the fundamentals, and then, depending on what was deemed impossible to hear, the harmonics of our music will often be quantized more than the mid-range frequencies were.
And so on days when we are ‘off’, our MP3 music can sound scratchy even to us. And then this aggravates the (same) speed with which the accidental emissions of sound from our headphones, will seem like distortion to people who surround us.
To the best of my understanding, OGG makes many of the same assumptions that MP3 makes, including that the original sound can be passed through a Discreet Cosine Transform, which can be simplified to keeping only the even-numbered coefficients, and that these coefficients may be quantized, so that next, they may be stored with a variable-length encoding.
But AFAIK, OGG makes fewer assumptions about the validity of ‘Psychoacoustic Masking’, therefore not applying that as much, and therefore keeping more coefficients as non-zero values, which MP3 would cut from the encoded file. This causes OGG to require more kbps to sound good initially, but to my taste, also causes many of my own OGG Files to sound better finally, than my MP3 Files do.
But what my observations lead me to suspect is that unlike MP3, OGG does not perform a Fast Fourier Transform first, in order to compute the audibility thresholds for each sub-band. Instead, OGG may simply compute the DCT, and end up quantizing some of the coefficients more than it does others. AFAIK, OGG also needs to store one-scale factor per sub-band. But that scale-factor can be computed entirely from the DCT.
And then it will also not remain possible, to derive stereo information from the FFT, only so from the DCT.