In This Posting, I suggested a way of using a Discreet Fourier Transform, which I suspect may be in use in sound compression techniques such as MP3, with the exception of the fact that I think MP3 uses sampling intervals of 1152 samples, while in theory I was suggesting 1024.
What I was suggesting, was that if the sampling intervals overlap by 50%, Because they only use the odd-numbered coefficients, each of them would analyze a unit vector, as part of a phase-vector diagram, which would have been 90 degrees out of phase with the previous. And every second sampling interval would also have a base vector, which is 180 degrees out of phase with the earlier one.
If the aim was, to preserve the phase-position of the sampled sound correctly, it might seem that all we need to do, is to preserve the sign of each coefficient, so that when the sampling intervals are reconstructed as overlapping, a wave will result, that has the correct phase angle, between being a ‘cosine’ and a ‘sine’ wave.
But there would be yet another problem with that, specifically in sound compression, if the codec is using customary psychoacoustic models.
By its nature, such a scheme would produce amplitudes which, in addition to requiring a sign bit to store, would be substantially different between even-numbered and odd-numbered sampling intervals, not because of time-based changes in the signal, but because they are orthogonal unit vectors.
An assumption that several sound compression schemes also make, is that If the amplitude of a certain frequency component was at say 100% at t=-1, Then at t=0 that same frequency component has an inherited ‘audibility threshold’ of maybe 80%, ? Thus, if the later frequency coefficient does exceed 80%, it will be deemed inaudible and suppressed. Hence, an entire ‘skirt’ of audibility thresholds tends to get stored, for all the coefficients, which not only surrounds peak amplitudes with slopes, but which additionally decays from one frame to the next.
Hence, even if our frames were intended to complement each other as being orthogonal, practical algorithms will nonetheless treat them as having the same meaning, but consecutive in time. And then, if one or the other is simply suppressed, our phase accuracy is gone again.
This thought was also the main reason for which I had suggested, that the current and the previous sampling interval should have their coefficients blended, to arrive at the coefficient for the current frame. And the blending method which I would have suggested, was not a linear summation, but to find the square root, of the sums, of the squares of the two coefficients.
As soon as anybody has done that, they have computed the absolute amplitude, and have destroyed all phase-information.
But there is an observation about surround-sound which comes as comforting for this subject. Methods that exist today, to decode stereo into 5.1 surround, only require phase-accuracy to within 90 degrees, as far as I know, to work properly. This would be due to the industrious way in which Pro Logic 1 and 2 were designed.
And so one type of information which could be added back in to the frequency coefficients, would be of whether the cosine and the sine function are each positive or negative, with respect to the origin of each frame. This will result in sound reproduction which is actually 45 degrees out of phase, from how it started, yet possessing 4 possible phase positions, that correspond to the 4 quadrants of the sine and cosine functions.
And this could even be encoded, simply by giving the coefficient a single sign-bit, with respect to each frame.
And this could cause some perception oddity when the weaker coefficients are played back, with an inaccurate phase position with respect to the dominant coefficients. Yet, a system is plausible, that at least states a phase position, for the stronger, dominant frequency components.
What I could also add, is that in a case where Coefficient 1 had an amplitude of 100%, and the audibility threshold of Coefficient 2 followed as being at 80%, their computation does not always require that these values be represented in decibels.
Obviously, in order for any psychoacoustic model to work, the initial research needs to reveal relationships in decibels. But if we can at least assume that Coefficient 2 was always a set number of decibels lower than Coefficient 1, even at odd numbers of decibels, this relationship can be converted into a fraction, which can be applied to amplitude units, instead of to decibel values.
And, if we are given a signed 16-bit amplitude, and would wish to multiply it by a fraction, which has also been prepared as an unsigned 15-bit value expressing values from 0 to 99%, then to perform an integer multiplication between these two will yield a 32-bit integer. Because we have right-shifted the value of one of our two integers, from the sense in which they usually express +32767 … -32768, we do have the option of next ignoring the least-significant word from our product, and using only the most-significant 16-bit word, to result in a fractional multiplication.
The same can be done with 8-bit integers.
Further, if we did not want our hypothetical scheme to introduce a constant 45 degree phase-shift, there would be a way to get rid of that, which would add some complexity at the encoding level.
For each pair of sampling intervals, it could be determined rather easily, which of thee two was the even-numbered, and which was the odd-numbered. Then, we could find whether the absolute of the sine or the absolute of the cosine component was greater, and record that as 1 bit. Finally, we would determine whether the given component was negative or not, and record that as another bit.
(Edit 05/23/2016 : ) Further, there is no need to encode both frames in MPEG-2, where one frame is stored in their place, as derived from both sampling intervals. Hence, the default pattern would be shortened to
[ (0,0), (1,0) ] .
(Edit 12/31/2016 : The default pattern would be shortened to [ (0,0), (0,1) ] .
When playing back the frames, the second granule of each could follow from the first, by default, in the following pattern:
+cos -> -sin -sin -> -cos -cos -> +sin +sin -> +cos
We would need access to the sine-counterpart, of the IDCT, to play this back.
End of Edit . )
But then we should also ask ourselves, what has truly been gained. A phase-position that lines up with an assumed cosine or an assumed sine vector, should be remembered as only being lined up, with the origin of each sampling window. But the exact timing of each sampling window is arbitrary, with respect to the input signal. There is really no reason to assume any exact correspondence, since the source of the signal is typically from somewhere else, than the provider of the codec.
And so in my opinion, to have all the reproduced waves consistently 45 degrees out of phase, only puts them that way, with respect to a sampling window whose timing is unknown. According to Pro Logic, Surround-Sound decoding, what should really matter, is whether two waves belonging to the same stream, are in-phase with each other, or out-of-phase, to whatever degree of accuracy can be achieved.
( This last concept, actually contradicts being able to reconstruct a waveform accurately, because a constant phase-shift is inconsistent with a constant time-delay, over a range of frequencies. When a complex waveform is actually time-shifted, so as to keep its shape, then this conversely implies different phase-shifts for its frequency components. )