Exploring the Discrete Sine Transform…

I can sometimes describe a way of using certain tools – such as in this case, one of the Discrete Cosine Transforms – which is correct in principle, but which has an underlying flaw, that needs to be corrected, from my first approximation of how it can be applied.

One of the things which I had said was possible was, to take a series of frequency-domain ‘equalizer settings’, which be at one per unit of frequency, not, at so many per octave, compute whichever DCT was relevant, such that the result had the lowest frequency as its first element, and then to apply that result as a convolution, in order finally to apply the computed equalizer to a signal.

One of the facts which I’m only realizing recently is, that if the DCT is computed in a one-sided way, the results are ‘completely non-ideal’, because it gives no control over what the phase-shifts will be, at any frequency! Similarly, such a one-sided convolution can also not be applied as the sinc function, because the amount of sine-wave output, in response to a cosine-wave input, will approach infinity, when the frequency is actually at the cutoff frequency.

What I have found instead is, that if such a cosine transform is mirrored around a centre-point, the amount of sine response, to an input cosine-wave, will cancel out and become zero, thus giving phase-shifts of zero.

But a result which some people might like is, to be able to apply controlled phase-shifts, differently for each frequency, such that those people specify a cosine as well as a sine component, for an assumed input cosine-wave.

The way to accomplish that is, to add-in the corresponding (normalized) sine-transform, of the series of phase-shifted response values, and to observe that the sine-transform will actually be zero at the centre-point. Then, the thing to do is, to apply the results negatively on the other side of the centre-point, which were to be applied positively on one side.



I have carried out a certain experiment with the Computer Algebra System named “wxMaxima”, in order first to observe what happens if a set of equal, discrete frequency-coefficients belonging to a series is summed. And then, I plotted the result of the definite integral, of the sine function, over a short interval. Just as with the sinc function, The integral of the cosine function was (sin(x) – sin(0)) / x, the definite integral of the sine function will be (1 – cos(x)) / x, and, Because the derivative of cos(x) is zero at (x = 0), the limit equation based on the divide by zero, will actually approach zero, and be well-behaved.



(Update 1/31/2021, 13h35: )

There is an underlying truth about Integral Equations in general, which people who studied Calculus 2 generally know, but, I have no right just to assume that any reader of my blog did so. There exist certain standard Integrals, which behave in the reverse way of how the standard Derivatives behave, just because ‘Integrals’ are ‘Antiderivatives’…

When one solves the Derivatives of certain trig functions repeatedly, one obtains the sequence:

sin(x) -> cos(x) -> -sin(x) -> -cos(x) -> sin(x)

Solving the Indefinite Integrals of the same trig functions yields the result:

sin(x) -> -cos(x) -> -sin(x) -> cos(x) -> sin(x)

Hence, the Indefinite Integral of sin(x) is in fact -cos(x), and:

( -(-cos(0)) = +1 )

(End of Update, 1/31/2021, 13h35.)


(Updated 2/04/2021, 17h10…)

Continue reading Exploring the Discrete Sine Transform…

An Observation about the Modern Application of Sinc Filters

One of the facts which I have blogged about before, was that an important type of filter, which was essentially digital, except for its first implementations, was called a ‘Sinc Filter‘. This filter was once presented as an ideal low-pass filter, that was also a brick-wall filter, meaning, that as the series was made longer, near-perfect cutoff was achieved.

Well, while the use of this filter in its original form has largely been deprecated, there is a modern version of it that has captured some popularity. The Sinc Filter is nowadays often combined with a ‘Kaiser Window‘, and doing so accomplishes two goals:

  • The Kaiser Window puts an end to the series being an infinite series, which many coders had issues with,
  • It also makes possible Sinc Filters with cutoff-frequencies, that are not negative powers of two, times the Nyquist Frequency.

It has always been possible to design a Sinc Filter with 2x or 4x over-sampling, and in some frivolous examples, with 8x over-sampling. But if a Circuit Designer ever tried to design one, that has 4.3 over-sampling, for example, thereby resulting in a cutoff-frequency which is just lower than 1/4 the Nyquist Frequency, the sticky issue would always remain, as to what would take place with the last zero-crossing of the Sinc Function, furthest from the origin. It could create a mess in the resulting signal as well.

Because the Kaiser Windowing Function actually goes to zero gradually, it suppresses the farthest zero-crossings of the Sinc Function from the origin, without impeding that the filter still works essentially, as the Math of the Sinc Function would suggest.

Further, even Linux utilities such as ‘ffmpeg’, employ a Sinc Filter by default when resampling an audio stream, but with a Kaiser Window.

(Updated 8/06/2019, 15h35 … )

Continue reading An Observation about the Modern Application of Sinc Filters

Some Trivia about Granules of Sound

One of the subjects which I’ve blogged about often, is the compression of sound, including Codecs which are based in the frequency-domain, rather than in the time-domain. What I’ve basically written is that in such cases, the time-domain samples of sound generate granules of frequency-domain coefficients, which are then in turn quantized. What tends to happen is that a new granule of sound is encoded every 576 time-domain samples, but that each time, a 1152-sample sampling window is used, and that due to the application of the “Modified Discrete Cosine Transform” (the ‘MDCT’), what amounts to all the odd coefficients of the Type 2 ‘DCT‘ are encoded, resulting in 576 coefficients being encoded each time. The present sampling window’s cosine function corresponds to the previous and next sampling window’s sine function, so that in a way that is orthogonal, these overlapping sampling windows also have the potential to preserve phase-information.

One observation which my readers may have about this, is the fact that while it does a good job at maintaining spectral resolution, this granule-size does not provide good temporal resolution. Therefore, a mechanism which MP3 compression introduced already, was ‘transient detection’. This feature can arbitrarily replace one of these full-length granules with 3 granules that only generate 192 frequency coefficients, and that recur as frequently.

The method by which transients are detected may be simple. For example, these short granules may tentatively have the stream subdivided all the time, but if any one of them contains more than average variance – which corresponds to signal energy – for example, if one shorter granule contains 1.5 times the average signal energy between the current 3, then this switch can take place.

What I do know is that when granules of sound – or rather, the quantized spectral information from granules of sound – are included in the stream, they include two extra bits each time, that define what the “Zone” of the present granule is. This can be one of four zones:

  • A full-sized granule belonging to a stream of them,
  • A shortened granule, belonging to a stream of them,
  • A shortened granule, that precedes a full-sized granule,
  • A shortened granule, that follows a full-sized granule.
  • Because it’s inherent in MP3 compression that the entire current sampling window must overlap, partially with the preceding, and partially with the following one, there may be no special rule for how to shape a sampling window, that corresponds to a long granule, both preceded and followed by shortened ones. However, when that happens, both the preceding and following shortened granules will be encoded, to be followed and preceded respectively, by a long granule, for which reason those granules will already have long overlap-portions. Therefore, the current granule in such a case can be encoded as though it was just part of a sequence of long granules.

This information is ultimately non-trivial because it also affects the computation of sampling windows, i.e., it also affects the exact windowing function to be used when encoding. If the granule is followed or preceded by short granules, then either side of the windowing function must also be shortened. (:1)

Now, in the case of other Codecs, such as ‘OGG Vorbis’, a similar approach is taken. But I can well imagine that if specific ideals were simply implemented exactly as they were with MP3 sound, then eventually, the owners of the MP3 Codec might cry foul, over software patent violations. And yet, this problem can easily be sidestepped, let’s say by deciding that the shortened granules be made 1/2 the length of the full-sized granule, instead of 1/3 that length. And at that point the implementation would be sufficiently different from the original idea, that it would no longer constitute a patent violation.

Continue reading Some Trivia about Granules of Sound

An Update about MP3-Compressed Sound

In many of my earlier postings, I stated what happens in MP3-compressed sound somewhat inaccurately. One reason is the fact that an overview requires that information be combined from numerous sources. While earlier WiKiPedia articles tended to be quite incomplete on this subject, it happens that more-recent WiKi-coverage has become quite complete, yet still requires that users click deeper and deeper, into subjects such as the Type 4 Discrete Cosine Transform, the Modified Discrete Cosine Transform, and Polyphase Quadrature Filters.

What seems to happen with MP3 compression, which is also known as MPEG-2, Layer 3, is that the Discrete Cosine Transform is not applied to the audio directly, but that rather, the audio stream is divided down to 32 sub-bands in fact, and that the MDCT is applied to each sub-band individually.

Actually, after the coefficients are computed, a specific filter is applied to them, to reduce the aliasing that happened, just because of the PQF Filter-bank.

I cannot be sure that this was always how MP3 was implemented, because if we take into account the fact that with PQF, every second sub-band is frequency-inverted, we may be able to obtain equivalent results just by performing the Discrete Cosine Transform which is needed, directly on the audio. But apparently, there is some advantage in subdividing the spectrum into its 32 sub-bands first.

One advantage could be, that doing so reduces the amount of computation required. Another advantage could be the reduction of round-off errors. Computing many smaller Fourier Transforms has generally accomplished both.

Also, if the spectrum is first subdivided in this way, it becomes easier to extract the parameters from each sub-band, that will determine how best to quantize its coefficients, or to cull ones either deemed to be inaudible, or aliased artifacts.

Continue reading An Update about MP3-Compressed Sound