Hypothetically, how an FFT-based equalizer can be programmed.

One of the concepts which I only recently posted about was, that I had activated an equalizer function, that was once integrated into how the PulseAudio sound server works, but which may be installed with additional packages, in more-recent versions of Debian Linux. As I wrote, to activate this under Debian 8 / Jessie was a bit problematic at first, but could ultimately be accomplished. The following is what the controls of this equalizer look like, on the screen:

Equalizer_1

And, this is what the newly-created ‘sink’ is named, within the (old) KDE-4 desktop manager’s Settings Panel:

Equalizer_2

What struck me as remarkable about this, was its naming, as an “FFT-based Equalizer…”. I had written an earlier posting, about How the Fast Fourier Transform differs from the Discrete Fourier Transform. And, because I tend to think first, about how convolutions may be computed, using a Discrete Cosine Transform, it took me a bit of thought, to comprehend, how an equalizer function could be implemented, based on the FFT.

BTW, That earlier posting which I linked to above, has as a major flaw, a guess on my part about how MP3 sound compression works, that makes a false assumption. I have made more recent postings on how sound-compression schemes work, which no longer make the same false assumption. But otherwise, that old posting still explains, what the difference between the FFT and other, Discrete Transforms is.

So, the question which may go through some readers’ minds, like mine, would be, how a graphic equalizer based on the FFT can be made computationally efficient, to the maximum. Obviously, when the FFT is only being used to analyze a sampling interval, what results is a (small) number of frequency coefficients, spaced approximately uniformly, over a series of octaves. Apparently, such a set of coefficients-as-output, needs to be replaced by one stream each, that isolates one frequency-component. Each stream then needs to be multiplied by an equalizer setting, before being mixed into the combined equalizer output.

I think that one way to compute that would be, to replace the ‘folding’ operation normally used in the Fourier Analysis, with a procedure, that only computes one or more product-sums, of the input signal with reference sine-waves, but in each case except for the lowest frequency, over only a small fraction of the entire buffer, which becomes shorter according to powers of 2.

Thus, it should remain constant that, in order for the equalizer to be able to isolate the frequency of ~31Hz, a sine-product with a buffer of 1408 samples needs to be computed, once per input sample. But beyond that, determining the ~63Hz frequency-component, really only requires that the sine-product be computed, with the most recent 704 samples of the same buffer. Frequency-components that belong to even-higher octaves can all be computed, as per-input-sample sine-products, with the most-recent 352 input-samples, etc. (for multiples of ~125Hz). Eventually, as the frequency-components start to become odd products of an octave, an interval of 176 input samples can be used, for the frequency-components belonging to the same octave, thus yielding the ~500Hz and ~750Hz components… After that, in order to filter out the ~1kHz and the ~1.5kHz components, a section of the buffer only 88 samples long can be used…

Mind you, one alternative to doing all that would be, to apply a convolution of fixed length to the input stream constantly, but to recompute that convolution, by first interpolating frequency-coefficients between the GUI’s slider-positions, and then applying one of the Discrete Cosine Transforms to the resulting set of coefficients. The advantage to using a DCT in this way would be, that the coefficients would only need to be recomputed once, every time the user changes the slider-positions. But then, to name the resulting equalizer an ‘FFT-based’ equalizer, would actually be false.

(Updated 6/23/2020, 18h50… )

(As of 6/20/2020, 19h00: )

I look at this equalizer configuration as a 14-band, because I don’t tend to view the slider that has been labelled “Coda” as being particularly valid, unless I’m misinterpreting, when I think it gives an extra boost to the actual Nyquist Frequency.

Contrarily to that, what the DC slider and the “31Hz” slider do, should be orthogonal, and counts as two settings.

The Nyquist Frequency needs to be regarded with some suspicion in signal processing, as bad things happen at that frequency. And in the case of this equalizer, that frequency is already being passed to a partial degree, by the “16kHz” setting.

Because of the additional reality of the speakers on the computer in question being lousy, trying to boost the actual 22.05kHz content can only result in more distortion, or might even damage them.


 

What I’ve seen written elsewhere is, that if I merely drag that window to become wider, the equalizer will reconfigure itself to have more bands. But I see no urgent need to test that for the moment.


 

When it comes to the actual question of Computing Efficiency, I suppose a more pertinent question would be of the form: ‘Should actual sine functions be computed, for each of the product-sums, with a reference sine wave? If so, would that not impact efficiency severely?’

  • But, that very question has known an answer since the beginning of the Computing Era. What one would do is, to precompute a sine-function table once, when the software initializes itself…
  • As long as the shorter sine-function tables are powers of (1/2) the length of the first, they only need to sample 1 sine-function out of 2, then 1 out of 4, then 1 out of 8, etc.
  • As soon as the table of sine-function-values would become 6 samples long, for the multiples of ~8kHz, instead of 11 samples long, for the multiples of ~4kHz, the problem would also become slightly more involved…
  • Visible slider positions corresponding to 8kHz and 16kHz could actually be implemented, as hidden, interpolated slider-positions for 8, 12, 16 and 20kHz, again using the 11-sample table of sine-value functions.
  • Rather than only mixing a product of the Nyquist Frequency back in, the function of the visible slider labelled “Coda” could additionally be, to define how a 20kHz position is to be interpolated, when it is not being displayed.
  • Even in the early days of Computing, the fact was recognized that the first derivative of the sine function was the cosine function, which in turn could be found, by looking up the sine function differently. Further, its second derivative is the negative of the sine function…
  • Therefore, a first-order interpolation between any 2 positions of the sine-value-table, can be computed, by looking up values from the table twice.
  • Further, a second-order interpolation can be computed as a quadratic spline, the endpoints of which are two adjacent look-ups in the table, and the position of the handle of which corresponds to (the sum between the endpoints, minus (the second derivative times the increment in radians)), divided by two. This is still cheaper, than to compute the sine function each time it’s to be used.
  • (A separate table of 6 values could also be precomputed.)

 

Another possible question this equalizer poses would be, ‘How is the DC level being implemented differently from the 31.25Hz level?’ But even that question leads to one answer out of two, based entirely on theory:

  • There is an ‘HQ’ way to do it, that actually doubles the computational cost. Or,
  • There is an ‘LQ’ way to do it, that will cause sound artifacts when transients are input, but that will not increase computational cost by much.

If the HQ approach is to be assumed, then what the DC component needs is a windowing function. If the LQ approach is to be used – a mere average of the sample-buffer – then, as soon as an input transient takes place, if the DC level has been set high, that transient will seem to have an echo. (:2)

A suitable windowing function could just be, a half-sine-wave, that spans the sample-buffer, symmetrically, while the 31.25Hz component is defined by a full sine-wave, asymmetrically.


 

Another issue that would result, if the product of a discrete series of input samples was just multiplied by a reference sine-wave, would be, that at low frequencies, it works fine, while, when the frequency gets close to the Nyquist Frequency, the sine function has as property to generate values close to zero.

(Update 6/21/2020, 11h45:

The reason this will happen, is the fact that, at the Nyquist Frequency, sampling the sine function at discrete points in time, will tend ‘to catch it exactly at a zero-crossing each time’.

If the first 11 samples of the buffer were being used, and occupied by a single sine-wave, then by default, n ∈ [0 .. 10], F=1, sin(2πnF/11) would be computed. Just as is frequently done with cosine transforms, a half-sample shift would need to be added to this index, resulting in values to be generated that are sin(π(2n+1)F/11).

Then, given a sine-product with a sine-wave that completes (F=5) cycles over the same interval, that reference sine-wave would become [sin(π5/11), sin(π15/11), .. sin(π105/11)]. So, the sine function would begin near full-amplitude, and would end, near full amplitude, but would generate several low amplitudes in the middle of the interval. When (n=5), the value actually computed for the sine-function would become sin(π55/11), which is exactly zero. This should agree well with the additional assumption, that such products tend to have ½ amplitude over the entire interval. ) (:1)

 


 

(Update 6/21/2020, 22h10: )

That last bit of Math might strike the reader as somewhat obscure, so what I would also offer is, to explain it in a less-abstract way.

If the assumption of an ‘FFT-based’ filter is, that only one table of sine-values has been precomputed, then it also follows that the index into this table consists of integers. But, for index (i), the assumption would be, that the sine-value stored is actually sin(πi/352), because in my hypothetical version of such an equalizer, the table would actually hold (704) elements, numbered from [0 .. 703].

This would enable the DC signal component to receive an accurate windowing function, in the form of a half-sine-wave, without any half-sample shift, but only by reading the first (352) samples from this table.

For the other look-ups into the table, the actual implementation would ignore, the fact that (2π) was ever a part of the argument to the sine function. Instead, the index to be looked up in the table would be:

i = (2n + 1) (2Octave) F mod 704

Where, (n) corresponds to the element of the sample-buffer, which is to be multiplied by the sine function, and (F) corresponds to the multiple of that octave’s base frequency that is to be filtered out, in such a way that (F) may become an odd number of sine-waves, over the given interval of the sample-buffer…


 

Just to keep my Hypothetical Equalizer Mathematically simple, I’m also assuming that:

Octave ∈ [0 .. 5]

Such that, when (Octave = 0), 352 samples are actually being used from the sample buffer, in order to filter out the ~125Hz frequency component, and when (Octave = 5), only 11 samples are being used from the sample buffer, so that multiples of ~4kHz are being filtered out, from multiple (F = 2) through multiple (F = 5), where a frequency of ~20kHz starts to get close to the Nyquist Frequency, and actually requires the half-sample shift, to be computed correctly.

Obviously, the real implementation may differ considerably. But such implementations are not Linux-specific, and it’s generally good, to understand what FFT-based analyses of sampling intervals do differently, from DCT-based analyses.


 

Whatever the exact implementation details were, when I look at the GUI shown at the beginning of this posting, of the equalizer that I just activated, that was obviously programmed by other people, I can’t help but notice a certain pattern in the frequencies assigned to the sliders. That is, once the base-frequencies of the octaves reached 125Hz, the equalizer seems to offer (2) frequencies per octave, visible as 2x125Hz, 3x125Hz, 2x250Hz, 3x250Hz, 2x500Hz, 3x500Hz… But if that pattern was to continue, then what is conspicuously missing, are 3x1kHz and 3x2kHz. Further, my posting has argued, that ‘greater Mathematical simplicity’ could be achieved, if there were also settings for 3x4kHz and 5x4kHz.


1:)

What the bulk of this posting states, is that a very specific, yet hypothetical system could be implemented, that includes an explicit definition for the (5x4kHz = 20kHz) frequency component. A valid question which the reader could ask would be, ‘Why would the need exist, to define the level at which the Nyquist Frequency is transmitted, separately from this hidden 20kHz setting, that is, if the Nyquist Frequency is to be transmitted at all?’

The Nyquist Frequency is the frequency at which a simple, alternating pattern of [+1, -1, +1, …] oscillates, that may be present to some degree in the input, and which, if the sampling rate is the standard 44.1kHz sampling rate, is at 22.05kHz.

What my earlier paragraph states is, that a half-sample shift can be applied to the reference sine function, so that, when filtering out the 20kHz component, at the beginning of the sampling interval, as well as at the end, maximum amplitude is achieved. This sampling interval would have 11 samples, and yet see 5 complete cycles, in a way that generates minimum amplitude near its midpoint. Well, what follows from that is, that the parity with which this reference sine-wave oscillates, that is no longer discernible as an actual sine function, is opposite at the two endpoints. Therefore, if a simple alternation did appear at the input, its product with the reference pattern would mainly cancel out.

Therefore, if that was a desired part of the signal, it would also need to be added back in. Fortunately, the convolution which would do this, and do it with an amplitude consistent with all the other amplitudes, would simply be an even number of elements following [+0.5, -0.5, +0.5, …] (maybe, 10 such elements). So, it is possible that the setting which was actually labelled “Coda” adjusts that frequency-component separately.


2:)

The question of whether a windowing function is actually necessary, in order to invent a DC channel, eventually boils down to the following assumption:

  • The way in which the product-sums are generally normalized is, that they are divided by the number of elements, from the sample-buffer, that are being used, to arrive at an assumed amplitude of ½.

Hence, if 1408 samples are in fact being averaged (arriving at an assumed amplitude of 1), effectively, the amplitude of a short-term / high-frequency transient, is also being divided by 1408. For that reason, even though windowing functions are more proper, the situation required, to make the sound artifact noticeable, would also be extreme. It might in that case seem better, to halve the computational cost, instead of computing yet another, 1408-element product-sum.

However, if the buffer ‘only’ holds 352 samples, as stated, then it makes more sense to apply a windowing function.


My desire to speculate has taken a back door for the moment, to my actual curiosity. What I have now done is, actually to drag that equalizer window sideways, to become a wider window. And indeed, the equalizer has several new bands now, totalling 21 plus the “Coda” setting:

Equalizer_4

As I expected, the 3kHz, 6kHz and 12kHz sliders are now visible. But quite to my surprise, there is also a 15Hz slider now, as well as a few new sliders, that seem to result as odd products, of lower octave-base frequencies.

According to some traditional thinking, a few facts should not change, such as, that in order to isolate the 15Hz frequency, a product sum of 2816 elements needs to be recomputed now, once per input sample,? Apparently, while my words are cheap, CPU cycles on that old laptop are even cheaper. It seems unrealistic.

And of course, the equalizer still works, without stuttering.


I suppose that I should introduce a new type of question, in light of this apparent discovery, that the equalizer can just seem to go down to a lower octave of frequencies, as many times as desired, without tying up the CPU drastically, over ~2816-element sample-buffers~, etc…

‘Is it possible that the way this equalizer works, is actually to down-sample the input stream, by a factor of (2) each time that the lowest non-zero frequency-setting becomes another octave lower than 125Hz, and to add a new buffer of 352 samples each time, in a way that allows for the factors (F=2) and (F=3) of the down-sampled octave-base frequencies to be filtered out, resulting in a linear increase in the number of elements?’

And, the only reservations I’d have about this idea would be:

  • The fact that down-sampling, without causing gross frequency-non-linearity, requires that some sort of wavelet be applied. I don’t know whether the Haar Wavelet would remain frequency-neutral enough for most purposes.
  • Where the streams belonging to lower, derived octaves were down-sampled, they also need to be up-sampled again, after being multiplied by their respective equalizer settings, and before being mixed back, into the output stream, again, in an inherently frequency-neutral way.

But Yes, once a methodology has been perfected, by which this stage-wise down-sampling and then, re-up-sampling can take place, it should be possible, only to apply a smaller buffer of maybe ?352? samples each time, extracting those multiples of (F=2) and (F=3), cycles per buffer-length. And the ultimate result would be, more modest increase in CPU and RAM usage.


I think that the way in which resampling errors in the reproduction of the lowest frequencies can be avoided, is by not making each buffer too small. The resolution of those added buffers, together with simple up-sampling per se, could keep the up-sampling ‘smooth’.


 

(Update 6/22/2020, 12h15: )

As I left the subject above, I had asked two questions explicitly, but only given an implicit answer to each:

  • Would it be suitable to down-sample the input stream, using the Haar Wavelet?
  • Can the frequencies filtered out of the down-sampled streams and their additional buffers, be up-sampled again, in a simple way, yet in a way that keeps their sine-waves smooth?

 

After thinking about it for another day I’d say that the explicit answer to both questions is Yes:

  • Using the Haar Wavelet for down-sampling will create strong frequency-response non-linearity, near the newly-defined Nyquist Frequency, which will actually be half the previous Nyquist Frequency. However, using it should not produce strong non-linearity, 6 octaves down from the newly-defined Nyquist Frequency.
  • Using Linear Interpolation in order to up-sample a stream twofold, will also create noticeable artifacts, when the frequencies in the original stream get close to the Nyquist Frequency. However, if the highest frequency (i.e., sine-wave) to be up-sampled, is still being subdivided into ~100 samples (according to my hypothesis, at F=3), then, to perform a linear interpolation on those samples, will result in much weaker artifacts.

Hence, if the frequency being up-sampled is indeed 15.625Hz, 31.25Hz, or 46.875Hz, but, if the sample-rate it is being taken from is still 5.5kHz, then the artifacts that result from the linear interpolation, which is being implied somewhere, should not be terribly audible.


 

(Update 6/22/2020, 15h00: )

According to some brief calculations, when the ‘46.875Hz’ frequency component is at its peak, its second derivative is also at its (negative) peak. And then, after subdividing into ~100 samples, applying a linear interpolation should still cause a 5.5kHz noise to be measurable, at a relative level of -42db.

This type of result can be suppressed, by not using the most primitive method available, to up-sample, that being, a linear interpolation. In a much earlier posting, I had already described that a better method exists, that works well, when the signal frequency is much lower than the Nyquist Frequency.


 

(Update 6/23/2020, 18h50: )

I suppose that the initial statement, of the purpose of this posting, can be taken more literally, to try to minimize RAM and CPU usage ‘to the maximum’. But if that were done, then I’d also worry that somewhere, doing so might come at the expense of ‘ultimate sound quality’, at least, as far as that is conceivable, for a 44.1kHz sample rate.

This entire hypothesis can be refactored, so that each sample-buffer is only 176 samples long, so that there are 4 implied buffers instead of 3 at the default GUI configuration, and so that the sine-value-table only has 352 sine-function values, not 704. As a result, ‘only’ 39 million multiplications per second would need to be computed, instead of 78 million.

But then, one side effect of that would be, that for the octaves low enough to require later up-sampling, the frequency which I’ve defined as (F=3), would only be subdivided into ~50 samples per cycle, instead of ~100. At that point, the relative level of the resampling artifact, that would result from purely linear interpolation, would be -36db instead of -42db.

It would still be true that, given a better method of interpolation, the artifacts due to this very high amount of granularity could be made much weaker than -36db. But even a hypothetical, ‘better method of interpolation’, would eventually break at some point.


 

Either way, an assumption I’m making is, that the GUI is interpolating certain slider-positions, in addition to whatever ‘the real equalizer’ is doing in the background.

 

Dirk

 

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>