## Sound Fonts: How something that I blogged, was still not 100% accurate.

Sometimes it can happen that, in order to explain a subject 100% accurately, would seem to require writing an almost endless amount of text, and that, with short blog postings, I’m limited to always posting a mere approximation of the subject. The following posting would be a good example:

(Link to an earlier posting.)

Its title clearly states, that there are exactly two types of interpolation-for-use-in-resampling (audio). After some thought, I realized that a third type of interpolation might exist, and that it might especially be useful for Sound Fonts.

According to my posting, the situation can exist, in which the relationship between the spacing of (interpolated) output samples and that of (Sound Font) input samples, is an irrational relationship, and then plausibly, the approach would be, to derive a polynomial’s actual coefficients from the input sample-values (which would be the result of one matrix multiplication), and compute the value of the resulting polynomial at (x = t), (0.0 <= t < 1.0).

But there is an alternative, which is, that the input samples could be up-sampled, arriving at a set of sub-samples with fixed positions, and then, every output sample could arise as a mere linear interpolation, between two sub-samples.

It would seem then, that a third way to interpolate is feasible, even when the spacing of output samples is irrational with respect to that of input samples.

Also, with Sound Fonts, the possibility presents itself, that the Sound Font itself could have been recorded professionally, at a sample rate considerably higher than 44.1kHz, such as maybe at 96kHz, just so that, if the Sound Font Player did rely on linear interpolation, doing so would not mess up the sound as much, as if the Sound Font itself had been recorded at 44.1kHz.

Further, specifically with Sound Font Players, the added problem presents itself, that the virtual instrument could get bent upward in pitch, even though its recording already had frequencies approaching the Nyquist Frequency, so that those frequencies could end up being pushed ‘higher than the output Nyquist Frequency’, thereby resulting in aliasing – i.e., getting reflected back down to lower frequencies – even though each output sample could have been interpolated super-finely by itself.

These are the main reasons why, as I did experience, to play a sampled sound at a very high, bent pitch, actually just results in ‘screeching’.

Yet, the Sound Font Player could again be coded cleverly, so that it would also sub-sample its output sample rate. I.e., if expected to play the virtual instrument at a sample rate of 44.1kHz, it could actually compute interpolated samples closer together than that, corresponding to 88.2kHz, and then the Sound Font Player could compute each ‘real output sample’ as the average between two ‘virtual, sub-sampled output samples’. This would effectively insert a low-pass filter, which would flatten the screeching that would result from frequencies higher than 22kHz, being reflected below 22kHz, and eventually, all the way back down to 0kHz. And admittedly, the type of (very simple) low-pass filter such an arrangement would imply, would be The Haar Wavelet again.

If you asked me what the best was, which a Soundblaster sound card from 1998 would have been able to do, I’d say, ‘Just compute each audio sample as a linear interpolation between two, Sound Font samples.’ Doing so would have required an added lookup into an address in (shared) RAM, a subtraction, a multiplication, and an addition. In fact, basing this speculation on my estimation of how much circuit-complexity such an early Soundblaster card just couldn’t have had, I’d say that those cards would need to have applied integer arithmetic, with a limited number of fractional bits – maybe 8 – to state which Sound Font sample-position, a given audio sample was being ‘read from’, ?  It would have been up to the driver, to approximate the integer fed to the hardware. And then, if that sound card was poorly designed, its logic might have stated, ‘Just truncate the Sound-Font sample-position being read from, to the nearest sample.’

In contrast, when Audio software is being programmed today, one of the first things the developer will insist on, is to apply floating-point numbers wherever possible…

Also, if a hypothetical, superior Sound Font Player did have as logic, ‘If the sample rate of the loaded Sound Font (< 80kHz), up-sample it 2x; if that sample rate is actually (< 40kHz), up-sample it 4x…’, just to simplify the logic to the point of making it plausible, this up-sampling would only take place once, when the Sound Font is actually being loaded into RAM. By contrast, the oversampling of the output of the virtual instrument, as well as the low-pass filter, would need to be applied in real-time… ‘If the output sample rate is (>= 80kHz), replace adjacent Haar Wavelets with overlapping Haar Wavelets.’

Food for thought.

Sincerely,
Dirk

## A Hypothetical Algorithm…

One of the ideas which I’ve written about often is, that when certain Computer Algebra Software needs to compute the root of an equation, such as of a polynomial, an exact Algebraic solution, which is also referred to as the analytical solution, or symbolic Math, may not be at hand, and that therefore, the software uses numerical approximation, in a way that never churned out the Algebraic solution in the first place. And while it might sound disappointing, often, the numerical solution is what Engineers really need.

But one subject which I haven’t analyzed in-depth before, was, how this art might work. This is a subject which some people may study in University, and I never studied that. I can see that in certain cases, an obvious pathway suggests itself. For example, if somebody knows an interval for (x), and if the polynomial function of (x), that being (y), happens to be positive at one end of the interval, and negative at the other end, then it becomes feasible to keep bisecting the interval, so that if (y) is positive at the point of bisection, its value of (x) replaces the ‘positive’ value of (x) for the interval, while if at that new point, (y) is negative, its value for (x) replaces the ‘negative’ value of (x) for the interval. This can be repeated until the interval has become smaller than some amount, by which the root is allowed to be inaccurate.

But there exist certain cases in which the path forward is not as obvious, such as what one should do, if one was given a polynomial of an even degree, that only has complex roots, yet, if these complex roots nevertheless needed to be found. Granted, in practical terms such a problem may never present itself in the lifetime of the reader. But if it does, I just had lots of idle time, and have contemplated an answer.

(Updated 1/30/2019, 13h00 … )

## How the JACK Sound Daemon is capable of running at 192kHz

Most of my Linux-computers have as their sound-server “Pulse Audio”. But specifically on my laptop named ‘Klystron’, I have set up the JACK Daemon to be able to run as an alternative, yet not to be running by default. I have performed experiments on that laptop, to confirm that I can launch this sound-server, using a GUI named ‘QJackCtl’, but have also had to make modifications to how this GUI executes commands from the user, so that its start-up pauses the Pulse Audio daemon, which has been able to resume successfully after I was done using JACK. Without such a detail, the attempt should not be made.

One fact which I can see in QJackCtl, is that JACK is capable of running at 192kHz, even though it has not interrogated any of the available devices, about their real capabilities are.

The reason this is possible is the fact that individual sound devices are just clients to that daemon, including any number of devices that act as sound-sources, rather than acting as sinks, i.e. that act as inputs rather than as one output.

I also own a USB-Sound-Device named the ‘Scarlett Focusrite 2i2′, which is mainly intended for use in sound capture, but which also has outputs intended for monitoring purposes.

If I was to run JACK at 192kHz, then one simple consequence of that would be, that zero actual sound-devices would remain compatible with it. As to how cleanly an attempt to connect to an incompatible device exits, giving error messages or crashes, I have not tested, because when I tested the Focusrite, I took into account the real limit of that device at 96kHz.

Similarly, the JACK Daemon runs with 32-bit linear precision by default. In this case, when we enable devices to act as clients, which are only capable of 24-bit sample-depth, which is common, the mismatch is safely ignored. JACK already sees to it, that the last 8 bits of precision get ignored.

Now, I could be cautious and worry, that because of errors in the Linux drivers, those last 8 bits somehow get mapped to a control register as an error. But then the simple way to test for that, was simply to send some 32-bit sound through JACK, to this output device. What I found when testing this, is that the basic operation of the Focusrite was not disturbed, even though my hearing was not good enough, to tell me when I had my Sennheisers on, whether in fact 24-bit precision was still working. I was mainly testing, that trying to send a 32-bit value, does not disrupt the actual operation.

## A Cheapo Idea To Throw Out There, On Digital Oversampling

In This Posting, I elaborated at length, about Polynomial Approximation that is not overdetermined, but rather exactly defined, by a set of unknown (y) values along a set of known time-coordinates (x). Just to summarize, If the sample-time-points are known to be arbitrary X-coordinates 0, 1, 2 and 3, then the matrix (X1) can state the powers of these coordinates, and If additionally the vector (A) stated the coefficients of a polynomial, then the product ( X1 * A ) would also produce the four y-values as vector (Y).

X1 can be computed before the algorithm is designed, and its inverse, ( X1^-1 ), would be such that ( X1^-1 * Y = A ). Hence, given a prepared matrix, a linear multiplication can derive a set of coefficients easily from a set of variable y-values.

Well this idea can be driven further. There could be another arbitrary set of x-coordinates 1.0, 1.25, 1.5, 1.75 , which are meant to be a 4-point interpolation within the first set. And then another matrix could be prepared before the algorithm is committed, called (X2), which states the powers of this sequence. And then ( X2 * A = Y' ), where (Y') is a set of interpolated samples.

What follows from this is that ( X2 * X1^-1 * Y = Y' ). But wait a moment. Before the algorithm is ever burned onto a chip, the matrix ( X2 * X1^-1 ) can be computed by Human programmers. We could call that our constant matrix (X3).

So a really cheap interpolation scheme could start with a vector of 4 samples (Y), and derive the 4 interpolated samples (Y') just by doing one matrix-multiplication ( X3 * Y = Y' ). It would just happen that

Y'[1] = Y[2]

And so we could already guess off the top of our heads, that the first row of X3 should be equal to ( 0, 1, 0, 0 ).

While this idea would certainly be considered obsolete by standards today, it would correspond roughly to the amount of computational power a single digital chip would have had in real-time, in the late 1980s… ?

I suppose that an important question to ask would be, ‘Aside from just stating that this interpolation smooths the curve, what else does it cause?’ And my answer would be, that Although it Allows for (undesirable) Aliasing of Frequencies to occur during playback, when the encoded ones are close to the Nyquist Frequency, If the encoded Frequencies are about 1/2 that or lower, Very Little Aliasing will still take place. And so, over most of the audible spectrum, this will still act as a kind of low-pass filter, although over-sampling has taken place.

Dirk