Linear Predictive Coding

Linear Predictive Coding is a method of using a constant number of known sample-values, that precede an unknown sample-value, and to find coefficients for each of the preceding sample-values, which they should be multiplied by, and the products summed, to derive the most-probable following sample-value.

More specifically, while the exercise of applying these coefficients should not require much explaining, methods for deriving them do.

Finding these coefficients is also called an Auto-Correlation, because the following sample is part of the same sequence, as the preceding samples belonged to.

Even though LPC travels along the stream of values, each preceding position relative to the current sample to predict is treated as having a persistent meaning, and for the sake of simplicity I’ll be referring to each preceding sample-position as One Predictor.

If the Linear Predictive Coding was only to be of the 2nd order, thus taking into account 2 Predictors, then it will often be simple to use a fixed system of coefficients.

In this case, the coefficients should be { -1, +2 }, which will successfully predict the continuation of a straight line, and nothing else.

One fact about a set of coefficients is, that their sum should be equal to 1, in order to predict a DC value correctly.

( If the intent was to use a set of 3 predictors, to conserve both the 1st and the 2nd derivatives of a curve, then the 3 coefficients should automatically be { +1, -3, +3 } . But, what’s needed for Signal Processing is often not, what’s needed for Analytical Geometry. )

But for orders of LPC greater than 3, the determination of the coefficients is anything but trivial. In order to understand how these coefficients can be computed, one must first understand a basic concept in Statistics called a Correlation. A correlation supposes that an ordered set of X-value and Y-value pairs exist, which could have any values for both X and Y, but that Y is supposed to follow from X, according to a linear equation, such that

Y = α + β X

Quite simply, the degree of correlation is the ideal value for β, which achieves the closest-fitting set of predicted Y-values, given hypothetical X-values.

The process of trying to compute this ideal value for β is also called Linear Regression Analysis, and I keep a fact-sheet about it:

Fact Sheet on Regression Analyses.

This little sheet actually describes Non-Linear Regression Analysis at the top, using a matrix which states the polynomial terms of X, but it goes on to show the simpler examples of Linear Regression afterward.

There is a word of commentary to make, before understanding correlations at all. Essentially, they exist in two forms

  1. There is a form, in which the products of the deviations of X and Y are divided by the variance of X, before being divided by the number of samples.
  2. There is a form, in which the products of the deviations of X and Y are divided by the square root, of the product of the variance of X and the variance of Y, before being divided by the number of samples.

The variance of any data-set is also its standard deviation squared. And essentially, there are two ways to deal with the possibility of non-zero Y-intercepts – non-zero values of α. One way is to compute the mean of X, and to use the deviations of individual values of X from this mean, as well as to find the corresponding mean of Y, and to use deviations of individual values of Y from this mean.

Another way to do the Math, is what my fact-sheet describes.

Essentially, Form (1) above treats Y-values as following from known X-values, and is easily capable of indicating amounts of correlation greater than 1.

Form (2) finds how similar X and Y -values are, symmetrically, and should never produce correlations greater than 1.

For LPC, Form (2) is rather useless, and the mean of a set of predictors must be found anyway, so that individual deviations from this mean are also the easiest values to compute with.

The main idea when this is to become an autocorrelation, is that the correlation of the following sample is computed individually, as if it was one of the Y-values, as following each predictor, as if that was just one of the X-values. But it gets just a touch trickier…

(Last Edited 06/07/2017 … )

Continue reading Linear Predictive Coding

A single time-delay can also be expressed in the frequency-domain.

Another way to state, that a stream of time-domain samples has been given a time-delay, is simply to state that each frequency-coefficient has been given a phase-shift, that depends both on the frequency of the coefficient, and on the intended time-delay.

A concern that some readers might have with this, is the fact that a number of samples need to be stored, in order for a time-delay to be executed in the time-domain. But as soon as differing values for coefficients, for a Fourier Transform, are spaced closer together, indicating in this case a longer time-delay, its computation also requires that a longer interval of samples in the time-domain need to be combined.

Now, if the reader would like to visualize what this would look like, as a homology to a graphical equalizer, then he would need to imagine a graphical equalizer the sliders of which can be made negative – i.e. one that can command, that one frequency come out inverted – so that then, if he was to set his sliders into the accurate shape of a sine-wave that goes both positive and negative in its settings, he should obtain a simple time-delay.

But there is one more reason for which this homology would be flawed. The type of Fourier Transform which is best-suited for this, would be the Discrete Fourier Transform, not one of the Discrete Cosine Transforms. The reason is the fact that the DFT accepts complex numbers as its terms. And so the reader would also have to imagine, that his equalizer not only have sliders that move up and down, but sliders with little wheels on them, from which he can give a phase-shift to one frequency, without changing its amplitude. Obviously graphical equalizers for music are not made that way.

Continue reading A single time-delay can also be expressed in the frequency-domain.

How certain signal-operations are not convolutions.

One concept that exists in signal processing, is that there could be a definition of a filter, which is based in the time-domain, and that this definition can resemble a convolution. And yet, a derived filter could no longer be expressible perfectly as a convolution.

For example, the filter in question might add reverb to a signal recursively. In the frequency-domain, the closer two frequencies are, which need to be distinguished, the longer the interval is in the time-domain, which needs to be considered before an output sample is computed.

Well, reverb that is recursive would need to be expressed as a convolution with an infinite number of samples. In the frequency-domain, this would result in sharp spikes instead of smooth curves.

I.e., If the time-constant of the reverb was 1/4 millisecond, a 4kHz sine-wave would complete within this interval, while a 2kHz sine-wave would be inverted in phase 180⁰. What this can mean is that a representation in the frequency-domain may simply have maxima and minima, that alternate every 2kHz. The task might never be undertaken to make the effect recursive.

(Last Edited on 02/23/2017 … )

Continue reading How certain signal-operations are not convolutions.

A Thought on SRS

Today, when we buy a laptop, we assume that its internal speakers offer inferior sound by themselves, but that through the use of a feature named ‘SRS’, they are enhanced, so that sound which simply comes from two speakers in front of us, seems to fill the space around us, kind of how surround-sound would work.

The immediate problem with Linux computers is, that they do not offer this enhancement. However, technophiles have known for a long time that this problem can be solved.

The underlying assumption here is, that the stereo being sent to the speakers should act as if each channel was sent to one ear in an isolated way, as if we were using headphones.

The sound that leaves the left speaker, reaches our right ear with a slightly longer time-delay, than the time-delay with which it reaches our left ear, and a converse truth exists for the right speaker.

It has always been possible to time-delay and attenuate the sound that came from the left speaker in total, before subtracting the result from the right speaker-output, and vice-verso. That way, the added signal that reaches the left ear from the left speaker, cancels with the sound that reached it from the right speaker…

The main problem with that effect, is that it will mainly seem to work when the listener is positioned in front of the speakers, in exactly one position.

I have just represented a hypothetical setup in the time-domain. There can exist a corresponding representation in the frequency-domain. The only problem is, that this effect cannot truly be achieved just with one graphical equalizer setting, because it affects (L+R) differently from how it affects (L-R). (L+R) would be receiving some recursive, negative reverb, while (L-R) would be receiving some recursive, positive reverb. But reverb can also be expressed by a frequency-response curve, as long as that has sufficiently fine resolution.

This effect will also work well with MP3-compressed stereo, because with Joint Stereo, an MP3 stream is spectrally complex in its reproduction of the (L-R) component.

I expect that when companies package SRS, they do something similar, except that they may tweak the actual frequency-response curves into something simpler, and they may also incorporate a compensation, for the inferior way the speakers reproduce frequencies.

Simplifying the curves would allow the effect to break down less, when the listener is not perfectly positioned.

We do not have it under Linux.

(Edit 02/24/2017 : A related effect is possible, by which 2 or more speakers are converted into an effectively-directional speaker-system. I.e., the intent could be, that sound which reaches our filter as the (L) channel, should predominantly leave the speaker-set at one angle, while sound which reaches our filter as the (R) channel, should leave the speaker-set at an opposing angle.

In fact, if we have an entire array of speakers – i.e. a speaker-bar – then we can apply the same sort of logic to them, as we would apply to a phased-array radar system.

The main difference with such a system, as opposed to one based on the Inter-Aural Delay, is that this one would absolutely require we know the distance between the speakers. And then we would use that distance, as the basis for our time-delays… )

Continue reading A Thought on SRS