Linear Predictive Coding

Linear Predictive Coding is a method of using a constant number of known sample-values, that precede an unknown sample-value, and to find coefficients for each of the preceding sample-values, which they should be multiplied by, and the products summed, to derive the most-probable following sample-value.

More specifically, while the exercise of applying these coefficients should not require much explaining, methods for deriving them do.

Finding these coefficients is also called an Auto-Correlation, because the following sample is part of the same sequence, as the preceding samples belonged to.

Even though LPC travels along the stream of values, each preceding position relative to the current sample to predict is treated as having a persistent meaning, and for the sake of simplicity I’ll be referring to each preceding sample-position as One Predictor.

If the Linear Predictive Coding was only to be of the 2nd order, thus taking into account 2 Predictors, then it will often be simple to use a fixed system of coefficients.

In this case, the coefficients should be { -1, +2 }, which will successfully predict the continuation of a straight line, and nothing else.

One fact about a set of coefficients is, that their sum should be equal to 1, in order to predict a DC value correctly.

( If the intent was to use a set of 3 predictors, to conserve both the 1st and the 2nd derivatives of a curve, then the 3 coefficients should automatically be { +1, -3, +3 } . But, what’s needed for Signal Processing is often not, what’s needed for Analytical Geometry. )

But for orders of LPC greater than 3, the determination of the coefficients is anything but trivial. In order to understand how these coefficients can be computed, one must first understand a basic concept in Statistics called a Correlation. A correlation supposes that an ordered set of X-value and Y-value pairs exist, which could have any values for both X and Y, but that Y is supposed to follow from X, according to a linear equation, such that

Y = α + β X

Quite simply, the degree of correlation is the ideal value for β, which achieves the closest-fitting set of predicted Y-values, given hypothetical X-values.

The process of trying to compute this ideal value for β is also called Linear Regression Analysis, and I keep a fact-sheet about it:

Fact Sheet on Regression Analyses.

This little sheet actually describes Non-Linear Regression Analysis at the top, using a matrix which states the polynomial terms of X, but it goes on to show the simpler examples of Linear Regression afterward.

There is a word of commentary to make, before understanding correlations at all. Essentially, they exist in two forms

  1. There is a form, in which the products of the deviations of X and Y are divided by the variance of X, before being divided by the number of samples.
  2. There is a form, in which the products of the deviations of X and Y are divided by the square root, of the product of the variance of X and the variance of Y, before being divided by the number of samples.

The variance of any data-set is also its standard deviation squared. And essentially, there are two ways to deal with the possibility of non-zero Y-intercepts – non-zero values of α. One way is to compute the mean of X, and to use the deviations of individual values of X from this mean, as well as to find the corresponding mean of Y, and to use deviations of individual values of Y from this mean.

Another way to do the Math, is what my fact-sheet describes.

Essentially, Form (1) above treats Y-values as following from known X-values, and is easily capable of indicating amounts of correlation greater than 1.

Form (2) finds how similar X and Y -values are, symmetrically, and should never produce correlations greater than 1.

For LPC, Form (2) is rather useless, and the mean of a set of predictors must be found anyway, so that individual deviations from this mean are also the easiest values to compute with.

The main idea when this is to become an autocorrelation, is that the correlation of the following sample is computed individually, as if it was one of the Y-values, as following each predictor, as if that was just one of the X-values. But it gets just a touch trickier…

(Last Edited 06/07/2017 … )

Continue reading Linear Predictive Coding