Linear Predictive Coding

Linear Predictive Coding is a method of using a constant number of known sample-values, that precede an unknown sample-value, and to find coefficients for each of the preceding sample-values, which they should be multiplied by, and the products summed, to derive the most-probable following sample-value.

More specifically, while the exercise of applying these coefficients should not require much explaining, methods for deriving them do.

Finding these coefficients is also called an Auto-Correlation, because the following sample is part of the same sequence, as the preceding samples belonged to.

Even though LPC travels along the stream of values, each preceding position relative to the current sample to predict is treated as having a persistent meaning, and for the sake of simplicity I’ll be referring to each preceding sample-position as One Predictor.

If the Linear Predictive Coding was only to be of the 2nd order, thus taking into account 2 Predictors, then it will often be simple to use a fixed system of coefficients.

In this case, the coefficients should be { -1, +2 }, which will successfully predict the continuation of a straight line, and nothing else.

One fact about a set of coefficients is, that their sum should be equal to 1, in order to predict a DC value correctly.

( If the intent was to use a set of 3 predictors, to conserve both the 1st and the 2nd derivatives of a curve, then the 3 coefficients should automatically be { +1, -3, +3 } . But, what’s needed for Signal Processing is often not, what’s needed for Analytical Geometry. )

But for orders of LPC greater than 3, the determination of the coefficients is anything but trivial. In order to understand how these coefficients can be computed, one must first understand a basic concept in Statistics called a Correlation. A correlation supposes that an ordered set of X-value and Y-value pairs exist, which could have any values for both X and Y, but that Y is supposed to follow from X, according to a linear equation, such that

Y = α + β X

Quite simply, the degree of correlation is the ideal value for β, which achieves the closest-fitting set of predicted Y-values, given hypothetical X-values.

The process of trying to compute this ideal value for β is also called Linear Regression Analysis, and I keep a fact-sheet about it:

Fact Sheet on Regression Analyses.

This little sheet actually describes Non-Linear Regression Analysis at the top, using a matrix which states the polynomial terms of X, but it goes on to show the simpler examples of Linear Regression afterward.

There is a word of commentary to make, before understanding correlations at all. Essentially, they exist in two forms

  1. There is a form, in which the products of the deviations of X and Y are divided by the variance of X, before being divided by the number of samples.
  2. There is a form, in which the products of the deviations of X and Y are divided by the square root, of the product of the variance of X and the variance of Y, before being divided by the number of samples.

The variance of any data-set is also its standard deviation squared. And essentially, there are two ways to deal with the possibility of non-zero Y-intercepts – non-zero values of α. One way is to compute the mean of X, and to use the deviations of individual values of X from this mean, as well as to find the corresponding mean of Y, and to use deviations of individual values of Y from this mean.

Another way to do the Math, is what my fact-sheet describes.

Essentially, Form (1) above treats Y-values as following from known X-values, and is easily capable of indicating amounts of correlation greater than 1.

Form (2) finds how similar X and Y -values are, symmetrically, and should never produce correlations greater than 1.

For LPC, Form (2) is rather useless, and the mean of a set of predictors must be found anyway, so that individual deviations from this mean are also the easiest values to compute with.

The main idea when this is to become an autocorrelation, is that the correlation of the following sample is computed individually, as if it was one of the Y-values, as following each predictor, as if that was just one of the X-values. But it gets just a touch trickier…

(Last Edited 06/07/2017 … )

Continue reading Linear Predictive Coding

My Distinction Between Variables And Constants

The way I process information, applied to ‘Computer Algebra Systems’, defines the difference between constants and variables in a context-sensitive way. It’s for the purpose of solving one problem, that certain symbols in an expression become variables, others constants, and others yet, function names. The fact that a syntax has been defined to store these symbols, does not affect the fact that their status can be changed from constant to variable and vice-versa.

I’ll name an example. For most purposes a Univariate Polynomial has the single variable (x), denotes powers of (x) as its base terms, and multiplies each of the base terms by a constant coefficient. To some people this might seem immutable.

But if the purpose of the exercise is to compute a Statistical, Polynomial Regression – which is “an overdetermined system” – then we must find optimal values for prospective coefficients. We can use this as the basis to form a “Polynomial Approximation” of a system, which could be of the 8th degree for example, and yet this polynomial must fit a data-set as closely as possible, which could have a list of 20 values of (x), each associated with a real value of (y), which our optimized set of coefficients is supposed to approximate, from the powers of (x), including the power (0), which always yields the base value (1).

In order to determine our 9 coefficients, we need to decide that all the powers of (x) have become constants. The coefficients we’re trying to determine best, have now become the variables in our problem. Thus, we have a column-vector of real (y)s (still variables), and matrices which state the powers of (x) which supposedly led to those values of (y). I believe that this is a standard for doing so:

Regression Analysis Guide

Well another conclusion we can reach, is that the base values which need to be correlated with real (y), aren’t limited to powers of (x). They could just as easily be some other functions of (x). It’s just that one advantage which polynomials have, is that if there is some scaling of (x), it’s possible to define a scaled parameter (t = ux) such that a corresponding polynomial in terms of (t) can do what our polynomial in terms of (x) did. If the base value was ( sin(x) ) , then ( sin(t) ) could not simply take its place. This is important to note if we are trying to approximate orbital motions of planets for example.

But then as soon as we’ve computed our best-fitting vector of coefficients, we can treat them as constants again, so that to plug in different values of (x) which did not occur in the original data-set, will also yield the corresponding, predicted values of (y’). So now (x) and (y’) are our variables again.

Dirk