Home Page      Guest Comments

Moulton  Lectures

 

On

Electro-Acoustics

 

 

 

Lecture 19

 

Digital Noise Cancellation 4

 

Adaptive Filters And The LMS Algorithm

 

 

 

Presented by:

Dave L Moulton

 

 

 

 

Location:  Thales Acoustics Harrow UK

Date:  09-April-2003

 


Content

 

Continuing on from lecture 18, I now want to explain how coefficient adaption can be achieved using probably the most famous Algorithm in the world of adaptive Filtering. This algorithm is called the Least Mean Squared (LMS) algorithm.  

 

I will explain the derivation of this algorithm in its most simplest form. A small degree of Matrix theory will be required in order to reach the final derivation.  I will not devote a lot of time teaching Matrix theory but simply rely on the intuition of the method to get us through the stages of derivation.

 

Towards the end of the lecture I will give examples of the adaptive filter in action, showing block diagram arrangements of noise cancellers and predictive correlation.

 

On with the lecture

 

Lets first remind ourselves of the simplified block arrangement for an adaptive FIR filter.

 

Figure 19.1

 

 

We have already seen from lecture 18 that we can represent the output of the FIR filter by using the following mathematical expression:

 

 

We can also represent the error signal e(n) in the following way:

 

 

From Figure 19.1 we can see that the adaptive controller is a function of the error signal e(n).  In the case of performing a system identification, the filter coefficients will adapt to match the impulse response of the unknown system when the error signal is reduced to zero.  So clearly the adaptive controller must contain an algorithm that has the task of monitoring e(n) and then adjusting each of the coefficients in the FIR filter until e(n) ideally reduces to zero.

 

In real life we know that we cannot get a 100% perfect match for the impulse response of a continuous (Analogue system) using a discrete filter. So from an analytical point of view we tend to consider the system to have reached optimum adaption when e(n) reaches a minimum value.  One way of representing the error function is to look at the squared error e2(n). This allows us to view the error as a positive quantity rather than a signal that can go either positive or negative. We call this function the Squared Error and give it the symbol (x)

 

Thus we can express our error function in the following way:

 

 

Statistically we would be interested in the Mean Squared Error (MSE), which is defined as:

 

 

 

In order to achieve coefficient adaptation we need to do the following:

 

1.     Relate the Squared Error  x to the Weighted FIR filter coefficients w.

2.     Minimize x as a function of w

 

The two steps above will enable us to develop an algorithm that causes each weighting w in the FIR filter to converge to an optimum value resulting in a minimum value for x.

 

Note:  Each coefficient will converge to its own optimum value.  This optimum will result in the coefficients being weighted to create the optimum impulse response for the system.

 

Now lets go back to expression [19.1].  This expression is very nice, however it does not lend itself to a great deal of mathematical flexibility.  We need to soften up this expression by representing it in terms of matrix notation.  We can use matrices to represent operations on series of data values.  For example I can represent all of the coefficients in the FIR filter as a row Matrix W, thus:

 

 

I can also represent the input data as a row Matrix X.  Since my FIR filter can only operate on a maximum of N data values at anyone time (equal to the number of FIR coefficients) my matrix X will contain N data points, starting at x(n), thus:

 

This is a good start, but we must now find a way of combining both matrices W and X to form a matrix that describes all of the operations carried out in equation [19.1].  It so happens that this is very easy to do, simply by transposing the W matrix (WT) and multiplying it by the X matrix. 

 

The transpose of W simply converts it from being a row matrix into a column matrix.

 

Thus:

 

Hence :

Which results in the square matrix (equal number of rows and columns)

 

 

We can put this matrix to the test by simply applying a single impulse to the FIR filter and watch how the impulse ripples through the filter resulting in the weighted coefficients of the FIR filter.

Figure 19.2

 

If we write the Matrix out we can see the impulse ripple through the filter resulting in a diagonal matrix containing the impulse response of the filter. The Column  Matrix Y(n) represents the output at each interval of time. Y(n) is basically the sum of all the components in the nth row of the matrix WTX.

 

 

If I had put a step function into the filter then the result would have been different:

Figure 19.3

 

Resulting in a Matrix which looks like:

 

 

Thus:

 

 

 

 

 

Note: matrix theory allows us to swap the transpose matrix and still get the same result.

 

Thus:

WTX = XTW……..[19.16]

 

W2 = WTW………[19.17]

 

X2 = XTX……….[19.18]

 

Getting back to expression [19.1], we can now re-express this as follows:

 

 

Thus we can also write our error function as:

 

 

So we have now met the first of our objectives, by establishing a simple link between the error function e(n) and the weighting function W.

 

Gradient Iteration to achieve optimum weighting coefficients

 

Now let us consider a recurrence relationship that allows each weight wk to be updated based on the previous value of the weight plus some other controlling function F(n).

 

We could express this algorithm as:

 

 

So what about F(n)

 

F(n) needs to be a function that is directly related to the minimum error, since reaching the minimum error will only occur when w(n+1) =w(n).

 

One way of relating F(n) to the error, is to relate it to the rate of change of the Mean Squared Error x with respect to the weighting w. If we start at a maximum error we would like to iterate down a performance curve to reach the minimum, hence we will be working with a negative gradient function, Thus:

 

 

We can introduce a constant m for the proportionality:

 

 

Graphically this looks like figure 19.4.  Where m represents the step size down the performance curve. m is very significant in determining the stability of the recurrence algorithm.

 

Figure 19.4

 

Since we are trying to achieve a Least Mean Square (LMS) value in the error function to get optimum weighting values, we call this algorithm the LMS Algorithm.

 

 

Writing x as e2(n) we have:

 

Performing the Differentiation of e2(n) gives:

 

 

Substituting for e(n) From equation [19.20] into the differential we get:

 

 

Performing the Differentiation:

 

 

 

[19.28] relates to the operation on the nth weighing w(n), re-expressing this as a function of Matrices will represent the operation on all of the FIR weighting coefficients.  Thus:

 

 

We can now identify the adaptive controller as the LMS algorithm.

 

Figure 19.5

 

Some Examples of Adaptive Systems:

 

1.     Noise Canceller (figure 19.6)

 

In this arrangement the noise entering the filter correlates with noise entering the summing node (represented by the desired signal d(n)), the filters coefficients adapt to reduce the error in the noise signal to zero. However speech and the noise are uncorrelated and therefore ignored by the adaptive filter.  The result is that the noise is removed from the signal. 

 

The error signal e(n) represents the speech  signal less the noise.

 

 

Figure 19.6

 

2.     Prediction System (figure 19.7)

 

This arrangement forms the basis of many ANC systems for electrically stripping noise signals off of microphone lines.

 

In this system Speech and Noise exist together on the same input channel. Most noise tends to be continuous and with relatively periodic fluctuations in amplitude, a delayed version of the noise will have a very similar spectrum to an un-delayed version. Thus the noise signal can still be correlated after the delay.  Speech on the other hand has  rapidly changing amplitude and phase, so a delayed version of speech may have very little in common with an un-delayed version.  The delay in 19.7 is used to de-correlate the speech but not the noise. The result is that the noise will be adapted out of the signal and the speech will remain.

 

Clearly in this type of system the error needs to be chosen carefully so that the de-correlation of the speech signal is not at the expense of an excessive delay in the speech signal. A long delay can result in a very un-natural side tone which is uncomfortable for the user and can reduce communications effectiveness.

 

If the delay is too short the speech signal might still have significant components that can be correlated with the delayed version, this results in the filters adapting to components in the speech and modifying the voice response leaving the users voice sounding distorted. Often the affect is like the voice sound you get after inhaling Helium into the lungs.

 

 

Figure 19.7

 

3.     Inverse system Identification (figure 19.8)

 

Remember back to Lecture 18 where we looked at a method for establishing the impulse response of the headset transfer function H(s) using adaptive system identification.

 

As ever our Active Noise Reduction Closed Loop equation takes the form:

 

 

To achieve ideal noise cancellation we would like to achieve the following condition:

 

 

Using inverse system identification we could get the FIR coefficients to adapt to the impulse response of the inverse function.

 

 

Figure 19.8

 

 

End Of Lecture