# An overview on optimized NLMS algorithms for acoustic echo cancellation

- Constantin Paleologu
^{1}Email author, - Silviu Ciochină
^{1}, - Jacob Benesty
^{2}and - Steven L. Grant
^{3}

**Received: **30 June 2015

**Accepted: **6 November 2015

**Published: **19 November 2015

## Abstract

Acoustic echo cancellation represents one of the most challenging system identification problems. The most used adaptive filter in this application is the popular normalized least mean square (NLMS) algorithm, which has to address the classical compromise between fast convergence/tracking and low misadjustment. In order to meet these conflicting requirements, the step-size of this algorithm needs to be controlled. Inspired by the pioneering work of Prof. E. Hänsler and his collaborators on this fundamental topic, we present in this paper several solutions to control the adaptation of the NLMS adaptive filter. The developed algorithms are “non-parametric” in nature, i.e., they do not require any additional features to control their behavior. Simulation results indicate the good performance of the proposed solutions and support the practical applicability of these algorithms.

## 1 Review

### 1.1 Introduction

Hands-free audio terminals are required in many popular applications, such as mobile telephony and teleconferencing systems. An important issue that has to be addressed when dealing with such devices is the acoustic coupling between the loudspeaker and the microphone. Due to this coupling, the microphone of the device captures a signal coming from its own loudspeaker, known as the acoustic echo. This phenomenon is influenced by the environment’s characteristics, and it can be very disturbing for the users. For example, in a telephone conversation, the user could hear a replica of her/his own voice. Consequently, in order to enhance the overall quality of the communication, there is a need to cancel the unwanted acoustic echo.

In this context, acoustic echo cancellation (AEC) provides one of the best solutions to the control of acoustic echoes generated by hands-free audio terminals. The basic issue in AEC is then to estimate the impulse response between the loudspeaker and the microphone of the device. The most reliable solution to this problem is the use of an adaptive filter that generates at its output a replica of the echo, which is further subtracted from the microphone signal [1–9]. In other words, the adaptive filter has to model an unknown system (i.e., the acoustic echo path between the loudspeaker and the microphone), like in a “system identification” problem [10–12].

Despite the straightforward formulation of the problem, there are several specific features of AEC, which represent a challenge for any adaptive algorithm. First, the acoustic echo paths have excessive lengths in time (up to hundreds of milliseconds), due to the slow speed of sound in the air, together with multiple reflections caused by the environment; consequently, long length adaptive filters are required (hundreds or even thousands of coefficients), thus influencing the convergence rate of the algorithm. Also, the acoustic echo paths are time-variant systems, depending on temperature, pressure, humidity, and movement of objects or bodies; hence, good tracking capabilities are required for the echo canceller. Second, the echo signal is combined with the near-end signal; ideally, the adaptive filter should separate this mixture and provide an estimate of the echo at its output and an estimate of the near-end from the error signal (from this point of view, the adaptive filter works as in an “interference cancelling” configuration [10–12]). This is not an easy task, since the near-end signal can contain both the background noise and the near-end speech; the background noise can be non-stationary and strong (it is also amplified by the microphone of the hands-free device), while the near-end speech acts like a large level disturbance. Moreover, the input of the adaptive filter (i.e., the far-end signal) is mainly speech, which is a non-stationary and highly correlated signal that can influence the overall performance of the adaptive algorithm.

In addition, the double-talk case (i.e., the talkers on both sides speak simultaneously) is perhaps the most challenging situation in AEC. The behavior of the adaptive filter can be seriously affected in this case, up to divergence. For this reason, the echo canceller is usually equipped with a double-talk detector (DTD), in order to slow down or completely halt the adaptation process during double-talk periods [6, 7]. Nevertheless, there is some inherent delay in the decision of any DTD; during this small period, a few undetected large amplitude samples can perturb the echo path estimate considerably. Consequently, it is highly desirable to improve the robustness of the adaptive algorithm in order to handle a certain amount of double-talk without diverging.

Many adaptive algorithms were proposed in the context of AEC [1–9, 13], but the workhorse remains the normalized least mean square (NLMS) algorithm [10–12]. The main reasons behind this popularity are its moderate computational complexity, together with its good numerical stability. The performance of the NLMS algorithm is influenced by two important parameters, i.e., the normalized step-size and regularization terms [1, 8, 11]. The first one reflects a trade-off between convergence rate and misadjustment of the algorithm. The second parameter is essential in all ill-posed and ill-conditioned problems such as in adaptive filters; it depends on the signal-to-noise ratio (SNR) of the system [14]. Both these parameters can be controlled (i.e., making them time dependent) in order to address the conflicting requirement of fast convergence and low misadjustment. This was the main motivation behind the development of variable step-size (VSS) and variable regularized (VR) versions of the NLMS algorithm, e.g., [13, 15–25]. Even if they focus on the optimization of different parameters, the VSS-NLMS and VR-NLMS algorithms are basically equivalent in terms of their purpose [1, 19]. In general, most of them require the tuning of some additional parameters that are difficult to control in practice. For real-world AEC applications, it is highly desirable to design “non-parametric” algorithms, which can operate without requiring additional features related to the acoustic environment (e.g., system change detector).

In this context, the contributions of Prof. E. Hänsler and his collaborators represent real milestones in the field. For example, in [1], Hänsler and Schmidt present a comprehensive and insightful review of the methods and algorithms used for acoustic echo and noise control. In their work, a special interest is given to the performance analysis of the NLMS algorithm (e.g., see Chapters 7 and 13 from [1]), in terms of developing optimal expressions for its control parameters, i.e., the normalized step-size and regularization term. In Section 1.2 of this paper, we summarize their main findings related to the control of the NLMS algorithm. Also, in Section 1.3, we present another benchmark solution, i.e., the non-parametric variable step-size NLMS (NPVSS-NLMS) algorithm [19]. Motivated and inspired by the work of Hänsler and Schmidt [1] (summarized in Section 1.2), we extend their findings in the framework of a state variable model (similar to Kalman filtering) [26]. The joint-optimized NLMS (JO-NLMS) algorithm developed in Section 1.4 brings together three main elements: a time-variant system model, an optimization criterion based on the minimization of the system misalignment, and an iterative procedure for adjusting the system model parameter. Consequently, it achieves a proper compromise between the performance criteria, i.e., fast convergence/tracking and low misadjustment, without requiring any additional features to control its behavior (like stability thresholds or system change detector). Simulations performed in Section 1.5 support the theoretical findings and indicate the good performance of the presented algorithms. Finally, Section 2 concludes this work and outlines several perspectives.

### 1.2 Control of the NLMS algorithm

*x*(

*n*), goes through the echo path,

**h**(

*n*), providing the echo signal,

*y*(

*n*), where

*n*is the time index. This signal is added to the near-end signal,

*v*(

*n*) (which can contain both the background noise and the near-end speech), resulting the microphone signal,

*d*(

*n*). The adaptive filter, defined by the vector \(\widehat {\mathbf {h}}(n)\), aims to produce at its output an estimate of the echo, \(\widehat {y}(n)\), while the error signal,

*e*(

*n*), should contain an estimate of the near-end signal.

*x*(

*n*). These two systems are assumed to be finite impulse response (FIR) filters of length

*L*, defined by the real-valued vectors:

^{ T }denotes transposition. The desired (or microphone) signal for the adaptive filter is

is a real-valued vector containing the *L* most recent time samples of the input signal, *x*(*n*), and *v*(*n*) (i.e., the near-end signal) plays the role of the system noise (assumed to be quasi-stationary, zero mean, and independent of *x*(*n*)) that corrupts the output of the unknown system.

*n*−1 and

*n*, respectively. The update equation for NLMS-type algorithms is

*μ*(

*n*) is a positive factor known as the step-size, which governs the stability, the convergence rate, and the misadjustment of the algorithm. A reasonable way to derive

*μ*(

*n*), taking into account the stability conditions, is to cancel the a posteriori error signal [27]. Replacing (4) in (3) with the requirement

*ε*(

*n*)=0, it results that

*e*(

*n*)≠0, we find

We should note that the above procedure makes sense in the absence of noise [i.e., *v*(*n*)=0], where the condition *ε*(*n*)=0 implies that \(\mathbf {x}^{T}(n) \left [ \mathbf {h}(n) - \widehat {\mathbf {h}}(n) \right ]=0\). Finding the parameter *μ*(*n*) in the presence of noise will introduce noise in \(\widehat {\mathbf {h}}(n)\), since the condition *ε*(*n*)=0 leads to \(\mathbf {x}^{T}(n) \left [ \mathbf {h}(n) - \widehat {\mathbf {h}}(n) \right ]=-v (n) \neq 0\). In fact, we would like to have \(\mathbf {x}^{T}(n) \left [ \mathbf {h}(n) - \widehat {\mathbf {h}}(n) \right ]=0\), which implies that *ε*(*n*)=*v*(*n*).

*α*(with 0<

*α*<2), known as the normalized step-size, multiplies (6) to achieve a proper compromise between the convergence rate and the misadjustment [10–12]; also, a positive constant

*δ*, known as the regularization parameter, is added to the denominator of (6) in order to make the adaptive filter work well in the presence of noise. Consequently, the well-known update equation of the NLMS algorithm becomes

#### 1.2.1 Performance analysis

*α*and

*δ*, highly influence the overall performance of the NLMS algorithm. An insightful analysis of their influence was developed by Hänsler and Schmidt in [1]. To begin, let us define the a posteriori misalignment (also known as the system mismatch [1]) as

*ℓ*

_{2}norm in (10), we obtain

*x*(

*n*) and

*v*(

*n*) are assumed to be independent and zero mean), we obtain

*L*(i.e.,

*L*≫1), it holds that \(\left \| \mathbf {x}(n) \right \|^{2}_{2} \approx L{\sigma _{x}^{2}}\) [1, 19]. Consequently,

*L*and a certain stationarity degree of the input signal, we can treat this term as a deterministic quantity at this point [1]. Under these circumstances, (14) becomes

**x**(

*n*), and the a posteriori misalignment vector,

**m**(

*n*−1), are statistically independent, and

*x*(

*n*) is white. In this case,

represent the so-called *contraction* and *expansion* parameters, respectively [1].

*α*<2 and

*δ*≥0. The expansion parameter, \(B\left (\alpha,\delta,L,{\sigma _{x}^{2}}\right)\), is related to the influence of the system noise, since it multiplies \({\sigma _{v}^{2}}\). Both terms depend on the control parameters,

*α*and

*δ*, as well as on the filter length,

*L*, and the input signal power, \({\sigma _{x}^{2}}\). However, a compromise should be made when setting the values of the control parameters. For example, if the influence of the system noise should be eliminated completely, i.e., \(B\left (\alpha,\delta,L,{\sigma _{x}^{2}}\right) = 0\), we should set

*α*=0 or

*δ*=

*∞*, which, on the other hand, leads to \(A\left (\alpha,\delta,L,{\sigma _{x}^{2}} \right) = 1\), i.e., the filter will not be updated. The fastest convergence (FC) mode is achieved when \(A\left (\alpha,\delta,L,{\sigma _{x}^{2}} \right)\) reaches its minimum (e.g., for

*α*=1 and

*δ*=0), which, unfortunately, increases the misadjustment (in terms of the influence of the system noise). For example, taking the normalized step-size as the reference parameter and evaluating

*δ*≈0), the fastest convergence mode is achieved for

*α*≈1, which is a well-known result [1, 11, 12]. Also, the stability condition can be found by imposing \(\left |A(\alpha,\delta,L,{\sigma _{x}^{2}})\right | < 1\), which leads to

*δ*=0 in (23), the standard stability condition of the NLMS algorithm results, i.e., 0<

*α*<2. On the other hand, the lowest misadjustment (LM) is obtained when the term from (20) reaches its minimum. Also, taking the normalized step-size as the reference parameter and evaluating

which is also a well-known result [1, 11, 12]; unfortunately, the filter is not updated in this case.

Summarizing, the convergence rate of the algorithm is not influenced by the level of the system noise, but the misadjustment increases when the system noise increases. More importantly, it can be noticed that the expansion term from (20) always increases when *α* increases; this concludes the fact that a higher value of the normalized step-size increases the misadjustment. Nevertheless, the ideal requirements of the algorithm are for both fast convergence and low misadjustment. It is clear that (22) and (25) “push” the normalized step-size in opposite directions. This aspect represents the motivation behind the VSS approaches, i.e., the normalized step-size needs to be controlled in order to meet these conflicting requirements. The regularization constant also influences the performance of the algorithm, but in a “milder” way. It can be noticed that the contraction term from (19) always decreases when the regularization constant increases, while the expansion term from (20) always increases when the regularization constant decreases.

#### 1.2.2 Optimal choice of the control parameters

*e*

_{u}(

*n*) denotes the so-called undistorted error signal [1], i.e., the part of the error that is not affected by the system noise. Using this notation, (11) can be rewritten as

*L*(i.e.,

*L*≫1), the assumption \(\left \| \mathbf {x}(n) \right \|^{2}_{2} \approx L{\sigma _{x}^{2}}\) is valid [1, 19]. Also, since the input signal,

*x*(

*n*), and the system noise,

*v*(

*n*), are uncorrelated, the undistorted error signal,

*e*

_{u}(

*n*), is also uncorrelated with the system noise. Therefore, (31) simplifies to

In the absence of the system noise [i.e., *v*(*n*)=0], the a priori error signal, *e*(*n*), equals the undistorted error signal, *e*
_{u}(*n*), so that and the optimal normalized step-size is equal to 1, which justifies the discussion related to (6). In the presence of the system noise, when the adaptive filter starts to converge, the power of the undistorted error signal, *e*
_{u}(*n*), decreases and, consequently, the normalized step-size decreases, thus leading to low misadjustment.

*e*

_{u}(

*n*), is not available in practice. In order to overcome this issue, several solutions were proposed in [1]. For example, assuming that the excitation,

*x*(

*n*), is white and considering that the input vector,

**x**(

*n*), and the a posteriori misalignment vector,

**m**(

*n*−1), are statistically independent, (32) can be developed based on (17) as

where *L*
_{D} denotes the number of coefficients corresponding to the artificial delay. However, this method may freeze the adaptation when the unknown system changes, which would require an additional system change detector [1].

*δ*. Using a similar approach as before, the only-regularized version of the NLMS algorithm is considered (also imposing that the regularization parameter is time dependent), with the update

*α*

_{opt}(

*n*), (38) leads to the optimal regularization parameter, which can be further developed as

The denominator of (39) can be evaluated based on (34). Also, another important parameter to be found is the noise power, \({\sigma _{v}^{2}}\). There are different methods for estimating this parameter; for example, in echo cancellation, it can be estimated during silences of the near-end talker [19]. Also, other practical methods to estimate \({\sigma _{v}^{2}}\) in AEC can be found in [28, 29] (which are briefly detailed in the end of Section 1.3). However, we should note that different other estimators can be used for the noise power; the analysis of their influence on the algorithms’ performance is beyond the scope of this paper.

Concluding, both control methods proposed by Hänsler and Schmidt in [1] [i.e., *α*
_{opt}(*n*) and *δ*
_{opt}(*n*)] are theoretically equivalent and represent valuable benchmarks in the field of VSS/VR-NLMS algorithms. However, in practice, their implementations are usually different. In most cases, the control of the normalized step-size is preferred, mainly due to the limited dynamic range of its values; on the other hand, the regularization control usually requires an upper bound (to avoid overflow in case of very large values).

### 1.3 NPVSS-NLMS algorithm

In the previous section, the optimization criterion used for adjusting the control parameters was the minimization of the system misalignment. However, in a system identification setup like AEC (as shown in Fig. 1), this is equivalent to recover the system noise from the error of the adaptive filter [1].

*μ*(

*n*) (which is deterministic in nature), can be found by imposing the condition [19]

*L*≫1), it results

Let us examine the behavior of the algorithm in terms of its normalized step-size. Looking at (44), it is obvious that before the algorithm converges, *σ*
_{
e
}(*n*) is large compared to *σ*
_{
v
} and, consequently, *α*
_{NPVSS}(*n*)≈1. When the algorithm has converged to the true solution, *σ*
_{
e
}(*n*)≈*σ*
_{
v
} and *α*
_{NPVSS}(*n*)≈0. This is the desired behavior for the adaptive algorithm, leading to both fast convergence and low misadjustment.

It is clear that *α*
_{opt}(*n*) is larger than *α*
_{NPVSS}(*n*) by a factor between 1 and 2, but the two variable step-sizes have the same effect for good convergence and low misadjustment.

*l*

_{2}norm in (48), then mathematical expectation on both sides, and assuming that

*v*(

*n*) is white, we obtain

**x**(

*n*) are independent (i.e., the white input assumption), where tr(·) is the trace of a matrix,

**R**=

*E*[

**x**(

*n*)

**x**

^{ T }(

*n*)], and

**K**(

*n*−1)=

*E*[

**m**(

*n*−1)

**m**

^{ T }(

*n*−1)]. Taking (51) into account, (52) becomes

**R**>0 (i.e.,

**R**is a positive definite matrix), it results that

**K**(

*∞*)=

**0**and, consequently,

*δ*, should be added to the denominator of

*μ*

_{NPVSS}(

*n*). A second consideration is related to the estimation of the parameter

*σ*

_{ e }(

*n*). In practice, the power of the error signal is estimated as follows:

where *λ* is a weighting factor. Its value is chosen as *λ*=1−1/(*K*
*L*), where *K*>1. The initial value for (55) is \(\widehat {\sigma }_{e}^{2}(0)=0\). Theoretically, it is clear that \({\sigma _{e}^{2}}(n) \geq {\sigma _{v}^{2}}\), which implies that *μ*
_{NPVSS}(*n*)≥0. Nevertheless, the estimation from (55) could result in a lower magnitude than the noise power estimate, which would make *μ*
_{NPVSS}(*n*) negative. In this situation, the problem is solved by setting *μ*
_{NPVSS}(*n*)=0.

NPVSS-NLMS algorithm

| |

\(\widehat {\mathbf {h}}(0)=\mathbf {0}_{L \times 1} \) | |

\(\widehat {\sigma }_{e}^{2}(0) = 0 \) | |

| |

\(\lambda = 1 - \frac {1}{KL},\) weighting factor with | |

\({\sigma _{v}^{2}},\) noise power known or estimated | |

| |

| |

| |

\(e(n) = d(n) - \mathbf {x}^{T}(n) \widehat {\mathbf {h}}(n-1) \) | |

\(\widehat {\sigma }_{e}^{2}(n) = \lambda \widehat {\sigma }_{e}^{2}(n-1) + (1-\lambda) e^{2}(n) \) | |

\(\alpha _{\text {NPVSS}}(n) =1- \frac {\sigma _{v} }{ \zeta + \widehat {\sigma }_{e}(n)} \) | |

\(\mu _{\text {NPVSS}}(n) = \left \{ \begin {array}{lll} \alpha _{\text {NPVSS}}(n) \left [ \delta + \mathbf {x}^{T}(n) \mathbf {x}(n) \right ]^{-1}, & \text {if} \ \alpha _{\text {NPVSS}}(n) > 0 \\ \\ ~~~~~~0, & \text {otherwise} \end {array}\right.\) | |

\(\widehat {\mathbf {h}}(n) = \widehat {\mathbf {h}}(n-1) + \mu _{\text {NPVSS}}(n) \mathbf {x}(n) e(n)\) |

*e*(

*n*) is estimated based on (55) and the other terms are evaluated in a similar manner, i.e.,

*d*(

*n*)=

*y*(

*n*)+

*v*(

*n*). Since the echo signal and the near-end signal can be considered uncorrelated, the previous relation can be rewritten in terms of variances as

*d*(

*n*) and \(\widehat {y}(n)\), respectively. These parameters can be recursively evaluated similar to (55), i.e.,

The absolute values in (61) prevent any minor deviations (due to the use of power estimates) from the true values, which can make the normalized step-size negative or complex.

When only the background noise is present, an estimate of its power is obtained using the right-hand term in (61). This expression holds even if the level of the background noise changes, so that there is no need for the estimation of this parameter during silences of the near-end talker. In case of double-talk, when the near-end speech is also present (assuming that it is uncorrelated with the background noise), the right-hand term in (61) still provides a power estimate of the near-end signal. Most importantly, this term depends only on the signals that are available within the AEC application, i.e., the microphone signal, *d*(*n*), and the output of the adaptive filter, \(\widehat {y}(n)\). Moreover, as it was demonstrated in [29], the estimation from (61) is also suitable for the under-modeling case, i.e., when the length of \(\widehat {\mathbf {h}}(n)\) is smaller than the length of **h**(*n*), so that an under-modeling noise appears (i.e., the residual echo caused by the part of the echo path that is not modeled by the adaptive filter; it can be interpreted as an additional noise that corrupts the near-end signal).

The main drawback of (61) is due to the approximation in (60). This assumption will be biased in the initial convergence phase or when there is a change of the echo path. Concerning the first problem, we can use a regular NLMS algorithm in the first steps (e.g., in the first *L* iterations).

### 1.4 JO-NLMS algorithm

In both previous sections, the assumption from (9) was used when evaluating the a posteriori misalignment, i.e., the unknown system is time-invariant. However, in AEC and also in many other system identification problems, this assumption is quite strong. In practice, the system to be identified could be variable in time. For example, in AEC, it can be assumed that the impulse response of the echo path is modeled by a time-varying system following a first-order Markov model [9]. Therefore, a more reliable approach could be based on the Kalman filter, since the state variable model fits better in this context [26, 30, 31].

**h**(

*n*) is a zero-mean random vector, which follows a simplified first-order Markov model, i.e.,

where **w**(*n*) is a zero-mean white Gaussian noise signal vector, which is uncorrelated with **h**(*n*−1). The correlation matrix of **w**(*n*) is assumed to be \(\mathbf {R}_{\mathbf {w}} = {\sigma _{w}^{2}} \mathbf {I}_{L}\), where **I**
_{
L
} is the *L*×*L* identity matrix. The variance, \({\sigma _{w}^{2}}\), captures the uncertainties in **h**(*n*). Equations (1) and (64) define now a state variable model, similar to Kalman filtering setup.

#### 1.4.1 Convergence analysis

*L*(i.e.,

*L*≫1), it holds that \(\mathbf {x}^{T}(n)\mathbf {x}(n) \approx L{\sigma _{x}^{2}}\) [1, 19]. Consequently,

This term contains both the control parameters, i.e., *α* and *δ*, and also the statistical information on the input signal. However, for a large value of *L* and a certain stationarity degree of the input signal, we can treat this term as a deterministic quantity [1].

*ℓ*

_{2}norm in (65), then mathematical expectation on both sides (also using (66)), and removing the uncorrelated products, we obtain

*n*−1 is uncorrelated with the input vector at time index

*n*and (ii) the correlation matrix of the input is close to a diagonal one, i.e., \(E\left [\mathbf {x}(n)\mathbf {x}^{T}(n)\right ] \approx {\sigma _{x}^{2}}\mathbf {I}_{L}\) (this is a fairly restrictive assumption, however, it has been widely used to simplify the analysis [16]). Consequently, (69) becomes

**w**(

*n*) is assumed to be diagonal and removing the uncorrelated products, it results in

*δ*≈0) and assuming that

*L*≫2, the fastest convergence mode is achieved for

*α*≈1, which is the same conclusion related to (22). Also, similar to (23), the stability condition can be found by imposing \(\left |\widetilde {A}(\alpha,\delta,L,{\sigma _{x}^{2}})\right | < 1\), which leads to

For example, taking *δ*=0 and *L*≫2 in (79), the standard stability condition of the NLMS algorithm results, i.e., 0<*α*<2.

In order to compare this result with (25), let us assume that the system is time-invariant, i.e., \({\sigma _{w}^{2}} \approx 0\). Consequently, (80) leads to *α*≈0 (i.e., the lowest misadjustment is obtained for a normalized step-size close to zero), which is the same result obtained in (25).

#### 1.4.2 Derivation of the algorithm

It is known that the ideal requirements of any adaptive algorithm are for both fast convergence and low misadjustment. In our framework, there are two important issues to be considered: (1) we have two main parameters to control, *α* and *δ*, which influence the overall performance of the NLMS algorithm and (2) in the context of system identification, it is reasonable to follow a minimization problem in terms of the system misalignment, as outlined by Hänsler and Schmidt in [1].

*m*(

*n*) in (84). Using (83) in (75), followed by several straightforward computations, it results in

Consequently, the resulting JO-NLMS algorithm is defined by the relations (2), (84), and (85).

*ℓ*

_{2}norm on both sides of (64) and replacing

**h**(

*n*) by its estimate \(\widehat {\mathbf {h}}(n)\), thus resulting in

According to (86), the parameter \(\widehat {\sigma }_{w}^{2}(n)\) takes large values in the beginning of adaptation (or when there is an abrupt change of the system), thus providing fast convergence and tracking. On the other hand, when the algorithm starts to converge, the value of \(\widehat {\sigma }_{w}^{2}(n)\) reduces, which leads to low misadjustment. In this way, the algorithm achieves a proper compromise between the performance criteria. In finite precision implementations, in order to avoid any risk of freezing in (86), it is recommended to set a lower bound for \(\widehat {\sigma }_{w}^{2}(n)\) (e.g., the smallest positive number available).

JO-NLMS algorithm

| |

\(\widehat {\mathbf {h}}(0)=\mathbf {0}_{L \times 1} \) | |

| |

\(\widehat {\sigma }_{w}^{2}(0) = 0 \) | |

| |

\({\sigma _{v}^{2}},\) noise power known or estimated | |

| |

\(\widehat {\sigma }_{x}^{2}(n) = \frac {1}{L}\mathbf {x}^{T}(n)\mathbf {x}(n) \) | |

\(e(n) = d(n)-\mathbf {x}^{T}(n)\widehat {\mathbf {h}}(n-1) \) | |

\(p(n) = m(n-1)+ L\widehat {\sigma }_{w}^{2}(n-1)\) | |

\(q(n) = \frac {p(n)}{L{\sigma _{v}^{2}} + (L+2)p(n)\widehat {\sigma }_{x}^{2}(n)} \) | |

\(\widehat {\mathbf {h}}(n) = \widehat {\mathbf {h}}(n-1)+q(n)\mathbf {x}(n)e(n) \) | |

\(m(n) = \left [1 - q(n)\widehat {\sigma }_{x}^{2}(n) \right ] p(n) \) | |

\(\widehat {\sigma }_{w}^{2}(n)=\frac {1}{L}\left \| \widehat {\mathbf {h}}(n) - \widehat {\mathbf {h}}(n-1) \right \|_{2}^{2}\) |

### 1.5 Simulation results

*L*=512; the sampling rate is 8 kHz. We should note that in many real-world AEC scenarios, the adaptive filter works most likely in an under-modeling situation, i.e., its length is smaller than the length of the acoustic impulse response. Hence, the residual echo caused by the part of the system that cannot be modeled acts like an additional noise (that corrupts the near-end signal) and disturbs the overall performance. However, for experimental purposes, we set the same length for both the unknown system (i.e., the acoustic echo path) and the adaptive filter.

*x*(

*n*), is either a white Gaussian noise, an AR(1) process generated by filtering a white Gaussian noise through a first-order system 1/(1−0.8

*z*

^{−1}) or a speech sequence. An independent white Gaussian noise

*v*(

*n*) is added to the echo signal

*y*(

*n*), with SNR = 20 dB (except in the last experiment where the SNR is variable and the near-end speech is also present). In most of the experiments (except in the last one), we assume that \({\sigma _{v}^{2}}\) is known; in practice, this variance can be estimated like in [19, 28, 29] (as presented in the end of Section 1.3). The tracking capability of the algorithm is an important issue in AEC, where the acoustic impulse response may rapidly change at any time during the connection. Consequently, an echo path change scenario is simulated in most of the experiments, by shifting the impulse response to the right by 12 samples, in the middle of the experiment. The measure of performance is the normalized misalignment (in dB), defined as

*α*

_{opt}(

*n*) and

*δ*

_{opt}(

*n*) from Section 1.2], assuming that the undistorted error signal

*e*

_{u}(

*n*) from (27) is available and its power, \(E\left [ e_{\mathrm {u}}^{2}(n) \right ] = \sigma _{e_{\mathrm {u}}}^{2}(n)\), can be evaluated similar to (55), i.e.,

where *λ* is a weighting factor [ *λ*=1−1/(*K*
*L*), with *K*>1]. Of course, in practice, the near-end signal *v*(*n*) is not available; however, for comparison purpose, we consider that it is available in (88).

*α*

_{opt}(

*n*) and

*δ*

_{opt}(

*n*), respectively. Since the estimation from (88) is used for both these parameters, we deal with the ideal behavior of the algorithms. Consequently, we will refer to these algorithms as the ideal optimal step-size NLMS (OSS-NLMS-id) and the ideal optimal regularized NLMS (OR-NLMS-id), respectively. In Fig. 3, these ideal benchmarks are compared to the NLMS algorithm using different constant values of the normalized step-size,

*α*, and regularization parameter,

*δ*; the input signal is a white Gaussian noise. First, it can be noticed that the performance of the regular NLMS algorithm can be controlled in terms of both parameters,

*α*and

*δ*, either by setting the fastest convergence mode (i.e.,

*α*=1) and adjusting the value of

*δ*, or by neglecting the regularization constant (i.e.,

*δ*=0) and tuning the value of

*α*. On the other hand, in case of the optimal control parameters, the OSS-NLMS-id and OR-NLMS-id algorithms achieve both fast convergence/tracking and low misalignment, outperforming the NLMS algorithms that use constant values for

*α*and

*δ*. Besides, it should be noted that the OSS-NLMS-id and OR-NLMS-id algorithms are equivalent in terms of their performance (their misalignment curves are overlapped), which justifies the findings from Section 1.2. For this experiment, the evolution of

*α*

_{opt}(

*n*) and

*δ*

_{opt}(

*n*) is depicted in Fig. 4, also supporting the expected behavior of these parameters.

*α*

_{opt}(

*n*) and

*δ*

_{opt}(

*n*) depicted in Fig. 6, we can outline again the discussion from the end of Section 1.2, related to the dynamic range of these parameters. In practice, it is usually more convenient to control the performance of the algorithm in terms of the normalized step-size, since its values are limited in a specific interval. On the other hand, it could be more difficult to control the adaptation in terms of the regularization term, since its values are increasing and could lead to overflows. Usually, an upper bound on the regularization parameter could be imposed, but this would introduce an extra tuning parameter in the algorithm. Due to these aspects, only the OSS-NLMS-id algorithm will be considered as a benchmark in the following experiments.

*L*. For our experimental setup, i.e.,

*L*=512 and SNR = 20 dB, the value \(\delta = 20{\sigma _{x}^{2}}\) fits well. However, this value should be increased for larger values of

*L*or lower SNRs [14]. To conclude this experiment, the influence of the regularization parameter can be also noticed in Fig. 8, where the control parameters of the NPVSS-NLMS and OSS-NLMS-id algorithms are depicted, i.e.,

*α*

_{NPVSS}(

*n*) and

*α*

_{opt}(

*n*), respectively. Clearly, their behavior is strongly biased in case of the small regularization parameter, while they perform similarly in case of a proper regularization.

*α*) is compared with the NPVSS-NLMS, JO-NLMS, and OSS-NLMS-id algorithms, when the far-end signal is an AR(1) process or a speech sequence, respectively. According to these results, it can be noticed that the NLMS algorithm is clearly outperformed by the other algorithms, in terms of convergence rate, tracking, and misalignment. Also, the NPVSS-NLMS and JO-NLMS algorithms perform in a similar manner (with a slight advantage for the JO-NLMS algorithm); besides, they are close to the performance of the OSS-NLMS-id algorithm, which represents the ideal benchmark.

*v*(

*n*) represents the near-end signal, which can contain both the background noise and the near-end speech; since both these signals could be non-stationary, the estimation of \({\sigma _{v}^{2}}\) becomes more difficult. There are different methods for estimating this parameter; for example, in a single-talk scenario, it can be estimated during silences of the near-end talker [19]. Also, other practical methods to estimate \({\sigma _{v}^{2}}\) can be found in [28, 29], as shown in the end of Section 1.3. In the last experiment, the estimation from (61) is used within the NPVSS-NLMS and JO-NLMS algorithms. Two challenging scenarios are considered in Fig. 11, where the far-end signal is a speech sequence. First, a variation of the background noise is simulated, by decreasing the SNR from 20 to 10 dB between times 10 and 20 s; second, the near-end speech appears between times 25 and 30 s (i.e., double-talk case), without using any DTD. The results from Fig. 11 indicate that the NLMS algorithm fails in this case, especially during double-talk. The NPVSS-NLMS and JO-NLMS algorithms show good robustness features in both situations (with an advantage for the JO-NLMS algorithm during double-talk). In terms of robustness, the JO-NLMS algorithm performs similar to the ideal case represented by the OSS-NLMS-id algorithm. Finally, it should be noted that both the NPVSS-NLMS and JO-NLMS algorithms do not require any additional features to control their behavior, thus being reliable candidates for AEC applications.

## 2 Conclusions

In this paper, we have presented several NLMS-based algorithms suitable for AEC applications. These algorithms are based on different control strategies for adjusting their main parameters, i.e., the normalized step-size and regularization term, in order to achieve a proper compromise between the performance criteria (i.e., fast convergence/tracking and low misadjustment). The main motivation behind this approach was the reference work of Hänsler and Schmidt from [1]. Following their ideas, we presented here two related solutions, i.e., the NPVSS-NLMS and JO-NLMS algorithms. The first one (originally proposed in [19]) represents a simple and efficient method to control the normalized step-size. Due to its non-parametric nature, it is a reliable choice in many practical applications. The second one is developed in the context of a state-variable model and follows an optimization criterion based on the minimization of the system misalignment. It is also a non-parametric algorithm, which does not require any additional control features (e.g., system change detector, stability thresholds, etc.). It also gives good robustness against double-talk, which is one of the most challenging situation in AEC. Consequently, it could be an appealing candidate for real-world applications.

There are several perspectives that could follow the ideas presented in this paper. First, the extension to the affine projection algorithm represents a straightforward approach. Second, it would be highly interesting to further develop these solutions in the context of proportionate-type algorithms, which are also attractive choices for sparse system identification.

Concluding, despite the fact that the NLMS algorithm was the workhorse in AEC and also in many other applications, it is still highly studied and very often represents the algorithm of choice in practice. Therefore, let us end this paper with a neat remark of Hänsler and Schmidt from [4], which fits best in this context: “The NLMS algorithm has often been declared to be dead. According to a popular saying, this is an infallible sign of a very long life.”

### Acknowledgements

This work was supported by the UEFISCDI Romania under Grant PN-II-RU-TE-2014-4-1880.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

