- Review
- Open Access
Adaptive filters: stable but divergent
- Markus Rupp^{1}Email author
https://doi.org/10.1186/s13634-015-0289-8
© Rupp. 2015
- Received: 3 June 2015
- Accepted: 16 November 2015
- Published: 3 December 2015
Abstract
The pros and cons of a quadratic error measure in the context of various applications have often been discussed. In this tutorial, we argue that it is not only a suboptimal but definitely the wrong choice when describing the stability behavior of adaptive filters. We take a walk through the past and recent history of adaptive filters and present 14 canonical forms of adaptive algorithms and even more variants thereof contrasting their mean-square with their l _{2}−stability conditions. In particular, in safety critical applications, the convergence in the mean-square sense turns out to provide wrong results, often not leading to stability at all. Only the robustness concept with its l _{2}−stability conditions ensures the absence of divergence.
Keywords
- Adaptive gradient-type filters
- l _{2}-stability
- Mean squared error
- Small-gain theorem
- Contraction mapping
- Error bounds
- Neural networks
- Backpropagation
- Proportionate normalized least-mean-square
1 Introduction: some historical background on adaptive-filter stability
The basic concept of a quadratic error measure whose minimum can simply be found by differentiating and solving a resulting set of linear equations, invented by C.F.Gauss in 1795, has been the tool of choice for about 200 years. In [1], many arguments were demonstrated to question the usefulness of the mean squared error (MSE) in image and audio processing due to our complex human perception and these arguments were nicely supported by many practical examples and observations.
Most commonly used variables and parameters
Variable | Meaning |
---|---|
u _{ k } | Input sequence |
u _{ k } | Vector with input sequence |
x _{ k } | Vector with alternative input sequence |
Alternative regression vector | |
w | Unknown system (impulse response) |
w _{ k } | Estimate of w |
g | Unknown system 1st. partition |
g _{ k } | Estimate of g |
h | Unknown system 2nd. partition |
h _{ k } | Estimate of h |
a | Unknown IIR system recursive partition |
a _{ k } | Estimate of a |
b | Unknown IIR system forward partition |
b _{ k } | Estimate of b |
d _{ k } | Desired, observed noisy output |
y _{ k } | Desired, undistorted output |
y _{ k } | Vector of undistorted outputs |
\(\hat {{\mathrm {\mathbf {y}}}}_{k}\) | Vector with estimated outputs |
v _{ k } | Additive noise |
M | Filter order |
This paper provides a historical overview of adaptive-filter theory spanning the past 50 years. In Section 2, we review the problems of filters including filtered errors as they emerged in the 1970s. In Section 3, we formally introduce the two different concepts of stability, MSE, and l _{2}−stability, and compare their properties. Section 4 then continues our historical walk into the 1990s including newer and older algorithms that exhibit stability problems which were not observed at the time of their proposal. We even address adaptive algorithms for blind channel estimation and show their robustness. We further exploit the robustness concept and l _{2}−stability in Section 5 by a recently proposed singular-value-decomposition (SVD)-based method that is better suited to detect the instability of adaptive systems. We also provide an example of the so-called proportionate normalized LMS algorithm (PNLMS) which shows that an adaptive-filter algorithm can be MSE stable but still exhibit divergence. Based on this new framework, we investigate in Section 6 the stability of adaptive algorithms whose error signals are linearly coupled. As shown in Section 7, this sets the framework for all cascaded adaptive algorithms and allows us finally to describe the stability behavior of such an algorithmic family. Eventually, some open issues are addressed in Section 8. Altogether, we discuss 14 different adaptive-filter algorithms and many of their variants in terms of stability and robustness.
2 First stability problems found
Employing the MSE method as the favorite and satisfying tool of most researchers in the field of adaptive filters, Feintuch introduced in 1976 an adaptive algorithm [7] (see Algorithm 2) that exhibited first obstacles. He proposed the estimation of IIR filter coefficients (A,B), rather than the conventional FIR (B) coefficients, located in the filter weights w _{ k }. Usually, when a stability issue in adaptive filters occurs, practitioners recommend to lower the step-size μ _{ k }, thus buying increased stability with the expense of slower convergence. However, this method did not work for Feintuch’s algorithm.
Equivalent SPR conditions of linear operators
γ _{min}(F)= | minΩRe{F(e ^{ j Ω })} | >0 |
minΩ{F(e ^{ j Ω })+F(e ^{−j Ω })} | >0 | |
γ _{min}(F)= | minx ≠ 0Re{x ^{H} F x} | >0 |
minx ≠ 0{x ^{H}[F+F ^{H}]x} | >0 | |
mineig[F+F ^{H}] | >0 |
In fact, Feintuch’s adaptive IIR filter algorithm is a special case of a so-called filtered-error-type algorithm (see Algorithm 4). A very simple instantiation of such filtered-error-type of algorithm is the so-called LMS algorithm with delayed updates (DLMS) [12–14]. It occurs if the error filter is a simple delay, which can easily happen if a pipelined chip structure for the LMS algorithm is designed that requires to introduce a delayed version of the error signal. If the error signal appears delayed by, say K>0 steps, the filter F(e ^{−j Ω })=e ^{−j K Ω } cannot be SPR and thus a pipelined LMS algorithm can become unstable. However, the cure for this algorithm is simply obtained by also delaying the regression vector by K steps and applying an older estimate, as shown in Algorithm 3.
If the filter F is known and non-SPR, a cure can be rather simple by applying an additional backward filter F ^{ R } to the filtered error. This results in an SPR part F ^{∗} F=|F|^{2} and a pure delay e ^{−j M Ω } that can be treated by delaying the regression vector u _{ k } in a similar way to the DLMS cure in Algorithm 3. Note, however, that such treatment usually results in a rather slow update rate as the error signal is severely delayed now.
Such a behavior was also detected in the context of active noise control, where the linear filter F is not defined by the unknown recursive part of an IIR filter but by an acoustic-electrical transfer function, defined also by the mechanical construction of the concatenation of loudspeaker, free-space, and microphone system [15]. Different from the adaptive IIR filter, however, in acoustic noise control, the filter F can be observed and its impulse response identified first and then compensated for. An alternative idea that avoids applying the backward filter was proposed by applying the error filter F on the regression vector. Many algorithms being derivatives of this so-called Filtered-X-LMS algorithm (see Algorithm 5) have been proposed during the 1980s to overcome the SPR condition, once F is known. The essential idea is to compensate the impact of the filtered error by an identical filter on the regression vector (in this case G _{ k }[.]=1). In [16], robustness conditions for the Filtered-X-LMS algorithm were analyzed and it was found that, although placing F on the regression vector has a beneficial behavior, the algorithm is in general only locally robust. The Filtered-X-LMS algorithm was then reformulated in the form of a filtered-error-type; however, now, a new, time-variant linear operator 1/[1−μ _{ k } C _{ k }] applies on the filtered error. The coefficients of this time-variant operator C _{ k } depend on linearly filtered versions of the input signal u _{ k } as well as on the algorithm’s step-size. As the coefficients of μ _{ k } C _{ k } are proportional to the step-size μ _{ k }, sufficiently small step-sizes can ensure that 1−μ _{ k } C _{ k } is SPR, however, with the price of a slowed down adaptation. Only if a particular time-variant linear operator \(G_{k}=G_{o}=\frac 1{1+\mu _{k} C_{k}}\) is additionally applied on the filtered error term, this can be compensated for and the algorithm can be sped up considerably.
The Filtered-X LMS algorithm has experienced a renaissance during the past years as it appears to be the right choice for vibration control in car engines [17–20]. A novel aspect here is that car engines can be controlled without sensors as the engine speed is known. The input signal u _{ k } can thus be generated artificially out of weighted sine and cosine terms of the car’s rotation frequency Ω and multiples thereof. A compact notation of this algorithm results in a complex-valued Filtered-X LMS algorithm. However, due to physical reasons, the error signal must be real-valued and therefore a complex-valued LMS algorithm is run only by a real-valued error fraction. In [21], it was shown that this variant behaves indeed in a robust way while an alternative variant employing a complex-valued error and a real-valued regressor does not. Both variants show identical MSE behavior though.
3 Stability of adaptive filters
After so much disturbing news on potential instability, it is time now to take a closer look into the stability of adaptive filters as we need to understand the various notions of stability.
MSE-stability: is based on minimizing \(E[|\tilde {e}_{a}|^{2}]\) with respect to the parameter estimates w _{ k } [2, 22–26]. Depending on the application, a minimal remaining error energy can be desired (signal adaptation), but also the correct knowledge of the parameters w may be desired (system adaptation). In the classical MSE analysis, the parameter vector error-covariance matrix \(\mathbf {P}_{k}=E\left [\tilde {\mathbf {w}}_{k}{\tilde {\mathbf {w}}}_{k}^{\textsf {H}}\right ]\) is studied—requiring the so-called independence assumptions on the participating processes u _{ k } and v _{ k }—and step-size conditions are derived to guarantee tr(P _{ k }) to decrease. Due to this procedure, MSE-stability always includes some form of convergence. If an additive stationary noise process v _{ k } is assumed, the algorithm converges into a nonzero steady-state.
l _{2}−stability: is based on robustness terms originating from control theory [27, 28] in the form of l _{2}-norms of instantaneous regression vectors rather than their expectation values. In the context of adaptive filters, it was introduced in 1993 by Kailath, Sayed, and Hassibi [29]. Further work over the next 10 years [25, 30–33] showed that more and more adaptive filters exhibit such property. In loose words, l _{2}−stability simply says that if the input sequence has a bounded Euclidean norm, so does the output sequence. Note that, different from common treatment, inputs of the scheme are now additive noise v _{ k } as well as the initial parameter-error vector \(\tilde {\mathbf {w}}_{0}=\mathbf {w}-\mathbf {w}_{0}\), outputs are the undistorted a-priori error sequence \(e_{a,k}=\mathbf {u}_{k}^{\textsf {T}} \tilde {\mathbf {w}}_{k-1}\) and possibly the a-posteriori parameter-error vector \(\tilde {\mathbf {w}}_{k}=\mathbf {w}-\mathbf {w}_{k}\). The driving sequence u _{ k } only influences the algorithmic mapping from input to output.
As the stability result depends on the small-gain theorem [27, 28], the resulting step-size bound is conservative. While for the classic LMS algorithm, the observation coincides very sharply with the predicted bounds, for many other algorithms, the bound obtained is indeed conservative.
For gradient-type algorithms it was concluded that, if the noise sequence compensates the undistorted error, i.e., v _{ k }=−e _{ a,k }, the algorithms do not update and their maximum robustness level γ is obtained with identity, thus such sequences were considered to cause worst-case situations. Surprisingly, there was no worst condition imposed on the driving sequence u _{ k } as long as \(0<\mu _{k}<2\bar {\mu }_{k}=\frac 2{\|{\mathrm {u}_{k}}\|_{2}^{2}}\). The reason for this may be in the fact that the method itself aims for a convergence of the undisturbed error sequence \(e_{a,k}={{\mathbf {u}_{k}^{\textsf {T}}}} \tilde {{\mathbf {w}}}_{k-1}\rightarrow 0\), rather than the parameter-error vector \(\tilde {{\mathbf {w}}}_{k}\). Not surprisingly, signal conditions only came up when requiring that not only the error energy of |e _{ a,k }|^{2} tends to zero but also that the parameter-error vector \(\tilde {{\mathbf {w}}}_{k-1}={\mathbf {w}}-\hat {{\mathbf {w}}}_{k-1}\), i.e., the difference between the true system impulse response and its estimate, converges strongly (in norm) to zero. If the latter is also required, the driving signal vectors u _{ k } need to be persistent exciting, i.e., consecutive vectors need to span the space of dimension M, if M denotes the filter order.
A consequence of the l _{2}−stability property is that an energy-bounded input sequence (noise v _{ k } and initial parameter-error vector \(\tilde {{\mathbf {w}}}_{0}\)) causes a bounded output of undistorted errors e _{ a,k }. If the input sequence is a Cauchy sequence, so is the output. If, on the other hand, such bound γ cannot be guaranteed, it is likely that an input sequence exists that causes divergence. Convergence in this context means that a range of step-size parameters (or alternative design parameters) exist, for which even under worst-case sequences, no divergence occurs.
The concept of l _{2}−stability is thus very different from MSE-stability as the existence of a single worst-case sequence (e.g., one among an infinite amount) still would guarantee MSE-stability (an infinite amount of working sequences outweigh the single worst-case sequence) but not vice versa. The idea of l _{2}−stability is thus more restricting and to be preferred in cases where security is of utmost importance (smart cities, smart grids, transportation flow, automatically controlled cars, flight control, and so on), while MSE-stability might only be sufficient for typical applications in telecommunications where corrupted data transmissions can be corrected by different means. We can conclude that for bounded random sequences, l _{2}-stability leads to MSE-stability but not to the opposite.
The robustness framework was even able to handle such different algorithms as the Gauss-Newton-type Algorithm 6, of which the recursive least squares (RLS) algorithm is its most famous special form, but also single-layer neural network adaptations. The global robustness and l _{2}−stability of the RLS algorithm was shown [34], corresponding results for the entire Gauss-Newton algorithmic family with time-variant forgetting factor 0<λ _{ k }<1 as well as memory factor 0<β _{ k }≤1 are reported in [35], special results for least squares (LS) estimators including Kalman filters appeared in [36]. The real-valued Perceptron learning algorithm (PLA), see Algorithm 7, was shown to be l _{2}−stable in [37]. Even more complicated single-layer structures, such as the so-called Narendra and Parthasarathy structure, that include feedback with memory could be analyzed and l _{2}−stable conditions were provided.
4 Recently discovered evidence
Until here, the occurrence of a linear filter in the error path may have been regarded as some curiosity in the many variations of adaptive-filter algorithms and applications that simply set an exemplary exception, requiring a different treatment while the majority of adaptive-filter algorithms work accurately according to an MSE-based theory. The developed robustness description allows to define stability conditions for all those cases very accurately.
Back to our historical walk. In the 1990s, adaptive filters for neural networks and particular fast versions of LS techniques were in the focus, so called Fast-RLS algorithms. Their theory is also based on minimum MSE (MMSE) but, due to their deterministic nature, independence assumptions were not required. To include them in practical applications, their LS nature was often sacrificed, and time-variant step-sizes were introduced. With such step-sizes, however, their nature was more along the stochastic, gradient-type algorithms. One of these RLS derivatives is the affine projection (AP) algorithm [38] that speeds up convergence when compared to its simpler gradient counterpart by taking P past regression directions into account. A fast version of this [39, 40] is the basis for millions of copies of such algorithms running today in electric echo cancellation devices to reduce the echoes of long-distance call telephone cables. Unlike the original algorithm, they use a sophisticated step-size control to prevent instable behavior in double-talk situations [41], that is when both talkers are active. The resulting algorithm is called pseudo affine projection (PAP) algorithm, see Algorithm 8, as with a moderate step-size the original property is lost. Recently, it has been shown [42] that depending on the correlation of the input signal, such PAP algorithm can become unstable and that, depending on the input signal statistic, situations exist in which even small step-sizes do not result in stable behavior but larger ones are required; thus depending on the steady-state of the predictor coefficients a _{ k } (correspondingly denoted here as linear operator A(q ^{−1})), lower normalized step-sizes α _{min} may exist as well as upper bounds α _{max}.
However, this is not the only algorithm for which stability problems remained undiscovered for a long time. A well-known adaptive algorithm for zero forcing (ZF) equalization is the gradient algorithm by Lucky [43], see Algorithm 9. In the well-known text book by Proakis [44] we can read:
“The peak distortion has been shown by Lucky (1965) to be a convex function of the coefficients. That is, it possesses a global minimum and no relative minima. Its minimization can be carried out numerically, using, for example, the method of steepest descent”.
The argumentation sounded very convincing until the algorithmic behavior was analyzed throughly in [45] and it was found that indeed there exist channel conditions and data sequences that cause the algorithm to diverge, even for smallest step-sizes. Based on the channel impulse response {h _{ i }}, step-size conditions only for MSE-stability can be derived. See also [46] for alternative non-robust ZF equalizer algorithms.
Such examples may corroborate the suspicion that they all may be related to a linear filter in the error path of some form and thus depend on an SPR condition. Note, however, that neither for the ZF algorithm nor for the PAP algorithm, any SPR condition appears in the error path; thus, they do not fall under the existing knowledge of the early 1990s, their l _{2}− stability behavior being much different from their MSE behavior. In the meantime, they were, however, correctly analyzed by the now existing robustness techniques [42, 45].
Moreover, also other problems can cause stability trouble, when the driving signal is of persistent excitation. Once we consider algorithms with matrix inverses such as RLS algorithms, it is well understood that with a lack of persistent excitation, a null space in the solution opens up that offers the algorithm a wide space to diverge. Also in applications, such as stereo hands-free telephones [47], null-spaces can occur as part of the solution and cause adaptive filters to diverge. In such cases, regularization and leakage factors are often applied to force the null spaces out of the obtained estimates.
5 A converse approach: worst-case scenarios that lead to divergence
that is directions x _{ k } are applied, different (not parallel) from driving process vector u _{ k } that constitutes the error \(\tilde {e}_{a,k}={e}_{a,k}+v_{k}={{\mathbf {u}^{\textsf {T}}_{k}}}\tilde {{\mathbf {w}}}_{k-1}+v_{k}\). We refer to these algorithms in the following as asymmetric in contrast to symmetric algorithms, such as LMS or RLS. Equivalently speaking, it remained unclear for those adaptive filters of general asymmetric structure whether worst-case sequences exist that could cause divergence no matter what the step-size (μ _{ k }>0) is.
A more general view that encompasses also the driving processes u _{ k } into the worst-case scenarios and aims directly at the convergence or divergence of the parameter-error vector \(\tilde {{\mathbf {w}}}_{k}\) is proposed in [52] where a similar argument to robustness is employed but instead of using the small-gain theorem, the sub-multiplicative property of norms in the context of SVD is applied.
that is, not necessarily into direction u _{ k } but x _{ k }. Applying the update several times results in a product of matrices \(\prod \mathbf {B}_{k}\) whose largest singular value should remain bounded to preserve stability. This is equivalent to requiring a norm on B _{ k } to remain bounded and as \(\|\prod \mathbf {B}_{k}\|\le \prod \| \mathbf {B}_{k}\|\) for many norms (sub-multiplicative property), we can conclude that l _{2}−stability can be guaranteed as long as the largest singular value σ _{max}(B _{ k })≤1. This condition is—similar to the small-gain theorem that was applied before—a conservative condition. However, due to the linear operators involved, it is now simpler to analyze converse conditions, i.e., bounds for instability rather than stability. Note that for the above example, the largest singular value turns out to be larger than one if x _{ k }≠α u _{ k }, that is if these vectors are not parallel.
are obtained, which are obviously weaker than the original ones as the terms of the undistorted errors \(\sum _{k=1}^{N} \mu _{k} |e_{a,k}|^{2}\) are missing. Similarly, to the robustness method of the previous section, stability conditions for the step-size μ _{ k } and boundedness conditions on additive noise can be derived now. However, the so obtained bounds appear to be tighter (or equivalent) when compared to the previous ones based on robustness. If the largest singular value of the mapping B _{ k } is larger than one, \(\tilde {\gamma }\) will not remain bounded for N, growing to infinity, and thus robustness is potentially lost.
A good first example is the LMS algorithm, i.e, Algorithm 1. The classic robustness scheme showed l _{2}−stability as long as \(\mu _{k}<2/\|{\mathbf {u}_{k}}\|_{2}^{2}\). But, how much larger can μ _{ k } become until the algorithm really diverges? Due to the conservatism of the small-gain theorem, we cannot answer this. The SVD method, on the other hand, allows to derive the worst-case sequence u _{ k } so that divergence can be guaranteed [52] and indeed for \(\mu _{k}>2/\|{\mathbf {u}_{k}}\|_{2}^{2}\) divergence can be ensured that is, sequences that cause divergence can always be found. The stability bound of the LMS algorithm is thus tight as both methods deliver the same bound. While the SVD-based method provides an identical bound in this case; in many other algorithms, larger bounds could be identified.
It is very illustrative to view in this context the so-called proportionate normalized LMS (PNLMS) algorithm as a second example. Originally derived by Duttweiler [53] in 2000, the algorithm can be viewed as a time-variant counterpart of the algorithm by Makino [54]; both variants are shown in Algorithm 11. During the next 10 years the algorithm became very popular as a clever control of the diagonal step-size matrix can cause a significant speed up of the algorithm [55]. Note that time-invariant matrix step-sizes that are positive definite or exhibit SPR properties are shown to be robust in [32, 42], ensuring l _{2}−stability of Makino’s algorithm. This can easily be shown as the product of consecutive matrices B _{ k } is equivalent to Eq. (5).
The asymmetric form of matrix B _{ k } can thus be made symmetric and standard theory can be applied. Duttweiler replaced L by a time-variant diagonal matrix L _{ k } for which such symmetry correction in the style of (5) does not work any more. He showed his algorithm to be mean-square convergent. First attempts for showing robustness, however, turned out to require further rather limiting conditions on L _{ k } [56]. In [57], it finally is shown that the PNLMS algorithm can indeed become non-robust even if the positive definite entries of L _{ k } are fluctuating only little.
6 Linearly-coupled and partitioned adaptive filters
In a partitioned algorithm, the input vector is split up into one or more sections that run with a different (individual) step-size. This can facilitate parallel implementation and/or improved convergence speed. To simplify matters, let us thus envisage a simple form of a gradient algorithm in which we use two partitions with different step-sizes μ _{ g,k } and μ _{ h,k }. We split the entire parameter-error vector into two parts, say g and h, and correspondingly we use two partitions, say u _{ k } and x _{ k }, as regression vectors. The so obtained Bipartite PNLMS algorithm is summarized in Algorithm 12.
Based on such an algorithmic formulation, we recognize that the bipartite PNLMS algorithm is of the same kind as linearly-coupled adaptive filters with the special step-size/coupling factor choice: ν _{1,k }=ν _{2,k }=μ _{ g,k } and ν _{3,k }=ν _{4,k }=μ _{ h,k }. Even if the two step-sizes are not identical, the update error is still linearly dependent for both partitions, causing one singular value to be larger than one, thus violating robustness. Only the weaker MSE-stability remains. In the following, we demonstrate this behavior on a simple example in which we first run the PNLMS algorithm with worst-case sequences rather than random sequences.
7 Cascaded adaptive algorithms
Cascaded or concatenated structures of adaptive filter algorithms have attracted many researchers in the past. The motivation can be as simple as dividing a long filter in shorter autonomous parts or owing to structural purposes [59, 60]. In the context of the identification of non-linear power amplifiers of large bandwidth, a concatenation of linear filter parts with memory and nonlinear parts without memory is very common. Depending on whether the linear filter comes first or not, we distinguish so-called Wiener or Hammerstein models [61, 62].
In Eq. (10), we recognize the linearly-coupled error term from Eq. (6) with \(\nu _{1,k}=\mu _{g,k} {\mathbf {d}^{\textsf {T}}_{k}} {\mathbf {h}}\), ν _{2,k }=μ _{ g,k }, \(\nu _{3,k}= \mu _{h,k}{\mathbf {d}^{\textsf {T}}_{k}} {\mathbf {h}}\), and ν _{4,k }=μ _{ h,k }, a common property of cascaded filter structures. An extension toward more than two cascaded stages is straightforward but not changing the essential properties. As the update error is identically applied in both stages (in all stages if the filter chain comprises more partitions), the update errors are linearly dependent, only different by potentially different step-sizes. For this particular case of linearly dependent error terms, we find no robustness possible, in particular, as long as \({\mathbf {d}^{\textsf {T}}_{k}} {\mathbf {h}}\) is not exactly known. Recent results on this are provided in [64]. Cascaded structures also appear in multiple-input, multiple-output form in the context of Big Data [65, 66].
8 Outlook and conclusions
It may thus be surprising that indeed many practically relevant adaptive algorithms are non-robust although they are MSE stable. While in everyday situations, they appear to work very properly, input sequences can be found that cause the algorithm to diverge. Once such a sequence is present and no step-size control can cure it.
Are all adaptive filters well understood now? No, there certainly still are white spots left that are not as clear as they could be. Take for example the well-known backpropagation algorithm [67–69] (see Algorithm 14), an extension of the PLA into several layers. Early investigations [70] only showed that the algorithm is locally but not globally robust. Single-layer PLAs, however, are globally robust.
But, also the search for worst-case sequences in asymmetric algorithms that exhibit singular values larger than one, can remain inconclusive. Once such a sequence is found, non-robustness follows, but if the search space is too large and no sequence is found in a random or somehow sophisticated search, it remains unclear whether such sequence does not exist or if we simply cannot find it.
Once we need to rely on the algorithms, we should thus turn to the few robust algorithms rather than pray for stability in the mean-square sense. An open question in this contest remains, however:
Is the MSE really the trouble maker, or is it the independence assumption?
As the mean-square analysis typically comes with the independent assumption, both are not easy to separate. The few known cases for which an analysis is known without the independence assumption [71, 72] are valid for the LMS algorithm only (either for very short or infinitely long filters) and this is as we know a very robust algorithm of symmetrical form.
There are indeed many more algorithms that are worth mentioning. They cannot all be named due to limited space. Let us, however, shortly refer to the notion of stable in probability (almost sure convergence), as it provides another means of describing stable or unstable filter behavior. In [73], the LMS algorithm was analyzed in terms of almost sure convergence, showing that substantially larger step-sizes can be employed than those obtained from MSE analysis. Such methods were successfully applied to the constant modulus algorithm (CMA) [74] and the least mean fourth (LMF) algorithm [75], showing that indeed divergence of the algorithms can be readily obtained when modifying the signal properties of noise [76] and input [77, 78], respectively. Extensions to higher than four exponents can be found in [79].
9 Endnotes
Declarations
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Z Wang, AC Bovik, Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Proc. Mag.26(1), 98–117 (2009).View ArticleGoogle Scholar
- G Ungerboeck, Theory on the speed of convergence in adaptive equalizers for digital communication. IBM J. Res. Develop. 16(6), 546–555 (1972).View ArticleMATHGoogle Scholar
- B Widrow, ME Hoff Jr, Adaptive switching circuits. IRE WESCON Conv. Rec.4:, 96–104 (1960).Google Scholar
- JE Mazo, On the independence theory of equalizer convergence. Bell Syst. Tech. Journal. 58:, 963–993 (1979).View ArticleMathSciNetMATHGoogle Scholar
- E Hänsler, G Schmidt, Acoustic Echo and Noise Control (Wiley, Hoboken, NJ, USA, 2004).View ArticleGoogle Scholar
- R Nitzberg, Application of the normalized LMS algorithm to MSLC. IEEE Trans. Aerosp. Electron. Syst. 21(1), 79–91 (1985).View ArticleGoogle Scholar
- PL Feintuch, An adaptive recursiv LMS filter. Proc. IEEE. 64(11), 1622–1624 (1976).View ArticleGoogle Scholar
- RCJ Johnson, MG Larimore, Comments on and additions to an adaptive recursive LMS filter. Proc. IEEE. 65(9), 1401–1402 (1977).View ArticleGoogle Scholar
- B Widrow, JM McCool, Comments on an adaptive recursive LMS filter. Proc. IEEE. 65(9), 1402–1404 (1977).View ArticleGoogle Scholar
- CJ Johnson, Inf. Theory, IEEE Transac. 25(6), 745–749 (1979). doi:10.1109/TIT.1979.1056097.
- JJ Shynk, Adaptive IIR filtering. IEEE ASSP Mag. 6(2), 4–21 (1989). doi:10.1109/53.29644.View ArticleGoogle Scholar
- P Kabal, The stability of adaptive minimum mean square error equalizers using delayed adjustment. IEEE Trans. Commun. 31(3), 430–432 (1983).View ArticleMATHGoogle Scholar
- G Long, F Ling, JG Proakis, The LMS algorithm with delayed coefficient adaptation. IEEE Trans. Acoust. Speech Signal Process. 37(9), 1397–1405 (1989).View ArticleMATHGoogle Scholar
- M Rupp, R Frenzel, Analysis of LMS and NLMS algorithms with delayed coefficient update under the presence of spherically invariant processes. IEEE Trans. Signal Process. 42(3), 668–672 (1994). doi:10.1109/78.277860.View ArticleGoogle Scholar
- B Widrow, D Shur, S Shaffer, in Record of the Fifteenth Asilomar Conference on Circuits, Systems and Computers. On adaptive inverse control, (1981), pp. 185–189.Google Scholar
- M Rupp, AH Sayed, Robust FxLMS algorithm with improved convergence performance. IEEE Trans. Speech and Audio Process. 6(1), 78–85 (1998). doi:10.1109/89.650314.View ArticleGoogle Scholar
- K Tammi, Active control of rotor vibrations by two feedforward control algorithms. J. Dyn. Syst. Meas. Control. 131:, 1–10 (2009).View ArticleGoogle Scholar
- AJ Hillis, Multi-input multi-output control of an automotive active engine mounting system. Proc. Inst. Mech. Eng. Part D: J. Automob. Eng. 225:, 1492–1504 (2011).View ArticleGoogle Scholar
- F Hausberg, S Vollmann, P Pfeffer, S Hecker, M Plöchl, T Kolkhorst, in 42nd International Congress and Exposition on Noise Control Engineering (Internoise 2013). Improving the convergence behavior of active engine mounts in vehicles with cylinder-on-demand engines (Innsbruck, Austria, 2013).Google Scholar
- F Hausberg, C Scheiblegger, P Pfeffer, M Plöchl, S Hecker, M Rupp, Experimental and analytical study of secondary path variations in active engine mounts. J Sound Vib. 340:, 22–38 (2015).View ArticleGoogle Scholar
- M Rupp, F Hausberg, in Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. LMS algorithmic variants in active noise and vibration control, (2014), pp. 691–695.Google Scholar
- NJ Bershad, Analysis of the normalized LMS algorithm with Gaussian inputs. IEEE Trans. Acoust. Speech Signal Process. 34(4), 793–806 (1986).View ArticleGoogle Scholar
- M Tarrab, A Feuer, Convergence and performance analysis of the normalized LMS algorithm with uncorrelated Gaussian data. IEEE Trans. Inform. Theory. 34(4), 680–691 (1988).View ArticleMathSciNetMATHGoogle Scholar
- M Rupp, The behavior of LMS and NLMS algorithms in the presence of spherically invariant processes. IEEE Trans. Signal Process. 41(3), 1149–1160 (1993). doi:10.1109/78.205720.View ArticleMATHGoogle Scholar
- AH Sayed, Fundamentals of Adaptive Filtering (Wiley, Hoboken, NJ, USA, 2003).Google Scholar
- M Rupp, Asymptotic equivalent analysis of the LMS algorithm under linearly filtered processes. EURASIP J. Adv Signal Process (2015).Google Scholar
- HK Khalil, Nonlinear Systems (Mac Millan, US, 1992).MATHGoogle Scholar
- M Vidyasagar, Nonlinear Systems Analysis (Prentice Hall, second edition, New Jersey, 1993).MATHGoogle Scholar
- B Hassibi, AH Sayed, T Kailath, in Proc. Conference on Decision and Control, 1. LMS is H ∞ optimal (San Antonio, TX, 1993), pp. 74–79.Google Scholar
- AH Sayed, M Rupp, in Proc. SPIE Conf. Adv. Signal Process, 2563. A time-domain feedback analysis of adaptive gradient algorithms via the small gain theorem (San Diego, CA, USA, 1995), pp. 458–469. doi:10.1117/12.211422.
- M Rupp, AH Sayed, A time-domain feedback analysis of filtered-error adaptive gradient algorithms. IEEE Trans. Signal Process. 44(6), 1428–1439 (1996). doi:10.1109/78.506609.View ArticleGoogle Scholar
- AH Sayed, M Rupp, Error-energy bounds for adaptive gradient algorithms. IEEE Trans. Signal Process. 44(8), 1982–1989 (1996). doi:10.1109/78.533719.View ArticleGoogle Scholar
- AH Sayed, M Rupp, in The Digital Signal Processing Handbook. Robustness issues in adaptive filtering (CRC PressBoca Raton, FL, USA, 1998). Chap. 20.Google Scholar
- B Hassibi, T Kailath, in Decision and Control, 1994., Proceedings of the 33rd IEEE Conference On, 4. H _{ ∞ } bounds for the recursive-least-squares algorithm, (1994), pp. 3927–39284. doi:10.1109/CDC.1994.411555.
- M Rupp, AH Sayed, Robustness of Gauss-Newton recursive methods: a deterministic feedback analysis. Signal Process. 50:, 165–187 (1996). doi:10.1016/0165-1684(96)00022-9.View ArticleMATHGoogle Scholar
- B Hassibi, T Kailath, h _{ ∞ } bounds for LS estimators. IEEE Trans. Autom. Control. 46:, 309–414 (2001).View ArticleMATHGoogle Scholar
- M Rupp, AH Sayed, Supervised learning of perceptron and output feedback dynamic networks: A feedback analysis via the small gain theorem. IEEE Trans. Neural Netw. 8(3), 612–622 (1997). doi:10.1109/72.572100.View ArticleGoogle Scholar
- K Ozeki, T Umeda, An adaptive filtering algorithm using orthogonal projection to an affine subspace and its properties. Electron. Commun. Japan. 67-A(5), 19–27 (1984).View ArticleMathSciNetGoogle Scholar
- SL Gay, in Third International Workshop on Acoustic Echo Control. A fast converging, low complexity adaptive filtering algorithm (Plestin les Greves, France, 1993).Google Scholar
- SL Gay, S Tavathia, in Proc. Intl. Conf on Acoustics, Speech and Signal Proc.The fast affine projection algorithm (Detroit, MI, 1995).Google Scholar
- A Mader, H Puder, G Schmidt, Step-size control for acoustic echo cancellation filters—an overview. Signal Process. 80(9), 1697–1719 (2000).View ArticleMATHGoogle Scholar
- M Rupp, Pseudo affine projection algorithms revisited: robustness and stability analysis. IEEE Trans. Signal Process. 59(5), 2017–2023 (2011). doi:10.1109/TSP.2011.2113346.View ArticleMathSciNetGoogle Scholar
- RW Lucky, Automatic equalization for digital communication. Bell System Technical J. 44:, 547–588 (1965).View ArticleMathSciNetGoogle Scholar
- J Proakis, Digital Communications (McGraw-Hill, New York, 2000).Google Scholar
- M Rupp, Convergence properties of adaptive equalizer algorithms. IEEE Trans. Signal Process. 59(6), 2562–2574 (2011). doi:10.1109/TSP.2011.2121905.View ArticleMathSciNetGoogle Scholar
- M Rupp, Robust design of adaptive equalizers. IEEE Trans. Signal Process. 60(4), 1612–1626 (2012). doi:10.1109/TSP.2011.2180717.View ArticleMathSciNetGoogle Scholar
- J Benesty, T Gänsler, Y Huang, M Rupp, in Audio Signal Processing for Next-Generation Multimedia Communication Systems, ed. by Y Huang, J Benesty. Adaptive Algorithms for Mimo Acoustic Echo Cancellation (Springer, 2004), pp. 119–147. ISBN: 978-1-4020-7768-5.Google Scholar
- L Tong, G Xu, T Kailath, in Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems and Computers. A new approach to blind identification and equalization of multipath channels, (1991), pp. 856–8602. doi:10.1109/ACSSC.1991.186568.
- L Tong, S Perreau, Multichannel blind identification: from subspace to maximum likelihood methods. Proc. IEEE. 86(10), 1951–1968 (1998). doi:10.1109/5.720247.View ArticleGoogle Scholar
- Y Huang, J Benesty, Adaptive multi-channel least mean square and Newton algorithms for blind channel identification. Signal Process. 82:, 1127–1138 (2002).View ArticleMATHGoogle Scholar
- M Rupp, AH Sayed, On the convergence of blind adaptive equalizers for constant modulus signals. IEEE Trans. Commun. 48(5), 795–803 (2000). doi:10.1109/26.843192.View ArticleGoogle Scholar
- R Dallinger, M Rupp, in Record 43rd ACSSC. A strict stability limit for adaptive gradient type algorithms (Pacific Grove, CA, USA, 2009), pp. 1370–1374. doi:10.1109/ACSSC.2009.5469884.
- DL Duttweiler, Proportionate normalized least mean square adaptation in echo cancellers. IEEE Trans. Speech Audio Process. 8(5), 508–518 (2000).View ArticleGoogle Scholar
- S Makino, Y Kaneda, N Koizumi, Exponentially weighted step-size NLMS adaptive filter based on the statistics of a room impulse response. IEEE Trans. Speech and Audio Process. 1(1), 101–108 (1993). doi:10.1109/89.221372.View ArticleGoogle Scholar
- J Benesty, SL Gay, in Proc. IEEE ICASSP. An improved PNLMS algorithm, (2002), pp. 1881–1884.Google Scholar
- M Rupp, J Cezanne, Robustness conditions of the LMS algorithm with time-variant matrix step-size. Signal Process. 80(9), 1787–1794 (2000). doi:10.1016/S0165-1684(00)00088-8.View ArticleMATHGoogle Scholar
- R Dallinger, M Rupp, in Proc. of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP’13). On the robustness of LMS algorithms with time-variant diagonal matrix step-size, (2013). doi:10.1109/ICASSP.2013.6638754.
- J Arenas-García, AR Figueiras-Vidal, AH Sayed, Mean-square performance of a convex combination of two adaptive filters. IEEE Trans. Signal Process. 54(3), 1078–1090 (2006). doi:10.1109/TSP.2005.863126.View ArticleGoogle Scholar
- RT Flanagan, J-J Werner, Cascade echo canceler arrangement. U.S. Patent 6,009,083 (1999).Google Scholar
- DY Huang, X Su, A Nallanathan, in Proc. IEEE ICASSP 2005, 3. Characterization of a cascade LMS predictor (Singapore, Singapore, 2005), pp. 173–176. doi:10.1109/ICASSP.2005.1415674.
- SC Cripps, Advanced Techniques in RF Power Amplifier Design (Artech House, Inc., Boston (MA), USA, 2002).Google Scholar
- E Aschbacher, M Rupp, in Proc. IEEE SSP 2005. Robustness analysis of a gradient identification method for a nonlinear Wiener system (Bordeaux, France, 2005), pp. 103–108. doi:10.1109/SSP.2005.1628573.
- R Dallinger, M Rupp, in Proc. IEEE ICASSP 2010. Stability analysis of an adaptive Wiener structure (Dallas, TX, USA, 2010), pp. 3718–3721. doi:10.1109/ICASSP.2010.5495866.
- R Dallinger, M Rupp, in Proc. of EUSIPCO Conference. Stability of adaptive filters with linearly interfering update errors, (2015).Google Scholar
- M Rupp, S Schwarz, in 40th International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). A tensor LMS algorithm, (2015), pp. 3347–3351. doi:10.1109/ICASSP.2015.7178591.
- M Rupp, S Schwarz, in Proc. of EUSIPCO Conference. Gradient-based approaches to learn tensor products, (2015).Google Scholar
- DE Rumelhart, GE Hinton, RJ Williams, Learning representations by back-propagating errors. Nature. 323:, 533–536 (1986). doi:10.1038/323533a0.View ArticleGoogle Scholar
- RP Lippmann, An introduction to computing with neural nets. IEEE Trans. Acoust. Speech Signal Process. 4(2), 4–22 (1987). doi:10.1109/MASSP.1987.1165576.Google Scholar
- R Rojas, Neural Networks (Springer, Berlin, Germany, 1996).View ArticleMATHGoogle Scholar
- B Hassibi, AH Sayed, T Kailath, in Theoretical Advances in Neural Computation and Learning, ed. by V Roychowdhury, K Siu, and A Orlitsky. LMS and backpropagation are minimax filters (Kluwer Academic PublishersNorwell, MA, USA, 1994), pp. 425–447. Chap. 12.View ArticleGoogle Scholar
- SC Douglas, W Pan, Exact expectation analysis of the LMS adaptive filter. IEEE Trans. Signal Process. 43(12), 2863–2871 (1995). doi:10.1109/78.476430.View ArticleGoogle Scholar
- H-J Butterweck, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95), 2. A steady-state analysis of the LMS adaptive algorithm without use of the independence assumption, (1995), pp. 1404–1407. doi:10.1109/ICASSP.1995.480504.
- VH Nascimento, AH Sayed, in Signals, Systems & Computers, 1998. Conference Record of the Thirty-Second Asilomar Conference on, vol. 2. Are ensemble-average learning curves reliable in evaluating the performance of adaptive filters? (1998), pp. 1171–1175. doi:10.1109/ACSSC.1998.751511.
- DN Godard, Self-recovering equalization and carrier tracking in twodimensional data communication systems. IEEE Trans. Commun. 28(11), 1867–1875 (1980). doi:10.1109/TCOM.1980.1094608.View ArticleGoogle Scholar
- E Walach, B Widrow, The least mean fourth (LMF) adaptive algorithm and its family. IEEE Trans. Inf. Theory. 30(2), 275–283 (1984). doi:10.1109/TIT.1984.1056886.View ArticleGoogle Scholar
- O Dabeer, E Masry, Convergence analysis of the constant modulus algorithm. IEEE Trans. Inf. Theory. 49(6), 1447–1464 (2003). doi:10.1109/TIT.2003.811903.View ArticleMathSciNetMATHGoogle Scholar
- VH Nascimento, JCM Bermudez, Probability of divergence for the least-mean fourth algorithm. IEEE Trans. Signal Process. 54(4), 1376–1385 (2006). doi:10.1109/TSP.2006.870546.View ArticleGoogle Scholar
- PI Hubscher, JCM Bermudez, VH Nascimento, A mean-square stability analysis of the least mean fourth adaptive algorithm. IEEE Trans. Signal Process. 55(8), 4018–4028 (2007). doi:10.1109/TSP.2007.894423.View ArticleMathSciNetGoogle Scholar
- M Moinuddin, UM Al-Saggaf, A Ahmed, Family of state space least mean power of two-based algorithms. EURASIP J Adv. Signal Process. 39: (2015). doi:10.1186/s13634-015-0219-9.
- H Robbins, S Monro, A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951).View ArticleMathSciNetMATHGoogle Scholar
- J Kiefer, J Wolfowitz, Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952).View ArticleMathSciNetMATHGoogle Scholar