Skip to main content

Mean square cross error: performance analysis and applications in non-Gaussian signal processing

Abstract

Most of the cost functions of adaptive filtering algorithms include the square error, which depends on the current error signal. When the additive noise is impulsive, we can expect that the square error will be very large. By contrast, the cross error, which is the correlation of the error signal and its delay, may be very small. Based on this fact, we propose a new cost function called the mean square cross error for adaptive filters, and provide the mean value and mean square performance analysis in detail. Furthermore, we present a two-stage method to estimate the closed-form solutions for the proposed method, and generalize the two-stage method to estimate the closed-form solution of the information theoretic learning methods, including least mean fourth, maximum correntropy criterion, generalized maximum correntropy criterion, and minimum kernel risk-sensitive loss. The simulations of the adaptive solutions and closed-form solution show the effectivity of the new method.

Introduction

The mean square error (MSE) is probably the most widely used cost function for adaptive linear filters [1,2,3,4,5]. The MSE relies heavily on Gaussianity assumptions and performs well for Gaussian noise. Recently, information theoretic learning (ITL) has been proposed to process non-Gaussian noise. ITL uses the higher-order moments of the probability density function and may work well for non-Gaussian noise. Inspired by ITL, some cost functions, such as the maximum correntropy criterion (MCC) [6,7,8,9,10,11], improved least sum of exponentials (ILSE) [12], least mean kurtosis (LMK) [13], least mean fourth (LMF) [14,15,16,17,18,19], generalized MCC (GMCC) [20], and minimum kernel risk-sensitive loss (MKRSL) criterion [21, 22] have been presented.

The LMK and LMF are robust to sub-Gaussian noise. One typical sub-Gaussian distribution is the uniform distribution. The MCC and ILSE are robust to larger outliers or impulsive noise, which often take relatively more often values that are very close to zero or very large. This means that impulsive noise has a super-Gaussian distribution [23, 24].

Altogether, the distribution of the additive noise in linear filtering can be divided into three types: Gaussian, super-Gaussian, and sub-Gaussian. Super-Gaussian noise and sub-Gaussian noise are both non-Gaussian.

From the viewpoint of performance, for example, the steady error, the MSE, MCC, and LMF work well for Gaussian, super-Gaussian, and sub-Gaussian noise, respectively. The MSE demonstrates similar performance for the three types of noise under the same signal-to-noise ratio (SNR). For Gaussian noise, all the algorithms have similar steady errors. For super-Gaussian noise, the steady error comparison under the same SNR is MCC < MSE < LMF. For sub-Gaussian noise, the comparison is LMF < MSE < MCC.

Note that the cost functions of the above algorithms all include the square error, which is the correlation of the error signal. When impulsive noise is involved, we can expect that the square error will be very large. By contrast, the cross error (CE), which is the correlation of the error signal itself and its delay, may be very small for impulsive noise.

In our early work [25,26,27], we proposed the mean square cross prediction error to extract the desired signal in blind source separation (BSS), where the square cross prediction error was much smaller than the square prediction error. In this paper, we propose a new cost function called the mean square CE (MSCE) for adaptive filtering to process non-Gaussian noise. We expect that the proposed MSCE algorithm will perform well for non-Gaussian noise.

Note that the ITL methods can capture higher-order statistics of data. Thus, it is hard to directly obtain the corresponding closed-form solutions. We present a two-stage method to estimate the closed-form solutions for the LMF, MCC, GMCC, MKRSL, and MSCE.

The contributions of this paper are summarized as follows:

  1. i)

    We present a new cost function, that is, the MSCE, for adaptive filters, and provide the mean value and mean square performance analysis in detail.

  2. ii)

    We propose a two-stage method to estimate the closed-form solution of the proposed MSCE algorithm.

  3. iii)

    We generalize the two-stage method to estimate the closed-form solution of the LMF MCC, GMCC, and MKRSL algorithms.

The paper is organized as follows: In Section 2, the problem statement is explained in detail. In Section 3, the MSCE algorithm is presented with the adaptive algorithm and closed-form solution. In Section 4, the closed-form solution of the LMF, MCC, GMCC, and MKRSL are estimated. In Section 5, the mean behavior and mean square behavior of MSCE are analyzed. Simulations are provided in Section 6. Lastly, a conclusion is provided in Section 7.

Problem formulation

The absolute value of the normalized kurtosis may be considered as one measure of non-Gaussianity of the error signal. Several definitions for a random variable of zero means are presented as follows:

  • Definition 1 (normalized kurtosis)

The normalized kurtosis of random variable x is defined as

$$ {\kappa}_4=\frac{E\left\{{\left|x\right|}^4\right\}}{E^2\left\{{\left|x\right|}^2\right\}}\hbox{-} 3 $$
(1)

where x has zero mean.

  • Definition 2 (sub-Gaussian or platykurtic)

A distribution with negative normalized kurtosis is called sub-Gaussian, platykurtic, or short-tailed (e.g., uniform).

  • Definition 3 (super-Gaussian or leptokurtic)

A distribution with positive normalized kurtosis is called super-Gaussian, leptokurtic, or heavy-tailed (e.g., Laplacian).

  • Definition 4 (mesokurtic)

A zero-kurtosis distribution is called mesokurtic (e.g., Gaussian).

When the linear filtering problem is considered, there is an input vector uM, with unknown parameter woM and desired response d1. Data d(i) are observed at each time point i by the linear regression model:

$$ d(i)={\mathbf{w}}_o^T\mathbf{u}(i)+v(i),\kern1em i=1,2,\cdots, L, $$
(2)

where v is zero-mean background noise with variance \( {\sigma}_v^2 \) and L is the length of the sequence. The error signal for the linear filter is defined as

$$ e(i)=d(i)-{\mathbf{w}}^T\mathbf{u}(i), $$
(3)

where w is the estimate of wo. The distribution of the additive noise in linear filtering can be divided into three types: Gaussian, super-Gaussian, and sub-Gaussian. Super-Gaussian noise and sub-Gaussian noise are both non-Gaussian.

In this research, we made the following assumptions:

  • A1) The additive noise is white, that is,

$$ E\left\{v(i)v(j)\kern0.1em \right\}=0,i\ne j. $$
(4)
  • A2) Inputs u(t) at different time moments (i, j) are uncorrelated:

$$ E\left\{{\mathbf{u}}^H(i)\mathbf{u}(j)\right\}=E\left\{{\mathbf{u}}^T(i)\mathbf{u}(j)\right\}=0,i\ne j. $$
(5)
  • A3) The inputs and additive noise at different time moments (i, j) are uncorrelated:

$$ E\left\{{\mathbf{u}}^H(i)v(j)\right\}=0,i\ne j. $$
(6)

The linear filtering algorithms of the MSE, MCC, and LMF are as follows: the cost function based on the MSE is given by

$$ {J}_{MSE}\left(\mathbf{w}\right)=E\left\{{e}^2\right\}, $$
(7)

where E denotes the expectation operator. The gradient is defined as

$$ \frac{\partial {J}_{MSE}}{\partial \mathbf{w}}=-E\left\{e\mathbf{u}\right\}. $$
(8)

At the stationary point, E{eu} = 0. The closed-form solution denoted by wMSE is given by the Wiener–Hopf equation:

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{MSE}={\left[\mathbf{R}\left(\mathbf{u}\right)\right]}^{-1}\left[{\mathbf{R}}_{du}\right],\\ {}\mathbf{R}\left(\mathbf{u}\right)=E\left\{{\mathbf{uu}}^H\right\},\\ {}{\mathbf{R}}_{du}=E\left\{d\mathbf{u}\right\}.\end{array}} $$
(9)

The corresponding stochastic gradient descent, or LMS, algorithm is

$$ {\mathbf{w}}_{LMS}\left(i+1\right)={\mathbf{w}}_{LMS}(i)+\mu e(i)\mathbf{u}(i)\kern0.1em $$
(10)

where μdenotes the step size and μ > 0.

The cost function based on the LMF is given by

$$ {J}_{LMF}\left(\mathbf{w}\right)=E\left\{{e}^4\right\}. $$
(11)

The corresponding stochastic gradient descent algorithm is

$$ {\mathbf{w}}_{LMF}\left(i+1\right)={\mathbf{w}}_{LMF}(i)+\mu {e}^3(i)\boldsymbol{u}(i). $$
(12)

The cost function based on the correntropy of the error, also called the MCC, is given by

$$ {J}_{MCC}\left(\mathbf{w}\right)=E\left\{\exp \left(-\frac{e^2}{2{\sigma}^2}\right)\right\}\kern0.1em $$
(13)

where σdenotes the kernel bandwidth. The corresponding stochastic gradient descent algorithm is

$$ {\mathbf{w}}_{MCC}\left(i+1\right)={\mathbf{w}}_{MCC}(i)+\mu \exp \left(-\frac{e^2(i)}{2{\sigma}^2}\right)e(i)\mathbf{u}(i). $$
(14)

The cost function based on the GMCC is given by

$$ {J}_{GMCC}\left(\mathbf{w}\right)={\gamma}_{\alpha, \kern0.5em \beta}\left\{1\hbox{-} E\left[\exp \left(-\lambda {\left|e(i)\right|}^{\alpha}\right)\right]\right\}. $$
(15)

The corresponding stochastic gradient descent algorithm is

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{GMCC}\left(i+1\right)={\mathbf{w}}_{GMCC}(i)+\\ {}\mu \lambda \alpha \exp \left(-\lambda {\left|e(i)\right|}^{\alpha}\right){\left|e(i)\right|}^{\alpha \hbox{-} 1}\operatorname{sign}\left(e(i)\right)\mathbf{u}(i).\end{array}} $$
(16)

The cost function based on the MKRSL is given by

$$ {\displaystyle \begin{array}{l}{J}_{MKRSL}\left(\mathbf{w}\right)=\frac{1}{L\lambda}\sum \limits_{i=1}^L\exp \left(\lambda \left(1-{\kappa}_{\sigma}\left(e(i)\right)\right)\right),\\ {}{\kappa}_{\sigma}\left(e(i)\right)=\exp \left(-\frac{e^2(i)}{2{\sigma}^2}\right).\end{array}} $$
(17)

The corresponding stochastic gradient descent algorithm is

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{MKRSL}\left(i+1\right)={\mathbf{w}}_{MKRSL}(i)+\\ {}\frac{\mu }{\sigma^2}\exp \left(\lambda \left(1-{\kappa}_{\sigma}\left(e(i)\right)\right)\right){\kappa}_{\sigma}\left(e(i)\right)e(i)\mathbf{u}(i).\end{array}} $$
(18)

Methods

Adaptive algorithm of the MSCE

The CE can be expressed as e(i)e(i-q), where q denotes the error delay. Because the CE may be negative, we provide a new cost function, that is, the MSCE, as

$$ {J}_{MSCE}\left(\mathbf{w}\right)=\frac{1}{2}E\left\{{e}^2(i){e}^2\left(i-q\right)\right\}\kern0.1em $$
(19)

where

$$ {\displaystyle \begin{array}{l}e(i)=d(i)-{\mathbf{w}}^T\mathbf{u}(i),\\ {}e\left(i-q\right)=d\left(i-q\right)-{\mathbf{w}}^T\mathbf{u}\left(i-q\right).\end{array}} $$
(20)

The gradient of the MSCE can be derived as

$$ \frac{\partial {J}_{MSCE}}{\partial \mathbf{w}}=-E\left\{{e}^2\left(i-q\right)\left[e(i)\mathbf{u}(i)\right]+{e}^2(i)\left[e\left(i-q\right)\mathbf{u}\left(i-q\right)\right]\right\}. $$
(21)

Then, the corresponding stochastic gradient descent algorithm is

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{MSCE}\left(i+1\right)={\mathbf{w}}_{MSCE}(i)+\\ {}\mu {e}^2\left(i-q\right)e(i)\mathbf{u}(i)+\mu {e}^2(i)e\left(i-q\right)\mathbf{u}\left(i-q\right).\end{array}} $$
(22)

Equation (19) may not be robust against outliers. We provide the generalized MSCE (GMSCE) as

$$ {J}_{MSCE}\left(\mathbf{w}\right)=E\left\{G\left[e(i)\right]G\left[e\left(i-q\right)\right]\right\}\kern0.1em $$
(23)

where G(x) is an x2-like function. The stochastic gradient descent algorithm for the GMSCE is

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{GMSCE}\left(i+1\right)={\mathbf{w}}_{GMSCE}(i)\\ {}+\mu G\left[e(i)\right]g\left[e\left(i-q\right)\right]\mathbf{u}\left(i-q\right)\\ {}+\mu G\left[e\left(i-q\right)\right]g\left[e(i)\right]\mathbf{u}(i)\end{array}} $$
(24)

where g(.) is the derivative of G(.).

A suitable criterion for cost function

Combining the references of ICA [23, 24, 28,29,30] with those of the MCC [6,7,8,9,10,11], ILSE [12], and LMF [14,15,16,17,18,19], we determined that there are three cost functions in the fast ICA [30] algorithm:

$$ {G}_1(u)=\frac{1}{\alpha_1}\log \cosh \left({\alpha}_1u\right), $$
(25)
$$ {G}_2(u)=-\frac{1}{\alpha_2}\exp \left(-\frac{\alpha_2{u}^2}{2{\sigma}^2}\right), $$
(26)
$$ {G}_3(u)=\frac{1}{4}{u}^4. $$
(27)

G2(u) is used to separate the super-Gaussian source in ICA, and works as the cost function of the MCC. G3(u) is used to separate the sub-Gaussian source in ICA when there are no outliers, and works as the cost function of the LMF. G1(u) has not been used in adaptive filtering, but cosh(α1u) works as the cost function of the ILSE.

This motivated us to explore ICA or BSS algorithms to determine a suitable criterion for adaptive filtering. Here, we use G1(u) in the proposed GMSCE algorithm:

$$ G(x)=\frac{1}{\alpha}\log \cosh \left(\alpha x\right) $$
(28)

where 1 ≤ α ≤ 2, and α = 1in the simulations. The derivative of G(x) is

$$ g(x)=\tanh \left(\alpha x\right). $$
(29)

Substituting (28)–(29) into (24) with α = 1, we obtain

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{GMSCE}\left(i+1\right)={\mathbf{w}}_{GMSCE}(i)\\ {}+\mu \log \cosh \left[e(i)\right]\tanh \left[e\left(i-q\right)\right]\mathbf{u}\left(i-q\right)\\ {}+\mu \log \cosh \left[e\left(i-q\right)\right]\tanh \left[e(i)\right]\mathbf{u}(i).\end{array}} $$
(30)

Closed-form solution of the MSCE and GMSCE

We can estimate the closed-form solution of the MSCE from the stationary point ∂JMSCE/w = 0:

$$ \frac{\partial {J}_{MSCE}}{\partial \mathbf{w}}=-E\left\{{e}^2\left(i-q\right)\left[e(i)\mathbf{u}(i)\right]+{e}^2(i)\left[e\left(i-q\right)\mathbf{u}\left(i-q\right)\right]\right\}=0. $$
(31)

Substituting (20) into (31), we obtain

$$ {\displaystyle \begin{array}{l}\frac{\partial {J}_{MSCE}}{\partial \mathbf{w}}=-E\left\{{e}^2\left(i-q\right)\left[d(i)\mathbf{u}(i)-\mathbf{u}(i){\mathbf{u}}^T(i)\mathbf{w}\right]\right\}\\ {}-E\left\{{e}^2(i)\left[d\left(i-q\right)\mathbf{u}\left(i-q\right)-\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\mathbf{w}\right]\right\}=0.\end{array}} $$
(32)

It is difficult to solve w from (32) because e2(i) contains the second-order term of w. We present a two-stage method to estimate w.

In the first stage, we estimate e(i) and e(i − q) from (9):

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{MSE}={\left[R\left(\mathbf{u}\right)\right]}^{-1}\left[{R}_{du}\right],\\ {}\hat{e}\left(i-q\right)=d\left(i-q\right)-{\mathbf{w}}_{MSE}^T\mathbf{u}\left(i-q\right),\\ {}\hat{e}(i)=d(i)-{\mathbf{w}}_{MSE}^T\mathbf{u}(i),i=1,2,\cdots, L\end{array}} $$
(33)

where \( \hat{e}(i) \)and \( \hat{e}\left(i-q\right) \) are the estimates of e(i) and e(i − q), respectively.

In the second stage, we estimate w from (32).

If we define

$$ {F}_1\left(\mathbf{w}\right)=\frac{\partial {J}_{MSCE}}{\partial \mathbf{w}}, $$

then we can rewrite (33) as

$$ {\displaystyle \begin{array}{l}{F}_1\left(\mathbf{w}\right)=-E\left\{{\hat{e}}^2\left(i-q\right)\left[d(i)\mathbf{u}(i)-\mathbf{u}(i){\mathbf{u}}^T(i)\mathbf{w}\right]\right\}\\ {}-E\left\{{\hat{e}}^2(i)\left[d\left(i-q\right)\mathbf{u}\left(i-q\right)-\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\mathbf{w}\right]\right\}=0\end{array}} $$
(34)

where F1(w) is the estimate of F1(w).

Note that the expectation can be estimated by averaging over the samples using

$$ E\left\{f(x)\right\}\approx \frac{1}{L}\sum \limits_{i=1}^Lf\left({x}_i\right). $$
(35)

Equation (34) can be estimated by

$$ {\displaystyle \begin{array}{l}{F}_1\left(\mathbf{w}\right)\approx -\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right)\left[d(i)\mathbf{u}(i)-\mathbf{u}(i){\mathbf{u}}^T(i)\mathbf{w}\right]\right\}\\ {}-\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2(i)\left[d\left(i-q\right)\mathbf{u}\left(i-q\right)-\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\mathbf{w}\right]\right\}\\ {}=-\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right)d(i)\mathbf{u}(i)+{\hat{e}}^2(i)d\left(i-q\right)\mathbf{u}\left(i-q\right)\right\}\\ {}+\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i)+{\hat{e}}^2(i)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\right\}\mathbf{w}=0.\end{array}} $$
(36)

Furthermore, we have

$$ {\displaystyle \begin{array}{l}\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right)d(i)\mathbf{u}(i)+{\hat{e}}^2(i)d\left(i-q\right)\mathbf{u}\left(i-q\right)\right\}=\\ {}\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i)+{\hat{e}}^2(i)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\right\}\mathbf{w}.\end{array}} $$
(37)

If we define Rdu(q) and Ruu(q) as

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{du}(q)=E\left\{{e}^2\left(i-q\right)d(i)\mathbf{u}(i)+{e}^2(i)d\left(i-q\right)\mathbf{u}\left(i-q\right)\right\},\\ {}{\mathbf{R}}_{uu}(q)=E\left\{{e}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i)+{e}^2(i)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\right\},\end{array}} $$
(38)

then we can estimate them as

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{du}(q)=\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right)d(i)\mathbf{u}(i)+{\hat{e}}^2(i)d\left(i-q\right)\mathbf{u}\left(i-q\right)\right\},\\ {}{\mathbf{R}}_{uu}(q)=\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i)+{\hat{e}}^2(i)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\right\},\end{array}} $$
(39)

where Rdu(q) and Ruu(q) are the estimates of Rdu(q) and Ruu(q), respectively.

We can estimate the closed-form solution of the MSCE as

$$ {\mathbf{w}}_{close\_ MSCE}={\left[{\mathbf{R}}_{uu}(q)\right]}^{-1}\left[{\mathbf{R}}_{du}(q)\right]. $$
(40)

Note than tanh(x) ≈ x when x is small. We can estimate the closed-form solution of the GMSCE in the same way.

If we define RGdu(q) and RGuu(q) as

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{Gdu}(q)=E\left\{G\left(e\left(i-q\right)\right)d(i)\mathbf{u}(i)+G\left(e(i)\right)d\left(i-q\right)\mathbf{u}\left(i-q\right)\right\},\\ {}{\mathbf{R}}_{Guu}(q)=E\left\{G\left(e\left(i-q\right)\right)\mathbf{u}(i){\mathbf{u}}^T(i)+G\left(e(i)\right)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\right\},\end{array}} $$
(41)

then we can estimate them as

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{Gdu}(q)=\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{G\left(\hat{e}\left(i-q\right)\right)d(i)\mathbf{u}(i)+G\left(\hat{e}(i)\right)d\left(i-q\right)\mathbf{u}\left(i-q\right)\right\},\\ {}{\mathbf{R}}_{Guu}(q)=\frac{1}{L}\sum \limits_{i=q+1}^{L+q}\left\{G\left(\hat{e}\left(i-q\right)\right)\mathbf{u}(i){\mathbf{u}}^T(i)+G\left(\hat{e}(i)\right)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right)\right\}.\end{array}} $$
(42)

We can estimate the closed-form solution of the GMSCE as

$$ {\mathbf{w}}_{close\_ GMSCE}={\left[{\mathbf{R}}_{Guu}(q)\right]}^{-1}\left[{\mathbf{R}}_{Gdu}(q)\right]. $$
(43)

Closed-form solution of the LMF, MCC, GMCC, and MKRSL

Based on the two-stage methods, we can also estimate the closed solution of the LMF, MCC, GMCC, and MKRSL algorithms as follows.

Closed-form solution of the LMF

We can estimate the closed-form solution of the LMF from the stationary point ∂JLMF/w = 0:

$$ \frac{\partial {J}_{LMF}}{\partial \mathbf{w}}=-E\left\{{e}^2(i)\left[d-{w}^T\mathbf{u}(i)\right]\mathbf{u}(i)\right\}. $$
(44)

In the first stage, we estimate e(i) from (9):

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{MSE}={\left[R\left(\mathbf{u}\right)\right]}^{-1}\left[{R}_{du}\right]\\ {}\hat{e}(i)=d(i)-{\mathbf{w}}_{MSE}^T\mathbf{u}(i),i=1,2,\cdots, L.\end{array}} $$
(45)

In the second stage, we estimate w from (44).

If we define

$$ {F}_2\left(\mathbf{w}\right)=\frac{\partial {J}_{LMF}}{\partial \mathbf{w}}, $$
(46)

then we can rewrite (44) as

$$ {F}_2\left(\mathbf{w}\right)=-E\left\{{\hat{e}}^2(i)\left[d(i)\mathbf{u}(i)-\mathbf{u}(i){\mathbf{u}}^T(i)\mathbf{w}\right]\right\} $$
(47)

where F2(w) is the estimate of F2(w).

With the help of (35), Eq. (47) can be estimated by

$$ {F}_1\left(\mathbf{w}\right)\approx -\frac{1}{L}\sum \limits_{i=1}^L\left\{{\hat{e}}^2(i)\left[d(i)\mathbf{u}(i)-\mathbf{u}(i){\mathbf{u}}^T(i)\mathbf{w}\right]\right\}. $$
(48)

If we define Rdu2 and Ruu2 as

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{du2}=E\left\{{e}^2(i)d(i)\mathbf{u}(i)\right\},\\ {}{\mathbf{R}}_{uu2}=E\left\{{e}^2(i)\mathbf{u}(i){\mathbf{u}}^T(i)\right\},\end{array}} $$
(49)

then we can estimate them as

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{du2}=\frac{1}{L}\sum \limits_{i=1}^L\left\{{\hat{e}}^2(i)d(i)\mathbf{u}(i)\right\},\\ {}{\mathbf{R}}_{uu2}=\frac{1}{L}\sum \limits_{i=1}^L\left\{{\hat{e}}^2(i)\mathbf{u}(i){\mathbf{u}}^T(i)\right\},\end{array}} $$
(50)

where Rdu2 and Ruu2 are the estimates of Rdu2 and Ruu2, respectively.

We can estimate the closed-form solution of the LMF as

$$ {\mathbf{w}}_{close\_ LMF}={\left[{\mathbf{R}}_{uu2}\right]}^{-1}\left[{\mathbf{R}}_{du2}\right]. $$
(51)

Closed-form solution of the MCC

We can estimate the closed-form solution of the MCC from the stationary point ∂JMCC/w = 0:

$$ \frac{\partial {J}_{MCC}}{\partial \mathbf{w}}=\frac{1}{\sigma^2}E\left\{\exp \left(-{e}^2(i)/2{\sigma}^2\right)\left[d-{w}^T\mathbf{u}(i)\right]\mathbf{u}(i)\right\}=0. $$
(52)

In the first stage, we estimate e(i) from (45). In the second stage, we estimate w from (52).

If we define

$$ {F}_3\left(\mathbf{w}\right)={\sigma}^2\frac{\partial {J}_{MCC}}{\partial \mathbf{w}}, $$
(53)

then we can rewrite (52) as

$$ {F}_3\left(\mathbf{w}\right)=E\left\{\exp \left(-{\hat{e}}^2(i)/2{\sigma}^2\right)\left[d(i)\mathbf{u}(i)-\mathbf{u}(i){\mathbf{u}}^T(i)\mathbf{w}\right]\right\}\kern0.1em $$
(54)

where F3(w) is the estimate of F3(w).

With the help of (35), Eq. (54) can be estimated by

$$ {F}_1\left(\mathbf{w}\right)\approx -\frac{1}{L}\sum \limits_{i=1}^L\left\{\exp \left(-{\hat{e}}^2(i)/2{\sigma}^2\right)\left[d(i)\mathbf{u}(i)-\mathbf{u}(i){\mathbf{u}}^T(i)\mathbf{w}\right]\right\}. $$
(55)

Denote by

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{du3}=\frac{1}{L}\sum \limits_{i=1}^L\left\{\exp \left(-{\hat{e}}^2(i)/2{\sigma}^2\right)d(i)\mathbf{u}(i)\right\},\\ {}{\mathbf{R}}_{uu3}=\frac{1}{L}\sum \limits_{i=1}^L\left\{\exp \left(-{\hat{e}}^2(i)/2{\sigma}^2\right)\mathbf{u}(i){\mathbf{u}}^T(i)\right\}.\end{array}} $$
(56)

We can estimate the closed-form solution of the MCC as

$$ {\mathbf{w}}_{close\_ MCC}={\left[{\mathbf{R}}_{uu3}\right]}^{-1}\left[{\mathbf{R}}_{du3}\right] $$
(57)

Closed-form solution of the GMCC

The closed-form solution of the GMCC is given by [20]:

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{close\_ GMCC}={\left[E\left({h}_1\left(e(i)\right)\mathbf{u}(i){\mathbf{u}}^T(i)\right)\right]}^{-1}\left[E\left({h}_1\left(e(i)\right)d(i)\mathbf{u}(i)\right)\right],\\ {}{h}_1\left(e(i)\right)=\exp \left(-\lambda {\left|e(i)\right|}^{\alpha}\right){\left|e(i)\right|}^{\alpha -2}.\end{array}} $$
(58)

In the first stage, we estimate e(i) from (45), and we have

$$ {h}_1\left(\hat{e}(i)\right)=\exp \left(-\lambda {\left|\hat{e}(i)\right|}^{\alpha}\right){\left|\hat{e}(i)\right|}^{\alpha -2}. $$
(59)

In the second stage, we estimate w from (59). Denote by

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{du4}=\frac{1}{L}\sum \limits_{i=1}^L\left\{{h}_1\left(\hat{e}(i)\right)d(i)\mathbf{u}(i)\right\},\\ {}{\mathbf{R}}_{uu4}=\frac{1}{L}\sum \limits_{i=1}^L\left\{{h}_1\left(\hat{e}(i)\right)\mathbf{u}(i){\mathbf{u}}^T(i)\right\}.\end{array}} $$
(60)

We can estimate the closed-form solution of the GMCC as

$$ {\mathbf{w}}_{close\_ GMCC}={\left[{\mathbf{R}}_{uu4}\right]}^{-1}\left[{\mathbf{R}}_{du4}\right] $$
(61)

Closed-form solution of the MKRSL

The closed-form solution of the MKRSL is given by [22]:

$$ {\displaystyle \begin{array}{l}{\mathbf{w}}_{close\_ MKRSL}={\left[E\left({h}_2\left(e(i)\right)\mathbf{u}(i){\mathbf{u}}^T(i)\right)\right]}^{-1}\left[E\left({h}_2\left(e(i)\right)d(i)\mathbf{u}(i)\right)\right],\\ {}{h}_2\left(e(i)\right)=\exp \left(\lambda \left(1-{\kappa}_{\sigma}\left(e(i)\right)\right)\right){\kappa}_{\sigma}\left(e(i)\right).\end{array}} $$
(62)

In the first stage, we estimate e(i)from (45), and we have

$$ {h}_2\left(\hat{e}(i)\right)=\exp \left(\lambda \left(1-{\kappa}_{\sigma}\left(\hat{e}(i)\right)\right)\right){\kappa}_{\sigma}\left(\hat{e}(i)\right). $$
(63)

In the second stage, we estimate w from (63). Denote by

$$ {\displaystyle \begin{array}{l}{\mathbf{R}}_{du5}=\frac{1}{L}\sum \limits_{i=1}^L\left\{{h}_2\left(\hat{e}(i)\right)d(i)\mathbf{u}(i)\right\},\\ {}{\mathbf{R}}_{uu5}=\frac{1}{L}\sum \limits_{i=1}^L\left\{{h}_2\left(\hat{e}(i)\right)\mathbf{u}(i){\mathbf{u}}^T(i)\right\}.\end{array}} $$
(64)

We can estimate the closed-form solution of the MKRSL as

$$ {\mathbf{w}}_{close\_ MKRSL}={\left[{\mathbf{R}}_{uu5}\right]}^{-1}\left[{\mathbf{R}}_{du5}\right]. $$
(65)

Performance analysis of MSCE

Mean value behavior

To compare the performance of the MSE and MSCE, we define the total weight error as

$$ E\left\{{\left|\varepsilon (i)\right|}^2\right\}=E\left\{{\varepsilon}^T(i)\varepsilon (i)\right\} $$
(66)

where

$$ \varepsilon (i)={\mathbf{w}}_o-{\mathbf{w}}_{MSCE}(i). $$
(67)

Substituting (22) into (67), we obtain

$$ {\displaystyle \begin{array}{l}\varepsilon \left(i+1\right)=\varepsilon (i)-\mu {e}^2\left(i-q\right)\left[\mathbf{u}(i){\mathbf{u}}^T(i)\kern0.1em \varepsilon (i)+\mathbf{u}(i)\kern0.1em v(i)\right]\\ {}-\mu {e}^2(i)\left[\mathbf{u}\left(i-q\right)\kern0.1em {\mathbf{u}}^T\left(i-q\right)\kern0.1em \varepsilon (i)+\mathbf{u}\left(i-q\right)\kern0.1em v\left(i-q\right)\right]\\ {}=\left[\mathbf{I}-\mu {e}^2(i)\mathbf{u}\left(i-q\right)\kern0.1em {\mathbf{u}}^T\left(i-q\right)-\mu {e}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i)\right]\varepsilon (i)\\ {}-\left[\mu {e}^2(i)v\left(i-q\right)\mathbf{u}\left(i-q\right)+\mu {e}^2\left(i-q\right)v(i)\mathbf{u}(i)\kern0.1em \right]\end{array}} $$
(68)

where I is the identity matrix.

Substituting (39) into (67), we obtain

$$ {\displaystyle \begin{array}{l}E\left\{\varepsilon \left(i+1\right)\right\}=\left[\mathbf{I}-\mu {\mathbf{R}}_{uu}(q)\right]E\left\{\varepsilon (i)\right\}\\ {}-\mu E\left[{e}^2(i)v\left(i-q\right)\mathbf{u}\left(i-q\right)+{e}^2\left(i-q\right)v(i)\mathbf{u}(i)\kern0.1em \right].\end{array}} $$
(69)

Note that Ruu(q) is positive definite; thus, (69) is stable for a sufficiently small step size μ.

The eigenvalue decomposition of Ruu(q) is Ruu(q) = UΛUT. Then, we have the first-order moment of ε(i)

$$ {\displaystyle \begin{array}{l}E\left\{\varepsilon (i)\right\}\approx \mathbf{U}{\left[\mathbf{I}-\mu \Lambda \right]}^i{\mathbf{U}}^T\varepsilon (0),\\ {}\Lambda =\mathit{\operatorname{diag}}\left\{{d}_1,{d}_2,\cdots {d}_M\right\}.\end{array}} $$
(70)

Let dmax denote the maximum eigenvalue of Ruu(q). The step size should be selected as

$$ 0<\mu <2/{d}_{\mathrm{max}} $$
(71)

so that the iterations will converge.

Mean square behavior

If we define

$$ {\displaystyle \begin{array}{l}\delta (i)={\mathbf{U}}^T\varepsilon (i),\\ {}\mathbf{U}=\left[{\mathbf{U}}_1,\kern0.5em {\mathbf{U}}_2,\cdots, \kern0.5em {\mathbf{U}}_M\right],\end{array}} $$
(72)

where Um is the mth column of U, then we can rewrite (56) as

$$ {\displaystyle \begin{array}{l}\delta \left(i+1\right)\approx \left[\mathbf{I}-\mu \Lambda \right]\delta (i)\\ {}-\mu {\mathbf{U}}^T\left[{e}^2(i)v\left(i-q\right)\mathbf{u}\left(i-q\right)+{e}^2\left(i-q\right)v(i)\mathbf{u}(i)\kern0.1em \right],\end{array}} $$
(73)

which is composed of M decoupled difference equations:

$$ {\displaystyle \begin{array}{l}{\delta}_m\left(i+1\right)\approx \left(1-\mu {d}_m\right){\delta}_m(i)\\ {}-\mu {\mathbf{U}}_m^T\left[{e}^2(i)v\left(i-q\right)\mathbf{u}\left(i-q\right)+{e}^2\left(i-q\right)v(i)\mathbf{u}(i)\kern0.1em \right],\\ {}m=1,2,\cdots, M.\end{array}} $$
(74)

The second-order moment of δm(i + 1) can be derived from (74) as

$$ {\displaystyle \begin{array}{l}E\left\{{\delta}_m^2\left(i+1\right)\right\}\approx {\left(1-\mu {d}_m\right)}^2E\left\{{\delta}_m^2(i)\right\}\\ {}+{\mu}^2E\left\{{e}^2(i){v}^2\left(i-q\right){\mathbf{U}}_m^T{e}^2(i)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right){\mathbf{U}}_m\right\}\\ {}+{\mu}^2E\left\{{e}^2\left(i-q\right){v}^2(i){\mathbf{U}}_m^T{e}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i){\mathbf{U}}_m\right\}.\end{array}} $$
(75)

From (2), (3), and (67), we obtain

$$ {\displaystyle \begin{array}{l}e(i)={\varepsilon}^T(i)\mathbf{u}(i)+v(i),\\ {}e\left(i-q\right)={\varepsilon}^T(i)\mathbf{u}\left(i-q\right)+v\left(i-q\right).\end{array}} $$
(76)

Thus, we have

$$ {\displaystyle \begin{array}{l}{e}^2(i)={\mathbf{u}}^T(i)\varepsilon (i){\varepsilon}^T(i)\mathbf{u}(i)+{v}^2(i)\approx {v}^2(i),\\ {}{e}^2\left(i-q\right)={\mathbf{u}}^T\left(i-q\right)\varepsilon (i){\varepsilon}^T(i)\mathbf{u}\left(i-q\right)+{v}^2\left(i-q\right)\approx {v}^2\left(i-q\right).\end{array}} $$
(77)

Substituting (77) into (75) yields

$$ {\displaystyle \begin{array}{l}E\left\{{\delta}_m^2\left(i+1\right)\right\}\approx {\left(1-\mu {d}_m\right)}^2E\left\{{\delta}_m^2(i)\right\}\\ {}+{\mu}^2E\left\{{e}^2(i){e}^2\left(i-q\right){\mathbf{U}}_m^T{e}^2(i)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right){\mathbf{U}}_m\right\}\\ {}+{\mu}^2E\left\{{e}^2\left(i-q\right){e}^2(i){\mathbf{U}}_m^T{e}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i){\mathbf{U}}_m\right\}.\end{array}} $$
(78)

Let

$$ {r}_{uu}(q)={e}^2\left(i-q\right)\mathbf{u}(i){\mathbf{u}}^T(i)+{e}^2(i)\mathbf{u}\left(i-q\right){\mathbf{u}}^T\left(i-q\right), $$
(79)

then we have

$$ {\mathbf{R}}_{uu}(q)=E\left\{{r}_{uu}(q)\right\}. $$
(80)

Substituting (79) into (80) yields

$$ {\displaystyle \begin{array}{l}E\left\{{\delta}_m^2\left(i+1\right)\right\}\approx {\left(1-\mu {d}_m\right)}^2E\left\{{\delta}_m^2(i)\right\}\\ {}+{\mu}^2E\left\{{e}^2(i){e}^2\left(i-q\right){\mathbf{U}}_m^T\left[{r}_{uu}(q)\right]{\mathbf{U}}_m\right\}\\ {}\approx {\left(1-\mu {d}_m\right)}^2E\left\{{\delta}_m^2(i)\right\}\\ {}+{\mu}^2E\left\{{e}^2(i){e}^2\left(i-q\right)\right\}E\left\{{\mathbf{U}}_m^T\left[{r}_{uu}(q)\right]{\mathbf{U}}_m\right\}\\ {}={\left(1-\mu {d}_m\right)}^2E\left\{{\delta}_m^2(i)\right\}\\ {}+{\mu}^2E\left\{{e}^2(i){e}^2\left(i-q\right)\right\}{\mathbf{U}}_m^T{\mathbf{R}}_{uu}(q){\mathbf{U}}_m.\end{array}} $$
(81)

Note that \( {\mathbf{U}}_m^T{\mathbf{R}}_{uu}(q){\mathbf{U}}_m={d}_m \), and we have

$$ {\displaystyle \begin{array}{l}E\left\{{\delta}_m^2\left(i+1\right)\right\}\approx {\left(1-\mu {d}_m\right)}^2E\left\{{\delta}_m^2(i)\right\}\\ {}+{\mu}^2E\left\{{e}^2(i){e}^2\left(i-q\right)\right\}{d}_m.\end{array}} $$
(82)

Then the second-order moment of δm(i + 1) can be rewritten as

$$ {\displaystyle \begin{array}{l}E\left\{{\delta}_m^2(i)\right\}\approx {\left(1-\mu {d}_m\right)}^{2i}E\left\{{\delta}_m^2(0)\right\}\\ {}+\sum \limits_{j=0}^i{\left(1-\mu {d}_m\right)}^{2j}{\mu}^2{d}_mE\left\{{e}^2(i){e}^2\left(i-q\right)\right\}\\ {}={\left(1-\mu {d}_m\right)}^{2i}E\left\{{\delta}_m^2(0)\right\}\\ {}+\frac{1-{\left(1-\mu {d}_m\right)}^{2i}}{\left(2-\mu {d}_m\right)\mu {d}_m}{\mu}^2{d}_mE\left\{{e}^2(i){e}^2\left(i-q\right)\right\}\\ {}={\left(1-\mu {d}_m\right)}^{2i}E\left\{{\delta}_m^2(0)\right\}\\ {}+\frac{1-{\left(1-\mu {d}_m\right)}^{2i}}{\left(2-\mu {d}_m\right)}\mu E\left\{{e}^2(i){e}^2\left(i-q\right)\right\}\\ {}\approx {\left(1-\mu {d}_m\right)}^{2i}E\left\{{\delta}_m^2(0)\right\}\\ {}+\frac{1-{\left(1-\mu {d}_m\right)}^{2i}}{\left(2-\mu {d}_m\right)}\mu E\left\{{v}^2(i){v}^2\left(i-q\right)\right\}.\end{array}} $$
(83)

Thus, the steady state error of the MSCE algorithm is given by

$$ \underset{i\to \infty }{\lim}\sum \limits_{m=1}^ME\left\{{\delta}_m^2(i)\right\}\approx \sum \limits_{m=1}^M\frac{1}{\left(2-\mu {d}_m\right)}\mu E\left\{{v}^2(i){v}^2\left(i-q\right)\right\}. $$
(84)

When μ <  < 1/dmax, we have

$$ \underset{i\to \infty }{\lim}\sum \limits_{m=1}^ME\left\{{\delta}_m^2(i)\right\}\approx \frac{1}{2}\mu ME\left\{{v}^2(i){v}^2\left(i-q\right)\right\}. $$
(85)

For comparison, we can write the steady state error of the MSE algorithm as

$$ \underset{i\to \infty }{\lim}\sum \limits_{m=1}^ME\left\{{\delta}_m^2(i)\right\}\approx \frac{1}{2}\mu ME\left\{{v}^2(i)\right\}. $$
(86)

For impulsive noise, its variance E{v2(i)} may be very large, but its MSCE E{v2(i)v2(i − q)} may be small. Thus, the proposed MSCE algorithm may have a smaller steady error than the MSE for impulsive noise.

Selection of delay q

After obtaining the estimate of e(i) using (33), we can estimate the MSCE for q = 1, 2, , Q:

$$ {J}_{MSCE}\left(\mathbf{w},q\right)\approx \frac{1}{2L}\sum \limits_{i=q+1}^{L+q}\left\{{\hat{e}}^2\left(i-q\right){\hat{e}}^2(i)\right\} $$
(87)

whereJMSCE is the estimate of JMSCE. Because the mean-square performance of the MSCE algorithm is proportional to E{v2(i)v2(i − q)} according to (85), we should select q with the smallest JMSCEin (87).

Simulation results and discussion

In this section, the performance of the MSE, MSCE, GMSCE, LMF, MCC, GMCC, and MKRSL will be evaluated by simulations. All the simulation points were averaged over 100 independent runs. The performance of the adaptive solution was estimated by the steady-state mean-square deviation (MSD)

$$ MSD=\underset{i\to \infty }{\lim }E\left\{{\left\Vert {\mathbf{w}}_o-\mathbf{w}(i)\right\Vert}_2^2\right\}. $$
(88)

The performance of the closed-form solution is

$$ MSD={\left\Vert {\mathbf{w}}_o-\mathbf{w}\right\Vert}_2^2. $$
(89)

We concluded that the smaller the MSD, the better the performance.

Closed-form solutions comparison

The closed-form solutions of the MSE, MSCE, GMSCE, LMF, MCC, GMCC, and MKRSL are expressed by (9), (40), (43), (51), (57), (61), and (65), respectively. The GMCC with α = 2, 4, and 6 are denoted by GMCC1, GMCC2, and GMCC3, respectively. The MKRSL with λ = 0.1 and 32 are denoted by MKRSL1 and MKRSL2, respectively.

In the experiments, we compared the MSDs of the closed-form solutions of the ten algorithms with different non-Gaussian noises. The input filter order was M = 5, and the sample size had length L = 3000. When the SNRs are ranged from − 20 to 20 dB, we obtain similar performance comparisons. Here the SNR was set to 6 dB.

Figures 1 and 2 partly show the four types of sub- and super-Gaussian noise, respectively.

Fig. 1
figure1

Four types of sub-Gaussian noises

Fig. 2
figure2

Four types of super-Gaussian noises

Figure 1a–c shows the periodic noises, and Fig. 1d shows the noise with uniform distribution. The kurtoses of the noises shown in Fig. 1a–d are − 1.5, − 1.0, − 1.4, and − 1.2, respectively.

Figure 2a, b shows the periodic super-Gaussian noises, and Fig. 2c, d shows impulsive noise. The impulsive noise v(i) is generated as v(i)=b(i)*G(i), where b(i) is Bernoulli process with a probability of success P{b(i)=1}=p. G(i) in Fig. 2c is zero-mean Rayleigh noise, and G(i) in Fig. 2d is zero-mean Gaussian noise. The kurtoses of the noises shown in Fig. 2a–d are 3.0, 4.1, 14.4, and 7.3, respectively.

The MSDs of the closed-from solutions for sub- and super-Gaussian noise were shown in Tables 1 and 2, respectively. From the above two tables, we can observe the following three conclusions: firstly, the existing algorithms (MSE, LMF, MCC, GMCC and MKRSL) do not perform better than the MSE method for sub and super-Gaussian noise simultaneously. The MCC, GMCC1, MKRSL1 and MKRSL2 performs better (worse) than the MSE method for super-Gaussian (sub-Gaussian) noise, whereas the LMF and GMCC2 perform better (worse) than the MSE for sub-Gaussian (super-Gaussian) noise. Simulations demonstrated that the proposed MSCE and GMSCE algorithm may perform better than the MSE algorithm both for sub and super-Gaussian noise. Secondly, the MCC performs as well as the MKRSL, whose parameters,λ and σ, did not influence the MSDs of the closed-form solution. Thirdly, the parameters,λ and α, have great influence on the GMCC. When α = 2 and λ = 0.031, GMCC1 performs better than the MSE for super-Gaussian noise. When α = 4 and λ = 0.005, GMCC2 performs better than the MSE for sub-Gaussian noise.

Table 1 The MSDs (dB) of the closed-form solutions of the MSCE, GMSCE, MSE, LMF, and MCC with different sub-Gaussian noises at SNR = 6 dB (L = 3000)
Table 2 The MSDs of the closed-form solutions of the MSCE, GMSCE, MSE, LMF, and MCC with different super-Gaussian noises at SNR = 6 dB (L = 3000)

Adaptive solution for sub-Gaussian noise

In the simulation, the input filter order was M = 5, the sample size had length L = 10,000 and SNR was set to 6 dB. The proposed algorithms (22) and (30) are denoted by MSCE and GMSCE, respectively.

For sub-Gaussian noise shown in Fig. 1a, d, we compared the performance of the LMS, LMF, MCC, GMCC1-3, MKRSL1-2, MSCE, and GMSCE. The step-sizes were chosen such that all the algorithms had almost the same initial convergence speed, and other parameters (if any) for each algorithm were experimentally selected to achieve desirable performance.

The comparisons were shown in Figs. 3 and 4. From the two figures we can observe:

Fig. 3
figure3

Comparisons of the algorithms under sub-Gaussian noise shown in Fig. 1a

Fig. 4
figure4

Comparisons of the algorithms under sub-Gaussian noise shown in Fig. 1d

First, the GMCC1-3, LMF, and MSCE performed better than LMS for sub-Gaussian noises. GMCC1 and GMCC2 perform best among the algorithms.

Second, the MKRSL1-2 and MCC performed worse than the LMS. The performance curves of MKRSL1 and MCC were almost overlapped.

Third, the performance of the adaptive solution was not always consistent with that of the closed-form solution. Table 1 showed that the closed-form solution of GMCC3 was worse than MSE, but the adaptive solution of GMCC3 was better than MSE. It may be hard for each algorithm to achieve a good tradeoff between the same initial convergence speed and the desirable performance (steady-state error).

Adaptive solution for super-Gaussian noise

In the simulations, the input filter order was M = 5, the sample size had length L = 10,000 and SNR was set to 6 dB. The step-sizes were chosen such that all the algorithms had almost the same initial convergence speed.

For the super-Gaussian noise shown in Fig. 2a, d, we compared the performance of the LMS, LMF, MCC, GMCC1-3, MKRSL1-2, MSCE, and GMSCE. The comparisons were shown in Figs. 5 and 6. From the two figures we can observe:

Fig. 5
figure5

Comparisons of the algorithms under super-Gaussian noise shown in Fig. 2a

Fig. 6
figure6

Comparisons of the algorithms under super-Gaussian noise shown in Fig. 2d

First, the proposed MSCE and GMSCE performed much better than other algorithms for the periodic super-Gaussian noise shown in Fig. 2a. The MSCE performed a litter better than the LMS for impulsive noise shown in Fig. 2d.

Second, the MKRSL1 and MCC had almost the same performance, the two algorithms performed a little better than the LMS.

Third, the LMF, GMCC1-3, and MKRSL2 performed worse than the LMS, though the closed-form solutions of the GMCC2 and MKRSL2 performed better than the LMS.

Combining the above simulations in this section, we can find that each algorithm may have its good points, and no algorithms can perform best for all kinds of noise. Dividing the additive noise into three types will be helpful to select the suitable algorithm for real applications.

Conclusions

This paper proposes a new cost function called the MSCE for adaptive filters, and provided the mean value and mean square performance analysis in detail. We have also presented a two-stage method to estimate the closed-form solutions for the MSCE method, and generalize the two-stage method to estimate the closed-form solution of the information theoretic learning methods, such as LMF, MCC, GMCC, and MKRSL.

The additive noise in adaptive filtering is divided into three types: Gaussian, sub-Gaussian, and super-Gaussian. The existing algorithms do not perform better than the mean square error method for sub and super-Gaussian noise simultaneously. The MCC, GMCC1, MKRSL1 and MKRSL2 performs better (worse) than the MSE method for super-Gaussian (sub-Gaussian) noise, whereas the LMF and GMCC2 perform better (worse) than the MSE for sub-Gaussian (super-Gaussian) noise. Simulations demonstrated that the proposed MSCE and GMSCE algorithm may perform better than the MSE algorithm both for sub and super-Gaussian noise.

In the future work, the MSCE algorithm may be extended to Kalman filtering, complex-valued filtering, distributed estimation, and non-linear filtering.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request.

Abbreviations

MSE:

Mean square error

ITL:

Information theoretic learning

MCC:

Maximum correntropy criterion

ILSE:

Improved least sum of exponentials

LMK:

Least mean kurtosis

LMF:

Least mean fourth

GMCC:

Generalized maximum correntropy criterion

MKRSL:

Minimum kernel risk-sensitive loss

SNR:

Signal-to-noise ratio

CE:

Cross error

BSS:

Blind source separation

MSCE:

Mean square CE Cross error

References

  1. 1.

    X. Li, T. Adali, Complex-valued linear and widely linear filtering using MSE and Gaussian entropy. IEEE Trans. Signal Process. 60(11), 5672–5684 (2012)

    MathSciNet  Article  Google Scholar 

  2. 2.

    T. Adali, P.J. Schreier, Optimization and estimation of complex valued signals: theory and applications in filtering and blind source separation. IEEE Signal Process. Mag. 31(5), 112–128 (2014)

    Article  Google Scholar 

  3. 3.

    S.Y. Huang, C.G. Li, Y. Liu, Complex-valued filtering based on the minimization of complex-error entropy. IEEE Trans. Signal Process. 24(5), 695–708 (2013)

    Google Scholar 

  4. 4.

    A.H. Sayed, Adaptive networks. Proc. IEEE 102, 460–497 (2014)

    Article  Google Scholar 

  5. 5.

    J. Chen, A.H. Sayed, Diffusion adaptation strategies for distributed optimization and learning over networks. IEEE Trans. Signal Process. 60(8), 4289–4305 (2011)

    MathSciNet  Article  Google Scholar 

  6. 6.

    W. Liu, P.P. Pokharel, J.C. Principe, Correntropy: properties and applicationsin non-Gaussian signal processing. IEEE Trans. Signal Process. 55(11), 5286–5298 (2007)

    MathSciNet  Article  Google Scholar 

  7. 7.

    A. Singh, J.C.Principe, in Proceedings of the International Joint Conference on Neural Networks 2009, Using correntropy as a cost function in linear adaptive filters (IEEE, Atlanta, 2009), pp. 2950–2955. https://doi.org/10.1109/IJCNN.2009.5178823

  8. 8.

    R. He, W.S. Zheng, B.G. Hu, Maximum correntropy criterion for robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1561–1576 (2011)

    Article  Google Scholar 

  9. 9.

    A.I. Fontes, A.M. de Martins, L.F. Silveira, J. Principe, Performance evaluation of the correntropy coefficient in automatic modulation classification. Expert Syst. Appl. 42(1), 1–8 (2015)

    Article  Google Scholar 

  10. 10.

    W. Ma, B. Chen, J. Duan, H. Zhao, Diffusion maximum correntropy criterion algorithms for robust distributed estimation. Digit. Signal Process. 58, 10–19 (2016)

    Article  Google Scholar 

  11. 11.

    J.P.F. Guimaraes, A.I.R. Fontes, J.B.A. Rego, A.M. de Martins, J.C. Principe, Complex correntropy: probabilistic interpretation and application to complex-valued data. IEEE Signal Process. Lett. 24(1), 42–25 (2017)

    Article  Google Scholar 

  12. 12.

    S. Wang, Y. Zheng, S. Duan, L. Wang, C.K. Tse, A class of improved least sum of exponentials algorithms. Signal Process. 128, 340–349 (2016)

    Article  Google Scholar 

  13. 13.

    N.J. Bershad, J.C.M. Bermudez, Stochastic analysis of the least mean kurtosis algorithm for Gaussian inputs. Digit. Signal Process. 54, 35–45 (2016)

    Article  Google Scholar 

  14. 14.

    E. Walach, B. Widrow, The least mean fourth (LMF) adaptive algorithm and its family. IEEE Trans. Inf. Theory 30, 275–283 (1984)

    Article  Google Scholar 

  15. 15.

    P.I. Hubscher, J.C.M. Bermudez, An improved statistical analysis of the least mean fourth (LMF) adaptive algorithm. IEEE Trans. Signal Process. 51(3), 664–671 (2003)

    Article  Google Scholar 

  16. 16.

    E. Eweda, Global stabilization of the least mean fourth algorithm. IEEE Trans. Signal Process. 60(3), 1473–1477 (2012)

    MathSciNet  Article  Google Scholar 

  17. 17.

    E. Eweda, Mean-square stability analysis of a normalized least mean fourth algorithm for a Markov plant. IEEE Trans. Signal Process. 62(24), 6545–6553 (2014)

    MathSciNet  Article  Google Scholar 

  18. 18.

    E. Eweda, Dependence of the stability of the least mean fourth algorithm on target weights non-stationarity. IEEE Trans. Signal Process. 62(7), 1634–1643 (2014)

    MathSciNet  Article  Google Scholar 

  19. 19.

    W. Wang, H. Zhao, Performance analysis of diffusion least mean fourth algorithm over network. Signal Process. 141, 32–47 (2017)

    Article  Google Scholar 

  20. 20.

    B. Chen, L. Xing, H. Zhao, N. Zheng, J.C. Príncipe, Generalized correntropy for robust adaptive filtering. IEEE Trans. Signal Process. 64(13), 3376–3387 (2016)

    MathSciNet  Article  Google Scholar 

  21. 21.

    B. Chen, R. Wang, Risk-sensitive loss in kernel space for robust adaptive filtering. Proc. 2015 IEEE Int. Conf. Digit. Signal Process., 921–925 (2015)

  22. 22.

    B. Chen, L. Xing, B. Xu, H. Zhao, N. Zheng, J.C. Principe, Kernel risk-sensitive loss: definition, properties and application to robust adaptive filtering. IEEE Trans. Signal Process. 65(11), 2888–2901 (2017)

    MathSciNet  Article  Google Scholar 

  23. 23.

    A. Cichocki, S. Amari, Adaptive blind signal and image processing, learning algorithms and applications (Wiley, New York, 2002)

    Google Scholar 

  24. 24.

    A. Hyvärinen, J. Karhunen, E. Oja, Indepedent component analysis (Wiley, New York, 2001)

    Book  Google Scholar 

  25. 25.

    G. Wang, Y. Zhang, B. He, K.T. Chong, A framework of target detection in hyperspectral imagery based on blind source extraction. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 9(2), 835–844 (2016)

    Article  Google Scholar 

  26. 26.

    G. Wang, C. Li, L. Dong, Noise estimation using mean square cross prediction error for speech enhancement. IEEE Trans. Circuits Syst. I Reg. Pap. 57, 1489–1499 (2010)

    MathSciNet  Article  Google Scholar 

  27. 27.

    G. Wang, N. Rao, S. Shepherd, C. Beggs, Extraction of desired signal based on AR model with its application to atrial activity estimation in atrial fibrillation. EURASIP J. Adv. Signal Process. 9, 728409 (2008)

    Article  Google Scholar 

  28. 28.

    A. Hyvärinen, Blind source separation by nonstationarity of variance: a cumulant-based approach. IEEE Trans. Neural Netw. 12(6), 1471–1474 (2001)

    Article  Google Scholar 

  29. 29.

    A. Hyvarinen, Sparse code shrinkage: denoising of nonGaussian data by maximum likelihood estimation. Neural Comput. 11(7), 1739–1768 (1999)

    Article  Google Scholar 

  30. 30.

    A. Hyvärien, E. Oja, Fast and robust fixed-point algorithm for independent component analysis. IEEE Trans. Neural Netw. 10(3), 626–634 (1999)

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to the anonymous reviewers and editors for their hard work.

Funding

This work was supported by the National Key Research and Development Program of China (Project No. 2017YFB0503400), National Natural Science Foundation of China under Grants 61371182 and 41301459.

Author information

Affiliations

Authors

Contributions

Rui Xue and Gang Wang proposed the original idea of the full text. Rui Xue designed the experiment. Yunxiang Zhang, Yuyang Zhao and Gang Wang performed the experiment and analyzed the results. Yunxiang Zhang and Yuyang Zhao drafted the manuscript. Rui Xue and Gang Wang wrote the manuscript. All authors read and approved this submission.

Corresponding author

Correspondence to Rui Xue.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors agree to publish the submitted paper in this journal.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Zhao, Y., Wang, G. et al. Mean square cross error: performance analysis and applications in non-Gaussian signal processing. EURASIP J. Adv. Signal Process. 2021, 24 (2021). https://doi.org/10.1186/s13634-021-00733-7

Download citation

Keywords

  • Adaptive filter
  • Mean square error (MSE)
  • Maximum correntropy criterion (MCC)
  • Least mean fourth (LMF)
  • Mean square cross error (MSCE)