Mean square cross error: performance analysis and applications in non-Gaussian signal processing

Most of the cost functions of adaptive filtering algorithms include the square error, which depends on the current error signal. When the additive noise is impulsive, we can expect that the square error will be very large. By contrast, the cross error, which is the correlation of the error signal and its delay, may be very small. Based on this fact, we propose a new cost function called the mean square cross error for adaptive filters, and provide the mean value and mean square performance analysis in detail. Furthermore, we present a two-stage method to estimate the closed-form solutions for the proposed method, and generalize the two-stage method to estimate the closed-form solution of the information theoretic learning methods, including least mean fourth, maximum correntropy criterion, generalized maximum correntropy criterion, and minimum kernel risk-sensitive loss. The simulations of the adaptive solutions and closed-form solution show the effectivity of the new method.

The LMK and LMF are robust to sub-Gaussian noise. One typical sub-Gaussian distribution is the uniform distribution. The MCC and ILSE are robust to larger outliers or impulsive noise, which often take relatively more often values that are very close to zero or very large. This means that impulsive noise has a super-Gaussian distribution [23,24].
Altogether, the distribution of the additive noise in linear filtering can be divided into three types: Gaussian, super-Gaussian, and sub-Gaussian. Super-Gaussian noise and sub-Gaussian noise are both non-Gaussian.
From the viewpoint of performance, for example, the steady error, the MSE, MCC, and LMF work well for Gaussian, super-Gaussian, and sub-Gaussian noise, respectively. The MSE demonstrates similar performance for the three types of noise under the same signal-to-noise ratio (SNR). For Gaussian noise, all the algorithms have similar steady errors. For super-Gaussian noise, the steady error comparison under the same SNR is MCC < MSE < LMF. For sub-Gaussian noise, the comparison is LMF < MSE < MCC.
Note that the cost functions of the above algorithms all include the square error, which is the correlation of the error signal. When impulsive noise is involved, we can expect that the square error will be very large. By contrast, the cross error (CE), which is the correlation of the error signal itself and its delay, may be very small for impulsive noise.
In our early work [25][26][27], we proposed the mean square cross prediction error to extract the desired signal in blind source separation (BSS), where the square cross prediction error was much smaller than the square prediction error. In this paper, we propose a new cost function called the mean square CE (MSCE) for adaptive filtering to process non-Gaussian noise. We expect that the proposed MSCE algorithm will perform well for non-Gaussian noise.
Note that the ITL methods can capture higher-order statistics of data. Thus, it is hard to directly obtain the corresponding closed-form solutions. We present a two-stage method to estimate the closed-form solutions for the LMF, MCC, GMCC, MKRSL, and MSCE.
The contributions of this paper are summarized as follows: i) We present a new cost function, that is, the MSCE, for adaptive filters, and provide the mean value and mean square performance analysis in detail. ii) We propose a two-stage method to estimate the closed-form solution of the proposed MSCE algorithm. iii) We generalize the two-stage method to estimate the closed-form solution of the LMF MCC, GMCC, and MKRSL algorithms.
The paper is organized as follows: In Section 2, the problem statement is explained in detail. In Section 3, the MSCE algorithm is presented with the adaptive algorithm and closed-form solution. In Section 4, the closed-form solution of the LMF, MCC, GMCC, and MKRSL are estimated. In Section 5, the mean behavior and mean square behavior of MSCE are analyzed. Simulations are provided in Section 6. Lastly, a conclusion is provided in Section 7.

Problem formulation
The absolute value of the normalized kurtosis may be considered as one measure of non-Gaussianity of the error signal. Several definitions for a random variable of zero means are presented as follows:

Definition 1 (normalized kurtosis)
The normalized kurtosis of random variable x is defined as where x has zero mean.

Definition 4 (mesokurtic)
A zero-kurtosis distribution is called mesokurtic (e.g., Gaussian). When the linear filtering problem is considered, there is an input vector u ∈ ℝ M , with unknown parameter w o ∈ ℝ M and desired response d ∈ ℝ 1 . Data d(i) are observed at each time point i by the linear regression model: where v is zero-mean background noise with variance σ 2 v and L is the length of the sequence. The error signal for the linear filter is defined as where w is the estimate of w o . The distribution of the additive noise in linear filtering can be divided into three types: Gaussian, super-Gaussian, and sub-Gaussian. Super-Gaussian noise and sub-Gaussian noise are both non-Gaussian.
In this research, we made the following assumptions: A1) The additive noise is white, that is, A2) Inputs u(t) at different time moments (i, j) are uncorrelated: A3) The inputs and additive noise at different time moments (i, j) are uncorrelated: The linear filtering algorithms of the MSE, MCC, and LMF are as follows: the cost function based on the MSE is given by where E denotes the expectation operator. The gradient is defined as At the stationary point, E{eu} = 0. The closed-form solution denoted by w MSE is given by the Wiener-Hopf equation: The corresponding stochastic gradient descent, or LMS, algorithm is where μdenotes the step size and μ > 0. The cost function based on the LMF is given by The corresponding stochastic gradient descent algorithm is The cost function based on the correntropy of the error, also called the MCC, is given by where σdenotes the kernel bandwidth. The corresponding stochastic gradient descent algorithm is The cost function based on the GMCC is given by The corresponding stochastic gradient descent algorithm is The cost function based on the MKRSL is given by The corresponding stochastic gradient descent algorithm is 3 Methods

Adaptive algorithm of the MSCE
The CE can be expressed as e(i)e(i-q), where q denotes the error delay. Because the CE may be negative, we provide a new cost function, that is, the MSCE, as The gradient of the MSCE can be derived as Then, the corresponding stochastic gradient descent algorithm is Equation (19) may not be robust against outliers. We provide the generalized MSCE (GMSCE) as where G(x) is an x 2 -like function. The stochastic gradient descent algorithm for the GMSCE is where g(.) is the derivative of G(.).

A suitable criterion for cost function
Combining the references of ICA [23,24,[28][29][30] with those of the MCC [6][7][8][9][10][11], ILSE [12], and LMF [14][15][16][17][18][19], we determined that there are three cost functions in the fast ICA [30] algorithm: G 2 (u) is used to separate the super-Gaussian source in ICA, and works as the cost function of the MCC. G 3 (u) is used to separate the sub-Gaussian source in ICA when there are no outliers, and works as the cost function of the LMF. G 1 (u) has not been used in adaptive filtering, but cosh(α 1 u) works as the cost function of the ILSE.
This motivated us to explore ICA or BSS algorithms to determine a suitable criterion for adaptive filtering. Here, we use G 1 (u) in the proposed GMSCE algorithm: Substituting (28)-(29) into (24) with α = 1, we obtain

Closed-form solution of the MSCE and GMSCE
We can estimate the closed-form solution of the MSCE from the stationary point ∂J MSCE /∂w = 0: Substituting (20) into (31), we obtain It is difficult to solve w from (32) because e 2 (i) contains the second-order term of w. We present a two-stage method to estimate w.
In the first stage, we estimate e(i) and e(i − q) from (9): whereêðiÞandêði−qÞ are the estimates of e(i) and e(i − q), respectively. In the second stage, we estimate w from (32). If we define then we can rewrite (33) as Note that the expectation can be estimated by averaging over the samples using Equation (34) can be estimated by If we define R du (q) and R uu (q) as then we can estimate them as where R du (q) and R uu (q) are the estimates of R du (q) and R uu (q), respectively. We can estimate the closed-form solution of the MSCE as Note than tanh(x) ≈ x when x is small. We can estimate the closed-form solution of the GMSCE in the same way.
If we define R Gdu (q) and R Guu (q) as then we can estimate them as We can estimate the closed-form solution of the GMSCE as 4 Closed-form solution of the LMF, MCC, GMCC, and MKRSL Based on the two-stage methods, we can also estimate the closed solution of the LMF, MCC, GMCC, and MKRSL algorithms as follows.

Closed-form solution of the LMF
We can estimate the closed-form solution of the LMF from the stationary point ∂J LMF / ∂w = 0: In the first stage, we estimate e(i) from (9): In the second stage, we estimate w from (44). If we define then we can rewrite (44) as where F 2 (w) is the estimate of F 2 (w).
With the help of (35), Eq. (47) can be estimated by If we define R du2 and R uu2 as then we can estimate them as where R du2 and R uu2 are the estimates of R du2 and R uu2 , respectively. We can estimate the closed-form solution of the LMF as

Closed-form solution of the MCC
We can estimate the closed-form solution of the MCC from the stationary point ∂J MCC /∂w = 0: In the first stage, we estimate e(i) from (45). In the second stage, we estimate w from (52).
If we define then we can rewrite (52) as where F 3 (w) is the estimate of F 3 (w). With the help of (35), Eq. (54) can be estimated by Denote by We can estimate the closed-form solution of the MCC as

Closed-form solution of the GMCC
The closed-form solution of the GMCC is given by [20]: In the first stage, we estimate e(i) from (45), and we have In the second stage, we estimate w from (59). Denote by We can estimate the closed-form solution of the GMCC as

Closed-form solution of the MKRSL
The closed-form solution of the MKRSL is given by [22]: In the first stage, we estimate e(i)from (45), and we have In the second stage, we estimate w from (63). Denote by We can estimate the closed-form solution of the MKRSL as 5 Performance analysis of MSCE

Mean value behavior
To compare the performance of the MSE and MSCE, we define the total weight error as where Substituting (22) where I is the identity matrix. Substituting (39) into (67), we obtain Note that R uu (q) is positive definite; thus, (69) is stable for a sufficiently small step size μ.
The eigenvalue decomposition of R uu (q) is R uu (q) = UΛU T . Then, we have the firstorder moment of ε(i) Let d max denote the maximum eigenvalue of R uu (q). The step size should be selected as so that the iterations will converge.

Mean square behavior
If we define where U m is the mth column of U, then we can rewrite (56) as which is composed of M decoupled difference equations: The second-order moment of δ m (i + 1) can be derived from (74) as From (2), (3), and (67), we obtain Thus, we have Substituting (77) into (75) yields Let then we have Substituting (79) into (80) yields Note that U T m R uu ðqÞU m ¼ d m , and we have Then the second-order moment of δ m (i + 1) can be rewritten as Thus, the steady state error of the MSCE algorithm is given by When μ < < 1/d max , we have For comparison, we can write the steady state error of the MSE algorithm as For impulsive noise, its variance E{v 2 (i)} may be very large, but its MSCE E{v 2 (i)v 2 (i − q)} may be small. Thus, the proposed MSCE algorithm may have a smaller steady error than the MSE for impulsive noise.

Selection of delay q
After obtaining the estimate of e(i) using (33), we can estimate the MSCE for q = 1, 2, ⋯, Q: whereJ MSCE is the estimate of J MSCE . Because the mean-square performance of the MSCE algorithm is proportional to E{v 2 (i)v 2 (i − q)} according to (85), we should select q with the smallest J MSCE in (87).

Simulation results and discussion
In this section, the performance of the MSE, MSCE, GMSCE, LMF, MCC, GMCC, and MKRSL will be evaluated by simulations. All the simulation points were averaged over 100 independent runs. The performance of the adaptive solution was estimated by the steady-state mean-square deviation (MSD) The performance of the closed-form solution is We concluded that the smaller the MSD, the better the performance.

Closed-form solutions comparison
The In the experiments, we compared the MSDs of the closed-form solutions of the ten algorithms with different non-Gaussian noises. The input filter order was M = 5, and the sample size had length L = 3000. When the SNRs are ranged from − 20 to 20 dB, we obtain similar performance comparisons. Here the SNR was set to 6 dB. Figures 1 and 2 partly show the four types of sub-and super-Gaussian noise, respectively.  Fig. 2c is zero-mean Rayleigh noise, and G(i) in Fig. 2d is zero-mean Gaussian noise. The kurtoses of the noises shown in Fig. 2a-d are 3.0, 4.1, 14.4, and 7.3, respectively.
The MSDs of the closed-from solutions for sub-and super-Gaussian noise were shown in Tables 1 and 2, respectively. From the above two tables, we can observe the following three conclusions: firstly, the existing algorithms (MSE, LMF, MCC, GMCC and MKRSL) do not perform better than the MSE method for sub and super-Gaussian noise simultaneously. The MCC, GMCC1, MKRSL1 and MKRSL2 performs better (worse) than the MSE method for super-Gaussian (sub-Gaussian) noise, whereas the LMF and GMCC2 perform better (worse) than the MSE for sub-Gaussian (super-Gaussian) noise. Simulations demonstrated that the proposed MSCE and GMSCE algorithm may perform better than the MSE algorithm both for sub and super-Gaussian noise. Secondly, the MCC performs as well as the MKRSL, whose parameters,λ and σ, did not influence the MSDs of the closed-form solution. Thirdly, the parameters,λ and α, have great influence on the GMCC. When α = 2 and λ = 0.031, GMCC1 performs better than the MSE for super-Gaussian noise. When α = 4 and λ = 0.005, GMCC2 performs better than the MSE for sub-Gaussian noise.

Adaptive solution for sub-Gaussian noise
In the simulation, the input filter order was M = 5, the sample size had length L = 10, 000 and SNR was set to 6 dB. The proposed algorithms (22) and (30) are denoted by MSCE and GMSCE, respectively. For sub-Gaussian noise shown in Fig. 1a, d, we compared the performance of the LMS, LMF, MCC, GMCC1-3, MKRSL1-2, MSCE, and GMSCE. The step-sizes were chosen such that all the algorithms had almost the same initial convergence speed, and other parameters (if any) for each algorithm were experimentally selected to achieve desirable performance.
The comparisons were shown in Figs. 3 and 4. From the two figures we can observe:   First, the GMCC1-3, LMF, and MSCE performed better than LMS for sub-Gaussian noises. GMCC1 and GMCC2 perform best among the algorithms.
Second, the MKRSL1-2 and MCC performed worse than the LMS. The performance curves of MKRSL1 and MCC were almost overlapped.
Third, the performance of the adaptive solution was not always consistent with that of the closed-form solution. Table 1 showed that the closed-form solution of GMCC3 was worse than MSE, but the adaptive solution of GMCC3 was better than MSE. It may be hard for each algorithm to achieve a good tradeoff between the same initial convergence speed and the desirable performance (steady-state error).

Adaptive solution for super-Gaussian noise
In the simulations, the input filter order was M = 5, the sample size had length L = 10, 000 and SNR was set to 6 dB. The step-sizes were chosen such that all the algorithms had almost the same initial convergence speed.
For the super-Gaussian noise shown in Fig. 2a, d, we compared the performance of the LMS, LMF, MCC, GMCC1-3, MKRSL1-2, MSCE, and GMSCE. The comparisons were shown in Figs. 5 and 6. From the two figures we can observe: First, the proposed MSCE and GMSCE performed much better than other algorithms for the periodic super-Gaussian noise shown in Fig. 2a. The MSCE performed a litter better than the LMS for impulsive noise shown in Fig. 2d.
Second, the MKRSL1 and MCC had almost the same performance, the two algorithms performed a little better than the LMS.