EURASIP Journal on Applied Signal Processing 2002:12, 1–13 c ○ 2002 Hindawi Publishing Corporation Noise Cancellation with Static Mixtures of a Nonstationary Signal and Stationary Noise

We address the problem of cancelling a stationary noise component from its static mixtures with a nonstationary signal of interest. Two di ﬀ erent approaches, both based on second-order statistics, are considered. The ﬁrst is the blind source separation (BSS) approach which aims at estimating the mixing parameters via approximate joint diagonalization of estimated correlation matrices. Proper exploitation of the nonstationary nature of the desired signal in contrast to the stationarity of the noise, allows parameterization of the joint diagonalization problem in terms of a nonlinear weighted least squares (WLS) problem. The second approach is a denoising approach, which translates into direct estimation of just one of the mixing coe ﬃ cients via solution of a linear WLS problem, followed by the use of this coe ﬃ cient to create a noise-only signal to be properly eliminated from the mixture. Under certain assumptions, the BSS approach is asymptotically optimal, yet computationally more intense, since it involves an iterative nonlinear WLS solution, whereas the second approach only requires a closed-form linear LS solution. We analyze and compare the performance of the two approaches and provide some simulation results which conﬁrm our analysis. Comparison to other methods is also provided.


INTRODUCTION
In many applications in signal processing and communications, a desired signal is contaminated by some unknown, statistically independent, noise signal.Multisensor arrays are often used for the purpose of separating, or denoising, the desired signal.Each sensor receives a linear combination of the desired signal and noise so that, by properly combining the received signals, enhancement of the desired signal is possible.
This problem can be regarded either as a denoising or as a blind source separation (BSS) problem.The difference between these two approaches lies within the treatment of the noise signal: while the former regards the noise merely as a disturbance, the latter regards it as another source signal to be separated from the desired one.
A major practical difference between the two approaches to this problem lies in their computational complexity while the BSS approach involves approximate joint diagonalization, which amounts to the solution of a nonlinear weighted least squares (WLS) problem, the denoising approach only requires the solution of a linear WLS problem.It is therefore interesting to compare the performance of the two approaches in order to gauge the benefit of using the computationally more intense BSS approach.
In order to attain the desired noise cancellation, some special characteristics of the signals and/or the mixing have to be exploited.Traditionally, the BSS approach is only based on statistical independence of the sources.However, in several contexts (e.g., [1,2,3]), second-order statistics are sufficient.One such context is the framework of nonstationarity.
The key property to be employed in this paper is the assumption that the desired signal is nonstationary whereas the noise signal is stationary.This assumption holds in several situations of interest, for example, (i) in a microphones array, when the desired nonstationary signal is speech, whereas some stationary noise source (such as fan noise) is also received (as another coherent source); (ii) in a multiuser array communication system, when the signal of interest (possibly a mobile source) is received through a fading (time-varying) channel, while a nuisance signal (possibly a static source) is received through a nonfading (constant) channel.In that case, although both sources are stationary at the origin, the source of interest appears nonstationary at the array, while the undesired source (regarded as noise) appears stationary.
The mixing that links the source signals to the sensors is usually assumed to be linear and time-invariant (LTI).In its more general form, it consists of different (unknown) LTI systems relating each source signal to each sensor.However, a more degenerate case of an LTI system is a static mixture, in which each sensor receives a memoryless (static) linear combination of the source signals.While the case of static mixtures is not as prevalent in practical situation as the dynamic (convolutive) mixtures case, it has been treated extensively in the context of BSS and independent components analysis (ICA)-see, for example, [4,5,6], for a comprehensive review.In many situations, the assumption of a static mixture holds precisely, and in other situations, it can be justified as a first-order approximation of a short-memory convolutive system (e.g., in communications applications with narrowband sources or in nonreverberant acoustic situations with closely-spaced directional microphones).The treatment of the static case basically encompasses many of the principles underlying the BSS problem in general, even in the context of convolutive mixtures.
Our purpose in this paper is to present and compare (by analysis and simulations) both the denoising and the separation approaches for the problem of a static mixture of a nonstationary (desired) signal and a stationary (noise) signal.
The problem of BSS in a static mixture of nonstationary signals has recently been treated by Pham and Cardoso in [3], where one proposed method was to apply a special form of joint diagonalization to a set of estimated correlation matrices taken from different segments.It is assumed that the source signals have constant powers within segments but these powers vary between segments-thus constituting the nonstationarity of the sources.While directly applicable in our problem, this approach cannot exploit the fact that one of the source signals (the noise in our case) is stationary.In the BSS approach we take in this paper, the joint diagonalization problem assumes the form of a WLS problem, in which the parameterization properly exploits the noise stationarity.It is therefore interesting to compare the performance of our BSS approach to the approach of [3].We include an empirical comparison in this paper.
Static mixtures in the BSS context were also addressed in [7] by Parra and Spence as a preliminary tool for treatment of the convolutive case.Their model is more general since it also contains uncorrelated additive noise components in each sensor (on top of the signals' mixing).Therefore this model is also over parameterized for our more concise problem.We also provide an empirical comparison of performance, comparing the BSS and denoising approaches to the approach taken in [7] in the context of static mixtures.
In [8,9], Rahbar et al. address the case of convolutive mixtures of nonstationary signals, where separation is performed in the frequency domain by applying static source separation to the spectral components at each frequency taken over different segments (and later resolving the scale/permutation ambiguity).Again, exploitation of stationarity of one of the sources is beyond the scope of these contributions (although the extension of the associated diagonalization problems accordingly is possible).
The alternative approach, which regards the separation as a denoising problem, was first introduced by Gannot et al. in [10] and analyzed in [11].It was applied in the convolutive mixture case, and relies on a system-identification method proposed by Shalvi and Weinstein in [12].This method estimates an LTI system's transfer function by exploiting the nonstationarity of its input signal contrasted with the stationarity of the input/output noise signal.One identification approach in [12] was based on estimated time-domain correlations, while another approach was based on spectral estimates.Only the frequency-domain approach was (approximately) analyzed.However, the degenerate case of a static mixture, which allows exact (small errors) analysis in the time-domain, was not addressed.
The paper is organized as follows.In Section 2 we pro-1 vide the problem formulation.In Section 3, we present the BSS approach and in Section 4, we present the denoising approach.While the general approaches in both sections do not make any assumptions on the actual distribution of each source, a small-errors analysis is also provided (for both approaches) for the case of Gaussian, temporally-uncorrelated sources.Based on these analyses, optimized versions (under the same assumptions) of both approaches are derived.In Section 5, we present some simulations results comparing the two approaches as well as showing the agreement with the analytically predicted performance.In addition, the algorithms are empirically compared to other algorithms, and their robustness is tested.Some conclusions are drawn in Section 6.

We denote the nonstationary source signal by s[n] and the stationary noise by v[n].
In the blind scenario, the scales of neither the source signal nor the noise are known.Therefore, some arbitrary constraints have to be imposed on the mixing coefficients in order to eliminate the inherent ambiguity involved in the possible commutation of scales between the channel and the signal.We use unity scales in the direct paths, denoting by a and b the two unknown mixing parameters.The observed signals are x 1 [n] and x 2 [n], The source signal s[n] is assumed to be piece-wise powerstationary in the following sense: divide the observation interval into K segments.In each segment, s[n] satisfies where where L k being the (known) length of the kth segment 2

3
(and In general, the source signals, as well as the mixing parameters, may be either real-valued or complex-valued.Unfortunately, the real-valued case cannot be regarded as a special case of the complex-valued case since in the complexvalued case, the signals are usually assumed to be circular (see, e.g., [13]).A real-valued signal cannot be considered a circular complex-valued signal.While both cases are of interest, the presentation of the real-valued case is considerably more concise.Therefore, in order to capture the essence of the proposed approaches, we will mainly address the realvalued case, leaving for the appendix the further modifications required to address the complex-valued case.

THE BSS APPROACH
In this section, we address the denoising problem as a BSS problem, attempting to estimate the mixing parameters explicitly in order to use their estimates for demixing.
Transforming to matrix-vector notation, we define as the mixing matrix, and T as the observation vector.
Since s[n] and v[n] are zero mean and statistically independent, and are both power-stationary in each segment, the signals x 1 [n] and x 2 [n] are jointly power-stationary in each segment.Specifically, The zero-lag correlation matrices are independent of n within each segment, so that we may define the kth segment's zero-lag correlation matrix, These correlation matrices can be estimated in each segment using straightforward averaging, The estimates are unbiased and, moreover, consistent if the source signal and noise are weakly ergodic within each segment (consistency is per segment, with respect to its length L k ).
A set of K matrices R 1 , R 2 , . . ., R K is said to be jointly diagonalized by a matrix M if there exist K diagonal matrices Λ 1 , Λ 2 , . . ., Λ K such that R k = MΛ k M T for all k = 1, 2, . . ., K.Under certain conditions on the Λ k -s, the diagonalizing matrix M is unique up to possible scaling and permutation 4 of its columns.
It is evident from (6) that the true correlation matrices R 1 , R 2 , . . ., R K are jointly diagonalized by M(a, b).Thus, an estimate of M(a, b) can be obtained by attempting to jointly diagonalize the K estimated correlation matrices R1 , R2 , . . ., RK , which we will denote the "target matrices."5 However, if K > 2, then it is (almost surely) impossible to attain exact joint diagonalization of these target matrices.We must then resort to approximate joint diagonalization, a concept which has seen extensive use in the field of BSS [6,14,15,16,17] with various selections of sets of "target matrices."Several criteria have been proposed as a measure of the extent of attainable diagonalization, see, for example, [15,17,18], and especially [3] in a context similar to ours.
One possible measure of diagonalization is the straightforward least-squares (LS) criterion which, in our case, assumes the following form: (8) where • 2 F denotes the squared Frobenius norm. 2 Note that the minimization has to be attained with respect to (w.r.t.) the nuisance parameters σ2 v , σ2 1 , σ2 2 , . . ., σ2 K (as well as w.r.t. the parameters of interest â, b), since these are additional unknowns.
This formulation differs from the general formulation of standard approximate joint diagonalization problems in two respects: one is the structural constraint on the mixing matrix, which eliminates the scaling and permutation ambiguity by explicitly parameterizing just two degrees of freedom.The other is the constraint on the diagonal matrices, by which the (2, 2) element (namely σ2 v ) must be the same for all k-a direct consequence of the noise's stationarity.
Therefore, with slight manipulations, we prefer to represent this criterion as a standard (nonlinear, possibly weighted) LS problem.First denote, for shorthand, a vector θ σ2 T consisting of all nuisance parameters.In addition, define K vectors consisting of the entries of the respective target matrices in vec{•} formation, The equivalent vec{•} formation of the kth diagonal form would be Consequently, if we concatenate all rk -s into a 4K × 1 vector r r1 r2 where the 4K T and I, 1, and 0 as the K × K identity matrix, a K × 1 all-ones vector and a 4 × 1 all-zeros vector, respectively.⊗ denotes Kronecker product.6 The concatenation of the K vectors rk would normally comprise the entire "measurements vector" for the LS formulation.However, since Rk is symmetric, the second and the third elements of each rk are identical, and hence, one of 2 The Frobenius norm of a matrix A is given by A them is redundant.To mitigate this redundancy, we define reduced "measurement vectors" y k , which we concatenate to form Adding an arbitrary weight matrix W, the weighted LS criterion becomes where Note that this criterion coincides with the criterion in (8) when However, any symmetrical positive definite matrix can be used, and we will pursue the optimal weight matrix in the sequel.

Nonlinear LS solution
While linear in θ, this WLS criterion is nonlinear in â and b.As a minimization approach, we propose to use "alternating coordinates minimization" (ACM) in the following form.Assuming â and b are fixed, minimization w.r.t.θ is readily attained by the linear WLS solution, Assuming that θ is fixed, we may take Gauss' method (see, e.g., [19]) to solve the nonlinear problem in terms of â and b.Define H( â, b, θ) to be the following derivative matrix: ements of θ.Gauss' method iteratively updates the estimates â and b via where â[l] and b[l] are the lth iteration values of â and b, respectively.
A "true" ACM algorithm would alternate between minimization of the LS criterion w.r.t.θ assuming â and b are fixed, and full minimization w.r.t.â and b assuming θ is fixed.However, these full minimizations may require a large number of inner (Gauss) iterations per each outer (ACM) iteration.In an attempt to speed up the iterative process, it may be desirable to interlace minimizations w.r.t.θ between Gauss iterations.Thus, each Gauss iteration (19) would be preceded with re-estimation of θ using (17).
In a "true" ACM algorithm, the WLS criterion is guaranteed not to increase (usually to decrease) in each iteration.Being bounded below, this property guarantees convergence of the WLS criterion which, under some reasonable assumptions (see, e.g., [17]), implies convergence of the parameters.Since the criterion is fully minimized w.r.t.either θ or â, b in each iteration, the point of convergence must be a minimum both w.r.t.θ and w.r.t.â, b.However, it may happen that this point would not be a minimum with respect to â, b, and θ simultaneously.
In the "interlaced" ACM algorithm, the WLS criterion is guaranteed not to increase in each application of ( 17), 8 but not (in general) in each application of a Gauss iteration (19).Nevertheless, under a "small errors assumption," each Gauss iteration solves a linearized WLS problem in the vicinity of a true minimum, thus the nonlinear WLS criterion is decreased as well.
In order to justify such a "small errors assumption," a reasonable initial guess for the parameters has to be used.A possible choice for â[0] and b[0] can be computed from the (exact) joint diagonalization of any two matrices of the set R1 , R2 , . . ., RK , say R1 and R2 .Since these estimated correlation matrices are symmetric and positive definite, there exist some M, Λ1 , and Λ2 that satisfy meaning that M is the eigenvectors matrix of R1 R−1 2 (with eigenvalues given by the diagonal values of Λ1 Λ−1 2 ).Thus, initial guesses for â and b can be obtained from this eigenvectors matrix using proper normalization.The permutation ambiguity can be resolved by ordering the eigenvalues such that the (2, 2) element of the 9 eigenvalues matrix be the nearest to unity among the two (reflecting the nominal requirement Λ(2,2) ).The minimization algorithm therefore assumes the form shown in Algorithm 1.
A reasonable convergence criterion would be to monitor the norm of all the parameters' update in each iteration and compare to a small threshold.
Algorithm 1 Once a and b are estimated, the demixing matrix can be constructed, and the source (and noise) process(es) estimated

Performance analysis and optimal weighting
When some statistical knowledge regarding the source and the noise processes is available, a small-errors performance analysis can be derived and, moreover, an optimal (or an asymptotically optimal) weight matrix W can be found.A key step in the analysis would be to obtain the covariance matrix of the "measurements" y.
To this end, we will now use a statistical model consisting of the following additional assumptions (in addition to the assumptions stated in Section 2): (i) both the source and the noise are Gaussian processes; (ii) all nonzero-lag correlations of both processes are zero, These additional assumptions imply statistical independence between observation signals x[n] belonging to different segments.This statistical independence implies, in turn, zero covariance between the estimates of correlation matrices from two different segments.We therefore need only the covariance between the elements of the estimated Rk for each k (segment).By exploiting the Gaussianity and the insegment whiteness of both signals, we obtain where i, j, p, q = 1, 2. Since the first term on the last row equals the remaining term equals the desired covariance.Consequently, the entire covariance matrix (per segment k) can be written in matrix form as follows: The covariance matrix of the measurements y k is then given by where D was defined via (13).Finally, the covariance matrix of the entire measurements vector is given by where diag{•} is in the matrices-to-matrix block-diagonal sense.
With C y in hand, we can now proceed to analyze the error in estimating a and b and the consequent denoising performance.It is well known that under the "small errors assumption," the nonlinear-WLS estimates are unbiased, and their covariance can be calculated as follows.Define φ a b θT T as the complete vector of unknown parameters, and as the complete derivative matrix.Then The covariance matrix of â and b is then given by the upper-left 2 × 2 matrix of C φ. Specifically, define σ 2 a as the (1, 1) element of this matrix.
When the estimated demixing matrix is applied to the observed signals, the entire (residual) mixing is given by such that the denoised signal is given by ŝ The residual interference to signal ratio (ISR) is usually defined as the expected value of the power of the residual noise coefficient , normalized by the power of the signal coefficient α.Under the "small error assumption," and assuming further that the true mixing matrix is well conditioned (the product ab is far from unity), it can be deduced that α ≈ 1 and When such a statistical model is in effect, it becomes relatively straightforward to use the optimal weight matrix, which is well known [19] to be given by W opt = C −1 y .However, since the true correlation matrices are unknown, the estimated matrices can be used in (25), yielding a suboptimal weight matrix.Nevertheless, due to the ergodicity of the source and the noise processes, the estimated weight is asymptotically optimal ( "asymptotically" means here that the number of segments is fixed and their lengths all tend to infinity).The optimality here is in the sense of the resulting mean squared error in estimating a and b which translates directly into the ISR.
Note, in addition, that when W opt is used, the expression in (29) reduces to [F T (φ)W opt F(φ)] −1 .

DENOISING APPROACH
The BSS approach presented so far is approximately optimal (under several assumptions), but involves an iterative solution of a nonlinear LS problem.Now we will derive a different approach which only involves a linear LS solution.A comparison between the two methods would be presented in Section 5.This solution addresses the noise cancellation problem as a denoising problem, attempting to cancel out noise terms in the first signal, x 1 [n].Again the nonstationarity of the desired signal s[n] is exploited in concert with the stationarity of the noise v[n].

Algorithm derivation
To get an estimate of the desired signal, we first define a noise-only reference signal, u Obviously, u[n] is unavailable since b is unknown.We will therefore replace b with its estimate b.The procedure for estimating b will be discussed in the sequel.However, assuming for now that u[n] is available, an estimate of the desired signal s[n] can be obtained by fixing the coefficient h in the following expression: (34) such that the power of ŝ[n] is minimized.This dwells on the fact that s[n] is uncorrelated with v[n] (and hence with u[n]).
Let the output power be defined by Since r x1u and r uu are not directly available, we will express them using the input signals' correlations.
Using (33), we note that, indeed, if r x1x1 , r x1x2 , and r x2x2 are known, then However, since, in practice, the cross and auto correlations are not known, we should use their estimated values instead, , and rx2x2 = (1/N) n x 2  2 [n] are the correlation estimates (at lag zero) taken over the entire observation interval.Zero-lag correlations are sufficient due to the static mixture framework.
When estimates ĥ and b are used (for h and b, resp.), the estimated signal is given by ŝ The first additive term is (a scaled version of) the desired signal, and the second term is a residual noise term.This expression is similar in structure to (31).However, in (31), direct estimates â, b, of both mixing parameters (a, b, resp.), were used, whereas in (41), a is not estimated directly.Instead, an external parameter h is introduced and estimated.Now we turn to the estimation of b.To this end, we will exploit the nonstationarity of s [n] where, r(k) x2x1 , r(k) ux1 , and r(k) x1x1 are (the kth segment's) consistent correlation estimates (at lag zero) and (k) ux1 r(k) ux1 − r ux1 is the zero-mean error in estimating Concatenating (43) for k = 1, 2, . . ., K, we obtain in matrix form or in short form Treating (??) as an LS problem in the parameter η, with e a 10 zero-mean "noise" vector, the WLS estimate of η is given by η where W is a possible weight matrix.The desired estimate of b is given by the first term of η, the second term could be regarded as a nuisance parameter.Choosing an asymptotically optimal weight matrix W opt will be discussed in Section 4.2.We summarize in Algorithm 2.

Performance analysis and optimal weighting
In this section, we analyze the expected performance of the suggested denoising algorithm.Using a small error analysis, we can write where b and h are zero-mean "small" random variables such that Using (41), the residual error is given by ˜ v [n] where Neglecting the second-order error term h b and using h = a/(1 − ab), we obtain The scaling error in (41) is given by where in the last transition, we neglected again the secondorder error term h b .Thus, in order to calculate the residual error energy and the scaling distortion, we need to calculate the second-order statistics of b and h .Since all the error terms in the analysis are due to errors in estimating the input signals' correlations, we will now define the relations between these segment-wise errors and the error terms of interest.Now we reemploy the additional assumptions of Section 3.2, namely, both the signal s[n] and the noise v[n] are Gaussian, temporally uncorrelated.Consequently, the covariance of the kth segment's estimation error vector (which equals the covariance of y k of ( 13)) is given by C y,k of (26), and the covariance of the augmented vector is given by C y of ( 27).
The error in estimating η = b r ux1 T using the LS solution ( 46) is given by 11 Thus, the error term in estimating b is given by this vector's first element, namely, where q 1 , . . ., q K are the elements of q.Further, define where neglecting second and higher order terms in all approximations.Consequently, we identify with Using (49), (61) Then the ISR is defined by As we did in the BSS context, we may, under the same statistical assumption, employ an asymptotically optimal weight matrix in the WLS problem (??).Using the identity x1x1 , we can obtain the optimal weight matrix W = diag Var (1)  ux1 , Var (2)  ux1 , . . ., Var (K) ux1 −1 , (63) where with δ T = −b 1 0 .Since the true correlation terms are unknown, the estimated terms can be used instead.Note that this also requires an estimate b of b.Thus, in order to use the optimal weighting matrix, we may first estimate b using (46) with W = I (the identity matrix) and then use (63) to obtain the (asymptotically) optimal W. Note that, as in the BSS approach, this procedure requires reasonably "good" estimates in order, for the estimated W to be close to the true optimal weight.Recall, further, that this weight matrix is only optimal under the assumption that the source and noise signals are Gaussian, temporally uncorrelated.When this is not the case, the algorithm can still be applied using either W = I or any other properly calculated weight matrix.

PERFORMANCE EVALUATION AND COMPARISON
In this section, we compare the performance of the two approaches, both analytically and empirically.In addition, we compare their performance to that obtained by several other algorithms applied to the same problem.We also provide empirical results that address the performance degradation in the presence of additional (additive, noncoherent) noise.Finally, we address the sensitivity of performance to the Gaussianity assumption by presenting empirical results for signals/noise with nonGaussian distribution.
The nominal setup used is as follows.All signals involved are temporally uncorrelated zero-mean Gaussian.We use six equal-length segments ( ), and with unity noise power σ 2 v = 1.The true mixing matrix is In Figure 2, we present analytical and empirical results for three algorithms: the optimally weighted BSS algorithm, the unweighted denoising algorithm, and the optimally weighted denoising algorithm.All results are displayed in terms of ISR versus the entire observation length N = 6L.The empirical (simulations) results represent averages over 250 trials each.All algorithms were applied to the same data.
The empirical results for our algorithms are seen to coincide (asymptotically) with the theoretically predicted values.As expected, the computationally more intensive BSS approach outperforms the denoising approach in both its weighted and unweighted versions.However, this advantage is more pronounced at the longer observation lengths.At the shorter lengths, the BSS weighting departs from its optimal value, and hence, the advantage in performance decreases.As for the denoising approach, its weighted version attains some slight improvement over the unweighted version.
We proceed to compare (empirically) the performance of these algorithms to that of three other algorithms, namely, (i) a BSS algorithm for nonstationary signals by Pham and Cardoso [3]: the algorithm is based on a special form of joint diagonalization of the empirical correlation matrices, and attains the maximum likelihood (ML) estimate for all unknown parameters.However, it cannot directly exploit the fact that one of the signals (the noise) is stationary;  (ii) a least-squares gradient-descent algorithm proposed by Parra and Spence [7]: this algorithm minimizes an unweighted least-squares criterion using a gradientdescent approach and, in general, includes parameterization for additive (noncoherent) noise as well.However, the additive noise parameters can only be estimated when the number of observed signals is at least four.Since, in our setup, the number of observed signals is two, we applied a noiseless version of the algorithm, in which the noncoherent additive noises' variances are assumed zero.Like Pham's algorithm, this algorithm does not exploit the stationarity of the (coherent) noise; (iii) the joint approximate diagonalization of eigenmatrices (JADE) algorithm (Cardoso and Souloumiac [20]), which is based on empirical fourth-order cumulant matrices (estimated over the entire observation interval) and does not take advantage of the nonstationarity.Nevertheless, it can be applied to any BSS problem as long as no more than one of the sources has a zero fourth-order cumulant (e.g., is Gaussian).Note that although both the source and the noise signals are Gaussian in our case, the source signal appears to the JADE algorithm as nonGaussian due to its nonstationarity: the overall empirical fourth-order cumulant would depart from zero, behaving like the fourthorder cumulant of a Gaussian-mixture distribution.
In Table 1, we present the empirical ISR for all algorithms, averaged over 250 trials and applied to the same data, generated using the same setup described above, with N = 996.
The results of the BSS and the denoising approaches are in accordance with those already depicted (for the same N) in Figure 2. As for the other algorithms, it is interesting to observe that Pham's maximum likelihood estimate attains the same performance as the optimal BSS algorithm although it does not explicitly use the knowledge that the noise is stationary. 3However, Parra's ordinary least-squares gradientdescent algorithm, as well as the JADE algorithm, have attained inferior performance relative to the proposed algorithms.The main reasons for the degraded performance of the LS algorithm are the suboptimal (uniform) weighting, combined with the fact that the noise stationarity is unaccounted for.The degraded performance of JADE could be easily anticipated from the fact that the nonstationarity is not exploited.
To further evaluate the behavior of the algorithms under various off-nominal conditions, we will now present empirical results for the two following cases: (1) presence of additive (noncoherent) uncorrelated white noise, in addition to the coherent noise signal v[n]; (2) nonGaussian source/noise distributions.
In Figure 3, we demonstrate the performance in presence of noncoherent additive noise for JADE, weighted denoising, denoising, BSS, and Pham's algorithm, all for a fixed observation length N = 996.The measurement model is given by where w 1 [n] and w 2 [n] are white, uncorrelated, Gaussian noise processes with equal variances σ 2 w .Results are displayed in terms of the ISR versus the additive noise variance σ 2 w .To generate the signals s [n] and v[n], the same model specified above was used.Note that the ISR reflects only the separation performance and not the suppression of the incoherent noise, which would still be present at the separated outputs, even if the mixing matrix were perfectly known.
All algorithms are seen to exhibit degraded performance as σ 2 w increases.The differences in performance vanish as all curves converge into one, with the exception of the BSS algorithm which, at high σ 2 w levels, departs towards further   degradation.It is interesting to observe, once again, that the performance of Pham's algorithm usually follows that of BSS rather closely.It is to be noted, however, that unlike Pham's algorithm, the BSS algorithm can be easily adjusted to accommodate the additional noise terms by proper parameterization thereof.In such a case, it can be expected that its performance under noisy conditions would improve significantly.However, the pursuit of such a modification is beyond the scope of this paper.
To conclude the simulation section, we explore the robustness of the algorithms with respect to the source signals' distributions.Empirical performance (in terms of ISR) is presented for all 16 combinations of the following four source/noise distributions (all zero-mean with the prescribed variances; source distributions are per segment): (1) Gaussian; (2) Binary (BPSK-like); (3) Uniform; (4) Laplace (double-sided exponential).
No additive (incoherent) noise was added in this experiment.All algorithms used the same data with overall observation length N = 996.The results for our three approaches (for all 16 combinations) are summarized in Tables 2, 3, and 4.
Results are seen to be roughly insensitive to the actual source distribution, with the most notable degradation occurring when the source is Laplace distributed, in which case the estimation of its correlations becomes more erratic.Although the optimal weighting we used assumes a Gaussian distribution, departure of the actual distributions from Gaussianity does not have a severe effect on performance, at least in these tested cases.Moreover, in some cases, the mismatched weighting is compensated for by the improved accuracy in the correlation estimates, yielding improved performance.

CONCLUSION
We presented and compared two approaches for the noise cancellation problem in static mixtures of a nonstationary desired signal and stationary noise.Both approaches are based on second-order statistics.However, the BSS approach requires the solution of a nonlinear WLS problem, whereas the denoising approach only requires the solution of a linear WLS problem.Consequently, the performance obtained by the BSS approach is superior to that obtained by the denoising approach.
To capture the essence of the different approaches, and to simplify the exposition, as well as to enable a tractable analysis of performance, we concentrated on the simple 2 × 2 static-mixture model.While justified in a limited number of applications, such a model has been the subject of extensive research in the literature since it serves as a basis for evolving methods for the more prevalent model of dynamic mixtures.Indeed, both of the approaches presented in this paper can be extended and applied in the convolutive case, possibly expressing similar tradeoffs between computational complexity and performance.

APPENDIX MODIFICATIONS FOR THE COMPLEX-VALUED CASE
For the complex-valued case, we assume that both the source signal and the noise are complex-valued circular random processes.The circularity property [13], often assumed in the context of complex random processes, implies that

E[s[n]s[m]] = 0 and E[v[n]v[m]]
= 0 for all n and m.In other words, we have where v R [n] and v I [n] denote the real and imaginary parts (resp.) of v[n].A similar property holds for s[n] in each segment.Note that this implies that the real and imaginary parts at each time instant n are uncorrelated.
In addition, we assume a properly normalized complex mixing matrix M(a, b) as in (4), with a = a R + j • a I and b = b R + j • b I , so these are now four real-valued parameters of interest, a R , a I , b R , and b I .The other K + 1 nuisance parameters remain unchanged (since they represent real-valued positive variances).12 The modifications to the BSS approach are as follows: the segmental correlation matrices are now estimated using where the superscript H denotes the conjugate-transpose.With rk = vec{ Rk } and y = Dr k defined as in ( 9) and (13) (resp.), the matrix G( â, b) is still defined as in (15), but now b = 1 b * |b| 2 T and a = |a| 2 a 1 T .The matrix H( â, b, θ) of ( 18) is now defined as Therefore, the minimization w.r.t.θ still takes the form of (17), with the T superscript replaced by H.However, the Gaussian iterations take the augmented form, where Re{•} denotes the real part of the enclosed expression.This is the special form of a linear WLS solution obtained when using complex-valued measurements and model matrix, while constraining the estimated parameters to be realvalued.
As for calculating the optimal weight matrix W opt , the only modification is to C r,k , which is now given (still under the assumption of complex circular Gaussian, temporally uncorrelated source signal and noise) by The matrices C y,k , C y , and W opt = C −1 y are automatically 13 updated accordingly.
The modifications to the denoising approach are more simple; naturally, all correlations should be estimated using conjugation, as indicated in (A.2).The linear LS problem (??) still holds, so the estimation of the complex value of b is still given by ( 46), but with the T superscript replaced by H.All other procedures, including calculation of the optimal weight in (63) and (64), remain unchanged, provided that (A.5) is used for C r,k in (64)., where he holds a consulting position in the fields of speech and audio processing, video processing, and emitter location algorithms.He is currently a faculty member at the department of Electrical Engineering-Systems at Tel-Aviv University, where he teaches courses in statistical and digital signal processing, and has been awarded the Best Lecturer of the Faculty of Engineering Award for three consecutive years.His research interests include estimation theory, statistical signal processing, and blind source separation.

Figure 2 :
Figure 2: Empirical and theoretical results for the BSS, denoising and weighted denoising approaches in terms of ISR [dB] versus the entire observation length N.

Figure 3 :
Figure 3: Empirical results for the BSS, denoising, weighted denoising, and JADE algorithms, in terms of ISR [dB] versus the incoherent additive noise variance σ 2 w .

Sharon
Gannot received the B.S. degree (summa cum laude) from the Technion-Israel Institute of Technology, Haifa, in 1986 and the M.S. (cum laude) and Ph.D. degrees from Tel-Aviv University, Tel-Aviv, Israel, in 1995 and 2000, respectively, all in electrical engineering.Between 1986 and 1993, he was Head of a research and development section in the Israeli Defense Forces.In 2001, he held a post-doctoral position at the Department of Electrical Engineering (SISTA), K.U.Leuven, Leuven, Belgium.He currently holds a research and teaching position at the signal and image processing lab (SIPL), Faculty of Electrical Engineering, the Technion, Israel.His research interests include parameters estimation, statistical signal processing, and speech processing using either single or multimicrophone arrays.Arie Yeredor was born in Haifa, Israel in 1963.He received the B.S. in electrical engineering (summa cum laude) and Ph.D. degrees from Tel-Aviv University in 1984 and 1997, respectively.From 1984 to 1990, he was with the Israeli Defense Forces (Intelligence Corps), in charge of advanced research and development activities in the fields of statistical and array signal processing.Since 1990, he has been with NICE Systems Inc.
and stationarity of v[n].
sistent estimate of b.Unfortunately, by (33), u[n] and x 1 [n] are in general correlated, which would cause this estimate to be biased and inconsistent.The bias effect can be mitigated by introducing an extra unknown parameter.To do so, we divide the observations x 1 [n] and x 2 [n] into the segments introduced in (3).Thus, for the kth segment, we obtain, r(k)

Table 1 :
ISR results [dB] attained by different algorithms, all using the same data with N = 996.