The improvement in the speech quality and intelligibility depends remarkably on the accuracy of the noise power estimate. The estimators presented here are inspired by the target cancelation technique, in which the coherent target speech signal is blocked from the microphone signals to retrieve the noise components. However, the estimated noise components at the output of the blocking system are always the filtered versions of the actual noise signal. A spectral correction gain, obtained via the estimated blocking filters, is thus employed in each case to undo this filtering effect.

It should also be mentioned that the assumption of target speech cancelation would not be completely fulfilled in the presence of the observation noise, which is the case considered in this paper. Therefore, the residual speech components (called speech leakage) leak into the estimated noise, increasing the estimated noise power and possibly leading to speech distortion in the enhancement stage of Fig. 1. The speech leakage problem in blocking-based-noise PSD estimators will be elaborated upon more precisely in Section 6.2 of this paper.

The algorithms that will be elaborated upon in this section are all based on square-error minimization. However, the filter structures are different for each method, c.f., Fig. 2
a, b, and c. All methods can be understood as being different forms of subspace analysis, with different origins in the signal or noise-subspace analysis; however, they will all be cast into the common framework of a noise PSD estimator here.

### 4.1 ITF-based adaptive blocking (ITFB)

The interaural transfer function (ITF) estimation errors, subject to minimization, are written as [41]

$$\begin{array}{*{20}l} {e}_{l}(k) &= {y}_{l}(k-\tau_{a}) - {\mathbf{\widehat{w}}_{r}}^{T}(k)\mathbf{y}_{r}(k), \\ {e}_{r}(k) &= {y}_{r}(k-\tau_{a}) - {\mathbf{\widehat{w}}_{l}}^{T}(k)\mathbf{y}_{l}(k), \end{array} $$

(12)

where the causality delay of *τ*
_{
a
} has been added to ensure that the system identification problem is causal. The left-to-right and right-to-left interaural impulse responses \({\mathbf {\widehat {w}}_{i}}\), with *i*∈{*l,r*}, are then updated iteratively according to

$$\begin{array}{*{20}l} \widehat{\mathbf{w}}_{l}(k~+~1) &= \widehat{\mathbf{w}}_{l}(k)+\mu_{l}(k) e_{r}(k){\mathbf{y}}_{l}(k),\\ \widehat{\mathbf{w}}_{r}(k~+~1) &= \widehat{\mathbf{w}}_{r}(k)+\mu_{r}(k) e_{l}(k){\mathbf{y}}_{r}(k), \end{array} $$

(13)

where \(\mu _{i}(k) = {\mu _{0}}/{\mathbf {y}^{T}_{i}(k)\mathbf {y}_{i}(k)}\) is the normalized stepsize with a fixed stepsize of 0<*μ*
_{0}≤1. This minimization of the respective error signal powers is in accordance with the sample-based normalized least-mean-square (NLMS) algorithm as shown here in the time domain or alternatively via the more efficient frequency-domain adaptive filter (FDAF) [49]. In either case, two parallel adaptive filters are implemented to perform the minimization of the left and right error signals independently. The presence of observation noise will naturally affect the adaptive filter performance, but we will rely on the general insight that the target cancelation error of LMS-type adaptive filters is theoretically several dB below the observation noise level [30, 44]. Although the actual target cancelation error depends on the stepsize of the LMS algorithm, we found that the range of stepsize factors 0.01<*μ*<0.1 to be sufficient to deduce an accurate noise PSD estimation from the error signal of the adaptive filters. With this argument, we can characterize the error signals of (12) as

$$\begin{array}{*{20}l} {e}_{i}(k) \!&= \!{x}_{i}(k-\tau_{a})\! + \!{n}_{i}(k-\tau_{a}) \! \\ &- \!\mathbf{\widehat{w}}^{T}_{j}(k)\mathbf{x}_{j}(k) \! -\! \mathbf{\widehat{w}}^{T}_{j}(k)\mathbf{n}_{j}(k), \\ &\approx \!{n}_{i}(k-\!\tau_{a})\! - \!\mathbf{\widehat{w}}^{T}_{j}(k)\mathbf{n}_{j}(k), \; \; \; \;i\neq j \in\{l,r\}. \end{array} $$

(14)

By computing the PSDs of the error signals according to (5), a system of equations including the left and right noise PSDs is obtained,

$$\begin{array}{*{20}l} \widehat{\Phi}_{{e}_{l}e_{l}} &= {\Phi}_{{n}_{l}n_{l}} + \left|{{\widehat{W}}_{r}}\right|^{2} {\Phi}_{{n}_{r}n_{r}} -2\textup{Re} \left\{e^{j\frac{2\pi}{M}\lambda\tau_{a}}{\widehat{W}}_{r}{\Phi}_{{n}_{l}{n}_{r}}\right\},\\ \;\widehat{\Phi}_{{e}_{r}e_{r}} &= {\Phi}_{{n}_{r}n_{r}} + \left|{{\widehat{W}}_{l}}\right|^{2} {\Phi}_{{n}_{l}n_{l}} -2\text{Re}\left\{e^{j\frac{2\pi }{M}\lambda\tau_{a}}{\widehat{W}}_{l}{\Phi}_{{n}_{l}{n}_{r}}\right\}, \end{array} $$

(15)

with an STFT length of *M*. The PSD of the left and right noise signals, \(\widehat {\Phi }_{{n}_{l}n_{l}}\) and \(\widehat {\Phi }_{{n}_{r}n_{r}}\), respectively, can then be derived by solving the simultaneous equations in (15), and consequently, the noise distortion due to the blocking filters can be corrected. In this process, at least three different noise coherence models can be assumed: (1) uncorrelated noise, (2a) free-field spherically isotropic diffuse noise, and (2b) measured or semi-analytical head-related coherence.

#### 4.1.1 Uncorrelated noise

First, we assume that the noise signals in the left and right microphone are uncorrelated \(\Phi _{n_{l}n_{r}}~=~\Phi _{n_{r}n_{l}}~=~0\) which is a reasonable assumption for a diffuse noise field above a cutoff frequency. Therefore, (15) will be a system of linear equations. By solving the equations, the PSDs of the left and right noise signals can be derived as

$$\begin{array}{*{20}l} \widehat{\Phi}_{{n}_{l}n_{l}} &= \frac{\widehat{\Phi}_{{e}_{l}e_{l}} - \left|{{\widehat{W}}_{r} }\right|^{2}\widehat{\Phi}_{{e}_{r}e_{r}} }{1-\left|{{\widehat{W}}_{l}}\right|^{2}{\left|{\widehat{W}}_{r}\right|}^{2}}, \\ \widehat{\Phi}_{{n}_{r}n_{r}} &= \frac{\widehat{\Phi}_{{e}_{r}e_{r}} - \left|{{\widehat{W}}_{l}}\right|^{2}\widehat{\Phi}_{{e}_{l}e_{l}}}{1-\left|{{\widehat{W}}_{l}}\right|^{2}\left|{{\widehat{W}}_{r}}\right|^{2}}. \end{array} $$

(16)

Many practical noise signals exhibit high correlation in the low-frequency range. Therefore, the premise that the noise signal in real acoustic scenarios is fully uncorrelated is not true. Thus, the proposed solution with the assumption of an uncorrelated noise model indeed leads to noise PSD underestimation at low frequencies where the noise signals are correlated (not shown here). The low-frequency compensation of the noise PSD will be addressed in the following section.

#### 4.1.2 Diffuse noise

To overcome the underestimation of the noise power at low frequencies, we employ the noise coherence function. The complex coherence between two noise signals is generally defined as [50]

$$ \Gamma_{{n}_{l}{n}_{r}}(\lambda,\kappa) = \frac{\Phi_{n_{l}n_{r}}(\lambda,\kappa)}{\sqrt{\Phi_{n_{l}n_{l}}(\lambda,\kappa)\Phi_{n_{r}n_{r}}(\lambda,\kappa)}}, $$

(17)

where \(\Phi _{n_{i}n_{j}}(\lambda,\kappa), i,j\in \{l,r\}\) are the cross and auto-PSD of the noise signals, which can be estimated using a first-order recursive equation as in (5) when *n*
_{
l
}(*k*) and *n*
_{
r
}(*k*) are available. Substituting (17) into (15) will lead to a nonlinear system of equations. To simplify the equations, the noise PSDs at the left and right ear are considered to be equal. In [41], it was shown that for measured noise signals, the assumptions of equal noise PSDs at the two microphones are more plausible at low frequencies than at high frequencies. Assuming equal noise PSDs, i.e., \(\phantom {\dot {i}\!}{\Phi }_{{n}_{l}n_{l}} ~= {~\Phi }_{{n}_{r}n_{r}} ~=~\Phi _{n}\) at the two microphones, the cross PSD, \(\phantom {\dot {i}\!}{\Phi }_{{n}_{l}{n}_{r}}\) in (15), consequently can be expressed based on the left and right noise PSDs and the coherence function, i.e., \(\phantom {\dot {i}\!}{\Phi }_{{n}_{r}{n}_{l}} ~= {~\Phi }_{{n}_{l}{n}_{r}} ~=~\Gamma _{{n}_{l}{n}_{r}} {\Phi }_{{n}}\), therein considering that the noise coherence of a diffuse noise field is real valued. Therefore, the noise PSD estimates can be obtained as

$$\begin{array}{*{20}l} \widehat{\Phi}_{{n}_{l}n_{l}} &= \frac{\widehat{\Phi}_{{e}_{l}}}{1+\left|{{\widehat{W}}_{r}}\right|^{2} - 2\text{Re}\left\{e^{j\frac{2\pi}{M}\lambda\tau_{a}}{\widehat{W}}_{r}\Gamma_{{n}_{l}{n}_{r}} \right\}}, \\ \widehat{\Phi}_{{n}_{r}n_{r}} &= \frac{\widehat{\Phi}_{{e}_{r}}}{1+\left|{{\widehat{W}}_{l}}\right|^{2} - 2\text{Re}\left\{e^{j\frac{2\pi}{M}\lambda\tau_{a}}{\widehat{W}}_{l}\Gamma_{{n}_{l}{n}_{r}} \right\}}. \end{array} $$

(18)

A spectral flooring of −20 dB is additionally used in the denominator to avoid division by zero. Moreover, the following noise coherence models can be considered here: (1) free-field diffuse noise coherence, (2) the head-related coherence model [51], and (3) head-related coherence estimates. It has been observed that an accurate estimation of the noise PSD can be obtained if a good model of the noise coherence is employed. Therefore, we suggest using the 2D head-related coherence model proposed in [51].

### 4.2 CR-based adaptive blocking (CRB)

The cross-relation (CR) error between the microphone signal is given as, for instance [44],

$$ e(k) = {\widehat{\mathbf{h}}^{T}_{r}}(k){\mathbf{y}_{l}(k)} - {\widehat{\mathbf{h}}^{T}_{l}}(k){\mathbf{y}_{r}(k)}, $$

(19)

where the left and right impulse responses \(\mathbf {\widehat {h}}_{i}(k)=\left [ \widehat {h}_{i}(0)~\widehat {h}_{i}(1) ~ {\ldots } ~ \widehat {h}_{i}(L-1)\right ]^{T}\) can be determined by a stereo normalized least-mean-square (NLMS) algorithm [42, 44]:

$$\begin{array}{*{20}l} \widehat{\mathbf{h}}_{l}(k~+~1) = \widehat{\mathbf{h}}_{l}(k)~+~\mu(k) e(k){\mathbf{y}}_{r}(k), \\ \widehat{\mathbf{h}}_{r}(k~+~1) = \widehat{\mathbf{h}}_{r}(k)~-~\mu(k) e(k){\mathbf{y}}_{l}(k), \end{array} $$

(20)

where the normalized stepsize

$$ \mu(k) = \mu_{0} \left({\mathbf{y}}^{T}_{l}(k){\mathbf{y}}_{l}(k)~+~{\mathbf{y}^{T}_{r}(k)\mathbf{y}}_{r}(k)\right)^{-1} $$

(21)

governs the convergence rate of the algorithm.

The estimated impulse responses are further normalized to unit norm in each iteration of the recursive adaptation, i.e.,

$$ \widehat{\mathbf{h}}_{l}^{T}(k)\widehat{\mathbf{h}}_{l}(k) + \widehat{\mathbf{h}}_{r}^{T}(k)\widehat{\mathbf{h}}_{r}(k) = 1, $$

(22)

to avoid trivial solutions. Substituting the binaural signal model (1) into (19), we have

$$\begin{array}{*{20}l} e(k) = \:& {\widehat{\mathbf{h}}^{T}_{r}}(k)({\mathbf{x}_{l}(k) + \mathbf{n}_{l}(k)}), \\ &-{\widehat{\mathbf{h}}^{T}_{l}}(k)(\mathbf{x}_{r}(k)+ \mathbf{n}_{r}(k)). \end{array} $$

(23)

Because we expect that \(\widehat {\mathbf {h}}^{T}_{r}(k)\mathbf {x}_{l}(k) \approx \widehat {\mathbf {h}^{T}_{l}}(k)\mathbf {x}_{r}(k)\) after the error signal minimization in cross-relation techniques, the speech related part in (23) is canceled. Even when the estimated channels are altered by an unknown yet common convolutive operation, i.e., \(\widehat {h}_{i}(k) = f(k)\ast {h}_{i}(k)\) [52], the common convolutive error, which might be a drawback in blind channel identification, does not seriously affect the speech blocking performance because it applies simultaneously to both the left and right estimated impulse responses. Therefore, the error signal

$$ e(k) \approx {\widehat{\mathbf{h}}^{T}_{r}}(k)\mathbf{n}_{l}(k)-{\widehat{\mathbf{h}}^{T}_{l}}(k)\mathbf{n}_{r}(k), $$

(24)

contains the filtered noise components of the left and right microphone signals. Thus, although the error signal can be considered as an estimation of the noise signal, this estimation is biased because the left and right noise signal components are filtered by the estimated impulse responses. Transferring (24) into the PSD domain, we obtain

$$ \widehat{\Phi}_{e} = \left|{\widehat{H}_{r}}\right|^{2} \Phi_{n_{l}n_{l}}+ \left|{\widehat{H}_{l}}\right|^{2} \Phi_{n_{r}n_{r}} - 2\text{Re}\left\{\widehat{H}_{l}\widehat{H}_{r}^{\ast}\Phi_{n_{l}n_{r}}\right\}. $$

(25)

Moreover, the left and right noise PSDs are again assumed to be identical to solve the single Eq. (25), i.e., \(\Phi _{n_{r}n_{r}}~=~\Phi _{n_{l}n_{l}}~=~\Phi _{n} \). The cross PSD of the left and right noise signals is again replaced by the coherence of the noise signals, i.e., \(\Phi _{n_{l}n_{r}}~=~\Phi _{n}\Gamma _{n_{l}n_{r}}\). Thus,

$$ \widehat{\Phi}_{e} = \left|{\widehat{H}_{r}}\right|^{2} {\Phi}_{n}+ \left|{\widehat{H}_{l}}\right|^{2} \Phi_{n} - 2\textup{Re}\left\{\widehat{H}_{l}{\widehat{H}_{r}}^{\ast} \Gamma_{n_{l}n_{r}}\Phi_{n}\right\}. $$

(26)

The error PSD \(\widehat {\Phi }_{e}\) is obtained using the first-order recursive averaging according to (5), with *E*(*λ*,*κ*) being the STFT of the cross-relation error signal *e*(*k*) according to (19). By solving (26), the estimated noise PSD is obtained as

$$ \widehat{\Phi}_{n}= \frac{\widehat{\Phi}_{e}}{\left|{\widehat{H}_{r}}\right|^{2} + \left|{\widehat{H_{l}}}\right|^{2} - 2\text{Re}\left\{\widehat{H}_{l}{\widehat{H}_{r}}^{\ast} \Gamma_{n_{l}n_{r}}\right\}}. $$

(27)

To avoid division by zero, a spectral flooring is applied to limit the denominator to −20 dB.

### 4.3 PCA-based adaptive blocking (PCAB)

In this algorithm, the left and the right source-to-microphone transfer functions are identified by minimizing the error signal between microphone signal and an estimated source signal, i.e., \(\widehat {s}(k)\) [43, 44, 53]

$$\begin{array}{*{20}l} {e}_{l}(k) &= y_{l}(k-L) - \widehat{\mathbf{h}}_{l}^{T}\widehat{\mathbf{s}}(k), \\ {e}_{r}(k) &= y_{r}(k-L) - \widehat{\mathbf{h}}_{r}^{T}\widehat{\mathbf{s}}(k). \end{array} $$

(28)

The estimated source signal \(\widehat {\mathbf {s}}(k)\) is a vector of *L* recent successive samples \(\widehat {\mathbf {s}}(k) =\left [\widehat {s}(k) ~ \widehat {s}(k-1)~ {\ldots } \widehat {s}(k~-~L~+~1) \right ]^{T}\) resulting in a matched filter operation,

$$ \widehat{s}(k) = \widehat{\mathbf{h}}_{l}^{T{\hookleftarrow}}(k)\mathbf{y}_{l}(k) + \widehat{\mathbf{h}}_{r}^{T{\hookleftarrow}}(k)\mathbf{y}_{r}(k), $$

(29)

where (.)^{↩} denotes the time-reversed estimated impulse response. The estimated left and right impulse responses are updated according to the LMS style,

$$\begin{array}{*{20}l} \widehat{\mathbf{h}}_{l}(k~+~1) &= \widehat{\mathbf{h}}_{l}(k)+\mu(k)e_{l}(k)\widehat{\mathbf{s}}(k), \\ \widehat{\mathbf{h}}_{r}(k~+~1) &= \widehat{\mathbf{h}}_{r}(k)+\mu(k)e_{r}(k)\widehat{\mathbf{s}}(k). \end{array} $$

(30)

We can transfer (28) into the STFT domain,

$$\begin{array}{*{20}l} E_{l}(\kappa,\lambda) &= e^{-j\frac{2\pi}{M}\lambda L}Y_{l}(\kappa,\lambda)- \widehat{H}_{l}(\kappa,\lambda)\widehat{S}(\kappa,\lambda), \\ E_{r}(\kappa,\lambda) &= e^{-j\frac{2\pi}{M}\lambda L}Y_{r}(\kappa,\lambda)- \widehat{H}_{r}(\kappa,\lambda)\widehat{S}(\kappa,\lambda), \end{array} $$

(31)

and the matched filter output of (29) is

$$ \widehat{S}= e^{-j\frac{2\pi}{M}\lambda L}\widehat{H}^{*}_{l}Y_{l} + e^{-j\frac{2\pi}{M}\lambda L}\widehat{H}^{*}_{r}Y_{r}. $$

(32)

Assuming the proper transfer function estimation, i.e., \(\widehat {H}_{i} = H_{i} F\), where *F* is a common filter error [52], (32) is expressed as

$$\begin{array}{*{20}l} \widehat{S} =\: & e^{-j\frac{2\pi}{M}\lambda L}SF^{-1}\left(\left|{\widehat{H}_{l}}\right|^{2} + \left|{\widehat{H}_{r}}\right|^{2}\right) \\ & + e^{-j\frac{2\pi}{M}\lambda L} N_{l}\widehat{H}^{*}_{l} + e^{-j\frac{2\pi}{M}\lambda L} N_{r}\widehat{H}^{*}_{r}. \end{array} $$

(33)

Because the recursive algorithm in (30) can be observed as a one-to-one translation of a frequency-domain (bin-wise) representation of adaptive PCA [54], it provides approximately a bin-wise unit norm, i.e., \( \left |{\widehat {H}_{l}}\right |^{2} + \left |{\widehat {H}_{r}}\right |^{2} \approx 1\) when the convergence toward the principle components is achieved. Thus,

$$ \widehat{S} = e^{-j\frac{2\pi}{M}\lambda L}\left(F^{-1}S + N_{l}\widehat{H}^{*}_{l} + N_{r}\widehat{H}^{*}_{r}\right). $$

(34)

Again considering the binaural signal model (3) and substituting (34) back into (31), the target signal will be canceled out, and the error signals will consists of only the filtered noise components as follows:

$$\begin{array}{*{20}l} E_{l} &= e^{-j\frac{2\pi}{M}\lambda L}\left (N_{l}\left(1 -\left|{\widehat{H}_{l}}\right|^{2}\right) - N_{r}\widehat{H}_{l}\widehat{H}^{*}_{r}\right), \\ E_{r} &= e^{-j\frac{2\pi}{M}\lambda L}\left (N_{r}\left(1 -\left|{\widehat{H}_{r}}\right|^{2}\right) - N_{l}\widehat{H}_{r}\widehat{H}^{*}_{l}\right). \end{array} $$

(35)

By transforming (35) into the PSD domain, we have

$$ \widehat{\mathbf{\Phi}}_{e} = \mathbf{A}\mathbf{\Phi}_{n} - 2\text{Re}\left\{\widehat{H}^{\ast}_{l}\widehat{H}_{r}\right\}{\Phi}_{n_{l}n_{r}}\widehat{\mathbf{H}}', $$

(36)

where \(\widehat {\mathbf {\Phi }}_{e} =\left [ \widehat {\Phi }_{e_{l}}~ \widehat {\Phi }_{e_{r}} \right ]^{T}\) and \(\mathbf {\Phi }_{n} = \left [ \Phi _{n_{l}n_{l}} ~ \Phi _{n_{r}n_{r}} \right ]^{T}\) are a concatenation of the left and right error and noise PSDs, respectively. The matrix **A** is defined as

$$ \mathbf{A} =\left[ \begin{array}{ll} \left(1 -\left|{\widehat{H}_{l}}\right|^{2}\right)^{2} &\left|{\widehat{H}_{l}}\right|^{2}\left|{\widehat{H}_{r}}\right|^{2}\\ \left|{\widehat{H}_{l}}\right|^{2}\left|{\widehat{H}_{r}}\right|^{2}&\left(1 -\left|{\widehat{H}_{r}}\right|^{2}\right)^{2} \end{array} \right], $$

(37)

while \( \widehat {\mathbf {H}}' = \left [ 1- \left |{\widehat {H}_{l}}\right |^{2} ~ 1- \left |{\widehat {H}_{r}}\right |^{2} \right ]^{T}.\)

Due to the bin-wise norm normalization, det (**A**) is very small, and thus, **A** is singular, regardless of the position of the target speaker. To solve the rank deficiency of **A**, the noise PSDs at the left and right ear are again assumed to be identical, i.e., \(\Phi _{n_{l}n_{l}} = \Phi _{n_{r}n_{r}} = \Phi _{n}\). Therefore, (36) is rewritten as

$$ \widehat{\mathbf{\Phi}}_{e} = \mathbf{B}{\Phi}_{n} - 2\text{Re}\{\widehat{H}^{\ast}_{l}\widehat{H}_{r}\}{\Phi}_{n_{l}n_{r}}\widehat{\mathbf{H}}', $$

(38)

with

$$ \mathbf{B} = \left[ \begin{array}{ll} \left|{\widehat{H}_{l}}\right|^{2}\left|{\widehat{H}_{r}}\right|^{2} + \left(1- \left|{\widehat{H}_{l}}\right|^{2}\right)^{2} \\ \left|{\widehat{H}_{l}}\right|^{2}\left|{\widehat{H}_{r}}\right|^{2} + \left(1- \left|{\widehat{H}_{r}}\right|^{2}\right)^{2} \end{array}\right]. $$

(39)

#### 4.3.1 Uncorrelated noise

Assuming uncorrelated noise, i.e., \({\Phi }_{n_{l}n_{r}}~=~{0}\), (38) will be simplified to

$$\begin{array}{*{20}l} \widehat{\mathbf{\Phi}}_{e}(\kappa,\lambda) = \mathbf{B}(\kappa,\lambda){\Phi}_{n}(\kappa,\lambda), \end{array} $$

(40)

which is an over-determined problem. Thus, (40) can be solved using least-squares [55],

$$ \widehat{\Phi}_{n} = \left(\mathbf{B}^{T}\mathbf{B}\right)\mathbf{B}^{T}\widehat{\mathbf{\Phi}}_{e}. $$

(41)

Many practical noise situations, however, have to be modeled as diffuse noise [22], with high correlation in the low frequencies. Therefore, the noise PSD is underestimated especially at low frequencies.

#### 4.3.2 Diffuse noise

Assuming an isotropic homogeneous noise field, the noise will be correlated in low frequencies and uncorrelated in high frequencies. Under the assumption of equal noise PSD at the left and right ear and substituting \(\Phi _{n_{l}n_{r}}~= ~\Phi _{n_{r}n_{l}}~=~\Gamma _{n_{l}n_{r}} \Phi _{n} \) into (38), we have

$$\begin{array}{*{20}l} \mathbf{\Phi}_{e}(\kappa,\lambda) &= \mathbf{B}(\kappa,\lambda){\Phi}_{n}(\kappa,\lambda) \\ &\quad- 2\textup{Re}\left\{\widehat{H}^{\ast}_{l}\widehat{H}_{r}\right\}{\Phi}_{n}\Gamma_{n_{l}n_{r}}\widehat{\mathbf{H}}'. \end{array} $$

(42)

The noise PSD then again can be estimated by solving (42) in the least-squares sense [55] as

$$ \widehat{\Phi}_{n} = \left(\mathbf{C}^{T}\mathbf{C}\right)\mathbf{C}^{T}\widehat{\mathbf{\Phi}}_{e}, $$

(43)

where

$$ \mathbf{C} = \mathbf{B} - 2\text{Re}\left\{\widehat{H}^{\ast}_{l}\widehat{H}_{r}\right\}\Gamma_{n_{l}n_{r}}\widehat{\mathbf{H}}'. $$

(44)