# Phase reference for the generalized multichannel Wiener filter

- Simon Grimm
^{1}, - Toby Christian Lawin-Ore
^{2}, - Simon Doclo
^{2}and - Jürgen Freudenberger
^{1}Email author

**2016**:78

https://doi.org/10.1186/s13634-016-0375-6

© The Author(s) 2016

**Received: **16 December 2015

**Accepted: **20 June 2016

**Published: **7 July 2016

## Abstract

The multichannel Wiener filter (MWF) is a well-established noise reduction technique for speech processing. Most commonly, the speech component in a selected reference microphone is estimated. The choice of this reference microphone influences the broadband output signal-to-noise ratio (SNR) as well as the speech distortion. Recently, a generalized formulation for the MWF (G-MWF) was proposed that uses a weighted sum of the individual transfer functions from the speaker to the microphones to form a better speech reference resulting in an improved broadband output SNR. For the MWF, the influence of the phase reference is often neglected, because it has no impact on the narrow-band output SNR. The G-MWF allows an arbitrary choice of the phase reference especially in the context of spatially distributed microphones.

In this work, we demonstrate that the phase reference determines the overall transfer function and hence has an impact on both the speech distortion and the broadband output SNR. We propose two speech references that achieve a better signal-to-reverberation ratio (SRR) and an improvement in the broadband output SNR. Both proposed references are based on the phase of a delay-and-sum beamformer. Hence, the time-difference-of-arrival (TDOA) of the speech source is required to align the signals. The different techniques are compared in terms of SRR and SNR performance.

## Keywords

## 1 Introduction

Recently, research on speech enhancement using so-called acoustic sensor networks consisting of spatially distributed microphones has gained significant interest [1–12]. Compared with a microphone array at a single position, spatially distributed microphones are able to acquire more information about the sound field. The usage of spatially distributed microphones allows to employ beamforming techniques for speech quality improvement in reverberant and noisy conditions. Several methods were introduced that use a reference channel. These include the relative transfer function—generalized sidelobe canceler (RTF-GSC) [13], the minimum variance distortionless response (MVDR) beamformer [14], and the speech distortion weighted—multichannel Wiener filter (SDW-MWF) [15, 16].

The MWF is a well-established technique for speech enhancement. It produces a minimum-mean-squared error (MMSE) estimate of an unknown desired signal. The desired signal of the standard MWF (S-MWF) is usually the speech component in one of the microphone signals, referred to as the reference microphone signal. For spatially distributed microphones, the selection of the reference microphone may have a large influence on the performance of the MWF depending on the positions of the speech/noise sources and the microphones [5–7, 17].

With the S-MWF, the overall transfer function from the speakers to the output of the MWF equals the acoustic transfer function (ATF) from the speaker to the reference microphone. Hence, the reference microphone selection determines the amount of speech distortion. Moreover, the overall transfer function has an impact on the broadband output SNR of the MWF [17]. In [5], an MWF formulation with partial equalization (P-MWF) was presented, where the overall transfer function was chosen as the envelope of the individual ATFs with the phase of an arbitrary reference microphone. This results in a partial equalization of the acoustic system and an improved broadband output SNR. While this approach has advantages with respect to background noise reduction, the reverberation caused by the acoustic environment is not reduced.

Recently, the generalized MWF was proposed in order to improve the broadband output SNR [7] (see also [6]). With the G-MWF, the speech reference is a weighted sum of the speech components, such that the output signal has the same phase as the speech component in the reference microphone. The overall transfer function is the weighted sum of squared amplitudes of all ATFs.

In this work, we consider the phase of the speech reference. That is, we present a further generalization of the G-MWF approach in [7], which enables different phase references. We demonstrate that the phase of the speech reference shapes the overall transfer function and hence impacts the speech distortion. Moreover, the overall transfer function influences the broadband output SNR. We propose two speech references that achieve a better signal-to-reverberation ratio and an improvement in broadband output SNR. The proposed references are based on the phase of a delay-and-sum beamformer (DSB) [18].

As shown in [19], the temporal smearing and therefore the reverberation relies on the all-pass component of the overall transfer function. This suggests that a suitable phase reference can improve the output SRR of the system. As a consequence, the phase term of a delay-and-sum beamformer is applied as a phase reference of the G-MWF. Similar concepts were proposed in [20–22]. The DSB needs an estimate of the TDOA to align the signals properly. In the literature, several methods for TDOA estimation were proposed [23–30]. Many of these techniques are summarized in [29].

The work is a sequel to [21]. In addition to the concept proposed in [21], we present a new approach that combines the delay-and-sum beamformer and the P-MWF. Both approaches for the G-MWF can improve the SRR and SNR compared with the S-MWF and P-MWF. Furthermore, we present a theoretical analysis of the broadband output SNR of the G-MWF.

The paper is organized as follows: in Section 2, we introduce the signal model and notation. The G-MWF formulation and the analysis of the output SNR are presented in Sections 3 and 4, respectively. The design of the overall transfer function is explained in Section 5. The block diagram structure of the system is presented in Section 6, together with the necessary TDOA estimation and the challenge of acquiring these estimates in noisy and reverberated environments. In Section 7, the simulation results in terms of SNR and SRR improvement are given, followed by a conclusion in Section 8.

## 2 Signal model and notation

*M*microphones. The

*i*th microphone signal

*y*

_{ i }(

*k*) can be expressed as the convolution of the speech signal

*s*(

*k*) with the acoustic impulse response

*h*

_{ i }(

*k*) from the speech source to the

*i*th microphone plus an additive noise term

*n*

_{ i }(

*k*). In the short time frequency domain, the resulting microphone signals can be written as follows

*Y*

_{ i }(

*κ*,

*ν*),

*S*(

*κ*,

*ν*), and

*N*

_{ i }(

*κ*,

*ν*) correspond to the short time spectra of the time domain signals.

*H*

_{ i }(

*ν*) represents the ATF corresponding to the the acoustic impulse response and

*X*

_{ i }(

*κ*,

*ν*)=

*H*

_{ i }(

*ν*)

*S*(

*κ*,

*ν*) is the speech component at the

*i*th microphone.

*κ*and

*ν*denote the subsampled time index and the frequency bin index, respectively. In the following, these indices are often omitted when possible. The short time spectra and the ATF can be written as

*M*-dimensional vectors:

^{
T
} denotes the transpose of a vector, ^{∗} the complex conjugate, and ^{
†
} denotes the conjugate transpose. Vectors and matrices are written in bold and scalars are normal letters.

*Φ*

_{ N i }

^{2}and

*Φ*

_{ S }

^{2}. Assuming a single speech source, the speech correlation matrix R

_{ S }has rank one and therefore can be expressed as

where \(\mathbb {E}\) denotes the mathematical expectation. Similarly, \(\boldsymbol {R}_{N}=\mathbb {E} \bigl \{ \boldsymbol {N}\boldsymbol {N}^{\dag } \bigr \}\) denotes the noise correlation matrix. It is assumed, that the speech and noise terms are uncorrelated.

*Z*of the beamformer with filter coefficients

**G**=[

*G*

_{1},

*G*

_{2},…,

*G*

_{ M }]

^{ T }is obtained by filtering and summing the microphone signals, i.e.,

where *Z*
_{
S
} and *Z*
_{
N
} denote the speech and the noise components at the beamformer output.

## 3 Generalized MWF

*μ*is a trade-off parameter between noise reduction and speech distortion. The filter minimizing (9) is given by

where **u** is a vector that selects the reference microphone, i.e., the vector **u** contains a single one and all other elements are zero. Therefore, the overall transfer function is equal to the ATF of a reference microphone, i.e. *H*
_{
d
}=*H*
_{ref}.

_{ S }is a rank one matrix, it should be noted that any non-zero vector

**u**achieves the same (optimal) narrow-band output SNR. In [7], the generalized MWF was presented, where the elements

*u*

_{ i }of the vector

**u**define a speech reference for the MWF which is a weighted sum of the speech components in the different microphones with the phase of the speech component in the reference microphone signal. The vector

**u**can be used to define the desired complex-valued response as

In [7], the magnitude of the response \(\tilde {H}_{d} \) was designed to improve the broadband output SNR, whereas the phase term of \(\tilde {H}_{d}\) was set equal to the phase of the ATF in the reference microphone. In contrast to the approach in [7], we consider a complex-valued selection vector **u** which enables different phase references. In the following, we demonstrate that \(\tilde {H}_{d}\) can be considered as the overall transfer function.

### 3.1 MWF overall transfer function

*μ*=0, the overall transfer function equals \(\tilde {H}_{d}\), because

**G**

^{MVDR}has a unity gain transfer function. The output signal can be written as

In the following, we consider some special cases of the G-MWF. Note that the different formulations of the G-MWF differ only with respect to the vector **u** and the corresponding transfer function \(\tilde {H}_{d}\).

### 3.2 MVDR beamformer

**u**are

However, the resulting G-MWF requires perfect knowledge about the ATF from the speaker to the microphones. The corresponding issue of blind channel estimation is a challenging task in noisy environments and so far an unsolved problem. A further issue is the inversion of the squared norm of the ATFs, since they may contain zeros in their magnitude response.

### 3.3 Selection of a reference channel

where **u** is a column vector of length *M* that selects the reference microphone, i.e., the corresponding entry is equal to one, while all other entries are equal to zero. As a result, the corresponding ATF remains as the overall transfer function.

Compared to the MVDR beamformer in Section 3.2, the advantage of the S-MWF is that it only depends on estimates of the signal statistics, i.e., R
_{
S
} and R
_{
N
} and no explicit knowledge of the ATFs is required. However, it should be noted that the output signal is as reverberant as the input signal.

### 3.4 Partial equalization approach

*ϕ*

_{ref}of an arbitrary (reference) ATF, i.e.,

**u**can be computed as

**R**

_{ S }in the

*i*th row and

*j*th column. Hence, for the P-MWF, we have

Similar to the S-MWF, the P-MWF only depends on the signal statistics and therefore no explicit knowledge of the ATFs is required. It should be noted that the phase of the output speech component is equal to the phase of the reverberant speech component in the reference microphone signal. As a result, the P-MWF approach equalizes the amplitude of the desired overall transfer function, but the output signal is as reverberant as the selected microphone signal.

## 4 Output SNR

*P*

_{ S }. Hence, the PSD of the speech component

*Z*

_{ S }at the output of the G-MWF is \(\mathbb {E} \left \{|Z_{S}(\nu)|^{2}\right \}=|G_{WF}|^{2}|\tilde {H}_{d}|^{2} P_{S}\). Similarly, the PSD of the noise component at the output of the MVDR beamformer is \(P_{N,\text {MVDR}}=\mathbf {G}_{\text {MVDR}}^{\dag }\boldsymbol {R}_{N}\mathbf {G}_{\text {MVDR}}\), such that the PSD of the noise component at the output of the G-MWF is \(\mathbb {E} \left \{|Z_{N}(\nu)|^{2}\right \}=|G_{WF}|^{2}|\tilde {H}_{d}|^{2} P_{N,\text {MVDR}}\) and

From this equation, it can be seen that the overall transfer function as well as the single-channel Wiener post filter impact the broadband output SNR.

*F*denotes the total number of frequency bins. Maximizing

*γ*

_{out}is equivalent to solving the generalized eigenvalue problem \(\mathbf {A}\tilde {\mathbf {H}}=\lambda \mathbf {B}\tilde {\mathbf {H}}\) or \(\mathbf {B}^{-1}\mathbf {A}\tilde {\mathbf {H}}=\lambda \tilde {\mathbf {H}}\). The solution to the eigenvalue problem is the eigenvector corresponding to the largest eigenvalue

*λ*

_{max}. Since

**B**

^{−1}

**A**is a diagonal matrix, the largest eigenvalue is

Comparing Eqs. (28) with (26), we obtain the corresponding eigenvector \(\tilde {\mathbf {H}}=[0,\ldots,1,\ldots,0]^{T}\), with a one in the frequency bin corresponding to the largest eigenvalue and zero elsewhere. Although this overall transfer function maximizes the broadband output SNR, the corresponding speech distortion will not be acceptable, because only one frequency bin will pass the beamformer.

Hence, we conclude that the design of the desired response \(\tilde {H}_{d}\) requires additional constraints on the speech distortion. The optimal solution with respect to speech distortion is the MVDR beamformer which is, however, hardly attainable in practice.

## 5 MWF reference selection

It was shown in [19] that the temporal smearing and therefore the reverberation relies on the all-pass component of the overall ATF. This suggests that a suitable phase reference can improve the output SRR. In this section, we present two formulations of the G-MWF that improve the SRR and the broadband output SNR compared with the S-MWF or the P-MWF. Both formulations use a phase reference from a DSB, which delays the microphone signals to compensate for the different times of arrival. Hence, the DSB enhances the direct path component and, as we will see in Section 7, improves the SRR.

### 5.1 Delay-and-sum beamformer

**u**can be described as

*τ*

_{ i }is a delay (in samples), which compensates the TDOA of the direct path speech components at the microphones. The speech components are typically aligned to the microphone with the latest arrival time to obtain a causal DSB. Using (12) we obtain the overall transfer function

### 5.2 Partial equalization with DSB phase reference

**u**can be described as

Hence, the direct path speech components in the microphones are aligned, but additionally the microphone signals are weighted with the magnitude of the ATFs similar to the P-MWF approach.

## 6 System structure of the G-MWF

_{ N }is updated. The estimate of the speech correlation matrix

**R**

_{ S }is obtained from the input correlation matrix

**R**

_{ Y }as

Furthermore, for the phase reference proposed in Section 5, the TDOA from the speaker to the microphones is required, to achieve a coherent summation of the microphone signals. Depending on the TDOA, a suitable vector **u** is derived to compensate the phase differences of the microphone signals, as calculated in Eq. (29). A very popular TDOA estimation approach is the generalized cross correlation (GCC) method [23, 28, 29], where the cross-correlation between the microphone signals is calculated in the frequency domain as the cross power spectral density (CPSD). Depending on the application and the environmental conditions, the CPSD is typically weighted with a coherence or noise-based weighting using the magnitude spectrum of the CPSD. The weighted CPSD is transformed to the time domain using the inverse Fourier transform, resulting in the cross correlation vector. The main peak in the cross correlation vector indicates the time delay. It should be noted that the TDOA estimate is only valid in signal blocks where the speaker is active, which can be determined based on a VAD.

_{ Y }and R

_{ N }[34, 35]. In [36], an approach for unbiased RTF estimation was proposed, requiring estimates of the PSDs and CPSDs of the speech and noise components, which can be obtained from the estimated speech and noise correlation matrices R

_{ S }and R

_{ N }. The RTF estimate between microphones

*i*and

*j*is computed as a combination of two weighted coefficients

*f*

_{ i }and

*f*

_{ j }are SNR-based weighting coefficients which are defined as

*j*can be calculated as

where \(\hat {w}_{\text {unbiased}}(n)\) is the *n*th element of the vector \(\hat {W}_{\text {unbiased}}\).

## 7 Simulation results

*F*=512. We consider a noisy car environment as well as a reverberant classroom. The signals for testing the algorithms are ITU speech signals convolved with measured impulse responses. For the car scenario, this was done with an artificial head and two cardioid microphones that were mounted close to the rear-view mirror. For the classroom scenario [37] impulse responses were recorded with a loudspeaker and omnidirectional microphones at two different spatial locations with a microphone distance of 0.5 m. The reverberation time

*R*

*T*

_{60}of the classroom has a value between 1.5 and 1.8 s over all frequencies. To evaluate the dereverberation capabilities of the algorithms, the energy decay curves (EDCs) [38] of the resulting overall transfer functions \(\tilde {H}_{d}\) using the measured impulse responses were calculated (for

*μ*=0). For the car environment, the resulting EDCs are shown in Fig. 2.

Curve (a) depicts the EDC of the overall transfer function for the S-MWF. Curve (b) depicts the resulting EDC of the overall transfer function of the P-MWF. Compared with (a), it can be observed that the decay time is increased, but the energy of the first reflections is reduced due to the partial equalization as can be seen from the first 230 samples of the EDC. Curves (c) and (d) depict the EDC of the overall transfer function for the G-MWF-1 and G-MWF-2, respectively. Compared with (a) and (b), a reduced decay time is observed due to the coherent combining of the phase terms. As a result, the direct components of the ATF are enhanced, which leads to an improvement in speech quality of the overall system.

*h*

_{ d }is the impulse response of the overall transfer function \(\tilde {H}_{d}\) in the time domain and

*n*

_{ d }are the samples of the direct path. For

*n*

_{ d }, we considered a time interval of 8 ms after the first arrival of the direct sound. In Table 1, the DRR values for the different overall transfer functions \(\tilde {H}_{d}\) are presented. From the table, it can be seen that the G-MWF approaches improve the DRR in both scenarios compared with the S-MWF and P-MWF.

DRR of the overall transfer function for choosing a different phase and magnitude reference

S-MWF | P-MWF | G-MWF1 | G-MWF2 | |
---|---|---|---|---|

Car scenario | 12.6 dB | 9.3 dB | 14.7 dB | 14.3 dB |

Classroom scenario | −3.8 dB | −3.7 dB | −1.4 dB | −1.7 dB |

*μ*>0, the MWF performs an adaptive noise reduction and therefore the resulting overall transfer function is time varying. As a result, signal-based performance measures for the noise reduction and dereverberation performance need to be used. For the dereverberation performance, the signal-to-reverberation ratio (SRR) after [39] is used, i.e.,

where *s*
_{
d
}(*k*) is the direct path signal component of the first microphone and \(\hat {s}(k)\) is the output signal of the beamformer in the time domain. It should be noted that this measure is only valid for signal segments, where speech activity is detected.

*μ*, where a larger value of

*μ*results in more noise reduction. The SRR was measured in time frames where speech was present. The performance of both G-MWF approaches are compared with the S-MWF and P-MWF. It can be observed that both G-MWF approaches outperform the S-MWF in terms of SRR and SNR. G-MWF-1 outperforms the P-MWF in terms of SRR and SNR, whereas G-MWF-2 improves the SRR compared to G-MWF-1 at the expense of a small SNR loss.

SRR and SNR comparison for different MWF formulations

| SNR | SRR |

S-MWF | −1.94 dB | 2.87 dB |

P-MWF | −0.86 dB | 2.29 dB |

G-MWF1 | −0.72 dB | 4.69 dB |

G-MWF2 | −1.33 dB | 5.86 dB |

| SNR | SRR |

S-MWF | 2.82 dB | 1.66 dB |

P-MWF | 4.35 dB | 1.81 dB |

G-MWF1 | 4.90 dB | 3.49 dB |

G-MWF2 | 4.25 dB | 5.08 dB |

## 8 Conclusions

For the multichannel Wiener filter, the influence of the phase reference is often neglected, because it has no impact on the narrow-band output SNR. In this work, we have shown that the phase reference influences the overall transfer function. Moreover, the overall transfer function determines the speech distortion and impacts the broadband output SNR. We have proposed two generalized formulations for the MWF where the phase reference is based on the phase of a delay-and-sum beamformer. The proposed G-MWF technique requires an estimate of the time-difference-of-arrival, which can be acquired from the estimates of the speech and noise correlation matrices. Thus, the G-MWF requires only information about the second order statistics of the signals. The presented simulation results indicate that both G-MWF versions can achieve a better signal-to-reverberation ratio and an improvement in broadband output SNR compared to previously known MWF formulations.

## Notes

## Declarations

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- S Wehr, I Kozintsev, R Lienhart, W Kellermann, in Proceedings of IEEE Sixth International Symposium on Multimedia Software Engineering. Synchronization of acoustic sensors for distributed ad-hoc audio networks and its use for blind source separation (IEEE, 2004), pp. 18–25.Google Scholar
- S Doclo, M Moonen, T Van den Bogaert, J Wouters, Reduced-bandwidth and distributed MWF-based noise reduction algorithms for binaural hearing aids. IEEE Transac. Audio, Speech, and Language Processing.
**17**(1), 38–51 (2009).View ArticleGoogle Scholar - TC Lawin-Ore, S Doclo, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Analysis of rate constraints for MWF-based noise reduction in acoustic sensor networks (IEEE, 2011), pp. 269–272.Google Scholar
- S Stenzel, J Freudenberger, 2012. Blind matched filtering for speech enhancement with distributed microphones, (2012), p. 15. Article ID 169853.Google Scholar
- S Stenzel, TC Lawin-Ore, J Freudenberger, S Doclo, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. A multichannel Wiener filter with partial equalization for distributed microphones (Mohonk Mountain House, New Paltz, NY, 2013).Google Scholar
- TC Lawin-Ore, S Stenzel, J Freudenberger, S Doclo, in Proceedings of the International Workshop on Acoustic Signal Enhancement (IWAENC). Alternative formulation and robustness analysis of the multichannel Wiener filter for spatially distributed microphones (Antibes, France, 2014), pp. 208–212.Google Scholar
- TC Lawin-Ore, S Stenzel, J Freudenberger, S Doclo, in Proc. ITG Conference on Speech Communication. Generalized multichannel Wiener filter for spatially distributed microphones (Erlangen, Germany, 2014), pp. 1–4.Google Scholar
- TC Lawin-Ore, S Doclo, Analysis of the average performance of the multichannel Wiener filter based noise reduction using statistical room acoustics. Signal Process.
**107:**, 96–108 (2015).View ArticleGoogle Scholar - S Markovich-Golan, A Bertrand, M Moonen, S Gannot, Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks. Signal Process.
**107:**, 4–20 (2015).View ArticleGoogle Scholar - J Schmalenstroeer, P Jebramcik, R Haeb-Umbach, A combined hardware-software approach for acoustic sensor network synchronization. Signal Process.
**107:**, 171–184 (2015).View ArticleGoogle Scholar - S Miyabe, N Ono, S Makino, Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation. Signal Process.
**107:**, 185–196 (2015).View ArticleGoogle Scholar - L Wang, S Doclo, Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation. IEEE/ACM Trans. Audio, Speech Lang. Process.
**24:**, 571–582 (2016).View ArticleGoogle Scholar - S Gannot, D Burshtein, E Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transac. Signal Process.
**49**(8), 1614–1626 (2001).View ArticleGoogle Scholar - EAP Habets, J Benesty, I Cohen, S Gannot, J Dmochowski, New insights into the MVDR beamformer in room acoustics. IEEE Transactions on Audio, Speech, Language Process.
**18**(1), 158–170 (2010).View ArticleGoogle Scholar - J Chen, J Benesty, Y Huang, S Doclo, New insights into the noise reduction Wiener filter. IEEE Transac. Audio, Speech Lang. Process.
**14**(4), 1218–1234 (2006).View ArticleGoogle Scholar - S Doclo, A Spriet, J Wouters, M Moonen, Frequency-domain criterion for the speech distortion weighted multichannel Wiener filter for robust noise reduction. Speech Commun.
**49**(7–8), 636–656 (2007).View ArticleGoogle Scholar - TC Lawin-Ore, S Doclo, in
*Proceedings of 10. ITG Symposium on Speech Communication*. Reference microphone selection for MWF-based noise reduction using distributed microphone arrays (VDEBraunschweig, 2012), pp. 31–34.Google Scholar - JB Allen, DA Berlkey, J Blauert, Multimicrophone signal-processing technique to remove room reverberation from speech signals. J. Acoust. Soc. Am.
**62**(4), 912–915 (1977).View ArticleGoogle Scholar - Q-G Liu, B Champagne, P Kaba, Room speech dereverberation via minimum-phase and all-pass component processing of multi-microphone signals. IEEE Pacific Rim Conf. Commun. Comput. Signal Process, 571–574 (1995).Google Scholar
- EAP Benesty, J Habets, A two-stage beamforming approach for noise reduction and dereverberation. IEEE Transac. Audio, Speech, Language Process.
**21**(5), 945–958 (2013).View ArticleGoogle Scholar - S Grimm, J Freudenberger, in Jahrestagung für Akustik (DEGA). A phase reference for a multichannel Wiener filter by a delay and sum beamformer (Nürnberg, 2015), pp. 208–212.Google Scholar
- I Kodrasi, S Doclo, Joint dereverberation and noise reduction based on acoustic multichannel equalization. IEEE Transac. Audio, Speech, Lang. Process.
**24**(4), 680–9693 (2016).View ArticleGoogle Scholar - C Knapp, G Carter, The generalized correlation method for estimation of time delay. IEEE Transac. Acoust. Speech Signal Process.
**24**(4), 320–327 (1976).View ArticleGoogle Scholar - GC Carter, Coherence and time delay estimation: an applied tutorial for research, development, test and evaluation engineers (IEEE Press, 1993).Google Scholar
- S Doclo, M Noonen, Robust adaptive time delay estimation for speaker localization in noisy and reverberant acoustic environments. EURASIP J. Appl Signal Process.
**11:**, 1110–1124 (2003).View ArticleMATHGoogle Scholar - MS Brandstein, HF Silverman, A robust method for speech signal time-delay estimation in reverberant rooms. Proc. IEEE Int. Conf. Acoust. Speech Signal Process.
**1:**, 375–378 (1997).Google Scholar - TG Dvorkind, S Gannot, Time difference of arrival estimation of speech source in a noisy and reverberant environment. Signal Process.
**85**(1), 177–204 (2005).View ArticleMATHGoogle Scholar - J Chen, J Benesty, Y (Arden) Huang, Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. EURASIP J. Appl. Signal Process.
**2005**(1), 25–36 (2005).View ArticleMATHGoogle Scholar - J Chen, J Benesty, Y Huang, Time delay estimation in room acoustic environments: an overview. EURASIP J. Appl. Signal Process.
**2006:**, 1–19 (2006).MATHGoogle Scholar - TG Manickam, RJ Vaccaro, DW Tufts, A least-squares algorithm for multipath time-delay estimation. IEEE Transac. Signal Process.
**42**(11), 3229–3233 (1994).View ArticleGoogle Scholar - S Doclo, A Spriet, M Moonen, J Wouters, in
*Speech Enhancement*. Speech distortion weighted multichannel Wiener filtering techniques for noise reduction, chapter 9 (SpringerBerlin/Heidelberg, 2005).Google Scholar - J Freudenberger, S Stenzel, in
*IEEE Workshop on Statistical Sig. Proc. (SSP), Nice*. Time-frequency dependent voice activity detection based on a simple threshold test (IEEENice, 2011).Google Scholar - I Cohen, Relative transfer function identification using speech signals. Speech Audio Process. IEEE Transac.
**12**(5), 451–459 (2004).View ArticleGoogle Scholar - S Markovich-Golan, S Gannot, I Cohen, Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Transac. Audio, Speech, Lang. Process.
**17**(6), 1071–1086 (2009).View ArticleGoogle Scholar - S Markovich-Golan, S Gannot, in
*IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*. Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method (IEEESouth Brisbane, 2015), pp. 544–548.Google Scholar - M Schwab, P Noll, T Sikora, in
*European Signal Processing Conference (EUSIPCO)*, 2. Noise robust relative transfer function estimation (IEEEFlorence, 2006), pp. 1–5.Google Scholar - R Stewart, M Sandler, in
*IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)*. Database of omnidirectional and B-format room impulse responses (IEEEDallas, 2010), pp. 165–168.Google Scholar - MR Schroeder, Frequency correlation functions of frequency responses in rooms. J. Acoust. Soc. Am.
**34**(2), 1819–1823 (1962).View ArticleGoogle Scholar - PA Naylor, ND Gaubitch,
*Speech Dereverberation*, 1st edn. (Springer, London, 2010).View ArticleMATHGoogle Scholar