EURASIP Journal on Applied Signal Processing 2003:11, 1074–1090 c ○ 2003 Hindawi Publishing Corporation Subspace Methods for Multimicrophone Speech Dereverberation

A novel approach for multimicrophone speech dereverberation is presented. The method is based on the construction of the null subspace of the data matrix in the presence of colored noise, using the generalized singular-value decomposition (GSVD) technique, or the generalized eigenvalue decomposition (GEVD) of the respective correlation matrices. The special Silvester structure of the filtering matrix, related to this subspace, is exploited for deriving a total least squares (TLS) estimate for the acoustical transfer functions (ATFs). Other less robust but computationally more efficient methods are derived based on the same structure and on the QR decomposition (QRD). A preliminary study of the incorporation of the subspace method into a subband framework proves to be efficient, although some problems remain open. Speech reconstruction is achieved by virtue of the matched filter beamformer (MFBF). An experimental study supports the potential of the proposed methods.


INTRODUCTION
In many speech communication applications, the recorded speech signal is subject to reflections on the room walls and other objects on its way from the source to the microphones. The resulting speech signal is then called reverberated. The quality of the speech signal might deteriorate severely and this can even cause a degradation in intelligibility. Subsequent processing of the speech signal, such as speech coding or automatic speech recognition, might be rendered useless in the presence of reverberated speech. Although singlemicrophone dereverberation techniques do exist, the most successful methods for dereverberation are based on multimicrophone measurements.
Spatiotemporal methods, which are directly applied to the received signals, have been presented by Liu et al. [1] and by Gonzalez-Rodriguez et al. [2]. They consist of a spatial averaging of the minimum-phase component of the speech signal and cepstrum domain processing for manipulating the all-pass component of the speech signal. Other methods use the linear prediction residual signal to dereverberate the speech signal [3,4].
Beamforming methods [5,6] which use an estimate of the related acoustical transfer functions (ATFs) can reduce the amount of reverberation, especially if some a priori knowledge of the acoustical transfer is given. The average ATFs of all the microphones prove to be efficient and quite robust to small speaker movements. However, if this information is not available, these methods cannot eliminate the reverberation completely. Hence, we will avoid using the small movement assumption in this work as it is not valid in many important applications.
Subspace methods appear to be the most promising methods for dereverberation. These methods consist of estimating the null subspace of the data matrix. These null subspace vectors are used to extract the ATFs (see, e.g., [7,8]). The EVAM algorithm presented by Gürelli and Nikias [9] is of special interest. As the null subspace vectors are shown to be filtered versions of the actual ATFs, extraneous zeros should be eliminated. This is done by the "fractal" method which is essentially a recursive method for successively eliminating these zeros, yielding the correct filters.
The methods presented in this contribution are also based on null subspace estimation. The special Silvester structure of the filtering matrix is taken into account to derive several algorithms. Both fullband and subband versions are derived. A shorter preliminary conference version of the fullband methods has been published in [10].
The general dereverberation problem is presented in Section 2. The proposed method is outlined in Section 3. We start by deriving a method for constructing the null subspace in the presence of colored noise. Then, the special structure of the filtering matrix is exploited to derive a total least squares (TLS) approach for ATF estimation. Suboptimal procedures, based on the QR decomposition (QRD), are derived in Section 4. The use of decimated subbands for reducing the complexity of the algorithm and increasing its robustness is explored in Section 5. A reconstruction procedure, based on the ATFs' matched filter and incorporated into an extension of the generalized sidelobe canceller (GSC) is proposed in Section 6. The derivation of the algorithms is followed by an experimental study presented in Section 7.

PROBLEM FORMULATION
Assume a speech signal is received by M microphones in a noisy and reverberating environment. The microphones receive a speech signal which is subject to propagation through a set of ATFs and contaminated by additive noise. The M received signals are given by where m = 1, . . . , M, t = 0, 1, . . . , T, z m (t) is the mth received signal, y m (t) is the corresponding desired signal part, n m (t) is the noise signal received in the mth microphone, s(t) is the desired speech signal, and T + 1 is the number of samples observed. The convolution operation is denoted by * . We further assume that the ATFs relating the speech source and each of the M microphones can be modelled as an FIR filter of order n a , with taps given by a T m = a m (0), a m (1), . . . , a m n a , m = 1, 2, . . . , M. (2) Define also the Z-transform of each of the M filters as All the involved signals and ATFs are depicted in Figure 1. The goal of the dereverberation problem is to reconstruct the speech signal s(t) from the noisy observations z m (t), m = 1, 2, . . . , M. In this contribution, we will try to achieve this goal by first estimating the ATFs, a m , followed by a signal reconstruction scheme based on these ATFs estimates. Schematically, an ATF estimation procedure, depicted in Figure 2, is searched for.

ATF ESTIMATION-ALGORITHM DERIVATION
In this section, the proposed algorithm is derived in several stages. First, it is shown that the desired ATFs are embedded Figure 1: The general dereverberation problem. Figure 2: ATF estimation.

Signals
Null space in a data matrix null subspace. Then, the special structure of the null subspace is exploited to derive several estimation methods. We start our discussion with the special case of the problem, namely, the two-microphone noiseless case. We proceed through the two microphones contaminated by colored noise case. Then the general multimicrophone colored noise case is addressed. Special treatment for the case when only part of the null subspace vectors are determined concludes this section.

Two-microphone noiseless case
In this section, we lay the foundations of the algorithm by showing that the desired ATFs are embedded in the null subspace of a signal data matrix. This proof is merely a repetition of previously established results (see, e.g., [9]), but in a more intuitive way of presentation.

Preliminaries
The two-microphone noiseless case is depicted in Figure 3. The noiseless signals y m (t), as can be seen from the left-hand side of the figure, are given by Clearly, as depicted in the right-hand side of Figure 3, the following identity holds: where e l (t), l = 0, 1, 2, . . . , are arbitrary and unknown filters, the number and the order of which will be discussed in the sequel. It is evident that filtered version of the desired ATFs, subject to the constraint that the arbitrary filters e l (t) are common to all the microphone, might result in zero output. This observation was previously shown in [7,8,9]. Define the (n a + 1) × (T +n a + 1) single-channel data matrix y m n a y m n a + 1 · · · y m (T) 0 Note that, as the ATFs order n a is unknown, we use instead an (over-) estimated valuen a . An estimate of the correct order would be a product of the proposed algorithm. We assume that the inequalityn a ≥ n a holds, that is, the ATFs order is always overestimated. Define also the two-channel data matrix The 2(n a +1)×2(n a +1) correlation matrix of the data is thus given byR y = ᐅᐅ T /(T + 1). Now, following [7,9], the null subspace of the correlation matrix can be calculated by virtue of the eigenvalue decomposition (EVD). Let λ l , l = 0, 1, . . . , 2n a + 1, be the eigenvalues of the correlation matrixR y . Then, by sorting them in ascending order, we have λ l = 0, l = 0, 1, . . . ,n a − n a , λ l > 0, otherwise.
Thus, as proven by Gurelli and Nikias [9], the rank of the null subspace of the correlation matrix isn a − n a + 1. This rank is useful for determining the correct ATFs order n a . We note that the singular-value decomposition (SVD) of the data matrix ᐅ might be used instead of the EVD for determining the null subspace. The SVD is generally regarded as a more robust method. Denote the null subspace vectors (eigenvectors corresponding to zero eigenvalues or singular values) by g l for l = 0, 1, 2, . . . ,n a − n a . Then, splitting each null subspace vector into two parts of equal lengthn a + 1, we obtain Ᏻ = g 0 g 1 · · · gn a −na = ã 1,0ã1,1 · · ·ã 1,na−nã a 2,0ã2,1 · · ·ã 2,na−na .
From the above discussion, these null subspace filters may be presented in the following product: Thus, the zeros of the filtersÃ ml (z) extracted from the null subspace of the data contain the roots of the desired filters as well as some extraneous zeros. This observation was proven by Gurelli and Nikias [9] as the basis of their EVAM algorithm. It can be stated in the following lemma (for the general M channel case).
Lemma 1. Letã ml be the partitions of the null subspace eigenvectors into M vectors of lengthn a +1, withÃ ml (z) their equivalent filters. Then, all the filtersÃ ml (z) for l = 0, . . . ,n a −n a have n a common roots, which constitute the desired ATFs A m (z), and n a −n a different extraneous roots, which constitute E l (z). These extraneous roots are common for all partitions of the same vector, that is,Ã ml (z) for m = 1, . . . , M.
Under several regularity conditions (stated, e.g., by Moulines et al. [7]), the filters A m (z) can be found. An observation of special interest is that common roots of the filters A m (z) cannot be extracted by the algorithm as they are treated as the extraneous roots which constitute E l (z). Although this is a drawback of the method, we will take benefit of it while constructing a subband structure in Section 5.
In matrix form, equation (11) may be written in the following manner. Define the (n a + 1) × (n a − n a + 1) Silvester filtering matrix (recall that we assumen a ≥ n a ) Then,ã where e T l = e l (0) e l (1) · · · e l (n a − n a ) are vectors of the coefficients of the arbitrary unknown filters E l (z). Thus, the number of different filters (as shown in (11)) isn a − n a + 1 and their order isn a − n a . Using Figure 3 and identity (5), and denoting Ᏹ = e 0 e 1 · · · en a −na , we conclude that where Ᏹ is an unknown (n a −n a +1)×(n a −n a +1) matrix. We note that in the special case when the ATFs order is known, that is,n a = n a , there is only one vector in the null subspace and its partitionsã m0 , m = 1, . . . , M, are equal to the desired filters a m up to a (common) scaling factor ambiguity. In the case wheren a > n a , the actual ATFs A m (z) are embedded iñ A ml (z), l = 0, 1, . . . ,n a − n a . The casen a < n a could not be treated properly by the proposed method. The special structure depicted in (15) and (12) forms the basis of our suggested algorithm.

Algorithm
Based on the special structure of (15) and, in particular, on the Silvester structure of Ꮽ 1 and Ꮽ 2 , found in (12), we derive now an algorithm for finding the ATFs A m (z).
Note that Ᏹ in (15) is a square and arbitrary matrix, implying that its inverse usually exists. Denote this inverse by Denote the columns of Ᏹ i by Ᏹ i = e i 0 e i 1 · · · e in a −na . Equation (16) can be then rewritten as whereᏳ is defined as and the vector of unknowns is defined as where 0 is a vector of zeros: 0 T = [0 0 · · · 0]. We used the following expressions: ᏻ is a 2(n a +1) × (n a − n a + 1) all-zeros matrix and Ᏽ (l) , l = 0, 1, . . . ,n a − n a , is a fixed shifting matrix given by where I (na+1)×(na+1) is the (n a + 1) × (n a + 1) identity matrix. A nontrivial (and exact) solution for the homogenous set of (17) may be obtained by finding the eigenvector of the ma-trixᏳ corresponding to its zero eigenvalue. The ATF coefficients are given by the last 2(n a + 1) terms of this eigenvector. The first part of the eigenvector is comprised of the nuisance parameters e i l , l = 0, 1, . . . ,n a − n a . In the presence of noise, the somewhat nonstraightforward procedure will prove to be useful.

Two microphone noisy case
Recall that Ᏻ is a matrix containing the eigenvectors corresponding to zero eigenvalues of the noiseless data matrix. In the presence of additive noise, the noisy observations z m (t), given in (1), can be stacked into a data matrix fulfilling where ᐆ and ᏺ are noisy signal and noise-only signal data matrices, respectively, given by n m n a n m n a + 1 · · · n m (T) 0 Now, for a long observation time, the following approximation holds:R whereR z = ᐆᐆ T /(T + 1) andR n = ᏺᏺ T /(T + 1) are the noisy signal and noise-only signal correlation matrices, respectively. Now, (17) will not be accurate anymore. First, the null subspace matrix Ᏻ should be determined in a slightly different manner than suggested in (8). The white noise and colored noises cases are treated separately in the sequel. Second, the matrix Ᏻ will not in general have an eigenvalue of value 0. A reasonable approximation for the solution, although not exact, would be to transform (17) into the following problem:Ᏻ where µ is an error term, which should be minimized. To obtain this minimization, the eigenvector corresponding to the smallest eigenvalue is chosen, and the desired ATFs are obtained from the last part of the vector (as in the noiseless case). Note that this is exactly the total least squares (TLS) approach for estimating the parameters. As the matrixᏳ is highly structured, more efficient structured total least squares (STLS) methods [11] are called for. This issue will not be treated in this work anymore.

White noise case
In the case of spatiotemporally white noise-that is,R n ≈ σ 2 I, where I is the identity matrix-the firstn a −n a +1 eigenvalues in (8) will be σ 2 instead of zero. The corresponding eigenvectors will remain intact. Thus, the algorithm remains unchanged.

Colored noise case
The case of nonwhite noise signal was addressed in [7,9]. In contrast to the noise balancing method presented in [9] and the prewhitening of the noise correlation matrix presented in [7], the problem is treated here more rigourously, with the application of the generalized eigenvalue decomposition (GEVD) or generalized singular-value decomposition (GSVD) techniques. These alternative methods are computationally more efficient. We suggest to use the GEVD of the measurement correlation matrix R z and the noise correlation matrix R n (usually, the latter is estimated from speech-free data segments). The null subspace matrix Ᏻ is formed by choosing the generalized eigenvectors related to the generalized eigenvalues of value 1. Alternatively, the GSVD of the corresponding data matrices ᐆ and ᏺ can be used. After determining the null subspace matrix, subsequent steps of the algorithm remain intact.

Multimicrophone case (M > 2)
In the multimicrophone case, a reasonable extension would be based on channel pairing (see [9]). Each of the pairs Thus, the new data matrix would be constructed as follows: where ᏻ here is an (n a + 1) × (T +n a + 1) all-zero matrix. This data matrix, as well as the corresponding noise matrix, can be used by either the GEVD or the GSVD methods to construct the null subspace. Denoting this null subspace by Ᏻ, we can construct a new TLS equation: where Ᏻ is constructed in a similar way as Ᏻ was constructed in (18). The vector of unknowns x is given by Note that the last M × (n a + 1) terms of x are the required filter coefficients a m , m = 1, 2, . . . , M.

Partial knowledge of the null subspace
In the noisy case, especially when the dynamic range of the input signal s(t) is high (which is the case for speech signals), determination of the null subspace might be a troublesome task. As there are no zero eigenvalues, and as some of the eigenvalues are small due to the input signal, the borderline between the signal eigenvalues and the noise eigenvalues becomes vague. As the number of actual null subspace vectors is not known in advance, using only a subgroup of the eigenvectors, which are associated with the smallest eigenvalues, might increase the robustness of the method. Based on Lemma 1, it is obvious that, in the noiseless case, even two null subspace vectors are sufficient to estimate the ATFs just by extracting their common zeros. Denote byL <n a − n a the number of eigenvectors used. The matrix Ᏹ in (15) is then of dimensions (n a − n a + 1) ×L, and is thus noninvertible.
To overcome this problem, we suggest concatenating several shifted versions of (15) in the following manner: The new dimensions ofᏱ is L × (n a − n a +l), wherel is the number of blocks added. Each block adds 1 to the row dimension andL to the column dimension. The matrixᏭ has a similar structure as Ꮽ in (12) and (15) but with more columns. The resulting matrixᏱ has now more columns than rows and thus can generally be pseudoinverted: resulting inᏳ Now the extended matrixᏳ can be used in (24) instead of Ᏻ to construct Ᏻ in a similar manner to (18). Subsequent stages of the algorithm remain intact.

A SUBOPTIMAL METHOD-THE QR DECOMPOSITION AND ESTIMATES AVERAGING
Recall that the special structure of the filtering matrix Ꮽ was the basis for the TLS approach. In this section, a new method is derived for the estimate of the ATFs, which is computationally more efficient although less robust. We rely again on the fact that each column of the Silvester matrix is a delayed version of the previous one. Thus, in the noiseless case, it is enough to extract one of the columns. In the noisy case, each column may be different. Thus extracting all the columns might give several slightly different estimates. We can take the median (or average) of these estimates to increase the robustness.

Complete knowledge of the null subspace
Apply the transpose operation to (15), As Ᏹ T is an arbitrary matrix, it will usually have a QRD. Denote Ᏹ T = Q Ᏹ R Ᏹ . Then, where R Ᏻ = R Ᏹ Ꮽ T is also an upper triangular matrix since it consists of a multiplication of two upper triangular matrices.
Since the QRD is unique, equation (33) constitutes the QRD of Ᏻ T . As R Ᏹ is a square and upper triangular matrix, it has only one nonzero element in its last row. Therefore, the last row of R Ᏻ will be a scaled version of the last column of Ꮽ. This last column consists of a concatenation of the vectors a m , m = 1, 2, . . . , M, each preceded byn a − n a zeros. Define A(t, e jω ) = A 1 (t, e jω ) A 2 (t, e jω ) · · · A M (t, e jω ) .
(2) Fixed beamformer (FBF) W 0 (t, e jω ) = A(t, e jω ) A(t, e jω ) 2 . FBF output: Y FBF (t, e jω ) = W † 0 (e jω )Z(t, e jω ). For extracting the other columns of the matrix Ꮽ, we use rotations of the null subspace matrix Ᏻ. Note that the QRD procedure will extract the rightmost column of Ꮽ regardless of its Silvester structure. Define the K ×K row rotation matrix It is obvious that left multiplication of a K-row matrix by J k K will rotate its rows downwards k times, while right multiplication of an L-columns matrix by (J l L ) T will rotate its columns rightwards l times. Lemma 2 can now be used to extract an estimate of the ATFs. Lemma 2. Compute the QRD of the transpose of the k-times (k ≤n a − n a + 1) row-rotated null subspace matrix Ᏻ. The last row of the R matrix equals the kth column (counting from the rightmost column) of the filtering matrix Ꮽ up to a scaling factor.
The proof of this lemma follows.
Proof. Rotate the M(n a + 1) × (n a − n a + 1) null subspace matrix Ᏻ not more thann a − n a + 1 times. Then, Exploiting the orthogonality of the matrices J k K , we have Then, applying the transpose operation, Now assume a QRD for the first term (although Ᏹ is not known), Then, The last row of (J k M(na+1) Ꮽ(J k na−na+1 ) T ) T is the kth column (counting from the rightmost column) Ꮽ T , provided that k ≤n a −n a +1 and it is still an upper triangular matrix. Thus, the same statements regarding the nonrotated matrices apply for the rotated matrices.
By rotating through all the columns of matrix Ꮽ, several estimates of the desired filter are obtained. An average or a median of these estimates can be used to obtain a more robust estimate.

Partial knowledge of the null subspace
As in the TLS approach, we may want to use only part of the null subspace vectors. Assume that we have only two of these null subspace vectors,Ᏻ whereᏳ is an M(n a + 1) × 2 matrix andᏱ is an (n a − n a + 1) × 2 matrix. SinceᏱ is not a square matrix, the algorithm of Section 4.1 is not applicable anymore. LetᏳ Each of the vectorsã m,l represents a null subspace filter of ordern a . Since there are only two rows, applying the QRD to Ᏻ T will yield the following RᏳ matrix: Note that nowã m,1 relate to filters that have an order which is lower than their corresponding filtersã m,1 by 1. As the first row RᏳ is not important, it is not presented. To further reduce the order by virtue of another QRD application, we need another set of filtered version of the ATFs. This set may be obtained in several ways. One possibility (although others are also applicable) is to rotate each part ofᏳ, that is,ã m,l , downwards and apply the QRD again. After this two-steps stage, we obtain a shorter null subspacȇ This process is repeatedn a − n a times until the correct order is reached and only a common scale factor ambiguity remains. This method has an appealing structure since the extra roots are eliminated recursively, one in each stage of the algorithm. Each stage of the recursion is similar to the previous one. This property resembles the "fractal" nature of the EVAM algorithm [9].

SUBBAND METHOD
The proposed method, although theoretically supported, can have several drawbacks in real-life scenarios. First, actual ATFs in real room environments may be very long (1000-2000 taps are common in medium-sized room). In such a case, the GEVD procedure is not robust enough and it is quite sensitive to small errors in the null subspace matrix [13]. Furthermore, the matrices involved become extremely large causing huge memory and computation requirements. Another problem is the speech signal wide dynamic range. This may result in erroneous estimates of the frequency response of the ATFs in the low energy parts of the input signal. Thus, frequency domain approaches are called for. In this section, we suggest incorporating the TLS subspace method into a subband structure. The use of subbands for splitting adaptive filters, especially in the context of echo cancellation, has gained recent interest in the literature [14,15,16,17]. However, the use of subbands in subspace methods is not that common. The design of the subbands is of crucial importance. Special emphasis should be given to adjusting the subband structure to the problem at hand. In this contribution, we only aim at demonstrating the ability of the method, thus only a simple eight-channel subband structure was used as depicted in Figure 4. Each of the channel filters is an FIR filter of order 150. The filters are equispaced along the frequency axis and are of equal bandwidth. Now the M microphone signals are filtered by the subband structure. The subspace methods presented above can be applied on each subband signal separately. Although the resulting subband signal corresponds to a longer filter (which     is the convolution of the corresponding ATF and the subband filter), the algorithm aims at reconstructing the ATF alone, ignoring the filterbank roots. This is due to the fact that the subband filter is common for all channels. Recall that subspace methods are blind to common zeros, as discussed in Section 3. For properly exploiting the benefits of the subband structure, each subband signal should be decimated. We took a critically decimated filterbank, that is, decimation factor equals the number of bands. By doing so, the ATF order in each band is reduced by approximately the decimation factor, making the estimation task easier. Note that now we need only to overestimate the reduced order of the ATFs in each subband rather than the fullband order. Another benefit arises from the decimation. The signals in each subband are flatter in the frequency domain, making the signals processed whiter, and thus enabling lower dynamic range and resulting in improved performance. After estimating the decimated ATFs, they are combined together using a proper synthesis filterbank comprised of interpolation followed by filtering with a filterbank similar to the analysis subband filters. The overall subband system is depicted in Figure 5, where the ATF estimation block is shown schematically in Figure 2. Gain ambiguity may be a major drawback of the subband method. Recall that all the subspace methods are estimating the ATFs only up to a common gain factor. In the fullband scheme, this does not impose any problem since it results in overall scaling of the output. In the subband scheme, the gain factor is common for all subband signals but is generally different from band to band. Thus, the estimated ATFs (and the reconstructed signal) are actually filtered by a new arbitrary filter, which can be regarded as a new reverberation term. Several methods can be applied to overcome this gain ambiguity problem. First, the original gain of the signals in each subband may be restored as an approximate gain adjustment.
Another method was suggested by Rahbar et al. [18]. The method imposes the resulting filters to have as few taps as possible (actually, the filters are constrained to be FIR). The order of these filters should be determined in advance. As we do not have this information, we suggest using the ATFs order estimation obtained by the subspace method. The use of this method is a topic of further research. As far as this paper goes, the gain ambiguity problem is ignored, and we will assume that the gain in each subband is known. Thus we would only demonstrate the ability of the method to estimate the frequency shaping of the method in each band.

SIGNAL RECONSTRUCTION
Having the estimated ATFs, we can invert them and apply the inverse filter to the received signals to obtain the desired signal estimate. A method for inverting multichannel FIR filters by a multichannel set of FIR filters is presented in [19].
We use instead a frequency-domain method. Rewrite (1) It is easily verified that if the estimation of the ATFs is sufficiently accurate, that is,Â m (t, e jω ) A m (t, e jω ), then the first term in (45) becomes S(t, e jω ) and dereverberation is obtained. The second term is a residual noise term, which can even be amplified by the procedure. To achieve a better estimation of the speech signal, when noise is present, we suggest incorporating the procedure into the recently proposed extended GSC, derived by Gannot et al. [20], shown schematically in Figure 6 and summarized in Algorithm 1. This GSCbased structure enables the use of general ATFs rather than delay-only filters in order to dereverberate the speech signal and to reduce the noise level. It consists of a fixed beamformer branch-which is essentially the MFBF described in (45), a noise reference construction block-which uses the ATFs ratios (note that U m (t, e jω ) contain only noise terms), and a multichannel noise canceller branch (consisting of the filters G m (t, e jω )). The use of the GSC structure is only essential when the noise level is relatively high, otherwise, the MFBF branch produces sufficiently accurate estimate.

EXPERIMENTAL STUDY
The validity of the proposed methods was tested using various input signals and ATFs modelled as an FIR filters with exponential decaying envelope, and compared with the EVAM algorithm [9]. This input signal consisted of either white noise, speech-like noise (white signal colored to have a speech-like spectrum, drawn from the NOISEX-92 database [21]), or a real speech signal comprised of a concatenation of several speech signals drawn from the TIMIT database [22]. The input signal was 32000 samples long (corresponding to 4 seconds for the 8-kHz sampled speech signal, including silence periods). Three-microphone signals were simulated by filtering the input signal by various ATFs. In most applications, the order of realistic ATFs is at least several hundred taps. However, our method (as well as other methods in the literature) is incapable of dealing with this order. Thus, we settled for short ATFs of order of either 8, 16, or 32. To approximate more realistic scenario, we used ATFs with exponential decaying envelope. Various SNR levels were taken to test the robustness of the algorithms to additive noise. Temporally nonwhite but spatially white (i.e., no correlation between noise signals at the microphones) noise signals were used. The noise correlation matrix was estimated using signals drawn from the same source but at different segments. The basic fullband algorithm (using all the null subspace vectors or only part of them) as well as QRD-based algorithms (again, with all the vectors or only part of them) were tested and compared with the state-of-the-art EVAM algorithm. Then, the subband-based algorithm was evaluated to confirm its ability to comprehend longer filters. The gain ambiguity problem was not addressed in this experimental study, and the gain levels in the various bands were assumed to be known a priori (see also [5]).
The frequency response of the real ATFs compared with the estimated ATFs for the fullband algorithm is depicted in Figures 7, 8, and 9 for SNR levels of 45 dB, 35 dB, and 25 dB, respectively. The filter order was overestimated by 5, that is, n a − n a = 5 in all cases. While the estimation with speechlike noise (with a wide dynamic range) is quite good at the higher SNR level and filter order 32 (Figure 7), when the SNR decreases to 35 dB, the performance is maintained only with a white noise input signal (Figure 8). For SNR level of 25 dB, the algorithm fails to work with the longer filter and its order had to be reduced to only 8 ( Figure 9). The sensitivity to the noise level is thus clearly indicated.
Results for the suboptimal QRD-based algorithm are depicted in Figure 10 for the speech-like input and an SNR level of 45 dB, and in Figure 11 for a white noise input and an SNR level of 35 dB. The QRD-based method is more sensitive to the noise level. At 45 dB, good results could be obtained only with filter of order 16, and at 35 dB, only with filter order of 8.
In all cases, using only part of the null subspace vectors yielded reduced performance. Therefore, we omit results concerning these experiments from the presentation.
For comparison, we used the EVAM algorithm, while successively reducing the overestimation of the filter order in their fractal-based method as explained in [9]. Results for the speech-like input are depicted in Figures 12 and 13 for SNR levels of 45 dB and 35 dB, respectively. It is clearly shown that, while high performance of the EVAM algorithm is demonstrated, degradation is encountered at the lower SNR level, especially at the high-frequency range where the input signal content is low.
The incorporation of the subspace method into a subband structure is depicted in Figures 14 and 15. We used the eight-subband structure shown in Figure 4. The decimation in each channel by a factor of 8 (critically decimated) allowed for a significant order reduction. In particular, the correct order of the filter in each channel was only 4. In this case, we overestimated the correct order only by 2, since the null subspace determination is less robust. In Figure 14, the subband structure is depicted, and the estimated response is given in each band separately. In Figure 15, all the bands are combined to form the entire frequency response of the ATFs. The results demonstrate the ability of the algorithm to work well at lower SNR levels (25 dB) while the filter order is still relatively high (n a = 32), even for the speech-like signal. It is worth noting that errors in the frequency response are mainly encountered in the transition regions between the frequency bands. This phenomenon should be explored indepth to enable a filterbank design, which is more suited to the problem at hand.
Finally, the fullband algorithm is tested with real speech signal. For demonstrating the dereverberation ability we used a high SNR level (50 dB) and n a = 32. The frequency response of the estimated filter is depicted in Figure 16 for the fullband method. The dereverberated speech signal is shown in Figure 17. It is clearly shown from the figure, that while the microphone signal is different from the original signal (due to the filtering by the ATF), the dereverberated signal resembles it. This is also supported by a more than 12 dB decrease in the power of the difference.

CONCLUSIONS
A novel method for speech dereverberation based on null subspace extraction (applying either GSVD to a noisy data matrix or GEVD to the corresponding correlation matrix) is suggested. An ATF estimation procedure is obtained by exploiting the special Silvester structure of the corresponding filtering matrix by using TLS fitting. An alternative, more efficient method, is proposed, based on the same null subspace structure and on the QRD. The TLS approach, although imposing a high computational burden, is found to be superior to the cheaper QRD method. The desired signal is obtained by incorporating the estimated ATFs into an extended GSC structure.
Of special interest is the subband framework, in which the ATF estimation is done in each decimated band separately and then combined to form the fullband ATFs. This technique allows an increase of the filter order which can be treated by the proposed system while maintaining good performance even with real speech signals and higher noise levels. This method still suffers from the gain ambiguity problem, and thus, should be further explored. Unfortunately, this issue is left for further research. However, we note that such subband structure might be incorporated into other methods as well (e.g., the EVAM algorithm). An experimental study supports the above conclusions. It is worth mentioning that the method (as well as other methods in the literature) should be further explored and tested in actual scenarios where the ATFs are more realistic and much longer.