Exploiting Narrowband Efﬁciency for Broadband Convolutive Blind Source Separation

Based on a recently presented generic broadband blind source separation (BSS) algorithm for convolutive mixtures, we propose in this paper a novel algorithm combining advantages of broadband algorithms with the computational e ﬃ ciency of narrowband techniques. By selective application of the Szeg¨o theorem which relates properties of Toeplitz and circulant matrices, a new normalization is derived as a special case of the generic broadband algorithm. This results in a computationally e ﬃ cient and fast converging algorithm without introducing typical narrowband problems such as the internal permutation problem or circularity e ﬀ ects. Moreover, a novel regularization method for the generic broadband algorithm is proposed and subsequently also derived for the proposed algorithm. Experimental results in realistic acoustic environments show improved performance of the novel al-gorithm compared to previous approximations.


INTRODUCTION
Blind source separation (BSS) refers to the problem of recovering signals from several observed linear mixtures [1].In this paper we deal with the convolutive mixing case as encountered, for example, in acoustic environments.Therefore, we are interested in finding a corresponding demixing system, where the output signals y q (n), q = 1, . . ., P, are described by and where w pq,κ , κ = 0, . . ., L − 1, denote the current weights of the MIMO filter taps from the pth sensor channel x p (n) to the qth output channel.In this paper the number of active source signals Q is less than or equal to the number of microphones P. BSS algorithms are solely based on the fundamental assumption of mutual statistical independence of the different source signals.The separation is achieved by forcing the output signals y q to be mutually statistically decoupled up to joint moments of a certain order.
In [2] a generic framework called TRINICON (Triple-N ICA for convolutive mixtures) has been introduced for multichannel blind signal processing, such as BSS or dereverberation based on multichannel blind deconvolution (MCBD).In [3,4] we have also shown that based on this framework many seemingly different BSS algorithms can be treated in a unified way.Apart from these existing BSS algorithms, also several novel broadband convolutive BSS algorithms for both the time and frequency domains have been derived.In this paper we exemplarily use a second-order BSS algorithm resulting from the broadband time-domain derivation in [3,4].This yields an algorithm which possesses an inherent normalization of the coefficient update leading to fast convergence also for colored signals such as speech.However, for realistic acoustic environments large correlation matrices have to be inverted for every output channel.An approximation of this matrix by a diagonal matrix led to a very efficient algorithm which allows real-time implementation using a block-online update structure [5].In Section 2 the generic broadband algorithm combined with the blockonline update is briefly summarized.In Section 3 a novel normalization strategy is presented which is obtained by the application of the Szegö theorem and constitutes a better approximation of the inverse autocorrelation matrix.In general, the Szegö theorem relates the eigenvalues of circulant and Toeplitz matrices which can here be interpreted as the relation between broadband and narrowband signal models.The novel normalization leads to an algorithm where the main parts of the algorithm are still implemented in a broadband manner and thus avoid the internal permutation problem and circularity effects as experienced in purely narrowband BSS algorithms.Due to the selective application of the Szegö theorem only the normalization is implemented using the narrowband approximation which leads to a computationally efficient algorithm as the matrix inverse can be replaced by a scalar inversion in each frequency bin.Another important aspect for robust implementations is the regularization of the possibly ill-conditioned correlation matrices prior to inversion.This issue is discussed in Section 4 and a novel regularization strategy is presented for the generic broadband algorithm.An analogous regularization method is then derived for the proposed algorithm.Finally, experimental results show the improved performance of the new algorithm.

Cost function and block-online update
A block processing broadband algorithm simultaneously exploiting nonwhiteness and nonstationarity of the source signals is derived from the following matrix formulation [3].First, we introduce a block output signal matrix y q (mL + 1) . . .y q (mL − D + 2) . . . . . . . . .
and reformulate the convolution (1) as with m being the block time index and N denoting the block length.The N ×D matrix Y q (m) incorporates D time lags into the correlation matrices in the cost function, as is necessary for the exploitation of the nonwhiteness property.To ensure linear convolutions for all elements of Y q (m), the N ×2L matrices X p (m) and 2L × D matrices W pq are given as x p (mL + 1) . . .x p (mL − 2L + 2) . . . . . . . . .
w pq,0 0 where the matrices X p (m), p = 1, . . ., P, in (3) are Toeplitz matrices due to the shift of subsequent rows by one sample each.The matrices W pq exhibit a Sylvester structure which is a special form of a Toeplitz matrix, where each column is shifted by one sample containing the current weights w pq = [w pq,0 , w pq,1 , . . ., w pq,L−1 ] T of the MIMO sub-filter of length L from the pth sensor channel to the qth output channel.Superscript T denotes transposition of a vector or a matrix.It can be seen that for the general case 1 ≤ D ≤ L the last L − D + 1 rows are padded with zeros to ensure compatibility with X p .To allow a convenient notation of the algorithm combining all channels, we write (3) for all channels simultaneously as with the matrices The definition of Y q in (2) leads to the short-time corre- Here • H denotes conjugate transposition.In [3] a cost function based on these correlation matrices has been presented which inherently includes all D time lags of all autocorrelations and crosscorrelations of the BSS output signals where bdiag R yy creates a PD × PD block-diagonal matrix with the channelwise D × D submatrices R yqyq , q = 1, . . ., P, on the main diagonal and zeros elsewhere.The variable β denotes a weighting function with finite support that is normalized according to m i=0 β(i, m) = 1 allowing offline, online, or block-online realizations of the algorithm.The concept of a general weighting function is already well known from supervised adaptive filtering [6].There it was shown that, for example, the weighting function ∞ i=0 β(i, m) = m i=0 (1 − λ)λ m−i leads to a recursive online algorithm.The parameter λ denotes the exponential forgetting factor (0 < λ < 1) and i is the summation index of all blocks up to the current block m.The cost function becomes zero if and only if R ypyq , p = q, that is, all output cross-correlations over all time lags become zero.Thus, (8) explicitly exploits the nonwhiteness property of the output signals.
In [3] a coefficient update based on (8) was derived and in [5] a block-online update rule was derived for the coefficient update by specifying β(i, m) such that it leads to a combination of an online update and an offline update.In the block-online update scheme the offline part is calculated iteratively for the current block m containing KN samples as where j = 1, . . ., j max denotes the current iteration, μ is the stepsize, and W j (m) is the demixing filter matrix after j iterations based on data of the mth block.Equation ( 10) performs a simultaneous optimization for K blocks of length N which allows to exploit the nonstationarity of the source signals as for each block the source statistics change and thus new conditions are generated.Thus, (10) contains K update terms Q(i, W j−1 (m)) which are determined as the natural gradient of the cost function (8) [3] A high number of offline iterations j max allows a fast convergence without introducing an additional algorithmic delay but at the cost of an increased computational complexity.The demixing filter matrix W jmax (m) of the current block m which is obtained from the offline part after j max iterations is then used as input to the online part of the block-online algorithm which is written recursively as with the forgetting factor λ.This yields the final demixing filter matrix W(m) of the current block m containing the filter weights w pq (m) used for separation.The demixing filter weights w pq (m) of the current block are then used as initial values for the offline algorithm (9) of the next block.An overview of the block-online update procedure can also be found in the pseudocode given in Table 1.
It should be pointed out that the natural gradient (11) obtained from the cost function (8) can similarly be derived using the Kullback-Leibler divergence based on multivariate probability density functions [4].The second-order BSS algorithm is then obtained by using the multivariate Gaussian probability density function.

Compute for each block
Calculate the signal energy of each block i σ 2 Calculate Y H 1 Y 1 in (33) by scalar multiplication in each frequency bin and perform narrowband regularization according to (33) by using the signal energy Perform scalar inversion of the frequency-domain values on the main diagonal of S y1y1 (i) as given in (26) and apply the inverse DFT to the resulting vector to obtain the first column of the circulant matrix C −1 Y1 Y1 (i).( 9) In ( 27) the circulant matrix C −1 Y1 Y1 (i) is constrained to yield the approximation of the inverse of the Toeplitz matrix R −1 y1y1 (i).Matrix R −1 y1y1 (i) can be generated by picking the first L and last L − 1 values of the resulting vector from Step (8).(10) Compute the matrix product R y2y1 (i)R −1 y1y1 (i) in (11) by fast convolution techniques exploiting the Toeplitz structure of both matrices.The result A y2y1 (i) of the matrix product may be approximated due to complexity reasons by calculating only the entries [a(i, 0), . . ., a(i, −L + 1)] in the first column and the entries [a(i, 0), . . ., a(i, L − 1)] in the first row and generate a Toeplitz structure from these values.(11) Compute the matrix product Table 1: Continued.
(12) Update equation for the offline part (note that also an adaptive stepsize according to [5] can be applied):

Estimation of the correlation matrices and Sylvester constraint SC
In principle, there are two basic methods for the blockbased estimation of the short-time output correlation matrices R ypyq (i) for nonstationary signals: the so-called covariance method and the correlation method, as they are known from linear prediction problems [7]. 1 In [3] the more accurate covariance method was introduced by the definition R ypyq (i) = Y H p (i)Y q (i).In [5] the computationally less complex correlation method was used which is obtained by assuming stationarity within each block i.This leads to a Toeplitz structure of the D × D matrix R ypyq (i) which can be expressed as r yp yq (i, −1) . . .r yp yq (i, D − 2) . . . . . . . . .
Using the correlation method, the Toeplitz matrix R ypyq can also be written as a matrix product where Y p denotes (N + D) × D matrix exhibiting a Sylvester structure as shown for the coefficient matrix in (5).The first column vector of Y p (i) contains the output signal values y q (iL), . . ., y q (iL + N − 1) analogously to the first column vector of (2).In contrast to the covariance method using the matrix defined in (2) now additionally D zeros are appended to the output signal values.For each subsequent column this vector is shifted by one sample as shown in (5). 1 It should be emphasized that the terms covariance method and correlation method are not based upon the standard usage of the covariance function as the correlation function with the means removed.
In [3] the coefficient update was derived by taking the derivative with respect to the Sylvester matrix W. There, it was shown that the Sylvester structure of the update Q in (11) has to be ensured by a Sylvester constraint (SC).In [5,8] two efficient versions have been discussed.They allow to implement the matrix multiplication of W with the remaining Toeplitz matrix in (11) as a fast convolution reducing the complexity from O(L 3 ) to O(log(L)).A detailed analysis of the computational complexity of the algorithm ( 9)-( 12) can be found in [5].In the present paper we apply the row Sylvester constraint SC R which calculates only the Lth row of the update Q and then replicates the elements to obtain the Sylvester structure of W. A detailed discussion of the Sylvester constraints can be found in [8].

NORMALIZATION STRATEGIES
The update of the generic algorithm given by ( 11) exhibits an inherent normalization by the inverse of a block-diagonal matrix.This is an advantage compared to algorithms based on Frobenius norm cost functions as, for example, [9] where heuristic normalizations have to be introduced.Moreover, (11) allows for several normalization strategies by applying certain approximations as shown in the following.

Exact normalization based on matrix inverse
When using the correlation method, the D × D Toeplitz matrices R yqyq , q = 1, . . ., P, given by (15), have to be inverted in (11).This is similar to the matrix inversion occurring in the recursive least-squares (RLS) algorithm in supervised adaptive filtering [6].The complexity of a Toeplitz matrix inversion is O(D 2 ).For realistic acoustic environments large values for D (e.g., 1024) are required which are prohibitive for a real-time implementation of the exact normalization on most current hardware platforms.

Normalization based on diagonal matrices in the time domain
In [5] an approximation of the matrix inverse has been used to obtain an efficient algorithm suitable for real-time implementations.There, the off-diagonals of the autocorrelation submatrices have been neglected, so that for the correlation method it can be approximated by a diagonal matrix with the output signal powers, that is, for q = 1, . . ., P, where the diag operator applied to a matrix sets all off-diagonal elements to zero.Thus, the matrix inversion is replaced by an element-wise division.This is comparable to the normalization in the well-known normalized least mean squares (NLMS) algorithm in supervised adaptive filtering approximating the RLS algorithm [6].

Novel approximation of exact normalization based on the Szegö theorem
The broadband algorithm given by ( 9)-( 12) can also be formulated equivalently in the frequency domain as has been presented in [3].Additionally it has been shown that by certain approximations to this frequency-domain formulation a purely narrowband version of the broadband algorithm can be obtained.In this section we will derive a novel algorithm combining broadband and narrowband techniques by using two steps.First, the exact normalization is formulated equivalently in the frequency domain (Section 3.3.1).In a second step the Szegö theorem is applied to the normalization to obtain an efficient version of the exact normalization (Section 3.3.2).The Szegö theorem allows a selective introduction of narrowband approximations to specific parts of the algorithm.This approach allows to combine both the advantages of the broadband algorithm (e.g., avoiding internal permutation ambiguity and circularity problem) and the low complexity of a narrowband approach.

Exact normalization expressed in the frequency domain
In [10] it was shown that any Toeplitz matrix can be expressed equivalently in the frequency domain by first generating a circulant matrix by proper extension of the Toeplitz matrix.Then the circulant matrix is diagonalized by using the discrete Fourier transform (DFT) matrix F R of size R × R where R ≥ N + D denotes the transformation length.These two steps are given for the Toeplitz output signal matrix Y q as where C Yq is a R × R circulant matrix and the window matrices are given as Here the convention is used that the lower index of a matrix denotes its dimensions and the upper index describes the positions of ones and zeros.The size of the unity submatrices is indicated in subscript (e.g., "01 D ").The matrix Y q exhibits a diagonal structure containing the eigenvalues of the circulant matrix C Yq on the main diagonal.The eigenvalues are calculated by the DFT of the first column of C Yq and thus Y q can be interpreted as the frequency-domain counterpart of Y q : Diag F R 0, . . ., 0, y q (iL), . . ., y q (iL+N − 1), 0, . . ., 0 T . (20) The operator Diag{a} denotes a square matrix with the elements of vector a on its main diagonal.An illustration of the circulant matrix C Yq and the window matrices, which constrain the circular matrix to the original matrix Y q , is given in Figure 1.With (18) we can now write R ypyq as It can be seen in the upper left corner of the illustration in Figure 1 that by extending the window matrix W 01N+D N+D×R to W 01R R×R = I R×R only rows of zeros are introduced at the beginning of the matrix Y q , that is, ( 17) is now of the form These appended rows of zeros have no effect on the calculation of the correlation matrix R ypyq and thus we can replace the multiplication of the window matrices in (21) by This leads to The correlation matrix in ( 24) is an equivalent expression to (15) in the frequency domain.Thus, the normalization based on the inversion of (24) or (25) for p = q = 1, . . ., P still corresponds to the exact normalization based on the matrix inverse of a Toeplitz matrix as described in Section 3.1.In the following it is shown how the inverse of ( 25) can be approximated to obtain an efficient implementation.

Application of the Szegö theorem
In the tutorial paper [10] the Szegö theorem is formulated and proven for finite-order Toeplitz matrices.A finite-order Toeplitz matrix is defined as an R × R Toeplitz matrix where a finite D exists such that all elements of the matrix with the row or column index greater than D are equal to zero.It was shown in [10] that the R × R Toeplitz matrix of order D is asymptotically equivalent to the R × R circulant matrix generated from an appropriately complemented D × D Toeplitz matrix.If the two matrices are also of Hermitian structure, then the Szegö theorem on the asymptotic eigenvalue distribution states the following.
(1) The eigenvalues of both matrices lie between a lower bound and an upper bound.(2) The arithmetic means of the eigenvalues of both matrices are equal if the size R of both matrices approaches infinity.
Then, the eigenvalues of both matrices are said to be asymptotically equally distributed.
It can be seen in ( 25) that the autocorrelation matrix necessary for the normalization can be expressed as a D × D Toeplitz matrix R yqyq or an R × R circulant matrix C Yq Yq generated from the Toeplitz matrix by extending it appropriately and multiplying it with some window matrices.According to [10] both matrices are asymptotically equivalent.As both the Toeplitz and the circulant matrices are Hermitian, it is possible to apply the Szegö theorem.The eigenvalues of C Yq Yq are given in (24) as the elements on the main diagonal of the diagonal matrix Y H q Y q .The Szegö theorem states that the eigenvalues of the R × R Toeplitz matrix generated by appending zeros to R yqyq can be asymptotically approximated by Y H q Y q for R → ∞.The benefit of this approximation becomes clear if we take a look at the inverse of a circulant matrix.The inverse of a circulant matrix can be easily calculated by inverting its eigenvalues By using the Szegö theorem we can now approximate the inverse of the Toeplitz matrix R yqyq by the inverse of the circulant matrix (26 This can also be denoted as narrowband approximation because the eigenvalues Y H q Y q can easily be determined as the DFT of the first column of the circulant matrix C Yq Yq .The inverse in (27) can now be efficiently implemented as a scalar inversion because Y H q Y q denotes a diagonal matrix.Moreover, it is important to note that the inverse of a circulant matrix is also circulant.Thus, after the windowing by W 1D0

•••
the resulting matrix R −1 yqyq exhibits again a Toeplitz structure.The error which is introduced by the narrowband approximation has been examined in [11] for the case of stationary random processes.The error has been measured as the difference between the exact inversion of the Toeplitz matrix given in (24) and the approximated inverse given in (27).The results obtained in [11] show that for R D the narrowband approximation is well justified.
In summary, (27) can be efficiently implemented as a DFT of the first column of C Yq Yq followed by a scalar inversion of the frequency-domain values and then applying the inverse DFT.After the windowing operation these values are then replicated to generate the Toeplitz structure of R −1 yqyq .This approach reduces the complexity from O(D 2 ) to O(R log R) (e.g., experiments in Section 5: D = L, R = 4L).Obtaining a Toeplitz matrix after the inversion has the advantage that in the update equation (11) again a product of Toeplitz matrices has to be calculated which can be efficiently implemented using fast convolutions.For more details see [5].

REGULARIZATION OF THE MATRIX INVERSE
Prior to the inversion of the autocorrelation Toeplitz matrices according to (15) a regularization is necessary as these matrices may be ill-conditioned.Here we propose to attenuate the off-diagonals of R yqyq by multiplying them with the factor ρ: The attenuation factor ρ has to be within the range 0 ≤ ρ ≤ 1.Using this regularization, the algorithm performs also well even if there is just one active source.It should be noted that for ρ = 0 the previous approximation of the normalization in [5] and Section 3.2 can be seen as a special case of the regularized version of the novel normalization presented in Section 3.3.
The selective narrowband approximation of Section 3.3 leads to an inversion of circulant matrices C Yp Yq instead of Toeplitz matrices R yqyq .Thus, analogously to (28) it is desirable for the proposed algorithm to also regularize C Yp Yq prior to inversion: In Section 3.3 it was pointed out that every circulant matrix can be expressed using the DFT, inverse DFT matrix, and a diagonal matrix The diagonal matrix Y H q Y q contains the DFT transformed elements of the first column of the circulant matrix on its diagonal.Thus, by applying the diag operator on C Yq Yq we can write diag C Yq Yq = r yq yq (0) Thus, (29) can be simplified to a narrowband regularization in each frequency bin as Note that the second term in (32) is equivalent to the second term in (28).This time-frequency equivalence can be explained by the Parseval theorem.It should be noted that the regularization in (32) can also be applied to purely narrowband algorithms (e.g., [3, Section IV-C]).There, considerable separation performance improvements compared to a regularization by adding a constant have been observed too.
A pseudocode of the efficient implementation of the proposed algorithm based on ( 9)-( 12) together with the novel normalization presented in Section 3.3 and the new regularization in Section 4 is given in Table 1.There, the implementation is exemplarily shown for the update Δw 11 (m) for P = 2, D = L and application of the Sylvester constraint SC R .

EXPERIMENTS
The experiments were conducted using speech data convolved with measured impulse responses of speakers in two different environments: (a) in a real room (580 cm × 590 cm × 310 cm) with reverberation time T 60 = 250 ms at ±45 • and 2 m distance of the sources to the array, and (b) impulse responses of a driver and codriver in a car (T 60 = 50 ms) with the array mounted to the rear mirror.In the car scenario also recorded background noise with 0 dB SNR was added.The sampling frequency was f s = 16 kHz.A two-element microphone array with an interelement spacing of 20 cm was used for both recordings.The demixing filter length L was chosen to 1024 taps, the block length N = 2L, and the number of time lags considered in the correlation matrices was set to D = L.The frameshift was L samples, K = 8 blocks have been used to exploit nonstationarity, and j max = 5 iterations have been used as number of iterations for the offline update.The adaptive stepsize proposed in [5] has been used with the minimum and maximum values μ min = 0.0001, μ max = 0.01, respectively, and the forgetting factor λ = 0.2.The factor ρ for the novel regularization has been set to ρ = 0.5.The demixing filters were initialized with a shifted unit impulse where w qq,20 = 1 for q = 1, . . ., P and zeros elsewhere.
To evaluate the performance, the signal-to-interference ratio (SIR) was calculated which is defined for the qth channel as the ratio of the signal power of the target source signal y s,q (n) to the signal power from the crosstalk signal y c,q (n) given by where the estimate E of the expectation operator is implemented as a moving average.To obtain the target and crosstalk signal component for the SIR calculation, each signal component at the microphone signals is processed individually by the demixing system obtained by the BSS algorithm.A possible external permutation, that is, if the source signal s p (n) is obtained at a BSS output channel y q (n) with p = q, is corrected before the SIR calculation.In the experiments the channelwise SIR q defined in (34) has been averaged over both channels q = 1, 2.  In Figures 2 and 3 the results of the broadband algorithm with the three different normalization schemes presented in Section 3 are shown.The dashed line represents the exact normalization by the inverse of the Toeplitz matrix which is estimated using the correlation method.It can be seen that the novel normalization scheme (solid) obtained by the narrowband approximation corresponding to the inversion of a circulant matrix approximates the exact normalization very well.Moreover, the novel normalization yields improved performance compared to the time-domain approximation (dash-dotted) resulting in a normalization by the output signal power.Sometimes the novel algorithm even seems to slightly outperform the exact normalization.This can be explained by the usage of an adaptive stepsize [5] which may result in slightly different convergence speeds for all three algorithms.It should also be noted that the fluctuation of the SIR is due to the nonstationarity of the speech signals.

CONCLUSION
In this paper a novel efficient normalization scheme was presented resulting in a novel algorithm combining advantages of broadband algorithms with the efficiency of narrowband techniques.Moreover, a regularization method was proposed leading to improved convergence behavior.Experimental results in realistic acoustic environments confirm the efficiency of the proposed approach.
From 1989 to 1990, he was a Postdoctoral Member of technical staff at AT&T Bell Laboratories, Murray Hill, NJ.In 1990, he joined Philips Kommunikations Industrie, Nuremberg, Germany.From 1993 to 1999, he was a Professor at the Fachhochschule Regensburg, before he had joined the University of Erlangen-Nuremberg as a Professor and Head of the Audio Research Laboratory in 1999.He authored or coauthored seven book chapters and more than 70 refereed papers in journals and conference proceedings.He served as a Guest Editor to various journals, as an Associate Calculate the values on the diagonal of Y 1 by computing the DFT of length R of the ith output signal block of length N of Step (3).

Figure 1 :
Figure 1: Illustration of (17) showing the relation between circulant matrix C Yq and Toeplitz matrix Y q .
Editor and Guest Editor to IEEE Transactions on Speech and Audio Processing from 2000 to 2004, and presently serves as an Associate Editor to the EURASIP Journal on Signal Processing and EURASIP Journal on Advances in Signal Processing.He was the General Chair of the 5th International Workshop on Microphone Arrays in 2003 and the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics in 2005.His current research interests include speech signal processing, array signal processing, adaptive filtering, and its applications to acoustic human/machine interfaces.

Table 1 :
Pseudocode of the block-online algorithm with improved normalization according to Section 3.3 exemplarily shown for the update Δw 11 (m) in the 2 × 2 case.