EURASIP Journal on Applied Signal Processing 2003:1, 21–31 c ○ 2003 Hindawi Publishing Corporation An Acoustic Human-Machine Front-End for Multimedia Applications

A concept of robust adaptive beamforming integrating stereophonic acoustic echo cancellation is presented which reconciles the need for low-computational complexity and efficient adaptive filtering with versatility and robustness in real-world scenarios. The synergetic combination of a robust generalized sidelobe canceller and a stereo acoustic echo canceller is designed in the frequency domain based on a general framework for multichannel adaptive filtering in the frequency domain. Theoretical analysis and real-time experiments show the superiority of this concept over comparable time-domain approaches in terms of computational complexity and adaptation behaviour. The real-time implementation confirms that the concept is robust and meets well the practical requirements of real-world scenarios, which makes it a promising candidate for commercial products.


INTRODUCTION
With a continuously increasing desire for convenient human-machine interaction, the acoustic interface of any terminal for multimedia or telecommunication services is challenged to allow seamless, hands-free, and untethered audio communication for the benefit of human users.
Audio capture is usually responsible for extracting desired signals for the multimedia device or, in telecommunication applications, for remote listeners. Compared to sound capture by a microphone next to the source, seamless audio interfaces as depicted in Figure 1 cause the desired signals to be impaired by (a) acoustic echoes from the loudspeaker(s), (b) local interferers, and (c) reverberation due to distant talking.
Techniques for acoustic echo cancellation (AEC) evolved over the last two decades [1,2] and lead to the recent presentation of a five-channel AEC for real-time operation on a personal computer (PC) [3,4]. If no distortion of the desired signal should be allowed, suppression of local interference is best handled by microphone arrays [5,6]. Here, robust adaptive beamforming algorithms are necessary to cope with time-varying acoustic environments including moving desired sources. Removing reverberation from the desired signal, ideally, requires blind identification and inversion of the channel(s) from the source to the sensor(s). For realistic time-varying environments, this problem still awaits theoretical solutions with robust implementations out of reach. Consequently, practical dereverberation is limited to the spatial filtering effected by a beamforming microphone array, which suppresses acoustic reflections from undesired directions.
From the above, for practical multimedia terminals, a combination of a beamforming microphone array with AEC is desirable. While the general properties and synergies of such combinations have been studied in [7], we describe here a system which incorporates advanced adaptive filtering techniques for both beamforming and multichannel AEC leading to a highly efficient and robust real-time implementation. For beamforming, a robust generalized sidelobe  canceller (RGSC) [8] serves as a starting point which is discussed in Section 2. For stereo sound reproduction as considered here, the system identification problem of stereo AEC (SAEC) is described in Section 3.
In Section 4, a general framework for multichannel adaptive filtering in the frequency domain (more exactly, discrete Fourier transform (DFT) domain) is presented, which is subsequently used to systematically derive efficient algorithms for both adaptive beamforming and SAEC using the formalism.
In Section 5, as the main contribution of this paper, the embedding of SAEC into an RGSC structure in the frequency domain is described. The algorithms for each of the adaptive building blocks are formulated while including crucial issues of adaptation control.
The functionality and efficiency of the realized system are documented in Section 6. Results for convergence behaviour of the various adaptive components are presented for welldefined real-world simulation scenarios, and main characteristics of the real-time implementation are described.
Compared to [9], (a) the frequency-domain system is derived starting from a description in the time domain and rigorously applying the concept of multiple-input multipleoutput (MIMO) frequency-domain adaptive filtering, and (b) the AEC part is integrated such that the system using stereophonic AEC runs in real time on low-cost PC platforms.

GENERALIZED SIDELOBE CANCELLER (GSC) FOR NONSTATIONARY BROADBAND SIGNALS
Essentially, beamforming microphone arrays separate desired signals from interference by exploiting spatial information about the source location. Since acoustic environments are strongly time-variant, means of adaptive beamforming are necessary. First approaches only took the time-variance of the interference into account but assumed fixed positions for the desired speaker [10]. Although this yields sufficient interference suppression, it often leads to cancellation of the desired signal for even slightly moving desired sources. Thus, adaptive beamformers are necessary, which track (a) transient interference and (b) moving desired sources. Due to their simplicity, adaptive beamformers realized as GSC structures [11] are especially promising. A RGSC was presented in [8] that explicitly takes the time-variance of the desired source position into account, which enhances robustness against desired signal cancellation compared to conventional GSCs.
In this section, we describe the RGSC algorithm (see Figure 2). It consists of a fixed-reference path, which is formed by a fixed beamformer (FBF), and an adaptive sidelobe-cancelling path with the adaptive blocking matrix (ABM) and the adaptive interference canceller (AIC). These building blocks are described in the discrete time domain in Sections 2.1, 2.2, and 2.3. In Section 2.4, we show problems of the original RGSC structure and propose solutions by way of a realization in the DFT domain (see Sections 4 and 5).

Fixed beamformer
Capturing the L f most recent output samples of M microphone signals x m (k), m = 1, 2, . . . , M, by vectors where T denotes a vector or matrix transposition, and describing the FBF impulse responses by vectors where L f is the number of filter taps, we can write the FBF output signal as w defines the beamformer response with respect to (w.r.t.) the signal impinging from the location of the desired source d(k). The weight vector w is designed such that desired speaker movements within a predefined region are possible without distorting the desired signal (see, e.g., [9]), whereas any interference n(k) arriving from another direction is attenuated.

Adaptive blocking matrix
The ABM suppresses the desired signal components in the adaptive sidelobe-cancelling path. The M-channel ABM output, ideally, only contains interference components which are used in the AIC to form an estimate of the interference contained in y f (k). The ABM is realized by M adaptive filters with impulse using the FBF output as reference signals and the sensor signals . .
as desired signals (see Figure 2). Defining a matrix B(k) which captures all adaptive filters b m (k), we obtain for the ABM output signals in vector notation In order to cancel the desired signal d(k) by the ABM, B(k) must be determined such that the ABM output signals e b (k) are minimized w.r.t. desired signal components. This can be expressed as an exponentially weighted least-squares problem [12] by where λ b (0 < λ b < 1) is an exponential forgetting factor. In contrast to fixed blocking matrices, which ensure distortion-free desired signals only for very few predetermined source positions [10], the adaptivity of the ABM allows to track arbitrarily moving desired speakers. Leakage of desired signal components is efficiently prevented by the ABM so that the RGSC is more robust against desired signal cancellation than GSCs using fixed blocking matrices.

Adaptive interference canceller
The AIC is realized by M adaptive filters with impulse response vectors a m (k) of length L a , a m (k) = a 0,m (k), a 1,m (k), . . . , a La−1,m (k) T , as reference signals and the FBF output y f (k) as desired signal.
The AIC structure minimizes the interference at the RGSC output e a (k) by subtracting the produced estimate of the interference from the fixed beamforming path Nonstationary interference is efficiently suppressed by determining the optimum AIC filters using again an exponentially weighted least-squares optimization criterion with λ a (0 < λ a < 1) as an exponential forgetting factor .

Adaptation strategy
The above description reveals the fundamental problem of the RGSC of adapting the adaptive sidelobe-cancelling path.
In this section, we first describe this problem, and, second, we show how it can be relieved by transforming the system into the frequency domain. The adaptive sidelobe-cancelling path consists of two cascaded adaptive modules, ABM and AIC. Although they need to be adapted simultaneously for optimum-tracking of nonstationarities of the desired signal and the interference, they can only be adapted separately, which impairs tracking performance and output signal quality.
The ABM ideally suppresses only desired signal components. Interference must be excluded from the (unconstrained) adaptation of the ABM (see (9)). Therefore, the ABM cannot be adapted during double-talk, tracking performance is impaired, and desired signal may leak to the AIC input.
Ideally, the ABM output only contains interference components, which are used as reference for the AIC for minimizing interference in the GSC output signal (see (13)). Since the ABM output may contain desired signal during doubletalk, the AIC cannot be adapted during double-talk for preventing cancellation of the desired signal by the AIC. As a result, the ABM/AIC should only be adapted if the signal-to-noise ratio (SNR) is high/low. Otherwise the adaptation should be stopped for optimum output signal quality. When applying this adaptation strategy for nonstationary signals in the full frequency band, then the adaptive interference cancelling path is nearly inefficient. Interference suppression is reduced to that of the FBF [9].
Application of this adaptation strategy not in the fullband but in narrow subbands allows more flexibility for nonstationary signals. For each of the subbands, the SNR can be estimated independently and, based on this SNR-estimate, adaptation of ABM and AIC can be performed. In Section 6, we illustrate that tracking performance, interference suppression, and output signal quality are improved by our realization in the DFT domain.

STEREOPHONIC AEC
The fundamental idea of any two-channel AEC structure ( Figure 3) is to use adaptive FIR filters with length-L impulse response vectorŝ which identify the truncated (generally time-varying) echo path impulse responses h p (k). The filtersĥ p (k) are stimulated by the loudspeaker signals x ls,p (k) and then the resulting echo estimatesŷ p (k) are subtracted from the microphone signal y(k) to cancel the echoes. The generalization to the multimicrophone case is straightforward, but is disregarded in this section. The residual echo signal reads where Adaptation of the filters minimizing the power of e(k) is carried out only if there is no activity of the speaker in the receiving room.

Specific problems of SAEC compared to single-channel AEC
The specific problems of SAEC include all those known for single-channel AEC such as colored and nonstationary excitation of very long adaptive filters (e.g., [1]), but in addition to that, SAEC usually has to cope with high crosscorrelation between the loudspeaker signals, which in turn cause correlated echoes that cannot easily be distinguished in the microphone signal [13]. The correlation results from the fact that the signals are usually derived from a common sound source at the far-end, for example, a speaker as shown in Figure 3. Straightforward extension of known mono AEC schemes thus often leads to very slow convergence of the adaptive filter towards the physically true echo paths [13]. If the relation between the signals x ls,p (k) is strictly linear, then there is a fundamental problem of nonuniqueness in the two-channel case as was shown in [13]. In general, convergence to the true echo paths is necessary since otherwise the AEC would have to track not only changes of the echo paths at the near-end but also any changes of the crosscorrelation between the channels of the incoming audio signal, leading to sudden degradation of the echo cancellation performance [13]. To some extent, the problem can be relieved by some nearly inaudible preprocessing of the loudspeaker signals (e.g., [14,15]) for partial decorrelation of the channels, but in addition, sophisticated adaptation algorithms taking the crosscorrelations into account are still necessary for SAEC. This is discussed next.

Two-channel and multichannel adaptive filtering for highly cross-correlated excitation signals
Multichannel versions of known adaptation algorithms such as the (normalized) least-mean squares ((N)LMS) or the recursive least-squares (RLS) algorithms can be straightforwardly derived by rewriting (15) using concatenated vectors in the same way as shown in (12). However, due to the high crosscorrelation between the loudspeaker signals, the performance of SAEC is more severely affected by the choice of algorithm than the monophonic counterpart. This is easily recognized since the convergence speed of most adaptive algorithms depends on the condition number of the input signal's covariance matrix. In the stereo case, this condition number is very high. To cope with such ill-conditioned problems, the RLS algorithm turns out to be the optimum choice since its mean-squared error convergence is completely independent of that condition number [12]. Using concatenated data vectors, the corresponding coefficient update equation readsĥ where R xx denotes the 2L×2L covariance matrix of the loudspeaker signals x ls,p (k). Note that this matrix contains both, estimates of autocorrelations (block matrices on the main diagonal) and crosscorrelations (block matrices on the offdiagonals). Unfortunately, because of the very high computational cost required for the inversion of R xx and the associated numerical stability problems, this algorithm is not readily suitable for AEC in real-time operation. Therefore, efficient approximations to the multichannel RLS algorithm are needed, which explicitly take the high crosscorrelations into account. Section 4 describes an efficient and systematic adaptive filtering concept to solve this problem in the frequency domain.

EFFICIENT MULTICHANNEL FREQUENCY-DOMAIN ADAPTIVE FILTERING
The integrated system presented in Section 5 is solely based on efficient frequency-domain adaptive filtering using the overlap-save method. In the following, we give a compact formulation of a generic adaptive filter structure with P input channels and Q output channels as shown in Figure 4. This formalism will then be applied in Section 5 to our combination of RGSC and SAEC. As it turns out, the following formulation supports a systematic transformation of the entire structure (Sections 2 and 3) into the frequency domain and leads to several desirable properties, such as improved adaptation control for the RGSC and taking into account the crosscorrelation between the loudspeaker signals of the SAEC module. Note that the application of the overlap-save method using DFTs requires block processing of the input and output data streams. In the following, we derive the algorithm for a block length N equal to the filter length L, which yields maximum efficiency. However, to keep the processing delay short and to preserve optimum-tracking behaviour, the data blocks are overlapped in our realization (Section 5). Moreover, we consider here only multichannel frequencydomain adaptive filters in their unconstrained form. A more general treatment of this class of adaptive algorithms including an in-depth convergence analysis can be found in [4].

Optimization criterion
To obtain a MIMO algorithm in the frequency domain, we first formulate a block-error signal and a suitable cost function for optimization. According to Figure 4, the error signal of the qth output channel (q = 1, . . . , Q) is where the vectors x p (k) andĥ p,q are defined as in (16) and (14), respectively. The vectors x(k) andĥ q are obtained by concatenating the vectors x p (k) andĥ p,q , respectively. For applying L-point DFTs, as the corresponding L × 1 block error signal vector is defined where n denotes the block index over time. Moreover, the signals of all Q channels are then put together into an L × Q block-error signal matrix which leads to an equivalent matrix formulation of (18) containing the block signal matrix and the PL × Q matrix of MIMO filter coefficientŝ The data vector x(k) in (18) translates into a block-Toeplitz matrix in the block formulation. According to the overlapsave method [4,16], this matrix can be transformed by appropriate windowing and using DFT matrices F of size 2L × 2L into a block-diagonal matrix where It follows for the MIMO block-error matrix where Having derived a frequency-domain error matrix, the following frequency-domain criterion [4] is applied for optimizing the coefficient matrixĤ =Ĥ(n): where H denotes conjugate transpose and λ (0 < λ < 1) is an exponential forgetting factor. The criterion (33) is very similar to the one leading to the well-known RLS algorithm. The main advantage of using (33) is to take advantage of the fast Fourier transform (FFT) in order to have low-complexity adaptive filters.

Adaptive algorithm
An RLS-like algorithm can be straightforwardly derived from the so-called normal equation that is obtained by setting the gradient of (33) w.r.tĤ equal to zero. According to [12] and by noting that G H G = G and G H Y(i) = Y(i), we have for the gradient Setting this gradient equal to zero, we obtain the normal equation where The iterative algorithm, that is, the recursive update of the coefficient matrixĤ, is directly derived from (35), (36), and (37). In the recursive equation (37), we replace S xy (n) and S xy (n − 1) by formulating (35) in terms of block-time indices n and n − 1, respectively. We then eliminate S xx (n − 1) from the resulting equation using (36). Reintroducing the error signal vector (28), we obtain the adaptive algorithm Additionally, matrix S xx (n) is estimated by (36). The above algorithm is equivalent to the RLS algorithm in the sense that its mean-squared error convergence is also independent of the condition number of the input covariance matrix. To reduce the computational complexity of the adaptation drastically, it is shown in [4] that matrix G in (36) can be well approximated by a diagonal matrix G ≈ I/2. Using this approximation and introducing a diagonal 2L × 2L matrix µ containing frequency-dependent stepsizes, we may rewrite (36) and (39) as where µ is a diagonal matrix of stepsizes, with elements 0 ≤ µ i ≤ 2, i = 0, 1, . . . , L − 1, and optimum stepsize µ = 2I. Note that prior to inversion of S xx (n), a proper regularization by adding a suitable diagonal matrix [4] is important to ensure robust convergence behaviour.

REALIZATION OF RGSC WITH EMBEDDED STEREOPHONIC AEC IN THE DFT DOMAIN
Fundamentally, adaptive beamforming and AEC need to be combined such that advantages are explored and insufficiencies are relieved. Optimum positive synergies between SAEC and GSC are obtained when the SAEC is placed in the sensor channels of the GSC (AEGSC) [7]. Maximum computational efficiency is given if the SAEC is located in the fixedreference path after the FBF (GSAEC), since the number of SAEC output channels Q is minimized (see Figure 5). In [17], it is shown that most of the synergies are preserved for the latter structure. In Section 6, we present a frequency-domain GSAEC realization (FGSAEC). Systematic application of multichannel frequency-domain adaptive filters (see Section 4) yields a system that exploits the advantages of multichannel frequencydomain adaptive filtering while preserving positive synergies between GSC and SAEC. Especially, (a) crosscorrelation between the loudspeaker signals is taken into account for fast convergence of the SAEC, (b) adaptation problems of the adaptive sidelobe-cancelling path of the GSC are efficiently resolved (see Section 5.6), and (c) computational complexity is minimized for efficient implementation of the integrated system on low-cost PC platforms for real-time application (see Section 6 and [9]). x ls,1 (k) x M (k) Figure 6: SAEC in the fixed-reference path of the GSC.

Notations
For optimum performance, we use different DFT lengths 2L g and 2L h for GSC and SAEC, respectively, which yields the DFT matrices F 2Lg×2Lg and F 2Lh×2Lh . The parameters L g = L b = L a and L h are identical to the number of filter taps of GSC and AEC adaptive filters, respectively. For better tracking behavior of the adaptive filters, block overlaps by factors α g and α h are introduced in the GSC and AEC input signal blocks, respectively [18]. This leads to the block-time index n = kα g /L g . It reflects the discrete time in numbers of blocks of length L g /α g . In the sequel, we assume that L h /α h is an integer multiple of L g /α g which maximizes efficiency. For a better reading, we define R = L h α g /L g α h and the time index r = kα h /L h . GSC and AEC adaptive filters are updated at times n and r, respectively.

FBF
The mainlobe of simple delay&sum beamformer with broadside steered microphone arrays is too narrow at high frequencies. This often leads to cancellation of the desired signal at high frequencies if the desired speaker position and the steering direction do not match. Dolph-Chebyshev beamformer design [19] allows to arbitrarily choose the first null of the array pattern relative to the steering direction while minimizing the level of the sidelobes. It allows to design filter&sum beamformers with predefined mainlobe widths that are constant over a wide range of frequencies. This makes this design method especially appropriate for our application since it allows to arbitrarily specify a region where desired signals are not attenuated [9].

Stereophonic AEC integrated in fixed-reference path of RGSC
Using the notations of Section 4, the basic signal-processing of the SAEC in the fixed-reference path of the GSC can be summarized as follows (see Figure 6). The number of input channels is P = 2 and the number of output channels is Q = 1. According to (24), we capture the last 2L h samples of the loudspeaker signals x ls,p (k), p = 1, 2, in vectors, and we find for the frequency-domain loudspeaker signals, The loudspeaker signals x ls,p (k) are assumed to be preprocessed by inaudible nonlinearities [14]. Capturing the 2L h ×1 vectors of adaptive filter transfer functions h p (r) according to (27) in a vector and defining a matrix of loudspeaker signals X ls (r) = X ls,1 (r), X ls,2 (r) , as in (23), we obtain for the L h × 1 time-domain block error signal vector (see (25)) where the L h × 1 vector of FBF output signal samples is defined as . .
We define a frequency-domain error signal e h (r) according to (29) as e h (r) = F 2Lh×2Lh W 01 2Lh×Lh e h (r).
AIC X a (n) This allows us to write the SAEC filter update equation as is an exponential forgetting factor, and S XlsXls (r) is a recursive estimate of the cross-power spectral density matrix of the loudspeaker signals S XlsXls (r) = 1 − λ h S XlsXls (r − 1) + λ h X H ls (r)X ls (r). (48) With the inverse of the cross-power spectral density matrix S XlsXls (r) in the update equation, crosscorrelation of the loudspeaker signals is explicitly taken into account, leading to fast convergence of the adaptive filters. One block of length L h /α h of the AEC output signal is finally given by the last L h /α h samples of the error signal e h (r), which is by a factor R larger than the signal blocks which are required for the GSC. We split e h (r) into R blocks x b (n − i), i = 0, 1, . . . , R − 1 of length L h /R. Therefore, R − 1 blocks of x b (n − i) are buffered until they are used by the GSC.

Adaptive blocking matrix
In Figure 7, reference path and adaptive sidelobe-cancelling path are depicted. For the ABM, P = 1, Q = M. For applying the overlap-save method to the ABM adaptive filter inputs in the frequency domain, we have to transform 2α g subsequent blocks of the AEC output signal x b (n) into the frequency domain. That is, . .
With the ABM adaptive filters B(n), written in the frequency domain according to (27) as a 2L g × M matrix B(n), the L g × M block error matrix E b (n) is obtained from (25) as where the L g × M block sensor signal matrix is defined as . .
The time delay κ b ensures causality of the ABM adaptive filters. Defining E b (n) as the update equation for B(n) reads 1 The matrix G is defined according to (31) with L replaced by L g . In contrast to the SAEC (see Section 5.3), circular convolution constraints [4] are required for the ABM since the impulse responses of the ideal ABM filters are generally much longer than the length of the adaptive filters. Thus, circular convolution effects cannot be disregarded. The 2L g × 2L g diagonal matrix µ b (n) is a matrix with frequency-dependent stepsizes on the main diagonal, which control the adaptation of the ABM (see Section 5.6).
The diagonal power spectral density matrix is given by One block of length L g /α g of the time-domain AIC input signals x a,s (n) is obtained by saving the last L g /α g samples of the mth column of the block error signal matrix E b (n).

Adaptive interference canceller
With P = M, Q = 1, the frequency-domain adaptive filter input matrix X a (n) of size 2L g × 2L g M is given by where X a,m (n) is obtained in the same way as in (49) with X b (n) and x b (n) replaced by X a,m (n) and x a,m (n), respectively.
Writing the AIC adaptive filters a(n) after (27) in the frequency domain as a 2L g M × 1 vector a(n), then the timedomain block error vector reads e a (n) = y a (n) − W 01 where y a (n) is defined as . .
The time delay κ a ensures causality. Defining the frequencydomain error signal e a (n) = F 2Lg×2Lg W 01 2Lg×Lg e a (n), we obtain the multichannel filter update equation as 2 where we introduced the 2L g × 2L g diagonal matrix µ a (n) with frequency-dependent stepsizes on the main diagonal for controlling the adaptation of the AIC (see Section 5.6).
Note that circular convolution is prevented by the matrix G (see (31)). As for the ABM, ideal impulse responses AIC are much longer than the length of the adaptive filters. The diagonal power spectral density matrix is computed from S XaXa (n) = 1−λ a S XaXa (n−1)+λ a diag X H a (n)X a (n) , (60) where diag{·} extracts the main diagonal of the given argument. Finally, one block of length L g /α g of the GSC output signal is obtained by saving the last L g /α g samples of e a (n).

Adaptation control
In this system, we modified the GSC adaptation control presented in [8] to a DFT-binwise operation, which increases convergence speed and robustness significantly. It is based on a spatial SNR estimate: the FBF output yields an estimate of the desired signal PSD. A fixed beamformer, which is complementary to the FBF, yields an estimate of the interference PSD. A frequency-dependent SNR estimate is then obtained by the ratio of desired signal PSD and interference PSD. This is used for a bin-wise decision whether the ABM or the AIC is adapted.
We do not consider the stepsize control of the AEC here. Various stepsize-control methods can be found in the literature (see, e.g., [2,20]). The adaptation control of GSC and SAEC does not need to rely on synergies between both adaptation mechanisms, so that GSC and SAEC can be adapted independently of each other.

REAL-TIME IMPLEMENTATION AND EXPERIMENTAL EVALUATION
For demonstrating the performance of our acoustic humanmachine interface in real time, we implemented the FGSAEC algorithm on a PC platform. The multichannel audio capture unit is realized as separate hardware integrating the microphones, the preamplifiers, the A/D conversion, and the microphone calibration. The digitized sensor data is fed into the PC via a standard USB port with specific drivers for the microphone array. Our experiments were conducted on an Intel Pentium IV 1.8 GHz processor at a sampling rate of 12 kHz. Optimum performance of the frequency-domain RGSC (FGSC) and FGSAEC was obtained with 30% and 52% CPU load, respectively.
For all experiments, we use a linear microphone array with 8 equally spaced, broadside-steered sensors with 4 cm spacing in an office environment with 300 ms reverberation time. The male desired speaker and the male interferer with an average signal power ratio of 0 dB are located in the array look-direction and 30 degrees off the array axis, respectively. The stereophonic loudspeakers emitting music are placed to the left and to the right of the microphone array. All distances to the array center are 60 cm. The frequency band is 300 Hz-5.9 kHz. The FBF is realized by a Dolph-Chebyshev design described in [9]. Typical numerical values for filter lengths and block overlapping factors are L g = 128, L h = 2048 and α g = 2, α h = 8, respectively. In Section 6.1, we study steadystate performance of FGSAEC. In Section 6.2, tracking capability of RGSC with moving desired speaker is illustrated.

Performance after convergence of the adaptive filters
For evaluating the proposed system after convergence of the adaptive filters, we compare the average interference rejection (IR) and the average echo-return-loss enhancement (ERLE) of FGSAEC, FGSC, frequency-domain AEGSC (FAEGSC), and TGSAEC, (the time-domain equivalent of FGSAEC) for only interference and for double-talk of interference and desired speaker.
Since it is difficult to study IR and ERLE separately for real-time scenarios, we illustrate the results that we obtained with recorded signals in simulations. Audio examples which illustrate the performance of the real-time system can be found in [21].
The results are depicted in Table 1. For only interference, IR and ERLE are higher than for the double-talk case since the ABM is fixed and since the AIC can be adapted permanently over the entire frequency range, yielding optimumtracking capability of nonstationary interference. The performance of TGSAEC and FGSAEC is identical. During doubletalk, IR and ERLE are considerably improved for FGSAEC relative to TGSAEC as controlling the adaptation in individual frequency bins still allows tracking the transient ABM and of nonstationary interference at frequencies with low SNR [9]. FGSAEC clearly improves the suppression of acoustic echoes relative to FGSC; however, optimum performance  of FAEGSC cannot be obtained due to leakage effects across the GSC sidelobe-cancelling path [17].

Tracking of the ABM
For illustrating the tracking capability of the ABM, desired signal rejection (DR) and interference rejection of the FGSC over time are measured for changing desired speaker position. Both rejections are estimated by the ratio of recursively averaged squared sensor signals and beamformer output signals w.r.t. the desired signal components and interference components. Figure 8 depicts the results for the ABM in comparison with a fixed blocking matrix (BM) after [11]. Parameters are chosen to have the same IR(k) for ABM and fixed BM. For controlling the adaptation of both GSC realizations, knowledge about the true sensor SIR is assumed. For controlling the adaptation of ABM and AIC, knowledge of the true sensor SNR is assumed. At 1.66 s, the desired speaker switches from broadside (0 degrees) to 10 degrees; neither interference suppression nor desired signal quality are impaired due to fast-tracking capability of FGSC. The fixed blocking matrix is designed to suppress signals from a single-propagation path. Due to reverberation, it leads to considerable desired signal distortion before and after changing the desired speaker position.

CONCLUSIONS
The presented signal-processing algorithms describe an example of efficient integration of adaptive beamforming and multichannel AEC which meets well the practical requirements regarding the suppression of interference and acoustic echoes for seamless acoustic human-machine interfaces. Without structural changes, it can be extended to more reproduction channels and even multichannel recording. Moving the implementation from the PC platform to a more specialized hardware will be smooth as long as efficient and numerically sound implementations of basic signal-processing algorithms such as fast Fourier transforms are assured and as long as block processing and control loops pose no obstacles.