 Research
 Open access
 Published:
Efficient multichannel acoustic echo cancellation using constrained tap selection schemes in the subband domain
EURASIP Journal on Advances in Signal Processing volume 2017, Article number: 63 (2017)
Abstract
Acoustic echo cancellation (AEC) is a key speech enhancement technology in speech communication and voiceenabled devices. AEC systems employ adaptive filters to estimate the acoustic echo paths between the loudspeakers and the microphone(s). In applications involving surround sound, the computational complexity of an AEC system may become demanding due to the multiple loudspeaker channels and the necessity of using long filters in reverberant environments. In order to reduce the computational complexity, the approach of partially updating the AEC filters is considered in this paper. In particular, we investigate tap selection schemes which exploit the sparsity present in the loudspeaker channels for partially updating subband AEC filters. The potential for exploiting signal sparsity across three dimensions, namely time, frequency, and channels, is analyzed. A thorough analysis of different stateoftheart tap selection schemes is performed and insights about their limitations are gained. A novel tap selection scheme is proposed which overcomes these limitations by exploiting signal sparsity while not ignoring any filters for update in the different subbands and channels. Extensive simulation results using both artificial as well as realworld multichannel signals show that the proposed tap selection scheme outperforms stateoftheart tap selection schemes in terms of echo cancellation performance. In addition, it yields almost identical echo cancellation performance as compared to updating all filter taps at a significantly reduced computational cost.
1 Introduction
Acoustic echo cancellation (AEC) [1, 2] is a key technology used in handsfree telephony and voiceenabled systems. An AEC system consists of an adaptive filter which estimates the acoustic echo path between the loudspeaker and the microphone. Using this estimated echo path, an estimate of the acoustic echo signal is generated which is then subtracted from the microphone signal. When multiple loudspeakers are present, as is the case for surroundsound systems, Multichannel Acoustic Echo Cancellation (MAEC) systems are required [3–6]. These systems consist of multiple adaptive filters dedicated to estimate the acoustic echo paths between each loudspeaker and each microphone, i.e., one filter per channel. When employing timedomain MAEC systems in large and/or reverberant rooms, very long filters with several thousand taps may be required in order to achieve effective echo cancellation. Using such long filters requires large computational effort, both for updating the filters as well as for generating the acoustic echo signal estimates.
In order to reduce computational complexity of timedomain adaptive filters, a number of tap selection schemes [7–14] have been proposed for implementing partial updates of the adaptive filters. These schemes reduce complexity by updating only a subset M of all N filter taps in each iteration, where the subset is chosen based on a tap selection criterion. Since speech and/or surroundsound entertainment signals usually exhibit significant sparsity across frequency (due to spectrally colored content), channels (due to different content in the different loudspeakers) and time (due to nonstationary content), a number of tap selection schemes have been proposed which exploit the sparsity present in the loudspeaker signals for partially updating the filters [8–12]. The MMax [9, 10] is a wellknown tap selection scheme which exploits signal sparsity by selecting the filter taps corresponding to the M largest magnitude tapinputs in each iteration. For a given M, this scheme maximizes the energy of the update in each iteration and thereby gives the closest possible performance to full filter update in terms of minimizing the mean squared error. Another tap selection scheme which exploits signal sparsity is the selectivepartialupdate (SPU) [11] tap selection scheme, where the Ntap adaptive filter is first divided into B blocks, which are then ranked according to the squared Euclidean norm of their respective tapinputs. Based on this ranking, in each iteration the top \(\lfloor B \cdot \frac {M}{N} \rfloor \) blocks, where ⌊·⌋ denotes the flooring operation, are selected to be updated. Many other schemes have been proposed which further improve performance by exploiting the sparseness of the echo path [13, 14]. Since sparseness of the echo path is more relevant for applications such as network echo cancellation [1], and not particularly relevant for the considered AEC application (as acoustic impulse responses are not particularly sparse), we will not consider such approaches in this paper.
Apart from large computational complexity, MAEC systems also suffer from other notable problems such as the misalignment problem [3, 15, 16]. Since in MAEC systems the different loudspeaker input signals are typically correlated with each other, the input covariance matrix may be illconditioned, possibly resulting in a large filter misalignment and a slow convergence speed. It should be realized that the misalignment problem is typically more severe in the context of speech communication systems, since the loudspeaker signals are obtained by filtering the same source (farend speaker), as compared to surroundsound systems, where the loudspeaker signals may be independent of each other. The most common approach to tackle the misalignment problem is to decorrelate the tapinputs, for which several techniques have been proposed in literature [3, 15, 17]. Tap selection schemes such as the exclusivemaximum (XM) [18–20] have also been proposed to specifically tackle the misalignment problem for stereo AEC applications. The XM scheme improves the conditioning of the tapinput covariance matrix via exclusive updates of the two adaptive filters, i.e., in each iteration the same filter tap index is never selected in both channels. In this paper, however, we do not aim to solve the misalignment problem using tap selection schemes and do not claim to improve the misalignment performance for highly coherent loudspeaker signals, i.e., our main motivation is solely computational complexity reduction of MAEC systems.
As an alternative to timedomain adaptive filters, frequencydomain and subband adaptive filters are frequently used as they enable more efficient and frequencydependent filter updates [2, 21–25]. Frequencydomain adaptive filtering algorithms, such as the frequencydomain least mean square (FLMS) [21], the partitioned block frequencydomain adaptive filtering (PBFDAF) [22] and the multidelay block frequencydomain adaptive filtering (MDF) algorithm [23], are typically based on the overlapsave method [24, 25] and use the fast Fourier transform (FFT) to efficiently compute the required timedomain convolution and correlation operations. In [26], the MMax tap selection scheme has been proposed for the frequencydomain MDF algorithm. Alternatively, adaptive filtering can be performed using subband processing, where an analysis filterbank transforms the timedomain signals into the subband domain, the filter adaptation and processing is performed independently in each subband, and a synthesis filterbank is used to reconstruct the timedomain signals. In this paper, we will only consider subband adaptive filters. More specifically, we will use the wellknown weighted overlapadd (WOLA) method [2, 27], i.e., using an FFT analysis filterbank to transform the (windowed) timedomain signals to the shorttime Fourier transform (STFT) domain and an inverse FFT synthesis filterbank. Such a processing scheme provides a suitable compromise between computational complexity and latency, and enables to achieve a suitable time and frequency resolution.
In general, using a tap selection scheme may lead to a significant amount of processing overhead, primarily due to the required sorting effort. The computational savings obtained due to partial filter update are offset (and may even be exceeded in some cases) by the additional effort required for sorting. Compared to popular sorting algorithms such as the QUICKSORT routine [28], a more efficient fast running algorithm known as the SORTLINE routine [29] has been proposed for sorting vectors which contain many elements in common with a presorted vector from a previous iteration, which is often the case with tapinput vectors from one iteration to the next.
In this paper, we propose and investigate different tap selection schemes in the subband domain for constrained partial updates of subband MAEC filters. Please note that in such a framework, the tap selection schemes operate on the magnitudes of the complexvalued STFT coefficients. Also, we consider the subband AEC filter in each channel to be composed of a number of subfilters, i.e., one subfilter per subband. First, we extend the MMax tap selection scheme proposed for complexvalued loudspeaker signals in [26] to the multichannel scenario, thereby applying the MMax criterion across three dimensions, i.e., subbands, channels and filter length. Then, we present two new tap selection schemes which apply the MMax criterion independently in each subfilter across filter length only. The first scheme selects the same number of taps in each subfilter, while the second scheme exploits the sparsity present in the loudspeaker signals across frequency and channels to select taps dynamically in the different subfilters. Some preliminary results were obtained in [30] which indicated that signal sparsity present in realworld multichannel entertainment signals can be exploited to efficiently update the MAEC filters. The proposed tap selection schemes are then compared to the SPU tap selection scheme [11] in the subband domain^{1}.
The remainder of the paper is organized as follows. The signal model is presented in Section 2 and the different tap selection schemes considered are presented in Section 3. Section 4 presents a sparsity analysis for several synthetic and realworld multichannel signals, and the echo cancellation performance obtained when the different tap selection schemes are used. Section 5 discusses the computational effort required for the different tap selection schemes and the computational savings obtained when performing partial filter updates.
2 Signal model
We consider a loudspeaker–enclosure–microphone (LEM) system with R loudspeakers and a single microphone. The acoustic echo paths between the loudspeakers and the microphone are assumed to be timeinvariant, such that the echo contribution from the r ^{th} loudspeaker at discrete time index n is given by
where x _{ r } denotes the r ^{th} input signal and h _{ r } denotes the impulse response corresponding to the r ^{th} acoustic echo path, with V _{ r } denoting its length. Considering nearend speech signal s and nearend noise signal b, the microphone signal y is given as
where \(d(n) = \sum _{r=1}^{R} d_{r}(n)\) denotes the total acoustic echo component.
For the subbanddomain processing, an FFT analysis filterbank of order N _{FFT} is used to transform the (windowed) timedomain signals into the STFT domain, with the total number of subbands given by \(K = \frac {N_{\text {FFT}}}{2} + 1\). The STFT coefficient of the rth input signal in the kth subband and ℓth frame is computed as
where \(j = \sqrt {1}\), F denotes the frameshift and W _{ana} denotes the analysis window. In the remainder of the paper, the terms reference channels and reference spectra will be used to refer to the loudspeaker signals and their corresponding STFT coefficients, respectively.
The subband MAEC system is depicted in Fig. 1 and consists of R adaptive filters, i.e., one corresponding to each reference channel, where each filter is composed of K subfilters with L taps each. Thus, the total number of filter taps is given as
i.e., L taps ×K subbands ×R channels.
The subfilter for the k ^{th} subband in the r ^{th} channel is denoted as \(\underline {\hat {H}}_{r}(k,\ell)\) and consists of L complexvalued coefficients
where \(\hat {H}^{i}_{r}(k,\ell)\) denotes the i ^{th} filter tap and ·^{T} denotes the transpose operator. The tapinput vector to the subfilter \(\underline {\hat {H}}_{r}(k,\ell)\) also consists of L complexvalued spectral coefficients and is given as
The acoustic echo estimate for the r ^{th} channel is generated by filtering the reference spectrum \(\underline {X}_{r}(k,\ell)\) with the subfilter \(\underline {\hat {H}}_{r}(k,\ell)\)
where ·^{H} denotes the Hermitian operator. The total MAEC filter output is given as
with the residual echo equal to
where Y denotes the complexvalued spectrum of the microphone signal y, computed similarly to (3).
In order to reduce the computational complexity of the MAEC filter update in every frame, we will consider a partial update of \(\underline {\hat {H}}_{r}(k,\ell)\) by updating only a subset \(\mathcal {L}_{r}(k,\ell)\) of all L filter taps, where \(\mathcal {L}_{r}(k,\ell)\) is an integer and is determined using a tap selection scheme (see Section 3). These tap selection schemes compute a vector
consisting of L binaryvalued elements. If the element \(T^{i}_{r}(k,\ell) = 1\), then the corresponding filter tap \(\hat {H}^{i}_{r}(k,\ell)\) is selected to be updated, otherwise it is not. Thus, the sum of the elements of \(\underline {T}_{r}(k,\ell)\) always satisfies
For updating \(\underline {\hat {H}}_{r}(k,\ell)\), we use a variant of the normalized least mean squares (NLMS) algorithm [25], incorporating a partial filter update as shown below
where μ denotes the (fixed) stepsize, ^{∗} denotes the complexconjugate operator and ⊙ denotes the elementwise multiplication operator. The stepsize is normalized by the sum of the regularization parameter ε and the multichannel tapinput power
From hereon, we will refer to (12) as the partial update NLMS (PUNLMS) algorithm.
All tap selection schemes considered in this paper are based on the magnitudes of the tapinput vector \(\underline {X}_{r}(k,\ell)\), i.e.,
By stacking the vector \(\underline {\mathcal {X}}_{r}(k,\ell)\) over all K subbands and R channels, we define the Nelement vector
containing the magnitudes of all MAEC filter tapinputs. Similarly to (15), we define the Nelement tap selection vector \(\underline {\alpha }(\ell)\) by stacking the vector \(\underline {T}_{r}(k,\ell)\) over all K subbands and R channels.
3 Tap selection schemes
In this section, we investigate and propose different tap selection schemes for designing the tap selection vector \(\underline {\alpha }(\ell)\). All tap selection schemes exploit sparsity in \(\underline {\mathbf {X}}(\ell)\) across one or more dimensions, i.e., frames, subbands and channels. A vector is considered sparse if a small number of its elements contain a large proportion of its energy. The terms temporal, spectral, and spatial sparsity will be used to refer to sparsity present across frames, subbands, and channels, respectively. For all considered schemes, we impose the constraint that in every frame exactly M taps across all K·R subfilters are selected to be updated, with
where \(Q \! \in \! \mathcal {R}\) is a design parameter, with 0≤Q≤1. Note that Q=0 implies no filter update and Q=1 implies full filter update. This also means that exactly M elements in the tap selection vector \(\underline {\alpha }(\ell)\) are equal to 1, i.e.,
The first tap selection scheme we investigate is the 3D MMax scheme, which applies the MMax criterion across the three dimensions of subbands, channels, and filter length for selecting taps. Then, we investigate the SPU scheme, which sorts the K·R subfilters in each frame according to the squared Euclidean norm of their respective tapinputs and then selects all L taps in the top \(\left \lfloor \frac {M}{L} \right \rfloor \) subfilters. Finally, we present two 1D MMax schemes which apply the MMax criterion only across the dimension of filter length, with the first scheme selecting the same number of taps in all subfilters and the second scheme dynamically selecting taps in each subfilter.
3.1 3D MMax (3DM) scheme
The 3D MMax tap selection scheme is an extension of the MMax scheme proposed for the singlechannel scenario in [26] to the multichannel scenario. Using this scheme, the filter taps corresponding to the M largest magnitude tapinputs in every frame are selected to be updated by applying the MMax criterion on the vector \(\underline {\mathbf {X}}(\ell)\). The resulting tap selection vector \(\underline {\alpha }(\ell)\) can then be unstacked to obtain the vectors \(\underline {T}_{r}(k,\ell)\) corresponding to the K·R subfilters. Implementing this scheme requires sorting the Nelement vector \(\underline {\mathbf {X}}(\ell)\) in every frame which is done efficiently using the QUICKSORT routine, requiring comparisons in the order of \(\mathcal {O}\left (N \cdot \log _{2} N\right)\) per frame.
As this scheme applies the MMax criterion on the complete vector \(\underline {\mathbf {X}}(\ell)\), it is able to exploit the spectrospatiotemporal sparsity that may be present in the multichannel reference spectra, with the M selected taps distributed amongst the different subfilters in every frame. For reference spectra with significant temporal, spatial and spectral diversity/nonstationarity, it is highly likely that each of the N filter taps are eventually updated at some stage. However, if the reference spectra exhibit stationarity and large spectral coloration and/or large interchannel power difference, all M taps may be selected in only a small subset of the K·R subfilters in every frame. This may result in the subfilters in certain subbands and/or channels being completely ignored for a long time period, which may severely affect filter convergence. This disadvantage of the 3DM scheme motivates us to look for schemes which do not completely ignore these subfilters when allocating taps to be updated.
3.2 SPU scheme
In the SPU scheme [11], in each frame the K·R subfilters are sorted according to the squared Euclidean norm of their respective tapinputs
All L taps in the top \(\left \lfloor \frac {M}{L} \right \rfloor \) subfilters are then selected to be updated, while no taps are selected in the remaining subfilters. Hence, this scheme exploits the sparsity present in the multichannel reference spectra but suffers from the same problem as the 3DM scheme, i.e., it may completely ignore subfilters in certain subbands and/or channels when the reference signals are spectrally coloured and stationary and/or exhibit large interchannel power difference.
3.3 1D MMax schemes
In this section, we present two tap selection schemes which apply the MMax criterion only across the single dimension of filter length, thereby exploiting the temporal sparsity present in the multichannel reference spectra. Unlike the 3DM and SPU schemes, these two schemes are designed to not completely ignore the subfilters with small magnitude tapinputs when allocating taps to be updated. In both schemes, the MMax criterion is applied on the Lelement vector \(\underline {\mathcal {X}}_{r}(k,\ell)\) for selecting taps in the subfilter \(\underline {\hat {H}}_{r}(k,\ell)\), with the number of taps selected given as
where ψ _{ r }(k,ℓ) is computed using two different criteria for the two schemes.
The fixed effort allocation (FEA) scheme selects the same number of filter taps in each subfilter, thereby not exploiting spectral and spatial sparsity. On the other hand, the dynamic effort allocation (DEA) scheme selects filter taps in each subfilter dynamically, aiming to exploit spectrospatial sparsity while not ignoring subfilters with small magnitude tapinputs. It should be noted that ψ _{ r }(k,ℓ) needs to satisfy the condition
as \(\mathcal {L}_{r}(k,\ell)\) obviously cannot be larger than L. The vector \(\underline {\mathcal {X}}_{r}(k,\ell)\) is sorted very efficiently using the SORTLINE routine, with the number of comparisons in the order of \(\mathcal {O}(\log _{2} L)\) per frame.
Substituting (16) and (19) into (17) gives
Assuming no rounding errors when computing the flooring operation in (21), the constraint in (16) can be reformulated as
3.3.1 Fixed effort allocation (FEA)
In the FEA scheme, the same number of filter taps are allocated to all K·R subfilters in every frame, i.e.,
where the superscript ^{F} denotes the FEA scheme. Substituting (23) in (22) yields
Thus, in each subfilter the filter coefficients corresponding to the ⌊Q·L⌋ largest magnitude tapinputs are selected to be updated in every frame. Due to the same number of taps selected in all subfilters, this scheme does not exploit the spectral and spatial sparsity present in the multichannel reference spectra.
3.3.2 Dynamic effort allocation (DEA)
In the DEA scheme, filter taps are dynamically allocated to the different subfilters based on their respective tapinput content. We propose to allocate a larger number of taps in every frame to subfilters with relatively larger magnitude tapinputs, while not completely ignoring the subfilters with smaller magnitude tapinputs. Thus, the DEA scheme aims to combine the advantages of the 3DM and the FEA schemes while avoiding their disadvantages, i.e. exploiting the spectrospatial sparsity present in the multichannel reference spectra, while not ignoring the subfilters with small magnitude tapinputs.
In general, in the DEA scheme the number of filter taps allocated to the subfilter for the k ^{th} subband in the r ^{th} channel is based on the corresponding tapinput content, which can be quantified by
where ·_{ p } denotes the l _{ p }norm for p>0. Hence, subfilters with larger magnitude tapinputs will have larger values of ϕ _{ r }(k,ℓ) as compared to subfilters with smaller magnitude tapinputs. Note that for simplicity, we have used p=1. The factor ψ _{ r }(k,ℓ) in (19) is then computed as
where the superscript ^{G} denotes the generic form of the DEA scheme, the function f(·) depends on the used tap allocation criterion and the minimum operator is required to satisfy the condition in (20). The number of taps selected in the subfilter \(\underline {\hat {H}}_{r}(k,\ell)\) is finally determined by substituting (26) in (19).
We propose to design the function f(·) based on the simple criterion that subfilters with ϕ _{ r }(k,ℓ) above a certain threshold ϕ _{th}(k,ℓ) get L filter taps selected, while all other subfilters get a number proportional to ϕ _{ r }(k,ℓ), i.e.,
Choosing an appropriate value for the threshold ϕ _{th}(ℓ) is quite important. On the one hand, choosing a low value could result in a large number of subfilters having L taps updated, which potentially dilutes the extent to which spectrospatial sparsity is exploited for tap allocation. On the other hand, choosing a large value could result in a large number of subfilters being completely ignored. Hence, we propose to use the average value of ϕ _{ r }(k,ℓ) across all subbands and channels, i.e.,
However, when using the function in (27) with the threshold in (28), it cannot be guaranteed that the constraint in (22) is satisfied in every frame. Since min(a,1)≤a for any real number \(a \in \mathcal {R}\), it can be easily shown that
such that
Thus, it is not guaranteed that M _{G}(ℓ) is equal to Q·K·R, and hence the constraint in (22) may not always be satisfied.
We will now distinguish 2 cases, i.e., M _{G}(ℓ)<Q·K·R and M _{G}(ℓ)>Q·K·R, and discuss how to adjust the filter tap allocation in order to satisfy the constraint.

Case 1: M _{G}(ℓ)<Q·K·R
Figure 2 shows an exemplary function f(ϕ _{ r }(k,ℓ)) (black curve) and corresponding \(\psi ^{G}_{r}(k,\ell)\) (blue curve) plotted for all K·R subfilters for the case M _{G}(ℓ)<Q·K·R, sorted from largest to smallest value in terms of ϕ _{ r }(k,ℓ). Please note that the area under the black curve is equal to K·R, while the area under the blue curve is equal to M _{G}(ℓ). In order to satisfy the constraint in (22), the surplus effort Q·K·R−M _{G}(ℓ) needs to be redistributed amongst the subfilters for which \(\psi ^{G}_{r}(k,\ell) < 1\). In order to do so, different criteria can be used for modifying \(\psi ^{G}_{r}(k,\ell)\):

Trickle Down (TD): When using this criterion (red), the surplus effort is redistributed via the trickledown procedure, i.e., the subfilters are filled up in sorted order of \(\psi ^{G}_{r}(k,\ell)\). Allocating taps in this way respects the spectrospatial sparsity present in the tapinputs, but would most likely completely ignore subfilters with the smallest magnitude tapinputs.

Equal Income (EI): When using this criterion (orange), the same number of taps are allocated in all subfilters for which \(\psi ^{G}_{r}(k,\ell) < 1\). This has the beneficial effect that no subfilters are ignored, but has the detrimental effect that the spectrospatial sparsity present in the tapinputs would most likely not be exploited for tap allocation.

Equal Bonus (EB): When using this criterion (green), the surplus effort is redistributed equally amongst all subfilters for which \(\psi ^{G}_{r}(k,\ell) < 1\). Allocating taps in this way respects the spectrospatial sparsity present in the tapinputs while making sure that all subfilters get a few taps updated.
Since the EB criterion attains a balance between exploiting spectrospatial sparsity and not completely ignoring subfilters, we decide to use this criteria in our proposed DEA scheme when M _{G}(ℓ)<Q·K·R, i.e.,
$$ \psi^{D}_{r}(k,\ell) = \{1\gamma(\ell)\} + \gamma(\ell) \cdot \psi^{G}_{r}(k,\ell), $$(31)where the superscript ^{D} denotes the proposed DEA scheme. The constant γ(ℓ) can be computed by substituting (31) into (22), yielding
$$ \gamma(\ell) = \frac{K \cdot R  Q \cdot K \cdot R}{ K \cdot R  M_{\text{G}}(\ell)}. $$(32)Thus, each subfilter has a minimum of ⌊{1−γ(ℓ)}·L⌋ taps selected in the ℓ ^{th} frame.


Case 2: M _{G}(ℓ)>Q·K·R
Similarly to Fig. 2, Fig. 3 shows an exemplary function f(ϕ _{ r }(k,ℓ)) (black curve) and corresponding \(\psi ^{G}_{r}(k,\ell)\) (blue curve) for the case M _{G}(ℓ)>Q·K·R. In order to satisfy the constraint, different criteria can be used for modifying \(\psi ^{G}_{r}(k,\ell)\):

Tax the Poor (TP): When using this criterion (red), the constraint is satisfied by decreasing the number of taps allocated to subfilters with the lowest \(\psi ^{G}_{r}(k,\ell)\). Such a scheme typically results in highly unequal tap allocation, with all taps reserved for a small number of subfilters with the largest magnitude tapinputs.

Tax the Rich (TR): When using this criterion (orange), the constraint is satisfied by decreasing the number of taps allocated to subfilters with the highest \(\psi ^{G}_{r}(k,\ell)\). This scheme has the beneficial effect that the majority of subfilters are not ignored when allocating taps but has the detrimental effect that the spectrospatial sparsity present in the tapinputs is most likely not exploited for tap allocation.

Equal Tax (ET): When using this criterion (violet), the constraint is satisfied by decreasing the same number of taps from all K·R subfilters. At first, this looks like a fair way of subtracting taps as it respects the spectrospatial sparsity in the tapinputs. However, it can be observed that this criterion ignores subfilters with the smallest magnitude tapinputs, as it takes away any small number of taps that may have been previously allocated to them.

Proportionate Tax (PT): When using this criterion (green curve), the constraint is satisfied by uniformly scaling down the number of allocated taps in the different subfilters. Allocating taps in this way respects the spectrospatial sparsity present in the tapinputs, while ensuring that lesser number of taps are reduced from subfilters with smaller \(\psi ^{G}_{r}(k,\ell)\).
Since the PT criterion attains a good balance between exploiting spectrospatial sparsity and not completely ignoring subfilters, we decide to use this criterion in our proposed DEA scheme when M _{G}(ℓ)>Q·K·R, i.e.,
$$ \psi^{D}_{r}(k,\ell) = \delta(\ell) \cdot \psi^{G}_{r}(k,\ell), $$(33)where the constant δ(ℓ) can be computed by substituting (33) into (22), yielding
$$ \delta(\ell) = \frac{Q \cdot K \cdot R} {M_{\text{G}}(\ell)}. $$(34) 
The proposed DEA scheme can thus be summarized as
The number of taps selected to be updated in the subfilter \(\underline {\hat {H}}_{r}(k,\ell)\) using the DEA scheme is finally determined by substituting (35) into (19).
4 Simulations, results and discussion
In this section, we present the reference signals and algorithmic parameters used, as well as the different metrics used to analyze signal sparsity, tap selection, and echo cancellation performance. We perform a sparsity analysis of the multichannel reference signals, individually across the three dimensions of subbands, channels, and filter length, as well as jointly across multiple dimensions. We then analyze the effect of using the different tap selection schemes on the echo cancellation performance obtained for the different types of reference signals used.
4.1 Signals and algorithmic parameters
In our simulations, we use timedomain reference signals at a sampling frequency of f _{ s }=16 kHz. The different reference signals used can be divided into two categories:

Synthetic signals

Mono brown and white noise signals, i.e., signals whose power densities change at the rate of 6 and 0 dB/octave, respectively.

Stereo white noise signal.


Realworld signals

Mono speech signals (TIMIT database)

Surroundsound movie signals (Dolby Digital 5.0 format)

Surroundsound concert signals (Dolby Digital 5.0 format)

The acoustic impulse responses have been measured in a room with T _{60}≈550 ms, with the microphone and the five loudspeakers placed on a circle of 3 m radius. The microphone was placed at a height of 1.2 m, the centre (C) loudspeaker was placed directly 0.85 m below the microphone, the front left (FL) and right (FR) loudspeakers were placed at the same height and 30 ^{o} either side of the microphone, and the side left (SL) and right (SR) loudspeakers were placed 0.4 m above and 110 ^{o} either side of the microphone, respectively. The acoustic echo signal d _{ r } is obtained by convolving the reference signal x _{ r } with the corresponding impulse response h _{ r } for V _{ r }=200 ms. We assume no nearend speech signal (s(n)=0) and no additive nearend noise signal (b(n)=0) for our simulations. For the mono reference signals, we use the impulse response corresponding to the C loudspeaker only, while for the stereo white noise signal, we use the impulse responses corresponding to the FL and FR channels. The timedomain signals have been transformed into the subband domain using STFT processing with N _{FFT}=512 (i.e., K=257) using a Hanning window and an overlap of 75%. We use a filter length L=20 for the MAEC filters, which corresponds to N _{FFT}·{1+0.25·(L−1)} samples or 184 ms. For updating the MAEC filters, a fixed stepsize of μ=0.1 and regularization parameter of ε=10^{−60} have been used.
4.2 Performance measures
Here, we present the different metrics used to analyze the sparsity present in the reference spectra, to analyze the performance of the different tap selection schemes in exploiting signal sparsity and to measure the echo cancellation performance.
4.2.1 Sparsity metric
To analyze the sparsity in the multichannel reference spectra across subbands, channels and frames, different metrics exist, such as the l _{0}norm, the l _{1} norm, the Gini index [31] and the Hoyer metric [32]. For an Nelement (nonzero) vector \(\underline {u} = [u_{0} \ldots u_{N1}]\), where the elements are sorted in order of magnitude u _{0}≤…≤u _{ N−1}, the Gini index is defined as
On the one hand, for the extreme case where u _{0}=…=u _{ N−1}, i.e., no sparsity in \(\underline {u}\), \(g(\underline {u}) = 0\). On the other hand, for the extreme case where u _{0}=…=u _{ N−2}=0 and u _{ N−1}≠0, i.e., very high sparsity in \(\underline {u}\), \(g(\underline {u}) = 1  \frac {1}{N}\), which for a large value of N is approximately equal to 1. Thus, the sparser the vector, the higher the Gini index.
Furthermore, the Gini index exhibits the following properties:

Limited range: \(0 \leq g(\underline {u}) \leq 1\).

Scaling invariance: \(g(a \cdot \underline {u}) = g(\underline {u})\), ∀\(a \in \mathcal {R}\).

Sensitivity to addition: \(g(a + \underline {u}) < g(\underline {u})\), ∀\(a \in \mathcal {R}, a > 0\).

Cloning invariance: \(g(\underline {u}) = g([\underline {u} \hspace {3pt} \underline {u}]) = g([\underline {u} \hspace {3pt} \underline {u} \hspace {3pt} \underline {u}])\)

Sensitivity to zeropadding: \(g([\underline {u} \hspace {3pt} 0]) > g(\underline {u})\)
The cloning invariance property allows a fair comparison of the sparsity of vectors with different number of elements. This is an important consideration, as we want to compare the sparsity of the reference spectra across the different dimensions of subbands, channels and frames. Note that the oftused Hoyer metric does not exhibit this invariance and is hence not suited for comparing vectors with different number of elements.
4.2.2 Tap selection performance
In order to quantify the closeness of a tap selection scheme to full tap selection, we use the socalled Closeness Measure [19, 20] which is defined as the ratio of the energy of the M selected tapinputs to the energy of all tapinputs, i.e.,
For full filter update, i.e., \(\underline {\alpha }(\ell) = \underline {1}\), we obviously obtain ξ=1. For a given Q, the 3DM scheme maximizes the Closeness Measure in every frame, as it selects the M largest magnitude tapinputs. The expectation and assumption is that the tap selection scheme yielding the largest Closeness Measure also results in the smallest difference in AEC performance compared to updating the filters using full tap selection.
4.2.3 Echo cancellation performance
The echo cancellation performance is evaluated using the echo return loss enhancement (ERLE) [2], which is defined as
where \(\hat {d}(n)\) is the timedomain signal corresponding to the total MAEC filter output \(\hat {D}(k,\ell)\) and E[·] denotes the statistical expectation operator. In practice, the ERLE is computed by approximating the expectation operator with the current sample value. The speed of convergence of the MAEC filters is assessed using the t _{20} metric, which is the time required for the ERLE to reach 20 dB.
4.3 Sparsity analysis
In this section, we present an example to illustrate the amount of sparsity typically present in realworld multichannel spectra across subbands, channels and frames, and also jointly across multiple dimensions. Figure 4 depicts the waveform of a 10 s segment from the soundtrack of a 5channel movie signal, with the spectrograms of the C, FL, FR, SL and SR channels shown in the subplots below. Each magnitude spectrogram is composed of K=257 subbands and T=1247 frames. In this movie signal, the centre channel contains the speech content, while the surroundsound channels contain the background score.
From these spectrograms, we first analyze the sparsity across subbands (spectral sparsity), across frames (temporal sparsity) and across channels (spatial sparsity). The Gini index for spectral sparsity in each channel is computed in every frame on a vector of K spectral coefficients, as exemplarily shown in Fig. 4 b for the centre channel using the magenta box in frame 200. Similarly, the Gini index for temporal sparsity in each channel is computed on a vector of T spectral coefficients in every subband, as shown using the blue box for subband 150. The Gini index for spatial sparsity in each subband and frame is computed on a vector of R spectral coefficients, as exemplarily shown using the black boxes for the first subband in frame 400. The Gini indices so obtained for spectral, temporal and spatial sparsity are shown in Fig. 5 a, b, and c, respectively. It can be observed that the multichannel reference spectra displays a fairly high amount of sparsity across all the three dimensions individually, with Gini indices on average above 0.5 (except for temporal sparsity in the surroundsound channels). The centre channel displays higher temporal sparsity as compared to the surroundsound channels as it contains timevarying speech content, while the surroundsound channels contain the background score, which varies slowly with time.
Additionally, we analyze the sparsity present in the spectra jointly across multiple dimensions. In Fig. 5 a, the black curve displays the Gini index for the joint spectrospatial sparsity, computed in every frame on a vector with K·R spectral coefficients. Similarly, in Fig. 5 b, the black curve displays the Gini index for the joint spatiotemporal sparsity, computed in every subband on a vector with R·T spectral coefficients. The Gini index for the joint spectrotemporal sparsity in each channel is computed by processing the magnitude spectrogram of that channel and is plotted in Fig. 5 d, along with the joint spectrospatiotemporal sparsity for all K·R·T coefficients. From this figure, it can be clearly observed that the multichannel reference spectra exhibit even higher levels of sparsity when analyzed across multiple dimensions, with Gini indices on average above 0.85. This provides the motivation to exploit sparsity jointly across subbands, channels and frames for the purpose of tap selection.
Figure 6 shows the Gini indices for the joint spectrospatiotemporal sparsity for the different considered reference signals. The stereo white noise signal is chosen to be spatially sparse, with an interchannel broadband power ratio of 20 dB. Firstly, it can be observed for the synthetic signals that the spectrally colored brown noise signal and the stereo white noise signal are obviously more sparse than the mono white noise signal. Secondly, it can be observed that typical realworld signals such as mono speech and fivechannel movie and concert signals also display high amounts of sparsity.
4.4 Analysis of tap selection schemes for synthetic signals
In this section, we analyze the effect of using the constrained tap selection schemes from Section 3 (3DM, SPU, FEA and DEA) for synthetic signals.
4.4.1 Effect of Spectral Coloration
For the different tap selection schemes, Fig. 7 shows the number of taps selected in each subband when using a mono brown signal with Q=0.2. For the 3DM and SPU schemes, a larger number of taps are selected in the lowfrequency subbands which contain the larger magnitude tapinputs, while the highfrequency subbands with the smallest magnitude tapinputs get no taps selected. Since the FEA scheme does not exploit spectral sparsity, it allocates an equal number of taps in all subfilters irrespective of the signal content. The proposed DEA scheme achieves a balance by allocating more taps to subfilters with larger magnitude tapinputs (thereby exploiting spectral sparsity), while not completely ignoring the subfilters with the smallest magnitude tapinputs.
4.4.2 Effect of InterChannel Power Ratio
We now consider a stereo white noise signal, where the broadband power of the first and the second channel is denoted as λ _{1} and λ _{2}, respectively. Figure 8 shows the effect of the interchannel power ratio \(\frac {\lambda _{2}}{\lambda _{1}}\) on the number of taps selected in the subfilters of the first channel (as a fraction of the M taps selected in both channels) for the different tap selection schemes with Q=0.2. When using the 3DM and SPU schemes, for λ _{1}>λ _{2}, the subfilters in the first channel get the majority of the M taps selected. Thus, both schemes are highly spatially selective, as hardly any taps of the subfilters in the less dominant reference channel are updated (e.g., for the SPU scheme when the interchannel power ratio is larger than 5 dB and for the 3DM scheme when the interchannel power difference ratio is larger than 10 dB). Since the FEA scheme does not exploit spatial sparsity, it allocates an equal number of taps to the subfilters in the first and the second channel (i.e., \(\frac {M}{2}\) taps each), irrespective of the interchannel power ratio. The proposed DEA scheme achieves a balance by allocating more taps to the subfilters in the dominant reference channel (thereby exploiting spatial sparsity), while not completely ignoring the channel with the smaller magnitude tapinputs.
4.4.3 Closeness Measure
For different values of Q, Fig. 9 depicts the Closeness Measure ξ obtained when using the different tap selection schemes for mono brown, mono white and stereo white noise signals. For the stereo white noise signal, an interchannel power ratio of 20 dB has been chosen. This figure shows how close the different tap selection schemes are to full tap selection in terms of the energy of the selected tapinputs. By design, the 3DM scheme maximizes the Closeness Measure for a given Q, and hence yields the highest values for each signal. For a highly sparse signal such as the mono brown signal, a very high value for the Closeness Measure (≈1) is obtained for the 3DM scheme even when only 10% of the total filter taps are selected (i.e., Q=0.1). This means that just 10% of the tapinputs contain almost the entire energy. For the least sparse mono white noise signal, low values of the Closeness Measure are obtained for all schemes, especially for the SPU scheme. For example, for Q=0.5, a Closeness Measure of about 0.85 is obtained for the 3DM, FEA and DEA schemes, whereas a Closeness Measure of about 0.6 is obtained for the SPU scheme. The Closeness Measure values obtained for the stereo white signal for all schemes lie in between those obtained for the more sparse mono brown noise signal and the less sparse mono white noise signal, except for the FEA scheme, which yields the same values as for the mono white noise signal. The SPU scheme gives high values for highly sparse signals and very low values for signals with low amounts of sparsity, while the proposed DEA scheme performs similarly to the 3DM scheme for highly sparse signals and similarly to the FEA scheme for signals with low amounts of sparsity.
4.4.4 ERLE and t _{20}
As shown by the previous experiments, depending on the spectral coloration and the interchannel power ratio of the reference signals, each considered tap selection scheme results in a different distribution of the selected taps across subbands and channels, and a different Closeness Measure. Hence, it is to be expected that the tap selection schemes have an influence on the overall acoustic echo cancellation performance, i.e. ERLE and speed of filter convergence.
For mono brown, mono white and stereo white noise (interchannel power ratio of 20 dB) signals, Fig. 10 a shows the ERLE convergence curves for the 3DM, SPU, FEA, and DEA tap selection schemes (Q=0.2), compared to full filter update (Q=1). Figure 10 b shows the corresponding t _{20} values for different values of the parameter Q. It can be observed that for signals with a high amount of spectral sparsity, such as the mono brown noise signal, the DEA scheme yields the best echo cancellation performance, while the 3DM and SPU schemes yield the poorest performance despite obtaining the highest values for the Closeness Measure. This is due to the highly spectrally selective nature of the 3DM and SPU schemes (discussed in Section 4.4.1), i.e., the subfilters with the smallest magnitude tapinputs do not have taps updated in every frame, resulting in very slow convergence of these subfilters and thus negatively affecting the overall echo cancellation performance. For the least sparse mono white noise signal, it can be observed that the 3DM, FEA, and DEA schemes yield similar echo cancellation performance, while the SPU again yields the poorest performance. This may be due to the fact that the SPU scheme is the only one which completely ignores entire subbands when updating the filters, while the other schemes may allocate a few taps to each subband when the reference signal has a low amount of sparsity. For the spatially sparse stereo white noise signal, the DEA scheme performs better than the FEA scheme, both in terms of the converged ERLE value as well as the t _{20} values. For all considered signals, the ERLE and t _{20} values obtained by the proposed DEA scheme for Q=0.2 are very similar to those obtained for full filter update. Thus, the DEA scheme gives very similar echo cancellation performance to full filter update even when only 20% of the total MAEC filter taps are updated in every frame.
4.5 Analysis of tap selection schemes for realworld signals
Contrary to the synthetic (stationary) signals in the previous section, in this section we investigate the effect of using constrained tap selection schemes on the echo cancellation performance for (nonstationary) realworld signals.
For a mono speech signal, Fig. 11 shows the ERLE curves obtained when the MAEC filters are updated using the different tap selection schemes for Q=0.2 and for full filter update (Q=1) for a period of 10 s. For this signal, we find that even when only 20% of all filter taps are updated in every frame, both the 3DM scheme and the proposed DEA scheme typically perform as well as full filter update in terms of ERLE, with the FEA scheme performing slightly worse (about 1–2 dB). On the other hand, the SPU scheme performs significantly worse, yielding about 7–8 dB deterioration in terms of ERLE.
For a 5channel concert signal, Fig. 12 shows the ERLE curves obtained when the MAEC filters are updated using the different tap selection schemes for Q=0.2 and for full filter update (Q=1) for a period of 30 s. For this signal, we find that even when only 20% of all filter taps are updated in every frame, both the 3DM scheme and the proposed DEA scheme perform almost identically to full filter update in terms of ERLE, with less than 1 dB deterioration, while the FEA scheme leads to about 2–4 dB deterioration in terms of ERLE. The SPU scheme again performs significantly worse, yielding about 10–12 dB deterioration in ERLE. It can be seen that around the 12s mark, all schemes witness a sudden drop in ERLE. This is because the tapinput covariance matrix becomes illconditioned, leading to an increase in misalignment. However, it can also be observed that even though the FEA and DEA schemes have not been designed to tackle the misalignment problem, they do not deteriorate the problem further.
Additionally, Fig. 13 shows the number of taps \(\mathcal {L}_{r}(k,\ell)\) updated in the different subfilters in every frame using the DEA scheme for Q=0.2. It can be observed that the subfilters in each channel get a small number of taps selected in every frame, where the number of taps updated across subbands depends on the spectral content present in each channel. As the centre channel for this signal consists of only speech, the tap allocation for the centre channel strongly resembles the spectrogram of a speech signal. As the surroundsound channels are mainly dominated by background score and lowfrequency crowd noise but also contain some speech, this is reflected in how taps are allocated in the surroundsound channels.
5 Computational effort
When compared to full filter update, implementing a tap selection scheme requires some computational overhead, but still may result in significant savings when updating the MAEC filters, as only a fraction Q of the total N filter taps are updated in every frame. The computational effort per frame for implementing the different tap selection schemes and for updating the MAEC filters using the PUNLMS algorithm is given in Table 1. The computations have been divided into four categories, namely the number of additions (# Adds), multiplications (# Mults), divisions (# Divs) and comparisons (# Comps). Please note that all complex operations have been converted into an equivalent number of real operations, e.g. 1 complex multiplication has been counted as 4 real multiplications and 2 real additions.
Figure 14 is an exemplary figure depicting the total computational effort required per frame for implementing tap selection and partial filter update for different values of Q. The numbers have been computed for K=257, R=5, and L=20 and by assuming that the comparison, multiplication and division operations are 1, 4 and 15 times as computationally expensive as an addition operation, respectively. The numbers have been plotted as a percentage of the computational effort required for full filter update, i.e., the effort required for updating the MAEC filters using the PUNLMS algorithm with Q=1. For these assumed settings, it can be observed that the total computational effort for the 3DM, SPU, FEA and DEA schemes is smaller than full filter update for Q<0.27, Q<0.95, Q<0.96 and Q<0.93, respectively. Hence, the SPU, FEA, and DEA schemes are almost always cheaper than full filter update. When only 20% of the MAEC filter taps are updated in every frame (Q=0.2), the 3DM scheme requires 94%, while the SPU, FEA, and DEA schemes require about 28% of the total computational effort required for full filter update. Using the SPU and DEA schemes results in slightly larger computational effort as compared to the FEA scheme due to the additional overhead required for computing η _{ r }(k,ℓ) in (18) and \(\psi ^{D}_{r}(k,\ell)\) in (35), respectively.
6 Conclusions
In this paper, different tap selection schemes for constrained partial updates of subband MAEC filters have been compared. Realworld multichannel signals have been analyzed and shown to be sparse across subbands (spectrally), channels (spatially), and frames (temporally). This sparsity is then exploited by different tap selection schemes for updating the MAEC filters. The MAEC system consists of a dedicated subband AEC filter for each loudspeaker channel, with each filter composed of multiple subfilters, i.e., one subfilter per subband per channel. The first tap selection scheme considered applied the wellknown MMax criterion on the multichannel input spectra across all three dimensions, and is hence called the 3DM scheme. This scheme jointly exploits the spectral, spatial and temporal sparsity in the input signals but typically results in some subfilters having no taps updated. In order to avoid this problem, two new schemes have been presented which perform tap selection by applying the MMax criterion only across filter length (and thereby exploit temporal sparsity for updating each subfilter) and do not completely ignore the subfilters with the smallest magnitude tapinputs. The FEA scheme allocates a fixed number of taps to be updated in each subfilter per frame, while the proposed DEA scheme exploits the joint spectrospatial sparsity present in the input signals for dynamically allocating the number of taps to be updated in the different subfilters. The new tap selection schemes have been compared to the stateoftheart SPU tap selection scheme in the subband domain, which displays similar properties to the 3DM scheme. The proposed DEA scheme is designed such that it selects more taps in the subfilters with larger magnitude tapinputs (like the 3DM and SPU schemes) while not completely ignoring the subfilters with smaller magnitude tapinputs (like the FEA scheme). Simulation results for speech and music signals showed that in terms of ERLE and convergence speed, the 3DM and DEA schemes achieved almost identical echo cancellation performance compared to full filter update even when only 20% of the MAEC filter taps were updated in every frame, while the FEA and SPU schemes performed worse (about 2–4 dB and 10–12 dB deterioration in ERLE, respectively). The SPU, FEA and DEA tap selection schemes have a reduced computational cost compared to full filter update, while the 3DM scheme does not necessarily lead to reduction in computational complexity. Hence, in conclusion, the proposed DEA tap selection scheme yields almost identical echo cancellation performance compared to updating all filter taps at a significantly reduced computational cost.
References
J Benesty, T Gansler, DR Morgan, MM Sondhi, SL Gay, Advances in Network and Acoustic Echo Cancellation (SpringerVerlag, Berlin, 2001).
E Hänsler, G Schmidt, Acoustic Echo and Noise Control  a Practical Approach (Wiley and Sons, Hoboken, NJ, 2004).
MM Sondhi, DR Morgan, JL Hall, Stereophonic acoustic echo cancellation—an overview of the fundamental problem. IEEE Sig. Process Lett. 2:, 148–151 (1995).
H Buchner, J Benesty, W Kellermann, in Adaptive signal processing: Application to realworld problems, ed. by J Benesty, Y Huang. Multichannel frequencydomain adaptive filtering with application to acoustic echo cancellation (SpringerVerlagBerlin/Heidelberg, 2003), pp. 95–128.
H Buchner, J Benesty, W Kellermann, Generalized multichannel frequencydomain adaptive filtering: efficient realization and application to handsfree speech communication. Signal Proc. 85(3), 549–570 (2005).
Y Huang, J Benesty, J Chen, Identification of acoustic MIMO systems: Challenges and opportunities. Signal Proc. 86(6), 1278–1295 (2006).
SC Douglas, Adaptive filters employing partial updates. IEEE Trans. Circ. Syst.II Analog. Digit. Signal Proc. 44(3), 209–216 (1997).
T Schertler, Selective block update of NLMS type algorithms. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Seattle USA.3:, 1717–1720 (1998).
T Aboulnasr, K Mayyas, Selective coefficient update of gradientbased adaptive algorithms. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Munich Germany.3:, 1929–1932 (1997).
T Aboulnasr, K Mayyas, Complexity reduction of the NLMS algorithm via selective coefficient update. IEEE Trans. Signal Proc. 47(5), 1421–1424 (1999).
K Doǧançay, O Tanrikulu, Adaptive filtering algorithms with selective partial updates. IEEE Trans. Circ. Syst.II Analog. Digit. Signal Proc.48(8), 762–769 (2001).
K Doǧançay, PA Naylor, Recent advances in partial update and sparse adaptive filters. Proc. Eur. Signal Proc. Conf. Antalya Turkey, 1–4 (2005).
PA Naylor, W Sherliker, A shortsort MMax NLMS partialupdate adaptive filter with application to echo cancellation. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Hong Kong.5:, 373–376 (2003).
H Deng, M Doroslovački, New sparse adaptive algorithms using partial update. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Montreal Canada.2:, 845–848 (2004).
J Benesty, DR Morgan, MM Sondhi, A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation. IEEE Trans. Speech Audio Proc. 6(2), 156–165 (1998).
AWH Khong, J Benesty, PA Naylor, Stereophonic acoustic echo cancellation: analysis of the misalignment in the frequency domain. IEEE Signal Proc. Lett. 13(1), 33–36 (2006).
M Ali, Stereophonic acoustic echo cancellation system using timevarying allpass filtering for signal decorrelation. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. 6:, 3689–3692 (1998).
AWH Khong, PA Naylor, Reducing interchannel coherence in stereophonic acoustic echo cancellation using partial update adaptive filters. Proc. Eur. Signal Proc. Conf. Vienna Austria.405–408 (2004).
AWH Khong, PA Naylor, A family of selectivetap algorithms for stereo acoustic echo cancellation. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Philadelphia USA. 3:, 133–136 (2005).
AWH Khong, PA Naylor, Stereophonic acoustic echo cancellation employing selectivetap adaptive algorithms. IEEE Trans. Audio Speech Lang. Proc. 14(3), 785–796 (2006).
E Ferrara, Fast implementations of LMS adaptive filters. IEEE Trans. Acoust. Speech Signal Proc. 28:, 474–475 (1980).
JMP Borrallo, MG Otero, On the implementation of a partitioned block frequency domain adaptive filter (PBFDAF) for long acoustic echo cancellation. Signal Proc. 27:, 301–315 (1992).
JS Soo, K Pang, Multidelay block frequency domain adaptive filter. IEEE Trans. Acoust. Speech Signal Proc. 38(2), 373–376 (1990).
JJ Shynk, Frequencydomain and multirate adaptive filtering. IEEE Signal Proc. Mag. 9(1), 14–37 (1992).
S Haykin, Adaptive Filter Theory (Prentice Hall, Upper Saddle River, NJ, 1996).
X Lin, AWH Khong, M Doroslovački, PA Naylor, Frequencydomain adaptive algorithm for network echo cancellation in VoIP. EURASIP J. Audio Speech Music Proc.2008:, 1–9 (2008).
R Crochiere, A weighted overlapadd method of shorttime Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Signal Proc. 28:, 99–102 (1980).
DE Knuth, The Art of Computer Programming, vol. 3 (AddisonWesley, Reading, MA, 1973).
I Pitas, Fast algorithms for running ordering and max/min calculation. IEEE Trans. Circ. Syst. 36(6), 795–804 (1989).
NK Desiraju, S Doclo, T Gerkmann, T Wolff, Efficient multichannel acoustic echo cancellation using constrained sparse filter updates in the subband domain. Proc. ITG Symp. Speech Commun. Erlangen Germany.1–4 (2014).
NP Hurley, ST Rickard, Comparing measures of sparsity. IEEE Trans. Inform. Theory. 55(10), 4723–4741 (2009).
PO Hoyer, Nonnegative matrix factorization with sparseness constraints. J Mach. Learn. Res. 5:, 1457–1469 (2004).
Acknowledgements
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/20072013) project DREAMS under grant agreement ITNGA2012316969. The authors would also like to acknowledge the contribution of Anirudha Kalya in helping develop the DEA tap selection scheme.
Author information
Authors and Affiliations
Contributions
The contribution of the first author consists of developing the main algorithmic idea, performing simulations, analyzing the simulation results and drafting the article. The contribution of the second and third authors consist of critically discussing the developed algorithms and the simulation results with the first author, and proofreading and revising the article. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Desiraju, N., Doclo, S. & Wolff, T. Efficient multichannel acoustic echo cancellation using constrained tap selection schemes in the subband domain. EURASIP J. Adv. Signal Process. 2017, 63 (2017). https://doi.org/10.1186/s1363401704975
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363401704975