 Research
 Open Access
Efficient multichannel acoustic echo cancellation using constrained tap selection schemes in the subband domain
 Naveen Kumar Desiraju^{1, 2},
 Simon Doclo^{2}Email author and
 Tobias Wolff^{1}
https://doi.org/10.1186/s1363401704975
© The Author(s) 2017
 Received: 31 March 2017
 Accepted: 14 August 2017
 Published: 4 September 2017
Abstract
Acoustic echo cancellation (AEC) is a key speech enhancement technology in speech communication and voiceenabled devices. AEC systems employ adaptive filters to estimate the acoustic echo paths between the loudspeakers and the microphone(s). In applications involving surround sound, the computational complexity of an AEC system may become demanding due to the multiple loudspeaker channels and the necessity of using long filters in reverberant environments. In order to reduce the computational complexity, the approach of partially updating the AEC filters is considered in this paper. In particular, we investigate tap selection schemes which exploit the sparsity present in the loudspeaker channels for partially updating subband AEC filters. The potential for exploiting signal sparsity across three dimensions, namely time, frequency, and channels, is analyzed. A thorough analysis of different stateoftheart tap selection schemes is performed and insights about their limitations are gained. A novel tap selection scheme is proposed which overcomes these limitations by exploiting signal sparsity while not ignoring any filters for update in the different subbands and channels. Extensive simulation results using both artificial as well as realworld multichannel signals show that the proposed tap selection scheme outperforms stateoftheart tap selection schemes in terms of echo cancellation performance. In addition, it yields almost identical echo cancellation performance as compared to updating all filter taps at a significantly reduced computational cost.
Keywords
 Acoustic echo cancellation
 Multichannel
 Subband domain
 Adaptive filters
 Partial filter updates
 Tap selection
 Signal sparsity
 Computational complexity
1 Introduction
Acoustic echo cancellation (AEC) [1, 2] is a key technology used in handsfree telephony and voiceenabled systems. An AEC system consists of an adaptive filter which estimates the acoustic echo path between the loudspeaker and the microphone. Using this estimated echo path, an estimate of the acoustic echo signal is generated which is then subtracted from the microphone signal. When multiple loudspeakers are present, as is the case for surroundsound systems, Multichannel Acoustic Echo Cancellation (MAEC) systems are required [3–6]. These systems consist of multiple adaptive filters dedicated to estimate the acoustic echo paths between each loudspeaker and each microphone, i.e., one filter per channel. When employing timedomain MAEC systems in large and/or reverberant rooms, very long filters with several thousand taps may be required in order to achieve effective echo cancellation. Using such long filters requires large computational effort, both for updating the filters as well as for generating the acoustic echo signal estimates.
In order to reduce computational complexity of timedomain adaptive filters, a number of tap selection schemes [7–14] have been proposed for implementing partial updates of the adaptive filters. These schemes reduce complexity by updating only a subset M of all N filter taps in each iteration, where the subset is chosen based on a tap selection criterion. Since speech and/or surroundsound entertainment signals usually exhibit significant sparsity across frequency (due to spectrally colored content), channels (due to different content in the different loudspeakers) and time (due to nonstationary content), a number of tap selection schemes have been proposed which exploit the sparsity present in the loudspeaker signals for partially updating the filters [8–12]. The MMax [9, 10] is a wellknown tap selection scheme which exploits signal sparsity by selecting the filter taps corresponding to the M largest magnitude tapinputs in each iteration. For a given M, this scheme maximizes the energy of the update in each iteration and thereby gives the closest possible performance to full filter update in terms of minimizing the mean squared error. Another tap selection scheme which exploits signal sparsity is the selectivepartialupdate (SPU) [11] tap selection scheme, where the Ntap adaptive filter is first divided into B blocks, which are then ranked according to the squared Euclidean norm of their respective tapinputs. Based on this ranking, in each iteration the top \(\lfloor B \cdot \frac {M}{N} \rfloor \) blocks, where ⌊·⌋ denotes the flooring operation, are selected to be updated. Many other schemes have been proposed which further improve performance by exploiting the sparseness of the echo path [13, 14]. Since sparseness of the echo path is more relevant for applications such as network echo cancellation [1], and not particularly relevant for the considered AEC application (as acoustic impulse responses are not particularly sparse), we will not consider such approaches in this paper.
Apart from large computational complexity, MAEC systems also suffer from other notable problems such as the misalignment problem [3, 15, 16]. Since in MAEC systems the different loudspeaker input signals are typically correlated with each other, the input covariance matrix may be illconditioned, possibly resulting in a large filter misalignment and a slow convergence speed. It should be realized that the misalignment problem is typically more severe in the context of speech communication systems, since the loudspeaker signals are obtained by filtering the same source (farend speaker), as compared to surroundsound systems, where the loudspeaker signals may be independent of each other. The most common approach to tackle the misalignment problem is to decorrelate the tapinputs, for which several techniques have been proposed in literature [3, 15, 17]. Tap selection schemes such as the exclusivemaximum (XM) [18–20] have also been proposed to specifically tackle the misalignment problem for stereo AEC applications. The XM scheme improves the conditioning of the tapinput covariance matrix via exclusive updates of the two adaptive filters, i.e., in each iteration the same filter tap index is never selected in both channels. In this paper, however, we do not aim to solve the misalignment problem using tap selection schemes and do not claim to improve the misalignment performance for highly coherent loudspeaker signals, i.e., our main motivation is solely computational complexity reduction of MAEC systems.
As an alternative to timedomain adaptive filters, frequencydomain and subband adaptive filters are frequently used as they enable more efficient and frequencydependent filter updates [2, 21–25]. Frequencydomain adaptive filtering algorithms, such as the frequencydomain least mean square (FLMS) [21], the partitioned block frequencydomain adaptive filtering (PBFDAF) [22] and the multidelay block frequencydomain adaptive filtering (MDF) algorithm [23], are typically based on the overlapsave method [24, 25] and use the fast Fourier transform (FFT) to efficiently compute the required timedomain convolution and correlation operations. In [26], the MMax tap selection scheme has been proposed for the frequencydomain MDF algorithm. Alternatively, adaptive filtering can be performed using subband processing, where an analysis filterbank transforms the timedomain signals into the subband domain, the filter adaptation and processing is performed independently in each subband, and a synthesis filterbank is used to reconstruct the timedomain signals. In this paper, we will only consider subband adaptive filters. More specifically, we will use the wellknown weighted overlapadd (WOLA) method [2, 27], i.e., using an FFT analysis filterbank to transform the (windowed) timedomain signals to the shorttime Fourier transform (STFT) domain and an inverse FFT synthesis filterbank. Such a processing scheme provides a suitable compromise between computational complexity and latency, and enables to achieve a suitable time and frequency resolution.
In general, using a tap selection scheme may lead to a significant amount of processing overhead, primarily due to the required sorting effort. The computational savings obtained due to partial filter update are offset (and may even be exceeded in some cases) by the additional effort required for sorting. Compared to popular sorting algorithms such as the QUICKSORT routine [28], a more efficient fast running algorithm known as the SORTLINE routine [29] has been proposed for sorting vectors which contain many elements in common with a presorted vector from a previous iteration, which is often the case with tapinput vectors from one iteration to the next.
In this paper, we propose and investigate different tap selection schemes in the subband domain for constrained partial updates of subband MAEC filters. Please note that in such a framework, the tap selection schemes operate on the magnitudes of the complexvalued STFT coefficients. Also, we consider the subband AEC filter in each channel to be composed of a number of subfilters, i.e., one subfilter per subband. First, we extend the MMax tap selection scheme proposed for complexvalued loudspeaker signals in [26] to the multichannel scenario, thereby applying the MMax criterion across three dimensions, i.e., subbands, channels and filter length. Then, we present two new tap selection schemes which apply the MMax criterion independently in each subfilter across filter length only. The first scheme selects the same number of taps in each subfilter, while the second scheme exploits the sparsity present in the loudspeaker signals across frequency and channels to select taps dynamically in the different subfilters. Some preliminary results were obtained in [30] which indicated that signal sparsity present in realworld multichannel entertainment signals can be exploited to efficiently update the MAEC filters. The proposed tap selection schemes are then compared to the SPU tap selection scheme [11] in the subband domain^{1}.
The remainder of the paper is organized as follows. The signal model is presented in Section 2 and the different tap selection schemes considered are presented in Section 3. Section 4 presents a sparsity analysis for several synthetic and realworld multichannel signals, and the echo cancellation performance obtained when the different tap selection schemes are used. Section 5 discusses the computational effort required for the different tap selection schemes and the computational savings obtained when performing partial filter updates.
2 Signal model
where \(d(n) = \sum _{r=1}^{R} d_{r}(n)\) denotes the total acoustic echo component.
where \(j = \sqrt {1}\), F denotes the frameshift and W _{ana} denotes the analysis window. In the remainder of the paper, the terms reference channels and reference spectra will be used to refer to the loudspeaker signals and their corresponding STFT coefficients, respectively.
i.e., L taps ×K subbands ×R channels.
where Y denotes the complexvalued spectrum of the microphone signal y, computed similarly to (3).
From hereon, we will refer to (12) as the partial update NLMS (PUNLMS) algorithm.
containing the magnitudes of all MAEC filter tapinputs. Similarly to (15), we define the Nelement tap selection vector \(\underline {\alpha }(\ell)\) by stacking the vector \(\underline {T}_{r}(k,\ell)\) over all K subbands and R channels.
3 Tap selection schemes
The first tap selection scheme we investigate is the 3D MMax scheme, which applies the MMax criterion across the three dimensions of subbands, channels, and filter length for selecting taps. Then, we investigate the SPU scheme, which sorts the K·R subfilters in each frame according to the squared Euclidean norm of their respective tapinputs and then selects all L taps in the top \(\left \lfloor \frac {M}{L} \right \rfloor \) subfilters. Finally, we present two 1D MMax schemes which apply the MMax criterion only across the dimension of filter length, with the first scheme selecting the same number of taps in all subfilters and the second scheme dynamically selecting taps in each subfilter.
3.1 3D MMax (3DM) scheme
The 3D MMax tap selection scheme is an extension of the MMax scheme proposed for the singlechannel scenario in [26] to the multichannel scenario. Using this scheme, the filter taps corresponding to the M largest magnitude tapinputs in every frame are selected to be updated by applying the MMax criterion on the vector \(\underline {\mathbf {X}}(\ell)\). The resulting tap selection vector \(\underline {\alpha }(\ell)\) can then be unstacked to obtain the vectors \(\underline {T}_{r}(k,\ell)\) corresponding to the K·R subfilters. Implementing this scheme requires sorting the Nelement vector \(\underline {\mathbf {X}}(\ell)\) in every frame which is done efficiently using the QUICKSORT routine, requiring comparisons in the order of \(\mathcal {O}\left (N \cdot \log _{2} N\right)\) per frame.
As this scheme applies the MMax criterion on the complete vector \(\underline {\mathbf {X}}(\ell)\), it is able to exploit the spectrospatiotemporal sparsity that may be present in the multichannel reference spectra, with the M selected taps distributed amongst the different subfilters in every frame. For reference spectra with significant temporal, spatial and spectral diversity/nonstationarity, it is highly likely that each of the N filter taps are eventually updated at some stage. However, if the reference spectra exhibit stationarity and large spectral coloration and/or large interchannel power difference, all M taps may be selected in only a small subset of the K·R subfilters in every frame. This may result in the subfilters in certain subbands and/or channels being completely ignored for a long time period, which may severely affect filter convergence. This disadvantage of the 3DM scheme motivates us to look for schemes which do not completely ignore these subfilters when allocating taps to be updated.
3.2 SPU scheme
All L taps in the top \(\left \lfloor \frac {M}{L} \right \rfloor \) subfilters are then selected to be updated, while no taps are selected in the remaining subfilters. Hence, this scheme exploits the sparsity present in the multichannel reference spectra but suffers from the same problem as the 3DM scheme, i.e., it may completely ignore subfilters in certain subbands and/or channels when the reference signals are spectrally coloured and stationary and/or exhibit large interchannel power difference.
3.3 1D MMax schemes
where ψ _{ r }(k,ℓ) is computed using two different criteria for the two schemes.
as \(\mathcal {L}_{r}(k,\ell)\) obviously cannot be larger than L. The vector \(\underline {\mathcal {X}}_{r}(k,\ell)\) is sorted very efficiently using the SORTLINE routine, with the number of comparisons in the order of \(\mathcal {O}(\log _{2} L)\) per frame.
3.3.1 Fixed effort allocation (FEA)
Thus, in each subfilter the filter coefficients corresponding to the ⌊Q·L⌋ largest magnitude tapinputs are selected to be updated in every frame. Due to the same number of taps selected in all subfilters, this scheme does not exploit the spectral and spatial sparsity present in the multichannel reference spectra.
3.3.2 Dynamic effort allocation (DEA)
In the DEA scheme, filter taps are dynamically allocated to the different subfilters based on their respective tapinput content. We propose to allocate a larger number of taps in every frame to subfilters with relatively larger magnitude tapinputs, while not completely ignoring the subfilters with smaller magnitude tapinputs. Thus, the DEA scheme aims to combine the advantages of the 3DM and the FEA schemes while avoiding their disadvantages, i.e. exploiting the spectrospatial sparsity present in the multichannel reference spectra, while not ignoring the subfilters with small magnitude tapinputs.
where the superscript ^{ G } denotes the generic form of the DEA scheme, the function f(·) depends on the used tap allocation criterion and the minimum operator is required to satisfy the condition in (20). The number of taps selected in the subfilter \(\underline {\hat {H}}_{r}(k,\ell)\) is finally determined by substituting (26) in (19).
Thus, it is not guaranteed that M _{G}(ℓ) is equal to Q·K·R, and hence the constraint in (22) may not always be satisfied.

Case 1: M _{G}(ℓ)<Q·K·R
Figure 2 shows an exemplary function f(ϕ _{ r }(k,ℓ)) (black curve) and corresponding \(\psi ^{G}_{r}(k,\ell)\) (blue curve) plotted for all K·R subfilters for the case M _{G}(ℓ)<Q·K·R, sorted from largest to smallest value in terms of ϕ _{ r }(k,ℓ). Please note that the area under the black curve is equal to K·R, while the area under the blue curve is equal to M _{G}(ℓ). In order to satisfy the constraint in (22), the surplus effort Q·K·R−M _{G}(ℓ) needs to be redistributed amongst the subfilters for which \(\psi ^{G}_{r}(k,\ell) < 1\). In order to do so, different criteria can be used for modifying \(\psi ^{G}_{r}(k,\ell)\):
Trickle Down (TD): When using this criterion (red), the surplus effort is redistributed via the trickledown procedure, i.e., the subfilters are filled up in sorted order of \(\psi ^{G}_{r}(k,\ell)\). Allocating taps in this way respects the spectrospatial sparsity present in the tapinputs, but would most likely completely ignore subfilters with the smallest magnitude tapinputs.

Equal Income (EI): When using this criterion (orange), the same number of taps are allocated in all subfilters for which \(\psi ^{G}_{r}(k,\ell) < 1\). This has the beneficial effect that no subfilters are ignored, but has the detrimental effect that the spectrospatial sparsity present in the tapinputs would most likely not be exploited for tap allocation.

Equal Bonus (EB): When using this criterion (green), the surplus effort is redistributed equally amongst all subfilters for which \(\psi ^{G}_{r}(k,\ell) < 1\). Allocating taps in this way respects the spectrospatial sparsity present in the tapinputs while making sure that all subfilters get a few taps updated.
Since the EB criterion attains a balance between exploiting spectrospatial sparsity and not completely ignoring subfilters, we decide to use this criteria in our proposed DEA scheme when M _{G}(ℓ)<Q·K·R, i.e.,$$ \psi^{D}_{r}(k,\ell) = \{1\gamma(\ell)\} + \gamma(\ell) \cdot \psi^{G}_{r}(k,\ell), $$(31)where the superscript ^{ D } denotes the proposed DEA scheme. The constant γ(ℓ) can be computed by substituting (31) into (22), yielding$$ \gamma(\ell) = \frac{K \cdot R  Q \cdot K \cdot R}{ K \cdot R  M_{\text{G}}(\ell)}. $$(32)Thus, each subfilter has a minimum of ⌊{1−γ(ℓ)}·L⌋ taps selected in the ℓ ^{th} frame.


Case 2: M _{G}(ℓ)>Q·K·R
Similarly to Fig. 2, Fig. 3 shows an exemplary function f(ϕ _{ r }(k,ℓ)) (black curve) and corresponding \(\psi ^{G}_{r}(k,\ell)\) (blue curve) for the case M _{G}(ℓ)>Q·K·R. In order to satisfy the constraint, different criteria can be used for modifying \(\psi ^{G}_{r}(k,\ell)\):
Tax the Poor (TP): When using this criterion (red), the constraint is satisfied by decreasing the number of taps allocated to subfilters with the lowest \(\psi ^{G}_{r}(k,\ell)\). Such a scheme typically results in highly unequal tap allocation, with all taps reserved for a small number of subfilters with the largest magnitude tapinputs.

Tax the Rich (TR): When using this criterion (orange), the constraint is satisfied by decreasing the number of taps allocated to subfilters with the highest \(\psi ^{G}_{r}(k,\ell)\). This scheme has the beneficial effect that the majority of subfilters are not ignored when allocating taps but has the detrimental effect that the spectrospatial sparsity present in the tapinputs is most likely not exploited for tap allocation.

Equal Tax (ET): When using this criterion (violet), the constraint is satisfied by decreasing the same number of taps from all K·R subfilters. At first, this looks like a fair way of subtracting taps as it respects the spectrospatial sparsity in the tapinputs. However, it can be observed that this criterion ignores subfilters with the smallest magnitude tapinputs, as it takes away any small number of taps that may have been previously allocated to them.

Proportionate Tax (PT): When using this criterion (green curve), the constraint is satisfied by uniformly scaling down the number of allocated taps in the different subfilters. Allocating taps in this way respects the spectrospatial sparsity present in the tapinputs, while ensuring that lesser number of taps are reduced from subfilters with smaller \(\psi ^{G}_{r}(k,\ell)\).
Since the PT criterion attains a good balance between exploiting spectrospatial sparsity and not completely ignoring subfilters, we decide to use this criterion in our proposed DEA scheme when M _{G}(ℓ)>Q·K·R, i.e.,$$ \psi^{D}_{r}(k,\ell) = \delta(\ell) \cdot \psi^{G}_{r}(k,\ell), $$(33) 
The number of taps selected to be updated in the subfilter \(\underline {\hat {H}}_{r}(k,\ell)\) using the DEA scheme is finally determined by substituting (35) into (19).
4 Simulations, results and discussion
In this section, we present the reference signals and algorithmic parameters used, as well as the different metrics used to analyze signal sparsity, tap selection, and echo cancellation performance. We perform a sparsity analysis of the multichannel reference signals, individually across the three dimensions of subbands, channels, and filter length, as well as jointly across multiple dimensions. We then analyze the effect of using the different tap selection schemes on the echo cancellation performance obtained for the different types of reference signals used.
4.1 Signals and algorithmic parameters

Synthetic signals

Mono brown and white noise signals, i.e., signals whose power densities change at the rate of 6 and 0 dB/octave, respectively.

Stereo white noise signal.


Realworld signals

Mono speech signals (TIMIT database)

Surroundsound movie signals (Dolby Digital 5.0 format)

Surroundsound concert signals (Dolby Digital 5.0 format)

The acoustic impulse responses have been measured in a room with T _{60}≈550 ms, with the microphone and the five loudspeakers placed on a circle of 3 m radius. The microphone was placed at a height of 1.2 m, the centre (C) loudspeaker was placed directly 0.85 m below the microphone, the front left (FL) and right (FR) loudspeakers were placed at the same height and 30 ^{o} either side of the microphone, and the side left (SL) and right (SR) loudspeakers were placed 0.4 m above and 110 ^{o} either side of the microphone, respectively. The acoustic echo signal d _{ r } is obtained by convolving the reference signal x _{ r } with the corresponding impulse response h _{ r } for V _{ r }=200 ms. We assume no nearend speech signal (s(n)=0) and no additive nearend noise signal (b(n)=0) for our simulations. For the mono reference signals, we use the impulse response corresponding to the C loudspeaker only, while for the stereo white noise signal, we use the impulse responses corresponding to the FL and FR channels. The timedomain signals have been transformed into the subband domain using STFT processing with N _{FFT}=512 (i.e., K=257) using a Hanning window and an overlap of 75%. We use a filter length L=20 for the MAEC filters, which corresponds to N _{FFT}·{1+0.25·(L−1)} samples or 184 ms. For updating the MAEC filters, a fixed stepsize of μ=0.1 and regularization parameter of ε=10^{−60} have been used.
4.2 Performance measures
Here, we present the different metrics used to analyze the sparsity present in the reference spectra, to analyze the performance of the different tap selection schemes in exploiting signal sparsity and to measure the echo cancellation performance.
4.2.1 Sparsity metric
On the one hand, for the extreme case where u _{0}=…=u _{ N−1}, i.e., no sparsity in \(\underline {u}\), \(g(\underline {u}) = 0\). On the other hand, for the extreme case where u _{0}=…=u _{ N−2}=0 and u _{ N−1}≠0, i.e., very high sparsity in \(\underline {u}\), \(g(\underline {u}) = 1  \frac {1}{N}\), which for a large value of N is approximately equal to 1. Thus, the sparser the vector, the higher the Gini index.

Limited range: \(0 \leq g(\underline {u}) \leq 1\).

Scaling invariance: \(g(a \cdot \underline {u}) = g(\underline {u})\), ∀\(a \in \mathcal {R}\).

Sensitivity to addition: \(g(a + \underline {u}) < g(\underline {u})\), ∀\(a \in \mathcal {R}, a > 0\).

Cloning invariance: \(g(\underline {u}) = g([\underline {u} \hspace {3pt} \underline {u}]) = g([\underline {u} \hspace {3pt} \underline {u} \hspace {3pt} \underline {u}])\)

Sensitivity to zeropadding: \(g([\underline {u} \hspace {3pt} 0]) > g(\underline {u})\)
The cloning invariance property allows a fair comparison of the sparsity of vectors with different number of elements. This is an important consideration, as we want to compare the sparsity of the reference spectra across the different dimensions of subbands, channels and frames. Note that the oftused Hoyer metric does not exhibit this invariance and is hence not suited for comparing vectors with different number of elements.
4.2.2 Tap selection performance
For full filter update, i.e., \(\underline {\alpha }(\ell) = \underline {1}\), we obviously obtain ξ=1. For a given Q, the 3DM scheme maximizes the Closeness Measure in every frame, as it selects the M largest magnitude tapinputs. The expectation and assumption is that the tap selection scheme yielding the largest Closeness Measure also results in the smallest difference in AEC performance compared to updating the filters using full tap selection.
4.2.3 Echo cancellation performance
where \(\hat {d}(n)\) is the timedomain signal corresponding to the total MAEC filter output \(\hat {D}(k,\ell)\) and E[·] denotes the statistical expectation operator. In practice, the ERLE is computed by approximating the expectation operator with the current sample value. The speed of convergence of the MAEC filters is assessed using the t _{20} metric, which is the time required for the ERLE to reach 20 dB.
4.3 Sparsity analysis
Additionally, we analyze the sparsity present in the spectra jointly across multiple dimensions. In Fig. 5 a, the black curve displays the Gini index for the joint spectrospatial sparsity, computed in every frame on a vector with K·R spectral coefficients. Similarly, in Fig. 5 b, the black curve displays the Gini index for the joint spatiotemporal sparsity, computed in every subband on a vector with R·T spectral coefficients. The Gini index for the joint spectrotemporal sparsity in each channel is computed by processing the magnitude spectrogram of that channel and is plotted in Fig. 5 d, along with the joint spectrospatiotemporal sparsity for all K·R·T coefficients. From this figure, it can be clearly observed that the multichannel reference spectra exhibit even higher levels of sparsity when analyzed across multiple dimensions, with Gini indices on average above 0.85. This provides the motivation to exploit sparsity jointly across subbands, channels and frames for the purpose of tap selection.
4.4 Analysis of tap selection schemes for synthetic signals
In this section, we analyze the effect of using the constrained tap selection schemes from Section 3 (3DM, SPU, FEA and DEA) for synthetic signals.
4.4.1 Effect of Spectral Coloration
4.4.2 Effect of InterChannel Power Ratio
4.4.3 Closeness Measure
4.4.4 ERLE and t _{20}
As shown by the previous experiments, depending on the spectral coloration and the interchannel power ratio of the reference signals, each considered tap selection scheme results in a different distribution of the selected taps across subbands and channels, and a different Closeness Measure. Hence, it is to be expected that the tap selection schemes have an influence on the overall acoustic echo cancellation performance, i.e. ERLE and speed of filter convergence.
4.5 Analysis of tap selection schemes for realworld signals
Contrary to the synthetic (stationary) signals in the previous section, in this section we investigate the effect of using constrained tap selection schemes on the echo cancellation performance for (nonstationary) realworld signals.
5 Computational effort
Computational effort
Operation  3DM  SPU  FEA  DEA  PUNLMS 

# Adds  KR  3K R  0  6K R+1  4Q N+3K R−K 
# Mults  0  2K R  KR  3K R+2  4Q N+2K R+3K 
# Divs  0  0  0  2  K 
# Comps  N log2N  K R log2(K R)  K R(2 log2L + 2)  K R(2 log2L + 3) + 1  0 
6 Conclusions
In this paper, different tap selection schemes for constrained partial updates of subband MAEC filters have been compared. Realworld multichannel signals have been analyzed and shown to be sparse across subbands (spectrally), channels (spatially), and frames (temporally). This sparsity is then exploited by different tap selection schemes for updating the MAEC filters. The MAEC system consists of a dedicated subband AEC filter for each loudspeaker channel, with each filter composed of multiple subfilters, i.e., one subfilter per subband per channel. The first tap selection scheme considered applied the wellknown MMax criterion on the multichannel input spectra across all three dimensions, and is hence called the 3DM scheme. This scheme jointly exploits the spectral, spatial and temporal sparsity in the input signals but typically results in some subfilters having no taps updated. In order to avoid this problem, two new schemes have been presented which perform tap selection by applying the MMax criterion only across filter length (and thereby exploit temporal sparsity for updating each subfilter) and do not completely ignore the subfilters with the smallest magnitude tapinputs. The FEA scheme allocates a fixed number of taps to be updated in each subfilter per frame, while the proposed DEA scheme exploits the joint spectrospatial sparsity present in the input signals for dynamically allocating the number of taps to be updated in the different subfilters. The new tap selection schemes have been compared to the stateoftheart SPU tap selection scheme in the subband domain, which displays similar properties to the 3DM scheme. The proposed DEA scheme is designed such that it selects more taps in the subfilters with larger magnitude tapinputs (like the 3DM and SPU schemes) while not completely ignoring the subfilters with smaller magnitude tapinputs (like the FEA scheme). Simulation results for speech and music signals showed that in terms of ERLE and convergence speed, the 3DM and DEA schemes achieved almost identical echo cancellation performance compared to full filter update even when only 20% of the MAEC filter taps were updated in every frame, while the FEA and SPU schemes performed worse (about 2–4 dB and 10–12 dB deterioration in ERLE, respectively). The SPU, FEA and DEA tap selection schemes have a reduced computational cost compared to full filter update, while the 3DM scheme does not necessarily lead to reduction in computational complexity. Hence, in conclusion, the proposed DEA tap selection scheme yields almost identical echo cancellation performance compared to updating all filter taps at a significantly reduced computational cost.
7 Endnote
Declarations
Acknowledgements
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/20072013) project DREAMS under grant agreement ITNGA2012316969. The authors would also like to acknowledge the contribution of Anirudha Kalya in helping develop the DEA tap selection scheme.
Authors’ contributions
The contribution of the first author consists of developing the main algorithmic idea, performing simulations, analyzing the simulation results and drafting the article. The contribution of the second and third authors consist of critically discussing the developed algorithms and the simulation results with the first author, and proofreading and revising the article. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 J Benesty, T Gansler, DR Morgan, MM Sondhi, SL Gay, Advances in Network and Acoustic Echo Cancellation (SpringerVerlag, Berlin, 2001).View ArticleMATHGoogle Scholar
 E Hänsler, G Schmidt, Acoustic Echo and Noise Control  a Practical Approach (Wiley and Sons, Hoboken, NJ, 2004).View ArticleGoogle Scholar
 MM Sondhi, DR Morgan, JL Hall, Stereophonic acoustic echo cancellation—an overview of the fundamental problem. IEEE Sig. Process Lett. 2:, 148–151 (1995).View ArticleGoogle Scholar
 H Buchner, J Benesty, W Kellermann, in Adaptive signal processing: Application to realworld problems, ed. by J Benesty, Y Huang. Multichannel frequencydomain adaptive filtering with application to acoustic echo cancellation (SpringerVerlagBerlin/Heidelberg, 2003), pp. 95–128.View ArticleGoogle Scholar
 H Buchner, J Benesty, W Kellermann, Generalized multichannel frequencydomain adaptive filtering: efficient realization and application to handsfree speech communication. Signal Proc. 85(3), 549–570 (2005).View ArticleMATHGoogle Scholar
 Y Huang, J Benesty, J Chen, Identification of acoustic MIMO systems: Challenges and opportunities. Signal Proc. 86(6), 1278–1295 (2006).View ArticleMATHGoogle Scholar
 SC Douglas, Adaptive filters employing partial updates. IEEE Trans. Circ. Syst.II Analog. Digit. Signal Proc. 44(3), 209–216 (1997).View ArticleGoogle Scholar
 T Schertler, Selective block update of NLMS type algorithms. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Seattle USA.3:, 1717–1720 (1998).Google Scholar
 T Aboulnasr, K Mayyas, Selective coefficient update of gradientbased adaptive algorithms. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Munich Germany.3:, 1929–1932 (1997).Google Scholar
 T Aboulnasr, K Mayyas, Complexity reduction of the NLMS algorithm via selective coefficient update. IEEE Trans. Signal Proc. 47(5), 1421–1424 (1999).View ArticleGoogle Scholar
 K Doǧançay, O Tanrikulu, Adaptive filtering algorithms with selective partial updates. IEEE Trans. Circ. Syst.II Analog. Digit. Signal Proc.48(8), 762–769 (2001).View ArticleMATHGoogle Scholar
 K Doǧançay, PA Naylor, Recent advances in partial update and sparse adaptive filters. Proc. Eur. Signal Proc. Conf. Antalya Turkey, 1–4 (2005).Google Scholar
 PA Naylor, W Sherliker, A shortsort MMax NLMS partialupdate adaptive filter with application to echo cancellation. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Hong Kong.5:, 373–376 (2003).Google Scholar
 H Deng, M Doroslovački, New sparse adaptive algorithms using partial update. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Montreal Canada.2:, 845–848 (2004).Google Scholar
 J Benesty, DR Morgan, MM Sondhi, A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation. IEEE Trans. Speech Audio Proc. 6(2), 156–165 (1998).View ArticleGoogle Scholar
 AWH Khong, J Benesty, PA Naylor, Stereophonic acoustic echo cancellation: analysis of the misalignment in the frequency domain. IEEE Signal Proc. Lett. 13(1), 33–36 (2006).View ArticleGoogle Scholar
 M Ali, Stereophonic acoustic echo cancellation system using timevarying allpass filtering for signal decorrelation. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. 6:, 3689–3692 (1998).Google Scholar
 AWH Khong, PA Naylor, Reducing interchannel coherence in stereophonic acoustic echo cancellation using partial update adaptive filters. Proc. Eur. Signal Proc. Conf. Vienna Austria.405–408 (2004).Google Scholar
 AWH Khong, PA Naylor, A family of selectivetap algorithms for stereo acoustic echo cancellation. Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. Philadelphia USA. 3:, 133–136 (2005).Google Scholar
 AWH Khong, PA Naylor, Stereophonic acoustic echo cancellation employing selectivetap adaptive algorithms. IEEE Trans. Audio Speech Lang. Proc. 14(3), 785–796 (2006).View ArticleGoogle Scholar
 E Ferrara, Fast implementations of LMS adaptive filters. IEEE Trans. Acoust. Speech Signal Proc. 28:, 474–475 (1980).View ArticleGoogle Scholar
 JMP Borrallo, MG Otero, On the implementation of a partitioned block frequency domain adaptive filter (PBFDAF) for long acoustic echo cancellation. Signal Proc. 27:, 301–315 (1992).View ArticleGoogle Scholar
 JS Soo, K Pang, Multidelay block frequency domain adaptive filter. IEEE Trans. Acoust. Speech Signal Proc. 38(2), 373–376 (1990).View ArticleGoogle Scholar
 JJ Shynk, Frequencydomain and multirate adaptive filtering. IEEE Signal Proc. Mag. 9(1), 14–37 (1992).View ArticleGoogle Scholar
 S Haykin, Adaptive Filter Theory (Prentice Hall, Upper Saddle River, NJ, 1996).MATHGoogle Scholar
 X Lin, AWH Khong, M Doroslovački, PA Naylor, Frequencydomain adaptive algorithm for network echo cancellation in VoIP. EURASIP J. Audio Speech Music Proc.2008:, 1–9 (2008).View ArticleGoogle Scholar
 R Crochiere, A weighted overlapadd method of shorttime Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Signal Proc. 28:, 99–102 (1980).View ArticleGoogle Scholar
 DE Knuth, The Art of Computer Programming, vol. 3 (AddisonWesley, Reading, MA, 1973).MATHGoogle Scholar
 I Pitas, Fast algorithms for running ordering and max/min calculation. IEEE Trans. Circ. Syst. 36(6), 795–804 (1989).View ArticleGoogle Scholar
 NK Desiraju, S Doclo, T Gerkmann, T Wolff, Efficient multichannel acoustic echo cancellation using constrained sparse filter updates in the subband domain. Proc. ITG Symp. Speech Commun. Erlangen Germany.1–4 (2014).Google Scholar
 NP Hurley, ST Rickard, Comparing measures of sparsity. IEEE Trans. Inform. Theory. 55(10), 4723–4741 (2009).MathSciNetView ArticleMATHGoogle Scholar
 PO Hoyer, Nonnegative matrix factorization with sparseness constraints. J Mach. Learn. Res. 5:, 1457–1469 (2004).MathSciNetMATHGoogle Scholar