Significance-aware filtering for nonlinear acoustic echo cancellation

Hofmann, Christian; Huemmer, Christian; Guenther, Michael; Kellermann, Walter

doi:10.1186/s13634-016-0410-7

Research
Open access
Published: 08 November 2016

Significance-aware filtering for nonlinear acoustic echo cancellation

Christian Hofmann¹,
Christian Huemmer¹,
Michael Guenther¹ &
…
Walter Kellermann¹

EURASIP Journal on Advances in Signal Processing volume 2016, Article number: 113 (2016) Cite this article

2757 Accesses
10 Citations
Metrics details

Abstract

This article summarizes and extends the recently proposed concept of Significance-Aware (SA) filtering for nonlinear acoustic echo cancellation. The core idea of SA filtering is to decompose the estimation of the nonlinear echo path into beneficially interacting subsystems, each of which can be adapted with high computational efficiency. The previously proposed SA Hammerstein Group Models (SA-HGMs) decompose the nonlinear acoustic echo path into a direct-path part, modeled by a Hammerstein Group Model (HGM) and a complementary part, modeled by a very efficient Hammerstein model. In this article, we furthermore propose a novel Equalization-based SA (ESA) structure, where the echo path is equalized by a linear filter to allow for an estimation of the loudspeaker nonlinearities by very small and efficient models. Additionally, we provide a novel in-depth analysis of the computational complexity of the previously proposed SA and the novel ESA filters and compare both SA filtering approaches to each other, to adaptive HGMs, and to linear filters, where fast partitioned-block frequency-domain realizations of the competing filter structures are considered. Finally, the echo reduction performance of the proposed SA filtering approaches is verified using real recordings from a commercially available smartphone. Beyond the scope of previous publications on SA-HGMs, the ability of the SA filters to generalize for double-talk situations is explicitly considered as well. The low complexity as well as the good echo reduction performance of both SA filters illustrate the potential of SA filtering in practice.

1 Introduction

Since the first adaptive linear echo canceler for network echoes in telephone lines [1], linear echo cancellation has evolved to a key ingredient of almost any full-duplex speech communication system. This has resulted in a multitude of approaches to efficiently model, parametrize, and estimate even complex linear systems, such as the acoustic echo paths in hands-free wideband telecommunication scenarios [2]. With increasingly common nonlinear distortions produced by miniaturized amplifiers and loudspeakers in modern portable devices, dedicated nonlinear echo path models have emerged as an important topic of research and motivated sophisticated approaches for nonlinear Acoustic Echo Cancellation (AEC) based on Volterra filters [3–5], artificial neural networks [6, 7], Functional Link Adaptive Filters (FLAFs) [8, 9], or kernel methods [10, 11]. A very simple, yet effective model for nonlinear acoustic echo paths is the cascade of a memoryless preprocessor (modeling loudspeaker signal distortions) and a subsequent linear system (modeling sound propagation through air) [12]. Due to its simplicity, this so-called Hammerstein Model (HM) has been frequently employed [13–22] and will also be used in this contribution. So will be a group of B parallel HMs, referred to as Hammerstein Group Model (HGM) in the following. The recently proposed efficient Significance-Aware HGM (SA-HGM) [20] combines the advantages of HMs and HGMs and was extended to an efficient partitioned-block frequency-domain realization in [22].

Beyond previous work, this article introduces a novel variant of the Significance-Aware (SA) filtering concept denoted as Equalization-based SA (ESA) filtering which complements the existing efficient frequency-domain realization in [22]. Thereby, highly efficient SA filters can be derived for higher-order nonlinear systems—even without block partitioning. Furthermore, a novel in-depth analysis of the computational complexity of the previously proposed SA filtering concepts is provided and contrasted with the computational complexity of the novel ESA filtering concept, adaptive HGMs, and conventional adaptive linear filters. Beyond previous investigations [22], this article also assesses the ability of the considered echo path models to generalize for double-talk situations, in which the models cannot be adapted to the current input signal. The performance in such situations reflects how well the estimated system models the physical system to be identified.

The remainder of this paper is structured as follows: after introducing the notation in Section 2, frequently used echo path models and their adaptation are reviewed in Sections 3.1 and 3.2, respectively. Afterwards, the recently proposed Partitioned-Block SA-HGM (PBSA-HGM) is summarized in Section 4.1, before introducing the novel ESA filtering concept in Section 4.2, which results in an Equalization-based Significance-Aware HM (ESA-HM). Then, the computational complexity of the SA filters (PBSA-HGM and ESA-HM) is analyzed and compared to the complexity for adapting a linear model and an HGM in Section 5.1. An experimental verification of the efficacy of the novel ESA-HM in comparison to other approaches in terms of echo reduction performance is given in Section 5.2. Finally, the manuscript is complemented by a summary of the main results in Section 6.

2 Notation

Throughout this article, vectors will be typeset in boldface lowercase letters, e.g., a, and matrices in uppercase boldface, e.g., A. The complex conjugate, transposed, and Hermitian-transposed of a vector a (column vector by default) will be written as a ^∗, a ^T, and a ^H, respectively. Besides, A⊙B, A⊘B, and |A|² denote element-wise multiplication (Hadamard product), element-wise division, and element-wise magnitude-squaring of matrices or vectors of the same size, respectively. Furthermore, 〈a,b〉 stands for the scalar product a ^H b. Special matrices are the M×M identity matrix I _M, the M×M all-zero matrix 0 _M, and the windowing matrices

$$ \mathbf{W}_{01}= \left[\begin{array}{cc} \mathbf{0}_{M} & \mathbf{0}_{M}\\ \mathbf{0}_{M} & \mathbf{I}_{M} \end{array}\right] \quad\text{and} \quad\mathbf{W}_{10}= \left[\begin{array}{cc} \mathbf{I}_{M} & \mathbf{0}_{M}\\ \mathbf{0}_{M} & \mathbf{0}_{M} \end{array}\right], $$

(1)

setting the first or second half of a length-2M vector to zero, respectively. Furthermore, F and F ^H denote the Nth-order Discrete Fourier Transform (DFT) matrix and its inverse, respectively. Besides, a(k)∗b(k) and $a(k)\circledast b(k)$ denote linear and cyclic convolution between time series a(k) and b(k), respectively, where k is the discrete-time sample index. Analogously to vectors, $\left \langle a(k),b(k)\right \rangle =\sum _{k=-\infty }^{\infty }a(k)b^{*}(k)$ denotes the scalar product between the time series a(k) and b(k). For a real-valued scalar a, ⌈a⌉ refers to the smallest integer number larger than or equal to a, also known as ceiling function.

3 Fundamentals of linear and nonlinear acoustic echo cancellation

The acoustic echo path of a full-duplex communication system can be described as illustrated in Fig. 1.

Therein, the discrete-time loudspeaker signal x(k), where k is the sample index, is interpolated by the D/A converter, amplified, and played-back via a loudspeaker, yielding the analog loudspeaker signal x _a(t), where t is the continuous time variable. The subsequent propagation of the sound waves through the medium air to the microphone can be modeled very accurately by a linear system with impulse response h _a(t). These acoustic echoes (far-end components) superimpose with local speakers and interferences (near-end components), evoking the continuous-time microphone signal y _a(t) and its sampled, discrete-time representation y(k). In order to provide good estimates of local (near-end) speakers, two technologies are typically employed: an AEC unit for removing the far-end signal components from y(k) and a postfilter suppressing residual echoes and near-end interferences [23–32].

In this work, we will not address the latter but focus on the AEC unit, which provides an echo estimate $\hat {y}(k)$ by identifying the acoustic echo path adaptively. The AEC error signal

$$e (k) = y (k) - \hat{y} (k) $$

is an estimate for the near-end signal, even during double-talk periods, where both far-end and near-end components are present. In periods of vanishing near-end signal (single-talk periods), e(k) can be used for refining the echo path estimate. Thus, the AEC unit typically requires a double-talk detection [33–38]. As double-talk detection is out of the scope of this article, we will assume this to be handled separately, such that the microphone signal during time intervals used for echo path estimation only consists of far-end signal components (acoustic echoes). Furthermore, the AEC unit requires the choice of a suitable echo path model and of an adaptation strategy. This will be addressed in Sections 3.1 and 3.2, respectively.

3.1 From linear to nonlinear echo path models

This section contains a brief overview of frequently employed echo path models. Strategies for estimating the model parameters (i.e., filter coefficients) will be described in Section 3.2.

3.1.1 Linear models

The most simple, yet frequently used echo path model is the linear model, depicted in Fig. 2 a.

Therein, the Loudspeaker-Enclosure-Microphone System (LEMS) is modeled by a linear FIR filter with the input/output relation

$$\begin{array}{*{20}l} y (k) & = x (k)* h (k)\,, \end{array} $$

(2)

where h(k) is the discrete-time impulse response of the LEMS. Such models are most suitable for high-quality audio equipment. On the other hand, energy-efficient and miniaturized portable devices operating at the limit of their capabilities (such as portable navigation devices or smartphones in hands-free mode) lead to nonlinear distortions in the played-back signal [12] (see red box in Fig. 1), which render linear echo path models insufficient.

3.1.2 Hammerstein models

Due to the cascaded nature of the acoustic echo path between x(k) and y(k) (first nonlinear playback equipment, then linear transmission through air), a very simple, yet effective nonlinear echo path model is given by the cascade of a memoryless nonlinearity (memoryless preprocessor) with a subsequent linear system [12–22]. Such a structure is depicted in Fig. 2 b and is referred to as HM or nonlinear-linear cascade in the literature [39–43]. This corresponds to the input/output relation

$$\begin{array}{*{20}l} y(k) & =h(k)\ast x_{\text{NL}}{(k)} \\ & =h(k)\ast f \left\{ x \,(k)\right\}. \end{array} $$

(3)

In AEC, the memoryless preprocessor x _NL(k)=f{x (k)} typically has to model a saturation introduced by the amplifier and the loudspeaker. To this end, f{x (k)} can be chosen from a simple hard limiter to a parametric preprocessor [14]

$$ x_{\text{NL}}{(k)}=\sum_{b=1}^{B}w_{b}f_{b}\left\{x(k)\right\} $$

(4)

with nonlinearity basis functions f _b{·} and preprocessor weights (parameters) w _b, which corresponds to the block diagram in Fig. 2 c. Considering a Bth-order polynomial preprocessor, intermodulation products up to the order B may be modeled. Examples for such polynomial preprocessors can be found in [13, 14].

3.1.3 Hammerstein group models

A more complex and more general class of models will be referred to as HGMs. They consist, as depicted in Fig. 2 d, of a group of B parallel HMs and have individual impulse responses h _b (k) (referred to as kernels) for each branch b=1,…,B after the branch’s nonlinearity basis function f _b{·}. This corresponds to the input/output relation

$$\begin{array}{*{20}l} y\,(k) & =\sum_{b=1}^{B}h_{b} (k) \ast\underbrace{f_{b}\left\{x(k)\right\}}_{x_{b}(k)}, \end{array} $$

(5)

where x _b (k) will be referred to as branch signals in the following.

Practical examples for such HGMs employ, e.g., monomes as nonlinearity basis functions [44, 45] (so-called power filters) and Legendre polynomials [19–22], as well as sinusoidal functions [8, 9, 42]. Some of the publications employing sinusoidal functions also refer to HGMs as FLAF without memory [8, 9]. Inspired by machine learning, FLAFs typically express the input/output relation of (5) differently: instead of individual branch signals, all branch signals are interleaved and concatenated to a larger expanded vector, which is mapped to an echo estimate by linear combination (covers both convolution and summation in (5)). However, with focus on the applicability of fast convolution methods in this article, the formulation according to a group of parallel Hammerstein systems is advantageous.

Furthermore, note that the special case of an HGM where all $h_{b}(k)=w_{b}h_{b_{\text {ref}}}(k)\forall b=1,\dots, B$ are scaled versions of a reference impulse response $h_{{b}_{\text {ref}}}\,(k)\phantom {\dot {i}\!}$ in branch b _ref can be expressed as an HM. One possible HM representation of such an HGM has the reference impulse response $h\,(k) =\,h_{b_{\text {ref}}}\,{(k)}\phantom {\dot {i}\!}$ and a parametric preprocessor according to (4) with the HGM’s nonlinearity basis functions f _b{·} and preprocessor weights

$$ w_{b}=\frac{\left\langle h_{b_{\text{ref}}}{(k)}, h_{b}{(k)}\right\rangle }{\left\langle h_{b_{\text{ref}}}{(k)}, h_{b_{\text{ref}}}{(k)}\right\rangle}. $$

(6)

This correspondence will be essential for the SA filters in Section 4, where b _ref=1 (implying w ₁=1) will be considered by default.

3.1.4 Models with dynamic nonlinearities

A further generalization of HGMs are models, where the nonlinearity basis functions are nonlinear functions f _b{x(k)} depending on a vector x (k), which is formed from samples of the input signal x (k). The most prominent example for such filters are the so-called Volterra filters [3–5, 40, 46–48], for which f _b{x(k)} computes time-lagged products of samples of x(k) (different elements of x(k))¹. Note that power filters [44, 45] (having a memoryless nonlinearity) represent the special case of the Volterra filters where only the main diagonal of each Volterra kernel is populated with non-zero coefficients, corresponding to the respective kernels of the power filter. Volterra filters can be seen as a multidimensional Taylor series expansion of the function x(k)↦y(k). Alternatively, the monomial basis functions of Volterra filters have also been replaced by Legendre polynomials [49, 50] or Fourier basis functions (sinusoids) [51]. Note that for the Fourier-basis nonlinear filters, the feed-forward structure has also been complemented by a feedback path, leading to bounded-input/bounded-output (BIBO)-stable recursive nonlinear filters [52].

Although the SA filters described in Section 4 will employ only HGMs and memoryless preprocessors, it is worth noticing that the SA filtering concept can also be applied to dynamic nonlinearities f _b{x(k)}, involving, e.g., Volterra filters instead of HGMs.

3.2 Adaptation strategies

Adaptation of nonlinear models can be performed on different levels. On a basic level, parameters like filter coefficients of a model with a given structure are identified. To this end, the parameters may be modeled as deterministic parameters, resulting in, e.g., Least-Mean-Square (LMS) algorithms [53] for adaptation, or as probabilistic parameters, leading to Kalman filter-like algorithms [15, 19, 21, 54]. On a higher level, the model structure (e.g., filter lengths of linear subsystems or numbers of diagonals of Volterra kernels) can be estimated via self-configuring evolutionary algorithms [5, 55–57]. In this article, the term “adaptation” will refer to the iterative estimation of filter coefficients modeled as deterministic parameters of a model with fixed structure.

In this section, the adaptation of the linear subsystems of the models described in Section 3.1 will be discussed. As all these models can be expressed as special cases of a parallel structure like an HGM, the adaptation will be described for an HGM. Then, the adaptation schemes for all other models can be derived as special cases of the HGM case. In particular, two common approaches for filter adaptation will be revisited: the direct adaptation of the impulse responses in the time domain by a Normalized Least-Mean-Square (NLMS) algorithm and the adaptation of partitioned versions of the impulse responses in the frequency domain, which allows a smooth trade-off between computational efficiency and algorithmic delay (latency) of the digital signal processing system.

3.2.1 Time-domain adaptive filtering

A straightforward identification of the linear filters of an HGM is possible by iteratively minimizing a quadratic cost function derived from the error signal e (k). Most common algorithms are LMS-, affine projection-, or Recursive Least-Squares (RLS)-type algorithms [53]. Due to its computational efficiency, the adaptation of an HGM by an NLMS algorithm, which aims at minimizing the Mean Squared Error (MSE) in a gradient-descent manner, will be considered. The following description is independent of the actual choice of the nonlinearity basis functions f _b{·}, as long as f _b{·} are time-invariant (Legendre polynomials will be employed for the evaluations in Section 5).

An adaptive HGM with such time-invariant nonlinearity basis functions f _b{·} first computes the branch signals

$$ x_{b}(k)=f_{b}\left\{x\,(k)\right\} $$

(7)

and afterwards the echo estimate

$$ \hat{y}(k)=\sum_{b=1}^{B}\sum_{l=0}^{L-1}x_{b}{(k-l)}\cdot \hat{h}_{b,l}({k-1}), $$

(8)

where $\hat {h}_{b,l}(k - 1)$ is the lth tap of a length-L estimate of the impulse response in branch b and has been obtained at time index k−1. Then, the error signal

$$ e (k) = y (k) - \hat{y} (k) $$

(9)

has to be computed, before performing for each branch b and filter tap l an update according to

$$ \hat{h}_{b,l}(k)=\hat{h}_{b,l}(k - 1)+\mu_{b}\cdot\frac{e\,(k)}{E_{x_{b}}(k)+\delta}x_{b}(k-l), $$

(10)

where 0<μ _b<1 are branch-specific adaptation stepsizes, $E_{x_{b}} (k)=\sum _{l=0}^{L-1}\left |x_{b}(k-l)\right |^{2}$ are the branch signal energies, and δ is a regularization constant for numerical stability². An efficient implementation (as assumed for the complexity analysis in Section 5.1) decomposes (10) by determining branch powers

$$ P_{x_{b}}\,(k)=\left(x_{b}(k)\right)^{2} $$

(11)

to compute the branch energies

$$ E_{x_{b}}\,(k)=E_{x_{b}}({k-1})+ P_{x_{b}}(k)-P_{x_{b}} (k - L), $$

(12)

computes normalized errors

$$ e_{\mathrm{norm,}b}(k)=\mu_{b}e (k)\left/\left(E_{x_{b}}\,(k)+\delta\right)\right., $$

(13)

and finally updates the filter coefficients according to

$$ \hat{h}_{b,l} (k)=\hat{h}_{b,l}\,(k - 1)+e_{\text{norm},b}(k)\,x_{b}\,(k - l). $$

(14)

This allows to adaptively identify the individual branches of HGMs but also covers the identification of single HMs and linear models as special cases.

3.2.2 Partitioned-block frequency-domain adaptive filtering

3.2.2.1 Partitioned-block convolution:

A linear convolution can be realized despite large filter lengths L with a low input/output delay efficiently by block-based processing methods like Partitioned-Block Frequency-Domain Filtering (PBFDF) [56, 58–60], also known as multidelay convolution [61]. In the following, only a uniform partitioning with frame shift M and frame size N=2M will be considered.

In this case, the input signal x(k), the impulse response h (k), and the output signal y(k) are partitioned into length-N vectors

$$\begin{array}{*{20}l} \mathbf{x}(\kappa) & = \left[x(\kappa M - N + 1),\ldots,x(\kappa M)\right]^{\mathrm{T}} \end{array} $$

(15)

$$\begin{array}{*{20}l} \mathbf{h}^{(p)} & =\left[ h(pM),\ldots, h (p M + M - 1),0,\ldots,0\right]^{\mathrm{T}} \end{array} $$

(16)

$$\begin{array}{*{20}l} \mathbf{y}(\kappa) & =\left[\,0,\ldots,0,\, {y}(\kappa M- M + 1),\ldots, y \left(\kappa M\right)\right]^{\mathrm{T}}\,, \end{array} $$

(17)

respectively, where κ is the frame index for block processing and p is the index of the impulse response partition. After a DFT, represented by the DFT matrix F, the transformed versions of the signal vectors and impulse response partitions are referred to as

$$\begin{array}{*{20}l} \underline{\mathbf{x}}{(\kappa)} & =\mathbf{F}\mathbf{x}(\kappa) \end{array} $$

(18)

$$\begin{array}{*{20}l} \underline{\mathbf{h}}^{(p)} & =\mathbf{F}\mathbf{h}^{(p)}. \end{array} $$

(19)

As illustrated in Fig. 3 for P=2 partitions, performing fast DFT-domain convolution between each pair of h ^(p) and x(κ−p) and summing up the respective partial results yields

$$\begin{array}{*{20}l} \mathbf{y}(\kappa) & = \mathbf{W}_{01}\overbrace{\mathbf{F}^{{\mathrm{H}}}\underbrace{\sum_{p=0}^{P-1}\underline{\mathbf{x}}~{(\kappa-p)}\odot \underline{\mathbf{h}}^{(p)}}_{\underline{\mathbf{y}}^{\circ}{(\kappa)}}}^{\mathbf{y}^{\circ}({\kappa})}, \end{array} $$

(20)

where $P =\left \lceil \frac {L}{M}\right \rceil $ is the number of non-zero impulse response partitions and where the windowing matrix W ₀₁ according to (1) suppresses additionally computed samples in y°(κ), which result from the previous frame and may contain cyclic convolution artifacts.

Note that such a PBFDF scheme is computationally efficient because each input signal frame’s DFT has to be computed only once.

3.2.2.2 Partitioned-block frequency-domain adaptive filtering (PBFDAF):

In the following, partitioned-block convolution will be utilized during the adaptation of an HGM with DFT-domain filter estimates $\hat {{\mathbf {\underline {h}}}}_{b}^{(p)}({\kappa })$ for partition p in branch b at time frame κ. As for the time-domain description in Section 3.2.1, we will assume that the nonlinearity basis functions f _b{·} are chosen in advance (e.g., as monomials or Legendre polynomials)—the validity of the following section is independent of the actual choice. Iteratively minimizing the MSE by adapting ${\hat {\underline {\mathbf {h}}}}_{b}^{(p)}({\kappa })$ in a gradient-descent manner with instantaneous estimates for the gradient can be realized by the NLMS algorithm. All computations required for modeling a Partitioned-Block HGM (PB-HGM) and adapting it via a Frequency-domain Normalized Least-Mean-Square (FNLMS) algorithm can be summarized as follows: first, the set of new samples of each branch signal x _b(k) has to be computed and appended to the already available samples of the branch signal for a given frame (for overlap-save processing), yielding the branch signal vectors

$$ \mathbf{x}_{b}({\kappa})= \left[\begin{array}{c} [\mathbf{0}_{M},\mathbf{I}_{M}]\mathbf{x}_{b}({\kappa-1})\\ \left[f_{b}\left\{x~({\kappa})\right\},\dots,f_{b}\left\{x~({\kappa+M-1})\right\} \right]^{\mathrm{T}} \end{array}\right], $$

(21)

which are then transformed into the DFT domain according to

$$ \mathbf{\underline{x}}_{b}({\kappa})=\mathbf{F}\mathbf{x}_{b}\,({\kappa}). $$

(22)

This allows the computation of the DFT-domain error signal vector

$$ \mathbf{\underline{e}}({\kappa})=\mathbf{F}\left(\mathbf{y}({\kappa})-\underbrace{ \mathbf{W}_{01}\mathbf{F}^{{\mathrm{H}}}\hat{\underline{\mathbf{y}}}^\circ{(\kappa)}}_{\hat{\mathbf{y}}({\kappa})}\right), $$

(23)

where, analogously to (20),

$$ \hat{\underline{\mathbf{y}}}^\circ{(\kappa)}=\sum_{b=1}^{B}\sum_{p=0}^{P-1}\underline{\mathbf{x}}_b {{(\kappa-p)}}\odot \hat{\underline{\mathbf{h}}}_{b}^{(p)}{(\kappa-1)} $$

(24)

is the intermediate DFT-domain microphone signal estimate with cyclic convolution artifacts of the partitioned-block convolution and $\hat {\mathbf {y}}({\kappa })$ is the final time-domain estimate containing M zeros and the M most recent samples of $\hat {y} (k)$. Similar to the time-domain adaptation, branch signal Power Spectral Density (PSD) vectors

$$ \underline{\mathbf{s}}_{x_{b}}{(\kappa)}=\gamma_{\textsc{E}}\underline{\mathbf{s}}_{x_{b}}{(\kappa-1)}+ \left(1-\gamma_{\textsc{E}}\right)\left|\mathbf{\underline{x}}_{b}({\kappa})\right|^{2} $$

(25)

are calculated in the DFT domain by recursively smoothing the element-wise squared magnitudes $\left |\underline {\mathbf {x}}_{b}{(\kappa)}\right |^{2}$ with 0≤γ _E<1. In the following, γ _E=0.9 will be employed as default. Based on $\underline {\mathbf {s}}_{x_{b}}{(\kappa)}$, normalized branch signals are computed according to

$$ \underline{\mathbf{x}}_{\text{norm,}b}{(\kappa)}=\left(\underline{\mathbf{x}}_b {(\kappa)}\right)^{*}\odot\left(\mu_{b}\oslash\left(\underline{\mathbf{s}}_{x_{b}}({\kappa})+\delta\right)\right), $$

(26)

before performing the actual filter update according to

$$ \tilde{\underline{\mathbf{h}}}_{b}^{(p)}{(\kappa)}=\hat{\underline{\mathbf{h}}}_{b}^{(p)}{(\kappa-1)} +\underline{\mathbf{x}}_{\text{norm,} b}{(\kappa-p)}\odot \underline{\mathbf{e}}{(\kappa)}. $$

(27)

Additionally, the temporal support of the filters can be limited explicitly to the partition length M

$$ \hat{\underline{\mathbf{h}}}_{b}^{(p)}{(\kappa)}=\mathbf{F}\underbrace{\mathbf{W}_{10}\mathbf{F}^{{\mathrm{H}}} \left(\tilde{\underline{\mathbf{h}}}_{b}^{(p)}{(\kappa)}\right)}_{\hat{h}_{b}^{(p)}({\kappa})}. $$

(28)

This is equivalent to the classical formulation of the so-called constrained update, where the constraint (zeros in time domain) is imposed on the update (see [61]). Yet, the computation of ${\hat {\mathbf {h}}_{b}^{(p)}({\kappa })}$ as a byproduct of the filter constraint in (28) will be beneficial for the SA filtering later on.

As for time-domain adaptive filtering, the algorithm for adapting an HGM in the partitioned-block frequency domain covers the adaptation of an HM (an HGM with B=1 and f ₁{x(k)}=f{x(k)}) and a linear model (an HGM with B=1 and x ₁(k)=f ₁{x(k)}=f{x(k)}=x(k)). Furthermore, the non-partitioned FNLMS algorithm, derived in many textbooks, e.g., [53, 62], results from the aforementioned partitioned-block description for P=1 with sufficiently large N=2L. Adapting a model with the block partitioning will be referred to as Partitioned-Block Frequency-domain Normalized Least Mean Squares (PB-FNLMS) algorithm.

4 Significance-aware filtering

Significance-aware (SA) filtering is a generalized system identification concept which exploits prior knowledge about the physical system to be identified by decomposing the adaptive model in a divide-and-conquer manner into beneficially interacting subsystems.

In particular, the originally proposed SA-HGM [20] performs a temporal decomposition of the acoustic echo path, as depicted in Fig. 4 for a linear model, into a region describing the direct-path wavefront and a complementary region.

This decomposition is then employed, as depicted in Fig. 5, to model the short direct-path region (carrying a significant amount of energy) by an HGM and to model the large complementary region by a computationally efficient HM. Apart from the decomposition into subsystems, the interaction between the subsystems is a key feature of SA filtering as well. In particular, the HM subsystem’s nonlinear preprocessor has the form

$$ x_{\text{pp}} (k)=f_{\text{pp}}\left\{ x (k)\right\} =\sum_{b=1}^{B}\hat{w}_{b}\,(\kappa)\,x_{b} (k) $$

(29)

and combines the branch signals x _b(k) with frame-wise updated weights $\hat {w}_{b} (\kappa)$. These weights $\hat {w}_{b} (\kappa)$ are estimated from the kernels of the HGM subsystem (will be explained in Section 4.1.3). Thereby, a nonlinear model estimated from the direct path is extrapolated to the entire acoustic echo path.

These two features, the decomposition and the preprocessor coefficient estimation via an HGM, are key components of both the recently proposed PBSA-HGM [22], revisited in Section 4.1, and the novel alternative SA filtering structure, which will be denoted by ESA filtering and will be introduced in Section 4.2.

4.1 Significance-aware Hammerstein group models

SA-HGMs are efficient because the potentially computationally expensive HGM only has to model a small temporal support of the acoustic echo path. To realize this concept while exploiting the benefit of fast frequency-domain convolution, a partitioned-block variant of the SA-HGMs, denoted as PBSA-HGM, has recently been introduced in [22]³. This PBSA-HGM will be briefly reviewed in the following. Employing the uniform partitioning according to Section 3.2.2 leads to an adaptive HM with DFT-domain partition estimates $\hat {\underline {\mathbf {h}}}^{(p)}({\kappa })$ for partitions p=0,…,P−1 and an adaptive HGM with DFT-domain estimates $\hat {\underline {\mathbf {h}}}_{b}^{({p}_{\mathrm {d}})}(\kappa)$ for branches b=1,…,B but just a single partition with index p _d. This partition with index p _d should capture the direct path and thus a significant portion of the energy transmitted from the loudspeaker to the microphone. Note that $\hat {\underline {\mathbf {h}}}^{(p_{\mathrm {d}})}{(\kappa)}$ models h _d(k) in Fig. 5 and $\hat {\underline {\mathbf {h}}}^{(p)}{(\kappa)}\forall p \neq {p}_{\mathrm {d}}$ jointly model h _c(k) in Fig. 5. In the following, the estimation of $\hat {\underline {\mathbf {h}}}^{(p)}{(\kappa)}$ (covering both h _d(k) and h _c(k) of Fig. 5) and $\hat {\underline {\mathbf {h}}}_{b}^{(p_{\text {d}})}{(\kappa)}$ will be described in Sections 4.1.1 and 4.1.2, respectively, and the estimation of the HM’s preprocessor coefficients $\hat {w}_{b} (\kappa)$ based on $\hat {\mathbf {\underline {h}}}_{b}^{({p_{\mathrm {d}}})}({\kappa })$ will be explained in Section 4.1.3.

4.1.1 Estimation of the RIR of the HM submodel

In the following, the identification of the HM subsystem’s impulse response partitions will be described. For this purpose, the preprocessed loudspeaker signal can be expressed in vector notation as

$$ \mathbf{x}_{\text{pp}}({\kappa})= \left[\begin{array}{c} [\mathbf{0}_{M},\mathbf{I}_{M}]{\mathbf{x}}_{\text{pp}}{(\kappa-1)}\\ \sum_{b=1}^{B}\hat{w}_{b}{(\kappa)}[\mathbf{0}_{M},\mathbf{I}_{M}]{\mathbf{x}}_{b}{(\kappa)} \end{array}\right], $$

(30)

where x _b(κ) are the branch signal vectors of the HGM submodel. The HM submodel with DFT-domain partition estimates $\hat {\underline {\mathbf {h}}}^{(p)}{(\kappa)}$ yields, analogously to (23), an echo signal estimate vector

$$ \hat{\mathbf{y}}_{\text{HM}}(\kappa)= \mathbf{W}_{01}\mathbf{F}^{{\mathrm{H}}}\sum_{p=0}^{P-1} \underline{\mathbf{x}}_{\text{pp}}{(\kappa-p)}\odot\hat{\underline{\mathbf{h}}}^{(p)}{(\kappa-1)}, $$

(31)

where $\underline {\mathbf {x}}_{\text {pp}}{(\kappa)}= \mathbf {F}\mathbf {x}_{\text {pp}}{(\kappa)}$ with x _pp(κ) from (30).

The HM submodel’s DFT-domain error vector can then be computed via

$$ \underline{\mathbf{e}}_{\text{HM}}{(\kappa)} =\mathbf{F}\left(\mathbf{y}(\kappa)-\hat{\mathbf{y}}_{\text{HM}}(\kappa)\right). $$

(32)

Iteratively minimizing the two-norm of $\underline {\mathbf {e}}_{\text {HM}}{(\kappa)}$ in a gradient-descent manner by an FNLMS algorithm leads to the update rule

$$ \begin{aligned} \hat{\underline{\mathbf{h}}}^{(p)}{(\kappa)} &=\mathbf{FW}_{10}\mathbf{F}^{{\mathrm{H}}}\left(\vphantom{\hat{\underline{\mathbf{h}}}^{(p)}{(\kappa-1)} \underline{\mathbf{h}}}\hat{\underline{\mathbf{h}}}^{(p)}{(\kappa-1)} \right.\\ & \left. + \underline{\mathbf{x}}_{\mathrm{norm,}\text{pp}}{(\kappa-p)}\odot \underline{\mathbf{e}}_{\text{HM}}{(\kappa)} \vphantom{\hat{\underline{\mathbf{h}}}^{(p)}{(\kappa-1)}}\right), \end{aligned} $$

(33)

where $\underline {\mathbf {x}}_{\mathrm {norm,}\text {pp}}{(\kappa -p)}$ is the normalized DFT-domain signal vector computed analogously to (26) from the preprocessed input $\underline {\mathbf {x}}_{\text {pp}}{(\kappa)}$.

4.1.2 Estimating the HGM submodel

The HGM submodel alone provides an echo estimate

$$ \hat{\mathbf{y}}_{\text{HGM}}{(\kappa)}= \mathbf{W}_{01}\mathbf{F}^{{\mathrm{H}}}\left(\sum_{b=1}^{B}\underline{\mathbf{x}}_b {(\kappa-{p_{\mathrm{d}}})}\odot \hat{\underline{\mathbf{h}}}_{b}^{(p_{\mathrm{d}})}{(\kappa-1)}\right), $$

(34)

which is combined with the complementary-region partitions’ echo estimate of the HM submodel according to

$$ \begin{aligned} \hat{\mathbf{y}}_{\text{SA}}{(\kappa)} & = \mathbf{W}_{01}\mathbf{F}^{{\mathrm{H}}} \left(\sum_{b=1}^{{B}}\underline{\mathbf{x}}_{b}{(\kappa-{p_{\mathrm{d}}})}\odot\hat{\underline{\mathbf{h}}}_{b}^{(p_{\mathrm{d}})}{(\kappa-1)} \right. \\ & \left.+\sum_{p\in\{0,\dots,P-1\}\setminus{p_{\mathrm{d}}}}\!\!\!\!\!\!\!\!\!\!\!\underline{\mathbf{x}}_{\text{pp}}{(\kappa-p)} \odot\hat{\underline{\mathbf{h}}}^{(p)}{(\kappa-1)} \vphantom{\sum_{b=1}^{B}}\right). \end{aligned} $$

(35)

Minimizing the two-norm of the error signal vector

$$ \underline{\mathbf{e}}_{\text{SA}}{(\kappa)} =\mathbf{F}\left(\mathbf{y}(\kappa)-\hat{\mathbf{y}}_{\text{SA}}(\kappa)\right) $$

(36)

by an FNLMS algorithm leads to the update rule

$$ \begin{aligned} \hat{\underline{\mathbf{h}}}_{b}^{(p_{\mathrm{d}})}{(\kappa)} & =\mathbf{FW}_{10}\mathbf{F}^{{\mathrm{H}}}\left(\vphantom{\hat{ \underline{\mathbf{h}}}_{b}^{(p_d)}{(\kappa-1)}} \hat{\underline{\mathbf{h}}}_{b}^{(p_{\mathrm{d}})}{(\kappa-1)}\right. \\ & \left. \quad +\underline{\mathbf{x}}_{\text{norm,}b}{(\kappa-p_{\mathrm{d}})} \vphantom{\hat{\underline{\mathbf{h}}}_{b}^{(p_d)} {(\kappa-1)}}\odot \underline{\mathbf{e}}_{\text{SA}}{(\kappa)} \right), \end{aligned} $$

(37)

where $\underline {\mathbf {x}}_{\text {norm},b}(\kappa)$ is the normalized DFT-domain signal vector computed analogously to (26). The reason for the application of the windowing in (37) to the actual impulse response partitions will become obvious in the following section.

4.1.3 Estimating the preprocessor of the HM

Finally, the HM’s preprocessor coefficients can be recomputed employing the HGM. To avoid overhead due to complex-valued arithmetic, a direct use of the time-domain kernels $\hat {\mathbf {h}}_{b}^{(p_{\mathrm {d}})}({\kappa })$ (obtained as a byproduct of (37)) is possible to compute instantaneous least squares estimates for the preprocessor weights [20] according to

$$ \tilde{w}_{b}({\kappa})=C_{b,1}(\kappa)\left/C_{1,1}(\kappa)\right. $$

(38)

with

$$ C_{b,1}(\kappa)=\left\langle \hat{\mathbf{h}}_{b}^{(p_{\mathrm{d}})}({\kappa}),\hat{\mathbf{h}}_{1}^{(p_{\mathrm{d}})}({\kappa})\right\rangle. $$

(39)

A subsequent temporal smoothing of these estimates leads to the final preprocessor coefficients

$$ \hat{w}_{b}({\kappa+1})=\gamma_{\mathrm{w}}\hat{w}_{b}({\kappa})+(1-\gamma_{\mathrm{w}})\tilde{w}_{b}({\kappa}), $$

(40)

where 0≤γ _w<1. In the following, γ _w=0.95 will be employed per default. Note that $\hat {w}_{1}{(\kappa)}=1\,\forall \kappa $ and therefore does not need to be computed at all. As in [20, 22], the first branch (b=1) will be assumed to be linear (f ₁{x(t)}=x(t)), such that an entirely linear echo path model results from a preprocessor with $\hat {w}_{b}({\kappa })=0\,\,\forall \,b>1$.

By the method described in Section 4.1 and its subsections, the estimation of the nonlinear system has been split into two beneficially interacting subproblems (HM and HGM adaptation). The beneficial interaction is achieved by the preprocessor coefficient refinement based on the HGM and by the extension of the temporal support of the HGM employing partitions of the HM. An in-depth evaluation of the computational complexity of such a PBSA-HGM will be given in Section 5.1.

4.2 Equalization-based significance-aware Hammerstein models

Previous applications of the SA concept [20–22] split the echo path model along the time axis (using knowledge about a dominating direct-path component), to allow for an efficient estimation of nonlinear parameters (see also Section 4.1). This corresponded to a parallel decomposition of the acoustic echo path (see h _c(k) and h _d(k) of the HM submodel block in Fig. 5, where f _pp{·} can be shifted in the parallel branches of h _c(k) and h _d(k) as well).

In this section, we propose a novel realization of the SA filtering concept. This realization employs a serial decomposition (a cascade) of the echo path into a nonlinear loudspeaker and a subsequent linear RIR, in order to estimate the nonlinear parameters of an HM (which is used as actual echo path model). The resulting ESA-HM structure is depicted in Fig. 6. Therein, the topmost branch contains the actual echo path, which is assumed to have Hammerstein structure. Below, there are three colored blocks comprising the novel structure denoted as ESA-HM. While block α computes the actual echo estimate for AEC by an HM, and blocks β and γ facilitate the estimation of the nonlinear preprocessor f _pp{·} of the HM in block α. In the following, the ESA-HM will be explained in a bottom-up manner, starting with block γ and ending with α. To avoid redundancy, the employed adaptive filtering algorithms, all of which have been described in previous sections, are not entirely re-written for the particular filters to be estimated in this section but are referenced instead.

4.2.1 Equalization of the RIR (block γ)

In block γ (Fig. 6), the ESA-HM directly exploits the cascaded nature of the nonlinear echo path (memoryless nonlinearity f{·}, followed by an impulse response h(k)): the IR h(k) is equalized by an adaptive linear filter with coefficients $\hat {h}_{\text {eq},l} (k)$ for tap l at time k. This yields a discrete-time estimate

$$ \begin{aligned} \hat{x}_{\text{NL}}{(k-L)}= \sum_{l=0}^{L-1} \hat{h}_{\text{eq},l}{(k-1)} \cdot y {(k-l)} \end{aligned} $$

(41)

of the delayed nonlinearly distorted loudspeaker signal x _NL(k−L). The linear equalizer $\hat {h}_{\text {eq},l} (k)$ is identified by a PB-FNLMS algorithm (see Section 3.2.2) using the error signal

$$ e_{\text{eq}}(k) = x (k - L) - \hat{x}_{\text{NL}}(k - L)\,. $$

(42)

On the one hand, (41) suggests that $\hat {x}_{\text {NL}}(k -L)$ is an estimate for x(k−L), obtained via linear filtering of y(k). On the other hand, the linear equalizer $\hat {h}_{\text {eq},l} (k)$ cannot equalize the nonlinear components of the LEMS, such that the nonlinear distortion remains in the equalizer output and $\hat {x}_{\text {NL}}(k - L)$ according to (41) can be seen as an estimate for the nonlinearly distorted signal x _NL(k−L). Thus, while the inability of impulse responses to model nonlinear systems hampers the performance of the AEC system in the first place, this inability is exploited here to estimate an otherwise inaccessible intermediate signal to guide the estimation of the nonlinear components (described in the next paragraphs). Furthermore, the equalization is only as complex as a single linear AEC system (assuming identical filter lengths).

4.2.2 Estimating the nonlinearity (block β)

Note that both the RIR and the frequency response of the loudspeaker (the linear component of the loudspeaker) have been equalized by the inverse filtering in block γ (Fig. 6), such that $\hat {x}_{\text {NL}}(k - L)$ is time-aligned with x(k−L). Thus, the nonlinear mapping between the discrete-time loudspeaker signal x(k−L) and the estimate of the nonlinear loudspeaker signal $\hat {x}_{\text {NL}} (k - L)$ can be modeled by an adaptive time-domain HGM with branch kernels $\hat {h}_{b} (k)$ with a very small temporal support of L _SA taps (e.g., L _SA=3). This corresponds to the input/output relation

$$ \hat{\hat{x}}_{\text{NL}}{(k-L)} = \sum_{b=1}^{B} \sum_{l=0}^{L_{\text{SA}}-1}x (k - L - l) \cdot \hat{h}_{b,l} (k - 1). $$

(43)

The adaptation of $\hat {h}_{b,l} (k)$ is performed by a time-domain NLMS algorithm (see Section 3.2.1) operating on the error signal

$$ e_{\text{eq},\text{HGM}}(k) = \hat{x}_{\text{NL}}(k - L) - \hat{\hat{x}}_{\text{NL}}{(k-L)}. $$

(44)

Due to the small number of modeled taps L _SA, such a time-domain algorithm is computationally inexpensive.

4.2.3 Estimating the preprocessor of the HM (between blocks β and α)

Analogously to the previously proposed SA filtering concept, the parameters of the adaptive HGM in block β in Fig. 6 (branch kernels $\hat {h}_{b,l}{(k)}$) can be employed to determine the coefficients of the nonlinear preprocessor of the HM in block α in Fig. 6. Using a preprocessor with the structure of (29), this is done analogously to Eqs. (38) and (39) on a frame-wise basis (every M samples) according to

$$ \tilde{w}_{b}({\kappa})=C_{b,1}(\kappa)\left/C_{1,1}(\kappa)\right. $$

(45)

with

$$ C_{b,1} (\kappa)= \sum_{l=0}^{L_{\text{SA}}-1} \hat{h}_{b,l}{(\kappa\cdot M)} \hat{h}_{1,l}{(\kappa\cdot M)}, $$

(46)

with the filter taps $\hat {h}_{b,l}{(k)}$ from the end of time frame κ, where k=κ·M. As in (40), a subsequent temporal smoothing of these estimates leads to the final preprocessor coefficients

$$ \hat{w}_{b}({\kappa+1})=\gamma_{\mathrm{w}}\hat{w}_{b}({\kappa})+(1-\gamma_{\mathrm{w}})\tilde{w}_{b}\,({\kappa}), $$

(47)

where 0≤γ _w<1.

4.2.4 Computation of the echo estimate

The HM’s linear subsystem computes the echo estimate in block α in Fig. 6 similar as in (3) by convolving x _pp(k) with an impulse response estimate $\hat {h}_{\text {HM},l} (k)$ according to

$$ \hat{y} (k) = \sum_{l=0}^{L-1} x_{\text{pp}}{(k-l)} \hat{h}_{\text{HM},l}\,(k - 1). $$

(48)

For computational efficiency, $\hat {h}_{\text {HM},l} (k)$ is estimated adaptively by a PB-FNLMS algorithm (see Section 3.2.2) operating on the error signal

$$ e_{\text{ESA}}(k) = y (k) - \hat{y} (k). $$

(49)

The overall computational complexity of this algorithm is slightly higher than two adaptive linear filters (originating from blocks α and γ). A detailed analysis of the computational complexity of this novel algorithm and of previously proposed algorithms will be given in the following in Section 5.1.

4.3 Structural comparison of SA and ESA filtering

In the following, the similarities and differences between the previously proposed SA filtering and the novel ESA filtering will be summarized. Note that in this section, the notation of time-domain filters will be employed, despite their potential realization as block-partitioned adaptive filters in the frequency domain.

Similarities Both the SA-HGM and ESA-HM decompose the acoustic echo path into subsystems to facilitate the efficient estimation of an HGM to estimate the nonlinear components of the echo path. Furthermore, both the SA-HGM and ESA-HM estimate the preprocessor coefficients of an HM from identified HGM kernels as a least squares estimate and use this preprocessor in an HM with a long subsequent impulse response. Therefore, both structures are considered as different variants of SA filtering.

Differences Despite their similarities, the previously proposed SA filtering and the novel ESA filtering employ a fundamentally different mechanism for the estimation of the nonlinearity by an HGM with small temporal support. As depicted in Fig. 7 a, classical SA filtering employs a parallel decomposition of the echo path into a direct-path component and a complementary-path component. Canceling the complementary-part component of the echo signal yields the direct-path component of the echo path, which is identified efficiently by an HGM for the SA-HGM (recall Section 4.1). As opposed to this, ESA filtering does not rely on any temporal decomposition of the echo path, such that it may be advantageous for non-acoustic system identification, where a dominant direct-path peak does not exist. Instead, the cascade of the Hammerstein system (a serial decomposition) is employed by augmenting the LEMS with an equalizer (recall block γ in Fig. 6), as depicted in Fig. 7 b. The resulting overall system ideally consists of the first component of the Hammerstein echo path, namely the nonlinearity. This nonlinearity can be identified by an even shorter HGM than for the originally proposed SA filtering and does not require knowledge about any temporal structure of the Hammerstein-shaped echo path’s linear subsystem. Another difference between the proposed SA and ESA filters is that the ESA-HM provides an echo path model with Hammerstein structure (output e _ESA(k) in Fig. 6), while the SA-HGM allows to replace the direct-path region of an echo path model with Hammerstein structure by an HGM. This corresponds to the two SA-HGM outputs e _HM(k) and e _SA(k) in Fig. 5, respectively.

5 Evaluation

In this section, the novel and the previously proposed SA filtering concepts will be compared to each other and to classical adaptive filters in terms of computational complexity (Section 5.1) and echo cancellation performance (Section 5.2).

5.1 Computational complexity

This section contains an in-depth analysis of the computational complexity of the previously discussed adaptive filters. In practice, the actual computational load of an algorithm is determined by many platform-specific factors, such as the instruction set, the number of clock cycles for a particular arithmetic operation, pipelining, and caching abilities of the processor. Furthermore, especially with growing amount of data to be processed, the memory access pattern may significantly impact how well the processor exploits its capabilities or waits for new data from external memory. Nevertheless, the number of FLoating Point Operations (FLOPs) is still a commonly accepted indicator for the computational complexity of an algorithm on modern platforms, where the different arithmetic operations require similar execution time in principle. Operations counted as FLOPs will be real-valued multiplications (RMULs), additions (RADDs), and divisions (RDIVs).

As complex numbers typically have to be implemented by treating real and imaginary parts individually, complex-valued arithmetic operations are composed of multiple real-valued operations. In the following, we will assume the realization of a complex-valued multiplication (CMUL) by four RMULs and two RADDs. Only for FFT algorithms, where operations performed on the twiddle factors can be cached, an alternative implementation with three RMULs and three RADDs will be considered [63]. Thereby, an Nth-order DFT for real-valued time-domain signals can be implemented by

$$\text{RMUL}_{\text{FFT}}(N)=\frac{1}{2}N\log_{2}N-\frac{5}{4}N $$

RMULs and

$$\text{RADD}_{\text{FFT}}(N)=\frac{3}{2}N\log_{2}N-\frac{1}{4}N-4 $$

RADDs [64].

Complexity of adaptive HGM Using these conventions, Table 1 lists the computational effort of the individual operations for the adaptation of an HGM in the time-domain (recall Section 3.2.1).

Table 1 Computational effort for the identification of an HGM by a time-domain NLMS algorithm

Full size table

The first column contains the name of the computed quantity, the second column contains the corresponding equation number in this article, the third column indicates how often the equation has to be evaluated, and the subsequent columns list the number of FFTs, complex-valued, and real-valued operations for the particular equation. Below this list, the accumulated number of individual operations and the total number of FLOPs are listed as well. Note that these operation counts are given per output sample for this time-domain algorithm. The frequency-domain algorithms considered in the following will be analyzed on a frame-wise basis.

Complexity of adaptive PB-HGM Analogously to Table 1, the operations for identifying a PB-HGM according to Section 3.2.2 in the frequency domain are listed in Table 2. Therein, $N_{\mathrm {N}}=\left \lceil \tfrac {N-1}{2}\right \rceil +1$ denotes the number of non-redundant frequency bins (DC bin to Nyquist bin), for which signal processing is done—the remaining bins can be reconstructed due to the conjugate symmetry. Recall that the operations for a linear model are included in this analysis for the special case of B=1. Furthermore, the case of a single-partition HGM results for P=1.

Table 2 Computational effort for the identification of an HGM by a PB-FNLMS algorithm

Full size table

Complexity of adaptive SA-HGM The two aforementioned PB-HGM special cases are building blocks for the complexity analysis of the PBSA-HGM (recall Section 4.1), which is listed in Table 3. The additional N _N additions for the HGM submodel stem from the fact that the HGM submodel’s output has to be added to the complementary-part echo estimate obtained with the HM (recall (35)).

Table 3 Computational effort for the identification of a PBSA-HGM with a PB-FNLMS algorithm

Full size table

Complexity of adaptive ESA-HM In the same way, the operations for the novel ESA-HM are listed in Table 4. These operations consist of one linear frequency-domain adaptive filter for the HM and one for the echo path inversion, a very short time-domain HGM for assessing loudspeaker nonlinearities, determining the preprocessor weights from the HGM, and evaluating the preprocessor of the HM.

Table 4 Computational effort for the identification of a ESA-HM with a PB-FNLMS algorithm

Full size table

Comparison For consistency with the echo reduction performance evaluation later on, a frame shift of M=256 taps, B=5 branches, P=4 frames, and only L _SA=3 taps for the ESA-HM’s HGM submodel are assumed as default. Varying the number of branches B of the nonlinear models and normalizing the resulting FLOPs to the FLOPs of a linear PB-FNLMS yields the relative FLOPs depicted in Fig. 8 a.

Obviously, the ESA-HM has the constant offset of another linear filter for system inversion, which makes it unattractive in comparison to the previously proposed PBSA-HGM for only two or three branches. For two branches, even a full PB-HGM is marginally more efficient than an ESA-HM and only slightly more complex than a PBSA-HGM. However, with increasing number of branches, the benefit of an ESA-HM in comparison to a PB-HGM or even to the less complex PBSA-HGM becomes evident: an ESA-HM’s complexity is mainly dominated by the two linear adaptive filters—even for B=20 branches.

Varying the number of frames P and leaving B=5 constant results in the relative FLOPs depicted in Fig. 8 b, where relative FLOPs result from normalization to the FLOPs of an adaptive linear model with a single partition. As can be seen, the ESA-HM leads to a significantly reduced computational complexity for a low number of frames P. Thus, the ESA-HM always leads to a complexity reduction compared to the PB-HGM, even for P=1, which corresponds to conventional frequency-domain adaptive filtering without block partitioning.

5.2 Echo reduction performance

In this section, the echo reduction performance of the novel ESA and the previously proposed SA filtering concept will be compared to classical adaptive filters based on linear models and HGMs.

5.2.1 Experimental setup

The following evaluation is based on double-talk free recordings of male and female speech, played-back and recorded with a smartphone in hands-free mode at very high volume. In total, five different physical setups, referred to as setups A to E, will be considered. The acoustically relevant characteristics of these setups are listed in Table 5.

Table 5 Properties of setups A to E for the AEC experiments

Full size table

Setups A and B correspond to recordings with a duration of about 130 s in a living room-like environment. Setups A and B differ in the placement of the smartphone within the same environment. Setup C corresponds to a recording in an anechoic environment and has a duration of about 80 s. The data for setups D and E are synthesized by convolving the anechoic nonlinear recordings obtained from setup C with measured RIRs. The employed RIRs are from a lab environment with variable acoustics, once with curtains at the walls (setup D) and once with open curtains (setup E). For all setups, a partitioned-block linear model (PB-HGM with a single linear branch), a full PB-HGM (full temporal support), a PBSA-HGM, and an ESA-HM will be compared.

In the following experiments, processing is done at a sampling rate of f _S=16 kHz. The acoustic echo paths are estimated with FIR filters of length L=1024 taps and the input signals and impulse responses are partitioned into P=4 blocks with a relative frame shift of M=256 taps (implying N=512). Note that, despite large values of T₆₀ for some setups, the echo reduction performance in the experiments is not limited by the filter length but by the nonlinearity of the echo path (see Section 5.2.2 for a quantitative analysis of the amount of nonlinearity in the recordings).

As in [20], the HGMs consist of B=5 branches, with odd-order Legendre polynomials up to order 9 as nonlinearity basis functions f _b{·} in (5). The algorithm parameters used in the experiments have been selected based on the recordings of setup C. In particular, a stepsize of μ _b=0.1 (see (26)) is used for the kernels of the PB-HGMs and a stepsize of μ _b=0.2 is used for the PBSA-HGMs and ESA-HMs. For the reference algorithm, consisting of the linear filter, a stepsize of μ ₁=0.5 is employed to account for the lack of multiple adaptation branches. Furthermore, the adaptation process is refined using the robust statistics described in [34, 59], which hardly affects the computational complexity but robustifies the system identification against outliers in the error signal. For the same purpose, the dynamics of the adaptive preprocessor weights is limited to a maximum stepsize of ±0.001 per frame.

The AEC performance is quantified by computing the Echo-Return Loss Enhancement (ERLE) measure [62]

$$ \text{ERLE}=10\log_{10}\left(\frac{\mathcal{E} \left\{ {y_{\text{echo}}^{2}(k)} \right\}}{\mathcal{E} \left\{\left|y_{\text{echo}}(k) - \hat y(k)\right|^{2}\right\}}\right) \text{dB}, $$

(50)

where y _echo(k) is the echo signal component of the microphone signal y(k) and $\hat y(k)$ is the echo estimate obtained using the model to be evaluated (linear model, PB-HGM, PBSA-HGM, ESA-HM). For single talk, (50) becomes

$$ \text{ERLE}=10\log_{10}\left(\frac{\mathcal{E} \left\{ {y^{2}(k)} \right\}}{\mathcal{E} \left\{ {e^{2}(k)} \right\} }\right) \text{dB}, $$

(51)

where the numerator is the power of the microphone signal y(k) and the denominator is the power of the error signal e(k) produced by the model to be evaluated. For a practical evaluation, the mathematical expectation (ensemble average) in (51) will be replaced by a time-averaging over the entire sequences. Note that the ERLE measure in its general definition of (50) does not depend on the near-end signal components (including the number and positions of the near-end sources) but requires knowledge of the actual far-end signal component during double-talk periods. This knowledge is not available in practice. Nonetheless, the ERLE performance during double-talk periods can be simulated by performing AEC on single-talk recordings with previously determined fixed filters (as will be done in Section 5.2.3 in experiment 2).

5.2.2 Quantification of nonlinearity

For monofrequent excitation signals, the nonlinearity of a system can easily be assessed by means of the Total Harmonic Distortion (THD), which measures the ratio of the energy of the excitation frequency to the energy of its harmonics. However, for nonstationary broadband excitation signals like speech, this measure is not meaningful anymore. In order to characterize the nonlinearity nevertheless, one can pursue noise-loading methods [65] or, as done in the following, methods based on the actual AEC excitation signal itself. To this end, consider the energy E _in which is recorded in the bandwidth excited by the digital loudspeaker signal (0−8 kHz) and the energy E _out which is recorded above for frequencies exceeding the highest frequency contained in the excitation signal (8−16 kHz). For a linear system, the out-of-band energy is only caused by noise. For a nonlinear system, E _out is significantly increased due to the harmonic distortion. The In-band/Out-of-band Ratio (IOR)

$$\text{IOR}=10\log_{10}\left(\frac{E_{\text{in}}}{E_{\text{out}}}\right)\text{dB} $$

may therefore serve as a measure of the intensity of the nonlinearity in the echo signal.

Furthermore, the so-called Linear-to-Non-Linear Ratio (LNLR) is defined as the ratio between the power of the linear echo signal component and the power of the nonlinear echo signal component. Thus, a lower LNLR corresponds to stronger nonlinearities. While the THD is a description of how much nonlinearity a particular frequency causes at its harmonics, a frequency-selective LNLR quantifies the distortion present at a particular frequency⁴. Although an exact LNLR computation required exact knowledge of the linear and nonlinear system components to compute the respective signal components, the LNLR can be estimated using the echo estimate $\hat {y}(k)$ obtained by a converged long adaptive linear filter and the resulting error signal $e(k)=y(k)-\hat {y}(k)$. Employing the respective signal spectra estimates $S_{\hat {y}\hat {y}}(f)$ and S _ee(f), the LNLR at a frequency f can be estimated as

$$ \text{LNLR}(f)=10\log_{10}\left(\frac{S_{\hat{y}\hat{y}}\,\,(f)}{S_{ee}\,\,(f)}\right) \text{dB}. $$

(52)

The particular smartphone employed for the AEC experiments leads to an IOR of 27 dB for the AEC speech signal in an anechoic environment⁵. The LNLR estimated from the same measurement is depicted in Fig. 9 up to 8 kHz—for higher frequencies, the numerator of (52) is determined by noise only and, therefore, (52) cannot be employed to estimate the LNLR anymore.

As can be seen, the LNR (f) is strongly frequency-dependent and varies between 1.2 dB (at about 2.5 kHz) and 13.7 dB (at about 500 Hz). This minimum around 2.5 kHz and the maximum around 500 Hz coincide with a local minimum and the maximum of the far-end signal spectrum, respectively.

5.2.3 Experimental results

In the following, two experiments will be discussed. Experiment 1 considers an adaptive identification of the nonlinear systems according to Section 5.2.1, and experiment 2 considers an offline filtering without further adaptation with previously adapted models from the end of experiment 1. While experiment 1 is affected by the initial convergence phase, experiment 2 evaluates the performance achievable in double-talk situations after convergence of the filters, which is of vital importance for a full-duplex voice communication system as well.

5.2.3.1 Experiment 1 - continuously adapted filters

Figure 10 a shows the ERLE measure for different setups and models for experiment 1. Note that the PBSA-HGM is listed twice: once with the previously considered output signal e _SA (k) and once with the HM submodel’s output e _HM (k). Obviously, the linear filter performs worst for all setups. The other end of the scale is represented by the PB-HGM, which leads to an almost twice as high average ERLE value for setup C. The PBSA-HGM leads to a good approximation of the PB-HGM. In setups A and B, the PB-HGM with full temporal support is even marginally outperformed by the PBSA-HGM’s HM and HGM outputs, respectively. On the other hand, when considering the HM submodel of the PBSA-HGM only, the average gap to the PB-HGM performance is larger. This is not surprising, as the classical PBSA-HGM with a peak-modeling HGM has more degrees of freedom and tracks the nonlinearities before providing a smoothed nonlinearity estimate to the HM submodel. The ESA-HM can also approximate the PB-HGM but leads to slightly less ERLE than the PBSA-HGM’s HM output in experiment 1. Yet, the ESA-HM consistently outperforms the partitioned block linear filter. This clearly verifies the applicability of SA filtering in general and of the proposed novel ESA filtering concept for practically relevant scenarios with real-world smartphone recordings.

5.2.3.2 Experiment 2 - echo reduction during double talk

In experiment 2 (depicted in Fig. 10 b), when processing the entire sequence offline with the converged filters from the previous experiment, the ESA-HM performs as good as the full PB-HGM in setups C and D. Only in setups A and E, the PBSA-HGM can marginally outperform the ESA-HM in experiment 2. Overall, the ESA-HM generalizes slightly better for double-talk situations than the PBSA-HGM. Interestingly, the typically considered PBSA-HGM’s HGM output and the PBSA-HGM’s HM output do not differ significantly in experiment 2, which supports the initial assumption that an HM is a suitable efficient approximation of the echo path. The increased performance due to the PBSA-HGM’s HGM submodel seems to originate from a quicker reaction to the instantaneous signal characteristics.

Clearly, the novel ESA-HM is a very well performing alternative to the PBSA-HGM, which leads to comparable or even better echo reduction in double-talk situations.

6 Conclusions

The adaptation of nonlinear echo paths for small portable devices requires efficient adaptive nonlinear echo path models. To this end, a novel variant of SA filtering has been introduced and compared to known concepts in this article. This novel ESA filtering method, leading to an adaptive ESA-HM, exploits the inability of IRs to model nonlinearities to obtain an estimate of the unobservable nonlinearly distorted loudspeaker signal by inverse filtering. While the previously proposed PBSA-HGM has been an efficient alternative to HGMs only for a block-partitioning of the filter, the novel ESA filtering concept is advantageous without block partitioning at all and for a very high number of branches (see complexity analyses in Section 5.1). For applications where very long RIRs need to be modeled and where a low input-output delay is required, a block partitioning and therefore the PBSA-HGM may computationally be more efficient than the ESA-HM. Both methods thereby complement each other very well in terms of computational efficiency for different application scenarios. A comparison of the echo reduction performance of the ESA-HM, the PBSA-HGM, a linear model, and a PB-HGM in Section 5.2 has emphasized the efficacy of the proposed ESA-HM, especially for double-talk situations, in which AEC is actually most important.

7 Endnotes

¹Note that this representation of a Volterra filter is also referred to as diagonal coordinate representation [66].

²This update rule is also referred to as NLMS with kernel-specific normalization in [66].

³As opposed to time-domain adaptive filters, the complexity for filter adaptation in the frequency domain is not determined by the length of the time-domain support of the filter, but by the DFT size N and the number of partitions P. This disqualifies unpartitioned frequency-domain adaptive filters, as the HGM submodel in Fig. 5 would have the same complexity as the HGM with full temporal support in Fig. 2 d.

⁴Determining such a measure is also referred to as Schüßler’s model in [67], as it goes back to [68, 69].

⁵Note that this number is caused by nonlinearity and not by background noise, as the SNR of the recorded signal is more than 24 dB in the 8−16 kHz range, where E _out is computed.

References

MM Sondhi, An adaptive echo canceller. Bell Syst. Tech. J. 46(3), 497–511 (1967).
Article Google Scholar
P Dreiseitel, E Hänsler, H Puder, in Proc. European Signal Process. Conf. (EUSIPCO). Acoustic echo and noise control—a long lasting challenge (EURASIPRhodes, 1998), pp. 945–952.
Google Scholar
EJ Thomas, Some considerations on the application of the Volterra representation of nonlinear networks to adaptive echo cancellers. Bell Syst. Tech. J.50(8), 2797–2805 (1971).
Article Google Scholar
A Stenger, L Trautmann, R Rabenstein, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP), 2. Nonlinear acoustic echo cancellation with 2nd order adaptive Volterra filters (IEEEPhoenix, 1999), pp. 877–880.
Google Scholar
M Zeller, LA Azpicueta–Ruiz, J Arenas-Garcia, W Kellermann, Adaptive Volterra filters with evolutionary quadratic kernels using a combination scheme for memory control. IEEE Trans. Signal Process.59(4), 1449–1464 (2011).
Article Google Scholar
AN Birkett, RA Goubran, in Proc. IEEE Workshop Neural Networks Signal Process. (NNSP). Nonlinear echo cancellation using a partial adaptive time delay neural network (IEEECambridge, 1995), pp. 449–458.
Google Scholar
LSH Ngja, J Sjobert, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Nonlinear acoustic echo cancellation using a Hammerstein model (IEEESeattle, 1998), pp. 1229–1232.
Google Scholar
D Comminiello, M Scarpiniti, R Parisi, A Uncini, in Proc. Int. Workshop Acoustic Echo, Noise Control (IWAENC). A functional link based nonlinear echo canceller exploiting sparsity (IWAENCTel Aviv, 2010).
Google Scholar
D Comminiello, M Scarpiniti, LA Azpicueta-Ruiz, J Arenas-Garcia, A Uncini, Functional link adaptive filters for nonlinear acoustic echo cancellation. IEEE Trans. Audio Speech Lang. Process.21(7), 1502–1512 (2013).
Article Google Scholar
G Li, C Wen, WX Zheng, Y Chen, Identification of a class of nonlinear autoregressive models with exogenous inputs based on kernel machines. IEEE Trans. Signal Process.59(5), 2146–2159 (2011).
Article MathSciNet Google Scholar
J Kivinen, AJ Smola, RC Williamson, Online learning with kernels. IEEE Trans. Signal Process.52(8), 165–176 (2004).
Article MathSciNet Google Scholar
AN Birkett, RA Goubran, in Proc. IEEE Workshop Applications Signal Process. Audio Acoustics (WASPAA). Limitations of handsfree acoustic echo cancellers due to nonlinear loudspeaker distortion and enclosure vibration effects (IEEENew Paltz, 1995), pp. 103–106.
Google Scholar
A Stenger, R Rabenstein, in Proc. European Signal Process. Conf. (EUSIPCO), 98. An acoustic echo canceller with compensation of nonlinearities (EURASIPIsland of Rhodes, 1998), pp. 969–972.
Google Scholar
A Stenger, W Kellermann, Adaptation of a memoryless preprocessor for nonlinear acoustic echo cancelling. Signal Process.80(9), 1747–1760 (2000).
Article MATH Google Scholar
S Malik, G Enzner, State-space frequency-domain adaptive filtering for nonlinear acoustic echo cancellation. IEEE Trans. Audio Speech Lang. Process.20(7), 2065–2079 (2012).
Article Google Scholar
S Shimauchi, Y Haneda, in Proc. Int. Workshop Acoustic Echo, Noise Control (IWAENC). Nonlinear acoustic echo cancellation based on piecewise linear approximation with amplitude threshold decomposition (VDEAachen, 2012), pp. 1–4.
Google Scholar
S Malik, G Enzner, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Variational Bayesian inference for nonlinear acoustic echo cancellation using adaptive cascade modeling (IEEEKyoto, 2012), pp. 37–40.
Google Scholar
S Malik, G Enzner, A variational Bayesian learning approach for nonlinear acoustic echo control. IEEE Trans. Signal Process.61(23), 5853–5867 (2013).
Article MathSciNet Google Scholar
C Huemmer, C Hofmann, R Maas, A Schwarz, W Kellermann, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). The elitist particle filter based on evolutionary strategies as novel approach for nonlinear acoustic echo cancellation (IEEEFlorence, 2014), pp. 1315–1319.
Google Scholar
C Hofmann, C Huemmer, W Kellermann, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Significance-aware Hammerstein group models for nonlinear acoustic echo cancellation (IEEEFlorence, 2014), pp. 5934–5938.
Google Scholar
C Huemmer, C Hofmann, R Maas, W Kellermann, in Proc. IEEE Global Conf. Signal Information Process. (GlobalSIP). The significance-aware EPFES to estimate a memoryless preprocessor for nonlinear acoustic echo cancellation (IEEEAtlanta, 2014), pp. 557–561.
Google Scholar
C Hofmann, M Guenther, C Huemmer, W Kellermann, in Proc. European Signal Process. Conf. (EUSIPCO). Efficient nonlinear acoustic echo cancellation by partitioned-block significance-aware Hammerstein group models (EURASIPBudapest, 2016).
Google Scholar
S Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process.27(2), 113–120 (1979).
Article Google Scholar
S Gustafsson, R Martin, P Vary, Combined acoustic echo control and noise reduction for hands-free telephony. Signal Process.64(1), 21–32 (1998).
Article MATH Google Scholar
K Linhard, T Haulick, in Proc. European Conf. Speech Communication and Technology (EUROSPEECH). Noise subtraction with parametric recursive gain curves (ISCABudapest, 1999), pp. 2611–2614.
Google Scholar
I Cohen, Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Process. Lett.9(4), 113–116 (2002).
Article Google Scholar
O Hoshuyama, A Sugiyama, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP), 5. An acoustic echo suppressor based on a frequency-domain model of highly nonlinear residual echo (IEEEToulouse, 2006), pp. 269–272.
Google Scholar
F Küch, W Kellermann, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP), I. Nonlinear residual echo suppression using a power filter model of the acoustic echo path (IEEEHonolulu, 2007), pp. 73–76.
Google Scholar
DA Bendersky, JW Stokes, HS Malvar, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Nonlinear residual acoustic echo suppression for high levels of harmonic distortion (IEEELas Vegas, 2008), pp. 261–264.
Google Scholar
O Hoshuyama, in Proc. Int. Workshop Acoustic Echo, Noise Control (IWAENC). An update algorithm for frequency-domain correlation model in a nonlinear echo suppressor (VDEAachen, 2012).
Google Scholar
A Schwarz, C Hofmann, W Kellermann, in Proc. IEEE Workshop Applications Signal Process. Audio Acoustics (WASPAA). Spectral feature-based nonlinear residual echo suppression (IEEENew Paltz, 2013).
Google Scholar
A Schwarz, C Hofmann, W Kellermann, in ITG Conf. Speech Commun. Combined nonlinear echo cancellation and residual echo suppression (VDEErlangen, 2014).
Google Scholar
DL Duttweiler, A twelve-channel digital echo canceler. IEEE Trans. Commun.26(5), 647–653 (1978).
Article Google Scholar
T Gänsler, A double-talk resistant subband echo canceller. Signal Process.65(1), 89–101 (1998).
Article MATH Google Scholar
T Gänsler, J Benesty, A frequency-domain double-talk detector based on a normalized cross-correlation vector. Signal Process.81:, 1783–1787 (2001).
Article MATH Google Scholar
H Buchner, J Benesty, T Gänsler, W Kellermann, Robust extended multidelay filter and double-talk detector for acoustic echo cancellation. IEEE Trans. Audio Speech Lang. Process.14(5), 1633–1644 (2006).
Article Google Scholar
K Shi, X Ma, GT Zhou, in Annual Conf. on Information Sciences and Systems (CISS). A mutual information based double-talk detector for nonlinear systems (IEEEPrinceton, 2008), pp. 356–360.
Google Scholar
M Schneider, EAP Habets, in Proc. European Signal Process. Conf. (EUSIPCO). Comparison of multichannel doubletalk detectors for acoustic echo cancellation (EURASIPNice, 2015), pp. 300–304.
Google Scholar
IW Hunter, MJ Korenberg, The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biol. Cybern.55(2), 135–144 (1986). doi:10.1007/BF00341929.
MathSciNet MATH Google Scholar
VJ Mathews, GL Sicuranza, Polynomial Signal Processing (Wiley, New York, 2000).
Google Scholar
GL Sicuranza, A Carini, in Proc. IEEE Instrumentation and Measurement Technology Conf. (IMTC). On the accuracy of generalized Hammerstein models for nonlinear active noise control (IEEESorrento, 2006), pp. 1411–1416.
Google Scholar
S Malik, G Enzner, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Fourier expansion of Hammerstein models for nonlinear acoustic system identification (IEEEPrague, 2011), pp. 85–88.
Google Scholar
S Van Vaerenbergh, LA Azpicueta-Ruiz, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Kernel-based identification of Hammerstein systems for nonlinear acoustic echo-cancellation (IEEEFlorence, 2014), pp. 3739–3743.
Google Scholar
F Küch, A Mitnacht, W Kellermann, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP), 3. Nonlinear acoustic echo cancellation using adaptive orthogonalized power filters (IEEEPhiladelphia, 2005), pp. 105–108.
Google Scholar
F Küch, W Kellermann, Orthogonalized power filters for nonlinear acoustic echo cancellation. Signal Process.86(6), 1168–1181 (2006).
Article MATH Google Scholar
VJ Mathews, Adaptive polynomial filters. IEEE Signal Process. Mag.8(3), 10–26 (1991).
Article Google Scholar
MJ Korenberg, IW Hunter, The identification of nonlinear biological systems: Volterra kernel approaches. Ann. Biomed. Eng.24(2), 250–268 (1996).
Article Google Scholar
A Fermo, A Carini, GL Sicuranza, Low-complexity nonlinear adaptive filters for acoustic echo cancellation in GSM handset receivers. Eur. Trans. Telecommun.14(2), 161–169 (2003).
Article Google Scholar
A Carini, S Cecchi, M Gasparini, GL Sicuranza, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Introducing Legendre nonlinear filters (IEEEFlorence, 2014), pp. 7939–7943.
Google Scholar
A Carini, S Cecchi, L Romoli, GL Sicuranza, in Proc. European Signal Process. Conf. (EUSIPCO). Perfect periodic sequences for Legendre nonlinear filters (EURASIPLisbon, 2014), pp. 2400–2404.
Google Scholar
A Carini, GL Sicuranza, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). Even mirror Fourier nonlinear filters (IEEEVancouver, 2013), pp. 5608–5612.
Google Scholar
GL Sicuranza, A Carini, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). A novel class of BIBO stable recursive nonlinear filters (IEEEFlorence, 2014), pp. 7934–7938.
Google Scholar
S Haykin, Adaptive Filter Theory, 4th edn. (Prentice Hall, Upper Saddle River, 2002).
MATH Google Scholar
G Enzner, 22. A Model-Based Optimum Filtering Approach to Acoustic Echo Control: Theory and Practice. Aachener Beiträge zu Digitalen Nachrichtensystemen (ABDN) (Verlag MainzAachen, 2006).
M Zeller, LA Azpicueta-Ruiz, W Kellermann, in Proc. IEEE Workshop Applications Signal Process. Audio Acoustics (WASPAA). Adaptive FIR filters with automatic length optimization by monitoring a normalized combination scheme (IEEENew Paltz, 2009), pp. 149–152, doi:10.1109/ASPAA.2009.5346547.
Google Scholar
M Zeller, W Kellermann, in Proc. Int. Workshop Acoustic Echo, Noise Control (IWAENC). Self-configuring system identification via evolutionary frequency-domain adaptive filters (IWAENCTel Aviv, 2010).
Google Scholar
M Zeller, W Kellermann, in Proc. European Signal Process. Conf. (EUSIPCO). Evolutionary adaptive filtering based on competing filter structures (EURASIPBarcelona, 2011), pp. 1264–1268.
Google Scholar
WG Gardner, Efficient convolution without input-output delay. J. Audio Eng. Soc.43(2), 127–136 (1995).
Google Scholar
H Buchner, J Benesty, T Gänsler, W Kellermann, in Proc. Int. Workshop Acoustic Echo, Noise Control (IWAENC). An Outlier-Robust Extended Multidelay Filter with Application to Acoustic Echo Cancellation (IEEE Japan Council et al.Kyoto, 2003), pp. 19–22.
Google Scholar
F Küch, E Mabande, G Enzner, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP). State-space architecture of the partitioned-block-based acoustic echo controller (IEEEFlorence, 2014), pp. 1295–1299.
Google Scholar
J-S Soo, KK Pang, Multidelay block frequency domain adaptive filter. IEEE Trans. Acoust. Speech Signal Process.38(2), 373–376 (1990).
Article Google Scholar
E Ha Ënsler, G Schmidt, Acoustic Echo and Noise Control: a Practical Approach (Wiley, Hoboken, 2004).
Book Google Scholar
CSS Burrus, TW Parks, DFT/FFT and Convolution Algorithms: Theory and Implementation, 1st edn. (John Wiley & Sons, Inc., New York, 1985).
MATH Google Scholar
H Sorensen, D Jones, M Heideman, C Burrus, Real-valued fast fourier transform algorithms. IEEE Trans. Acoust. Speech Signal Process.35(6), 849–863 (1987).
Article Google Scholar
HW Schüßler, An objective method for measuring the performance of weakly nonlinear and noisy systems. Frequenz. 41(6–7), 147–154 (1987).
Google Scholar
F Küch, Adaptive polynomial filters and their application to nonlinear acoustic echo cancellation (PhD thesis, University Erlangen-Nuremberg, Germany, 2005).
Google Scholar
G Enzner, in ITG Conf. Speech Communication. From acoustic nonlinearity to adaptive nonlinear system identification (VDEBraunschweig, 2012), pp. 1–4.
Google Scholar
HW Schüßler, Y Dong, in 7. Aachener Symposium für Signaltheorie (ASST ’90). Messung und Modellierung von schwach nichtlinearen Systemen (SpringerAachen, 1990), pp. 2–7. (in German).
Chapter Google Scholar
HW Schüßler, Y Dong, in Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Process. (ICASSP), 4. A new method for measuring the performance of weakly nonlinear systems (IEEEGlasgow, 1989), pp. 2089–2092.
Google Scholar

Download references

Acknowledgements

The authors would like to thank the Deutsche Forschungsgemeinschaft (DFG) for supporting this work (contract number KE 890/9-1).

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Multimedia Communications and Signal Processing, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Cauerstraße 7, Erlangen, Germany
Christian Hofmann, Christian Huemmer, Michael Guenther & Walter Kellermann

Authors

Christian Hofmann
View author publications
You can also search for this author in PubMed Google Scholar
Christian Huemmer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Guenther
View author publications
You can also search for this author in PubMed Google Scholar
Walter Kellermann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Hofmann.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Hofmann, C., Huemmer, C., Guenther, M. et al. Significance-aware filtering for nonlinear acoustic echo cancellation. EURASIP J. Adv. Signal Process. 2016, 113 (2016). https://doi.org/10.1186/s13634-016-0410-7

Download citation

Received: 26 June 2016
Accepted: 06 October 2016
Published: 08 November 2016
DOI: https://doi.org/10.1186/s13634-016-0410-7

Significance-aware filtering for nonlinear acoustic echo cancellation

Abstract

1 Introduction

2 Notation

3 Fundamentals of linear and nonlinear acoustic echo cancellation

3.1 From linear to nonlinear echo path models

3.1.1 Linear models

3.1.2 Hammerstein models

3.1.3 Hammerstein group models

3.1.4 Models with dynamic nonlinearities

3.2 Adaptation strategies

3.2.1 Time-domain adaptive filtering

3.2.2 Partitioned-block frequency-domain adaptive filtering

3.2.2.1 Partitioned-block convolution:

3.2.2.2 Partitioned-block frequency-domain adaptive filtering (PBFDAF):

4 Significance-aware filtering

4.1 Significance-aware Hammerstein group models

4.1.1 Estimation of the RIR of the HM submodel

4.1.2 Estimating the HGM submodel

4.1.3 Estimating the preprocessor of the HM

4.2 Equalization-based significance-aware Hammerstein models

4.2.1 Equalization of the RIR (block γ)

4.2.2 Estimating the nonlinearity (block β)

4.2.3 Estimating the preprocessor of the HM (between blocks β and α)

4.2.4 Computation of the echo estimate

4.3 Structural comparison of SA and ESA filtering

5 Evaluation

5.1 Computational complexity

5.2 Echo reduction performance

5.2.1 Experimental setup

5.2.2 Quantification of nonlinearity

5.2.3 Experimental results

5.2.3.1 Experiment 1 - continuously adapted filters

5.2.3.2 Experiment 2 - echo reduction during double talk

6 Conclusions

7 Endnotes

References

Acknowledgements

Competing interests

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords