 Research Article
 Open Access
A Stereo Crosstalk Cancellation System Based on the CommonAcoustical Pole/Zero Model
 Lin Wang^{1, 2}Email author,
 Fuliang Yin^{1} and
 Zhe Chen^{1}
https://doi.org/10.1155/2010/719197
© Lin Wang et al. 2010
 Received: 8 January 2010
 Accepted: 7 August 2010
 Published: 11 August 2010
Abstract
Crosstalk cancellation plays an important role in displaying binaural signals with loudspeakers. It aims to reproduce binaural signals at a listener's ears via inverting acoustic transfer paths. The crosstalk cancellation filter should be updated in real time according to the head position. This demands high computational efficiency for a crosstalk cancellation algorithm. To reduce the computational cost, this paper proposes a stereo crosstalk cancellation system based on commonacoustical pole/zero (CAPZ) models. Because CAPZ models share one set of common poles and process their zeros individually, the computational complexity of crosstalk cancellation is cut down dramatically. In the proposed method, the acoustic transfer paths from loudspeakers to ears are approximated with CAPZ models, then the crosstalk cancellation filter is designed based on the CAPZ transfer functions. Simulation results demonstrate that, compared to conventional methods, the proposed method can reduce computational cost with comparable crosstalk cancellation performance.
Keywords
 Filter Length
 Inverse Filter
 Acoustic Path
 Acoustic Transfer
 Crosstalk Cancellation
1. Introduction
A 3D audio system can be used to position sounds around a listener so that the sounds are perceived to come from arbitrary points in space [1, 2]. This is not possible with classical stereo systems. Thus, 3D audio has the potential of increasing the sense of realism in music or movies. It can be of great benefit in virtual reality, augmented reality, remote video conference, or home entertainment. A 3D audio technique achieves virtual sound perception by synthesizing a pair of binaural signals from a monaural source signal with the provided 3D acoustic information: the distance and direction of the sound source with respect to the listener. Specifically, the sense of direction can be rendered by using headrelated acoustic information, such as headrelated transfer functions (HRTFs) which can be obtained by either experimental or theoretical means [3, 4]. To deliver binaural signals, the simplest way is through headphones. However, in many applications, for example, home entertainment environment, teleconferencing, and so forth, many listeners prefer not to wear headphones. If loudspeakers are used, the delivery of these binaural signals to the listener's ears is not straightforward. Each ear receives a socalled crosstalk component, moreover, the direct signals are distorted by room reverberation. To overcome the above problems, an inverse filter is required before playing binaural signals through loudspeakers.
The concept of crosstalk cancellation and equalization was introduced by Atal and schroeder [5] and Bauer [6] in the early 1960s. Many sophisticated crosstalk cancellation algorithms have been presented since then, using two or more loudspeakers for rendering binaural signals. Crosstalk cancellation can be realized directly or adaptively. Supposing that the acoustical transfer paths from loudspeakers to ears are known, the direct implementation method calculates the crosstalk cancellation filter by directly inverting the acoustical transfer functions [7, 8]. Generally a headtracking scheme, which can tell the head position precisely, is employed to work together with the direct estimation method. The direct estimation method can be implemented in the time or frequency domain. Timedomain algorithms are generally computationally consuming, while frequencydomain algorithms have lower complexity. On the other hand, timedomain algorithms perform better than frequencydomain ones with the same crosstalk cancellation filter length. For example, a frequencydomain method such as the fast deconvolution method [7], which has been shown to be very useful and easy to use in several practical cases, can suffer from a circular convolution effect when the inverse filters are not long enough compared to the duration of the acoustic path response. In an adaptive implementation method, the crosstalk cancellation filter is calculated adaptively with the feedback signals received by miniature microphones placed in human ears [9]. Several adaptive crosstalk cancellation methods typically employ some variation of LMS or RLS algorithms [10–13]. The LMS algorithm, which is known for its simplicity and robustness, has been used widely, but its convergence speed is slow. The RLS algorithm may accelerate the convergence, but the large computation load is a side effect. Although many algorithms have been proposed, the adaptive implementation method remains academic research rather than a real solution. The reason is that people who do not want to use headphones would probably not like to use a pair of microphones in the ears to optimize loudspeaker reproduction either.
One key limitation of a crosstalk cancellation system arises from the fact that any listener movement which exceeds 75–100 mm may completely destroy the desired spatial effect [14, 15]. This problem can be resolved by tracking the listener's head in 3D space. The head position is captured by a magnetic or camerabased tracker, then the HRTF filters and the crosstalk canceller based on the location of the listener are updated in real time [16]. Although headtracking systems can be employed, measures should still be taken to increase the robustness of the crosstalk cancellation system. It has been shown that the robust solution to this virtual sound system could be obtained by placing the loudspeakers in an appropriate way to ensure that the acoustic transmission path or transfer function matrix is well conditioned [17–19]. Robust crosstalk cancellation methods with multiple loudspeakers have been proposed [8, 20, 21]. Another approach adds robustness of a crosstalk canceller by exploring the statistical knowledge of acoustic transfer functions [22].
This paper focuses on the crosstalk cancellation problem for a stereo loudspeaker system. Leastsquares methods are popular in designing a crosstalk cancellation system; however, the required large computation is always a challenge. To reduce the computational cost, this paper proposes a novel crosstalk cancellation system based on commonacoustical pole/zero (CAPZ) models, which outperforms conventional allzero or pole/zero models in computational efficiency [23, 24]. The acoustic paths from loudspeakers to ears are approximated with CAPZ models, then the crosstalk cancellation filters are designed based on the CAPZ transfer functions. Compared with conventional leastsquares methods, the proposed method can reduce the computation cost greatly. The paper is organized as follows. Conventional crosstalk cancellation methods are introduced in Section 2. Then the proposed crosstalk cancellation method based on the CAPZ model is described in detail in Section 3. The performance of the proposed method is evaluated in Section 4. Finally, conclusions are drawn in Section 5.
2. Conventional Crosstalk Canceller
where is the identity matrix. The delay term is necessary to guarantee that is physical realizable (causal). However, a perfect reproduction is impossible because is generally nonminimumphase, in which case a leastsquares algorithm is employed to approximate the optimal inverse filter . The timedomain leastsquares algorithm is given below.
is a vector of length whose th component equals 1, and is a vector of length containing only zeros.
where is a regularization parameter to increase the robustness of the inversion [25].
The acoustic path matrix is dependent on the head position. When the head moves, it is required to update and calculate in real time. The computation load becomes heavy when the size of is large.
In [26], a singlefilter structure for a stereo loudspeaker system is proposed to calculate the inverse of , which needs less computation. It is given as follows.
is a convolution matrix of size by cascading of the vector ; .
The offdiagonal items of (21) are always zeros regardless the value of . This implies that the crosstalk is almost fully suppressed. However, due to the filtering effect by the diagonal items in (21), distortion will be introduced when reproducing the target signals. This is the inherent disadvantage of the singlefilter structure method.
3. Crosstalk Cancellation System Based on CAPZ Models
The acoustic transfer function is usually an allzero model, whose coefficients are its impulse response. However, when the duration of the impulse response is long, it requires a large number of parameters to represent the transfer function [27]. This results in large computation in binaural synthesis and crosstalk cancellation. Pole/zero models may decrease the computational load, but their poles and zeros both change when the acoustic transfer function varies, leading to inconvenience for acoustic path inversion. To reduce the computational cost, this paper attempts to approximate the acoustic transfer function with commonacoustical pole/zero (CAPZ) models, then design a crosstalk cancellation system based on it.
3.1. CAPZ Modeling of Acoustic Transfer Functions
Haneda proposed the concept of commonacoustical pole/zero (CAPZ) models, and modeled room transfer functions and headrelated transfer functions with good results [23, 24]. He believed that an HRTF contains a resonance system of ear canal whose resonance frequencies and factors are independent of source directions. Based on this, the HRTF can be efficiently modeled by using poles that are independent of source directions, with zeros that are dependent on source directions. The poles represent the resonance frequencies and factors. The model is called commonacoustical pole/zero model. CAPZ models share one set of poles and process their own zeros individually. This obviously reduces the amount of parameters with respect to conventional pole/zero models, and also cut down computation.
where and are the numbers of the poles and zeros, and are the pole and zero coefficient vectors, respectively. The CAPZ parameters may be estimated with a leastsquares method [23, 24] or a statespace method [28]. The leastsquares method is simply given below.
where is the length of and is the impulse response of .
where vector and matrix .
It is useful to specify the selection of the number of poles and zeros, and . The more poles and zeros used, the better approximation result may be obtained. On the other hand, more parameters require higher computation. Thus a tradeoff should be considered. Generally, in the leastsquares method, the number of parameters can be determined empirically [24]; or in the statespace method, it is determined based on the singularvalue decomposition result [28].
3.2. Crosstalk Cancellation Based on the CAPZ Model
where , , , and are the transmission delays from the loudspeakers to the ears.
where , , and is the delay.
is a vector of length whose th component equas 1.
where is the regularization parameter.
where .
3.3. Computational Complexity Analysis
Parameters for the three methods: the leastsquares method, the singlefilter structure method, and the CAPZ method.
Method  Inverse filter  Matrix size  Crosstalk cancellation filter length 

Leastsquares 



Singlefilter structure 



CAPZ 



3.3.1. Computational Complexity of Crosstalk Cancellation Filter Estimation
Computational complexity of crosstalk cancellation filter estimation for the three methods: the leastsquares method, the singlefilter structure method, and the CAPZ method.
Method  Computation cost (in multiplications) 

Leastsquares 

Singlefilter structure 

CAPZ 

From Table 2, the computational complexity of the leastsquares method is much higher than the other two methods (almost 8 times), while the computation of the singlefilter structure method is a little higher than the proposed CAPZ method.
3.3.2. Computational Complexity of Crosstalk Cancellation Filter Implementation
with the assumption of .
The leastsquares method has the lowest computational complexity in crosstalk cancellation filter implementation, while the singlefilter structure method has the highest one.
In summary, although the leastsquares method has the lowest computational cost in filter implementation, its complexity in filter estimation is much higher than the other two. On the other hand, the CAPZ method has the lowest complexity in filter estimation, and ranks second in terms of the complexity of filter implementation. In a global view of both measures, the CAPZ method is the most effective among the three ones. Later, the performance comparison of the three methods will be carried out in Section 4.3 under the same assumption with .
4. Performance Evaluation
The acoustic transfer function can be estimated based on the positions of loudspeakers and ears. Headrelated transfer functions (HRTF) provide a measure of the transfer path of a sound from some point in space to the ear canal. This paper assumes that the acoustic transfer function can be represented by HRTF in anechoic conditions. The HRTFs used in our experiments are from the extensive set of HRTFs measured at the CIPIC Interface Laboratory, University of California [29]. The database is composed of HRTFs for 45 subjects, and each subject contains 1250 HRTFs measured at 25 different azimuths and 50 different elevations. The HRTF is 200 taps long with a sampling rate of 44.1 kHz. In the experiment, the HRTFs are modeled as CAPZ models first, then the performance of the proposed crosstalk cancellation method is evaluated in two cases for loudspeakers placement: symmetric and asymmetric cases.
4.1. Experiments on CAPZ Modeling
4.2. Performance Metrics
and the average signaltocrosstalk ratio is given by .
and the average signaltodistortion ratio is .
According to the definitions above, the signaltocrosstalk ratio measures the crosstalk suppression performance, and signaltodistortion ratio measures the signal reproduction performance.
4.3. Performance Evaluation in Symmetric Cases
Optimal delay at various inverse filter lengths (in samples) for the three methods: the leastsquares method (LS), the singlefilter structure method (SF), and the CAPZ method.
Filter length  LS  SF  CAPZ 

50  50  100  100 
100  100  150  150 
150  100  150  150 
200  150  200  200 
250  150  200  200 
300  200  250  250 
350  200  250  250 
400  250  300  300 
From Figures 3–6, similar variation trends of the signaltodistortion ratio (SDR) and signaltocrosstalk ratio (SCR) may be observed for both noisy and noisefree cases. For all the three methods, the SDR performance increases with the inverse filter length , and the increase is small for . The slow variation of SDR for large may be related to the leastsquares matrix inversion process. When increases, the size of the matrices , and increases, the matrix inversion becomes difficult and more errors will be introduced. The error may cancel part of the benefit brought by a longer inverse filter. Thus the SDR increases slowly for large inverse filter length. With regard to the SCR performance, the leastsquares method yields increasing SCR with the increasing inverse filter length, while the singlefilter structure method and the CAPZ method yield almost constant SCR with the increasing inverse filter length. Since the offdiagonal items of (21) are always zeros regardless of the value of , the SCR of the singlefilter structure method is little affected by the inverse filter length. Likewise, the CAPZ method shows similar trend as the singlefilter structure method does. In Figure 6, a slow decrease is also noticed for the curves of the CAPZ method and the singlefilter structure method, which may be caused by the noise added to the acoustic transfer functions.
Mean crosstalk cancellation performance in the symmetric case for the three methods when the inverse filter length equals 150.
Method  SDR(dB)  SCR(dB)  Crosstalk cancellation filter length 

Leastsquares  11.2  15.6  150 
Singlefilter structure  7.1  26.8  349 
CAPZ  8.6  17.6  233 
4.4. Performance Evaluation in Asymmetric Cases
Crosstalk cancellation performance in the asymmetric case for the three methods when the inverse filter length equals 150.
Method  SDR(dB)  SCR(dB) 

Leastsquares  14.7  18.9 
Singlefilter structure  10.2  27.7 
CAPZ  12.0  19.1 
5. Conclusion
This paper investigates crosstalk cancellation for authentic binaural reproduction of stereo sounds over two loudspeakers. Since the crosstalk cancellation filter has to be updated according to the head position in real time, the computational efficiency of the crosstalk cancellation algorithm is crucial for practical applications. To reduce the computational cost, this paper presents a novel crosstalk cancellation system based on commonacoustical pole/zero (CAPZ) models. The acoustic transfer paths from loudspeakers to ears are approximated with CAPZ models, then the crosstalk cancellation filter is designed based on the CAPZ model. Since the CAPZ model has advantages in storage and computation, the proposed method is more efficient than conventional ones. Simulation results demonstrate that the proposed method can reduce the computational complexity greatly with comparable crosstalk cancellation performance with respect to conventional methods.
The experiment in this paper is conducted in anechoic conditions. However, with promising results in anechoic environments, the proposed method can be extended to realistic situations. For example, in reverberation conditions, the acoustic transfer functions may also be approximated by the CAPZ model, and then crosstalk cancellation may be conducted in a similar way. However, due to large computational complexity and timevarying environments, this situation has not been specially addressed. Our further research will focus on this practical problem.
Declarations
Acknowledgments
This work is supported by the National Natural Science Foundation of China (60772161, 60372082) and the Specialized Research Fund for the Doctoral Program of Higher Education of China (200801410015). This work is also supported by NRCMOE Research and Postdoctoral Fellowship Program from Ministry of Education of China and National Research Council of Canada.The authors gratefully acknowledge stimulating discussions with Dr. Heping Ding and Dr. Michael R. Stinson from Institute for Microstructural Sciences, National Research Council Canada.
Authors’ Affiliations
References
 Begault DR: 3D Sound for Virtual Reality and Multimedia. 1st edition. Academic Press, London, UK; 1994.Google Scholar
 Bronkhorst AW: Localization of real and virtual sound sources. Journal of the Acoustical Society of America 1995, 98(5):25422553. 10.1121/1.413219View ArticleGoogle Scholar
 Gardner WG, Martin KD: HRTF measurements of a KEMAR. Journal of the Acoustical Society of America 1995, 97(6):39073908. 10.1121/1.412407View ArticleGoogle Scholar
 Otani M, Ise S: Fast calculation system specialized for headrelated transfer function based on boundary element method. Journal of the Acoustical Society of America 2006, 119(5):25892598. 10.1121/1.2191608View ArticleGoogle Scholar
 Atal BS, Schroeder MR: Apparent sound source translator. US Patent no. 3,236,949, 1966Google Scholar
 Bauer BB: Stereophonic earphones and binaural loudspeakers. Journal of the AudioEngineering Society 1961, 9(2):148151.Google Scholar
 Kirkeby O, Nelson PA, Hamada H, OrdunaBustamante F: Fast deconvolution of multichannel systems using regularization. IEEE Transactions on Speech and Audio Processing 1998, 6(2):189194. 10.1109/89.661479View ArticleGoogle Scholar
 Huang Y, Benesty J, Chen J: On crosstalk cancellation and equalization with multiple loudspeakers for 3D sound reproduction. IEEE Signal Processing Letters 2007, 14(10):649652.View ArticleGoogle Scholar
 Garas J: Adaptive 3D Sound Systems. Kluwer Academic Publishers, Norwell, Mass, USA; 2000.View ArticleGoogle Scholar
 Mouchtaris A, Reveliotis P, Kyriakakis C: Inverse filter design for immersive audio rendering over loudspeakers. IEEE Transactions on Multimedia 2000, 2(2):7787. 10.1109/6046.845012View ArticleGoogle Scholar
 Nelson PA, Hamada H, Elliott SJ: Adaptive inverse filters for stereophonic sound reproduction. IEEE Transactions on Signal Processing 1992, 40(7):16211632. 10.1109/78.143434View ArticleMATHGoogle Scholar
 Gonzalez A, Lopez JJ: Time domain recursive deconvolution in sound reproduction. Proceedings of IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, June 2000 833836.Google Scholar
 Kuo SM, Canfield GH: Dualchannel audio equalization and crosstalk cancellation for 3D sound reproduction. IEEE Transactions on Consumer Electronics 1997, 43(4):11891196. 10.1109/30.642386View ArticleGoogle Scholar
 Kyriakakis C: Fundamental and Technological Limitations of Immersive Audio Systems. Proceedings of the IEEE 1998, 86(5):941951. 10.1109/5.664281View ArticleGoogle Scholar
 Bai MR, Lee CC: Objective and subjective analysis of effects of listening angle on crosstalk cancellation in spatial sound reproduction. Journal of the Acoustical Society of America 2006, 120(4):19761989. 10.1121/1.2257986View ArticleGoogle Scholar
 Lentz T: Dynamic crosstalk cancellation for binaural synthesis in virtual reality environments. Journal of the Audio Engineering Society 2006, 54(4):283294.Google Scholar
 Ward DB, Elko GW: Effect of loudspeaker position on the robustness of acoustic crosstalk cancellation. IEEE Signal Processing Letters 1999, 6(5):106108. 10.1109/97.755428View ArticleGoogle Scholar
 Takeuchi T, Nelson PA: Optimal source distribution for binaural synthesis over loudspeakers. Journal of the Acoustical Society of America 2002, 112(6):27862797. 10.1121/1.1513363View ArticleGoogle Scholar
 Bai MR, Tung CW, Lee CC: Optimal design of loudspeaker arrays for robust crosstalk cancellation using the Taguchi method and the genetic algorithm. Journal of the Acoustical Society of America 2005, 117(5):28022813. 10.1121/1.1880852View ArticleGoogle Scholar
 Yang J, Gan WS, Tan SE: Improved sound separation using three loudspeakers. Acoustic Research Letters Online 2003, 4: 4752. 10.1121/1.1566419View ArticleGoogle Scholar
 Kim Y, Deille O, Nelson PA: Crosstalk cancellation in virtual acoustic imaging systems for multiple listeners. Journal of Sound and Vibration 2006, 297(12):251266. 10.1016/j.jsv.2006.03.042View ArticleGoogle Scholar
 Kallinger M, Mertins A: A spatially robust least squares crosstalk canceller. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), April 2007 177180.Google Scholar
 Haneda Y, Makino S, Kaneda Y: Common acoustical pole and zero modeling of room transfer functions. IEEE Transactions on Speech and Audio Processing 1994, 2(2):320328. 10.1109/89.279281View ArticleGoogle Scholar
 Haneda Y, Makino S, Kaneda Y, Kitawaki N: Commonacousticalpole and zero modeling of headrelated transfer functions. IEEE Transactions on Speech and Audio Processing 1999, 7(2):188195. 10.1109/89.748123View ArticleGoogle Scholar
 Golub GH, Van Loan CF: Matrix Computations. 3rd edition. Johns Hopkins University Press, Baltimore, Md, USA; 1996.MATHGoogle Scholar
 Kim SM, Wang S: A Wiener filter approach to the binaural reproduction of stereo sound. Journal of the Acoustical Society of America 2003, 114(6):31793188. 10.1121/1.1624070View ArticleGoogle Scholar
 Wang L, Yin F, Chen Z: HRTF compression via principal components analysis and vector quantization. IEICE Electronics Express 2008, 5(9):321325. 10.1587/elex.5.321View ArticleGoogle Scholar
 Grantham DW, Willhite JA, Frampton KD, Ashmead DH: Reduced order modeling of head related impulse responses for virtual acoustic displays. Journal of the Acoustical Society of America 2005, 117(5):31163125. 10.1121/1.1882944View ArticleGoogle Scholar
 Algazi VR, Duda RO, Thompson DM, Avendano C: The CIPIC HRTF database. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2001 99102.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.