- Research Article
- Open access
- Published:
A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment
EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 013601 (2006)
Abstract
This work presents a robust speaker's location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones' mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment.
References
Ryan JG, Goubran RA: Application of near-field optimum microphone arrays to hands-free mobile telephony. IEEE Transactions on Vehicular Technology 2003,52(2):390–400.
Pulasinghe K, Watanabe K, Izumi K, Kiguchi K: Modular fuzzy-neuro controller driven by spoken language commands. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(1):293–302. 10.1109/TSMCB.2003.811511
Herbordt W, Horiuchi T, Fujimoto M, Jitsuhiro T, Nakamura S: Noise-robust hands-free speech recognition on PDAs using microphone array technology. Autumn Meeting of the Acoustical Society of Japan, September 2005, Sendai, Japan 51–54.
Gannot S, Burshtein D, Weinstein E: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transactions on Signal Processing 2001,49(8):1614–1626. 10.1109/78.934132
Aarabi P, Shi G: Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(4):1763–1773. 10.1109/TSMCB.2004.830345
Hu J-S, Cheng C-C: Frequency domain microphone array calibration and beamforming for automatic speech recognition. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2005,E88-A(9):2401–2411. 10.1093/ietfec/e88-a.9.2401
Ahn S, Ko H: Background noise reduction via dual-channel scheme for speech recognition in vehicular environment. IEEE Transactions on Consumer Electronics 2005,51(1):22–27. 10.1109/TCE.2005.1405694
Carter GC, Nuttall AH, Cable PG: The smoothed coherence transform. Proceedings of the IEEE 1973,61(10):1497–1498.
Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976, 24: 320–327. 10.1109/TASSP.1976.1162830
Bienvenu G: Eigensystem properties of the sampled space correlation matrix. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '83), 1983, Boston, Mass, USA 8: 332–335.
Wax M, Shan T-J, Kailath T: Spatio-temporal spectral analysis by eigenstructure methods. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(4):817–827. 10.1109/TASSP.1984.1164400
Wang H, Kaveh M: Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(4):823–831. 10.1109/TASSP.1985.1164667
Smith JO, Abel JS: Closed-form least-squares source location estimation from range-difference measurements. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(12):1661–1669. 10.1109/TASSP.1987.1165089
Hu J-S, Cheng C-C, Liu W-H, Su TM: A speaker tracking system with distance estimation using microphone array. Proceedings of the IEEE/ASME International Conference on Advanced Manufacturing Technologies and Education, August 2002, Chiayi, Taiwan 485–494.
Hu J-S, Su TM, Cheng C-C, Liu W-H, Wu TI: A self-calibrated speaker tracking system using both audio and video data. Proceedings of the IEEE Conference on Control Applications, September 2002, Glasgow, Scotland 2: 731–735.
Omologo M, Svaizer P: Acoustic source location in noisy and reverberant environment using CSP analysis. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 901–904.
Brandstein MS, Silverman HF: A robust method for speech signal time-delay estimation in reverberant rooms. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 375–378.
Strobel N, Rabenstein R: Classification of time delay estimates for robust speaker localization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 6: 3081–3084.
Mavandadi S, Aarabi P: Multichannel nonlinear phase analysis for time-frequency data fusion. Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2003, April 2003, Orlando, Fla, USA, Proceedings of SPIE 5099: 222–231.
Aarabi P, Mavandadi S: Robust sound localization using conditional time-frequency histograms. Information Fusion 2003,4(2):111–122. 10.1016/S1566-2535(03)00003-4
Ward DB, Williamson RC: Particle filter beamforming for acoustic source localization in a reverberant environment. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1777–1780.
Potamitis I, Chen H, Tremoulis G: Tracking of multiple moving speakers with multiple microphone arrays. IEEE Transactions on Speech and Audio Processing 2004,12(5):520–529. 10.1109/TSA.2004.833004
Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359–370. 10.1155/S1110865703212038
Chung P-J, Böhme JF, Hero AO: Tracking of multiple moving sources using recursive EM algorithm. EURASIP Journal on Applied Signal Processing 2005,2005(1):50–60. 10.1155/ASP.2005.50
Ng BC, See CMS: Sensor-array calibration using a maximum-likelihood approach. IEEE Transactions on Antennas and Propagation 1996,44(6):827–835. 10.1109/8.509886
Ward DB, Lehmann EA, Williamson RC: Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Transactions on Speech and Audio Processing 2003,11(6):826–836. 10.1109/TSA.2003.818112
Hu J-S, Cheng C-C, Liu W-H: Robust speaker's location detection in a vehicle environment using GMM models. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2006,36(2):403–412.
Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72–83. 10.1109/89.365379
Ramírez J, Segura JC, Benítez C, De la Torre A, Rubio Á: Efficient voice activity detection algorithms using long-term speech information. Speech Communication 2004,42(3–4):271–287. 10.1016/j.specom.2003.10.002
Potamitis I: Estimation of speech presence probability in the field of microphone array. IEEE Signal Processing Letters 2004,11(12):956–959. 10.1109/LSP.2004.838200
Brandstein M, Ward D: Microphone Arrays: Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 2
Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72–83. 10.1109/89.365379
Xuan G, Zhang W, Chai P: EM algorithms of Gaussian mixture model and hidden Markov model. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 1: 145–148.
Mitsubishi Motors - Savrin (https://doi.org/www.sym-motor.com.tw/savrin-1.htm)
Ryan JG, Goubran RA: Near-field beamforming for microphone arrays. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 363–366.
Vries DD, Hulsebos EM, Bann J: Spatial fluctuations in measures for spaciousness. Journal of the Acoustical Society of America 2001,110(2):947–954. 10.1121/1.1377634
Pelorson X, Vian J-P, Polack J-D: On the variability of room acoustical parameters: reproducibility and statistical validity. Applied Acoustics 1992,37(3):175–198. 10.1016/0003-682X(92)90002-A
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Hu, JS., Cheng, CC. & Liu, WH. A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment. EURASIP J. Adv. Signal Process. 2007, 013601 (2006). https://doi.org/10.1155/2007/13601
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/13601