A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment

Hu, Jwu-Sheng; Cheng, Chieh-Cheng; Liu, Wei-Han

doi:10.1155/2007/13601

Research Article
Open access
Published: 01 December 2006

A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment

Jwu-Sheng Hu¹,
Chieh-Cheng Cheng¹ &
Wei-Han Liu¹

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 013601 (2006) Cite this article

1269 Accesses
4 Citations
Metrics details

Abstract

This work presents a robust speaker's location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones' mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment.

References

Ryan JG, Goubran RA: Application of near-field optimum microphone arrays to hands-free mobile telephony. IEEE Transactions on Vehicular Technology 2003,52(2):390–400.
Article Google Scholar
Pulasinghe K, Watanabe K, Izumi K, Kiguchi K: Modular fuzzy-neuro controller driven by spoken language commands. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(1):293–302. 10.1109/TSMCB.2003.811511
Article Google Scholar
Herbordt W, Horiuchi T, Fujimoto M, Jitsuhiro T, Nakamura S: Noise-robust hands-free speech recognition on PDAs using microphone array technology. Autumn Meeting of the Acoustical Society of Japan, September 2005, Sendai, Japan 51–54.
Google Scholar
Gannot S, Burshtein D, Weinstein E: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transactions on Signal Processing 2001,49(8):1614–1626. 10.1109/78.934132
Article Google Scholar
Aarabi P, Shi G: Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(4):1763–1773. 10.1109/TSMCB.2004.830345
Article Google Scholar
Hu J-S, Cheng C-C: Frequency domain microphone array calibration and beamforming for automatic speech recognition. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2005,E88-A(9):2401–2411. 10.1093/ietfec/e88-a.9.2401
Article MathSciNet Google Scholar
Ahn S, Ko H: Background noise reduction via dual-channel scheme for speech recognition in vehicular environment. IEEE Transactions on Consumer Electronics 2005,51(1):22–27. 10.1109/TCE.2005.1405694
Article Google Scholar
Carter GC, Nuttall AH, Cable PG: The smoothed coherence transform. Proceedings of the IEEE 1973,61(10):1497–1498.
Article Google Scholar
Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976, 24: 320–327. 10.1109/TASSP.1976.1162830
Article Google Scholar
Bienvenu G: Eigensystem properties of the sampled space correlation matrix. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '83), 1983, Boston, Mass, USA 8: 332–335.
Article Google Scholar
Wax M, Shan T-J, Kailath T: Spatio-temporal spectral analysis by eigenstructure methods. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(4):817–827. 10.1109/TASSP.1984.1164400
Article Google Scholar
Wang H, Kaveh M: Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(4):823–831. 10.1109/TASSP.1985.1164667
Article Google Scholar
Smith JO, Abel JS: Closed-form least-squares source location estimation from range-difference measurements. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(12):1661–1669. 10.1109/TASSP.1987.1165089
Article Google Scholar
Hu J-S, Cheng C-C, Liu W-H, Su TM: A speaker tracking system with distance estimation using microphone array. Proceedings of the IEEE/ASME International Conference on Advanced Manufacturing Technologies and Education, August 2002, Chiayi, Taiwan 485–494.
Google Scholar
Hu J-S, Su TM, Cheng C-C, Liu W-H, Wu TI: A self-calibrated speaker tracking system using both audio and video data. Proceedings of the IEEE Conference on Control Applications, September 2002, Glasgow, Scotland 2: 731–735.
Article Google Scholar
Omologo M, Svaizer P: Acoustic source location in noisy and reverberant environment using CSP analysis. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 901–904.
Google Scholar
Brandstein MS, Silverman HF: A robust method for speech signal time-delay estimation in reverberant rooms. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 375–378.
Google Scholar
Strobel N, Rabenstein R: Classification of time delay estimates for robust speaker localization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 6: 3081–3084.
Google Scholar
Mavandadi S, Aarabi P: Multichannel nonlinear phase analysis for time-frequency data fusion. Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2003, April 2003, Orlando, Fla, USA, Proceedings of SPIE 5099: 222–231.
Google Scholar
Aarabi P, Mavandadi S: Robust sound localization using conditional time-frequency histograms. Information Fusion 2003,4(2):111–122. 10.1016/S1566-2535(03)00003-4
Article Google Scholar
Ward DB, Williamson RC: Particle filter beamforming for acoustic source localization in a reverberant environment. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1777–1780.
Google Scholar
Potamitis I, Chen H, Tremoulis G: Tracking of multiple moving speakers with multiple microphone arrays. IEEE Transactions on Speech and Audio Processing 2004,12(5):520–529. 10.1109/TSA.2004.833004
Article Google Scholar
Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359–370. 10.1155/S1110865703212038
MATH Google Scholar
Chung P-J, Böhme JF, Hero AO: Tracking of multiple moving sources using recursive EM algorithm. EURASIP Journal on Applied Signal Processing 2005,2005(1):50–60. 10.1155/ASP.2005.50
MATH Google Scholar
Ng BC, See CMS: Sensor-array calibration using a maximum-likelihood approach. IEEE Transactions on Antennas and Propagation 1996,44(6):827–835. 10.1109/8.509886
Article Google Scholar
Ward DB, Lehmann EA, Williamson RC: Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Transactions on Speech and Audio Processing 2003,11(6):826–836. 10.1109/TSA.2003.818112
Article Google Scholar
Hu J-S, Cheng C-C, Liu W-H: Robust speaker's location detection in a vehicle environment using GMM models. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2006,36(2):403–412.
Article Google Scholar
Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72–83. 10.1109/89.365379
Article Google Scholar
Ramírez J, Segura JC, Benítez C, De la Torre A, Rubio Á: Efficient voice activity detection algorithms using long-term speech information. Speech Communication 2004,42(3–4):271–287. 10.1016/j.specom.2003.10.002
Article Google Scholar
Potamitis I: Estimation of speech presence probability in the field of microphone array. IEEE Signal Processing Letters 2004,11(12):956–959. 10.1109/LSP.2004.838200
Article Google Scholar
Brandstein M, Ward D: Microphone Arrays: Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 2
Book Google Scholar
Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72–83. 10.1109/89.365379
Article Google Scholar
Xuan G, Zhang W, Chai P: EM algorithms of Gaussian mixture model and hidden Markov model. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 1: 145–148.
Google Scholar
Mitsubishi Motors - Savrin (https://doi.org/www.sym-motor.com.tw/savrin-1.htm)
Ryan JG, Goubran RA: Near-field beamforming for microphone arrays. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 363–366.
Google Scholar
Vries DD, Hulsebos EM, Bann J: Spatial fluctuations in measures for spaciousness. Journal of the Acoustical Society of America 2001,110(2):947–954. 10.1121/1.1377634
Article Google Scholar
Pelorson X, Vian J-P, Polack J-D: On the variability of room acoustical parameters: reproducibility and statistical validity. Applied Acoustics 1992,37(3):175–198. 10.1016/0003-682X(92)90002-A
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Control Engineering, National Chiao Tung University, Hsinchu, 300, Taiwan
Jwu-Sheng Hu, Chieh-Cheng Cheng & Wei-Han Liu

Authors

Jwu-Sheng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chieh-Cheng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Han Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jwu-Sheng Hu.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hu, JS., Cheng, CC. & Liu, WH. A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment. EURASIP J. Adv. Signal Process. 2007, 013601 (2006). https://doi.org/10.1155/2007/13601

Download citation

Received: 01 May 2006
Revised: 27 July 2006
Accepted: 26 August 2006
Published: 01 December 2006
DOI: https://doi.org/10.1155/2007/13601

A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords