Skip to content

Advertisement

  • Research Article
  • Open Access

A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment

EURASIP Journal on Advances in Signal Processing20062007:013601

https://doi.org/10.1155/2007/13601

  • Received: 1 May 2006
  • Accepted: 26 August 2006
  • Published:

Abstract

This work presents a robust speaker's location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones' mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment.

Keywords

  • Mixture Model
  • Acoustics
  • Speech Signal
  • Adaptation Scheme
  • Gaussian Mixture Model

[12345678910111213141516171819202122232425262728293031323334353637]

Authors’ Affiliations

(1)
Department of Electrical and Control Engineering, National Chiao Tung University, Hsinchu, 300, Taiwan

References

  1. Ryan JG, Goubran RA: Application of near-field optimum microphone arrays to hands-free mobile telephony. IEEE Transactions on Vehicular Technology 2003,52(2):390-400.Google Scholar
  2. Pulasinghe K, Watanabe K, Izumi K, Kiguchi K: Modular fuzzy-neuro controller driven by spoken language commands. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(1):293-302. 10.1109/TSMCB.2003.811511View ArticleGoogle Scholar
  3. Herbordt W, Horiuchi T, Fujimoto M, Jitsuhiro T, Nakamura S: Noise-robust hands-free speech recognition on PDAs using microphone array technology. Autumn Meeting of the Acoustical Society of Japan, September 2005, Sendai, Japan 51-54.Google Scholar
  4. Gannot S, Burshtein D, Weinstein E: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transactions on Signal Processing 2001,49(8):1614-1626. 10.1109/78.934132View ArticleGoogle Scholar
  5. Aarabi P, Shi G: Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(4):1763-1773. 10.1109/TSMCB.2004.830345View ArticleGoogle Scholar
  6. Hu J-S, Cheng C-C: Frequency domain microphone array calibration and beamforming for automatic speech recognition. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2005,E88-A(9):2401-2411. 10.1093/ietfec/e88-a.9.2401View ArticleGoogle Scholar
  7. Ahn S, Ko H: Background noise reduction via dual-channel scheme for speech recognition in vehicular environment. IEEE Transactions on Consumer Electronics 2005,51(1):22-27. 10.1109/TCE.2005.1405694View ArticleGoogle Scholar
  8. Carter GC, Nuttall AH, Cable PG: The smoothed coherence transform. Proceedings of the IEEE 1973,61(10):1497-1498.View ArticleGoogle Scholar
  9. Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976, 24: 320-327. 10.1109/TASSP.1976.1162830View ArticleGoogle Scholar
  10. Bienvenu G: Eigensystem properties of the sampled space correlation matrix. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '83), 1983, Boston, Mass, USA 8: 332-335.View ArticleGoogle Scholar
  11. Wax M, Shan T-J, Kailath T: Spatio-temporal spectral analysis by eigenstructure methods. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(4):817-827. 10.1109/TASSP.1984.1164400View ArticleGoogle Scholar
  12. Wang H, Kaveh M: Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(4):823-831. 10.1109/TASSP.1985.1164667View ArticleGoogle Scholar
  13. Smith JO, Abel JS: Closed-form least-squares source location estimation from range-difference measurements. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(12):1661-1669. 10.1109/TASSP.1987.1165089View ArticleGoogle Scholar
  14. Hu J-S, Cheng C-C, Liu W-H, Su TM: A speaker tracking system with distance estimation using microphone array. Proceedings of the IEEE/ASME International Conference on Advanced Manufacturing Technologies and Education, August 2002, Chiayi, Taiwan 485-494.Google Scholar
  15. Hu J-S, Su TM, Cheng C-C, Liu W-H, Wu TI: A self-calibrated speaker tracking system using both audio and video data. Proceedings of the IEEE Conference on Control Applications, September 2002, Glasgow, Scotland 2: 731-735.View ArticleGoogle Scholar
  16. Omologo M, Svaizer P: Acoustic source location in noisy and reverberant environment using CSP analysis. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 901-904.Google Scholar
  17. Brandstein MS, Silverman HF: A robust method for speech signal time-delay estimation in reverberant rooms. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 375-378.Google Scholar
  18. Strobel N, Rabenstein R: Classification of time delay estimates for robust speaker localization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 6: 3081-3084.View ArticleGoogle Scholar
  19. Mavandadi S, Aarabi P: Multichannel nonlinear phase analysis for time-frequency data fusion. Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2003, April 2003, Orlando, Fla, USA, Proceedings of SPIE 5099: 222-231.Google Scholar
  20. Aarabi P, Mavandadi S: Robust sound localization using conditional time-frequency histograms. Information Fusion 2003,4(2):111-122. 10.1016/S1566-2535(03)00003-4View ArticleGoogle Scholar
  21. Ward DB, Williamson RC: Particle filter beamforming for acoustic source localization in a reverberant environment. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1777-1780.Google Scholar
  22. Potamitis I, Chen H, Tremoulis G: Tracking of multiple moving speakers with multiple microphone arrays. IEEE Transactions on Speech and Audio Processing 2004,12(5):520-529. 10.1109/TSA.2004.833004View ArticleGoogle Scholar
  23. Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359-370. 10.1155/S1110865703212038View ArticleMATHGoogle Scholar
  24. Chung P-J, Böhme JF, Hero AO: Tracking of multiple moving sources using recursive EM algorithm. EURASIP Journal on Applied Signal Processing 2005,2005(1):50-60. 10.1155/ASP.2005.50View ArticleMATHGoogle Scholar
  25. Ng BC, See CMS: Sensor-array calibration using a maximum-likelihood approach. IEEE Transactions on Antennas and Propagation 1996,44(6):827-835. 10.1109/8.509886View ArticleGoogle Scholar
  26. Ward DB, Lehmann EA, Williamson RC: Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Transactions on Speech and Audio Processing 2003,11(6):826-836. 10.1109/TSA.2003.818112View ArticleGoogle Scholar
  27. Hu J-S, Cheng C-C, Liu W-H: Robust speaker's location detection in a vehicle environment using GMM models. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2006,36(2):403-412.View ArticleGoogle Scholar
  28. Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72-83. 10.1109/89.365379View ArticleGoogle Scholar
  29. Ramírez J, Segura JC, Benítez C, De la Torre A, Rubio Á: Efficient voice activity detection algorithms using long-term speech information. Speech Communication 2004,42(3-4):271-287. 10.1016/j.specom.2003.10.002View ArticleGoogle Scholar
  30. Potamitis I: Estimation of speech presence probability in the field of microphone array. IEEE Signal Processing Letters 2004,11(12):956-959. 10.1109/LSP.2004.838200View ArticleGoogle Scholar
  31. Brandstein M, Ward D: Microphone Arrays: Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 2View ArticleGoogle Scholar
  32. Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72-83. 10.1109/89.365379View ArticleGoogle Scholar
  33. Xuan G, Zhang W, Chai P: EM algorithms of Gaussian mixture model and hidden Markov model. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 1: 145-148.Google Scholar
  34. Mitsubishi Motors - Savrin (http://www.sym-motor.com.tw/savrin-1.htm)
  35. Ryan JG, Goubran RA: Near-field beamforming for microphone arrays. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 363-366.Google Scholar
  36. Vries DD, Hulsebos EM, Bann J: Spatial fluctuations in measures for spaciousness. Journal of the Acoustical Society of America 2001,110(2):947-954. 10.1121/1.1377634View ArticleGoogle Scholar
  37. Pelorson X, Vian J-P, Polack J-D: On the variability of room acoustical parameters: reproducibility and statistical validity. Applied Acoustics 1992,37(3):175-198. 10.1016/0003-682X(92)90002-AView ArticleGoogle Scholar

Copyright

Advertisement