Skip to main content

A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment

Abstract

This work presents a robust speaker's location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones' mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment.

References

  1. 1.

    Ryan JG, Goubran RA: Application of near-field optimum microphone arrays to hands-free mobile telephony. IEEE Transactions on Vehicular Technology 2003,52(2):390–400.

    Article  Google Scholar 

  2. 2.

    Pulasinghe K, Watanabe K, Izumi K, Kiguchi K: Modular fuzzy-neuro controller driven by spoken language commands. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(1):293–302. 10.1109/TSMCB.2003.811511

    Article  Google Scholar 

  3. 3.

    Herbordt W, Horiuchi T, Fujimoto M, Jitsuhiro T, Nakamura S: Noise-robust hands-free speech recognition on PDAs using microphone array technology. Autumn Meeting of the Acoustical Society of Japan, September 2005, Sendai, Japan 51–54.

    Google Scholar 

  4. 4.

    Gannot S, Burshtein D, Weinstein E: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transactions on Signal Processing 2001,49(8):1614–1626. 10.1109/78.934132

    Article  Google Scholar 

  5. 5.

    Aarabi P, Shi G: Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(4):1763–1773. 10.1109/TSMCB.2004.830345

    Article  Google Scholar 

  6. 6.

    Hu J-S, Cheng C-C: Frequency domain microphone array calibration and beamforming for automatic speech recognition. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2005,E88-A(9):2401–2411. 10.1093/ietfec/e88-a.9.2401

    MathSciNet  Article  Google Scholar 

  7. 7.

    Ahn S, Ko H: Background noise reduction via dual-channel scheme for speech recognition in vehicular environment. IEEE Transactions on Consumer Electronics 2005,51(1):22–27. 10.1109/TCE.2005.1405694

    Article  Google Scholar 

  8. 8.

    Carter GC, Nuttall AH, Cable PG: The smoothed coherence transform. Proceedings of the IEEE 1973,61(10):1497–1498.

    Article  Google Scholar 

  9. 9.

    Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976, 24: 320–327. 10.1109/TASSP.1976.1162830

    Article  Google Scholar 

  10. 10.

    Bienvenu G: Eigensystem properties of the sampled space correlation matrix. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '83), 1983, Boston, Mass, USA 8: 332–335.

    Article  Google Scholar 

  11. 11.

    Wax M, Shan T-J, Kailath T: Spatio-temporal spectral analysis by eigenstructure methods. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(4):817–827. 10.1109/TASSP.1984.1164400

    Article  Google Scholar 

  12. 12.

    Wang H, Kaveh M: Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(4):823–831. 10.1109/TASSP.1985.1164667

    Article  Google Scholar 

  13. 13.

    Smith JO, Abel JS: Closed-form least-squares source location estimation from range-difference measurements. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(12):1661–1669. 10.1109/TASSP.1987.1165089

    Article  Google Scholar 

  14. 14.

    Hu J-S, Cheng C-C, Liu W-H, Su TM: A speaker tracking system with distance estimation using microphone array. Proceedings of the IEEE/ASME International Conference on Advanced Manufacturing Technologies and Education, August 2002, Chiayi, Taiwan 485–494.

    Google Scholar 

  15. 15.

    Hu J-S, Su TM, Cheng C-C, Liu W-H, Wu TI: A self-calibrated speaker tracking system using both audio and video data. Proceedings of the IEEE Conference on Control Applications, September 2002, Glasgow, Scotland 2: 731–735.

    Article  Google Scholar 

  16. 16.

    Omologo M, Svaizer P: Acoustic source location in noisy and reverberant environment using CSP analysis. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 901–904.

    Google Scholar 

  17. 17.

    Brandstein MS, Silverman HF: A robust method for speech signal time-delay estimation in reverberant rooms. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 375–378.

    Google Scholar 

  18. 18.

    Strobel N, Rabenstein R: Classification of time delay estimates for robust speaker localization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 6: 3081–3084.

    Google Scholar 

  19. 19.

    Mavandadi S, Aarabi P: Multichannel nonlinear phase analysis for time-frequency data fusion. Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2003, April 2003, Orlando, Fla, USA, Proceedings of SPIE 5099: 222–231.

    Google Scholar 

  20. 20.

    Aarabi P, Mavandadi S: Robust sound localization using conditional time-frequency histograms. Information Fusion 2003,4(2):111–122. 10.1016/S1566-2535(03)00003-4

    Article  Google Scholar 

  21. 21.

    Ward DB, Williamson RC: Particle filter beamforming for acoustic source localization in a reverberant environment. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1777–1780.

    Google Scholar 

  22. 22.

    Potamitis I, Chen H, Tremoulis G: Tracking of multiple moving speakers with multiple microphone arrays. IEEE Transactions on Speech and Audio Processing 2004,12(5):520–529. 10.1109/TSA.2004.833004

    Article  Google Scholar 

  23. 23.

    Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359–370. 10.1155/S1110865703212038

    MATH  Google Scholar 

  24. 24.

    Chung P-J, Böhme JF, Hero AO: Tracking of multiple moving sources using recursive EM algorithm. EURASIP Journal on Applied Signal Processing 2005,2005(1):50–60. 10.1155/ASP.2005.50

    MATH  Google Scholar 

  25. 25.

    Ng BC, See CMS: Sensor-array calibration using a maximum-likelihood approach. IEEE Transactions on Antennas and Propagation 1996,44(6):827–835. 10.1109/8.509886

    Article  Google Scholar 

  26. 26.

    Ward DB, Lehmann EA, Williamson RC: Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Transactions on Speech and Audio Processing 2003,11(6):826–836. 10.1109/TSA.2003.818112

    Article  Google Scholar 

  27. 27.

    Hu J-S, Cheng C-C, Liu W-H: Robust speaker's location detection in a vehicle environment using GMM models. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2006,36(2):403–412.

    Article  Google Scholar 

  28. 28.

    Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72–83. 10.1109/89.365379

    Article  Google Scholar 

  29. 29.

    Ramírez J, Segura JC, Benítez C, De la Torre A, Rubio Á: Efficient voice activity detection algorithms using long-term speech information. Speech Communication 2004,42(3–4):271–287. 10.1016/j.specom.2003.10.002

    Article  Google Scholar 

  30. 30.

    Potamitis I: Estimation of speech presence probability in the field of microphone array. IEEE Signal Processing Letters 2004,11(12):956–959. 10.1109/LSP.2004.838200

    Article  Google Scholar 

  31. 31.

    Brandstein M, Ward D: Microphone Arrays: Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 2

    Google Scholar 

  32. 32.

    Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72–83. 10.1109/89.365379

    Article  Google Scholar 

  33. 33.

    Xuan G, Zhang W, Chai P: EM algorithms of Gaussian mixture model and hidden Markov model. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 1: 145–148.

    Google Scholar 

  34. 34.

    Mitsubishi Motors - Savrin (https://doi.org/www.sym-motor.com.tw/savrin-1.htm)

  35. 35.

    Ryan JG, Goubran RA: Near-field beamforming for microphone arrays. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 363–366.

    Google Scholar 

  36. 36.

    Vries DD, Hulsebos EM, Bann J: Spatial fluctuations in measures for spaciousness. Journal of the Acoustical Society of America 2001,110(2):947–954. 10.1121/1.1377634

    Article  Google Scholar 

  37. 37.

    Pelorson X, Vian J-P, Polack J-D: On the variability of room acoustical parameters: reproducibility and statistical validity. Applied Acoustics 1992,37(3):175–198. 10.1016/0003-682X(92)90002-A

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jwu-Sheng Hu.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Hu, J., Cheng, C. & Liu, W. A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment. EURASIP J. Adv. Signal Process. 2007, 013601 (2006). https://doi.org/10.1155/2007/13601

Download citation

Keywords

  • Mixture Model
  • Acoustics
  • Speech Signal
  • Adaptation Scheme
  • Gaussian Mixture Model