A Posterior Union Model with Applications to Robust Speech and Speaker Recognition

Ming, Ji; Lin, Jie; Smith, F. Jack

doi:10.1155/ASP/2006/75390

Research Article
Open access
Published: 01 December 2006

A Posterior Union Model with Applications to Robust Speech and Speaker Recognition

Ji Ming¹,
Jie Lin² &
F. Jack Smith¹

EURASIP Journal on Advances in Signal Processing volume 2006, Article number: 075390 (2006) Cite this article

1100 Accesses
7 Citations
Metrics details

Abstract

This paper investigates speech and speaker recognition involving partial feature corruption, assuming unknown, time-varying noise characteristics. The probabilistic union model is extended from a conditional-probability formulation to a posterior-probability formulation as an improved solution to the problem. The new formulation allows the order of the model to be optimized for every single frame, thereby enhancing the capability of the model for dealing with nonstationary noise corruption. The new formulation also allows the model to be readily incorporated into a Gaussian mixture model (GMM) for speaker recognition. Experiments have been conducted on two databases: TIDIGITS and SPIDRE, for speech recognition and speaker identification. Both databases are subject to unknown, time-varying band-selective corruption. The results have demonstrated the improved robustness for the new model.

References

Lippmann RP, Carlson BA: Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise. Proceedings of 5th European Conference on Speech Communication and Technology (Eurospeech '97), September 1997, Rhodes, Greece 37–40.
Google Scholar
Tibrewala S, Hermansky H: Sub-band based recognition of noisy speech. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), April 1997, Munich, Germany 2: 1255–1258.
Google Scholar
Drygajlo A, El-Maliki M: Speaker verification in noisy environments with combined spectral subtraction and missing feature theory. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 1: 121–124.
Google Scholar
Okawa S, Bocchieri E, Potamianos A: Multi-band speech recognition in noisy environments. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 2: 641–644.
Google Scholar
Renevey P, Drygajlo A: Statistical estimation of unreliable features for robust speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1731–1734.
Google Scholar
Besacier L, Bonastre JF, Fredouille C: Localization and selection of speaker-specific information with statistical modeling. Speech Communication 2000, 31(2–3):89–106. 10.1016/S0167-6393(99)00070-9
Article Google Scholar
Seltzer ML, Raj B, Stern RM: Classifier-based mask estimation for missing feature methods of robust speech recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China
Google Scholar
Barker J, Cooke MP, Green P: Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise. Proceedings of 7th European Conference on Speech Communication and Technology (Eurospeech '01), September 2001, Aalborg, Denmark 213–217.
Google Scholar
Morris A, Hagen A, Glotin H, Bourlard H: Multi-stream adaptive evidence combination for noise robust ASR. Speech Communication 2001, 34(1–2):25–40. 10.1016/S0167-6393(00)00044-3
Article Google Scholar
Cooke MP, Green P, Josifovski L, Vizinho A: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 2001, 34(3):267–285. 10.1016/S0167-6393(00)00034-0
Article Google Scholar
Barker JP, Cooke MP, Ellis DPW: Decoding speech in the presence of other sources. Speech Communication 2005, 45(1):5–25. 10.1016/j.specom.2004.05.002
Article Google Scholar
Ming J, Jan̆covĭc P, Smith FJ: Robust speech recognition using probabilistic union models. IEEE Transactions on Speech and Audio Processing 2002, 10(6):403–414. 10.1109/TSA.2002.803439
Article Google Scholar
Ming J, Smith FJ: Speech recognition with unknown partial feature corruption—a review of the union model. Computer Speech and Language 2003, 17(2–3):287–305. 10.1016/S0885-2308(03)00003-2
Article Google Scholar
Jan̆covĭc P, Ming J: A probabilistic union model with automatic order selection for noisy speech recognition. Journal of Acoustic Society of America 2001, 110(3):1641–1648. 10.1121/1.1387083
Article Google Scholar
Reynolds DA: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 1995, 17(1–2):91–108. 10.1016/0167-6393(95)00009-D
Article Google Scholar
Leonard RG: A database for speaker-indpendent digit recognition. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '84), March 1984, San Diego, Calif, USA 42.11.1–42.11.4.
Google Scholar
Campbell JP Jr., Reynolds DA: Corpora for the evaluation of speaker recognition systems. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 2: 2247–2250.
Google Scholar
Reynolds DA: The effects of handset variability on speaker recognition performance: experiment on the Switchboard corpus. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 113–116.
Google Scholar
Ortega-Garcia J, Gonzalez-Rodriguez L: Overview of speaker enhancement techniques for automatic speaker recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 929–932.
Chapter Google Scholar
Suhadi , Stan S, Fingscheidt T, Beaugeant C: An evaluation of VTS and IMM for speaker verification in noise. Proceedings of 8th European Conference on Speech Communication and Technology (Eurospeech '03), September 2003, Geneva, Switzerland 1669–1672.
Google Scholar
Matsui T, Kanno T, Furui S: Speaker recognition using HMM composition in noisy environments. Computer Speech and Language 1996, 10(2):107–116. 10.1006/csla.1996.0007
Article Google Scholar
Wong LP, Russell M: Text-dependent speaker verification under noisy conditions using parallel model combination. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 1: 457–460.
Google Scholar
Nadeu C, Hernando J, Gorricho M: On the decorrelation of filter-bank energies in speech recognition. Proceedings of 4th European Conference on Speech Communication and Technology (Eurospeech '95), September 1995, Madrid, Spain 1381–1384.
Google Scholar
Paliwal KK: Decorrelated and liftered filter-bank energies for robust speech recognition. Proceedings of 6th European Conference on Speech Communication and Technology (Eurospeech '99), September 1999, Budapest, Hungary 85–88.
Google Scholar
Ming J, Smith FJ: A posterior union model for improved robust speech recognition in nonstationary noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 420–423.
Article Google Scholar
Ming J: Universal compensation—an approach to noisy speech recognition assuming no knowledge of noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Canada 961–964.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Queen's University Belfast, Belfast, BT7 1NN, United Kingdom
Ji Ming & F. Jack Smith
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
Jie Lin

Authors

Ji Ming
View author publications
You can also search for this author in PubMed Google Scholar
Jie Lin
View author publications
You can also search for this author in PubMed Google Scholar
F. Jack Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ji Ming.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ming, J., Lin, J. & Smith, F.J. A Posterior Union Model with Applications to Robust Speech and Speaker Recognition. EURASIP J. Adv. Signal Process. 2006, 075390 (2006). https://doi.org/10.1155/ASP/2006/75390

Download citation

Received: 13 January 2005
Revised: 12 December 2005
Accepted: 14 December 2005
Published: 01 December 2006
DOI: https://doi.org/10.1155/ASP/2006/75390

A Posterior Union Model with Applications to Robust Speech and Speaker Recognition

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords