Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Suh, Youngjoo; Kim, Sungtak; Kim, Hoirin

doi:10.1155/2007/67870

Research Article
Open access
Published: 01 December 2007

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Youngjoo Suh¹,
Sungtak Kim¹ &
Hoirin Kim¹

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 067870 (2007) Cite this article

898 Accesses
4 Citations
Metrics details

Abstract

A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by over the mel-cepstral-based features and by over the conventional histogram equalization method, respectively.

References

Sankar A, Lee C-H: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing 1996,4(3):190-202. 10.1109/89.496215
Article Google Scholar
Huang X, Acero A, Hon H-W: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, Englewood Cliffs, NJ, USA; 2001.
Google Scholar
Rosenberg AE, Lee C-H, Soong FK: Cepstral channel normalization techniques for HMM-based speaker verification. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1835–1838.
Google Scholar
Viikki O, Laurila K: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 1998,25(1–3):133-147.
Article Google Scholar
Kermorvant C: A comparison of noise reduction techniques for robust speech recognition. In IDIAP Research Report IDIAP-RR 99-10. IDIAP Research Institute, Martigny, Switzerland; 1999.
Google Scholar
Kim NS, Kim YJ, Kim H: Feature compensation based on soft decision. IEEE Signal Processing Letters 2004,11(3):378-381. 10.1109/LSP.2003.821720
Article MathSciNet Google Scholar
Droppo J, Deng L, Acero A: Evaluation of the SPLICE algorithm on the Aurora2 database. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 217–220.
Google Scholar
Gales MJF: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 1998,12(2):75-98. 10.1006/csla.1998.0043
Article Google Scholar
Gonzalez RC, Woods RE: Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 2002.
Google Scholar
Dharanipragada S, Padmanabhan M: A nonlinear unsupervised adaptation technique for speech recognition. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 4: 556–559.
Google Scholar
Hilger F, Ney H: Quantile based histogram equalization for noise robust speech recognition. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 1135–1138.
Google Scholar
Saon G, Huerta JM: Improvements to the IBM Aurora 2 multi-condition system. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 469–472.
Google Scholar
Molau S, Hilger F, Keysers D, Ney H: Enhanced histogram normalization in the acoustic feature space. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1421–1424.
Google Scholar
Molau S, Hilger F, Ney H: Feature space normalization in adverse acoustic conditions. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 656–659.
Article Google Scholar
Hilger F: Quantile based histogram equalization for noise robust speech recognition, Ph.D. thesis. RWTH, Aachen-University of Technology, Aachen, Germany; 2004.
Google Scholar
Segura JC, Benítez C, de La Torre Á, Rubio AJ, Ramírez J: Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Processing Letters 2004,11(5):517-520. 10.1109/LSP.2004.826648
Article Google Scholar
de La Torre Á, Peinado AM, Segura JC, Pérez-Córdoba JL, Benítez MC, Rubio AJ: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing 2005,13(3):355-366.
Article Google Scholar
Suh Y, Kim H: Class-based histogram equalization for robust speech recognition. ETRI Journal 2006,28(4):502-505. 10.4218/etrij.06.0206.0005
Article Google Scholar
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550
Article Google Scholar
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453
Article Google Scholar
Kim H, Rose RC: Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Speech and Audio Processing 2003,11(5):435-446. 10.1109/TSA.2003.815515
Article Google Scholar
Cho YD, Al-Naimi K, Kondoz A: Mixed decision-based noise adaptation for speech enhancement. Electronics Letters 2001,37(8):540-542. 10.1049/el:20010368
Article Google Scholar
Kim NS, Chang J-H: Spectral enhancement based on global soft decision. IEEE Signal Processing Letters 2000,7(5):108-110. 10.1109/97.841154
Article Google Scholar
Cohen I: Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters 2004,11(9):725-728. 10.1109/LSP.2004.833478
Article Google Scholar
Final draft ETSI ES 202 050 V1.1.1 : Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI, June 2002
Chen SS, Gopinath RA: Gaussianization. Proceedings of Advances in Neural Information Processing Systems (NIPS '00), December 2000, Denver, Colo, USA 423–429.
Google Scholar
Saon G, Dharanipragada S, Povey D: Feature space gaussianization. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 1: 329–332.
Google Scholar
Nadeu C, Macho D, Hernando J: Time and frequency filtering of filter-bank energies for robust HMM speech recognition. Speech Communication 2001,34(1-2):93-114. 10.1016/S0167-6393(00)00048-0
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering, Information and Communications University, 119 Munjiro, Daejeon, Yuseong-Gu, 305-732, South Korea
Youngjoo Suh, Sungtak Kim & Hoirin Kim

Authors

Youngjoo Suh
View author publications
You can also search for this author in PubMed Google Scholar
Sungtak Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hoirin Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youngjoo Suh.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Suh, Y., Kim, S. & Kim, H. Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition. EURASIP J. Adv. Signal Process. 2007, 067870 (2007). https://doi.org/10.1155/2007/67870

Download citation

Received: 01 February 2006
Revised: 26 November 2006
Accepted: 01 February 2007
Published: 01 December 2007
DOI: https://doi.org/10.1155/2007/67870

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords