- Research Article
- Open access
- Published:
Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition
EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 067870 (2007)
Abstract
A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by over the mel-cepstral-based features and by over the conventional histogram equalization method, respectively.
References
Sankar A, Lee C-H: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing 1996,4(3):190-202. 10.1109/89.496215
Huang X, Acero A, Hon H-W: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, Englewood Cliffs, NJ, USA; 2001.
Rosenberg AE, Lee C-H, Soong FK: Cepstral channel normalization techniques for HMM-based speaker verification. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1835–1838.
Viikki O, Laurila K: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 1998,25(1–3):133-147.
Kermorvant C: A comparison of noise reduction techniques for robust speech recognition. In IDIAP Research Report IDIAP-RR 99-10. IDIAP Research Institute, Martigny, Switzerland; 1999.
Kim NS, Kim YJ, Kim H: Feature compensation based on soft decision. IEEE Signal Processing Letters 2004,11(3):378-381. 10.1109/LSP.2003.821720
Droppo J, Deng L, Acero A: Evaluation of the SPLICE algorithm on the Aurora2 database. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 217–220.
Gales MJF: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 1998,12(2):75-98. 10.1006/csla.1998.0043
Gonzalez RC, Woods RE: Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 2002.
Dharanipragada S, Padmanabhan M: A nonlinear unsupervised adaptation technique for speech recognition. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 4: 556–559.
Hilger F, Ney H: Quantile based histogram equalization for noise robust speech recognition. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 1135–1138.
Saon G, Huerta JM: Improvements to the IBM Aurora 2 multi-condition system. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 469–472.
Molau S, Hilger F, Keysers D, Ney H: Enhanced histogram normalization in the acoustic feature space. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1421–1424.
Molau S, Hilger F, Ney H: Feature space normalization in adverse acoustic conditions. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 656–659.
Hilger F: Quantile based histogram equalization for noise robust speech recognition, Ph.D. thesis. RWTH, Aachen-University of Technology, Aachen, Germany; 2004.
Segura JC, Benítez C, de La Torre Á, Rubio AJ, Ramírez J: Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Processing Letters 2004,11(5):517-520. 10.1109/LSP.2004.826648
de La Torre Á, Peinado AM, Segura JC, Pérez-Córdoba JL, Benítez MC, Rubio AJ: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing 2005,13(3):355-366.
Suh Y, Kim H: Class-based histogram equalization for robust speech recognition. ETRI Journal 2006,28(4):502-505. 10.4218/etrij.06.0206.0005
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453
Kim H, Rose RC: Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Speech and Audio Processing 2003,11(5):435-446. 10.1109/TSA.2003.815515
Cho YD, Al-Naimi K, Kondoz A: Mixed decision-based noise adaptation for speech enhancement. Electronics Letters 2001,37(8):540-542. 10.1049/el:20010368
Kim NS, Chang J-H: Spectral enhancement based on global soft decision. IEEE Signal Processing Letters 2000,7(5):108-110. 10.1109/97.841154
Cohen I: Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters 2004,11(9):725-728. 10.1109/LSP.2004.833478
Final draft ETSI ES 202 050 V1.1.1 : Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI, June 2002
Chen SS, Gopinath RA: Gaussianization. Proceedings of Advances in Neural Information Processing Systems (NIPS '00), December 2000, Denver, Colo, USA 423–429.
Saon G, Dharanipragada S, Povey D: Feature space gaussianization. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 1: 329–332.
Nadeu C, Macho D, Hernando J: Time and frequency filtering of filter-bank energies for robust HMM speech recognition. Speech Communication 2001,34(1-2):93-114. 10.1016/S0167-6393(00)00048-0
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Suh, Y., Kim, S. & Kim, H. Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition. EURASIP J. Adv. Signal Process. 2007, 067870 (2007). https://doi.org/10.1155/2007/67870
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/67870