Open Access

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

EURASIP Journal on Advances in Signal Processing20072007:067870

https://doi.org/10.1155/2007/67870

Received: 1 February 2006

Accepted: 1 February 2007

Published: 22 March 2007

Abstract

A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by over the mel-cepstral-based features and by over the conventional histogram equalization method, respectively.

Keywords

Feature ExtractionHistogram EqualizationClass ReferenceSpeech DataSpeech Enhancement

[12345678910111213141516171819202122232425262728]

Authors’ Affiliations

(1)
School of Engineering, Information and Communications University, Daejeon, South Korea

References

  1. Sankar A, Lee C-H: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing 1996,4(3):190-202. 10.1109/89.496215View ArticleGoogle Scholar
  2. Huang X, Acero A, Hon H-W: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, Englewood Cliffs, NJ, USA; 2001.Google Scholar
  3. Rosenberg AE, Lee C-H, Soong FK: Cepstral channel normalization techniques for HMM-based speaker verification. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1835-1838.Google Scholar
  4. Viikki O, Laurila K: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 1998,25(1–3):133-147.View ArticleGoogle Scholar
  5. Kermorvant C: A comparison of noise reduction techniques for robust speech recognition. In IDIAP Research Report IDIAP-RR 99-10. IDIAP Research Institute, Martigny, Switzerland; 1999.Google Scholar
  6. Kim NS, Kim YJ, Kim H: Feature compensation based on soft decision. IEEE Signal Processing Letters 2004,11(3):378-381. 10.1109/LSP.2003.821720View ArticleGoogle Scholar
  7. Droppo J, Deng L, Acero A: Evaluation of the SPLICE algorithm on the Aurora2 database. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 217-220.Google Scholar
  8. Gales MJF: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 1998,12(2):75-98. 10.1006/csla.1998.0043View ArticleGoogle Scholar
  9. Gonzalez RC, Woods RE: Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 2002.Google Scholar
  10. Dharanipragada S, Padmanabhan M: A nonlinear unsupervised adaptation technique for speech recognition. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 4: 556-559.Google Scholar
  11. Hilger F, Ney H: Quantile based histogram equalization for noise robust speech recognition. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 1135-1138.Google Scholar
  12. Saon G, Huerta JM: Improvements to the IBM Aurora 2 multi-condition system. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 469-472.Google Scholar
  13. Molau S, Hilger F, Keysers D, Ney H: Enhanced histogram normalization in the acoustic feature space. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1421-1424.Google Scholar
  14. Molau S, Hilger F, Ney H: Feature space normalization in adverse acoustic conditions. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 656-659.View ArticleGoogle Scholar
  15. Hilger F: Quantile based histogram equalization for noise robust speech recognition, Ph.D. thesis. RWTH, Aachen-University of Technology, Aachen, Germany; 2004.Google Scholar
  16. Segura JC, Benítez C, de La Torre Á, Rubio AJ, Ramírez J: Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Processing Letters 2004,11(5):517-520. 10.1109/LSP.2004.826648View ArticleGoogle Scholar
  17. de La Torre Á, Peinado AM, Segura JC, Pérez-Córdoba JL, Benítez MC, Rubio AJ: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing 2005,13(3):355-366.View ArticleGoogle Scholar
  18. Suh Y, Kim H: Class-based histogram equalization for robust speech recognition. ETRI Journal 2006,28(4):502-505. 10.4218/etrij.06.0206.0005View ArticleGoogle Scholar
  19. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550View ArticleGoogle Scholar
  20. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453View ArticleGoogle Scholar
  21. Kim H, Rose RC: Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Speech and Audio Processing 2003,11(5):435-446. 10.1109/TSA.2003.815515View ArticleGoogle Scholar
  22. Cho YD, Al-Naimi K, Kondoz A: Mixed decision-based noise adaptation for speech enhancement. Electronics Letters 2001,37(8):540-542. 10.1049/el:20010368View ArticleGoogle Scholar
  23. Kim NS, Chang J-H: Spectral enhancement based on global soft decision. IEEE Signal Processing Letters 2000,7(5):108-110. 10.1109/97.841154View ArticleGoogle Scholar
  24. Cohen I: Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters 2004,11(9):725-728. 10.1109/LSP.2004.833478View ArticleGoogle Scholar
  25. Final draft ETSI ES 202 050 V1.1.1 : Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI, June 2002Google Scholar
  26. Chen SS, Gopinath RA: Gaussianization. Proceedings of Advances in Neural Information Processing Systems (NIPS '00), December 2000, Denver, Colo, USA 423-429.Google Scholar
  27. Saon G, Dharanipragada S, Povey D: Feature space gaussianization. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 1: 329-332.Google Scholar
  28. Nadeu C, Macho D, Hernando J: Time and frequency filtering of filter-bank energies for robust HMM speech recognition. Speech Communication 2001,34(1-2):93-114. 10.1016/S0167-6393(00)00048-0View ArticleMATHGoogle Scholar

Copyright

© Youngjoo Suh et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement