Skip to main content
  • Research Article
  • Open access
  • Published:

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Abstract

A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by over the mel-cepstral-based features and by over the conventional histogram equalization method, respectively.

References

  1. Sankar A, Lee C-H: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing 1996,4(3):190-202. 10.1109/89.496215

    Article  Google Scholar 

  2. Huang X, Acero A, Hon H-W: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, Englewood Cliffs, NJ, USA; 2001.

    Google Scholar 

  3. Rosenberg AE, Lee C-H, Soong FK: Cepstral channel normalization techniques for HMM-based speaker verification. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1835–1838.

    Google Scholar 

  4. Viikki O, Laurila K: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 1998,25(1–3):133-147.

    Article  Google Scholar 

  5. Kermorvant C: A comparison of noise reduction techniques for robust speech recognition. In IDIAP Research Report IDIAP-RR 99-10. IDIAP Research Institute, Martigny, Switzerland; 1999.

    Google Scholar 

  6. Kim NS, Kim YJ, Kim H: Feature compensation based on soft decision. IEEE Signal Processing Letters 2004,11(3):378-381. 10.1109/LSP.2003.821720

    Article  MathSciNet  Google Scholar 

  7. Droppo J, Deng L, Acero A: Evaluation of the SPLICE algorithm on the Aurora2 database. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 217–220.

    Google Scholar 

  8. Gales MJF: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 1998,12(2):75-98. 10.1006/csla.1998.0043

    Article  Google Scholar 

  9. Gonzalez RC, Woods RE: Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 2002.

    Google Scholar 

  10. Dharanipragada S, Padmanabhan M: A nonlinear unsupervised adaptation technique for speech recognition. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 4: 556–559.

    Google Scholar 

  11. Hilger F, Ney H: Quantile based histogram equalization for noise robust speech recognition. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 1135–1138.

    Google Scholar 

  12. Saon G, Huerta JM: Improvements to the IBM Aurora 2 multi-condition system. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 469–472.

    Google Scholar 

  13. Molau S, Hilger F, Keysers D, Ney H: Enhanced histogram normalization in the acoustic feature space. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1421–1424.

    Google Scholar 

  14. Molau S, Hilger F, Ney H: Feature space normalization in adverse acoustic conditions. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 656–659.

    Article  Google Scholar 

  15. Hilger F: Quantile based histogram equalization for noise robust speech recognition, Ph.D. thesis. RWTH, Aachen-University of Technology, Aachen, Germany; 2004.

    Google Scholar 

  16. Segura JC, Benítez C, de La Torre Á, Rubio AJ, Ramírez J: Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Processing Letters 2004,11(5):517-520. 10.1109/LSP.2004.826648

    Article  Google Scholar 

  17. de La Torre Á, Peinado AM, Segura JC, Pérez-Córdoba JL, Benítez MC, Rubio AJ: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing 2005,13(3):355-366.

    Article  Google Scholar 

  18. Suh Y, Kim H: Class-based histogram equalization for robust speech recognition. ETRI Journal 2006,28(4):502-505. 10.4218/etrij.06.0206.0005

    Article  Google Scholar 

  19. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550

    Article  Google Scholar 

  20. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453

    Article  Google Scholar 

  21. Kim H, Rose RC: Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Speech and Audio Processing 2003,11(5):435-446. 10.1109/TSA.2003.815515

    Article  Google Scholar 

  22. Cho YD, Al-Naimi K, Kondoz A: Mixed decision-based noise adaptation for speech enhancement. Electronics Letters 2001,37(8):540-542. 10.1049/el:20010368

    Article  Google Scholar 

  23. Kim NS, Chang J-H: Spectral enhancement based on global soft decision. IEEE Signal Processing Letters 2000,7(5):108-110. 10.1109/97.841154

    Article  Google Scholar 

  24. Cohen I: Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters 2004,11(9):725-728. 10.1109/LSP.2004.833478

    Article  Google Scholar 

  25. Final draft ETSI ES 202 050 V1.1.1 : Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI, June 2002

  26. Chen SS, Gopinath RA: Gaussianization. Proceedings of Advances in Neural Information Processing Systems (NIPS '00), December 2000, Denver, Colo, USA 423–429.

    Google Scholar 

  27. Saon G, Dharanipragada S, Povey D: Feature space gaussianization. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 1: 329–332.

    Google Scholar 

  28. Nadeu C, Macho D, Hernando J: Time and frequency filtering of filter-bank energies for robust HMM speech recognition. Speech Communication 2001,34(1-2):93-114. 10.1016/S0167-6393(00)00048-0

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youngjoo Suh.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Suh, Y., Kim, S. & Kim, H. Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition. EURASIP J. Adv. Signal Process. 2007, 067870 (2007). https://doi.org/10.1155/2007/67870

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/2007/67870

Keywords