Skip to main content

Advertisement

You are viewing the new BMC article page. Let us know what you think. Return to old version

Research Article | Open | Published:

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Abstract

A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by over the mel-cepstral-based features and by over the conventional histogram equalization method, respectively.

References

  1. 1.

    Sankar A, Lee C-H: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing 1996,4(3):190-202. 10.1109/89.496215

  2. 2.

    Huang X, Acero A, Hon H-W: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, Englewood Cliffs, NJ, USA; 2001.

  3. 3.

    Rosenberg AE, Lee C-H, Soong FK: Cepstral channel normalization techniques for HMM-based speaker verification. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1835–1838.

  4. 4.

    Viikki O, Laurila K: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 1998,25(1–3):133-147.

  5. 5.

    Kermorvant C: A comparison of noise reduction techniques for robust speech recognition. In IDIAP Research Report IDIAP-RR 99-10. IDIAP Research Institute, Martigny, Switzerland; 1999.

  6. 6.

    Kim NS, Kim YJ, Kim H: Feature compensation based on soft decision. IEEE Signal Processing Letters 2004,11(3):378-381. 10.1109/LSP.2003.821720

  7. 7.

    Droppo J, Deng L, Acero A: Evaluation of the SPLICE algorithm on the Aurora2 database. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 217–220.

  8. 8.

    Gales MJF: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 1998,12(2):75-98. 10.1006/csla.1998.0043

  9. 9.

    Gonzalez RC, Woods RE: Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 2002.

  10. 10.

    Dharanipragada S, Padmanabhan M: A nonlinear unsupervised adaptation technique for speech recognition. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 4: 556–559.

  11. 11.

    Hilger F, Ney H: Quantile based histogram equalization for noise robust speech recognition. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 1135–1138.

  12. 12.

    Saon G, Huerta JM: Improvements to the IBM Aurora 2 multi-condition system. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 469–472.

  13. 13.

    Molau S, Hilger F, Keysers D, Ney H: Enhanced histogram normalization in the acoustic feature space. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1421–1424.

  14. 14.

    Molau S, Hilger F, Ney H: Feature space normalization in adverse acoustic conditions. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 656–659.

  15. 15.

    Hilger F: Quantile based histogram equalization for noise robust speech recognition, Ph.D. thesis. RWTH, Aachen-University of Technology, Aachen, Germany; 2004.

  16. 16.

    Segura JC, Benítez C, de La Torre Á, Rubio AJ, Ramírez J: Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Processing Letters 2004,11(5):517-520. 10.1109/LSP.2004.826648

  17. 17.

    de La Torre Á, Peinado AM, Segura JC, Pérez-Córdoba JL, Benítez MC, Rubio AJ: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing 2005,13(3):355-366.

  18. 18.

    Suh Y, Kim H: Class-based histogram equalization for robust speech recognition. ETRI Journal 2006,28(4):502-505. 10.4218/etrij.06.0206.0005

  19. 19.

    Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550

  20. 20.

    Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453

  21. 21.

    Kim H, Rose RC: Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Speech and Audio Processing 2003,11(5):435-446. 10.1109/TSA.2003.815515

  22. 22.

    Cho YD, Al-Naimi K, Kondoz A: Mixed decision-based noise adaptation for speech enhancement. Electronics Letters 2001,37(8):540-542. 10.1049/el:20010368

  23. 23.

    Kim NS, Chang J-H: Spectral enhancement based on global soft decision. IEEE Signal Processing Letters 2000,7(5):108-110. 10.1109/97.841154

  24. 24.

    Cohen I: Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters 2004,11(9):725-728. 10.1109/LSP.2004.833478

  25. 25.

    Final draft ETSI ES 202 050 V1.1.1 : Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI, June 2002

  26. 26.

    Chen SS, Gopinath RA: Gaussianization. Proceedings of Advances in Neural Information Processing Systems (NIPS '00), December 2000, Denver, Colo, USA 423–429.

  27. 27.

    Saon G, Dharanipragada S, Povey D: Feature space gaussianization. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 1: 329–332.

  28. 28.

    Nadeu C, Macho D, Hernando J: Time and frequency filtering of filter-bank energies for robust HMM speech recognition. Speech Communication 2001,34(1-2):93-114. 10.1016/S0167-6393(00)00048-0

Download references

Author information

Correspondence to Youngjoo Suh.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Keywords

  • Feature Extraction
  • Histogram Equalization
  • Class Reference
  • Speech Data
  • Speech Enhancement