Skip to main content
  • Research Article
  • Open access
  • Published:

Robust Speech Recognition Using Factorial HMMs for Home Environments

Abstract

We focus on the problem of speech recognition in the presence of nonstationary sudden noise, which is very likely to happen in home environments. As a model compensation method for this problem, we investigated the use of factorial hidden Markov model (FHMM) architecture developed from a clean-speech hidden Markov model (HMM) and a sudden-noise HMM. While in conventional studies this architecture is defined only for static features of the observation vector, we extended it to dynamic features. In addition, we performed home-environment adaptation of FHMMs to the characteristics of a given house. A database recorded by a personal robot called PaPeRo in home environments was used for the evaluation of the proposed method. Isolated word recognition experiments demonstrated the effectiveness of the proposed method under noisy conditions. Home-dependent word FHMMs (HD-FHMMs) reduced the word error rate by 20.5 compared to that of the clean-speech word HMMs.

References

  1. Huang X, Acero A, Hon H: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice-Hall, Upper Saddle River, NJ, USA; 2001.

    Google Scholar 

  2. Cooke M, Green P, Josifovski L, Vizinho A: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 2001,34(3):267-285. 10.1016/S0167-6393(00)00034-0

    Article  Google Scholar 

  3. Varga AP, Moore RK: Hidden Markov model decomposition of speech and noise. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 845–848.

    Google Scholar 

  4. Gales MJF, Young SJ: HMM recognition in noise using parallel model combination. Proceedings of the 3rd European Conference on Speech Communication and Technology (EuroSpeech '93), September 1993, Berlin, Germany 2: 837–840.

    Google Scholar 

  5. Ghahramani Z, Jordan MI: Factorial hidden Markov models. Machine Learning 1997,29(2-3):245-273.

    Article  Google Scholar 

  6. Deoras AN, Hasegawa-Johnson M: A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 861–864.

    Google Scholar 

  7. Iwasawa T, Ohnaka S, Fujita Y: A speech recognition interface for robots using notification of III-suited conditions. Proceedings of the 16th Meeting of Special Interest Group on AI Challenges, 2002 33–38.

    Google Scholar 

  8. Roweis TS: One microphone source separation. Proceedings of Neural Information Processing Systems (NIPS '00), 2000, Denver, Colo, USA 13: 793–799.

    Google Scholar 

  9. Nadas A, Nahamoo D, Picheny MA: Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(10):1495-1503. 10.1109/29.35387

    Article  Google Scholar 

  10. Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420

    Article  Google Scholar 

  11. Logan B, Moreno P: Factorial HMMs for acoustic modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 2: 813–816.

    Google Scholar 

  12. Furui S: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):52-59. 10.1109/TASSP.1986.1164788

    Article  Google Scholar 

  13. Shinoda K, Watanabe T: Speaker adaptation with autonomous control using tree structure. Proceedings of the 4th European Conference on Speech Communication and Technology (EuroSpeech '95), September 1995, Madrid, Spain 1143–1146.

    Google Scholar 

  14. Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171-185. 10.1006/csla.1995.0010

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agnieszka Betkowska.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Betkowska, A., Shinoda, K. & Furui, S. Robust Speech Recognition Using Factorial HMMs for Home Environments. EURASIP J. Adv. Signal Process. 2007, 020593 (2007). https://doi.org/10.1155/2007/20593

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/2007/20593

Keywords