Skip to content

Advertisement

Open Access

Robust Speech Recognition Using Factorial HMMs for Home Environments

EURASIP Journal on Advances in Signal Processing20072007:020593

https://doi.org/10.1155/2007/20593

Received: 1 February 2006

Accepted: 17 December 2006

Published: 14 February 2007

Abstract

We focus on the problem of speech recognition in the presence of nonstationary sudden noise, which is very likely to happen in home environments. As a model compensation method for this problem, we investigated the use of factorial hidden Markov model (FHMM) architecture developed from a clean-speech hidden Markov model (HMM) and a sudden-noise HMM. While in conventional studies this architecture is defined only for static features of the observation vector, we extended it to dynamic features. In addition, we performed home-environment adaptation of FHMMs to the characteristics of a given house. A database recorded by a personal robot called PaPeRo in home environments was used for the evaluation of the proposed method. Isolated word recognition experiments demonstrated the effectiveness of the proposed method under noisy conditions. Home-dependent word FHMMs (HD-FHMMs) reduced the word error rate by 20.5 compared to that of the clean-speech word HMMs.

Keywords

Markov ModelHide Markov ModelQuantum InformationStatic FeatureWord Recognition

[1234567891011121314]

Authors’ Affiliations

(1)
Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Tokyo, Japan

References

  1. Huang X, Acero A, Hon H: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice-Hall, Upper Saddle River, NJ, USA; 2001.Google Scholar
  2. Cooke M, Green P, Josifovski L, Vizinho A: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 2001,34(3):267-285. 10.1016/S0167-6393(00)00034-0View ArticleMATHGoogle Scholar
  3. Varga AP, Moore RK: Hidden Markov model decomposition of speech and noise. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 845-848.Google Scholar
  4. Gales MJF, Young SJ: HMM recognition in noise using parallel model combination. Proceedings of the 3rd European Conference on Speech Communication and Technology (EuroSpeech '93), September 1993, Berlin, Germany 2: 837-840.Google Scholar
  5. Ghahramani Z, Jordan MI: Factorial hidden Markov models. Machine Learning 1997,29(2-3):245-273.View ArticleMATHGoogle Scholar
  6. Deoras AN, Hasegawa-Johnson M: A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 861-864.Google Scholar
  7. Iwasawa T, Ohnaka S, Fujita Y: A speech recognition interface for robots using notification of III-suited conditions. Proceedings of the 16th Meeting of Special Interest Group on AI Challenges, 2002 33-38.Google Scholar
  8. Roweis TS: One microphone source separation. Proceedings of Neural Information Processing Systems (NIPS '00), 2000, Denver, Colo, USA 13: 793-799.Google Scholar
  9. Nadas A, Nahamoo D, Picheny MA: Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(10):1495-1503. 10.1109/29.35387View ArticleGoogle Scholar
  10. Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420View ArticleGoogle Scholar
  11. Logan B, Moreno P: Factorial HMMs for acoustic modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 2: 813-816.Google Scholar
  12. Furui S: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):52-59. 10.1109/TASSP.1986.1164788View ArticleGoogle Scholar
  13. Shinoda K, Watanabe T: Speaker adaptation with autonomous control using tree structure. Proceedings of the 4th European Conference on Speech Communication and Technology (EuroSpeech '95), September 1995, Madrid, Spain 1143-1146.Google Scholar
  14. Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171-185. 10.1006/csla.1995.0010View ArticleGoogle Scholar

Copyright

© Agnieszka Betkowska et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement