Skip to content


  • Research Article
  • Open Access

Robust Speech Recognition Using Factorial HMMs for Home Environments

EURASIP Journal on Advances in Signal Processing20072007:020593

  • Received: 1 February 2006
  • Accepted: 17 December 2006
  • Published:


We focus on the problem of speech recognition in the presence of nonstationary sudden noise, which is very likely to happen in home environments. As a model compensation method for this problem, we investigated the use of factorial hidden Markov model (FHMM) architecture developed from a clean-speech hidden Markov model (HMM) and a sudden-noise HMM. While in conventional studies this architecture is defined only for static features of the observation vector, we extended it to dynamic features. In addition, we performed home-environment adaptation of FHMMs to the characteristics of a given house. A database recorded by a personal robot called PaPeRo in home environments was used for the evaluation of the proposed method. Isolated word recognition experiments demonstrated the effectiveness of the proposed method under noisy conditions. Home-dependent word FHMMs (HD-FHMMs) reduced the word error rate by 20.5 compared to that of the clean-speech word HMMs.


  • Markov Model
  • Hide Markov Model
  • Quantum Information
  • Static Feature
  • Word Recognition


Authors’ Affiliations

Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Tokyo 152-8552, Japan


  1. Huang X, Acero A, Hon H: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice-Hall, Upper Saddle River, NJ, USA; 2001.Google Scholar
  2. Cooke M, Green P, Josifovski L, Vizinho A: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 2001,34(3):267-285. 10.1016/S0167-6393(00)00034-0View ArticleMATHGoogle Scholar
  3. Varga AP, Moore RK: Hidden Markov model decomposition of speech and noise. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 845-848.Google Scholar
  4. Gales MJF, Young SJ: HMM recognition in noise using parallel model combination. Proceedings of the 3rd European Conference on Speech Communication and Technology (EuroSpeech '93), September 1993, Berlin, Germany 2: 837-840.Google Scholar
  5. Ghahramani Z, Jordan MI: Factorial hidden Markov models. Machine Learning 1997,29(2-3):245-273.View ArticleMATHGoogle Scholar
  6. Deoras AN, Hasegawa-Johnson M: A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 861-864.Google Scholar
  7. Iwasawa T, Ohnaka S, Fujita Y: A speech recognition interface for robots using notification of III-suited conditions. Proceedings of the 16th Meeting of Special Interest Group on AI Challenges, 2002 33-38.Google Scholar
  8. Roweis TS: One microphone source separation. Proceedings of Neural Information Processing Systems (NIPS '00), 2000, Denver, Colo, USA 13: 793-799.Google Scholar
  9. Nadas A, Nahamoo D, Picheny MA: Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(10):1495-1503. 10.1109/29.35387View ArticleGoogle Scholar
  10. Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420View ArticleGoogle Scholar
  11. Logan B, Moreno P: Factorial HMMs for acoustic modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 2: 813-816.Google Scholar
  12. Furui S: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):52-59. 10.1109/TASSP.1986.1164788View ArticleGoogle Scholar
  13. Shinoda K, Watanabe T: Speaker adaptation with autonomous control using tree structure. Proceedings of the 4th European Conference on Speech Communication and Technology (EuroSpeech '95), September 1995, Madrid, Spain 1143-1146.Google Scholar
  14. Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171-185. 10.1006/csla.1995.0010View ArticleGoogle Scholar