Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Nefian, Ara V.; Liang, Luhong; Pi, Xiaobo; Liu, Xiaoxing; Murphy, Kevin

doi:10.1155/S1110865702206083

Research Article
Published: 28 November 2002

Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Ara V. Nefian¹,
Luhong Liang²,
Xiaobo Pi²,
Xiaoxing Liu² &
…
Kevin Murphy³

EURASIP Journal on Advances in Signal Processing volume 2002, Article number: 783042 (2002) Cite this article

2769 Accesses
124 Citations
Metrics details

Abstract

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

Author information

Authors and Affiliations

Intel Corporation, Microprocessor Research Labs, 2200 Mission College Blvd., Santa Clara, CA, 95052-8119, USA
Ara V. Nefian
Intel Corporation, Microcomputer Research Labs, Guanghua Road, 100020, Chaoyang District, Beijing, China
Luhong Liang, Xiaobo Pi & Xiaoxing Liu
Computer Science Division, University of California, Berkeley, Berkeley, CA, 94720-1776, USA
Kevin Murphy

Authors

Ara V. Nefian
View author publications
You can also search for this author in PubMed Google Scholar
Luhong Liang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobo Pi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ara V. Nefian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nefian, A.V., Liang, L., Pi, X. et al. Dynamic Bayesian Networks for Audio-Visual Speech Recognition. EURASIP J. Adv. Signal Process. 2002, 783042 (2002). https://doi.org/10.1155/S1110865702206083

Download citation

Received: 30 November 2001
Revised: 06 August 2002
Published: 28 November 2002
DOI: https://doi.org/10.1155/S1110865702206083

Dynamic Bayesian Networks for Audio-Visual Speech Recognition

Abstract

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords