Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Patterson, Eric K.; Gurbuz, Sabri; Tufekci, Zekeriya; Gowdy, John N.

doi:10.1155/S1110865702206101

Research Article
Published: 28 November 2002

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Eric K. Patterson¹,
Sabri Gurbuz¹,
Zekeriya Tufekci¹ &
…
John N. Gowdy¹

EURASIP Journal on Advances in Signal Processing volume 2002, Article number: 208541 (2002) Cite this article

1907 Accesses
54 Citations
Metrics details

Abstract

Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE) database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are given.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, 29634, USA
Eric K. Patterson, Sabri Gurbuz, Zekeriya Tufekci & John N. Gowdy

Authors

Eric K. Patterson
View author publications
You can also search for this author in PubMed Google Scholar
Sabri Gurbuz
View author publications
You can also search for this author in PubMed Google Scholar
Zekeriya Tufekci
View author publications
You can also search for this author in PubMed Google Scholar
John N. Gowdy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric K. Patterson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patterson, E.K., Gurbuz, S., Tufekci, Z. et al. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus. EURASIP J. Adv. Signal Process. 2002, 208541 (2002). https://doi.org/10.1155/S1110865702206101

Download citation

Received: 30 November 2001
Revised: 10 May 2002
Published: 28 November 2002
DOI: https://doi.org/10.1155/S1110865702206101

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Abstract

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords