Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

Adams, W. H.; Iyengar, Giridharan; Lin, Ching-Yung; Naphade, Milind Ramesh; Neti, Chalapathy; Nock, Harriet J.; Smith, John R.

doi:10.1155/S1110865703211173

Research Article
Published: 25 February 2003

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

W. H. Adams¹,
Giridharan Iyengar¹,
Ching-Yung Lin²,
Milind Ramesh Naphade²,
Chalapathy Neti¹,
Harriet J. Nock¹ &
…
John R. Smith²

EURASIP Journal on Advances in Signal Processing volume 2003, Article number: 987184 (2003) Cite this article

2780 Accesses
74 Citations
Metrics details

Abstract

We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM). Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
W. H. Adams, Giridharan Iyengar, Chalapathy Neti & Harriet J. Nock
IBM T. J. Watson Research Center, Hawthorne, NY, 10532, USA
Ching-Yung Lin, Milind Ramesh Naphade & John R. Smith

Authors

W. H. Adams
View author publications
You can also search for this author in PubMed Google Scholar
Giridharan Iyengar
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Yung Lin
View author publications
You can also search for this author in PubMed Google Scholar
Milind Ramesh Naphade
View author publications
You can also search for this author in PubMed Google Scholar
Chalapathy Neti
View author publications
You can also search for this author in PubMed Google Scholar
Harriet J. Nock
View author publications
You can also search for this author in PubMed Google Scholar
John R. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to W. H. Adams.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adams, W.H., Iyengar, G., Lin, CY. et al. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues. EURASIP J. Adv. Signal Process. 2003, 987184 (2003). https://doi.org/10.1155/S1110865703211173

Download citation

Received: 02 April 2002
Revised: 15 November 2002
Published: 25 February 2003
DOI: https://doi.org/10.1155/S1110865703211173

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

Abstract

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords