Skip to content

Advertisement

  • Research Article
  • Open Access

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

  • 1Email author,
  • 1,
  • 2,
  • 2,
  • 1,
  • 1 and
  • 2
EURASIP Journal on Advances in Signal Processing20032003:987184

https://doi.org/10.1155/S1110865703211173

  • Received: 2 April 2002
  • Published:

Abstract

We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM). Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

Keywords

  • query by keywords
  • multimodal information fusion
  • statistical modeling of multimedia
  • video indexing and retrieval
  • SVM
  • GMM
  • HMM
  • spoken document retrieval
  • video event detection
  • video TREC

Authors’ Affiliations

(1)
IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA
(2)
IBM T. J. Watson Research Center, Hawthorne, NY 10532, USA

Copyright

Advertisement