Skip to content


  • Research Article
  • Open Access

Search the Audio, Browse the Video—A Generic Paradigm for Video Collections

EURASIP Journal on Advances in Signal Processing20032003:182545

  • Received: 18 April 2002
  • Published:


The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.


  • automatic video indexing
  • video browsing
  • video and speech retrieval
  • phonetic speech retrieval

Authors’ Affiliations

IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA
Computer Science, University of Arizona, Tucson, AZ 85721-0077, USA


© Copyright © 2003 Hindawi Publishing Corporation 2003