Skip to content


  • Research Article
  • Open Access

A System for the Semantic Multimodal Analysis of News Audio-Visual Content

  • 1Email author,
  • 1,
  • 1, 2,
  • 3,
  • 3,
  • 4,
  • 4, 5,
  • 4,
  • 1 and
  • 1, 2
EURASIP Journal on Advances in Signal Processing20102010:645052

  • Received: 24 July 2009
  • Accepted: 21 February 2010
  • Published:


News-related content is nowadays among the most popular types of content for users in everyday applications. Although the generation and distribution of news content has become commonplace, due to the availability of inexpensive media capturing devices and the development of media sharing services targeting both professional and user-generated news content, the automatic analysis and annotation that is required for supporting intelligent search and delivery of this content remains an open issue. In this paper, a complete architecture for knowledge-assisted multimodal analysis of news-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately and proposes a novel fusion technique based on the particular characteristics of news-related content for the combination of the individual modality analysis results. Experimental results on news broadcast video illustrate the usefulness of the proposed techniques in the automatic generation of semantic annotations.


  • Automatic Generation
  • Individual Modality
  • Multimedia Content
  • Fusion Technique
  • Semantic Annotation

Publisher note

To access the full article, please see PDF.

Authors’ Affiliations

Centre for Research and Technology Hellas, Informatics and Telematics Institute, 6th Km Charilaou-Thermi Road, P.O. BOX 60361, 57001 Thermi, Greece
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54006 Thessaloniki, Greece
Language Technology Lab, DFKI GmbH, Stuhlsatzenhausweg 3, 66123 Saarbrucken, Germany
Department of Computer Science/Human Media Interaction, University of Twente, 7500 AE Enschede, The Netherlands
Centre for Language and Speech Technology, Radboud University Nijmegen, 6525 HT Nijmegen, The Netherlands


© Vasileios Mezaris et al. 2010

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.