Skip to content


Open Access

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

  • Taras Butko1Email author,
  • Cristian Canton-Ferrer1,
  • Carlos Segura1,
  • Xavier Giró1,
  • Climent Nadeu1,
  • Javier Hernando1 and
  • Josep R. Casas1
EURASIP Journal on Advances in Signal Processing20112011:485738

Received: 20 May 2010

Accepted: 14 January 2011

Published: 13 February 2011


Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.


Recognition RateSound SourceObject DetectionAudio SignalFusion Strategy

Publisher note

To access the full article, please see PDF.

Authors’ Affiliations

Department of Signal Theory and Communications, TALP Research Center, Technical University of Catalonia, Barcelona, Spain


© Taras Butko et al. 2011

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.