Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Butko, Taras; Canton-Ferrer, Cristian; Segura, Carlos; Giró, Xavier; Nadeu, Climent; Hernando, Javier; Casas, Josep R.

doi:10.1155/2011/485738

Research Article
Open access
Published: 13 February 2011

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Taras Butko¹,
Cristian Canton-Ferrer¹,
Carlos Segura¹,
Xavier Giró¹,
Climent Nadeu¹,
Javier Hernando¹ &
…
Josep R. Casas¹

EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 485738 (2011) Cite this article

1709 Accesses
11 Citations
Metrics details

Abstract

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.

Publisher note

To access the full article, please see PDF.

Author information

Authors and Affiliations

Department of Signal Theory and Communications, TALP Research Center, Technical University of Catalonia, Campus Nord, Ed. D5, Jordi Girona 1-3, 08034, Barcelona, Spain
Taras Butko, Cristian Canton-Ferrer, Carlos Segura, Xavier Giró, Climent Nadeu, Javier Hernando & Josep R. Casas

Authors

Taras Butko
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Canton-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Segura
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Giró
View author publications
You can also search for this author in PubMed Google Scholar
Climent Nadeu
View author publications
You can also search for this author in PubMed Google Scholar
Javier Hernando
View author publications
You can also search for this author in PubMed Google Scholar
Josep R. Casas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taras Butko.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Butko, T., Canton-Ferrer, C., Segura, C. et al. Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities. EURASIP J. Adv. Signal Process. 2011, 485738 (2011). https://doi.org/10.1155/2011/485738

Download citation

Received: 20 May 2010
Revised: 30 November 2010
Accepted: 14 January 2011
Published: 13 February 2011
DOI: https://doi.org/10.1155/2011/485738

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Abstract

Publisher note

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords