Skip to main content

Human-Activity Analysis in Multimedia Data


Many important applications in multimedia revolve around the detection of humans and the interpretation of their behavior. These include surveillance and intrusion detection, video conferencing applications, assisted living applications, and automatic analysis of sports videos, broadcasts, and movies, to name just a few. Success in these tasks often requires the integration of various sensor or data modalities such as video, audio, motion, and accompanying text, and typically hinges on a host of machine-learning methodologies to handle the inherent variability and complexity of the ensuing features. The computational efficiency of the resulting algorithms is critical since the amount of data to be processed in multimedia applications is typically large, and in real-time systems, speed is of the essence.

There have been several recent special issues dealing with the dection of humans and the analysis of their activity relying solely on video footage. In this special issue, we have tried to provide a platform to contributions that make use of a broader spectrum of multimedia information, complementing video with audio or text information as well as other types of sensor signals, whenever available.

The first group of papers in the special issue addresses the joint use of audio and video data. The paper "Audiovisual head orientation estimation with particle filtering in multisensor scenarios'' by C. Canton-Ferrer et al. describes a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones such as smart rooms for automatic video conferencing. The fusion of audio and vision is based on particle filtering.

S. Shivappa et al., in the paper "An terative decoding algorithm for fusion of multimodal nformation,'' present an algorithm for speech segmentation in a meeting room scenario using both audio and visual cues. The authors put forward an iterative fusing algorithm that takes advantage of the theory of turbo codes in communications theory by establishing an analogy between the redundant parity bits of the constituent codes of a turbo code and the information from different sensors in a multimodal system. Dimoulas et al., in the paper "Joint wavelet video denoising and motion activity detection in multimodal human activity analysis: application to video-assisted bioacoustic/psychophysiological monitoring,'' also integrate both audio and video information to develop a video-assisted biomedical monitoring system that has been tested for the noninvasive diagnosis of gastrointestinal motility dysfunctions.

The articles by N. Ince et al., titled "Detection of early morning daily activities with static home and wearable wireless sensors,'' and B. Toreyin et al., titled "Falling person detection using multisensor signal processing,'' are concerned with indoor monitoring and surveillance applications that rely on the integration of sensor data. N. Ince et al. describe a human activity monitoring system to assist patients with cognitive impairments caused by traumatic brain injury. The article details how fixed motion sensors combined with accelerometer embedded in wearable wireless sensors allow the system to detect and classify daily morning activity. B. Toreyin et al. outline a smart room application employing passive infrared and vibration sensors, as well as audio, to reliably detect a person falling.

The rest of the papers in this issue describe video-based surveillance applications. F. Porikli et al., in the paper "Robust abandoned object detection using dual foregrounds,'' detect abandoned objects by estimating dual foreground images from video recorded in an intelligent building. G. Pieri and D. Moroni, in the paper "Active video-surveillance based on stereo and infrared imaging," describe a video surveillance system integrating information from regular stereo and infrared cameras. They exploit the strengths of both modalities by utilizing the more accurate localization made possible by the stereo cameras in combination with the improved detection robustness that results from inspecting the IR data. The article by L. Raskin et al., titled "Using gaussian process annealing particle filter for 3D human tracking,'' tracks humans in 3D scenes using particle filters. The article by M. Hossain et al., titled "Edge segment-based automatic video surveillance,'' describes a paper using image edge information for automatic video surveillance.

  1. A.

    Enis Cetin

Eric Pauwels

Ovidio Salvetti

Author information

Authors and Affiliations


Corresponding author

Correspondence to A. Enis Cetin.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Cetin, A.E., Pauwels, E. & Salvetti, O. Human-Activity Analysis in Multimedia Data. EURASIP J. Adv. Signal Process. 2008, 293453 (2007).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: