Human-Activity Analysis in Multimedia Data
© A. Enis Cetin et al. 2008
Received: 11 November 2007
Accepted: 11 November 2007
Published: 19 November 2007
Many important applications in multimedia revolve around the detection of humans and the interpretation of their behavior. These include surveillance and intrusion detection, video conferencing applications, assisted living applications, and automatic analysis of sports videos, broadcasts, and movies, to name just a few. Success in these tasks often requires the integration of various sensor or data modalities such as video, audio, motion, and accompanying text, and typically hinges on a host of machine-learning methodologies to handle the inherent variability and complexity of the ensuing features. The computational efficiency of the resulting algorithms is critical since the amount of data to be processed in multimedia applications is typically large, and in real-time systems, speed is of the essence.
There have been several recent special issues dealing with the dection of humans and the analysis of their activity relying solely on video footage. In this special issue, we have tried to provide a platform to contributions that make use of a broader spectrum of multimedia information, complementing video with audio or text information as well as other types of sensor signals, whenever available.
The first group of papers in the special issue addresses the joint use of audio and video data. The paper "Audiovisual head orientation estimation with particle filtering in multisensor scenarios'' by C. Canton-Ferrer et al. describes a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones such as smart rooms for automatic video conferencing. The fusion of audio and vision is based on particle filtering.
S. Shivappa et al., in the paper "An terative decoding algorithm for fusion of multimodal nformation,'' present an algorithm for speech segmentation in a meeting room scenario using both audio and visual cues. The authors put forward an iterative fusing algorithm that takes advantage of the theory of turbo codes in communications theory by establishing an analogy between the redundant parity bits of the constituent codes of a turbo code and the information from different sensors in a multimodal system. Dimoulas et al., in the paper "Joint wavelet video denoising and motion activity detection in multimodal human activity analysis: application to video-assisted bioacoustic/psychophysiological monitoring,'' also integrate both audio and video information to develop a video-assisted biomedical monitoring system that has been tested for the noninvasive diagnosis of gastrointestinal motility dysfunctions.
The articles by N. Ince et al., titled "Detection of early morning daily activities with static home and wearable wireless sensors,'' and B. Toreyin et al., titled "Falling person detection using multisensor signal processing,'' are concerned with indoor monitoring and surveillance applications that rely on the integration of sensor data. N. Ince et al. describe a human activity monitoring system to assist patients with cognitive impairments caused by traumatic brain injury. The article details how fixed motion sensors combined with accelerometer embedded in wearable wireless sensors allow the system to detect and classify daily morning activity. B. Toreyin et al. outline a smart room application employing passive infrared and vibration sensors, as well as audio, to reliably detect a person falling.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.