Localized Detection of Abandoned Luggage
© Jing-Ying Chang et al. 2010
Received: 15 December 2009
Accepted: 2 June 2010
Published: 29 June 2010
Abandoned luggage represents a potential threat to public safety. Identifying objects as luggage, identifying the owners of such objects, and identifying whether owners have left luggage behind are the three main problems requiring solution. This paper proposes two techniques which are "foreground-mask sampling" to detect luggage with arbitrary appearance and "selective tracking" to locate and to track owners based solely on looking only at the neighborhood of the luggage. Experimental results demonstrate that once an owner abandons luggage and leaves the scene, the alarm fires within few seconds. The average processing speed of the approach is 17.37 frames per second, which is sufficient for real world applications.
Intelligent and automatic security surveillance systems have recently become an active research focus due to continuously growing public demand for such systems. Terrorist attacks frequently employ bombs, such as car bombs, suicide bombs, and luggage bombs. Modern technology cannot fully prevent such attacks, and security officers can easily miss their targets. However, compared with the two previous forms, luggage bombs are relatively difficult to hide and there is generally ample time to either deal with the bombs or organize an evacuation. Humans thus have a better chance to prevent destruction arising from luggage bombs. Therefore, to achieve early detection of these threats with the assistance of automatic security systems, the ability to reliably detect suspicious items and identify their owners is urgently necessary in various venues such as airports and train stations.
Previous studies have given several definitions of a luggage abandonment event [1–5]. This study follows three similar but slightly different rules . ( ) Contextual rule: luggage is considered unattended after the person who entered the area in possession of that luggage concerned is no longer in close proximity to it. ( ) Spatial rule: luggage is considered unattended when its owner is outside of a small neighborhood around the luggage. ( ) Temporal rule: If the owner of a luggage leaves the area without the luggage, or if the luggage has been left unattended for more than consecutive seconds, the luggage is considered abandoned.
1.1. Related Works
The task of abandoned luggage detection in surveillance video generally comprises three stages: The first stage localizes candidate abandoned luggage items in the video. The second stage locates and tracks the luggage owner(s), providing a trajectory for subsequent probabilistic reasoning. The final stage assesses a probability or confidence score for the luggage-abandonment event based on information obtained during previous stages. The three stages all represent distinct research areas with their own rich literature. Various existing algorithms may employ different methods for different stages.
The first stage of locating candidate abandoned luggage items within the video frame is performed using two types of techniques: those that utilize the technique of background subtraction [6–8], and those that do not [9, 10]. As generally acknowledged, object detection and recognition is an instinctive and spontaneous process for human visual system. However, implementing a robust and accurate computer vision system capable of detecting relevant objects in monitored areas has proved challenging. The main difficulty is the appearance of an object can vary significantly due to viewpoint changes, scene clutter, ambient lighting changes, and in some cases even shape changes (for nonrigid objects such as human body). Consequently, the same object may present enormously different images under various viewing conditions. Background subtraction works reasonably well when the camera is stationary and the change in ambient lighting is gradual. For those approaches without background subtraction, a set of discriminative features of objects must be learned through machine-learning algorithms to enable subsequent detection of these objects.
Most existing event detection methods incorporate some form of tracking algorithm [4–7, 10–13]. In most cases, tracking is performed on all detected moving objects or foreground blobs. However, because of occlusion and fixed camera angle, this comprehensive tracking frequently results in errors such as identity switch (when two objects in close proximity switch identities), which is difficult to avoid and occurs in many PETS 2006  demonstration sequences—such as those in .
The final stage of determining whether an alarm is necessary is performed deterministically. In a deterministic system, an event is declared to have occurred if particular criteria are satisfied. A few reports employ a probabilistic framework for event modeling, with an event being deemed to have occurred if its confidence score exceeds a certain threshold . The probabilistic approach gives users increased flexibility to set thresholds, and thus system sensitivity, and a better understanding of how the reality of a situation.
The contribution of this paper is as follows. First, this paper proposes the foreground-mask sampling to localize the candidates of abandoned luggage items by calculating the intersection of a number of background-subtracted frames which are sampled over a period of time. Abandoned luggage items are assumed to be static foreground objects, and thus appear in this intersection. Since this approach requires no prior learning of luggage appearance in any form, luggage of all shapes, sizes, orientations, viewing angles and colors can be successfully localized without the need for training data and associated constraints.
Second, selective tracking is applied following identification and localization of a suspicious luggage item. This approach seeks the owner of the luggage in a neighborhood around the detected item. If the owner is found within this neighborhood, the luggage is assumed to be being attended by its owner and thus to require no further processing. However, if no owner is found, the tracking algorithm returns to the frame in which the owner was still attending the luggage, and starts tracking the owner from that point. Selective tracking only tracks the owner, and ignores other irrelevant moving objects in the foreground. Accordingly, the computational requirements of selective tracking are less than in previous works.
The remainder of this paper is organized as follows. Section 2 details how the foreground-mask sampling approach localizes the suspicious luggage. Section 3 then elucidates the selective tracking module. Section 3 then presents the experimental results, indicating the tracked owner and alarm time. Finally, Section 4 draws conclusions.
2. Foreground-Mask Sampling
During the first stage of the system, the foreground-mask sampling attempts to localize static and possibly abandoned luggage items within the camera view. This technique imitates the natural human ability to focus attention exclusively on objects of interest. The algorithm identifies the objects (in this case abandoned luggage items) via logical foreground-background reasoning, while ignoring all irrelevant objects within the same scene. The appearance-based model is not used in locating suspicious luggage items, and thus can deal with luggage of any color and shape and is not affected by different viewing angles.
Since abandoned luggage is assumed to remain static for more than consecutive seconds, a number of video frames are collected from the past seconds; the number of frames is set to , and is evenly distributed across the second sample. In the subsequent experiment, detection performance is not significantly influenced with changing .
The background model is constructed using selected clean frames from the standard test sequences in which foreground clutter is minimized. In situations in which clean background frames are unavailable, frames with minimal foreground clutter are used. The background model comprises the average of the selected frames, with a standard deviation calculated on each background pixel to consider the pixel variation. This study does not employ dynamic update of the background model, since the tested video sequences contain minimal ambient lighting change, and for such sequences a one-time construction of a static background model provides reasonable performance. The background-subtraction-based object detection is not constrained to the following method. It can be replaced by other state-of-the-art approaches [15, 16] for complex environments.
3. Selective Tracking Module
The system presents information on the locations of suspicious items after obtaining . All static foreground objects are assumed to be either humans or luggage items. Each foreground region in is checked to determine whether it is a human via a combination of skin color information and body contours. If the region is identified as a human, it is discarded because the object of the search is abandoned luggage items. If the region is identified as not a human, it is assumed to be a luggage item. A local search region is constructed around the detected luggage to see whether its owner is in close proximity in the present frame at time . If the owner is found, the region is again discarded because the owner exhibits no intention of abandoning the luggage. However, if the owner is not located near the luggage, the algorithm goes back in time for a predefined seconds to the frame at time when the owner was still attending the luggage and begins tracking the owner from that point (at time ). The tracking module also employs skin color information and human body contour to track the owner.
is set to 30 seconds based on the assumption that when an isolated luggage item is first detected in a scene, its owner must have been in close proximity to the item until shortly before detection. This assumption is valid because if the owner has been absent for some time, the isolated luggage item will be detected faster using the foreground-mask sampling technique. Furthermore, owners who abandon their luggage with criminal intention would generally want to avoid attention and thus are unlikely to loiter; instead they will remain constantly with their luggage prior to abandonment. Therefore, in the case in which multiple people surround the abandoned luggage, the person closest to the luggage is assumed to be the owner.
The actual implementation uses a cache mechanism to store the information from the previous backtracking. When the system needs to repeat back-tracking around a single abandoned luggage item, the system needs only to update some of the stored information. For example, if two 30-second back-tracks overlap by 20 seconds, the information regarding the first 20 seconds of the second back-tracking can be directly obtained from the last 20 seconds of the first back-tracking, which is cached in the system. Overheads associated with recollecting thus are eliminated. This mechanism provides sufficient computational reduction in the back-tracking procedure and guarantees real-time performance on live streaming surveillance videos.
Because suspicious luggage items have been identified, tracking can be performed solely and selectively on their owners. This mechanism closely mimics the human ability to notice and track only objects of interest even under a highly cluttered background; for example, humans have a natural ability to identify familiar faces even in such crowded environments as an airport pick-up area.
The implementation of detection and tracking using skin color information and human body contour is detailed below, and its integration into the motion prediction of the tracking module.
3.1. Cr Color Channel with Human Skin
Human skin signal response is significantly larger in the YCbCr color space than the commonly used RGB color space. Due to significant blood flow, human skin responds strongly to the Cr channel in the YCbCr space, irrespective of skin color . Accordingly, the Cr channel of skin color is used for human face localization because in situations involving severe occlusion (crowded scenes with people overlapping one another), human face is the most visible body part when viewed with a typical surveillance camera positioned looking downwards from a height.
3.2. Improved Hough Transform on Body Contour
HT is a morphological tool which, in its simplest form, maps a straight line in normal space to a point in parameter space . A generalized version of HT is utilized to localize contour of an arbitrary shape. The algorithm comprises two stages: template generation and contour matching.
During template generation, given a predefined head-shoulder contour, as in Figure 5, the HT algorithm first establishes a center point of the face for the contour template. The algorithm then runs through all edge points on the contour template and for each point records the (angle with respect to the horizontal direction), (distance with respect to the center point) and (angle with respect to the center point). The lies between 0 and 180 degrees, and thus serves as the bin-index with which the pair is recorded into a 180-bin reference table. Multiple pairs of can be recorded under the same -angle bin in the reference table. After traversing all the points on the contour template, the template generation and reference table are completed.
This modification is made because for a angle computed from an edge point on the input edge image, an inherent error arises from pixel quantization and angle quantization, and thus the angle obtained at best indicates only a small range of neighboring angles. This small range is modeled by applying a Gaussian-weighting system to a range of -angle bins. Besides, by allowing the angle to vary within a limited range, the system can handle human head-shoulder contours that are slightly out of alignment given a perfect frontal image, and then allowing some variance in pose.
With the assistance of multiple cues to detect owners, even the color of abandoned object is close to skin color, the object will not be recognized as a human since it has no head-shoulder contour.
3.3. Integration into Motion Prediction
where , , . The fact that is calculated recursively ensures that past information is considered and past influences decay exponentially with time. In the implementation, the exponential smoothing coefficients and are empirically determined to be 0.4 and 0.6, respectively.
The three probabilities serve more as comparative than absolute values. A change in the standard deviations of these probability calculations would similarly affect all probabilities thus calculated, with the most probable detection still having the highest probability ranking. Empirical values thus are assigned to the standard deviations, and parameter selections in the experiment produce insignificant effects.
4. Experimental Results
Alarm time (second).
PETS2006 Seq. 1
PETS2006 Seq. 2
PETS2006 Seq. 4
PETS2006 Seq. 5
PETS2006 Seq. 6
PETS2006 Seq. 7
The AVSS 2007 dataset contains three cases with different difficulty levels: easy, medium and hard. The easy case contains objects with larger appearance, activities closer to the camera and less scene clutter; as the difficulty level raises, objects shrink and clutter increases. The proposed approach successfully detects the abandoned luggage in all three cases. The owner in the easy case is tracked continuously until leaving the scene without his luggage, triggering an alarm event. In the medium and hard cases, the owners pass behind a large pillar before leaving the scene without their luggage, and both are occluded for about 1.5 seconds. The proposed tracking engine is unable to follow the owner through the occlusion, and thus the owner is deemed lost; therefore alarms are also triggered in these two cases.
Comparison of AVSS2007 dataset.
5. Conclusion and Future Work
This paper presents a localized approach for detecting abandoned luggage in surveillance environments. Through foreground-mask sampling, only the object of interest is localized, while filtering out all irrelevant, interfering agents. Tracking thus can be performed in a more selective and localized manner. An improved implementation of the HT for detecting the contours of the upper-body is also proposed for use in tandem with skin color detection. These techniques make abandoned luggage detection become a real-time system, which can run at 17.37 frames per second on average.
In the future, the proposed approach is extended to a multicamera network in which coordination of various cameras enables cues to be gathered from multiple perspectives and information to be relayed from one to another camera. Besides, the approach is generalized to include different viewing angles on the human form. Currently, our approach can detect multiple abandoned objects. The alarm will fire at the first abandoned occurrence. But the system still has room for improvement. It will become inefficient (running at lower frame rate) since multiple abandoned objects existing simultaneously mean the system requires multiple selective tracking modules to locate each owner. High population density is another difficult issue for vision-based methods. Even humans cannot notice abandonment reliably. In this case, foreground-mask sampling method may fail and the systems need an object-recognition-based solution to detect the abandonment.
The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract no. NSC 97-2221-E-002-173-MY3. Ted Knoy is appreciated for his editorial assistance.
- Tian Y-L, Feris R, Hampapur A: Real-time detection of abandoned and removed objects in complex environments. Proceedings of the IEEE International Workshop on Visual Surveillance in Conjunction with European Conference on Computer Vision (ECCV '08), 2008Google Scholar
- Bird N, Atev S, Caramelli N, Martin R, Masoud O, Papainkolopoulos N: Real time, online detection of abandoned objects in public areas. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA '06), May 2006 3775-3780.Google Scholar
- Ferrando S, Gera G, Regazzoni C: Classification of unattended and stolen objects in video-surveillance system. In Proceedings of the IEEE International Conference on Video and Signal Based Surveillance (AVSS '06), 2006, Washington, DC, USA. IEEE Computer Society; 21.Google Scholar
- Beynon M, Van Hook D, Seibert M, Peacock A, Dudgeon D: Detecting abandoned packages in a multi-camera video surveillance system. Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, July 2003 221-228.View ArticleGoogle Scholar
- Lv F, Song X, Wu B, Singh V, Nevatia R: Left-luggage detection using bayesian inference. Proceedings of the 9th IEEE International Workshop on Performance Evaluation in Tracking and Surveillance (PETS '06), June 2006Google Scholar
- Martinez-del Rincon J, Herrero-Jaraba JE, Gomez JR, Orrite-Urunuela C: Automatic left luggage detection and tracking using multi-camera ukf. Proceedings of the 9th IEEE International Workshop on Performance Evaluation in Tracking and Surveillance (PETS '06), June 2006 59-66.Google Scholar
- Li L, Luo R, Ma R, Huang W, Leman K: Evaluation of an ivs system for abandoned object detection on pets 2006 datasets. Proceedings of the 9th IEEE International Workshop on Performance Evaluation in Tracking and Surveillance (PETS '06), June 2006 91-98.Google Scholar
- Zhou J, Hoang J: Real time robust human detection and tracking system. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), June 2005 149.Google Scholar
- Wu B, Nevatia R: Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In Proceedings of the IEEE International Conference on Computer Vision, 2005, Washington, DC, USA. Volume 1. IEEE Computer Society; 90-97.Google Scholar
- Wu B, Nevatia R: Tracking of multiple, partially occluded humans based on static body part detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), 2006, Washington, DC, USA 1: 951-958.Google Scholar
- Auvinet E, Grossmann E, Rougier C, Dahmane M, Meunier J: Left-luggage detection using homographies and simple heuristics. Proceedings of the 9th IEEE International Workshop on Performance Evaluation in Tracking and Surveillance (PETS '06), June 2006 51-58.Google Scholar
- Guler S, Silverstein JA, Pushee IH: Stationary objects in multiple object tracking. Proceedings of the IEEE International Conference on Video and Signal Based Surveillance (AVSS '07), September 2007 248-253.Google Scholar
- Smith K, Quelhas P, Gatica-Perez D: Detecting abandoned luggage items in a public space. Proceedings of the 9th IEEE International Workshop on Performance Evaluation in Tracking and Surveillance (PETS '06), June 2006 75-82.Google Scholar
- Pets 2006 dataset http://www.cvg.cs.reading.ac.uk/PETS2006/data.html
- Li L, Huang W, Gu IY-H, Tian Q: Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing 2004, 13(11):1459-1472. 10.1109/TIP.2004.836169View ArticleGoogle Scholar
- Nadimi S, Bhanu B: Physical models for moving shadow and object detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence 2004, 26(8):1079-1087. 10.1109/TPAMI.2004.51View ArticleGoogle Scholar
- Kumar CNR, Bindu A: An efficient skin illumination compensation model for efficient face detection. Proceedings of the 32nd Annual Conference on IEEE Industrial Electronics (IECON '06), November 2006 3444-3449.Google Scholar
- Chai D, Ngan KN: Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology 1999, 9(4):551-564. 10.1109/76.767122View ArticleGoogle Scholar
- Axis communications http://www.axis.com/
- Arecont vision http://www.arecontvision.com/
- Hough V, Paul C: Method and means for recognizing complex patterns. Patent no. 3 069 654, December 1962Google Scholar
- Duda RO, Hart PE: Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM 1972, 15(1):11-15. 10.1145/361237.361242View ArticleMATHGoogle Scholar
- Hetzel G, Leibe B, Levi P, Schiele B: 3d object recognition from range images using local feature histograms. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), 2001 2: 394-399.Google Scholar
- i-LIDS Dataset for AVSS 2007 http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html
- Porikli F, Ivanov Y, Haga T: Robust abandoned object detection using dual foregrounds. EURASIP Journal on Advances in Signal Processing 2008, 2008:-11.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.