- Research Article
- Open Access
Robust Real-Time Background Subtraction Based on Local Neighborhood Patterns
© Ariel Amato et al. 2010
- Received: 1 December 2009
- Accepted: 21 June 2010
- Published: 29 June 2010
This paper describes an efficient background subtraction technique for detecting moving objects. The proposed approach is able to overcome difficulties like illumination changes and moving shadows. Our method introduces two discriminative features based on angular and modular patterns, which are formed by similarity measurement between two sets of RGB color vectors: one belonging to the background image and the other to the current image. We show how these patterns are used to improve foreground detection in the presence of moving shadows and in the case when there are strong similarities in color between background and foreground pixels. Experimental results over a collection of public and own datasets of real image sequences demonstrate that the proposed technique achieves a superior performance compared with state-of-the-art methods. Furthermore, both the low computational and space complexities make the presented algorithm feasible for real-time applications.
- Background Model
- Current Frame
- Foreground Pixel
- Cast Shadow
- Color Vector
Moving object detection is a crucial part of automatic video surveillance systems. One of the most common and effective approach to localize moving objects is background subtraction, in which a model of the static scene background is subtracted from each frame of a video sequence. This technique has been actively investigated and applied by many researchers during the last years [1–3]. The task of moving object detection is strongly hindered by several factors such as shadows cast by moving object, illuminations changes, and camouflage. In particular, cast shadows are the areas projected on a surface because objects are occluding partially or totally direct light sources. Obviously, an area affected by cast shadow experiences a change of illumination. Therefore in this case the background subtraction algorithm can misclassify background as foreground [4, 5]. Camouflage occurs when there is a strong similarity in color between background and foreground; so foreground pixels are classified as background. Broadly speaking, these issues rise problems such as shape distortion, object merging, and even object losses. Thus a robust and accurate algorithm to segment moving object is highly desirable.
In this paper, we present an adaptive background model, which is formed by temporal and spatial components. These components are basically computed by measuring the angle and the Euclidean distance between two sets of color vectors. We will show how these components are combined to improve the robustness and the discriminative sensitivity of the background subtraction algorithm in the presence of moving shadows and strong similarities in color between background and foreground pixels. Another important advantage of our algorithm is its low computational complexity and its low space complexity that makes it feasible for real-time applications.
The rest of the paper is organized as follows. Section 2 introduces a brief literature review. Section 3 presents our method. In Section 4 experimental results are discussed. Concluding remarks are available in Section 5.
Haritaoglu et al. state that in W4  the background is modeled by representing each pixel by three values: its minimum and maximum intensity values and the maximum intensity differences between consecutive frames observed during this training period. Pixels are classified as foreground if the differences between the current value and the minimum and maximum values are greater than the values of the maximal interframe difference. However, this approach is rather sensitive to shadows and lighting changes, since the only illumination intensity cue is used and the memory resource to implement this algorithm is extremely high.
Horprasert et al.  implement a statistical color background algorithm, which use color chrominance and brightness distortion. The background model is built using four values: the mean, the standard deviation, the variation of the brightness, and chrominance distortion. However, this approach usually fails for low and high intensities.
Kim et al.  use a similar approach as , but they obtain more robust motion segmentation in the presents of the illumination and scene changes using background model with codebooks. The codebooks idea gives the possibility to learn more about the model in the training period. The authors propose to cope with the unstable information of the dark pixels, but still they have some problems in the low- and the high-intensity regions. Furthermore, the space complexity of their algorithm is high.
Stauffer and Grimson  address the low- and the high-intensity regions problem by using a mixture of Gaussians to build a background color model for every pixel. Pixels from the current frame are checked against the background model by comparing them with every Gaussian in the model until a matching Gaussian is found. If so, the mean and variance of the matched Gaussian are updated; otherwise a new Gaussian with the mean equal to the current pixel color and some initial variance is introduced into the mixture.
McKenna et al.  assume that cast shadows result in significant change in intensity without much change in chromaticity. Pixel chromaticity is modeled using its mean and variance and the first-order gradient of each background pixel modeled using gradient means and magnitude variance. Moving shadows are then classified as background if the chromaticity or gradient information supports their classification.
Cucchiara et al.  use a model in Hue-Saturation-Value (HSV) and stress their approach in shadow suppression. The idea is that shadows change the hue component slightly and decrease the saturation component significantly. In the HSV color space a more realistic noise model can be done. However, this approach also has drawbacks. The similarity measured in the nonlinear HSV color space usually generates ambiguity at gray levels. Furthermore threshold handling is the major limitation of this approach.
A simple and common background subtraction procedure involves subtraction of each new image from a static model of the scene. As a result a binary mask with two labels (foreground and background) is formed for each pixel in the image plane. Broadly speaking, this technique can be separated in two stages, one dealing with the scene modeling and another with the motion detection process. The scene modeling stage represents a crucial part in the background subtraction technique [12–17].
Usually a simple unimodal approach uses statistical parameters such as mean and standard deviation values, for example, [7, 8, 10], and so forth. Such statistical parameters are obtained during a training period and then these are dynamically updated. In the background modeling process the statistical values depend on both the low- and high-frequency changes of the camera signal. If the standard deviations of the low- and high-frequency components of the signal are comparable, methods based on such statistical parameters exhibit robust discriminability. When the standard deviation of the high-frequency change is significantly less than the low-frequency change, then the background model can be improved to make the discriminative sensitivity much higher. Since a considerable change in the low-frequency domain is produced for the majority of real video sequences, we propose to build a model that is insensitive to low-frequency changes. The main idea is to estimate only the high-frequency change per each pixel value as one interframe interval. The general background model in this case can be explained as the subtraction between the current frame and the previous frame, which suppose to be the background image. Two values for each pixel in the image are computed to model background changes during the training period: the maximum difference in angular and Euclidean distances between the color vectors of the consecutive image frames. The angular difference is used because it can be considered as photometric invariant of color measurement and in turn as significant cues to detect moving shadows.
Often pixelwise comparison is not enough to distinguish background from foreground and in our classification process we further analyze the neighborhood of each pixel position. In the next section we give a formal definition of the proposed similarity measurements.
3.1. Background Scene Modeling
3.1.1. Similarity Measurements
- (ii)Euclidean distance similarity measurement I between two color vectors and in the RGB color space is defined as follows:
where and are intrinsic parameters of the threshold functions of the similarity measurements.
To describe a neighbourhood similarity measurem-ent let us first characterize the index vector which define the position of a pixel in the image. Also we need to name the neighbourhood radius vector which define the positions of pixels that belong to the neighbourhood relative to any current pixel. Indeed, the domain is just a square window around a chosen pixel.
- (iii)Angular neighborhood similarity measurement between two sets of color vectors in the RGB color space and can be written as
where , and are defined in (3) and (1), respectively, and is ( , ).
- (iv)Euclidean distance neighborhood similarity measurement between two sets of color vectors in the RGB color space and can be written aswhere , and are defined in (3) and (2), respectively. With each of the neighbourhood similarity measurements we associate a threshold function:
where and are intrinsic parameters of the threshold functions of the neighborhood similarity measurements.
3.1.2. Scene Modeling
Our background model (BG) will be represented with two classes of components, namely, running components (RCs) and training components (TCs). The RC is a color vector in RGB space and only this component can be updated in running process. The TC is a set of fixed thresholds values obtained during the training.
where is maxima of the chromaticity variation; is maxima of the intensity variation; is the half size of the neighbourhood window.
where is the number of frames in the training period.
3.2. Classification Process
Our classification rules consist of two steps.
where and are experimental scale factors. Otherwise, when (9) is not TRUE, the classification has to be done in the following step.
3.3. Model Updating
where is the updated rate. Due to our experiments the value of this parameter has to be = 0.45.
In this section we present the performance of our approach in terms of quantitative and qualitative results applied to 5 well-known datasets taken from 7 different video sequences: PETS 2009 (http://www.cvg.rdg.ac.uk/ (View 7 and 8)), ATON (http://cvrr.ucsd.edu/aton/shadow/ (Laboratory and Intelligentroom)), ISELAB (http://iselab.cvc.uab.es (ETSE Outdoor)), LVSN (http://vision.gel.ulaval.ca/CastShadows/ (HallwayI)), and VSSN, (http://mmc36.informatik.uni-augsburg.de/VSSN06_OSAC/).
This paper proposes an efficient background subtraction technique which overcomes difficulties like illumination changes and moving shadows. The main novelty of our method is the incorporation of two discriminative similarity measures based on angular and Euclidean distance patterns in local neighborhoods. Such patterns are used to improve foreground detection in the presence of moving shadows and strong similarities in color between background and foreground. Experimental results over a collection of public and own datasets of real image sequences demonstrate the effectiveness of the proposed technique. The method shows an excellent performance in comparison with other methods. Most recent approaches are based on very complex models designed to achieve an extremely effective classification; however these approaches become unfeasible for real-time applications. Alternatively, our proposed method exhibits low computational and space complexities that make our proposal very appropriate for real-time processing in surveillance systems with low-resolution cameras or Internet web-cams.
This work has been supported by the Spanish Research Programs Consolider-Ingenio 2010:MIPRCV (CSD200700018) and Avanza I+D ViCoMo (TSI-020400-2009-133) and by the Spanish projects TIN2009-14501-C02-01 and TIN2009-14501-C02-02.
- Karaman M, Goldmann L, Yu D, Sikora T: Comparison of static background segmentation methods. Visual Communications and Image Processing, 2005, Proceedings of SPIE 5960(4):2140-2151.Google Scholar
- Piccardi M: Background subtraction techniques: a review. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '04), October 2004, The Hague, The Netherlands 4: 3099-3104.Google Scholar
- McIvor A: Background subtraction techniques. Proceedings of the International Conference on Image and Vision Computing, 2000, Auckland, New ZealandGoogle Scholar
- Prati A, Mikic I, Trivedi MM, Cucchiara R: Detecting moving shadows: algorithms and evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2003, 25(7):918-923. 10.1109/TPAMI.2003.1206520View ArticleGoogle Scholar
- Obinata G, Dutta A: Vision Systems: Segmentation and Pattern Recognition. I-TECH Education and Publishing, Vienna, Austria; 2007.View ArticleGoogle Scholar
- Haritaoglu I, Harwood D, Davis LS: W4: real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, 22(8):809-830. 10.1109/34.868683View ArticleGoogle Scholar
- Hoprasert T, Harwood D, Davis LS: A statistical approach for real-time robust background subtraction and shadow detection. Proceedings of the 7th IEEE International Conference on Computer Vision, Frame Rate Workshop (ICCV '99), September 1999, Kerkyra, Greece 4: 1-9.Google Scholar
- Kim K, Chalidabhongse TH, Harwood D, Davis L: Real-time foreground-background segmentation using codebook model. Real-Time Imaging 2005, 11(3):172-185. 10.1016/j.rti.2004.12.004View ArticleGoogle Scholar
- Stauffer C, Grimson WEL: Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, 22(8):747-757. 10.1109/34.868677View ArticleGoogle Scholar
- McKenna SJ, Jabri S, Duric Z, Rosenfeld A, Wechsler H: Tracking groups of people. Computer Vision and Image Understanding 2000, 80(1):42-56. 10.1006/cviu.2000.0870View ArticleMATHGoogle Scholar
- Cucchiara R, Grana C, Piccardi M, Prati A, Sirotti S: Improving shadow suppression in moving object detection with HSV color information. Proceedings of the IEEE Intelligent Transportation Systems Proceedings, August 2001, Oakland, Calif, USA 334-339.Google Scholar
- Toyama K, Krumm J, Brumitt B, Meyers B: Wallflower: principles and practice of background maintenance. Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV '99), September 1999, Kerkyra, Greece 1: 255-261.View ArticleGoogle Scholar
- Elgammal A, Harwood D, Davis LS: Nonparametric background model for background subtraction. Proceedings of the European Conference on Computer Vision (ECCV '00), 2000, Dublin, Ireland 751-767.Google Scholar
- Mittal A, Paragios N: Motion-based background subtraction using adaptive kernel density estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), July 2004, Washington, DC, USA 2: 302-309.Google Scholar
- Chen Y-T, Chen C-S, Huang C-R, Hung Y-P: Efficient hierarchical method for background subtraction. Pattern Recognition 2007, 40(10):2706-2715. 10.1016/j.patcog.2006.11.023View ArticleMATHGoogle Scholar
- Li L, Huang W, Gu IY-H, Tian Q: Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing 2004, 13(11):1459-1472. 10.1109/TIP.2004.836169View ArticleGoogle Scholar
- Zhong J, Sclaroff S: Segmenting foreground objects from a dynamic textured background via a robust Kalman filter. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03), October 2003, Nice, France 44-50.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.