Object detection oriented video reconstruction using compressed sensing
- Bin Kang^{1}Email author,
- Wei-Ping Zhu^{1, 2} and
- Jun Yan^{1}
https://doi.org/10.1186/s13634-015-0194-1
© Kang et al.; licensee Springer. 2015
Received: 17 August 2014
Accepted: 5 January 2015
Published: 26 February 2015
Abstract
Moving object detection plays a key role in video surveillance. A number of object detection methods have been proposed in the spatial domain. In this paper, we propose a compressed sensing (CS)-based algorithm for the detection of moving object in video sequences. First, we propose an object detection model to simultaneously reconstruct the foreground, background, and video sequence using the sampled measurement. Then, we use the reconstructed video sequence to estimate a confidence map to improve the foreground reconstruction result. Experimental results show that the proposed moving object detection algorithm outperforms the state-of-the-art approaches and is robust to the movement turbulence and sudden illumination changes.
Keywords
Compressed sensing Low-rank optimization Moving object detection Moving turbulence mitigation1 Introduction
With the strong market demand on sensor networks for video surveillance purpose, the design of multimedia sensors equipped with high-resolution video acquisition systems to adapt to particular environment and the limited bandwidth is of crucial importance. In the multimedia sensor networks, the video sequences captured are first encoded and then transmitted to the processing center for video analysis. Moving object detection, aiming to locate and segment interesting objects in a video sequence, is a key to video surveillance.
A common approach for detecting moving objects, called background subtraction (BS) [1], is to estimate a background model first, and then compare video frames with the background model to detect the moving objects. When processing real video surveillance sequences, BS algorithms face with several challenges such as sudden illumination changes, movement turbulence, etc. [2]. A sudden illumination change strongly affects the appearance of the background and thus causes false foreground subtraction. Movement turbulence may contain (1) periodical or irregular turbulences such as waving trees and water ripples; (2) the objects being suddenly introduced or removed from the scene. It is still an open problem to eliminate the movement turbulence due to its complex structure. Recently, Tsai et al. [3] proposed a fast background subtraction scheme using independent component analysis (ICA) for object detection. This scheme is tolerant of sudden illumination changes in indoor surveillance videos. Zhang et al. [4] proposed a kernel similarity modeling method for motion detection in complex and dynamic environments. This approach is robust to simple movement turbulence. Kim et al. [5] proposed a fuzzy color histogram (FHC)-based background subtraction algorithm to detect the moving object in the dynamic background. This algorithm can minimize the color variations generated by background motion. Chen et al. [6] suggested a hierarchical background model based on the fact that the background images consist of different objects whose conditions may change frequently. In the same year, Han et al. [7] proposed a piecewise background model which integrates color, gradient, and Haar-like features to handle spatiotemporal variations. This model is robust to the illumination change and shadow effect. All the aforementioned BS algorithms operate in the spatial domain and require a large amount of training sequences to estimate a background model. The training process always imposes high computational complexity, so it actually limits the application of BS algorithms in the multimedia sensor networks.
- 1.
There is a key problem as to how to obtain a robust video foreground reconstruction result using the compressed measurement. In order to solve this problem, we first propose a new object detection model to simultaneously reconstruct the video foreground, background, and video sequence using a small number of compressed measurements. Then, we use the reconstructed video sequence to estimate a confidence map, which is used to further refine the foreground reconstruction result.
- 2.
An efficient alternating algorithm is proposed for solving the minimization problem of the new object detection model. We prove that the alternating algorithm is guaranteed to yield a feasible background, foreground, and video reconstruction result.
The paper is organized as follows: Section 2 discusses how to solve the key problem in the CS-based object detection algorithm. Section 3 develops an alternating algorithm for solving the new object detection model. The experimental results of the proposed approach are then given in Section 4. Finally, conclusion is provided in Section 5.
2 Problem formulation
where X ∈ R ^{(MN) × T } is the original video sequence, and B and F represent the background and foreground of the video, respectively. There are two drawbacks with the RPCA model. (1) RPCA cannot reconstruct B and F using the sampled measurement A directly because the original video sequence X is required in object detection. Obviously, the requirement for the original video reconstruction imposes a high computational complexity. (2) In RPCA, the foreground reconstruction result is robust only to the corruption that has a sparse distribution [17,18]. In the real-world video sequence, however, there rarely exists the movement turbulence that is sparse in nature.
where D _{1} and D _{2} are, respectively, the horizontal and vertical difference operators within a frame, and D _{3} is the time-varying difference operator.
The difference between problem (4) and the 3DCS model in [16] is that: the 3DCS model is aimed to give a high video reconstruction result, where not only TV3D is used for video reconstruction but also the nuclear norm is adopted to make use of the low-rank property of the video sequence in the wavelet domain. Problem (4) in this paper is, however, aimed to exactly reconstruct the video foreground and background using a small number of sampled measurements. To achieve this goal, we employ TV3D to guarantee the exact low-rank and sparse decomposition.
By solving problem (4), we can obtain the reconstructed foreground \( \widehat{\mathbf{F}} \), background \( \widehat{\mathbf{B}} \), and the video sequence \( \widehat{\mathbf{X}} \). Since the reconstructed \( \widehat{\mathbf{F}} \) is not robust to strong movement turbulence, Borenstein et al. have proposed in [19] an algorithm to achieve an excellent image segmentation performance by using a confidence map to identify the image region. Inspired by this idea, we use the reconstructed video sequence \( \widehat{\mathbf{X}} \) to construct a confidence map denoted as O = [o _{1}, o _{2} …, o _{ T }], where the element of O is 0 or 1. We then use O to further improve the reconstructed foreground \( \widehat{\mathbf{F}} \) through \( \odot \kern0.5em \widehat{\mathbf{F}} \), where ⊙ denotes the Hadamard (point-wise) product. Note that the confidence map is a binary matrix, in which the location of the movement turbulence is set to 0 and the location of the moving object is set to 1.
where f(x _{ ij }) represents the probability density of a pixel x _{ ij } at jth element in the ith column of \( \widehat{\mathbf{X}} \), ω is the weight of the two Gaussian models, μ_{ x } and σ _{ x } are the mean and the standard deviation, which are estimated by the EM algorithm, and μ_{ p } and Σ _{ p } are the mean and the covariance matrix, which are estimated from the particle trajectory of x _{ ij } [22]. Particle trajectory aims to capture the deformation caused by movement turbulence, which can be obtained by using Lagrangian particle trajectory advection approach [24,25].
The confidence map is obtained as follows: we first estimate each pixel’s probability density f(x _{ ij }) using (5), then we decide which pixels belong to the movement turbulence and which ones belong to the moving object using an threshold θ. If f(x _{ ij }) > θ, we set it as 1. Otherwise, we set it as 0. The obtained binary matrix is the final confidence map.
3 Reconstruction algorithm
Next, we propose an alternating algorithm for the reconstruction of X, B, and F in (6). Each iteration of the alternating algorithm contains two steps: R-step, which aims at reconstructing the original video X; and S-step, which is to segment background and foreground.
In sub-problem (14), X is updated through solving a quadratic problem.
The complete algorithm proposed to solve problem (6) is summarized in Algorithm 1 below.
In the above algorithm, \( \mathbf{M}={\displaystyle \sum_{i=1}^3}{\alpha}_i{\beta}_i{\mathbf{D}}_i^T{\mathbf{D}}_i+{\beta}_4{\mathbf{C}}^T\mathbf{C} \), \( {\mathcal{D}}_{\alpha}\left(\cdotp \right) \)(·) is the singular value shrinkage operator [27], which is defined as follows: suppose the SVD of a matrix Z is given by Z = UΣV ^{ T }, where Σ is an rectangular diagonal matrix in which each diagonal entries Σ _{ ii } is the singular value of Z, U and V are real unitary matrix. The singular value shrinkage operator for matrix Z is defined as \( {\mathcal{D}}_{\alpha }(Z)=U{S}_{\alpha}\left(\varSigma \right){V}^T \), where S _{ α }(·) is soft-thresholding operator for matrix Σ with respect to α. In Algorithm 1, the termination criterion is set as \( \frac{{\left\Vert {\mathbf{X}}^{k+1}-{\mathbf{X}}^k\right\Vert}_F}{{\left\Vert {\mathbf{X}}^k\right\Vert}_F}={10}^{-6} \) considering that the reconstruction of B and F rely on the reconstruction of X.
The solution to problem (7) does not guarantee a global minimum solution for problem (6). Moreover, it is difficult to rigorously prove the convergence of the proposed alternating algorithm for problem (7). But we can prove that there exists a feasible solution for X, B, and F that can minimize the cost function in (6). This feasible solution is stated in the following theorem.
Theorem 1: The sequence {X ^{ k }}, {B ^{ k }}, and {F ^{ k }} generated in Algorithm 1 are bounded, and there exists a feasible point (X*, B*, F*) for the solution of problem (6).
The proof of Theorem 1 is given in Appendix.
4 Experimented results
In this section, we perform numerical experiments to show the performance of the proposed object detection algorithm. We focus on the illustration of the moving object reconstruction result and show that the new object detection algorithm is robust to the movement turbulence.
Parameters used in solving the proposed reconstruction model
η _{1} | β _{1} | β _{2} | β _{3} | β _{4} | β _{5} | τ | μ |
---|---|---|---|---|---|---|---|
\( \frac{1}{\sqrt{M\times N}} \) | 100 | 100 | 100 | 100 | 100 | 1.6 | 1 |
4.1 The new object detection model
Clearly, Figure 4b,c,d give an exact foreground reconstruction result, where in order to see the difference among three images, we have given the local magnified images of the foreground reconstruction result. It is seen that Figure 4d gives the best foreground reconstruction result. Figure 4c gives a slightly better performance than Figure 4b does. This is because the sampling rate used in Figure 4c is higher than that in Figure 4b. Figure 4a does not give a clear foreground reconstruction result due to the poor performance of the video reconstruction result. Comparing with Figure 4a,b,c,d, Figure 4e,f,g,h give poor video foreground and background reconstruction results. This is because that Figure 4e, f, g, h are reconstructed by compressive PCP, which is a special case of problem (6) when α _{ i } = 0 (i = 1, 2, 3). In this special case, the poor video reconstruction performance has become the bottleneck that precludes good video background and foreground reconstruction at low sampling rate. We can conclude from this experiment that using TV3D norm in our model can guarantee a high object detection performance at low sampling rate. In addition to the above subjective measure of the object detection performances at different sampling rates, we choose PSNR and root mean square error (RMSE) as objective evaluation parameters to further illustrate the performance of the proposed object detection model and compressive PCP at different sampling rates.
Evaluation of the proposed model at different sampling rate
Sampling rate | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | |
---|---|---|---|---|---|---|---|---|---|
Proposed object detection model | |||||||||
PSNR | 24.06 | 31.81 | 34.20 | 36.77 | 40.37 | 43.61 | 46.35 | 49.45 | RMSE_B in RPCA |
dB | dB | dB | dB | dB | dB | dB | dB | ||
RMSE_B | 0.080 | 0.061 | 0.059 | 0.057 | 0.052 | 0.049 | 0.047 | 0.046 | |
Compressive PCP | |||||||||
PSNR | 4.61 | 5.16 | 6.83 | 7.19 | 7.32 | 12.51 | 22.43 | 30.56 | Model is 0.045 |
dB | dB | dB | dB | dB | dB | dB | dB | ||
RMSE_B | 0.924 | 0.828 | 0.632 | 0.577 | 0.522 | 0.292 | 0.130 | 0.079 |
4.2 The moving object detection result
Quality evaluation (F-measure) of the detection results in Figure 5
Sequence | Proposed | RPCA | GMM |
---|---|---|---|
Airport | 0.55 | 0.56 | 0.50 |
Lobby | 0.56 | 0.45 | 0.43 |
Canteen | 0.63 | 0.61 | 0.59 |
Shopping mall | 0.49 | 0.50 | 0.39 |
We now illustrate the performance of the proposed algorithm in outdoor video sequence. The outdoor video sequence usually contains strong movement turbulence. We choose campus, fountain, and pedestrian video sequences for this experiment. The pedestrian video sequence is captured by a COTS camera (the SONRY DCW-TRV 740).
5 Conclusion
In this paper, we have proposed a CS-based algorithm for detecting the moving object in video sequences. In order to achieve robust foreground reconstruction result using only a small number of sampled measurements, we have first proposed an object detection model to simultaneously reconstruct the foreground, background, and the original video sequence using the sampled measurements. Then, the reconstructed video sequence is used to estimate a confidence map to refine the foreground reconstruction result. It has been shown through experiment that the proposed moving object detection algorithm can give a good performance for both indoor and outdoor video sequences. Especially for outdoor video sequence, the proposed reconstruction model is able to effectively eliminate the movement turbulence such as waving trees, water fountain, and video noise. In conclusion, the proposed moving object detection algorithm can achieve an accuracy comparable to some known spatial-domain methods with a significantly reduced number of sampled measurements. The limitation of the proposed method includes: (1) In Algorithm 1, solving nuclear norm imposes high computational complexity. (2) There is a lack of theoretical analysis of the impact of the sampling rate on the object detection result. To solve those problems in future work, (1) we will use an online version of object detection model to achieve background reconstruction, and (2) we will refer to [15] for possible theoretical analysis of the performance of the proposed model.
Declarations
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.61372122 and 61302103); the Innovation Program for Postgraduate in Jiangsu Province under Grant (No. CXZZ13_0491).
Authors’ Affiliations
References
- O Barnich, M Van Droogenbroeck, ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20(6), 1709–1724 (2011)View ArticleMathSciNetGoogle Scholar
- Brutzer, B Hoferlin, G Heidemann, Evaluation of background subtraction techniques for video surveillance, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1937–1944Google Scholar
- T Du-Ming, L Shia-Chih, Independent component analysis-based background subtraction for indoor surveillance. IEEE Trans. Image Process. 18(1), 158–167 (2009)View ArticleMathSciNetGoogle Scholar
- Z Baochang, G Yongsheng, Z Sanqiang, Z Bineng, Kernel similarity modeling of texture pattern flow for motion detection in complex background. IEEE Trans. Circuits Syst. Video Technol. 21(1), 29–38 (2011)View ArticleGoogle Scholar
- K Wonjun, K Changick, Background subtraction for dynamic texture scenes using fuzzy color histograms. IEEE Signal Process. Lett. 19(3), 127–130 (2012)View ArticleGoogle Scholar
- S Chen, J Zhang, Y Li, J Zhang, A hierarchical model incorporating segmented regions and pixel descriptors for video background subtraction. IEEE Trans Ind Inf. 8(1), 118–127 (2012)View ArticleGoogle Scholar
- H Bohyung, LS Davis, Density-based multifeature background subtraction with support vector machine. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 1017–1023 (2012)View ArticleGoogle Scholar
- R Baraniuk, Compressive sensing. IEEE Signal Process. Mag. 24(4), 118–121 (2007)View ArticleGoogle Scholar
- DL Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)View ArticleMATHMathSciNetGoogle Scholar
- EJ Candes, MB Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21–30 (2008)View ArticleGoogle Scholar
- J Ma, G Plonka, MY Hussaini, Compressive video sampling with approximate message passing decoding. IEEE Trans. Circuits Syst. Video Technol. 22(9), 1354–1364 (2012)View ArticleGoogle Scholar
- V Cevher, A Sankaranarayanan, M Duarte, D Reddy, R Baraniuk, R Chellappa, Compressive sensing for background subtraction, in Pro. European Conference on Computer Vision (ECCV), 2008Google Scholar
- H Jiang, W Deng, Z Shen, Surveillence video processing using compressive sensing. Inverse Probl. Imaging 6(2), 201–214 (2012)View ArticleMATHMathSciNetGoogle Scholar
- F Yang, H Jiang, Z Shen, W Deng, D Metaxas, Adaptive low rank and sparse decomposition of video using compressive sensing, in Proc. IEEE International Conference on Image Processing (ICIP), 2013, pp. 1016–1020Google Scholar
- J Wright, A Ganesh, K Min, Y Ma, Compressive principal component pursuit. Inf. Inference 2(1), 32–68 (2013)View ArticleMATHMathSciNetGoogle Scholar
- X Shu, N Ahuja, Imaging via three-dimensional compressive sampling (3DCS), in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 439–446Google Scholar
- B Bao, G Liu, C Xu, S Yan, Inductive robust principal component analysis. IEEE Trans. Image Process. 21(8), 3794–3800 (2012)View ArticleMathSciNetGoogle Scholar
- EJ Candes, X Li, Y Ma, J Wright, Robust principal component analysis? J. ACM 58(1), 1–37 (2009)MathSciNetGoogle Scholar
- E Borenstein, E Sharon, S Ullman, Combining top-down and bottom-up segmentation, in Proc. Conference on Computer Vision and Pattern Recognition Workshop, 2004, p. 46Google Scholar
- M Shimizu, S Yoshimura, M Tanaka, M Okutomi, Super-resolution from image sequence under influence of hot-air optical turbulence, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8Google Scholar
- O Oreifej, G Shu, T Pace, M Shah, A two-stage reconstruction approach for seeing through water, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1153–1160Google Scholar
- O Oreifej, X Li, M Shah, Simultaneous video stabilization and moving object detection in turbulence. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 450–462 (2013)View ArticleGoogle Scholar
- C Stauffer, WEL Grimson, Adaptive background mixture models for real-time tracking, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 1999, p. 252Google Scholar
- W Shandong, O Oreifej, M Shah, Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories, in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 1419–1426Google Scholar
- S Wu, BE Moore, M Shah, Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2054–2060Google Scholar
- W Yin, S Morgan, J Yang, Y Zhang, Practical compressive sensing with Toeplitz and circulant matrices, in Visual Communications and Image Processing Huangshan China, 2010Google Scholar
- H Yao, Z Debing, Y Jieping, L Xuelong, H Xiaofei, Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2117–2130 (2013)View ArticleGoogle Scholar
- X Zhou, C Yang, Y Weichuan, Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 597–610 (2013)View ArticleGoogle Scholar
- Z Zivkovic, Improved adaptive Gaussian mixture model for background subtraction. Int. Conf. Pattern Recog. 2, 28–31 (2004)Google Scholar
- L Li, W Huang, IYH Gu, Q Tian, Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 13(11), 1459–1472 (2004)View ArticleGoogle Scholar
- Y Sheikh, M Shah, Bayesian modeling of dynamic scenes for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1778–1792 (2005)View ArticleGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.