 Research
 Open access
 Published:
Object detection oriented video reconstruction using compressed sensing
EURASIP Journal on Advances in Signal Processing volumeÂ 2015, ArticleÂ number:Â 15 (2015)
Abstract
Moving object detection plays a key role in video surveillance. A number of object detection methods have been proposed in the spatial domain. In this paper, we propose a compressed sensing (CS)based algorithm for the detection of moving object in video sequences. First, we propose an object detection model to simultaneously reconstruct the foreground, background, and video sequence using the sampled measurement. Then, we use the reconstructed video sequence to estimate a confidence map to improve the foreground reconstruction result. Experimental results show that the proposed moving object detection algorithm outperforms the stateoftheart approaches and is robust to the movement turbulence and sudden illumination changes.
1 Introduction
With the strong market demand on sensor networks for video surveillance purpose, the design of multimedia sensors equipped with highresolution video acquisition systems to adapt to particular environment and the limited bandwidth is of crucial importance. In the multimedia sensor networks, the video sequences captured are first encoded and then transmitted to the processing center for video analysis. Moving object detection, aiming to locate and segment interesting objects in a video sequence, is a key to video surveillance.
A common approach for detecting moving objects, called background subtraction (BS) [1], is to estimate a background model first, and then compare video frames with the background model to detect the moving objects. When processing real video surveillance sequences, BS algorithms face with several challenges such as sudden illumination changes, movement turbulence, etc. [2]. A sudden illumination change strongly affects the appearance of the background and thus causes false foreground subtraction. Movement turbulence may contain (1) periodical or irregular turbulences such as waving trees and water ripples; (2) the objects being suddenly introduced or removed from the scene. It is still an open problem to eliminate the movement turbulence due to its complex structure. Recently, Tsai et al. [3] proposed a fast background subtraction scheme using independent component analysis (ICA) for object detection. This scheme is tolerant of sudden illumination changes in indoor surveillance videos. Zhang et al. [4] proposed a kernel similarity modeling method for motion detection in complex and dynamic environments. This approach is robust to simple movement turbulence. Kim et al. [5] proposed a fuzzy color histogram (FHC)based background subtraction algorithm to detect the moving object in the dynamic background. This algorithm can minimize the color variations generated by background motion. Chen et al. [6] suggested a hierarchical background model based on the fact that the background images consist of different objects whose conditions may change frequently. In the same year, Han et al. [7] proposed a piecewise background model which integrates color, gradient, and Haarlike features to handle spatiotemporal variations. This model is robust to the illumination change and shadow effect. All the aforementioned BS algorithms operate in the spatial domain and require a large amount of training sequences to estimate a background model. The training process always imposes high computational complexity, so it actually limits the application of BS algorithms in the multimedia sensor networks.
Compressed sensing (CS) [810] is a recently proposed sampling method which states that if a signal is sparse, it can be faithfully reconstructed from a small number of random measurements. The number of measurements required by CS is much smaller than that required by Nyquist sampling rate. CS can perform image sensing and compression simultaneously with low computational complexity. It has the superiority in reducing the computational cost of video encoder [11]. Due to its advantages, CS has become an attractive solution in object detection. One early attempt of using CS algorithm for object detection is to utilize the sampled measurements of the image background to train an object silhouette firstly, and then use the trained object silhouette to detect the moving object [12]. This algorithm needs a large amount of storage and computation for training the object silhouette, which is not suitable for realtime multimedia sensor networks. In 2012, Jiang et al. [13] proposed an object detection model to perform lowrank and sparse decomposition by using the compressed measurements. Although this model adapts to the limited bandwidth of multimedia sensor networks, it is not robust to the movement turbulence and sudden illumination change because the wavelet transform coefficients of the video sequence are not sparse when encountering with the movement turbulence. In 2013, Yang et al. [14] proposed a CSbased algorithm for object detection. This algorithm can exactly and simultaneously reconstruct the video foreground and background by using only 10% of sampled measurements. However, it still uses the wavelet transform as [13] does to achieve sparse decomposition. This causes false foreground reconstruction in the movement turbulence and sudden illumination change. Write et al. [15] proposed an algorithm called compressive principal component pursuit for analyzing the performance of the natural convex heuristic of solving the problem as to how to recover the lowrank matrix and the sparse component from a small set of linear measurement. This algorithm can be used to achieve object detection in the compressed domain. In this paper, we propose a new CSbased algorithm for detecting moving object. We firstly use threedimensional circulant sampling method to obtain sampled measurement, based on which we reconstruct simultaneously the video foreground and background by solving an optimization problem. The main contributions of this paper are as follows:

1.
There is a key problem as to how to obtain a robust video foreground reconstruction result using the compressed measurement. In order to solve this problem, we first propose a new object detection model to simultaneously reconstruct the video foreground, background, and video sequence using a small number of compressed measurements. Then, we use the reconstructed video sequence to estimate a confidence map, which is used to further refine the foreground reconstruction result.

2.
An efficient alternating algorithm is proposed for solving the minimization problem of the new object detection model. We prove that the alternating algorithm is guaranteed to yield a feasible background, foreground, and video reconstruction result.
The paper is organized as follows: Section 2 discusses how to solve the key problem in the CSbased object detection algorithm. Section 3 develops an alternating algorithm for solving the new object detection model. The experimental results of the proposed approach are then given in Section 4. Finally, conclusion is provided in Section 5.
2 Problem formulation
The authors of [16] have proposed a threedimensional circulant sampling method, as shown in Figure 1, which can perform video sensing and compression simultaneously with low computational complexity and easy hardware implementation. This method achieves video compression in two steps: the first step is random convolution, which yields circulant measurements Cx _{ t } through convolving the original vectorized video frames x _{ t } (tâ€‰=â€‰1, 2, â€¦, T) with a circulant matrix C. The second step is random subsampling, which aims to reduce the number of circulant measurements Cx _{ t }. In this step, a random permutation is first applied to each vector Cx _{ t } by a permutation matrix P. Then the permuted vectors (measurements) PCx _{ t } are each subsampled utilizing a subsampling matrix S _{ t } to generate the compressed (dimensionreduced) measurements a _{t}â€‰=â€‰S _{ t } PCx _{ t }. In the figure, the whole compressed measurements are denoted by matrix Aâ€‰=â€‰[a _{1}, a _{2} â€¦, a _{ T }].
Given the sampled measurement matrix A, how to reconstruct the foreground and background becomes a key problem in the CSbased object detection. In 2009, Candes et al. proposed a robust principal component analysis (RPCA) model to simultaneously reconstruct the video foreground and background by solving the following minimization problem:
where X âˆˆ R ^{(MN) Ã— T} is the original video sequence, and B and F represent the background and foreground of the video, respectively. There are two drawbacks with the RPCA model. (1) RPCA cannot reconstruct B and F using the sampled measurement A directly because the original video sequence X is required in object detection. Obviously, the requirement for the original video reconstruction imposes a high computational complexity. (2) In RPCA, the foreground reconstruction result is robust only to the corruption that has a sparse distribution [17,18]. In the realworld video sequence, however, there rarely exists the movement turbulence that is sparse in nature.
The socalled threedimensional total variation (TV3D) has recently been proposed for CSbased video reconstruction [16], which can exploit both intraframe and interframe correlations of the video sequence. The advantage of TV3D is that it can guarantee the performance of video reconstruction result with a low computational complexity (O (3â€‰Ã—â€‰MNâ€‰Ã—â€‰T)). The TV3D model is formulated as:
where D _{1} and D _{2} are, respectively, the horizontal and vertical difference operators within a frame, and D _{3} is the timevarying difference operator.
In order to detect the moving object from the sampled measurement directly, we propose a new object detection model, by combining TV3D and RPCA, that can simultaneously reconstruct foreground, background, and video sequence. The proposed object detection model is described as:
where Xâ€‰=â€‰[x _{1}, x _{2}, â€¦, x _{ T }] represents the original video sequence to be reconstructed Bâ€‰=â€‰[b _{1}, b _{2}, â€¦, b _{ T }] is the background, Fâ€‰=â€‰[f _{1}, f _{2} â€¦, f _{ T }] is the foreground (moving object), and Î¦ is the measurement matrix. Since the accuracy of the reconstructed background and foreground relies on the performance of the video reconstruction result, the TV3D is used to enhance the quality of the reconstructed video. As mentioned earlier, TV3D has a low computational complexity (see (2)), while (3) gives a similar computational complexity as RPCA does. Therefore, problem (3) is less insensitive to the variable initialization, and we can initialize X, B and F as zero matrices. Note that solving the minimization problem of rank(B) in (3) is NPhard due to its nonconvexity and discontinuous nature [17]. We would like to relax the rank(B) function through a nuclear norm, leading (3) to:
The difference between problem (4) and the 3DCS model in [16] is that: the 3DCS model is aimed to give a high video reconstruction result, where not only TV3D is used for video reconstruction but also the nuclear norm is adopted to make use of the lowrank property of the video sequence in the wavelet domain. Problem (4) in this paper is, however, aimed to exactly reconstruct the video foreground and background using a small number of sampled measurements. To achieve this goal, we employ TV3D to guarantee the exact lowrank and sparse decomposition.
By solving problem (4), we can obtain the reconstructed foreground \( \widehat{\mathbf{F}} \), background \( \widehat{\mathbf{B}} \), and the video sequence \( \widehat{\mathbf{X}} \). Since the reconstructed \( \widehat{\mathbf{F}} \) is not robust to strong movement turbulence, Borenstein et al. have proposed in [19] an algorithm to achieve an excellent image segmentation performance by using a confidence map to identify the image region. Inspired by this idea, we use the reconstructed video sequence \( \widehat{\mathbf{X}} \) to construct a confidence map denoted as Oâ€‰=â€‰[o _{1}, o _{2} â€¦, o _{ T }], where the element of O is 0 or 1. We then use O to further improve the reconstructed foreground \( \widehat{\mathbf{F}} \) through \( \odot \kern0.5em \widehat{\mathbf{F}} \), where âŠ™ denotes the Hadamard (pointwise) product. Note that the confidence map is a binary matrix, in which the location of the movement turbulence is set to 0 and the location of the moving object is set to 1.
In realworld video surveillance, movement turbulence is repetitive and locally centered [20,21], which can be modeled by Gaussian distribution [22,23]. In this paper, we utilize the following mixed Gaussian model to estimate the intensity distribution of a pixel undergoing movement turbulence [22].
where f(x _{ ij }) represents the probability density of a pixel x _{ ij } at jth element in the ith column of \( \widehat{\mathbf{X}} \), Ï‰ is the weight of the two Gaussian models, Î¼_{ x } and Ïƒ _{ x } are the mean and the standard deviation, which are estimated by the EM algorithm, and Î¼_{ p } and Î£ _{ p } are the mean and the covariance matrix, which are estimated from the particle trajectory of x _{ ij } [22]. Particle trajectory aims to capture the deformation caused by movement turbulence, which can be obtained by using Lagrangian particle trajectory advection approach [24,25].
The confidence map is obtained as follows: we first estimate each pixelâ€™s probability density f(x _{ ij }) using (5), then we decide which pixels belong to the movement turbulence and which ones belong to the moving object using an threshold Î¸. If f(x _{ ij })â€‰>â€‰Î¸, we set it as 1. Otherwise, we set it as 0. The obtained binary matrix is the final confidence map.
3 Reconstruction algorithm
In problem (4), we generalize the process of video compression as a _{t}â€‰=â€‰Î¦x _{t}. Since we use P, C, and S _{ t } (tâ€‰=â€‰1, 2, â€¦, T) to generate the compressed measurement A (see Figure 1), we should use the specific form r _{ t }â€‰=â€‰Cx _{ t } and S _{ t } Pr _{ t }â€‰=â€‰a _{t} (tâ€‰=â€‰1, 2, â€¦, T) to replace the Î¦Xâ€‰=â€‰A in (4) and rewrite it as:
where Râ€‰=â€‰[r _{1}, r _{2}, â€¦, r _{ T }] is the circulant measurement.
Next, we propose an alternating algorithm for the reconstruction of X, B, and F in (6). Each iteration of the alternating algorithm contains two steps: Rstep, which aims at reconstructing the original video X; and Sstep, which is to segment background and foreground.
In Rstep, we reconstruct X by solving the following problem:
We adopt the augmented Lagrange multiplier (ALM) algorithm [26] to solve problem (7). The augmented Lagrange function of (7) is given by:
where Î» _{ i } and Ï… are Lagrange multiplier matrices. The constrained optimization problem in (7) has been replaced by problem (8). The ALM algorithm is to solve the minimization problem of (8) by iteratively minimizing the Lagrange function and updating the Lagrange multiplier,
Note that it is difficult to solve (9) directly. One can use an alternating strategy to minimize the augmented Lagrange function with respect to each component separately, namely,
The subproblem in (12) is solved as follows:
where S _{ Î± }(Â·) is a softthresholding operator, which is defined, for a scalar x, as:
where Î± is represented as a softthresholding. Suppose there is a matrix Zâ€‰=â€‰(z _{ ij }). Then, S _{ Î± }(Z) outputs a matrix which defines an operator for matrix Z with respect to scalar Î±, i.e., the elements of S _{ Î± }(Z) follow the definition in (16).
Next, we solve the subproblem (13) through the following two steps [16].
where PicS _{ t } is the index of measurements which is selected by S _{ t }, and r _{ t } is the tth column in R.
In subproblem (14), X is updated through solving a quadratic problem.
By fixing X ^{k + 1}, we reconstruct B and F in Sstep by solving the following problem:
The augmented Lagrange function of (19) can be expressed as:
where Y is the Lagrange multiplier matrix, andâ€‰<â€‰Â·, Â·â€‰>â€‰denotes the matrix inner product. We use ALM algorithm to solve the minimization problem in (20) by the following two steps:
Similarly, we use an alternating strategy to minimize problem (21) with respect to each component separately:
The complete algorithm proposed to solve problem (6) is summarized in Algorithm 1 below.
In the above algorithm, \( \mathbf{M}={\displaystyle \sum_{i=1}^3}{\alpha}_i{\beta}_i{\mathbf{D}}_i^T{\mathbf{D}}_i+{\beta}_4{\mathbf{C}}^T\mathbf{C} \), \( {\mathcal{D}}_{\alpha}\left(\cdotp \right) \)(Â·) is the singular value shrinkage operator [27], which is defined as follows: suppose the SVD of a matrix Z is given by Zâ€‰=â€‰UÎ£V ^{T}, where Î£ is an rectangular diagonal matrix in which each diagonal entries Î£ _{ ii } is the singular value of Z, U and V are real unitary matrix. The singular value shrinkage operator for matrix Z is defined as \( {\mathcal{D}}_{\alpha }(Z)=U{S}_{\alpha}\left(\varSigma \right){V}^T \), where S _{ Î± }(Â·) is softthresholding operator for matrix Î£ with respect to Î±. In Algorithm 1, the termination criterion is set as \( \frac{{\left\Vert {\mathbf{X}}^{k+1}{\mathbf{X}}^k\right\Vert}_F}{{\left\Vert {\mathbf{X}}^k\right\Vert}_F}={10}^{6} \) considering that the reconstruction of B and F rely on the reconstruction of X.
The solution to problem (7) does not guarantee a global minimum solution for problem (6). Moreover, it is difficult to rigorously prove the convergence of the proposed alternating algorithm for problem (7). But we can prove that there exists a feasible solution for X, B, and F that can minimize the cost function in (6). This feasible solution is stated in the following theorem.
Theorem 1: The sequence {X ^{k}}, {B ^{k}}, and {F ^{k}} generated in Algorithm 1 are bounded, and there exists a feasible point (X*, B*, F*) for the solution of problem (6).
The proof of Theorem 1 is given in Appendix.
4 Experimented results
In this section, we perform numerical experiments to show the performance of the proposed object detection algorithm. We focus on the illustration of the moving object reconstruction result and show that the new object detection algorithm is robust to the movement turbulence.
For quantitative evaluation, we utilize Fmeasure to evaluate the accuracy of the moving object detection result. The Fmeasure is defined as:
where â€˜precisionâ€™ and â€˜recallâ€™ are given by:
â€˜Precisionâ€™ and â€˜recallâ€™ are two kinds of classification accuracy parameters which are widely used to measure the accuracy of the background subtraction result [28]. In the â€˜precisionâ€™ and â€˜recallâ€™, TP, FP, and FN are the number of true positives, the number of false positives, and the number of false negatives, respectively. The higher the Fmeasure, the better the accuracy of the moving object detection is. The major parameters used in Algorithm 1 are shown in Table 1. In our experiments, we compare the proposed object detection algorithm with the RPCA method as well as a widely used background subtraction algorithm called improved Guassian mixture model (GMM) [29]. Both RPCA and GMM are operated in the spatial domain. All the experiments are performed on an Acer PC (CPU is Inter(R)Core(TM) i32310 M 2.10 GHz).
The testing video sequence for all the experiments are chosen from the database which are detailed in Table 2.
4.1 The new object detection model
Here, we choose Fountain sequence as an example to show first the video reconstruction result of the new object detection model. In this experiment, we compare the video reconstruction result of the proposed object detection model with three known video reconstruction sparsity measures: 2DTV, DWT, and 2DTVâ€‰+â€‰DWT. The simulation results in terms of peak signaltonoise ratio (PSNR) of the four methods are shown in Figure 2.
It is seen that the PSNR of the reconstructed video using the proposed object detection model is significantly higher than that of 2DTV, DWT, and 2DTVâ€‰+â€‰DWT. Figure 3 shows the twentieth frame of the original video sequence and the corresponding reconstruction results of the four methods. Evidently, the reconstructed video frame using the proposed object detection model is clearer than that from 2DTV, DWT, and 2DTVâ€‰+â€‰DWT. We can conclude from this experiment that the proposed reconstruction model is able to yield superior video reconstruction performance.
Next, we illustrate the video reconstruction and object detection results of our proposed model versus the sampling rate as shown in Figure 4. The chosen video sequence is from an airport video which contains a large amount of edge information and thus can highlight the difference of video foreground reconstruction results at different sampling rates. In addition, we compare our object detection results with compressive principal component pursuit (PCP) in Figure 4, which clearly shows the advantage of using TV3D norm in our object detection model.
Clearly, Figure 4b,c,d give an exact foreground reconstruction result, where in order to see the difference among three images, we have given the local magnified images of the foreground reconstruction result. It is seen that Figure 4d gives the best foreground reconstruction result. Figure 4c gives a slightly better performance than Figure 4b does. This is because the sampling rate used in Figure 4c is higher than that in Figure 4b. Figure 4a does not give a clear foreground reconstruction result due to the poor performance of the video reconstruction result. Comparing with Figure 4a,b,c,d, Figure 4e,f,g,h give poor video foreground and background reconstruction results. This is because that Figure 4e, f, g, h are reconstructed by compressive PCP, which is a special case of problem (6) when Î± _{ i }â€‰=â€‰0 (iâ€‰=â€‰1, 2, 3). In this special case, the poor video reconstruction performance has become the bottleneck that precludes good video background and foreground reconstruction at low sampling rate. We can conclude from this experiment that using TV3D norm in our model can guarantee a high object detection performance at low sampling rate. In addition to the above subjective measure of the object detection performances at different sampling rates, we choose PSNR and root mean square error (RMSE) as objective evaluation parameters to further illustrate the performance of the proposed object detection model and compressive PCP at different sampling rates.
In Table 3, PSNR is used to measure the video reconstruction result and RMSE_B is utilized to evaluate the RMSE of the background reconstruction result. From Figure 4 and Table 3, we could see that at 20% sampling rate, the PSNR of our video reconstruction result is already above 30 dB, this means that we have obtained enough information for the exact foreground reconstruction result.
4.2 The moving object detection result
Here, we illustrate the performance of the proposed object detection algorithm with an emphasis on the reconstruction of foreground and background. In order to compare with GMM algorithm, we give the binary form of our foreground reconstruction result in the following experiments. We choose four indoor video sequences (airport, lobby, canteen, and shopping mall) to illustrate that the proposed object detection algorithm is able to give a similar performance as the popular spatialdomain moving object detection methods do. The reconstruction results of our proposed algorithm for four indoor video sequences are shown in Figure 5, where columns 1 to 4 are the moving object detection results of airport, lobby, canteen, and shopping mall video sequences, respectively. It is seen that our proposed algorithm using only 20% sampled measurement can give a similar moving object detection performance as the RPCA and GMM methods do. In the lobby and canteen video sequences, the proposed moving object detection algorithm is able to reduce the shadow turbulence. Table 4 gives objective evaluation results in terms of the Fmeasure of the proposed algorithm along with the two known methods for the four video sequences. We can see that the Fmeasure of the proposed object detection algorithm is obviously higher than that of the GMM method. From this experiment, we can conclude that the proposed moving object detection algorithm is able to exactly detect the moving object using only 20% sampled measurements in the indoor video sequences.
Figure 6 shows object detection results of the lobby video sequence with a sudden illumination change from the 10th frame to 11th frame. It is clearly seen that the proposed algorithm is robust to the sudden illumination changes of the indoor video sequence.
We now illustrate the performance of the proposed algorithm in outdoor video sequence. The outdoor video sequence usually contains strong movement turbulence. We choose campus, fountain, and pedestrian video sequences for this experiment. The pedestrian video sequence is captured by a COTS camera (the SONRY DCWTRV 740).
This test case is very challenging because the whole video sequence is strongly disturbed by the swaying tree and flag. From Figure 7, it is obvious that the proposed algorithm is able to effectively eliminate the turbulence of the swaying trees (Figure 7b), while RPCA is not robust to this kind of strong movement turbulence (Figure 7c). The postprocessing result of RPCA (Figure 7e) can give a slightly better performance than the proposed algorithm does. Although the GMM method can reduce the movement turbulence (Figure 7d), its foreground reconstruction result is not better than that of the proposed object detection algorithm. We can conclude from this experiment that the proposed object detection algorithm is able to give a robust foreground reconstruction result using only 40% sampled measurement.
In this experiment, the background involves a huge fountain, which would strongly disturb the moving object. It is seen from Figure 8 that the new object detection algorithm is able to efficiently eliminate the fountain turbulence, and it gives a better foreground reconstruction result than the GMM method does (Figure 8b,d). RPCA is still not robust to this kind of movement turbulence (Figure 8c). The postprocessing result of RPCA (Figure 8e) is better than the proposed algorithm due to the fact that RPCA is operated in the spatial domain. The original video sequence can give RPCA a large amount of detailed information.
We choose a realworld outdoor video sequence to conduct this experiment. The chosen video sequence contains ordinary turbulence such as shadow and cameral noise. We randomly select four frames to show the moving object detection performance of different methods. It is clearly shown in Figure 9b that the proposed object detection algorithm is able to exactly distinguish the contour outlines of the moving person. It can completely eliminate the cameral noise. Both of RPCA and GMM (see Figure 9c,d) could not give a clear moving object detection result. The averaged Fmeasure of Figures 7, 8, and 9 are given in Table 5, which show that the proposed algorithm gives an obviously higher Fmeasure than the RPCA and GMM methods do.
5 Conclusion
In this paper, we have proposed a CSbased algorithm for detecting the moving object in video sequences. In order to achieve robust foreground reconstruction result using only a small number of sampled measurements, we have first proposed an object detection model to simultaneously reconstruct the foreground, background, and the original video sequence using the sampled measurements. Then, the reconstructed video sequence is used to estimate a confidence map to refine the foreground reconstruction result. It has been shown through experiment that the proposed moving object detection algorithm can give a good performance for both indoor and outdoor video sequences. Especially for outdoor video sequence, the proposed reconstruction model is able to effectively eliminate the movement turbulence such as waving trees, water fountain, and video noise. In conclusion, the proposed moving object detection algorithm can achieve an accuracy comparable to some known spatialdomain methods with a significantly reduced number of sampled measurements. The limitation of the proposed method includes: (1) In Algorithm 1, solving nuclear norm imposes high computational complexity. (2) There is a lack of theoretical analysis of the impact of the sampling rate on the object detection result. To solve those problems in future work, (1) we will use an online version of object detection model to achieve background reconstruction, and (2) we will refer to [15] for possible theoretical analysis of the performance of the proposed model.
References
O Barnich, M Van Droogenbroeck, ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20(6), 1709â€“1724 (2011)
Brutzer, B Hoferlin, G Heidemann, Evaluation of background subtraction techniques for video surveillance, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1937â€“1944
T DuMing, L ShiaChih, Independent component analysisbased background subtraction for indoor surveillance. IEEE Trans. Image Process. 18(1), 158â€“167 (2009)
Z Baochang, G Yongsheng, Z Sanqiang, Z Bineng, Kernel similarity modeling of texture pattern flow for motion detection in complex background. IEEE Trans. Circuits Syst. Video Technol. 21(1), 29â€“38 (2011)
K Wonjun, K Changick, Background subtraction for dynamic texture scenes using fuzzy color histograms. IEEE Signal Process. Lett. 19(3), 127â€“130 (2012)
S Chen, J Zhang, Y Li, J Zhang, A hierarchical model incorporating segmented regions and pixel descriptors for video background subtraction. IEEE Trans Ind Inf. 8(1), 118â€“127 (2012)
H Bohyung, LS Davis, Densitybased multifeature background subtraction with support vector machine. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 1017â€“1023 (2012)
R Baraniuk, Compressive sensing. IEEE Signal Process. Mag. 24(4), 118â€“121 (2007)
DL Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289â€“1306 (2006)
EJ Candes, MB Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21â€“30 (2008)
J Ma, G Plonka, MY Hussaini, Compressive video sampling with approximate message passing decoding. IEEE Trans. Circuits Syst. Video Technol. 22(9), 1354â€“1364 (2012)
V Cevher, A Sankaranarayanan, M Duarte, D Reddy, R Baraniuk, R Chellappa, Compressive sensing for background subtraction, in Pro. European Conference on Computer Vision (ECCV), 2008
H Jiang, W Deng, Z Shen, Surveillence video processing using compressive sensing. Inverse Probl. Imaging 6(2), 201â€“214 (2012)
F Yang, H Jiang, Z Shen, W Deng, D Metaxas, Adaptive low rank and sparse decomposition of video using compressive sensing, in Proc. IEEE International Conference on Image Processing (ICIP), 2013, pp. 1016â€“1020
J Wright, A Ganesh, K Min, Y Ma, Compressive principal component pursuit. Inf. Inference 2(1), 32â€“68 (2013)
X Shu, N Ahuja, Imaging via threedimensional compressive sampling (3DCS), in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 439â€“446
B Bao, G Liu, C Xu, S Yan, Inductive robust principal component analysis. IEEE Trans. Image Process. 21(8), 3794â€“3800 (2012)
EJ Candes, X Li, Y Ma, J Wright, Robust principal component analysis? J. ACM 58(1), 1â€“37 (2009)
E Borenstein, E Sharon, S Ullman, Combining topdown and bottomup segmentation, in Proc. Conference on Computer Vision and Pattern Recognition Workshop, 2004, p. 46
M Shimizu, S Yoshimura, M Tanaka, M Okutomi, Superresolution from image sequence under influence of hotair optical turbulence, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1â€“8
O Oreifej, G Shu, T Pace, M Shah, A twostage reconstruction approach for seeing through water, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1153â€“1160
O Oreifej, X Li, M Shah, Simultaneous video stabilization and moving object detection in turbulence. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 450â€“462 (2013)
C Stauffer, WEL Grimson, Adaptive background mixture models for realtime tracking, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 1999, p. 252
W Shandong, O Oreifej, M Shah, Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories, in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 1419â€“1426
S Wu, BE Moore, M Shah, Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2054â€“2060
W Yin, S Morgan, J Yang, Y Zhang, Practical compressive sensing with Toeplitz and circulant matrices, in Visual Communications and Image Processing Huangshan China, 2010
H Yao, Z Debing, Y Jieping, L Xuelong, H Xiaofei, Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2117â€“2130 (2013)
X Zhou, C Yang, Y Weichuan, Moving object detection by detecting contiguous outliers in the lowrank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 597â€“610 (2013)
Z Zivkovic, Improved adaptive Gaussian mixture model for background subtraction. Int. Conf. Pattern Recog. 2, 28â€“31 (2004)
L Li, W Huang, IYH Gu, Q Tian, Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 13(11), 1459â€“1472 (2004)
Y Sheikh, M Shah, Bayesian modeling of dynamic scenes for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1778â€“1792 (2005)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.61372122 and 61302103); the Innovation Program for Postgraduate in Jiangsu Province under Grant (No. CXZZ13_0491).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Appendix: Proof of Theorem 1
Appendix: Proof of Theorem 1
The proof of Theorem 1 is based on following two Lemmas.
Lemma 1: Let \( {\mathbf{a}}_i^k={\boldsymbol{\uplambda}}_i^k\tau \left({\mathbf{G}}_i^{k+1}{\mathbf{D}}_i{\mathbf{X}}^k\right)\kern0.48em \left(i=1,2,3\right) \), b ^{k}â€‰=â€‰Y ^{k}â€‰+â€‰Î¼(X ^{k + 1}â€‰âˆ’â€‰B ^{k + 1}â€‰âˆ’â€‰F ^{k}) c ^{k}â€‰=â€‰Y ^{k}â€‰+â€‰Î¼(X ^{k + 1}â€‰âˆ’â€‰B ^{k + 1}â€‰âˆ’â€‰F ^{k + 1}). The sequence \( \left\{{\mathbf{a}}_i^k\right\}\left(i=1,2,3\right) \), {b ^{k}} and {c ^{k}} are bounded.
Proof of Lemma 1
(i) We prove first \( \left\{{\mathbf{a}}_i^k\right\}\left(i=1,2,3\right) \) is bounded.
In each iteration of Algorithm 1, G _{ i }, (iâ€‰=â€‰1, 2, 3) is updated through solving problem (7), and B and F are reconstructed through solving problem (19). When minimizing the Lagrange function in (8), we can obtain:
\( \because \kern0.5em {\mathbf{a}}_i^k\in {\partial}_{{\mathbf{G}}_i}{\left\Vert {\mathbf{G}}_i^{k+1}\right\Vert}_1 \), hence, \( \left\{{\mathbf{a}}_i^k\right\}\left(i=1,2,3\right) \) is bounded [22].
(ii) Now we prove {b ^{k}} and {c ^{k}} are bounded.
When minimizing the Lagrange function in (20), we can obtain that:
âˆµâ€‰ b ^{k}â€‰âˆˆâ€‰âˆ‚_{B}â€–B^{k+ 1}â€–*, so the sequence {b ^{k}} is bounded [22].
âˆµâ€‰ c ^{k}â€‰âˆˆâ€‰âˆ‚_{F}â€–F ^{k+ 1}â€–_{1}, hence the sequence {c ^{k}} is bounded [22].
Lemma 2: Let â„’(B, F, Y) be the Lagrange function of problem (19). Then we have â„’^{k + 1}â€‰=â€‰â„’(B ^{k + 1}, F ^{k + 1}, Y ^{k}) and \( {\mathrm{\mathcal{L}}}^{k+1}{\mathrm{\mathcal{L}}}^k\le \frac{1}{\mu }{e}^k \) with e ^{k}â€‰=â€‰â€–Y ^{k}â€‰âˆ’â€‰Y ^{k âˆ’ 1}â€–_{ F }, kâ€‰=â€‰1, 2, â€¦
Proof of Lemma 2
Let â„’^{k + 1}â€‰=â€‰â„’(B ^{k + 1}, F ^{k + 1}, Y ^{k}), then:
Substituting (A2) into (A1), we can obtain that \( {\mathrm{\mathcal{L}}}^{k+1}{\mathrm{\mathcal{L}}}^k\le \frac{1}{\mu }{\left\Vert {\mathbf{Y}}^k{\mathbf{Y}}^{k1}\right\Vert}_F \). End of proof of Lemma 2.
In Algorithm. 1, we need to solve two Lagrange functions: â„’(X, G _{ i }, R, Î» _{ i }, Ï…) and â„’(B, F, Y ^{k}) for the reconstruction of X, B and F. If there exists a feasible point (X*, B*, F*) for the solution of problem (6), we must make sure that â„’(X, G _{ i }, R, Î» _{ i }, Ï…) and â„’(B, F, Y ^{k}) are all bounded.
In Theorem 1 of [16], it has been proved that â„’(X, G _{ i }, R, Î» _{ i }, Ï…) is bounded, so we only need to prove that â„’(B, F, Y ^{k}) is bounded. We have proved that: {c ^{k}} is bounded in Lemma 1.
Noting that Y ^{k + 1}â€‰=â€‰c ^{k}, we have bounded {Y ^{k}}.
Since \( {\mathrm{\mathcal{L}}}^{k+1}{\mathrm{\mathcal{L}}}^k\le \frac{1}{\mu }{\left\Vert {\mathbf{Y}}^k{\mathbf{Y}}^{k1}\right\Vert}_F \) and {Y ^{k}} is bounded, we conclude that â„’(B, F, Y ^{k}) is bounded.
Next we prove that {X ^{k}}, {B ^{k}} and {F ^{k}} are bounded.
As it has been proved that â„’(X, G _{ i }, R, Î» _{ i }, Ï…) is bounded, we can obtain that the Lagrange multiplier \( \left\{{\boldsymbol{\uplambda}}_i^k\right\} \) is bounded.
From (10) in paper, we obtain:
From Lemma 1, we have:
Subtracting (A4) into (A3), we get:
\( \left\{{\boldsymbol{\uplambda}}_i^k\right\} \) and \( \left\{{\mathbf{a}}_i^k\right\} \) are bounded, so {X ^{k}} is bounded.
Noting that:
Therefore,
Since Y ^{k} is bounded, we get \( \left\langle {\mathbf{Y}}^k,{\mathbf{X}}^{k+1}{\mathbf{B}}^{k+1}{\mathbf{F}}^{k+1}\right\rangle +\frac{\mu }{2}{\left\Vert {\mathbf{X}}^{k+1}{\mathbf{B}}^{k+1}{\mathbf{F}}^{k+1}\right\Vert}_F^2 \) converges to 0.
Thus, we have:
As â„’^{k + 1} is bounded and \( \left\langle {\mathbf{Y}}^k,{\mathbf{X}}^{k+1}{\mathbf{B}}^{k+1}{\mathbf{F}}^{k+1}\right\rangle +\frac{\mu }{2}{\left\Vert {\mathbf{X}}^{k+1}{\mathbf{B}}^{k+1}{\mathbf{F}}^{k+1}\right\Vert}_F^2 \) converge to 0, we can prove that {B ^{k}} and {F ^{k}} are bounded.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Kang, B., Zhu, WP. & Yan, J. Object detection oriented video reconstruction using compressed sensing. EURASIP J. Adv. Signal Process. 2015, 15 (2015). https://doi.org/10.1186/s1363401501941
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363401501941