- Research Article
- Open Access
An Adaptive Motion Segmentation for Automated Video Surveillance
EURASIP Journal on Advances in Signal Processing volume 2008, Article number: 187413 (2008)
This paper presents an adaptive motion segmentation algorithm utilizing spatiotemporal information of three most recent frames. The algorithm initially extracts the moving edges applying a novel flexible edge matching technique which makes use of a combined distance transformation image. Then watershed-based iterative algorithm is employed to segment the moving object region from the extracted moving edges. The challenges of existing three-frame-based methods include slow movement, edge localization error, minor movement of camera, and homogeneity of background and foreground region. The proposed method represents edges as segments and uses a flexible edge matching algorithm to deal with edge localization error and minor movement of camera. The combined distance transformation image works in favor of accumulating gradient information of overlapping region which effectively improves the sensitivity to slow movement. The segmentation algorithm uses watershed, gradient information of difference image, and extracted moving edges. It helps to segment moving object region with more accurate boundary even some part of the moving edges cannot be detected due to region homogeneity or other reasons during the detection step. Experimental results using different types of video sequences are presented to demonstrate the efficiency and accuracy of the proposed method.
Automatic segmentation of moving object is the fundamental technique for analyzing image sequence in video surveillance, video conferencing, multimedia, or real-time imaging applications. Successful detection of motion helps to reduce the information redundancy in all of these applications and thus motion detection has become an active research issue. A lot of research works have been afforded in this area and it is still challenging. Details of existing research works can be found in . One of the typical approaches for moving object detection is background subtraction where background modeling is an unavoidable part intending to accommodate the changes in the environment. However, most of the background-modeling approaches are complex and time consuming for real-time processing . Moreover, these methods show poor performance due to lack of compensation with the dynamism of real environments . Some researchers use edge information instead of intensity as edge shows more robustness against illumination variation and noise . Nonetheless, it is difficult to keep track of the variations in the environment because of the poor representation of edges by the existing edge pixel-based methods.
To get rid of these challenges, some methods do not utilize any background; instead they make use of temporal information of consecutive frames [5, 6]. Because of region homogeneity and high sensitivity of pixel intensity to noise, these methods show poor performance in detecting moving object region properly. With compare to region-based methods, edge-based methods are robust, as the difference of consecutive frames produces significant difference only on the boundary region of moving object [2, 7, 8]. However, these methods treat each edge point independently without carrying shape and neighborhood information and thus it is not convenient for matching, segmentation, and tracking. Due to illumination variation, quantization error and speckle error, edge pixel may change its location in subsequent frames which is termed as edge-localization error. Hence, by pixel-wise matching, these methods result in scattered moving edges and show frequent failures in detecting moving object.
Figure 1 illustrates some of the limitations of existing methods. Figures 1(a)–1(c) show three successive frames: previous (), current (), and next (). Left and right difference image edge maps computed from the difference images of these three frames are shown in Figures 1(d) and 1(e), respectively. Figure 1(f) shows the scattered moving edges obtained by exact matching among difference image edge maps. This deterioration is resulted from the positional variations of edge points in different frames. Some methods utilize more than three successive frames and accumulate moving edge information of to improve the detection result  as shown in Figure 1(g). However, these methods require more computation and still suffer from scattered edges due to edge localization error. Figure 1(h) shows the moving edges, where matching between two pixels is considered if the distance is not greater than two. Still the detected moving edges are scattered and deviate from the ground truth of the moving object as shown in Figure 1(i). Existing three frame-based methods perform worse while detecting objects with slow movement in two successive frames. This is due to the insignificant gradient values in the respective edge location in overlapping region of the difference image. Moreover, edges in the overlapping region in difference image edge map deteriorate in size and shape than that in current image, . So, it is difficult to recover the actual moving edges in as shown in Figure 1(i) by matching left and right difference image edge maps. For the better visualization, the edges of left and right difference image edge maps within the region of moving object in are shown in Figures 1(j) and 1(k), respectively. Figure 1(l) shows the missing of moving object region in the difference image obtained by subtracting two successive frames as shown in Figures 1(a) and 1(b), respectively. This problem occurs due to the region homogeneity which discourages region-based background-independent approaches to detect moving object.
Considering the above-mentioned problems, we extract moving edges from current image while removing the background edges by comparing with spatiotemporal edge information of three successive frames. We represent edges as segments, where all the edge pixels belonging to a segment are considered as a unit and processed together. A distance transformation-based flexible edge matching is proposed which is more robust than pixel-based matching. Segment-based representation of edges and flexible edge matching make the system efficient in terms of accuracy and time, and reduce the occurrence of scattered moving edges significantly. The proposed method is adaptive to the illumination variation as it uses most recent frames. A watershed-based algorithm is utilized which makes use of the extracted moving edges to segment moving object region with more accurate boundary. It ensures meaningful representation of moving objects, which is essential in video surveillance and many other image content-based applications in multimedia communication.
2. Related Works
A good number of research efforts have been reported in moving object detection during last few years. Background subtraction-based methods are the typical approaches for moving object detection because of its simplicity [9–11]. However, intensity of background pixel frequently changes due to object motion, illumination variation, or noise effect. To deal with these dynamisms, background subtraction-based methods need to incorporate automatic background estimation and update methods [3, 12, 13]. These methods usually utilize temporal change information or optical flow-based information to identify the appropriate pixel values in a time series for the background model, which are complex and time consuming for real-time processing. Some optical flow-based approaches are used for moving object detection [14, 15]. For these methods, intensity changes are important cues for locating object movement in time and space. However, these methods may result false detection if the temporal changes are generated by noise or other external factors like illumination drift due to weather change. Moreover, these methods are computationally expensive. Methods that utilize temporal differencing [5, 6] use pixel-wise difference between two/three consecutive frames. Though these methods are very adaptive to the change of environment, it shows poor performance in extracting entire relevant feature pixels because of region homogeneity and excessive responsiveness of pixels to the noise.
In [2, 7, 16], authors propose edge-based methods for moving object detection utilizing double edge maps. In , one edge map is generated from difference image of background and . Another edge map is generated from difference image of and . Finally, moving edge points are detected applying logical OR operation on these two edge maps. Due to illumination variation, random noise may vary in the background which may cause false detection of edges in the edge map. If any false edge appears in anyone of the edge maps, it is finally conveyed to the detection result because of applying logical OR operation on the edge maps.
In , one edge map is computed from the difference image of and , and another edge map is computed from and . Then the moving edges of are extracted by applying logical AND operation on these two edge maps. However, due to random noise, edge pixel positions may be changed to some extent in consecutive frames. Moreover, edges located in the overlapping region of difference images are deteriorated due to insignificant gradient values of that region. Hence, exact matching like AND operation is not sufficient to extract accurate shape information of moving object. Moreover, pixel-based representation of edges is not suitable for flexible edge matching and tracking.
Some edge-based methods utilize more than three successive frames to accumulate the edge information of frame in the difference image. In , initially a coarse moving edge representation is computed from a given frame and two equidistant frames, and later on nondesired edges are removed by means of a filtering. Finally, iterative accumulation of detection result obtained with varying distance images is used in this method to strengthen the respective moving edge points of current image. However, this is time consuming and requires many preceding/succeeding frames in consideration which is not reasonable for real-time detection.
A pseudogradient-based moving edge extraction method is proposed in . Though this method is computationally faster but its background is not efficient enough to take care of the situation when a new object arrives in the scene and stops its movement. In this stage, a stopped object is continuously detected as a moving object. As no background update is adopted in this method, it is not much robust against illumination change. Additionally, this method also suffers with scattered edge pixels of moving objects. The proposed method intends to address the drawbacks of the existing pixel-based methods by introducing an edge segment-based fast and robust solution for moving object detection.
3. The Proposed Method
Figure 2 illustrates the overall flow diagram of the proposed method. The proposed method makes use of three most recent frames , , and for moving edge detection, and later on the detected edges are utilized for segmentation of moving object. Since this method does not require any background, it is free from complex and time-consuming background modeling technique. Moreover, it is adaptive to the change of environment because of using most recent frames.
In the first step of the proposed method, edges are extracted from current image and represent as segments to generate the current image edge map (). In segment-based representation, all the edge pixels belonging to a segment are processed together instead of considering each of the edge pixels individually. It helps to take advantage of robust edge matching and shape information for moving edge detection. It also significantly reduces the occurrence of scattered edge pixels in the detection result.
To identify the moving edges from , two edge maps: left difference image edge map and right difference image edge map are computed. and are utilized to generate distance transformation images and , respectively. Distance transformation image contains the distance values from the nearest edge points of the respective edge map. It provides a linear progression of distances from edge points and is used for edge matching to detect moving edges. An accumulated distance transformation image is computed from and . It contains the lowest distance value in the location of the difference image edge maps. This works as an accumulator of gradient values of two difference image edge maps and reduces the loss of moving edge information on DT image due to overlapping of moving objects in the consecutive frames. It provides more information of moving edge location in current frame. It is to be noted here that minor movement of camera between successive frames is adjusted before obtaining the difference image, using the extracted edge information as described in our previous work [18, 19].
Figure 3 illustrates the advantage of using during matching. Figures 3(a)–3(c) show three consecutive frames, where Figures 3(d)–3(f) show , , and , respectively. For better visualization, we focus only on the moving object region in distance images and scale the inverted distance values in range . Region with brighter portion of DT images represents more likelihood of containing moving edges. By accumulating and , recovers gradient information on the overlapping region to some extent. It improves the accuracy of edge matching. It also helps to detect moving edges even in slow movement, where the existing three frame-based detection methods usually fail.
Moving edges are detected from , applying edge segment-based matching by making use of . However, this process may detect some of the background edges as moving edge because of accumulating more gradient information in . These background edges are removed in the postprocessing step utilizing and with a variability test of matching confidence (DM). After detection, moving edges are grouped together, where each group represents a moving object . Each group of moving edges is used to generate the region of interest (ROI). Watershed algorithm with reduced oversegmentation problem is applied on ROI of current image. Moving edges and gradient infimum values of two difference images are used by an iterative algorithm which removes background segments from ROI to obtain moving object regions. Since the proposed method is applied only on current image ROI, it is faster and applicable for real-time detection. The segmentation result is more accurate because of applying watershed algorithm and it is suitable for content-based applications, where motion information is important to increase the efficiency. The proposed method is described in detail in the following subsections.
3.2. Edge Detection and Representation as Segment
Three edge maps: , , and are utilized in the proposed method for moving edge detection. is computed from by using Canny edge detection algorithm. Two difference image edge maps, and , are obtained utilizing , , and as follows:
where , , and G represent Canny edge detector, gradient operator, and Gaussian mask for noise filtering, respectively. Though we use fixed camera for moving object detection, minor displacement of camera frequently occurs in real application and it is adjusted using distance transformation-based translation with the help of edge segments. Camera adjustment procedure is described in detail in our previous work [18, 19]. Extracted edges from are represented as segments to form utilizing an efficiently designed edge class. In this representation, an edge segment consists of a number of neighboring consecutive edge pixels, where edge operations are performed on whole segment instead of individual edge pixel. Detail description of the edge class can be found in [18, 21]. This representation provides the shape information of an edge and allows local geometric operation. Segment-based representation of edges helps to incorporate an efficient and flexible edge-matching algorithm with higher accuracy and moderate computation time. Since we extract edges from and apply segment-based flexible edge matching, detected moving edges preserve the shape information and missing of edge pixels is reduced significantly.
Figure 4 illustrates the robustness and suitability of using edge segment-based approach over edge pixel-based approach during matching. Figures 4(a) and 4(b) show two edge images of an object taken at different times. Due to edge localization error, there are some displacements of edge pixel position in these two different frames. As a result, pixel-based matching is not suitable in this situation and produces scattered edge pixels in the detection result which is shown in Figure 4(c). Value of disparity threshold (matching flexibility) was set to for this illustration. However, in the case of segment-based representation, no edge pixels are missed as all the pixels belonging to a segment are processed together. Result of segment-based matching is shown in Figure 4(d). It is to be noted that about 17% of the edge pixels are missed in the case of pixel-based matching as compared to segment-based matching.
3.3. Moving Edge Detection
Moving edges are detected from by eliminating background edges using and . Equation (1) is used to compute and from the difference images of consecutive frames. These two edge maps can also be computed by differencing edge points of , , and . However, edge differencing approach is more noise prone as random noise in one frame is different from that of the successive frame . Hence, (1) is utilized to generate difference image edge maps instead of utilizing edge differencing approach. Still, shape of edges on the overlapping regions in and changes to some extent than that in . Thus, existing three frame-based methods that match two difference image edge maps fail to extract accurate shape information of moving object. To solve this problem, we detect moving edges from by making use of distance transformation images of and , instead of comparing these two difference image edge maps directly. Since distance transformation image provides a linear progression of distances from the edge points, the combined distance transformation image provides better information to extract moving edges using . Moreover, segment-based representation of edges and flexible edge matching increase the robustness in terms of accuracy and computational speed. Figure 5 illustrates some of the intermediate steps of moving edge detection process. Figures 5(a)–5(c) show , , and , where , , and are shown in Figures 5(d)–5(f), respectively. Distance transformation and edge matching procedure utilizing , , and are described as follows.
3.3.1. Distance Transformation
During edge matching, one edge map is converted to distance transformation image, DT and another edge map is overlaid on it to compute disparity in matching, DM for each of the edge segments. In DT, each pixel contains the distance to the nearest edge pixels. Since the true Euclidean is resource demanding in terms of time and memory, therefore an integer approximation (3/4 Chamfer ) is used. Thus, distance image can be generated in linear time without considering any floating point operation.
The basic idea behind DT image generation is that global distances in the image are approximated by propagating local distances. This transformation is performed in three steps. In the first step, all the edge pixels are initialized with zero and other pixels are initialized with high value. Second stage is accomplished with a forward pass that scans the image from left to right and top to bottom and update the distances as follows:
Finally, a backward pass scans the image from right to left and bottom to top and modifies the image as follows:
Since, this algorithm uses only two passes to generate DT image, it is faster and can be computed in linear time. Using this algorithm, and are computed from and , respectively. However, insignificant gradient value in the overlapping region may result failure of detecting moving edges in and . Hence, to have more information in the distance image, we compute an accumulated distance image as depicted in the following equation:
3.3.2. Computation of Disparity in Matching
contains moving edge information of and , as it is computed from the difference image of these two frames. Similarly, contains moving edge information of and . Therefore, contains moving edge information of , , and . Moving edges in are detected by making use of these three DT images, where is used first to detect the coarse moving edge list, and later on and are used for noise filtering.
During matching, edge segments in are overlaid on and respective distance values are accumulated. DM for an edge segment is computed by taking a normalized average of distance values in that are hit by the edge segments of , shown as follows:
where k is the number of edge points in the edge segment of and is the distance value at edge point of edge segment. A segment is aligned with the distance transformation image by translation in a small search window. As we represent edges as segments, this translation is performed easily and very fast by just adding the necessary displacement  with each of the edge points. The translation which results in the lowest DM value is finally selected.
During matching, two similar edge segments produce a lower disparity value DM. An edge segment in with distance image having is considered as moving edge and enlisted in the coarse moving edge list (CMEL). Here, is the disparity threshold and we use it to allow some flexibility during matching. Flexibility is allowed as edges change their location in different frames due to noise, illumination variation, and quantization error. Edges in the overlapping region of difference image also experience size, shape, and positional variations than that in current frame (as shown in Figure 1). Moreover, if the object movement is not very significant, low gradient values in the overlapping region cause missing of some moving object pixels in the difference image edge maps. We make use of along with to handle this variation of edges during matching. However, selection of a very high threshold value for might allow an edge segment to be matched with a different edge in the difference image edge maps and thereby increases false positive in the detection result. On the other hand, selection of a very low threshold value might miss some of the moving edge segments to be matched with and thereby increases false negative. In our implementation, we set empirically for all datasets as it gives comparatively better result in most of the cases.
3.3.3. Noise Filtering
CMEL enlists the edge segments of that have higher possibility of being moving edges in . However, some background edges of may erroneously enlisted in CMEL due to excessive incorporation of moving edge information in . Hence, further filtering procedure is applied to make the detection result more accurate. We perform a variability test for each of the edge segments enlisted in CMEL for the final classification as moving edge. Steps of variability test for noise filtering are as follows.
Select an unclassified edge segment from CMEL.
Use (5) to compute disparity of matching and utilizing and , separately.
If , then it is discarded from CMEL. Here, is a threshold to allow some flexibility.
Repeat steps (i) to (iii) until all the edge segments of CMEL are considered.
In noise filtering, our intension is to observe the variation of and for each of the edge segments inCMEL. Since moving edges of current image also exist in both and , absolute difference of their matching confidence value is expected to be zero in the ideal situation. However, due to noise and edge localization error, some flexibility is needed and hence we use . But selection of high value for might causes false matching of edges whereas very low threshold might cause missing of true moving edges in the final detection result. Considering the above issues, we set empirically in our experiment. Figure 5(h) shows the edges initially enlisted in CMEL, where the final detection result after noise filtering is shown in Figure 5(i). To show the advantages of segment-based matching, we also present the result of pixel-based matching in Figure 5(g), where most of the moving edge pixels are missed due to edge-localization error.
3.4. Segmentation of Moving Object
In segmentation, moving regions are extracted from moving edges by using a watershed-based iterative background removal technique. Detected moving edges do not provide the complete boundary of moving object. Thus, a separate algorithm is required for segmentation. The segmentation algorithm is applied on ROI of and it makes use of moving edge segments and gradient infimum values of difference image. Moving object segmentation procedure is described in the following subsections.
3.4.1. ROI Detection and Segmentation
Rectangular bounding box of moving edges is used to determine the ROI of moving object for segmentation. Figure 6(a) shows the detected moving edges, where the defined ROI using these edges is shown Figure 6(b). Use of watershed only on ROI helps to reduce processing time significantly. Since watershed has been proven to be very useful and efficient for image segmentation , it extracts more accurate moving object region during segmentation.
In watershed algorithm, image is split into areas known as catchment basins based on the topology of the image. Catchment basin is defined as the region over which all points flow "downhill" to a common point as shown in Figure 6(c). The local minima (black regions) and maxima (dotted line) of the gray-level data yield catchment basins and watershed lines (dam), respectively. An efficient watershed transformation is flooding process, where only the dams emerge, defining the watershed of the image  as shown in Figure 6(d). However, watershed segmentation frequently results oversegmented image with hundreds or thousands of catchment basins; each corresponds to a minimum of the gradient, some of which may be due to small variations caused by noise. Considering the accuracy and efficiency, we have adopted Vincent-Soilly watershed algorithm  in our proposed method for segmentation. To solve the oversegmentation problem, we replace the gradient value by zero, where it is less than a particular threshold, . is determined by mean of the gradient image minus one fourth of its standard deviation. Thus, around fifty percent  of the gradient values are replaced with zeros, which reduces the oversegmentation problem significantly.
3.4.2. Computation of Gradient Infimum Value
Gradient infimum value is computed from the gradient values of difference images of consecutive frames. It is utilized for making decision on watershed segments for classification as foreground or background. Left gradient image () and right gradient image () are computed using the following equation:
Due to the above formulation, high gradient values in exist only in the region, where moving object boundaries exist in and . Similarly, high gradient values in exist on the boundaries of moving object of and . Hence, to obtain high gradient values only on moving object boundary region in , gradient infimum values, is computed from and . To achieve more robustness, pixels including their eight neighbors are considered while computing as shown in the following equation:
3.4.3. Iterative Procedure for Background Removal
In background removal technique, it is tried to remove the segments adjacent to the outer boundary of selected region in every step if it is identified as background. At first iteration, segments adjacent to the outer boundary of ROI are selected for consideration. If the common boundary portion of the selected segment and the outer boundary belongs to a moving edge, the segment is marked as foreground. Otherwise, gradient values in the position of boundary pixels of selected segments are checked from . If high gradient values (greater than or equal to threshold, ) exist in more than pixels, the segment is marked as moving object segment. Here, NB is the number of boundary pixels in a segment and μ is an adjusting parameter. In ideal situation, high gradient values are expected to exist in the regions of , where boundary of moving object region exists. Due to noise, illumination variation and existence of low contrast between foreground and background regions, high gradient values do not exist in many boundary pixel positions of moving object region. Moreover, insignificant interframe displacement of moving object in the consecutive frames also reduces high gradient information on the boundary region. Hence, to allow some flexibility during segment boundary matching, we set empirically.
If the boundary pixels of the selected segment do not satisfy the above condition, it is considered as background segment. At the end of the first iteration, all the background segments neighboring to ROI are removed and the outer boundary is updated as well. In the following iterations, adjacent segments of updated outer boundary are selected and classified as foreground or background similarly. However, the segments classified as foreground in the previous step are not taken into consideration. This iterative process continues until no further outer regions are classified as foreground. In this stage, the remaining segments represent regions of moving object. In the case of computing the value of , we utilize the same procedure as like . The convergence of the algorithm depends on the amount of background segments presented inside bounding box after applying watershed segmentation. The background segment removal procedure is done as follows.
All the segments are initialized as unmarked and outer boundary pixels of ROI are enlisted as outer boundary list, .
Segments neighboring to outer boundary and not marked yet are enlisted in current segment list, .
A segment is selected from for marking. If the common boundary portion of the selected segment and belongs to a moving edge, the segment is marked as foreground. Else all its boundary positions are checked in. If more than pixels in contain high gradient values, the segment is marked as foreground. Otherwise, the segment is marked as background.
All the segments marked as background are removed. is updated by removing the portion common to the boundary of the removed background segment and including the rest of the boundary of removed segment. is updated accordingly.
Stop the process and constitute moving object from remaining segments if is not updated any more in step (iv). Repeat step (ii) to step (iv) for all the segments in .
Figure 7 illustrates the steps of the proposed segmentation method. Figure 7(a) shows the segmented ROI of current image, where Figure 7(b) shows only watershed lines. Figure 7(c) shows , where high gradient values exist on the boundary region of moving object. Initially, contains the pixels of bounding box of ROI and is shown in Figure 7(d). The shaded segments neighboring to in Figure 7(e) are selected for in the first iteration. The segments belonging to white region are marked as background and thus removed at the end of first iteration, depicted in Figure 7(f). Updated is shown in Figure 7(g) and thereby, its neighboring segments are enlisted in for consideration in second iteration. Figures 7(h) and 7(i) show the regions enlisted in and the segmentation result in second iteration. Figures 7(j) and 7(k) show the updated outer boundary and selected segments neighboring to of third iteration, respectively. Figure 7(l) shows the segmentation result obtained in the final iteration. From the result, it can be noticed that watershed algorithm is effective to extract the complete and more accurate boundary of moving object.
4. Results and Analysis
Experiments were carried out with several video sequences captured from indoor as well as outdoor environment to verify the effectiveness of the proposed method. We utilized a system with Intel Pentium IV GHz processor and 512 MB of RAM. Visual C++ and Multimedia Technology for Educational System (MTES ) were used as of our working environment tools. The above system can process 7 frames/second if the frame size is . Various image sequences are utilized to investigate the performance of the proposed method in different situations. Obtained results are evaluated in two ways: subjective evaluation and quantitative evaluation. In the case of subjective evaluation, detected moving edges and segmented moving object are visualized and compared with the results obtained by other related edge-based methods. As for the quantitative evaluation, we analyze the accuracy of the detected moving edge points and these data are also compared with other methods to investigate the robustness of the proposed method.
4.1. Subjective Evaluation
Figure 8 shows the detection results obtained by the proposed method in "Hall Monitor" sequence and a comparison with two standard reference-independent moving edge detection methods. Figures 8(a)–8(c) show three consecutive frames, , , and , respectively. Figure 8(d) shows the detection result using the method proposed by Dailey et al.  (DC), which uses frame differences of three consecutive frames followed by an AND operation for moving edge detection. However, in the difference image edge maps, edges change their shape (deteriorated) to some extent due to extracting edges from the difference image. Moreover, illumination variation and noise also cause edge localization error. Hence, exact matching like AND operation fails to produce better result in real scenario.
Sappa and Dornaika  (SD) tries to solve the problem by considering a combination of m frame pairs equidistant from current frame. Figure 8(e) depicts the result obtained by this method, where value of m is 2. This iterative solution improves the result in some extent but increases the processing time significantly due to the usage of more future and preceding frames. This method still results in scattered moving edges because of pixel-based processing. The proposed method does not suffer with the problem as edges are extracted as segments from current image and flexible matching is used to obtain the moving edges. Figure 8(f) shows moving edge detection result using proposed method whereas Figure 8(g) shows the segmentation result. Figures 8(h) and 8(i) show the moving edge detection and segmentation result of by the proposed method.
Figure 9 illustrates the performance of the proposed method with the change of illumination. Figure 9(a) shows the background frame. Figures 9(b) and 9(c) show and in different illumination conditions (more bright). Figure 9(d) shows the result obtained by the method proposed by Kim and Hwang  (KW), where many background edge pixels are also detected as moving edges. This is due to the inefficiency in updating background to adapt with the illumination changes in KW. However, the proposed method works with the most recent frames. Thus, it is capable of adapting with the illumination change without any requirement of background update. Figures 9(e) and 9(f) show the moving edge detection and segmentation result, respectively, by the proposed method.
Figure 10 shows that the proposed method is robust against slight movement of camera. Figures 10(a)–10(c) show three consecutive frames: , , and , respectively. Frames , , and have movement of 2, 3, and 4 pixels with respect to the background along the upper left direction. Thus, each pair of consecutive frames has movement of 1 pixel. Figure 10(d) is the frame having similar movement of frame . These displacements were manually adapted to illustrate the robustness of the proposed method. Figure 10(e) shows the result obtained by DC. It is noticeable that many background edge pixels are detected as foreground. Due to camera movement, background edge pixels of one frame cannot cancel out that of other frame during difference image edge map generation. The result is even worse when previous (Figure 10(a)) and next (Figure 10(d)) frames have similar movement with respect to the current frame. In this case, AND operation induces most of the background pixels in the detection result. This result is shown in Figure 10(f). The result obtained by KW is shown in Figure 10(g). Due to camera movement, background edge pixels cannot cancel out the background edge pixels in current frame. Thus, difference image edge map contains some of the background edge pixels which cause false detection in the final detection result. However, our method overcomes this problem as we align different successive frames before obtaining difference image and apply a flexible matching for each of the edge segments of current image which can tolerate the minor movement of camera in video sequence. The result obtained by the proposed method is shown in Figure 10(h), where Figure 10(i) shows the segmentation result of moving object.
Figure 11 presents experimental results for moving edge detection and segmentation of moving objects from "Hall Monitor" and "Highway" sequences to illustrate the comprehensiveness of the proposed method in dynamic environment. Figures 11(a)–11(f) illustrate the result obtained from "Hall Monitor" sequence, where background is much cluttered and high level of noise is present. Figures 11(a) and 11(d) show and , respectively, of "Hall Monitor" sequence. Figures 11(b) and 11(e) show the corresponding moving edge detection results, where segmentation results are shown in Figures 11(c) and 11(f), respectively. Figures 11(g)–11(l) show the detection results for "Highway" sequence. Figures 11(g) and 11(j) show and whereas Figures 11(h) and 11(k) show the corresponding moving edge detection result. Figures 11(i) and 11(l) show the final segmentation result, respectively. It is to be noted that in Figure 11(j) top right and top left vehicles are missed to be detected. "Highway" sequence is challenging with background movement and cluttered scene. Moreover, interframe displacement of some cars on top of the images is very insignificant, which results in less gradient information in the difference image edge maps to detect moving edges. As a result, some of the moving edges are missed to be detected, which eventually results in the missing of moving objects.
Figure 12 illustrates the segmentation result of the proposed method to comprehend its robustness even in the absence of some moving edge pixels in the detection result. Segmentation result of the proposed method is compared with the result obtained from VOP extraction method proposed by KW. In KW, moving object regions are segmented out by horizontal and vertical scanning followed by morphological operation. Figure 12(a) shows the edge detection result by KW. Since moving edges form almost complete boundary of the moving object, VOP extraction method with the help of morphological closing (with structuring element) extracts moving object region effectively as shown in Figure 12(b). In such situation, the proposed method also works well as shown in Figures 12(c) and 12(d), respectively.
However, due to the presence of low contrast between foreground and background in the scene or in presence of illumination variation or noise, moving edge detection result may be deteriorated which may eventually degrade the segmentation result as well in KW. Figures 12(e) and 12(i) show moving edge detection result for two different experiments by KW, where moving edges do not form complete boundary. As a result, horizontal and vertical scanning-basedVOP extraction method fails to extract moving object region properly as shown in Figures 12(f) and 12(j). Segmentation result of this method is largely dependent on extracted moving edges and the size of the morphological operator. Figures 12(h) and 12(l) illustrate the moving object segmentation result by the proposed method whereas respective moving edges are shown in Figures 12(g) and 12(k). In case of moving edge detection result, some of the edges were missed as well. Since, we utilize watershed segments of current image ROI with the gradient infimum value instead of relying only on moving edges, the proposed method segment out the moving object region properly even in such challenging environment.
4.2. Quantitative Evaluation
As for the quantitative evaluation, the accuracy of detected moving edges is determined, where ground truth is obtained by extracting moving edges manually. This evaluation is done using two criteria: precision and recall, defined in the following equations:
Precision tends to evaluate the accuracy of the detected moving edges, while recall is used to measure how much of the actual moving edges are extracted by a particular method. Precision also helps to determine the number of nonmoving edge pixels detected as moving edges whereas the second parameter, recall, provides the quantity of moving edges missed during the detection process.
Figure 13 shows the precision of detected moving edges by the proposed method. For comparison, precision values for KW and DC are also included. Results of five different experiments each having 12 frames are included here, where first two experiments are done for the image sequence obtained in indoor environment. As indoor environment is more challenging than outdoor environment, the results are comparatively a bit worse than that in outdoor sequence. Precision of the proposed method is better than other approaches. Flexible edge matching based on distance transformation and segment-based representation of edges of current image contributes to this improvement. Moreover, we apply a further refinement process on the detected coarse moving edges to get rid of background noisy edge segments. Results of experiment 5 are comparatively worse than that in two other outdoor experiments due to frequent variations of scene constituent in the busy road scene and noise.
Figure 14 shows the values of recall by different methods. Benefit of segment-based representation is clearly visible here as recall is higher for the proposed method in comparison to others. Due to the individual participation of edge pixel in matching, many edge pixels are missed to be detected. These results in scattered moving edges which eventually lead lower recall value for KW and DC. Degradation of result by DC occurred due to the loss of moving edges through matching byAND operator. Because of false matching, some of the moving edge pixels are classified as background and removed in detection process, which results in degradation of recall value in some extent in the proposed method. Some of the moving edges are also removed during postprocessing to filter out noisy edge segments. However, overall recall value is satisfactory considering the dynamism of environment and the obtained precision value together.
The proposed method presents a novel solution for moving object segmentation which is computationally efficient and suitable for real-time automated video surveillance system. This method overcomes some major limitations of existing background-independent methods by utilizing segment-based representation of edges and combining gradient information of moving edges in accumulated distance image. It also shows robustness against sensor noise, quantization error, and edge-localization error. Since the method utilizes most recent frames, it automatically adapts to the change of environment and it does not require any reinitialization step. The proposed edge matching method is performed in linear time and it is effective considering both accuracy and speed together. Segment-based representation of edges can be easily extended to moving object tracking, recognition, and classification. Extracted boundary of the segmented moving object by the proposed method is more precise as we apply watershed algorithm. Experimental results and comparative studies justify the effectiveness of the proposed method for moving object segmentation. However, the effectiveness of the proposed method can be further improved by determining the application-specific suitable values for the threshold parameters. Our future works focus on tracking of moving object using edge segment. Tracking information may also assist to adjust some of the threshold values dynamically to achieve better performance.
Radke RJ, Andra S, Al-Kofahi O, Roysam B: Image change detection algorithms: a systematic survey. IEEE Transactions on Image Processing 2005, 14(3):294-307.
Sappa AD, Dornaika F: An edge-based approach to motion detection. Proceedings of the 6th International Conference on Computational Science (ICCS '06), May 2006, Reading, Mass, USA, Lecture Notes in Computer Science 3991: 563-570.
Gutchess D, Trajković M, Cohen-Solal E, Lyons D, Jain AK: A background model initialization algorithm for video surveillance. Proceedings of the 8th IEEE International Conference on Computer Vision (ICCV '01), July 2001, Vancouver, Canada 1: 733-740.
Bergen JR, Burt PJ, Hingorani R, Peleg S: A three-frame algorithm for estimating two-component image motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 1992, 14(9):886-896. 10.1109/34.161348
Kameda Y, Minoh M: A human motion estimation method using 3-successive video frames. Proceedings of the 2nd International Conference on Virtual Systems and Multimedia (VSMM '96), September 1996, Gifu, Japan 135-140.
Yokoyama M, Poggio T: A contour-based moving object detection and tracking. Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS '05), October 2005, Beijing, China 271-276.
Dailey DJ, Cathey FW, Pumrin S: An algorithm to estimate mean traffic speed using uncalibrated cameras. IEEE Transactions on Intelligent Transportation Systems 2000, 1(2):98-107. 10.1109/6979.880967
Vieren C, Cabestaing F, Postaire J-G: Catching moving objects with snakes for motion tracking. Pattern Recognition Letters 1995, 16(7):679-685. 10.1016/0167-8655(95)00019-D
Kim JB, Kim HJ: Efficient region-based motion segmentation for a video monitoring system. Pattern Recognition Letters 2003, 24(1–3):113-128.
Cai Q, Aggarwal JK: Tracking human motion in structured environments using a distributed-camera system. IEEE Transactions on Pattern Analysis and Machine Intelligence 1999, 21(11):1241-1247. 10.1109/34.809119
Murray D, Basu A: Motion tracking with an active camera. IEEE Transactions on Pattern Analysis and Machine Intelligence 1994, 16(5):449-459. 10.1109/34.291452
Elgammal A, Duraiswami R, Harwood D, Davis LS: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of the IEEE 2002, 90(7):1151-1163. 10.1109/JPROC.2002.801448
Lee D-S: Effective Gaussian mixture learning for video background subtraction. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005, 27(5):827-832.
Duncan JH, Chou T-C: On the detection of motion and the computation of optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence 1992, 14(3):346-352. 10.1109/34.120329
Smith SM, Brady JM: ASSET-2: real-time motion segmentation and shape tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 1995, 17(8):814-820. 10.1109/34.400573
Kim C, Hwang J-N: Fast and automatic video object segmentation and tracking for content-based applications. IEEE Transactions on Circuits and Systems for Video Technology 2002, 12(2):122-129. 10.1109/76.988659
Makarov A, Vesin J-M, Kunt M: Intrusion detection using extraction of moving edges. Proceedings of the 12th IAPR International Conference on Pattern Recognition (ICPR '94), October 1994, Jerusalem, Israel 1: 804-807.
Hossain MJ, Dewan MAA, Chae O: Edge segment-based automatic video surveillance. EURASIP Journal on Advances in Signal Processing 2008, 2008:-14.
Hossain MJ: An edge segment based moving object detection for automated video surveillance, Ph.D. dissertation. Computer Engineering Department, Kyung Hee University, Yongin, South Korea; February, 2008.
Dewan MAA, Hossain MJ, Chae O: Moving object detection and classification using neural network. Proceedings of the 2nd KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications (KES-AMSTA '08), March 2008, Incheon, Korea, Lecture Notes in Computer Science 4953: 152-161.
Hossain MJ, Dewan MAA, Chae O: Moving object detection for real time video surveillance: an edge segment based approach. IEICE Transactions on Communications 2007, E90-B(12):3654-3664. 10.1093/ietcom/e90-b.12.3654
Borgefors G: Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 1988, 10(6):849-865. 10.1109/34.9107
Vincent L, Soille P: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991, 13(6):583-598. 10.1109/34.87344
Yates RD, Goodman DJ: Probability and Stochastic Processes. 2nd edition. John Wiley & Sons, New York, NY, USA; 2005.
Lee J, Cho Y, Heo H, Chae O: MTES: visual programming environment for teaching and research in image processing. Proceedings of the 5th International Conference on Computational Science (ICCS '05), May 2005, Atlanta, Ga, USA, Lecture Notes in Computer Science 3514: 1035-1042.
About this article
Cite this article
Dewan, M.A., Hossain, M.J. & Chae, O. An Adaptive Motion Segmentation for Automated Video Surveillance. EURASIP J. Adv. Signal Process. 2008, 187413 (2008). https://doi.org/10.1155/2008/187413
- Segmentation Result
- Consecutive Frame
- Edge Pixel
- Edge Segment
- Watershed Algorithm