 Research
 Open Access
 Published:
Motion detection using binocular image flow in dynamic scenes
EURASIP Journal on Advances in Signal Processing volume 2016, Article number: 49 (2016)
Abstract
Motion detection is a hard task for intelligent vehicles since target motion is mixed with egomotion caused by moving cameras. This paper proposes a stereomotion fusion method for detection of moving objects from a moving platform. A 3dimensional motion model integrating stereo and optical flow has been established to estimate the egomotion flow. The mixed flow is calculated from an edgeindexed correspondence matching algorithm. The difference between the mixed flow and the egomotion flow yields residual target motion flow where the intact target is segmented from. To estimate the egomotion flow, a visual odometer has been implemented. We first extract some feature points in the ground plane that are identified as static points using the height constraint and Harris algorithm. And then, 6 DOF motion parameters of the moving camera are calculated by fitting the feature points into the linear least square algorithm. The approach presented here is tested on substantial traffic videos, and the results prove the efficiency of the method.
Introduction
Detection on moving obstacles like pedestrians and vehicles is of critical importance for autonomous vehicles. Visionbased sensing systems have been used for object detection in many applications including autonomous vehicles, robotics, and surveillance. Compared with the static systems such as the traffic and crowd surveillance, motion detection from a moving platform (vehicle) is more challengeable since target motion is mixed with camera’s egomotion. This paper addresses on this issue and presents a binocular stereovisionbased invehicle motion detection approach which integrates stereo with optical flow. The approach fully makes use of two pairs of image sequences captured from a stereovision rig, i.e., disparity from left and right pair images and motion fields from consecutive images.
Visionbased motion detection methods can be categorized into three major classes, i.e., temporal difference, background modeling and subtraction, and optical flow. Temporal difference methods [1] readily adapt to sudden changes in the environment, but the resulting shapes of moving objects are often incomplete. Background modeling and subtraction is mainly used in video surveillance where the background is relatively fixed and static. Its basic idea is to subtract or differentiate the current image from a reference background model [2]. However, the generated background model may not be applicable in some scenes such as gradual or sudden illumination changes and dynamic background (wave trees). To address these issues, a hierarchical background modeling and subtraction [3] and a selfadaptive background matching method [4] have been proposed. Adaptive background models have also been used in autonomous vehicles in an effort to adapt surveillance methods to the dynamic onroad environment. In [5], an adaptive background model was constructed, with vehicles detected based on motion that differentiated them from the background. Dynamic modeling of the scene background in the area of the image where vehicles typically overtake was implemented in [6].
Optical flow, a fundamental machine vision tool, has advantages that directly reflect an accurate estimation of point motion, representing an evident change in position of a moving point. It has been used for motion detection and tracking in defense [7] and abnormal crowd behavior detection in video surveillance [8]. In autonomous vehicles, monocular optical flow has been used to detect headon vehicle [9], overtaking vehicles in the blind spot [10] and crossing obstacles [11]. In [12], interest points that persisted over long periods of time were detected and tracked using the hidden Markov model as vehicles traveling parallel to the ego vehicle. In [13], optical flow was used to form a spatiotemporal descriptor, which was able to classify the scene as either intersection or nonintersection. The use of optical flow has also heavily been found in stereovisionbased motion detection, i.e., stereomotion fusion method, which benefits from motion cues as well as depth information. There are many different fusion schemes. In [14], Pantilie et al. fuse motion information derived from optical flow into a depthadaptive occupancy grid (birdview map) generated from stereovision 3D reconstruction. As an improvement of stereovisionbased approach, the method is of benefits to distinguish between static and moving obstacles and to reason about motion speed and direction. Franke and Heinrich [15] propose a depth/flow quotient constraint. Independently moving regions of the image do not fulfill the constraints and are detected. Since the fusion algorithm compare the flow/depth quotient against a threshold function at distinct points only, it is computationally efficient. However, the approach reduces the possibility of carrying out geometrical reasoning and lacks a precise measurement of the detected movements. In addition, the approach is limited with respect to robustness since only two consecutive frames are considered. To get more reliable results, the Kalman filter is equipped to integrate the observations over time. In [16], Rabe et al. employ a Kalman filter to track image points and to fuse the spatial and temporal information so that static and moving pixels can be distinguished before any segmentation is performed. The result is an improved accuracy of the 3D position and an estimation of the 3D motion of the detected moving objects. In [17], Kitt et al. use a sparse set of static image features (e.g., corners) with measured optical flow and disparity and apply the LonguetHigginsEquations with an implicit extended Kalman filter to recover the egomotion. The feature points with optical flow and disparity flow not consistent with the estimated egomotion indicate the existence of independently moving objects. In [18], Bota and Nedevschi focus on fusing stereo and optical flow for multiclass object tracking by designing Kalman filter fitted with static and dynamic cuboidal object models. In [19], interest moving points are first detected and projected on 3D reconstruction ground plane using optical flow and stereo disparity. The scene flow is computed via finite differences for a track up to five 3D positions, and points with a similar scene flow are grouped together as rigid objects in the scene. A graphlike structure connecting all detected interest points is generated, and the resulting edges are removed according to scene flow differences exceeding a certain threshold. The remaining connected components describe moving objects.
A precise recovery of the egomotion is essential in order to distinguish between static and moving objects in dynamic scenes. One of the methods of egomotion estimation was to use invehicle inertial navigation system (INS) [15]. However, egomotion from the incar sensor is not sufficient for a variety of reasons like navigation loss, wheel slip, INS saturation, and calibration errors. Thus, it is ideal to estimate the camera egomotion directly from the imagery. Egomotion estimation using monocular optical flow and integrated detection of vehicles was implemented in [20]. Several groups have reported stereobased egomotion estimation based on tracking point features. In [18], the concept of 6D vision, i.e., the tracking of interest points in 3D using Kalman filtering, along with egomotion compensation, was used to identify moving objects in the scene. In [21], vehicle’s egomotion was estimated from computational expensive dense stereo and dense optical flow with the method of iterative learning from all points in the image.
Stereomotion fusion has been studied in a theoretical manner by Waxman and Duncan [22]. The important result was the relationship between camera’s 3D motion and corresponding image velocities with stereo constraints. Our work builds on the basic principles presented in [22] and extends it to dynamic scene analysis. In this work, a mathematical model, integrating optical flow, depth, and camera egomotion parameters, is firstly derived from Waxman and Duncan’s theoretical analysis. Camera’s egomotion is then estimated from the model by using ground feature points, and accordingly egomotion flow of the image is calculated from the model. A moving target is detected from the difference of the mixed flow and the egomotion flow.
The main contributions of this work can be summarized as follows: (1) The relationship between optical flow, stereo depth, and camera egomotion parameters has been established based on Waxman and Duncan’s theoretical model. Accordingly, a novel motion detection approach fusing stereo with optical flow sensor has been proposed for invehicle environment sensing systems. A visual odometer able to estimate camera’s egomotion has also been proposed. Motion detection using stereomotion fusion normally identifies image points [16, 19] or features [17] as static or moving and then segment moving objects accordingly. Our method works on the image level, i.e., the difference between the mixed flow image and the egomotion flow image. (2) Existing motion detection approaches often make some assumptions on object/vehicle motion or scene structure. Our approach can detect moving objects without any constraints on object/vehicle motion or scene structure since the proposed visual odometer can estimate all six motion parameters. (3) When fusing stereo with optical flow, the computational load, accuracy, and comparability (or consistence) between stereo and optical flow calculations are practical issues. Our method uses the edgeindexed method for all calculations and therefore greatly reduces computational load without impact on detection performance, improves calculation accuracy especially on the mixed flow, and provides pixelwise consistence for all calculations so that the stereo depth, the mixed flow, and the egomotion flow can be compared pixel by pixel.
Approaches
Overview of the approach
The difficulty of motion detection from a moving camera/vehicle is that the background is also moving and its motion is mixed with target motion. Therefore, the key of motion detection in dynamic scenes is to distinguish the background motion from the target motion. The underlying idea of our approach is to subtract the motion of the camera (egomotion) from the calculated (mixed) optical flow, that is, a moving target can be detected from the difference between the mixed optical flow and the egomotion optical flow. Figure 1 gives an overview of the approach.
The mixed flow of the scene is caused by both camera motion and target motion and is obtained from correspondence matching between consecutive images. The egomotion flow is caused only by camera motion and calculated from a mathematical model derived from Waxman and Duncan’s theoretical analysis [22], which indicates the relation between optical flow, depth map, and camera egomotion parameters. To calculate the egomotion flow, we need first know the egomotion parameters of six degree of freedom. A visual odometer has been implemented for this purpose, in which six motion parameters are estimated by solving a set of equations fitted with a fixed number of feature points using the linear least square method. The feature points are selected as corner points lying on the road surface and determined by using height constraint and Harris corner detection algorithm [23]. Within the two stages, the depth of the image points is provided by the stereovision disparity map. The difference between the mixed flow and the egomotion flow yields an independent flow which is purely caused by the target motion. The moving target is extracted according to the continuity of the similar independent flow.
To reduce the computational workload and considering that object contour is the most effective cue for object segmentation, all calculations are edgeindexed, i.e., we only conduct calculations on edge points for stereo matching, the mixed flow, and the egomotion flow calculations. This tactic greatly increases the realtime performance and has no impact on object detection performance.
The mixed flow
Many methods have been developed to calculate dense optic flow from image sequences [24]. Basically, these approaches can be split into two categories: spatiotemporal gradientbased and correspondence matching techniques. The spatiotemporal gradientbased techniques calculate optic flow based on assumptions including globe smoothness or direction smoothness. Our experiences show that these methods take huge computation cost and are difficult to obtain accurate optical flow in complex traffic scenarios. The correspondence matchingbased techniques detect optic flow by searching for correspondence points between consecutive images, therefore are more suitable for dynamic traffic scene images. In this work, we implement an edgeindexed correspondence matching algorithm based on greyscale similarity to calculate the mixed optical flow. The details of the algorithm can be found in our previous work [25]. A summary is as follows:

Step 1.
Generate edge image using Canny operator and use the edge points as seed points to find the correspondence points in next frame.

Step 2.
Define the searching range as a square area centered at the seed point and define a rectangular matching window.

Step 3.
Use the normalized cross correlation coefficients as a measure of greyscale similarity of two matching windows. The correspondence points are regarded as those with the maximum cross correlation coefficient that must be greater than a predefined threshold.

Step 4.
Achieve the subpixel estimation of the calculated optical flow along the vertical and horizontal directions by introducing a quadratic interpolation. This is to improve the optical flow resolution so that a higher optical flow accuracy can be achieved.
3dimensional motion and egomotion flow
Egomotion flow is the optical flow evoked by the moving camera/vehicle, representing the effect of the camera motion. The camera’s 3dimensional motion and planer imaging model is represented in Fig. 2. The origin of the world coordinate system (X, Y, Z) is located at the center of image coordinates (x, y), and the Zaxis is directed along optical axis of the camera. The translational velocity of the camera is \( \overline{V}=\left({V}_x,{V}_y,{V}_z\right) \), and the rotational velocity \( \overline{W}=\left({W}_x,{W}_y,{W}_z\right) \).
Assuming a point P(X, Y, Z) in space moves to point P′(X′, Y′, Z′), the relation between the point motion and camera motion is as below [22]:
The cost product of the point P(X, Y, Z) and camera’s rotational velocity vector can be represented as
where \( \left(\begin{array}{ccc}\hfill i,\hfill & \hfill j,\hfill & \hfill k\hfill \end{array}\right) \) denotes the unit vector in the direction of X, Y, and Zaxes, × refers to crossproduct. Thus, Eq. (2) can be rewritten as
The 3dimensional velocity \( \left(\begin{array}{ccc}\hfill \frac{dX}{dt}\hfill & \hfill \frac{dY}{dt}\hfill & \hfill \frac{dZ}{dt}\hfill \end{array}\right) \) of the point can be obtained as below:
For an ideal pinhole camera model, the image point p(x y) of the world point P(X, Y, Z) projected in the image plane can be expressed as
where f denotes the focal length of the stereo camera. The optical flow (u, v) of P(X, Y, Z) can be obtained by estimating the derivatives along Xaxis and Yaxis in 2D image coordinates.
Integrating Eqs. (4) to (6) yields the following:
where \( A=\left[\begin{array}{ccc}\hfill \frac{f}{Z}\hfill & \hfill 0\hfill & \hfill \frac{x}{Z}\hfill \\ {}\hfill 0\hfill & \hfill \frac{f}{Z}\hfill & \hfill \frac{y}{Z}\hfill \end{array}\ \begin{array}{ccc}\hfill \frac{xy}{f}\hfill & \hfill \frac{f^2+{x}^2}{f}\hfill & \hfill y\hfill \\ {}\hfill \frac{f^2+{y}^2}{f}\hfill & \hfill \frac{xy}{f}\hfill & \hfill x\hfill \end{array}\right] \).
Equation (7) indicates the relationship between the egomotion flow, the depth and the six parameters of the camera motion. It is evident that the egomotion flow can be calculated from Eq. (7) if the depth and the six motion parameters are known. The depth can be obtained from stereovision as reported in our previous work [26]. Two methods can be used to obtain the motion parameters: one is to use an invehicle INS or gyroscope to measure them; the other is to use a visual odometer. However, subject to problems like navigation loss, wheel slip, INS saturation, and calibration errors between the IMU and the cameras, invehicle INS may cause inaccurate motion estimation in some cases. Thus, it is ideal to estimate the camera motion directly from the imagery. Ultimately, it could be fused with other state sensors to produce a more accurate and reliable joint estimate of cameral/vehicle motion.
Visual odometry
It can be known from Eq. (7) that if the egomotion flow and the depth of six or more points in the scene are known, we can set up a set of equations with six unknown variables, i.e., six camera motion parameters and estimate these variables by solving the equations set using the least square fitting method. The points used for the least square fitting must be assured with accurate optical flow calculation and must not be any moving points.
In this work, the corner points lying on the road surface are selected for this purpose since the ground points are static and the corner points are of good stability and inflexibility to light intensity, therefore possessing relatively accurate optical flow.
Extraction of ground corner points using stereovision and Harris method
Ground points can be determined from the height information that can be obtained from the stereovision as reported in our previous work [26]. The height Y _{ g } of the ground points, namely their Yaxis coordinate, depends on the camera installation height H _{ c }, the tilt angle towards the road plane θ, and distance Z _{ g }, as indicated in Eq. (8) and Fig. 3. Those points with Yaxis coordinate less than Y _{ g } are regarded as ground points.
A corner is defined as a point for which there are two dominant and different edge directions in a local neighborhood of the point. Harris corner points are detected by considering the differential of the corner score with respect to direction [23]. The corner score is referred as autocorrelation. Assuming that a pixel I(X, Y) moves in any directions by small displacements (∇x, ∇y), the autocorrelation function is defined as below:
where φ(x, y) is Gaussian weighting function used here to reduce the impact of noise; W(x, y) denotes window blocks centered at the point; I _{ x } is the gradient in x direction; and I _{ y } is the gradient in y direction. The Sobel convolution kernel ω _{ x } and its transposed form ω _{ y } are used to obtain I _{ x } = I(X, Y) ⊗ ω _{ x } , and I _{ y } = I(X, Y) ⊗ ω _{ y }. M(x, y) is called the autocorrelation matrix and
The corner response function (CRF) can be calculated as.
where det(M) = λ _{1} × λ _{2} and traceM = λ _{1} + λ _{2}, λ _{1} and λ _{2} denote the eigenvalues of the matrix M, we set α = 0.04. The point with CRF bigger than a certain threshold is regarded as a corner point.
Egomotion parameter estimation using the linear square algorithm
The objective function is defined as the Euclidean distance between the estimated optical flow \( \left(\widehat{u},\widehat{v}\right) \) and the true optical flow (u, v).
The true optical flow (u, v) is calculated from the method introduced in Section 2.2. The estimated optical flow \( \left(\widehat{u},\widehat{v}\right) = A{\left(\overline{V},\overline{W}\right)}^T \) is obtained from Eq. (7). The minimum value of the object function is found by setting the gradient to zero and the optimal parameter values are
where \( A \) denotes the coefficient matrix made up with the focal length f of the stereo camera, the depth Z, and the image coordinates as shown in Eq. (7).
Independent flow and target segmentation
The difference between the mixed flow and the egomotion flow yields the independent flow which attributes purely to moving targets. This operation ideally cancels out the effects of interfame changes caused by vehicle motion and involves a 2D vector difference:
where [u _{ r } v _{ r }] denotes the independent flow in the horizontal and vertical directions, [u _{ m } v _{ m }] the mixed flow, and [u _{ e } v _{ e }] the egomotion flow. The synthetic of the two components of the independent flow is calculated as \( s=\sqrt{u^2+{v}^2\ } \). Target segmentation is based on the synthetic independent flow.
In theory, the independent flow of the background should be zero. However, the background has some residual independent flow due to calculation errors. The key to distinguish a moving object from the background is to determine a threshold of the independent flow. In this work, we adopt the OTSU algorithm to determine a selfadapting threshold. The algorithm can be described as follows:

1.
For a threshold t, s _{min} < t < s _{max} , define the variance ε(t) between the moving target’s independent flow and the background’s independent flow as
$$ \varepsilon (t)={P}_o{\left({s}_ot\right)}^2+{P}_g{\left({s}_gt\right)}^2 $$(15)where s _{ o } denotes the mean of the independent flows of the moving points, \( {s}_o=\frac{{\displaystyle \sum }{s}_i*{p}_i}{p_o}\ \left({s}_i>t,\ i=1,\kern1em 2,\kern1em 3\dots \right) \), s _{ g } denotes the mean of the independent flows of the background points, \( {s}_g=\frac{{\displaystyle \sum }{s}_i*{p}_i}{p_g}\ \left({s}_i<t,\kern0.75em i=1,\kern1em 2,\kern1em 3\dots \right) \); p _{ o } denotes the proportion of the points with s > t, p _{ g } the proportion of the points with s < t, and p _{ i } the proportion of the points with s < s _{ i }.

2.
Search for the t from s _{min} to s _{max} to make ε(t) maximum and use it as the threshold to segment the moving objects from the background. This process endures a maximum betweenclass distance.
We cancel out the pixels with the independent flow below the threshold determined above. For the pixels with the independent flow above the threshold, we use the regiongrowing method to cluster similar potentials together to form the eventual segmentation. Actually, in this work, the independent flow is also combined with the disparity (depth) for object clustering. This tactic is especially useful for separating objects close to each other or with occultation.
Experiments and results
A VIDERE stereo rig with strict calibration is used to capture images in this work. The two cameras have the image resolution of 640 × 480 pixels, the pixel size of 15 μm, and the baseline of 218.95 mm. The focal length is 16.63 mm. The detection ranges from −8 to 8 m in lateral and from 4 to 50 m in distance. Figure 4a, d shows the left images of two typical traffic scenarios. The first scenario involves a pedestrian, an oncoming coach, and some static obstacles like parked cars and trees, where the equipped vehicle moves in longitudinal direction. In the second scenario, the vehicle is turning in a bend. It helps to evaluate our algorithm when the vehicle undergoes more complex movement.
Disparity of stereovision
Figure 4b, e shows the edge maps obtained from a Canny detector. The edge points in the left image are used as seed points to search for the correspondence points in the right image by using greyscale similarity as the measure. The resulting disparity maps are displayed in Fig. 4c, f. A color scheme is used to visualize the disparity. The depth information of the image points can be derived from the disparity map. It should be noted that some points like the trees out of the detection range are not presented in the disparity maps. It is worthy to be noted that contour occluding could be generated due to the different viewpoints of the two cameras and may bring troubles for stereo correspondence matching especially for a short distance with a wider baseline. In our application, we use a relatively short stereo baseline of 218.95 mm, and the detection range is 4 to 50 m. The occluding effect is not significant. In addition, stereo matching depends on the selection of matching windows and setting of threshold of correlation coefficient. The detailed edgeindexed stereo matching procedure can be found in our previous work [26]. All experiments show that the edgeindexed stereo matching can successfully generate an edgeindexed disparity map.
Mixed flow results
Figure 5 shows the mixed flow obtained using the edgeindexed correspondence matching algorithm described in Section 2.2. It can be noted that even for static objects like parked cars, trees, and ground points, there is obvious motion, which is caused by camera/vehicle’s motion. The motion of the pedestrian and the oncoming coach is significantly different from its surroundings due to its own motion. The mixed motion shown in Fig. 5a, b reflects actual movement of the points and will be used for subtraction of motion flows in late stage.
Visual odometer results
The edge points in the ground surfaces are successfully extracted, as shown in Fig. 6a (scenario 1) and 6b (scenario 2). The Harris corner points are detected and marked with “+” in the figures. For each case, 15 Harris Corner points with higher CRF scores are selected to set up a set of equations for estimating the six egomotion parameters using the least square fitting method. The results are presented in Table 1. It can be found that for both scenarios, V _{ z } are significant and V _{ y }, W _{ x }, W _{ y }, and W _{ z } are tiny. This is reasonable since the vehicle was moving with a certain speed in a relatively flat road. For scenario 2, V _{ x } is also significant because the vehicle was left turning in a bend. For scenario 1, V _{ x } is equal to 0.17 m/frame, indicating that the vehicle was not strictly moving in longitudinal direction and had a small lateral moving at the moment.
During the video acquisition, a spatial NAV 982 Inertial Navigation System was fitted in the car to measure the egomotion parameters. Although the INS may lose detection in some cases, the comparison between the effective data of two systems shows that the difference of the results is within 4 %, indicating that our visual odometer is reasonably accurate.
Egomotion flow results
The egomotion flow calculated from Eq. (7) by using above estimated egomotion parameters is shown in Fig. 7. The egomotion shown in Fig. 7a, b will be used for subtraction of motion flows in late stage.
Independent flow and motion extraction
The subtraction of Fig. 7a from Fig. 5a is shown in Fig. 8a, while the subtraction of Fig. 7b from Fig. 5b is shown in Fig. 8b. The subtraction yields the independent flow which is purely caused by the target motion. It can be seen that the most of the background have been canceled out and the moving objects are significantly highlighted using the method described in Section 2.5. Furthermore, the pedestrian can be framed according to the continuity of the similar independent flow, as shown in Fig. 8c, d.
Evaluation of the system
Experiments have also been conducted on the public image database KITTI (Karlsruhe Institute Technology and Toyota Technological Institute) [27]. Figure 9 shows the process of motion detection for one of the scenarios containing multiple moving obstacles. A total of 5000 frames of various scenarios with handlabeled moving objects including pedestrians and cars have been tested using our approach. In general, Recall and Precision are usually used to assess the accuracy of object detection.
Recall is defined as follows:
where tp is the total number of truepositively detected objects, fn is the total number of falsenegatively detected objects, and (tp + fn) indicates the total number of objects in the ground truth. Precision is defined as follows:
where fp is the total number of falsepositively detected objects, and (tp + fp) indicates the total number of the detected objects.
Table 2 lists the performance of our method in terms of detection of pedestrians and cars.
The system is implemented with C++ language in an industrial computer equipped with a 2.40GHz Intel Dual Core i5 processor and 4 GB of RAM. In general, we can achieve a processing rate of 10–15 frames per second (FPS), depending on complexity of the images. This processing rate includes the stereo preprocessing time. Ideally, it should work at least 25 FPS for a realtime system. But we believe that it will not be a problem to achieve this by using a bespoke image processing hardware in future.
Comparison with other methods
Table 3 lists the comparison with the some other work reported for moving object detection including applications in video surveillance [4]. It is a hard task to make a uniform comparison with other approaches for two reasons: (1) Evaluation metric used can be different; (2) Many research work do not give statistical accuracy rate. The work most related to our approach can be found in [14–19], which use stereomotion fusion. However, there are no reports on detection rate or accuracy rate in [14–16, 19]. In [17], authors only give the accuracy for feature point detection rather than accuracy for object detection. Moreover, the accuracy definition is slightly different from ours. In [18], authors only provide result for object tracking.
Conclusions
This paper presents a novel motion detection approach using a stereovision sensor for invehicle environment sensing system. The relationship between optical flow, stereo depth, and camera egomotion parameters has been established. Accordingly, a visual odometer has been implemented for estimation of six egomotion parameters by solving a set of equations fitted with a number of feature points using the linear least square method. The feature points are selected as corner points lying on the road surface and determined by using height constraint and Harris corner detection algorithm. The egomotion flow evoked by the moving camera/vehicle is calculated from the relational model by using the estimated egomotion parameters. The mixed flow caused by both camera motion and target motion is obtained from the correspondence matching between consecutive images. The difference between the mixed flow and the egomotion flow yields the independent flow which attributes purely to the target motion. The moving targets are extracted according to the continuity of the similar independent flow. The approach presented here was tested on substantial complex urban traffic videos. The experimental results demonstrate that the approach can detect moving objects with a correction rate of 93 %. The accuracy of egomotion estimation is within 4 %, comparing to an invehicle INS sensor. The processing rate reaches 10–15 FPS on an industrial computer equipped with a 2.40GHz Intel Dual Core i5 processor and 4 GB of RAM.
References
 1.
JE Ha, WH Lee, Foreground objects detection using multiple difference images. Opt. Eng 4, 047–201 (2010)
 2.
M.C. arco, F. Michela, B. Domenico, M. Vittorio, Background subtraction for automated multisensor surveillance: a comprehensive review. EURASIP. J. Adv. Signal. Process. 2010, 343057. doi:10.1155/2010/343057
 3.
L Wei, H Yu, H Yuan, H Zhao, X Xu, Effective background modelling and subtraction approach for moving object detection. IET Computer Vision 9(1), 13–24 (2015)
 4.
FC Cheng, SJ Ruan, Accurate motion detection using a selfadaptive background matching framework. IEEE Trans. Intell. Transp. Sys 13(2), 671–679 (2012)
 5.
A. Broggi, A. Cappalunga, S. Cattani, P. Zani. In Proceedings of IEEE Intell. Veh. Symp. Lateral vehicles detection using monocular high resolution cameras on TerraMax, (2008), pp. 1143–1148
 6.
Y Zhu, D Comaniciu, M Pellkofer, T Koehler, Reliable detection of overtaking vehicles using robust information fusion. IEEE Trans. Intell. Transp. Syst 7(4), 401–414 (2006)
 7.
L. Kui, D. Qian, Y. He, M. Ben. Optical flow and principal component analysisbased motion detection in outdoor videos. EURASIP. J.Adv. Signal Process. 2010, 680623. doi:10.1155/2010/680623
 8.
L. Yang, X.F. Li, J. Limin, In Proceedings of the 11th World Congress on intelligent control and automation (WCICA). Abnormal crowd behavior detection based on optical flow and dynamic threshold (2014), pp. 29022906
 9.
E. Martinez, M. Diaz, J. Melenchon, J. Montero, I. Iriondo, J. Socoro, In Proc. IEEE Intell. Veh. Symp.. Driving assistance system based on the detection of headon collisions (2008); pp. 913–918
 10.
J Diaz Alonso, E Ros Vidal, A Rotter, M Muhlenberg, Lanechange decision aid system based on motiondriven vehicle tracking. IEEE Trans. Veh. Technol 57, 2736–2746 (2008)
 11.
I. Sato, C. Yamano, H. Yanagawa, In Proc. IEEE IV. Crossing obstacle detection with a vehiclemounted camera (2011), pp. 60–65
 12.
H Jazayeri, J Cai, Y Zheng, M Tuceryan, Vehicle detection and tracking in car video based on motion model. IEEE Trans. Intell. Transp. Syst. 12(2), 583–595 (2011)
 13.
H. Geiger, B. Kitt, In Proc. IEEE Intell. Veh. Symp. Object flow: a descriptor for classifying traffic motion (San Diego, USA, 2010), pp. 287–293
 14.
D. Pantilie, S. Nedevschi, In Proc. IEEE Conference on Intelligent Transportation Systems. Realtime obstacle detection in complex scenarios using dense stereo vision and optical flow (Funchal, 2010), pp. 439 – 444
 15.
U Franke, S Heinrich, Fast obstacle detection for urban traffic situations. IEEE Trans. Intell. Transp. Syst 3(3), 173–181 (2002)
 16.
C. Rabe, U. Franke, S Gehrig, In Proceedings of IEEE Intelligent Vehicles Symposium. Fast detection of moving objects in complex scenarios (2007), pp. 398–403
 17.
B. Kitt, B. Ranft, H. Lategahn, In Proc. 13th Int. IEEE Conf. on ITSC. Detection and tracking of independently moving objects in urban environments (2010), pp. 1396–1401
 18.
S. Bota, S. Nedevschi, In Proc. 14th Int. IEEE Conf. ITSC, Tracking multiple objects in urban traffic environments using dense stereo and optical flow (2011), pp. 791–796
 19.
P. Lenz, J. Ziegler, A. Geiger, In Proc. IEEE Intell. Veh. Symp.. Roser M. Sparse scene flow segmentation for moving object detection in urban environments (BadenBaden, Germany, 2011), pp. 926–932
 20.
T. Yamaguchi, H. Kato, Y. Ninomiya, In Proc. IEEE Intell. Veh. Symp.. Moving obstacle detection using monocular vision (2006), pp. 288–293
 21.
A. Talukder, L. Matthies, In Proc. IEEE Int. Conf. Intelligent Robots and Systems. Realtime detection of moving vehicles using dense stereo objects from moving and optical flow (2004), pp. 37183725
 22.
AM Waxman, JH Duncan, Binocular image flows: steps towards stereomotion fusion. IEEE Trans Pattern Anal Mach Intell 8, 715–729 (1986)
 23.
C. Harris, M.A. Stephens, In Proceedings of the 4th Alvey Vision Conference. Combined corner and edge detector, (1988), pp. 147–151
 24.
B McCane, K Novins, D Crannitch, B Galvin, On benchmarking optical flow. Comput Vis Image Underst 84, 126–143 (2001)
 25.
Y. Huang, K. Young. Binocular image sequence analysis: integration of stereo disparity and optic flow for improved obstacle detection and tracking. EURASIP. J Adv. Signal. Process. 2008, 843232. doi:10.1155/2008/843232
 26.
Y Huang, S Fu, C Thompson, Stereovisionbased object segmentation for automotive applications. EURASIP J Appl Signal Process 14, 2322–2329 (2005)
 27.
KITTI Vision, Available online: http://www.cvlibs.net/datasets/kitti/, Accessed 18 Jul 2015.
Acknowledgements
This work was sponsored by Specialized Research Fund for the Doctoral Program of Higher Education (Project No. 20133120110006), the National Natural Science Foundation of China (Project No. 61374197), and the Science and Technology Commission of Shanghai Municipality (Project No. 13510502600).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Min, Q., Huang, Y. Motion detection using binocular image flow in dynamic scenes. EURASIP J. Adv. Signal Process. 2016, 49 (2016). https://doi.org/10.1186/s1363401603498
Received:
Accepted:
Published:
Keywords
 Motion detection
 Stereovision
 Optical flow
 Egomotion
 Visual odometer