Rule-driven Object Tracking in Clutter and Partial Occlusion with Model-based Snakes

In the last few years it has been made clear to the research community that further improvements in classic approaches for solving low level computer vision and image/video understanding tasks, are diﬃcult to obtain. New approaches start evolving, employing knowledge-based processing, though transforming a priori knowledge to low level models and rules, are far from being straightforward. In this paper, we examine one of the most popular active contour models, Snakes and propose a snake model, modifying terms and introducing a model-based one that eliminates basic problems through the usage of prior shape knowledge in the model. A probabilistic rule-driven utilization of the proposed model follows that copes with objects of diﬀerent shape complexity and motion, diﬀerent environments, indoor and outdoor, cluttered sequences, cases where background is complex (not smooth) and when moving objects get partially occluded. The proposed method has been tested in a variety of sequences and the experimental results verify its eﬃciency.


I. Introduction
Active contours, popularly known as snakes, have drawn special attention in the last decade among the Computer Vision, Image and Video processing researchers.They employ weak models, which deform in conformance with salient image features.The approaches proposed in the literature focus on either the highest accuracy of estimating moving silhouettes or the lowest computational complexity.Active contours (snakes) were first introduced by Kass et al. [18].A snake is actually a curve defined by energy terms, being able to deform itself in order to minimize its total energy.This total energy consists of an "internal" term, that enforces smoothness along the curve, and an "external" term, that makes the curve move towards the desired object boundaries.Many variations and extensions of snakes have been proposed and applied to certain applications [11], [6].However, the majority of them faces three main limitations.The first one is the quality of the initialization that is crucial for the convergence of the algorithm.The second one is the need for parameter tuning that may lead to loss of generality and the third one is the sensitivity to noise, clutter and occlusions.During the last decade snakes and their variants are applied to motion segmentation [5], [20], [3], [25], object detection, localization and tracking in video sequences [28], [26], [34], [8].Most approaches require an initial shape approximation that is close to the objects' of interest boundaries [4].The straightforward incorporation of prior knowledge in such models is a very interesting property that makes them appropriate for capturing casedependent constraints.Constraining the active contour representation to follow a global shape prior while preserving local deformations has drawn the interest of the research community.Cootes et al. [7] introduce the term "Active Shape Models" to compensate for the extension of classical snakes with global constraints.They describe a technique, which allows an initial rough guess for the best shape, orientation, scale, and position to be refined by comparing a hypothesized model instance with image data, and using differences between model and image to deform the shape.The results demonstrate that their method can deal with clutter and limited occlusion.An efficient method towards the combination of low-and high-level information in a consistent probabilistic framework is proposed by Isard and Blake [15], [16].The result is highly robust tracking of agile motion in clutter that runs in near real-time.The CONDENSATION algorithm they introduce is a fusion of the statistical factor sampling algorithm for static, non-Gaussian problems with a stochastic differential equation model for object motion.Rouson et al., [31] propose a two-stage approach using level-set representations.During the first stage a shape model is built directly on the level set space using a collection of samples.This model allows shape variabilities that can be seen as an "uncertainty region" around the initial shape.Then, this model is used as a basis to introduce the shape prior in an energetic form.In the proposed approach we consider a knowledge-based view of active contour models, which is appropriate for handling object tracking in partial occlusion, as well as tracking objects whose shape can be approximated by parameter-based models.We use shape priors and set them in a rather loose way to preserve the required deformations and introduce an uncertainty region around the contour to be extracted, which is based on motion history.In order to cope with partial occlusion we use a rule-driven approach and provide several results.The algorithm seems to provide efficient solutions in terms of both accuracy and computational complexity.Head tracking has been selected as a test-bed application of the integrated model; head is approximated by shape priors derived from an ellipsoid.This approach provides the constraint that the desired object is not strongly deformed in successive frames of video sequences, which is actually valid for most cases.The paper is organized as follows.In the next section we review the classic snake model and provide information on the adopted model-based approach.Section III describes in detail the proposed tracking approach and section IV provides the experimental results.Future research directions are given in section V.

II. Snake Model
In general, Snakes concern model and image data analysis, through the definition of a linear energy function and a set of regularization parameters.Their energy function consists of two components, the internal or smoothness-driven one, which enforces smoothness along the snake, and the external or data-driven component, which depends on the image data according to a chosen criterion, forcing the snake towards the object boundaries.The goal is to minimize the total snake energy and this is achieved iteratively, after considering an initial approximation of the object shape (prototype).Once such an appropriate initialization is specified, the snake can converge to the nearby energy minimum, using gradient descent techniques.According to that formulation, a snake is modeled as being able to deform elastically, but any deformation increases its internal energy causing a "restitution" force, which tries to bring it back to its original shape.At the same time, the snake is immersed in an energy field (created by the examined image), which causes a force acting on the snake.These two forces balance each other and the contour actively adjusts its shape and position until it reaches a local minimum of its total energy.Let us consider a snake C snake defined by a set V(s) of N ordered points (snaxels) {V i (s) | i=1,2,...,N }, corresponding to the positions (x i (s), y i (s)) in the image plane (s is a parameter denoting the normalized arc-length in [0 1].For simplicity in the following the parameter s will be mentioned only when necessary).The total energy function E snake is then defined by the weighted summation of the internal energy E int , corresponding to the summation of the stretching and bending energies of the snake, and the external one, which indicates how the snake evolves according to the features of the image.
where e int (V i ) and e ext (V i ) are the internal and external energies corresponding to point V i , and the procedure of snake's convergence to the object boundary is given by the solution of its total energy minimization: where a 1 and a 2 are the snake's regularization parameters.

A. Internal Energy
The internal energy E int has been given various definitions in the literature [9], [14], [29], depending on the application criteria.In our approach we define the internal energy in terms of the snake curvature CU snake and its point density distribution DV snake : (5) where (x, y) parameterize the curve as V i = [x i , y i ] and the first and second derivatives of (x, y) denote the velocity and the acceleration along the curve: ( ẋ = dx ds , ẏ = dy ds ) and (ẍ = d 2 x ds 2 , ÿ = d 2 y ds 2 ).Thus, the internal energy of the snake is defined as: where | • | denotes the magnitude of the corresponding quantities.In the discrete case, the value of the curvature at the k-th point is calculated using the neighboring points to each side of it; the sign of the curvature is positive, if the contour is locally convex, and negative if concave.Moreover, curvature distribution/function uniquely defines a propagating curve at different time instances, although it is not affine-invariant and thus it is inappropriate in object recognition problems [14], [1].In the proposed snake model the points constituting a curve are not equally spaced and thus the distances between successive points represent the local elasticity of the snake.Finally, it should be noted that curvature and point density terms are often used in the literature [18], [23], [29], and in the present work they are used both as smoothness and curves similarity criteria, as described in the following sections.Fig. 1 illustrates the curvature (curve smoothness) and point density (elasticity) distributions of a given snake.

A.1 Prior Model Constraints
The inclusion of a global shape model biases the snake contour towards a target shape, allowing some selectivity over image features.In several applications the general shape and possibly the location and orientation of objects is known and this knowledge may be incorporated into the deformable adaptive contour in the form of initial conditions, data constraints, constraints on the model shape parameters, or into the model fitting procedure.However, for efficient interpretation, it is essential to have a model that not only describes the size, shape, location and orientation of the target object but that also permits expected variations in these characteristics.A number of researchers have incorporated knowledge of object shape into deformable models by using deformable shape templates.These models usually use global shape parameters to embody a priori knowledge of expected shape and shape variation of the structures and have been used successfully for many applications of automatic image interpretation.An excellent example in computer vision is the work of Yuille et al. [35], who construct deformable templates for detecting and describing features of faces, such as the eye.Staib et al. [32] use probability distributions on the parameters of the representation and bias the model to a particular overall shape while allowing for deformations.Boundary finding is formulated as an optimization problem using a maximum a posteriori objective function.A model-based snake that is directly applicable in image space as opposed to parameter space is proposed in [10].This method is simple and fast and therefore fits well to our intention to extend the previous formulation with a model prior constraint.We mention here that our goal is to illustrate the increased robustness of the proposed method provided by the inclusion of shape information rather than incorporating a novel shape prior constraint representation.We formulate the model energy function by using a slightly different shape modeling than the one adopted in [10].Therefore, we define the constraint energy term E model (V(s)) as: where λ is parameterized, since it can vary with position, and eq.( 7) is reformulated as, As an example, a generalized ellipse represented by eq. ( 10) is used as a model (model ellipse ) here.Ellipse is a typical model for human faces and therefore is appropriate for head tracking, which is our test-bed application.
where a and b are the minor and major axes respectively and ϑ is the ellipsoid rotation.
The model should take scaling, translation and rotation under consideration.In order to meet previous requirements, we base the minor, major axes and rotation calculation on a statistical representation of an ellipsoid as the covariance matrix S derived from the distribution of the last recovered (previous frame) solution points.The eigenvalues λ 1 and λ 2 (λ 1 ≥ λ 2 ) correspond to each of the principal directions e 1 and e 2 , respectively.The eigenvalues determine the shape of the ellipsoid, while the eigenvectors determine the orientation as shown in Fig. 2 B

. External Energy
Regarding the external energy term, in most approaches, for each point V i , it is defined as: where |∇G σ * I(x i , y i )| denotes the magnitude of the gradient of the image convolved with a Gaussian filter, of variance σ at point (x i , y i ) corresponding to the snaxel V i ; g(V i ) is the respective gradient direction and n(V i ) is the normal vector of the snake at the snaxel V i .The common problems in snake models, is the presence of noise, background edges close to object boundaries and edges in the interior of the desired object.These problems flow from the definition of the external energy and the Laplacian-of-Gaussian (LoG) term ∇G σ * I, especially in cases where the initialization is not close enough to object boundaries.For that reason, snakes turn out to be efficient only in specific cases of images and video sequences.In the proposed model, another term is introduced instead, minimizing the local variance of the image gradient and preserving the most important image regions.This is achieved through morphological operations leading to a modified image gradient.
In particular, the expression |∇G σ * I(x i , y i )| is replaced by a modified image gradient G m and the image-data criterion is strengthened through the the square of G m : To obtain the modified image gradient, we first pre-smooth the image with a non-linear morphological filter, called ASF (Alternating Sequential Filter) [21], and we extract the morphological image gradient.The ASF used in our model is based on morphological area opening (•) and closing (•) operations with structure elements of increasing scale.The main advantage of such filters is that they preserve line-type image structures, which is impossible to be achieved with e.g.median filtering.Fig. 3 illustrates the performance of a frame's pre-smoothing with the proposed ASF: it can be clearly seen that noise is eliminated and the most important edges are preserved.More details can be found at [33].Fig. 4 illustrates the differences between the two image-data criteria |∇G σ * I(x i , y i )| and G m , presented in equations ( 12) and ( 13).It can be seen in Figs.4(b,c) that the proposed procedure clearly suppresses noise and retains the most important edges of the examined image, whereas Figs.4(d,e) illustrate the difference between image gradient and the proposed modified gradient, computed along a randomly selected image line.Fig. 4 clearly shows the advantages of the proposed external energy term for edgebased methods, in terms of noise reduction and preservation of the most important edges.Comparing this external energy with related work found in the literature, except for the commonly used LoG-based definitions, a representative example is the respective term proposed in [19].In this work, a gaussian filter is used to obtain the image gradient, but an appropriate value of the gaussian variance is required, which is done manually.Fig. 5 illustrates the difference between the proposed external energy term and the one proposed in [19].

III. The Proposed Tracking Approach
Object tracking actually concerns the separation of moving objects from background [24], which is done so far in two different ways: (a) the motion-based approaches that rely on grouping motion information over time and (b) the model-based approaches that impose high-level semantic representation and knowledge.In these approaches either geometrical properties or region-based features of the desired objects are extracted and utilized.Thus the methods proposed in the literature can categorized in edge-based methods [15], which rely on the boundary information, and region-based ones [22], utilizing the information provided by the interior region of the tracked objects.
The main problems that tracking approaches are called upon to cope with are non-rigid (deformable) objects, objects with complicated (not smooth) contours, object movements that are not simple translations and movement in natural sequences, where background is usually complicated and the amount of noise or the external lighting changes are not known.The latter has been a motivation for many researchers, especially in the last years, to follow probabilistic approaches, e.g [27].In addition, a more difficult problem emerges in many sequences, the occlusion, i.e when moving objects get occluded successively as time passes; this requires some assumptions about the shape, region or motion of the tracked object in order to estimate its contour even in regions that are covered by other moving or static objects.In the following we describe the proposed approach, which aims to cope with the above mentioned problems.The proposed method consists of two main steps: the extraction of the "uncertainty regions" of each object in a sequence, and the estimation of the mobile object contours.The term "uncertainty regions" is used to describe the regions in a frame, where moving contours are possible to be located, whereas the estimation of the contours consists of an energy minimization procedure based on the proposed snake energy terms, described in Section II.More specifically, the contour of a moving object is estimated first in a few successive frames of a sequence; this can be achieved with appropriate parameter initialization utilizing the proposed snake model.Then, for the next frames, a forcebased approach is being followed to minimize the total snake energy inside the respective uncertainty regions, which are extracted using the displacement history of each point of the contour.The force-based approach is adopted as an alternative to direct energy minimization, while some rules are introduced to separate objects from background and to detect possible occlusions.

A. Uncertainty Region Estimation
The minimization procedure of snake's total energy is actually a problem of picking out the "correct" curve in the image, i.e. the curve which corresponds to the object of interest among a set of candidate curves, given an initial estimate of the object's contour.In this Section we propose a way to determine a region around the snake initialization, for each frame of a video sequence, in which the correct curve is located.This idea is not new, as stochastic models have been lately proposed in the literature, mostly as shape prior knowledge [28], to define possible positions of the curve points around an initialization.In the same direction, we introduce here the term "uncertainty region", which denotes that the minimization procedure (or the picking out of the correct curve) takes place inside that region, constraining the problem inside a narrow band around the snake initialization.Such regions are extracted by exploiting the motion history of the tracked contour (curve points' displacements in previous time instances), extracting statistical measurements of the motion: the previously estimated contour is deformed according to the previously calculated point displacements (initialization for the next frame) and the standard deviation of each point's mean motion is calculated; the uncertainty region around each point is then defined in terms of its corresponding standard deviation.The next step is to find the new position of each point of the curve, inside its corresponding uncertainty region, which corresponds to the minimum of a criterion, which is defined by the snake's energy terms described Section II.Let us define the contour of an object, located in the I-th frame (I > 1), of a video sequence as a vector of complex numbers, i.e., where k is the location of the k-th point of the contour.We define the instant motion of the k-th point of the object contour, computed in the I-th frame, as: where M F (I−1,I) (x k , y k ) is the motion vector of the pixel (x k , y k ) estimated with the use of a robust motion estimation technique proposed by Black et al. [2], between the successive frames I − 1 and I.
Based on the definition of the instant motion, we calculate the mean movement of the contour C up to frame I as: where is the corresponding mean movement of the k-th point of the contour.Similarly the standard deviation of contour's mean movement is defined as: where is the standard deviation of k-th point's mean movement.
In practice, equations ( 17) and ( 19) are computed based on the last L frames so as to take into account only the recent history of contour's movement, i.e., m(I) The initial estimation of the object's contour init in the frame I + 1 is computed based on the contour's current location and: (a) its mean motion, when no abrupt movements are expected to occur, i.e., or (b) its instant motion, when no knowledge about the motion of the desired object is available, i.e., C where m  24) E ext (V) and E model (V) are given by equations ( 3) and ( 8 according to the standard deviation of their mean movement, computed using the equations ( 18) and (21).The Gaussian formulation for the point oscillations is mainly adopted to show that each point of the curve is likely to move in the same way (amplitude and direction) that it has been moving until the current frame.init,(k) .If point k was moving with invariable velocity then the standard deviation of its movement is again s(I) c,k = 0 and the previous case holds regarding its uncertainty region.On the other hand, if point k was oscillating in the previous L frames, the standard deviation of its movement is high and consequently its uncertainty region is large.Figure 6 illustrates the proposed approach in steps, in the case of face tracking.Figures 6(a) and 6(b) present two successive frames of a face sequence and the respective contours.Figure 6(c) presents the amplitude of the computed standard deviation (in pixels) of the contour mean motion, and based on this standard deviation, the uncertainty regions are then extracted (Fig. 6(d)).

B. Force-Based Approach
The minimization of the equation ( 24) is a procedure of high complexity: if N is the number of points consisting the examined curve C and M is the number of all possible positions of each curve point C (I+1) init,k inside the extracted uncertainty region, assuming that M is the same for all points, then the number of all possible curves r ∈ R generated by points' oscillations is M N .In order to avoid that problem, we propose a force-based approach (instead of using a dynamic programming algorithm) where the energy terms, participating in the snake energy function, are transformed into forces applied in each curve point so as to converge to the desired object boundaries.Let us consider the curve V describing the object's contour.The object's contour at frame I is given by C (I) , and its initialization at frame I + 1 is given by C (I+1) init .Let also t be the set of the tangential unit vectors and n the set of the normal vectors of curve V, given by equations ( 27): We define the following forces acting at each contour point V k : represents the deformation of the curve along its normal direction.The property of the curvature distribution to take low values where the curve is relatively smooth and high values where the curve has strong variations, makes F c force curve to the initial shape (the one in the previous frame) and not to a smoother form.Moreover, we exploit the curvature's property to be positive where the curve is convex and negative where the curve is concave.Fig. 7 illustrates the directions of F c and F d along a curve.
These forces represent the internal snake forces that deform the curve V, initialized at init , according to the shape of the contour C (I) in the previous frame.The constraint of such a deformation is actually the first term of equation ( 24), i.e the external energy E ext , which is transformed into force as described in the following.Let us define g m,k (p), given by ( 30), be the modified image gradient function of all pixels p = x p + j • y p that: (a) belong to the uncertainty region U, and (b) lie on the line segment that is defined by the normal direction of the curve V at point V k .
The maximum of this function determines the most salient edge-pixel in the line segment defined above and thus defines the direction of the external snake force: where sgn k denotes the sign/direction of the external force to be applied to V k .Then, the external snake force for each point V k is given by: From the definition of the external energy term (equation ( 13)) it can be seen that it takes values close to zero in contour points corresponding to regions with high image gradient (G 2 m (V k ) 1) and values close to unity in regions with relatively constant intensity (G 2 m (V k ) 0).Thus, the term F e = [F e (k)|k = 1, . . ., N ] is proportional to G m and forces the curve to the salient edges inside the extracted uncertainty region.In the definition of this force we exploit the advantage of G m against |∇G σ * I|, to preserve the most important edges, as shown before, and thus the problem of the existence of many local maxima in equation ( 31) is eliminated.In the force-based approach, the examined curve V marches towards the object's boundaries in the next frame I + 1, according to the forces applied to it.Thus, the minimization of equation ( 24) can be approximated by using the internal and external snake forces defined above, in an iterative manner similar to the steepest descent approach [12], as it is summarized below.In particular, let V (ξ) be the estimated contour in the ξ-est iteration, then the following equations hold: ) are estimated according to equations ( 28), ( 29) and ( 33) respectively, and F model (V ) is the regularization force, according to the specific model adopted, given by: The final curve V corresponding to the contour C (I+1) is obtained when one of the following criteria is satisfied: (a) F τ (V (ξ) ) < a • F τ (V (ξ+1) ), where Parameter a is a positive constant in the range 0 < a < 1.When a is selected to be close to one C (I+1) is more likely to correspond to a local minimum solution; lower values of a increase the number of iterations and, therefore, the execution time.The statistical approach we follow to estimate the regions of uncertainty allows for the use of a close to one.
(b) The maximum number of iterations is reached.In this case It must be noted that the use of the proposed steepest descent approach does not ensure that the final contour corresponds to the solution of the equation ( 24).However, under the constraints we pose, even if C (I+1) corresponds to a local minimum, it is close to the desired solution (global minimum).

C. Weights Estimation
In equations ( 24) and (37) four energy and force terms, respectively, participate in the minimization procedure with different weights w 1 , w 2 , µ 1 and µ 2 .The choice of appropriate values for these weights is important for the method's performance.The values should be set depending on the amount of the background complexity and the smoothness of the object silhouette.For sequences with relatively smooth background (without any significant edges close to object boundaries, or edges far from object boundaries) the curve's external energy/force term is used as a reliable criterion and thus w 1 is set to higher value.Moreover, if the contour of the tracked object is complicated (not smooth) or noisy, the elasticity and smoothness energy/force terms are not reliable and thus w 2 is set to lower values.In order to automatically estimate the value of w 2 , it suffices to count the curvature and point density distributions' zero-crossings, which can give us the contour's local smoothness/elasticity.To estimate the value of w 1 , it suffices to calculate the mean values of the external energy at all pixels p inside the extracted uncertainty region U (as verified by trial and error).Thus, smooth background inside the uncertainty region results in higher mean values and w 1 is set to a higher value, whereas low mean values correspond to cases of complex/noisy uncertainty regions (great number of edge-pixels) and w 1 is set to a lower value.Fig. 8 illustrates three different sequences capturing moving objects of different contour complexities.Figs.8(a,d,g) represent the original images along with the moving object contours.At the first sequence, the background is relatively smooth and the object (car) has a uncomplicated contour.In the case of the aircraft, the background is also smooth but the contour is quite complicated, whereas in the third case, the walking man's contour is simple but the background is very cluttered.The respective external energies visualization is illustrated in Figs.8(b,e,h) where the background complexity can be clearly seen; the modified image gradient preserves the most salient edges and eliminates noise.Finally, in Figs.8(c,f,i) the complexity of the respective object contours is presented in terms of the curvature.The first and the second subplots of each case illustrate the x and y coordinate

and
Zc,man N man 0.05.The parameters µ 1 and µ 2 related to the internal snake force can be set according to the application under consideration.If strict prior model knowledge is available (e.g.medical applications), then the model can strongly influence the solution.In the contrary, if there is no high certainty regarding the model prior, then the first term of the internal force should affect more the solution.This competitive relation of the two internal force terms can be easily represented by allowing one of them to change according to the other in a functional manner (µ 1 = f (µ 2 )).

D. Rule-driven Approach for Complex Background and Partial Occlusion Cases
In order to separate background and object regions, especially when the background in not homogeneous (smooth), as well as to cope with moving object's partial occlusion that may occur, we introduce more constraints that p k in equation ( 31) must obey, so that its estimation will be reasonable.The adopted motion estimation technique [2] ensures the distinction between moving background and foreground even in hard to detect cases (slightly different movements).Therefore, without loss of generality, we suppose that the background is static and possible occluding objects are also static.Let m(I) c,k be the mean estimated motion of the k-th contour point at frame I, estimated through equation (17) or by any motion estimation algorithm as shown in Fig. 14, and p l and p m be the surrounding pixels of p k on the line segment, along which the function g m,k (p) is computed.Then p k must fulfill the following two constraints/requirements: (a) p k must divide that line segment in two parts: an immiscibly moving and an immiscibly static one, that is and (b) p k must be a moving point with velocity close to m(I) c,k , that is where u(•) denotes the instant velocity.Thus, taking the above constraints into consideration, we overcome cases such as (a) when the maximum is found in background: it is not a moving one and does not separate two immiscible (according to the motion) parts of the function g m , (b) when the maximum is found inside the moving object region: although it is a moving one, it does not divide the function g m in such two parts, (c) when occlusion occurs and the maximum is on the occluding object boundary: the maximum is not moving, although it makes the region g m separation and (d) when occlusion occurs and the maximum is in the occluding object region: neither the maximum is moving, nor it makes such a separation.In these cases, where these two constraints are not reached, we ignore the external force and evolve the curve according to its internal forces; in this way, we can obtain contours similar to the ones in the past frames.Fig. 9 illustrates the detection of occlusion with the use of the above defined rules that the local maximum p k (shown as minimum), corresponding to a curve point k, must obey.It has to be mentioned that treating large occlusions is limited by the capabilities of the estimated motion field.

IV. Experimental Results
The performance of the proposed approach is tested over a large number of natural sequences, where specific tracking problems emerge.The results presented in this section concern cases of different object shape complexities, different motions, noisy video sequences, complicated backgrounds, as well as partial occlusion.Finally, a specific application of the proposed method is shown, where the desired objects are human heads.It has to be mentioned that the contour initialization for the first frame in a sequence is done manually.The adopted time-window parameter L (number of past successive frames) is set to 5 for all the sequences under consideration.Additionally, the motion at the very first frame of each sequence is supposed to be zero Fig. 10 illustrates the case of tracking an object with complicated contour (low smoothness) moving in front of a relatively smooth background.In such a case, weight w 1 of equations ( 24) and ( 37) is significantly greater than w 2 ([w 1 , w 2 ] = [10,1]).In this case the desired object (aircraft) is moving towards the shooting camera and even if the object is rigid, its projection on the image plane is deforming (its contour expands) along the time.In Fig. 11 the case of car tracking in six successive frames of a traffic sequence is presented.In this example the desired object (car) is moving towards the camera and although it is rigid, its projection on the image plane is slowly deforming along time, as in the previous example.In this case, the sequence is more cluttered, although the car silhouette is smoother.The utilization of image pre-smoothing with the ASF and the modified image gradient, used for the external snake energy definition as described in Section II, result to the estimation of such accurate contours.Fig. 12 illustrates the method's performance in a strongly cluttered sequence, where the object is non-rigid and its motion projection is both rotational and translational rather than a simple translation or expansion/shrink.The low accuracy of the method in this case is mainly due to the large uncertainty regions extracted; on the other hand, the object is well detected localized in each frame of the sequence, due to the snake's external energy definition through the ASF pre-filtering and the image modified gradient estimation.In Fig. 13, the proposed approach is applied to a strongly cluttered sequence, where the desired object is a man walking in one direction.The contour of the moving human body is strongly deforming along time, resulting to large uncertainty regions, whereas the weights w 1 and w 2 are estimated to values 1 and 10 respectively.The accuracy of the method is based on the snake's external energy definition and the rule-driven approach described in Subsection III-D.Fig. 14 illustrates a case of two moving objects, where, as time goes by, the one is getting partially occluded by a static obstacle, while the other is moving in the front of the obstacle.In sub-figs.14 (a1-f1), the motion estimates are illustrated, showing that the noise is effectively eliminated on the boundaries between the static and moving regions even when the occlusion occurs, whereas the respective sub-figs.14 (a2-f2) show that both objects' contours are estimated with sufficient accuracy, due to the additional constraints (equations (42) and (43)), in which the maximum of equation ( 31) is imposed.In order to demonstrate the efficiency of the proposed approach we also evaluate the results of our technique applied to head (face) extraction.Obtaining quantitative results for such an application area is hard since no extensive ground-truth databases are available today.Therefore, in order to quantitatively assess the improvement to our algorithm achieved by the addition of the geometrical model we generated ground-truth masks for available sequences.Fig. 15 shows the ground-truth masks for selected frames.The presence of noise is strong since these sequences are extracted from TV clips.Consequently, the developed technique faces several difficulties not only due to occlusion/clutter cases but also due to vaguely defined object borders (weak gradient).The ground-truth database consists of 100 images.Figs.16 and 17 show specific applications of the tracking method with and without the geometrical model for head contour tracking.We present representative example frames from the TV clip collection and comment on the various difficulties introduced.Even though the head is a rigid object, its contour is being deformed on the image plane along the time, due to the projection of its motion.The head contours produced by the ruledriven model-based approach (Fig. 17) are obtained using an ellipsoid as shape prior.In order to get a visual grip of the way that an ellipsoid (par.II-A.1) affects the contour deformation we present the results in two ways: we superimpose (a) the ellipse model (soft) and the non-model based contour (Fig. 16), and (b) the model-based (bold) and the non-model-based contour on the original image (Fig. 17).Obviously, as illustrated at Fig. 16 an ellipsoid may affect the contour evolvement positively (Fig. 16(a1)-(d1) case), since it fits well with the actual head contour or negatively due to strong fluctuations from the actual head shape (Fig. 16(a2)-(d2) case-forehead area).We expect that the competitive nature of the different forces in the total energy formula will produce an acceptable result in terms of accuracy and total shape.Fig. 17 shows the final results of the proposed technique that meet our expectations.The obtained contours are smooth and capture accurately the side regions of the head, as shown clearly in Figs.17(a1)-(d1) and (a2)-(d2).The actual head contour of Fig. 17(a4)-(d4) case is fairly different than the corresponding "perfect" ellipsoid (Fig. 17(a4)-(d4) case) since our ground-truth generation is merely based on skin presence.However the final model-based contour seems to be much closer to the shape of a head and therefore it can be treated as a better way to verify the presence of a human head in a sequence even in the case of partial occlusion (due to hair in the considered case).Additionally, we provide a table with precision and recall measurements for the selected sequences in order to verify the improvement imposed by the model-based approach.As mentioned before, we extracted the ground-truth masks based on skin activation.Consequently, we expect slightly lower precision than recall values, since areas covered e.g. by hair may be considered as "head", but they are not included in the ground-truth masks.Table I compares the recall/precision values for the 16 selected frames used in Figs.16-17 and gives the overall values for the 100 images we tested.

V. Conclusions and Further Work
In this work we have presented a probabilistic application of Snakes for object tracking in clutter, partial occlusion and complex backgrounds.Statistical measurements of the object contour motion history are extracted to obtain uncertainty regions, in which the estimated contours are to be localized.In this way we constrain the solution in a narrow band around the next frames snake initialization; moreover utilizing various tools from image morphology, we eliminate noise.This approach is extended to cope with complex background and partial occlusion, introducing rule-based knowledge to separate objects from background and to detect occlusion.Finally, for specific applications where the desired object contours can be approximated by specific models, we use a shape prior knowledge in addition to the rule-driven approach so as to obtain more accurate contours.As indicated before, in this work our goal is to illustrate the increased robustness of the proposed method with the addition of a model rather than incorporating a novel prior constraint representation.Therefore, the future direction of our work is a more sophisticated representation and use of generalized geometric-based models, which will permit the method to deal even more efficiently with occlusions and perform tracking under various conditions (e.g.static or mobile camera).In this sense, a possible extension can be the incorporation of region-based tracking modules to the existing framework that will increase robustness.Additionally, covering large occlusions cases would require extensions of the method, e.g. using higher level representation of the moving regions, which is a topic of future research.We are currently examining such issues using "semantics" and ontological knowledge techniques in the framework of [36].

Fig. 1 .
Fig. 1.Curvature and point density distribution of a given contour.(a) The snake is locked at car boundaries, whereas the circled areas denote parts of the curve of high curvature and point density, (b) curvature distribution and (c) point density distribution.

Fig. 2 .
Fig. 2. Proposed model, constraining the obtained solutions to the application of the human head modeling and tracking.

Fig. 5 .
Fig. 5. Qualitative comparison between (a) a representative example of external energy term, using gaussian filtering, and (b) the proposed external energy term.
N ) ].The final solution, i.e. the desired contour C (I+1) = [C (I+1) (k) |k = 1 . . .N ], is obtained by solving the following equations: ) respectively, CU (C ) are the curvature and the point density values of the contour C (I) at the k-th point.Parameters w 1 and w 2 represent the weights with which the energy-based terms of equation (24) participate in the minimization procedure, whereas µ 1 and µ 2 control the model's influence on the final solution; more about these weights is discussed in paragraph III-C.The set of all possible curves R, defining the uncertainty region, emerge by oscillating the points of the curve C (I+1) init

Fig. 6 .
Fig. 6.The proposed tracking approach in steps.(a)-(b) Two successive frames of a face sequence and the respective contours.(c) Amplitude of the standard deviation of the contour mean motion leading to (d) the uncertainty regions of the curve.

Fig. 7 .
Fig. 7. Curvature-based and point density-based forces F c and F d , respectively, along the initialization of a curve V in the frame I + 1.

F
d = [F d (k)|k = 1, . . ., N ] represents the stretching component that forces points to come closer or draw away from each other along the curve, and it is always tangential to it.Thus, if the distance between two curve points C is greater than the distance between V k and V k+1 , then

Fig. 8 .
Fig. 8. Curvature and external energy terms.(a,d,g) different cases of curves and background complexity, (b,e,h) respective external energies visualization and (c,f,i) respective curvature distributions.

Fig. 9 .
Fig. 9. Detection of occlusion using the two rules of eqs.(42) and (43) for the local minimum p k of the function 1 − g m,k (p).
Recall and precision values, comparing the ground-truth results with the ones obtained using the proposed method with and without the shape prior model.