An advanced Bayesian model for the visual tracking of multiple interacting objects
- Carlos R del Blanco^{1}Email author,
- Fernando Jaureguizar^{1} and
- Narciso García^{1}
https://doi.org/10.1186/1687-6180-2011-130
© del Blanco et al; licensee Springer. 2011
Received: 14 May 2011
Accepted: 12 December 2011
Published: 12 December 2011
Abstract
Visual tracking of multiple objects is a key component of many visual-based systems. While there are reliable algorithms for tracking a single object in constrained scenarios, the object tracking is still a challenge in uncontrolled situations involving multiple interacting objects that have a complex dynamics. In this article, a novel Bayesian model for tracking multiple interacting objects in unrestricted situations is proposed. This is accomplished by means of an advanced object dynamic model that predicts possible interactive behaviors, which in turn depend on the inference of potential events of object occlusion. The proposed tracking model can also handle false and missing detections that are typical from visual object detectors operating in uncontrolled scenarios. On the other hand, a Rao-Blackwellization technique has been used to improve the accuracy of the estimated object trajectories, which is a fundamental aspect in the tracking of multiple objects due to its high dimensionality. Excellent results have been obtained using a publicly available database, proving the efficiency of the proposed approach.
Keywords
1 Introduction
Visual object tracking is a fundamental part in many video-based systems such as vehicle navigation, traffic monitoring, human-computer interaction, motion-based recognition, security and surveillance, etc. While there exist reliable algorithms for the tracking of a single object in constrained scenarios, the object tracking is still a challenge in uncontrolled situations involving multiple objects with complex dynamics. The main problem is that object detectors produce a set of unlabeled and unordered detections, whose correspondence with the tracked objects is unknown. The estimation of this correspondence, called the data association problem, is of paramount importance for the proper estimation of the object trajectories. In addition, visual object detectors can produce false and missing detections as consequence of object appearance changes, illumination variations, occlusions, and scene structures similar to the objects of interest (also called clutter). This fact makes more complex the estimation of the true correspondence between detections and objects. Another important issue related to the data association is the computational cost, since it grows exponentially with the number of objects.
To alleviate the data association problem, the tracking also relies on the prior knowledge about the object dynamics, which constrains the feasible associations between detections and objects. Nonetheless, the modeling of the object dynamics can be a very difficult task, especially in situations in which the objects undergo complex interactions.
Besides, the estimation of the object trajectories can be quite inaccurate in situations involving many objects due to the high dimensionality of the resulting tracking problem, which is called the curse of dimensionality [1].
In this article, an efficient Bayesian tracking framework for multiple interacting objects in complex situations is proposed. Complex object interactions are simulated by means of a novel dynamic model that uses potential events of object occlusions to predict different object behaviors. This interacting dynamic model allows to appropriately estimate a set of data association hypotheses that are used for the estimation of the object trajectories. On the other hand, a Rao-Blackwellization strategy [2] has been used to derive an approximation of the posterior distribution over the object trajectories, which allows to achieve accurate estimates in spite of the high dimensionality.
The organization of the article is as follows. The state of the art is presented in Section 2. The description of the tracking model for interacting objects is described in Section 3. The inference method used to estimate the object trajectories from the given tracking model is presented in Sections 4, 5, and 6. Results are shown in Section 7, and lastly, conclusions are drawn in Section 8.
2 State of the art
Many strategies have been proposed in the scientific literature to solve the data association problem. The simplest one is the global nearest neighbor algorithm [3], also known as the 2D assignment algorithm, which computes a single association between detections and objects. However, this approach discards many feasible associations. On the other hand, the multiple hypotheses tracker (MHT) [4, 5] attempts to compute all the possible associations along the time. However, the number of associations grows exponentially over time, and consequently the computational cost becomes prohibitive. Therefore, a trade-off between computational efficiency and handling of multiple association hypotheses is needed. In this respect, one of the most popular methods is the joint probabilistic data association filter (JPDAF) [6, 7], which performs a soft association between detections and objects. This consists in combining all the detections with all the objects, which prunes away many unfeasible hypotheses, but also restricts the data association distribution to be Gaussian. Subsequent works [8, 9] have tried to overcome this limitation using a mixture of Gaussians to model the data association distribution. However, heuristic techniques are necessary to prune the number of components and make the algorithm computationally manageable. The probabilistic multiple hypotheses tracker (PMHT) [10, 11] assumes that the data association is an independent process to overcome the problems with the pruning. Nevertheless, the performance is similar to that of the JPDAF, although the computational cost is higher.
The data association problem has been also addressed with particle filtering techniques. These allow to deal with arbitrary data association distributions in a natural way, establishing a compromise between the computational cost and the accuracy in the estimation. In practice, the performance of the particle filtering techniques depends on the ability to correctly sample association hypotheses from a proposal distribution. In [12], a Gibbs sampler is used to sample the data association hypotheses, while in [13, 14] a strategy based on a Markov Chain Monte Carlo (MCMC) is followed. The main problem with these samplers is that they are iterative methods that need an unknown number of iterations to converge. This fact can make them inappropriate for online applications. Some works [15–17] overcome this limitation by designing an efficient and non-iterative proposal distribution that depends on the specific characteristics of the tracking system. An additional problem is that the accuracy of the estimated object trajectories can be very poor due to the high dimensionality of the tracking problem. In [18], a variance reduction technique called Rao-Blackwellization has been used to improve the accuracy.
A random finite set (RFS) approach can be used as an alternative to data association methods, which treats the collection of objects and detections as finite sets. However, the computation of the posterior of a RFS is intractable in general, and therefore the use of approximations is required. In [19], a probability hypothesis density (PHD) filter is used in the context of visual tracking, which approximates the full posterior distribution by its first-order moment. The cardinalized PHD (CPHD) filter [20] is a variation of the PHD that is able to propagate the entire probability distribution on the number of objects. In [21], a closed form for the posterior distribution is derived assuming that the image regions that are influenced by individual states do not overlap.
One common limitation of the previous works is their limitation to track interacting objects. They cannot manage complex interactions involving trajectory changes and occlusions, since the assumption that the objects move independently does not hold. Part of the problem comes from the fact that these techniques were developed for radar and sonar applications, in which the dynamics of the target objects have certain physical restrictions that prevent the existence of the complex interactions that can occur in visual tracking. On the other hand, tracked objects are usually considered as point targets [22]. Therefore, occlusion events between tracked objects are not as problematic as in the field of visual tracking, wherein they are one of the main sources of tracking errors. Some works have proposed specific strategies to deal with the problems that arise in visual tracking. In [23, 24] data association hypotheses are computed using a sampling technique that is able to handle split and merged detections. These type of detections are typical from background subtraction techniques [25], which are used to detect moving objects in video sequences. In [26], an approach for handling object interactions involving occlusions and changes in trajectories is proposed. It creates virtual detections of possible occluded objects to cope with the changes in trajectories during the occlusions. However, tracking errors can appear when a virtual detection is associated to an object that is actually not occluded. In this article, a novel Bayesian approach that explicitly models the occlusion phenomenon and the object interactions has been developed, which is able to reliably track complex interacting objects whose trajectories change during occlusions.
3 Bayesian tracking model for multiple interacting objects
where each component contains the 2D position and velocity of a tracked object. The number of tracked objects N_{obj} is variable, but it is assumed that entrances and exits of objects in the scene are known. This allows to focus on the modeling of object interactions.
The sequence of available detections until the current time step is represented by z_{1:t}= {z_{1}, ..., z_{ t }}, where z_{ t } = {z_{ t, j }|j = 1, ..., N_{ ms }} contains the set of detections at the current time step t. The number of detections N_{ ms } can vary at each time step. Each detection z_{ t, j }contains the position of a potential object, and a confidence value related to the quality of the detection. Detections are obtained from each frame by means of a set of object detectors, where each detector is specialized in one specific type or category of object. Detections have associated an object category identifier according to the object detector that created them. In addition, some of the computed detections can be false alarms due to the clutter, and also there can be objects without any detection, called missing detections, as consequence of occlusions and changes in the object appearance and illumination.
where each component stores the occlusion information of one object. To express that the i th object is occluded by the l th object, o_{ t, i } = l is written. And, if the object is not occluded, it is expressed as o_{ t, i } = 0.
where the probability term in the denominator is just a normalization constant, and the other terms as explained as follows.
taking into account the conditional independence properties of the involved variables (see [27, 28] for an explanation of how to derive and apply the conditional independence properties given a graphical model). From now on, the conditional independence properties will be applied whenever possible to simplify probabilities expressions. These properties expresses three different characteristics of the tracking problem: first, p(x_{ t }| x_{t-1}, o_{ t }), that models the dynamics of interacting objects, depends only on the previous object positions and possible occlusions; second, since the detections are unordered, previous data associations and object positions are useless for the prediction of the current data association p(a_{ t }); and last, p(o_{ t }| x_{t-1}), that models the object occlusions, depends only on the previous object positions.
This expression reflects the fact that the data association between detections and objects is necessary for estimating the object trajectories.
Lastly, the object trajectories at the current time step are obtained by computing the maximum a posteriori (MAP) estimation of p(x_{ t }| z_{1:t}).
However, p(x_{ t }, a_{ t }, o_{ t }| z_{1:t}) cannot be analytically solved, and therefore neither can p(x_{ t }|z_{1:t}) be. This problem arises from the fact that some of the stochastic processes involved in the multiple object tracking model are nonlinear or/and non-Gaussian [29]. To overcome this problem, an approximate inference technique is introduced in the next section that allows to obtain an accurate suboptimal solution.
4 Approximate inference based on a Rao-Blackwellized particle filtering
The variance reduction technique Rao-Blackwellization has been used to accurately approximate p(x_{ t }, a_{ t }, o_{ t }| z_{1:t}) This technique assumes that the random variables have a special structure that allows to analytically marginalize out some of the variables conditioned to the rest ones, improving the estimation in high dimensional problems.
where p(x_{ t }| z_{1:t}, a_{ t }, o_{ t }) is assumed to be conditionally linear Gaussian, and therefore with an analytical expression known as the Kalman filter. This assumption arises from the fact that the object dynamics can be acceptably simulated by a constant velocity model with Gaussian perturbations if the object occlusions and the data association are known. That is, if the main sources of non-linearity and multimodality in the tracking problem are known. Section 5 derives the expression of p(x_{ t }| z_{1:t}, a_{ t }, o_{ t }) using a dynamic model for interacting objects.
where one association depends on the previous computed associations. If one detection fulfills the second and third restrictions, the object association probability is p(a_{ t, j }= i| a_{t,1}, ..., a_{t, j-1}) = p^{obj} that expresses the prior probability that one detection is associated with one object. In the same conditions, the clutter association probability is p(a_{ t, j } = 0|a_{t,1}, ..., a_{t, j-1}) = p^{clu}. If any of the restrictions is not fulfilled, the detection is associated to the clutter.
where an occlusion event depends on the previous computed occlusions. The probability that one object is occluded by another, providing that both objects have not been involved in previous occlusion events, is expressed by a Gaussian function that depends on the distance between the two considered objects. And in the same conditions, the probability that it is not occluded is determined by the probability density d^{vis}. In the case that any of the considered objects have been involved in previous occlusion events, the occlusion restrictions are applied to avoid non-realistic situations.
where i ∈ {1,..., N_{obj}}, ${\mathbf{r}}_{t,j}^{z}$ and ${\mathbf{r}}_{t,i}^{x}$ are the positional information of the detection and the object, respectively, d^{clu} is the clutter probability density, and Σ^{ lh } is the covariance matrix of the Gaussian function. The previous expression is only applicable between detections and objects of the same category, since the object association probability is zero otherwise.
The last probability term p(z_{ t }| z_{1:t-1}) in Equation 10 is just a normalization constant.
As occurred with p(x_{ t }, a_{ t }, o_{ t }|z_{1:t}), the posterior pdf p(a_{ t }, o_{ t }| z_{1:t}) has not analytical form. To overcome this problem, an approximate inference method based on a particle filtering framework is used to obtain a suboptimal solution, which is described in Section 6.
5 Conditional Kalman filtering of object trajectories
depending if the object is assumed to undergo an interaction or not. The event of interaction is managed by a Bernoulli distribution, whose parameter can be adjusted according to the expected number of interactions per occlusion.
The covariance matrix ${\widehat{\mathbf{\Sigma}}}_{t}$ is computed using the standard equations of the Kalman filter, taking into account that the prior covariance for occluded objects should be higher than that for non-occluded ones, since the uncertainty in the trajectory of an occluded object is usually higher.
where the parameters of the Gaussian function are obtained using the standard expressions of the Kalman filter. The update step only is applied to those objects that have associated a detection, determined by a_{ t, j }= i; i ∈ {1, ..., N_{obj}}.
6 Ancestral particle filtering of data association and object occlusions
which is a discrete probability defined in Section 4.
where a_{ t } acts as a parameter of f (x_{ t }; a_{ t }), det() is the determinant function, and Σ_{ f } is the covariance matrix of f (x_{ t }; a_{ t }).
where all the involved probability terms are discrete, and whose mathematical expressions are defined in Sections 4 and 5.
7 Results
The proposed Bayesian tracking model for interacting objects has been evaluated using the public database 'VS-PETS 2003' [31], which contains sequences of a football match. Given the great number and variety of player interactions, this dataset is very suitable for testing purposes.
The proposed tracking algorithm has been compared with the Rao-Blackwellized Monte Carlo data association (RBMCDA) method [18], a state-of-the-art tracking algorithm for multiple objects. Its main characteristics are the ability to handle false and missing detections, and the use of the Rao-Blackwellization technique to achieve accurate estimation in high dimensional state space. The main difference with the algorithm proposed in this article is the lack of an interacting model, which limits its ability to handle object interactions.
Tracking results for the proposed IRBMCDA algorithm and the RBMCDA algorithm used for comparison purposes
Object interaction description | Tracking results | |||||
---|---|---|---|---|---|---|
Interaction name | Interaction type | Number of players in interaction | Total number of players | Duration of interaction in frames | Number of errors for IRBMCDA method(the proposed one) | Number of errors for RBMCDA method |
interact-1 | Simple cross | 2 | 17 | 46 | 0 | 0 |
interact-2 | Simple cross | 3 | 17 | 72 | 0 | 0 |
interact-3 | Simple cross | 2 | 18 | 48 | 0 | 0 |
interact-4 | Simple cross | 2 | 18 | 50 | 0 | 0 |
interact-5 | Simple cross | 2 | 17 | 123 | 0 | 0 |
interact-6 | Simple cross | 2 | 16 | 99 | 0 | 0 |
interact-7 | Simple cross | 2 | 5 | 37 | 0 | 0 |
interact-8 | Simple cross | 2 | 5 | 56 | 0 | 0 |
interact-9 | Simple cross | 2 | 18 | 73 | 0 | 0 |
interact-10 | Complex cross | 2 | 14 | 36 | 0 | 0 |
interact-11 | Complex cross | 3 | 14 | 56 | 0 | 0 |
interact-12 | Complex cross | 2 | 13 | 55 | 3 | 36 |
interact-13 | Complex cross | 3 | 17 | 78 | 0 | 45 |
interact-14 | Complex cross | 2 | 15 | 69 | 0 | 0 |
interact-15 | Complex cross | 2 | 18 | 61 | 0 | 0 |
interact-16 | Complex cross | 2 | 17 | 113 | 0 | 87 |
interact-17 | Complex cross | 2 | 16 | 109 | 0 | 74 |
interact-18 | Complex cross | 2 | 17 | 50 | 0 | 0 |
interact-19 | Complex cross | 2 | 8 | 92 | 0 | 47 |
interact-20 | Complex cross | 2 | 10 | 126 | 0 | 84 |
interact-21 | Complex cross | 3 | 16 | 45 | 6 | 32 |
interact-22 | Complex cross | 2 | 18 | 38 | 0 | 0 |
interact-23 | Overtaking | 2 | 17 | 95 | 0 | 0 |
interact-24 | Overtaking | 2 | 17 | 60 | 0 | 0 |
interact-25 | Overtaking | 3 | 14 | 94 | 0 | 0 |
interact-26 | Overtaking | 2 | 14 | 35 | 13 | 14 |
interact-27 | Overtaking | 3 | 19 | 89 | 0 | 0 |
interact-28 | Overtaking | 2 | 19 | 29 | 12 | 15 |
interact-29 | Overtaking | 2 | 17 | 108 | 0 | 0 |
interact-30 | Overtaking | 2 | 15 | 90 | 0 | 0 |
interact-31 | Overtaking | 2 | 15 | 89 | 0 | 0 |
interact-32 | Overtaking | 2 | 10 | 27 | 1 | 2 |
interact-33 | Overtaking | 2 | 8 | 63 | 0 | 0 |
interact-35 | Overtaking | 2 | 14 | 100 | 0 | 0 |
interact-36 | Overtaking | 2 | 16 | 45 | 14 | 16 |
The results show that the proposed algorithm clearly outperforms the RBMCDA method in complex crosses, which are the most challenging interactions. The reason is that the RBMCDA method cannot handle trajectory changes during occlusions, since it assumes that the involved objects keep invariable their trajectories. On the other hand, the proposed IRBMCDA method explicitly considers this situation computing several object behavior hypotheses. In overtaking actions, the performance of the proposed method is slightly better, and the improvement is more noticeable when the duration of the interaction increases or the object velocities vary during the occlusion. In simple crosses, both algorithms correctly estimate the object trajectories since there are no changes in the object trajectories.
The main source of errors arises from situations involving players of the same team, since there is not enough information to reliably estimate the data association. A more sophisticated object detector would be needed, which provides richer information such as pose and shape. In spite of this fact, the tracking algorithm is able to identify when the trajectory estimation is not very reliable, since its variance is significantly higher in these cases.
8 Conclusions
A novel Bayesian tracking model for interacting objects has been presented. One of the main contribution is an object dynamic model that is able to simulate the object interactions using the predicted occlusion events among objects. The tracking algorithm is also able to handle false and missing detections through a probabilistic data association stage. For the inference of object trajectories, a Rao-Blackwellized particle filtering technique has been used, which is able to obtain accurate estimations in the presence of a high number of tracked objects. In addition, the presented tracking model can work with any object detector that provides at least positional information. The performed experiments have shown a great efficiency and reliability, especially in situations involving complex object interactions where the objects change their trajectories while they are occluded.
Declarations
Acknowledgements
This study has been partially supported by the Ministerio de Ciencia e Innovación of the Spanish Government under the Project TEC2010-20412 (Enhanced 3DTV).
Authors’ Affiliations
References
- Bellman RE: Dynamic Programming. Courier Dover Publications, New York; 2003.Google Scholar
- Doucet A, Freitas Nd, Murphy KP, Russell SJ: Rao-Blackwellised particle filtering for dynamic Bayesian networks. Proceedings of the Conference on Uncertainty in Artificial Intelligence 2000, 176-183.Google Scholar
- Blackman S: Multiple-target Tracking with Radar Applications. Artech House, Dedham; 1986.Google Scholar
- Reid D: An algorithm for tracking multiple targets. IEEE Trans Automat Control 1979,24(6):843-854. 10.1109/TAC.1979.1102177View ArticleGoogle Scholar
- Blackman S: Multiple hypothesis tracking for multiple target tracking. IEEE Trans Aerospace Electronic Syst Mag 2004,19(1):5-18.View ArticleGoogle Scholar
- Cox IJ: A review of statistical data association for motion correspondence. Int J Comput Vis 1993,10(1):53-66. 10.1007/BF01440847View ArticleGoogle Scholar
- Fortmann T, Bar-Shalom Y, Scheffe M: Sonar tracking of multiple targets using joint probabilistic data association. IEEE J Oceanic Eng 1983,8(3):173-184. 10.1109/JOE.1983.1145560View ArticleGoogle Scholar
- Pao LY: Multisensor multitarget mixture reduction algorithms for tracking. J Guidance Control Dynamics 1994, 17: 1205-1211. 10.2514/3.21334View ArticleGoogle Scholar
- Salmond D: Mixture reduction algorithms for target tracking in clutter. SPIE Signal and Data Processing of Small Targets 1990 1990,1305(1):434-445.View ArticleGoogle Scholar
- Gauvrit H, Le Cadre J: A formulation of multitarget tracking as an incomplete data problem. IEEE Trans Aerospace Electronic Syst 1997, 33: 1242-1257.View ArticleGoogle Scholar
- Streit R, Luginbuhl T: Maximum likelihood method for probabilistic multi-hypothesis tracking. SPIE Proceedings of the Signal and Data Processing of Small Targets 1994, 2235: 394-405.Google Scholar
- Hue C, Le Cadre J, Perez P: Tracking multiple objects with particle filtering. IEEE Trans Aerospace Electronic Syst 2002,38(3):791-812. 10.1109/TAES.2002.1039400View ArticleGoogle Scholar
- Khan Z, Balch T, Dellaert F: Mcmc-based particle filtering for tracking a variable number of interacting targets. IEEE Trans Pattern Anal Mach Intell 2005, 27: 1805-1918.View ArticleGoogle Scholar
- del Blanco CR, Jaureguizar F, García N: Robust tracking in aerial imagery based on an ego-motion Bayesian model. EURASIP J Adv Signal Process 2010,2010(30):1-18.Google Scholar
- Gordon N, Doucet A: Sequential Monte Carlo for maneuvering target tracking in clutter. SPIE Proceedings of the Signal and Data Processing of Small Targets 1999, 3809: 493-500.Google Scholar
- Doucet A, Vo B, Andrieu C, Davy M: Particle filtering for multi-target tracking and sensor management. Proceedings of the International Conference on Information Fusion 2002, 1: 474-481.View ArticleGoogle Scholar
- Cuevas C, del Blanco CR, Garcia N, Jaureguizar F: Segmentation-tracking feedback approach for high-performance video surveillance applications. IEEE Proceedings of the Southwest Symposium on Image Analysis Interpretation 2010, 41-44.Google Scholar
- Särkkä S, Vehtari A, Lampinen J: Rao-Blackwellized particle filter for multiple target tracking. J Inf Fusion 2007,8(1):2-15. 10.1016/j.inffus.2005.09.009View ArticleGoogle Scholar
- Maggio E, Taj M, Cavallaro A: Efficient multitarget visual tracking using random finite sets. IEEE Trans Circuits Syst Video Technol 2008,18(8):1016-1027.View ArticleGoogle Scholar
- Mahler R: Phd filters of higher order in target number. IEEE Trans Aerospace Electronic Syst 2007,43(4):1523-1543.View ArticleGoogle Scholar
- Vo B-N, Vo B-T, Pham N-T, Suter D: Joint detection and estimation of multiple objects from image observations. IEEE Trans Signal Process 2010,58(10):5129-5141.MathSciNetView ArticleGoogle Scholar
- Pulford G: Taxonomy of multiple target tracking methods. IEE Proceedings of the Radar Sonar and Navigation 2005,152(5):291-304. 10.1049/ip-rsn:20045064View ArticleGoogle Scholar
- Ma Y, Yu Q, Cohen I: Target tracking with incomplete detection. Comput Vision Image Understanding 2009,113(4):580-587. 10.1016/j.cviu.2009.01.002View ArticleGoogle Scholar
- Khan Z, Balch T, Dellaert F: Multitarget tracking with split and merged measurements. IEEE Proceedings of the Conference on Computer Vision and Pattern Recognition 2005, 1: 605-610.Google Scholar
- Piccardi M: Background subtraction techniques: a review. IEEE Proceedings of the International Conference on Systems, Man and Cybernetics 2004, 4: 3099-3104.Google Scholar
- Blanco CR del, Jaureguizar F, Garcia N: Visual tracking of multiple interacting objects through Rao-Blackwellized data association particle filtering. IEEE Proceedings of the International Conference on Image Processing 2010, 821-824.Google Scholar
- Bishop CM: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Berlin; 2006.Google Scholar
- Lauritzen S: Graphical Models. 1st edition. Clarendon Press, Oxford; 1996.Google Scholar
- Arulampalam S, Maskell S, Gordon N: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process 2002, 50: 174-188. 10.1109/78.978374View ArticleGoogle Scholar
- MacKay D: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge; 2003.Google Scholar
- PI INMOVE (2003) Vs-pets 2003 [Online]. Available: http://www.cvg.cs.rdg.ac.uk/VSPETS/vspets-db.html
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.