 Research
 Open Access
 Published:
Pedestrian tracking with an infrared sensor using road network information
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 26 (2012)
Abstract
This article presents a pedestrian tracking methodology using an infrared sensor for surveillance applications. A distinctive feature of this study compared to the existing pedestrian tracking approaches is that the road network information is utilized for performance enhancement. A multiple model particle filter, which uses two different motion models, is designed for enabling the tracking of both roadconstrained (onroad) and unconstrained (offroad) targets. The lateral position of the pedestrians on the walkways are taken into account by a specific onroad target model. The overall framework seamlessly integrates the negative information of occlusion events into the algorithm for which the required modifications are discussed. The resulting algorithm is illustrated on real data from a field trial for different scenarios.
1 Introduction
Accurate pedestrian tracking and anomaly detection are important hot topics in surveillance applications in the security area (see the surveys [1, 2] and the special issue [3]), where currently the demands on the operator are very high. Further, the tracking algorithms integrated in the sensors have the potential to solve some of the integrity problems currently associated with video surveillance. In order to obtain efficient solutions, in terms of both performance and cost, there is a need for automatic processing and analysis of imagery. Multiple pedestrian tracking is a very challenging task due to clutter, occlusion, etc. The exploitation of contextual information, such as maps and terrain information, is therefore highly desirable not only for the enhancement of the tracking performance, but also for behavior analysis and anomaly detection.
This article presents a sensor system with an infrared camera and sophisticated algorithms for pedestrian detection and tracking. The focus is here on the tracking part rather than the detector which is a classifier that is trained using a variant of boosting. The multiple pedestrian tracker is proposed to be a multiplemodel particle filter that uses prior information about walkways to enhance the estimation performance. The state of the art multiple model particle filters are used with two different models, namely, an onroad (roadconstrained) model and an offroad (unconstrained) model to perform tracking in 3D global coordinates. The proposed algorithms are applied to realworld imagery data where a number of pedestrians are walking around in a parklike environment.
The related literature is vast and spans the areas of research related to several academic communities. For this reason, we defer a more comprehensive survey until Section 2.4 and summarize below just the main contributions of this study compared to the existing literature.

1.
The use of the road network for pedestrian tracking, enabling multiple model approaches is novel. This has, to the best of the authors' knowledge, not been presented in literature before.

2.
The road network information use in target tracking have indeed been proposed earlier for road vehicles observed by a radar sensor, typically GMTI (ground moving target indicator). Compared to the state of the art GMTI based approaches, the following distinct properties of the pedestrian tracking make our study a significant contribution to roadconstrained tracking literature (see [4] and the references therein):

Better angular resolution of the sensors (compared to radar) enables tracking the lateral position on the road.

Pedestrians move much more freely and independently than cars, so the algorithm cannot rely on the motion model to the same extent.

Switches between onroad and offroad modes occur more frequently, increasing the need for robust mode tracking.

3.
The multiple model framework with onroad and offroad modes

gives better tracking performance, independently of which state of the art algorithm that is used (MMPF or IMMPF);

provides improved predictions during occlusion by using the concept of negative information;

serves well for planning the pan/tilt/zoom of the camera via improved predictions;

includes statistical tools that can be used to calculate the switching times, frequency, corresponding positions, and correlation for such events between different pedestrians which makes it possible to learn what is normal behavior. This is in fact a technical enabler for future anomaly detection algorithms.

4.
Although the road network information has been used in GMTI based target tracking before, the number of examples in which real world experiments were performed is very few. Our algorithm presented in this study is applied to a real world data set and the resulting estimates are compared to GPS data which answers some fundamental questions as to what the achievable accuracy in this type of application would be.
We finalize this section with a brief outline of the remaining parts of the article as follows. Section 2 introduces the elements of the surveillance problem considered in this article such as surveillance environment, prior knowledge, and sensor system. In particular, a global overview of the multiple pedestrian motion models is given, and the pedestrian image detector is described. The section ends with a literature survey of the related research. Section 3 gives a brief introduction to estimation theory and multiple target tracking from a particle filter perspective. In Section 4 the specific models of on/offroad pedestrian motion and the infrared sensor are described in detail and the proposed multiple model pedestrian tracking particle filter is presented. The filter is applied to a realworld data set and the results are illustrated in Section 5. Finally, in Section 6 some conclusions are drawn along with the discussion of the results.
2 Problem description
We consider a surveillance scenario where a sensor system with an infrared camera is monitoring a certain area with a number of known walkways. Detected pedestrians must be tracked simultaneously. The detector and tracking modules would be an essential part of (semi)autonomous surveillance systems corresponding to the autonomous unmanned aerial vehicle (UAV) framework presented in [5, 6] where also sensor management is an important part. The sensor management controls the movement of the sensor platform and the pointing direction of the pan/tilt infrared camera such that the performance of the tracking and monitoring is as good as possible.
One major tool for providing a "situation awareness" of the scene is to estimate interesting states of the environment. These states can have very different properties, depending on the mission and the user requirements, but in this study the position, velocity, etc., of the pedestrians are important. In order to improve the tracking performance prior knowledge about the walkway network will facilitate the estimation process.
2.1 Multiple pedestrian motion models
The walkway network is available for a park like environment, see the orthophoto with the network overlaid in Figure 1. An infrared sensor is located south of the area pointing upwards, the approximate sensor footprint on the ground is also shown. One image frame is shown in Figure 2 with the walkway network projected onto the image. We will use the symbolic notation {\mathcal{I}}_{RN} to denote the road network information. (The terms road and walkway are used interchangeably in this article. The terms pedestrian and target are also used interchangeably.)
Suppose we would like to track pedestrians which can move both onroad and offroad. We consider two different state space representations corresponding to onroad and offroad target modes,
where the vectors {x}_{t}^{r}\in {\mathbb{R}}^{{n}_{x}^{r}} and {x}_{t}^{g}\in {\mathbb{R}}^{{n}_{x}^{g}} represent the state vectors of the target in onroad and offroad (global) coordinates, respectively. The functions f^{r}(·) and f^{g}(·) are in general nonlinear functions. The process noise terms {\eta}_{t}^{r}\in {\mathbb{R}}^{{n}_{x}^{r}} and {\eta}_{t}^{g}\in {\mathbb{R}}^{{n}_{x}^{g}} are assumed to be white. The process noise {\nu}_{t+1}^{r}\in \left\{1,2,...,{N}_{r}\left({x}_{t}^{r}\right)\right\} determines which road segment the target will follow in the next sampling interval in case more than one alternative exists. We assume the availability of prior probability density functions (or probability mass functions in the discrete case) {p}_{{\eta}_{t}^{r}}\left(\cdot \right),{p}_{{\eta}_{t}^{g}}\left(\cdot \right), and {p}_{{\nu}_{t}^{r}}\left(\cdot \right) for the random variables {\eta}_{t}^{r},{\eta}_{t}^{g}, and {\nu}_{t}^{r}, respectively.
In order to be able to use both models at the same time, one always needs the appropriate functions to convert the state vectors given in one of the representations into the other representation. For this purpose we assume the availability of two transformation functions named T^{gr}(·) (transformation from road coordinates to global coordinates) and T^{rg}(·) (transformation from global coordinates to road coordinates).
The measurements associated with the target are modeled according to relations
where h^{g}(·) is in general a nonlinear function of the global state of the target and {e}_{t}^{g} is white measurement noise. We assume that the probability density function {p}_{{e}_{t}^{g}}\left(\cdot \right) is available. Note that with this notation, the measurements related to onroad coordinates of the target can be written to satisfy
The hypothesis (event) that the target is moving onroad or offroad is modeled by a discrete variable q_{ t }∈ {1,2} where the events {q_{ t }= 1} and {q_{ t }= 2} correspond to the hypotheses that the target is onroad and offroad, respectively. According to the value of the variable q_{ t }the corresponding dynamics of the target given in (1) and (2) must be used. It is assumed that q_{ t }is a homogeneous possibly state dependent Markov chain with transition probability matrix denoted as Π = [π_{ ij }] where
This modeling framework where the underlying dynamics of the target evolves based on a Markov chain belongs to the class of so called jump Markov nonlinear systems in the literature (see [7] and the references therein).
2.2 Infrared sensor system
The experimental sensor system consists of a gyrostabilized gimbal with IR and CCD video sensors, and an integrated highperformance navigation system. The navigation system combines GPS with data from an inertial measurement unit (IMU) mounted with reference to the optical sensors. However, in the experiments presented in this article external landmarks with known location have also been used to estimate the orientation of the camera relative the world frame by using standard camera calibration techniques [8].
The IR sensor in the gimbal is a FLIR systems ThermaCAM SC3000, which is a longwave infrared (LWIR) sensor with a quantum well infrared photodetector (QWIP) focal plane array. It has a low noise equivalent temperature difference (NETD) of 30 mK. The detector array is composed of 320 × 240 pixels with a comparatively narrow spectral sensitivity of 8.0  9.2 μm, which corresponds to the wavelength peak of an equivalent black body radiator at 25°C. The digital output has a resolution of 14 bits/pixel and a frame rate of 50 Hz. The mounted optics has a fieldofview of 20° × 15° which gives a spatial angular resolution of 1.1 mrad per pixel.
2.3 Target detector
The detection problem is to find targets in cluttered backgrounds and the output from the detector is a set of image coordinates for all detections in each video frame. In this study a sliding window approach is used to detect pedestrians in cluttered backgrounds [9]. At each image position, the content of a local image region is fed into a classifier that decides whether or not the region contains a target.
The classifier is trained using a variant of boosting [10]. Boosting iteratively builds a highly discriminative classifier by combining the outputs of many component functions often referred to as "weak learners". Applying the resulting classifier to an image window x, the output can be written as F(x) = Σ_{ i }f_{ i }(x) and the window is classified as containing a target if the confidence sum F(x) is greater than a threshold that is set to achieve an acceptable false alarm rate. Viola and Jones [11] proposed a highly efficient cascadestructured detector architecture where each stage is a boosting classifier that is trained to reject a moderate fraction of the remaining background examples, while retaining a large fraction of the target examples. This leads to an exponential decay in the probability that a retained window belongs to the background class. Another important contribution by [11] is the design of weak learners that can be computed very efficiently.
In the ViolaJones detection framework each weak learner bases its decision on the response of a single Haarlike image feature, which can be computed very efficiently using a socalled integral image representation. In addition to Haarlike features, our implementation also uses more discriminative (but computationally more expensive) gradient histogram features, similarly to Laptev [12]. We adopt the soft cascade detector architecture [13] which allows for efficient tradeoff between accuracy and speed.
Figure 3 shows an infrared image frame with a number of pedestrian detections. The false alarm rate is very low, and persistent false alarms can easily be handled by the tracking filter, or ignored if the detection location is in unreasonable areas according to prior information of the buildings and environment. Nonpersistent clutter is handled by a suitable initiator logic that prevents the false alarms to give rise to new tracks.
2.4 Related research
Visual surveillance and crowd analysis in dynamic scenes with humans are very active research topics in computer vision [2, 14]. The possible applications are numerous, and so are the number of publications in the area.
This article's focus is on the object tracker part of the surveillance system, see [15] and the references therein for an overview. The study [16] is an early publication where a particle filter is used for visual contour tracking. In [17] a mixture particle filter and an Adaboost detector is used to track multiple objects (hockey players) in a video stream. Visual tracking is often performed in the image plane with the benefit of keeping the state dimension low and avoiding the calibration of extrinsic camera parameters, i.e., the location and orientation of the camera relative a world reference frame. In this study, tracking is performed in global coordinates which simplifies the motion model of the target and also makes it easier to combine with other tracking systems and contextual knowledge about the environment. Tracking in global coordinates with a vision sensor is essentially equivalent to tracking with a bearingsonly sensor which has been traditionally treated in the target tracking community, see [18, Chapter 6] and the references therein.
Association is a hard problem, especially with a single camera in crowded environments with occlusions. A hierarchical association approach is proposed in [19] to form the trajectories of the pedestrians. The method also contains an automatic scene structure estimator. The study [20] estimates the probabilities of the occupancy bins in the ground plane represented as a grid. The Viterbi algorithm is then used to estimate target trajectories in a sequence of frames. One common approach for handling occlusion is to use multiple views in order to be able to utilize the depth information. In [21] a planar homography constraint is used to locate the targets on the ground plane. Only the types of occlusion which are due to stationary and known objects like buildings and trees are considered in this study.
In a classic surveillance setup the vision sensors are stationary, but in recent years a number of pedestrian detection and tracking systems have been proposed for moving cameras in automotive applications, see e.g., [22]. The study [23] uses structurefrommotion to estimate the ground plane that supports the target tracking.
Target tracking with road network information requires methodologies which can keep the inherent multimodality of the underlying probability densities. The first attempts [24–26] used the jumpMarkov (non)linear systems in combination with the interacting multiple model (IMM) algorithm [27, 28] with extended Kalman filters (EKFs) as subblocks. Since the different road segments correspond to different modes in these IMM algorithms, there are too many of them to be considered at a single step of the multiple model filter. Hence, these algorithms applied the socalled variable structure interacting multiple model (VSIMM) algorithm [29] which adds/removes modes into/from the filter when necessary.
Important alternatives to IMM based methods appear in [30],[18, Chapter 10] which propose variable structure multiple model particle filters (VSMMPF) as an extension of the VSIMM approaches. Since the particle filters can handle nonlinear and nonGaussian models, the user has much more freedom than in VSIMM modeling. The road constraints are handled using the concept of directional process noise. In [31] the roads are 3D curves represented by linear segments and the road network is represented as a graph with roads and intersections as the edges and nodes, respectively. The position and velocity along a single road are modeled by a standard linear GaussMarkov model. The target can be masked both by the clutter notch of the sensor and by terrain obstacles. The results for a Gaussian sum filter (see also [32]) and a standard bootstrap particle filter approach are presented.
A considerable amount of research effort has been made in the literature for improving particle filter based methods in terms of both performance and computational efficiency. The socalled optimal proposals and RaoBlackwellization have been utilized to produce more efficient particle filters. In this respect [33] proposes an unscented particle filter (UPF) in a GMTI context and it is shown that fewer particles are needed compared to VSMMPF. Optimal proposal densities are also used in [34]. However, the use of them unfortunately requires the combinatorial enumeration of all the possible models and the road segments the target can use in the next sampling period which might, at the same time, be a computational bottleneck. The proposed filter is applied to a GMTI target tracking example and it also utilizes RaoBlackwellization of the full kinematic state in order to minimize the number of particles, i.e., given the road segment the target is on, the whole kinematic target state is represented by a Gaussian density. A more recent example of the RaoBlackwellized particle filter is given in [35] to solve the road target tracking problem with a bearingsonly observation model. Compared to other RaoBlackwellized and filter bank approaches [33, 34], this study treats not only the road identity, but also the position along the road as a nonlinear state. This means that the probability densities with multiple modes along a single road can be handled, and this is often the case in tracking applications with a vision sensor when buildings and vegetation are possibly occluding the road.
In the standard bootstrap version of the particle filter, the number of particles in each mode is determined by the posterior probability of that mode. In the case of some unexpected events, like a sudden onroad to offroad transition, particle degeneracy happens if the new mode has too few particles. There are already some alternatives in the literature proposed for establishing robustness against this phenomenon with road networks. An example using the VSMMPF methodology is presented in [36] where user selected number of particles can be used in each mode of the filter by making use of the socalled "variablemass" idea. Another important alternative is the interacting multiple model particle filter (IMMPF) of [7] which is applied to the road target tracking case in [37] with an onroad and offroad mode.
Recent advances in multiple target tracking [38, 39] have resulted in random set theoretic methods [40] and in [41], an instance of such methods, namely a cardinalized probability hypothesis density (CPHD) filter [42] was presented for multiple ground target tracking. An example, with two groups of targets with four single targets in each group, is given. Track extraction is shown to be faster if the road information is used with the same road network model and observation model (GMTI) as in [31].
3 Multitarget tracking
Classical multitarget tracking consists of three subproblems; detection, association, and estimation [39, 28]. The multitarget tracker used in this study follows this structure, i.e., the detections are treated by an association step where each observation is associated with a known target track. The state of each target is estimated and predicted by a single target filter, and the observations are used to improve the result. If an observation cannot be associated with a known target, a new tentative filter is initialized.
In this section the target tracking problem is described by presenting first the general estimation solution and then the particle filter that is used to compute the posterior estimates. The association problem is briefly described and, in particular, a classical association technique is tailored to the particle representation.
3.1 The general estimation solution
The aim of this section is to introduce the recursive state estimation theory. Let x_{ t }denote the state of the target at time t and let y_{ t }be an observation of the target at time t. Assume that the target state evolution can be represented as a hidden Markov model composed of the transition model p(x_{t+1}x_{ t }) and the observation likelihood function p(y_{ t }x_{ t }). Let y_{1:t}= {y_{1}, y_{2},..., y_{ t }} denote the set of all observations up to and including time t. A recursive state estimator is given by the Bayes rule and can be expressed as the wellknown measurement update formula
and the one step ahead prediction
The normalizing factor α_{ t }can be calculated as
The above equations represent the so called Bayesian filter and there are only few cases when it is possible to derive the analytical solutions for them. One case is the linear Gaussian case, leading to the well known Kalman filter (KF). In the general case, numerical approximations are necessary. One common technique is to approximate the target density p(x_{ t }y_{1:t}) by a particle mixture as in the particle filter (PF).
3.2 Particle filter
In a PF the target density p(x_{ t }y_{1:t}) is approximated by a particle mixture, containing N particles {\left\{{x}_{t}^{\left(i\right)}\right\}}_{i=1}^{N} and their corresponding importance weights {\left\{{w}_{t}^{\left(i\right)}\right\}}_{i=1}^{N}. Thus, the approximation is expressed as
where
and δ(·) is the Dirac delta distribution. This approximation is very suitable for calculating the integral in (7) and it can be shown that this approximation converges to the true solution as the number of particles goes to infinity, see [43] and [44] for the details on particle filtering. The importance weights {\left\{{w}_{t}^{\left(i\right)}\right\}}_{i=1}^{N} are computed using importance sampling where samples {\left\{{x}_{t}^{\left(i\right)}\right\}}_{i=1}^{N} are drawn from a proposal density q(x_{ t }x_{t1},y_{ t }). The filter recursion (6) and (7) can be expressed as
where the weights are normalized such that {\sum}_{j=1}^{N}{w}_{t}^{\left(j\right)}=1. If the proposal density is selected as the state transition model, the filter recursion is simplified to
This is perhaps the simplest particle filter and is called Bootstrap particle filter (BSPF) [44].
A resampling step is needed to prevent degeneration, see [45] for details. The so called systematic resampling algorithm was used in this study.
3.3 Association
The detector provides image coordinates of the measurements in each video frame, but it does not provide any information about the correspondence between the measurements at different times. An association method is used to handle this problem. Association is the process of assigning measurements to existing tracks, or existing tracks to measurements.
The association method used in this study is based on the global nearest neighbor (GNN) algorithm [39], but in contrast to the classical GNN where the target densities are assumed to be Gaussians, a more general approach is here used with the particle mixture approximation. Basically, the method computes the likelihood of each possible measurement to track correspondence and chooses the most likely global association hypothesis which gives the origins of all the measurements in current measurement set. The most likely association of measurements and tracks (or false alarms) is determined using the auction algorithm [39]. Letting P_{ D }be the probability of detection, the log likelihood that the measurement j belongs to target k is defined as
A suitable approximation, in the particle filter context, of the predictive likelihood {p}^{k}\left({y}_{t}^{j}{y}_{1:t1}\right) is
where the particles {x}_{tt1}^{k\left(i\right)} are sampled from a proposal density q\left({x}_{t}^{k\left(i\right)}{x}_{t1}^{k\left(i\right)},{y}_{t}^{j}\right) and the predictive weights are
A similar calculation was used in [46] in a joint probabilistic data association framework. If observation model is represented as {y}_{t}^{j}=h\left({x}_{t}\right)+{e}_{t},\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{e}_{t}~\mathcal{N}\left({e}_{t};0,R\right), then p\left({y}_{t}^{j}{x}_{tt1}^{k\left(i\right)}\right)=\mathcal{N}\left({y}_{t}^{j};h\left({x}_{tt1}^{k\left(i\right)}\right),R\right). If the bootstrap particle filter is used, the weights are {w}_{tt1}^{k\left(i\right)}={w}_{t1}^{k\left(i\right)} and the particles {x}_{tt1}^{k\left(i\right)} are obtained by simulating the particles {x}_{t1}^{k\left(i\right)} according to the motion model.
It is assumed that the nonpersistent false alarms are uniformly distributed in the image plane and their number is Poisson distributed with rate β_{ FA }. The log likelihood that measurement j belongs a nonpersistent false alarm, is then given as l_{j,FA}≜ log(β_{ FA }).
Measurements that are not associated to any confirmed or tentative tracks are used to create new tentative tracks. A basic M/Nlogic [28] is used for determining when a tentative track will be considered as confirmed. If a tentative track is updated with measurements for M out of N consecutive frames, it is considered as a confirmed track. Furthermore, a target is considered as lost and the track is deleted if no measurements are associated to the track for a number of consecutive frames, or the state covariance is too large.
Remark 1 Classical target tracking also uses a gating step to exclude very unlikely measurement to the track assignments. The main purpose is to reduce the overall computational load since the gating is much cheaper to evaluate compared to association. In this study the gating step is removed since a reasonable gating criterion needs a similar amount of computational power as the log likelihood measures above. Furthermore, the number of detections and targets are quite low in our application, hence, the number of possible assignments are reasonably low.
Example 1 (Association: Particle mixtures vs. Gaussianity assumption) Note that classical association methods often assume Gaussian target densities. The association method presented here does not have such assumptions and will handle the possible multimodal and/or nonGaussian target densities in a reasonable way. See the example in Figure 4 where the particle mixtures of two targets are shown. The means of the particle mixtures are indicated by a plussymbol and a circlesymbol, respectively. Now assume that two detections, which are denoted by stars, have been received. The association methods proposed here will associate the lower right detection with target 1 (if the measurement noise is reasonably small, and the P_{ FA }is low). This is despite the fact that the mean of target 2 is very close to that detection. A Gaussian density assumption would in fact switch the association decisions yielding an unreasonable matching.
4 Road constrained pedestrian tracking with MMPF
In this section the onroad and offroad motion models and the observation model are described in more detail compared to the introduction in Section 2.1. After the specific models are presented, the multimodel particle filter algorithm is described and some implementation issues will also be considered.
4.1 Onroad motion model
In a geographic information system (GIS) different forms of geographically referenced information can be analyzed and displayed. There are two classical methods to store GIS data: raster data (images) and vector data. Different geometrical types can be described by vector data and basically there are three broad type categories; zerodimensional points are used to represent pointsofinterest, lines are used to represent linear features such as roads and topological lines, and polygons are used to represent particular areas such as lakes. There exist many approaches to store geospatial vector data and one common representation is the environmental systems research institute (ESRI) shapefile [47].
For target tracking purposes it is sometimes convenient to have a slightly different representation with redundant information to facilitate and speed up the data processing. In such a case, one data structure represents the roads and this structure contains the road stretch and the corresponding attributes. This structure is more or less the raw shape data plus an ID number for each road and an intersection ID for the each road end. An additional structure is used for the intersections and it contains the location and all connected roads (IDs) of each intersection. The exact structure of the data depends on what type of additional information is included, such as travel direction and prior probabilities for roads at an intersection.
In this study the road network information I_{RN} contains the two data structures mentioned above. The road information structures contain the following fields

ID  unique road ID

N  number of road segments

X  (3 × N) vector with 3D coordinates

d  (1 × N) vector with the cumulative distances of all road segments

w  width of the road

i _{1}  (1 × N _{1}) vector containing the intersection ID of N _{1} roads connected to the start intersection

i _{2}  (1 × N _{2}) vector containing the intersection ID of N_{2} roads connected to the end intersection

p _{1}  (1 × N _{1}) vector containing prior probabilities of each connected road to the start intersection

p _{2}  (1 × N _{2}) vector containing prior probabilities of each connected road to the end intersection and the intersection structure contains

ID ^{i} unique intersection ID

X ^{i} (3 × 1) 3D location of the intersection.

N ^{r} number of connecting roads

ID ^{r} (1 × N ^{r}) vector with IDs of the connecting roads
Assume that a road network description as defined above is available. The target is assumed to be on one of the roads all the time. A curvelinear coordinate system is defined for each road. Which road a target currently travels on is described by a mode parameter m. Let x^{r}∈ [0, [d]_{ N }] be the longitudinal position along the road relative the road start ([d]_{ N }is the last element in the cumulative distance vector d, or in other words, the total length of the road). v^{r}is the longitudinal speed and y^{r}and z^{r}are the lateral and the vertical deviation relative the road, respectively.
The onroad state vector is defined as x^{r}≜ (x^{r}y^{r}z^{r}v^{r})^{T} and the dynamic target model {f}^{r}\left({x}_{t}^{r},{\mathcal{I}}_{RN},{\eta}_{t}^{r},{\nu}_{t}^{r}\right) in (1) can, as long as the target stays on the same road, be expressed as the linear discretetime model
where the process noise is {\eta}_{t}^{r}~\mathcal{N}\left(0,{Q}^{r}\right) and β_{ i }∈ {β0 < β ≤ 1}, i = y^{r}, z^{r}, are constants.
Thus, the target state is updated according to the linear model in (18), but a feasibility check is needed after every update. If the target has passed an intersection and is outside the current road, a nonlinear state update is also needed. A new road connected to that intersection is selected randomly among the roads i_{1/2} according to some discrete random variable {\nu}^{r} given the road probabilities p_{1/2}. In such a case, the mode parameter m_{t+1}is set to the new road and the longitudinal distance outside the old road is used to update {\mathsf{\text{x}}}_{t+1}^{r}. Note that the direction of the old and new roads affects the update of {\mathsf{\text{x}}}_{t+1}^{r}. Furthermore, the sign of the longitudinal velocity {\mathsf{\text{v}}}_{t+1}^{r} needs to be changed if the travel directions on the roads are opposite.
The standard choice for the constants {\beta}_{{\mathsf{\text{y}}}^{r}} and {\beta}_{{\mathsf{\text{z}}}^{r}} is 1, but β_{ i }< 1 can be used to constrain the standard deviation of the state i. In practice, if 0 < β_{ i }< 1 and no observations of the target is received, the state i will approach zero. This is in general a reasonable behavior since we do not want the prediction to deviate too much from the actual road network.
4.2 Offroad motion model
The offroad motion model {f}^{g}\left({x}_{t}^{g},{\eta}_{t}^{g}\right) in (2) is selected to be the following constant velocity model with the state vector x^{g}= (x^{g}y^{g}z^{g}v^{g}ψ)^{T}, where x^{g},y^{g},z^{g}is the 3D location in a global Cartesian reference system, v^{g}is the translational speed in the x^{g}y^{g}plane, and ψ is the course. The model is expressed as
where {\beta}_{{\mathsf{\text{z}}}^{g}}\in \left\{\beta 0<\beta \le 1\right\} is a constant design parameter. The process noise is distributed as {\eta}_{t}^{g}~\mathcal{N}\left(0,{Q}^{g}\right) and ideally Q^{g}is state dependent, but in this study only constant covariance matrices are considered for simplicity.
Remark 2 (Incorporating the ground model) The default value of the constant {\beta}_{{\mathsf{\text{z}}}^{g}} is 1, but in the case of a stationary bearingsonly sensor the constant needs to be less than 1 to make the estimation problem observable. An implicit incorporation of a known ground model into the problem is possible here by defining the state z^{g}as the deviation from the ground model.
4.3 On/offroad transformations
As mentioned in Section 2.1 we need appropriate functions to convert the state vectors given in one of the representations into the other representation.
The function T^{gr}(·) converts a state vector given in onroad coordinates to offroad (global) coordinates. This is generally an easy task and the global 3D position is found by interpolation. The underlying function is given by the array X in {\mathcal{I}}_{RN} in the points d.
The function T^{rg}(·), on the other hand, has to find the closest onroad coordinate state corresponding to a state vector in global coordinates. This is more involved in that one generally has to search in the road database for the closest point on the road network to the position component of the global state vector and has to project the velocity and other quantities onto their equivalents in the road network. It might also be useful to have a feasibility test by just checking if the lateral deviation state y^{r} is smaller than the road width (denoted as w in {\mathcal{I}}_{RN}).
A globalization function T^{g}(·,·) is defined for later use as
4.4 Observation model
A detection consists of the image coordinate and the height and width of the detection window. In the tracking filter the location of the feet of the pedestrian is used, so a foot detector is also needed. The position of the feet are transformed to azimuth and inclination angles given the perspective projection formula and knowledge of the sensor orientation and the intrinsic camera parameters. Thus, the observation model is a bearingsonly model where the azimuth and inclination describe the direction to the target relative the sensor platform.
Let x^{s}= (x^{s}y^{s}z^{s})^{T} be the position of the sensor relative to a global Cartesian reference system. An observation at time t is the relative angles between the sensor in x^{s}and the target in x^{g}, i.e.,
where e_{ t }is the measurement noise modeled according to the Student's Tdistribution
where \nu is the degreeoffreedom. Note that the Gaussian probability distribution \mathcal{N}\left(x;\mu ,\Sigma \right) is a special case of Student's Tdistribution \mathcal{S}t\left(x;\mu ,\Sigma ,\nu \right) when the degreeoffreedom \nu is ∞. For 1\le \nu <\infty the distribution resembles a Gaussian function but with heavier tails. The reason for selecting the Student's Tdistribution is that it has been seen in early empirical trials to make the PF more robust to outliers.
Remark 3 (Observability) It is a well known fact that the observability in bearingsonly tracking is highly dependent on the sensor trajectory, see [6] and references therein. In particular, for a stationary camera some additional information is required, e.g., a road network or a ground elevation model, see Remark 2.
4.5 Multiplemodel PF
In a multimodel particle filter (MMPF) one keeps the particles {\left\{{x}_{t}^{\left(i\right)},{q}_{t}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}} and their weights {\left\{{w}_{t}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}}, where {x}_{t}^{\left(i\right)} is the state of the particle with respect to either road coordinates \left({x}_{t}^{r,\left(i\right)}\right) or global coordinates \left({x}_{t}^{g,\left(i\right)}\right) according to the value of the onroad/offroad hypothesis variable {q}_{t}^{\left(i\right)} i.e.,
Having these particles one can always calculate the density of the state of the target in global coordinates as
Using the density function (24), the minimum mean square error estimate of the target state in global coordinates is given by
with a covariance
The particle filter calculates with each measurement the updated particles {\left\{{x}_{t}^{\left(i\right)},{q}_{t}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}} and their weights {\left\{{w}_{t}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}} from the corresponding previous particles {\left\{{x}_{t1}^{\left(i\right)},{q}_{t1}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}} and weights {\left\{{w}_{t1}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}}.
A single step of the bootstrap version of the MMPF is summarized below.
Algorithm 1 (MMPF) Suppose we have the previous particles{\left\{{x}_{t1}^{\left(i\right)},{q}_{t1}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}}and weights{\left\{{w}_{t1}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}}available and we have received a new measurement y_{ t }.

1.
Resampling: Sample {\left\{{\stackrel{\u0303}{x}}_{t1}^{\left(i\right)},{\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}} from {\left\{{x}_{t1}^{\left(i\right)},{q}_{t1}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}} according to weights {\left\{{w}_{t1}^{\left(i\right)}\right\}}_{i=1}^{{N}_{p}} such that
P\left({\stackrel{\u0303}{x}}_{t1}^{\left(i\right)}={x}_{t1}^{\left(j\right)},{\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}={q}_{t1}^{\left(j\right)}\right)={w}_{t1}^{\left(j\right)}(27)for each i = 1, ..., N_{ p }.

2.
Prediction Step:
(a) Sample {q}_{t}^{\left(i\right)} from {\stackrel{\u0303}{q}}_{t1}^{\left(i\right)} such that
P\left({q}_{t}^{\left(i\right)}{\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}\right)={\pi}_{{\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}{q}_{t}^{\left(i\right)}}(28)for each i = 1,..., N_{ p }.
(b) For each i = 1, ..., N_{ p }, generate{x}_{t}^{\left(i\right)}from{\stackrel{\u0303}{x}}_{t1}^{\left(i\right)},\phantom{\rule{2.77695pt}{0ex}}{q}_{t}^{\left(i\right)}and{\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}by using samples from the process noise sequences{\eta}_{t}^{r,\left(i\right)}~{p}_{{\eta}_{t}^{r}}\left(\cdot \right),\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\eta}_{t}^{g,\left(i\right)}~{p}_{{n}_{t}^{g}}\left(\cdot \right)and{\nu}_{t}^{r,\left(i\right)}~{p}_{{\nu}_{t}^{r}}\left(\cdot \right)according to:

If{\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}=1,\phantom{\rule{2.77695pt}{0ex}}{q}_{t}^{\left(i\right)}=1 then
{x}_{t}^{\left(i\right)}={f}^{r}\left({\stackrel{\u0303}{x}}_{t1}^{\left(i\right)},{\mathcal{I}}_{RN},{\eta}_{t}^{r,\left(i\right)},{\nu}_{t}^{r,\left(i\right)}\right)(29) 
If {\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}=1,\phantom{\rule{2.77695pt}{0ex}}{q}_{t}^{\left(i\right)}=2 then
{x}_{t}^{\left(i\right)}={f}^{g}\left({T}^{gr}\left({\stackrel{\u0303}{x}}_{t1}^{\left(i\right)},{\mathcal{I}}_{RN}\right),{\eta}_{t}^{g,\left(i\right)}\right),(30) 
If {\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}=2,\phantom{\rule{2.77695pt}{0ex}}{q}_{t}^{\left(i\right)}=1 then
{x}_{t}^{\left(i\right)}={f}^{r}\left({T}^{rg}\left({\stackrel{\u0303}{x}}_{t1}^{\left(i\right)},{\mathcal{I}}_{RN}\right),{\mathcal{I}}_{RN},{\eta}_{t}^{r,\left(i\right)},{\nu}_{t}^{r,\left(i\right)}\right),(31) 
If {\stackrel{\u0303}{q}}_{t1}^{\left(i\right)}=2,\phantom{\rule{2.77695pt}{0ex}}{q}_{t}^{\left(i\right)}=2 then
{x}_{t}^{\left(i\right)}={f}^{g}\left({\stackrel{\u0303}{x}}_{t1}^{\left(i\right)},{\eta}_{t}^{g,\left(i\right)}\right).(32)


3.
Update Step: Set {w}_{t}^{\left(i\right)} as
{w}_{t}^{\left(i\right)}\propto {p}_{e}\left({y}_{t}h\left({T}^{g}\left({x}_{t}^{\left(i\right)},{q}_{t}^{\left(i\right)}\right)\right)\right)(33)such that{\sum}_{i=1}^{{N}_{p}}{w}_{t}^{\left(i\right)}=1.
Remark 4 (Feasibility Check) When a particle is selected to be transformed from the offroad mode to the onroad mode, a feasibility check of the new onroad state is done according to Section 4.3 (basically, check if the particle close to a road or not). If the state is not feasible, that particle will not be transformed and will therefore continue being in the offroad mode. Since an onroad state can always be transformed to an offroad state, a similar feasibility test is not needed in the opposite case. This will formally mean that the transition probability matrix (5)
is state dependent where
and π_{11}, π_{12}, {\stackrel{\u0304}{\pi}}_{21} and {\stackrel{\u0304}{\pi}}_{22} are constants.
Remark 5 (Initialization) Measurements that are not associated to any confirmed or tentative tracks are used to create new tentative tracks. When a new filter is created, N particles are generated for both models using different Gaussian prior distributions, one for each model. The initial position is computed by projecting the observation onto the ground plane. The feasibility check in Remark 4 is here also used for all the onroad particles, so particles outside the roads are discarded. The prior should be quite flat since the initial measurement is directly used in a measurement update step plus a resampling step to set the total number of particles to N in the MMPF.
Remark 6 (Other Multiple Model Particle Filters) There are other instances of multiple model particle filters in the literature [7, 36]. The particular selection of MMPF in our study was made only because of the fact that it is the most wellknown and the earliest of its kind. In general, all of the different multiple model particle filters are expected to give similar performance results for our application, which is also confirmed by the comparison between MMPF and IMMPF of [7] we present in Section 5.3. Nevertheless, it must still be noted that there might be pathological examples (see e.g., [37]) for which these algorithms would yield significantly different performances especially during mode transitions.
4.6 Occlusion and information from nondetections
The standard approach in target tracking is to update the filter statistics if an observation is received, otherwise, if no observation is received the target state remains intact in the update step. However, a measurement indicating no target in the field of view can also be considered as an observation and this is sometimes called "negative information" [48, 49]. Negative information represents conclusions that are drawn from expected but actually missed detections. Despite that no observation data is available, these conclusions can be used to improve the current target estimate. Let {y}_{t}=\varnothing denote that no detection was obtained at time t. The density p\left({x}_{t}^{r}{y}_{t}=\varnothing ,{y}_{1:t1}\right) is not just the prediction p\left({x}_{t}^{r}{y}_{1:t1}\right), it also has to incorporate the (negative) information of a nondetection as
where p\left({y}_{t}=\varnothing {x}_{t}^{r}\right)=1{P}_{D}\left({x}_{t}^{r}\right). In the particle filter this means that the weight i is updated according to
where α = 1. When the possibly occluded regions in the scene are known, this information can be used as a form of negative information in the particle filter at time instants with no detection. In such a case, the (negative information) update (37) tends to increase the weights of the particles in the occluded regions and reduce the weights of particles in the nonoccluded regions.
Note that this requires that the model of the probability of detection be correct, otherwise, the risk of degeneracy increases in a particle filter with a limited number of particles. In practice a more conservative approach with 0 ≤ α < 1 is recommended when P_{ d }(·) may have significant modeling errors.
5 Results
In this section some results of the proposed pedestrian tracker are presented. First, in Section 5.1 the multiplepedestrian tracker is applied to a realworld data set of an infrared sensor placed on top of a roof and pointing at a parklike environment with some trees, buildings and walkways. In Section 5.2 the comparison between MMPF and a standard offroad tracker is made on a similar data set with a GPS trajectory as the ground truth. A MonteCarlo (MC) study based on synthetic data is presented in Section 5.3 where the IMMPF [37, 7] is also evaluated in order to come to a judgement about the expected differences between different multiple model particle filters. Finally, in Section 5.4 an example illustrating the use and the performance gain of negative information is shown.
5.1 Pedestrian tracking field trial
The task presented in this section is to track a number of pedestrians in an infrared image sequence acquired by the infrared sensor described in Section 2.2. The pedestrians were walking both on and off the walkways and trees/buildings were occluding the pedestrians in some areas. The detector in Section 2.3 is used and the resulting detections are fed into the multitarget tracker based on the MMPF of Section 4.
The infrared sensor is located south of the surveillance area on a roof and the sampling frequency is 50 Hz, but just every 5th frame is used, i.e., the sampling frequency of the filter is 10 Hz. This makes it possible to use a time interleaved approach for increased robustness, where the same algorithm runs in parallel, each one time interleaved and operating on different data.
The number of particles in the MMPF is 1000 and the transition probabilities in (5) and (35) are
The measurement noise is assumed to be distributed as
When the standard deviation of the angle noise is set to be σ_{ e }= 0.004 as above, the projected uncertainty on the ground plane (with 68% confidence) corresponds to 4 m and 9 m for Gaussian distributions when a target is 130 m and 200 m away from the sensor, respectively. These uncertainty values become slightly larger for Student's T distributions (\nu =10) due to its heavier tails. The covariance matrices of the process noise are
for the onroad and the offroad models, respectively. The β parameters are set to {\beta}_{{\mathsf{\text{y}}}^{r}}=0.96 and {\beta}_{{\mathsf{\text{z}}}^{r}}={\beta}_{{\mathsf{\text{z}}}^{g}}=0.99. The initial state distribution is selected as Gaussian. It has been observed that the tracking results are quite insensitive to the initial state covariance. A target must be detected for three consecutive frames after its first appearance, and then for two out of three consecutive frames (after the first three consecutive frames) in order to be confirmed. A target is deleted if no detection is received for 5 s.
The results of the experiment are illustrated in a number of figures below. (Movies are available, see [50].) A snapshot where the particle mixtures can be seen is shown in Figure 5. In Figures 6, 7, and 8 the focus is only on three selected pedestrians for the sake of clarity.
The estimated paths, based on the point estimates (25), for these three pedestrians are shown in Figure 6. One target is starting offroad, but ending onroad and viceversa for another pedestrian. This mode transition can easily be seen in Figure 7 where the onroad mode probabilities are shown. Note that when a pedestrian is offroad, the onroad mode probability is very close to zero, but when the pedestrian is onroad the mode probability is just about 0.70.8. The reason for this is that the offroad model is valid when the pedestrian is onroad as well, but the opposite is not true if the target is too far from the road. The improvement of using a road network model can be seen in Figure 8 where the uncertainty is shown. The uncertainty is here defined as
where {P}_{t}^{pos} is the position part of the state covariance matrix (26).
5.2 Performance evaluation with GPS ground truth
In this section a similar real data set to the one described above is used to evaluate the tracking performance for a single pedestrian by using the GPS trajectory of the pedestrian as ground truth. The MMPF pedestrian tracker with both onroad and an offroad models is compared with a standard offroad PF with no road network knowledge.
The scenario and the filter parameters of the MMPF and the PF are similar to those in Section 5.1, but the frame rate of the filters here is 12.5 Hz. The number of particles in both filters is 1000 and the transition probabilities in MMPF are
The covariance matrices of the process noise are
for the onroad and the offroad models, respectively. The β parameters are set to {\beta}_{{\mathsf{\text{y}}}^{r}}=0.95 and {\beta}_{{\mathsf{\text{z}}}^{r}}={\beta}_{{\mathsf{\text{z}}}^{g}}=0.99. The altitudes of the roads are given by GPS measurements. Since no ground model is available, in order to get observability for the offroad model, the ground is simply assumed to be a plane. For each MonteCarlo run, the fixed ground plane elevation is selected randomly by sampling uniformly from an interval of length 0.3 m which is determined by the altitude range for the closest road segment.
Since there is only a single set of measurements in the experiment (as opposed to the MonteCarlo runs where a different realization of the measurement process is generated for each run) and since the results of the particle filters hardly differ for different runs, only 10 MonteCarlo runs were found to be trustable. The true (GPS) path of the pedestrian with the expected accuracy around 0.1  0.2 m is shown in Figure 9. In addition, Figure 9 illustrates the average path estimate of each filter over the MonteCarlo runs. The RMS position errors corresponding to both filters are presented in Figure 10. Figure 11 shows the average onroad mode probabilities provided by MMPF. As expected, the tracking result is significantly better for the MMPF when the target is onroad. When the target switches to offroad motion, the accuracy difference between the filters starts to get smaller. The peak in the MMPF error occurs at the onroad to offroad switching of the target during which the onroad model of MMPF pulls the overall estimate towards the road segment. As soon as the mode probabilities of MMPF converge, the MMPF estimate becomes slightly better than that of PF. The PF estimates are more erroneous than those of MMPF during the offroad motion of the pedestrian. The reason is that the initial error of PF (just after the switching occurs) takes some time before decaying to the steady state level where both filters are expected to reach the same performance. The short period around 15 s where the PF error curve makes a dip below the MMPF error curve is a scenario specific phenomenon which is confirmed by the average path of PF intersecting the true GPS path in Figure 9.
5.3 MonteCarlo simulation study
In order to compare the performance of different multiple model particle filters, and different mode transition probabilities in a controlled manner, a MonteCarlo simulation based on synthetic data is presented in this section. The task is to track a single pedestrian that is walking both on and off the walkways according to Figure 12. The main objective in this section is to show the advantage of using a multiple model particle filter with road network knowledge over a standard PF. In addition to MMPF another multiple model particle filter, the IMMPF [7, 37], is also evaluated. IMMPF is similar to MMPF, but the number of particles is constant and predefined for each mode, unlike the MMPF where the number of particles in each mode is varying according to the posterior mode probabilities. We here emphasize that the IMMPFMMPF comparison is included here only to show whether the particular selection of MMPF as the tracking algorithm in this study is critical or not. In fact, IMMPF, being a wellknown method, was not used in pedestrian tracking before either and might as well have been selected as the tracking algorithm in this study.
In the MCsimulation the total number of particles is 1000 for all filters: PF, MMPF, and IMMPF. For the IMMPF, the total number of particles is divided equally between the modes, i.e., each model has 500 particles. The vision sensor is running at 10 fps and is located about 200 m south and 17 m above the surveillance area in Figure 12. To achieve a better triangulation behavior the sensor is moving slowly with 1 m/s to the east. The measurement noise of the vision sensor is distributed as
The projected uncertainty (with 68% confidence) on the ground corresponds to 9 m.
We run two instances of each multiple model particle filter with different transition probabilities in order to see the algorithms sensitivity. In the literature, the convention for selecting the transition probabilities for multiple model methods is to use almost always diagonally dominant transition probability matrices (TPM). We here follow the same tradition and select the different transition probabilities for MMPF and IMMPF as
where \stackrel{\u0304}{\Pi}\triangleq \left({\pi}_{11}\phantom{\rule{2.77695pt}{0ex}}{\pi}_{12}\phantom{\rule{2.77695pt}{0ex}}{\stackrel{\u0304}{\pi}}_{21}\phantom{\rule{2.77695pt}{0ex}}{\stackrel{\u0304}{\pi}}_{22}\right) whose elements are defined in (5) and (35).
In this MC simulation, the covariance matrices of the process noise are set as
for the onroad and the offroad models, respectively. Suitable value of the model parameter {\beta}_{{\mathsf{\text{y}}}^{r}} depends on the target behavior, but also the width of the road/walkway Low value of {\beta}_{{\mathsf{\text{y}}}^{r}} will decrease the state uncertainty and force the state estimate towards the center line of the road, but at the cost of possible state bias and decreased ability of onroad to offroad change detection. A value between 0.95 and 0.99 is reasonable in most cases. In this MC simulation, the β parameters are set to {\beta}_{{\mathsf{\text{y}}}^{r}}=0.95 and {\beta}_{{\mathsf{\text{z}}}^{r}}={\beta}_{{\mathsf{\text{z}}}^{g}}=0.99.
The position RMSE values for the MC simulation with 100 runs are shown in Figure 13. Five different filters are considered: one PF with a single offroad mode, two MMPFs and two IMMPFs with transition probabilities {\stackrel{\u0304}{\Pi}}^{1} and {\stackrel{\u0304}{\Pi}}^{2}, respectively. When the pedestrian is offroad the performances of all filters are basically identical once the effects of the mode transition have died out. The only part where the single mode filter is best is at the onroad to offroad transition which is due to the fact that PF has no particles locked to the onroad model pulling the estimates towards the road. The differences between the MMPF and the IMMPF are quite small, even during the mode transitions. When the target is onroad the MMPF onroad mode probability is about 0.5 and, hence, the number of particles in each mode is then similar to IMMPF and the behavior during the onroad to offroad transition becomes similar. In the offroad to onroad transition the IMMPF cannot benefit from the reserved onroad particles since those are infeasible, therefore the behavior of the two filters will be similar for this case too. The MMPF is slightly better when the target is offroad since it can use twice as many particles, but the difference is too small to be clearly visible in the figure. The direction of the roads affects significantly how much the multiple model filters would gain from the onroad model. The more perpendicular the road stretch is to the line of sight of the sensor, the more useful the road information is. For example, compare the errors of the MMPF (or IMMPF) during the time intervals 7080 s and 80 90 s. Although the effects of the transition probabilities on the performance of the two multiple model particle filters are quite visible, the changes due to different transition probabilities seem to be rather small compared to the gain in using the road network information (i.e., the onroad model).
5.4 Use of negative information in pedestrian tracking
In Section 4.6 the concept of negative information was introduced, i.e., how one can draw conclusions from nondetections. This section will provide a simple example to illustrate the gain in using the negative information. Note, that in this study we are only considering occlusions caused by stationary objects, like trees and buildings, with known locations. Occlusions caused by other pedestrians are not handled.
Two particle filters using the onroad motion model are applied to a scenario where a fictitious building is placed in the area in front of the path of one pedestrian. The detections are removed manually when the pedestrian is occluded by the hypothetical building. The only difference between the filters is that one filter is using the socalled negative information. The position RMSE for the two filters are compared in Figure 14. A nonoccluded filter result is used as ground truth.
The filter that is using negative information performs better since the effect of the particles that are visible from the pointofview of the sensor is suppressed. An intuitive explanation for reducing the effect of the visible particles is given as follows. If the particles that are visible represented the true state, then the pedestrian would have been detected, but he/she is not, and therefore such particles should be less probable.
6 Conclusions
The pedestrian tracker proposed in this study is a multiplemodel particle filter that uses prior information about the walkways to enhance the estimation performance. The tracking is performed in 3D global coordinates by utilizing the road network information. The states of the pedestrians are estimated by separate filters. Thus, the correlation between pedestrians are neglected, but experiments show that this is a reasonable approximation. For example, cars on a road are in general much more correlated than pedestrians.
The sampling based GNN association method works very well since the detector performs well with few false detections and the measurement noise is quite small for vision/infrared sensors compared to, for instance, radar. Using the Student's Tdistribution for the measurement noise makes the filter more robust against minor outliers caused by the detector.
There are a number of advantages of using a road model. The tracking performance is significantly better if the road network information is used. On the other hand, filters based only on an offroad model perform quite well too as long as the detections are received on a regular basis and a reliable ground model is available. The gains in incorporating an onroad model into the estimation are significant not only for pedestrian motion prediction (e.g., due to occlusion or not in the fieldofview), but also for enhanced sensor management, track analysis, and anomaly detection.
On the other hand, there can also be some unpredicted disadvantages of using a road model. Using contextual information that is described relative a global reference system requires that the knowledge of the location and the orientation of the sensor be very accurate, otherwise unmodeled navigation error biases can have severe effects on the tracking performance. For a sensor system, in a known environment with known landmarks, the location and the orientation are usually straightforward to estimate with good accuracy. If this is not the case, algorithms that rely much on prior information should always be used with a failsafe algorithm that can take over when the prior information is wrong or erroneous. In our case the offroad model provides the filter with both an offroad tracking capability and increased robustness against model and navigation errors in onroad target tracking.
Observability is always an issue in vision based target tracking. Since the infrared sensor was stationary in the field trial, the offroad filter also needs a ground elevation model. This external information can be included explicitly by computing a range measurement or implicitly in the motion model. Regardless of the method used, erroneous orientation estimate and/or ground model will cause problems as in the erroneous road model case. However, note that a road network model is in general much easier to acquire and verify than a complete ground model. If no shape data exists for the roads of interest, it is quite straightforward to use GPS or orthophotos to create the road network and then to verify the result by projecting the network onto the camera image. If the sensor platform is moving the observability improves and the robustness against road and ground plane model errors increases.
In tracking applications, the performance depends always on a number of tuning parameters which are usually scenario dependent. As usual, there is a compromise between low uncertainty and robustness against unexpected events. In the end, it is the user, with certain experiences and preferences, that decides which models and parameter values to use. Our conclusion here is that if a reliable road network model is available, it is very beneficial to use it even in a pedestrian tracking application where the apparent gains, at first sight, might be shadowed by the properties of the accurate sensor.
According to our simulation results, incorporating both onroad and offroad models into the tracking seems to be much more important than the specific multiple model particle filter (MMPF or IMMPF) that is used for implementing the incorporation. Similarly, the sensitivity to the transition probabilities used in multiple model filters proves to be less important compared to the gain obtained by using an additional onroad model.
In this article it has also been shown how a probability of detection model, e.g., 3D models of buildings etc., can be used to draw conclusions from nondetections. In practice the gain in using negative information depends on several factors, e.g., the environment (many or few buildings and trees), the target motion characteristics (highly predictable or not) etc., and the decision to use negative information must be made after taking such factors into consideration.
References
Räty T: Survey on contemporary remote surveillance systems for public safety. IEEE Trans Syst Man Cybern C 2010, 40(5):493515.
Hu W, Tan T, Wang L, Maybank S: A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern C 2004, 34(3):334352. 10.1109/TSMCC.2004.829274
Ahmad I, He Z, Liao M, Pereira F, Sun MT: Special issue on video surveillance. IEEE Trans Circuits Syst Video Technol 2008, 18(8):10011005.
Gustafsson F, Orguner U, Schön TB, Skoglar P, Karlsson R: Navigation and tracking of roadbound vehicles. In Handbook of Intelligent Vehicles. Edited by: Eskandarian, A. Springer; 2011.
Rydell J, Haapalahti G, Karlholm J, Näsström F, Skoglar P, Stenborg KG, Ulvklo M: Autonomous functions for UAV surveillance. International Conference on Intelligent Unmanned Systems (ICIUS) 2010.
Skoglar P: Planning methods for aerial exploration and ground target tracking. Licentiate thesis no. 1420, Department of Electrical Engineering, Linköping University, SE581 83 Linköping. Sweden; 2009.
Boers Y, Driessen J: Interacting multiple model particle filter. IEE PRadar Son Nav 2003, 150(5):344349. 10.1049/iprsn:20030741
Ma Y, Soatto S, Kosecka J, Sastry SS: An Invitation to 3D Vision: From Images to Geometric Models. Springer Verlag; 2003.
Karlholm J: Design and evaluation of a hierarchy of boosted classifiers for detection of ground targets in aerial surveillance imagery. Automatic Target Recognition XIV, Proc SPIE 2004., 5426:
Schapire RE, Singer Y: Improved boosting algorithms using confidencerated predictions. Mach Learn 1999, 37: 297336. 10.1023/A:1007614523901
Viola P, Jones MJ: Robust realtime face detection. Internat J Comput Vision 2004, 57: 137154.
Laptev I: Improving object detection with boosted histograms. Image Vision Comput 2009, 27(5):535544. 10.1016/j.imavis.2008.08.010
Bourdev L, Brandt J: Robust object detection via soft cascade. Computer Vision and Pattern Recognition. IEEE Computer Society Conference on 2005, 2: 236243.
Zhan B, Monekosso D, Remagnino P, Velastin S, Xu LQ: Crowd analysis: a survey. Mach Vision Appl 2008, 19: 345357. 10.1007/s0013800801324
Yılmaz A, Javed O, Shah M: Object tracking: A survey. ACM J Comput Surv 2006, 38(4):145.
Isard M, Blake A: Condensation  conditional density propagation for visual tracking. Internat J Comput Vision 1998, 29: 528. 10.1023/A:1008078328650
Okuma K, Taleghani A, Freitas Nd, Little JJ, Lowe DG: A boosted particle filter: Multitarget detection and tracking. In Computer Vision  ECCV 2004, Lecture Notes in Computer Science. Volume 3021. Edited by: Pajdla, T, Matas, J. Springer Berlin/Heidelberg; 2004:2839. 10.1007/9783540246701_3
Ristic B, Arulampalam S, Gordon N: Beyond the Kalman Filter: Particle Filters for Tracking Applications (Artech House Radar Library). Artech House, Norwood, MA; 2004.
Huang C, Wu B, Nevatia R: Robust object tracking by hierarchical association of detection responses. Proceedings of the 10th European Conference on Computer Vision: Part II, ECCV '08 2008, 788801.
Fleuret F, Berclaz J, Lengagne R, Fua P: Multicamera people tracking with a probabilistic occupancy map. IEEE Trans Pattern Anal Mach Intell 2008, 30: 267282.
Khan S, Shah M: A multiview approach to tracking people in crowded scenes using a planar homography constraint. In Computer Vision  ECCV 2006, Lecture Notes in Computer Science. Volume 3954. Edited by: Leonardis, A, Bischof, H, Pinz, A. Springer Berlin/Heidelberg; 2006:133146.
Xu F, Liu X, Fujimura K: Pedestrian detection and tracking with night vision. IEEE Trans Intell Transp Syst 2005, 6(1):6371. 10.1109/TITS.2004.838222
Leibe B, Schindler K, Cornelis N, Van Gool L: Coupled object detection and tracking from static cameras and moving vehicles. IEEE Trans Pattern Anal Mach Intell 2008, 30: 16831698.
Kirubarajan T, BarShalom Y, Pattipati KR, Kadar I: Ground target tracking with variable structure IMM estimator. IEEE Trans Aerosp Electron Syst 2000, 36(1):2646. 10.1109/7.826310
Shea PJ, Zadra T, Klamer D, Frangione E, Brouillard R: Improved state estimation through use of roads in ground tracking. Proceedings of Signal and Data Processing of Small Targets, SPIE 2000, 4048: 312332.
Shea PJ, Zadra T, Klamer D, Frangione E, Brouillard R: Precision tracking of ground targets. Proceedings of IEEE Aerospace Conference, IEEE 2000, 3: 473482.
Blom H, BarShalom Y: The interacting multiple model algorithm for systems with Markov switching coefficients. IEEE Trans Automat Contr 1988, 33(8):780783. 10.1109/9.1299
BarShalom Y, Li XR: Estimation and Tracking: Principles, Techniques and Software. Artech House, Inc., Storrs, CT, Norwood, MA; 1993.
Li XR, BarShalom Y: Multiplemodel estimation with variable structure. IEEE Trans Automat Contr 1996, 41(4):478493. 10.1109/9.489270
Arulampalam MS, Gordon N, Orton M, Ristic B: A variable structure multiple model particle filter for GMTI tracking. Proceedings of International Conference on Information Fusion 2002, 2: 927934.
Ulmke M, Koch W: Roadmap assisted ground moving target tracking. IEEE Trans Aerosp Electron Syst 2006, 42(4):12641274.
Koller J, Ulmke M: Roadmap assisted ground target tracking. Aerosp Sci Technol 2007, 11(4):261270. 10.1016/j.ast.2006.10.010
Payne O, Marrs A: An unscented particle filter for GMTI tracking. Proceedings of IEEE Aerospace Conference 2004, 3: 18691875.
Cheng Y, Singh T: Efficient particle filtering for roadconstrained target tracking. IEEE Trans Aerosp Electron Syst 2007, 43(4):14541469.
Skoglar P, Orguner U, Törnqvist D, Gustafsson F: Road target tracking with an approximative RaoBlackwellized Particle filter. Proceedings of International Conference on Information Fusion 2009.
Kravaritis G, Mulgrew B: Variablemass particle filter for roadconstrained vehicle tracking. EURASIP J Adv Signal Process 2008., 2008:
Orguner U, Schön TB, Gustafsson F: Improved target tracking with road network information. Proceedings of IEEE Aerospace Conference, Big Sky, Montana, USA 2009.
BarShalom Y, Li XR: MultitargetMultisensor Tracking: Principles, Techniques. YBS Publishing, Storrs, CT; 1995.
Blackman S, Popoli R: Design and Analysis of Modern Tracking Systems. Artech House, Inc., Norwood, MA; 1999.
Mahler R: Statistical MultisourceMultitarget Information Fusion. Artech House, Norwood, MA, USA; 2007.
Ulmke M, Erdin¸c O, Willett P: Gaussian mixture cardinalized PHD filter for ground moving target tracking. In Proceedings of International Conference on Information Fusion. Quebec, Que; 2007:18.
Mahler R: PHD filters of higher order in target number. IEEE Trans Aerosp Electron Syst 2007, 43(4):15231543.
Gustafsson F, Gunnarsson F, Bergman N, Forssell U, Jansson J, Karlsson R, Nordlund PJ: Particle filters for positioning, navigation, and tracking. IEEE Trans Signal Process 2002, 50(2):425437. 10.1109/78.978396
Gordon NJ, Salmond DJ, Smith AFM: Novel approach to nonlinear/nonGaussian Bayesian state estimation. IEE Proc Radar Signal Process 1993, 140(2):107113. 10.1049/ipf2.1993.0015
Hol JD, Schön TB, Gustafsson F: On resampling algorithms for particle filters. Nonlinear Statistical Signal Processing Workshop 2006.
Vermaak J, Godsill S, Perez P: MonteCarlo filtering for multi target tracking and data association. IEEE Trans Aerosp Electron Syst 2005, 41(1):309332. 10.1109/TAES.2005.1413764
ESRI  Environmental Systems Research Institute: ESRI shapefile technical description  an ESRI white paper. URL1998. [http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf]
Stone LD, Barlow CA, Corwin TL: Bayesian Multiple Target Tracking. Artech House Publishers, Norwood, MA; 1999.
Koch W: On exploiting "negative" sensor evidence for target tracking and sensor data fusion. Inf Fusion 2007, 8(1):2839. 10.1016/j.inffus.2005.09.002
Skoglar P: Pedestrian tracking movies. URL.2011. [http://www.control.isy.liu.se/~skoglar/asp2011/]
Acknowledgements
This study had been supported by the Swedish Research Council under the Linnaeus Center CADICS and the frame project grant Extended Target Tracking (62120104301). The study was a part of the graduate school Forum Securitatis in Security Link. The data acquisition was done in the FOI project "Signalbehandling for styrbara sensorsystem" founded by FM (Swedish Armed Forces) and the detector described in this work has been developed by FOI. The authors would like to thank Fredrik Näsström, Fredrik Hemström, Gustav Haapalahti, Philip Engström, KarlGöran Stenborg, Jörgen Karlholm, Joakim Rydell, Staffan Cronström and Morgan Ulvklo at FOI.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Skoglar, P., Orguner, U., Törnqvist, D. et al. Pedestrian tracking with an infrared sensor using road network information. EURASIP J. Adv. Signal Process. 2012, 26 (2012). https://doi.org/10.1186/16876180201226
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/16876180201226
Keywords
 pedestrian tracking
 infrared sensor
 road network
 particle filter
 multiple model
 occlusion