Skip to main content

A recursive kinematic random forest and alpha beta filter classifier for 2D radar tracks


In this work, we show that by using a recursive random forest together with an alpha beta filter classifier, it is possible to classify radar tracks from the tracks’ kinematic data. The kinematic data is from a 2D scanning radar without Doppler or height information. We use random forest as this classifier implicitly handles the uncertainty in the position measurements. As stationary targets can have an apparently high speed because of the measurement uncertainty, we use an alpha beta filter classifier to classify stationary targets from moving targets. We show an overall classification rate from simulated data at 82.6 % and from real-world data at 79.7 %. Additional to the confusion matrix, we also show recordings of real-world data.

1 Introduction

The increasing demand for protection and surveillance of the coastal areas requires modern coastal surveillance radars. These radars are designed such that small objects can be detected. Therefore, there is an increasing amount of information for the radar observer. Moreover, the number of false and unwanted objects increases as the demand for seeing small objects makes the radar more sensitive. Generally, the false objects can be avoided by using a reliable tracker. However, the tracker does not exclude unwanted objects. The difference between false and unwanted objects is that false objects do not originate from true objects but are mainly noise objects, whereas the unwanted objects originate from true objects but are unwanted in the surveillance image. These objects depend on the purpose of the radar; however, for coastal surveillance radars, the unwanted objects are normally birds, wakes from large ships, etc.

It has been shown in [1] that it is possible to classify tracks by using a recursive classifier where a Gaussian mixture model (GMM) is used to model the probability distribution function (PDF) of the target’s kinematic behavior. However the classifier does not handle the uncertainty in the measurements from the radar. In [2], the position uncertainty is used as an input to the classifier. The classifier also use a GMM to model the PDF of the kinematic behavior of the target. The problem with this is that it is very computationally expensive. To obtain an easier way to handle uncertainty, joint target tracking and classification can be used, as shown in [35]. The problem with joint target tracking and classification is that it is difficult to achieve a high degree of freedom in the filters to separate the classes. For example, a car driving 130 km/h on the highway is not likely to accelerate but more likely to decelerate. This is very hard to model with a tracking filter. A particle filter can be used, but this is computationally expensive. In [6], the authors are describing a method to classify trucks and cars from GPS measurements. The classifier consists of a support vector machine (SVM), and the features are primarily acceleration and deceleration. The classifier is non-recursive, which means that the complete length of the tracks is required. The measurements from a GPS device is generally more accurate than the position measurements from a radar. In [7], a decision tree is used for a recursive classification of four different target classes. The data are from a radar with height information. The decision tree has the advantage that it in some way implicitly handles the uncertainty, that is, features that do not separate the classes will not be used as much as features separating the classes. The disadvantage is that the classifier has a high variance of the classification results. In [8], the random forest classifier is introduced. The random forest is a bagging classifier [9] where multiple decision trees are used to reduce the variance of the classification results. For this reason, random forest is selected in this work.

In this work, we introduce a classifier which uses position measurements to classify radar tracks from a 2D scanning radar. The classifier consists of an alpha beta filter [10] and a random forest classifier. The alpha beta filter is classifying stationary or moving targets, and the random forest classifies moving targets. The classifier is recursive such that the classification results are being updated for each scan of the radar. The classifier performance is shown by using simulated track data and real-world radar data.

In Section 2.1, we will introduce the random forest classifier by describing the training of a decision tree and then explain how this tree is used in the random forest. In Section 2.2, we will explain how we utilize the probability estimates from the random forest in a recursive framework. In Section 2.3, we introduce an alpha beta filter classifier, which classifies targets as either stationary or moving. This is introduced because stationary targets can have high speeds because they fluctuate in the position because of measurement uncertainty or the main scatter points are moving, i.e., wind turbine. In Section 2.4, we combine the random forest and the alpha beta filter as our proposed classifier. In Section 2.5, we describe which features we use in the random forest. The simulation study is shown in Section 3, and in Section 4, the real-world results are shown. We discuss the results in Section 5 and conclude the work in Section 6.

2 Method

When using a random forest, a feature vector is needed. We define our feature vector as a set of kinematic and geographic features. The feature vector is derived from the radar position measurements. We define this set of position measurements as

$$ \{Z_{n}\}{~}_{k} = \{Z_{n} \cdots Z_{n-k}\}, $$

where Z n = [x n ,y n ] T, x and y is the position in a Cartesian coordinate system with the origin at the location of the radar, n is the measurement number index, and k is the set size.

2.1 Random forest

In this section, we introduce the random forest classifier [8, 11]. The random forest is a bagging algorithm, which means that the random forest consists of a number of weak classifiers [12], which has zero bias but high variance of the true value. The weak classifiers are decision trees [9]. We start this section by describing how to grow a decision tree and then move on to the random forest.

A decision tree consists of a number of nodes, e.g., (N 1N 3) and a number of leafs, e.g., (N 4N 7). This is shown in Fig. 1. A node is defined by more than one class existing in the node data, whereas a leaf has only one class. In every node, a decision must be made such that we go either left or right in the tree. The decision must always be true or false. A leaf is defined as a node where all of the data in the node consists of only one class; therefore, no more splits are required.

Fig. 1
figure 1

An example of a decision tree where N 1 to N 3 are nodes where a decision must be made. An example could be the blue ball (true or false)

To train the tree, we start with a feature vector F of size N s ×D where N s is the number of samples and D is the number of features, i.e., dimensions in the feature vector. We now want to split the data such that we make the best separation of the classes by choosing the best feature and feature value. To do this, we need to find the best feature to split on and the best value to split at. To explain the algorithm, we assume that there are only two classes so it forms a binary classification problem and that the values of the feature belong to a finite sample space. This is done to make the explanation easier.

We start by assuming that a split already has been made and we want to evaluate how good the split is. For this, we use a normalized entropy measure to do that [12]. An alternative to the normalized entropy is the more common Gini index [13]; however, for this work, the normalized entropy has shown better results. We define the set of samples in the parent node as s 1 and the number of samples in the set as |s 1|. Similarly, we define the set of samples in the children as s 2 and s 3 and the number of samples as |s 2| and |s 3|. Further, we index the samples belonging to class by the superscript such as \(s_{1}^{\ell }\), where {1,2}. We can calculate the empirical entropy for the children as

$$ \begin{aligned} H(s_{i}) &= -P\left({s_{i}^{1}}\right)\log_{2}\left(P\left({s_{i}^{1}}\right)\right) - P\left({s_{i}^{2}}\right)\log_{2}\left(P\left({s_{i}^{2}}\right)\right), i\\ &\quad=\{2,3\}, \end{aligned} $$

where \(P\left ({s_{i}^{1}}\right) = |{s_{i}^{1}}|/|s_{i}|\) and \(P\left ({s_{i}^{2}}\right) = |{s_{i}^{2}}|/|s_{i}|\). It follows that \(P\left ({s_{i}^{1}}\right) = 1-P\left ({s_{i}^{2}}\right)\). As the entropy does not take into account how many samples there are in each child, we normalize the entropy as

$$ \hat{H}(s_{i}) = \frac{|s_{i}|}{|s_{1}|}H(s_{i}), i=\{2,3\}. $$

We can now calculate the information gain from the split as

$$ \tilde{H} = H(s_{1}) - \left(\hat{H}(s_{2}) + \hat{H}(s_{3})\right). $$

From (4), we now have a measure for how good a split is and now able to optimize each split of the data such that we choose the best feature to split on and the best value of the feature. We split the data and continue to split the data until all data in a node is of the same class, i.e., the node becomes a leaf. To prevent overfitting, a decision tree must be prone. However, an advantage of using random forest is that it is not necessary to prune the decision trees. The random forest is a bagging classifier [9]. This means that the random forest consists of a number of trees N t where each tree is trained with a random part of the samples and a random part of the features, that is, we draw a random subset of the training data and select a random subset of the features. We then train each tree with these random subsets, and we assume that the trees are statistically independent of each other. A decision tree classifies the data by following a path through each node. The path is decided by the feature and feature value that made the best split in the training. The data which must be classified follow the path until a leaf is met. The leaf has a unique class, and the data is classified as this class. The classification of the data is a majority vote of the result from each of the individual decision trees, that is, each tree is a unique classifier which classifies the individual data.

In general, the random forest is not a probabilistic classifier but a majority vote between each of the trees. However, by counting the votes for each class and normalizing with the total number of trees, an empirical probability can be achieved.

$$ \hat{P}(c_{i}|\{Z_{n}\}_{k})=\psi_{i}/N_{t}, $$

where ψ i is the the number of votes for class i.

In the next section, we explain how we obtained (5) from the random forest to achieve a recursive update of the probability for the class given all the measurements.

2.2 Recursive update of the random forest probability

The empirical probabilities obtained from the random forest classifier are obtained as the fraction of the number of trees which predicts c i divided by the total number of trees. By this definition, the resolution of the probability estimates is given by the number of trees in the random forest. To prevent that a class is assigned a zero probability, we modify it in the following way:

$$ P\left(c_{i}|\{Z_{n}\}_{k}\right) = \frac{\hat{P}(c_{i}|\{Z_{n}\}_{k})(1-2/N_{t}) + 1/N_{t}}{\gamma}, $$

where γ is a normalization constant such that \(\sum _{i} P(c_{i}|\{Z_{n}\}_{k})=1\). By this formula, the probability never reaches zero for any of the classes.

Based upon the above, we have the probability for the class given the current set of features P(c i |{Z n } k ). However, we want the probability given all measurements, that is P(c i |{Z n }), where {Z n }={Z n } n . We have, however, not been able to find a simple way to recursively update P(c i |{Z n }) based on the previous P(c i |{Z n−1}) and which works for all n. Instead, we propose the following recursive function f(c i |{Z n }), which is everywhere non-negative and sums to one. Thus, f(c i |{Z n }) can be considered to be a probability mass function (PMF), which we will use as an approximation for the true P(c i |{Z n }). In particular, we define

$$ f(\{Z_{n}\}_{k}, c_{i}) \triangleq \frac{P(c_{i}|\{Z_{n}\}_{k})^{w}}{\phi_{n}}f(\{Z_{n-1}\}_{k},c_{i}), $$

where w is a weighting factor, P(c i |{Z n } k ) is given by (6), and where ϕ n is the normalization constant such that \(\sum _{c_{i}} f(\{Z_{n}\}_{k} = 1\). The introduction of the weighting by w is inspired by the weighted Bayesian classifier used in [14]. In particular, we choose w=1/k since the features of the random forest are given by a set of measurements where only one out of k measurements is substituted at each update.

In the next section, we describe our alpha beta tracking filter. This filter is used to classify if a target is non-moving or moving. The reason for applying such a filter is to classify stationary targets, which have a high apparent speed due to measurement uncertainties.

2.3 Alpha beta filter

The alpha beta filter is a simple tracking filter [15]. By using the alpha beta filter, we assume that we can describe the target movements with a first-order Markov chain. We have the state vector \(X_{n} = [\hat {x}, \hat {y}]{~}^{T}\) and the measurement Z n . The alpha beta filter is trying to predict Z n given the speed V n−1 at time n−1 and the state X n−1 as

$$ X_{n}^{-} = X_{n-1}^{+} + \tau V_{n-1}^{+}, $$

where τ is the time between Z n−1 and Z n and the superscript − is the prediction before the measurement is used and the superscript + is after the measurement is used. The filter assumes the speed is constant between n and n−1, that is V n =V n−1. The error can be calculated as

$$ R_{n} = Z_{n} - X_{n}^{-}, $$

and with the residual, we update the estimate of the \(V_{n}^{-}\) and \(X_{n}^{-}\) as

$$ \begin{aligned} X_{n}^{+} &= X_{n}^{-} + \alpha R_{n}\\ V_{n}^{+} &= V_{n}^{-} + \frac{\beta}{\tau} R_{n}, \end{aligned} $$

where α and β are the constants in the alpha beta filter. To calculate the probability for Z n given \(X_{n}^{-}\), α, and β, we use a multivariate normal distribution

$$ \begin{aligned} &P_{\alpha\beta}(Z_{n}|X_{n}^{-},\alpha,\beta)\\ &= \frac{1}{2\pi\sqrt{|\Sigma_{n}|}}\exp\left(-\frac{1}{2}\left(Z_{n}-X_{n}^{-}\right){~}^{T}\Sigma_{n}^{-1}\left(Z_{n}-X_{n}^{-}\right)\right) \end{aligned} $$

where Σ n is the covariance of the position and the subscript α β is to emphasize that this is the probability for the alpha beta filter. The purpose of the alpha beta filter is to separate non-moving targets, i.e., stationary targets from moving targets. We therefore define two filters: a stationary filter with the parameters α=0.1 and β=0.0, which allows the position part of the state to move slightly but forces the speed to be constant at zero. The possibility for a slight movement of the state is because of the possibility for false-starting measurements. As the parameters α and β are given of the class c s , we use the notation \(P_{\alpha \beta }(Z_{n}|X_{n}^{-}c_{s})\). Likewise, we define the moving alpha beta filter as \(P_{\alpha \beta }(Z_{n}|c_{m}, X_{n}^{-})\) with the parameters α=1.0 and β=1.0, i.e., we hold the speed constant from update to update but allow both the movement and the speed to change with the measured change. If we know {Z n−1} which is the set measurement up to n−1 and α and β, we can calculate \(X_{n}^{-}\) and we can therefore write P α β (Z n |c i ,{Z n−1}) instead of \(P_{\alpha \beta }(Z_{n}|c_{i}, X_{n}^{-})\). For this work, we want the alpha beta filter to classify if the target is stationary or non-stationary, and we therefore recursively update the probability of the alpha beta filter.

$$ P_{\alpha\beta}(c_{i}|\{Z_{n}\}) = \frac{P_{\alpha\beta}(Z_{n}|c_{i},\{Z_{n-1}\})P_{\alpha\beta}(c_{i}|\{Z_{n-1}\})}{P_{\alpha\beta}(Z_{n}|\{Z_{n-1}\})}. $$

To reduce the computational complexity, we assume that the positions are controlled by a first-order Markov chain, i.e., Z n Z n−1{Z n−2},n.1

$$ P_{\alpha\beta}(c_{i}|\{Z_{n}\}) = \frac{P_{\alpha\beta}(Z_{n}|c_{i},Z_{n-1})P_{\alpha\beta}(c_{i}|\{Z_{n-1}\})}{P_{\alpha\beta}(Z_{n}|Z_{n-1})}, $$

In the next section, we describe how we combine the random forest classifier and the alpha beta filter classifier such that a classifier, which is a combination of the two classifiers, is created.

2.4 Combining the alpha beta filter with random forest

In our work, we let the alpha beta filter classify if the target is stationary or non-stationary, i.e., the alpha beta filter has two classes. The random forest has a stationary class and multiple non-stationary classes. We define for the random forest c 0 to be the stationary class and c 1 n C to be the moving classes, where n C is the total number of classes. For the alpha beta filter, we have the two classes as c s and c m for stationary and non-stationary classes, respectively. We want the alpha beta filter classifier to have a larger weight on the classification result of stationary vs. moving than the random forest. We therefore use the recursive updated probability from (13). We do this as described in (14) and (15).

$$ \hat{P}(c_{0}|\{Z_{n}\}) = f(\{Z_{n}\}_{k},c_{0}) P_{\alpha\beta}(c_{s}|\{Z_{n}\}), $$
$$ \hat{P}(c_{i}|\{Z_{n}\}) = f(\{Z_{n}\}_{k},c_{i}) P_{\alpha\beta}(c_{m}|\{Z_{n}\}), i=1\cdots n_{C} $$

We then normalize \(\hat {P}(c_{i}|\{Z_{n}\})\) as

$$ P_{c}(c_{i}|\{Z_{n}\}) = \frac{\hat{P}(c_{i}|\{Z_{n}\})}{\hat{\omega}}, $$

where \({\hat {\omega }}\) is a constant such that \(\sum _{i} P_{c}(c_{i}|\{Z_{n}\}) = 1\). By including the alpha beta filter in this manner, we ensure that the alpha beta filter classifies if a target is stationary while the alpha beta filter classifier does not have influence on the different moving classes.

In the next section, we will describe the features we use for the random forest feature vector, and we will also describe how these are derived from the position. We only utilize position-dependent features such as speed and acceleration.

2.5 Features

For the feature vector, we draw inspiration from [16] for some of the features. In this work, we set the number of position measurements k in (1) to 10. The number of measurements used in the feature vector is a compromise between the time it takes to get the number of measurements required for a full feature vector and the amount of information contained in the feature vector. Larger k requires more measurements, i.e., more time before a classification result is made, whereas for smaller k, the first classification result comes earlier albeit with a greater uncertainty due to the smaller amount of available information. The features and their descriptions can be seen in Table 1. Remember that we defined {Z n } k to be {Z n Z nk }. To make the notation easier, we index each measurement in {Z n } k by i such that i represent the ith element in the set of measurements {Z n } k , that is 0≤i<k. Likewise, we define the set of time stamps of the measurements as {t n } k with the individual measurement being observed at time t i . We start by calculating the vectorial distance between the measurements as

$$ \delta_{i} = Z_{i} - Z_{i-1}, $$
Table 1 The feature vector used. The number of measurement has been chosen to be k=10

with the scalar distance given by

$$ \Delta_{i} = |Z_{i} - Z_{i-1}|, $$

and the time difference between the measurements as

$$ \tau_{i} = t_{i} - t_{i-1}. $$

The 2-point velocity estimate is

$$ v_{i} = \frac {\Delta_{i}} { \tau_{i}}, $$

for 1≤i<k, and the 3-point acceleration estimate is

$$ a_{i} = \frac {2 (v_{i + 1} -v_{i})} { \tau_{i + 1}+\tau_{i - 1}}, $$

for 1≤i<(k−1). The normal acceleration \(a_{i}^{\perp }\) is given by the product of the speed and angular velocity

$$ a^{\perp}_{i}=\left(\frac{v_{i + 1} + v_{i}} {2} \right) \left(\frac{2}{t_{i + 1} - t_{i -1}} \right) \cos^{-1} \left(\frac { \delta_{i+1} \cdot \delta_{i} } {\Delta_{i+1} \Delta_{i}} \right). $$

We also use land/sea as information These can be extracted from the SWBD (SRTM Water Body Data) database from [17]. The database is a set of polygons describing the coastline. Because of errors in the database, a hard threshold cannot be used for land and sea. We therefore proposed to use the distance to the coastline d i for each measurement as a feature. By using these polygons, it is possible to calculate the distance from a measurement to the coastline. However, it is getting more and more computationally expensive to calculate the distance as the distance to the nearest coastline increases. We therefore assign a maximum distance ξ to the coastline from the target. If the target is farther away than ξ, we assign ξ to the distance. The sign of the distance decides if it is over land or sea. We set ξ=700 m to accommodate for errors in the SWBD database.

In the next section, we will show some simulation results of the classifier. We will also show some real-world results of the classifier.

3 Simulation study

We start by showing the performance of the algorithm vs. the number of measurements k where the extracted features are from. The size of the feature vector changes by k and the table shown in Table 1 for k=10. The data we use are simulated data from a controlled random walk. The controlled random walk consists of a three-state transition matrix which has a deceleration state, steady state, and acceleration state. The parameters for maximum and minimum speed are incorporated which changes the probability in the transition matrix if the speed is not within the boundary of the permitted speed range. The data for different targets are generated such that they have nearly the same support in speed and the main difference is the acceleration support. The random walk creates positions \({p_{m}^{x}}\) and \({p_{m}^{y}}\) which are extrapolated from some smooth speeds \(\hat {v}_{m}^{x}\) and \(\hat {v}_{m}^{y}\) described later.

$$\begin{array}{*{20}l} {p_{m}^{x}} = p_{m-1}^{x} + \Delta t \hat{v}_{m}^{x} + {\Sigma_{m}^{x}} \end{array} $$
$$\begin{array}{*{20}l} {p_{m}^{y}} = p_{m-1}^{y} + \Delta t \hat{v}_{m}^{y} + {\Sigma_{m}^{y}}, \end{array} $$

where Δ t is the time between the updates for m and m−1 and \({\Sigma _{m}^{x}}\) and \({\Sigma _{m}^{y}}\) are position uncertainties drawn from a distribution.

$$ \left[ \begin{array}{cc} {\Sigma_{m}^{x}}\\ {\Sigma_{m}^{y}} \end{array} \right] \sim \mathcal{N}(0,\Sigma_{e}), $$

where Σ e is the position covariance and \(\mathcal {N}\) denotes the normal distribution. The smooth speeds are speeds \({v_{m}^{x}}\) and \({v_{m}^{y}}\) which are convolved with a 25-tap moving average filter h. This is done to avoid quick changes in the speed.

$$\begin{array}{*{20}l} \hat{v}_{m}^{x} = h*{v_{m}^{x}} \end{array} $$
$$\begin{array}{*{20}l} \hat{v}_{m}^{y} = h*{v_{m}^{y}}. \end{array} $$

The speeds (27) are extrapolated from accelerations \({a_{j}^{x}}(m)\) and \({a_{j}^{y}}(m)\), where j denotes depending upon the state j described in (32). The speeds are given as

$$ \left[ \begin{array}{cc} {v_{m}^{x}}\\ {v_{m}^{y}} \end{array} \right] = \left[ \begin{array}{cc} v_{m-1}^{x} + \Delta t {o_{j}^{x}}(m)\\ v_{m-1}^{y} + \Delta t {o_{j}^{y}}(m), \end{array} \right] $$

where \({o_{j}^{x}}(m)\) and \({o_{j}^{y}}(m)\) are accelerations which are drawn from two normal distributions given by

$$\begin{array}{*{20}l} {o_{j}^{x}}(m) \sim \mathcal{N}\left(\mu_{x,j},\sigma_{x,j}^{2}\right) \end{array} $$
$$\begin{array}{*{20}l} {o_{j}^{y}}(m) \sim \mathcal{N}\left(\mu_{y,j},\sigma_{y,j}^{y}\right) \end{array} $$

The parameters for the normal distribution μ x,j , μ y,j , \(\sigma _{x,j}^{2}\), and \(\sigma _{y,j}^{2}\) are given from the function ϕ j (v(m−1),Γ). This is done because we want to control the maximum and minimum allowed speed. We define this function as

$$ \phi_{j}(v(m-1),\Gamma) = \left\{ \begin{array}{ll} \psi_{j}(1) & \text{if~} v_{m-1} > \zeta^{\text{max}}, \\ \psi_{j}(2) & \text{if~} \zeta^{\text{min}} \leq v_{m-1} \leq \zeta^{\text{max}} \\ \psi_{j}(3) & \text{else,} \end{array} \right., $$

where ψ j (1), ψ j (2), and ψ j (3) are the set of parameters \(\{\mu _{y,j}, \mu _{x,j}, \sigma _{y,j}^{2}, \sigma _{x,j}^{2}\}\) used in (30) and Γ={ζ min,ζ max}. The state machine consists of three states: deceleration (d), constant (c), and acceleration (a) states, see Fig. 2. Further, the state machine is also controlled by the speed. We define the state transition probabilities as

$$ P_{\hat{j},j}(v(m-1),\Gamma) = \left\{ \begin{array}{ll} \Psi_{\hat{j},j}(1) & \text{if~} v_{m-1} > \zeta^{\text{max}}, \\ \Psi_{\hat{j},j}(2) & \text{if~} \zeta^{\text{min}} \leq v_{m-1} \leq \zeta^{\text{max}} \\ \Psi_{\hat{j},j}(3) & \text{else, } \end{array} \right., $$
Fig. 2
figure 2

The state machine used for the data generation of the simulated data. The state machine has three states, an accelerating a, decelerating d, and a constant-speed c state. The probability for jumping between the states is controlled with \(P_{\hat {j},j}\) which is change depending on the speed

where \(\hat {j}\) is the previous state and \(\Psi _{\hat {j},j}\) is the transition probability. An example of a track can be seen in Fig. 3. The speed PDFs can be seen in Fig. 4, and the accelerations PDF can be seen in Fig. 5. The performance of the classifier vs. the number of measurements k can be seen in Fig. 6. Further, we show the performance of the classifier vs. the number of trees N t used in the random forest, see Fig. 7. The confusion matrix of the classification results for the four classes can be seen in Table 2, where we have used k=10 and N t =100.

Fig. 3
figure 3

An example of a simulated track

Fig. 4
figure 4

The speed PDFs of the four different classes

Fig. 5
figure 5

The acceleration PDF of the four classes

Fig. 6
figure 6

The overall performance of the algorithm given the number of measurements used in the feature vector

Fig. 7
figure 7

The performance of the classifier for the synthetic generated data vs. the number of trees used in the random forest

Table 2 The confusion matrix of the simulated data

4 Real-world results

The data used for this work consist of Automatic Identification System (AIS), which is a broadcast system used for large ships; Automatic Dependent Surveillance-Broadcast (ADS-B), which is a broadcast system used for commercial aircrafts; GPS logs; and real-world radar data. The classes for this work are typical classes for coastal surveillance, e.g., large ships, birds, and small boats.

We show a confusion matrix for real-world data in Table 3. As a confusion matrix does not take into account how the probability develops over time, we also show some real-world scenarios. For these scenarios, extra classes are used. The scenarios are images showing all tracks within a specific time period. The scenarios have both known and unknown targets. It is therefore not possible to make a confusion matrix of the scenario; however, it is possible to have a good estimate of the performance of the classifier in real-world situations. The scenarios are recorded with different radars and antennas; further, the sampling rate can be different for the different scenarios. We show two scenarios from coastal surveillance applications. The first coastal surveillance scenario is recorded in Denmark where a rigid inflatable boat (RIB) is sailing from west to east and zigzagging back. Towards the north of the RIB, there are two unknown vessels; further, there are some sea buoys present both to the north of the RIB and also to the far south. The rest of the tracks are believed to be birds. See the scenario in Fig. 8. The second scenario is also from Denmark and shows two wind turbine farms. A commercial plane is flying in from the west to the east, and a small personal aircraft is circling over the first wind farm to the north, then the second wind farm, and finally leaving towards the east. Three vessels are present, one to the east of the wind farm in the north (above the other wind farm) and the second vessel is sailing through the wind farm in the south. The last vessel is sailing from west to east under the south wind farm. The rest of the tracks are believed to be birds, see Fig. 9. As the majority of previously published results are based on a joint tracking and classification approach, mostly on simulated data, it is not directly possible to compare the obtained classification accuracy.

Fig. 8
figure 8

The scenario where a RIB is sailing out and zigzagging back again. A big amount of birds is present

Fig. 9
figure 9

Horns Rev

Table 3 The confusion matrix for real-world data

In the next section, we will discuss the results of the classifier.

5 Discussion

In Fig. 6, the performance of the classification results for the simulated data set is shown, where we vary the number of measurements k, in (1), used to extract the features. The performance is calculated as the mean of the diagonal in the confusion matrix. It is clear that the more measurement (longer feature vector) used, the better the classification results. This is clear as more information to the classifier gives better estimation of the class, and therefore, it is more likely to classify correct. The downside of increasing the number of measurements is that it takes longer time from a track is seen until the first probability of the target is shown. For our results, the sampling rate varies between 0.333 and 1 Hz. For 10 measurements, this gives a maximum waiting time of 30 s, which we believe for the application in hand is acceptable. In Fig. 7, the performance can be seen when varying the number of trees used in the random forest. The plot is made with k=10. It can be seen that the performance does not get better after around 170 trees. The increase in the number of trees takes longer time to train the random forest and is more computationally expansive and memory requiring when using the classifier for testing, i.e., the purpose of the classifier is to run in real time. The performance of k=10 and n t =100 can be seen in Table 2. It is clear that type 2 and type 3 have the most confusion between them. This is also natural if we look at the speed PDFs and the acceleration PDFs in Figs. 4 and 5, respectively, as these are very similar. In general, the diagonal numbers in the confusion matrix are at the left side. This is due to the fact the large allowed acceleration still contains smaller acceleration which therefore will be classified as a lower class type.

For the real-world scenarios, we use k=10 and n t =170. As it can be seen, the confusion matrix in Table 3 shows relative good performance. Nearly all of the stationary sea targets and commercial aircrafts are classified correctly. The helicopters are confused with birds. This can be because the helicopters can move as slow as birds. There is some confusion between large ships, birds, and RIBs. All of these classes have kinematics which are close to each other.

In Fig. 8, one of the real coastal surveillance scenarios is shown. The scenario shows a RIB sailing out from a marina and zigzagging back again. The RIB is classified as a small fast boat. The reason that it is not classified as a Jet Ski/RIB is that it sails more like a fast boat whereas a Jet Ski/RIB often makes turns, accelerates, and decelerates. The two slow-moving vessels to the north of the RIB are classified correctly. Some of the sea buoys are classified correctly as stationary targets. Only a few birds are classified correctly. In Fig. 9, two wind farms can be seen and nearly all of the wind turbines are classified as stationary, while a few are misclassified as small slow-moving boats. The commercial aircraft is between a commercial aircraft and small aircraft; however, the target is primarily classified as a commercial aircraft. The small aircraft circling the two wind farms is classified correctly even though the aircraft is flying below stall speed. This can be due to the strong winds, and therefore, the real airspeed is much larger. The one sea vessel that is sailing between the wind turbines is misclassified as a bird, while the other sea vessels are classified as small slow boats, small fast boats, and helicopters. Unfortunately, nearly all the birds are misclassified either as unknown or as a helicopter. We believe this is because the training data do not contain any birds at that distance and speed (because of the wind). Further, the radar used to record this scenario is different from the radars used for the training data.

6 Conclusions

We have shown that it is possible to use a recursive approach to classify radar tracks from kinematic data. We have also shown that it is possible to use an alpha beta filter together with the random forest such that stationary targets are classified as stationary. The study used both simulated data, which is simulated to behave as real targets, and real-world data. We have shown both scenario and confusion matrix to get an overview of the performance.


  1. LW Jochumsen, M Pedersen, K Hansen, SH Jensen, J Østergaard, in Radar Conference (Radar), 2014 International. Recursive Bayesian classification of surveillance radar tracks based on kinematic with temporal dynamics and static features (IEEELille, France, 2014), pp. 1–6.

    Chapter  Google Scholar 

  2. LW Jochumsen, E Nielsen, J Østergaard, SH Jensen, M Pedersen, in Radar Conference (RadarCon), 2015 IEEE. Using position uncertainty in recursive automatic target classification of radar tracks (IEEEWashington DC, USA, 2015), pp. 0168–0173.

    Chapter  Google Scholar 

  3. DH Nguyen, JH Kay, BJ Orchard, RH Whiting, Classification and tracking of moving ground vehicles. Lincoln Lab. J.13(2), 275–308 (2002).

    Google Scholar 

  4. S Challa, GW Pulford, Joint target tracking and classification using radar and ESM sensors. IEEE Trans. Aerosp. Electron. Syst.37(3), 1039–1055 (2001).

    Article  Google Scholar 

  5. D Angelova, L Mihaylova, Joint target tracking and classification with particle filtering and mixture Kalman filtering using kinematic radar information. Digital Signal Process.16(2), 180–204 (2006).

    Article  Google Scholar 

  6. Z Sun, XJ Ban, Vehicle classification using GPS data. Transp. Res. Part C Emerg. Technol.37:, 102–117 (2013).

    Article  Google Scholar 

  7. M Garg, U Singh, in National Conference on Research Trends in Computer Science and Technology (NCRTCST), IJCCT_Vol3Iss1/IJCCT_Paper_3. C & R tree based air target classification using kinematics (IJECCEHyderabad, India, 2012).

    Google Scholar 

  8. L Breiman, Random forests. Mach. Learn.45(1), 5–32 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  9. S Theodoridis, K Koutroumbas, Pattern Recognition, Fourth Edition, 4th edn. (Academic Press, 2008).

  10. E Brookner, Tracking and Kalman Filtering Made Easy (Wiley-Interscience, 1998).

  11. J Friedman, T Hastie, R Tibshirani, The Elements of Statistical Learning, vol. 1 (Springer, 2001).

  12. CM Bishop, Pattern Recognition and Machine Learning, vol. 1 (Springer, New York, 2006).

    MATH  Google Scholar 

  13. G James, D Witten, T Hastie, An Introduction to Statistical Learning: with Applications in R (Taylor & Francis, 2014).

  14. L Chen, S Wang, in Proceedings of the 21st ACM International Conference on Information and Knowledge Management. Automated feature weighting in naive bayes for high-dimensional data classification (ACM, 2012), pp. 1243–1252.

  15. JA Lawton, RJ Jesionowski, P Zarchan, Comparison of four filtering options for a radar tracking problem. J. Guid. Control. Dyn.21(4), 618–623 (1998).

    Article  Google Scholar 

  16. N Mohajerin, J Histon, R Dizaji, SL Waslander, in Radar Conference, 2014 IEEE. Feature extraction and radar track classification for detecting UAVs in civillian airspace, (2014), pp. 0674–9.

  17. SWBD. Accessed 4 Dec 2014.

Download references


This work was partially supported by the Engineering and Physical Sciences Research Council (EPSRC) Grant number EP/K014307/1 and the MOD University Defence Research Collaboration in Signal Processing.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Lars W. Jochumsen.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jochumsen, L.W., Østergaard, J., Jensen, S.H. et al. A recursive kinematic random forest and alpha beta filter classifier for 2D radar tracks. EURASIP J. Adv. Signal Process. 2016, 82 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: