Binomial Gaussian mixture filter
 Matti Raitoharju^{1}Email author,
 Simo AliLöytty^{2} and
 Robert Piché^{1}
https://doi.org/10.1186/s1363401502212
© Raitoharju et al.; licensee Springer. 2015
Received: 16 March 2014
Accepted: 20 March 2015
Published: 11 April 2015
Abstract
In this work, we present a novel method for approximating a normal distribution with a weighted sum of normal distributions. The approximation is used for splitting normally distributed components in a Gaussian mixture filter, such that components have smaller covariances and cause smaller linearization errors when nonlinear measurements are used for the state update. Our splitting method uses weights from the binomial distribution as component weights. The method preserves the mean and covariance of the original normal distribution, and in addition, the resulting probability density and cumulative distribution functions converge to the original normal distribution when the number of components is increased. Furthermore, an algorithm is presented to do the splitting such as to keep the linearization error below a given threshold with a minimum number of components. The accuracy of the estimate provided by the proposed method is evaluated in four simulated singleupdate cases and one time series tracking case. In these tests, it is found that the proposed method is more accurate than other Gaussian mixture filters found in the literature when the same number of components is used and that the proposed method is faster and more accurate than particle filters.
Keywords
1 Introduction
where h(x) is a nonlinear measurement function and ε is the additive measurement error, assumed to be zero mean Gaussian and independent of the prior, with nonsingular covariance matrix R. When a Kalman filter extension that linearizes the measurement function is used for the update, the linearization error involved is dependent on the measurement function and also the covariance of the prior: generally, larger prior covariances give larger linearization errors. In some cases, e.g., when the posterior is multimodal, no Kalman filter extension that uses a single normal distribution as the state estimate can estimate the posterior well.
Gaussian mixture filters (GMFs) work in such a manner that the prior components are split, if necessary, into smaller components to reduce the linearization error within a component. In splitting, it is desirable that the mixture generated is similar to the original prior. Usually, the mean and covariance of the generated mixture are matched to the mean and covariance of the original component. Convergence properties are more rarely discussed, but, for example, in [2] a GM splitting that converges weakly to the prior component is presented.
We propose in this paper a way of splitting a prior component called Binomial Gaussian mixture (BinoGM) and show that when the number of components is increased, the pdf and cumulative distribution function (cdf) of the resulting mixture converge to the pdf and cdf of the prior component. Furthermore, we propose the Binomial Gaussian mixture filter (BinoGMF) for time series filtering. BinoGMF uses BinoGM in component splitting and optimizes the component covariance so that the measurement nonlinearity is kept small, while minimizing the required number of components needed to have a good approximation of the prior.
In our GMF implementation, we use the unscented Kalman filter (UKF) [3,4] for computing the measurement update. The UKF is used because the proposed splitting algorithm uses measurement evaluations that can be reused in the UKF update. To reduce the number of components in the GMF, we use the algorithm proposed in [5].
The rest of the paper is organized as follows. In Section 2, related work is discussed. Binomial GM is presented in Section 3, and BinoGMF algorithms are given in Section 4. Tests and results are presented in Section 5, and Section 6 concludes the article.
2 Related work
In this section, we present first the UKF algorithm that is used for updating the GM components and then five different GM generating methods that we use also in our tests section for comparison.
2.1 UKF
where μ ^{+} is the posterior mean, P ^{+} is the posterior covariance, \(\Omega _{0,m} = \frac {\xi }{n+\xi }\), \(\Omega _{0,c} = \frac {\xi }{n+\xi } + (1  \alpha _{\text {UKF}}^{2}+\beta _{\text {UKF}})\), \(\Omega _{i,c} = \Omega _{i,m} = \frac {1}{2n+2\xi }, (i>0)\) and \(\xi =\alpha _{\text {UKF}}^{2}(n+\kappa _{\text {UKF}})n\). The variables with subscript UKF are configuration parameters [3,4].
where k is the index of kth mixture component. Weights are normalized so that \(\sum _{k=1}^{m} w_{k}^{+} = 1\).
2.2 Havlak and Campbell (H&C)
In H&C [6], a GM is used to improve the estimation in the case of the nonlinear state transition model, but the method can be applied also to the case of the nonlinear measurement model by using measurement function in place of state transition function.
For computing the direction of maximum nonlinearity, the sigma points \(\mathcal {X}_{i}\) are weighted by the norm of their associated residual \(A\mathcal {X}_{i} + \mathbf {b}  \mathcal {Y}_{i} \) and the direction of nonlinearity is the eigenvector corresponding to the largest eigenvalue of the second moment of this set of weighted sigma points.
In the prior splitting, the first dimension of a standard multivariate normal distribution is replaced with a precomputed mixture. An affine transformation is then applied to this mixture. The affine transformation is chosen to be such that after transformation the split dimension is aligned with the direction of the maximum nonlinearity and such that the resulting mixture is a good approximation of the prior component.
2.3 Faubel and Klakow (F&K)
The splitting direction is chosen to be the eigenvector of the prior covariance closest to this direction, and the prior components are split into two or three components that preserve the mean and covariance of the original component.
2.4 Split and merge (S&M)
higher than some predefined value [8]. The splitting direction is chosen to be the eigenvector corresponding to the maximum eigenvalue of the prior covariance. Components are split into two components with equal covariances.
2.5 Box GMF (BGMF)
where H _{ i } is the Hessian of the ith component of the measurement equation. This criterion is only used to assess whether or not the splitting should be done.
If a measurement is considered highly nonlinear, the prior is split into a grid along all dimensions. Each grid cell is replaced with a normal distribution having the mean and covariance of the pdf inside the cell and having as weight the amount of probability inside the cell. It is shown that the resulting mixture converges weakly to the prior component.
2.6 Adaptive splitting (AS)
It is shown that the direction of the maximum nonlinearity is aligned with the eigenvector corresponding to the largest absolute eigenvalue of PH. In [9], a numerical method for approximating the splitting direction that is exact for secondorder polynomial measurements is also presented.
3 Binomial Gaussian mixture
The BinoGM is based on splitting a normal distributed prior component into smaller ones using weights and transformed locations from the binomial distribution.
If a random variable having a standardized binomial distribution is scaled by σ, then its variance is σ ^{2}.
The BinoGM is constructed using a mixture of standard normal distributions along the main axes, with mixture component means and weights selected using a scaled binomial distribution. The mixture product is then transformed with an affine transformation to have the desired mean and covariance.
which contains sets of indices to binomial distributions of all mixture components. Notation C _{l,i} is used to denote the ith component of the lth combination. If m _{ k }=1, the term \(\frac {2 C_{l,k}  m_{k}  1}{\sqrt {m_{k}1}}\) in (21b) is replaced with 0.
where \(\Sigma =\text {diag}\left ({\sigma _{1}^{2}}, \ldots, {\sigma _{n}^{2}}\right)\), to denote a random variable distributed according to a BinoGM. Parameters of the distribution are \(\mu \in \mathbb {R}^{n}\), \(T \in \mathbb {R}^{n \times n} \wedge \text {det}\,{T} \neq 0\) and \(\forall i; 1\leq i \leq n: m_{i} \in \mathbb {N}^{+} \wedge \sigma _{i} \in \mathbb {R}^{+}\).
Matrix T is a square root (5) of a component covariance P. We use notation T instead of L(P) here, because the matrix square root L(P) is not unique and the choice of T affects the covariance of the mixture (25). BinoGM could also be parameterized using prior covariance P _{0} instead of T. In such case, T should be replaced with \(T=L(P_{0})(\Sigma +I)^{\frac {1}{2}}\). In this case, the component covariance is affected by the choice of L(P _{0}).
The limit (26) for BinoGM is in the sense of the weak convergence and convergence of the pdf.
Because x _{B,m} converges weakly to standard normal distribution, then by the continuous mapping theorem, σ x _{B,m} converges weakly to normal distribution with variance σ ^{2} [12].
where \(w_{i}={m1 \choose i1} \) and \(\mu _{i}=\sigma \frac {2i m 1}{\sqrt {m1}}\). This is a pdf of a GM whose variance is σ ^{2}+1. For the weak convergence, we use the following result (Theorem 105 of [13]):
Theorem 1.
If cdf Φ _{ m }(x) converges weakly to Φ(x) and cdf B _{ m }(x) converges weakly to B(x) and Φ _{ m } and B _{ m } are independent, then the convolution of Φ _{ m }(x) and B _{ m }(x) converges to the convolution of Φ(x) and B(x).
Because the cdf of σ x _{B,m} converges weakly to the cdf of a normal distribution with variance σ ^{2} and the sum of two independent normally distributed random variables is normally distributed, the cdf of the sum (27) converges to the cdf of a normal distribution.
 1.
The pdf g _{ m } exists
 2.
Weak convergence
 3.
\(\sup _{m}  g_{m}(x)\leq M(x) < \infty \), for all \(x\in \mathbb {R}\)
 4.
For every x and ε>0, there exist δ(ε) and m(ε) such that x−y<δ(ε) implies that g _{ m }(x)−g _{ m }(y)<ε for all m≥m(ε).
we see that by choosing \(\delta = \frac {\varepsilon }{\max p_{N}^{\prime }(c)+1}\), the requirements of the fourth item are fulfilled and the pdf converges.
where ε _{ i } is an error term.
Because p _{ x } converges to a normal distribution, p _{ xBinoGM} converges also and as such the pdf of the multidimensional mixture (35), which is BinoGM (21), converges to a normal distribution.
4 Binomial Gaussian mixture filter
In this section, we present BinoGMF, which uses BinoGM for splitting prior components when nonlinear measurements occur. In Section 4.1, we propose algorithms for choosing parameters for BinoGM in the case of onedimensional measurements. After the treatment of onedimensional measurements, we extend the proposed method for multidimensional possibly dependent measurements in Section 4.2, and finally, we present the algorithm for time series filtering in Section 4.3.
4.1 Choosing the parameters for a BinoGM
The BinoGM can be constructed using (21), when numbers of components m _{1},…,m _{ n }, binomial variances σ _{1},…,σ _{ n }, and parameters of the affine transformation T and μ are given. Now the goal is to find parameters for the BinoGM such that it is a good approximation of the prior, the nonlinearity is below a desired threshold η _{limit}, and the number of components is minimized. In this subsection, we concentrate on the case of a onedimensional measurement. Treatment of multidimensional measurements is presented in Section 4.2.
Using this relationship, matrix Σ in (67) is linked to the total number of components by the formula \(m_{\text {tot}}=\prod m_{i} = \prod \left (\Sigma _{[i,i]}+1 \right)\).
as the nonlinearity measure, which is similar to the ones used in [2,9]. The Hessian H of the measurement function is evaluated at the mean of a component and treated as a constant. Analytical evaluation of this measure requires that the second derivative of the measurement function (1) exists. The optimal component size is such that the total number of components is minimized while (40) is satisfied and the nonlinearity (41) is below η _{limit}. Nonlinearity measure (41) is defined for onedimensional measurements only. We present a method for handling multidimensional measurements in Section 4.2.
and Δ _{ i }=γ L(P _{0})_{[:,i]}. If γ is chosen as \(\gamma =\sqrt {n+\xi }\), then the computed values of the measurement function in (47) may also be used in the UKF [9]. The Q matrix can also be used for computing the amount of nonlinearity (41), because \(\text {tr} PHPH \approx \frac {QQ}{\gamma ^{4}}\) [9]. Using the Q matrix instead of the analytically computed H takes measurement function values from a larger area into account, analytical differentiation of the measurement function is not needed, and the numerical approximation can be done for any measurement function (1) that is defined everywhere. Because the approximation is based on secondorder polynomials, it is possible that when the measurement function is not differentiable, the computed nonlinearity value does not represent the nonlinearity well.
Our proposed algorithm for choosing the integer number of components is presented in Algorithm 1. At the start of the algorithm, the nonlinearity is reduced to η _{limit}, but if this reduction produces more than m _{limit} components, the component covariance is chosen such that nonlinearity is minimized while having at most m _{limit} components. The algorithm for splitting a Gaussian with component weight w _{0} is summarized in Algorithm 2.
where p(x) is the true posterior and q(x) is the approximated posterior estimate.
From the figure, we can see that if the number of components is quadrupled from (40), the KL divergence does not improve significantly. If the subfigures with the same numbers of components are compared, it is evident that subfigures with (η _{limit}=4) have clearly the worst KL divergence. Subfigures with η _{limit}=0.25 have slightly better KL divergence than whose with η _{limit}=1 but are by visual inspection more peaky. The KL divergence reduces towards the bottom right corner, which indicates convergence.
4.2 Multidimensional measurements
The covariance of the transformed measurements is L(R)^{−1} R L(R)^{−I}=I. This kind of transformation does not change the posterior, which is shown for the UKF in Appendix Appendix 3: UKF update after a linear transformation is applied to measurement function.
Thus, the update of prior with two conditionally independent measurements is \(p(\mathbf {x}\mathbf {y}_{1}, \mathbf {y}_{2}) \propto p(\mathbf {x})p(\mathbf {y}_{1}\mathbf {x})p(\mathbf {y}_{2}\mathbf {x})\), which can be done with two separate updates, first \(p(\mathbf {x}\mathbf {y}_{1}) \propto p(\mathbf {x})p(\mathbf {}y_{1}\mathbf {x})\) and then, using this as the prior for the next update, \(p(\mathbf {x}\mathbf {y}_{1},\mathbf {y}_{2}) \propto p(\mathbf {x}\mathbf {y}_{1}) p(\mathbf {y}_{2}\mathbf {x})\). Thus, the measurements can be applied one at a time. Of course, when using an approximate update method, such as the UKF, the result is not exactly the same because there are approximation errors involved. Processing measurements separately can improve the accuracy, e.g., when y _{1} is a linear measurement, then the prior for the second measurement becomes smaller and thus the effect of its nonlinearity is reduced.
4.3 Time series filtering
So far, we have discussed the update of a prior component with a measurement at a single time step. For the time series estimation, the filter requires in addition to Algorithms 1 and 2 a prediction step and a component reduction step.
If the state model is linear, the predicted mean of a component is \(\mu _{(t)} = F \mu _{(t1)}^{+}\), where F is the state transition matrix and \(\mu _{(t1)}^{+}\) is the posterior mean of the previous time step. The predicted covariance is \(P_{(t)} = {FP}_{(t1)}^{+}F^{T} + W\), where W is the covariance of the state transition error. The weights of components do not change in the prediction step. If the state model is nonlinear, a sigma point approach can be also used for the prediction [3].
where \(P_{i,j} = \frac {w_{i}P_{i} + w_{j}P_{j}}{w_{i}+w_{j}}+(\mu _{i}\mu _{j})(\mu _{i}\mu _{j})^{T}\). Whenever the number of components is larger than m _{reduce} or B _{i,j}<B _{limit}, the component pair that has the smallest B _{i,j} is merged so that the mean and covariance of the mixture is preserved. The test for B _{limit} is our own addition to the algorithm to allow the number of components to be reduced below m _{reduce} if there are very similar components.
When the prior consists of multiple components, the value of m _{limit} for each component is chosen proportional to the prior weight so that they sum up to the total limit of components. Algorithm 3 shows the BinoGMF algorithm. A MATLAB implementation of BinoGMF is also available [see Additional file 1].
5 Tests
We evaluate the performance of the proposed BinoGMF method in two simulation settings. First, the splitting method is compared with other GM splitting algorithms in single prior component single measurement cases, and then BinoGMF is compared with a particle filter in a time series scenario. In all cases, a singlecomponent UKF is used as a reference.
5.1 Comparison of GM splitting methods

2D range  a range measurement in twodimensional positioning

2D second order  the measurement consists of a secondorder polynomial term aligned with a random direction and a linear measurement term aligned with another random direction

4D speed  a speedometer measurement with a highly uncertain prior

10D third order  a thirdorder polynomial measurement along a random direction in a tendimensional space
For computation of sigma points and UKF updates, we used the parameter values α _{UKF}=0.5, κ _{UKF}=0, and β _{UKF}=2. Detailed parameters of different cases are presented in Appendix Appendix 4: Simulation parameters.

For H&C, the onedimensional mixture is optimized with a different method. This should not have a significant effect on results.

For F&K, [7] gives two alternatives for choosing the number of components and for computing the direction of nonlinearity. We chose to use split into three components, and the direction of nonlinearity was computed using the eigendecomposition.

In AS, splitting is not done recursively; instead, every split is done to the component with highest nonlinearity. This is done to get the desired number of components. This probably makes the estimate more accurate than with the original method.
Our implementation of UKF, H&C, F&K, and S&M might have some other inadvertent differences from the original implementations.
The nonlinearitybased stopping criterion is not tested in this paper. Splitting is done with a fixed upper limit on the number of components. We chose to test the methods with at most 3, 9, and 81 components for consistency with the number of components in BGMF. The number of components in BGMF is (2N ^{2}+1)^{ n }, where \(N \in \mathbb {N}\) is a parameter that adjusts the number of components. Since BinoGMF does the splitting into multiple directions at once, the number of components of BinoGMF is usually less than the maximum limit.
Average error on estimation of the splitting direction in degrees
Random  F&K  H&C  BinoGMF  

Δ θ  45  22  23  0.8 
The true direction of nonlinearity is a. AS and BinoGMF use the same algorithm for computing the direction of maximum nonlinearity. S&M uses the maximum eigenvalue of the prior component as the split direction, and in this case, it gives the same results as choosing the direction randomly. In the table, Δ θ is the average error of the direction of maximum nonlinearity.
In two dimensions, choosing a random direction for splitting has a mean error of 45 degrees. The good performances of the proposed method and AS in Figure 5 are partly due to the algorithm for estimating the direction of maximum nonlinearity, because they have clearly the most accurate nonlinearity direction estimates. The 0.8 degree error of AS and BinoGMF could be reduced by choosing sigma points closer to the mean, but this would make the estimate more local. The BinoGMF was also tested with α _{UKF}=10^{−3}. This resulted in splitting direction estimates closer than 10^{−5} degrees, but there was significant accuracy drop in the 4D speed case. With a small value of α _{UKF}, the computation of Q does not take variations of the Hessian into account. This causes fewer problems with AS, because AS splits only in the direction of maximum nonlinearity and then reevaluates the nonlinearity for the resulting components.
Number of measurement function evaluations
Method  Evaluations  Order 

H&C  (m+1)(2n−1)  \(\mathcal {O}(mn)\) 
F&K  \( \frac {(m + 1)}{2}(2n+1)\)  \(\mathcal {O}mn)\) 
S&M  (2m−1)(2n−1)  \(\mathcal {O}(mn)\) 
BGMF  m(2n+1)  \(\mathcal {O}(mn)\) 
AS  (m−1)(n ^{2}+n)+2n+1  \(\mathcal {O}(mn^{2})\) 
BinoGMF  \( \frac {n^{2} +n}{2}+ m(2n+1) \)  \(\mathcal {O}(n^{2} + mn)\) 
5.2 Time series filtering
In the time series evaluation, we simulate the navigation of an aircraft. The route does circular loops with a radius of 90 m if seen from above. In the vertical dimension, the aircraft first ascends until it reaches 100 m, flies at constant altitude for a while, and finally descends. In the simulation, there are five ground stations that emit signals that the aircraft receives.

Time of arrival (TOA) [16]$$ \mathbf{y} = \ \mathbf{r}_{\text{receiver}}  \mathbf{r}_{\text{emitter}} \ + \delta_{\text{receiver}} + \varepsilon_{\text{receiver}}+\varepsilon, $$(54)where r _{receiver} is the aircraft location, r _{emitter} is the location of a ground station, δ _{receiver} is the receiver clock bias, ε _{receiver} is an error caused by receiver clock jitter that is the same for all measurements, and ε is the independent error. The covariance of a set of TOA measurements is$$ R_{\text{TOA}}=I + 5^{2}\mathbf{1}, $$(55)
where 1 is a matrix of ones.

Doppler [17]$$ \mathbf{y} = \frac{\mathbf{v}_{\text{receiver}}^{T} (\mathbf{r}_{\text{receiver}}  \mathbf{r}_{\text{emitter}})}{\ \mathbf{r}_{\text{receiver}}  \mathbf{r}_{\text{emitter}} \ }+ {\gamma}_{\text{receiver}} + \varepsilon_{\text{receiver}}+\varepsilon, $$(56)where v _{receiver} is the velocity of the aircraft and γ _{receiver} is the clock drift. The Doppler measurement error covariance used is$$ R_{\text{Doppler}} = I + \mathbf{1}. $$(57)
The probability of receiving a measurement is dependent on the distance from a ground station, and a TOA measurement is available with 50% probability when a Doppler measurement is available. The estimated state contains eight variables, three for position, three for velocity, and one for each clock bias and drift. The state model used with all filters is linear. State model, true track, and ground station parameters can be found in Appendix Appendix 4: Simulation parameters.
Parameters used in filtering test for BinoGMF
m _{total}  m _{reduce}  η _{limit}  B _{limit}  

BinoGMF4  4  2  4  0.1 
BinoGMF16  16  4  1  0.01 
BinoGMF64  64  16  0.25  0.001 
The UKF and BinoGMF results are located according to their time usage compared to the time usage of PFs. The figure shows how BinoGMF achieves better positioning accuracy with smaller time usage than the PFs. The 95% quantile of UKF and BinoGMF4 is more than 150 m, which is caused by multimodality of the posterior. In these cases, UKF and BinoGMF4 follow only wrong modes. The RBPF has better accuracy than the PF with a similar number of particles, but is slower. In our implementation, updating of one RaoBlackwellized particle is 6 to 10 times faster than the UKF update depending on how many particles are updated. The RBPF requires much more particles than BinoGMF requires components. The bootstrap PF is faster than the RBPF, because our MATLAB implementation of the bootstrap PF is highly optimized.
The figure shows how the PF and BinoGMF16 estimates are multimodal at several time steps, but most of the time, BinoGMF16 has more weight on the correct mode. The RBPF starts in the beginning to follow a wrong mode and does not recover during the whole test. The UKF estimate starts out somewhere between the modes, and it takes a while to converge to the correct mode. UKF could also converge to a wrong node. In multimodal situations as in Figure 7, the comparison of the accuracy of the mean to the true route is not necessarily so relevant, e.g., in PF at time step 70, the mean is located in a low probability area of the posterior.
This simulated test showed that there are situations where the BinoGMF can outperform PFs. We found also that if the state transition model noise (87) was made smaller without changing the true track, then the number of required particles increased fast, while the effect on BinoGMF was small.
6 Conclusions
In this paper, we have presented the BinoGMF. BinoGMF uses a binomial distribution in the generation of a GM from a normal distribution. It was shown that the pdf and cdf of the resulting mixture converge to the prior when the number of components is increased. Furthermore, we presented an algorithm for choosing the component size so that the nonlinearity is not too high, the resulting mixture is a good approximation of the prior, and the number of required components is minimized.
We compared the proposed method with UKF and five different GMFs in several singlestep estimation simulation cases and with UKF and PF in a time series estimation scenario. In these tests, the proposed method outperforms other GMbased methods in accuracy, while using a similar amount or fewer measurement function evaluations. In filtering, BinoGMF provided more accurate estimates faster than bootstrap PF or RaoBlackwellized PF. BinoGMF can be used in suitable situations instead of PFs to get better estimation accuracy, if the measurement error can be modeled as additive and normally distributed.
Because BinoGMF performed well in all tests, we recommend it to be used instead of other GM splitting methods to get better estimation accuracy. It performs especially well in situations where there is more than a few dimensions and in cases where it is essential to have an accurate estimate of the posterior pdf.
7 Appendices
7.1 Appendix 1: Determining the component distance
7.2 Appendix 2: Optimization of mixture parameters
where H is the Hessian of the measurement function, then the nonlinearity associated with direction L(P _{0})V e _{ i } changes from \({\lambda _{i}^{2}}\) to \((1\beta)^{2}{\lambda _{i}^{2}}\).
where B is a diagonal matrix having β _{1},…,β _{ n } on its diagonal.
This means that if the integer nature of m _{ i } is neglected, the optimum m _{ i } is either 1 or proportional to λ _{ i }.
7.3 Appendix 3: UKF update after a linear transformation is applied to measurement function
If the original measurement error covariance is R, then the transformed measurement error covariance is \(\hat {R}=ARA^{T}\).
and the posterior estimate is identical to the estimate computed with nontransformed measurement function.
7.4 Appendix 4: Simulation parameters
7.4.1 GMF comparison

2D range

Prior mean: [5 0]^{ T }

Prior covariance: \(\left [\begin {array}{cc}\cos \theta & \sin \theta \\ \sin \theta & \cos \theta \end {array}\right ]\left [\begin {array}{cc}10 & 0 \\ 0 & 1\end {array}\right ]\left [\begin {array}{lc}\cos \theta & \sin \theta \\ \sin \theta & \cos \theta \end {array}\right ]\), where θ∼U(0,2π)

Measurement function: \(\sqrt {\mathbf {x}_{[1]}^{2} + \mathbf {x}_{[2]}^{2}}\)


2D second order

Prior mean: ∼N(0,I)

Prior covariance: \(\left [\begin {array}{ll}5 & 2 \\ 2 & 5\end {array}\right ]\)

Measurement function: \(\left (\mathbf {a}_{1}^{T}\mathbf {x}\right)^{2}+\mathbf {a}_{2}^{T}\mathbf {x}\), where a _{1}∼N(0,I) and a _{2}∼N(0,I)


4D speed

Prior mean: \(\sim N\left (0, \left [ \begin {array}{ll} \frac {100^{3}}{3}I & \frac {100^{2}}{2} I \\ \frac {100^{2}}{2}I & \frac {100^{2}}{2} I \end {array} \right ]\right)\)

Prior covariance: \(\left [\begin {array}{ll} \frac {100^{3}}{3}I & \frac {100^{2}}{2} I \\ \frac {100^{2}}{2}I & \frac {100^{2}}{2} I \end {array} \right ]\)

Measurement function: \(\sqrt {\mathbf {x}_{[3]}^{2}+\mathbf {x}_{[4]}^{2}}\)


10D third order

Prior mean: ∼N(0,I)

Prior covariance: I

Measurement function: \(\left (k_{4}\mathbf {a}^{T}\mathbf {x}\right)^{3}+ \left (k_{3}\mathbf {a}^{T}\mathbf {x}\right)^{2}+k_{2}\mathbf {a}^{T}\mathbf {x}+k_{1}\), where \({k_{i}\sim N(0,1)} \left (i=1,\ldots,4\right)\) and a∼N(0,I)

7.4.2 Filtering example

Position:$$ \mathbf{r}_{i}^{\text{true}} =\left[ \begin{array}{c} 90 \sin{\theta_{i}} \\ 90 \cos{\theta_{i}} \\ h_{i} \end{array} \right], $$(88)where \(\theta _{i}= \frac {2i\pi }{100}\) and$$ h_{i}= \left\{ \begin{array}{ll} 50+50\cos \frac{\theta_{i}+2\pi}{2}&, i \leq 100 \\ 100 &, 100< i \leq 300 \\ 50+50\cos \frac{\theta_{i}2\pi}{2}&, 300< i \end{array} \right. $$(89)

Velocity: \(\mathbf {v}^{\text {true}}_{i} = r_{i+1}^{\text {true}}  r_{i}^{\text {true}}\)

Bias: \(\delta _{i}= 10\cos \frac {\theta _{i}}{3}\)

Drift: γ _{ i }=δ _{i+1}−δ _{ i }
Declarations
Acknowledgements
This work was supported by Tampere Doctoral Programme in Information Science and Engineering, Nokia Corporation, Nokia Foundation, and Jenny and Antti Wihuri Foundation. The funding sources were not involved in the preparation of this article. The simulations were carried out using the computing resources of CSC  IT Center for Science.
Authors’ Affiliations
References
 HW Sorenson, DL Alspach, Recursive Bayesian estimation using Gaussian sums. Automatica. 7(4), 465–479 (1971).MATHMathSciNetView ArticleGoogle Scholar
 S AliLöytty, Box Gaussian mixture filter. IEEE Trans. Autom. Control. 55(9), 2165–2169 (2010). doi:10.1109/TAC.2010.2051486.View ArticleGoogle Scholar
 SJ Julier, JK Uhlmann, HF DurrantWhyte, in American Control Conference, 3. A new approach for filtering nonlinear systems (Seattle, WA, USA,21–23 June 1995), pp. 1628–1632.Google Scholar
 EA Wan, R Van Der Merwe, in Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. ASSPCC. The unscented Kalman filter for nonlinear estimation (Lake Louise, AB, Canada, 1–4 October 2000), pp. 153–158. doi:10.1109/ASSPCC.2000.882463.Google Scholar
 AR Runnalls, KullbackLeibler approach to Gaussian mixture reduction. IEEE Trans. Aerospace Electron. Syst. 43(3), 989–999 (2007). doi:10.1109/TAES.2007.4383588.View ArticleGoogle Scholar
 F Havlak, M Campbell, Discrete and continuous, probabilistic anticipation for autonomous robots in urban environments. IEEE Trans. Rob.PP(99), 1–14 (2013). doi:10.1109/TRO.2013.2291620.View ArticleGoogle Scholar
 F Faubel, D Klakow, in Proc. Europ. Sig. Process. Conf. (EUSIPCO). Further improvement of the adaptive level of detail transform: splitting in direction of the nonlinearity (Aalborg, Denmark, 23).Google Scholar
 F Faubel, J McDonough, D Klakow, The split and merge unscented Gaussian mixture filter. Signal Process. Lett., IEEE. 16(9), 786–789 (2009). doi:10.1109/LSP.2009.2024859.View ArticleGoogle Scholar
 M Raitoharju, S AliLöytty, An adaptive derivative free method for Bayesian posterior approximation. Signal Process. Lett., IEEE. 19(2), 87–90 (2012). doi:10.1109/LSP.2011.2179800.View ArticleGoogle Scholar
 AC Berry, The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Am. Math. Soc. 49(1), 122–136 (1941).View ArticleGoogle Scholar
 W Feller, On the normal approximation to the binomial distribution. Ann. Math. Stat. 16(4), 319–329 (1945). doi:10.1214/aoms/1177731058.MATHMathSciNetView ArticleGoogle Scholar
 P Billingsley, Convergence of Probability Measures. Wiley Series in Probability and Statistics (Wiley, New York, 2009).Google Scholar
 D McLeish, STAT 901: Probability. Lecture Notes (University of Waterloo, Waterloo, ON, Canada, 2005).Google Scholar
 DD Boos, A converse to Scheffe’s theorem. Ann. Stat, 423–427 (1985).Google Scholar
 S Kullback, RA Leibler, On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951).MATHMathSciNetView ArticleGoogle Scholar
 F Gustafsson, F Gunnarsson, Mobile positioning using wireless networks: possibilities and fundamental limitations based on available wireless network measurements. Signal Process. Mag., IEEE. 22(4), 41–53 (2005). doi:10.1109/MSP.2005.1458284.View ArticleGoogle Scholar
 D Borio, N Sokolova, G Lachapelle, in Proc., ION/GNSS, 9. Doppler measurements and velocity estimation: a theoretical framework with software receiver implementation (Savannah, GA, USA, 22), pp. 304–316.Google Scholar
 NJ Gordon, DJ Salmond, AFM Smith, Novel approach to nonlinear/nonGaussian Bayesian state estimation. Radar Signal Process. IEE Proc. F. 140(2), 107–113 (1993).View ArticleGoogle Scholar
 B Ristic, S Arulampalam, N Gordon, Beyond the Kalman Filter: Particle Filters for Tracking Applications (Artech House, Boston, 2004).Google Scholar
 G Hendeby, R Karlsson, F Gustafsson, The RaoBlackwellized particle filter: a filter bank implementation. EURASIP J Adv. Signal Process. 2010, 724087 (2010). doi:10.1155/2010/724087.View ArticleGoogle Scholar
 HW Kuhn, AW Tucker, in Second Berkeley Symposium on Mathematical Statistics and Probability, 1. Nonlinear programming (Berkeley, CA, USA, 12 August 1951 31 July), pp. 481–492.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.