Model set adaptive filtering algorithm using variational Bayesian approximations and Rényi information divergence

The paper presents a model set adaptive filtering algorithm based on variational Bayesian approximation (MSA-VB) for the target tracking system with the model and noise uncertainties. The Rényi information divergence, as a criterion, is to choose the best match model that has the minimum divergence between candidate models and true mode. Subsequently, the model-conditioned estimation based on variational Bayesian approximation is proposed to estimate system state and measurement noise variances. To deal with the coupled noise intractability, the moments matching technique is used to obtain the mixed statistics of measurement noise at the fusion stage. The proposed algorithm is compared with the interacting multiple models (IMM) algorithm and the variational Bayesian-interacting multiple models (IMM-VB) algorithm via two scenarios for maneuvering target tracking, and simulation results show that the MSA-VB has improved estimation and tracking performance.


Introduction
Target tracking plays an important role in a variety of practical applications, such as underwater sonar tracking [1], aircraft surveillance [2], and visual tracking [3,4]. The objective is to accurately estimate the target state for a sequence of observation sets in presence of noise. When the systems are linear and noise are Gaussian, Kalman filter provides an optimal filtering technique to estimate the target state [5]. Its variant have been studied under numerous relaxed assumptions [6][7][8][9]. Of course, this limitation in these filters is that it assumes a complete prior knowledge of the dynamic and measurement model parameters, including the noise statistics. In many practical situations, model uncertainty is caused by unknown dynamic model parameters, and noise uncertainty is caused by *Correspondence: gaos@xatu.edu.cn 2 Autonomous Systems and Intelligent Control International Joint Research Center, Xi'an Technological University, Xi'an, China 3 School of Electronic Information Engineering, Xi'an Technological University, Xi'an, China Full list of author information is available at the end of the article unknown noise statistics; both violate the abovementioned assumption.
The model uncertainty is due to the fact that the target dynamics cannot be properly modeled by a single state space model (SSM) [10]. For example, in the maneuvering target tracking, the target has different maneuvering behaviors, such as constant velocity, acceleration, and turning with different angular velocity. Thus, the interacting multiple models (IMM) algorithm was proposed in [11], where the multiple models are used to match the different maneuvering behaviors and the transition among different models is subject to a Markov process. Since the state estimation of each model is parallel and independent, the computational complexity of the algorithm increases gradually as the number of models increases and the conflict among the models are more significant. In order to deal with the problem, Li and BarShalom [12] proposed the variable structure multiple model methods (VSMM), which adjusts their model sets in real time. In VSMM, the core idea is the model set adaptation (MSA) (2020) 2020: 17 Page 2 of 15 that aims at finding the best model set for the state estimation. Different implementations for MSA approaches have been proposed. For instance, the expected mode augmentation (EMA) [13], the likely model set (LMS) [14], the minimal sub model set (MSS) [15], and the best model augmentation (BMA) [16]. In addition, the noise uncertainty also affects the performance of the tracking system. Practically, the measurement noise usually varies with interferences. To cope with the noise uncertainty, the adaptive filtering method is used to address the issue of state estimation in the case of unknown noise statistics in [17]. The adaptive filtering algorithms are classified into four categories: maximum likelihood, correlation, covariance matching, and Bayesian method. Among them, the Bayesian method can be seen as a more general case of the other three algorithms. However, most of the Bayesian algorithms are difficult to get the analytical solution-because of the complexity of the probability density function and the high dimensional integral. These adaptive filters considered that the sequence of state variable follows a first-order Markov process. For the stationary random sequence (e.g., image and video signal [18]), the denoising technique based on the intersection of confidence intervals (ICI) rule was presented to provide a noise-free image or its best possible estimate [19,20].
Recently, the variational Bayesian approximation adaptive Kalman filter (VBAKF) proposed in [17] has been introduced to estimate the target state under the case without knowledge of measurement noise variances. Its main idea is that the joint posterior of the target state and measurement noise variances can be approximated by a factored free-form distribution (for models in the conjugate-exponential class). Unfortunately, linear system and Gaussian distribution assumption do not really exist in actual applications. The VBAKF cannot achieve the demanding filtering performance. The nonlinear estimation methods, such as unscented Kalman filter (UKF) and cubature Kalman filter (CKF), were combined with VB approximation method [21,22]. Here, the measurement noise variances are approximated by the variational Bayesian approximation (VB) approach; thereafter, system states are updated by these nonlinear estimation methods. Hu [23] proposed the robust version of VBAKF, which models the measurement noise by using the Student t distribution, and [23] was extended in [24] and [25].
However, the abovementioned algorithms considered only one of these uncertainties. In real environment, the model and noise uncertainties have to be considered simultaneously. Several suggestions for dealing with this problem can be found in literature. In [26], a novel estimator was presented for the jump Markov linear systems with unknown measurement noise variance parameters. A merging scheme is adopted for the system noises in the fusion stage of the IMM approach and a fix recursive form is used to estimate the noise variance parameters. Based on the literature [26], Hong [27] presented a robust variational Bayesian-interacting multiple model (IMM-VB), which models the glint noise by using Gaussian mixture distribution. Gao [28] proposed an interacting multiple model estimation-based adaptive robust UKF, which establishes an adaptive fading UKF for the case of process model uncertainty and a robust UKF for the case of measurement model uncertainty. These approaches have obtained better performance for the problem of the absence of model and noise uncertainties, but they have a very high burden of time complexity. That is because more models have been designed in IMM algorithm for demanding results.
In this paper, we present a model set adaptive filtering algorithm based on variational Bayesian approximation to address the state estimation problem under the situation with dual uncertainties. The Rényi information divergence, as a criterion, is used to computing the divergence between the true mode and the candidate models. Subsequently, it develops a model-conditioned estimation based on variational Bayesian approximation to fuse the state and measurement noise variances. Two simulation experiments are provided to illustrate the effectiveness of the proposed algorithm.
The rest of paper is organized as follows. In Section 2, the state estimation problem with unknown noise statistics is formulated and the variational Bayesian method is briefly reviewed. The proposed algorithm is described in Section 3. In Section 4, the simulation results are presented to prove the effectiveness of the proposed algorithms. Finally, the conclusions are given in Section 5.

Methods/experimental
The main drawbacks of the interacting multiple model method for target tracking system are the high computational complexity and poor performance. A model set adaptive filtering algorithm based on variational Bayesian approximation is proposed in this paper. The proposed method is designed based on the idea of the VSMM and VB methods. The Rényi information divergence measures the "closeness" of two probability density functions. It has additional flexibility in that in allows for emphasis to be placed on specific portions of the support of the densities to be compared. Hence, the Rényi information divergence is as a criterion to choose the best match model that has the minimum divergence between candidate models and true mode. And the moments matching technique is used to obtain the mixed statistics of measurement noise and system state at the fusion stage.
The paper performs Monte Carlo simulation using MATLAB software to examine the behavior of the proposed method. The root mean square errors (RMSEs) (2020) 2020: 17 Page 3 of 15 are used to evaluate the estimation accuracy. We compare our method with IMM-VB and IMM in two different scenarios. The parameters in the experiments are introduced in Section 5. This paper does not contain any studies with human participants or animals performed by any of the authors.

Problem formulation
Consider the following state space model: where x k ∈ R n and z k ∈ R d are the target state and the measurement vectors, respectively. r k denotes the system mode which is described by a discrete-time homogenous Markov chain. The process noise w r k k−1 corresponding to mode r k and the measurement noise v k are assumed to be mutually independent zero-mean Gaussian random processes with the covariance matrices Q r k k−1 and k , respectively. Here, we denote the diagonal covariance matrix comprising of these variances by Due to the fact that the inverse Wishart distribution is the conjugate prior distribution for the variance of the Gaussian distribution [29]. For this reason, a product of inverse Wishart models is adopted to approximate the posterior distribution k . That is where the notation IW ( k ; κ k , k ) represents the inverse Wishart distribution for the variable k with the degree of freedom κ k , and the symmetric positive definite matrix k .

Remark 1
Since the inverse gamma distribution is a special case of the inverse wishart distribution in onedimensional space, the discussed model in this paper is much more general and thus more information can be utilized for filter design. Interested readers are referred to [30] and [31] for a detailed introduction.
However, k is unknown in most cases, which requires to joint estimate the posterior distribution of the target state and the measurement noise covariance. Assume that the dynamic model of the state and the covariance matrix are independent for any mode r k , that is The predicted joint distribution of the state and measurement noise are calculated by the Chapman-Kolmogorov equation.
When the measurement z k is available, the joint posterior distribution is given by the Bays rule.
Notice that the two main problems need to be solved. One is the dynamic model of the measurement noise covariance p( k | k−1 , r k ) is unknown. The other is that the posterior density is difficult to achieve due to the involved intractable integrals. To calculate the posterior density with the unknown noise covariance, the variational Bayesian approximation method, which uses a simple free-from distribution to approximate the joint posterior density, is proposed.

Variational Bayesian approximation
Assume that the state vector and measurement noise covariance are independent, and the joint posterior density can be approximated by a free-form factored distribution as follows where the probability densities Q x (x k ) and Q ( k ) are Gaussian distribution and inverse wishart distribution, respectively. The non-negative KL divergence represents the measure of the dissimilarity of the approximation and the true posterior, that can be expressed as The optimal approximation of the joint posterior density can be obtained by minimizing the KL divergence, and the mean field approximation is used to solve the calculation problem of multiple hidden variables [29]. The results are given as

Model set adaptive diltering algorithm based on variational Bayesian approximation
In this section, a model set adaptive filtering algorithm is proposed. The model set adaptive approach is used to select the best model set for multiple model estimation. Moreover, the noise statistics and state of each model are estimated by the model set conditioned estimation based on variational Bayesian approximation.

Model-set adaptation
To address the problem of the model uncertainty, IMM-VB needs a lot of models to improve the algorithm performance. This has two obvious defects [13]. First, the computation load grows with the increase of the number of the models. Second, the competition among these models lead to performance decrease greatly.
To overcome this problem, a MSA algorithm based on Rényi divergence is proposed. Rényi information divergence is a distance measure between two densities (the test density f and the reference density f 0 ) [32]. The order α Rényi information divergence of f and f 0 is defined as For any order α, the Rényi information divergence takes on its minimum value if and only f = f 0 . In our application, we wish to compute the divergence between the true mode s k and candidate model r j k ∈ M c at the time k. That is where p(z k |s k ) and p z k |r j k are the probability density functions of z k conditioned on s k and r j k , respectively. Due to the true mode s k of the system is unknown at the time k [16], it is assumed that the best online estimates of the probability density function of z k for the true mode s k can be approximated as be the Gaussian densities with vector meansz s k ,z j k and positive definite covariance matrices s k , j k . The Rényi information divergence between p(z k |s k ) and p z k |r j k is where =z s k −z j k . These means and covariance matrices of the Gaussian densities can be calculated as Remark 2 Different selections of the parameter α allow for different parts of these distributions to be emphasized. In the limiting case of α → 1 the Rényi information divergence becomes the Kullback-Liebler divergence. The effect of the order to the Rényi information divergence is detailed and analyzed in [33][34][35]. The results show that α = 0.5 emphasizes the tails of the distribution and allows for the maximum discrimination between two similar distributions. Therefore, we can obtain a better performance by choosing α = 0.5 for tracking applications [36,37].
The optimal modelr k in the candidate model set M c can be selected as the one with the minimum Rényi information divergence.
Thus, the adapted model set is the basic model set M b combine with the modelr k at the time k.

Model-set conditioned estimation based variational Bayesian approximation
Suppose that the posterior probability density function of model i at the time k-1 is described as below Note that Eq. (16) can be seen as a product of a Gaussian distribution and an inverse Wishart distribution. r i k−1 means the event that model i matches the system model in effect at time k − 1, r i k−1 ∈ M k−1 . The notation N x k−1 ;x i k−1|k−1 , P i k−1|k−1 represents the Gaussian probability density function with meanx i k−1|k−1 and covariance P i k−1|k−1 . The notation IW σ 2 k−1,u ; κ i k−1,u , i k−1,u represents an inverse Wishart distribution with parameters κ i k−1,u and i k−1,u . By using the total probability theorem, one has In the model set conditional reinitialization stage, the joint probability density function is described as where μ i|j k−1 denotes the conditional probability that model j transfers to model i at the time k. It is calculated by whereμ j k|k−1 is the normalization coefficient. Based on the model set M k , the output of each filter is merged in the fusion stage [13]. Therefore, we aim to approximate the sum term in (18) by a single one, that is wherex 0j k−1|k−1 and P 0j k−1|k−1 are mixed state and covariance matrix of the model j, respectively.
On the basis of the moment matching theory [26], the first and second moments of the variable σ 0j k−1,u in (20) can be obtained as follows The mean and variance of the inverse Wishart sum distribution in (18) are given by By solving Eqs. (23) and (24), the parameters κ 0j k−1,u and 0j k−1,u are calculated as follows

Remark 3 In the proposed algorithm, the key feature is that the Gaussian sum distribution for the estimated state is approximated by a single Gaussian distribution. Similarly, the inverse Wishart sum distribution for the measurement noise covariance matrix k is approximated by a single inverse Wishart distribution by matching the first and second moments.
Then, the mixed target state and measurement noise covariance are taken as the filter input. The predict density is computed from Eq. (3).
where, taking account into the time variation of the parameters, a forgetting factor ρ is introduced, ρ ∈ (0, 1]. Note that the closer ρ is to 0, the more instability the parameters will be in terms of time-fluctuations [38]. The system state, covariance matrix, and the parameters of the inverse wishart distribution are predicted bŷ After receiving the measurement z k , VB approximation method can be used to obtain a free-form factored approximate distribution for p x k , k |r j k , Z k [38], the analytical expression of the joint posterior probability density function is given by To calculate the distributions of state and measurement noise covariance of the ith model, VB assumes that the joint posterior distribution in (34)  log q x Through the simplified formula, p x k−1 , k−1 |r j k , Z k is approximated by The mean and covariance of the Gaussian distribution are derived by the VB approximation as followŝ Similarly, the logarithm of q j k can be calculated by keeping q x j k fixed such as Here, q j k is approximated as Note that Eqs. (37), (38), (41), and (42) are coupled. The fixed-point algorithm is used to proceeded alternatively the state and noise parameters until the convergence is reached. It has been proved that VB converge very fast and most of the time, only a few iterations in [17]. By using the Bayes rule, the probability of each model is updated Here, the likelihood function L j k of model j is calculated Finally, based on the mixed equation, the state and the covariance in the fusion stage are shown aŝ The computational complexity of the model set adaptive filter is O ln 3

Simulation results
In this section, numerical simulations are carried out in order to compare the performance of the five algorithms: IMM3 (3 basic models), IMM11 (3 basic models and 8 CT models), IMM3-VB (3 basic models), IMM11-VB (3 basic models and 8 CT models), and MSA-VB (3 basic models and 8 CT models as the candidate models) with unknown measurement noise and system model. The reference system dynamic can be described by the following state space model  [1]. These models differ only in the turn rate ω i , which belongs to the basic model set M b that consists of the initial models of the IMM algorithm and the candidate model set M c that consists of these models with different structures compared with the basic models.

Experiment 1
In this case, a single target moves in a 2-D scenario with [ −100, 100] m×[ −100, 100] m surveillance region. The turn rate in different time periods is shown in Table 1.
The measurement equation is expressed as  where measurement noise v k is the zero-mean Gaussian distribution with the unknown covariance matrix k . The initial degree of freedom and symmetric positive definite matrix are κ 0 , i = 5(i = 1, 2), 0 = diag{20, 20}. The turn rates ω i , ω j belong to the following set ω i ∈ −6.67 × 10 −4 , 0, 6.67 × 10 −4 The transition probability matrix (TPM) of the IMM11 11 = (π 11 i,j ) 11×11 is extended from the TPM of the basic models set. The where i, j = 1, . . . , 11, and a = b = 0.01. As can be seen from the RMSEs of position in Figs. 1 and 2. The RMSE of MSA-VB is lower than that of IMM3-VB. Note that the IMM3 and IMM11 perform a bit better than the IMM3-VB, IMM11-VB, and MSA-VB at the beginning (t < 25). The reason for this is VB eliminates the error between initial noise variance and true noise variance by using the iterative computation. The RMSEs of velocity in Figs. 3 and 4. From these figures it can be observed that the RMSEs of velocity of the IMM11-VB, MSA-VB outperform that of the IMM3, IMM11, and IMM3-VB. During the maneuver, MSA also outperforms the other four algorithms.
In [29] and [23], the forgetting factor ρ is chosen empirically. The average RMSEs of the state versus different value of ρ are shown in Fig. 5. Simulation results show that the proposed algorithm performs better when ρ = 0.92.
The RMSEs of the five algorithms are given in Table 2 with different measurement noise variance parameters. The variance parameters are chosen as 5, 10, 20, and 50, respectively. From Table 2, with increasing levels of noise variance parameters, the proposed algorithm and the   Table 3. The CPU time needed for IMM11-VB is three times the CPU time of MSA-VB.

Experiment 2
In this scenario, a bearings-only tracking (BOT) problem is considered. The target located at coordinate The turn rates of the basic model set M b and the candidate model set M c are The TPMs are shown in Eqs. (49) and (50). The RMSEs of X and Y position and velocity over 200 Monte Carlo runs are shown in Figs. 6, 7, 8 and 9, respectively. It can be observed that the proposed algorithm performs a bit better than the other four algorithms, and the RMSEs of IMM11 and IMM11-VB become larger around the time steps where the true measurement noise variances are jump changes. As can be seen from Fig. 7, the RMSE of IMM11 is shocked in the noise varying process. The reason is that the number of dimensions of measurement vector is lower than that of state vector. The result of variance estimation is shown in the Fig. 10. We can see that the proposed algorithm is effective in the estimation of the measurement noise statistics with some penalty of time delay. This is because the old noise value will be contained in part from the prior time step to the next time step. The RMSEs of the five algorithms are given in Table 4 with different measurement noise  variance parameters. The variance parameters are chosen as 0.0001, 0.0005, 0.001, and 0.01, respectively. It can be seen from Table 4 that the average RMSEs increase as the noise variance parameters increase. Compared to the IMM3, IMM3-VB, IMM11, and IMM11-VB, the proposed algorithm has higher tracking accuracy. The comparison of relative computational times are shown in Table 5. The CPU time of the IMM3 denotes a unit. It is obvious that MSA-VB needs less computational time than IMM11-VB.

Discussion
We have shown that the proposed algorithm can mitigates the effects of system model and noise statistics uncertainties in the target track system. We found that the proposed algorithm can effectively estimate the target state, as demonstrated in the numerical simulations. The estimation accuracy was examined by comparing the proposed algorithm, the IMM-VB and IMM methods. The average RMSEs of the positions and velocities of the proposed algorithm are smaller than that of the other algorithms. Our results show an improved estimation and tracking performance compared to the IMM-VB and IMM methods. In addition, we found that the computational complexity of the proposed method is relatively higher than the IMM3-VB and IMM methods. The reason is that most of its computational time is spent on reconstructing the adapted model set and calculating the noise parameters by using the VB method. It should be noted that this study concentrates on only the single sensor target tracking with measurement noise uncertainty. Hence, we currently focus on extending it to multi-sensor target tracking with unknown measurement and process noise.

Conclusions
In this paper, we present an adaptively robust filter to address the performance degradation of the IMM-VB in the presence of system model and noise statistics uncertainties. The main contribution of this paper is that a MSA method is designed to choose the best match model by calculating the divergence between the candidate models and true mode. Based on the chosen model, the model-conditioned estimation based on variational Bayesian approximation is proposed to estimate the system state and noise parameters. The performance of the MSA-VB is evaluated over the different target tracking scenes. The RMSE for the positions and velocities are presented which shows higher accuracy compared with the IMM-VB and IMM methods.