A Concept of Approximated Densities for Efﬁcient Nonlinear Estimation

This paper presents the theoretical development of a nonlinear adaptive ﬁlter based on a concept of ﬁltering by approximated densities (FAD). The most common procedures for nonlinear estimation apply the extended Kalman ﬁlter. As opposed to conventional techniques, the proposed recursive algorithm does not require any linearisation. The prediction uses a maximum entropy principle subject to constraints. Thus, the densities created are of an exponential type and depend on a ﬁnite number of parameters. The ﬁltering yields recursive equations involving these parameters. The update applies the Bayes theorem. Through simulation on a generic exponential model, the proposed nonlinear ﬁlter is implemented and the results prove to be superior to that of the extended Kalman ﬁlter and a class of nonlinear ﬁlters based on partitioning algorithms.


INTRODUCTION
This paper describes a recursive algorithm based on a nonlinear approach to parameter estimation. The most common procedure applied for nonlinear estimation is the extended Kalman filter (EKF) [1,2,3].
It is known that the nonlinear estimator for nonlinear models does not provide a finite solution. By linearising the state equations about the conditional means x k|k−1 and x k|k , the EKF provides a solution in terms of minimal mean square error to the original nonlinear problem. A number of other procedures have also been applied including partitioning approaches, statistical linearisation, maximum a posteriori, least square criteria, and functional approximation of conditional state density [3,4,5,6,7].
Crucially, finite models or filters that completely define physical systems are rare [8,9]. Nevertheless, the state distributions depend on a finite number of parameters, which are recursively computed as the observed data arrive [10,11,12]. The computation of a finite number of pertinent parameters is a common procedure for the definition of approximated probability distributions. As nonlinear state space models rarely yield explicit or analytic distributions from output measurements, the distribution normally has to be approximated [12].
This paper presents the theoretical development of a simple method of approximation [13]. In the following, the proposed nonlinear adaptive filter may be referred to as the FAD filter. The approach uses a maximum entropy principle to approximate the filtering equations arising from a state model with nonlinear equations. The probability density functions (pdf) created by applying the entropy principle are of an exponential type and depend on a finite number of parameters. The nonlinear filtering leads to recursive equations involving these parameters. The use of the maximum entropy approximation closes the nonlinear equations of the filter; that is the exponential-type family of distributions under consideration is stable for nonlinear equations.
At each time instant k, the FAD filter estimates the consecutive a priori and a posteriori state probability density functions given past and current observations. The logarithms of the pdfs are linear combinations of several functions, ϕ 1 , . . . , ϕ n , chosen according to some specific criterion. The prediction uses maximum entropy subject to constraints related to the functions ϕ. The update applies the Bayes theorem.

MAXIMUM ENTROPY PRINCIPLE
Let µ be a probability over the measurable space (R n , n ) and P a probability law absolutely continuous with respect to µ and of Radon-Nikodym density Proposition 1. Let ϕ 0 = 1, ϕ 1 , . . . , ϕ n be (n + 1) real-valued functions in the Hilbert space. Then, the maximum entropy subject to the real-valued constraints, for all i = 1, . . . , n, gives rise to an exponential-type density function, where the real coefficients λ 0 , . . . , λ n are Lagrange multipliers that are evaluated from the constraints.
Proof. We use the method of the Lagrange multipliers and introduce λ 1 , . . . , λ n adjustable constants called the Lagrange multipliers in order to maximise the expression Using the calculus of variations, the solution for p(x) is given by Thus, − ln p(x) + λ 1 ϕ 1 (x) + · · · + λ n ϕ n (x) − 1 = 0.
Observing that p(x) is a probability density function and denoting the constant-valued function equal to 1 by ϕ 0 , a new constraint appears Therefore, we can substitute λ 0 ϕ 0 (x) for −1 in (7) and solving for p(x) gives an exponential type density function Due to the maximum entropy principle, the method inherently approximates all probability distributions with exponential type density functions. The Lagrange multipliers λ i vary with the family of densities under consideration and are computed at each update and prediction.

NONLINEAR MODEL
We choose to pursue the derivation in the scalar case and discrete time, k ∈ N, for simplicity purposes. Nonlinear state models are often given by equations of the form where {z k } and {y k } are the state and observation processes, g and h are some nonlinear functions. The noise sequences {w k } and {v k } are assumed to be white having any distribution. Let x k be a state defined by with h locally injective, we can write at time instant k + 1 where the nonlinear function f is defined by Thus, the filter uses the nonlinear model defined by (12) and (13) where x k and y k are the state and observation, respectively. The noise processes are mutually independent, and independent of the state, for all k ∈ N.

NONLINEAR FILTER
The FAD filter approximates, at each time instant k, the conditional pdfs of the state x k given the past or present observations through expression (4). We start with the prediction that uses the state equation (12) to predict the conditional density p k+1|k (x|y) of the state x k+1 given the observation up to time instant k, y = (y 1 , . . . , y k ).

Prediction
The prediction introduces several constraints l i , i ∈ N, upon the state x k+1 given y = (y 1 , . . . , y k ). The functions ϕ i define the information retained to describe the state distribution at time instant k+1. For example, a function indicator over an interval defines a constraint that is the probability of observing the state x k+1 in that interval; a function power of x k+1 defines a constraint that is a moment of the state variable.
The state equation (12) is used to evaluate the constraints. It precisely defines the transfer of random variables from time instant k to instant k + 1. Thus, where p w denotes the exponential pdf of the state noise expressed as in (4). The determination of the constraints relies on the calculation of multiple integrals. These integrals can easily be computed by numerical integration based on polynomial interpolation. However, in most applications analytical expressions can be derived.
Note from the state equation that the dynamic behaviour is not retained by this mean in its entirety. A study of the state equation without noise reveals the properties of the asymptotic or limit distribution as k approaches infinity. In most cases, the density has several maxima (even infinite), that is, it is multimodal. In practice, we pay particular attention to the number of local maxima in order to elaborate the a priori exponential-type density from the functions ϕ i that must be chosen accordingly. Some practical examples of bi-, tri-, and five-modal density functions may be found in areas of application such as motion compensated video compression and medical image analysis [13,14,15].
The density predicted is the density that maximises the entropy subject to the constraints resulting from (16). The density created is of an exponential type as in (4). The Lagrange multipliers in the density are determined from the constraints by solving the system of (n + 1) nonlinear equations Depending on the nature of the distributions, analytical expressions may exist. The case of Gaussian approximation is derived in Section 5. The Lagrange multipliers vary with the family of densities under consideration and are computed at each prediction and update.

Update
At each time instant k, the update uses the observation equation (13) to estimate the a posteriori density p k|k (x|y) of the state x k given the past observations up to y k−1 and the actual observation y k = η, y = ((y 1 , . . . , y k−1 ), y k ).
By hypothesis, the density of the state x k and the noise w k given the past observations, are of an exponential type as in (4). Applying the Bayes rule to the observation equation (13) gives the conditional density of the state, which remains as an exponential-type without modification of the functions ϕ i as, in this particular case, the observation equation is linear.
The conditional density p k|k (x|y), given the observations up to y k = η, is defined by where C is the normalisation constant The densities in (18) being as in (4), equating the coefficient of the like terms in ϕ i provides a very simple update of the Lagrange multipliers in the state density. The complete derivation is shown in Section 5 using Gaussian approximation.

IMPLEMENTATION
In its simplest form and for comparison purposes with techniques such as extended Kalman, the approach can be applied to the particular case of Gaussian approximation. In this case, the Lagrange multipliers in the pdf are easily obtained from the constraint. The Gaussian case is presented as a didactic example.

Gaussian approximation
A Gaussian distribution has a density function where m and σ denote the mean and standard deviation, respectively. It can be seen from the above expression that the Gaussian density is of an exponential type. The logarithm of the pdf is linearly developed from the basis functions In that particular case, the constraints, l i , correspond to the moments up to second order of the Gaussian density. Thus, numerical computation of integrals of the form E[ϕ i (X)], where X is a random variable, is avoided. Therefore, expressions for the constraints in terms of the Lagrange multipliers exist and are given by Normalising the constraints yields the statistical characteristics of the Gaussian distribution where m is the mean and σ 2 is the variance. Thus, the Lagrange multipliers in the pdf are defined in terms of the moments by (24)

Generic exponential model
The algorithm is implemented on a generic exponential model. The results are compared to those given in [4] for the same exponential model where the extended Kalman filter and a class of adaptive nonlinear filter (ANLF) based on partitioning algorithms were implemented. The state model is given by Before implementing the FAD filter, we modify the above model so as to obtain a linear observation equation. Following the transformation given in Section 3, the resulting nonlinear model is where the noise processes are mutually independent and independent of the state. The state and measurement noises are taken to be Gaussian distributed with zero mean and variance Q = 1, R = 0.5, and densities where β and µ are the Lagrange multipliers of the respective densities.
For comparison purposes, the initial state x 0 is set to zero and is Gaussian with density Note that, due to the model transformation used, the xstate trajectory will span over a range of values that is equal to that of the z-state trajectory to the power 3. Therefore, the nonlinear filter will have to track huge jumps in state value.

Update
Considering the actual observation y k = η and (27), the conditional density p k|k (x|y) given the observations up to time instant k results from (18) The expansion of the densities gives, (33) A simple addition of the coefficient of the like powers in x yields the updated Lagrange multipliers of the conditional density function, (34)

Results
The x-state trajectory is shown in Figure 1, where the actual states x k and their estimate x k|k are displayed. It is evident from Figure 1 that the FAD filter can easily track jumps in the state trajectory where an EKF would fail.  In [4], the authors compare the EKF to an ANLF formulation with predicted estimates for the innovations and one with filtered estimates. Both involved a bank of 10 subfilters. The performances were evaluated in terms of normalised root mean square error (NRMSE) of 100 time samples over 50 runs where the MSE is defined as In [4], the authors found that the NRMS error for the EKF was always greater than 0.6. For the ANLF with predicted estimates for the innovation, the NRMS error varies within the interval 0.4 and 1. Finally, for the ANLF with filtered estimated estimates, the NRMS error varies within the interval 0.3 and 0.6.
For comparison purposes, the performance of the filter is evaluated in terms of normalised root mean-square error as is done in [4]. Figure 2 displays the NRMS(k) resulting from 50 runs of the proposed filter. Over all, the performance of the proposed filter is superior to the performance of the EKF and both ANLFs.
In [4], the authors also give the values of the average normalised RMS error (average over 100 values) for the EKF, the ANLF (predicted) and the ANLF (filtered). We compare these values to the average NRMS error obtained with the FAD filter.
The values shown in Table 1, again, confirm the superiority of the FAD filter over the others.

CONCLUSION
This paper has presented the theoretical development of a nonlinear adaptive filter based on a concept of filtering by approximated densities (FAD). The proposed recursive algorithm uses a maximum entropy principle subject to constraints to approximate the filtering equations arising from nonlinear state space models.
The approach approximates the state distributions through parameterised exponential density functions. The filter updates and predicts the conditional densities given past and actual observations. The prediction uses the entropy principle, and the update of the Bayes theorem.
The simulation results presented on a generic exponential model prove the ability of the proposed approach to work with highly nonlinear system dynamics. The performance of the filter is superior to the performance found in [4] by the authors on the same exponential model for the extended Kalman filter and two adaptive nonlinear filters based on partitioning algorithms, one formulated with predicted estimates for the innovations and the other with filtered estimates.
The proposed approach has also been applied to nonlinear models involving time-varying Markovian parameters for applications such as motion estimation [14] in motion compensated video compression, and mammography [15], both involving multimodal Gaussian distributions. Further research is currently being conducted for synthetic aperture radar processing.
Other developments not yet published demonstrate the applicability of the approach in a purely nonlinear, non-Gaussian, nonstationary context.