Skip to main content

New graphical models for sequential data and the improved state estimations by data-conditioned driving noises


A prevalent problem in statistical signal processing, applied statistics, and time series analysis arises from the attempt to identify the hidden state of Markov process based on a set of available noisy observations. In the context of sequential data, filtering refers to the probability distribution of the underlying Markovian system given the measurements made at or before the time of the estimated state. In addition to the filtering, the smoothing distribution is obtained from incorporating measurements made after the time of the estimated state into the filtered solution. This work proposes a number of new filters and smoothers that, in contrast to the traditional schemes, systematically make use of the process noises to give rise to enhanced performances in addressing the state estimation problem. In doing so, our approaches for the resolution are characterized by the application of the graphical models; the graph-based framework not only provides a unified perspective on the existing filters and smoothers but leads us to design new algorithms in a consistent and comprehensible manner. Moreover, the graph models facilitate the implementation of the suggested algorithms through message passing on the graph.

1 Introduction

In a wide range of data science applications, such as tracking [1], navigation [2], and audio signal processing [3], one deals with the hidden Markov model that governs the dynamics of the latent process \(x_n\) and that establishes the relationship between the unobserved variable \(x_n\) and the observation \(y_n\). Here \(n \in {\mathbb {N}}\) signifies the discrete time. Let \(Y_{n'}:= \lbrace {y}_1,\cdots ,{y}_{n'} \rbrace\) be the historical accumulation of data and let \({x}_{n|n'}:= {x}_n|Y_{n'}\) be the conditioned random variable, then the Bayesian resolution for the unknown instance of \(x_n\) is given by the probability \({P}(x_{n|n'})\), which is called filtering when \(n=n'\), smoothing when \(n< n'\), and prediction when \(n> n'\) [4]. In this paper concern is confined to the filtering and smoothing distributions.

In the simplest case of linear dynamics together with linear observations corrupted by an independent Gaussian noise, the filtering/smoothing problem can explicitly be solved by the Kalman filter/smoother which describes how the mean and covariance of the conditional probability evolve in the course of time [5,6,7]. For nonlinear systems where the conditioned measure cannot be characterized by a Gaussian probability, however, the target distributions do not permit closed-form expressions. In this case, various efforts have been made to obtain approximate solutions, see e.g., [5, 8,9,10,11,12,13,14,15,16,17,18,19,20]. Examples of traditional algorithms for filtering include the extended Kalman filter [14], the ensemble Kalman filter [15], unscented Kalman filter [21], cubature Kalman filter [22], Gaussian particle filter [23], the bootstrap filter and its variants [13, 16, 24, 25], and the Gaussian mixture filter [17, 18, 26, 27]. As for the smoothing, one can solve the problem by direct application of the Kalman filtering results because smoothing problem is a Kalman filtering problem in disguise (see, for instance, [4] and references therein). In recent years, significant attention has been directed toward the filtering/smoothing for the state estimation due to the remarkable success in various applications including medical tomography [28], geological tomography [29, 30], hydrology [31], petroleum engineering [32, 33], as well as a host of other physical, biological, or social systems [34,35,36,37].

Let \(w_n\) be the independent system noise that affects \(x_n\), then the dependency among the relevant variables can typically be visualized through the directed graphical model in Fig. 1. This diagrammatic representation of the relationships among the random variables allows for the exact or approximate inference via turning the complex computation into a number of reduced operations in the graph [38, 39]. Notice that the graph-based approach has been already introduced to address the problem of sequential data (see, for instance, [19, 40,41,42] and references therein). Here it is important to point out that the classical filter and smoother never pay particular attention to the system noise \(w_n\) in estimating the hidden signal, given observations. One can verify the tendency using the graph model; in general any inference method can be associated with a suitable graphical model, and the class of data assimilation algorithms that can be associated with the graph in Fig. 1 without noises \(w_n\) encompasses most of the existing filters and smoothers [38, 39].

The goal of this work is to investigate the utilities of the previously ignored system noises in improving the state estimation skill of some well-known methods. Specifically, we build a new family of algorithms for which the driving noises play a more prominent and active role compared to the traditional schemes. The key idea for the development is to design a graphical model where, unlike the conventional graph, the process noises are explicitly visible and further integrated to the system variables so that they cannot be simply marginalized out but constitute an essential and integral part of the whole algorithm constructed according to the new graph. For instance, from the graph in Fig. 1, the graph model depicted in Fig. 2 can be obtained through augmenting the driving noise \(w_{n}\) to the system variable \(x_{n-1}\). The resultant graph is in an equivalent form with the original graph without noises, and thus one can obtain the counterpart of a classic algorithm by using the variable \(X_n:=(x_{n-1},w_n)^T\) in place of \(x_n\). Here the upper T denotes transpose.

The move to the other graphical model appears to bring no advantage due to the invariance of the basic scenario, but on the contrary, we will show that the inference schemes derived from the new graph yield more accurate and stable estimations of the unobserved underlying system state. For this, we address two principal ways of data assimilation, and the context can be classified into two parts accordingly. Section 2 discusses the batch assimilation which involves processing the entire training set in one go, while Sect. 3 studies the data assimilation techniques in the sequential fashion. Differently from the batch setting where one has the opportunity to re-use the data points many times and to obtain an answer irrelevant to the order of data, the sequential Bayesian updating uses each data point on arrival and then discards it before receiving the next point. Section 4 contains the summary of our contributions and discussion on future works.

Fig. 1
figure 1

A diagrammatic representation of the hidden Markov model, for which the underlying process \(x_n\) forms a Markov chain and emits a discrete time series of observation \(y_n\). The dashed arrows are used to emphasize that, in the graph model associated with the classical algorithms, the independent system noises \(w_n\) influencing \(x_n\) are not shown

Fig. 2
figure 2

The graphical model in Fig. 1 is transformed into the new graph, where the observation \(y_n\) is determined by \(x_{n-1}\) and \(w_n\), instead of \(x_n\)

2 Batch data assimilation

Section 2.1 introduces the variational inference scheme known as the expectation propagation or EP for short (one who has knowledge of EP can skip this section). Building a new smoother in Sect. 2.2, we provide the theoretical arguments on why the proposed method is likely to outperform in Sect. 2.3, and present numerical simulation results supporting our demonstration in Sect. 2.4.

2.1 Inference using EP

Suppose that the Markov chain \({\mathcal {X}}_n\) and the associated data \(y_n\) are governed by the transition density and the observation distribution:

$$\begin{aligned}&{\mathcal {X}}_n|{\mathcal {X}}_{n-1} \sim P(\cdot \vert {\mathcal {X}}_{n-1}), \end{aligned}$$
$$\begin{aligned}&{ y}_n|{\mathcal {X}}_n \sim P(\cdot \vert {\mathcal {X}}_n). \end{aligned}$$

Then the conditional distribution \(P({\mathcal {X}}_n |Y_N)\) can be approximated using the EP scheme as follows.

Together with the notations \({\mathcal {X}}_{p:q}:= \{ {\mathcal {X}}_p,{\mathcal {X}}_{p+1},\cdots , {\mathcal {X}}_q \}\) and \({\mathcal {X}}_{0:1}:={\mathcal {X}}_1\), one has the factorization \(P({\mathcal {X}}_{1:N}|Y_{N}) \propto \prod _{n=1}^N \varphi _n({\mathcal {X}}_{n-1:n})\) where the factor function \(\varphi _n({\mathcal {X}}_{n-1:n}):= P({\mathcal {X}}_n|{\mathcal {X}}_{n-1}) P(y_n|{\mathcal {X}}_n)\) is partially evaluated at the realization of \(y_n\). The EP method then seeks an approximation of the (conditioned) joint distribution, which is in the form

$$\begin{aligned} \begin{aligned} q({\mathcal {X}}_{1:N}) \propto \prod _{n=1}^N {\widehat{\varphi }}_{n}({\mathcal {X}}_{n-1:n}) \end{aligned} \end{aligned}$$

where \({ {\widehat{\varphi }}_n }\) is a Gaussian approximation of \({ \varphi }_n\). The algorithm further assumes a Gaussian factorization of the element \({ {\widehat{\varphi }}_n=\beta _{n-1}({\mathcal {X}}_{n-1})\alpha _n({\mathcal {X}}_n) }\) so that \(q({\mathcal {X}}_{1:N}) \propto \prod _{n=1:N}q_n( {\mathcal {X}}_n)\) is fully factorized and \(q_n = \alpha _n \beta _n\).

Suppose that all of the factors in (2) are given by some Gaussian functions. Then EP recursively updates the factor \({ {\widehat{\varphi }}_n }\) to a new Gaussian function \({\widehat{\varphi }}'_n\) (hereafter the prime will be used to denote the revised one). In order to do this, one first removes the relevant factor from approximate distribution (2) and then multiplies the exact factor \({ {\varphi }_n }\) to obtain \({\widehat{q}}: = \varphi _n \prod _{j \ne n} {\widehat{\varphi }}_j\). One next evaluates the new posterior \(q'\) by minimizing the KL divergence of \({\widehat{q}}\) against \(q'\), given by \(\int {\widehat{q}} \ln \left( {\widehat{q}}/q' \right)\). The result is that \(q'\) comprises the product of factors in which each factor is given by the corresponding marginal of \({\widehat{q}}\). To obtain the refined factor \({\widehat{\varphi }}'_n (= \beta '_{n-1}\alpha '_n)\), one simply divides \(q'\) by \(\prod _{n \ne j} {\widehat{\varphi }}_n\).

Let \({ p'_n({\mathcal {X}}_{n-1:n}):= \alpha _{n-1} \varphi _n \beta _n }\) then the procedure involves an approximate marginalization denoted by

$$\begin{aligned} \begin{aligned} q'_m({\mathcal {X}}_m) = \text {collapse} \int p'_n ( {\mathcal {X}}_{n-1:n}) \, \textrm{d}{\mathcal {X}}_{n-1:n}\backslash {\mathcal {X}}_m \end{aligned} \end{aligned}$$

where \(\backslash\) denotes the set difference. Here m is either \(m=n-1\) or \(m=n\). The “collapse \(\int\)”-operator performs projection to a Gaussian and marginalization over the states \({\mathcal {X}}_{n}\) or \({\mathcal {X}}_{n-1}\). From \(q'_m = \alpha _{m}'\beta _m'\), the knowledge of (3) allows one to update \(\beta '_{n-1}\) and \(\alpha '_n\).

The difficulty encountered here is that, when \(p'_n\) is a non-Gaussian function, computation (3) is usually intractable. A particular way of resolving this issue leads to one instance of the EP algorithm. Here we introduce two state-of-the-art techniques for the Gaussian approximation of \(q'_m\). One method is via the use of Gaussian cubature [41, 43]. Let \({ Q_n({\mathcal {X}}_{n-1:n}):=\alpha _{n-1}\beta _{n-1} \alpha _n \beta _n }\) be the proposal distribution and let \({ \mu _{Q_n} = \sum _j \lambda _j \delta _{{\mathcal {X}}_{n-1:n}^j} }\) be a cubature rule for this Gaussian function. By virtue of the re-weighting \({ \int g {p'_n } \, \textrm{d}{\mathcal {X}}_{n-1:n} = \int g \frac{p'_n }{Q_n } Q_n \, \textrm{d}{\mathcal {X}}_{n-1:n} }\), the discrete measure

$$\begin{aligned} \begin{aligned} { \mu _{p'_n} = \sum _j \lambda '_j \delta _{{\mathcal {X}}_{n-1:n}^j} \quad \text {where} \quad \lambda '_j \propto \frac{p'_n({\mathcal {X}}_{n-1:n}^j)}{Q_n({\mathcal {X}}_{n-1:n}^j)} \lambda _j } \end{aligned} \end{aligned}$$

is an approximation of the distribution \({ {p}'_n }\). For the approximation of \({q_m'}\), we use the appropriate marginalization of the joint Gaussian distribution function whose mean and covariance are given by the ones from (4). We call this scheme EP–GC (expectation propagation–Gaussian cubature). The other method is to collapse the non-Gaussian two-slice posterior belief \(p'_n\) to a Gaussian form by Laplace approximation [40, 42, 44]. This can be achieved by the truncation of the Taylor expansion of \(p'_n\) at the second-order term [38]. We call the resulting scheme EP-Laplace.

Importantly, the update procedure can be cast in terms of local message passing on the graphical model. To be more precise, one can regard \({ \alpha _n }\) and \({ \beta _{n-1} }\) as the messages between the factor function and the variable in the graph illustrated in Fig. 3. Though there is no required order, the usual EP implementation iteratively performs the message update via the forward and backward pass. During the forward pass, the \(\alpha\) is updated, while the \(\beta\) remains fixed. During the backward pass, the \(\beta\) is updated, while the \(\alpha\) remains fixed. The resulting algorithm reads as follows;

  1. 1.

    One initializes \(\alpha _n({\mathcal {X}}_n)\) and \(\beta _n({\mathcal {X}}_n)\) by suitable Gaussian functions (\(\beta _N \equiv 1\));,

  2. 2.

    Until possible convergence, one continues to perform multiple forward–backward passes;,

    • Forward pass: update \({ \alpha _n }\) as \({ \alpha '_n \propto q_n'/\beta _{n} }\) in the ascending order of \({ n \in [1, N] }\),

    • Backward pass: update \({ \beta _{n-1} }\) as \({ \beta '_{n-1} \propto q_{n-1}'/\alpha _{n-1}}\) in the descending order of \({ n \in [2, N] }\).

No convergence guarantees can be given for EP. It is however known that, in case of being convergent, the solution minimizes the Bethe free energy that takes into account two-point correlations between neighboring variables in the chain [38, 39, 45]. Eventually, the distribution function \(P({\mathcal {X}}_n |Y_N)\) is approximated by \(q_n = \alpha _{n}\beta _n\), that is the product of two incoming messages into the circle node associated with the variable \({\mathcal {X}}_n\).

Fig. 3
figure 3

The factor graph for state space model (1) is shown. The circle node is occupied with the variable, whereas the square node is occupied with the factor function

2.2 Proposed algorithm

Let the forward map for the hidden process and the likelihood function be given by

$$\begin{aligned} {x}_{n}&=\phi ^n( {x}_{n-1},w_n ), \quad w_n \sim {\mathcal {N}}\left( {0}, Q_n \right) \end{aligned}$$
$$\begin{aligned} P(y_n|x_n)&= {\mathcal {L}}^{y_n}(x_n), \end{aligned}$$

and let the law of \(x_1:=x_{1|0}\) be known. Here we intend to make use of EP in calculating the smoothing distribution \(P(x_{n|N})\) for n ranging from 1 to N.

Notice that, in order to solve the estimation problem arised from Eq. (5), one typically applies EP to the graph in Fig. 3, for which \({\mathcal {X}}_n\) is given by \(x_n\). It deserves to mention that this particular graph model can be viewed as the one converted from the graph in Fig. 1 without driving noises. Motivated by the relationship between Figs. 1 and 2, we develop a new version of EP from converting the graph in Fig. 2 into the one in Fig. 3, for which \({\mathcal {X}}_n\) is given by \(X_n =(x_{n-1},w_n)^T\). More precisely, using the notation \({x}_{n} =\phi ^n( {X}_{n} )\) instead of Eq. (5a), state space model (5) is reformulated into

$$\begin{aligned} {X}_{n}&= \left( \begin{array}{c} \phi ^n( {X}_{n-1}), w_{n} \end{array} \right) ^T, \end{aligned}$$
$$\begin{aligned} P(y_n|X_{n})&= {\mathcal {L}}^{y_n}\circ \phi ^n (X_{n}), \end{aligned}$$

and our proposed algorithm uses the transition kernel and likelihood governing state space model (6) in applying the EP method described in the preceding section.

One difference between the two methods is that, while the naive EP directly approximates \(P(x_n|Y_N)\), our suggested scheme yields an approximation of \(P(x_n,w_{n+1}|Y_N)\). Hence, in order to obtain the smoothing distribution of the system variable, additional marginalization needs to be performed. Because the task involves the integration against the noises conditioned on data, we refer to the new method as conditioned-noise EP or CNEP for short.

2.3 Discussion on the prospective performances

Based on the analytic approximation to the distribution function of interest by assuming that it factorizes into a particular way, the EP method carries out a variational inference through the iterated local optimization of a Kullback–Leibler (KL) divergence [19, 40]. For the problem of data assimilation, the original EP scheme seeks the approximate factorization of the probability distribution of the variables \(x_n\), given observations. By contrast, our CNEP explicitly takes into account the noise \(w_n\) and regards it as the part of the system variable so that the new method corresponds to seeking the approximate factorization of the joint distribution function expressed in terms of the augmented variable \(X_n =(x_{n-1},w_n)^T\), conditioned on data. The critical effect by CNEP is that the reference measure in the KL divergence is replaced by the one containing the more information on the whole system state, and that the space of the candidate functions for the optimization using the KL divergence has been extended from the one for EP. It is therefore our belief that the CNEP enables a closer approach to the true distribution function, leading to the accuracy enhancement.

Furthermore, unlike the conventional EP where the driving noises are simply integrated out at early stage before conditioning on data, our CNEP produces solutions through averaging with respect to the conditioned noises; the procedure effectively weakens the potential information loss caused by the marginalization, giving rise to a similar effect with the fully Bayesian approach in machine learning where all of the involved variables are modeled and estimated according to Bayes’ rule to yield a conditional-averaged result that is robust to the particular set of data [38, 39]. We thus anticipate the improvement of the batch assimilation performance by CNEP, compared to the use of EP, in the aspect of stability.

2.4 Numerical experiment

Let us consider the Poisson tracking model [41, 46]. The dynamic equation governs neural activity unfolding over time, and the spike counts within short time-bins are observed. The state space model is given by

$$\begin{aligned}&{x}_{n}| {x}_{n-1} \sim {\mathcal {N}}(\Phi ({x}_{n-1}),Q), \; \Phi (x)= \alpha x + \beta \,\text {erf}(x) \end{aligned}$$
$$\begin{aligned}y_n| {x}_n \sim \text {Poisson}(\exp ( {x}_n )). \end{aligned}$$

where erf\((\cdot )\) represents the error function. Here the transition kernel can be obtained from the forward map \(\phi ^n(x,w)=\Phi (x)+w\). Note that the data \(y_n \in \{ 0,1,2, \cdots \}\) assume a non-negative integer according to a Poisson distribution, and that the likelihood is given by

$$\begin{aligned} \begin{aligned} {\mathcal {L}}^{y_n=k}(x) \propto \exp \left( kx - e^x \right) . \end{aligned} \end{aligned}$$

Because Eq. (8) is a non-Gaussian function, rather than many other algorithms, it is plausible to apply the EP method to calculate the smoothing distribution [41, 46]. Equation (7) thus serves as a good example in comparing the performances of EP and CNEP

For the parameter values \(\alpha =0.9\), \(\beta =0.2\), \(Q_n=0.1\), \(x_1 \sim {\mathcal {N}}(0.1, 0.1)\), and \(N=60\), we implement the EP-GC and the corresponding CNEP method, which we call CNEP-GC. To do this, we use the standard Gaussian cubature formula of degree 3 and 5 that can be found in [47], whose support size is 2k and \(2k^2+2\) in case of k dimension. For the same problem data, we also implement EP-Laplace and the corresponding CNEP method, which we call CNEP-Laplace. We perform a total of 40 independent simulations, and we use the root-mean-square error (RMSE) to compare the various reconstructions of the evolving system state. Note the RMSE between \(A=\{ A_i\}_{i=1}^L\) and \(B=\{ B_i\}_{i=1}^L\) is defined by

$$\begin{aligned} \textstyle \text {RMSE}(A,B) = \sqrt{ \frac{1}{L} \sum _{i=1}^L \vert A_i - B_i \vert ^2 } \end{aligned}$$

where \(A_i\) and \(B_i\) are vectors.

We first calculate RMSE between the true trajectories and the mean of the smoothing solutions for each time n and depict the resulting RMSE distances as the function of time in the top panels of Fig. 4. This result shows that CNEP yields more accurate estimates than EP. While one can see notable improvement in the case of EP–GC and CNEP–GC, the difference between the accuracies of EP-Laplace and CNEP-Laplace is not so much. It also shows that the use of higher-order cubature is advantageous in case of classical EP–GC, but the benefit in case of CNEP–GC is not significant.

We next calculate the averaged RMSEs over the entire time interval, denoted by aRMSE, for independent simulations. The mean and variance of aRMSE are presented in the bottom panels of Fig. 4. The reduced mean further ensures the accuracy improvement of CNEP against EP. We use the variance to address the stability issue; the variance reduction due to CNEP implies that the state estimation results are more robust compared to the ones by EP.

Because the simulation shows that the outperformance of CNEP-Laplace over EP-Laplace is not significant in both aspects of accuracy and stability, our noise-conditioned framework appears not so advantageous for the EP-Laplace. By contrast, it is indeed useful to apply CNEP–GC, rather than EP–GC, for the performance improvement. Considering the increased computational complexity in the implementation of CNEP–GC using Gaussian cubature degree 5, we conclude that CNEP–GC using Gaussian cubature degree 3 is the optimal choice among the smoothing algorithms under consideration in the example of state space model (7).

Fig. 4
figure 4

A realization of the hidden signal \(x_n\) over time is generated from (7a) and compared with the smoothing approximations using observations from (7b). Top panel shows the RMSE distance between these two trajectories, and the bottom panel shows the mean and variance of averaged RMSE (aRMSE)

3 Recursive Bayesian estimation

Section 3.1 describes the representative recursive data assimilation schemes called the (nonlinear) Kalman filter and smoother. Section 3.2 discusses a number of graph models and the associated filtering/smoothing algorithms. Section 3.3 provides new techniques for the sequential filter and smoother. Specifying how to implement the concerned algorithms in Sect. 3.4, their performances will be compared in Sect. 3.5.

3.1 Sequential filter and smoother

In what follows, our presentation is in the context of the state space model given by

$$\begin{aligned} {x}_{n}&=\phi ^n( {x}_{n-1},w_n ), \quad w_n \sim {\mathcal {N}}\left( {0}, Q_n \right) \end{aligned}$$
$$\begin{aligned} {y}_{n}&= g^n({x}_{n}) +\eta _{n}, \quad \eta _{n} \sim {\mathcal {N}}({0},R_{n}) \end{aligned}$$

where \(w_n\) and \(\eta _n\) are independently distributed centered Gaussians. Given the law of \(x_1:=x_{1|0}\), the goal is to estimate the distribution of the conditioned variable \(x_{n|n'}\). Here n ranges from 1 to N, and the case of either \(n'=n\) or \(n'=N\) will be considered depending on the problem of filtering and smoothing, respectively.

3.1.1 Forward filter

The typical approach adopted in most filters for a sequential estimation of the underlying system state is to recursively alternate the time update and the measurement update of the probability distribution [4, 8, 9]. Here and after, the notation \(\nearrow\)/\(\searrow\) will be used to increase/decrease the first index of the conditioned variable by one, and the notation \(\Rightarrow\) to increase the second index by one. For instance, \(x_{n|m}\nearrow x_{n+1|m}\) and \(x_{n|m}\Rightarrow x_{n|m+1}\). Then the pseudo-code for the filter is as follows.

figure a

To achieve a Gaussian approximation of the target distribution, the forward time update is implemented in a way that one obtains a Gaussian approximation of \((x_{n-1|n-1}, x_{n|n-1})\) and performs a suitable marginalization. As for the measurement update, one approximates the joint distribution of \((x_{n|n-1},y_{n})\) by Gaussian and applies Bayes’ rule (A1) described in Appendix A in order to perform the conditioning \(x_{n|n-1} \vert y_n:= x_{n|n}\).

3.1.2 Backward smoother

While the sequential filter is in progress, the approximate distributions of \(x_{n|n}\) are recursively obtained from \(n=1\) to \(n=N\) in the increasing order. Once the process for filtering is over, the nonlinear Kalman smoother given in Appendix B can be applied to yield Gaussian approximations of \(x_{n|N}\) in the order of decreasing index, from \(n=N\) to \(n=1\). The pseudo-code reads as follows.

figure b

Note the implementation requires the knowledge of the joint Gaussian distribution of \((x_{n|n}, x_{n+1|n})\), which was obtained and stored during the previous filtering procedure.

3.2 One-step ahead filter and smoother

Notice that the forward filter and backward smoother introduced in the preceding section are directly relevant to the graph in Fig. 1 without the noises \(w_n\). More precisely, the forward–backward algorithm is the general graph-based method for the statistical inference via the message passing and reduces to the standard filter and smoother when it complies with the designated graph [38, 39].

In view of the same form of two directed graph models in Fig. 1 without noises and in Fig. 2, one is naturally interested in the forward–backward algorithm with respect to the graph in Fig. 2. Let us first describe the corresponding methods. In order to do that, state space model (9) governing \((x_n,y_n)\) is posed as the one in terms of the variables \((X_{n},y_{n})\):

$$\begin{aligned} {X}_{n}&= \left( \begin{array}{c} \phi ^n( {X}_{n-1}), w_{n} \end{array}\right) ^T, \end{aligned}$$
$$\begin{aligned} {y}_{n}&= g^n\circ \phi ^n ({X}_{n}) +\eta _{n}. \end{aligned}$$

Denoting \({X}_{n|n'}:= {X}_n|Y_{n'}\), one makes use of state space model (10) to perform the iteration as follows.

figure c

From suitable marginalizations of the outcomes \(P(X_{n+1|n})\), the filtering distributions \(P(x_{n|n})\) are obtained for \(1 \le n \le N\). Likewise, the smoothing distributions \(P(x_{n|N})\) can be calculated according to the following backward iteration.

figure d

We next remark that the forward filter making use of reformulation (10) has been proposed in the author’s prior work [48]. In the present paper the algorithm will be called the one-step ahead filter because, unlike the standard filter conducting the step \(x_{n|n-1} \Rightarrow x_{n|n}\), its variant using Eq. (10) produces the filtering law of \(x_{n|n}\) from the one-step ahead smoothing law of \(x_{n-1|n}\) (similarly, the smoother using Eq. (10) will be called the one-step ahead smoother). There is a body of work that has demonstrated the advantage of this alternative path toward the filtering distribution in addressing the estimation problem arised in geophysical sciences [49,50,51,52]. Inspired by the practical relevance of the smoothing-based filter, we would like to improve the current version of our one-step look-ahead algorithms. This can be achieved by repeating essentially the same procedure with the one we did when Fig. 1 is converted to Fig. 2, and here we can take advantage of the graph-model framework.

Specifically one can read from the graph in Fig. 2 that, in carrying out the time update of the one-step ahead filter to quantify the uncertainty propagated by the statistical model, the prediction \(x_{n-1|n} \nearrow x_{n|n}\) is pushed forward according to the law of \(w_{n}|y_{n}\). This driving noise conditioned on future observation possesses a nonzero value as the mean, giving rise to an effective form of importance sampling so that the filtered solution tends to be nudged to the true system trajectory [48]. Now the transition from Figs. 1 to 2 creates a momentum for us to proceed to a new graphical model presented in Fig. 5, and the philosophy lying behind this attempt is to strengthen the nudging effect; unlike the one-step ahead filter, the time update in the new graph drives the evolving estimate together with not only the conditioned noises \(w_{n}|y_{n}\) but \(w_{n-1}|y_{n}\) as well. Our concern is whether the noises conditioned on further future observations can give rise to a more accurate estimation of the hidden state. We address the issue after building this insight into the practical algorithm in the next section.

Fig. 5
figure 5

The probabilistic graphical model that gives rise to the two-step ahead conditioned-noise filter and smoother. Here the notation \(w^{m+1}_m = (w_m,w_{m+1})\) is used

Fig. 6
figure 6

Two different paths to the filtered solution are shown: (i) the one-step ahead filter (\({x}_{n-2|n-1} \rightarrow {x}_{n-1|n-1} \Rightarrow {x}_{n-1|n}\rightarrow {x}_{n|n}\)) and (ii) the two-step ahead filter (\({x}_{n-2|n-1} \Rightarrow {x}_{n-2|n} \rightarrow {x}_{n-1|n}\rightarrow {x}_{n|n}\))

3.3 Proposed algorithm: Two-step ahead filter and smoother

Here we formulate the forward filter and backward smoother associated with the graph in Fig. 5. Let \({\mathbb {X}}_n:= ({X}_{n-1}, w_{n})^T =(x_{n-2}, w_{n-1},w_{n})^T\) and let Eq. (10a) be denoted by \({X}_{n}:= \Phi ^n({\mathbb {X}}_{n})\) then state space models (9), (10) can be posed as the one governing the joint variable \(({\mathbb {X}}_{n},y_n)\):

$$\begin{aligned} {\mathbb {X}}_{n}&= \left( \begin{array}{c} \Phi ^n( {\mathbb {X}}_{n-1}), w_{n} \end{array}\right) ^T, \end{aligned}$$
$$\begin{aligned} y_n&= g^n\circ \phi ^{n} \circ \Phi ^n({\mathbb {X}}_{n})+\eta _n. \end{aligned}$$

Using the notation \({\mathbb {X}}_{n|n'}:= {\mathbb {X}}_n|Y_{n'}\), the filtering of \(x_{n|n}\) can be obtained via the following two-layer procedure. On the one hand, the recursive estimation \({\mathbb {X}}_{n|n-1} \Rightarrow {\mathbb {X}}_{n|n} \nearrow {\mathbb {X}}_{n+1|n}\) is performed. Meanwhile, on the other hand, the time update \({\mathbb {X}}_{n+1|n} \nearrow {\mathbb {X}}_{n+2|n}\) is carried out for filtering. This additional step is necessary because the law of \(x_{n|n'}\) comes from the marginalization of \({\mathbb {X}}_{n+2|n'}\). Putting it together, the pseudo-code reads as follows.

figure e

The fixed-interval smoothing distributions \(P(x_{n|N})\) can be obtained from the following iteration and from the marginalization of \({\mathbb {X}}_{n+2|N}\).

figure f

The proposed algorithms are called the two-step ahead filter and smoother. In Fig. 6, the conditioned variables \(x_{n|n'}\) resulting from our one-step and two-step ahead filters are illustrated to emphasize the algorithmic difference. As in [53], the two-step ahead filter produces the law of \(x_{n|n}\) through the smoothing distributions of \(x_{n-2|n}\) and \(x_{n-1|n}\). However, the presence of conditioned system noises distinguishes our method from the prior works and the framework can account for the prospective out-performance of the derived algorithms in a more comprehensible fashion. Though the approach can straightforwardly be generalized to higher-order procedures, we do not proceed further due to increasing complexity.

3.4 Implementation using cubature measure

Notice that the equations in state space models (9) and (10), (11) can be cast into the common form of \({\textsf{Y}}=\Phi ({\textsf{X}})\) for an appropriate function \(\Phi (\cdot )\) and a Gaussian random variable \({\textsf{X}}\). For instance, one can regard \({\textsf{X}} = ( X_{n-1},w_n)\) and \({\textsf{Y}} = X_{n}\) in case of Eq. (10a), and \({\textsf{X}} = ( X_{n},\eta _n)\) and \({\textsf{Y}} = y_{n}\) in case of Eq. (10b). Therefore, for the implementation of the algorithms discussed so far, it is sufficient to define how to obtain a Gaussian approximation of \({\textsf{Z}}=({\textsf{X}},{\textsf{Y}})\).

The method adopted here is to pass a set of weighted points known as cubature points through the function and fit a Gaussian to the resulting transformed points. To be precise, let \(\mathsf { \mu _X = \sum _i \lambda _i \delta _{X^i} }\) be the cubature with respect to \({\textsf{X}}\), that refers to a discrete measure in possession of the same moments with the distribution of \({\textsf{X}}\) up to a certain degree. Let \(\mathsf { Z^i=(X^i,\Phi (X^i)) }\) then the mean and covariance of \(\mathsf { \mu _{Z} = \sum _i \lambda _i \delta _{Z^i} }\) are given by

$$\begin{aligned}\mathsf { M} &= \sum _\mathsf{i} \lambda _\mathsf{i} {{Z}^i}, \\ { \Sigma} &= \sum _\mathsf{i} \lambda _\mathsf{i} \left( \mathsf{Z}^\mathsf{i} - \mathsf{M} \right) \left(\mathsf{Z}^\mathsf{i} - \mathsf{M} \right) ^\mathsf{T}. \end{aligned}$$

The probability of \({\textsf{Z}}\) is approximated by the normal distribution \({\mathcal {N}}({\textsf{M}},\mathsf {\Sigma })\). The method is easy to implement, giving rise to computational advantage, and further ensures a good degree of accuracy in filtering and smoothing applications [22, 47, 54, 55].

In the present paper, we call the filter and smoother that are introduced in Sect. 3.1 and that use the cubature measure, the cubature Kalman filter (CKF) and the cubature Kalman smoother (CKS). Similarly, we call the filters that are introduced in Sects. 3.2 and 3.3 and that use the cubature measure, the conditioned-noise cubature Kalman filter of the first order (CNCKF) and the conditioned-noise cubature Kalman filter of the second order (CN2CKF). The corresponding smoothers are named by CNCKS and CN2CKS.

3.5 Numerical simulation

Here we numerically perform a comparison analysis of the algorithms defined in the preceding section. Specifically, we are interested in the accuracy and stability of the cubature-based Gaussian approximation filters (CKF,  CNCKF,  CN2CKF) and smoothers (CKS,  CNCKS,  CN2CKS). Our test-bed is two benchmark examples in the context of sequential filtering and smoothing.

Fig. 7
figure 7

The RMSEs between the target trajectory and the filtering/smoothing estimates obtained from the average over 200 independent simulations; a position; b velocity; and c turn rate

Fig. 8
figure 8

The mean and variance of time-averaged RMSEs between the target trajectory and filtered/smoothed solutions; a position; b velocity; and c turn rate

3.5.1 Target tracking

Consider a model air-traffic monitoring scenario, where an aircraft executes a maneuvering turn in a horizontal plane at an unknown turn rate \(\Omega _n\) at time n [22, 48, 55]. The dynamical system is given by

$$\begin{aligned} \begin{aligned} {x}_{n+1} =&\left( \begin{array}{lllll} 1 &{}\quad \frac{\sin (\Omega _n\Delta t)}{\Omega _n} &{}\quad 0 &{} \frac{\cos (\Omega _n\Delta t)-1}{\Omega _n} &{}\quad 0 \\ 0 &{}\quad \cos (\Omega _n\Delta t) &{}\quad 0 &{}\quad -\sin (\Omega _n\Delta t) &{}\quad 0 \\ 0 &{}\quad \frac{1-\cos (\Omega _n\Delta t)}{\Omega _n} &{}\quad 1 &{}\quad \frac{\sin (\Omega _n\Delta t)}{\Omega _n} &{}\quad 0 \\ 0 &{}\quad \sin (\Omega _n\Delta t) &{}\quad 0 &{}\quad \cos (\Omega _n\Delta t) &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \end{array}\right) {x}_n \\&+ w_{n+1} \end{aligned} \end{aligned}$$

where \({x}_n =( \texttt{x}_n, \dot{\texttt{x}}_n, \texttt{y}_n, \dot{\texttt{y}}_n, \Omega _n )^T\); \((\texttt{x}_n, \texttt{y}_n)\) and \((\dot{\texttt{x}}_n, \dot{\texttt{y}}_n)\) are the position and velocity at time n, respectively. The system noise \(w_n\) is distributed according to a centered Gaussian with covariance

$$\begin{aligned} Q_n = \left( \begin{array}{lllll} \frac{\Delta t^3}{3} &{}\quad \frac{\Delta t^2}{2} &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ \frac{\Delta t^2}{2} &{}\quad \Delta t &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad \frac{\Delta t^3}{3} &{}\quad \frac{\Delta t^2}{2} &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad \frac{\Delta t^2}{2} &{}\quad \Delta t &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 1.75\times 10^{-3} \Delta t \end{array}\right) . \end{aligned}$$

The measurement equation is given by

$$\begin{aligned} {y}_n = \left( \begin{array}{l} \sqrt{ \texttt{x}_n^2+\texttt{y}_n^2} \\ \tan ^{-1}\left( {\texttt{y}_n}/{\texttt{x}_n} \right) \end{array}\right) +\eta _n \end{aligned}$$

and the noise covariance is \(R_n = \text {diag}(10^2, 10^{-5} )\). The inter-observation time is \(\Delta t = 1\) (hereafter the units of physical quantities are omitted for brevity).

Fig. 9
figure 9

The RMSEs between the target trajectory and the filtering/smoothing estimates obtained from the average over 1800 independent simulations; a position; b velocity; and c turn rate. Here EKF/EKS represents the extended Kalman filter/smoother

Fig. 10
figure 10

The mean and variance of time-averaged RMSEs between the target trajectory and filtered/smoothed solutions; a position; b velocity; and c turn rate

The simulation results are based on 200 independent Monte Carlo runs. In each case, the initial state of the true signal is a draw from the normal distribution \(x_1 \sim {\mathcal {N}}\left( ( 10^3, 3 \cdot 10^2, 10^3, 0, -\frac{3\pi }{180})^T, \text {diag}( [10^2, 10, 10^2, 10, 10^{-4} ] ) \right)\), and the sequential observations over the time interval \(1 \le n \le 200\) are randomly generated.

We obtain a single trajectory of \(x_n\) over time by solving (12) and calculate the filtered/smoothed estimations using the observational data from (13). Figure 7 shows the RMSEs in each time step, with respect to position, velocity and turn rate. We see that (i) the performance of CNCKF/CNCKS is less sensitive to the cubature order compared to the case of CKF/CKS, and that (ii) the algorithms obtained by applying the conditioned-noise framework uniformly outperform the naive methods. Figure 8 depicts the mean and variance of aRMSE. From the reduced means in case of our algorithms, compared to the classical schemes, we argue that there are improvements in accuracy. Similarly, the reduced variances imply that the new algorithms are more stable than the classical schemes. Because the performances of CNCKS and CN2CKS are similar to each other, taking into account the computational burden, CNCKS is preferred to the other suggested algorithms.

3.5.2 Ballistic target

Let us consider the problem of tracking a ballistic target under the influence of drag and gravity acting on the target [54, 56, 57]. Let \(x_n=(x_n^1,x_n^2,x_n^3)^T\) be the state vector, where \(x_n^1\) and \(x_n^2\) are altitude and velocity, respectively, and \(x_n^3\) is a constant ballistic coefficient. The equation of motion is given by

$$\begin{aligned} \begin{aligned} x_{n+1} = \left( \begin{array}{l} x_n^1 -\delta x_n^2\\ x_n^2 -\delta e^{-\gamma x_n^1} (x_n^2)^2 x_n^3+g\\ x_n^3 \end{array}\right) . \end{aligned} \end{aligned}$$

here \(\delta =0.5\) is the integration time, \(\gamma = 1.49\times 10^{-4}\) and \(g = 9.81\) serve as the drag and gravity constants.

The measurement equation is given by

$$\begin{aligned} \begin{aligned} y_n&= \sqrt{ M^2+(x_n^1-H)^2 }+\eta _n \end{aligned} \end{aligned}$$

where M is the horizontal distance, and H determines the radar location. The system is characterized by the parameters \(H=10^3\), \(M=10^4\) and \(R_n=30^2\). The true initial state is \(x_1^*=(61 \cdot 10^3, 3048, 4.49 \cdot 10^{-4})^T\), and the initial state density is \({\mathcal {N}}\left( (62 \cdot 10^3, 3400, 10^{-5})^T, \text {diag}([10^6, 10^4,10^{-4}]) \right)\).

We generate a single trajectory of \(x_n\) over time by solving (14) and obtain the filtered/smoothed estimations using the observational data from (15). We perform 1800 independent simulations. The illustrations in Figs. 9, 10 allow us to demonstrate the out-performance of the conditioned-noise framework over the classical methods. Figure 9 depicts the RMSE values as the function of time, showing that (i) CN2CKF is always better than CNCKF, and (ii) CN2CKS is in general better than CNCKS but occasionally does not lead to an improved performance. This numerical simulation reveals that, quite interestingly, the outperformance of our framework holds even for the system without explicit driving noise. This result accords with the prior work provided in [53].

4 Conclusion

This paper considers the design of batch smoother and sequential filter/smoother for discrete-time nonlinear systems with Gaussian noise. The new family of algorithms for the state estimation are proposed as follows;

  1. 1.

    For state space model (5),

    • we develop the conditioned-noise version of expectation propagation in Sect. 2.2,

  2. 2.

    For state space model (9),

    • we develop the conditioned-noise smoother of the first order in Sect. 3.2,

    • we develop the conditioned-noise filter and smoother of the second order in Sect. 3.3.

Note the development is achieved by the reformulation of the original state space model into the one governing the augmented variable comprising the system variable and driving noise. The implementation of the proposed algorithm is basically the same as the existing method, and the difference is that the conditioned-noise expectation propagation makes use of (6), in place of (5), and the conditioned-noise filter/smoother makes use of (10) and (11), in place of (9). The numerical simulations performed in Sects. 2.4, 3.5 confirm that, in any of the benchmark examples studied in the areas of batch and sequential data assimilation, the filters and smoothers developed according to the conditioned-noise framework uniformly outperform the corresponding classical methods in both aspects of accuracy and stability. We emphasize that this result from the numerical analysis is in accordance with the theoretical reasoning by considering the role of the conditioned noise in Sects. 2.3, 3.2.

Throughout the text, we investigate Gaussian approximation of the target probability distribution. It is our belief that the conditioned-noise framework remains to be competitive even when the distribution function is parametrized in a different way. In future works, therefore, we plan to develop Gaussian mixture and sequential Monte Carlo algorithms for filtering and smoothing, which are similar in spirit to the conditioned-noise Gaussian filters and smoothers proposed in this work. With a goal to extend the applicability of the graph-based method beyond the scope that explored in this paper, we also plan to pursue the direction where an improved performance of data assimilation is sought via introducing a new graphical model with particular characteristics and considering an inference scheme in the graph.

We finally discuss the pathway to the impact of this work to the academia and industry. Our effort to shed a new light on the driving noises can bring advantages to the academic community by creating new research momentum in devising a data assimilation technique on the basis of the graphical model. Enriching the library of the filters/smoothers directly applicable to solving real-world problems, our research result will be beneficial for industrial progress where the need for accurate and stable filtering/smoothing schemes is paramount.

Availability of data and materials

Not applicable.


  1. S. Särkkä, V.V. Viikari, M. Huusko, K. Jaakkola, Phase-based uhf rfid tracking with nonlinear kalman filtering and smoothing. Sens. J. IEEE 12(5), 904–910 (2012)

    Article  Google Scholar 

  2. D. Titterton, J.L. Weston, Strapdown Inertial Navigation Technology vol. 17. IET (2004)

  3. S. Godsill, P. Rayner, O. Cappé, Digital Audio Restoration (Springer, 2002)

  4. B.D.O. Anderson, J.B. Moore, Optimal Filtering, vol. 11 (Prentice-hall Englewood Cliffs, NJ, 1979)

    Google Scholar 

  5. R.E. Kalman et al., A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)

    Article  MathSciNet  Google Scholar 

  6. R.E. Kalman, R.S. Bucy, New results in linear filtering and prediction theory. J. Basic Eng. 83(3), 95–108 (1961)

    Article  MathSciNet  Google Scholar 

  7. S. Särkkä, Bayesian Filtering and Smoothing, vol. 3 (Cambridge University Press, 2013)

  8. H. Kushner, Approximations to optimal nonlinear filters. IEEE Trans. Autom. Control 12(5), 546–556 (1967)

    Article  Google Scholar 

  9. A. Jazwinski, Stochastic Processes and Filtering Theory, Mathematics in science and engineering, vol. 64. (Academic Press, San Diego, California, 1970)

    Book  Google Scholar 

  10. A.F. Bennett, Inverse Methods in Physical Oceanography (Cambridge University Press, 1992)

  11. E. Kalnay, Atmospheric Modeling, Data Assimilation, and Predictability (2003)

  12. D.S. Oliver, A.C. Reynolds, N. Liu, Inverse Theory for Petroleum Reservoir Characterization and History Matching (Cambridge University Press, 2008)

  13. A. Doucet, N. De Freitas, N. Gordon, Sequential Monte Carlo Methods in Practice (Springer Verlag, 2001)

  14. A. Gelb, Applied Optimal Estimation (MIT Press, 1974)

  15. G. Evensen, Data Assimilation: The Ensemble Kalman Filter (Springer Verlag, 2009)

  16. N.J. Gordon, D.J. Salmond, A.F.M. Smith, in Novel Approach to Nonlinear/Non-gaussian Bayesian State Estimation, IEE Proceedings F Radar and Signal Processing, vol. 140 (IET, 1993), pp. 107–113

  17. R. Chen, J.S. Liu, Mixture Kalman filters. J. R. Stat. Soc. Ser. B (Statistical Methodology) 62(3), 493–508 (2000)

    Article  MathSciNet  Google Scholar 

  18. A.S. Stordal, H.A. Karlsen, G. Nævdal, H.J. Skaug, B. Vallès, Bridging the ensemble Kalman filter and particle filters: the adaptive gaussian mixture filter. Comput. Geosci. 15(2), 293–305 (2011)

    Article  Google Scholar 

  19. T.P. Minka, in Expectation Propagation for Approximate Bayesian inference, Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, (Morgan Kaufmann Publishers Inc, 2001), pp. 362–369

  20. S. Razali, K. Watanabe, S. Maeyama, K. Izumi, in An Unscented Rauch-Tung-Striebel Smoother for a Bearing only Tracking Problem, ICCAS 2010, (IEEE, 2010), pp. 1281–1286

  21. S.J. Julier, J.K. Uhlmann, Unscented filtering and nonlinear estimation. Proc. IEEE 92(3), 401–422 (2004)

    Article  Google Scholar 

  22. I. Arasaratnam, S. Haykin, Cubature Kalman filters. IEEE Trans. Autom. Control 54(6), 1254–1269 (2009)

    Article  MathSciNet  Google Scholar 

  23. J.H. Kotecha, P.M. Djuric, Gaussian sum particle filtering. IEEE Trans. Signal Process. 51(10), 2602–2612 (2003)

    Article  MathSciNet  Google Scholar 

  24. M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002)

    Article  Google Scholar 

  25. O. Cappé, E. Moulines, T. Rydén, Inference in Hidden Markov Models (Springer, 2005)

    Book  Google Scholar 

  26. I. Hoteit, D. Pham, G. Triantafyllou, G. Korres, A new approximate solution of the optimal nonlinear filter for data assimilation in meteorology and oceanography. Mon. Weather Rev. 136(1), 317–334 (2008)

    Article  Google Scholar 

  27. I. Hoteit, X. Luo, D.-T. Pham, Particle Kalman filtering: a nonlinear Bayesian framework for ensemble Kalman filters*. Mon. Weather Rev. 140, 528–542 (2012)

    Article  Google Scholar 

  28. I.S. Weir, Fully Bayesian reconstructions from single-photon emission computed tomography data. J. Am. Stat. Assoc. 92(437), 49–60 (1997)

    Article  Google Scholar 

  29. R.P. Barry, M. Jay, V. Hoef, Blackbox kriging: spatial prediction without specifying variogram models. J. Agric. Biol. Environ. Stat. 297–322 (1996)

  30. R. Glaser, G. Johannesson, S. Sengupta, B. Kosovic, S. Carle, G. Franz, R. Aines, J. Nitao, W. Hanley, A. Ramirez, et al, Stochastic engine final report: Applying markov chain monte carlo methods with importance sampling to large-scale data-driven simulation, Technical report, (Lawrence Livermore National Lab., Livermore, CA, 2004)

  31. H.K. Lee, D.M. Higdon, Z. Bi, M.A. Ferreira, M. West, Markov random field models for high-dimensional parameters in simulations of fluid flow in porous media. Technometrics 44(3), 230–241 (2002)

    Article  MathSciNet  Google Scholar 

  32. P.S. Craig, M. Goldstein, J.C. Rougier, A.H. Seheult, Bayesian forecasting for complex systems using computer simulators. J. Am. Stat. Assoc. 96(454), 717–729 (2001)

    Article  MathSciNet  Google Scholar 

  33. B.K. Hegstad, O. Henning et al., Uncertainty in production forecasts based on well observations, seismic data, and production history. Spe J. 6(04), 409–424 (2001)

    Article  Google Scholar 

  34. P.K. Kitanidis, Parameter uncertainty in estimation of spatial functions: Bayesian analysis. Water Resour. Res. 22(4), 499–507 (1986)

    Article  Google Scholar 

  35. F. Liu, M. Bayarri, J. Berger, R. Paulo, J. Sacks, A Bayesian analysis of the thermal challenge problem. Comput. Methods Appl. Mech. Eng. 197(29–32), 2457–2466 (2008)

    Article  Google Scholar 

  36. D.M. Schmidt, J.S. George, C.C. Wood, Bayesian inference applied to the electromagnetic inverse problem. Hum. Brain Mapp. 7(3), 195–212 (1999)

    Article  Google Scholar 

  37. J. Wang, N. Zabaras, Hierarchical Bayesian models for inverse problems in heat conduction. Inverse Probl. 21(1), 183 (2004)

    Article  MathSciNet  Google Scholar 

  38. C.M. Bishop et al., Pattern Recognition and Machine Learning, vol. 4 (Springer, New York, 2006)

    Google Scholar 

  39. K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT press, 2012)

    Google Scholar 

  40. A. Ypma, T. Heskes, Novel approximations for inference in nonlinear dynamical systems using expectation propagation. Neurocomputing 69(1), 85–99 (2005)

    Article  Google Scholar 

  41. B.M. Yu, K.V. Shenoy, M. Sahani, in Expectation Propagation for Inference in Non-linear Dynamical Models with Poisson Observations, Nonlinear Statistical Signal Processing Workshop, (IEEE, 2006) pp. 83–86

  42. M. Deisenroth, S. Mohamed, Expectation propagation in gaussian process dynamical systems. Adv. Neural Inf. Process. Syst. 25 (2012)

  43. O. Zoeter, A. Ypma, T. Heskes, in Improved Unscented Kalman Smoothing for Stock Volatility Estimation, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, (IEEE, 2004) pp. 143–152

  44. M.P. Deisenroth, R.D. Turner, M.F. Huber, U.D. Hanebeck, C.E. Rasmussen, Robust filtering and smoothing with gaussian processes. IEEE Trans. Autom. Control 57(7), 1865–1871 (2011)

    Article  MathSciNet  Google Scholar 

  45. T.P. Minka, From hidden markov models to linear dynamical systems. Technical report, Tech. Rep. 531, (Vision and Modeling Group of Media Lab, MIT, 1999)

  46. B. Yu, A. Afshar, G. Santhanam, S.I. Ryu, K. Shenoy, M. Sahani, Extracting dynamical structure embedded in neural activity. Adv. Neural Inf. Process. Syst. 18, 1545 (2006)

    Google Scholar 

  47. B. Jia, M. Xin, Y. Cheng, High-degree cubature Kalman filter. Automatica 49(2), 510–518 (2013)

    Article  MathSciNet  Google Scholar 

  48. W. Lee, C. Farmer, Data assimilation by conditioning of driving noise on future observations. IEEE Trans. Signal Process. 62(15), 3887–3896 (2014)

    Article  MathSciNet  Google Scholar 

  49. M.E. Gharamti, B. Ait-El-Fquih, I. Hoteit, in A One-Step-Ahead Smoothing-Based Joint Ensemble Kalman Filter for State-Parameter Estimation of Hydrological Models, Dynamic Data-Driven Environmental Systems Science, (Springer, 2015), pp. 207–214

  50. M. Gharamti, B. Ait-El-Fquih, I. Hoteit, An iterative ensemble kalman filter with one-step-ahead smoothing for state-parameters estimation of contaminant transport models. J. Hydrol. 527, 442–457 (2015)

    Article  Google Scholar 

  51. B. Ait-El-Fquih, M. El Gharamti, I. Hoteit, A bayesian consistent dual ensemble Kalman filter for state-parameter estimation in subsurface hydrology. Hydrol. Earth Syst. Sci 20, 3289–3307 (2016)

    Article  Google Scholar 

  52. N.F. Raboudi, B. Ait-El-Fquih, I. Hoteit, Ensemble Kalman filtering with one-step-ahead smoothing. Mon. Weather Rev. 146(2), 561–581 (2018)

    Article  Google Scholar 

  53. F. Desbouvries, Y. Petetin, B. Ait-El-Fquih, Direct, prediction-and smoothing-based Kalman and particle filter algorithms. Signal Process. 91(8), 2064–2077 (2011)

    Article  Google Scholar 

  54. I. Arasaratnam, S. Haykin, Cubature Kalman smoothers. Automatica 47(10), 2245–2250 (2011)

    Article  MathSciNet  Google Scholar 

  55. B. Jia, M. Xin, Rauch-Tung-Striebel High-Degree Cubature Kalman Smoother, American Control Conference (ACC), (IEEE, 2013), pp. 2472–2477

  56. S.J. Julier, J.K. Uhlmann, A general method for approximating nonlinear transformations of probability distributions. Technical report, Technical report, Robotics Research Group, Department of Engineering Science, (University of Oxford, 1996)

  57. B. Ristic, S. Arulampalam, N. Gordon, Beyond the Kalman filter. IEEE Aerosp. Electron. Syst. Mag. 19(7), 37–38 (2004)

    Article  Google Scholar 

Download references


The author thanks the anonymous referees for their helpful comments and suggestions, which indeed contributed to improving the quality of the publication.


This work is supported by the Start-up Research Grant of Beijing Normal University (No. 28704-310432107), the United International College Start-up Research Fund (No. UICR0700040-22) and the National Natural Science Fund of China (NSFC) Research Fund for International Excellent Young Scientists (No. 12250610190).

Author information

Authors and Affiliations



Not applicable.

Corresponding author

Correspondence to Wonjung Lee.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A: Bayes’ rule for conditional Gaussian

Let \(Z = \left[ \begin{array}{c} X \\ Y \end{array}\right]\) be distributed as a Gaussian with the mean \(\left[ \begin{array}{c} {\bar{x}} \\ {\bar{y}} \end{array}\right]\) and covariance \(\left[ \begin{array}{cc} \Sigma _{xx} &{} \Sigma _{xy} \\ \Sigma _{yx} &{} \Sigma _{yy} \end{array}\right]\). Then Bayes’ rule asserts that the law of \(X\vert Y\) with \(Y=y\) is Gaussian with the mean and covariance given by

$$\begin{aligned} \begin{aligned} {\bar{x}}'&= {\bar{x}}+\Sigma _{xy}\Sigma _{yy}^{-1}(y-{\bar{y}}),\\ \Sigma _{xx}'&= \Sigma _{xx} - \Sigma _{xy}\Sigma _{yy}^{-1}\Sigma _{yx}, \end{aligned} \end{aligned}$$

respectively [4].

Appendix B: Nonlinear Kalman smoother

Let \({\textsf{x}}_{n|n'}:={\textsf{x}}_{n} |Y_{n'}\) then, based on the Gaussian assumption \({\textsf{x}}_{n|n'} \sim {\mathcal {N}}(\bar{{\textsf{x}}}_{n|n'},C_{n|n'})\), one can derive

$$\begin{aligned} \begin{aligned} \bar{{\textsf{x}}}_{n|N}&=\bar{{\textsf{x}}}_{n|n}+G_n(\bar{{\textsf{x}}}_{n+1|N}-\bar{{\textsf{x}}}_{n+1|n}), \\ C_{n|N}&=C_{n|n}+G_n(C_{n+1|N}-C_{n+1|n})G_n^T, \end{aligned} \end{aligned}$$

where \(G_n = D_{n+1}C_{n+1|n}^{-1}\) and \(D_{n+1} = \text {cov}({\textsf{x}}_{n|n}, {\textsf{x}}_{n+1|n})\) [4, 7]. Recurrence relation (B2) enables the backward time update \(\textsf{ x}_{n+1|N}\searrow {\textsf{x}}_{n|N}\), provided the joint Gaussian law of \(({\textsf{x}}_{n|n}, {\textsf{x}}_{n+1|n})\) is known.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, W. New graphical models for sequential data and the improved state estimations by data-conditioned driving noises. EURASIP J. Adv. Signal Process. 2024, 50 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: