- Research
- Open Access

# A Bayesian robust Kalman smoothing framework for state-space models with uncertain noise statistics

- Roozbeh Dehghannasiri
^{1}Email authorView ORCID ID profile, - Xiaoning Qian
^{1}and - Edward R. Dougherty
^{1}

**2018**:55

https://doi.org/10.1186/s13634-018-0577-1

© The Author(s) 2018

**Received:**21 January 2018**Accepted:**13 August 2018**Published:**6 September 2018

## Abstract

The classical Kalman smoother recursively estimates states over a finite time window using all observations in the window. In this paper, we assume that the parameters characterizing the second-order statistics of process and observation noise are unknown and propose an optimal Bayesian Kalman smoother (OBKS) to obtain smoothed estimates that are optimal relative to the posterior distribution of the unknown noise parameters. The method uses a Bayesian innovation process and a posterior-based Bayesian orthogonality principle. The optimal Bayesian Kalman smoother possesses the same forward-backward structure as that of the ordinary Kalman smoother with the ordinary noise statistics replaced by their effective counterparts. In the first step, the posterior effective noise statistics are computed. Then, using the obtained effective noise statistics, the optimal Bayesian Kalman filter is run in the forward direction over the window of observations. The Bayesian smoothed estimates are obtained in the backward step. We validate the performance of the proposed robust smoother in the target tracking and gene regulatory network inference problems.

## Keywords

- Kalman smoother
- Robust filtering
- Bayesian robustness
- Innovation process
- Orthogonality principle

## 1 Introduction

Classical Kalman filtering is defined via a set of equations that provide a recursive evaluation of the optimal linear filter output to incorporate new observations [1]. The filtering procedure assumes a state-space model consisting of a transition equation and an observation equation. There are three filtering paradigms [2]: the Kalman filter estimates the signal at the most recent observed time point, the Kalman predictor estimates the signal at a future time point, and the Kalman smoother estimates the signal at an intermediate observation time point. The equations for the filter and predictor are closely related, so that solving one provides an immediate solution for the other, whereas the smoother requires further work.

The issue that concerns us here is how to proceed when the model is not fully known. Classically, a precondition for optimal filtering is to have complete knowledge of the random process model; however, this assumption is not always realistic in many practical settings such as target tracking [3–7]. Over the years, various adaptive procedures have been developed that essentially provide improving model estimates with increasing numbers of observations [8, 9]. More recently, the problem has been addressed under the assumption that the model belongs to an uncertainty class of models governed by a prior probability distribution, thereby placing the matter in a Bayesian framework with the aim being to find a recursive filter that is optimal over the uncertainty class.

There are two existing viewpoints for designing robust filters: minimax robustness, which involves designing a filter with the best worst-case performance [10–12], and Bayesian robustness, which involves designing a robust filter with the optimal performance on average relative to a prior (or posterior) distribution governing the uncertainty class [13–16]. When designing a Bayesian robust filter, if optimization is not constrained, meaning that it is over the entire class of filters of a particular type, then the filter is called an *intrinsically Bayesian robust* filter when optimality is relative to the prior distribution, and called an *optimal Bayesian filter* when optimality is relative to the posterior distribution. This kind of uncertainty modeling has been applied to linear and morphological filtering, both with and without incorporating the information embedded in the observations into the prior distribution [15, 17]. In the case of Kalman filtering, the problem has been addressed for the filter and predictor without prior updating [16], which is called an *intrinsically Bayesian robust Kalman filter*, and with updating based on new data [18], which is called an *optimal Bayesian Kalman filter*. In this paper, we find the optimal Kalman smoother relative to the probability mass governing the uncertainty.

In a Bayesian robustness setting, the prior (posterior) distribution is on the model of the underlying random process, meaning that it refers directly to our scientific uncertainty. The general aim is to find an operator that is optimal with respect to both the stochasticity in the nominal problem, for which the underlying model is fully known, and the model uncertainty. The aim can be achieved by replacing model characteristics and statistics in the solution to the nominal problem with their *effective* counterparts, which incorporate model uncertainty in such a way that the equation structure of the nominal solution is essentially preserved in the Bayesian robust solution. This approach has been used for classification [19], linear and morphological filtering [15, 17], signal compression [20], and Kalman filtering [16]. For example, in optimal wide-sense stationary linear filtering, the power spectra are replaced by effective power spectra [15] or in Gaussian classification, the class-conditional densities are replaced by effective class-conditional densities [19].

An *intrinsically Bayesian robust Kalman filter* (IBR-KF) has been proposed in [16] that is optimal relative to the prior distribution of noise parameters. The theory of the IBR-KF is rooted in the Bayesian orthogonality principle and the Bayesian innovation process, which are the extended versions of their ordinary counterparts when applied to the prior distribution. Innovation processes have long been used for Kalman filtering, dating back to 1968 when Kailath proposed the first instance of an innovation-based approach for Kalman filtering [21]. Building on the IBR-KF theory developed in [16], an *optimal Bayesian Kalman filter* (OBKF) achieving optimality on average relative to the posterior distribution of the noise parameters when observations are incorporated into the prior distribution was proposed [18]. The OBKF shares the theoretical foundation of the IBR-KF, the difference being the distribution relative to which the Bayesian innovation process and Bayesian orthogonality principle are stated. It is the prior distribution in the latter [16] and the posterior distribution in the former [18].

Kalman smoothing is an offline signal processing tool where both past and future observations are used for making estimations [22–29]. Kalman smoothers can be classified as fixed-point, fixed-lag, and fixed-interval smoothers [30]; however, the term Kalman smoother generally refers to the fixed-interval case in which the goal is to estimate the sequence of states over a finite time window using all observations in the same window.

In this paper, we assume that the parameters characterizing the second-order statistics of process and observation noise are unknown and propose an *optimal Bayesian Kalman smoother* (OBKS) framework to obtain smoothed estimates that are optimal relative to the posterior distribution of the unknown noise parameters. Referring to our method as an “optimal Bayesian” smoother is consistent with the terminology used in other works devoted to the design of optimal Bayesian filters when a prior distribution is assumed for the unknown parameters in the random process model [17, 18].

In a sense, this paper fills in the last block of a six-part Kalman filtering paradigm: (1) filter/predictor under known model, (2) smoother under known model, (3) adaptive filter/predictor under unknown model, (4) adaptive smoother under unknown model, (5) optimal filter/predictor relative to an uncertainty class of models, and (6) optimal smoother relative to an uncertainty class of models. This is not to say that all problems have been solved. There can be many adaptive approaches. There are also many ways in which there can be uncertainty in the state-space model, and optimality relative to that uncertainty can be defined via different cost functions. In the four uncertainty settings referred to here, the covariance matrices for the process and observation noise are assumed to be unknown (in a manner to be precisely defined in the sequel).

Similar to the IBR-KF and OBKF, the proposed smoother is rooted in an innovation process. Several ordinary Kalman smoothers have employed innovation processes: for continuous-time systems [31], the fixed-interval Kalman smoother for linear discrete-time systems when only the covariance information is available [32], and when observations might be randomly missing [33]. In this paper, we use the Bayesian innovation process and the Bayesian orthogonality principle proposed in [16] to derive the OBKS forward-backward equations. The main advantage of the proposed smoothing framework is that it possesses the same forward-backward structure as that of the ordinary Kalman smoother with the ordinary noise statistics replaced by their effective counterparts. The effective statistics incorporate the uncertainty of the parameters characterizing the observation and process noise second-order statistics in such a way that designing an OBKS relative to an uncertainty class can be reduced to designing an ordinary Kalman smoother relative to the effective statistics. Specifically, we introduce the *effective Kalman smoothing gain* for the backward step of the OBKS. The proposed smoothing framework requires two forward steps. In the first step, the *posterior effective noise statistics* are computed. Then, the optimal Bayesian Kalman filter is designed relative to the obtained posterior effective noise statistics and is run in the forward direction over the window of observations. Finally, in the backward step, the Bayesian smoothed estimates are obtained.

This paper is organized as follows. In Section 2, we provide the theoretical foundation and derive the recursive equations for the proposed optimal Bayesian Kalman smoother. Section 3 is devoted to the experimental evaluation of the proposed OBKS method using two examples: target tracking and gene regulatory network inference. Finally, concluding remarks are given in Section 4.

Here, we summarize the notations employed throughout the paper. We use uppercase and lowercase boldface letters to denote matrices and vectors, respectively. **M**^{T}, |**M**|, and Tr{**M**} represent the transpose, determinant, and trace (sum of diagonal elements) operators for matrix **M**, respectively. Also, diag[ ·] represents the diagonal elements of a diagonal matrix. The value of a time-dependent matrix at time *k* is denoted by **M**_{k}. Let \((P,\Omega,\mathcal {E})\) be a probability space, then E[ ·] denotes the expectation relative to the probability measure *P*. In a real-valued random vector **x**=[*x*(1),...,*x*(*k*)], each component is a real random variable \(x(i): \Omega \rightarrow \mathcal {R}\), 1≤*i*≤*k*. We use E[ **x**] and cov[ **x**]=E[**x****x**^{T}] to denote the mean vector and the covariance matrix, respectively. Finally, \(\mathcal {N}(\mathbf {x};\mathbf {\mu },\mathbf {\Sigma })\) denotes a multivariate Gaussian function relative to random vector **x** with the mean vector **μ** and the covariance matrix **Σ**.

## 2 Optimal Bayesian Kalman smoother

### 2.1 Problem formulation and theoretical background

*n*×1 and

*m*×1, called the state vector and observation vector, respectively.

**Φ**

_{k},

**H**

_{k}, and

**Γ**

_{k}are matrices of size

*n*×

*n*,

*m*×

*n*, and

*n*×

*p*called the state transition matrix, observation transition matrix, and the process noise transition matrix, respectively. We let \(\mathbf {z}^{\theta _{1}}_{k}=\mathbf {H}_{k}\mathbf {x}^{\theta _{1}}_{k}\). \(\mathbf {u}_{k}^{\theta _{1}}\) and \(\mathbf {v}_{k}^{\theta _{2}}\) are

*p*×1 and

*m*×1 vectors representing the process noise and observation noise, respectively, being zero-mean discrete white-noise processes. The unknown noise covariance matrices of the process and observation noise are given by

where *δ*_{kl} is Dirac delta function, i.e., *δ*_{kl}=1 for *k*=*l* and *δ*_{kl}=0 for *k*≠*l*, and *θ*_{1} and *θ*_{2} are two unknown parameters such that θ=[*θ*_{1},*θ*_{2}]∈*Θ*, *Θ* being the collection of all possible realizations for θ, governed by a prior distribution *π*(θ). We assume that *θ*_{1} and *θ*_{2} are independent. Note that while the observation vector \(\mathbf {y}_{k}^{\boldsymbol {\theta }}\) depends on both *θ*_{1} and *θ*_{2}, the state vector \(\mathbf {x}_{k}^{\theta _{1}}\) depends only on *θ*_{1}.

*L*, we desire an

*optimal Bayesian Kalman smoother*(OBKS) that is a fixed-interval smoother involving finding the estimates of states \(\mathbf {x}^{\theta _{1}}_{0}, \mathbf {x}^{\theta _{1}}_{1},..., \mathbf {x}^{\theta _{1}}_{L}\) in the same window. In this context, the Bayesian smoothed estimate \(\widehat {\mathbf {x}}_{k|L}^{\boldsymbol {\theta }}\) of \(\mathbf {x}_{k}^{\theta _{1}}\), which is the output of the OBKS at time

*k*, has the following form

where \(\mathcal {G}\) is the vector space of all *n*×*m* matrix-valued functions \(\mathbf {G}_{k,l}:\mathbb {N}\times \mathbb {N}\longrightarrow \mathbb {R}^{n\times m}\), and \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}[\!\cdot ]\) denotes the expectation relative to \(\pi \left (\boldsymbol {\theta }|\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right)\), i.e., \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}[\!\cdot ]=\int _{\boldsymbol {\theta }}(\cdot)\pi \left (\boldsymbol {\theta }|\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right)\,d\boldsymbol {\theta }\). Note that we use E_{θ}[ ·] to denote the expectation relative to the prior distribution *π*(θ). Furthermore, E[ θ] and \(\mathrm {E}\left [\boldsymbol {\theta }|\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right ]\) represent the expectation of parameter θ relative to *π*(θ) and \(\pi \left (\boldsymbol {\theta }|\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right)\), respectively. It is worth mentioning that the optimal Bayesian Kalman predictor and the optimal Bayesian Kalman filter proposed in [18] correspond to *L*=*k*−1 and *L*=*k* in (5), respectively. Also, if instead of \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}[\!\cdot ]\), E_{θ}[ ·] is used in (6), the estimators corresponding to *L*=*k*−1 and *L*=*k* in (5) are called the intrinsically Bayesian robust Kalman predictor and filter, respectively [16].

Before developing the OBKS equations, we state a theorem and a lemma required for deriving equations whose proofs can be found in [16].

The next theorem is a restatement of the classical orthogonality principle relative to the inner product defined by \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}[\!\mathrm {E}[\!\cdot ]]\) applied to \(\mathbf {x}_{k}^{\theta _{1}}\), \(\mathbf {y}_{l}^{\boldsymbol {\theta }}\), and \(\widehat {\mathbf {x}}_{k|L}^{\boldsymbol {\theta }}\), keeping in mind that \(\mathbf {x}_{k}^{\theta _{1}}\) depends only on *θ*_{1}, whereas \(\widehat {\mathbf {x}}_{k}^{\boldsymbol {\theta }}\) depends on θ=[*θ*_{1},*θ*_{2}]. As originally stated in [16], the “Bayesian orthogonality principle” involved the inner product defined by E_{θ}[ E[ ·]].

###
**Theorem 1**

for all *l*≤*L*, where **0**_{n×m} is the zero matrix of size *n*×*m*.

*k*, then the

*Bayesian innovation process*is defined as [16]

The following lemma, which can be proved similar to the proof given in [16], helps us find the Bayesian smoothed estimates using the Bayesian innovation process.

###
**Lemma 1**

(Bayesian Information Equivalence) The Bayesian smoothed estimate \(\widehat {\mathbf {x}}_{k|L}^{\boldsymbol {\theta }}\) for \(\mathbf {x}_{k}^{\boldsymbol {\theta }}\) based upon observations \(\mathbf {y}_{l}^{\boldsymbol {\theta }}\), 0≤*l*≤*L*, can be found by computing the Bayesian smoothed estimate based upon the Bayesian innovation process \(\widetilde {\mathbf {z}}_{l}^{\boldsymbol {\theta }}\), 0≤*l*≤*L*.

### 2.2 Update equation for Bayesian smoothed estimate

We now proceed to develop the recursive structure of the OBKS based on the theoretical foundation laid out in the previous subsection.

*l*≤

*L*,

*k*. We can further simplify \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathrm {E}\left [\mathbf {x}_{k}^{\theta _{1}}\left (\widetilde {\mathbf {z}}_{l}^{\boldsymbol {\theta }}\right)^{T}\right ]\right ]\), for

*k*+1≤

*l*≤

*L*, as

*k*and its auto-correlation \(\mathrm {E}\left [\widetilde {\mathbf {x}}_{k}^{\boldsymbol {\theta }}\left (\widetilde {\mathbf {x}}_{l}^{\boldsymbol {\theta }}\right)^{T}\right ]\) is denoted by \(\mathbf {W}_{k,l}^{\boldsymbol {\theta }}\). Note that the third equality in (16) results from the fact that \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathrm {E}\left [\widehat {\mathbf {x}}_{k|k-1}^{\boldsymbol {\theta }}\left (\widetilde {\mathbf {x}}_{l}^{\boldsymbol {\theta }}\right)^{T}\right ]\right ]=\mathbf {0}_{n\times n}\) due to the Bayesian orthogonality principle and \(\mathbf {v}_{l}^{\theta _{2}}\) is independent from \(\widetilde {\mathbf {x}}_{k}^{\boldsymbol {\theta }}\) and \(\widehat {\mathbf {x}}_{k|k-1}^{\boldsymbol {\theta }}\) for

*k*<

*l*. Therefore, substituting (16), (15) becomes

*k*+1 we have

*effective Kalman gain matrix*[18]. Also, we call \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\!\mathbf {Q}^{\theta _{1}}\right ]\) and \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\!\mathbf {R}^{\theta _{2}}\right ]\) the

*posterior effective process noise statistics*and the

*posterior effective observation noise statistics*, respectively. As has been shown in [18], \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\!\mathbf {Q}^{\theta _{1}}\right ]\) is required for updating \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}_{k|k-1}^{\mathbf {z},\boldsymbol {\theta }}\right ]\). Using (19), we find the relation between \(\widetilde {\mathbf {x}}_{l}^{\boldsymbol {\theta }}\) and \(\widetilde {\mathbf {x}}_{k}^{\boldsymbol {\theta }}\) as follows:

*l*

^{′}≥

*k*, future process noise \(\mathbf {u}_{l^{\prime }}^{\theta _{1}}\) and observation noise \(\mathbf {v}_{l^{\prime }}^{\theta _{2}}\) are independent from \(\widetilde {\mathbf {x}}_{k}^{\boldsymbol {\theta }}\). Now plugging

is called the *effective Kalman smoothing gain*.

### 2.3 Update equation for the Bayesian smoothed error covariance matrix

*k*, we now aim to find a recursive formulation for the average Bayesian smoothing error covariance matrix

Finding the average Bayesian smoothing error covariance matrix in (35) completes all equations needed for implementing the OBKS framework.

The forward step of the OBKS involves running the OBKF and in the backward step the Bayesian smoothed estimates are obtained. We should point out that in practice for the OBKF, the posterior effective noise statistics are updated sequentially for each *k* because filtering is an online estimation scheme. However, here since we use OBKF as the forward step of the OBKS, which is an offline estimation, we use the posterior effective noise statistics \(\mathrm {E}\left [\boldsymbol {\theta }|\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right ]\) relative to the whole observation window for the OBKF-based estimation from the beginning. In other words, the OBKF used in the forward step, is in fact the IBR-KF designed relative to the posterior distribution \(\pi \left (\boldsymbol {\theta }|\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right)\).

Comparison of the recursive equations for the classical and the proposed optimal Bayesian Kalman smoothers

Forward step ( | |

Classical Kalman smoother | \(\widetilde {\mathbf {z}}_{k}=\mathbf {y}_{k}-\mathbf {H}_{k}\widehat {\mathbf {x}}_{k|k-1}\) |

\(\mathbf {K}_{k}=\mathbf {P}^{\mathbf {x}}_{k|k-1}\mathbf {H}^{T}_{k}\left (\mathbf {H}_{k}\mathbf {P}^{\mathbf {x}}_{k|k-1}\mathbf {H}_{k}^{T}+\mathbf {R}\right)^{-1}\) | |

\(\widehat {\mathbf {x}}_{k|k} =\widehat {\mathbf {x}}_{k|k-1} +\mathbf {K}_{k}\widetilde {\mathbf {z}}_{k}\) | |

\(\widehat {\mathbf {x}}_{k+1|k}=\mathbf {\Phi }_{k}\widehat {\mathbf {x}}_{k|k-1}+\mathbf {\Phi }_{k}\mathbf {K}_{k}\widetilde {\mathbf {z}}_{k}\) | |

\(\mathbf {P}^{\mathbf {x}}_{k|k} =(\mathbf {I}-\mathbf {K}_{k}\mathbf {H}_{k})\mathbf {P}_{k|k-1}^{\mathbf {x}}\) | |

\(\mathbf {P}^{\mathbf {x}}_{k+1|k}=\mathbf {\Phi }_{k}\left (\mathbf {I}-\mathbf {K}_{k}\mathbf {H}_{k}\right)\mathbf {P}^{\mathbf {x}}_{k|k-1}\mathbf {\Phi }^{T}_{k} +\mathbf {\Gamma }_{k}\mathbf {Q}\mathbf {\Gamma }^{T}_{k}\) | |

Optimal Bayesian Kalman smoother | \( \widetilde {\mathbf {z}}^{\boldsymbol {\theta }}_{k}=\mathbf {y}^{\boldsymbol {\theta }}_{k}-\mathbf {H}_{k}\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{k|k-1}\) |

\(\mathbf {K}^{\Theta }_{k}=\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}^{\mathbf {x},\boldsymbol {\theta }}_{k|k-1}\right ]\mathbf {H}^{T}_{k}\mathrm {E}_{\boldsymbol {\theta }^{\ast }}^{-1}\left [\mathbf {H}_{k}\mathbf {P}^{\mathbf {x},\boldsymbol {\theta }}_{k|k-1}\mathbf {H}_{k}^{T}+\mathbf {R}^{\theta _{2}}\right ]\) | |

\( \widehat {\mathbf {x}}_{k|k}^{\boldsymbol {\theta }}=\widehat {\mathbf {x}}_{k|k-1}^{\boldsymbol {\theta }} +\mathbf {K}_{k}^{\Theta }\widetilde {\mathbf {z}}^{\boldsymbol {\theta }}_{k}\) | |

\(\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{k+1|k} =\mathbf {\Phi }_{k}\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{k|k-1}+\mathbf {\Phi }_{k}\mathbf {K}^{\Theta }_{k}\widetilde {\mathbf {z}}^{\boldsymbol {\theta }}_{k}\) | |

\(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}_{k|k}^{\mathbf {x},\boldsymbol {\theta }}\right ] =(\mathbf {I}-\mathbf {K}_{k}^{\Theta }\mathbf {H}_{k})\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}_{k|k-1}^{\mathbf {x},\boldsymbol {\theta }}\right ]\) | |

\(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}^{\mathbf {x},\boldsymbol {\theta }}_{k+1|k}\right ] =\mathbf {\Phi }_{k}\left (\mathbf {I}-\!\mathbf {K}^{\Theta }_{k}\mathbf {H}_{k}\right)\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}^{\mathbf {x},\boldsymbol {\theta }}_{k|k-1}\right ] \mathbf {\Phi }^{T}_{k}\) | |

\(\qquad \qquad \qquad +\mathbf {\Gamma }_{k}\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {Q}^{\theta _{1}}\right ]\mathbf {\Gamma }^{T}_{k}\) | |

Backward step ( | |

Classical Kalman smoother | \(\mathbf {A}_{k}=\mathbf {P}_{k|k}^{\mathbf {x}}\mathbf {\Phi }_{k}^{T}\left (\mathbf {P}_{k+1|k}^{\mathbf {x}}\right)^{-1}\) |

\(\widehat {\mathbf {x}}_{k|L}=\widehat {\mathbf {x}}_{k|k}+\mathbf {A}_{k}\left (\widehat {\mathbf {x}}_{k+1|L}-\widehat {\mathbf {x}}_{k+1|k}\right)\) | |

\(\mathbf {P}_{k|L}^{\mathbf {x}}=\mathbf {P}_{k|k}^{\mathbf {x}}+\mathbf {A}_{k}\left (\mathbf {P}_{k+1|L}^{\mathbf {x}}-\mathbf {P}_{k+1|k}^{\mathbf {x}}\right)\mathbf {A}^{T}_{k}\) | |

Optimal Bayesian Kalman smoother | \(\mathbf {A}^{\Theta }_{k}=\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}_{k|k}^{\mathbf {x},\boldsymbol {\theta }}\right ]\mathbf {\Phi }^{T}_{k}\mathrm {E}_{\boldsymbol {\theta }^{\ast }}^{-1}\left [\mathbf {P}^{\mathbf {x},\boldsymbol {\theta }}_{k+1|k}\right ]\) |

\(\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{k|L}=\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{k|k}+\mathbf {A}^{\Theta }_{k}\left (\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{k+1|L}-\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{k+1|k}\right)\) | |

\(\begin {aligned} \mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}_{k|L}^{\mathbf {x},\boldsymbol {\theta }}\right ]&=\mathrm {E}_{\boldsymbol {\theta }^{\ast }} \left [\mathbf {P}_{k|k}^{\mathbf {x},\boldsymbol {\theta }}\right ]+\mathbf {A}^{\Theta }_{k}\left (\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}^{\mathbf {x},\boldsymbol {\theta }}_{k+1|k}\right ]\right.\\ &\quad -\left.\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}^{\mathbf {x},\boldsymbol {\theta }}_{k+1|L}\right ]\right)\left (\mathbf {A}^{\Theta }_{k}\right)^{T} \end {aligned}\) |

If the state vector **x**_{0} is characterized by E[ **x**_{0}] and cov[ **x**_{0}], then the forward step of the OBKS is initialized as \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}_{0|0}^{\mathbf {x},\boldsymbol {\theta }}\right ]=\text {cov}[\!\mathbf {x}_{0}]\), \(\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {P}_{1|0}^{\mathbf {x},\boldsymbol {\theta }}\right ]=\mathbf {\Phi }_{0}\text {cov}[\!\mathbf {x}_{0}]\mathbf {\Phi }^{T}_{0}+\mathbf {\Gamma }_{0}\mathrm {E}_{\boldsymbol {\theta }^{\ast }}\left [\mathbf {Q}^{\theta _{1}}\right ]\mathbf {\Gamma }^{T}_{0}\), \(\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{0|0}=\mathrm {E}[\!\mathbf {x}_{0}]\), and \(\widehat {\mathbf {x}}^{\boldsymbol {\theta }}_{1|0}=\mathbf {\Phi }_{0}\mathrm {E}[\!\mathbf {x}_{0}]\).

### 2.4 Computing posterior effective noise statistics

**Σ**

_{L},

**M**

_{L}, and

*S*

_{L}are computed recursively utilizing the following expressions from

*k*=0 to

*k*=

*L*−1:

**Λ**

_{k}and

**W**

_{k}are obtained as

The initial values are *S*_{0}=1, **Σ**_{0}=cov[ **x**_{0}], and **M**_{0}=E[ **x**_{0}]. A pseudo-code outlining the computational steps needed for computing the likelihood function is available in Additional file 1.

^{(new)}is generated based on the last accepted sample θ

^{(old)}according to a proposal distribution

*f*(θ

^{(new)}|θ

^{(old)}). The new sample θ

^{(new)}will be either accepted or rejected based on an acceptance ratio

*r*computed as follows:

Note that \(f\left (\mathcal {Y}^{\boldsymbol {\theta }}_{L}|\boldsymbol {\theta }^{(\text {new})}\right) \left (\text {and}\ f\left (\mathcal {Y}^{\boldsymbol {\theta }}_{L}|\boldsymbol {\theta }^{(\text {old})}\right)\right)\) are computed via the set of equations given in (39)–(46). The new sample θ^{(new)} will be accepted into the sequence of MCMC samples with probability *r*. Otherwise, it will be discarded and the last sample θ^{(old)} will be repeated in the sequence. When enough MCMC samples are generated, the posterior effective noise statistics \(\mathrm {E}\left [\boldsymbol {\theta }|\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right ]\) are approximated as the average of the generated samples. When a symmetric proposal distribution, i.e., *f*(θ^{(old)}|θ^{(new)})=*f*(θ^{(new)}|θ^{(old)}), such as a Gaussian distribution is used in our simulations, then (47) can be further simplified. Also, more explanation and a pseudo-code on how the recursive calculations in (42)–(46) should be performed are provided in Additional file 1.

Also, in order to study the computational complexity of the proposed OBKS, since the proposed recursive structure is completely similar to that of the ordinary Kalman smoother except using the posterior effective noise statistics, which are approximated using MCMC samples, we only need to analyze the complexity of the MCMC step. In the MCMC step, in order to obtain a sequence of MCMC samples, the likelihood function in (36) should be computed for each generated MCMC sample by iterating the equations given in (42)–(46) from *k*=0 to *k*=*L*−1. Therefore, the dimensions of the state vector **x** and the observation vector **y**, the size of the window *L*, and the number of generated MCMC samples can affect the complexity. We should point out that some matrix calculations such as inversions, multiplications, and determinants might need to be performed only one time for each generated MCMC sample. For example, when the process noise covariance matrix is known and the system is stationary, it is enough to compute \(\mathbf {\Phi }_{k}^{T}\left (\widetilde {\mathbf {Q}}_{k}^{\theta _{1}}\right)^{-1}\) once and then use it for the rest of the calculations.

## 3 Simulation results and discussion

*θ*

_{1},

*θ*

_{2}], which is given by [35]

^{′}, respectively, and matrices \(\mathbf {D}_{k}^{\boldsymbol {\theta }^{\prime }}\) and \(\mathbf {L}_{k}^{\boldsymbol {\theta }^{\prime }}\) can be found recursively via

The initial conditions for these two matrices are \(\mathbf {D}_{L-1}^{\boldsymbol {\theta }^{\prime }}=\mathbf {K}_{L}^{\boldsymbol {\theta }^{\prime }}\mathbf {H}_{L}\) and \(\mathbf {L}_{L-1}^{\boldsymbol {\theta }^{\prime }}=\mathbf {K}_{L-1}^{\boldsymbol {\theta }^{\prime }}\mathbf {H}_{L-1}\).

Note that we focus on the state at time *L*/2, which is in the middle of the observation window, as the steady-state performance of a smoother occurs for the middle points in an observation window. The IBR-KS is similar to the OBKS except that the optimization is relative to the prior distribution *π*(θ). To design an IBR-KS, one can use the OBKS equations in Table 1 with expectations relative to the prior distribution rather than the posterior distribution. Therefore, for the IBR-KS the MCMC step is not needed. The IBR-KS approach provides optimal smoothing performance relative to the prior distribution.

### 3.1 Target tracking example

**x**

_{k}=[

*p*

_{x}

*v*

_{x}

*p*

_{y}

*v*

_{y}]

^{T}, where

*p*

_{x},

*v*

_{x},

*p*

_{y}, and

*v*

_{y}are the horizontal position, horizontal velocity, vertical position, and vertical velocity, respectively. If the vehicle possesses a constant speed and the measurements are made with intervals

*τ*, then a state-space model with the following matrices can characterize the dynamic of the vehicle at each time step [36–38]:

*q*determines the process noise intensity. In the simulations, we assume that the measurement interval is

*τ*=1 and the initial conditions are E[

**x**

_{0}]=[100 10 30 − 10]

^{T}and cov[

**x**

_{0}]=diag[25 2 25 2].

**Γ**

_{k}is not an identity matrix, since the effect of

**Γ**

_{k}on the Kalman equations is only through the covariance matrix of the process noise, a state-space model with the process noise covariance matrix

**Q**and the process noise transition matrix

**Γ**

_{k}is equivalent to a state-space model with the process noise covariance matrix \(\mathbf {Q}^{\text {eq}}_{k}= \mathbf {\Gamma }_{k}\mathbf {Q}\mathbf {\Gamma }^{T}_{k}\) and the process noise transition matrix \(\mathbf {\Gamma }^{\text {eq}}_{k}=\mathbf {I}\). With this in mind, we can consider the simulation results of the above target tracking example for an equivalent state-space model with the process noise transition matrix

and the covariance matrix **Q**^{′}=*q*×diag[0.0657 0.0657 1.2676 1.2676], where **Q**^{′} is the diagonal matrix of the eigenvalues and \(\mathbf {\Gamma }_{k}^{\prime }\) is matrix of the eigenvectors for **Q**, i.e., **Q**=**Γ**^{′}**Q**^{′}(**Γ**^{′})^{T} is the eigen-decomposition of **Q**. Therefore, the simulations for the target tracking example can also be regarded for the case that the process noise transition matrix is not an identity matrix.

*q*to 2 and assume that the diagonal element

*r*of the observation noise covariance matrix is unknown and represented by a uniform random variable

*θ*over [0.25, 5]. In previous work, we have used the inverse-Wishart distribution as a prior for the covariance matrices [20] and could have done so here; however, for computational reasons, we have limited our example to priors governing

*q*and

*r*. Let the observation window be of length

*L*=15. For the MCMC step, we generate 10,000 MCMC samples and use a Gaussian distribution as the proposal distribution with the mean being the last accepted MCMC sample and variance being equal to 4. First, we study the average performance (MSE) over the uncertainty class for various Kalman smoothing approaches. To do so, for a given sequence of observations \(\mathcal {Y}^{\boldsymbol {\theta }}_{L}\), we find the error of each Kalman smoothing approach for each

*k*by first computing the error covariance matrix using (48), where

*θ*

^{′}is replaced by \(\mathrm {E}\left [\theta |\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right ]\), E[

*θ*], and

*θ*

_{mm}for the OBKS, IBR-KS, and the minimax Kalman smoother, respectively, and then, the MSE at time

*k*is obtained by finding the sum of diagonal elements of the error covariance matrix. The MSE for the optimal Kalman smoother designed relative to the actual

*θ*value is obtained according to

**P**

_{k|L}in Table 1. The reported average MSE is obtained over 30 different assumed values of

*θ*and 10 different observation sequences for each value (300 simulations in total). Figure 2a presents the average MSE across the observation window obtained for each smoothing scheme. As can be seen, OBKS outperforms IBR and minimax approaches and its performance is close to the average of the optimal MSEs obtained by the optimal smoothers. Figure 2b shows the average MSE for the middle state (

*k*=8) in the observation window. In addition, this figure presents the average MSE of each model-specific Kalman smoother designed relative to value

*θ*

^{′}. Note that the difference between the definitions of the optimal smoother and the model-specific smoother is that the optimal smoother is designed relative to

*θ*and always applied to model

*θ*, but the model-specific smoother is designed relative to

*θ*

^{′}and then applied to the model

*θ*. This figure suggests the better performance of the OBKS compared to the IBR, minimax, and model-specific Kalman smoothers.

*θ*=0.5 and

*θ*=4, respectively. Although

*θ*is fixed, since the OBKS performance depends on the generated observations, we report the average MSE taken over 300 different generated observation sequences \(\mathcal {Y}^{\boldsymbol {\theta }}_{L}\). It can be seen that the OBKS has its performance close to the optimal Kalman smoothers and much better compared to other robust Kalman smoothers. Since the IBR approach is optimal on average relative to the prior distribution, not for each possible model within the class, it is not guaranteed that the IBR-KS performs well for each model, an example being

*θ*=4 where minimax outperforms the IBR approach. However, even for models for which the IBR approach does not perform well, OBKS still gives promising results.

In Fig. 3c, d, in addition to the MSE values for different smoothers, we also present the MSEs of various Kalman filters for each time step *k*. For different filters, we compute \(\mathbf {P}_{k|k}^{\theta ^{\prime },\theta }\), as derived in [16], and report \(\text {Tr}\left \{\mathbf {P}_{k|k}^{\theta ^{\prime },\theta }\right \}\), where *θ*^{′} is replaced by *θ*_{mm}, E[*θ*], and \(\mathrm {E}\left [\theta |\mathcal {Y}^{\boldsymbol {\theta }}_{L}\right ]\) for the minimax, IBR, and OBKF approaches, respectively. Since the error covariance matrix of the filter is used as the initial value for the error covariance matrix of the corresponding smoother, for *k*=15, the MSE of each smoother equals that of its corresponding filter. As expected, for other time indices *k* within the observation window, the MSE of the smoother is always lower than that of the filter. Note that due to the range of the *y*-axis in Fig. 3c, d, the MSEs of various smoothing approaches might not be distinguishable. The difference between the performances of different smoothers is visible in Fig. 3a, b.

*L*. Figure 4 presents the average MSE of different Kalman smoothers at time step

*L*/2 (middle state) as

*L*changes from 6 to 50. The average MSE for each

*L*is obtained in the same way as in Fig. 2. We can observe that when the number of observations is small, the performances of OBKS and IBR-KS are close because the expectation of

*θ*relative to the posterior distribution is close to the expectation relative to the prior distribution; however, as the number of observations increases, the average MSE of the OBKS gets closer to that of the optimal smoothers because the expectation of the unknown parameter tends to the true parameter value. Moreover, both the OBKS and IBR-KS always outperform the minimax Kalman smoother in terms of average MSE.

*q*and the observation noise parameter

*r*are unknown, being denoted by uniform random variables

*θ*

_{1}and

*θ*

_{2}over intervals [3,5] and [0.25,5], respectively. Regarding the MCMC step, we use a multivariate Gaussian distribution with the mean vector being the vector of last accepted samples for

*θ*

_{1}and

*θ*

_{2}and a diagonal covariance matrix whose diagonal elements are diag[1 1.5]. Analogous to the previous set of simulations, we compare the performance of the OBKS with other smoothing approaches in terms of the average MSE over 200 different assumed true values for

*θ*

_{1}and

*θ*

_{2}and 10 different sets of observations for each pair of true values (2000 simulations in total) in Fig. 5. As shown in the figure, the OBKS outperforms other robust smoothers and performs closely to the optimal smoother designed relative to the underlying true model.

*k*=8, the middle point in the observation window. The white surface represents the average middle-point MSE for each model-specific Kalman smoother designed relative to the process noise parameter \(\theta _{1}^{\prime }\) and observation noise parameter \(\theta _{2}^{\prime }\). We also show the average middle-point MSEs for the IBR-KS, minimax smoother, and OBKS, and the average of the optimal middle-point MSEs obtained by the optimal smoothers by constant planes. This figure suggests that compared to other robust smoothers, the OBKS achieves the closest average middle-point MSE to that of the optimal smoothers.

*θ*

_{1}and

*θ*

_{2}. Figure 7a, c correspond to

*θ*

_{1}=4.5 and

*θ*

_{2}=1, respectively, and Fig. 7b, d correspond to

*θ*

_{1}=3.2 and

*θ*

_{2}=4, respectively. The figures in the first row report the MSE of different smoothers for each time instance within the observation window. For both state-space models, we see the promising performance of the OBKS. In addition to the MSEs of different smoothers, the second row gives the MSEs of different Kalman filters. The MSE of each smoother is initialized by the MSE of the corresponding filter, and then, it decreases as we proceed in the backward direction. The difference between the performances of different smoothers is visible in Fig. 7a, b, which focus on a shorter range for the smoothing performance.

*L*from 6 to 50 and report the average middle-point MSEs of various smoothers for each

*L*. When

*L*is small, the performances of the OBKS and IBR-KS are close but as

*L*increases, the performance of the OBKS tends to that of the optimal smoother. This is because the posterior effective noise statistics converge to the underlying true values as the number of observations increases.

*S*of generated MCMC samples is 5000, 10,000, and 15,000. Computations were performed on a machine with 16 GB RAM and Intel

^{®;}Core

^{TM}i7 2.5 GHz CPU. As can be seen, the run time tends to grow linearly with

*L*and

*S*. In our simulations, we set the number of samples in the MCMC step to 10,000 to obtain acceptable estimates at tolerable computational complexity.

*θ*is unknown and its true value is 0.5. We consider three different observation window sizes at

*L*=10,15,20 and vary the number of MCMC samples from 100 to 10,000. For each

*L*and number of MCMC samples, we report the middle-point MSE (MSE at

*k*=

*L*/2) over 300 different observation sequences generated based on the underlying true state-space model. As can be seen, as the number of MCMC samples increases (especially when the number of MCMC samples is not large enough), the OBKS performance gets better because more accurate posterior effective noise statistics can be obtained via more MCMC samples. However, after collecting enough MCMC samples, the performance of the OBKS converges, and further increase of MCMC samples has little additional performance improvement. In our simulations throughout the paper, we used 10,000 MCMC samples.

### 3.2 Gene regulatory network inference

*g*

_{i}, 1≤

*i*≤

*n*,

*n*being the total number of genes in the network, is characterized as

*v*

_{i}, and

*η*

_{i}(·) are the derivative of the gene-expression value relative to the time variable, the external noise, and the regulatory function, respectively. The regulatory function

*η*

_{i}is a linear combination of some nonlinear terms [39]:

where *N*_{i} is the number of nonlinear terms in *η*_{i}, *Ω*_{ij}(·) is the *j*th nonlinear term in *η*_{i} with corresponding coefficient *α*_{ij} and parameter noise *u*_{ij}.

The inference problem for this model involves estimating the values of coefficients *α*_{ij} from time series data, generated from the underlying true GRN model. To estimate unknown coefficients from data, following [16, 39], we build a state-space model with vectors formed by stacking coefficients *α*_{ij}, parameter noise *u*_{ij}, and external noise *v*_{i} in place of the state vector **x**_{k}, process noise vector **u**_{k}, and observation noise vector **v**_{k}, respectively. In the state-space model, we have **Φ**_{k}=**I** and **Γ**_{k}=**I**. The observation vector **y**_{k} and the observation transition matrix **H**_{k} are formed using gene-expression values *g*_{i} and the nonlinear terms *Ω*_{ij}, respectively. More details on constructing the state-space model for this inference problem can be found in [16, 39]. In this paper, we work out the inference problem for the yeast cell cycle network [39], which has *n*=12 genes and 54 coefficients to be inferred. For this network, the state vector is of size 54 and the observation vector is of size 12. To evaluate the performance of the OBKS for network inference, we use the synthetic time series data generated according to the regulatory equations given in [39].

**Q**=10

^{−7}×

**I**and

**R**=

*θ*×

**I**in which

*θ*is unknown and belongs to [ 0.25,6]. Let the initial conditions be E[

**x**

_{0}]=

**0**

_{54×1}and cov[

**x**

_{0}]=0.001×

**I**. We set the observation window length

*L*to 15. For MCMC calculations, we use the same setting as the first set of simulations for the target tracking example. First we analyze the average performance over the uncertainty class. In order to report the average MSE for each smoothing scheme, we take the average of the MSEs obtained over 30 different assumed true values of

*θ*and 20 different observation sequences for each assumed true value (600 simulations). In Fig. 11a, we compare the average MSEs of different Kalman smoothers for each time index

*k*within the observation window. The average MSE from the OBKS is lower compared to those of the IBR-KS and minimax smoothers. Fig. 11b presents the average middle-point MSE (for

*k*=8) for each model-specific Kalman smoother designed relative to value

*θ*

^{′}for the noise parameter and also those for the IBR-KS, minimax, and the OBKS approaches are shown. We also show the average of the optimal MSEs obtained by the optimal smoothers. This figure verifies the promising performance of the OBKS approach.

*θ*=1.5 and

*θ*=5. For each assumed true value, the results are averaged over 200 different observation sequences generated based on the underlying true value of

*θ*. In Fig. 12a and b, we only show the performance of different smoothers, but in Fig. 12c and d, we show the performances for both the smoothing and filtering schemes. We can see that the OBKS performs much better compared to other robust approaches.

*L*, we report the average MSE for

*k*=

*L*/2 corresponding to each Kalman smoothing strategy. For each

*L*, the average MSEs are obtained in the same way as Fig. 11. As shown in the figure, the performance of the OBKS gets closer to that of the optimal smoother for larger

*L*, which is what we expect from the OBKS as the posterior effective noise statistics, relative to which the OBKS is designed, eventually converge to the underlying true values.

## 4 Conclusions

We proposed an optimal Bayesian Kalman smoothing framework that provides the optimal smoothing performance relative to the posterior distribution of the unknown noise parameters. Thanks to the effective Kalman smoothing gain that is applied to the posterior distribution, the structure of the proposed OBKS is analogous to that of the classical Kalman smoother. In the absence of the prior update step via factor graph, one can employ the IBR Kalman smoother to obtain the optimality relative to the prior distribution. The optimal Bayesian smoothing framework can play a major role in applications where data are rare or expensive, such as in genomics.

There are a few avenues of research in which our future work can proceed. One future direction is to address prior construction for the proposed OBKS framework, which involves optimizing the prior distribution such that it can reflect the available prior knowledge as perfectly as possible. For example, this has been done for genomic classification by utilizing gene signaling pathway knowledge to optimize prior distribution parameters [40]. Another avenue is to extend the OBKS framework to other state-space models in which noise is not white or the state-space model is not linear, which takes the OBKS to the realm of extended Kalman filters.

## Declarations

### Funding

This work was funded in part by Award CCF-1553281 from the National Science Foundation.

### Availability of data and materials

Data and MATLAB source code are available from the corresponding author upon request.

### Authors’ contributions

RD conceived the method, developed the algorithm, performed the simulations, analyzed the results, and wrote the first draft. XQ analyzed the results and edited the manuscript. ERD conceived the method, oversaw the project, analyzed the results, and edited the manuscript. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- R.E. Kalman, A new approach to linear filtering and prediction problems. J. Basic Eng.
**82**(1), 35–45 (1960).View ArticleGoogle Scholar - C.K. Chui, G. Chen, et al.,
*Kalman filtering*(Springer, Cham, 2017).View ArticleMATHGoogle Scholar - C.C. Drovandi, J.M. McGree, A.N. Pettitt, A sequential Monte Carlo algorithm to incorporate model uncertainty in Bayesian sequential design. J. Comput. Graph. Stat.
**23**(1), 3–24 (2014).MathSciNetView ArticleGoogle Scholar - N. Chopin, P.E. Jacob, O. Papaspiliopoulos, SMC2: an efficient algorithm for sequential analysis of state space models. J. R. Stat. Soc. Ser. B Stat Methodol.
**75**(3), 397–426 (2013).MathSciNetView ArticleGoogle Scholar - D. Crisan, J. Miguez, et al., Nested particle filters for online parameter estimation in discrete-time state-space Markov models. Bernoulli.
**24**(4A), 3039–3086 (2018).MathSciNetView ArticleMATHGoogle Scholar - L. Martino, J. Read, V. Elvira, F. Louzada, Cooperative parallel particle filters for online model selection and applications to urban mobility. Digit. Signal Proc.
**60:**, 172–185 (2017).View ArticleGoogle Scholar - I. Urteaga, M.F. Bugallo, P.M. Djurić, in Statistical Signal Processing Workshop (SSP), 2016 IEEE. Sequential Monte Carlo methods under model uncertainty (IEEE, 2016), pp. 1–5.Google Scholar
- K.A. Myers, B.D. Tapley, Adaptive sequential estimation with unknown noise statistics. IEEE Trans. Autom. Control.
**21**(4), 520–523 (1976).View ArticleMATHGoogle Scholar - R.K. Mehra, On the identification of variances and adaptive Kalman filtering. IEEE Trans. Autom. Control.
**15**(2), 175–184 (1970).MathSciNetView ArticleGoogle Scholar - H.V. Poor, On robust Wiener filtering. IEEE Trans. Autom. Control.
**25**(3), 531–536 (1980).MathSciNetView ArticleMATHGoogle Scholar - S. Verdu, H. Poor, On minimax robustness: a general approach and applications. IEEE Trans. Infor. Theory.
**30**(2), 328–340 (1984).MathSciNetView ArticleMATHGoogle Scholar - V. Poor, D.P. Looze, Minimax state estimation for linear stochastic systems with noise uncertainty. IEEE Trans. Autom. Control.
**26**(4), 902–906 (1981).MathSciNetView ArticleMATHGoogle Scholar - A.M. Grigoryan, E.R. Dougherty, Design and analysis of robust binary filters in the context of a prior distribution for the states of nature. Math. Imag. Vision.
**11**(3), 239–254 (1999).View ArticleMATHGoogle Scholar - A.M. Grigoryan, E.R. Dougherty, Bayesian robust optimal linear filters. Signal Process.
**81**(12), 2503–2521 (2001).View ArticleMATHGoogle Scholar - L. Dalton, E. Dougherty, Intrinsically optimal Bayesian robust filtering. IEEE Trans. Signal Process.
**62**(3), 657–670 (2014).MathSciNetView ArticleMATHGoogle Scholar - R. Dehghannasiri, M.S. Esfahani, E.R. Dougherty, Intrinsically Bayesian robust Kalman filter: an innovation process approach. IEEE Trans. Signal Process.
**65**(10), 2531–2546 (2017).MathSciNetView ArticleGoogle Scholar - X. Qian, E. Dougherty, Bayesian regression with network prior: optimal Bayesian filtering perspective. IEEE Trans. Signal Process.
**64**(23), 6243 (2016).MathSciNetView ArticleGoogle Scholar - R. Dehghannasiri, M.S. Esfahani, X. Qian, E.R. Dougherty, Optimal Bayesian Kalman filtering with prior update. IEEE Trans. Signal Process.
**66**(8), 1982–1996 (2018).MathSciNetView ArticleGoogle Scholar - L.A. Dalton, E.R. Dougherty, Optimal classifiers with minimum expected error within a Bayesian framework–part I: discrete and Gaussian models. Pattern Recogn.
**46**(5), 1301–1314 (2013).View ArticleMATHGoogle Scholar - R. Dehghannasiri, X. Qian, E.R. Dougherty, Intrinsically Bayesian robust Karhunen-Loève compression. Signal Process.
**144**(Supplement C), 311–322 (2018).View ArticleGoogle Scholar - T. Kailath, An innovations approach to least-squares estimation–part I: linear filtering in additive white noise. IEEE Trans. Autom. Control.
**13**(6), 646–655 (1968).View ArticleGoogle Scholar - A. Bryson, M. Frazier, in
*Proceedings of the Optimum System Synthesis Conference*. Smoothing for linear and nonlinear dynamic systems (DTIC DocumentOhio, 1963), pp. 353–364.Google Scholar - H. Rauch, Solutions to the linear smoothing problem. IEEE Trans. Autom. Control.
**8**(4), 371–372 (1963).View ArticleGoogle Scholar - H.E. Rauch, C. Striebel, F. Tung, Maximum likelihood estimates of linear dynamic systems. AIAA J.
**3**(8), 1445–1450 (1965).MathSciNetView ArticleGoogle Scholar - J.S. Meditch, Orthogonal projection and discrete optimal linear smoothing. SIAM J. Control.
**5**(1), 74–89 (1967).MathSciNetView ArticleMATHGoogle Scholar - H. Zhao, P. Cui, W. Wang, D. Yang,
*h*_{∞}fixed-interval smoothing estimation for time-delay systems. IEEE Trans. Signal Process.**61**(2), 316–326 (2013).MathSciNetView ArticleMATHGoogle Scholar - B. Ait-El-Fquih, F. Desbouvries, On Bayesian fixed-interval smoothing algorithms. IEEE Trans. Autom. Control.
**53**(10), 2437–2442 (2008).MathSciNetView ArticleMATHGoogle Scholar - D. Fraser, J. Potter, The optimum linear smoother as a combination of two optimum linear filters. IEEE Trans. Autom. Control.
**14**(4), 387–390 (1969).MathSciNetView ArticleGoogle Scholar - M. Briers, A. Doucet, S. Maskell, Smoothing algorithms for state–space models. Ann. Inst. Stat. Math.
**62**(1), 61–89 (2010).MathSciNetView ArticleMATHGoogle Scholar - J.S. Meditch, On optimal linear smoothing theory. J. Inf. Control.
**10**(6), 598–615 (1967).View ArticleMATHGoogle Scholar - T. Kailath, P. Frost, An innovations approach to least-squares estimation–part II: linear smoothing in additive white noise. IEEE Trans. Autom. Control.
**13**(6), 655–660 (1968).View ArticleGoogle Scholar - S. Nakamori, A. Hermoso-Carazo, J. Linares-Pérez, Design of a fixed-interval smoother using covariance information based on the innovations approach in linear discrete-time stochastic systems. Appl. Math. Model.
**30**(5), 406–417 (2006).View ArticleMATHGoogle Scholar - S. Nakamori, A. Hermoso-Carazo, J. Linares-Pérez, M. Sánchez-Rodrıguez, Fixed-interval smoothing problem from uncertain observations with correlated signal and noise. Appl. Math. Comput.
**154**(1), 239–255 (2004).MathSciNetMATHGoogle Scholar - F.R. Kschischang, B. J. Frey, A. H-Loeliger, Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory.
**47**(2), 498–519 (2001).MathSciNetView ArticleMATHGoogle Scholar - R.E. Griffin, A. P. Sage, Sensitivity analysis of discrete filtering and smoothing algorithms. AIAA J.
**7**(10), 1890–1897 (1969).View ArticleMATHGoogle Scholar - S. Challa, M.R. Morelande, D. Musicki, R.J. Evans,
*Fundamentals of object tracking*(Cambridge University Press, Cambridge, 2011).View ArticleGoogle Scholar - J.L. Williams, Marginal multi-Bernoulli filters: RFS derivation of MHT, JIPDA, and association-based MeMBer. IEEE Trans. Aerosp. Electron. Syst.
**51**(3), 1664–1687 (2015).View ArticleGoogle Scholar - H. Liu, S. Zhou, H. Liu, H. Wang, in Radar Conference (Radar), 2014 International. Radar detection during tracking with constant track false alarm rate (IEEE, 2014), pp. 1–5.Google Scholar
- L. Qian, H. Wang, E.R. Dougherty, Inference of noisy nonlinear differential equation models for gene regulatory networks using genetic programming and Kalman filtering. IEEE Trans. Signal Process.
**56**(7), 3327–3339 (2008).MathSciNetView ArticleMATHGoogle Scholar - M.S. Esfahani, E.R. Dougherty, Incorporation of biological pathway knowledge in the construction of priors for optimal Bayesian classification. IEEE/ACM Trans. Comput. Biol. Bioinf.
**11**(1), 202–218 (2014).View ArticleGoogle Scholar