Open Access

Particle rejuvenation of Rao-Blackwellized sequential Monte Carlo smoothers for conditionally linear and Gaussian models

EURASIP Journal on Advances in Signal Processing20172017:54

Received: 18 November 2016

Accepted: 3 July 2017

Published: 25 July 2017


This paper focuses on sequential Monte Carlo approximations of smoothing distributions in conditionally linear and Gaussian state spaces. To reduce Monte Carlo variance of smoothers, it is typical in these models to use Rao-Blackwellization: particle approximation is used to sample sequences of hidden regimes while the Gaussian states are explicitly integrated conditional on the sequence of regimes and observations, using variants of the Kalman filter/smoother. The first successful attempt to use Rao-Blackwellization for smoothing extends the Bryson-Frazier smoother for Gaussian linear state space models using the generalized two-filter formula together with Kalman filters/smoothers. More recently, a forward-backward decomposition of smoothing distributions mimicking the Rauch-Tung-Striebel smoother for the regimes combined with backward Kalman updates has been introduced. This paper investigates the benefit of introducing additional rejuvenation steps in all these algorithms to sample at each time instant new regimes conditional on the forward and backward particles. This defines particle-based approximations of the smoothing distributions whose support is not restricted to the set of particles sampled in the forward or backward filter. These procedures are applied to commodity markets which are described using a two-factor model based on the spot price and a convenience yield for crude oil data.

1 Introduction

State space models are bivariate stochastic processes {(Y i ,Z i )} i≥1 where the state sequence (Z i ) i≥1 is a Markov chain which is only partially observed through the sequence (Y i ) i≥1. Conditionally, on the state sequence (Z i ) i≥1, the observations are independent, and for all ≥1, the conditional distribution of Y given (Z i ) i≥1 depends on Z only. These models are used in a large variety of disciplines such as financial econometrics, biology, signal processing, see [10], and the references therein. In general state space models, Bayesian filtering and smoothing problems, i.e., the computation of the posterior distributions of a sequence of states (Z i ,…,Z p ) for 1≤ip given observations (Y 1,…,Y ), are challenging tasks. Filtering refers to the estimation of the distributions of the hidden state Z i given the observations (Y 1,…,Y i ) up to time i, while fixed-interval smoothing stands for the estimation of the distribution of sequence of states (Z i ,…,Z p ) given observations (Y 1,…,Y ) with 1≤ip<. When the state and observation models are linear and Gaussian, filtering can be solved explicitly using the Kalman filter [18]. Exact solutions of the fixed-horizon smoothing problem can be obtained using either the Rauch-Tung-Striebel smoother [22] or the Bryson-Frazier two-filter smoother [4]. This paper focuses on conditionally linear and Gaussian models (CLGM) given for i≥2 by:
$$ Z_{i} = d_{a_{i}} + T_{a_{i}}Z_{i-1}+H_{a_{i}}\varepsilon_{i}\,, $$
  • (ε i ) i≥2 is a sequence of independent and identically distributed (i.i.d.) m-dimensional Gaussian vectors with zero mean and identity covariance.

  • (a i ) i≥1 is a homogeneous Markov chain taking values in a finite space {1,…,J}, called regimes, with initial distribution π and transition matrix Q.

  • (H j )1≤jJ are m×m positive-definite matrices, (d j )1≤jJ m-dimensional vectors, and (T j )1≤jJ m×m positive-definite matrices.

  • Z 1 is a m-dimensional Gaussian random variable with mean μ 1 and variance Σ 1 independent of (ε i ) i≥2.

Let n be the number of observations. At each time step 1≤in, the observation Y i is given by:
$$ Y_{i} = c_{a_{i}} + B_{a_{i}}Z_{i} + G_{a_{i}}\eta_{i}\,, $$
  • (η i ) i≥1 is a i.i.d. sequence of p-dimensional Gaussian vectors, independent of (ε i ) i≥2 and Z 1.

  • (G j )1≤jJ are p×p positive-definite matrices, (c j )1≤jJ p-dimensional vectors and (B j )1≤jJ p×m matrices.

CLGM play an important role in many applications; see [23] and the references therein for an up-to-date account. A crucial feature of these models is that, conditional on the regime sequence (a 1,…,a n ), both the state equation and the observation equation are linear and Gaussian, which implies that conditional on the sequence of regimes and on the observations, the filtering and the smoothing distributions of the continuous states (Z 1,…,Z n ) can be computed explicitly.

To exploit this specific structure, it has been suggested in the pioneering works of [6, 7] to solve the filtering problem by combining sequential Monte Carlo (SMC) methods to sample the regimes with the Kalman filter to compute the conditional distribution of the states sequence (Z i )1≤in conditional on the regimes and on the observations. This is a specific instance of Rao-Blackwellized Monte Carlo filters, often referred to as the mixture Kalman filter. Improvements of these early filtering techniques have been introduced in [8, 26].

The use of Rao-Blackwellization to solve the smoothing problem has been proved to be more challenging and has received satisfactory solutions only recently. The first forward-backward smoother proposed in the literature [12] was not fully Rao-Blackwellized as it required to sample the hidden linear states in the backward pass. An alternative approach, based on the so-called structural approximation of the model suggested in an early paper by [19], was proposed by [2] to avoid to sample a continuous state in the backward pass. This approximation is rather ad hoc and the resulting smoother is not consistent when the number of particles goes to infinity. The inaccuracy introduced by the approximation might be difficult to control.

The first fully Rao-Blackwellized SMC smoother which should lead to consistent approximations when the number of particles grows to infinity was proposed by [3] and extends the Bryson-Frazier smoother for Gaussian linear state space models using the generalized two-filter formula with Rao-Blackwellization steps for the forward and the backward filters. This two-filter approach combines a forward filter with a backward information filter which are approximated numerically using SMC for the regime sequence and Kalman filtering techniques for the hidden linear states.

More recently, [20, 21, 24] introduced a Rao-Blackwellized smoother based on the forward-backward decomposition of the FFBS algorithm with Rao-Blackwellization steps both in the forward and backward time directions. The update of the smoothing distribution of the regime given the observations shares some striking similarities with the Rauch-Tung-Striebel smoothing procedure, which is at the heart of the FFBS procedure. The Rao-Blackwellization requires to update backward in time the smoothing distribution of the states given the regimes and the observations, which is achieved by using an à la Kalman backward update.

In this paper, we propose to improve the performance of the algorithms introduced in [3] and in [20, 21, 24] by using additional Rao-Blackwellization steps which allows to sample new particles in the backward pass. This approach may be seen as an extension of the ideas of [11] for Rao-Blackwellized smoothers. In [3], for all 1≤in, the sampled forward and backward sequences are merged to approximate the posterior distribution of (a i ,z i ). This provides an approximation whose support is restricted to the particles produced at time i by the backward particle filter. As noted in ([13], Section 2.6), these two-filter smoothers are prone to suffer from degeneracy issues when the algorithm associates forward particles at time i−1 with backward particles at time i. We propose to approximate the marginal smoothing distribution of (a i ,z i ) by merging the sampled forward and backward trajectories at times i−1 and i+1 and integrating out all possible paths between time i−1 and time i and between time i and time i+1 instead of sampling random variables. Similarly, in the backward pass of [20, 21, 24], a regime \(\tilde {a}_{i}\) is sampled at time 1≤in−1 using the particles produced by the forward filter at time i. In this case, particle rejuvenation may be introduced by using the forward weighted samples at time i−1 and extending these trajectories at time i with a Kalman filter for all possible values of the regime. Then, \(\tilde {a}_{i}\) is sampled in {1,…,J} using an appropriately adapted weight.

The paper is organized as follows. The algorithms introduced in [3] and in [20, 21, 24] as long as the proposed rejuvenation associated with each method are presented in Section 2. The performance of all these methods is illustrated in Section 3 with simulated data. In Section 4, an application to commodity markets is presented; the performance of our procedure is illustrated with crude oil data. A detailed derivation of the algorithms is provided in the “Appendix: Technical lemmas”.

2 Rao-Blackwellized smoothing algorithms

This section details the sequential Monte Carlo algorithms which can be used to approximate the conditional distribution of the states (a 1,…,a n ) or the marginal distributions of (a i ,z i ) given the observations (Y 1,…,Y n ). For all m × m matrix let |A| be the determinant of A. If A is a positive-definite matrix, for all \(z\in \mathbb {R}^{m}\) define
$$\left\|z\right\|_{A}^{2} {:=} z'A^{-1}z\,, $$
where for any vector or matrix z, z denotes the transpose matrix of z. Let m(a i ,z i−1;z i ) be the probability density of the conditional distribution of Z i given (a i ,Z i−1) and g(a i ,z i ;y i ) be the probability density of the conditional distribution of Y i given (a i ,Z i ):
$$ {}\begin{aligned} m_{}(a_{i},z_{i-1};z_{i}) & {:=} \left|2\pi\overline{H}_{a_{i}}\right|^{-1/2}\exp\!\left\{-\frac{1}{2}\left\|z_{i} -d_{a_{i}} \,-\,T_{a_{i}}z_{i-1}\right\|_{\overline{H}_{a_{i}}}^{2}\right\}\,, \end{aligned} $$
$$ {}\begin{aligned} g_{}(a_{i},z_{i};y_{i}) & {:=} \left|2\pi\overline{G}_{a_{i}}\right|^{-1/2}\exp\!\left\{-\frac{1}{2}\left\|y_{i} - c_{a_{i}} \,-\, B_{a_{i}}z_{i}\right\|_{\overline{G}_{a_{i}}}^{2}\right\}\,, \end{aligned} $$
$$\overline{G}_{j} {:=} G_{j}G'_{j}\;,\; \overline{H}_{j} {:=} H_{j}H'_{j}\,. $$

All the algorithms considered in this paper are based on forward-backward or two-filter decompositions of the smoothing distributions and share the same forward filter presented in Section 2.1.

2.1 Forward filter

The SMC approximation \(p^{N}_{}(a_{1:i},z_{i}|y_{1:i})\) of p(a 1:i ,z i |y 1:i ) may be obtained using a standard Rao-Blackwellized algorithm. The procedure produces a sequence of trajectories \(\left (a^{k}_{1:i}\right)_{1\le k \le N}\) associated with normalized importance weights \(\left (\omega ^{k}_{i}\right)_{1\le k \le N} \left (\sum _{k=1}^{N} \omega ^{k}_{i} = 1\right)\) used to define the following approximation of p(a 1:i ,z i |y 1:i ):
$$ p^{N}_{}(a_{1:i},z_{i}|y_{1:i}) = \sum_{k=1}^{N}\omega^{k}_{i}\,p_{}\left(z_{i}|a^{k}_{1:i},y_{1:i}\right)\,\delta_{a^{k}_{1:i}}(a_{1:i})\,, $$

where δ is the Dirac delta function. In this equation, the conditional distribution of the hidden state z i given the observations y 1:i and a trajectory \(a^{k}_{1:i}\) is a Gaussian distribution whose mean \(\mu ^{k}_{i}\) and variance \(P^{k}_{i}\) may be obtained by using the Kalman filter update.

2.1.1 Initialization

At time i=1, write, for all 1≤jJ,
$$\mu^{j}_{1|0} = c_{j}+B_{j}\mu_{1}\;\;\text{and}\;\;P_{1|0}^{j} =B_{j}\Sigma_{1}B'_{j} + \overline{G}_{j}\,. $$
\(\left (a^{k}_{1}\right)_{1\le k \le N}\) are sampled independently in {1,…,J} with probabilities proportional to
$$\begin{array}{*{20}l} \pi_{j} p(a_{1}&=j|y_{1}) \propto \pi_{j} \left|P_{1|0}^{j}\right|^{-1/2}\exp\left\{-\left(y_{1}-\mu^{j}_{1|0}\right)'\right.\\ &\quad \times \left.\left(P_{1|0}^{j}\right)^{-1}\left(y_{1}-\mu^{j}_{1|0}\right)/2\right\}\,. \end{array} $$
Then, \(\mu _{1}^{k}\) and \(P_{1}^{k}\) are computed using a Kalman filter:
$$\begin{array}{*{20}l} K^{k}_{1} &=\Sigma_{1}B'_{a_{1}^{k}}\left(B_{a_{1}^{k}}\Sigma_{1}B'_{a_{1}^{k}} + \overline{G}_{a_{1}^{k}}\right)^{-1}\,,\\ \mu^{k}_{1} &= \mu_{1} + K^{k}_{1}\left(Y_{1} - c_{a_{1}^{k}} - B_{a_{1}^{k}}\mu_{1}\right)\,,\\ P^{k}_{1} &=\left(I_{\mathsf{m}}-K^{k}_{1}B_{a_{1}^{k}}\right)\Sigma_{1}\,, \end{array} $$

where for all positive integer p, I p is the p×p identity matrix. Each particle \(a_{1}^{k}\) is associated with the importance weight \(\omega ^{k}_{1} = 1/N\).

2.1.2 Iterations

Several procedures may be used to extend the trajectories \(\left (a^{k}_{1:i-1}\right)_{1\le k \le N}\) at time i. For all sampled trajectories \(\left (a_{1:i-1}^{k}\right)_{1\le k \le N}\) and all 1≤jJ, [6] used the incremental weights:
$$\gamma_{i}^{j,k} = p\left(y_{i} | a_{i} = j, a_{1:i-1}^{k}, y_{1:i-1}\right) Q\left(a_{i-1}^{k},j\right)\,. $$
The conditional distribution of Y i given \(a^{k}_{1:i-1}\), a i , and Y 1:i−1 is a Gaussian distribution with mean \(c_{a_{i}}+B_{a_{i}}\mu ^{k}_{i|i-1}(a_{i})\) and variance \(B_{a_{i}}P^{k}_{i|i-1}(a_{i})B'_{a_{i}} + \overline {G}_{a_{i}}\) where
$$\begin{array}{*{20}l} \mu^{k}_{i|i-1}(a_{i}) &= d_{a_{i}} + T_{a_{i}}\mu^{k}_{i-1}\,,\\ P^{k}_{i|i-1}(a_{i}) &= T_{a_{i}}P^{k}_{i-1}T'_{a_{i}} + \overline{H}_{a_{i}}\,. \end{array} $$
$$\begin{array}{*{20}l} \gamma_{i}^{j,k} \propto &Q\left(a_{i-1}^{k},j\right)\left|B_{j}P^{j,k}_{i|i-1}B'_{j} + \overline{G}_{j}\right|^{-1/2}\exp\left\{-\frac{1}{2}\left\|y_{i}{\vphantom{\mu^{j,k}_{i|i-1}}}\right.\right.\\ &\left.\left. \quad-c_{j}-B_{j}\mu^{j,k}_{i|i-1}\right\|_{B_{j}P^{j,k}_{i|i-1}B'_{j} + \overline{G}_{j}}^{2}\right\} \,, \end{array} $$
$$\begin{array}{*{20}l} \mu^{j,k}_{i|i-1} &= \mu^{k}_{i|i-1}(j) = d_{j} + T_{j}\mu^{k}_{i-1}\,, \end{array} $$
$$\begin{array}{*{20}l} P^{j,k}_{i|i-1} &= P^{k}_{i|i-1}(j) = T_{j}P^{k}_{i-1}T'_{j} + \overline{H}_{j}\,. \end{array} $$
In [6], for all 1≤kN, an ancestral path is chosen with probabilities proportional to \(\left (\omega ^{k}_{i-1}\right)_{1\le k \le N}\). Then, the new regime \(a_{i}^{k}\) is sampled in {1,…,J} with probabilities proportional to \((\gamma _{i}^{j,k})_{1\le j\le J}\). A drawback of this method is that only ancestral paths that have been selected using the importance weights \(\left (\omega ^{k}_{i-1}\right)_{1\le k \le N}\) are extended at time i. Following [5], this may be improved by considering all the offsprings of all ancestral trajectories \(\left (a_{1:i-1}^{k}\right)_{1\le k \le N}\). Each ancestral path has J offsprings at time i, it is thus necessary to choose a given number of trajectories at time i (for instance N) among the NJ possible paths. To obtain the weight associated with each offspring, write the following approximation of p(a 1:i |y 1:i ) based on the weighted samples at time i−1:
$$\begin{aligned} p^{N}(a_{1:i}|y_{1:i})\propto& \sum_{k=1}^{N}\omega^{k}_{i-1}Q\left(a^{k}_{i-1},a_{i}\right)\\ & p\left(y_{i}|a^{k}_{1:i-1},a_{i},y_{1:i-1}\right)\delta_{a^{k}_{1:i-1}}(a_{1:i-1})\,,\\ \propto& \sum_{k=1}^{N}\sum_{j=1}^{J}\omega^{k}_{i-1}\gamma_{i}^{j,k}\delta_{\left(a^{k}_{1:i-1},j\right)}(a_{1:i})\,. \end{aligned} $$

Therefore, each ancestral trajectory of the form \(\left (a^{k}_{1:i-1},j\right)\), 1≤kN, 1≤jJ, is associated with the normalized weight \(\tilde {\omega }^{j,k}_{i} \propto \omega ^{k}_{i-1}\gamma _{i}^{j,k}\). Several random selection schemes have been proposed to discard some of the possible offsprings to maintain an average number of N particles at each time step. Following [5], we might choose between the Kullback-Leibler optimal selection (KL-OS) or the chi-squared optimal selection (CS-OS) to associate a new weight to each of the NJ trajectories. If the new weight is 0, then the corresponding particle can be removed. KL-OS:
λ is chosen as the solution of
$$\sum_{k=1}^{N}\sum_{j=1}^{J}\text{min}\left(\tilde{\omega}^{j,k}_{i}/\lambda,1\right) = N\,. $$
For all 1≤jJ and 1≤kN, if \(\tilde {\omega }^{j,k}_{i}\ge \lambda \) then the new weight \(\tilde {\Omega }^{j,k}_{i}\) is \(\tilde {\Omega }^{j,k}_{i}=\tilde {\omega }^{j,k}_{i}\) and if \(\tilde {\omega }^{j,k}_{i}< \lambda \):
$$\tilde{\Omega}^{j,k}_{i}= \left\{ \begin{array}{rl} &\lambda \;\text{with probability~} \tilde{\omega}^{j,k}_{i}/\lambda\,,\\ &0 \;\text{with probability~} 1-\tilde{\omega}^{j,k}_{i}/\lambda\,. \end{array} \right. $$ CS-OS:
λ is chosen as the solution of
$$\sum_{k=1}^{N}\sum_{j=1}^{J}\text{min}\left(\sqrt{\tilde{\omega}^{j,k}_{i}/\lambda},1\right) = N\,. $$
For all 1≤jJ and 1≤kN, if \(\tilde {\omega }^{j,k}_{i}\ge \lambda \) then the new weight \(\tilde {\Omega }^{j,k}_{i}\) is \(\tilde {\Omega }^{j,k}_{i}=\tilde {\omega }^{j,k}_{i}\) and if \(\tilde {\omega }^{j,k}_{i}< \lambda \):
$$\tilde{\Omega}^{j,k}_{i}= \left\{ \begin{array}{rl} &\sqrt{\tilde{\omega}^{j,k}_{i}\lambda} \;\text{with probability} \sqrt{\tilde{\omega}^{j,k}_{i}/\lambda}\,,\\ &0 \;\text{with probability} 1-\sqrt{\tilde{\omega}^{j,k}_{i}/\lambda}\,. \end{array} \right. $$

Then, in both cases, all particles such that \(\tilde {\Omega }^{j,k}_{i} = 0\) are discarded and for all the other trajectories defined as an ancestral path \(\left (a^{k}_{1:i-1}\right)\) extended by \(a^{k}_{i} = j\), the new corresponding weight ω in (5) is given by the normalized weight \(\tilde {\Omega }^{j,k}_{i}\). In the numerical sections of this paper, the Kullback-Leibler optimal selection (KL-OS) scheme is used.

2.2 FFBS-based algorithms

2.2.1 FFBS algorithms of [20, 21, 24]

Lindsten et al. [20, 21, 24] proposed a Rao-Blackwellized procedure to sample the regime backward in time following the same steps as in the forward filtering backward smoothing algorithm [7, 15]. The algorithm relies on the decomposition given, for all 1≤in−1, by:
$$p(a_{1:n}|y_{1:n}) = p(a_{1:i}|a_{i+1:n},y_{1:n})p(a_{i+1:n}|y_{1:n})\,. $$
This decomposition is similar to the Rauch-Tung-Striebel decomposition of the filtering distribution. The first factor on the right hand side of the previous equation is nevertheless more difficult to handle because it itself relies on all the observations. As noted by [24], this term can be computed recursively by considering the following decomposition:
$$ {}p(a_{1:i}|a_{i+1:n},y_{1:n}) \propto p(y_{i+1:n},a_{i+1:n}|a_{1:i},y_{1:i})p(a_{1:i}|y_{1:i})\,. $$
The second factor in the last equation may be approximated using the ancestral trajectories \(\left (a^{k}_{1:i}\right)_{1\le k \le N}\) and the associated importance weights \(\left (\omega ^{k}_{i}\right)_{1\le k \le N}\) produced by the forward filter. Therefore, p(a 1:i |a i+1:n ,y 1:n ) may be approximated by:
$$\begin{array}{*{20}l} p^{N}&(a_{1:i}|a_{i+1:n},y_{1:n}) = \sum_{k=1}^{N} \tilde{\omega}^{k}_{i|n}\delta_{a^{k}_{1:i}}(a_{1:i}) \\ &\text{with} \quad \tilde{\omega}^{k}_{i|n} \propto \omega_{i}^{k} p\left(y_{i+1:n},a_{i+1:n}|a^{k}_{1:i},y_{1:i}\right) \,. \end{array} $$
Then, a trajectory \(\tilde {a}_{1:n}\) approximatively distributed according to p(a 1:n |y 1:n ) may be sampled following these steps:
  • Set \(\tilde {a}_{n}= a_{n}^{k}\) with probabilities proportional to \(\left (\omega _{n}^{k}\right)_{1\le k \le N}\).

  • For all 1≤in−1, set \(\tilde {a}_{i} = a_{i}^{k}\) with probabilities proportional to \(\left (\tilde {\omega }_{i|n}^{k}\right)_{1\le k \le N}\).

This algorithm requires to compute the quantity \(p\left (y_{i+1:n},a_{i+1:n}|a^{k}_{1:i},y_{1:i}\right)\). This predictive quantity is available analytically using Kalman filtering techniques. However, this has to be done for each trajectory \(\left (a^{k}_{1:i}\right)_{1\le k \le N}\), which leads to an algorithm with a prohibitive computational complexity. Lindsten et al. [21] proposed a procedure computationally less intensive by conditioning with respect to z i and then marginalizing with respect to this variable:
$$ \begin{aligned} {}p\left(y_{i+1:n},a_{i+1:n}|a^{k}_{1:i},y_{1:i}\right) =& \int p\left(y_{i+1:n},a_{i+1:n}|z_{i},a^{k}_{i}\right)\\ &\times p\left(z_{i}|a^{k}_{1:i},y_{1:i}\right)\mathrm{d} z_{i}\,. \end{aligned} $$
This is similar to the two-filter decomposition of the smoothing distribution, see Section 2.3. By [21],
$$\begin{array}{*{20}l} {}p(y_{i+1:n},a_{i+1:n}|z_{i},a_{i}) \propto Q(a_{i},a_{i+1}) \exp\left\{-\left(z_{i}'\Omega_{i}(a_{i+1:n})z_{i}\right.\right.\\ -\left.\left.2\lambda_{i}'(a_{i+1:n})z_{i}\right)/2\right\}\,, \end{array} $$
where the proportionality is with respect to (a i ,z i ) and
$${}p(y_{i:n},a_{i+1:n}|z_{i},a_{i}) \!\propto\! \exp\left\{-\left(z_{i}'\widehat{\Omega}_{i}(a_{i:n})z_{i}-\! 2\widehat{\lambda}'_{i}(a_{i:n})z_{i}\right)/2\right\}\!, $$
where the proportionality is with respect to z i . These quantities may be computed recursively backward in time with:
$$\begin{array}{*{20}l} \widehat{\Omega}_{n}(a_{n}) &= B'_{a_{n}}\overline{G}^{-1}_{a_{n}}B_{a_{n}}\,,\\ \widehat{\lambda}_{n}(a_{n}) &=B'_{a_{n}}\overline{G}^{-1}_{a_{n}}(y_{n}-c_{a_{n}})\,. \end{array} $$
Then, for 1≤in−1, define \(m_{i+1} = \widehat {\lambda }_{i+1} - \widehat {\Omega }_{i+1}d_{a_{i+1}}\) and \(M_{i+1} = H_{a_{i+1}}'\widehat {\Omega }_{i+1}H_{a_{i+1}} + I_{\mathsf {m}}\) and write
$$\begin{array}{*{20}l} &\Omega_{i}(a_{i+1:n})\\&\quad= T'_{a_{i+1}}\left(I_{\mathsf{m}}-\widehat{\Omega}_{i+1}(a_{i+1:n})H_{a_{i+1}}M^{-1}_{i+1}H'_{a_{i+1}}\right)\\ &\qquad\times\widehat{\Omega}_{i+1}(a_{i+1:n})T_{a_{i+1}}\,,\\ &\lambda_{i}(a_{i+1:n})\\&\quad=T'_{a_{i+1}}\left(I_{\mathsf{m}}-\widehat{\Omega}_{i+1}(a_{i+1:n})H_{a_{i+1}}M^{-1}_{i+1}H'_{a_{i+1}}\right)m_{i+1}\,. \end{array} $$
As p(y i:n ,a i+1:n |z i ,a i )=p(y i |z i ,a i )p(y i+1:n ,a i+1:n |z i ,a i ),
$$\begin{array}{*{20}l} \widehat{\Omega}_{i}(a_{i:n}) &= \Omega_{i}(a_{i+1:n})+ B'_{a_{i}}\overline{G}^{-1}_{a_{i}}B_{a_{i}}\,,\\ \widehat{\lambda}_{i}(a_{i:n}) & = \lambda_{i}(a_{i+1:n}) + B'_{a_{i}}\overline{G}^{-1}_{a_{i}}(y_{i}-c_{a_{i}})\,. \end{array} $$
Then, by (9),
$$ {}\begin{aligned} p\left(y_{i+1:n},a_{i+1:n}|a^{k}_{1:i},y_{1:i}\right)\propto &Q\left(a_{i}^{k},a_{i+1}\right)\left|\Lambda^{k}_{i}(a_{i+1:n})\right|^{-1/2}\\ &\times\exp\left\{-\eta^{k}_{i}(a_{i+1:n})/2\right\}\,, \end{aligned} $$
where the proportionality is with respect to \(a^{k}_{1:i}\) and
$$\begin{array}{*{20}l} \Lambda^{k}_{i}(a_{i+1:n})&= \left(\Gamma_{i}^{k}\right)'\Omega_{i}(a_{i+1:n})\Gamma_{i}^{k} + I_{\mathsf{m}}\,,\\ \eta^{k}_{i}(a_{i+1:n}) &= \|\mu_{i}^{k}\|^{2}_{\Omega^{-1}_{i}(a_{i+1:n})} - 2\lambda'_{i}(a_{i+1:n})\mu_{i}^{k}\\ &\quad-\|\left(\Gamma_{i}^{k}\right)'(\lambda_{i}(a_{i+1:n})\\ &\quad-\Omega_{i}(a_{i+1:n})\mu_{i}^{k})\|^{2}_{\Lambda_{i}(a_{i+1:n})}\,, \end{array} $$
where \(P_{i}^{k} = \Gamma _{i}^{k}(\Gamma _{i}^{k})'\). Therefore,
$${}\tilde{\omega}_{i|n} \propto \omega_{i}^{k}Q\!\left(\!a_{i}^{k},a_{i+1}\!\right)\!\left|\Lambda^{k}_{i}(a_{i+1:n})\right|^{-1/2}\exp\!\left\{\!-\eta^{k}_{i}(a_{i+1:n})/2\!\right\}. $$
If \(\left (\tilde {a}^{k}_{1:n}\right)_{1\le k \le \tilde {N}}\) are independent copies of \(\tilde {a}_{1:n}\), the SMC approximation of [21] of the joint smoothing distribution of the regime is:
$$p^{\mathsf{Lbscg}}_{\tilde{N}}(a_{1:n}|Y_{1:n}) = \frac{1}{\tilde{N}}\sum_{k=1}^{\tilde N} \delta_{\tilde{a}^{k}_{1:n}}(a_{1:n})\,. $$

2.2.2 Particle rejuvenation of FFBS algorithms

The crucial step of the FFBS algorithm is the decomposition (8) which allows to extend a backward trajectory \(\tilde {a}_{i+1:n}\) by choosing a particle in the set \(\left (a_{i}^{k}\right)_{1\le k \le N}\) produced by the forward filter (and discarding the states \(a^{k}_{1:i-1}\)). An improved version of this FFBS algorithm which is not constrained to sample states in the support \(\left (a_{i}^{k}\right)_{1\le k \le N}\) may be defined for all 2≤in−1 by writing:
$$\begin{array}{*{20}l} {}p(a_{1:i}|a_{i+1:n},y_{1:n}) &\propto p(y_{i+1:n},a_{i+1:n}|a_{1:i},y_{1:i})p(a_{1:i}|y_{1:i})\,,\\ &\propto p(y_{i+1:n},a_{i+1:n}|a_{1:i},y_{1:i})\\ &\quad\times\int p(a_{1:i-1},z_{i-1}|y_{1:i-1})Q(a_{i-1},a_{i})\\ &\quad\times m(a_{i},z_{i-1};z_{i})g(a_{i},z_{i};y_{i})\mathrm{d} z_{i-1:i}\,. \end{array} $$
Replacing p(a 1:i−1,z i−1|y 1:i−1) in the integral by the particle approximation obtained during the forward pass and using Kalman filtering techniques for each trajectory \(\left (a^{k}_{1:i-1}\right)_{1\le k\le N}\) and each a i {1,…,J} yields:
$$\begin{array}{*{20}l} \int& p^{N}(a_{1:i-1},z_{i-1}|y_{1:i-1})Q(a_{i-1},a_{i})m(a_{i},z_{i-1};z_{i})\\ &\times g(a_{i},z_{i};y_{i})\mathrm{d} z_{i-1:i} \propto \sum_{k=1}^{N} \omega_{i|i-1}^{k}(a_{i})\delta_{a^{k}_{1:i-1}}(a_{1:i-1})\,, \end{array} $$
$$\begin{array}{*{20}l} \omega_{i|i-1}^{k}(a_{i}) =&\ \omega_{i-1}^{k} Q\left(a_{i-1}^{k},a_{i}\right)|\Sigma^{k}_{i|i-1}(a_{i})|^{-1/2}\\ &\times\text{exp}\left\{-\frac{1}{2}\left\|y_{i} - y^{k}_{i|i-1}(a_{i})\right\|^{2}_{\Sigma^{k}_{i|i-1}(a_{i})}\right\}\,, \end{array} $$
$$\begin{array}{*{20}l} {}&y^{k}_{i|i-1}(a_{i}) = c_{a_{i}} + B_{a_{i}}\left(d_{a_{i}}+T_{a_{i}}\mu^{k}_{i-1}\right)\\ &\text{and}\; \Sigma^{k}_{i|i-1}(a_{i}) = B_{a_{i}}\left(T_{a_{i}}P^{k}_{i-1}T'_{a_{i}}+\overline{H}_{a_{i}}\right)B'_{a_{i}} + \overline{G}_{a_{i}}\,. \end{array} $$
On the other hand, for all 1≤kN, \(p\left (y_{i+1:n},a_{i+1:n}|a^{k}_{1:i-1},a_{i},y_{1:i}\right)\) is computed as in (10) with all possible values a i {1,…,J} and not only the regime of the filtering pass \(\left (a_{i}^{k}\right)_{1\le k \le N}\). This means that a Kalman filter must be used for each trajectory \(a^{k}_{1:i-1}\) which may be extended by a i {1,…,J}. Denote by \(\mu _{i|i-1}^{k}(a_{i})\) and \(P_{i|i-1}^{k}(a_{i}),\) the mean and covariance matrix of the law of z i given \(\left (a^{k}_{1:i-1},a_{i}\right)\) obtained as in (6) and (7). Then,
$$ {}\begin{aligned} &p\left(y_{i+1:n},a_{i+1:n}|a^{k}_{1:i-1},a_{i},y_{1:i}\right)\\ &\qquad= Q(a_{i},a_{i+1})\!\left|\Lambda^{k}_{i|i-1}(a_{i:n})\right|^{-1/2}\exp\!\left\{\!-\eta^{k}_{i|i-1}(a_{i:n})/2\right\}, \end{aligned} $$
where the proportionality is with respect to \(\left (a^{k}_{1:i-1},a_{i}\right)\) and
$$\begin{array}{*{20}l} \Lambda^{k}_{i|i-1}(a_{i:n})&= \left(\Gamma_{i|i-1}^{k}(a_{i})\right)'\Omega_{i}(a_{i+1:n})\Gamma_{i|i-1}^{k}(a_{i}) + I_{\mathsf{m}}\,,\\ \eta^{k}_{i|i-1}(a_{i:n}) &= \left\|\mu_{i|i-1}^{k}(a_{i})\right\|^{2}_{\Omega^{-1}_{i}(a_{i+1:n})}\\ &\quad - 2\lambda'_{i}(a_{i+1:n})\mu_{i|i-1}^{k}(a_{i})\\ &\quad-\|\left(\Gamma_{i|i-1}^{k}(a_{i})\right)'(\lambda_{i}(a_{i+1:n})\\ &\quad-\Omega_{i}(a_{i+1:n})\mu_{i|i-1}^{k}(a_{i}))\|^{2}_{\Lambda_{i}(a_{i+1:n})}\,, \end{array} $$
where \(\Gamma _{i|i-1}^{k}(a_{i})\) is defined as \(P_{i|i-1}^{k}(a_{i}) = \Gamma _{i|i-1}^{k}(a_{i}) \left (\Gamma _{i|i-1}^{k}(a_{i})\right)'\). The distribution p(a 1:i |a i+1:n ,y 1:n ) is then approximated by :
$$\begin{array}{@{}rcl@{}} &&{}p^{N}(a_{1:i}|a_{i+1:n},y_{1:n})\\ &&{}\propto\sum_{k=1}^{N} \omega_{i|i-1}^{k}(a_{i})Q(a_{i},a_{i+1})\left|\Lambda^{k}_{i|i-1}(a_{i:n})\right|^{-1/2}\\ &&{}\quad\times\exp\left\{-\eta^{k}_{i|i-1}(a_{i:n})/2\right\}\delta_{a^{k}_{1:i-1}}(a_{1:i-1})\,. \end{array} $$

By integrating over all possible paths, a 1:i−1, \(\tilde {a}_{i}\) is sampled in {1,…,J}. This particle rejuvenation step allows to explore states which are not in the support of the particles produced by the forward filter and improves the accuracy and the variance of the original FFBS algorithm, see Section 3 for numerical illustrations.

Another modification of the FFBS algorithm based on a Markov chain Monte Carlo (MCMC) sampling step was introduced in ([21], Section 5.2). Instead of sampling from (12), ([21], Section 5.2) proposed to draw a forward path a 1:i−1 in \((a^{k}_{1:i-1})_{1\le k \le N}\) and a sate a i in {1,…,J} according to:
$$\begin{array}{*{20}l} &\widetilde{q}(a_{1:i}|a_{i+1:n},y_{1:n}) \\ &\quad= \sum_{k=1}^{N} \widetilde{\vartheta}^{k}_{i-1}\widetilde{q}\left(a_{i}|a^{k}_{1:i-1},a_{i+1:n},y_{1:n}\right)\delta_{a^{k}_{1:i-1}}(a_{1:i-1})\,, \end{array} $$

where \(\left (\widetilde {\vartheta }^{k}_{i-1}\right)_{1\le k \le N}\) are adjustment multipliers and \(\widetilde {q}\left (a_{i}|a^{k}_{1:i-1},a_{i+1:n},y_{1:n}\right)\) is a proposal kernel chosen by the user. This means that an ancestral path \(a^{\star }_{1:i-1}\) is sampled in \(\left (a^{k}_{1:i-1}\right)_{1\le k \le N}\) with weights \(\left (\widetilde {\vartheta }^{k}_{i-1}\right)_{1\le k \le N}\) and \(a^{\star }_{i}\) is sampled from \(\widetilde {q}(\cdot |a^{\star }_{1:i-1},a_{i+1:n},y_{1:n})\). Then, the proposed sequence \(a^{\star }_{1:i}\) is accepted or rejected using the usual Metropolis-Hastings acceptance ratio. The choice of MCMC rejuvenation has interesting practical consequences as the computation of the acceptance ratio only requires to compute the posterior probability (11) for the proposed sequence \(a^{\star }_{1:i}\) while our technique is based on the computation of (11) for all combinations of sequences \(\left (a^{k}_{1:i-1}\right)_{1\le k \le N}\) and states a i {1,…,J}. Sampling from (12) is computationally more intensive, especially when N is large, but our method is based on a direct approximation of p(a 1:i |a i+1:n ,y 1:n ) based on \(\left (a^{k}_{1:i-1}\right)_{1\le k \le N}\) and a i+1:n instead of approximate MCMC draws.

2.3 Rao-Blackwellized two-filter smoother

2.3.1 Rao-Blackwellized two-filter smoother of [3]

Contrary to the previous methods, two-filter-based smoothers are designed to compute approximations of marginal smoothing distributions (usually the posterior distribution of one or two consecutive regimes given all the observations). Briers et al. [3] introduced the following decomposition of the smoothing distributions for all 2≤in:
$$ p(a_{i},z_{i}|y_{1:n}) \propto p_{}(a_{i},z_{i}|y_{1:i-1})p(y_{i:n}|a_{i},z_{i})\,. $$
The first term on the right hand side may be approximated using the forward filter by noting that
$$\begin{array}{*{20}l} &p_{}(a_{i},z_{i}|y_{1:i-1})\\ &\quad=\sum_{a_{i-1}}\int_{z_{i-1}}p_{}(a_{i-1},z_{i-1}|y_{1:i-1})m_{}(a_{i},z_{i-1};z_{i})\\ &\qquad\times Q(a_{i-1},a_{i}) \mathrm{d} z_{i-1}\,. \end{array} $$
In the forward pass described in Section 2.1, a set of possible sequences of regimes \(a_{1:i-1}^{k}\) associated with importance weights \(\omega _{i-1}^{k}\), 1≤kN are sampled to approximate p(a i−1,z i−1|y 1:i−1). This provides a normalized approximation p N (a i ,z i |y 1:i−1) of p(a i ,z i |y 1:i−1). Define
$$\begin{array}{*{20}l} \Omega^{k}_{i-1}(a_{i}) &= T_{a_{i}}P^{k}_{i-1}T'_{a_{i}} + \overline{H}_{a_{i}}\,,\\ \mu^{k}_{i-1}(a_{i}) &= d_{a_{i}} + T_{a_{i}}\mu^{k}_{i-1}\,,\\ r_{i-1}^{k}(a_{i}) &= \left(\Omega^{k}_{i-1}(a_{i})\right)^{-1}\mu^{k}_{i-1}(a_{i})\,,\\ \omega_{\mathsf{f},i}^{k}(a_{i}) &= \omega^{k}_{i-1}Q\left(a^{k}_{i-1},a_{i}\right) \left|2\pi \Omega^{k}_{i-1}(a_{i})\right|^{-1/2}\\ &\quad\times\exp\left\{-\frac{1}{2}\left\|\mu^{k}_{i-1}(a_{i})\right\|_{\Omega^{k}_{i-1}(a_{i})}^{2}\right\}\,. \end{array} $$
$$\begin{array}{*{20}l} &p^{N}(a_{i},z_{i}|y_{1:i-1}) \\ &\quad= \sum_{k=1}^{N} \omega_{\mathsf{f},i}^{k}(a_{i})\exp\left\{-\frac{1}{2}\left\|z_{i}\right\|_{\Omega^{k}_{i-1}(a_{i})}^{2}+z'_{i}r_{i-1}^{k}(a_{i})\right\}\,. \end{array} $$
As the function (a i ,z i )p(y i:n |a i ,z i ) is not a probability density function, approximating the second term of (13) using SMC samples is not straightforward. The backward filter uses artificial densities to introduce a surrogate target density function which may be approximated recursively using SMC methods. Then, the forward and backward weighted samples are combined using (13) to approximate p(a i ,z i |y 1:n ). Following [3], for any probability densities \((\gamma ^{}_{i})_{1\le i \le n}\), define the following joint probability densities:
$$\begin{array}{*{20}l} \tilde{p}_{n}(a_{n},z_{n},y_{n})& {:=} \gamma^{}_{n}(a_{n},z_{n})g_{}(a_{n},z_{n};y_{n}),\\ \tilde{p}_{n}(y_{n})&{:=} \sum_{a_{n}=1}^{J}\int \gamma^{}_{n}(a_{n},z_{n})g_{}(a_{n},z_{n};y_{n})\mathrm{d} z_{n}\,, \end{array} $$
and, for all 1≤in−1,
$$\begin{array}{*{20}l} &\tilde{p}_{i}(a_{i:n},z_{i:n},y_{i:n}) \\ &\quad{:=} \gamma^{}_{i}(a_{i},z_{i})p_{}(y_{i:n}|a_{i:n},z_{i:n})p_{}(a_{i+1:n},z_{i+1:n}|a_{i},z_{i})\,,\\ &\tilde{p}_{i}(y_{i:n})\\ &\quad{:=} \sum_{a_{i:n}=1}^{J}\int \gamma^{}_{i}(a_{i},z_{i})p_{}(y_{i:n}|a_{i:n},z_{i:n})\\ &\qquad \times p_{}(a_{i+1:n},z_{i+1:n}|a_{i},z_{i})\mathrm{d} z_{i:n}\,. \end{array} $$

Note that this choice differs slightly from [3] where it is advocated to set \(\gamma ^{}_{i}\) as the product of two independent densities \(\gamma _{i}^{a}(a_{i})\) and \(\gamma _{i}^{z}(z_{i})\). As the accuracy of the algorithm relies heavily on a proper tuning of this artificial density, a more general choice of \(\gamma ^{}_{i}\) is considered in this paper. By Lemma 1, these probability densities may be used to approximate the quantities p(y i:n |a i ,z i ), 1≤in, in (13).

Lemma 1

For all 1≤in−1,
$$\begin{array}{*{20}l}{} \tilde{p}_{i}(a_{i},z_{i}|y_{i:n}) &= p_{}(y_{i:n}|a_{i},z_{i})\gamma^{}_{i}(a_{i},z_{i})/\tilde{p}_{i}(y_{i:n})\,, \end{array} $$
$$\begin{array}{*{20}l} {}\tilde{p}_{i}(a_{i},z_{i}|y_{i:n}) &\!= \!\gamma^{}_{i}(a_{i},z_{i})\sum_{a_{i+1:n}=1}^{J}\frac{\tilde{p}_{i}(a_{i:n}|y_{i:n})p_{}(y_{i:n}|a_{i:n},z_{i})}{\int \gamma^{}_{i}(a_{i},z')p_{}(y_{i:n}|a_{i:n},z')\mathrm{d} z'}\,. \end{array} $$


The proof is postponed to “Appendix: Technical lemmas”. □

By definition of \(\tilde {p}_{i}\) for all 1≤in,
$$\begin{array}{*{20}l} {}\tilde{p}_{i}(a_{i:n},z_{i}|y_{i:n}) &\propto \gamma^{}_{i}(a_{i},z_{i}) \int p_{}(y_{i:n}|a_{i:n},z_{i:n})\\ &\quad\; p(a_{i+1:n},z_{i+1:n}|a_{i},z_{i})\mathrm{d} z_{i+1:n}\,,\\ &\propto\gamma^{}_{i}(a_{i},z_{i})\!\left\{\prod_{u=i}^{n-1}Q(a_{u},a_{u+1})\right\}p(y_{i:n}|z_{i},a_{i:n})\,. \end{array} $$
This yields
$${}\tilde{p}_{i}(a_{i:n}|y_{i:n}) \!\propto\!\! \left\{\prod_{u=i}^{n-1}Q(a_{u},a_{u+1})\!\right\}\! \int\! \!\gamma^{}_{i}(a_{i},z_{i}) p(y_{i:n}|z_{i},a_{i:n})\mathrm{d} z_{i}\,. $$
A set of weighted trajectories \(\left (\tilde {a}^{\ell }_{i:n}\right)_{1\le \ell \le N}\) with importance weights \(\left (\tilde {\omega }^{\ell }_{i}\right)_{1\le \ell \le N}\), 1≤in, may then be sampled recursively backward in time to produce a SMC approximation of \(\tilde {p}_{}(a_{i:n}|y_{i:n})\) as follows.
  • For 1≤N, sample \(\tilde {a}^{j}_{n}\sim \tilde {q}_{n}(\cdot)\) and set:
    $$\tilde{\omega}^{\ell}_{n} \propto \frac{\int\gamma^{}_{n}\left(\tilde{a}^{\ell}_{n},z'\right)g_{}\left(\tilde{a}^{\ell}_{n},z';y_{n}\right)\mathrm{d} z'}{\tilde{q}_{n}\left(\tilde{a}^{\ell}_{n}\right)}\,. $$
  • For all 1≤in−1, resample the set \(\left (\tilde {a}^{\ell }_{i+1:n}\right)_{1\le j\le N}\) using the normalized weights \(\left (\tilde {\omega }^{\ell }_{i+1}\right)_{1\le j \le N}\). Then, for 1≤N, sample \(\tilde {a}^{j}_{i}\sim \tilde {q}_{i}\left (\tilde {a}^{\ell }_{i+1:n},\cdot \right)\) and set:
    $${}\tilde{\omega}^{\ell}_{i} \propto \frac{Q\left(\tilde{a}^{\ell}_{i},\tilde{a}^{\ell}_{i+1}\right)\int \gamma^{}_{i}\left(\tilde{a}^{\ell}_{i},z'\right)p_{}\left(y_{i:n}|\tilde{a}^{\ell}_{i:n},z'\right)\mathrm{d} z'}{\tilde{q}_{i}\left(\tilde{a}^{\ell}_{i+1:n},\tilde{a}^{\ell}_{i}\right)\int \gamma^{}_{i+1}\left(\tilde{a}^{\ell}_{i+1},z'\right)p_{}\left(y_{i+1:n}|\tilde{a}^{\ell}_{i+1:n},z'\right)\mathrm{d} z'}\,. $$
To obtain uniformly weighted samples at each time step, in the numerical experiments we use
$${} \begin{aligned} &\tilde{q}_{n}(\cdot) = \int\gamma^{}_{n}(\cdot,z')g_{}(\cdot,z';y_{n})\mathrm{d} z'\\ &\text{and}\quad \tilde{q}_{i}\left(\tilde{a}^{\ell}_{i+1:n},\cdot\right) = \frac{Q\left(\cdot,\tilde{a}^{\ell}_{i+1}\right)\int \gamma^{}_{i}(\cdot,z')p_{}(y_{i:n}|\left(\cdot,\tilde{a}^{\ell}_{i+1:n}\right),z')\mathrm{d} z'}{\int \gamma^{}_{i+1}\left(\tilde{a}^{\ell}_{i+1},z'\right)p_{}(y_{i+1:n}|\tilde{a}^{\ell}_{i+1:n},z')\mathrm{d} z'}\,. \end{aligned} $$
By (15) and (16),
$$\begin{array}{*{20}l} {}p_{}(y_{i:n}|a_{i},z_{i}) &= \frac{\tilde{p}_{i}(y_{i:n}) \tilde{p}_{i}(a_{i},z_{i}|y_{i:n})}{\gamma^{}_{i}(a_{i},z_{i})}\\ &= \tilde{p}_{i}(y_{i:n}) \sum_{a_{i+1:n}=1}^{J}\frac{\tilde{p}_{i}(a_{i:n}|y_{i:n})p_{}(y_{i:n}|a_{i:n},z_{i})}{\int \gamma^{}_{i}(a_{i},z')p_{}(y_{i:n}|a_{i:n},z')\mathrm{d} z'}\,, \end{array} $$
which suggests the following particle approximation \(p^{N}_{}(y_{i:n}|a_{i},z_{i})\) of p(y i:n |a i ,z i ):
$$ \begin{aligned} &p^{N}_{}(y_{i:n}|a_{i},z_{i}) \\ &\quad= \tilde{p}_{i}(y_{i:n}) \sum_{\ell=1}^{N}\frac{\tilde{\omega}^{\ell}_{i} p_{}(y_{i:n}|\tilde{a}^{\ell}_{i:n},z_{i})}{\int \gamma^{}_{i}(\tilde{a}^{\ell}_{i},z')p_{}(y_{i:n}|\tilde{a}^{\ell}_{i:n},z')\mathrm{d} z'}\delta_{\tilde{a}^{\ell}_{i}}(a_{i})\,. \end{aligned} $$

The conditional likelihood of the observations given the sequence of states p(y i:n |a i:n ,z i ) can be computed explicitly using a Gaussian backward smoother; these computations are summarized in Lemma 2. In the numerical experiments, \(\gamma ^{}_{i}(a_{i},z_{i})\) is set as a mixture of Gaussian distributions. Note that for such a choice, the integral \(\int \gamma ^{}_{i}(a_{i},z')p_{}(y_{i:n}|a_{i:n},z')\mathrm {d} z'\) may be computed explicitly, see Lemma 3. Then, combining (17) and (14) with (13) provides an approximation of p(a i ,z i |y 1:n ) by merging the forward particles \(\left (a^{k}_{i-1}\right)_{1\le k \le N}\) with the backward particles \(\left (\tilde {a}^{k}_{i+1}\right)_{1\le k \le N}\), the support of this SMC approximation of p(a i ,z i |y 1:n ) being \(\left (\tilde {a}^{k}_{i+1}\right)_{1\le k \le N}\).

As noted in ([13], Section 2.6), two-filter smoothers are prone to suffer from degeneracy issues when the algorithm associates forward particles at time i−1 with backward particles at time i. The authors illustrate this issue in the case where the hidden state is an AR(2) process. To overcome the weakness of such standard two-filter approaches the particle rejuvenation proposed in Section 2.3.2 follows the idea introduced in [13] where new particles at time i are sampled conditional on \((a^{k}_{1:i-1})_{1\le k \le N}\) and on \(\left (\tilde {a}^{k}_{i+1:n}\right)_{1\le k \le N}\) and appropriately weighted. This allows to produce new particles at time i and to obtain a SMC approximation of p(a i ,z i |y 1:n ) whose support is not restricted to \(\left (\tilde {a}^{k}_{i+1}\right)_{1\le k \le N}\). Section 2.3.2 exploits this idea in the specific case of linear and Gaussian models where explicit computations allows to produce an approximation using \(\left (a^{k}_{1:i-1}\right)_{1\le k \le N}\) and \(\left (\tilde {a}^{k}_{i+1:n}\right)_{1\le k \le N}\) with support {1,…,J} and without any additional sampling steps.

2.3.2 Particle rejuvenation of two-filter based algorithms

For 2≤in−1, particle rejuvenation relies on the explicit marginalization:
$$ {}\begin{aligned} &p(a_{i},z_{i}|y_{1:n}) \\ &\quad= \sum_{a_{i-1}}\sum_{a_{i+1}} \int_{z_{i-1}} \int_{z_{i+1}} \psi^{n}_{i}(a_{i-1:i+1},z_{i-1:i+1})\mathrm{d} z_{i-1}\mathrm{d} z_{i+1}\,, \end{aligned} $$
where \(\psi ^{n}_{i}(a_{i-1:i+1},z_{i-1:i+1})\) is the smoothing distribution of the hidden regimes and states between time indices i−1 and i+1. Note that the EM algorithm requires the approximation of p(a i−1,z i−1,a i−1,z i−1|y 1:n ) in the E-step, this may be obtained following the same steps by marginalizing explicitly the linear states at time i−2 and i+1. Intermediate computations follow the same steps as for the approximation of p(a i ,z i |y 1:n ). First, \(\psi ^{n}_{i}\) may be decomposed as follows:
$${}\begin{aligned} \psi^{n}_{i}(a_{i-1:i+1},z_{i-1:i+1}) &\propto p_{}(y_{i+1:n}|a_{i+1},z_{i+1})\\ &\quad\times p_{}(a_{i-1},z_{i-1}|y_{1:i-1})Q(a_{i-1},a_{i})\\ &\quad\times m_{}(a_{i},z_{i-1};z_{i})g_{}(a_{i},z_{i};y_{i})\\ &\quad\times Q(a_{i},a_{i+1})m_{}(a_{i+1},z_{i};z_{i+1})\,, \end{aligned} $$
where the proportionality is with respect to (a i−1:i+1,z i−1:i+1). Then, by (18), the smoothing distribution p(a i ,z i |y 1:n ) may be written as
$$ {}p(a_{i},z_{i}|y_{1:n}) \propto p(a_{i},z_{i}|y_{1:i-1})g_{}(a_{i},z_{i};y_{i})t_{i}(a_{i},z_{i},y_{i+1:n})\,, $$
where m and g are defined in (3) and (4) and
$$\begin{array}{*{20}l} t_{i}(a_{i},z_{i},y_{i+1:n}) &= \sum_{a_{i+1}}\int_{z_{i+1}}m_{}(a_{i+1},z_{i};z_{i+1})Q(a_{i},a_{i+1})\\ &\quad\times p_{}(y_{i+1:n}|a_{i+1},z_{i+1}) \mathrm{d} z_{i+1}\,. \end{array} $$
The backward pass described in Section 2.3.1 produces a sequence of states \(\tilde {a}_{i+1:n}^{\ell }\) associated with importance weights \(\tilde {\omega }_{i+1}^{\ell }\), 1≤N which are used to approximate p(y i+1:n |a i+1,z i+1). Plugging this approximation into (20) provides an approximation \(t^{N}_{i}(a_{i},z_{i},y_{i+1:n})\) of t i (a i ,z i ,y i+1:n ) integrating over all possible choices (a i+1,z i+1). These steps are then combined to form a non normalized SMC approximation of p(a i ,z i |y 1:n ) using (19). The normalization of the SMC approximation of p(a i ,z i |y 1:n ) is obtained by integrating over the states a i ,z i , when p(a i ,z i |y 1:i−1) and t i (a i ,z i ,y i+1:n ) are replaced by p N (a i ,z i |y 1:i−1) and \(t^{N}_{i}(a_{i},z_{i},y_{i+1:n})\) in (19). Our procedure allows to construct sequence of regimes with non-degenerated importance weights in the combination step. This procedure improves significantly [3] where no marginalization of p(a i ,z i |y 1:n ) over the states at times i−1 and i+1 is performed and where the proposed forward and backward paths are directly merged. This method often leads to importance weights which are close to be numerically degenerated. By Lemma 2, the SMC approximation \(p^{N}_{}(y_{i:n}|a_{i},z_{i})\) of p(y i:n |a i ,z i ) is then given by:
$$ {}\begin{aligned} p^{N}_{}(y_{i:n}|a_{i},z_{i}) &= \tilde{p}_{i}(y_{i:n}) \sum_{\ell=1}^{N} \frac{\delta_{\tilde{a}^{\ell}_{i}}(a_{i})\tilde{\omega}_{i}^{\ell}}{\int \gamma^{}_{i}\left(\tilde{a}^{\ell}_{i},z'\right)p_{}(y_{i:n}|\tilde{a}^{\ell}_{i:n},z')\mathrm{d} z'} \\ &\quad\times\exp\left\{- \frac{1}{2}\left\|z_{i}\right\|_{\tilde{P}^{\ell}_{i}}^{2} + z'_{i}\tilde{\nu}^{\ell}_{i} - \frac{1}{2} \tilde{c}^{\ell}_{i} \right\}\,, \end{aligned} $$
where \(\left (\tilde {P}^{\ell }_{i}\right)^{-1} {:=} \tilde {P}_{i}^{-1}\left (\tilde {a}^{\ell }_{i:n}\right)\), \(\tilde {\nu }^{\ell }_{i}{:=} \tilde {\nu }_{i}\left (\tilde {a}^{\ell }_{i:n}\right)\) and \(\tilde {c}^{\ell }_{i} {:=} \tilde {c}^{\ell }_{i}\left (\tilde {a}^{\ell }_{i:n}\right)\) are defined in Lemma 2. Define
$$\begin{array}{*{20}l} \Delta^{\ell}_{i+1} &{:=} \left(I_{\mathsf{m}} + H'_{\tilde{a}^{\ell}_{i+1}}\left(\tilde{P}^{\ell}_{i+1}\right)^{-1}H_{\tilde{a}^{\ell}_{i+1}}\right)^{-1}\,,\\ \delta^{\ell}_{i+1}&{:=}\tilde{\nu}^{\ell}_{i+1} + \overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}\left(d_{\tilde{a}^{\ell}_{i+1}}+T_{\tilde{a}^{\ell}_{i+1}}z_{i}\right)\,. \end{array} $$
Then, by (20), the SMC approximation \(t^{N}_{i}(a_{i},z_{i},y_{i+1:n})\) of t i (a i ,z i ,y i+1:n ) is given by:
$$ {} \begin{aligned} t^{N}_{i}(a_{i},z_{i},y_{i+1:n}) &=\sum_{a_{i+1}=1}^{J}\int_{z_{i+1}}m_{}(a_{i+1},z_{i};z_{i+1})\\ &\quad\times Q(a_{i},a_{i+1})p^{N}_{}(y_{i+1:n}|a_{i+1},z_{i+1}) \mathrm{d} z_{i+1}\,,\\ &= \tilde{p}_{i+1}(y_{i+1:n}) \sum_{\ell=1}^{N} C_{i}^{-1}\left(\tilde{a}^{\ell}_{i+1:n}\right)\\ &\quad\times Q\left(a_{i}, \tilde{a}^{\ell}_{i+1}\right) \tilde{\omega}^{\ell}_{i+1} \left|\overline{H}_{\tilde{a}^{\ell}_{i+1}}\right|^{-1/2} \left|H_{\tilde{a}^{\ell}_{i+1}} \Delta^{\ell}_{i+1} H'_{\tilde{a}^{\ell}_{i+1}}\right|^{1/2} \, \\ &\quad \times \exp\left\{{\vphantom{-\frac{1}{2}\left\|d_{\tilde{a}^{\ell}_{i+1}}+T_{\tilde{a}^{\ell}_{i+1}}z_{i}\right\|_{\overline{H}_{\tilde{a}^{\ell}_{i+1}}}^{2}}}\frac{1}{2} \left(\delta^{\ell}_{i+1}\right)'H_{\tilde{a}^{\ell}_{i+1}}\Delta^{\ell}_{i+1}H'_{\tilde{a}^{\ell}_{i+1}} \delta^{\ell}_{i+1}\right. \\ &\qquad\quad\left.-\frac{1}{2}\left\|d_{\tilde{a}^{\ell}_{i+1}}+T_{\tilde{a}^{\ell}_{i+1}}z_{i}\right\|_{\overline{H}_{\tilde{a}^{\ell}_{i+1}}}^{2}\right\} \,,\\ &= \sum_{\ell=1}^{N}\tilde{\omega}_{\mathsf{b},i}^{\ell}(a_{i})\exp\left\{-\frac{1}{2}\left\|z_{i}\right\|_{\tilde{S}_{i+1}^{\ell}}^{2}+z_{i}'\tilde{s}_{i+1}^{\ell}\right\}\,, \end{aligned} $$
$${}\begin{aligned} C_{i}(\tilde{a}^{\ell}_{i+1:n}) &{:=} \exp \left\{\tilde{c}^{\ell}_{i+1}/2\right\} \int_{z_{i+1}} \gamma^{}_{i+1}\left(\tilde{a}^{\ell}_{i+1}, z\right)\tilde{p}\left(y_{i+1:n}|\tilde{a}^{\ell}_{i+1:n},z\right) \mathrm{d} z\,,\\ \tilde{\omega}_{\mathsf{b},i}^{\ell}(a_{i}) &= \tilde{p}_{i+1}(y_{i+1:n})C_{i}\left(\tilde{a}^{\ell}_{i+1:n}\right)^{-1}\\ &\quad\times Q\left(a_{i}, \tilde{a}^{\ell}_{i+1}\right) \tilde{\omega}^{\ell}_{i+1} \left|\overline{H}_{\tilde{a}^{\ell}_{i+1}}\right|^{-1/2} \left|H_{\tilde{a}^{\ell}_{i+1}} \Delta^{\ell}_{i+1} H'_{\tilde{a}^{\ell}_{i+1}}\right|^{1/2}\\ &\quad\times\exp\left\{-d'_{\tilde{a}^{\ell}_{i+1}}\overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}d_{\tilde{a}^{\ell}_{i+1}}/2\right\}\\ &\quad\times \exp\left\{\left(\tilde{\nu}^{\ell}_{i+1} + \overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}d_{\tilde{a}^{\ell}_{i+1}}\right)'H_{\tilde{a}^{\ell}_{i+1}}\Delta^{\ell}_{i+1}H'_{\tilde{a}^{\ell}_{i+1}}\right.\\ &\quad\left.\times\left(\tilde{\nu}^{\ell}_{i+1} + \overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}d_{\tilde{a}^{\ell}_{i+1}}\right)/2\right\} \,,\\ \left(\tilde{S}_{i+1}^{\ell}\right)^{-1} &= T'_{\tilde{a}^{\ell}_{i+1}}\overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}\left(T_{\tilde{a}^{\ell}_{i+1}}-H_{\tilde{a}^{\ell}_{i+1}}\Delta^{\ell}_{i+1}H'_{\tilde{a}^{\ell}_{i+1}}\overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}T_{\tilde{a}^{\ell}_{i+1}}\right)\,,\\ \tilde{s}_{i+1}^{\ell} &=T'_{\tilde{a}^{\ell}_{i+1}}\overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}\!\left(\!H_{\tilde{a}^{\ell}_{i+1}}\Delta^{\ell}_{i+1}H'_{\tilde{a}^{\ell}_{i+1}}\left(\tilde{\nu}^{\ell}_{i+1} + \overline{H}_{\tilde{a}^{\ell}_{i+1}}^{-1}d_{\tilde{a}^{\ell}_{i+1}}\!\right)-d_{\tilde{a}^{\ell}_{i+1}}\right)\,. \end{aligned} $$
In the numerical experiments, \(\gamma ^{}_{i}(a_{i},z_{i})\) is set as a mixture of Gaussian distributions. As explained in Section 2.3.1, the integral \(\int _{z_{i+1}} \gamma ^{}_{i+1}\left (\tilde {a}^{\ell }_{i+1}, z\right)\tilde {p}\left (y_{i+1:n}|\tilde {a}^{\ell }_{i+1:n},z\right) \mathrm {d} z\) may be computed explicitly, see Lemma 3.

3 Simulated data

This section highlights the improvements brought by the additional Rao-Blackwellization steps for the two-filter and the FFBS approximations of the marginal smoothing distributions in the case where the number of states is J=2. The transition matrix Q is such that the probability of switching from one regime to the other is small, as expected for the WTI crude oil data, see Section 4. First, the algorithms are applied to a simple one-dimensional model with:
$$\pi_{1}=\pi_{2} = 0.5;\;d_{1} = 0.5\;\; d_{2} = 0;\;c_{1}= 0.1\;\; c_{2} = 0 \,, $$
$$\begin{aligned} Q = \left(\begin{array}{ll} 0.99 & 0.01\\ 0.03 & 0.97 \end{array}\right) \;\;T_{1} = T_{2} = 1\;\;\overline{H}_{1} = \overline{H}_{2}= 0.1\,, \end{aligned} $$
$$B_{1} = B_{2} =1\;\;\overline{G}_{1} = 0.3\;\;\overline{G}_{2} = 0.1\,. $$
The original FFBS algorithm of [21] and the FFBS algorithm with rejuvenation proposed in this paper are used with \(N = \tilde {N} = 25\). For comparable computational costs, the two-filter method of [3] and the method with rejuvenation are run with N=100. The artificial distributions are chosen as \(\gamma ^{}_{i} (a_{i}, z_{i}) = p^{N}(a_{i},z_{i}|y_{1:i-1})\) where p N (a i ,z i |y 1:i−1) is defined by (14). All these algorithms are compared to the estimation obtained with the proposed FFBS algorithm with rejuvenation and 5000 particles considered as a benchmark value. Figure 1 displays the mean estimation error over 100 independent Monte Carlo runs. The estimation error is defined as the absolute difference between the benchmark value and the estimations given by all algorithms.
Fig. 1

Posterior probabilities estimation error for all algorithms

In addition, Fig. 2 displays the empirical variance of the estimation for each algorithm. Figures 1 and 2 illustrate that in both cases the additional rejuvenation step improves the accuracy and the variability of SMC smoothers. In addition, even with a sharp choice for the artificial distributions γ i , 1≤in, FFBS-based methods outperform two-filter-based smoothers for this model.
Fig. 2

Empirical variances of the estimation of \(\mathbb {P}(a_{k}=1|Y_{1:n})\) for all algorithms

4 Application to CME crude oil (WTI)

4.1 Model

Modeling commodity prices is a crucial step to valuate contingent claims related to energy markets and to optimize storage or extraction strategies. In [14, 25], the authors proposed a model where the spot price of a commodity (S t ,t≥0) depends on a second factor (δ t ,t≥0), referred to as the instantaneous convenience yield. This factor plays the role of dividends in equity markets and models the benefit of holding the physical commodity or the storage and maintenance costs required to keep the commodity. In this model, this convenience yield is described as an Ornstein-Uhlenbeck process:
$$\begin{array}{*{20}l} \mathrm{d} S_{t} & = (r-\delta_{t})S_{t}\mathrm{d} t+\sigma S_{t} \mathrm{d} W_{t}^{1} \,,\\ \mathrm{d} \delta_{t} & = \kappa(\alpha-\delta_{t})\mathrm{d} t+\eta \mathrm{d} W_{t}^{2} \,, \quad \mathrm{d} \langle W_{t}^{1}, W_{t}^{2}\rangle = \rho \mathrm{d} t\,, \end{array} $$
where the parameters (r,σ,κ,α,η,ρ) are constant and \(\left (\left (W_{t}^{1},W_{t}^{2}\right), t \geq 0\right)\) are standard Brownian motions. This model appears to be too restrictive as energy markets are not likely to revert to a single equilibrium value. This assumption is relaxed using Markov switching models to allow several possible regimes for the spot price and the convenience yield. Following [1], the spot price and convenience yield are described in this paper as:
$$\begin{array}{*{20}l} {}\mathrm{d} S_{t} & = (r-\delta_{t})S_{t}\mathrm{d} t+\sigma_{a_{t}} S_{t}\mathrm{d} W_{t}^{1} \,,\\ {}\mathrm{d}\delta_{t} & = \kappa (\alpha_{a_{t}} -\delta_{t})\mathrm{d} t+\eta_{a_{t}} \mathrm{d} W_{t}^{2} \,, \quad \mathrm{d}\left\langle W_{t}^{1}, W_{t}^{2}\right\rangle = \rho_{a_{t}} \mathrm{d} t\,, \end{array} $$
where (a t ) t≥0 is a finite state space Markov process. This model allows to exhibit fundamental features of commodity future prices, which typically display different regimes of volatility and/or convenience yield. A two-regime model is already sufficient to produce stylized effects such as contango (increase of future prices) and backwardation (decrease of future prices). Assuming that the switching rate between regimes is negligible compared to the inverse of the discretization period, the discretized version of the spot price and convenience yield Z i =(lnS i ,δ i ) (the sampling period is taken to be 1) is modeled as a CLGM. The explicit integration of this SDE detailed in Lemma 4 yields the following discrete time model for (Z i ) i≥2:
$$Z_{i} = d_{a_{i-1}} + T Z_{i-1} + H_{a_{i-1}} \varepsilon_{i}\,, $$
where (with \(\overline {H}_{a_{i-1}} {:=} H_{a_{i-1}}H'_{a_{i-1}}\) and τ=t i t i−1):
$$\begin{aligned} d_{a_{i-1}} &{:=} \left(\begin{array}{c} \left[\mu- \alpha_{a_{i-1}} - \sigma^{2}_{a_{i-1}}/2 \right] \tau + \alpha_{a_{i-1}}[1-\mathrm{e}^{-\kappa \tau}]/\kappa\\ \alpha_{a_{i-1}} [1-\mathrm{e}^{-\kappa \tau}] \end{array}\right)\\ T &{:=} \left(\begin{array}{cc} 1 & -[1-\mathrm{e}^{-\kappa \tau}]/\kappa \\ 0 & \mathrm{e}^{-\kappa \tau} \end{array} \right)\\ \overline{H}_{a_{i-1}}(1,1) &= \sigma^{2}_{a_{i-1}} \tau + \eta^{2}_{a_{i-1}}\left\{\tau + (1-\mathrm{e}^{-2\kappa \tau})/(2\kappa)\right. \\ &\left.\quad- 2(1-\mathrm{e}^{-\kappa \tau})/\kappa\right\}/\kappa^{2} \\ & \quad- 2\rho_{a_{i-1}} \eta_{a_{i-1}} \sigma_{a_{i-1}}\left\{ \tau t_{i} - (1-\mathrm{e}^{-\kappa \tau})/\kappa \right\}/\kappa\,, \\ \overline{H}_{a_{i-1}}(1,2) & = \left(\rho_{a_{i-1}} \eta_{a_{i-1}} \sigma_{a_{i-1}}-\eta^{2}_{a_{i-1}}/\kappa \right)\\ &\quad\times\left(1-\mathrm{e}^{-\kappa \tau} \right)/\kappa + \eta^{2}_{a_{i-1}}\left(1-\mathrm{e}^{-2\kappa \tau} \right)/(2\kappa^{2})\,,\\ \overline{H}_{a_{i-1}}(2,1) &= \overline{H}_{a_{i-1}}(1,2)\,, \quad \overline{H}_{a_{i-1}}(2,2) = \eta^{2}_{a_{i-1}}\\ &\quad\times\left(1-\mathrm{e}^{-2\kappa \tau} \right)/(2\kappa)\,. \end{aligned} $$
The observations are Wednesday future contracts of the West Texas Intermediate crude oil (WTI) traded in the Chicago Mercantile Exchange (CME) from 11 January 1995 to 13 November 2013. The contracts are numbered F 1,F 2,…,F 36 where F 1 (or front month) is the earliest delivery future contract, F 2 is the second earliest delivery future contract and so on. Among these 36 contracts, the four future contracts F 1,F 4,F 6,F 13 are used since their trading volumes and their impacts on the term structures are the most important (F 1 is the most liquid contract, F 13 characterizes the gap between prices over a one year period, F 4 and F 6 are intermediate future contracts that are mostly traded). As in [1], we consider that each future contract has a fixed time to maturity F 1,F 4,F 6,F 13 have time to maturity 4, 16, 26, and 56 weeks. Our time series contains n=975 weekly data with 534 in backwardation and 441 in contango (the backwardation effect is more frequent with crude oil data). At each time t i =i τ, with τ=0.0192, the observations of the p=4 future prices are \(Y_{i} {:=} \left (\ln \left (F^{(market)}_{i\tau t,m_{1}}\right), \ldots, \ln \left (F^{(market)}_{i\tau t,m_{\mathsf {p}}}\right)\right)'\), where \(F_{t_{i},m}\) is the future price at t i for a maturity m weeks. A closed form solution for \(F_{t_{i},m}\) may be written:
$$F_{t_{i},m} {:=} \exp\left(\mathsf{A}_{m}(a_{i}) + \mathsf{B}_{m} Z_{i}\right)\,, $$
where \(\mathsf {B}_{0} = \left (\begin {array}{cc}1 & 0 \end {array}\right)\)and B m =B m−1 T so that \( \mathsf {B}_{m}= \left (\begin {array}{cc} 1 & -\left (1-\mathrm {e}^{-\kappa m \tau t}\right)/\kappa \end {array}\right)\)and for all 1≤jJ, A 0(j)=0, and
$$\begin{array}{*{20}l} \mathsf{A}_{m}(j) &= \ln \left(\sum^{J}_{k=1} Q(j,k) \exp(\mathsf{A}_{m-1}(k)) \right) \\ &\quad+ \mathsf{B}_{m-1} d_{j} + \frac{1}{2} \mathsf{B}_{m-1} \overline{H}_{j} B'_{m-1}\,. \end{array} $$
Therefore, the observations of the logfuture prices are given, for all 1≤in, by
$$Y_{i} = c_{a_{i}} + B Z_{i} + G \eta_{i}\,, $$
where η i is a standard multivariate Gaussian random variable and
$$\begin{array}{*{20}l} c_{j}' &= [ \mathsf{A}_{m_{1}}(j), \dots, \mathsf{A}_{m_{\mathsf{p}}}(j) ]\,, \\ B' &= [\mathsf{B}_{m_{1}}', \dots, \mathsf{B}_{m_{\mathsf{p}}}']\,, \;\;\; G = \text{diag}(g_{1}, \dots, g_{d})\,. \end{array} $$
The model depends on the parameters:
$$\begin{array}{*{20}l} {}\theta {:=} \{\pi, Q, \mu_{1}, \Sigma_{1}, \kappa, &(\alpha_{j})_{1\leq j\leq J}, (\sigma_{j})_{1\leq j\leq J}, (\eta_{j})_{1\leq j\leq J}, (\rho_{j})_{1\leq j\leq J}, \\ &(g_{\ell})_{1\leq \ell\leq d} \}\,. \end{array} $$
The aim of this section is to estimate θ and the posterior probabilities \(\mathbb {P}(a_{k}=j|Y_{1:n})\), 1≤kn, 1≤jJ. Given the observations Y 1:n , the EM algorithm introduced in [9] maximizes the incomplete data log-likelihood \(\theta \mapsto \ell _{\theta }^{n}\) defined by
$${}\begin{aligned} \ell_{\theta}^{n}(Y_{1:n}) {:=} \log\left(\sum_{a_{1}=1}^{J}\ldots\sum_{a_{n}=1}^{J}\int p_{\theta}(a_{1:n},z_{1:n},Y_{1:n})\,\mathrm{d} z_{1:n}\right)\,, \end{aligned} $$
where the complete data likelihood p θ is given by
$$\begin{aligned} {}p_{\theta}(a_{1:n},z_{1:n},Y_{1:n}) {:=}& \pi(a_{1})\phi_{\mu_{1},\Sigma_{1}}(z_{1})g_{\theta}(a_{1},z_{1};Y_{1})\\ &\times\prod^{n}_{i=2}Q(a_{i-1},a_{i})m_{\theta}\left(a_{i},z_{i-1};z_{i}\right)\\ &\times g_{\theta}(a_{i},,z_{i};Y_{i})\,. \end{aligned} $$
Denote by \(\mathbb {E}_{\theta }\left [\cdot \middle |Y_{1:n}\right ]\) the conditional expectation given Y 1:n when the parameter value is set to θ. The EM algorithm iteratively builds a sequence {θ p } p≥0 of parameter estimates following the two steps:
  1. 1.

    E-step: compute \(\theta \mapsto Q(\theta,\theta _{p}){:=} \mathbb {E}_{\theta _{p}}\left [\log p_{\theta }(a_{1:n},Z_{1:n},Y_{1:n})\middle |Y_{1:n}\right ]\) ;

  2. 2.

    M-step: choose θ p+1 as a maximizer of θQ(θ,θ p ).


All the conditional expectations involved in Q(θ,θ p ) are approximated using our two-filter algorithm with rejuvenation to define the SMC approximation θQ N (θ,θ p ) of θQ(θ,θ p ). As the function θQ N (θ,θ p ) cannot be maximized analytically, the M-step is performed numerically using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) introduced in [17]. This derivative-free optimization procedure is known to perform well in complex multimodal optimization settings, see e.g. [16].

4.2 Numerical results

The initial transition probability in CMA-ES is chosen as Q(1,1)=0.98, Q(2,2)=0.97, and π 1=π 2=0.5 where the number 1 represents the backwardation regime and 2 represents the contango regime. The other parameters are initialized as shown in Table 1.
Table 1

Initial values for the EM algorithm


α 1

α 2

σ 1

σ 2

η 1

η 2

ρ 1

ρ 2

g 1

g 2

g 3

g 4














The number of particles is set to N=100, τ=1/52. The interest rate is r=0.0296 as in [1]. The initial guess for the mean and variance of the initial state are
$$\begin{aligned} \mu_{1} = \left(\begin{array}{cc} \ln F^{\mathrm{(market)}}_{1,4} & r-\cfrac{\ln F^{\mathrm{(market)}}_{1,16}-\ln F^{\mathrm{(market)}}_{1,4}}{(16-4)\tau} \end{array}\right)\\ \text{and}\quad \Sigma_{1} = \left(\begin{array}{cc} 0.05 & 0 \\ 0 & 0.05 \end{array}\right). \end{aligned} $$

The CMA-ES algorithm is used with an initial standard deviation for the parameters σ cmaes =0.005, a number of selected search points μ cmaes =20 and a population size λ cmaes =100. The algorithm is stopped after 10000 iterations.

In Gibson-Schwartz model [14], a stronger backwardation effect implies a greater value for α for the same values of the other parameters. For the CME WTI Crude Oil, backwardation effect is more frequent than contango effect so that α 1 should be greater than α 2. Therefore, this condition is imposed for all simulations in the CMA-ES algorithm. The results after 2500 iterations of the EM algorithm are given in Table 2. The estimated values and standard deviations are obtained with 50 independent runs of the algorithm.
Table 2

Final estimates after 2500 iterations



σ 1

σ 2

η 1

η 2

ρ 1

ρ 2











Std. dev










α 1

α 2

g 1

g 2

g 3

g 4






2.3e −2

1.0e −4

3.0e −4

2.3e −2



Std. dev



1.9e −4

2.6e −4

2.3e −4

2.1e −04

6.7e −4

9.6e −4

As expected, we obtain σ 1σ 2, α 1α 2, η 1η 2 and ρ 1ρ 2 at convergence of the EM algorithm. Moreover, Q(1,1)>Q(2,2) corresponds to the prediction that we did from the data description. The fact that σ 1σ 2 and α 1α 2 indicates the first regime (backwardation) characterized by a higher value in both volatility and equilibrium level of convenience yield, and the second regime (contango) characterized by a lower value in both volatility and equilibrium convenience yield level. This in accordance with the theory of storage that the volatility of the commodity spot price is high when the inventory is low, and the convenience yield is all the higher as inventory is low.

Figure 3 compares the evolution of future 1M (the nearest contracts) to the term structure observed from CME WTI crude oil, defined as the difference of future 13M and future 1M (to avoid seasonality). The figure shows that it is not necessary to have an inverse relationship between the price of the nearest contract and the term structure. But when a significant drop in the price of the nearest contract occurs, the term structure increases (i.e. in contango).
Fig. 3

Log-price (red line) and slope of future curves (blue line)

The correlation between the spot price and the convenience yield is positive and high in both two regimes. This is an accordance to what as been observed in most commodity market, see [14]. The slope of future curve decreases in function of maturity.

Figures 4 and 5 display the estimated posterior probabilities of the regimes and the observed future slope. When the future curve is in backwardation (resp. contango), the model is expected to be in the first regime (resp. second regime), except for the period where the slope of the future curve is too small and in the period from December 2008 to April 2009 (beginning of the crisis).
Fig. 4

Posterior probability (triangle black) and slope of future curves (blue line)

Fig. 5

Posterior probability (triangle black) and slope of future curves (blue line)

5 Conclusions

This paper presents Rao-Blackwellized sequential Monte Carlo methods to approximate smoothing distributions in conditionally linear and Gaussian state spaces in a common unifying framework. It also provides different techniques that could be used in the forward filtering pass to improve significantly the usual mixture Kalman filter. The filtering distributions are approximated at each time step by considering all possible offsprings of all ancestral trajectories before discarding degenerated paths instead of resampling the ancestral paths before propagating them at the next time step. The paper investigates the benefit of additional Rao-Blackwellization steps to sample new regimes at each time step conditional on the forward and backward particles. This rejuvenation step uses explicit integration of the hidden linear states before merging the forward and backward filters for two-filter based algorithms or before sampling new states backward in time for FFBS based methods. The paper displays some Monte Carlo experiments with simulated data to illustrate that this additional rejuvenation step improves the performance of the smoothing algorithms with no substantial additional computational costs. They are also applied to commodity markets using WTI crude oil data.

6 Appendix: Technical lemmas

Lemmas 1, 2, and 3 are close to ([3], Proposition 5, Proposition 6). The proofs are detailed in this appendix for completeness.

Proof of Lemma 1

For all 1≤in−1,
$${} \begin{aligned} p_{}(y_{i:n}|a_{i},z_{i}) &= \sum_{a_{i+1:n}}\int p_{}(y_{i:n},a_{i+1:n},z_{i+1:n}|a_{i},z_{i})\mathrm{d} z_{i+1:n}\,,\\ &= \sum_{a_{i+1:n}}\int p_{}(a_{i+1:n},z_{i+1:n}|a_{i},z_{i})p_{}(y_{i:n}|a_{i:n},z_{i:n})\\ &\quad\times\mathrm{d} z_{i+1:n}\,,\\ &= \frac{\tilde{p}_{i}(y_{i:n})}{\gamma^{}_{i}(a_{i},z_{i})}\sum_{a_{i+1:n}}\int \frac{\gamma^{}_{i}(a_{i},z_{i})}{\tilde{p}_{i}(y_{i:n})}p_{}(a_{i+1:n},z_{i+1:n}|a_{i},z_{i})\\ &\quad\times p_{}(y_{i:n}|a_{i:n},z_{i:n})\mathrm{d} z_{i+1:n}\,,\\ &= \frac{\tilde{p}_{i}(y_{i:n})}{\gamma^{}_{i}(a_{i},z_{i})}\sum_{a_{i+1:n}}\int \tilde{p}_{i}(a_{i:n},z_{i:n}|y_{i:n})\mathrm{d} z_{i+1:n}\,,\\ &= \frac{\tilde{p}_{i}(y_{i:n})}{\gamma^{}_{i}(a_{i},z_{i})}\tilde{p}_{i}(a_{i},z_{i}|y_{i:n})\,, \end{aligned} $$
which concludes the proof of (15). To prove (16) write,
$${} \begin{aligned} \tilde{p}_{i}(a_{i:n},z_{i}|y_{i:n}) &= \frac{\gamma^{}_{i}(a_{i},z_{i})}{\tilde{p}_{i}(y_{i:n})}\int p_{}(y_{i:n}|a_{i:n},z_{i:n})p_{}(a_{i+1:n},z_{i+1:n}|a_{i},z_{i})\\ &\quad\times\mathrm{d} z_{i+1:n}\,,\\ &= \frac{\gamma^{}_{i}(a_{i},z_{i})}{\tilde{p}_{i}(y_{i:n})}\int \frac{p_{}(y_{i:n}|a_{i:n},z_{i})p_{}(z_{i+1:n}|y_{i:n},a_{i:n},z_{i})}{p_{}(z_{i+1:n}|a_{i:n},z_{i})}\\ &\quad p(a_{i+1:n},z_{i+1:n}|a_{i},z_{i})\mathrm{d} z_{i+1:n}\,,\\ &=\frac{\gamma^{}_{i}(a_{i},z_{i})}{\tilde{p}_{i}(y_{i:n})}p_{}(y_{i:n}|a_{i:n},z_{i})p_{}(a_{i+1:n}|a_{i})\,. \end{aligned} $$
$${}\tilde{p}_{i}(a_{i},z_{i}|y_{i:n}) = \frac{\gamma^{}_{i}(a_{i},z_{i})}{\tilde{p}_{i}(y_{i:n})}\sum_{a_{i+1:n}}p_{}(y_{i:n}|a_{i:n},z_{i})p_{}(a_{i+1:n}|a_{i}) $$
and the proof is completed upon noting that
$$\tilde{p}_{i}(a_{i:n}|y_{i:n}) = \frac{p_{}(a_{i+1:n}|a_{i})}{\tilde{p}_{i}(y_{i:n})} \int \gamma^{}_{i}(a_{i},z)p_{}(y_{i:n}|a_{i:n},z)\mathrm{d} z\,. $$

Lemma 2

For all 1≤in,
$$ {}p_{}(y_{i:n}|a_{i:n},z_{i}) \,=\, \exp\!\left\{-\frac{1}{2}\tilde{c}_{i}(a_{i:n}) \,-\, \frac{1}{2}\left\|z_{i}\right\|_{\tilde{P}_{i}(a_{i:n})}^{2} \,+\, z'_{i}\tilde{\nu}_{i}(a_{i:n})\!\right\}\,, $$
$$\begin{array}{*{20}l} {}\tilde{c}_{n}(a_{n}) &= \mathsf{p}\log(2\pi) + \log \left|\overline{G}_{a_{n}}\right| + \left\|y_{n}-c_{a_{n}}\right\|_{\overline{G}_{a_{n}}}^{2}\,, \end{array} $$
$$\begin{array}{*{20}l} {}\tilde{P}_{n}^{-1}(a_{n}) &=B'_{a_{n}}\overline{G}_{a_{n}}^{-1}B_{a_{n}}\,, \end{array} $$
$$\begin{array}{*{20}l} {}\tilde{\nu}_{n}(a_{n})&=B'_{a_{n}}\overline{G}_{a_{n}}^{-1}(y_{n}-c_{a_{n}}) \end{array} $$
and, for all 1≤in−1,
$$\begin{array}{*{20}l} {}\tilde{c}_{i}(a_{i:n}) &= \tilde{c}_{i|i+1}(a_{i+1:n}) + \mathsf{p}\log(2\pi) + \log|\overline{G}_{a_{i}}|\\ &\quad+ \left\|y_{i}-c_{a_{i}}\right\|_{\overline{G}_{a_{i}}}^{2}\,, \end{array} $$
$$\begin{array}{*{20}l} {}\tilde{P}_{i}^{-1}(a_{i:n}) &= \tilde{P}_{i|i+1}^{-1}(a_{i+1:n}) + B'_{a_{i}}\overline{G}_{a_{i}}^{-1}B_{a_{i}}\,, \end{array} $$
$$\begin{array}{*{20}l} {}\tilde{\nu}_{i}(a_{i:n})&=\tilde{\nu}_{i|i+1}(a_{i+1:n}) + B'_{a_{i}}\overline{G}_{a_{i}}^{-1}(y_{i}-c_{a_{i}})\,, \end{array} $$
$${} \begin{aligned} \Delta_{i+1}(a_{i+1:n}) &= \left(I_{\mathsf{m}} + H'_{a_{i+1}}\tilde{P}_{i+1}^{-1}(a_{i+1:n})H_{a_{i+1}}\right)^{-1}\,,\\ \tilde{r}_{i|i+1}(a_{i+1:n})&= \tilde{\nu}_{i+1}(a_{i+1:n}) + \overline{H}_{a_{i+1}}^{-1}d_{a_{i+1}}\,,\\ \tilde{c}_{i|i+1}(a_{i+1:n}) &= \tilde{c}_{i+1}(a_{i+1:n}) + \log|\overline{H}_{a_{i+1}}| + d'_{a_{i+1}}\overline{H}_{a_{i+1}}^{-1}d_{a_{i+1}}\\ &\quad-\log|H_{a_{i+1}}\Delta_{i}(a_{i+1:n})H'_{a_{i+1}}|\\ & \quad- \tilde{r}'_{i|i+1}(a_{i+1:n})H_{a_{i+1}}\Delta_{i}(a_{i+1:n})H'_{a_{i+1}}\tilde{r}_{i|i+1}(a_{i+1:n})\,,\\ \tilde{P}_{i|i+1}^{-1}(a_{i+1:n}) &=T'_{a_{i+1}}\left(I_{\mathsf{m}}-\overline{H}_{a_{i+1}}^{-1}H_{a_{i+1}}\Delta_{i}(a_{i+1:n})H'_{a_{i+1}}\right)\\ &\quad\times\overline{H}_{a_{i+1}}^{-1}T_{a_{i+1}}\,,\\ \tilde{\nu}_{i|i+1}(a_{i+1:n})&=T'_{a_{i+1}}\overline{H}_{a_{i+1}}^{-1}\left[-d_{a_{i+1}}+H_{a_{i+1}}\Delta_{i}(a_{i+1:n})H'_{a_{i+1}}\right.\\ &\quad\times\left.\left(\tilde{\nu}_{i+1}(a_{i+1:n})+\overline{H}_{a_{i+1}}^{-1}d_{a_{i+1}}\right)\right]\,. \end{aligned} $$


The result is proved by backward induction. (24), (25) and (26) follow directly from (2). Assume that for a given 1≤in−1, p(y i+1:n |a i+1:n ,z i+1) is given by (23). Write
$$\begin{array}{*{20}l} p(y_{i:n}|a_{i:n},z_{i}) &= \int m(a_{i+1},z_{i};z_{i+1})g(a_{i},z_{i};y_{i})\\ &\quad\times p(y_{i+1:n}|a_{i+1:n},z_{i+1})\mathrm{d} z_{i+1}\,, \end{array} $$
$${} \begin{aligned} m(a_{i+1},z_{i};z_{i+1}) &= \exp\left\{-\frac{\mathsf{m}}{2}\log(2\pi)-\frac{1}{2}\log|\overline{H}_{a_{i+1}}|\right.\\ &\left.\quad-\frac{1}{2}\left\|z_{i+1}-d_{a_{i+1}}-T_{a_{i+1}}z_{i}\right\|_{\overline{H}_{a_{i+1}}}^{2}\right\}\,,\\ g(a_{i},z_{i};y_{i})& = \exp\left\{-\frac{\mathsf{p}}{2}\log(2\pi)-\frac{1}{2}\log|\overline{G}_{a_{i}}|\right.\\ &\left.\quad-\frac{1}{2}\left\|y_{i}-c_{a_{i}}-B_{a_{i}}z_{i}\right\|_{\overline{G}_{a_{i}}}^{2}\right\}\,,\\ p(y_{i+1:n}|a_{i+1:n},z_{i+1})& = \exp\left\{-\frac{1}{2}c_{i+1}(a_{i+1:n})\right. \\ &\quad - \frac{1}{2}\left\|z_{i+1}\right\|_{\tilde{P}_{i+1}(a_{i+1:n})}^{2} \\ &\left.\quad+ z'_{i+1}\tilde{\nu}_{i+1}(a_{i+1:n}){\vphantom{\frac{1}{2}}}\right\}\,. \end{aligned} $$
Let Δ i+1 and δ i+1 be given by:
$$\begin{array}{*{20}l} \Delta_{i+1}(a_{i+1:n}) &{:=} \left(I_{\mathsf{m}} + H'_{a_{i+1}}\tilde{P}_{i+1}^{-1}(a_{i+1:n})H_{a_{i+1}}\right)^{-1}\,,\\ \delta_{i+1}(a_{i+1:n}) &{:=} \nu_{i+1}(a_{i+1:n}) + \overline{H}_{a_{i+1}}^{-1}(d_{a_{i+1}}+T_{a_{i+1}}z_{i})\,. \end{array} $$
Then, \(\overline {H}_{a_{i+1}}^{-1} + \tilde {P}_{i+1}^{-1}(a_{i+1:n}) = \left (H_{a_{i+1}}\Delta _{i+1}(a_{i+1:n})H'_{a_{i+1}}\right)^{-1}\) and (27), (28) and (29) follows from
$${}\begin{aligned} \int \exp&\left\{-\frac{1}{2}\left\|z_{i+1}\right\|_{H_{a_{i+1}}\Delta_{i+1}(a_{i+1:n})H'_{a_{i+1}}}^{2} + z'_{i+1}\delta_{i+1}(a_{i+1:n})\right\}\mathrm{d} z_{i+1} \\ &=\exp\left\{\frac{1}{2}\log(2\pi) + \frac{1}{2}\log|H_{a_{i+1}}\Delta_{i+1}(a_{i+1:n})H'_{a_{i+1}}|\right\}\\ &\quad\times\exp\left\{\frac{1}{2}\delta_{i+1}(a_{i+1:n})'H_{a_{i+1}}\Delta_{i+1}(a_{i+1:n})H'_{a_{i+1}}\delta_{i+1}(a_{i+1:n})\right\}\,. \end{aligned} $$

Lemma 3

For all 1≤in,
$$\begin{aligned} \int \phi_{\mu_{i},\Sigma_{i}}(z_{i})p_{}(y_{i:n}|a_{i:n},z_{i})\mathrm{d} z_{i} &= \exp\left\{-\frac{1}{2}\log |\Sigma_{i}| - \frac{1}{2}\mu_{i}'\Sigma^{-1}_{i}\mu_{i}\right\}\\ &\quad\times\exp\left\{-\frac{1}{2}\tilde{c}_{i}(a_{i:n})+\frac{1}{2}\log|\tilde{\Omega}_{i}(a_{i:n})|\right.\\ &\quad\left.+\frac{1}{2}\tilde{z}'_{i}(a_{i:n})\tilde{\Omega}_{i}(a_{i:n})\tilde{z}_{i}(a_{i:n})\right\}\,, \end{aligned} $$
where ϕ μ,Σ is the probability density function of a m dimensional Gaussian random variable with mean μ and variance matrix Σ and
$$\begin{array}{*{20}l} &\tilde{\Omega}_{i}(a_{i:n}){:=} \left(\Sigma_{i}^{-1} + \tilde{P}_{i}^{-1}(a_{i:n})\right)^{-1}\\ &\text{and}\quad \tilde{z}_{i}(a_{i:n}){:=} \Sigma_{i}^{-1}\mu_{i}+\tilde{\nu}_{i}(a_{i:n}) \end{array} $$

and where c i , \(\tilde {p}_{i}\) and ν i are given in Lemma 2.


By Lemma 2,
$${}\begin{aligned} \phi_{\mu_{i},\Sigma_{i}}(z_{i})p(y_{i:n}|a_{i:n},z_{i}) = \exp\left\{-\frac{\mathsf{m}}{2}\log(2\pi) -\frac{1}{2}\log|\Sigma_{i}|-\frac{1}{2}\left\|z_{i}-\mu_{i}\right\|_{\Sigma_{i}}^{2}\right\}\\ \times\exp\left\{-\frac{1}{2}c_{i}(a_{i:n})-\frac{1}{2}\left\|z_{i}\right\|_{\tilde{P}_{i}(a_{i:n})}^{2}+z'_{i}\nu_{i}(a_{i:n})\right\}\,. \end{aligned} $$
The proof is completed noting that
$${}\begin{aligned} \int &\exp\left\{-\frac{1}{2}z'_{i}\tilde{\Omega}_{i}^{-1}(a_{i:n})z_{i} + z'_{i}(\Sigma_{i}^{-1}\mu_{i}+\nu_{i}(a_{i:n}))\right\}\mathrm{d} z_{i}\\ &= \exp\left\{\frac{\mathsf{m}}{2}\log(2\pi) + \frac{1}{2}\log|\tilde{\Omega}_{i}(a_{i:n})|\right.\\ &\left.\quad+\frac{1}{2}\left[\Sigma_{i}^{-1}\mu_{i}\,+\,\nu_{i}(a_{i:n})\!\right]'\!\tilde{\Omega}_{i}(a_{i:n})\left[\Sigma_{i}^{-1}\mu_{i}+\nu_{i}(a_{i:n})\right]\right\}\,. \end{aligned} $$

Lemma 4

Let (X t ,δ t ) t≥0 be solutions to the following SDE:
$$\begin{array}{*{20}l} \mathrm{d} X_{t} &= \left(\mu - \delta_{t}- \sigma^{2}/2 \right) dt + \sigma d W^{1}_{t} \,,\\ \mathrm{d} \delta_{t} &= \kappa\left(\alpha - \delta_{t}\right)dt + \eta dW^{2}_{t} \,, \end{array} $$
\(\left (W_{t}^{1}\right)_{t\ge 0}\) and \(\left (W_{t}^{2}\right)_{t\ge 0}\) are standard Brownian motions such that \(\mathrm {d} \langle W_{t}^{1},W_{t}^{2} \rangle = \rho \mathrm {d} t\). Then, for all t≥0 and h>0,
$$\left(\begin{array}{c} X_{t+h} \\ \delta_{t+h} \end{array}\right) = d_{h} + T_{h} \left(\begin{array}{c} X_{t} \\ \delta_{t} \end{array}\right) + H_{h}\varepsilon\,, $$
where ε is a standard 2-dimensional Gaussian random variable and (with \(\overline {H}_{h}{:=} H_{h}'H_{h}\)),
$$ \begin{aligned} &d {:=} \left(\begin{array}{c} \left[\mu- \alpha - \sigma^{2}/2 \right] h + \alpha[1-e^{-\kappa h}]/\kappa\\ \alpha [1-e^{-\kappa h}] \end{array}\right) \,, \\ &T_{h} {:=} \left(\begin{array}{cc} 1 & -[1-e^{-\kappa h}]/\kappa \\ 0 & e^{-\kappa h} \end{array}\right) \,, \end{aligned} $$
$${} \begin{aligned} \overline{H}_{h}(1,1) & {:=} \sigma^{2} h+ \eta^{2}\left\{h + (1-e^{-2\kappa h})/(2\kappa) - 2(1-e^{-\kappa h})/\kappa \right\}/\kappa^{2} \\ &\quad -2\rho\eta\sigma\left\{h - (1-e^{-\kappa h})/\kappa \right\}/\kappa\,, \\ \overline{H}_{h}(1,2) & {:=} \left(\rho \eta \sigma-\eta^{2}/\kappa\right)\left(1-e^{-\kappa h} \right)/\kappa\\ &\quad+ \eta^{2}\left(1-e^{-2\kappa h} \right)/(2\kappa^{2})\,,\\ \overline{H}_{h}(2,1) & {:=} \overline{H}_{h}(1,2)\,,\\ \overline{H}_{h}(2,2) & {:=} \eta^{2}\left(1-e^{-2\kappa h} \right)/(2\kappa)\,. \end{aligned} $$


For all t≥0,
$$X_{t} = X_{0} + (\mu-\sigma^{2}/2)t - \int_{0}^{t}\delta_{s}\mathrm{d} s +\sigma W_{t}^{1} $$
and, as (δ t )0≤tT is an Ornstein-Uhlenbeck process,
$$\delta_{t} = \delta_{0}\mathrm{e}^{-\kappa t} + \alpha(1-\mathrm{e}^{-\kappa t}) + \int_{0}^{t} \eta \mathrm{e}^{\kappa(s-t)}\mathrm{d} W^{2}_{s}\,. $$
$${}\begin{aligned} \int_{0}^{t}\delta_{s}\mathrm{d} s &= (\delta_{0}-\alpha)(1-\mathrm{e}^{-\kappa t})/\kappa + \alpha t \\ &\quad+ \eta\int_{0}^{t}\int_{0}^{s}\mathrm{e}^{\kappa(u-s)}\mathrm{d} W^{2}_{u}\mathrm{d} s\,,\\ &=(\delta_{0}-\alpha)(1-\mathrm{e}^{-\kappa t})/\kappa + \alpha t \\ &\quad+ (\eta/\kappa) \int_{0}^{t} (1-\mathrm{e}^{-\kappa (t-s)}) \mathrm{d} W^{2}_{s}\,. \end{aligned} $$
Defining \(\tilde {W}^{1}_{t} {:=} - (\eta /\kappa) \int _{0}^{t} (1-\mathrm {e}^{-\kappa (t-s)}) \mathrm {d} W^{2}_{s}+\sigma W_{t}^{1}\) and \(\tilde {W}_{t}^{2}{:=} \int _{0}^{t} \eta \mathrm {e}^{\kappa (s-t)}\mathrm {d} W^{2}_{s}\), this yields:
$$\begin{array}{*{20}l} {}X_{t} &= X_{0} \,+\, (\mu-\sigma^{2}/2)t \,+\, (\alpha-\delta_{0})(1-\mathrm{e}^{-\kappa t})/\kappa - \alpha t \,+\, \tilde{W}_{t}^{1}\,,\\ {}\delta_{t} &= \delta_{0}\mathrm{e}^{-\kappa t} + \alpha(1-\mathrm{e}^{-\kappa t}) + \tilde{W}^{2}_{t}\,. \end{array} $$
The proof is concluded upon noting that \(\tilde {W}^{1}_{t}\) and \(\tilde {W}^{2}_{t}\) are centered Gaussian random variables such that:
  • \( \mathrm{V}\mathrm{a}\mathrm{r}\left[{\overset{\sim }{W}}_t^1\right]={\sigma}^2t+{\eta}^2\left\{t+\left(1-{e}^{-2\kappa t}\right)/\left(2\kappa \right)\right.\left.-2\left(1-{e}^{-\kappa t}\right)/\kappa \right\}/{\kappa}^2-2\rho \eta \sigma \left\{t-\left(1-{e}^{-\kappa t}\right)/\kappa \right\}/\kappa \kern0.3em , \)

  • \( \mathrm{V}\mathrm{a}\mathrm{r}\left[{\overset{\sim }{W}}_t^2\right]={\eta}^2\left(1-{\mathrm{e}}^{-2\kappa t}\right)/\left(2\kappa \right)\kern0.3em , \)

  • \( \mathrm{C}\mathrm{o}\mathrm{v}\left[{\overset{\sim }{W}}_t^1,{\overset{\sim }{W}}_t^2\right]=\left(\rho \eta \sigma -{\eta}^2/\kappa \right)\left(1-{e}^{-\kappa t}\right)/\kappa +{\eta}^2\left(1-{e}^{-2\kappa t}\right)/\left(2{\kappa}^2\right)\kern0.3em . \)



This work has been developed during a 3-year Ph.D. at Télécom ParisTech and Lunalogic.

Authors’ contributions

All the authors have contributed to the conception of the algorithms and to the redaction of the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

LTCI, CNRS and Télécom ParisTech
Laboratoire de Mathématiques d’Orsay, Univ. Paris-Sud, CNRS, Université Paris-Saclay
Centre de Mathématiques Appliquées, UMR 7641, Ecole Polytechnique


  1. A Almansour, Convenience yield in commodity price modeling: a regime switching approach. Energy Econ. 53:, 238–247 (2016).View ArticleGoogle Scholar
  2. D Barber, Expectation correction for smoothed inference in switching linear dynamical systems. J. Mach. Learn. Res. 7:, 2515–2540 (2006).MathSciNetMATHGoogle Scholar
  3. M Briers, A Doucet, S Maskell, Smoothing algorithms for state-space models. Ann. Inst. Stat. Math. 62(1), 61–89 (2010).MathSciNetView ArticleMATHGoogle Scholar
  4. AE Bryson, M Frazier, Smoothing for linear and nonlinear dynamic systems. Proc. Optimum Sys. Synthesis Conf (1963).Google Scholar
  5. S Barembruch, A Garivier, E Moulines, On optimal sampling for particle filtering in digital communication. IEEE 9th Workshop Signal Process Adv Wireless, 634–638 (2008).Google Scholar
  6. R Chen, JS Liu, Mixture kalman filters. J. R. Stat. Soc. B. 62:, 493–508 (2000).MathSciNetView ArticleMATHGoogle Scholar
  7. A Doucet, S Godsill, C Andrieu, On sequential monte carlo sampling methods for bayesian filtering. Stat. Comput. 10:, 197–208 (2000).View ArticleGoogle Scholar
  8. A Doucet, N Gordon, V Krishnamurthy, Particle filters for state estimation of jump Markov linear systems. IEEE Trans. Sign. Process. 49(3), 613–624 (2001).View ArticleGoogle Scholar
  9. AP Dempster, NM Laird, DB Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B. 39(1), 1–38 (1977). with discussion.MathSciNetMATHGoogle Scholar
  10. P Del Moral, Mean field simulation for Monte Carlo integration (CRC press, London, 2013).MATHGoogle Scholar
  11. P Fearnhead, P Clifford, On-line inference for hidden Markov models via particle filters. J.R. Statist. Soc. B. 65:, 887–899 (2003).MathSciNetView ArticleMATHGoogle Scholar
  12. W Fong, SJ Godsill, A Doucet, M West, Monte Carlo smoothing with application to audio signal enhancement. IEEE Trans. Signal Process. 50(2), 438–449 (2002).View ArticleGoogle Scholar
  13. P Fearnhead, D Wyncoll, J Tawn, A sequential smoothing algorithm with linear computational cost. Biometrika. 97:, 447–464 (2010).MathSciNetView ArticleMATHGoogle Scholar
  14. R Gibson, ES Schwartz, Stochastic convenience yield and the pricing of oil contingent claims. J. Financ. 45(3), 959–976 (1990).View ArticleGoogle Scholar
  15. M Hürzeler, HR Künsch, Monte Carlo approximations for general state-space models. J. Comput. Graph. Statist. 7:, 175–193 (1998).MathSciNetMATHGoogle Scholar
  16. N Hansen, S Kern, Evaluating the CMA evolution strategy on multimodal test functions. Eighth Int. Conf. Parallel Probl. Solving from Nat. 72:, 337–354 (2004).Google Scholar
  17. N Hansen, A Ostermeier, Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001).View ArticleGoogle Scholar
  18. RE Kalman, A new approach to linear filtering and prediction problems. J. Basic Engng. 82:, 35–45 (1960).View ArticleGoogle Scholar
  19. CJ Kim, Dynamic linear models with Markov-switching. J. Econ. 60(1-2), 1–22 (1994).MathSciNetView ArticleMATHGoogle Scholar
  20. F Lindsten, P Bunch, SJ Godsill, TB Schon, in Proceedings of the 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rao-Blackwellized particle smoothers for mixed linear/nonlinear state-space models (IEEE Signal Processing SocietyNew Jersey, 2013).Google Scholar
  21. F Lindsten, P Bunch, S Sarkka, TB Schon, SJ Godsill, Rao-Blackwellized particle smoothers for conditionally linear Gaussian models. IEEE J. Sel. Top. Sign. Process. 10(2), 353–365 (2016).View ArticleGoogle Scholar
  22. AE Rauch, CT Striebel, F Tung, Maximum likelihood estimates of linear dynamic systems. Am. Inst. Aeronaut. Astronaut. J. 3(8), 1445–1450 (1965).MathSciNetView ArticleGoogle Scholar
  23. S Sarkka, Bayesian filtering and smoothing (Cambridge University Press, Cambridge, 2013).View ArticleMATHGoogle Scholar
  24. S Sarkka, P Bunch, SJ Godsill, in Proceedings of the 16th IFAC Symposium on System Identification (SYSID). A backward-simulation based Rao-Blackwellized particle smoother for conditionally linear Gaussian models (Elsevier Ltd.Amsterdam, 2012).Google Scholar
  25. ES Schwartz, The stochastic behaviour of commodity prices: implications for pricing and hedging. J. Financ. 3:, 923–973 (1997).View ArticleGoogle Scholar
  26. T Schon, F Gustafsson, P-J Nordlund, Marginalized particle filters for mixed linear/nonlinear state-space models. IEEE Trans. Sign. Process. 53(7), 2279–2289 (2005).MathSciNetView ArticleGoogle Scholar


© The Author(s) 2017