Open Access

Online sequential Monte Carlo smoother for partially observed diffusion processes

EURASIP Journal on Advances in Signal Processing20182018:9

Received: 5 March 2017

Accepted: 11 January 2018

Published: 2 February 2018


This paper introduces a new algorithm to approximate smoothed additive functionals of partially observed diffusion processes. This method relies on a new sequential Monte Carlo method which allows to compute such approximations online, i.e., as the observations are received, and with a computational complexity growing linearly with the number of Monte Carlo samples. The original algorithm cannot be used in the case of partially observed stochastic differential equations since the transition density of the latent data is usually unknown. We prove that it may be extended to partially observed continuous processes by replacing this unknown quantity by an unbiased estimator obtained for instance using general Poisson estimators. This estimator is proved to be consistent and its performance are illustrated using data from two models.


Stochastic differential equationsSmoothingSequential Monte Carlo MethodsOnline estimation

1 Introduction

This paper introduces a new algorithm to solve the smoothing problem for hidden Markov models (HMMs) whose hidden state is a solution to a stochastic differential equation (SDE). These models are referred to as partially observed diffusion (POD) processes in [27]. The hidden state process (X t )t≥0 is assumed to be a solution to a SDE, and the only information available is given by noisy observations (Y k )0≤kn of the states (X k )0≤kn (where X k stands for \(X_{t_{k}}\)) at some discrete time points (t k )0≤kn. The bivariate stochastic process {(X k ,Y k )}0≤kn is a state-space model such that conditional on the state sequence (X k )0≤kn the observations (Y k )0≤kn are independent and for all 0≤n the conditional distribution of Y given {X k }0≤kn depends on X only.

Statistical inference for HMMs often requires to solve Bayesian filtering and smoothing problems, i.e., the computation of the posterior distributions of sequences of hidden states given observations. The filtering problem refers to the estimation, for each 0≤kn, of the distributions of the hidden state X k given the observations (Y0,…,Y k ). Smoothing stands for the estimation of the distributions of the sequence of states (X k ,…,X p ) given observations (Y0,…,Y ) with 0≤kpn. These posterior distributions are crucial to compute maximum likelihood estimators of unknown parameters using the observations (Y0,…,Y n ) only. For instance, the E-step of the EM algorithm introduced in [9] boils down to the computation of a conditional expectation of an additive functional of the hidden states given all the observations up to time n. Similarly, by Fisher’s identity, recursive maximum likelihood estimates may be computed using the gradient of the log likelihood which can be written as the conditional expectation of an additive functional of the hidden states. See [7, Chapters 10 and 11], [19, 23, 24, 31] for further references on the use of these smoothed expectations of additive functionals applied to maximum likelihood parameter inference in latent data models.

However, in most cases, the exact computation of these expectations is usually not possible explicitly. Sequential Monte Carlo (SMC) methods are popular algorithms to approximate smoothing distributions with random particles associated with importance weights. [17, 22] introduced the first particle filters and smoothers for state-space models by combining importance sampling steps to propagate particles with resampling steps to duplicate or discard particles according to their importance weights. In the case of HMMs, approximations of the smoothing distributions may be obtained using the forward filtering backward smoothing algorithm (FFBS) and the forward filtering backward simulation algorithm (FFBSi) developed respectively in [11, 18, 22], and [16]. Both algorithms require first a forward pass which produces a set of particles and weights approximating the sequence of filtering distributions up to time n. Then, a backward pass is performed to compute new weights (FFBS) or sample trajectories (FFBSi) in order to approximate the smoothing distributions. Recently, [28] proposed a new SMC algorithm, the particle-based rapid incremental smoother (PaRIS), to approximate on-the-fly (i.e., using the observations as they are received) smoothed expectations of additive functionals. Unlike the FFBS algorithm, the complexity of this algorithm grows only linearly with the number of particles N and contrary to the FFBSi algorithm, no backward pass is required. One of the best features of PaRIS algorithm is that it may be implemented online, using the observations (Y k )k≥0 as they are received, without any increasing storage requirements.

Unfortunately, these methods cannot be applied directly to POD processes since some elementary quantities, such as transition densities of the hidden states, are not available explicitly. In the context of SDEs, discretization procedures may be used to approximate transition densities. For instance, the classical Euler-Maruyama method, the Ozaki discretization which proposes a linear approximation of the drift coefficient between two observations [29, 32], or Gaussian-based approximations using Taylor expansions of the conditional mean and variance of an observation given the observation at the previous time step, [20, 21, 33]. Other approaches based on Hermite polynomials expansion were also introduced by [13] and were extended in several directions recently, see [25] and all the references on the approximation of transition densities therein. However, even the most recent discretization-based approximations of the transition densities induce a systematic bias in the approximation of the transition densities, see for instance [8].

To overcome this difficulty, [13] proposed to solve the filtering problem by combining SMC methods with an unbiased estimate of the transition densities based on the generalized Poisson estimator (GPE). In this case, only the Monte Carlo error has to be controlled as there is no Taylor expansion to approximate unknown transition densities, i.e., no discretization scheme is used. The only solution to solve the smoothing problem for POD processes using SMC methods without any discretization procedure has been proposed in [27] and extends the fixed-lag smoother of [26]. Using forgetting properties of the hidden chain, the algorithm improves the performance of [13] to approximate smoothing distributions but at the cost of a bias, this time due to the fixed lag approximation, that does not vanish as the number of particles grows to infinity.

In this paper, we propose to use SMC methods to obtain consistent approximations of smoothing expectations of POD processes by extending the PaRIS algorithm. The proposed algorithm allows to approximate smoothed expectations of additive functionals online, with a complexity growing only linearly with the number of particles and without any discretization procedure or Taylor expansion of the transition densities. The crucial and simple result (Lemma 1) of the application of the PaRIS algorithm to POD processes is that the acceptance rejection mechanism introduced in [10] ensuring the linear complexity of the procedure is still correct when the transition densities are replaced by unbiased estimates. The usual FFBS and FFBSi algorithms may not extend this easily since they both require the computation of weights defined as ratios involving the transition densities, thus replacing these unknown quantities by unbiased estimates does not lead to unbiased estimators of the weights. The linear version of the FFBSi algorithm proposed in [10] could be extended in a similar way as PaRIS algorithm but it would still require a backward pass and would not be an online smoother. The proposed generalized random version of PaRIS algorithm, hereafter named GRand PaRIS algorithm, may not only be applied to POD processes but also to any general state-space model where the transition density of the hidden chain may be approximated using a positive and unbiased estimator.

Section 2 describes the model and the smoothing quantities to be estimated. Section 3 provides the algorithm to approximate smoothed additive functionals using unbiased estimates of the transition density of the hidden states. This section also details the application of this algorithm when the transition density are approximated using a GPE. In Section 4, classical convergence results for SMC smoothers are extended to the setting of this paper and illustrated with numerical experiments in Section 5. All proofs are postponed to Appendix.

2 Model and framework

Let (X t )t≥0 be defined as a weak solution to the following SDE in \(\mathbb {R}^{d}\):
$$ X_{0} = x_{0}\quad\text{and}\quad \mathrm{d} X_{t} = \alpha(X_{t})\mathrm{d} t + \Gamma(X_{t})\mathrm{d} W_{t}\;, $$

where (W t )t≥0 is a standard Brownian motion on \(\mathbb {R}^{d}\), \(\alpha : \mathbb {R}^{d}\to \mathbb {R}^{d}\), and \(\Gamma : \mathbb {R}^{d}\to \mathbb {R}^{d\times d}\). The solution to (1) is supposed to be partially observed at times t0=0,…,t n through an observation process (Y k )0≤kn in \(\left (\mathbb {R}^{m}\right)^{n+1}\). In the following, for all 0≤kn, the state \(X_{t_{k}}\) at time k is referred to as X k . For all 0≤kn, the distribution of Y k given X k has a density with respect to a reference measure λ on \(\mathbb {R}^{m}\) given by g(X k ,·). For the sake of simplicity, the shorthand notation g k (X k ) for g(X k ,Y k ) is used. The distribution of X0 has a density with respect to a reference measure μ on \(\mathbb {R}^{d}\) given by χ. For all 0≤kn−1, the conditional distribution of Xk+1 given X k has a density q k (X k ,·) with respect to μ.

For all 0≤kkn, the joint smoothing distributions of the hidden states are defined, for all measurable function h on \((\mathbb {R}^{d})^{k'-k + 1}\), by:
$$\phi_{k:k'|n}\left[h\right] = \mathbb{E}\left[\left.h\left(X_{k},\ldots,X_{k'}\right)\right|Y_{0:n}\right] $$
and ϕ k =ϕk:k|k denotes the filtering distributions. The aim of this paper is to approximate expectations of the form:
$$\begin{array}{*{20}l} \phi_{0:n\vert n}\left[H_{n}\right] =&\ \mathbb{E}\left[\left.H_{n}(X_{0:n})\right|Y_{0:n}\right], \\ \text{where}\ H_{n}=&\sum_{k=0}^{n-1}h_{k}(X_{k},X_{k+1})\;, \end{array} $$

when \(\{h_{k}\}_{k=0}^{n-1}\) are given functions on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\). Smoothed additive functionals as (2) are crucial for maximum likelihood inference of latent data models. These quantities appear naturally when computing the Fisher score in hidden Markov models or the intermediate quantity of the expectation maximization algorithm (see Section 5). They are also pivotal to design online expectation maximization-based algorithms which motivates the method introduced in this paper that does not require growing storage and can process observations online.

The algorithm proposed in this paper is based on sequential Monte Carlo methods which offer a flexible framework to approximate such distributions with weighted empirical measures associated with random samples. At each time step, the samples are moved randomly in \(\mathbb {R}^{d}\) and associated with importance weights. In general situations, the computation of these importance weights involve the unknown transition density of the process (1). The solution introduced in Section 3 requires an unbiased estimator of these unknown transition densities. Moreover, this estimator must be almost surely positive and upper bounded. Statistical inference of stochastic differential equations is an active area of research, and several solutions have been proposed to design unbiased estimates of these transition densities. Those estimators require different assumptions on the model (1), we provide below several solutions that can be investigated.

General Poisson estimators This paper focuses mainly on GPEs which have been widely used recently and applied in a variety of disciplines. These estimators require that the diffusion coefficient Γ is constant and equal to the identity matrix, see [13]. They may be applied to reducible SDE for which there exists an invertible and infinitely differentiable function η such that the process {Z t =η(X t )}t≥0 satisfies the SDE z0=η(x0) and
$$ \mathrm{d}Z_{t} = \beta(Z_{t})\mathrm{d}t + \mathrm{d}W_{t}\;. $$
By Ito’s formula, it is straightforward to show that, in the case of a reducible diffusion, the Jacobian matrix of η satisfies
$$\nabla \eta = \Gamma^{-1}\;, $$
and, in the case d=1,
$$\beta:u \mapsto \frac{\alpha(\eta^{-1}(u))}{\Gamma(\eta^{-1}(u))} - \frac{\Gamma'\left(\eta^{-1}(u)\right)}{2}\;. $$
In the case of a scalar diffusion, this Lamperti transform is given by
$$\eta:x\mapsto \int_{x_{0}}^{x}\Gamma^{-1}(u)\mathrm{d}u\;. $$
In the general case, [3, Proposition 1] shows that when Γ is non-singular, the SDE is reducible if and only if, for all 1≤i,j,kd,
$$ \frac{\partial\Gamma^{-1}_{i,j}}{\partial x_{k}} = \frac{\partial\Gamma^{-1}_{i,k}}{\partial x_{j}}\;. $$
In the case of a diagonal matrix Γ (4) is equivalent to assume that Γ is such that for each 1≤id, Γ i,i depends on x i only. [3] notes that the reducibility condition (4) holds also for some non-diagonal matrices Γ. This is true in particular in the case d=2 for stochastic volatility models where σ is of the form:
$$\Gamma(x) = \left(\begin{array}{cc} a(x_{1}) & a(x_{1})b(x_{2})\\ 0&c(x_{2}) \end{array}\right)\;. $$
GPEs consider that the process (X t )t≥0 satisfies the SDE (1) with Γ being the identity matrix, i.e., we consider a diffusion after the application of the Lamperti transform. In addition, designing GPEs also requires that
  1. i)

    α is of the form α(x)= x A(x) where \(A: \mathbb {R}^{d} \to \mathbb {R}\) is a twice continuously differentiable function ;

  2. ii)

    the function x(α(x)2+A(x))/2 is lower bounded where is the Laplace operator.


Assumption (i) is somewhat restrictive as it requires α to derive from a scalar potential, however, it has natural applications in many fields such as movement ecology, see [15]. Assumption (ii) is a technical condition which ensures that exact sampling of processes solution to (1) using acceptance rejection methods, see for instance [4, 5, 13]. In addition to provide an unbiased estimate of the transition density, the GPE ensure that this estimate is almost surely positive. Moreover, as detailed below, under additional conditions, a GPE that is almost surely upper bounded can be defined.

Continuous importance sampling-based estimators In the case the previous assumptions are not fulfilled, in particular assumption (i), alternatives to GPEs are given by continuous importance sampling procedures for SDE. In [34], for each 0≤kn−1, the transition density between t k and tk+1 is expressed as an infinite expansion obtained using the Kolmogorov backward operator associated with (1). This analytical expression of the transition density is not tractable and is estimated by updating random samples at random times between t k and tk+1 using tractable proposal distributions (for instance, based on an Euler discretization of the original SDE). Then, these samples are associated with random weights to ensure that the proposed estimator is unbiased. More recently, [14] extended the discrete time importance sampling estimator by introducing updates at random times associated with a renewal process. The random samples are weighted using the Kolmogorov forward operator associated with the SDE which relies on the first two order derivatives of the drift and diffusion coefficients (and is therefore tractable).

The unbiasedness of these procedures and the controls of the variability of the estimates require moments assumptions and Holder type conditions on the parameters of the SDE (1). Their efficiency require a fair amount of tuning as they highly depend on the proposal densities used to obtain the Monte Carlo samples and the point processes generating the underlying random times. In addition to unbiasedness, the proposed algorithm in this work requires that the estimator of the transition density is almost surely positive and upper bounded. This implies additional assumptions on the SDE depending on the chosen estimate and could lead to interesting perspectives.

3 The generalized random PaRIS algorithm

The algorithm is based on the following link between the filtering and smoothing distributions for additive functionals, see [28]:
$$ \begin{aligned} \phi_{0:n|n}\left[h\right] &= \phi_{n}\left[T_{n}[h]\right]\;,\;\text{where}\; \\ T_{n}\left[h\right](X_{n}) &= \mathbb{E}\left[h(X_{0:n})\vert X_{n},Y_{0:n}\right]\;. \end{aligned} $$

The approximation of (5) requires first to approximate the sequence of filtering distributions. Sequential Monte Carlo methods provide an efficient and simple solution to obtain these approximations using sets of particles \(\left \{\xi ^{\ell }_{k}\right \}_{\ell =1}^{N}\) associated with weights \(\left \{\omega ^{\ell }_{k}\right \}_{\ell =1}^{N}\), 0≤kn.

At time k=0, N particles \(\left \{\xi ^{\ell }_{0}\right \}_{\ell =1}^{N}\) are sampled independently according to \(\xi ^{\ell }_{0} \sim \eta _{0}\), where η0 is a probability density with respect to μ. Then, \(\xi ^{\ell }_{0}\) is associated with the importance weights \(\omega _{0}^{\ell } = \chi \left (\xi ^{\ell }_{0}\right)g_{0} \left (\xi ^{\ell }_{0}\right)/\eta _{0}\left (\xi ^{\ell }_{0}\right)\). For any bounded and measurable function h defined on \(\mathbb {R}^{d}\), the expectation ϕ0[h] is approximated by
$$\phi^{N}_{0}[\!h] = \frac{1}{\Omega_{0}^{N}} \sum_{\ell=1}^{N} \omega_{0}^{\ell} h \left(\xi^{\ell}_{0} \right)\;, \quad \Omega_{0}^{N}:= \sum_{\ell=1}^{N} \omega_{0}^{\ell}\;. $$
Then, for 1≤kn, using \(\left \{\left (\xi ^{\ell }_{k-1},\omega ^{\ell }_{k-1}\right)\right \}_{\ell =1}^{N}\), the auxiliary particle filter of [30] samples pairs \(\left \{\left (I^{\ell }_{k},\xi ^{\ell }_{k}\right)\right \}_{\ell =1}^{N}\) of indices and particles using an instrumental transition density pk−1 on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\) and an adjustment multiplier function 𝜗 k on \(\mathbb {R}^{d}\). Each new particle \(\xi ^{\ell }_{k}\) and weight \(\omega ^{\ell }_{k}\) at time k are computed following these steps:
  1. -

    choose a particle index \(I^{\ell }_{k}\) at time k−1 in {1,…,N} with probabilities proportional to \(\omega _{k-1}^{j} \vartheta _{k} \left (\xi ^{j}_{k-1}\right)\), for j in {1,…,N} ;

  2. -

    sample \(\xi ^{\ell }_{k}\) using this chosen particle according to \(\xi ^{\ell }_{k} \sim p_{k-1}\left (\xi ^{I^{\ell }_{k}}_{k-1},\cdot \right)\) ;

  3. -
    associate the particle \(\xi ^{\ell }_{k}\) with the importance weight:
    $$ \omega^{\ell}_{k} := \frac{q_{k-1}\left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k}\right)g_{k}\left(\xi^{\ell}_{k}\right)}{\vartheta_{k}\left(\xi^{I^{\ell}_{k}}_{k-1}\right) p_{k-1} \left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k}\right)}\;. $$
The expectation ϕ k [h] is approximated by
$$\phi^{N}_{k}[\!h] := \frac{1}{\Omega_{k}^{N}} \sum_{\ell=1}^{N} \omega_{k}^{\ell} h \left(\xi^{\ell}_{k} \right)\;,\quad\Omega_{k}^{N}:= \sum_{\ell=1}^{N} \omega_{k}^{\ell}\;. $$

The most simple choice for pk−1 and 𝜗 k is the bootstrap filter proposed by [17] which sets pk−1=qk−1 and for all \(x\in \mathbb {R}^{d}\), 𝜗(x)=1. In the case of POD processes, qk−1 is unknown but it can be replaced by any approximation to sample the particles as any choice of pk−1 can be made. The approximation can be obtained using a discretization scheme such as Euler method or a Poisson-based approximation as detailed below. A more appealing choice is the fully adapted particle filter which sets for all \(x,x'\in \mathbb {R}^{d}\), pk−1(x,x)qk−1(x,x)g k (x) and for all \(x\in \mathbb {R}^{d}\), \(\vartheta (x) = \int q_{k-1}\left (x,x'\right)g_{k}\left (x'\right)\mu \left (\mathrm {d} x'\right)\). Here, again qk−1 has to be replaced by an approximation. In Section 5, it is replaced by the Gaussian approximation provided by a Euler scheme which leads to a Gaussian proposal density pk−1 as the observation model is linear and Gaussian.

The PaRIS algorithm uses the same decomposition as the FFBS algorithm introduced in [12] and the FFBSi algorithm proposed by [16] to approximate smoothing distributions. It combines both the forward-only version of the FFBS algorithm with the sampling mechanism of the FFBSi algorithm. It does not produce an approximation of the smoothing distributions but of the smoothed expectation of a fixed additive functional and thus may be used to approximate (2). Its crucial property is that it does not require a backward pass, the smoothed expectation is computed on-the-fly with the particle filter and no storage of the particles or weights is needed.

PaRIS algorithm relies on the following fundamental property of T k [H k ] when H k is as in (2):
$$\begin{aligned} T_{k}\left[H_{k}\right](X_{k}) &\,=\,\mathbb{E}\left[\left.T_{k-1}\left[H_{k-1}\right]\left(X_{k-1}\right) + h_{k-1}\left(X_{k-1},X_{k}\right)\right|X_{k},Y_{0:k-1} \right]\;,\\ &\,=\, \frac{\int \phi_{k-1}\left(\!\mathrm{d} x_{k-1}\!\right)q_{k-1}\!\left(\!x_{k-1},X_{k}\!\right)\!\left\{\!T_{k-1}\!\left[\!H_{k-1}\!\right]\!\left(\!x_{k-1}\!\right) \!+ \!h_{k-1}\!\left(x_{k-1},X_{k}\!\right)\!\right\}}{\int \phi_{k-1}\left(\mathrm{d} x_{k-1}\right)q_{k-1}\left(x_{k-1},X_{k}\right)}. \end{aligned} $$
Therefore, [28] introduces sufficient statistics \(\tau ^{i}_{k}\) (starting with \(\tau ^{i}_{0} = 0\), 1≤iN), approximating \(T_{k}\left [H_{k}\right ]\left (\xi ^{i}_{k}\right)\), for 1≤iN and 0≤kn. First, replacing ϕk−1 by \(\phi ^{N}_{k-1}\) in the last equation leads to the following approximation of \(T_{k}\left [H_{k}\right ]\left (\xi ^{i}_{k}\right)\):
$$\begin{array}{*{20}l} T_{k}^{N}&\left[H_{k}\right]\left(\xi_{k}^{i}\right) =\\ &\sum_{j=1}^{N} \Lambda_{k-1}^{N}(i,j)\!\left\{\!T_{k-1}\left[H_{k-1}\right]\!\left(\xi_{k-1}^{j}\right) \,+\, h_{k-1}\left(\xi^{j}_{k-1},\xi^{i}_{k}\right)\right\}\;, \end{array} $$
$$ \Lambda_{k}^{N}(i,\ell) = \frac{\omega^{\ell}_{k} {q_{k}}\left(\xi^{\ell}_{k},\xi_{k+1}^{i}\right)}{\sum_{\ell=1}^{N}\omega^{\ell}_{k} {q_{k}}\left(\xi^{\ell}_{k},\xi_{k+1}^{i}\right)}\;,\quad 1\le \ell\le N\;. $$
Computing exactly these approximations would lead to a complexity growing quadratically with N because of the normalizing constant in (8). Therefore, PaRIS algorithm sample particles in the set \(\left \{\xi ^{j}_{k-1}\right \}_{j=1}^{N}\) with probabilities \(\Lambda _{k}^{N}(i,\cdot)\) to approximate the expectation (7) and produce \(\tau ^{i}_{k}\). Choosing \(\tilde {N}\ge 1\), at each time step 0≤kn−1 these statistics are updated according to the following steps:
  1. (i)

    Run one step of a particle filter to produce \(\left \{\left (\xi ^{\ell }_{k}, \omega ^{\ell }_{k}\right)\right \}\) for 1≤N.

  2. (ii)

    For all 1≤iN, sample independently \(J_{k}^{i,\ell }\) in {1,…,N} for \(1\le \ell \le \widetilde N\) with probabilities \(\Lambda _{k}^{N}(i,\cdot)\), given by (8).

  3. (iii)
    $$\tau^{i}_{k+1} := \frac{1}{\widetilde{N}} \sum^{\widetilde{N}}_{\ell=1} \left\{ \tau^{J_{k}^{i,\ell}}_{k} + h_{k} \left(\xi^{J_{k}^{i,\ell}}_{k}, \xi^{i}_{k+1}\right) \right\}\;. $$
Then, (2) is approximated by
$$\phi_{0:n\vert n}^{N}\left[\tau_{n}\right] = \frac{1}{\Omega_{n}^{N}}\sum_{i=1}^{N} \omega^{i}_{n} \tau_{n}^{i}\;. $$

It is clear from steps (i) to (iii) that each time a new observation Yn+1 is received, the quantities \(\left (\tau _{n+1}^{i}\right)_{1\le i \le N}\) can be updated only using Yn+1, \(\left (\tau _{n}^{i}\right)_{1\le i \le N}\) and the particle filter at time n. This means that storage requirements do not increase when processing additional data.

As proved in [28], the algorithm is asymptotically consistent (as N goes to infinity) for any precision parameter \(\tilde N\). However, there is a significant qualitative difference between the cases \(\tilde {N} = 1\) and \(\tilde {N} \geq 2\). As for the FFBSi algorithm, when there exists σ+ such that 0<q k <σ+, PaRIS algorithm may be implemented with \(\mathcal {O}(N)\) complexity using the accept-reject mechanism of [10].

In general situations, PaRIS algorithm cannot be used for stochastic differential equations as q k is unknown. Therefore, the computation of the importance weights \(\omega _{k}^{\ell }\) and of the acceptance ratio of [10] is not tractable. Following [13, 27], filtering weights can be approximated by replacing \({q_{k}}\left (\xi ^{\ell }_{k},\xi _{k+1}^{i}\right)\) by an unbiased estimator \(\widehat {q}_{k}\left (\xi ^{\ell }_{k},\xi _{k+1}^{i};\zeta _{k}\right)\), where ζ k is a random variable in \(\mathbb {R}^{q}\) such that
$$\begin{array}{*{20}l} &\widehat{q}_{k}\left(\!\xi^{\ell}_{k},\xi_{k+1}^{i};\zeta_{k}\!\right)\!>\! 0~~\text{a.s}\quad\text{and}\\ &\mathbb{E}\left[\left.\widehat{q}_{k}\left(\xi^{\ell}_{k},\xi_{k+1}^{i};\zeta_{k}\right)\right| \mathcal{G}_{k+1}^{N}\right] = {q_{k}}\left(\xi^{\ell}_{k},\xi_{k+1}^{i}\right)\;, \end{array} $$
where for all 0≤kn,
$$\begin{array}{*{20}l} {}\mathcal{F}_{k}^{N} &= \sigma\left\{Y_{0:k};\left(\xi^{\ell}_{u},\omega^{\ell}_{u},\tau^{\ell}_{u}\right);J_{v}^{\ell,j};~1\le \ell\le N,~0\le u\le k, 1\right.\\ &\qquad\left.{\vphantom{\tau^{\ell}_{u}}}\le j \le \widetilde{N}, 0\le v< k\right\}\;,\\ {}\mathcal{G}_{k+1}^{N} &= \mathcal{F}_{k}^{N} \vee \sigma\left\{Y_{k+1};\left(\xi^{\ell}_{k+1},\omega^{\ell}_{k+1}\right);~1\le \ell\le N\right\}\;. \end{array} $$
Practical choices for ζ k are discussed below, see for instance (14) which presents the choice made for the implementation of such estimators in our context. In the case where q k is unknown, the filtering weights in (6) then become
$$ \widehat{\omega}^{\ell}_{k} := \frac{\widehat{q}_{k-1}\left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k};\zeta_{k-1}\right)g_{k}\left(\xi^{\ell}_{k}\right)}{\vartheta_{k}\left(\xi^{I^{\ell}_{k}}_{k-1}\right) p_{k-1} \left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k}\right)}\;. $$

In Algorithm 1, M independent copies \(\left (\zeta ^{m}_{k-1}\right)_{1\le m \le M}\) of ζk−1 are sampled and the empirical mean of the associated estimates of the transition density are used to compute \(\widehat {\omega }^{\ell }_{k}\) instead of a single realization. Therefore, to obtain a generalized random version of PaRIS algorithm, we only need to be able to sample from the discrete probability distribution \(\Lambda _{k}^{N}(i,\cdot)\) in the case of POD processes.

Consider the following assumption: for all 0≤kn−1, there exists a random variable \(\hat {\sigma }^{k}_{+}\) measurable with respect to \(\mathcal {G}_{k+1}^{N}\) such that,
$$ \text{sup}_{x,y,\zeta}\;\widehat{q}_{k}(x,y;\zeta)\leq \hat{\sigma}^{k}_+\;. $$

Lemma 1

Assume that (10) holds for some 0≤kn−1. For all 1≤iN, define the random variable \(J_{k}^{i}\) as follows:

Then, the conditional probability distribution given \(\mathcal {G}_{k+1}^{N}\) of \(J_{k}^{i}\) is \(\Lambda _{k}^{N}(i,\cdot)\).


See Appendix. □

Note that Lemma 1 still holds if assumption (10) is relaxed and replaced by
$$ \text{sup}_{j,y,\zeta}\;\widehat{q}_{k}\left(\xi^{j}_{k},y,\zeta\right)\leq \hat{\sigma}^{k}_+\;. $$
It is worth noting that under assumptions (10) or (11), the linear complexity property of PaRIS algorithm is ensured. The following assumption can also be considered. For all 1≤iN,
$$ \text{sup}_{j,\zeta}\;\widehat{q}_{k}\left(\xi^{j}_{k},\xi^{i}_{k+1},\zeta\right)\leq \hat{\sigma}^{k,i}_+\;. $$

If only assumption (12) holds, the algorithm has a quadratic complexity. The bound of (10) is uniform (it does not depend on the particles) and can be used for every particle 1≤iN. However, this bound can be large (with respect to the simulated set of particles) for the algorithm of Lemma 1. The bound of (12) requires N computations per particle (therefore, N2 computations). However, it is clear that this second bound is sharper that the one of (10) for the acceptance rejection procedure and may lead to a computationally more efficient algorithm.

Bounded estimator ofq k using GPEs For \(x, y \in \mathbb {R}^{d}\), by Girsanov and Ito’s formulas, the transition density q k (x,y) of (1) satisfies, with Δ k =tk+1t k ,
$$\begin{array}{*{20}l} {}q_{k}(x,y)=&\varphi_{\Delta_{k}}(x,y)\exp\left\lbrace A(y)-A(x)\right\rbrace\\ &\mathbb{E}_{\mathbb{W}^{x,y,\Delta_{k}}}\left[ \exp \left\lbrace - \int_{0}^{\Delta_{k}} \phi(\mathsf{w}_{s})\mathrm{d} s \right\rbrace \right]\;, \end{array} $$
where \(\mathbb {W}^{x,y,\Delta _{k}}\) is the law of Brownian bridge starting at x at 0 and hitting y at Δ k , \(\phantom {\dot {i}\!}(\mathsf {w}_{t})_{0\leq t \leq \Delta _{k}}\) is such a Brownian bridge, \(\varphi _{\Delta _{k}}(x,y)\) is the p.d.f. of a normal distribution with mean x and variance Δ k , evaluated at y and \(\phi :\mathbb {R}^{d}\to \mathbb {R}\) is defined as
$$\phi(x) =\left(\|\alpha(x)\|^{2} + \triangle A(x)\right)/2\;. $$

Assume that there exist random variables L w and U w such that for all 0≤sΔ k , L w ϕ(w s )≤U w . The performance of the estimator depends on the choice of L w and U w which is specific to the SDE. In the case of the models analyzed in Section 5, these bounds are discussed in [13] for the SINE model and in [27] for the log-growth model. Note that in the case where ϕ is not upper bounded, [5] proposed the EA3 algorithm. This layered Brownian bridge construction first samples random variables to determine in which layer the Brownian bridge lies before simulating the bridge conditional on the event that it belongs to the layer. By continuity of ϕ, L w , and U w can be computed easily.

Let κ be a random variable taking values in \(\mathbb {N}\) with distribution μ and (U j )1≤jκ be independent uniform random variables on [0,Δ k ] and ζ k ={κ,w,U1,…,U κ }. As shown in [13], a positive unbiased estimator is given by
$$\begin{array}{*{20}l} {}\widehat{q}_{k}(x,y;\zeta_{k}) =&\ \varphi_{\Delta_{k}}(x,y) \exp \left\{A(y) - A(x)\right\}\\ &\ \times\text{exp}\left\{-\mathsf{U}_{\mathsf{w}}\Delta\right\}\frac{\Delta_{k}^{\kappa}}{\mu(\kappa)\kappa!}\prod_{j=1}^{\kappa}\left(\mathsf{U}_{\mathsf{w}}-\phi\left(\mathsf{w}_{U_{j}}\right)\right)\;. \end{array} $$
Interesting choices of μ are discussed in [13], and we focus here on the so called GPE-1, where μ is a Poisson distribution with intensity (U w L w )Δ k . In that case, the estimator (13) becomes
$$\begin{array}{*{20}l} {}\widehat{q}_{k}&(x,y;\zeta_{k}) =\\ {}&\varphi_{\Delta_{k}}(x,y) \exp \left\{A(y) - A(x)- \mathsf{L}_{\mathsf{w}}\Delta_{k} \right\}\prod_{j=1}^{\kappa}\frac{\mathsf{U}_{\mathsf{w}}-\phi\left(\mathsf{w}_{U_{j}}\right)}{\mathsf{U}_{\mathsf{w}}-\mathsf{L}_{\mathsf{w}}}\;. \end{array} $$
On the r.h.s. of (14), the product over κ elements is bounded by 1. Therefore, a sufficient condition to satisfy one of the assumptions (10)–(12) is that the function
$$\begin{array}{*{20}l} {}\rho_{\Delta_{k}}:\; \mathbb{R}^{d}\times \mathbb{R}^{d} &\mapsto \mathbb{R}\\ (x,y)&\mapsto \varphi_{\Delta_{k}}(x,y) \exp \left\{A(y) - A(x)- \mathsf{L}_{\mathsf{w}}\Delta_{k} \right\} \end{array} $$

is upper bounded almost surely by \(\hat {\sigma }^{k}_{+}\). In particular, if L w is bounded below almost surely, (14) always satisfies assumption (12) and Algorithm 1 can be used. This condition is always satisfied for models in the domains required for the applications of exact algorithms EA1, EA2, and EA3 defined in [6].

When (10) or (11) holds, it can be nonetheless of practical interest to choose the bounds \(\hat {\sigma }^{k,i}_{+}\), 1≤iN, corresponding to (12). Indeed, this might increase significantly the acceptance rate of the algorithm, and therefore reduce the number of draws of the random variable ζ k , which has a much higher cost than the computation of \(\rho _{\Delta _{k}}\), as it requires simulations of Brownian bridges. Moreover, this option allows to avoid numerical optimization if no analytical expression of \(\hat {\sigma }_{+}^{k}\) is available. In practice, this seems more efficient in terms of computational time when N has moderate values.

4 Convergence results

Consider the following assumptions.
  • H1 (i) For all k≥0 and all \(x\in \mathbb {R}^{d}\), g k (x)>0.

  •   (ii) \(\underset {k\geq 0}{\sup }|g_{k}|_{\infty } < \infty \).

Assumption H1 only involves the marginal likelihood g k of the observations and does not depend on the unbiased estimation of the transition density. In the case where the observations are given as in the Section 5, this assumption holds as soon as the variance of the observation is bounded away from zero.
  • H2 \(\underset {k\geq 1}{\sup }|\vartheta _{k}|_{\infty } < \infty \), \(\underset {k\geq 1}{\sup }|p_{k}|_{\infty } < \infty \), and \(\underset {k\geq 1}{\sup }|\widehat {\omega }_{k}|_{\infty } < \infty \), where
    $$\begin{array}{*{20}l} \widehat{\omega}_{0}(x) &= \frac{\chi(x)g_{0}(x)}{\eta_{0}(x)} \quad\text{and for}\; k\ge1,\\ \widehat{\omega}_{k}\left(x,x';z\right) &= \frac{\widehat{q}_{k-1}\left(x,x';z\right)g_{k}(x')}{\vartheta_{k}(x) p_{k-1} (x,x')}\;. \end{array} $$

Assumption H2 depends on the algorithm used to estimate the transition densities and on the tuning parameters of the SMC filter. The most common choice is 𝜗 k =1 so that under H1, the only requirement is to control \(\widehat {q}_{k-1}\) and pk−1. For instance, in the case of the GPE-1, as explained in Section 3, H2 is satisfied if ϕ is upper bounded (as for the EA1).

Lemma 2

For all 0≤kn−1, the random variables \(\left \{\widehat {\omega }_{k+1}^{i}\tau _{k+1}^{i}\right \}_{i=1}^{N}\) are independent conditionally on \(\mathcal {F}_{k}^{N}\) and
$$\begin{array}{*{20}l} {}\mathbb{E}&\left[\left.\widehat{\omega}^{1}_{k+1}\tau^{1}_{k+1}\right| \mathcal{F}_{k}^{N}\right] =\\ {}&\left(\phi^{N}_{k}\!\left[\!\vartheta_{k+1}\right]\right)^{-1}\!\phi^{N}_{k}\!\left[\!\int\! q_{k}(\cdot,x)g_{k+1}(x)\!\left\{\!\tau_{k}(\cdot) \,+\, h_{k+1}(\cdot,x)\right\}\!\mathrm{d} x\!\right]\!. \end{array} $$


See Appendix

Proposition 1

Assume that H1 and H2 hold and that for all 1≤kn, osc(h k )<+. For all 0≤kn and all \(\widetilde {N}\ge 1\), there exist b k ,c k >0 such that for all N≥1 and all \(\varepsilon \in \mathbb {R}_{+}^{\star }\),
$$\mathbb{P}\left(\left|\phi_{k}^{N}[\tau_{k}] - \phi_{k}\left[T_{k}h_{k}\right]\right|\ge \varepsilon\right)\le b_{k}\exp\left(-c_{k}N\varepsilon^{2}\right)\;. $$


See Appendix

5 Numerical experiments

This section investigates the performance of the proposed algorithm with the sine and log-growth models (Fig. 1). In both cases, the proposal distribution p k is chosen as the following approximation of the optimal filter (or the fully adapted particle filter in the terminology of [30]):
$$p_{k-1}\left(x_{k-1},x_{k}\right)\propto \tilde{q}_{k-1}\left(x_{k-1}, x_{k}\right)g_{k}(x_{k})\;, $$
where \(\tilde {q}_{k-1}(x_{k-1},x_{k})\) is the p.d.f. of Gaussian distibution with mean α(xk−1)Δ k and variance Δ k I d , i.e., the Euler approximation of Eq. (1). As the observation model is linear and Gaussian, the proposal distribution is therefore Gaussian with explicit mean and variance.
Fig. 1

SINE model - observations. Process X solution to the SDE (balls) and observations Y (circles) at times t0=0,…,t100=50

In order to evaluate the performance of the proposed algorithm, the following strategy has been chosen. We compare the estimation of the EM intermediate quantity with the one obtained by the fixed lag method of [27], for different values of the lag (namely, 1,2,5,10,50). The particle approximation of \(\mathcal {Q}(\theta,\theta)\) for each model is computed using each algorithm, see Fig. 2 for the SINE model and Fig. 3 for the log-growth model. This estimation is performed 200 times to obtain the estimates \(\widehat {Q}_{1},\dots,\widehat {Q}_{200}\), using \(\tilde {N}=2\) particles for PaRIS algorithm, and M=30 replications for the Monte Carlo approximation \(\widehat q_{k}\) of each q k . Moreover, the E step requires the computation of a quantity such as (2) with h k = logg k + logq k . logq k is not available explicitly and is approximated using the unbiased estimator proposed in [27, Appendix B] based on 30 independent Monte Carlo simulations. In order to obtain a reference value for our study, the intermediate quantity of the EM algorithm is also estimated 30 times using the GRand PaRIS algorithm with N = 5000 particles, the reference value is then computed as the arithmetic mean of these 30 estimations, and denoted by \(\widehat {Q}_{\star }\). Figures 2 and 3 display this estimate for an example with one simulated data set. The GRand Paris algorithm is performed using N = 400 particles in both cases, the fixed lag technique using N = 1600 so that both estimations require similar computational times, resulting a fair comparison. On a personal computer1, for the parameters mentioned above, it takes around 25 s to perform each E step.
Fig. 2

SINE model - EM intermediate quantity. Estimation of the EM intermediate quantity \(\mathcal {Q}(\theta,\theta)\) using the fixed-lag (FL) technique for five different lags, and the GRand PaRIS algorithm using 200 replicates. The whiskers represent the extent of the 95% central values. The dot represents the empirical mean over the 200 replicates. The dotted line shows the reference value, computed using the GRand PaRIS algorithm with N =5000 particles

Fig. 3

Log-growth model - EM intermediate quantity. Estimation of the EM intermediate quantity \(\mathcal {Q}(\theta,\theta)\) using the fixed lag (FL) technique for five different lags, and the GRand PaRIS algorithm using 200 replicates. The whiskers represent the extent of the 95% central values. The dot represents the empirical mean over the 200 replicates. The dotted line shows the reference value, computed using the GRand PaRIS algorithm with N=5000 particles

5.1 The SINE model

The performance of the GRand PaRIS algorithm are first highlighted using the SINE model, where (X t )t≥0 is supposed to be the solution to
$$ \mathrm{d} X_{t} = \sin \left(X_{t}-\mu\right)\mathrm{d} t + \mathrm{d} W_{t},~~X_{0}=x_{0}\;. $$
This simple model has no explicit transition density, however, GPEs may be computed by simulating Brownian bridges. The process solution to (15) is observed regularly at times t0=0,…,t100=50 through the observation process (Y k )0≤k≤100:
$$Y_{k} = X_{k} + \varepsilon_{k}\;, $$
where the (ε k )0≤k≤100 are i.i.d. \(\mathcal {N}\left (0, \sigma ^{2}_{\text {obs}}\right)\), the resulting set of model parameters is θ=(μ,σobs). In the example displayed in Fig. 1, we set μ=0 and σobs=1.

In the case of the SINE model, the estimator \(\widehat {q}_{k}\) defined by Eq. (14) satisfies both (10) and (11). The corresponding bound \(\widehat {\sigma }_{+}^{k}\) can be obtained using numerical optimization. If that bound is chosen, the GRand PaRIS algorithm has linear complexity in the number of particles. As an alternative, it is worth noting here that the bounds \(\widehat {\sigma }_{+}^{k,i}\), 1≤iN, defined by (12) can also be used. This method has a quadratic cost in the number of particles but provides the optimal bound for the algorithm of Lemma 1. This may reduce significantly the expected time before acceptance, in particular when the time step Δ k is large. In the experiment configuration presented here, both bounds resulted in an equivalent computational time.

This same experiment was reproduced on 100 different simulated data sets. For each simulation s, the empirical absolute relative bias arb s and the empirical absolute coefficient of variation acv s are computed as
$$\begin{array}{*{20}l} \mathsf{arb}_{s} &= \frac{\left\vert m(\widehat{Q}^{s})-\widehat{Q}^{s}_{\star}\right\vert }{\left\vert \widehat{Q}^{s}_{\star}\right\vert },\\ \mathsf{acv}_{s}&=\frac{\sigma(\widehat{Q}^{s})}{\left\vert m(\widehat{Q}^{s})\right\vert }\;, \end{array} $$
where \(m\left (\widehat {Q}^{s}\right)\) and \(\sigma \left (\widehat {Q}^{s}\right)\) are the empirical mean and standard deviation of the sequence \(Q_{1}^{s},\dots,Q_{200}^{s}\). For each estimation method, the resulting distributions of arb1,…,arb100 and acv1,…,acv100 are shown in Figs. 4 and 5.
Fig. 4

SINE model - bias. Distribution of the empirical absolute relative bias

Fig. 5

SINE model - variance. Distribution of the empirical absolute coefficient of variation

The GRand PaRIS algorithm outperforms the fixed-lag methods for any value of the lag as the bias is the lowest (it is already negligible for N = 400) and with a lower variance than fixed lag estimates with negligible bias (i.e., in this case, lags larger than 10). Small lags lead to strongly biased estimates for the fixed-lag method, and unbiased estimates are at the cost of a large variance. It is worth noting here that the lag for which the bias is small is model dependent.

Generalized EM procedure The performance of our algorithm is also assessed in the case where θ and the variance \(\sigma ^{2}_{\text {obs}}\) are unknown and estimated using a generalized EM algorithm. The study is done using a data set with n = 200 observations simulated with μ = 0 an d \( \sigma ^{2}_{\text {obs}}~=~1\). The GRand PaRIS algorithm is used to perform the E step, with the same settings as before for N, \(\tilde {N}\), and M. As there is no closed form solution to compute the M step of the EM algorithm and propose new parameter estimates, we use a generalized EM procedure: given the current estimation \(\theta ^{(k)}:= \left (\mu ^{(k)}, \sigma _{\text {obs}}^{(k)} \right)\), the function \(\mathcal {Q}\left (\cdot, \theta ^{(k)}\right)\) is approximated for 50 new candidates θ1,…,θ50 chosen by the user. The new estimate is set as
$$\theta^{(k + 1)}~=~\text{argmax}_{i}\mathcal{Q}\left(\theta_{i}, \theta^{(k)}\right)\;. $$
This procedure has the nice property of using the same particle filter and the same retrospective sampling of Lemma 1 for all candidates, avoiding to repeat this time consuming procedure. The number of candidates and the way to choose them is problem dependent and then left to the user. In our case, we sampled candidates using Gaussian distributions around the current estimate θ(k), decreasing the variance when k increases. Figures 6 and 7 illustrate the performance of the estimation for 12 different initializations of μ (resp. σobs) uniformly chosen in ]−π,π[ (resp. in ]0,6[), illustrating a convergence after only a few iterations of the EM procedure.
Fig. 6

SINE model - EM. Estimation of μ

Fig. 7

SINE model - EM. Estimation of σobs

5.2 Log-growth model

Following [6] and [28], the performance of the proposed algorithm are also illustrated with the log-growth model (Fig. 8) defined by
$$ {}\mathrm{d} Z_{t}~=~\kappa Z_{t}\left(1-\frac{Z_{t}}{\gamma}\right)\mathrm{d} t + \sigma Z_{t} \mathrm{d} W_{t},~~Z_{0}~=~z_{0}\;. $$
Fig. 8

Log-growth model - observations. Process X solution to the SDE (balls) and observations Y (circles) at times t0=0,…,t100 = 50

In order to use the exact algorithms of [6] and the GPE of [13], we consider (16) after the Lamperti transform, i.e., the process defined by X t = η(Z t ), with η(z): = − log(z)/σ, which satisfies the following SDE:
$$ \begin{aligned} \mathrm{d} X_{t} \,=\, \overbrace{\left(\! \frac{\sigma}{2} \,-\, \frac{\kappa}{\sigma} \,+\, \frac{\kappa}{\gamma\sigma}\!\exp\!\left(-\sigma X_{t}\right)\!\right)}^{:=\alpha(X_{t})}\!\mathrm{d} t +\mathrm{d} W_{t},~X_{0}=x_{0}=\eta(z_{0})\;. \end{aligned} $$

In this case, the conditions of the exact Algorithm 2 defined in [6] are satisfied, as for any \(m \in \mathbb {R}\) there exists U m such that for all xm, ψ(x):=α2(x)+α(x)≤U m . Moreover, ψ is lower bounded uniformly by L. Then, GPE estimators may be computed by simulating the minimum of a Brownian bridge, and simulating Bessel bridges conditionally to this minimum, as proposed by [6].

The process solution to (17) is observed regularly at times t0=0,…,t50=100 through the observation process (Y k )0≤k≤50 defined as
$$Y_{k} = X_{k} + \varepsilon_{k}\;, $$
where the (ε k )0≤k≤50 are i.i.d. \( \mathcal {N}\left (0,\sigma ^{2}_{obs}\right)\). The parameters are given by
$$\theta =(\kappa=0.1,\sigma=0.1,\gamma=1000,\sigma_{{\text{obs}}} = 2)\;. $$
In the case of the log-growth model, the estimator \(\widehat {q}_{k}(\cdot)\) defined by Eq. (14) satisfies (11), leading to a GRand PaRIS algorithm with linear complexity in the number of particles. However, the remarks about the bound \(\widehat {\sigma }_{+}^{k}\) made for the SINE model above still hold in this case. The intermediate quantity of the EM algorithm is evaluated as for the SINE model, see Figs. 3, 9, and 10.
Fig. 9

Log-growth model - bias. Distribution of the empirical absolute relative bias

Fig. 10

Log-growth model - variance. Distribution of the empirical absolute coefficient of variation

The results for the fixed-lag technique are similar to the ones presented in [27, Figure 1] using the same model. For small lags, the variance of the estimates is small, but the estimation is highly biased. The bias rapidly decreases as the lag increases, together with a great increase of variance. Again, the GRand PaRIS algorithm outperforms the fixed lag smoother as it shows a similar (vanishing) bias as the fixed lag for the largest lag and a smaller variance than the fixed lags estimates with negligible bias.

Note that in this case, the Lamperti transform to obtain a diffusion with a unitary diffusion term depends on σ. The process (X t )t≥0 is a function of σ and is not directly observed if σ is unknown, which prevents a direct use of an EM algorithm to estimate σ. Following [6, Section 8.2], this may be overcome with a two-step transformation of the process (Z t )t≥0.

6 Conclusions

This paper presents a new online SMC smoother for partially observed differential equations. This algorithm relies on an acceptance-rejection procedure inspired from the recent PaRIS algorithm. The main result of the article for practical applications is that the mechanism of this procedure remains valid when the transition density is approximated by a an unbiased positive estimator. The proposed procedure therefore extends the PaRIS algorithm to HMMs whose transition density is unknown and can be unbiasedly approximated. The GRand PaRIS algorithm outperforms the existing fixed lag smoother for POD processes of [27], as it does not introduce any intrinsic and non-vanishing bias. In addition, numerical simulations highlight a better variance using data from two different models. It can be implemented for the class of models for which exact algorithms of [6] are valid, with a linear complexity in N in the best cases, or at worse in N2.

7 Appendix

7.1 Proofs

Proof of Lemma 1

Let τ be the first time draws are accepted in the accept-reject mechanism. For all ≥1, write
$$\mathcal{A}^{k}_{\ell} = \left\{U_{\ell}<\widehat{q}_{k}\left(\xi_{k}^{J_{\ell}},\xi_{k+1}^{i},\zeta^{\ell}_{k}\right)/\hat{\sigma}^{k}_{+}\right\}\;. $$
Let h be a function defined on {1,…,N},
$$\begin{aligned} {}\mathbb{E}&\left[\left.h\left(J^{i,j}_{k}\right)\right| \mathcal{G}_{k+1}^{N}\right]\\ &= \sum_{m\ge 1}\mathbb{E}\left[\left.h(J_{m}){1}_{\tau=m}\right| \mathcal{G}_{k+1}^{N}\right]\;,\\ & = \sum_{m\ge 1}\left(\prod_{\ell=1}^{m-1}\mathbb{E}\left[\left.{1}_{(\mathcal{A}^{k}_{\ell})^{c}}\right| \mathcal{G}_{k+1}^{N}\right]\right) \mathbb{E}\left[\left.h(J_{m}){1}_{\mathcal{A}^{k}_{m}}\right| \mathcal{G}_{k+1}^{N}\right]\;,\\ & = \sum_{m\ge 1}\left(\prod_{\ell=1}^{m-1}\mathbb{E}\left[\left.1-\frac{\widehat{q_{k}}\left(\xi_{k}^{J_{\ell}},\xi_{k+1}^{i};\zeta_{k}^{\ell}\right)}{\hat{\sigma}^{k}_{+}}\right| \mathcal{G}_{k+1}^{N}\right]\right)\\ &\quad\times\mathbb{E}\left[\left.h(J_{m})\frac{\widehat{q_{k}}\left(\xi_{k}^{J_{m}},\xi_{k+1}^{i};\zeta_{k}^{m}\right)}{\hat{\sigma}^{k}_{+}}\right| \mathcal{G}_{k+1}^{N}\right]\;,\\ & = \sum_{m\ge 1}\left(\mathbb{E}\left[\left.1-\frac{q_{k}\left(\xi_{k}^{J_{1}},\xi_{k+1}^{1}\right)}{\hat{\sigma}^{k}_{+}}\right| \mathcal{G}_{k+1}^{N}\right]\right)^{m-1}\\ &\qquad\mathbb{E}\left[\left.h(J_{1})\frac{q_{k}\left(\xi_{k}^{J_{1}},\xi_{k+1}^{1}\right)}{\hat{\sigma}^{k}_{+}}\right| \mathcal{G}_{k+1}^{N}\right]\;,\\ & = \mathbb{E}\left.\left[\left.h(J_{1})q_{k}\left(\xi_{k}^{J_{1}},\xi_{k+1}^{i}\right)\right| \mathcal{G}_{k+1}^{N}\right]\right/\mathbb{E}\left[\left.q_{k}\left(\xi_{k}^{J_{1}},\xi_{k+1}^{i}\right)\right| \mathcal{G}_{k+1}^{N}\right]\;,\\ & = \sum_{\ell=1}^{N} \frac{h(\ell)\omega_{k-1}^{\ell}q_{k}\left(\xi_{k}^{\ell},\xi_{k+1}^{i}\right)}{\sum_{m=1}^{N}\omega_{k-1}^{m}q_{k}\left(\xi_{k}^{m},\xi_{k+1}^{i}\right)}\;,\\ &= \sum_{\ell=1}^{N} \Lambda_{k-1}^{N}(i,\ell)h(\ell) \;, \end{aligned} $$
which concludes the proof. □

Proof of Lemma 2

The independence is ensured by the mechanism of SMC methods. By (9),
$$\begin{array}{*{20}l} \mathbb{E}&\left[\left.\widehat{\omega}^{i}_{k+1}\tau^{i}_{k+1}\right| \mathcal{F}_{k}^{N}\right] =\\& \mathbb{E}\left[\left.\frac{\widehat{{q_{k}}}\left(\xi_{k}^{I^{i}_{k+1}}, \xi^{i}_{k+1};\zeta_{k}\right)g_{k+1}\left(\xi^{i}_{k+1}\right)}{\vartheta_{k+1}\left(\xi^{I^{i}_{k+1}}_{k}\right) p_{k}\left(\xi_{k}^{I^{i}_{k+1}},\xi^{i}_{k+1}\right)}\tau^{i}_{k+1}\right| \mathcal{F}_{k}^{N}\right]\;. \end{array} $$
Note that by Lemma 1,
$$\begin{array}{*{20}l} {}&\mathbb{E}\left[\!\tau^{i}_{k+1}\left|\mathcal{G}_{k+1}^{N}\right.\!\right] \,=\, \sum_{\ell=1}^{N}\frac{\omega_{k}^{\ell} {q_{k}}\left(\xi_{k}^{\ell}, \xi^{i}_{k+1}\right) \left(\tau^{\ell}_{k} + h_{k}\left(\xi_{k}^{\ell},\xi^{i}_{k+1}\right)\right)}{\sum_{\ell'=1}^{N}\omega_{k}^{\ell'} {q_{k}}\left(\xi_{k}^{\ell'},\xi^{i}_{k+1}\right)}\!,\\ {}&\mathbb{E} \left[\left.\widehat{{q_{k}}}\left(\xi_{k}^{I^{i}_{k+1}},\xi^{i}_{k+1};\zeta_{k}\right) \right| \mathcal{G}_{k+1}^{N}\right] = {q_{k}}\left(\xi_{k}^{I^{i}_{k+1}},\xi^{i}_{k+1}\right)\;. \end{array} $$
Since \(\tau ^{i}_{k+1}\) and ζ k are independent conditionally to \(\mathcal {G}_{k+1}^{N}\):
$$\begin{array}{*{20}l} {}\mathbb{E}&\left[\left.\tau^{i}_{k+1} \widehat{{q_{k}}} \left(\xi_{k}^{I^{i}_{k+1}},\xi^{i}_{k+1};\zeta_{k}\right)\right|\mathcal{G}_{k+1}^{N}\right]\\ {}& = q_{k}\!\left(\!\xi_{k}^{I^{i}_{k+1}},\xi^{i}_{k+1}\!\right)\!\!\sum_{\ell=1}^{N}\!\frac{\omega_{k}^{\ell} {q_{k}} \!\left(\xi_{k}^{\ell},\xi^{i}_{k+1}\right)\!\!\left(\tau^{\ell}_{k} + h_{k}\!\left(\!\xi_{k}^{\ell},\xi^{i}_{k+1}\!\right)\!\right)}{\sum_{\ell'=1}^{N}\omega_{k}^{\ell'} {q_{k}} \left(\xi_{k}^{\ell'},\xi^{i}_{k+1}\right)}\!. \end{array} $$
Moreover, conditionally to \(\mathcal {F}_{k}^{N}\), the probability density function of \(\left (\xi _{k+1}^{i},I_{k+1}^{i}\right)\) is given by
$$(x,j) \mapsto \frac{\omega_{k}^{j}\vartheta_{k+1}\left(\xi_{k}^{j}\right)p_{k}\left(\xi_{k}^{j},x\right)}{\Omega^{N}_{k}\phi_{k}^{N}\left[\vartheta_{k+1}\right]}\;. $$
Therefore, this yields
$$\begin{aligned} \mathbb{E}&\left[\left.\widehat{\omega}^{i}_{k+1}\tau^{i}_{k+1}\right| \mathcal{F}_{k}^{N}\right]\\ &= \left(\phi^{N}_{k}\left[\vartheta_{k+1}\right]\right)^{-1} \sum_{j=1}^{N}\frac{\omega_{k}^{j}}{\Omega_{k}} \int \vartheta_{k+1}\left(\xi^{j}_{k}\right)\frac{{q_{k}}\left(\xi_{k}^{j},x\right) g_{k+1}(x)}{\vartheta_{k+1}\left(\xi^{j}_{k}\right) p_{k}\left(\xi_{k}^{j},x\right)}\\ &\quad\times \sum_{\ell=1}^{N}\frac{\omega_{k}^{\ell} {q_{k}} \left(\xi_{k}^{\ell},x\right)\left(\tau^{\ell}_{k} + h_{k}\left(\xi_{k}^{\ell},x\right)\right)}{\sum_{\ell'=1}^{N}\omega_{k}^{\ell'}{q_{k}}\left(\xi_{k}^{\ell'},x\right)}p_{k}\left(\xi_{k}^{j},x\right)\mathrm{d} x\;,\\ &= \left(\phi^{N}_{k}\left[\vartheta_{k+1}\right]\right)^{-1}\\ &~~~~\times\sum_{\ell=1}^{N} \frac{\omega_{k}^{\ell}}{\Omega_{k}}\left[\int \frac{\sum_{j=1}^{N} \omega_{k}^{j}{q_{k}}\left(\xi_{k}^{j},x\right) }{ \sum_{\ell'=1}^{N}\omega_{k}^{\ell'}{q_{k}}\left(\xi_{k}^{\ell'},x\right)} g_{k+1}(x){q_{k}} \left(\xi_{k}^{\ell},x\right)\right.\\ &\qquad\qquad\quad\left.\left(\tau^{\ell}_{k} + h_{k}\left(\xi_{k}^{\ell},x\right)\right) \mathrm{d} x {\vphantom{\frac{\sum_{j=1}^{N} \omega_{k}^{j}{q_{k}}\left(\xi_{k}^{j},x\right) }{ \sum_{\ell'=1}^{N}\omega_{k}^{\ell'}{q_{k}}\left(\xi_{k}^{\ell'},x\right)}}}\right]\\ & =\left(\phi^{N}_{k}\left[\vartheta_{k+1}\right]\right)^{-1}\phi^{N}_{k}\left[\int {q_{k}}(\cdot,x)g_{k+1}(x)\left\{\tau_{k}(\cdot) + h_{k}(\cdot,x)\right\}\mathrm{d} x\right]\;, \end{aligned} $$
which concludes the proof. □

Proof of Proposition 1

The results is proved by induction. At time k = 0, the result holds using that for all 1≤iN, \(\rho _{0}^{i} = 0\) and the convention T0[h0] = 0. In addition, \(\phi _{0}^{N}\) is a standard importance sampler estimator of ϕ0 with \(\widehat \omega _{0}^{i}\le |\widehat {\omega }_{0}|_{\infty }\) so that for any bounded function h on X,
$$\mathbb{P}\left(\left|\phi_{0}^{N}[h] - \phi_{0}[h]\right|\ge \varepsilon\right)\le b_{0}\exp\left(-c_{0}N\varepsilon^{2}\right)\;. $$
Assume the results holds for k≥0 and that 𝜗k+1 = 1 for simplicity. Write
$$\phi_{k+1}^{N}\left[\tau_{k+1}\right] - \phi_{k+1}\left[T_{k+1}\left[h_{k+1}\right]\right] = a_{N}/b_{N}\;, $$
where \(a_{N} = N^{-1}\sum _{i=1}^{N} \widehat {\omega }_{k+1}^{i} \left (\!\tau _{k+1}^{i} - \phi _{k+1}\left [T_{k+1}[h_{k+1}]\right ]\right)\) and \(b_{N} =N^{-1}\sum _{i=1}^{N} \widehat {\omega }_{k+1}^{i}\). By Lemma 2, the random variables \(\{\widehat {\omega }_{k+1}^{i}\tau _{k+1}^{i}\}_{i=1}^{N}\) are independent conditionally on \(\mathcal {F}_{k}^{N}\) and by H2,
$${}\left|\widehat{\omega}_{k+1}^{i} \left(\tau_{k+1}^{i} - \phi_{k+1}\left[T_{k+1}\left[h_{k+1}\right]\right]\right)\right| \le 2|\widehat{\omega}_{k+1}|_{\infty}|H_{k+1}|_{\infty}\;. $$
Therefore, by Hoeffding inequality,
$$\begin{array}{*{20}l} {}\mathbb{P}&\left(\left|a_{N} - \mathbb{E}\left[a_{N}\left|\mathcal{F}_{k}^{N}\right.\right]\right|\ge \varepsilon\right) = \\ {}&\mathbb{E}\left[\mathbb{P}\left(\left.\left|a_{N} \!- \mathbb{E}\left[a_{N}\left|\mathcal{F}_{k}^{N}\right.\right]\right|\ge \varepsilon\right|\mathcal{F}_{k}^{N}\right)\right]\le 2\exp\left(-c_{k}N\varepsilon^{2}\right)\;. \end{array} $$
On the other hand,
$$\mathbb{E}\left[a_{N}\left|\mathcal{F}_{k}^{N}\right.\right] = \phi^{N}_{k}\left[\Upsilon_{k}\right] \;, $$
$$\begin{array}{*{20}l} {}\Upsilon_{k}(x_{k}) =& \int q_{k}(\cdot,x)g_{k+1}(x)\left(\tau_{k}(x_{k}) + h_{k+1}(x_{k},x)\right.\\ &\left.- \phi_{k+1}\left[T_{k+1}[h_{k+1}]\right]\right)\mathrm{d} x\;. \end{array} $$
By [28, Lemma 11], ϕ k [Υ k ] = 0 which implies by the induction assumption that
$$\mathbb{P}\left(\left|\mathbb{E}\left[a_{N}\left|\mathcal{F}_{k}^{N}\right.\right]\right|\ge \varepsilon\right)\le b_{k}\exp\left(-c_{k}N\varepsilon^{2}\right)\;. $$
$$\mathbb{P}\left(\left|a_{N}\right|\ge \varepsilon\right) \le b_{k}\exp\left(-c_{k}N\varepsilon^{2}\right)\;. $$
Similarly, as \(b_{N} \le |\widehat {\omega }_{k}|_{\infty }\), by Hoeffding inequality,
$$\begin{array}{*{20}l} {}\mathbb{P}&\left(\left|b_{N} - \mathbb{E}\left[b_{N}\left|\mathcal{F}_{k}^{N}\right.\right]\right|\ge \varepsilon\right)\\ {}&= \mathbb{E}\!\left[\!\mathbb{P}\!\left(\left|b_{N} \,-\, \mathbb{E}\!\left[\!b_{N}\!\left|\mathcal{F}_{k}^{N}\right.\right]\right|\!\ge\! \varepsilon\left|\mathcal{F}_{k}^{N}\right.\right)\right]\le 2\exp\left(-c_{k}N\varepsilon^{2}\right)\;. \end{array} $$
Note that
$$\mathbb{E}\left[b_{N}\left|\mathcal{F}_{k}^{N}\right.\right] = \phi^{N}_{k}\left[\int q_{k}(\cdot,x)g_{k+1}(x)\mathrm{d} x\right]\;. $$
By the induction assumption,
$$\begin{array}{*{20}l} {}\mathbb{P}&\left(\left|\mathbb{E}\left[b_{N}\left|\mathcal{F}_{k}^{N}\right.\right]-\phi_{k}\left[\int q_{k}(\cdot,x)g_{k+1}(x)\mathrm{d} x\right]\right|\ge \varepsilon\right)\\ &\le b_{k}\exp\left(-c_{k}N\varepsilon^{2}\right)\;. \end{array} $$

The proof is completed using Lemma 3. □

Lemma 3

Assume that a N , b N , and b are random variables defined on the same probability space such that there exist positive constants β, B, C, and M satisfying
  1. (i)

    |a N /b N |≤M, \(\mathbb {P}\)-a.s. and bβ, \(\mathbb {P}\)-a.s.,

  2. (ii)

    For all ε>0 and all N≥1, \(\mathbb {P}\left [|b_{N}-b|>\epsilon \right ]\leq B \exp \left (-C N \epsilon ^{2}\right)\),

  3. (iii)

    For all ε>0 and all N≥1, \(\mathbb {P} \left [ |a_{N}|>\epsilon \right ]\leq B \exp \left (-C N \left (\epsilon /M\right)^{2}\right)\).

$$ \mathbb{P}\left\{ \left| \frac{a_{N}}{b_{N}} \right| > \epsilon \right\} \leq B \exp{\left(-C N \left(\frac{\epsilon \beta}{2M} \right)^{2} \right)} \;.$$


See [10]. □


(i7-6600U CPU @ 2.60GHz)




This work has been developed during a 1-year postdoc funded by Paris-Saclay Center for Data Science.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All the authors have contributed to the conception of the algorithms, the analysis of the proposed estimator, and to the redaction of the manuscript. PG provided the simulations displayed in the final version. All authors read and approved the final manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

AgroParistech, Paris, France
Laboratoire de Mathématiques d’Orsay, Univ. Paris-Sud, CNRS, Université Paris-Saclay, Orsay, France


  1. Y Ait-Sahalia, Transition densities for interest rate and other nonlinear diffusions. J. Financ. 54:, 1361–1395 (1999).MathSciNetView ArticleGoogle Scholar
  2. Y Ait-Sahalia, Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach. Econometrica. 70:, 223–262 (2002).MathSciNetView ArticleMATHGoogle Scholar
  3. Y Ait-Sahalia, Closed-form likelihood expansions for multivariate diffusions. Ann. Stat. 36:, 906–937 (2008).MathSciNetView ArticleMATHGoogle Scholar
  4. A Beskos, O Papaspiliopoulos, GO Roberts, Retrospective exact simulation of diffusion sample paths with applications. Bernoulli. 12(6), 1077:1098 (2006).MathSciNetView ArticleMATHGoogle Scholar
  5. A Beskos, O Papaspiliopoulos, GO Roberts, A factorisation of diffusion measure and finite sample path constructions. Methodol. Comput. Appl. Probab. 10(1), 85–104 (2008).MathSciNetView ArticleMATHGoogle Scholar
  6. A Beskos, O Papaspiliopoulos, GO Roberts, P Fearnhead, Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discusion). J. Roy. Statist. Soc. Ser. B. 68(3), 333–382 (2006).View ArticleMATHGoogle Scholar
  7. O Cappé, E Moulines, T Rydén, Inference in hidden Markov models (Springer-Verlag, New York, 2005).MATHGoogle Scholar
  8. P Del Moral, J Jacod, P Protter, The Monte Carlo method for filtering with discrete-time observations. Probab. Theory Relat. Fields. 120:, 346–368 (2001).MathSciNetView ArticleMATHGoogle Scholar
  9. AP Dempster, NM Laird, DB Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B. 39(1), 1–38 (1977). (with discussion).MathSciNetMATHGoogle Scholar
  10. R Douc, A Garivier, E Moulines, J Olsson, Sequential Monte Carlo smoothing for general state space hidden Markov models. Ann. Appl. Probab. 21(6), 2109–2145 (2011).MathSciNetView ArticleMATHGoogle Scholar
  11. A Doucet, S Godsill, C Andrieu, On sequential Monte-Carlo sampling methods for Bayesian filtering. Stat. Comput. 10:, 197–208 (2000).View ArticleGoogle Scholar
  12. A Doucet, S Godsill, C Andrieu, On sequential monte-carlo sampling methods for bayesian filtering. Stat. Comput. 10:, 197–208 (2000).View ArticleGoogle Scholar
  13. P Fearnhead, O Papaspiliopoulos, GO Roberts, Particle filters for partially observed diffusions. J. Roy. Statist. Soc. Ser. B. 70(4), 755–777 (2008).MathSciNetView ArticleMATHGoogle Scholar
  14. P Fearnhead, K Latuszynski, GO Roberts, G Sermaidis, Continuous-time importance sampling: Monte Carlo methods which avoid time-discretisation error (2017). Technical report.Google Scholar
  15. P Gloaguen, M-P Étienne, S Le Corff, Stochastic differential equation based on a multimodal potential to model movement data in ecology. To appear in the Journal of the Royal Statistical Society: Series C.
  16. SJ Godsill, A Doucet, M West, Monte Carlo smoothing for non-linear time series. J. Am. Stat. Assoc. 50:, 438–449 (2004).MATHGoogle Scholar
  17. N Gordon, D Salmond, AF Smith, Novel approach to nonlinear/non-Gaussian bayesian state estimation. IEE Proc. F. Radar Sig. Process. 140:, 107–113 (1993).View ArticleGoogle Scholar
  18. M Hürzeler, HR Künsch, Monte Carlo approximations for general state-space models. J. Comput. Graph. Stat. 7:, 175–193 (1998).MathSciNetMATHGoogle Scholar
  19. N Kantas, A Doucet, SS Singh, J Maciejowski, N Chopin, On particle methods for parameter estimation in state-space models. Stat. Sci. 30(3), 328–351 (2015).MathSciNetView ArticleMATHGoogle Scholar
  20. M Kessler, Estimation of an ergodic diffusion from discrete observations. Scand. J. Stat. 24(2), 211–229 (1997).MathSciNetView ArticleMATHGoogle Scholar
  21. M Kessler, A Lindner, M Sorensen, Statistical methods for stochastic differential equations (CRC Press, Boca Raton, 2012).MATHGoogle Scholar
  22. G Kitagawa, Monte-Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Stat. 1:, 1–25 (1996).MathSciNetGoogle Scholar
  23. S Le Corff, G Fort, Convergence of a particle-based approximation of the block online Expectation Maximization algorithm. ACM Trans. Model. Comput. Simul. 23(1), 2 (2013).MathSciNetView ArticleGoogle Scholar
  24. S Le Corff, G Fort, Online expectation maximization based algorithms for inference in hidden Markov models. Electron. J. Stat. 7:, 763–792 (2013).MathSciNetView ArticleMATHGoogle Scholar
  25. C Li, Maximum-likelihood estimation for diffusion processes via closed-form density expansions. Ann. Stat. 41(3), 1350–1380 (2013).MathSciNetView ArticleMATHGoogle Scholar
  26. J Olsson, O Cappe, R Douc, E Moulines, Sequential monte carlo smoothing with application to parameter estimation in nonlinear state space models. Bernoulli. 14(1), 155–179 (2008).MathSciNetView ArticleMATHGoogle Scholar
  27. J Olsson, J Strojby, Particle-based likelihood inference in partially observed diffusion processes using generalised Poisson estimators. Electron. J. Stat. 5:, 1090–1122 (2011).MathSciNetView ArticleMATHGoogle Scholar
  28. J Olsson, J Westerborn, Efficient particle-based online smoothing in general hidden Markov models: the PaRIS algorithm. Bernoulli. 3:, 1951–1996 (2017).MathSciNetView ArticleMATHGoogle Scholar
  29. T Ozaki, A bridge between nonlinear time series models and nonlinear stochastic dynamical systems: a local linearization approach. Stat. Sin. 2:, 1130–135 (1992).MathSciNetMATHGoogle Scholar
  30. MK Pitt, N Shephard, Filtering via simulation: Auxiliary particle filters. J. Am. Stat. Assoc. 94(446), 590–599 (1999).MathSciNetView ArticleMATHGoogle Scholar
  31. G Poyiadjis, A Doucet, SS Singh, Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika. 98:, 65–80 (2011).MathSciNetView ArticleMATHGoogle Scholar
  32. I Shoji, T Ozaki, Estimation for nonlinear stochastic differential equations by a local linearization method 1. Stoch. Anal. Appl. 16(4), 733–752 (1998).MathSciNetView ArticleMATHGoogle Scholar
  33. M Uchida, N Yoshida, Adaptive estimation of an ergodic diffusion process based on sampled data. Stoch. Process. Appl. 122(8), 2885–2924 (2012).MathSciNetView ArticleMATHGoogle Scholar
  34. W Wagner, Unbiased Monte Carlo estimators for functionals of weak solutions of stochastic differential equations. Stochast. Stochast. Rep. 28:, 1–20 (1989).MathSciNetView ArticleMATHGoogle Scholar


© The Author(s) 2018