The algorithm is based on the following link between the filtering and smoothing distributions for additive functionals, see [28]:
$$ \begin{aligned} \phi_{0:n|n}\left[h\right] &= \phi_{n}\left[T_{n}[h]\right]\;,\;\text{where}\; \\ T_{n}\left[h\right](X_{n}) &= \mathbb{E}\left[h(X_{0:n})\vert X_{n},Y_{0:n}\right]\;. \end{aligned} $$
(5)
The approximation of (5) requires first to approximate the sequence of filtering distributions. Sequential Monte Carlo methods provide an efficient and simple solution to obtain these approximations using sets of particles \(\left \{\xi ^{\ell }_{k}\right \}_{\ell =1}^{N}\) associated with weights \(\left \{\omega ^{\ell }_{k}\right \}_{\ell =1}^{N}\), 0≤k≤n.
At time k=0, N particles \(\left \{\xi ^{\ell }_{0}\right \}_{\ell =1}^{N}\) are sampled independently according to \(\xi ^{\ell }_{0} \sim \eta _{0}\), where η0 is a probability density with respect to μ. Then, \(\xi ^{\ell }_{0}\) is associated with the importance weights \(\omega _{0}^{\ell } = \chi \left (\xi ^{\ell }_{0}\right)g_{0} \left (\xi ^{\ell }_{0}\right)/\eta _{0}\left (\xi ^{\ell }_{0}\right)\). For any bounded and measurable function h defined on \(\mathbb {R}^{d}\), the expectation ϕ0[h] is approximated by
$$\phi^{N}_{0}[\!h] = \frac{1}{\Omega_{0}^{N}} \sum_{\ell=1}^{N} \omega_{0}^{\ell} h \left(\xi^{\ell}_{0} \right)\;, \quad \Omega_{0}^{N}:= \sum_{\ell=1}^{N} \omega_{0}^{\ell}\;. $$
Then, for 1≤k≤n, using \(\left \{\left (\xi ^{\ell }_{k-1},\omega ^{\ell }_{k-1}\right)\right \}_{\ell =1}^{N}\), the auxiliary particle filter of [30] samples pairs \(\left \{\left (I^{\ell }_{k},\xi ^{\ell }_{k}\right)\right \}_{\ell =1}^{N}\) of indices and particles using an instrumental transition density pk−1 on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\) and an adjustment multiplier function 𝜗
k
on \(\mathbb {R}^{d}\). Each new particle \(\xi ^{\ell }_{k}\) and weight \(\omega ^{\ell }_{k}\) at time k are computed following these steps:
-
-
choose a particle index \(I^{\ell }_{k}\) at time k−1 in {1,…,N} with probabilities proportional to \(\omega _{k-1}^{j} \vartheta _{k} \left (\xi ^{j}_{k-1}\right)\), for j in {1,…,N} ;
-
-
sample \(\xi ^{\ell }_{k}\) using this chosen particle according to \(\xi ^{\ell }_{k} \sim p_{k-1}\left (\xi ^{I^{\ell }_{k}}_{k-1},\cdot \right)\) ;
-
-
associate the particle \(\xi ^{\ell }_{k}\) with the importance weight:
$$ \omega^{\ell}_{k} := \frac{q_{k-1}\left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k}\right)g_{k}\left(\xi^{\ell}_{k}\right)}{\vartheta_{k}\left(\xi^{I^{\ell}_{k}}_{k-1}\right) p_{k-1} \left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k}\right)}\;. $$
(6)
The expectation ϕ
k
[h] is approximated by
$$\phi^{N}_{k}[\!h] := \frac{1}{\Omega_{k}^{N}} \sum_{\ell=1}^{N} \omega_{k}^{\ell} h \left(\xi^{\ell}_{k} \right)\;,\quad\Omega_{k}^{N}:= \sum_{\ell=1}^{N} \omega_{k}^{\ell}\;. $$
The most simple choice for pk−1 and 𝜗
k
is the bootstrap filter proposed by [17] which sets pk−1=qk−1 and for all \(x\in \mathbb {R}^{d}\), 𝜗(x)=1. In the case of POD processes, qk−1 is unknown but it can be replaced by any approximation to sample the particles as any choice of pk−1 can be made. The approximation can be obtained using a discretization scheme such as Euler method or a Poisson-based approximation as detailed below. A more appealing choice is the fully adapted particle filter which sets for all \(x,x'\in \mathbb {R}^{d}\), pk−1(x,x′)∝qk−1(x,x′)g
k
(x′) and for all \(x\in \mathbb {R}^{d}\), \(\vartheta (x) = \int q_{k-1}\left (x,x'\right)g_{k}\left (x'\right)\mu \left (\mathrm {d} x'\right)\). Here, again qk−1 has to be replaced by an approximation. In Section 5, it is replaced by the Gaussian approximation provided by a Euler scheme which leads to a Gaussian proposal density pk−1 as the observation model is linear and Gaussian.
The PaRIS algorithm uses the same decomposition as the FFBS algorithm introduced in [12] and the FFBSi algorithm proposed by [16] to approximate smoothing distributions. It combines both the forward-only version of the FFBS algorithm with the sampling mechanism of the FFBSi algorithm. It does not produce an approximation of the smoothing distributions but of the smoothed expectation of a fixed additive functional and thus may be used to approximate (2). Its crucial property is that it does not require a backward pass, the smoothed expectation is computed on-the-fly with the particle filter and no storage of the particles or weights is needed.
PaRIS algorithm relies on the following fundamental property of T
k
[H
k
] when H
k
is as in (2):
$$\begin{aligned} T_{k}\left[H_{k}\right](X_{k}) &\,=\,\mathbb{E}\left[\left.T_{k-1}\left[H_{k-1}\right]\left(X_{k-1}\right) + h_{k-1}\left(X_{k-1},X_{k}\right)\right|X_{k},Y_{0:k-1} \right]\;,\\ &\,=\, \frac{\int \phi_{k-1}\left(\!\mathrm{d} x_{k-1}\!\right)q_{k-1}\!\left(\!x_{k-1},X_{k}\!\right)\!\left\{\!T_{k-1}\!\left[\!H_{k-1}\!\right]\!\left(\!x_{k-1}\!\right) \!+ \!h_{k-1}\!\left(x_{k-1},X_{k}\!\right)\!\right\}}{\int \phi_{k-1}\left(\mathrm{d} x_{k-1}\right)q_{k-1}\left(x_{k-1},X_{k}\right)}. \end{aligned} $$
Therefore, [28] introduces sufficient statistics \(\tau ^{i}_{k}\) (starting with \(\tau ^{i}_{0} = 0\), 1≤i≤N), approximating \(T_{k}\left [H_{k}\right ]\left (\xi ^{i}_{k}\right)\), for 1≤i≤N and 0≤k≤n. First, replacing ϕk−1 by \(\phi ^{N}_{k-1}\) in the last equation leads to the following approximation of \(T_{k}\left [H_{k}\right ]\left (\xi ^{i}_{k}\right)\):
$$\begin{array}{*{20}l} T_{k}^{N}&\left[H_{k}\right]\left(\xi_{k}^{i}\right) =\\ &\sum_{j=1}^{N} \Lambda_{k-1}^{N}(i,j)\!\left\{\!T_{k-1}\left[H_{k-1}\right]\!\left(\xi_{k-1}^{j}\right) \,+\, h_{k-1}\left(\xi^{j}_{k-1},\xi^{i}_{k}\right)\right\}\;, \end{array} $$
(7)
where
$$ \Lambda_{k}^{N}(i,\ell) = \frac{\omega^{\ell}_{k} {q_{k}}\left(\xi^{\ell}_{k},\xi_{k+1}^{i}\right)}{\sum_{\ell=1}^{N}\omega^{\ell}_{k} {q_{k}}\left(\xi^{\ell}_{k},\xi_{k+1}^{i}\right)}\;,\quad 1\le \ell\le N\;. $$
(8)
Computing exactly these approximations would lead to a complexity growing quadratically with N because of the normalizing constant in (8). Therefore, PaRIS algorithm sample particles in the set \(\left \{\xi ^{j}_{k-1}\right \}_{j=1}^{N}\) with probabilities \(\Lambda _{k}^{N}(i,\cdot)\) to approximate the expectation (7) and produce \(\tau ^{i}_{k}\). Choosing \(\tilde {N}\ge 1\), at each time step 0≤k≤n−1 these statistics are updated according to the following steps:
-
(i)
Run one step of a particle filter to produce \(\left \{\left (\xi ^{\ell }_{k}, \omega ^{\ell }_{k}\right)\right \}\) for 1≤ℓ≤N.
-
(ii)
For all 1≤i≤N, sample independently \(J_{k}^{i,\ell }\) in {1,…,N} for \(1\le \ell \le \widetilde N\) with probabilities \(\Lambda _{k}^{N}(i,\cdot)\), given by (8).
-
(iii)
Set
$$\tau^{i}_{k+1} := \frac{1}{\widetilde{N}} \sum^{\widetilde{N}}_{\ell=1} \left\{ \tau^{J_{k}^{i,\ell}}_{k} + h_{k} \left(\xi^{J_{k}^{i,\ell}}_{k}, \xi^{i}_{k+1}\right) \right\}\;. $$
Then, (2) is approximated by
$$\phi_{0:n\vert n}^{N}\left[\tau_{n}\right] = \frac{1}{\Omega_{n}^{N}}\sum_{i=1}^{N} \omega^{i}_{n} \tau_{n}^{i}\;. $$
It is clear from steps (i) to (iii) that each time a new observation Yn+1 is received, the quantities \(\left (\tau _{n+1}^{i}\right)_{1\le i \le N}\) can be updated only using Yn+1, \(\left (\tau _{n}^{i}\right)_{1\le i \le N}\) and the particle filter at time n. This means that storage requirements do not increase when processing additional data.
As proved in [28], the algorithm is asymptotically consistent (as N goes to infinity) for any precision parameter \(\tilde N\). However, there is a significant qualitative difference between the cases \(\tilde {N} = 1\) and \(\tilde {N} \geq 2\). As for the FFBSi algorithm, when there exists σ+ such that 0<q
k
<σ+, PaRIS algorithm may be implemented with \(\mathcal {O}(N)\) complexity using the accept-reject mechanism of [10].
In general situations, PaRIS algorithm cannot be used for stochastic differential equations as q
k
is unknown. Therefore, the computation of the importance weights \(\omega _{k}^{\ell }\) and of the acceptance ratio of [10] is not tractable. Following [13, 27], filtering weights can be approximated by replacing \({q_{k}}\left (\xi ^{\ell }_{k},\xi _{k+1}^{i}\right)\) by an unbiased estimator \(\widehat {q}_{k}\left (\xi ^{\ell }_{k},\xi _{k+1}^{i};\zeta _{k}\right)\), where ζ
k
is a random variable in \(\mathbb {R}^{q}\) such that
$$\begin{array}{*{20}l} &\widehat{q}_{k}\left(\!\xi^{\ell}_{k},\xi_{k+1}^{i};\zeta_{k}\!\right)\!>\! 0~~\text{a.s}\quad\text{and}\\ &\mathbb{E}\left[\left.\widehat{q}_{k}\left(\xi^{\ell}_{k},\xi_{k+1}^{i};\zeta_{k}\right)\right| \mathcal{G}_{k+1}^{N}\right] = {q_{k}}\left(\xi^{\ell}_{k},\xi_{k+1}^{i}\right)\;, \end{array} $$
where for all 0≤k≤n,
$$\begin{array}{*{20}l} {}\mathcal{F}_{k}^{N} &= \sigma\left\{Y_{0:k};\left(\xi^{\ell}_{u},\omega^{\ell}_{u},\tau^{\ell}_{u}\right);J_{v}^{\ell,j};~1\le \ell\le N,~0\le u\le k, 1\right.\\ &\qquad\left.{\vphantom{\tau^{\ell}_{u}}}\le j \le \widetilde{N}, 0\le v< k\right\}\;,\\ {}\mathcal{G}_{k+1}^{N} &= \mathcal{F}_{k}^{N} \vee \sigma\left\{Y_{k+1};\left(\xi^{\ell}_{k+1},\omega^{\ell}_{k+1}\right);~1\le \ell\le N\right\}\;. \end{array} $$
Practical choices for ζ
k
are discussed below, see for instance (14) which presents the choice made for the implementation of such estimators in our context. In the case where q
k
is unknown, the filtering weights in (6) then become
$$ \widehat{\omega}^{\ell}_{k} := \frac{\widehat{q}_{k-1}\left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k};\zeta_{k-1}\right)g_{k}\left(\xi^{\ell}_{k}\right)}{\vartheta_{k}\left(\xi^{I^{\ell}_{k}}_{k-1}\right) p_{k-1} \left(\xi_{k-1}^{I^{\ell}_{k}},\xi^{\ell}_{k}\right)}\;. $$
(9)
In Algorithm 1, M independent copies \(\left (\zeta ^{m}_{k-1}\right)_{1\le m \le M}\) of ζk−1 are sampled and the empirical mean of the associated estimates of the transition density are used to compute \(\widehat {\omega }^{\ell }_{k}\) instead of a single realization. Therefore, to obtain a generalized random version of PaRIS algorithm, we only need to be able to sample from the discrete probability distribution \(\Lambda _{k}^{N}(i,\cdot)\) in the case of POD processes.
Consider the following assumption: for all 0≤k≤n−1, there exists a random variable \(\hat {\sigma }^{k}_{+}\) measurable with respect to \(\mathcal {G}_{k+1}^{N}\) such that,
$$ \text{sup}_{x,y,\zeta}\;\widehat{q}_{k}(x,y;\zeta)\leq \hat{\sigma}^{k}_+\;. $$
(10)
Lemma 1
Assume that (10) holds for some 0≤k≤n−1. For all 1≤i≤N, define the random variable \(J_{k}^{i}\) as follows:
Then, the conditional probability distribution given \(\mathcal {G}_{k+1}^{N}\) of \(J_{k}^{i}\) is \(\Lambda _{k}^{N}(i,\cdot)\).
Proof
See Appendix. □
Note that Lemma 1 still holds if assumption (10) is relaxed and replaced by
$$ \text{sup}_{j,y,\zeta}\;\widehat{q}_{k}\left(\xi^{j}_{k},y,\zeta\right)\leq \hat{\sigma}^{k}_+\;. $$
(11)
It is worth noting that under assumptions (10) or (11), the linear complexity property of PaRIS algorithm is ensured. The following assumption can also be considered. For all 1≤i≤N,
$$ \text{sup}_{j,\zeta}\;\widehat{q}_{k}\left(\xi^{j}_{k},\xi^{i}_{k+1},\zeta\right)\leq \hat{\sigma}^{k,i}_+\;. $$
(12)
If only assumption (12) holds, the algorithm has a quadratic complexity. The bound of (10) is uniform (it does not depend on the particles) and can be used for every particle 1≤i≤N. However, this bound can be large (with respect to the simulated set of particles) for the algorithm of Lemma 1. The bound of (12) requires N computations per particle (therefore, N2 computations). However, it is clear that this second bound is sharper that the one of (10) for the acceptance rejection procedure and may lead to a computationally more efficient algorithm.
Bounded estimator ofq
k
using GPEs For \(x, y \in \mathbb {R}^{d}\), by Girsanov and Ito’s formulas, the transition density q
k
(x,y) of (1) satisfies, with Δ
k
=tk+1−t
k
,
$$\begin{array}{*{20}l} {}q_{k}(x,y)=&\varphi_{\Delta_{k}}(x,y)\exp\left\lbrace A(y)-A(x)\right\rbrace\\ &\mathbb{E}_{\mathbb{W}^{x,y,\Delta_{k}}}\left[ \exp \left\lbrace - \int_{0}^{\Delta_{k}} \phi(\mathsf{w}_{s})\mathrm{d} s \right\rbrace \right]\;, \end{array} $$
where \(\mathbb {W}^{x,y,\Delta _{k}}\) is the law of Brownian bridge starting at x at 0 and hitting y at Δ
k
, \(\phantom {\dot {i}\!}(\mathsf {w}_{t})_{0\leq t \leq \Delta _{k}}\) is such a Brownian bridge, \(\varphi _{\Delta _{k}}(x,y)\) is the p.d.f. of a normal distribution with mean x and variance Δ
k
, evaluated at y and \(\phi :\mathbb {R}^{d}\to \mathbb {R}\) is defined as
$$\phi(x) =\left(\|\alpha(x)\|^{2} + \triangle A(x)\right)/2\;. $$
Assume that there exist random variables L
w
and U
w
such that for all 0≤s≤Δ
k
, L
w
≤ϕ(w
s
)≤U
w
. The performance of the estimator depends on the choice of L
w
and U
w
which is specific to the SDE. In the case of the models analyzed in Section 5, these bounds are discussed in [13] for the SINE model and in [27] for the log-growth model. Note that in the case where ϕ is not upper bounded, [5] proposed the EA3 algorithm. This layered Brownian bridge construction first samples random variables to determine in which layer the Brownian bridge lies before simulating the bridge conditional on the event that it belongs to the layer. By continuity of ϕ, L
w
, and U
w
can be computed easily.
Let κ be a random variable taking values in \(\mathbb {N}\) with distribution μ and (U
j
)1≤j≤κ be independent uniform random variables on [0,Δ
k
] and ζ
k
={κ,w,U1,…,U
κ
}. As shown in [13], a positive unbiased estimator is given by
$$\begin{array}{*{20}l} {}\widehat{q}_{k}(x,y;\zeta_{k}) =&\ \varphi_{\Delta_{k}}(x,y) \exp \left\{A(y) - A(x)\right\}\\ &\ \times\text{exp}\left\{-\mathsf{U}_{\mathsf{w}}\Delta\right\}\frac{\Delta_{k}^{\kappa}}{\mu(\kappa)\kappa!}\prod_{j=1}^{\kappa}\left(\mathsf{U}_{\mathsf{w}}-\phi\left(\mathsf{w}_{U_{j}}\right)\right)\;. \end{array} $$
(13)
Interesting choices of μ are discussed in [13], and we focus here on the so called GPE-1, where μ is a Poisson distribution with intensity (U
w
−L
w
)Δ
k
. In that case, the estimator (13) becomes
$$\begin{array}{*{20}l} {}\widehat{q}_{k}&(x,y;\zeta_{k}) =\\ {}&\varphi_{\Delta_{k}}(x,y) \exp \left\{A(y) - A(x)- \mathsf{L}_{\mathsf{w}}\Delta_{k} \right\}\prod_{j=1}^{\kappa}\frac{\mathsf{U}_{\mathsf{w}}-\phi\left(\mathsf{w}_{U_{j}}\right)}{\mathsf{U}_{\mathsf{w}}-\mathsf{L}_{\mathsf{w}}}\;. \end{array} $$
(14)
On the r.h.s. of (14), the product over κ elements is bounded by 1. Therefore, a sufficient condition to satisfy one of the assumptions (10)–(12) is that the function
$$\begin{array}{*{20}l} {}\rho_{\Delta_{k}}:\; \mathbb{R}^{d}\times \mathbb{R}^{d} &\mapsto \mathbb{R}\\ (x,y)&\mapsto \varphi_{\Delta_{k}}(x,y) \exp \left\{A(y) - A(x)- \mathsf{L}_{\mathsf{w}}\Delta_{k} \right\} \end{array} $$
is upper bounded almost surely by \(\hat {\sigma }^{k}_{+}\). In particular, if L
w
is bounded below almost surely, (14) always satisfies assumption (12) and Algorithm 1 can be used. This condition is always satisfied for models in the domains required for the applications of exact algorithms EA1, EA2, and EA3 defined in [6].
When (10) or (11) holds, it can be nonetheless of practical interest to choose the bounds \(\hat {\sigma }^{k,i}_{+}\), 1≤i≤N, corresponding to (12). Indeed, this might increase significantly the acceptance rate of the algorithm, and therefore reduce the number of draws of the random variable ζ
k
, which has a much higher cost than the computation of \(\rho _{\Delta _{k}}\), as it requires simulations of Brownian bridges. Moreover, this option allows to avoid numerical optimization if no analytical expression of \(\hat {\sigma }_{+}^{k}\) is available. In practice, this seems more efficient in terms of computational time when N has moderate values.