4.1 Averaged estimators
We turn our attention to the analysis of the ensemble of non-interacting PFs outlined in Section 3.1. In particular, we study the accuracy of the particle approximations \(\pi _{t}^{m,N}\) and \(\pi _{t}^{M \times N}\) introduced in Eqs. (12) and (13), respectively. We adopt the mean square error (MSE) for integrals of bounded real functions,
$$\text{MSE} \equiv E\left[ \left(\left(f,\pi_{t}^{M\times N}\right) - (f,\pi_{t}) \right)^{2} \right], \quad f \in B({\mathcal X}), $$
as a performance metric. Since the underlying state-space model is the same for all filters and they are run in a completely independent manner, the measured-valued random variables \(\pi _{t}^{m,N}\), m=1,...,M, are i.i.d., and it is straightforward to show (via Lemma 1) that
$$ E\left[ \left(\left(f,\pi_{t}^{M\times N}\right) - (f,\pi_{t}) \right)^{2} \right] \le \frac{ c_{t}^{2} \| f \|_{\infty}^{2}}{MN}, $$
(15)
for some constant t independent of N and M. However, the inequality (15) does not illuminate the effect of the choice of N. In the extreme case of N = 1, for example, \(\pi _{t}^{M \times N}\) reduces to the outcome of a sequential importance sampling algorithm, with no resampling, which is known to degenerate quickly in practice. Instead of (15), we seek a bound for the approximation error that provides some indication on the trade-off between the number of independent filters, M, and the number of particles per filter, N.
With this purpose, we tackle the classical decomposition of the MSE in variance and bias terms. First, we obtain preliminary results that are needed for the analysis of the average measure \(\pi _{t}^{M \times N}\). In particular, we prove that the random non-normalised measure \(\rho _{t}^{N}\) produced by the bootstrap filter (Algorithm 1) is unbiased and attains L
p
error rates proportional to \(\frac {1}{\sqrt {N}}\), i.e. the same as \(\xi _{t}^{N}\) and \(\pi _{t}^{N}\). We use these results to derive an upper bound for the bias of \(\pi _{t}^{N}\) which is proportional to \(\frac {1}{N}\). The latter enables us to deduce an upper bound for the MSE of the ensemble approximation \(\pi _{t}^{M \times N}\) consisting of two additive terms that depend explicitly on M and N. Specifically, we show that the variance component of the MSE decays linearly with the total number of particles, K=MN, while the bias term decreases with N2, i.e. quadratically with the number of particles per filter.
4.2 Assumptions on the state space model
All the results to be introduced in the rest of Section 4 hold under the (mild) assumptions of Lemma 1, which we summarise below for convenience of presentation.
Assumption 1
The sequence of observations Y1:T=y1:T is arbitrary but fixed, with T<∞.
Assumption 2
The likelihood functions are bounded and positive, i.e.
$$ g_{t}^{y_{t}} \in B({\mathcal X}) \quad\text{and}\quad g_{t}^{y_{t}}>0 \quad\text{for every}\quad t= 1, 2,..., T. $$
Remark 2
Note that Assumptions 1 and 2 imply that
-
\((g_{t}^{y_{t}},\alpha) > 0\), for any \(\alpha \in {\mathcal P}({\mathcal X})\), and
-
\(\prod _{k=1}^{T} g_{t}^{y_{t}} \le \prod _{k=1}^{T} \| g_{t}^{y_{t}} \|_{\infty } < \infty \),
for every t=1,2,...,T.
Remark 3
We seek simple convergence results for a fixed time horizon T<∞, similar to Lemma 1. Therefore, no further assumptions related to the stability of the optimal filter for the state-space model [4,25] are needed. If such assumptions are imposed then stronger (time uniform) asymptotic convergence can be proved, similar to Theorem 1 in Section 3.2. See [11] for additional results that apply to the independent filters \(\pi _{t}^{m,N}\) and the ensemble \(\pi _{t}^{M \times N}\).
4.3 Bias and error rates
Our analysis relies on some properties of the particle approximations of the non-normalised measures ρ
t
, t≥1. We first show that the estimate \(\rho _{t}^{N}\) in Eq. (8) is unbiased.
Lemma 2
If Assumptions 1 and 2 hold, then
$$ E\left[ \left(f,\rho_{t}^{N}\right) \right] = (f,\rho_{t}) $$
for any \(f \in B({\mathcal X})\) and every t=1,2,...,T.
Proof
See Appendix 1 for a self-contained proof. □
Remark 4
The result in Lemma 2 was originally proved in [4]. For the case 1(x)=1, it states that the estimate \(\left (\mathbf {1},\rho _{t}^{N}\right)\) of the proportionality constant of the posterior distribution π
t
is unbiased. This property is at the core of recent model inference algorithms such as particle MCMC [27], SMC2[28] or some population Monte Carlo [29] methods.
Combining Lemma 2 with the standard result of Lemma 1 leads to an explicit convergence rate for the L
p
norms of the approximation errors \(\left (f,\rho _{t}^{N}\right) - (f,\rho _{t})\).
Lemma 3
If Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\), any p≥1 and every t=1,2,...,T, we have the inequality
$$ \| \left(f,\rho_{t}^{N}\right) - (f,\rho_{t}) \|_{p} \le \frac{ \tilde c_{t} \| f \|_{\infty} }{ \sqrt{N}}, $$
(16)
where \(\tilde c_{t} < \infty \) is a constant independent of N.
Proof
See Appendix 2. □
Finally, Lemmas 2 and 3 together enable the calculation of explicit rates for the bias of the particle approximation of (f,π
t
). This is a key result for the decomposition of the MSE into variance and bias terms. To be specific, we can prove the following theorem.
Theorem 4
If 0<(1,ρ
t
)<∞ for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and every 0≤t≤T, we obtain
$$ \left| E\left[ \left(f,\pi_{t}^{N}\right) - \left(f,\pi_{t}\right) \right] \right| \le \frac{ \hat c_{t} \| f \|_{\infty} }{ N }, $$
where \(\hat c_{t} < \infty \) is a constant independent of N.
Proof Let us first note that (f,π
t
)=(f,ρ
t
)/(1,ρ
t
) and
$$\begin{array}{*{20}l} \left(f,\pi_{t}^{N}\right) &= \frac{ \left(f,\rho_{t}^{N}\right) }{ G_{t}^{N} } \end{array} $$
(17)
$$\begin{array}{*{20}l} &= \frac{ \left(f,\rho_{t}^{N}\right) }{ G_{t}^{N} \left(\mathbf{1},\pi_{t}^{N}\right) } \end{array} $$
(18)
$$\begin{array}{*{20}l} &= \frac{ \left(f,\rho_{t}^{N}\right) }{ \left(\mathbf{1},\rho_{t}^{N}\right) }, \end{array} $$
(19)
where (17) follows from the construction of \(\rho _{t}^{N}\), (18) holds because \(\left (\mathbf {1},\pi _{t}^{N}\right)=1\) and (19) is, again, a consequence of the definition of \(\rho _{t}^{N}\). Therefore, the difference \(\left (f,\pi _{t}^{N}\right)-\left (f,\pi _{t}\right)\) can be written as
$$\left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) = \frac{ \left(f,\rho_{t}^{N}\right) }{ \left(\mathbf{1},\rho_{t}^{N}\right) } - \frac{ (f,\rho_{t}) }{ (\mathbf{1},\rho_{t}) } $$
and, since \(\left (f,\rho _{t}\right)=E\left [\left (f,\rho _{t}^{N}\right)\right ]\) (from Lemma 2), the bias can be expressed as
$$ E\left[ \left(f,\pi_{t}^{N}\right)-(f,\pi_{t}) \right] = E\left[ \frac{ \left(f,\rho_{t}^{N}\right) }{ \left(\mathbf{1},\rho_{t}^{N}\right) } - \frac{ \left(f,\rho_{t}^{N}\right) }{ (\mathbf{1},\rho_{t}) } \right]. $$
(20)
Some elementary manipulations on (20) yield the equality
$$\begin{array}{*{20}l} {}E \left[ \left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) \right] = E\left[ \left(f,\pi_{t}^{N}\right) \frac{ (\mathbf{1},\rho_{t}) - \left(\mathbf{1},\rho_{t}^{N}\right) }{ (\mathbf{1},\rho_{t}) } \right]. \end{array} $$
(21)
If we realise that \(E\left [ (\mathbf {1},\rho _{t}) - \left (\mathbf {1},\rho _{t}^{N}\right) \right ]=0\) (again, a consequence of Lemma 2) and move the factor (1,ρ
t
)−1 out of the expectation, then we easily rewrite Eq. (21) as
$$\begin{array}{*{20}l} {}E\left[\! \left(f,\pi_{t}^{N}\right)-(f,\pi_{t})\! \right] &\,=\, \frac{1}{(\mathbf{1},\rho_{t})} E\left[ \!\left(f,\pi_{t}^{N}\right) \left(\!(\mathbf{1},\rho_{t}) - \left(\mathbf{1},\rho_{t}^{N}\right)\! \right)\! \right] \\ &\quad- \frac{ (f,\pi_{t}) }{ (\mathbf{1},\rho_{t}) } E\left[ (\mathbf{1},\rho_{t}) - \left(\mathbf{1},\rho_{t}^{N}\right) \right] \\ &= \frac{1}{(\mathbf{1},\rho_{t})} E\left[ \left(\left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) \right)\right.\\ &\qquad\qquad\quad\left.\left((\mathbf{1},\rho_{t})-\left(\mathbf{1},\rho_{t}^{N}\right) \right) \right] \\ &\le \frac{1}{(\mathbf{1},\rho_{t})} \sqrt{ E\left[ \left(\left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) \right)^{2} \right] } \\ &\quad\times \sqrt{ E\left[ \left((\mathbf{1},\rho_{t})-\left(\mathbf{1},\rho_{t}^{N}\right) \right)^{2} \right] } \end{array} $$
(22)
$$\begin{array}{*{20}l} &\le \frac{1}{(\mathbf{1},\rho_{t})} \left(\frac{ c_{t} \| f \|_{\infty} }{ N } \times \frac{ \tilde c_{t} }{ N } \right) = \frac{ \hat c_{t} \| f \|_{\infty} }{ N }, \end{array} $$
(23)
where we have applied the Cauchy-Schwartz inequality to obtain (22), (23) follows from Lemmas 1 and 3 and
$$\hat c_{t} = \frac{ c_{t} \tilde c_{t} \| f \|_{\infty} }{ (\mathbf{1},\rho_{t}) } < \infty $$
is a constant independent of N. □
The result in Theorem 4 was originally proved in [30], albeit by a different method.
For any \(f \in B({\mathcal X})\), let \({\mathcal E}_{t}^{N}(f)\) denote the approximation difference, i.e.
$${\mathcal E}_{t}^{N}(f) \triangleq \left(f,\pi_{t}^{N}\right) - \left(f,\pi_{t}\right). $$
This is a r.v. whose second-order moment yields the MSE of \(\left (f,\pi _{t}^{N}\right)\). It is straightforward to obtain a bound for the MSE from Lemma 1 and, by subsequently using Theorem 4, we readily find a similar bound for the variance of \({\mathcal E}_{t}^{N}(f)\), denoted \(\text {\sf Var}\left [{\mathcal E}_{t}^{N}(f)\right ]\). These results are explicitly stated by the corollary below.
Corollary 1
If 0<(1,ρ
t
)<∞ for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and any 0≤t≤T, we obtain
$$\begin{array}{*{20}l} E\left[ \left({\mathcal E}_{t}^{N}\left(f\right) \right)^{2} \right] &\le \frac{ c_{t}^{2} \| f \|_{\infty}^{2} }{ N } \quad \text{and} \end{array} $$
(24)
$$\begin{array}{*{20}l} \text{\sf Var}\left[ {\mathcal E}_{t}^{N}\left(f\right) \right] &\le \frac{ \left(c_{t}^{v} \right)^{2} \| f \|_{\infty}^{2} }{ N }, \end{array} $$
(25)
where c
t
and \(c_{t}^{v}\) are finite constants independent of N.
Proof The inequality (24) for the MSE is a straightforward consequence of Lemma 1. Moreover, we can write the MSE in terms of the variance and the square of the bias, which yields
$$ {}E\left[ \left({\mathcal E}_{t}^{N}(f) \right)^{2} \right] = \text{\sf Var}\left[ {\mathcal E}_{t}^{N}(f) \right] + E^{2}\left[ {\mathcal E}_{t}^{N} \right] \le \frac{ c_{t}^{2} \| f \|_{\infty}^{2} }{ N }. $$
(26)
Since Theorem 4 ensures that \(\big | E\left [{\mathcal E}_{t}^{N}\right ] \big | \le \frac {\hat c_{t}\| f \|_{\infty }}{N}\), then the inequality (26) implies that there exists a constant \(c_{t}^{v}<\infty \) such that (25) holds. □
4.4 Error rate for the averaged estimators
Let us run M independent PFs with the same (fixed) sequence of observations Y1:T=y1:T, T<∞, and N particles each. The random measures output by the mth filter are denoted
$$\xi_{t}^{m,N}, \quad \pi_{t}^{m,N} \quad \text{and} \quad \rho_{t}^{m,N}, \quad \text{with \(m = 1, 2,..., M\).} $$
Obviously, all the theoretical properties established in Section 4.3, as well as the basic Lemma 1, hold for each one of the M independent filters.
Definition 1
The ensemble approximation of π
t
with M independent filters is the discrete random measure \(\pi _{t}^{M \times N}\) constructed as
$$ \pi_{t}^{M \times N} = \frac{1}{M} \sum_{m=1}^{M} \pi_{t}^{m,N}, $$
and the averaged estimator of (f,π
t
) is \(\left (f,\pi _{t}^{M\times N}\right)\).
It is apparent that similar ensemble approximations can be given for ξ
t
and ρ
t
. Moreover, the statistical independence of the PFs yields the following corollary as a straightforward consequence of Theorem 4 and Corollary 1.
Corollary 2
If 0<(1,ρ
t
)<∞ for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and any 0≤t≤T, the inequality
$$ {}E \left[ \left(\left(f,\pi_{t}^{M \times N}\right) - \left(f,\pi_{t}\right) \right)^{2} \right] \le \frac{ (c_{t}^{v})^{2} \| f \|_{\infty}^{2} }{ MN } + \frac{ \hat c_{t}^{2} \| f \|_{\infty}^{2} }{N^{2}} $$
(27)
holds for some constants \(c_{t}^{v}\) and \(\hat c_{t}\) independent of N and M.
Proof
Let us denote
$$\begin{array}{*{20}l} {\mathcal E}_{t}^{M \times N}\left(f\right) &= \left(f,\pi_{t}^{M \times N}\right) - \left(f,\pi_{t}\right) \quad \text{and} \\ {\mathcal E}_{t}^{m,N}\left(f\right) &= \left(f,\pi_{t}^{m, N}\right) - \left(f,\pi_{t}\right) \end{array} $$
for m=1,2,...,M. Since \(\pi _{t}^{M \times N}\) is a linear combination of i.i.d. random measures, we easily obtain that
$$\begin{array}{*{20}l} \left| E\left[ {\mathcal E}_{t}^{M \times N}(f) \right] \right|^{2} &= \left| \frac{1}{M} \sum_{m=1}^{M} E\left[ {\mathcal E}_{t}^{m,N}(f) \right] \right|^{2} \\ &= \left| E\left[ {\mathcal E}_{t}^{m,N}(f) \right] \right|^{2} \\ &\le \frac{\hat c_{t} \| f \|_{\infty}}{N}, \quad \text{for any \(m \le M\)}, \end{array} $$
(28)
where the inequality follows from Theorem 4. Moreover, again because of the independence of the random measures, we readily calculate a bound for the variance of \({\mathcal E}_{t}^{M \times N}(f)\),
$$ {}\text{\sf Var}\left[ {\mathcal E}_{t}^{M \times N}(f) \right] = \frac{1}{M} \text{\sf Var}\left[ {\mathcal E}_{t}^{m,N}(f) \right] \le \frac{ (c_{t}^{v})^{2} \| f \|_{\infty}^{2} }{ MN }, $$
(29)
where the inequality follows from Corollary 1. Since \(E\left [ \left ({\mathcal E}_{t}^{M \times N}\right)^{2} \right ] = \text {\sf Var}\left [ {\mathcal E}_{t}^{M\times N} \right ] + \left | E\left [ {\mathcal E}_{t}^{M \times N} \right ] \right |^{2}\), combining (29) and (28) yields (27) and concludes the proof. □
The inequality in Corollary 2 shows explicitly that the bias of the estimator \(\left (f,\pi _{t}^{M \times N}\right)\) cannot be arbitrarily reduced when N is fixed, even if M→∞. This feature is already discussed in Section 3.3. Note that the inequality (27) holds for any choice of M and N, while Theorem 2 yields asymptotic limits.
Remark 5
According to the inequality (27), the bias of the estimator \(\left (f,\pi _{t}^{M\times N}\right)\) is controlled by the number of particles per subset, N, and converges quadratically, while, for fixed N, the variance decays linearly with M. The MSE rate is \(\propto \frac {1}{MN} \) as long as N≥M. Otherwise, the term \(\frac {\hat c_{t}^{2} \| f \|_{\infty }^{2}}{N^{2}}\) becomes dominant and the resulting asymptotic error bound turns out higher.
Remark 6
While the convergence results presented here have been proved for the standard bootstrap filter, it is straightforward to extend them to other classes of PFs for which Lemmas 1 and 2 hold.
4.5 Comparison of parallelisation schemes via time–error indices
The advantage of parallel computation is the drastic reduction of the time needed to run the PF. Let the running time for a PF with K particles be of order \({\mathcal T}(K)\), where \({\mathcal T}:\mathbb {N}\rightarrow (0,\infty)\) is some strictly increasing function of K. The quantity \({\mathcal T}(K)\) includes the time needed to generate new particles, weight them and perform resampling. The latter step is the bottleneck for parallelisation, as it requires the interaction of all K particles. Also, a ‘straightforward’ implementation of the resampling step leads to an execution time \({\mathcal T}(K)=K\log (K)\), although efficient algorithms exist that achieve to a linear time complexity, \({\mathcal T}(K)=K\). We can combine the MSE rate and the time complexity to propose a time–error performance metric.
Definition 2
We define the time–error index of a particle filtering algorithm with running time of order \({\mathcal T}\) and asymptotic MSE rate \({\mathcal R}\) as \({\mathcal C} \triangleq {\mathcal T} \times {\mathcal R}.\)
The smaller the index \({\mathcal C}\) for an algorithm, the more (asymptotically) efficient its implementation. For the standard (centralised) bootstrap filter (see Algorithm 1) with K particles, the running time is of order \({\mathcal T}(K)=K\) and the MSE rate is of order \({\mathcal R}(K)=\frac {1}{K}\); hence, the time–error index becomes
$${\mathcal C}_{bf}(K) = {\mathcal T}(K) \times {\mathcal R}(K) = 1. $$
For the computation of the ensemble approximation \(\pi _{t}^{M \times N}\), we can run M independent PFs in parallel, with N=K/M particles each and no interaction among them. Hence, the execution time becomes of order \({\mathcal T}(M,N)=N\). Since the error rate for the ensemble approximation is of order \({\mathcal R}(M,N)~=~\left (\frac {1}{MN}+\frac {1}{N^{2}}\right)\), the time–error index of the ensemble approximation is
$${\mathcal C}_{ens}(M,N) = {\mathcal T}(M,N) \times {\mathcal R}(M,N) = \frac{1}{M} + \frac{1}{N} $$
and hence it vanishes with M,N→∞. In particular, since we have to choose N≥M to ensure a rate of order \(\frac {1}{MN}\), then \({\lim }_{M \rightarrow \infty } {\mathcal C}_{ens} = 0\). In any case, whenever N>1 it is apparent that \({\mathcal C}_{ens} < {\mathcal C}_{bf}\).
We have described alternative ensemble approximations where M non-independent PFs are run with N particles each in Section 3. The overall error rates for these methods are same as for the standard bootstrap filter; however, the time complexity depends not only on the number of particles N allocated to each of the M subsets, but also on the subsequent interactions among subsets.
Let us consider, for example, the double bootstrap algorithm with adaptive selection of [13] (namely, [13] (Algorithm 4)). This is a scheme where
-
M bootstrap filters (as Algorithm 1 in this paper) are run in parallel and an aggregate weight is computed for each one of them, denoted \(W_{t}^{(m)}\);
-
When the coefficient of variation (CV) of these aggregate weights is greater than a given threshold, the M bootstrap filters are resampled (some filters are discarded and others are replicated using a multinomial resampling procedure).
See [13] (Section 4.2) for details. Assuming that the resampling procedure in the second step above (termed island-level resampling in [13]) is performed, in the average, once every L time steps, then the running time for this algorithm is
$${\mathcal T}(M,N,L) = \frac{L-1}{L} N + \frac{1}{L} MN = \frac{N}{L}(M+L-1), $$
while the approximation error is \({\mathcal R}(M,N) = \frac {1}{MN}\) (see ([13] Theorem 5)). Hence, the time–error index for this double bootstrap algorithm is
$${\mathcal C}_{dbf} = {\mathcal T}(M,N,L) \times {\mathcal R}(M,N) = \frac{M+L-1}{LM}. $$
When L<<N, we readily obtain that \({\mathcal C}_{ens} < {\mathcal C}_{dbf}\). For example, for a configuration with M = 10 filters and N = 100 particles each and assuming that island-level resampling is performed every L = 20 time steps on average, then \({\mathcal C}_{dpf} ~=~ 0.145\) and \({\mathcal C}_{ens}~=~0.110\). On the contrary, if L is large enough (namely, if L > N(M − 1)/M), the double bootstrap algorithm becomes more efficient, meaning that \({\mathcal C}_{dbf} ~<~ {\mathcal C}_{ens}\).
Computing the time–error index for practical algorithms can be hard and highly dependent on the specific implementation. Different implementations of the double bootstrap algorithm, for example, may yield different time–error indices depending on how the island-level resampling step is carried out.