 Research
 Open Access
On the performance of parallelisation schemes for particle filtering
 Dan Crisan^{1},
 Joaquín Míguez^{2, 3}Email authorView ORCID ID profile and
 Gonzalo RíosMuñoz^{2, 3}
https://doi.org/10.1186/s136340180552x
© The Author(s) 2018
 Received: 2 March 2017
 Accepted: 1 May 2018
 Published: 25 May 2018
Abstract
Considerable effort has been recently devoted to the design of schemes for the parallel implementation of sequential Monte Carlo (SMC) methods for dynamical systems, also widely known as particle filters (PFs). In this paper, we present a brief survey of recent techniques, with an emphasis on the availability of analytical results regarding their performance. Most parallelisation methods can be interpreted as running an ensemble of lowercost PFs, and the differences between schemes depend on the degree of interaction among the members of the ensemble. We also provide some insights on the use of the simplest scheme for the parallelisation of SMC methods, which consists in splitting the computational budget into M noninteracting PFs with N particles each and then obtaining the desired estimators by averaging over the M independent outcomes of the filters. This approach minimises the parallelisation overhead yet still displays desirable theoretical properties. We analyse the mean square error (MSE) of estimators of moments of the optimal filtering distribution and show the effect of the parallelisation scheme on the approximation error rates. Following these results, we propose a time–error index to compare schemes with different degrees of parallelisation. Finally, we provide two numerical examples involving stochastic versions of the Lorenz 63 and Lorenz 96 systems. In both cases, we show that the ensemble of noninteracting PFs can attain the approximation accuracy of a centralised PF (with the same total number of particles) in just a fraction of its running time using a standard multicore computer.
Keywords
 Particle filtering
 Parallelisation
 Convergence analysis
 Particle islands
 Lorenz 63
 Lorenz 96
1 Introduction

Monte Carlo sampling in the space of the state variables,

Computation of weights for the generated samples and, finally,

Resampling according to the weights.
While at first sight the algorithm may look straightforward to parallelise (sampling and weighting can be carried out concurrently without any constraint), the resampling step involves the interaction of the whole set of Monte Carlo samples. Several authors have proposed schemes for ‘splitting’ the resampling step into simpler tasks that can be carried out concurrently. The approaches are diverse and range from the heuristic [7–9] to the mathematically wellprincipled [2, 10–14]. However, the former are largely based on (often loose) approximations that prevent the claim of any rigorous guarantees of convergence, whereas the latter involve nonnegligible overhead to ensure the proper interaction of particles.

A survey of recently proposed, and mathematically well grounded, parallelisation schemes for particle filtering and

Analytical insights into the performance of the simplest parallelisation method, namely the averaging of statistically independent PFs.
Besides describing the various methodologies, we aim at characterising their performance analytically whenever possible. For that purpose, we need to introduce accurate notation, unfortunately a bit more involved than needed for the mere description of the algorithmic steps. Then, we describe and provide a basic convergence result for the standard PF and proceed to describe four different approaches to its parallelisation: the simple averaging of statistically independent (i.e. non interacting) lowcomplexity PFs, the method based on the distributed resampling with nonproportional allocation (DRNA) procedure of [2, 10, 11], the particle island model of [13, 14] and the adaptive interaction scheme termed αsequential Monte Carlo (αSMC) in [12].
The simplest parallelisation scheme consists in running M statistically independent PFs with Nparticles (i.e. Monte Carlo samples) each and then averaging the M independent estimators. This approach has the limitation that the bias of the averaged estimator depends only on N. Hence, if N is relatively small, the bias is large even if we use a very high number M of parallel filters. This drawback can be overcome by allowing some degree of interaction among the M concurrently running PFs. The DRNA, particle island and αSMC approaches introduce this interaction in different flavours. In DRNAbased ensembles of PFs, each filter runs separately but it periodically exchanges a few particles with other members of the ensemble using a communication network [10]. Algorithms in the particle island class rely on two levels of resampling: conventional resampling at particle level and islandlevel resampling, where complete sets of particles (associated to parallelrunning PFs) are replicated or eliminated stochastically [13, 14]. Finally, the αSMC scheme of [12] is a very flexible methodology that enables the adaptive selection of different interaction patterns (i.e. which particles are resampled together) over time. For each one of these techniques, we describe the methodology and establish basic theoretical guarantees for convergence.
In the second part of the paper, we focus on the analysis of the performance of the simplest parallelisation scheme, the averaging of M statistically independent PFs with N particles each. Under mild assumptions, we analyse the mean square error (MSE) of the estimators of onedimensional statistics of the optimal filtering distribution and show explicitly the effect of the parallelisation scheme on the convergence rate. Specifically, we study the decomposition of the MSE into variance and bias components, to show that the variance is \(O\left (\frac {1}{MN}\right)\), i.e. it decreases linearly with the total number of particles, while the bias is \(O\left (\frac {1}{N^{2}}\right)\), i.e. it goes to 0 quadratically with N. These results have already been obtained, e.g. in [13] using the FeynmanKac framework of [4]. Here, we aim at providing a selfcontained analysis that illustrates the key theoretical issues in the convergence of parallel PFs. All proofs are constructed from elementary principles, and we obtain explicit error rates (for the bias, the variance and the MSE) that hold for all M and N, while the theorems in [13] are strictly asymptotic. While we have focused here on PFs for discretetime statespace models, the analysis can be similarly done for continuoustime systems, and, indeed, the basic results needed for that case can be found in [15]. Finally, in order to compare different parallelisation schemes, we introduce a time–error index that combines time complexity (asymptotic order of the running time) and estimation accuracy (asymptotic error rates) into a single quantitative figure of merit that can be used to compare schemes with different degrees of interaction.
The rest of the paper is organised as follows. In Section 2, we present basic background material, and notation, for the analysis of PFs. Section 3 is devoted to a survey of parallelisation schemes for particle filtering. Our analysis of the ensemble of noninteracting PFs is presented in Section 4. In Section 5, we present numerical results for two examples, namely the filtering of stochastic versions of the Lorenz 63 and Lorenz 96 systems, respectively. The latter is often used as a simplified model of atmospheric dynamics, and it has the property that it can be scaled to an arbitrary dimension. Our simulation results show that the use of averaged estimators computed from ensembles of noninteracting filters can be advantageous in terms of accuracy (not only running times) as the system dimension grows. Finally, Section 6 is devoted to a discussion of the obtained results, together with some concluding remarks.
2 Background
2.1 Notation and preliminaries

Functions.

The supremum norm of a real function \(f:\mathbb {R}^{d} \rightarrow \mathbb {R}\) is denoted as \(\ f \_{\infty } ~=~ \sup _{x\in \mathbb {R}^{d}}  f(x) \).

B(S) is the set of bounded real functions over \(S \subseteq \mathbb {R}^{d}\), i.e. \(f \in B(\mathbb {R}^{d})\) if, and only if, f has domain S and ∥f∥_{ ∞ }<∞.


Measures and integrals. Let \(S \subseteq \mathbb {R}^{d}\) be a subset of \(\mathbb {R}^{d}\).

\({\mathcal B}(S)\) is the σalgebra of Borel subsets of S.

\({\mathcal P}(S)\) is the set of probability measures over the measurable space \(({\mathcal B}(S),S)\).

\((f,\mu) \triangleq \int f(x) \mu (dx)\) is the integral of a real function \(f:S \rightarrow \mathbb {R}\) with respect to (w.r.t.) a measure \(\mu \in {\mathcal P}(S)\).

Given a probability measure \(\mu \in {\mathcal P}(S)\), a Borel set \(A \in {\mathcal B}(S)\) and the indicator functionμ(A) = (I_{ A },μ) is the probability of A.$$ I_{A}(x) = \left\{ \begin{array}{ll} 1, &\text{if}\ x \in A\\ 0, &\text{otherwise} \end{array} \right., $$


Sequences, vectors and random variables (r.v.’s).

We use a subscript notation for sequences, namely \(x_{t_{1}:t_{2}} \triangleq \left \{ x_{t_{1}}, \ldots, x_{t_{2}} \right \}\).

For an element \(x~=~\left (x_{1},\ldots,x_{d}\right) \in \mathbb {R}^{d}\) of a Euclidean space, its norm is denoted as \(\ x \~=~\sqrt {x_{1}^{2}+\ldots +x_{d}^{2} }\).

The L_{ p } norm of a real r.v. Z, with p ≥ 1, is written as \(\ Z \_{p} \triangleq E[ Z^{p} ]^{1/p}\), where E[·] denotes expectation w.r.t. the distribution of Z.

2.2 Statespace Markov models in discrete time
Consider two random sequences, {X_{ t }}_{t≥0} and {Y_{ t }}_{t≥1}, taking values in \({\mathcal X} \subseteq \mathbb {R}^{d_{x}}\) and \(\mathbb {R}^{d_{y}}\), respectively. Let \(\mathbb {P}_{t}\) be the joint probability measure for the collection of random variables {X_{0},X_{ n },Y_{ n }}_{1≤n≤t}.
where \(A \in {\mathcal B}({\mathcal X})\) is a Borel set. The sequence {Y_{ t }}_{t≥1} is termed the observation process. Each r.v. Y_{ t } is assumed to be conditionally independent of other observations given X_{ t }; hence, the conditional distribution of the r.v. Y_{ t } given X_{ t }=x_{ t } is fully described by the probability density function (pdf) g_{ t }(y_{ t }x_{ t })>0. We often use g_{ t } as a function of x_{ t } (i.e. as a likelihood) and hence we write \(g_{t}^{y}(x) ~\triangleq ~ g_{t}(yx)\). The prior τ_{0}, the kernels {τ_{ t }}_{t≥1} and the functions {g_{ t }}_{t≥1} describe a stochastic Markov statespace model in discrete time.
where \(A \in {\mathcal B}({\mathcal X})\). For many practical problems, the interest actually lies in the computation of statistics of π_{ t }, e.g. the posterior mean or the posterior variance of X_{ t }. Such statistics can be written as integrals of the form (f,π_{ t }), for some function \(f:{\mathcal X}\rightarrow \mathbb {R}\). Note that, for t = 0, we recover the prior signal measure, i.e. π_{0} = τ_{0}.
and we write ξ_{ t } = τ_{ t }π_{ t } as shorthand.
where 1(x) = 1 is the constant unit function.
2.3 Standard particle filter
Assume that a sequence of observations Y_{1:T} = y_{1:T}, for some T < ∞, is given. Then, the sequences of measures {π_{ t }}_{t≥1}, {ξ_{ t }}_{t≥1} and {ρ_{ t }}_{t≥0} can be numerically approximated using particle filtering. PFs are numerical methods based on the recursive relationships (4) and (6). The simplest algorithm, often called ‘standard particle filter’ or ‘bootstrap filter’ [16] (see also [17]), can be described as follows.
Step 2.(b) is referred to as resampling or selection. In the form stated here, it reduces to the socalled multinomial resampling algorithm [18, 19], but the convergence of the filter can be easily proved for various other schemes (see, e.g. the treatment of the resampling step in [6]).
respectively.
The convergence of PFs has been analysed in different ways [4, 6, 20–23]. Here, we use simple results for the convergence of the L_{ p } norms (p ≥ 1) of the approximation errors. For the approximation of integrals w.r.t. ξ_{ t } and π_{ t }, we have the following standard result.
Lemma 1
where \(\bar c_{t}\) and c_{ t } are finite constants independent of N, \(\ f \_{\infty }=\sup _{x \in {\mathcal X}} f(x)<\infty \) and the expectations are taken over the distributions of the measurevalued random variables \(\xi _{t}^{N}\) and \(\pi _{t}^{N}\), respectively.
Proof
This result is a special case of, e.g. Lemma 1 in [24]. □
3 Parallelisation schemes for particle filtering
3.1 Noninteracting particle filters
where we have denoted \(\pi _{t}^{M \times N} = \frac {1}{M} \sum _{m=1}^{M} \pi _{t}^{m,N}\).
This scheme is straightforward to implement, and it does not involve any parallelisation overhead as the M PFs do not interact. A selfcontained analysis of the MSE of the ensemble estimator \(\left (f,\pi _{t}^{M \times N}\right)\) is presented in Section 4.
A key result, to be explicitly shown in our analysis but also pointed out in [13] and [12], is that the estimation bias \(\left  E\left [ (f,\pi _{t})  \left (f,\pi _{t}^{M \times N}\right) \right ] \right \) decreases as O(N^{−2}). This implies that if the number of particles per subset, N, is kept fixed, then the MSE, \(E\left [ \left  (f,\pi _{t})  \left (f,\pi _{t}^{M \times N}\right) \right ^{2} \right ]\), remains bounded away from zero even if the number of subsets is made arbitrarily large, i.e. M→∞. This can be a drawback depending on the type of parallel computing configuration to be used. In multicore computers, for example, the number of subsets M can be expected to be moderate (of the order of cores available) and N can often be made large enough to make the bias negligible. On the other hand, implementations based on lowpower processors, such as graphical processing units (GPUs) or wireless networks, are more efficient when operating with a large number of subsets, M, and a low number of particles per subset, N. In these scenarios, the bias of the noninteracting ensemble estimator in Eq. (13) can be significant. The solution to this limitation is to introduce some degree of interaction among the M parallelrunning PFs. Some relevant schemes are described below.
3.2 Distributed resampling with nonproportional allocation
The scheme termed distributed resampling with nonproportional allocation (DRNA) for the parallelisation of PFs was originally introduced in [2] (Section IV.A.3), but it has been only recently that a theoretical characterisation of its performance has been obtained [10,11,26].
Typically, only small subsets of particles are exchanged, hence β(m,n)=(m,n) for most values of m and n. The resulting parallel particle filtering algorithm can be outlined as shown below (adapted from [10]).
We remark that every PF operates independently of all others except for the particle exchange, step 2.(c), which is carried out every t_{0} time steps. The degree of interaction can be controlled by designing the map β(m,k) in a proper way. Typically, exchanging a subset of particles with ‘neighbour’ PFs is sufficient. For example, if we assume the parallel PFs are arranged in a ring configuration, then the mth PF can exchange, say, two particles with PF number m−1 and another two particles with PF number m+1, in such a way that all parallel PFs retain N particles (four of them received from their neighbours) after the exchange.
We also note that the local resampling step is carried out independently, and concurrently, for each parallelrunning PF and it does not change the aggregate weights, i.e. \(\bar W_{t}^{(m)*} = \sum _{n=1}^{N} \bar w_{t}^{(m,n)*} = \sum _{n=1}^{N} \tilde w^{(m,n)*}\). We assume a multinomial resampling procedure, but other procedures can be used in an obvious manner.
The particle estimator of (f,π_{ t }) then becomes \(\left (f,\pi _{t}^{M \times N}\right) ~=~ \sum _{m=1}^{M} \frac {W_{t}^{(m)}}{N} \sum _{n=1}^{N} f\left (x_{t}^{(m,n)}\right)\).
The scheme in Algorithm 2 has been proved to converge uniformly over time, under some standard assumptions, when the number of particles per subset, N, is kept fixed and the number of subsets (i.e. the number of parallel PFs), M, is increased. To be specific, we have the following result, which is proved in [10] (Section 3.2).
Theorem 1

The sequence of observations {y_{ t }}_{t≥1} is fixed (but otherwise arbitrary) and there exists a real constant 0<a<∞ such that \(\frac {1}{a} < g_{t}^{y_{t}}(x) < a\) for every t≥1 and every \(x \in {\mathcal X}\).

The sequence of probability measures {π_{ t }}_{t≥0} is stable (see [25]).

The particle exchange step guarantees that$$ E\left[ \left(\sup_{1 \le m \le M} W_{rt_{0}}^{(m)} \right)^{q} \right] \le \frac{c^{q}}{M^{q\epsilon}}, \quad \text{for every} r \in \mathbb{N} $$
and some constants c<∞, 0≤ε<1 and q≥4 independent of M.
Assumption iii. in the latter theorem indicates that none of the M subsets should accumulate too much aggregate weight compared to the other subsets. This accumulation of weight is precisely controlled by the particle exchange steps. In a practical implementation, the aggregate weights \(W_{t}^{(m)*}\) should be monitored and additional particle exchange steps should be triggered when the weight of any subset increases beyond some prescribed threshold.
3.3 Particle islands
The particle island model was introduced in [13] in order to address the parallel processing of subsets of particles in SMC methods in a systematic manner. Similar to the DRNAbased PFs of Section 3.2, the algorithms proposed in [13] are based on running M parallel PFs, each one on a disjoint subset of particles, namely \(\big \{ x_{t}^{(m,n)} \big \}_{1 \le n \le N}\) for the mth filter, and keep track of the nonnormalised aggregate weights \(W_{t}^{(m)*}\) defined in Eq. (14).

Particle level: resampling is carried out locally within each of the M concurrently running PFs. This is equivalent to the local resampling step in Algorithm 2.

Island level: the aggregate weights \(W_{t}^{(m)}\) are used to resample the particle subsets, or islands, assigned to the individual PFs. In this step, complete subsets can be replicated or eliminated (in the same way as particles are in a conventional, or particle level, resampling step).
We now outline the double bootstrap filter, an algorithm described in [13] (Algorithm 1) that performs multinomial resampling at both the particle level and the island level. While in the version of [13] both resampling steps are taken at every time step t, we describe a slightly more general procedure where the islandlevel resampling steps are taken periodically, every t_{0} ≥ 1 time steps. For simplicity, we introduce the notation \({\sf X}_{t}^{m,N} ~=~ \big \{ x_{t}^{(m,n)} \big \}_{1 \le n \le N}\) for the subset of N particles assigned to the mth island (ie. the mth concurrently running PF).
In Algorithm 3, a multinomial resampling procedure is employed both at the particle level and the island level. Other schemes are obviously possible and some of them are explored in [13], including εinteractions and resampling conditional on the effective sample size.
The particle approximation of the optimal filter π_{ t } takes the form \(\pi _{t}^{M \times N} ~=~ \sum _{m=1}^{M} W_{t}^{(m)} \pi _{t}^{m,N}\), where \(\pi _{t}^{m,N} ~=~ \frac {1}{N} \sum _{n=1}^{N} \delta _{x_{t}^{(m,n)}}\). This is formally identical to the DRNAbased Algorithm 2, although the procedure for the computation of the particles and weights is obviously different.
The asymptotic convergence of the double bootstrap filter was proved in [13] using the FeynmanKac machinery of [4]. Then, in the followup paper [14], a central limit theorem was proved and bounds on the asymptotic variance of a class of schemes that includes Algorithm 3 were derived. Here we reproduce the basic convergence result of [13], adapted to the notation of this paper.
Theorem 2
where Var[·] denotes the variance of a random variable and B(f,t) and V(f,t) are finite constants with respect to both M and N.
where the constants \(\bar B(f,t)\) and \(\bar V(f,t)\) are independent of M and N. This implies that the bias of the estimator \(\left (f,\pi _{t}^{M\times N}\right)\) with noninteracting PFs depends only on N and cannot be eliminated by taking M→∞ alone. The MSE of Algorithm 3, on the other hand, vanishes as MN→∞.
3.4 Adaptive interaction pattern: the αSMC methodology
Rather than working with fixed subsets \({\sf X}_{t}^{m,N} \,=\, \left \{\! x_{t}^{(m,n)}\!\right \}_{1 \le n \le N}\), m = 1,...,M, the αSMC methodology of [12] enables the construction of particle filtering algorithms with adaptive interaction patterns. In particular, it is possible to devise parallelised PFs within this framework where the subsets of particles which are resampled together can change from one time step to the next (including their size, N).
Let K be the total number of particles. The interaction pattern for resampling is specified by means of a sequence of Markov transition matrices \(\alpha _{t} ~=~ \left [\alpha _{t}^{ij}\right ]\) where 1≤i≤K and 1≤j≤K are the row and column indices, respectively. Since α_{ t } is a Markov matrix, it satisfies \(\sum _{j=1}^{K} \alpha _{t}^{ij} = 1\) for every row i. The ith row in α_{ t } determines from which subset of particles we resample \(x_{t}^{(i)}\). The general αSMC method is outlined below. We assume that either the sequence α_{ t } is predetermined or there is some prescribed rule to select α_{ t } given the observations y_{1:t} and the particles \(\left \{ \bar x_{t}^{(k)} \right \}_{1 \le n \le K}\).
The particle approximation of π_{ t } produced by Algorithm 4 is \(\pi _{t}^{K} ~=~ \sum _{k=1}^{K} w_{t}^{(k)} \delta _{x_{t}^{(k)}}\). The αSMC scheme can be particularised to yield most standard particle filtering algorithms ([12] Section 2.2). Of specific interest for the purpose of parallelisation is that the DRNAbased PF (Algorithm 2) can also be described and analysed as an αSMC procedure [11].
The convergence of αSMC methods depends on the choice of the sequence of interaction matrices α_{ t }. Let us recursively define the matrices α_{t,t}=I_{ K } (where I_{ K } denotes the identity matrix) and α_{s,t}, constructed entrywise as \(\alpha _{s,t}^{ij} = \sum _{k=1}^{K} \alpha _{s+1,t}^{ik} \alpha _{s}^{kj}\), for i,j∈{1,...,K} and 0≤s<t. Furthermore, define \(\beta _{s,t}^{i} = \frac {1}{K} \sum _{j=1}^{K} \alpha _{s,t}^{ji}\), for i=1,...,K and 0≤s≤t. Then, we have the following result, proved in [12] (Section 3).
Theorem 3
4 Error rates for ensembles of noninteracting particle filters
4.1 Averaged estimators
for some constant t independent of N and M. However, the inequality (15) does not illuminate the effect of the choice of N. In the extreme case of N = 1, for example, \(\pi _{t}^{M \times N}\) reduces to the outcome of a sequential importance sampling algorithm, with no resampling, which is known to degenerate quickly in practice. Instead of (15), we seek a bound for the approximation error that provides some indication on the tradeoff between the number of independent filters, M, and the number of particles per filter, N.
With this purpose, we tackle the classical decomposition of the MSE in variance and bias terms. First, we obtain preliminary results that are needed for the analysis of the average measure \(\pi _{t}^{M \times N}\). In particular, we prove that the random nonnormalised measure \(\rho _{t}^{N}\) produced by the bootstrap filter (Algorithm 1) is unbiased and attains L_{ p } error rates proportional to \(\frac {1}{\sqrt {N}}\), i.e. the same as \(\xi _{t}^{N}\) and \(\pi _{t}^{N}\). We use these results to derive an upper bound for the bias of \(\pi _{t}^{N}\) which is proportional to \(\frac {1}{N}\). The latter enables us to deduce an upper bound for the MSE of the ensemble approximation \(\pi _{t}^{M \times N}\) consisting of two additive terms that depend explicitly on M and N. Specifically, we show that the variance component of the MSE decays linearly with the total number of particles, K=MN, while the bias term decreases with N^{2}, i.e. quadratically with the number of particles per filter.
4.2 Assumptions on the state space model
All the results to be introduced in the rest of Section 4 hold under the (mild) assumptions of Lemma 1, which we summarise below for convenience of presentation.
Assumption 1
The sequence of observations Y_{1:T}=y_{1:T} is arbitrary but fixed, with T<∞.
Assumption 2
Remark 2

\((g_{t}^{y_{t}},\alpha) > 0\), for any \(\alpha \in {\mathcal P}({\mathcal X})\), and

\(\prod _{k=1}^{T} g_{t}^{y_{t}} \le \prod _{k=1}^{T} \ g_{t}^{y_{t}} \_{\infty } < \infty \),
for every t=1,2,...,T.
Remark 3
We seek simple convergence results for a fixed time horizon T<∞, similar to Lemma 1. Therefore, no further assumptions related to the stability of the optimal filter for the statespace model [4,25] are needed. If such assumptions are imposed then stronger (time uniform) asymptotic convergence can be proved, similar to Theorem 1 in Section 3.2. See [11] for additional results that apply to the independent filters \(\pi _{t}^{m,N}\) and the ensemble \(\pi _{t}^{M \times N}\).
4.3 Bias and error rates
Our analysis relies on some properties of the particle approximations of the nonnormalised measures ρ_{ t }, t≥1. We first show that the estimate \(\rho _{t}^{N}\) in Eq. (8) is unbiased.
Lemma 2
Proof
See Appendix 1 for a selfcontained proof. □
Remark 4
The result in Lemma 2 was originally proved in [4]. For the case 1(x)=1, it states that the estimate \(\left (\mathbf {1},\rho _{t}^{N}\right)\) of the proportionality constant of the posterior distribution π_{ t } is unbiased. This property is at the core of recent model inference algorithms such as particle MCMC [27], SMC^{2}[28] or some population Monte Carlo [29] methods.
Combining Lemma 2 with the standard result of Lemma 1 leads to an explicit convergence rate for the L_{ p } norms of the approximation errors \(\left (f,\rho _{t}^{N}\right)  (f,\rho _{t})\).
Lemma 3
where \(\tilde c_{t} < \infty \) is a constant independent of N.
Proof
See Appendix 2. □
Finally, Lemmas 2 and 3 together enable the calculation of explicit rates for the bias of the particle approximation of (f,π_{ t }). This is a key result for the decomposition of the MSE into variance and bias terms. To be specific, we can prove the following theorem.
Theorem 4
The result in Theorem 4 was originally proved in [30], albeit by a different method.
This is a r.v. whose secondorder moment yields the MSE of \(\left (f,\pi _{t}^{N}\right)\). It is straightforward to obtain a bound for the MSE from Lemma 1 and, by subsequently using Theorem 4, we readily find a similar bound for the variance of \({\mathcal E}_{t}^{N}(f)\), denoted \(\text {\sf Var}\left [{\mathcal E}_{t}^{N}(f)\right ]\). These results are explicitly stated by the corollary below.
Corollary 1
where c_{ t } and \(c_{t}^{v}\) are finite constants independent of N.
Since Theorem 4 ensures that \(\big  E\left [{\mathcal E}_{t}^{N}\right ] \big  \le \frac {\hat c_{t}\ f \_{\infty }}{N}\), then the inequality (26) implies that there exists a constant \(c_{t}^{v}<\infty \) such that (25) holds. □
4.4 Error rate for the averaged estimators
Obviously, all the theoretical properties established in Section 4.3, as well as the basic Lemma 1, hold for each one of the M independent filters.
Definition 1
It is apparent that similar ensemble approximations can be given for ξ_{ t } and ρ_{ t }. Moreover, the statistical independence of the PFs yields the following corollary as a straightforward consequence of Theorem 4 and Corollary 1.
Corollary 2
holds for some constants \(c_{t}^{v}\) and \(\hat c_{t}\) independent of N and M.
Proof
where the inequality follows from Corollary 1. Since \(E\left [ \left ({\mathcal E}_{t}^{M \times N}\right)^{2} \right ] = \text {\sf Var}\left [ {\mathcal E}_{t}^{M\times N} \right ] + \left  E\left [ {\mathcal E}_{t}^{M \times N} \right ] \right ^{2}\), combining (29) and (28) yields (27) and concludes the proof. □
The inequality in Corollary 2 shows explicitly that the bias of the estimator \(\left (f,\pi _{t}^{M \times N}\right)\) cannot be arbitrarily reduced when N is fixed, even if M→∞. This feature is already discussed in Section 3.3. Note that the inequality (27) holds for any choice of M and N, while Theorem 2 yields asymptotic limits.
Remark 5
According to the inequality (27), the bias of the estimator \(\left (f,\pi _{t}^{M\times N}\right)\) is controlled by the number of particles per subset, N, and converges quadratically, while, for fixed N, the variance decays linearly with M. The MSE rate is \(\propto \frac {1}{MN} \) as long as N≥M. Otherwise, the term \(\frac {\hat c_{t}^{2} \ f \_{\infty }^{2}}{N^{2}}\) becomes dominant and the resulting asymptotic error bound turns out higher.
Remark 6
While the convergence results presented here have been proved for the standard bootstrap filter, it is straightforward to extend them to other classes of PFs for which Lemmas 1 and 2 hold.
4.5 Comparison of parallelisation schemes via time–error indices
The advantage of parallel computation is the drastic reduction of the time needed to run the PF. Let the running time for a PF with K particles be of order \({\mathcal T}(K)\), where \({\mathcal T}:\mathbb {N}\rightarrow (0,\infty)\) is some strictly increasing function of K. The quantity \({\mathcal T}(K)\) includes the time needed to generate new particles, weight them and perform resampling. The latter step is the bottleneck for parallelisation, as it requires the interaction of all K particles. Also, a ‘straightforward’ implementation of the resampling step leads to an execution time \({\mathcal T}(K)=K\log (K)\), although efficient algorithms exist that achieve to a linear time complexity, \({\mathcal T}(K)=K\). We can combine the MSE rate and the time complexity to propose a time–error performance metric.
Definition 2
We define the time–error index of a particle filtering algorithm with running time of order \({\mathcal T}\) and asymptotic MSE rate \({\mathcal R}\) as \({\mathcal C} \triangleq {\mathcal T} \times {\mathcal R}.\)
We have described alternative ensemble approximations where M nonindependent PFs are run with N particles each in Section 3. The overall error rates for these methods are same as for the standard bootstrap filter; however, the time complexity depends not only on the number of particles N allocated to each of the M subsets, but also on the subsequent interactions among subsets.

M bootstrap filters (as Algorithm 1 in this paper) are run in parallel and an aggregate weight is computed for each one of them, denoted \(W_{t}^{(m)}\);

When the coefficient of variation (CV) of these aggregate weights is greater than a given threshold, the M bootstrap filters are resampled (some filters are discarded and others are replicated using a multinomial resampling procedure).
When L<<N, we readily obtain that \({\mathcal C}_{ens} < {\mathcal C}_{dbf}\). For example, for a configuration with M = 10 filters and N = 100 particles each and assuming that islandlevel resampling is performed every L = 20 time steps on average, then \({\mathcal C}_{dpf} ~=~ 0.145\) and \({\mathcal C}_{ens}~=~0.110\). On the contrary, if L is large enough (namely, if L > N(M − 1)/M), the double bootstrap algorithm becomes more efficient, meaning that \({\mathcal C}_{dbf} ~<~ {\mathcal C}_{ens}\).
Computing the time–error index for practical algorithms can be hard and highly dependent on the specific implementation. Different implementations of the double bootstrap algorithm, for example, may yield different time–error indices depending on how the islandlevel resampling step is carried out.
5 Numerical results and discussion
5.1 Example: Lorenz 63 model
5.1.1 The threedimensional Lorenz system
where {V_{ t }}_{t=1,2,...} is a sequence of i.i.d. normal random variables with zero mean and variance \(\sigma ^{2} ~=~ \frac {1}{2}\).
5.1.2 Simulation setup

The standard bootstrap filter (Algorithm 1), termed BF in the sequel, and

The ensemble of noninteracting bootstrap filters (NIBFs) that we have investigated in Section 4
to track the sequence of probability measures π_{ t } generated by the threedimensional Lorenz model described in Section 5.1.1. We have generated a sequence of 200 synthetic observations, {y_{ t };t=1,...,200}, spread over an interval of 20 continuous time units, corresponding to 2×10^{4} discrete time steps in the Euler scheme (hence, one observation every 100 steps).
The ensemble of NIBFs consists of M filters with N particles each, while the standard BF runs with K particles, where K=MN for a fair comparison.
We have coded the three algorithms in Matlab (version 7.11.0.584 [R2010b] with the parallel computing toolbox) and run the experiments using a pool of identical multiprocessor machines, each one having 8 cores at 3.16 GHz and 32 GB of RAM memory. The standard (centralised) BF is run with K=NM particles in a single core. For the ensemble of NIBFs, we allow the parallel computing toolbox to allocate all available cores per server in order to run all BFs concurrently.
5.1.3 Numerical results
Next, we look into the relationship between the MSE and the running time for the two algorithms. With the number of filters M=20 fixed, we have run 100 independent simulation trials for each value N=100,200,400,800 and 1000 and computed the empirical MSE and the average running time for the parallel scheme and each combination of M and N. Correspondingly, we have also run the centralised BF with K = MN particles, hence for K=2×10^{3},4×10^{3},8×10^{3},16×10^{3} and 20×10^{3}.
5.2 Example: Lorenz 96 model
5.2.1 The Jdimensional Lorenz 96 system
where F = 8 is a constant forcing parameter^{5}, the Wiener processes {W_{ j }(s)}_{l,j≥0} are assumed independent and the scale parameter σ is known.
where j=0,…,J − 1 and {U_{j,n}}_{l,j,n≥0} are independent and identically distributed (i.i.d.) standard Gaussian r.v.’s.
where t=1,2,... and {V_{ t }}_{t≥1} is a sequence of i.i.d. r.v.’s with common pdf \({\mathcal N}\left (v_{t}; 0, \sigma _{y}^{2} {\mathcal I}_{\frac {J}{2}}\right)\), which denotes a \(\frac {J}{2}\)dimensional Gaussian distribution with 0 mean and covariance matrix \(\sigma _{y}^{2} {\mathcal I}_{\frac {J}{2}}\).
5.2.2 Simulation setup
We have run 100 independent simulations of the discretised Lorenz 96 model described in Section 5.2.1 above over 20 continuoustime units, with integration step T_{ d }=2×10^{−4} (which amounts to 10^{5} discretetime steps) and, for each simulation, we have obtained noisy observations, with \(\sigma _{y}^{2}=\frac {1}{2}\) and n_{0}=10, according to Eq. (36). The noisescale parameter σ in the state Eq. (35) is set as \(\sigma =\frac {1}{\sqrt {2}}\), so that the noise variance becomes \(\sigma _{x}^{2}=\frac {T_{d}}{2}\).
The experiments have been carried out for a Lorenz 96 model with J=20 variables first and then for the same model with J=50 variables.
The simulations have been coded using Matlab version R2016b (64 bits), with the parallel computing toolbox enabled, on an 8core Intel(R) Xeon(R) CPU E52680 v2 server, with clock frequency 2.80 GHz and 64 GB of RAM. All the results reported are averaged over 100 independent simulation runs as described above.
5.2.3 Numerical results
6 Conclusions
We have presented a survey of methods for the parallelisation of particle filters. Specifically, we have described the basic parallelisation scheme based on ensembles of statistically independent PFs and then discussed three alternatives which introduce different degrees of interaction among the concurrently running filters. We have placed emphasis on the theoretical guarantees of the algorithms, and, hence, we have stated conditions for the convergence of all the techniques, including the DRNAbased PF of [2], the particle island model of [13] and the αSMC method of [12].
In the second half of the paper, we have focused on the theoretical properties of the ensemble of noninteracting PFs. For this method, we have shown, both numerically and through the definition of time–error indices, that the averaging of statistically independent PFs should be preferred when N, the number of particles per independent filter, can be made sufficiently large to reduce the bias. This is often the case when using manycore computers (or computing clusters). When parallelisation is implemented using many lowpower devices (such as GPUs), parallelisation with interaction is more efficient. Our numerical experiments for the stochastic Lorenz 96 model also show that the averaging of independent estimators can lead to lower estimation errors, compared to a centralised bootstrap filter with the same number of particles, as the dimension of the state space is increased.
7 Appendix 1
8 Proof of Lemma 2
We proceed by induction in the time index t. For t = 0, ρ_{0}=τ_{0}=π_{0} and, since \(x_{0}^{(i)}\), i=1,...,N, are drawn from π_{0}, the equality \(E\left [\left (f,\rho _{0}^{N}\right)\right ] = \left (f,\rho _{0}\right)\) is straightforward.
where equality (46) follows from the induction hypothesis (39), (47) is obtained by simply reordering (46) and Eq. (48) follows from the recursive definition of ρ_{ t } in (5).
9 Appendix 2
10 Proof of Lemma 3
where \(Z_{t}^{(i)} = G_{t}^{N} f\left (x_{t}^{(i)}\right)  (f,\rho _{t})\), i=1,...,N. It is apparent that the random variables \(Z_{t}^{(i)}\), i=1,...,N, are conditionally independent given the σalgebra \(\bar {\mathcal F}_{t}\) generated by the set \(\left \{ x_{0:t1}^{(j)}, \bar x_{0:t}^{(j)} : 1 \le j \le N \right \}\). It can also be proved that every \(Z_{t}^{(i)}\) is centred and bounded, as explicitly shown in the sequel.
where (51) follows from Lemma 2 (i.e. \(\rho _{t1}^{N}\) is unbiased) and (52) is a straightforward consequence of the definition of ρ_{ t } in (5). Eq. 52 states that \(E\left [ Z_{t}^{(i)} \right ] = E\left [ G_{t}^{N} f\left (x_{t}^{(i)}\right)  \left (f,\rho _{t}\right) \right ] = 0\).
which is finite for any finite t (indeed, for every t ≤ T).
where the constant \(\breve c_{t}\) is finite and independent of N. From (56), we easily obtain the inequality (16) in the statement of Lemma 3, with \(\tilde c_{t} ~=~ 2 \breve c_{t} \ f \_{\infty } \prod _{k=1}^{t} \ g_{k}^{y_{k}} \_{\infty } < \infty \) for any t≤T<∞.
Note that \(G^{N}_{t}\) is an estimate of the normalising constant for π_{ t } (namely, the integral (1,ρ_{ t })) which can be shown to be unbiased under mild assumptions [4]. In Bayesian model selection, this constant is termed ‘model evidence’, while in parameter estimation problems, it is often referred to as the likelihood (of the unknown parameters) [27].
Other particle filtering algorithms can be applied in a straightforward way; however, we assume bootstrap filters (i.e. the procedure of Algorithm 1) for the sake of clarity and notational simplicity.
The deterministic Lorenz 96 system is chaotic for F > 6, with increasing turbulence of the chaotic flow as F is made larger.
Declarations
Acknowledgements
The authors thank Dr. Katrin Achutegui for her valuable assistance in obtaining and plotting the numerical results in Section 4.
Funding
This work was partially supported by Ministerio de Economía y Competitividad of Spain (TEC201238883C0201 COMPREHENSION and TEC201569868C21R ADVENTURE) and the Office of Naval Research Global (N62909 1512011). D. C. and J. M. acknowledge the support of the Isaac Newton Institute through the program Monte Carlo Inference for HighDimensional Statistical Models.
Authors’ contributions
DC and JM carried out the analysis and obtained the theoretical results. JM and GRM coded the algorithms and run the computer experiments. All authors collaborated in the composition of the manuscript. The authors are listed in alphabetical order. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 G Hendeby, R Karlsson, F Gustafsson, Particle filtering: the need for speed. EURASIP J. Adv. Sig. Process. 2010:, 22 (2010).MATHGoogle Scholar
 M Bolić, PM Djurić, S Hong, Resampling algorithms and architectures for distributed particle filters. IEEE Trans. Sig. Process. 53(7), 2442–2450 (2005).MathSciNetView ArticleMATHGoogle Scholar
 A Doucet, N de Freitas, N Gordon, Sequential Monte Carlo Methods in Practice (Springer, New York, 2001).View ArticleMATHGoogle Scholar
 P Del Moral, FeynmanKac Formulae: Genealogical and Interacting Particle Systems with Applications (SpringerVerlag, New York, 2004).View ArticleMATHGoogle Scholar
 O Cappé, SJ Godsill, E Moulines, An overview of existing methods and recent advances in sequential Monte Carlo. Proc. IEEE. 95(5), 899–924 (2007).View ArticleGoogle Scholar
 A Bain, D Crisan, Fundamentals of Stochastic Filtering (SpringerVerlag, New York, 2008).MATHGoogle Scholar
 A GelencsérHorváth, G Tornai, A Horváth, G Cserey, Fast, parallel implementation of particle filtering on the gpu architecture. EURASIP J. Adv. Sig. Process. 2013(1), 1–16 (2013).View ArticleGoogle Scholar
 J Míguez, Analysis of selection methods for costreference particle filtering with applications to maneuvering target tracking and dynamic optimization. Digit. Sig. Process. 17:, 787–807 (2007).View ArticleGoogle Scholar
 Hlinka O, Sluciak O, Hlawatsch F, Djuric P, Rupp M, Likelihood consensus and its application to distributed particle filtering. IEEE Trans. Sig. Process. 60(8), 4334–4349 (2012).MathSciNetView ArticleGoogle Scholar
 J Miguez, MA Vázquez, A proof of uniform convergence over time for a distributed particle filter. Sig. Process. 122:, 152–163 (2016).View ArticleGoogle Scholar
 K Heine, N Whiteley, Fluctuations, stability and instability of a distributed particle filter with local exchange. Stoch. Process. Appl.127.8(2017), 2508–2541 (2016).MathSciNetMATHGoogle Scholar
 N Whiteley, A Lee, K Heine, On the role of interaction in sequential Monte Carlo algorithms. Bernoulli. 22(1), 494–529 (2016).MathSciNetView ArticleMATHGoogle Scholar
 C Vergé, C Dubarry, P Del Moral, E Moulines, On parallel implementation of sequential Monte Carlo methods: the island particle model. Stat. Comput. 25(2), 243–260 (2015).MathSciNetView ArticleMATHGoogle Scholar
 P Del Moral, E Moulines, J Olsson, C Vergé, Convergence properties of weighted particle islands with application to the double bootstrap algorithm. Stoch. Syst. 6(2), 367–419 (2016).MathSciNetView ArticleMATHGoogle Scholar
 W Han, On the Numerical Solution of the Filtering Problem (Ph.D. Thesis. Department of Mathematics, Imperial College London, 2013).Google Scholar
 N Gordon, D Salmond, AFM Smith, Novel approach to nonlinear and nonGaussian Bayesian state estimation. IEE Proc.F. 140(2), 107–113 (1993).Google Scholar
 A Doucet, N de Freitas, N Gordon, in Sequential Monte Carlo Methods in Practice, ed. by A Doucet, N de Freitas, and N Gordon. An introduction to sequential Monte Carlo methods (SpringerVerlagNew York, 2001), pp. 4–14. chapter 1.View ArticleGoogle Scholar
 A Doucet, S Godsill, C Andrieu, On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 10(3), 197–208 (2000).View ArticleGoogle Scholar
 R Douc, O Cappé, in Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of the 4th International Symposium on. Comparison of resampling schemes for particle filtering (IEEE, 2005).Google Scholar
 D Crisan, A Doucet, A survey of convergence results on particle filtering. IEEE Trans. Sig. Process. 50(3), 736–746 (2002).MathSciNetView ArticleMATHGoogle Scholar
 N Chopin, A sequential particle filter method for static models. Biometrika. 89(3), 539–552 (2002).MathSciNetView ArticleMATHGoogle Scholar
 XL Hu, TB Schon, L Ljung, A basic convergence result for particle filtering. IEEE Trans. Sig. Process. 56(4), 1337–1348 (2008).MathSciNetView ArticleGoogle Scholar
 D Crisan, J Miguez, Particlekernel estimation of the filter density in statespace models. Bernoulli. 20(4), 1879–1929 (2014).MathSciNetView ArticleMATHGoogle Scholar
 J Míguez, D Crisan, PM Djurić, On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization. Stat. Comput. 23(1), 91–107 (2013).MathSciNetView ArticleMATHGoogle Scholar
 P Del Moral, A Guionnet, On the stability of interacting processes with applications to filtering and genetic algorithms. Ann. l’Institut Henri Poincaré, (B) Probab. Stat. 37(2), 155–194 (2001).MathSciNetView ArticleMATHGoogle Scholar
 J Miguez, in IEEE 8th Sensor Array and Multichannel Sig. Process. Workshop (SAM). On the uniform asymptotic convergence of a distributed particle filter (IEEE, 2014), pp. 241–244.Google Scholar
 C Andrieu, A Doucet, R Holenstein, Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B. 72:, 269–342 (2010).MathSciNetView ArticleMATHGoogle Scholar
 N Chopin, PE Jacob, O Papaspiliopoulos, SMC2: an efficient algorithm for sequential analysis of state space models. J. R. Stat. Soc. Ser. B Stat Methodol.75.3(2013), 397–426 (2012).Google Scholar
 J Míguez, IP Mariño, MA Vázquez, Analysis of a nonlinear importance sampling scheme for Bayesian parameter estimation in statespace models. Sig. Process. 142:, 281–291 (2018).View ArticleGoogle Scholar
 J Olsson, O Cappé, R Douc, E Moulines, Sequential Monte Carlo smoothing with application to parameter estimation in nonlinear state space models. Bernoulli. 14(1), 155–179 (2008).MathSciNetView ArticleMATHGoogle Scholar
 EN Lorenz, Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130–141 (1963).View ArticleMATHGoogle Scholar
 AJ Chorin, P Krause, Dimensional reduction for a Bayesian filter. PNAS. 101(42), 15013–15017 (2004).MathSciNetView ArticleMATHGoogle Scholar
 EN Lorenz, in Proceedings of the Seminar on Predictability, vol. 1. Predictability: a problem partly solved (European Centre on Medium Range Weather ForecastingReading, UK, 1996).Google Scholar
 J Hakkarainen, A Ilin, A Solonen, M Laine, H Haario, J Tamminen, E Oja, H Järvinen, On closure parameter estimation in chaotic systems. Nonlinear Proc. Geoph. 19(1), 127–143 (2012).View ArticleGoogle Scholar