 Research
 Open Access
 Published:
On the performance of parallelisation schemes for particle filtering
EURASIP Journal on Advances in Signal Processing volume 2018, Article number: 31 (2018)
Abstract
Considerable effort has been recently devoted to the design of schemes for the parallel implementation of sequential Monte Carlo (SMC) methods for dynamical systems, also widely known as particle filters (PFs). In this paper, we present a brief survey of recent techniques, with an emphasis on the availability of analytical results regarding their performance. Most parallelisation methods can be interpreted as running an ensemble of lowercost PFs, and the differences between schemes depend on the degree of interaction among the members of the ensemble. We also provide some insights on the use of the simplest scheme for the parallelisation of SMC methods, which consists in splitting the computational budget into M noninteracting PFs with N particles each and then obtaining the desired estimators by averaging over the M independent outcomes of the filters. This approach minimises the parallelisation overhead yet still displays desirable theoretical properties. We analyse the mean square error (MSE) of estimators of moments of the optimal filtering distribution and show the effect of the parallelisation scheme on the approximation error rates. Following these results, we propose a time–error index to compare schemes with different degrees of parallelisation. Finally, we provide two numerical examples involving stochastic versions of the Lorenz 63 and Lorenz 96 systems. In both cases, we show that the ensemble of noninteracting PFs can attain the approximation accuracy of a centralised PF (with the same total number of particles) in just a fraction of its running time using a standard multicore computer.
Introduction
Over the past decade, there has been a continued interest in the design of schemes for the implementation of particle filtering algorithms using parallel or distributed hardware of various types, including general purpose devices such as multicore CPUs or graphical processing units (GPUs) [1] and applicationtailored devices such as fieldprogrammable gate arrays (FPGAs) [2]. A particle filter (PF) is a recursive algorithm for the approximation of the sequence of posterior probability distributions that arise from a stochastic dynamical system in statespace form (see, e.g. [3–6] and references therein for a general view of the field). A typical PF includes three steps that are repeated sequentially:

Monte Carlo sampling in the space of the state variables,

Computation of weights for the generated samples and, finally,

Resampling according to the weights.
While at first sight the algorithm may look straightforward to parallelise (sampling and weighting can be carried out concurrently without any constraint), the resampling step involves the interaction of the whole set of Monte Carlo samples. Several authors have proposed schemes for ‘splitting’ the resampling step into simpler tasks that can be carried out concurrently. The approaches are diverse and range from the heuristic [7–9] to the mathematically wellprincipled [2, 10–14]. However, the former are largely based on (often loose) approximations that prevent the claim of any rigorous guarantees of convergence, whereas the latter involve nonnegligible overhead to ensure the proper interaction of particles.
The goal of this paper is to provide

A survey of recently proposed, and mathematically well grounded, parallelisation schemes for particle filtering and

Analytical insights into the performance of the simplest parallelisation method, namely the averaging of statistically independent PFs.
Besides describing the various methodologies, we aim at characterising their performance analytically whenever possible. For that purpose, we need to introduce accurate notation, unfortunately a bit more involved than needed for the mere description of the algorithmic steps. Then, we describe and provide a basic convergence result for the standard PF and proceed to describe four different approaches to its parallelisation: the simple averaging of statistically independent (i.e. non interacting) lowcomplexity PFs, the method based on the distributed resampling with nonproportional allocation (DRNA) procedure of [2, 10, 11], the particle island model of [13, 14] and the adaptive interaction scheme termed αsequential Monte Carlo (αSMC) in [12].
The simplest parallelisation scheme consists in running M statistically independent PFs with Nparticles (i.e. Monte Carlo samples) each and then averaging the M independent estimators. This approach has the limitation that the bias of the averaged estimator depends only on N. Hence, if N is relatively small, the bias is large even if we use a very high number M of parallel filters. This drawback can be overcome by allowing some degree of interaction among the M concurrently running PFs. The DRNA, particle island and αSMC approaches introduce this interaction in different flavours. In DRNAbased ensembles of PFs, each filter runs separately but it periodically exchanges a few particles with other members of the ensemble using a communication network [10]. Algorithms in the particle island class rely on two levels of resampling: conventional resampling at particle level and islandlevel resampling, where complete sets of particles (associated to parallelrunning PFs) are replicated or eliminated stochastically [13, 14]. Finally, the αSMC scheme of [12] is a very flexible methodology that enables the adaptive selection of different interaction patterns (i.e. which particles are resampled together) over time. For each one of these techniques, we describe the methodology and establish basic theoretical guarantees for convergence.
In the second part of the paper, we focus on the analysis of the performance of the simplest parallelisation scheme, the averaging of M statistically independent PFs with N particles each. Under mild assumptions, we analyse the mean square error (MSE) of the estimators of onedimensional statistics of the optimal filtering distribution and show explicitly the effect of the parallelisation scheme on the convergence rate. Specifically, we study the decomposition of the MSE into variance and bias components, to show that the variance is \(O\left (\frac {1}{MN}\right)\), i.e. it decreases linearly with the total number of particles, while the bias is \(O\left (\frac {1}{N^{2}}\right)\), i.e. it goes to 0 quadratically with N. These results have already been obtained, e.g. in [13] using the FeynmanKac framework of [4]. Here, we aim at providing a selfcontained analysis that illustrates the key theoretical issues in the convergence of parallel PFs. All proofs are constructed from elementary principles, and we obtain explicit error rates (for the bias, the variance and the MSE) that hold for all M and N, while the theorems in [13] are strictly asymptotic. While we have focused here on PFs for discretetime statespace models, the analysis can be similarly done for continuoustime systems, and, indeed, the basic results needed for that case can be found in [15]. Finally, in order to compare different parallelisation schemes, we introduce a time–error index that combines time complexity (asymptotic order of the running time) and estimation accuracy (asymptotic error rates) into a single quantitative figure of merit that can be used to compare schemes with different degrees of interaction.
The rest of the paper is organised as follows. In Section 2, we present basic background material, and notation, for the analysis of PFs. Section 3 is devoted to a survey of parallelisation schemes for particle filtering. Our analysis of the ensemble of noninteracting PFs is presented in Section 4. In Section 5, we present numerical results for two examples, namely the filtering of stochastic versions of the Lorenz 63 and Lorenz 96 systems, respectively. The latter is often used as a simplified model of atmospheric dynamics, and it has the property that it can be scaled to an arbitrary dimension. Our simulation results show that the use of averaged estimators computed from ensembles of noninteracting filters can be advantageous in terms of accuracy (not only running times) as the system dimension grows. Finally, Section 6 is devoted to a discussion of the obtained results, together with some concluding remarks.
Background
Notation and preliminaries
We first introduce some common notations to be used through the paper, broadly classified by topics. Below, \(\mathbb {R}\) denotes the real line, while for an integer d ≥ 1, \(\mathbb {R}^{d}~=~\overbrace {\mathbb {R} \times \ldots \times \mathbb {R}}^{d \text {{\tiny times}}}\).

Functions.

The supremum norm of a real function \(f:\mathbb {R}^{d} \rightarrow \mathbb {R}\) is denoted as \(\ f \_{\infty } ~=~ \sup _{x\in \mathbb {R}^{d}}  f(x) \).

B(S) is the set of bounded real functions over \(S \subseteq \mathbb {R}^{d}\), i.e. \(f \in B(\mathbb {R}^{d})\) if, and only if, f has domain S and ∥f∥_{ ∞ }<∞.


Measures and integrals. Let \(S \subseteq \mathbb {R}^{d}\) be a subset of \(\mathbb {R}^{d}\).

\({\mathcal B}(S)\) is the σalgebra of Borel subsets of S.

\({\mathcal P}(S)\) is the set of probability measures over the measurable space \(({\mathcal B}(S),S)\).

\((f,\mu) \triangleq \int f(x) \mu (dx)\) is the integral of a real function \(f:S \rightarrow \mathbb {R}\) with respect to (w.r.t.) a measure \(\mu \in {\mathcal P}(S)\).

Given a probability measure \(\mu \in {\mathcal P}(S)\), a Borel set \(A \in {\mathcal B}(S)\) and the indicator function
$$ I_{A}(x) = \left\{ \begin{array}{ll} 1, &\text{if}\ x \in A\\ 0, &\text{otherwise} \end{array} \right., $$μ(A) = (I_{ A },μ) is the probability of A.


Sequences, vectors and random variables (r.v.’s).

We use a subscript notation for sequences, namely \(x_{t_{1}:t_{2}} \triangleq \left \{ x_{t_{1}}, \ldots, x_{t_{2}} \right \}\).

For an element \(x~=~\left (x_{1},\ldots,x_{d}\right) \in \mathbb {R}^{d}\) of a Euclidean space, its norm is denoted as \(\ x \~=~\sqrt {x_{1}^{2}+\ldots +x_{d}^{2} }\).

The L_{ p } norm of a real r.v. Z, with p ≥ 1, is written as \(\ Z \_{p} \triangleq E[ Z^{p} ]^{1/p}\), where E[·] denotes expectation w.r.t. the distribution of Z.

Statespace Markov models in discrete time
Consider two random sequences, {X_{ t }}_{t≥0} and {Y_{ t }}_{t≥1}, taking values in \({\mathcal X} \subseteq \mathbb {R}^{d_{x}}\) and \(\mathbb {R}^{d_{y}}\), respectively. Let \(\mathbb {P}_{t}\) be the joint probability measure for the collection of random variables {X_{0},X_{ n },Y_{ n }}_{1≤n≤t}.
We refer to the sequence {X_{ t }}_{t≥0} as the state (or signal) process, and we assume that it is an inhomogeneous Markov chain governed by an initial probability measure \(\tau _{0} \in {\mathcal P}({\mathcal X})\) and a sequence of Markov transition kernels \(\tau _{t} : {\mathcal B}({\mathcal X}) \times {\mathcal X} \rightarrow \left [0,1\right ]\). To be specific, we define
where \(A \in {\mathcal B}({\mathcal X})\) is a Borel set. The sequence {Y_{ t }}_{t≥1} is termed the observation process. Each r.v. Y_{ t } is assumed to be conditionally independent of other observations given X_{ t }; hence, the conditional distribution of the r.v. Y_{ t } given X_{ t }=x_{ t } is fully described by the probability density function (pdf) g_{ t }(y_{ t }x_{ t })>0. We often use g_{ t } as a function of x_{ t } (i.e. as a likelihood) and hence we write \(g_{t}^{y}(x) ~\triangleq ~ g_{t}(yx)\). The prior τ_{0}, the kernels {τ_{ t }}_{t≥1} and the functions {g_{ t }}_{t≥1} describe a stochastic Markov statespace model in discrete time.
The stochastic filtering problem consists in the computation of the posterior probability measure of the state X_{ t } given the sequence of observations up to time t. Specifically, for a given observation record {y_{ t }}_{t≥1}, we seek the probability measures
where \(A \in {\mathcal B}({\mathcal X})\). For many practical problems, the interest actually lies in the computation of statistics of π_{ t }, e.g. the posterior mean or the posterior variance of X_{ t }. Such statistics can be written as integrals of the form (f,π_{ t }), for some function \(f:{\mathcal X}\rightarrow \mathbb {R}\). Note that, for t = 0, we recover the prior signal measure, i.e. π_{0} = τ_{0}.
An associated problem is the computation of the onestepahead predictive measure
This measure can be explicitly written in terms of the kernel τ_{ t } and the filter π_{t−1}. Indeed, for any integrable function \(f:{\mathcal X}\rightarrow \mathbb {R}\), we readily obtain (see, e.g. ([6] Chapter 10))
and we write ξ_{ t } = τ_{ t }π_{ t } as shorthand.
The filter at time t, π_{ t }, can be obtained from the predictive measure, ξ_{ t }, and the likelihood, \(g_{t}^{y_{t}}\), by way of the socalled projective product [6] or BoltzmanGibbs transformation [4], \(\pi _{t}~=~g_{t}^{y_{t}} \star \xi _{t}\), defined as
for any integrable function \(f:{\mathcal X}\rightarrow \mathbb {R}\). Combined with (3), this yields the recursive formula
It is key to the analysis of Section 4 to keep track of the sequence of nonnormalised measures {ρ_{ t }}_{t≥0}, where
and, for any integrable function \(f:{\mathcal X}\rightarrow \mathbb {R}\) and any measure \(\alpha \in {\mathcal P}({\mathcal X})\), we define
We remark that ρ_{ t } is not a probability measure but a nonnormalised version of π_{ t }, namely
where 1(x) = 1 is the constant unit function.
Standard particle filter
Assume that a sequence of observations Y_{1:T} = y_{1:T}, for some T < ∞, is given. Then, the sequences of measures {π_{ t }}_{t≥1}, {ξ_{ t }}_{t≥1} and {ρ_{ t }}_{t≥0} can be numerically approximated using particle filtering. PFs are numerical methods based on the recursive relationships (4) and (6). The simplest algorithm, often called ‘standard particle filter’ or ‘bootstrap filter’ [16] (see also [17]), can be described as follows.
Step 2.(b) is referred to as resampling or selection. In the form stated here, it reduces to the socalled multinomial resampling algorithm [18, 19], but the convergence of the filter can be easily proved for various other schemes (see, e.g. the treatment of the resampling step in [6]).
Using the sets \(\left \{ \bar x_{t}^{(n)} \right \}_{1 \le n \le N}\) and \(\left \{ x_{t}^{(n)} \right \}_{1 \le n \le N}\), we construct random approximations of ξ_{ t }, ρ_{ t } and π_{ t }, namely
where δ_{ x } is the delta unitmeasure located at \(x \in \mathbb {R}^{d_{x}}\) and^{Footnote 1}
For any integrable function f on the state space, it is straightforward to approximate the integrals (f,ξ_{ t }), (f,π_{ t }) and (f,ρ_{ t }) as
respectively.
The convergence of PFs has been analysed in different ways [4, 6, 20–23]. Here, we use simple results for the convergence of the L_{ p } norms (p ≥ 1) of the approximation errors. For the approximation of integrals w.r.t. ξ_{ t } and π_{ t }, we have the following standard result.
Lemma 1
Assume that the sequence of observations Y_{1:T}=y_{1:T} is fixed (with T<∞), \(g_{t}^{y_{t}} \in B({\mathcal X})\) and \(g_{t}^{y_{t}}>0\) (in particular, \(\left (g_{t}^{y_{t}},\xi _{t}\right) > 0\)) for every t=1,2,...,T. Then for any \(f \in B({\mathcal X})\), any p≥1 and every t=1,…,T,
where \(\bar c_{t}\) and c_{ t } are finite constants independent of N, \(\ f \_{\infty }=\sup _{x \in {\mathcal X}} f(x)<\infty \) and the expectations are taken over the distributions of the measurevalued random variables \(\xi _{t}^{N}\) and \(\pi _{t}^{N}\), respectively.
Proof
This result is a special case of, e.g. Lemma 1 in [24]. □
Remark 1
The constants \(\bar c_{t}\) and c_{ t } can be easily shown to increase exponentially with t. It is possible to find error rates independent of t by imposing additional assumptions on the statespace model (related to the stability of the optimal filter, π_{ t }) [4,25].
Parallelisation schemes for particle filtering
Noninteracting particle filters
Assume we intend to run a PF with K particles. Most parallelisation schemes split the set of particles \(\left \{ x_{t}^{(k)} \right \}_{1 \le k \le K}\) into subsets and then run separate (but possibly interacting) PFs for each subset. To be specific, assume that the complete set of K particles can be divided into M subsets with N elements each, i.e. K = MN, and we construct disjoint subsets
such that
In the simplest scheme, M independent (i.e. noninteracting) PFs are run separately. Assume for simplicity that the standard PF outlined in Algorithm 1 is used on each subset. Then, at each time t, we have M estimates of the filtering measure, namely
Assuming that the goal is to approximate integrals of the form (f,π_{ t }), for some integrable real function \(f:{\mathcal X}\!\rightarrow \!\mathbb {R}\), then we obtain an ensemble of M independent and identically distributed (i.i.d.) estimators
which can be averaged to yield
where we have denoted \(\pi _{t}^{M \times N} = \frac {1}{M} \sum _{m=1}^{M} \pi _{t}^{m,N}\).
This scheme is straightforward to implement, and it does not involve any parallelisation overhead as the M PFs do not interact. A selfcontained analysis of the MSE of the ensemble estimator \(\left (f,\pi _{t}^{M \times N}\right)\) is presented in Section 4.
A key result, to be explicitly shown in our analysis but also pointed out in [13] and [12], is that the estimation bias \(\left  E\left [ (f,\pi _{t})  \left (f,\pi _{t}^{M \times N}\right) \right ] \right \) decreases as O(N^{−2}). This implies that if the number of particles per subset, N, is kept fixed, then the MSE, \(E\left [ \left  (f,\pi _{t})  \left (f,\pi _{t}^{M \times N}\right) \right ^{2} \right ]\), remains bounded away from zero even if the number of subsets is made arbitrarily large, i.e. M→∞. This can be a drawback depending on the type of parallel computing configuration to be used. In multicore computers, for example, the number of subsets M can be expected to be moderate (of the order of cores available) and N can often be made large enough to make the bias negligible. On the other hand, implementations based on lowpower processors, such as graphical processing units (GPUs) or wireless networks, are more efficient when operating with a large number of subsets, M, and a low number of particles per subset, N. In these scenarios, the bias of the noninteracting ensemble estimator in Eq. (13) can be significant. The solution to this limitation is to introduce some degree of interaction among the M parallelrunning PFs. Some relevant schemes are described below.
Distributed resampling with nonproportional allocation
The scheme termed distributed resampling with nonproportional allocation (DRNA) for the parallelisation of PFs was originally introduced in [2] (Section IV.A.3), but it has been only recently that a theoretical characterisation of its performance has been obtained [10,11,26].
The same as in Section 3.1, assume that we have a budget of K = MN particles, which are split into M subsets with N particles each. We run a standard PF for each subset^{Footnote 2} which, in addition to the particles and weights, keeps track of the aggregated nonnormalised weight
Note that \(W_{t}^{(m)*}\) represents the likelihood of the mth subset of particles \(\left \{ x_{t}^{(m,n)} \right \}_{1 \le n \le N}\). The normalised aggregated weights are computed as
In this scheme, the M parallel PFs are not independent. Every t_{0} time steps, the PFs exchange subsets of particles and weights using a communication network [2]. This exchange can be formally described by means of a deterministic onetoone map
that keeps the number of particles per subset, N, invariant. Specifically, (u,v) = β(m,n) means that the nth particle of the mth subset is transmitted to the uth subset, where it becomes particle number v. In summary, if we have the particles
then, after the exchange step, the particles are relabelled as
Typically, only small subsets of particles are exchanged, hence β(m,n)=(m,n) for most values of m and n. The resulting parallel particle filtering algorithm can be outlined as shown below (adapted from [10]).
We remark that every PF operates independently of all others except for the particle exchange, step 2.(c), which is carried out every t_{0} time steps. The degree of interaction can be controlled by designing the map β(m,k) in a proper way. Typically, exchanging a subset of particles with ‘neighbour’ PFs is sufficient. For example, if we assume the parallel PFs are arranged in a ring configuration, then the mth PF can exchange, say, two particles with PF number m−1 and another two particles with PF number m+1, in such a way that all parallel PFs retain N particles (four of them received from their neighbours) after the exchange.
We also note that the local resampling step is carried out independently, and concurrently, for each parallelrunning PF and it does not change the aggregate weights, i.e. \(\bar W_{t}^{(m)*} = \sum _{n=1}^{N} \bar w_{t}^{(m,n)*} = \sum _{n=1}^{N} \tilde w^{(m,n)*}\). We assume a multinomial resampling procedure, but other procedures can be used in an obvious manner.
The ensemble estimator of the optimal filter π_{ t } is now computed as the weighted average
The particle estimator of (f,π_{ t }) then becomes \(\left (f,\pi _{t}^{M \times N}\right) ~=~ \sum _{m=1}^{M} \frac {W_{t}^{(m)}}{N} \sum _{n=1}^{N} f\left (x_{t}^{(m,n)}\right)\).
The scheme in Algorithm 2 has been proved to converge uniformly over time, under some standard assumptions, when the number of particles per subset, N, is kept fixed and the number of subsets (i.e. the number of parallel PFs), M, is increased. To be specific, we have the following result, which is proved in [10] (Section 3.2).
Theorem 1
If the following three assumptions hold:

The sequence of observations {y_{ t }}_{t≥1} is fixed (but otherwise arbitrary) and there exists a real constant 0<a<∞ such that \(\frac {1}{a} < g_{t}^{y_{t}}(x) < a\) for every t≥1 and every \(x \in {\mathcal X}\).

The sequence of probability measures {π_{ t }}_{t≥0} is stable (see [25]).

The particle exchange step guarantees that
$$ E\left[ \left(\sup_{1 \le m \le M} W_{rt_{0}}^{(m)} \right)^{q} \right] \le \frac{c^{q}}{M^{q\epsilon}}, \quad \text{for every} r \in \mathbb{N} $$and some constants c<∞, 0≤ε<1 and q≥4 independent of M.
Then, for any fixed 0<N<∞,
for any \(f \in B({\mathcal X})\) and every 1≤p≤q.
Assumption iii. in the latter theorem indicates that none of the M subsets should accumulate too much aggregate weight compared to the other subsets. This accumulation of weight is precisely controlled by the particle exchange steps. In a practical implementation, the aggregate weights \(W_{t}^{(m)*}\) should be monitored and additional particle exchange steps should be triggered when the weight of any subset increases beyond some prescribed threshold.
Particle islands
The particle island model was introduced in [13] in order to address the parallel processing of subsets of particles in SMC methods in a systematic manner. Similar to the DRNAbased PFs of Section 3.2, the algorithms proposed in [13] are based on running M parallel PFs, each one on a disjoint subset of particles, namely \(\big \{ x_{t}^{(m,n)} \big \}_{1 \le n \le N}\) for the mth filter, and keep track of the nonnormalised aggregate weights \(W_{t}^{(m)*}\) defined in Eq. (14).
However, particle island methods do not rely on an exchange of particles between the PFs running the different subsets. Instead, a resampling scheme in two levels is implemented.

Particle level: resampling is carried out locally within each of the M concurrently running PFs. This is equivalent to the local resampling step in Algorithm 2.

Island level: the aggregate weights \(W_{t}^{(m)}\) are used to resample the particle subsets, or islands, assigned to the individual PFs. In this step, complete subsets can be replicated or eliminated (in the same way as particles are in a conventional, or particle level, resampling step).
We now outline the double bootstrap filter, an algorithm described in [13] (Algorithm 1) that performs multinomial resampling at both the particle level and the island level. While in the version of [13] both resampling steps are taken at every time step t, we describe a slightly more general procedure where the islandlevel resampling steps are taken periodically, every t_{0} ≥ 1 time steps. For simplicity, we introduce the notation \({\sf X}_{t}^{m,N} ~=~ \big \{ x_{t}^{(m,n)} \big \}_{1 \le n \le N}\) for the subset of N particles assigned to the mth island (ie. the mth concurrently running PF).
In Algorithm 3, a multinomial resampling procedure is employed both at the particle level and the island level. Other schemes are obviously possible and some of them are explored in [13], including εinteractions and resampling conditional on the effective sample size.
The particle approximation of the optimal filter π_{ t } takes the form \(\pi _{t}^{M \times N} ~=~ \sum _{m=1}^{M} W_{t}^{(m)} \pi _{t}^{m,N}\), where \(\pi _{t}^{m,N} ~=~ \frac {1}{N} \sum _{n=1}^{N} \delta _{x_{t}^{(m,n)}}\). This is formally identical to the DRNAbased Algorithm 2, although the procedure for the computation of the particles and weights is obviously different.
The asymptotic convergence of the double bootstrap filter was proved in [13] using the FeynmanKac machinery of [4]. Then, in the followup paper [14], a central limit theorem was proved and bounds on the asymptotic variance of a class of schemes that includes Algorithm 3 were derived. Here we reproduce the basic convergence result of [13], adapted to the notation of this paper.
Theorem 2
Assume that the sequence of observations y_{1:T} is arbitrary but fixed, T is arbitrarily large but finite and the likelihood functions \(g_{t}^{y_{t}}(x)\) are positive and bounded for 1≤t≤T. Then, for any \(f \in B({\mathcal X})\) and every t=1,...,T,
where Var[·] denotes the variance of a random variable and B(f,t) and V(f,t) are finite constants with respect to both M and N.
The results in Theorem 2 can be adapted to the case where the islandlevel resampling step is removed from Algorithm 3, effectively converting the double bootstrap method into an ensemble of noninteracting PFs. It is proved in [13] that, in such case,
where the constants \(\bar B(f,t)\) and \(\bar V(f,t)\) are independent of M and N. This implies that the bias of the estimator \(\left (f,\pi _{t}^{M\times N}\right)\) with noninteracting PFs depends only on N and cannot be eliminated by taking M→∞ alone. The MSE of Algorithm 3, on the other hand, vanishes as MN→∞.
Adaptive interaction pattern: the αSMC methodology
Rather than working with fixed subsets \({\sf X}_{t}^{m,N} \,=\, \left \{\! x_{t}^{(m,n)}\!\right \}_{1 \le n \le N}\), m = 1,...,M, the αSMC methodology of [12] enables the construction of particle filtering algorithms with adaptive interaction patterns. In particular, it is possible to devise parallelised PFs within this framework where the subsets of particles which are resampled together can change from one time step to the next (including their size, N).
Let K be the total number of particles. The interaction pattern for resampling is specified by means of a sequence of Markov transition matrices \(\alpha _{t} ~=~ \left [\alpha _{t}^{ij}\right ]\) where 1≤i≤K and 1≤j≤K are the row and column indices, respectively. Since α_{ t } is a Markov matrix, it satisfies \(\sum _{j=1}^{K} \alpha _{t}^{ij} = 1\) for every row i. The ith row in α_{ t } determines from which subset of particles we resample \(x_{t}^{(i)}\). The general αSMC method is outlined below. We assume that either the sequence α_{ t } is predetermined or there is some prescribed rule to select α_{ t } given the observations y_{1:t} and the particles \(\left \{ \bar x_{t}^{(k)} \right \}_{1 \le n \le K}\).
The particle approximation of π_{ t } produced by Algorithm 4 is \(\pi _{t}^{K} ~=~ \sum _{k=1}^{K} w_{t}^{(k)} \delta _{x_{t}^{(k)}}\). The αSMC scheme can be particularised to yield most standard particle filtering algorithms ([12] Section 2.2). Of specific interest for the purpose of parallelisation is that the DRNAbased PF (Algorithm 2) can also be described and analysed as an αSMC procedure [11].
The convergence of αSMC methods depends on the choice of the sequence of interaction matrices α_{ t }. Let us recursively define the matrices α_{t,t}=I_{ K } (where I_{ K } denotes the identity matrix) and α_{s,t}, constructed entrywise as \(\alpha _{s,t}^{ij} = \sum _{k=1}^{K} \alpha _{s+1,t}^{ik} \alpha _{s}^{kj}\), for i,j∈{1,...,K} and 0≤s<t. Furthermore, define \(\beta _{s,t}^{i} = \frac {1}{K} \sum _{j=1}^{K} \alpha _{s,t}^{ji}\), for i=1,...,K and 0≤s≤t. Then, we have the following result, proved in [12] (Section 3).
Theorem 3
Assume that \(g_{t}^{y_{t}}\) is positive and bounded for every t≥1. If the coefficients \(\{ \beta _{s,t}^{i} \}_{1 \le i \le K}\) are measurable w.r.t. the trivial σalgebra \(\{ {\mathcal {X}}, \emptyset \}\) and \({\lim }_{K\rightarrow \infty } \max _{i \in \{1,..., K\}} \beta _{s,t}^{i} = 0\) for all 0≤s≤t then
for any \(f \in B({\mathcal X})\) and p≥1.
Error rates for ensembles of noninteracting particle filters
Averaged estimators
We turn our attention to the analysis of the ensemble of noninteracting PFs outlined in Section 3.1. In particular, we study the accuracy of the particle approximations \(\pi _{t}^{m,N}\) and \(\pi _{t}^{M \times N}\) introduced in Eqs. (12) and (13), respectively. We adopt the mean square error (MSE) for integrals of bounded real functions,
as a performance metric. Since the underlying statespace model is the same for all filters and they are run in a completely independent manner, the measuredvalued random variables \(\pi _{t}^{m,N}\), m=1,...,M, are i.i.d., and it is straightforward to show (via Lemma 1) that
for some constant t independent of N and M. However, the inequality (15) does not illuminate the effect of the choice of N. In the extreme case of N = 1, for example, \(\pi _{t}^{M \times N}\) reduces to the outcome of a sequential importance sampling algorithm, with no resampling, which is known to degenerate quickly in practice. Instead of (15), we seek a bound for the approximation error that provides some indication on the tradeoff between the number of independent filters, M, and the number of particles per filter, N.
With this purpose, we tackle the classical decomposition of the MSE in variance and bias terms. First, we obtain preliminary results that are needed for the analysis of the average measure \(\pi _{t}^{M \times N}\). In particular, we prove that the random nonnormalised measure \(\rho _{t}^{N}\) produced by the bootstrap filter (Algorithm 1) is unbiased and attains L_{ p } error rates proportional to \(\frac {1}{\sqrt {N}}\), i.e. the same as \(\xi _{t}^{N}\) and \(\pi _{t}^{N}\). We use these results to derive an upper bound for the bias of \(\pi _{t}^{N}\) which is proportional to \(\frac {1}{N}\). The latter enables us to deduce an upper bound for the MSE of the ensemble approximation \(\pi _{t}^{M \times N}\) consisting of two additive terms that depend explicitly on M and N. Specifically, we show that the variance component of the MSE decays linearly with the total number of particles, K=MN, while the bias term decreases with N^{2}, i.e. quadratically with the number of particles per filter.
Assumptions on the state space model
All the results to be introduced in the rest of Section 4 hold under the (mild) assumptions of Lemma 1, which we summarise below for convenience of presentation.
Assumption 1
The sequence of observations Y_{1:T}=y_{1:T} is arbitrary but fixed, with T<∞.
Assumption 2
The likelihood functions are bounded and positive, i.e.
Remark 2
Note that Assumptions 1 and 2 imply that

\((g_{t}^{y_{t}},\alpha) > 0\), for any \(\alpha \in {\mathcal P}({\mathcal X})\), and

\(\prod _{k=1}^{T} g_{t}^{y_{t}} \le \prod _{k=1}^{T} \ g_{t}^{y_{t}} \_{\infty } < \infty \),
for every t=1,2,...,T.
Remark 3
We seek simple convergence results for a fixed time horizon T<∞, similar to Lemma 1. Therefore, no further assumptions related to the stability of the optimal filter for the statespace model [4,25] are needed. If such assumptions are imposed then stronger (time uniform) asymptotic convergence can be proved, similar to Theorem 1 in Section 3.2. See [11] for additional results that apply to the independent filters \(\pi _{t}^{m,N}\) and the ensemble \(\pi _{t}^{M \times N}\).
Bias and error rates
Our analysis relies on some properties of the particle approximations of the nonnormalised measures ρ_{ t }, t≥1. We first show that the estimate \(\rho _{t}^{N}\) in Eq. (8) is unbiased.
Lemma 2
If Assumptions 1 and 2 hold, then
for any \(f \in B({\mathcal X})\) and every t=1,2,...,T.
Proof
See Appendix 1 for a selfcontained proof. □
Remark 4
The result in Lemma 2 was originally proved in [4]. For the case 1(x)=1, it states that the estimate \(\left (\mathbf {1},\rho _{t}^{N}\right)\) of the proportionality constant of the posterior distribution π_{ t } is unbiased. This property is at the core of recent model inference algorithms such as particle MCMC [27], SMC^{2}[28] or some population Monte Carlo [29] methods.
Combining Lemma 2 with the standard result of Lemma 1 leads to an explicit convergence rate for the L_{ p } norms of the approximation errors \(\left (f,\rho _{t}^{N}\right)  (f,\rho _{t})\).
Lemma 3
If Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\), any p≥1 and every t=1,2,...,T, we have the inequality
where \(\tilde c_{t} < \infty \) is a constant independent of N.
Proof
See Appendix 2. □
Finally, Lemmas 2 and 3 together enable the calculation of explicit rates for the bias of the particle approximation of (f,π_{ t }). This is a key result for the decomposition of the MSE into variance and bias terms. To be specific, we can prove the following theorem.
Theorem 4
If 0<(1,ρ_{ t })<∞ for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and every 0≤t≤T, we obtain
where \(\hat c_{t} < \infty \) is a constant independent of N.
Proof Let us first note that (f,π_{ t })=(f,ρ_{ t })/(1,ρ_{ t }) and
where (17) follows from the construction of \(\rho _{t}^{N}\), (18) holds because \(\left (\mathbf {1},\pi _{t}^{N}\right)=1\) and (19) is, again, a consequence of the definition of \(\rho _{t}^{N}\). Therefore, the difference \(\left (f,\pi _{t}^{N}\right)\left (f,\pi _{t}\right)\) can be written as
and, since \(\left (f,\rho _{t}\right)=E\left [\left (f,\rho _{t}^{N}\right)\right ]\) (from Lemma 2), the bias can be expressed as
Some elementary manipulations on (20) yield the equality
If we realise that \(E\left [ (\mathbf {1},\rho _{t})  \left (\mathbf {1},\rho _{t}^{N}\right) \right ]=0\) (again, a consequence of Lemma 2) and move the factor (1,ρ_{ t })^{−1} out of the expectation, then we easily rewrite Eq. (21) as
where we have applied the CauchySchwartz inequality to obtain (22), (23) follows from Lemmas 1 and 3 and
is a constant independent of N. □
The result in Theorem 4 was originally proved in [30], albeit by a different method.
For any \(f \in B({\mathcal X})\), let \({\mathcal E}_{t}^{N}(f)\) denote the approximation difference, i.e.
This is a r.v. whose secondorder moment yields the MSE of \(\left (f,\pi _{t}^{N}\right)\). It is straightforward to obtain a bound for the MSE from Lemma 1 and, by subsequently using Theorem 4, we readily find a similar bound for the variance of \({\mathcal E}_{t}^{N}(f)\), denoted \(\text {\sf Var}\left [{\mathcal E}_{t}^{N}(f)\right ]\). These results are explicitly stated by the corollary below.
Corollary 1
If 0<(1,ρ_{ t })<∞ for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and any 0≤t≤T, we obtain
where c_{ t } and \(c_{t}^{v}\) are finite constants independent of N.
Proof The inequality (24) for the MSE is a straightforward consequence of Lemma 1. Moreover, we can write the MSE in terms of the variance and the square of the bias, which yields
Since Theorem 4 ensures that \(\big  E\left [{\mathcal E}_{t}^{N}\right ] \big  \le \frac {\hat c_{t}\ f \_{\infty }}{N}\), then the inequality (26) implies that there exists a constant \(c_{t}^{v}<\infty \) such that (25) holds. □
Error rate for the averaged estimators
Let us run M independent PFs with the same (fixed) sequence of observations Y_{1:T}=y_{1:T}, T<∞, and N particles each. The random measures output by the mth filter are denoted
Obviously, all the theoretical properties established in Section 4.3, as well as the basic Lemma 1, hold for each one of the M independent filters.
Definition 1
The ensemble approximation of π_{ t } with M independent filters is the discrete random measure \(\pi _{t}^{M \times N}\) constructed as
and the averaged estimator of (f,π_{ t }) is \(\left (f,\pi _{t}^{M\times N}\right)\).
It is apparent that similar ensemble approximations can be given for ξ_{ t } and ρ_{ t }. Moreover, the statistical independence of the PFs yields the following corollary as a straightforward consequence of Theorem 4 and Corollary 1.
Corollary 2
If 0<(1,ρ_{ t })<∞ for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and any 0≤t≤T, the inequality
holds for some constants \(c_{t}^{v}\) and \(\hat c_{t}\) independent of N and M.
Proof
Let us denote
for m=1,2,...,M. Since \(\pi _{t}^{M \times N}\) is a linear combination of i.i.d. random measures, we easily obtain that
where the inequality follows from Theorem 4. Moreover, again because of the independence of the random measures, we readily calculate a bound for the variance of \({\mathcal E}_{t}^{M \times N}(f)\),
where the inequality follows from Corollary 1. Since \(E\left [ \left ({\mathcal E}_{t}^{M \times N}\right)^{2} \right ] = \text {\sf Var}\left [ {\mathcal E}_{t}^{M\times N} \right ] + \left  E\left [ {\mathcal E}_{t}^{M \times N} \right ] \right ^{2}\), combining (29) and (28) yields (27) and concludes the proof. □
The inequality in Corollary 2 shows explicitly that the bias of the estimator \(\left (f,\pi _{t}^{M \times N}\right)\) cannot be arbitrarily reduced when N is fixed, even if M→∞. This feature is already discussed in Section 3.3. Note that the inequality (27) holds for any choice of M and N, while Theorem 2 yields asymptotic limits.
Remark 5
According to the inequality (27), the bias of the estimator \(\left (f,\pi _{t}^{M\times N}\right)\) is controlled by the number of particles per subset, N, and converges quadratically, while, for fixed N, the variance decays linearly with M. The MSE rate is \(\propto \frac {1}{MN} \) as long as N≥M. Otherwise, the term \(\frac {\hat c_{t}^{2} \ f \_{\infty }^{2}}{N^{2}}\) becomes dominant and the resulting asymptotic error bound turns out higher.
Remark 6
While the convergence results presented here have been proved for the standard bootstrap filter, it is straightforward to extend them to other classes of PFs for which Lemmas 1 and 2 hold.
Comparison of parallelisation schemes via time–error indices
The advantage of parallel computation is the drastic reduction of the time needed to run the PF. Let the running time for a PF with K particles be of order \({\mathcal T}(K)\), where \({\mathcal T}:\mathbb {N}\rightarrow (0,\infty)\) is some strictly increasing function of K. The quantity \({\mathcal T}(K)\) includes the time needed to generate new particles, weight them and perform resampling. The latter step is the bottleneck for parallelisation, as it requires the interaction of all K particles. Also, a ‘straightforward’ implementation of the resampling step leads to an execution time \({\mathcal T}(K)=K\log (K)\), although efficient algorithms exist that achieve to a linear time complexity, \({\mathcal T}(K)=K\). We can combine the MSE rate and the time complexity to propose a time–error performance metric.
Definition 2
We define the time–error index of a particle filtering algorithm with running time of order \({\mathcal T}\) and asymptotic MSE rate \({\mathcal R}\) as \({\mathcal C} \triangleq {\mathcal T} \times {\mathcal R}.\)
The smaller the index \({\mathcal C}\) for an algorithm, the more (asymptotically) efficient its implementation. For the standard (centralised) bootstrap filter (see Algorithm 1) with K particles, the running time is of order \({\mathcal T}(K)=K\) and the MSE rate is of order \({\mathcal R}(K)=\frac {1}{K}\); hence, the time–error index becomes
For the computation of the ensemble approximation \(\pi _{t}^{M \times N}\), we can run M independent PFs in parallel, with N=K/M particles each and no interaction among them. Hence, the execution time becomes of order \({\mathcal T}(M,N)=N\). Since the error rate for the ensemble approximation is of order \({\mathcal R}(M,N)~=~\left (\frac {1}{MN}+\frac {1}{N^{2}}\right)\), the time–error index of the ensemble approximation is
and hence it vanishes with M,N→∞. In particular, since we have to choose N≥M to ensure a rate of order \(\frac {1}{MN}\), then \({\lim }_{M \rightarrow \infty } {\mathcal C}_{ens} = 0\). In any case, whenever N>1 it is apparent that \({\mathcal C}_{ens} < {\mathcal C}_{bf}\).
We have described alternative ensemble approximations where M nonindependent PFs are run with N particles each in Section 3. The overall error rates for these methods are same as for the standard bootstrap filter; however, the time complexity depends not only on the number of particles N allocated to each of the M subsets, but also on the subsequent interactions among subsets.
Let us consider, for example, the double bootstrap algorithm with adaptive selection of [13] (namely, [13] (Algorithm 4)). This is a scheme where

M bootstrap filters (as Algorithm 1 in this paper) are run in parallel and an aggregate weight is computed for each one of them, denoted \(W_{t}^{(m)}\);

When the coefficient of variation (CV) of these aggregate weights is greater than a given threshold, the M bootstrap filters are resampled (some filters are discarded and others are replicated using a multinomial resampling procedure).
See [13] (Section 4.2) for details. Assuming that the resampling procedure in the second step above (termed islandlevel resampling in [13]) is performed, in the average, once every L time steps, then the running time for this algorithm is
while the approximation error is \({\mathcal R}(M,N) = \frac {1}{MN}\) (see ([13] Theorem 5)). Hence, the time–error index for this double bootstrap algorithm is
When L<<N, we readily obtain that \({\mathcal C}_{ens} < {\mathcal C}_{dbf}\). For example, for a configuration with M = 10 filters and N = 100 particles each and assuming that islandlevel resampling is performed every L = 20 time steps on average, then \({\mathcal C}_{dpf} ~=~ 0.145\) and \({\mathcal C}_{ens}~=~0.110\). On the contrary, if L is large enough (namely, if L > N(M − 1)/M), the double bootstrap algorithm becomes more efficient, meaning that \({\mathcal C}_{dbf} ~<~ {\mathcal C}_{ens}\).
Computing the time–error index for practical algorithms can be hard and highly dependent on the specific implementation. Different implementations of the double bootstrap algorithm, for example, may yield different time–error indices depending on how the islandlevel resampling step is carried out.
Numerical results and discussion
Example: Lorenz 63 model
The threedimensional Lorenz system
Let us consider the problem of tracking the state of a threedimensional Lorenz system [31] with additive dynamical noise and partial observations [32]. To be specific, consider a threedimensional stochastic process {X(s)}_{s∈(0,∞)} (s denotes continuous time) taking values on \(\mathbb {R}^{3}\), which dynamics is described by the system of stochastic differential equations
where {W_{ i }(s)}_{s∈(0,∞)}, i = 1,2,3, are independent onedimensional Wiener processes and
are static model parameters^{Footnote 3} that yield chaotic dynamics. A discretetime version of the latter system using Euler’s method with integration step T_{ d }=10^{−3} is straightforward to obtain and yield the model
where {U_{i,n}}_{n=0,1,...}, i = 1,2,3, are independent sequences of i.i.d. normal random variables with 0 mean and variance 1. System (30)–(32) is partially observed every 100 discretetime steps. Specifically, we collect a sequence of scalar observations {Y_{ t }}_{t=1,2,...}, of the form
where {V_{ t }}_{t=1,2,...} is a sequence of i.i.d. normal random variables with zero mean and variance \(\sigma ^{2} ~=~ \frac {1}{2}\).
Let \(X_{n}=(X_{1,n},X_{2,n},X_{3,n}) \in \mathbb {R}^{3}\) be the state vector at discrete time n. The dynamic model given by Eqs. (30)–(32) yields the family of kernels τ_{n,θ}(dxx_{n−1}), and the observation model of Eq. (33) yields the likelihood function
both in a straightforward manner. The goal is to track the sequence of joint posterior probability measures π_{ t }, t=1,2,..., for \(\{ \hat X_{t} \}_{t=1,...}\), where \(\hat X_{t} = X_{100t}\). Note that one can draw a sample \(\hat X_{t} = \hat x_{t}\) conditional on \(\hat X_{t1} = \hat x_{t1}\) by successively simulating
where \(\tilde x_{100(t1)} = \hat x_{t1}\) and \(\hat x_{t} = \tilde x_{100t}\). The prior measure for the state variables is normal, namely \(\hat X_{0} \sim {\mathcal N}\left (x_{*},v_{0}^{2} {\mathcal I}_{3}\right),\) where x_{∗}=(− 10.2410;− 1.3984;− 23.6752) is the mean^{Footnote 4} and \(v_{0}^{2}{\mathcal I}_{3}\) is the covariance matrix, with \(v_{0}^{2} = 10\) and \({\mathcal I}_{3}\) the threedimensional identity matrix.
Simulation setup
We aim at illustrating the gain in relative performance, taking into account both estimation errors and running time, that can be attained using ensembles of independent PFs. With this purpose, we have applied

The standard bootstrap filter (Algorithm 1), termed BF in the sequel, and

The ensemble of noninteracting bootstrap filters (NIBFs) that we have investigated in Section 4
to track the sequence of probability measures π_{ t } generated by the threedimensional Lorenz model described in Section 5.1.1. We have generated a sequence of 200 synthetic observations, {y_{ t };t=1,...,200}, spread over an interval of 20 continuous time units, corresponding to 2×10^{4} discrete time steps in the Euler scheme (hence, one observation every 100 steps).
The ensemble of NIBFs consists of M filters with N particles each, while the standard BF runs with K particles, where K=MN for a fair comparison.
We have coded the three algorithms in Matlab (version 7.11.0.584 [R2010b] with the parallel computing toolbox) and run the experiments using a pool of identical multiprocessor machines, each one having 8 cores at 3.16 GHz and 32 GB of RAM memory. The standard (centralised) BF is run with K=NM particles in a single core. For the ensemble of NIBFs, we allow the parallel computing toolbox to allocate all available cores per server in order to run all BFs concurrently.
To assess the approximation errors, we have computed empirical MSEs for the approximation of the posterior mean, \(E[\hat X_{t}  Y_{1:t}] ~=~ (I,\pi _{t})\), where I(x) = x is the identity function, for the two algorithms at the last update step, t = 200. Note, however, that the integral (I,π_{ t }) cannot be computed in closed form for this system. Therefore, we have used the ‘expensive’ estimate
computed via the standard BF, as a proxy of the true value.
Numerical results
Figure 1 displays the empirical MSE, averaged over 100 independent simulation runs, attained by the parallel schemes when the number of filters is fixed, M=20, and the number of particles per filter (particle island) ranges from N=100 to N=1000. The outcome of the centralised BF with K = MN particles, hence ranging from K=20×100 to K=20×1000, is also shown for comparison. We observe that proposed ensemble of NIBFs achieves a poor performance when the number of particles per filter, N, is relatively low (N = 100), while for moderate values (N ≥ 400) it nearly matches the MSE of the centralised BF.
Next, we look into the relationship between the MSE and the running time for the two algorithms. With the number of filters M=20 fixed, we have run 100 independent simulation trials for each value N=100,200,400,800 and 1000 and computed the empirical MSE and the average running time for the parallel scheme and each combination of M and N. Correspondingly, we have also run the centralised BF with K = MN particles, hence for K=2×10^{3},4×10^{3},8×10^{3},16×10^{3} and 20×10^{3}.
Figure 2 displays the resulting empirical MSE versus the running time for the two methods. If we qualify an algorithm as more efficient than another one when it is capable of attaining a lower MSE in the same amount of time, then this set of simulations shows that the independent ensemble scheme is more efficient than the centralised BF. Indeed, a close look at Fig. 2 reveals that the ensemble of M=20 NIBFs with N=1000 particles per filter achieves an empirical MSE of ≈ 6 × 10^{−4} with a running time of ≈ 2.9 s, while the centralised BF attains the same performance with K = 20 × 800 particles and a running time of ≈ 27.2 s (as shown by the dashed horizontal line in the plot).
Example: Lorenz 96 model
The Jdimensional Lorenz 96 system
The Lorenz 96 model is a deterministic system of nonlinear differential equations that displays chaotic dynamics [33,34]. The system dimension, i.e. the number of dynamic variables, can be scaled arbitrarily. A stochastic version of the model can be easily obtained by converting each differential equation into a stochastic differential equation driven by an independent and additive Wiener process. In particular, a model with J variables, Z_{ j }, j=0,…,J−1, can be written down as the system of stochastic differential equations
where F = 8 is a constant forcing parameter^{Footnote 5}, the Wiener processes {W_{ j }(s)}_{l,j≥0} are assumed independent and the scale parameter σ is known.
A straightforward application of the EulerMaruyama integration method yields a discretetime version of the stochastic, twoscale Lorenz 96 model. If we let T_{ d } > 0 denote the discretisation period and n denotes discrete time, then we readily obtain
where j=0,…,J − 1 and {U_{j,n}}_{l,j,n≥0} are independent and identically distributed (i.i.d.) standard Gaussian r.v.’s.
We assume that observations can only be collected from this system once every n_{0} discrete time steps. Moreover, only the variables with even indices (j = 0,2,4,…,J, for even J) are measured. Therefore, the observation process has the form
where t=1,2,... and {V_{ t }}_{t≥1} is a sequence of i.i.d. r.v.’s with common pdf \({\mathcal N}\left (v_{t}; 0, \sigma _{y}^{2} {\mathcal I}_{\frac {J}{2}}\right)\), which denotes a \(\frac {J}{2}\)dimensional Gaussian distribution with 0 mean and covariance matrix \(\sigma _{y}^{2} {\mathcal I}_{\frac {J}{2}}\).
Equations 35 and (36) describe a state space model that can be expressed in terms of the general notation in Section 2. The state process at time t is \(\tilde X_{n}=\left [Z_{0,n}, \ldots, Z_{J1,n} \right ]^{\top }\) and the transition kernel from time n−1 to time n is
where \({\mathcal {N}}(x; \mu, \Sigma)\) is the Gaussian density with argument x, mean μ and covariance matrix Σ, \(\sigma _{x}^{2} ~=~ T_{d} \sigma ^{2}\) and \(\Psi : \mathbb {R}^{J} \rightarrow \mathbb {R}^{J}\) is the deterministic transformation that accounts for all the terms on the right hand side of (35) except the noise contribution \(\sqrt {T_{d}}\sigma U_{j,n}\). Since we only collect observations every n_{0}T_{ d } continuoustime units, we need to put the dynamics of the states on the same time scale as the observation process {Y_{ t }}_{t≥1} in Eq. (36). If we define \(X_{t} ~=~ \tilde X_{n_{0}t}\) then the transition kernel from X_{t−1} to X_{ t } follows readily from (37),
While τ_{ t }(x_{ t }x_{t−1}) cannot be evaluated in closed form, it is straightforward to draw a sample from X_{ t }x_{t−1} by simply running Eq. (35) n_{0} times, with starting point x_{t−1}. The likelihood function is
Simulation setup
We have run 100 independent simulations of the discretised Lorenz 96 model described in Section 5.2.1 above over 20 continuoustime units, with integration step T_{ d }=2×10^{−4} (which amounts to 10^{5} discretetime steps) and, for each simulation, we have obtained noisy observations, with \(\sigma _{y}^{2}=\frac {1}{2}\) and n_{0}=10, according to Eq. (36). The noisescale parameter σ in the state Eq. (35) is set as \(\sigma =\frac {1}{\sqrt {2}}\), so that the noise variance becomes \(\sigma _{x}^{2}=\frac {T_{d}}{2}\).
The computer experiments are similar to Section 5.1. For each simulation, we have run M = 10 iNIBFs with N particles each versus a centralised BF with K = 10N particles and used them to compute onestepahead predictions of the observations. In particular, at discretetime t, we have computed predictions of the observation vector y_{ t }, using the measures
for the centralised BF and the NIBFs, respectively. To be specific, if \(y_{t}=\left (y_{0,t}, y_{1,t}, \ldots, y_{\frac {J}{2},t}\right)\), we have computed estimates
and then we have averaged the quadratic errors \(\left (y_{r,t}  y_{r,t}^{K} \right)^{2}\) and \(\left (y_{r,t}  y_{r,t}^{M\times N} \right)^{2}\) over r, t and 100 independent simulation runs. Finally, we have normalised the resulting empirical MSE with respect to the observation power \(\frac {2}{J}\sum _{r=0}^{\frac {J}{2}} E\left [y_{r,t}^{2}\right ]\). Note that, in this case, we have used the actual observations generated in the simulations to obtain the errors, instead of the proxy values in Section 5.1.
The experiments have been carried out for a Lorenz 96 model with J=20 variables first and then for the same model with J=50 variables.
The simulations have been coded using Matlab version R2016b (64 bits), with the parallel computing toolbox enabled, on an 8core Intel(R) Xeon(R) CPU E52680 v2 server, with clock frequency 2.80 GHz and 64 GB of RAM. All the results reported are averaged over 100 independent simulation runs as described above.
Numerical results
Figure 3 plots the normalised MSE attained by the centralised BF and the ensemble of M=10 NIBFs versus N. The centralised BF is run with K=10N particles, while each one of the M=10 NIBFs is run with N particles. The figure shows results for two different statespace dimensions. The solid lines correspond to a stochastic Lorenz 96 model with J=20 variables. In this case, the outcome of the simulations is similar to the experiments with the Lorenz 63 system: for K=MN, the centralised BF attains a smaller MSE than the NIBFs, with the gap closing as N increases. The result of the experiment is different when the state space dimension is incremented to J=50. In this case, the normalised MSE of the NIBFs is slightly smaller than the error of the centralised BF for N<1600, with both estimators attaining the same performance for N=1600. Hence, for this example, the averaging of the NIBFs has a beneficial effect on the accuracy of the estimators, at least for certain combinations of the number of particles N and the dimension J. We have verified that the bias of the centralised estimator \(y_{t}^{K}\) is lesser than the bias of the estimator \(y_{t}^{M\times N}\), as predicted by the theoretical analysis, while \(y_{t}^{M\times N}\) attains a smaller empirical variance than \(y_{t}^{K}\) (at least for N<1600 and 40≤J≤100).
Figure 4 displays the results of the same computer experiment as in Fig. 3, except that instead of averaging the MSE over the 100 independent simulation ruins, we display the maximum MSE, both for the centralised BF and the ensembles of NIBFs, out of the 100 simulations for each one of the values of N. We observe that the ensemble of NIBFs is more robust than the centralised BF. While for dimension J = 20 the centralised BF attains a clearly lower average MSE than the NIBFs, the maximum MSE turns out to be similar for both algorithms. For dimension J = 50, the average MSE of the NIBFs is already lower (as shown in Fig. 3) than the average MSE of the BF, and the advantage of the parallelised algorithm increases when we look at the maximum MSE.
Figure 5 plots the same normalised MSE values of Fig. 3 versus the running times of the algorithms, given in seconds, for a complete simulation with 10^{5} discrete time steps. As in the experiments of Section 5.1.3, the NIBFs can attain the same MSE as the centralised BF in just a fraction of the running time. While the improvement can be, ideally, of a factor M (with M=10 in this case), in practice it depends on the efficiency of the computing software. With the version of Matlab (R2016b, with the parallelisation toolbox) and the 8core Intel Xeon processor used in these experiments, the running time of the centralised BF with K=10N particles was reduced by a modest factor of 2.6 for J=20 when using M=10 parallel NIBFs with N particles each. For J=50, however, the running time was reduced by a factor of 6.6. The difference is due to the ability of the Matlab software to parallelise more efficiently when handling larger vectors. From this figure, we observe that, for J=50, the NIBFs attain the same minimum error as the centralised BF (a normalised MSE of ≈ 0.0138) with a running time that is 6.6 times smaller (464 versus 3,082 s).
Conclusions
We have presented a survey of methods for the parallelisation of particle filters. Specifically, we have described the basic parallelisation scheme based on ensembles of statistically independent PFs and then discussed three alternatives which introduce different degrees of interaction among the concurrently running filters. We have placed emphasis on the theoretical guarantees of the algorithms, and, hence, we have stated conditions for the convergence of all the techniques, including the DRNAbased PF of [2], the particle island model of [13] and the αSMC method of [12].
In the second half of the paper, we have focused on the theoretical properties of the ensemble of noninteracting PFs. For this method, we have shown, both numerically and through the definition of time–error indices, that the averaging of statistically independent PFs should be preferred when N, the number of particles per independent filter, can be made sufficiently large to reduce the bias. This is often the case when using manycore computers (or computing clusters). When parallelisation is implemented using many lowpower devices (such as GPUs), parallelisation with interaction is more efficient. Our numerical experiments for the stochastic Lorenz 96 model also show that the averaging of independent estimators can lead to lower estimation errors, compared to a centralised bootstrap filter with the same number of particles, as the dimension of the state space is increased.
Appendix 1
Proof of Lemma 2
We proceed by induction in the time index t. For t = 0, ρ_{0}=τ_{0}=π_{0} and, since \(x_{0}^{(i)}\), i=1,...,N, are drawn from π_{0}, the equality \(E\left [\left (f,\rho _{0}^{N}\right)\right ] = \left (f,\rho _{0}\right)\) is straightforward.
Let us assume that
for some t>0 and any \(f \in B({\mathcal X})\). If we use \(\bar {\mathcal F}_{t}\) to denote the σalgebra generated by the set of random variables \(\left \{ x_{0:t1}^{(i)}, \bar x_{1:t}^{(i)} : 1 \le i \le N \right \}\) then we readily find that
since \(G_{t}^{N}\) is measurable w.r.t. \(\bar {\mathcal F}_{t}\) and \(E\left [\left (f,\pi _{t}^{N}\right)\bar {\mathcal F}_{t}\right ] = \left (f,\bar \pi _{t}^{N}\right)\). Moreover, if we recall that
then it is apparent from the definition of \(G_{t}^{N}\) in (9) that
Taking together (40) and (41), we have
Let \({\mathcal F}_{t1}\) be the σalgebra generated by the set of variables \(\left \{ x_{0:t1}^{(i)}, \bar x_{0:t1}^{(i)} : 1 \le i \le N \right \}\). Since \({\mathcal F}_{t1} \subseteq \bar {\mathcal F}_{t}\), Eq. (42) yields
since \(G_{t1}^{N}\) is measurable w.r.t. \({\mathcal F}_{t1}\). Moreover, for any \(h \in B({\mathcal X})\), it is straightforward to show that
hence, as \(f g_{t}^{y_{t}} \in B({\mathcal X})\), we readily obtain
Substituting (44) into (43), we arrive at
where (45) follows from the definition of the estimate of ρ_{t−1}, namely \(\rho _{t1}^{N} ~=~ G_{t1}^{N}\pi _{t1}^{N}\). If we take unconditional expectations on both sides of Eq. (45), we obtain
where equality (46) follows from the induction hypothesis (39), (47) is obtained by simply reordering (46) and Eq. (48) follows from the recursive definition of ρ_{ t } in (5).
Appendix 2
Proof of Lemma 3
For t=0, \(\rho _{0}^{N} = \pi _{0}^{N}\), hence the result follows from Lemma 1. At any time t>0, since \(\rho _{t}^{N} = G_{t}^{N} \pi _{t}^{N}\), we readily have
where \(Z_{t}^{(i)} = G_{t}^{N} f\left (x_{t}^{(i)}\right)  (f,\rho _{t})\), i=1,...,N. It is apparent that the random variables \(Z_{t}^{(i)}\), i=1,...,N, are conditionally independent given the σalgebra \(\bar {\mathcal F}_{t}\) generated by the set \(\left \{ x_{0:t1}^{(j)}, \bar x_{0:t}^{(j)} : 1 \le j \le N \right \}\). It can also be proved that every \(Z_{t}^{(i)}\) is centred and bounded, as explicitly shown in the sequel.
To see that \(Z_{t}^{(i)}\) has zero mean, let us note first that
since \(G_{t}^{N}\) is measurable w.r.t. \(\bar {\mathcal F}_{t}\). Moreover, by the same argument as in the proof of Lemma 2, one can show that \(G_{t}^{N}\left (f,\bar \pi _{t}^{N}\right) ~=~ G_{t1}^{N} \left (fg_{t}^{y_{t}},\xi _{t}^{N}\right)\) and, therefore,
where we have used the fact that, for any \(h \in B({\mathcal X})\), \(E\left [ \left (h,\xi _{t}^{N}\right)  {\mathcal F}_{t1} \right ] = \left ((h,\tau _{t}),\pi _{t1}^{N}\right)\). However, since \(\rho _{t1}^{N} ~=~ G_{t1}^{N} \pi _{t1}^{N}\), Eq. (50) amounts to
and taking (unconditional) expectations on both sides of the equation above yields
where (51) follows from Lemma 2 (i.e. \(\rho _{t1}^{N}\) is unbiased) and (52) is a straightforward consequence of the definition of ρ_{ t } in (5). Eq. 52 states that \(E\left [ Z_{t}^{(i)} \right ] = E\left [ G_{t}^{N} f\left (x_{t}^{(i)}\right)  \left (f,\rho _{t}\right) \right ] = 0\).
To see that (every) \(Z_{t}^{(i)}\) is bounded, note that
whereas
Taking (53) and (54) together, we arrive at
which is finite for any finite t (indeed, for every t ≤ T).
Since the variables \(Z_{t}^{(i)}\), i = 1,...,N, in (49) are bounded, with zero mean and conditionally independent given \(\bar {\mathcal F}_{t}\), it is not difficult to show (see, e.g. [23] (Lemma A.1)) that
where the constant \(\breve c_{t}\) is finite and independent of N. From (56), we easily obtain the inequality (16) in the statement of Lemma 3, with \(\tilde c_{t} ~=~ 2 \breve c_{t} \ f \_{\infty } \prod _{k=1}^{t} \ g_{k}^{y_{k}} \_{\infty } < \infty \) for any t≤T<∞.
Notes
 1.
Note that \(G^{N}_{t}\) is an estimate of the normalising constant for π_{ t } (namely, the integral (1,ρ_{ t })) which can be shown to be unbiased under mild assumptions [4]. In Bayesian model selection, this constant is termed ‘model evidence’, while in parameter estimation problems, it is often referred to as the likelihood (of the unknown parameters) [27].
 2.
Other particle filtering algorithms can be applied in a straightforward way; however, we assume bootstrap filters (i.e. the procedure of Algorithm 1) for the sake of clarity and notational simplicity.
 3.
Note the difference in notation between the continuous time s and the parameter s.
 4.
Chosen from a typical trajectory of the deterministic Lorenz 63 model.
 5.
The deterministic Lorenz 96 system is chaotic for F > 6, with increasing turbulence of the chaotic flow as F is made larger.
References
 1
G Hendeby, R Karlsson, F Gustafsson, Particle filtering: the need for speed. EURASIP J. Adv. Sig. Process. 2010:, 22 (2010).
 2
M Bolić, PM Djurić, S Hong, Resampling algorithms and architectures for distributed particle filters. IEEE Trans. Sig. Process. 53(7), 2442–2450 (2005).
 3
A Doucet, N de Freitas, N Gordon, Sequential Monte Carlo Methods in Practice (Springer, New York, 2001).
 4
P Del Moral, FeynmanKac Formulae: Genealogical and Interacting Particle Systems with Applications (SpringerVerlag, New York, 2004).
 5
O Cappé, SJ Godsill, E Moulines, An overview of existing methods and recent advances in sequential Monte Carlo. Proc. IEEE. 95(5), 899–924 (2007).
 6
A Bain, D Crisan, Fundamentals of Stochastic Filtering (SpringerVerlag, New York, 2008).
 7
A GelencsérHorváth, G Tornai, A Horváth, G Cserey, Fast, parallel implementation of particle filtering on the gpu architecture. EURASIP J. Adv. Sig. Process. 2013(1), 1–16 (2013).
 8
J Míguez, Analysis of selection methods for costreference particle filtering with applications to maneuvering target tracking and dynamic optimization. Digit. Sig. Process. 17:, 787–807 (2007).
 9
Hlinka O, Sluciak O, Hlawatsch F, Djuric P, Rupp M, Likelihood consensus and its application to distributed particle filtering. IEEE Trans. Sig. Process. 60(8), 4334–4349 (2012).
 10
J Miguez, MA Vázquez, A proof of uniform convergence over time for a distributed particle filter. Sig. Process. 122:, 152–163 (2016).
 11
K Heine, N Whiteley, Fluctuations, stability and instability of a distributed particle filter with local exchange. Stoch. Process. Appl.127.8(2017), 2508–2541 (2016).
 12
N Whiteley, A Lee, K Heine, On the role of interaction in sequential Monte Carlo algorithms. Bernoulli. 22(1), 494–529 (2016).
 13
C Vergé, C Dubarry, P Del Moral, E Moulines, On parallel implementation of sequential Monte Carlo methods: the island particle model. Stat. Comput. 25(2), 243–260 (2015).
 14
P Del Moral, E Moulines, J Olsson, C Vergé, Convergence properties of weighted particle islands with application to the double bootstrap algorithm. Stoch. Syst. 6(2), 367–419 (2016).
 15
W Han, On the Numerical Solution of the Filtering Problem (Ph.D. Thesis. Department of Mathematics, Imperial College London, 2013).
 16
N Gordon, D Salmond, AFM Smith, Novel approach to nonlinear and nonGaussian Bayesian state estimation. IEE Proc.F. 140(2), 107–113 (1993).
 17
A Doucet, N de Freitas, N Gordon, in Sequential Monte Carlo Methods in Practice, ed. by A Doucet, N de Freitas, and N Gordon. An introduction to sequential Monte Carlo methods (SpringerVerlagNew York, 2001), pp. 4–14. chapter 1.
 18
A Doucet, S Godsill, C Andrieu, On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 10(3), 197–208 (2000).
 19
R Douc, O Cappé, in Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of the 4th International Symposium on. Comparison of resampling schemes for particle filtering (IEEE, 2005).
 20
D Crisan, A Doucet, A survey of convergence results on particle filtering. IEEE Trans. Sig. Process. 50(3), 736–746 (2002).
 21
N Chopin, A sequential particle filter method for static models. Biometrika. 89(3), 539–552 (2002).
 22
XL Hu, TB Schon, L Ljung, A basic convergence result for particle filtering. IEEE Trans. Sig. Process. 56(4), 1337–1348 (2008).
 23
D Crisan, J Miguez, Particlekernel estimation of the filter density in statespace models. Bernoulli. 20(4), 1879–1929 (2014).
 24
J Míguez, D Crisan, PM Djurić, On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization. Stat. Comput. 23(1), 91–107 (2013).
 25
P Del Moral, A Guionnet, On the stability of interacting processes with applications to filtering and genetic algorithms. Ann. l’Institut Henri Poincaré, (B) Probab. Stat. 37(2), 155–194 (2001).
 26
J Miguez, in IEEE 8th Sensor Array and Multichannel Sig. Process. Workshop (SAM). On the uniform asymptotic convergence of a distributed particle filter (IEEE, 2014), pp. 241–244.
 27
C Andrieu, A Doucet, R Holenstein, Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B. 72:, 269–342 (2010).
 28
N Chopin, PE Jacob, O Papaspiliopoulos, SMC2: an efficient algorithm for sequential analysis of state space models. J. R. Stat. Soc. Ser. B Stat Methodol.75.3(2013), 397–426 (2012).
 29
J Míguez, IP Mariño, MA Vázquez, Analysis of a nonlinear importance sampling scheme for Bayesian parameter estimation in statespace models. Sig. Process. 142:, 281–291 (2018).
 30
J Olsson, O Cappé, R Douc, E Moulines, Sequential Monte Carlo smoothing with application to parameter estimation in nonlinear state space models. Bernoulli. 14(1), 155–179 (2008).
 31
EN Lorenz, Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130–141 (1963).
 32
AJ Chorin, P Krause, Dimensional reduction for a Bayesian filter. PNAS. 101(42), 15013–15017 (2004).
 33
EN Lorenz, in Proceedings of the Seminar on Predictability, vol. 1. Predictability: a problem partly solved (European Centre on Medium Range Weather ForecastingReading, UK, 1996).
 34
J Hakkarainen, A Ilin, A Solonen, M Laine, H Haario, J Tamminen, E Oja, H Järvinen, On closure parameter estimation in chaotic systems. Nonlinear Proc. Geoph. 19(1), 127–143 (2012).
Acknowledgements
The authors thank Dr. Katrin Achutegui for her valuable assistance in obtaining and plotting the numerical results in Section 4.
Funding
This work was partially supported by Ministerio de Economía y Competitividad of Spain (TEC201238883C0201 COMPREHENSION and TEC201569868C21R ADVENTURE) and the Office of Naval Research Global (N62909 1512011). D. C. and J. M. acknowledge the support of the Isaac Newton Institute through the program Monte Carlo Inference for HighDimensional Statistical Models.
Author information
Affiliations
Contributions
DC and JM carried out the analysis and obtained the theoretical results. JM and GRM coded the algorithms and run the computer experiments. All authors collaborated in the composition of the manuscript. The authors are listed in alphabetical order. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Joaquín Míguez.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Crisan, D., Míguez, J. & RíosMuñoz, G. On the performance of parallelisation schemes for particle filtering. EURASIP J. Adv. Signal Process. 2018, 31 (2018) doi:10.1186/s136340180552x
Received
Accepted
Published
DOI
Keywords
 Particle filtering
 Parallelisation
 Convergence analysis
 Particle islands
 Lorenz 63
 Lorenz 96