Skip to content


Open Access

On the performance of parallelisation schemes for particle filtering

EURASIP Journal on Advances in Signal Processing20182018:31

Received: 2 March 2017

Accepted: 1 May 2018

Published: 25 May 2018


Considerable effort has been recently devoted to the design of schemes for the parallel implementation of sequential Monte Carlo (SMC) methods for dynamical systems, also widely known as particle filters (PFs). In this paper, we present a brief survey of recent techniques, with an emphasis on the availability of analytical results regarding their performance. Most parallelisation methods can be interpreted as running an ensemble of lower-cost PFs, and the differences between schemes depend on the degree of interaction among the members of the ensemble. We also provide some insights on the use of the simplest scheme for the parallelisation of SMC methods, which consists in splitting the computational budget into M non-interacting PFs with N particles each and then obtaining the desired estimators by averaging over the M independent outcomes of the filters. This approach minimises the parallelisation overhead yet still displays desirable theoretical properties. We analyse the mean square error (MSE) of estimators of moments of the optimal filtering distribution and show the effect of the parallelisation scheme on the approximation error rates. Following these results, we propose a time–error index to compare schemes with different degrees of parallelisation. Finally, we provide two numerical examples involving stochastic versions of the Lorenz 63 and Lorenz 96 systems. In both cases, we show that the ensemble of non-interacting PFs can attain the approximation accuracy of a centralised PF (with the same total number of particles) in just a fraction of its running time using a standard multicore computer.


Particle filteringParallelisationConvergence analysisParticle islandsLorenz 63Lorenz 96

1 Introduction

Over the past decade, there has been a continued interest in the design of schemes for the implementation of particle filtering algorithms using parallel or distributed hardware of various types, including general purpose devices such as multi-core CPUs or graphical processing units (GPUs) [1] and application-tailored devices such as field-programmable gate arrays (FPGAs) [2]. A particle filter (PF) is a recursive algorithm for the approximation of the sequence of posterior probability distributions that arise from a stochastic dynamical system in state-space form (see, e.g. [36] and references therein for a general view of the field). A typical PF includes three steps that are repeated sequentially:
  • Monte Carlo sampling in the space of the state variables,

  • Computation of weights for the generated samples and, finally,

  • Resampling according to the weights.

While at first sight the algorithm may look straightforward to parallelise (sampling and weighting can be carried out concurrently without any constraint), the resampling step involves the interaction of the whole set of Monte Carlo samples. Several authors have proposed schemes for ‘splitting’ the resampling step into simpler tasks that can be carried out concurrently. The approaches are diverse and range from the heuristic [79] to the mathematically well-principled [2, 1014]. However, the former are largely based on (often loose) approximations that prevent the claim of any rigorous guarantees of convergence, whereas the latter involve non-negligible overhead to ensure the proper interaction of particles.

The goal of this paper is to provide
  • A survey of recently proposed, and mathematically well grounded, parallelisation schemes for particle filtering and

  • Analytical insights into the performance of the simplest parallelisation method, namely the averaging of statistically independent PFs.

Besides describing the various methodologies, we aim at characterising their performance analytically whenever possible. For that purpose, we need to introduce accurate notation, unfortunately a bit more involved than needed for the mere description of the algorithmic steps. Then, we describe and provide a basic convergence result for the standard PF and proceed to describe four different approaches to its parallelisation: the simple averaging of statistically independent (i.e. non interacting) low-complexity PFs, the method based on the distributed resampling with non-proportional allocation (DRNA) procedure of [2, 10, 11], the particle island model of [13, 14] and the adaptive interaction scheme termed α-sequential Monte Carlo (α-SMC) in [12].

The simplest parallelisation scheme consists in running M statistically independent PFs with Nparticles (i.e. Monte Carlo samples) each and then averaging the M independent estimators. This approach has the limitation that the bias of the averaged estimator depends only on N. Hence, if N is relatively small, the bias is large even if we use a very high number M of parallel filters. This drawback can be overcome by allowing some degree of interaction among the M concurrently running PFs. The DRNA, particle island and α-SMC approaches introduce this interaction in different flavours. In DRNA-based ensembles of PFs, each filter runs separately but it periodically exchanges a few particles with other members of the ensemble using a communication network [10]. Algorithms in the particle island class rely on two levels of resampling: conventional resampling at particle level and island-level resampling, where complete sets of particles (associated to parallel-running PFs) are replicated or eliminated stochastically [13, 14]. Finally, the α-SMC scheme of [12] is a very flexible methodology that enables the adaptive selection of different interaction patterns (i.e. which particles are resampled together) over time. For each one of these techniques, we describe the methodology and establish basic theoretical guarantees for convergence.

In the second part of the paper, we focus on the analysis of the performance of the simplest parallelisation scheme, the averaging of M statistically independent PFs with N particles each. Under mild assumptions, we analyse the mean square error (MSE) of the estimators of one-dimensional statistics of the optimal filtering distribution and show explicitly the effect of the parallelisation scheme on the convergence rate. Specifically, we study the decomposition of the MSE into variance and bias components, to show that the variance is \(O\left (\frac {1}{MN}\right)\), i.e. it decreases linearly with the total number of particles, while the bias is \(O\left (\frac {1}{N^{2}}\right)\), i.e. it goes to 0 quadratically with N. These results have already been obtained, e.g. in [13] using the Feynman-Kac framework of [4]. Here, we aim at providing a self-contained analysis that illustrates the key theoretical issues in the convergence of parallel PFs. All proofs are constructed from elementary principles, and we obtain explicit error rates (for the bias, the variance and the MSE) that hold for all M and N, while the theorems in [13] are strictly asymptotic. While we have focused here on PFs for discrete-time state-space models, the analysis can be similarly done for continuous-time systems, and, indeed, the basic results needed for that case can be found in [15]. Finally, in order to compare different parallelisation schemes, we introduce a time–error index that combines time complexity (asymptotic order of the running time) and estimation accuracy (asymptotic error rates) into a single quantitative figure of merit that can be used to compare schemes with different degrees of interaction.

The rest of the paper is organised as follows. In Section 2, we present basic background material, and notation, for the analysis of PFs. Section 3 is devoted to a survey of parallelisation schemes for particle filtering. Our analysis of the ensemble of non-interacting PFs is presented in Section 4. In Section 5, we present numerical results for two examples, namely the filtering of stochastic versions of the Lorenz 63 and Lorenz 96 systems, respectively. The latter is often used as a simplified model of atmospheric dynamics, and it has the property that it can be scaled to an arbitrary dimension. Our simulation results show that the use of averaged estimators computed from ensembles of non-interacting filters can be advantageous in terms of accuracy (not only running times) as the system dimension grows. Finally, Section 6 is devoted to a discussion of the obtained results, together with some concluding remarks.

2 Background

2.1 Notation and preliminaries

We first introduce some common notations to be used through the paper, broadly classified by topics. Below, \(\mathbb {R}\) denotes the real line, while for an integer d ≥ 1, \(\mathbb {R}^{d}~=~\overbrace {\mathbb {R} \times \ldots \times \mathbb {R}}^{d \text {{\tiny times}}}\).
  • Functions.
    • The supremum norm of a real function \(f:\mathbb {R}^{d} \rightarrow \mathbb {R}\) is denoted as \(\| f \|_{\infty } ~=~ \sup _{x\in \mathbb {R}^{d}} | f(x) |\).

    • B(S) is the set of bounded real functions over \(S \subseteq \mathbb {R}^{d}\), i.e. \(f \in B(\mathbb {R}^{d})\) if, and only if, f has domain S and f <.

  • Measures and integrals. Let \(S \subseteq \mathbb {R}^{d}\) be a subset of \(\mathbb {R}^{d}\).
    • \({\mathcal B}(S)\) is the σ-algebra of Borel subsets of S.

    • \({\mathcal P}(S)\) is the set of probability measures over the measurable space \(({\mathcal B}(S),S)\).

    • \((f,\mu) \triangleq \int f(x) \mu (dx)\) is the integral of a real function \(f:S \rightarrow \mathbb {R}\) with respect to (w.r.t.) a measure \(\mu \in {\mathcal P}(S)\).

    • Given a probability measure \(\mu \in {\mathcal P}(S)\), a Borel set \(A \in {\mathcal B}(S)\) and the indicator function
      $$ I_{A}(x) = \left\{ \begin{array}{ll} 1, &\text{if}\ x \in A\\ 0, &\text{otherwise} \end{array} \right., $$
      μ(A) = (I A ,μ) is the probability of A.
  • Sequences, vectors and random variables (r.v.’s).
    • We use a subscript notation for sequences, namely \(x_{t_{1}:t_{2}} \triangleq \left \{ x_{t_{1}}, \ldots, x_{t_{2}} \right \}\).

    • For an element \(x~=~\left (x_{1},\ldots,x_{d}\right) \in \mathbb {R}^{d}\) of a Euclidean space, its norm is denoted as \(\| x \|~=~\sqrt {x_{1}^{2}+\ldots +x_{d}^{2} }\).

    • The L p norm of a real r.v. Z, with p ≥ 1, is written as \(\| Z \|_{p} \triangleq E[ |Z|^{p} ]^{1/p}\), where E[·] denotes expectation w.r.t. the distribution of Z.

2.2 State-space Markov models in discrete time

Consider two random sequences, {X t }t≥0 and {Y t }t≥1, taking values in \({\mathcal X} \subseteq \mathbb {R}^{d_{x}}\) and \(\mathbb {R}^{d_{y}}\), respectively. Let \(\mathbb {P}_{t}\) be the joint probability measure for the collection of random variables {X0,X n ,Y n }1≤nt.

We refer to the sequence {X t }t≥0 as the state (or signal) process, and we assume that it is an inhomogeneous Markov chain governed by an initial probability measure \(\tau _{0} \in {\mathcal P}({\mathcal X})\) and a sequence of Markov transition kernels \(\tau _{t} : {\mathcal B}({\mathcal X}) \times {\mathcal X} \rightarrow \left [0,1\right ]\). To be specific, we define
$$\begin{array}{*{20}l} \tau_{0}(A) &\triangleq \mathbb{P}_{0}\left\{ X_{0} \in A \right\}, \end{array} $$
$$\begin{array}{*{20}l} \tau_{t}\left(A|x_{t-1}\right) &\triangleq \mathbb{P}_{t}\left\{ X_{t} \in A | X_{t-1}=x_{t-1} \right\}, \quad t \ge 1, \end{array} $$

where \(A \in {\mathcal B}({\mathcal X})\) is a Borel set. The sequence {Y t }t≥1 is termed the observation process. Each r.v. Y t is assumed to be conditionally independent of other observations given X t ; hence, the conditional distribution of the r.v. Y t given X t =x t is fully described by the probability density function (pdf) g t (y t |x t )>0. We often use g t as a function of x t (i.e. as a likelihood) and hence we write \(g_{t}^{y}(x) ~\triangleq ~ g_{t}(y|x)\). The prior τ0, the kernels {τ t }t≥1 and the functions {g t }t≥1 describe a stochastic Markov state-space model in discrete time.

The stochastic filtering problem consists in the computation of the posterior probability measure of the state X t given the sequence of observations up to time t. Specifically, for a given observation record {y t }t≥1, we seek the probability measures
$$ \pi_{t}(A) \triangleq \mathbb{P}_{t}\left\{ X_{t} \in A | Y_{1:t}=y_{1:t} \right\}, \quad t=0, 1, 2,... $$

where \(A \in {\mathcal B}({\mathcal X})\). For many practical problems, the interest actually lies in the computation of statistics of π t , e.g. the posterior mean or the posterior variance of X t . Such statistics can be written as integrals of the form (f,π t ), for some function \(f:{\mathcal X}\rightarrow \mathbb {R}\). Note that, for t = 0, we recover the prior signal measure, i.e. π0 = τ0.

An associated problem is the computation of the one-step-ahead predictive measure
$$ \xi_{t}(A) \triangleq \mathbb{P}_{t}\left\{ X_{t} \in A | Y_{1:t-1}=y_{1:t-1} \right\}, \quad t = 1, 2,... $$
This measure can be explicitly written in terms of the kernel τ t and the filter πt−1. Indeed, for any integrable function \(f:{\mathcal X}\rightarrow \mathbb {R}\), we readily obtain (see, e.g. ([6] Chapter 10))
$$\begin{array}{@{}rcl@{}} (f,\xi_{t}) &=& \int \int f(x)\tau_{t}(dx|x')\pi_{t-1}(dx') \\ &=& \left((f,\tau_{t}), \pi_{t-1} \right), \end{array} $$

and we write ξ t = τ t π t as shorthand.

The filter at time t, π t , can be obtained from the predictive measure, ξ t , and the likelihood, \(g_{t}^{y_{t}}\), by way of the so-called projective product [6] or Boltzman-Gibbs transformation [4], \(\pi _{t}~=~g_{t}^{y_{t}} \star \xi _{t}\), defined as
$$ \left(f,g_{t}^{y_{t}} \star \xi_{t}\right) \triangleq \frac{ \left(fg_{t}^{y_{t}},\xi_{t}\right) }{ \left(g_{t}^{y_{t}},\xi_{t}\right) } $$
for any integrable function \(f:{\mathcal X}\rightarrow \mathbb {R}\). Combined with (3), this yields the recursive formula
$$ \pi_{t} = g_{t}^{y_{t}} \star \tau_{t} \pi_{t-1}. $$
It is key to the analysis of Section 4 to keep track of the sequence of non-normalised measures {ρ t }t≥0, where
$$ \rho_{0} = \pi_{0}, \quad \rho_{t} = g_{t}^{y_{t}} \cdot \tau_{t} \rho_{t-1} $$
and, for any integrable function \(f:{\mathcal X}\rightarrow \mathbb {R}\) and any measure \(\alpha \in {\mathcal P}({\mathcal X})\), we define
$$ \left(f,g_{t}^{y_{t}}\cdot \alpha\right) \triangleq \left(fg_{t}^{y_{t}},\alpha\right). $$
We remark that ρ t is not a probability measure but a non-normalised version of π t , namely
$$ (f,\pi_{t}) = \frac{ (f,\rho_{t}) }{ (\mathbf{1},\rho_{t}) }, $$

where 1(x) = 1 is the constant unit function.

2.3 Standard particle filter

Assume that a sequence of observations Y1:T = y1:T, for some T < , is given. Then, the sequences of measures {π t }t≥1, {ξ t }t≥1 and {ρ t }t≥0 can be numerically approximated using particle filtering. PFs are numerical methods based on the recursive relationships (4) and (6). The simplest algorithm, often called ‘standard particle filter’ or ‘bootstrap filter’ [16] (see also [17]), can be described as follows.

Step 2.(b) is referred to as resampling or selection. In the form stated here, it reduces to the so-called multinomial resampling algorithm [18, 19], but the convergence of the filter can be easily proved for various other schemes (see, e.g. the treatment of the resampling step in [6]).

Using the sets \(\left \{ \bar x_{t}^{(n)} \right \}_{1 \le n \le N}\) and \(\left \{ x_{t}^{(n)} \right \}_{1 \le n \le N}\), we construct random approximations of ξ t , ρ t and π t , namely
$$\begin{array}{*{20}l} {}\xi_{t}^{N} \!= \frac{1}{N} \sum_{n=1}^{N} \delta_{\bar x_{t}^{(n)}}, \quad \pi_{t}^{N} = \frac{1}{N} \sum_{n=1}^{N} \delta_{x_{t}^{(n)}}, \quad \rho_{t}^{N} = G_{t}^{N} \pi_{t}^{N} \end{array} $$
where δ x is the delta unit-measure located at \(x \in \mathbb {R}^{d_{x}}\) and1
$$ G_{t}^{N} = \frac{1}{N^{t}} \prod_{k=1}^{t} \left(\sum_{j=1}^{N} g_{k}^{y_{k}}\left(\bar x_{k}^{(j)}\right) \right). $$
For any integrable function f on the state space, it is straightforward to approximate the integrals (f,ξ t ), (f,π t ) and (f,ρ t ) as
$$\begin{array}{*{20}l} (f,\xi_{t}) &\approx \left(f,\xi_{t}^{N}\right) = \frac{1}{N} \sum_{n=1}^{N} f\left(\bar x_{t}^{(n)}\right), \\ \left(f,\pi_{t}\right) &\approx \left(f,\pi_{t}^{N}\right) = \frac{1}{N} \sum_{n=1}^{N} f\left(x_{t}^{(n)}\right) \quad \text{and} \\ \left(f,\rho_{t}\right) &\approx \left(f,\rho_{t}^{N}\right) = G_{t}^{N} \left(f,\pi_{t}^{N}\right), \end{array} $$


The convergence of PFs has been analysed in different ways [4, 6, 2023]. Here, we use simple results for the convergence of the L p norms (p ≥ 1) of the approximation errors. For the approximation of integrals w.r.t. ξ t and π t , we have the following standard result.

Lemma 1

Assume that the sequence of observations Y1:T=y1:T is fixed (with T<), \(g_{t}^{y_{t}} \in B({\mathcal X})\) and \(g_{t}^{y_{t}}>0\) (in particular, \(\left (g_{t}^{y_{t}},\xi _{t}\right) > 0\)) for every t=1,2,...,T. Then for any \(f \in B({\mathcal X})\), any p≥1 and every t=1,…,T,
$$\begin{array}{*{20}l} \left\| \left(f,\xi_{t}^{N}\right) - \left(f,\xi_{t}\right) \right\|_{p} &\le \frac{ \bar c_{t} \| f \|_{\infty} }{ \sqrt{N} } \quad \text{and} \end{array} $$
$$\begin{array}{*{20}l} \left\| \left(f,\pi_{t}^{N}\right) - \left(f,\pi_{t}\right) \right\|_{p} &\le \frac{ c_{t} \| f \|_{\infty} }{ \sqrt{N} }, \end{array} $$

where \(\bar c_{t}\) and c t are finite constants independent of N, \(\| f \|_{\infty }=\sup _{x \in {\mathcal X}} |f(x)|<\infty \) and the expectations are taken over the distributions of the measure-valued random variables \(\xi _{t}^{N}\) and \(\pi _{t}^{N}\), respectively.


This result is a special case of, e.g. Lemma 1 in [24]. □

Remark 1

The constants \(\bar c_{t}\) and c t can be easily shown to increase exponentially with t. It is possible to find error rates independent of t by imposing additional assumptions on the state-space model (related to the stability of the optimal filter, π t ) [4,25].

3 Parallelisation schemes for particle filtering

3.1 Non-interacting particle filters

Assume we intend to run a PF with K particles. Most parallelisation schemes split the set of particles \(\left \{ x_{t}^{(k)} \right \}_{1 \le k \le K}\) into subsets and then run separate (but possibly interacting) PFs for each subset. To be specific, assume that the complete set of K particles can be divided into M subsets with N elements each, i.e. K = MN, and we construct disjoint subsets
$$\left\{ x_{t}^{(m,n)} \right\}_{1 \le n \le N}, \quad \text{for \(m=1,..., M\),} $$
such that
$$\bigcup_{m=1}^{M} \left\{ x_{t}^{(m,n)} \right\}_{1 \le n \le N} = \left\{ x_{t}^{(k)} \right\}_{1 \le k \le K}. $$
In the simplest scheme, M independent (i.e. non-interacting) PFs are run separately. Assume for simplicity that the standard PF outlined in Algorithm 1 is used on each subset. Then, at each time t, we have M estimates of the filtering measure, namely
$$ \pi_{t}^{m,N} = \frac{1}{N} \sum_{n=1}^{N} \delta_{x_{t}^{(m,n)}}, \quad m=1, \ldots, M. $$
Assuming that the goal is to approximate integrals of the form (f,π t ), for some integrable real function \(f:{\mathcal X}\!\rightarrow \!\mathbb {R}\), then we obtain an ensemble of M independent and identically distributed (i.i.d.) estimators
$$\left(f,\pi_{t}^{m,N}\right) = \frac{1}{N} \sum_{n=1}^{N} f\left(x_{t}^{(m,n)}\right), \quad m=1, \ldots, M, $$
which can be averaged to yield
$$ {}\left(f,\pi_{t}^{M \times N}\right) \,=\, \frac{1}{M}\! \sum_{m=1}^{M} \!\left(f,\pi_{t}^{m,N}\right) = \frac{1}{MN} \sum_{m=1}^{M}\sum_{n=1}^{N} f\left(x_{t}^{(m,n)}\right), $$

where we have denoted \(\pi _{t}^{M \times N} = \frac {1}{M} \sum _{m=1}^{M} \pi _{t}^{m,N}\).

This scheme is straightforward to implement, and it does not involve any parallelisation overhead as the M PFs do not interact. A self-contained analysis of the MSE of the ensemble estimator \(\left (f,\pi _{t}^{M \times N}\right)\) is presented in Section 4.

A key result, to be explicitly shown in our analysis but also pointed out in [13] and [12], is that the estimation bias \(\left | E\left [ (f,\pi _{t}) - \left (f,\pi _{t}^{M \times N}\right) \right ] \right |\) decreases as O(N−2). This implies that if the number of particles per subset, N, is kept fixed, then the MSE, \(E\left [ \left | (f,\pi _{t}) - \left (f,\pi _{t}^{M \times N}\right) \right |^{2} \right ]\), remains bounded away from zero even if the number of subsets is made arbitrarily large, i.e. M. This can be a drawback depending on the type of parallel computing configuration to be used. In multicore computers, for example, the number of subsets M can be expected to be moderate (of the order of cores available) and N can often be made large enough to make the bias negligible. On the other hand, implementations based on low-power processors, such as graphical processing units (GPUs) or wireless networks, are more efficient when operating with a large number of subsets, M, and a low number of particles per subset, N. In these scenarios, the bias of the non-interacting ensemble estimator in Eq. (13) can be significant. The solution to this limitation is to introduce some degree of interaction among the M parallel-running PFs. Some relevant schemes are described below.

3.2 Distributed resampling with non-proportional allocation

The scheme termed distributed resampling with non-proportional allocation (DRNA) for the parallelisation of PFs was originally introduced in [2] (Section IV.A.3), but it has been only recently that a theoretical characterisation of its performance has been obtained [10,11,26].

The same as in Section 3.1, assume that we have a budget of K = MN particles, which are split into M subsets with N particles each. We run a standard PF for each subset2 which, in addition to the particles and weights, keeps track of the aggregated non-normalised weight
$$ W_{t}^{(m)*} = W_{t-1}^{(m)*} \sum_{n=1}^{N} g_{t}^{y_{t}}\left(x_{t}^{(m,n)}\right). $$
Note that \(W_{t}^{(m)*}\) represents the likelihood of the mth subset of particles \(\left \{ x_{t}^{(m,n)} \right \}_{1 \le n \le N}\). The normalised aggregated weights are computed as
$$W_{t}^{(m)} = \frac{ W_{t}^{(m)*} }{ \sum_{i=1}^{M} W_{t}^{(i)*} }, \quad m=1, \ldots, M. $$
In this scheme, the M parallel PFs are not independent. Every t0 time steps, the PFs exchange subsets of particles and weights using a communication network [2]. This exchange can be formally described by means of a deterministic one-to-one map
$$\beta:\{ 1,..., M \} \times \{ 1,..., N \} \rightarrow \{ 1,..., M \} \times \{ 1,..., N \} $$
that keeps the number of particles per subset, N, invariant. Specifically, (u,v) = β(m,n) means that the nth particle of the mth subset is transmitted to the uth subset, where it becomes particle number v. In summary, if we have the particles
$$\left\{ x_{t}^{(m,n)} \right\}_{1 \le n \le N; 1 \le m \le M}, $$
then, after the exchange step, the particles are re-labelled as
$$\left\{ x_{t}^{\beta(m,n)} \right\}_{1 \le n \le N; 1 \le m \le M}. $$

Typically, only small subsets of particles are exchanged, hence β(m,n)=(m,n) for most values of m and n. The resulting parallel particle filtering algorithm can be outlined as shown below (adapted from [10]).

We remark that every PF operates independently of all others except for the particle exchange, step 2.(c), which is carried out every t0 time steps. The degree of interaction can be controlled by designing the map β(m,k) in a proper way. Typically, exchanging a subset of particles with ‘neighbour’ PFs is sufficient. For example, if we assume the parallel PFs are arranged in a ring configuration, then the mth PF can exchange, say, two particles with PF number m−1 and another two particles with PF number m+1, in such a way that all parallel PFs retain N particles (four of them received from their neighbours) after the exchange.

We also note that the local resampling step is carried out independently, and concurrently, for each parallel-running PF and it does not change the aggregate weights, i.e. \(\bar W_{t}^{(m)*} = \sum _{n=1}^{N} \bar w_{t}^{(m,n)*} = \sum _{n=1}^{N} \tilde w^{(m,n)*}\). We assume a multinomial resampling procedure, but other procedures can be used in an obvious manner.

The ensemble estimator of the optimal filter π t is now computed as the weighted average
$${}\pi_{t}^{M \times N} = \sum_{m=1}^{M} W_{t}^{(m)} \pi_{t}^{m,N}, \quad \text{where} \quad \pi_{t}^{m,N} = \frac{1}{N}\sum_{n=1}^{N} \delta_{x_{t}^{(m,n)}}. $$

The particle estimator of (f,π t ) then becomes \(\left (f,\pi _{t}^{M \times N}\right) ~=~ \sum _{m=1}^{M} \frac {W_{t}^{(m)}}{N} \sum _{n=1}^{N} f\left (x_{t}^{(m,n)}\right)\).

The scheme in Algorithm 2 has been proved to converge uniformly over time, under some standard assumptions, when the number of particles per subset, N, is kept fixed and the number of subsets (i.e. the number of parallel PFs), M, is increased. To be specific, we have the following result, which is proved in [10] (Section 3.2).

Theorem 1

If the following three assumptions hold:
  • The sequence of observations {y t }t≥1 is fixed (but otherwise arbitrary) and there exists a real constant 0<a< such that \(\frac {1}{a} < g_{t}^{y_{t}}(x) < a\) for every t≥1 and every \(x \in {\mathcal X}\).

  • The sequence of probability measures {π t }t≥0 is stable (see [25]).

  • The particle exchange step guarantees that
    $$ E\left[ \left(\sup_{1 \le m \le M} W_{rt_{0}}^{(m)} \right)^{q} \right] \le \frac{c^{q}}{M^{q-\epsilon}}, \quad \text{for every} r \in \mathbb{N} $$

    and some constants c<, 0≤ε<1 and q≥4 independent of M.

Then, for any fixed 0<N<,
$${\lim}_{M \rightarrow \infty} \sup_{t\ge 0} \left\|\left(f,\pi_{t}^{M \times N}\right) - (f,\pi_{t}) \right\|_{p} = 0 $$
for any \(f \in B({\mathcal X})\) and every 1≤pq.

Assumption iii. in the latter theorem indicates that none of the M subsets should accumulate too much aggregate weight compared to the other subsets. This accumulation of weight is precisely controlled by the particle exchange steps. In a practical implementation, the aggregate weights \(W_{t}^{(m)*}\) should be monitored and additional particle exchange steps should be triggered when the weight of any subset increases beyond some prescribed threshold.

3.3 Particle islands

The particle island model was introduced in [13] in order to address the parallel processing of subsets of particles in SMC methods in a systematic manner. Similar to the DRNA-based PFs of Section 3.2, the algorithms proposed in [13] are based on running M parallel PFs, each one on a disjoint subset of particles, namely \(\big \{ x_{t}^{(m,n)} \big \}_{1 \le n \le N}\) for the mth filter, and keep track of the non-normalised aggregate weights \(W_{t}^{(m)*}\) defined in Eq. (14).

However, particle island methods do not rely on an exchange of particles between the PFs running the different subsets. Instead, a resampling scheme in two levels is implemented.
  • Particle level: resampling is carried out locally within each of the M concurrently running PFs. This is equivalent to the local resampling step in Algorithm 2.

  • Island level: the aggregate weights \(W_{t}^{(m)}\) are used to resample the particle subsets, or islands, assigned to the individual PFs. In this step, complete subsets can be replicated or eliminated (in the same way as particles are in a conventional, or particle level, resampling step).

We now outline the double bootstrap filter, an algorithm described in [13] (Algorithm 1) that performs multinomial resampling at both the particle level and the island level. While in the version of [13] both resampling steps are taken at every time step t, we describe a slightly more general procedure where the island-level resampling steps are taken periodically, every t0 ≥ 1 time steps. For simplicity, we introduce the notation \({\sf X}_{t}^{m,N} ~=~ \big \{ x_{t}^{(m,n)} \big \}_{1 \le n \le N}\) for the subset of N particles assigned to the mth island (ie. the mth concurrently running PF).

In Algorithm 3, a multinomial resampling procedure is employed both at the particle level and the island level. Other schemes are obviously possible and some of them are explored in [13], including ε-interactions and resampling conditional on the effective sample size.

The particle approximation of the optimal filter π t takes the form \(\pi _{t}^{M \times N} ~=~ \sum _{m=1}^{M} W_{t}^{(m)} \pi _{t}^{m,N}\), where \(\pi _{t}^{m,N} ~=~ \frac {1}{N} \sum _{n=1}^{N} \delta _{x_{t}^{(m,n)}}\). This is formally identical to the DRNA-based Algorithm 2, although the procedure for the computation of the particles and weights is obviously different.

The asymptotic convergence of the double bootstrap filter was proved in [13] using the Feynman-Kac machinery of [4]. Then, in the follow-up paper [14], a central limit theorem was proved and bounds on the asymptotic variance of a class of schemes that includes Algorithm 3 were derived. Here we reproduce the basic convergence result of [13], adapted to the notation of this paper.

Theorem 2

Assume that the sequence of observations y1:T is arbitrary but fixed, T is arbitrarily large but finite and the likelihood functions \(g_{t}^{y_{t}}(x)\) are positive and bounded for 1≤tT. Then, for any \(f \in B({\mathcal X})\) and every t=1,...,T,
$$\begin{array}{@{}rcl@{}} {\lim}_{N\rightarrow\infty} {\lim}_{M\rightarrow\infty} NM \!\times\! E\left[\! \left(f, \pi_{t}^{M \times N}\right) - (f, \pi_{t})\! \right] &=& B(f,t) < \infty, \\\quad \text{and} \\ {\lim}_{N\rightarrow\infty} {\lim}_{M\rightarrow\infty} NM \times \text{\sf Var}\left[ \left(f, \pi_{t}^{M \times N}\right) \right] &=& V(f,t) < \infty, \end{array} $$

where Var[·] denotes the variance of a random variable and B(f,t) and V(f,t) are finite constants with respect to both M and N.

The results in Theorem 2 can be adapted to the case where the island-level resampling step is removed from Algorithm 3, effectively converting the double bootstrap method into an ensemble of non-interacting PFs. It is proved in [13] that, in such case,
$$\begin{array}{*{20}l} {}{\lim}_{N\rightarrow\infty} {\lim}_{M\rightarrow\infty} N \times E\left[ \left(f, \pi_{t}^{M \times N}\right) - (f, \pi_{t}) \right] &= \bar B(f,t) < \infty,\\ \text{and} \\ {\lim}_{N\rightarrow\infty} {\lim}_{M\rightarrow\infty} NM \times \text{\sf Var}\left[ \left(f, \pi_{t}^{M \times N}\right) \right] &= \bar V(f,t) < \infty \end{array} $$

where the constants \(\bar B(f,t)\) and \(\bar V(f,t)\) are independent of M and N. This implies that the bias of the estimator \(\left (f,\pi _{t}^{M\times N}\right)\) with non-interacting PFs depends only on N and cannot be eliminated by taking M alone. The MSE of Algorithm 3, on the other hand, vanishes as MN.

3.4 Adaptive interaction pattern: the α-SMC methodology

Rather than working with fixed subsets \({\sf X}_{t}^{m,N} \,=\, \left \{\! x_{t}^{(m,n)}\!\right \}_{1 \le n \le N}\), m = 1,...,M, the α-SMC methodology of [12] enables the construction of particle filtering algorithms with adaptive interaction patterns. In particular, it is possible to devise parallelised PFs within this framework where the subsets of particles which are resampled together can change from one time step to the next (including their size, N).

Let K be the total number of particles. The interaction pattern for resampling is specified by means of a sequence of Markov transition matrices \(\alpha _{t} ~=~ \left [\alpha _{t}^{ij}\right ]\) where 1≤iK and 1≤jK are the row and column indices, respectively. Since α t is a Markov matrix, it satisfies \(\sum _{j=1}^{K} \alpha _{t}^{ij} = 1\) for every row i. The ith row in α t determines from which subset of particles we resample \(x_{t}^{(i)}\). The general α-SMC method is outlined below. We assume that either the sequence α t is pre-determined or there is some prescribed rule to select α t given the observations y1:t and the particles \(\left \{ \bar x_{t}^{(k)} \right \}_{1 \le n \le K}\).

The particle approximation of π t produced by Algorithm 4 is \(\pi _{t}^{K} ~=~ \sum _{k=1}^{K} w_{t}^{(k)} \delta _{x_{t}^{(k)}}\). The α-SMC scheme can be particularised to yield most standard particle filtering algorithms ([12] Section 2.2). Of specific interest for the purpose of parallelisation is that the DRNA-based PF (Algorithm 2) can also be described and analysed as an α-SMC procedure [11].

The convergence of α-SMC methods depends on the choice of the sequence of interaction matrices α t . Let us recursively define the matrices αt,t=I K (where I K denotes the identity matrix) and αs,t, constructed entry-wise as \(\alpha _{s,t}^{ij} = \sum _{k=1}^{K} \alpha _{s+1,t}^{ik} \alpha _{s}^{kj}\), for i,j{1,...,K} and 0≤s<t. Furthermore, define \(\beta _{s,t}^{i} = \frac {1}{K} \sum _{j=1}^{K} \alpha _{s,t}^{ji}\), for i=1,...,K and 0≤st. Then, we have the following result, proved in [12] (Section 3).

Theorem 3

Assume that \(g_{t}^{y_{t}}\) is positive and bounded for every t≥1. If the coefficients \(\{ \beta _{s,t}^{i} \}_{1 \le i \le K}\) are measurable w.r.t. the trivial σ-algebra \(\{ {\mathcal {X}}, \emptyset \}\) and \({\lim }_{K\rightarrow \infty } \max _{i \in \{1,..., K\}} \beta _{s,t}^{i} = 0\) for all 0≤st then
$${\lim}_{K\rightarrow\infty} E\left[ \left| (f,\pi_{t}) - \left(f,\pi_{t}^{K}\right) \right|^{p} \right] = 0 $$
for any \(f \in B({\mathcal X})\) and p≥1.

4 Error rates for ensembles of non-interacting particle filters

4.1 Averaged estimators

We turn our attention to the analysis of the ensemble of non-interacting PFs outlined in Section 3.1. In particular, we study the accuracy of the particle approximations \(\pi _{t}^{m,N}\) and \(\pi _{t}^{M \times N}\) introduced in Eqs. (12) and (13), respectively. We adopt the mean square error (MSE) for integrals of bounded real functions,
$$\text{MSE} \equiv E\left[ \left(\left(f,\pi_{t}^{M\times N}\right) - (f,\pi_{t}) \right)^{2} \right], \quad f \in B({\mathcal X}), $$
as a performance metric. Since the underlying state-space model is the same for all filters and they are run in a completely independent manner, the measured-valued random variables \(\pi _{t}^{m,N}\), m=1,...,M, are i.i.d., and it is straightforward to show (via Lemma 1) that
$$ E\left[ \left(\left(f,\pi_{t}^{M\times N}\right) - (f,\pi_{t}) \right)^{2} \right] \le \frac{ c_{t}^{2} \| f \|_{\infty}^{2}}{MN}, $$

for some constant t independent of N and M. However, the inequality (15) does not illuminate the effect of the choice of N. In the extreme case of N = 1, for example, \(\pi _{t}^{M \times N}\) reduces to the outcome of a sequential importance sampling algorithm, with no resampling, which is known to degenerate quickly in practice. Instead of (15), we seek a bound for the approximation error that provides some indication on the trade-off between the number of independent filters, M, and the number of particles per filter, N.

With this purpose, we tackle the classical decomposition of the MSE in variance and bias terms. First, we obtain preliminary results that are needed for the analysis of the average measure \(\pi _{t}^{M \times N}\). In particular, we prove that the random non-normalised measure \(\rho _{t}^{N}\) produced by the bootstrap filter (Algorithm 1) is unbiased and attains L p error rates proportional to \(\frac {1}{\sqrt {N}}\), i.e. the same as \(\xi _{t}^{N}\) and \(\pi _{t}^{N}\). We use these results to derive an upper bound for the bias of \(\pi _{t}^{N}\) which is proportional to \(\frac {1}{N}\). The latter enables us to deduce an upper bound for the MSE of the ensemble approximation \(\pi _{t}^{M \times N}\) consisting of two additive terms that depend explicitly on M and N. Specifically, we show that the variance component of the MSE decays linearly with the total number of particles, K=MN, while the bias term decreases with N2, i.e. quadratically with the number of particles per filter.

4.2 Assumptions on the state space model

All the results to be introduced in the rest of Section 4 hold under the (mild) assumptions of Lemma 1, which we summarise below for convenience of presentation.

Assumption 1

The sequence of observations Y1:T=y1:T is arbitrary but fixed, with T<.

Assumption 2

The likelihood functions are bounded and positive, i.e.
$$ g_{t}^{y_{t}} \in B({\mathcal X}) \quad\text{and}\quad g_{t}^{y_{t}}>0 \quad\text{for every}\quad t= 1, 2,..., T. $$

Remark 2

Note that Assumptions 1 and 2 imply that
  • \((g_{t}^{y_{t}},\alpha) > 0\), for any \(\alpha \in {\mathcal P}({\mathcal X})\), and

  • \(\prod _{k=1}^{T} g_{t}^{y_{t}} \le \prod _{k=1}^{T} \| g_{t}^{y_{t}} \|_{\infty } < \infty \),

for every t=1,2,...,T.

Remark 3

We seek simple convergence results for a fixed time horizon T<, similar to Lemma 1. Therefore, no further assumptions related to the stability of the optimal filter for the state-space model [4,25] are needed. If such assumptions are imposed then stronger (time uniform) asymptotic convergence can be proved, similar to Theorem 1 in Section 3.2. See [11] for additional results that apply to the independent filters \(\pi _{t}^{m,N}\) and the ensemble \(\pi _{t}^{M \times N}\).

4.3 Bias and error rates

Our analysis relies on some properties of the particle approximations of the non-normalised measures ρ t , t≥1. We first show that the estimate \(\rho _{t}^{N}\) in Eq. (8) is unbiased.

Lemma 2

If Assumptions 1 and 2 hold, then
$$ E\left[ \left(f,\rho_{t}^{N}\right) \right] = (f,\rho_{t}) $$
for any \(f \in B({\mathcal X})\) and every t=1,2,...,T.


See Appendix 1 for a self-contained proof. □

Remark 4

The result in Lemma 2 was originally proved in [4]. For the case 1(x)=1, it states that the estimate \(\left (\mathbf {1},\rho _{t}^{N}\right)\) of the proportionality constant of the posterior distribution π t is unbiased. This property is at the core of recent model inference algorithms such as particle MCMC [27], SMC2[28] or some population Monte Carlo [29] methods.

Combining Lemma 2 with the standard result of Lemma 1 leads to an explicit convergence rate for the L p norms of the approximation errors \(\left (f,\rho _{t}^{N}\right) - (f,\rho _{t})\).

Lemma 3

If Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\), any p≥1 and every t=1,2,...,T, we have the inequality
$$ \| \left(f,\rho_{t}^{N}\right) - (f,\rho_{t}) \|_{p} \le \frac{ \tilde c_{t} \| f \|_{\infty} }{ \sqrt{N}}, $$

where \(\tilde c_{t} < \infty \) is a constant independent of N.


See Appendix 2. □

Finally, Lemmas 2 and 3 together enable the calculation of explicit rates for the bias of the particle approximation of (f,π t ). This is a key result for the decomposition of the MSE into variance and bias terms. To be specific, we can prove the following theorem.

Theorem 4

If 0<(1,ρ t )< for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and every 0≤tT, we obtain
$$ \left| E\left[ \left(f,\pi_{t}^{N}\right) - \left(f,\pi_{t}\right) \right] \right| \le \frac{ \hat c_{t} \| f \|_{\infty} }{ N }, $$
where \(\hat c_{t} < \infty \) is a constant independent of N.
Proof Let us first note that (f,π t )=(f,ρ t )/(1,ρ t ) and
$$\begin{array}{*{20}l} \left(f,\pi_{t}^{N}\right) &= \frac{ \left(f,\rho_{t}^{N}\right) }{ G_{t}^{N} } \end{array} $$
$$\begin{array}{*{20}l} &= \frac{ \left(f,\rho_{t}^{N}\right) }{ G_{t}^{N} \left(\mathbf{1},\pi_{t}^{N}\right) } \end{array} $$
$$\begin{array}{*{20}l} &= \frac{ \left(f,\rho_{t}^{N}\right) }{ \left(\mathbf{1},\rho_{t}^{N}\right) }, \end{array} $$
where (17) follows from the construction of \(\rho _{t}^{N}\), (18) holds because \(\left (\mathbf {1},\pi _{t}^{N}\right)=1\) and (19) is, again, a consequence of the definition of \(\rho _{t}^{N}\). Therefore, the difference \(\left (f,\pi _{t}^{N}\right)-\left (f,\pi _{t}\right)\) can be written as
$$\left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) = \frac{ \left(f,\rho_{t}^{N}\right) }{ \left(\mathbf{1},\rho_{t}^{N}\right) } - \frac{ (f,\rho_{t}) }{ (\mathbf{1},\rho_{t}) } $$
and, since \(\left (f,\rho _{t}\right)=E\left [\left (f,\rho _{t}^{N}\right)\right ]\) (from Lemma 2), the bias can be expressed as
$$ E\left[ \left(f,\pi_{t}^{N}\right)-(f,\pi_{t}) \right] = E\left[ \frac{ \left(f,\rho_{t}^{N}\right) }{ \left(\mathbf{1},\rho_{t}^{N}\right) } - \frac{ \left(f,\rho_{t}^{N}\right) }{ (\mathbf{1},\rho_{t}) } \right]. $$
Some elementary manipulations on (20) yield the equality
$$\begin{array}{*{20}l} {}E \left[ \left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) \right] = E\left[ \left(f,\pi_{t}^{N}\right) \frac{ (\mathbf{1},\rho_{t}) - \left(\mathbf{1},\rho_{t}^{N}\right) }{ (\mathbf{1},\rho_{t}) } \right]. \end{array} $$
If we realise that \(E\left [ (\mathbf {1},\rho _{t}) - \left (\mathbf {1},\rho _{t}^{N}\right) \right ]=0\) (again, a consequence of Lemma 2) and move the factor (1,ρ t )−1 out of the expectation, then we easily rewrite Eq. (21) as
$$\begin{array}{*{20}l} {}E\left[\! \left(f,\pi_{t}^{N}\right)-(f,\pi_{t})\! \right] &\,=\, \frac{1}{(\mathbf{1},\rho_{t})} E\left[ \!\left(f,\pi_{t}^{N}\right) \left(\!(\mathbf{1},\rho_{t}) - \left(\mathbf{1},\rho_{t}^{N}\right)\! \right)\! \right] \\ &\quad- \frac{ (f,\pi_{t}) }{ (\mathbf{1},\rho_{t}) } E\left[ (\mathbf{1},\rho_{t}) - \left(\mathbf{1},\rho_{t}^{N}\right) \right] \\ &= \frac{1}{(\mathbf{1},\rho_{t})} E\left[ \left(\left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) \right)\right.\\ &\qquad\qquad\quad\left.\left((\mathbf{1},\rho_{t})-\left(\mathbf{1},\rho_{t}^{N}\right) \right) \right] \\ &\le \frac{1}{(\mathbf{1},\rho_{t})} \sqrt{ E\left[ \left(\left(f,\pi_{t}^{N}\right)-\left(f,\pi_{t}\right) \right)^{2} \right] } \\ &\quad\times \sqrt{ E\left[ \left((\mathbf{1},\rho_{t})-\left(\mathbf{1},\rho_{t}^{N}\right) \right)^{2} \right] } \end{array} $$
$$\begin{array}{*{20}l} &\le \frac{1}{(\mathbf{1},\rho_{t})} \left(\frac{ c_{t} \| f \|_{\infty} }{ N } \times \frac{ \tilde c_{t} }{ N } \right) = \frac{ \hat c_{t} \| f \|_{\infty} }{ N }, \end{array} $$
where we have applied the Cauchy-Schwartz inequality to obtain (22), (23) follows from Lemmas 1 and 3 and
$$\hat c_{t} = \frac{ c_{t} \tilde c_{t} \| f \|_{\infty} }{ (\mathbf{1},\rho_{t}) } < \infty $$
is a constant independent of N. □

The result in Theorem 4 was originally proved in [30], albeit by a different method.

For any \(f \in B({\mathcal X})\), let \({\mathcal E}_{t}^{N}(f)\) denote the approximation difference, i.e.
$${\mathcal E}_{t}^{N}(f) \triangleq \left(f,\pi_{t}^{N}\right) - \left(f,\pi_{t}\right). $$

This is a r.v. whose second-order moment yields the MSE of \(\left (f,\pi _{t}^{N}\right)\). It is straightforward to obtain a bound for the MSE from Lemma 1 and, by subsequently using Theorem 4, we readily find a similar bound for the variance of \({\mathcal E}_{t}^{N}(f)\), denoted \(\text {\sf Var}\left [{\mathcal E}_{t}^{N}(f)\right ]\). These results are explicitly stated by the corollary below.

Corollary 1

If 0<(1,ρ t )< for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and any 0≤tT, we obtain
$$\begin{array}{*{20}l} E\left[ \left({\mathcal E}_{t}^{N}\left(f\right) \right)^{2} \right] &\le \frac{ c_{t}^{2} \| f \|_{\infty}^{2} }{ N } \quad \text{and} \end{array} $$
$$\begin{array}{*{20}l} \text{\sf Var}\left[ {\mathcal E}_{t}^{N}\left(f\right) \right] &\le \frac{ \left(c_{t}^{v} \right)^{2} \| f \|_{\infty}^{2} }{ N }, \end{array} $$

where c t and \(c_{t}^{v}\) are finite constants independent of N.

Proof The inequality (24) for the MSE is a straightforward consequence of Lemma 1. Moreover, we can write the MSE in terms of the variance and the square of the bias, which yields
$$ {}E\left[ \left({\mathcal E}_{t}^{N}(f) \right)^{2} \right] = \text{\sf Var}\left[ {\mathcal E}_{t}^{N}(f) \right] + E^{2}\left[ {\mathcal E}_{t}^{N} \right] \le \frac{ c_{t}^{2} \| f \|_{\infty}^{2} }{ N }. $$

Since Theorem 4 ensures that \(\big | E\left [{\mathcal E}_{t}^{N}\right ] \big | \le \frac {\hat c_{t}\| f \|_{\infty }}{N}\), then the inequality (26) implies that there exists a constant \(c_{t}^{v}<\infty \) such that (25) holds. □

4.4 Error rate for the averaged estimators

Let us run M independent PFs with the same (fixed) sequence of observations Y1:T=y1:T, T<, and N particles each. The random measures output by the mth filter are denoted
$$\xi_{t}^{m,N}, \quad \pi_{t}^{m,N} \quad \text{and} \quad \rho_{t}^{m,N}, \quad \text{with \(m = 1, 2,..., M\).} $$

Obviously, all the theoretical properties established in Section 4.3, as well as the basic Lemma 1, hold for each one of the M independent filters.

Definition 1

The ensemble approximation of π t with M independent filters is the discrete random measure \(\pi _{t}^{M \times N}\) constructed as
$$ \pi_{t}^{M \times N} = \frac{1}{M} \sum_{m=1}^{M} \pi_{t}^{m,N}, $$
and the averaged estimator of (f,π t ) is \(\left (f,\pi _{t}^{M\times N}\right)\).

It is apparent that similar ensemble approximations can be given for ξ t and ρ t . Moreover, the statistical independence of the PFs yields the following corollary as a straightforward consequence of Theorem 4 and Corollary 1.

Corollary 2

If 0<(1,ρ t )< for t=1,2,...,T and Assumptions 1 and 2 hold, then, for any \(f\in B({\mathcal X})\) and any 0≤tT, the inequality
$$ {}E \left[ \left(\left(f,\pi_{t}^{M \times N}\right) - \left(f,\pi_{t}\right) \right)^{2} \right] \le \frac{ (c_{t}^{v})^{2} \| f \|_{\infty}^{2} }{ MN } + \frac{ \hat c_{t}^{2} \| f \|_{\infty}^{2} }{N^{2}} $$

holds for some constants \(c_{t}^{v}\) and \(\hat c_{t}\) independent of N and M.


Let us denote
$$\begin{array}{*{20}l} {\mathcal E}_{t}^{M \times N}\left(f\right) &= \left(f,\pi_{t}^{M \times N}\right) - \left(f,\pi_{t}\right) \quad \text{and} \\ {\mathcal E}_{t}^{m,N}\left(f\right) &= \left(f,\pi_{t}^{m, N}\right) - \left(f,\pi_{t}\right) \end{array} $$
for m=1,2,...,M. Since \(\pi _{t}^{M \times N}\) is a linear combination of i.i.d. random measures, we easily obtain that
$$\begin{array}{*{20}l} \left| E\left[ {\mathcal E}_{t}^{M \times N}(f) \right] \right|^{2} &= \left| \frac{1}{M} \sum_{m=1}^{M} E\left[ {\mathcal E}_{t}^{m,N}(f) \right] \right|^{2} \\ &= \left| E\left[ {\mathcal E}_{t}^{m,N}(f) \right] \right|^{2} \\ &\le \frac{\hat c_{t} \| f \|_{\infty}}{N}, \quad \text{for any \(m \le M\)}, \end{array} $$
where the inequality follows from Theorem 4. Moreover, again because of the independence of the random measures, we readily calculate a bound for the variance of \({\mathcal E}_{t}^{M \times N}(f)\),
$$ {}\text{\sf Var}\left[ {\mathcal E}_{t}^{M \times N}(f) \right] = \frac{1}{M} \text{\sf Var}\left[ {\mathcal E}_{t}^{m,N}(f) \right] \le \frac{ (c_{t}^{v})^{2} \| f \|_{\infty}^{2} }{ MN }, $$

where the inequality follows from Corollary 1. Since \(E\left [ \left ({\mathcal E}_{t}^{M \times N}\right)^{2} \right ] = \text {\sf Var}\left [ {\mathcal E}_{t}^{M\times N} \right ] + \left | E\left [ {\mathcal E}_{t}^{M \times N} \right ] \right |^{2}\), combining (29) and (28) yields (27) and concludes the proof. □

The inequality in Corollary 2 shows explicitly that the bias of the estimator \(\left (f,\pi _{t}^{M \times N}\right)\) cannot be arbitrarily reduced when N is fixed, even if M. This feature is already discussed in Section 3.3. Note that the inequality (27) holds for any choice of M and N, while Theorem 2 yields asymptotic limits.

Remark 5

According to the inequality (27), the bias of the estimator \(\left (f,\pi _{t}^{M\times N}\right)\) is controlled by the number of particles per subset, N, and converges quadratically, while, for fixed N, the variance decays linearly with M. The MSE rate is \(\propto \frac {1}{MN} \) as long as NM. Otherwise, the term \(\frac {\hat c_{t}^{2} \| f \|_{\infty }^{2}}{N^{2}}\) becomes dominant and the resulting asymptotic error bound turns out higher.

Remark 6

While the convergence results presented here have been proved for the standard bootstrap filter, it is straightforward to extend them to other classes of PFs for which Lemmas 1 and 2 hold.

4.5 Comparison of parallelisation schemes via time–error indices

The advantage of parallel computation is the drastic reduction of the time needed to run the PF. Let the running time for a PF with K particles be of order \({\mathcal T}(K)\), where \({\mathcal T}:\mathbb {N}\rightarrow (0,\infty)\) is some strictly increasing function of K. The quantity \({\mathcal T}(K)\) includes the time needed to generate new particles, weight them and perform resampling. The latter step is the bottleneck for parallelisation, as it requires the interaction of all K particles. Also, a ‘straightforward’ implementation of the resampling step leads to an execution time \({\mathcal T}(K)=K\log (K)\), although efficient algorithms exist that achieve to a linear time complexity, \({\mathcal T}(K)=K\). We can combine the MSE rate and the time complexity to propose a time–error performance metric.

Definition 2

We define the time–error index of a particle filtering algorithm with running time of order \({\mathcal T}\) and asymptotic MSE rate \({\mathcal R}\) as \({\mathcal C} \triangleq {\mathcal T} \times {\mathcal R}.\)

The smaller the index \({\mathcal C}\) for an algorithm, the more (asymptotically) efficient its implementation. For the standard (centralised) bootstrap filter (see Algorithm 1) with K particles, the running time is of order \({\mathcal T}(K)=K\) and the MSE rate is of order \({\mathcal R}(K)=\frac {1}{K}\); hence, the time–error index becomes
$${\mathcal C}_{bf}(K) = {\mathcal T}(K) \times {\mathcal R}(K) = 1. $$
For the computation of the ensemble approximation \(\pi _{t}^{M \times N}\), we can run M independent PFs in parallel, with N=K/M particles each and no interaction among them. Hence, the execution time becomes of order \({\mathcal T}(M,N)=N\). Since the error rate for the ensemble approximation is of order \({\mathcal R}(M,N)~=~\left (\frac {1}{MN}+\frac {1}{N^{2}}\right)\), the time–error index of the ensemble approximation is
$${\mathcal C}_{ens}(M,N) = {\mathcal T}(M,N) \times {\mathcal R}(M,N) = \frac{1}{M} + \frac{1}{N} $$
and hence it vanishes with M,N. In particular, since we have to choose NM to ensure a rate of order \(\frac {1}{MN}\), then \({\lim }_{M \rightarrow \infty } {\mathcal C}_{ens} = 0\). In any case, whenever N>1 it is apparent that \({\mathcal C}_{ens} < {\mathcal C}_{bf}\).

We have described alternative ensemble approximations where M non-independent PFs are run with N particles each in Section 3. The overall error rates for these methods are same as for the standard bootstrap filter; however, the time complexity depends not only on the number of particles N allocated to each of the M subsets, but also on the subsequent interactions among subsets.

Let us consider, for example, the double bootstrap algorithm with adaptive selection of [13] (namely, [13] (Algorithm 4)). This is a scheme where
  • M bootstrap filters (as Algorithm 1 in this paper) are run in parallel and an aggregate weight is computed for each one of them, denoted \(W_{t}^{(m)}\);

  • When the coefficient of variation (CV) of these aggregate weights is greater than a given threshold, the M bootstrap filters are resampled (some filters are discarded and others are replicated using a multinomial resampling procedure).

See [13] (Section 4.2) for details. Assuming that the resampling procedure in the second step above (termed island-level resampling in [13]) is performed, in the average, once every L time steps, then the running time for this algorithm is
$${\mathcal T}(M,N,L) = \frac{L-1}{L} N + \frac{1}{L} MN = \frac{N}{L}(M+L-1), $$
while the approximation error is \({\mathcal R}(M,N) = \frac {1}{MN}\) (see ([13] Theorem 5)). Hence, the time–error index for this double bootstrap algorithm is
$${\mathcal C}_{dbf} = {\mathcal T}(M,N,L) \times {\mathcal R}(M,N) = \frac{M+L-1}{LM}. $$

When L<<N, we readily obtain that \({\mathcal C}_{ens} < {\mathcal C}_{dbf}\). For example, for a configuration with M = 10 filters and N = 100 particles each and assuming that island-level resampling is performed every L = 20 time steps on average, then \({\mathcal C}_{dpf} ~=~ 0.145\) and \({\mathcal C}_{ens}~=~0.110\). On the contrary, if L is large enough (namely, if L > N(M − 1)/M), the double bootstrap algorithm becomes more efficient, meaning that \({\mathcal C}_{dbf} ~<~ {\mathcal C}_{ens}\).

Computing the time–error index for practical algorithms can be hard and highly dependent on the specific implementation. Different implementations of the double bootstrap algorithm, for example, may yield different time–error indices depending on how the island-level resampling step is carried out.

5 Numerical results and discussion

5.1 Example: Lorenz 63 model

5.1.1 The three-dimensional Lorenz system

Let us consider the problem of tracking the state of a three-dimensional Lorenz system [31] with additive dynamical noise and partial observations [32]. To be specific, consider a three-dimensional stochastic process {X(s)}s(0,) (s denotes continuous time) taking values on \(\mathbb {R}^{3}\), which dynamics is described by the system of stochastic differential equations
$$\begin{array}{*{20}l} dX_{1} &= -{\sf s} (X_{1}-Y_{1}) + dW_{1}, \\ dX_{2} &= {\sf r} X_{1} - X_{2} - X_{1}X_{3} + dW_{2}, \\ dX_{3} &= X_{1}X_{2} - {\sf b} X_{3} + dW_{3}, \end{array} $$
where {W i (s)}s(0,), i = 1,2,3, are independent one-dimensional Wiener processes and
$$({\sf s,r,b}) = \left(10, 28, \frac{8}{3} \right) $$
are static model parameters3 that yield chaotic dynamics. A discrete-time version of the latter system using Euler’s method with integration step T d =10−3 is straightforward to obtain and yield the model
$$\begin{array}{*{20}l} X_{1,n} &= X_{1,n-1} - T_{d} {\sf s} (X_{1,n-1}-X_{2,n-1}) \\ &\quad+ \sqrt{T_{d}} U_{1,n}, \end{array} $$
$$\begin{array}{*{20}l} X_{2,n} &= X_{2,n-1} + T_{d} ({\sf r} X_{1,n-1} - X_{2,n-1} - X_{1,n-1}X_{3,n-1}) \\ &\quad+ \sqrt{T_{d}} U_{2,n}, \end{array} $$
$$\begin{array}{*{20}l} X_{3,n} &= X_{3,n-1} + T_{d} (X_{1,n-1}X_{2,n-1} - {\sf b} X_{3,n-1}) \\ &\quad+ \sqrt{T_{d}} U_{3,n}, \end{array} $$
where {Ui,n}n=0,1,..., i = 1,2,3, are independent sequences of i.i.d. normal random variables with 0 mean and variance 1. System (30)–(32) is partially observed every 100 discrete-time steps. Specifically, we collect a sequence of scalar observations {Y t }t=1,2,..., of the form
$$ Y_{t} = X_{1,100t} + V_{t}, $$

where {V t }t=1,2,... is a sequence of i.i.d. normal random variables with zero mean and variance \(\sigma ^{2} ~=~ \frac {1}{2}\).

Let \(X_{n}=(X_{1,n},X_{2,n},X_{3,n}) \in \mathbb {R}^{3}\) be the state vector at discrete time n. The dynamic model given by Eqs. (30)–(32) yields the family of kernels τn,θ(dx|xn−1), and the observation model of Eq. (33) yields the likelihood function
$$g_{t,\theta}^{y_{t}}(x_{100t}) \propto \exp\left\{ -\frac{1}{2\sigma^{2}}\left(y_{t} - x_{1,100t} \right)^{2} \right\}, $$
both in a straightforward manner. The goal is to track the sequence of joint posterior probability measures π t , t=1,2,..., for \(\{ \hat X_{t} \}_{t=1,...}\), where \(\hat X_{t} = X_{100t}\). Note that one can draw a sample \(\hat X_{t} = \hat x_{t}\) conditional on \(\hat X_{t-1} = \hat x_{t-1}\) by successively simulating
$$\tilde x_{n} \sim \tau_{n,\theta}(dx|\tilde x_{n-1}), \quad n=100(t-1)+1,..., 100t, $$
where \(\tilde x_{100(t-1)} = \hat x_{t-1}\) and \(\hat x_{t} = \tilde x_{100t}\). The prior measure for the state variables is normal, namely \(\hat X_{0} \sim {\mathcal N}\left (x_{*},v_{0}^{2} {\mathcal I}_{3}\right),\) where x=(− 10.2410;− 1.3984;− 23.6752) is the mean4 and \(v_{0}^{2}{\mathcal I}_{3}\) is the covariance matrix, with \(v_{0}^{2} = 10\) and \({\mathcal I}_{3}\) the three-dimensional identity matrix.

5.1.2 Simulation setup

We aim at illustrating the gain in relative performance, taking into account both estimation errors and running time, that can be attained using ensembles of independent PFs. With this purpose, we have applied
  • The standard bootstrap filter (Algorithm 1), termed BF in the sequel, and

  • The ensemble of non-interacting bootstrap filters (NIBFs) that we have investigated in Section 4

to track the sequence of probability measures π t generated by the three-dimensional Lorenz model described in Section 5.1.1. We have generated a sequence of 200 synthetic observations, {y t ;t=1,...,200}, spread over an interval of 20 continuous time units, corresponding to 2×104 discrete time steps in the Euler scheme (hence, one observation every 100 steps).

The ensemble of NIBFs consists of M filters with N particles each, while the standard BF runs with K particles, where K=MN for a fair comparison.

We have coded the three algorithms in Matlab (version [R2010b] with the parallel computing toolbox) and run the experiments using a pool of identical multi-processor machines, each one having 8 cores at 3.16 GHz and 32 GB of RAM memory. The standard (centralised) BF is run with K=NM particles in a single core. For the ensemble of NIBFs, we allow the parallel computing toolbox to allocate all available cores per server in order to run all BFs concurrently.

To assess the approximation errors, we have computed empirical MSEs for the approximation of the posterior mean, \(E[\hat X_{t} | Y_{1:t}] ~=~ (I,\pi _{t})\), where I(x) = x is the identity function, for the two algorithms at the last update step, t = 200. Note, however, that the integral (I,π t ) cannot be computed in closed form for this system. Therefore, we have used the ‘expensive’ estimate
$$(I,\pi_{t}) \approx \left(I,\pi_{t}^{J}\right), \quad \text{with \(J=10^{5}\) particles}, $$
computed via the standard BF, as a proxy of the true value.

5.1.3 Numerical results

Figure 1 displays the empirical MSE, averaged over 100 independent simulation runs, attained by the parallel schemes when the number of filters is fixed, M=20, and the number of particles per filter (particle island) ranges from N=100 to N=1000. The outcome of the centralised BF with K = MN particles, hence ranging from K=20×100 to K=20×1000, is also shown for comparison. We observe that proposed ensemble of NIBFs achieves a poor performance when the number of particles per filter, N, is relatively low (N = 100), while for moderate values (N ≥ 400) it nearly matches the MSE of the centralised BF.
Figure 1
Fig. 1

Empirical mean of the MSE for the centralised BF, with K = MN particles, and the ensemble of NIBFs, with M = 20 constant and N = 100,200,400,800,1000. All curves have been obtained from a set of 100 independent simulation trials. Note that the centralised BF is run with K = 20N particles, where N takes values in the same way as for the parallel algorithm

Next, we look into the relationship between the MSE and the running time for the two algorithms. With the number of filters M=20 fixed, we have run 100 independent simulation trials for each value N=100,200,400,800 and 1000 and computed the empirical MSE and the average running time for the parallel scheme and each combination of M and N. Correspondingly, we have also run the centralised BF with K = MN particles, hence for K=2×103,4×103,8×103,16×103 and 20×103.

Figure 2 displays the resulting empirical MSE versus the running time for the two methods. If we qualify an algorithm as more efficient than another one when it is capable of attaining a lower MSE in the same amount of time, then this set of simulations shows that the independent ensemble scheme is more efficient than the centralised BF. Indeed, a close look at Fig. 2 reveals that the ensemble of M=20 NIBFs with N=1000 particles per filter achieves an empirical MSE of ≈ 6 × 10−4 with a running time of ≈ 2.9 s, while the centralised BF attains the same performance with K = 20 × 800 particles and a running time of ≈ 27.2 s (as shown by the dashed horizontal line in the plot).
Figure 2
Fig. 2

Empirical mean of the MSE versus the running time for the centralised BF and the independent ensemble of BFs. The parallel scheme is run with M = 20 constant and N = 100,200,400,600,800 and 1000. The centralised BF is run with K = 20N particles, where N takes values in the same way as for the parallel algorithm. The dashed horizontal lines indicate where the mean of the MSE match for the independent ensemble and the centralised BF. The running times for the two algorithms at that MSE level are shown as labels on the horizontal axis

5.2 Example: Lorenz 96 model

5.2.1 The J-dimensional Lorenz 96 system

The Lorenz 96 model is a deterministic system of nonlinear differential equations that displays chaotic dynamics [33,34]. The system dimension, i.e. the number of dynamic variables, can be scaled arbitrarily. A stochastic version of the model can be easily obtained by converting each differential equation into a stochastic differential equation driven by an independent and additive Wiener process. In particular, a model with J variables, Z j , j=0,…,J−1, can be written down as the system of stochastic differential equations
$$ {}dZ_{j} = -Z_{j-1} \left(Z_{j-2} - Z_{j+1}\right) - Z_{j} + F + \sigma dW_{j}, \quad j=1, \ldots, J, $$

where F = 8 is a constant forcing parameter5, the Wiener processes {W j (s)}l,j≥0 are assumed independent and the scale parameter σ is known.

A straightforward application of the Euler-Maruyama integration method yields a discrete-time version of the stochastic, two-scale Lorenz 96 model. If we let T d > 0 denote the discretisation period and n denotes discrete time, then we readily obtain
$$\begin{array}{*{20}l} {}Z_{j,n} &= Z_{j,n-1} - T_{d} \left[ Z_{j-1,n-1} \left(Z_{j-2,n-1} - Z_{j+1,n-1}\right) - F \right.\\ &\left.\quad+ Z_{j,n-1} \right] + \sqrt{T_{d}} \sigma U_{j,n}, \end{array} $$

where j=0,…,J − 1 and {Uj,n}l,j,n≥0 are independent and identically distributed (i.i.d.) standard Gaussian r.v.’s.

We assume that observations can only be collected from this system once every n0 discrete time steps. Moreover, only the variables with even indices (j = 0,2,4,…,J, for even J) are measured. Therefore, the observation process has the form
$$ Y_{t} = \left[ Z_{0,n_{0}t}, Z_{2,n_{0}t}, \ldots, Z_{J,n_{0}t} \right]^{\top} + V_{t}, $$

where t=1,2,... and {V t }t≥1 is a sequence of i.i.d. r.v.’s with common pdf \({\mathcal N}\left (v_{t}; 0, \sigma _{y}^{2} {\mathcal I}_{\frac {J}{2}}\right)\), which denotes a \(\frac {J}{2}\)-dimensional Gaussian distribution with 0 mean and covariance matrix \(\sigma _{y}^{2} {\mathcal I}_{\frac {J}{2}}\).

Equations 35 and (36) describe a state space model that can be expressed in terms of the general notation in Section 2. The state process at time t is \(\tilde X_{n}=\left [Z_{0,n}, \ldots, Z_{J-1,n} \right ]^{\top }\) and the transition kernel from time n−1 to time n is
$$ \tilde \tau_{n}\left(d\tilde x_{n}|\tilde x_{n-1}\right) = {\mathcal N}\left(\tilde x_{n}; \Psi\left(\tilde x_{n-1}\right), \sigma_{x}^{2} {\mathcal I}_{J}\right) d\tilde x_{n}, $$
where \({\mathcal {N}}(x; \mu, \Sigma)\) is the Gaussian density with argument x, mean μ and covariance matrix Σ, \(\sigma _{x}^{2} ~=~ T_{d} \sigma ^{2}\) and \(\Psi : \mathbb {R}^{J} \rightarrow \mathbb {R}^{J}\) is the deterministic transformation that accounts for all the terms on the right hand side of (35) except the noise contribution \(\sqrt {T_{d}}\sigma U_{j,n}\). Since we only collect observations every n0T d continuous-time units, we need to put the dynamics of the states on the same time scale as the observation process {Y t }t≥1 in Eq. (36). If we define \(X_{t} ~=~ \tilde X_{n_{0}t}\) then the transition kernel from Xt−1 to X t follows readily from (37),
$$ \begin{aligned} {}\tau_{t}(dx_{t}|x_{t-1}) &= \int \cdots \int \tilde \tau_{n_{0}t}(dx_{t}|\tilde x_{n_{0}t-1})\\ &\quad\prod_{i=1}^{n_{0}-2} \tilde \tau_{n_{0}t-i}(\tilde x_{n_{0}t-i} | \tilde x_{n_{0}t-i-1}) \tilde \tau_{(t-1)n_{0}+1}\\ &\quad\ (d\tilde x_{(t-1)n_{0}+1} | x_{t-1}). \end{aligned} $$
While τ t (x t |xt−1) cannot be evaluated in closed form, it is straightforward to draw a sample from X t |xt−1 by simply running Eq. (35) n0 times, with starting point xt−1. The likelihood function is
$$ g_{t}^{y_{t}}(x_{t}) \propto \exp\left\{ -\frac{1}{2\sigma_{y}^{2}} \sum_{r=0}^{\frac{J}{2}} (y_{r,t} - x_{2r,t})^{2} \right\}. $$

5.2.2 Simulation setup

We have run 100 independent simulations of the discretised Lorenz 96 model described in Section 5.2.1 above over 20 continuous-time units, with integration step T d =2×10−4 (which amounts to 105 discrete-time steps) and, for each simulation, we have obtained noisy observations, with \(\sigma _{y}^{2}=\frac {1}{2}\) and n0=10, according to Eq. (36). The noise-scale parameter σ in the state Eq. (35) is set as \(\sigma =\frac {1}{\sqrt {2}}\), so that the noise variance becomes \(\sigma _{x}^{2}=\frac {T_{d}}{2}\).

The computer experiments are similar to Section 5.1. For each simulation, we have run M = 10 iNIBFs with N particles each versus a centralised BF with K = 10N particles and used them to compute one-step-ahead predictions of the observations. In particular, at discrete-time t, we have computed predictions of the observation vector y t , using the measures
$$\begin{aligned} {}\xi_{t}^{K} &= \frac{1}{K} \sum_{k=1}^{K} \delta_{\bar x_{t}^{(k)}} \quad \text{and} \quad \xi_{t}^{M \times N} = \frac{1}{M} \sum_{m=1}^{M} \xi_{t}^{m,N} \\ &= \frac{1}{MN} \sum_{m=1}^{M} \sum_{n=1}^{N} \delta_{\bar x_{t}^{(m,n)}}, \end{aligned} $$
for the centralised BF and the NIBFs, respectively. To be specific, if \(y_{t}=\left (y_{0,t}, y_{1,t}, \ldots, y_{\frac {J}{2},t}\right)\), we have computed estimates
$$y_{r,t}^{K} = \frac{1}{K}\sum_{k=1}^{K} \bar x_{2r,t}^{(k)}, \quad y_{r,t}^{M \times N} = \frac{1}{MN}\sum_{m=1}^{M}\sum_{n=1}^{N} \bar x_{2r,t}^{(m,n)}; $$
and then we have averaged the quadratic errors \(\left (y_{r,t} - y_{r,t}^{K} \right)^{2}\) and \(\left (y_{r,t} - y_{r,t}^{M\times N} \right)^{2}\) over r, t and 100 independent simulation runs. Finally, we have normalised the resulting empirical MSE with respect to the observation power \(\frac {2}{J}\sum _{r=0}^{\frac {J}{2}} E\left [y_{r,t}^{2}\right ]\). Note that, in this case, we have used the actual observations generated in the simulations to obtain the errors, instead of the proxy values in Section 5.1.

The experiments have been carried out for a Lorenz 96 model with J=20 variables first and then for the same model with J=50 variables.

The simulations have been coded using Matlab version R2016b (64 bits), with the parallel computing toolbox enabled, on an 8-core Intel(R) Xeon(R) CPU E5-2680 v2 server, with clock frequency 2.80 GHz and 64 GB of RAM. All the results reported are averaged over 100 independent simulation runs as described above.

5.2.3 Numerical results

Figure 3 plots the normalised MSE attained by the centralised BF and the ensemble of M=10 NIBFs versus N. The centralised BF is run with K=10N particles, while each one of the M=10 NIBFs is run with N particles. The figure shows results for two different state-space dimensions. The solid lines correspond to a stochastic Lorenz 96 model with J=20 variables. In this case, the outcome of the simulations is similar to the experiments with the Lorenz 63 system: for K=MN, the centralised BF attains a smaller MSE than the NIBFs, with the gap closing as N increases. The result of the experiment is different when the state space dimension is incremented to J=50. In this case, the normalised MSE of the NIBFs is slightly smaller than the error of the centralised BF for N<1600, with both estimators attaining the same performance for N=1600. Hence, for this example, the averaging of the NIBFs has a beneficial effect on the accuracy of the estimators, at least for certain combinations of the number of particles N and the dimension J. We have verified that the bias of the centralised estimator \(y_{t}^{K}\) is lesser than the bias of the estimator \(y_{t}^{M\times N}\), as predicted by the theoretical analysis, while \(y_{t}^{M\times N}\) attains a smaller empirical variance than \(y_{t}^{K}\) (at least for N<1600 and 40≤J≤100).
Figure 3
Fig. 3

Empirical mean of the normalised MSE for the centralised BF, with K=10N particles, and the average of M=10 NIBFs with N=20,40,80,160,320,1600. Results are shown for a Lorenz 96 system with J=20 variables (solid lines) and a Lorenz 96 model with J=50 variables (dashed lines). All curves have been obtained from a set of 100 independent simulation trials

Figure 4 displays the results of the same computer experiment as in Fig. 3, except that instead of averaging the MSE over the 100 independent simulation ruins, we display the maximum MSE, both for the centralised BF and the ensembles of NIBFs, out of the 100 simulations for each one of the values of N. We observe that the ensemble of NIBFs is more robust than the centralised BF. While for dimension J = 20 the centralised BF attains a clearly lower average MSE than the NIBFs, the maximum MSE turns out to be similar for both algorithms. For dimension J = 50, the average MSE of the NIBFs is already lower (as shown in Fig. 3) than the average MSE of the BF, and the advantage of the parallelised algorithm increases when we look at the maximum MSE.
Figure 4
Fig. 4

Empirical maximum (out of 100 independent simulations runs) of the normalised MSE for the centralised BF, with K=10N particles, and the average of M=10 NIBFs with N=20,40,80,160,320,1600. Results are shown for a Lorenz 96 system with J=20 variables (solid lines) and a Lorenz 96 model with J=50 variables (dashed lines)

Figure 5 plots the same normalised MSE values of Fig. 3 versus the running times of the algorithms, given in seconds, for a complete simulation with 105 discrete time steps. As in the experiments of Section 5.1.3, the NIBFs can attain the same MSE as the centralised BF in just a fraction of the running time. While the improvement can be, ideally, of a factor M (with M=10 in this case), in practice it depends on the efficiency of the computing software. With the version of Matlab (R2016b, with the parallelisation toolbox) and the 8-core Intel Xeon processor used in these experiments, the running time of the centralised BF with K=10N particles was reduced by a modest factor of 2.6 for J=20 when using M=10 parallel NIBFs with N particles each. For J=50, however, the running time was reduced by a factor of 6.6. The difference is due to the ability of the Matlab software to parallelise more efficiently when handling larger vectors. From this figure, we observe that, for J=50, the NIBFs attain the same minimum error as the centralised BF (a normalised MSE of ≈ 0.0138) with a running time that is 6.6 times smaller (464 versus 3,082 s).
Figure 5
Fig. 5

Empirical mean of the normalised MSE versus the running time for the centralised BF with K = 10N particles and the ensemble of M = 10 NIBFs with N = 20,40,80,160,320,1600 particles each. Results are shown for a Lorenz 96 system with J = 20 variables (solid lines) and a Lorenz 96 model with J = 50 variables (dashed lines). All curves have been obtained from a set of 100 independent simulation trials

6 Conclusions

We have presented a survey of methods for the parallelisation of particle filters. Specifically, we have described the basic parallelisation scheme based on ensembles of statistically independent PFs and then discussed three alternatives which introduce different degrees of interaction among the concurrently running filters. We have placed emphasis on the theoretical guarantees of the algorithms, and, hence, we have stated conditions for the convergence of all the techniques, including the DRNA-based PF of [2], the particle island model of [13] and the α-SMC method of [12].

In the second half of the paper, we have focused on the theoretical properties of the ensemble of non-interacting PFs. For this method, we have shown, both numerically and through the definition of time–error indices, that the averaging of statistically independent PFs should be preferred when N, the number of particles per independent filter, can be made sufficiently large to reduce the bias. This is often the case when using many-core computers (or computing clusters). When parallelisation is implemented using many low-power devices (such as GPUs), parallelisation with interaction is more efficient. Our numerical experiments for the stochastic Lorenz 96 model also show that the averaging of independent estimators can lead to lower estimation errors, compared to a centralised bootstrap filter with the same number of particles, as the dimension of the state space is increased.

7 Appendix 1

8 Proof of Lemma 2

We proceed by induction in the time index t. For t = 0, ρ0=τ0=π0 and, since \(x_{0}^{(i)}\), i=1,...,N, are drawn from π0, the equality \(E\left [\left (f,\rho _{0}^{N}\right)\right ] = \left (f,\rho _{0}\right)\) is straightforward.

Let us assume that
$$ E\left[ \left(f,\rho_{t-1}^{N}\right) \right] = \left(f,\rho_{t-1}\right) $$
for some t>0 and any \(f \in B({\mathcal X})\). If we use \(\bar {\mathcal F}_{t}\) to denote the σ-algebra generated by the set of random variables \(\left \{ x_{0:t-1}^{(i)}, \bar x_{1:t}^{(i)} : 1 \le i \le N \right \}\) then we readily find that
$$ {}E\left[ \left(f,\rho_{t}^{N}\right) | \bar {\mathcal F}_{t} \right] = E\left[ G_{t}^{N} \left(f, \pi_{t}^{N}\right) | \bar {\mathcal F}_{t} \right] = G_{t}^{N} \left(f, \bar \pi_{t}^{N}\right), $$
since \(G_{t}^{N}\) is measurable w.r.t. \(\bar {\mathcal F}_{t}\) and \(E\left [\left (f,\pi _{t}^{N}\right)|\bar {\mathcal F}_{t}\right ] = \left (f,\bar \pi _{t}^{N}\right)\). Moreover, if we recall that
$$\begin{array}{*{20}l} {}\left(f,\bar \pi_{t}^{N}\right) &= \sum_{i=1}^{N} w_{t}^{(i)} f\left(\bar x_{t}^{(i)}\right) = \sum_{i=1}^{N} \frac{ g_{t}^{y_{t}}\left(\bar x_{t}^{(i)}\right) f\left(\bar x_{t}^{(i)}\right) }{ \sum_{j=1}^{N} g_{t}^{y_{t}}\left(\bar x_{t}^{(j)}\right) } \\&= \frac{ \left(fg_{t}^{y_{t}},\xi_{t}^{N}\right) }{ \left(g_{t}^{y_{t}},\xi_{t}^{N}\right) } \end{array} $$
then it is apparent from the definition of \(G_{t}^{N}\) in (9) that
$$ G_{t}^{N} \left(f, \bar \pi_{t}^{N}\right) = G_{t-1}^{N} \left(fg_{t}^{y_{t}},\xi_{t}^{N}\right). $$
Taking together (40) and (41), we have
$$ E\left[ \left(f,\rho_{t}^{N}\right) | \bar {\mathcal F}_{t} \right] = G_{t-1}^{N} \left(fg_{t}^{y_{t}},\xi_{t}^{N}\right). $$
Let \({\mathcal F}_{t-1}\) be the σ-algebra generated by the set of variables \(\left \{ x_{0:t-1}^{(i)}, \bar x_{0:t-1}^{(i)} : 1 \le i \le N \right \}\). Since \({\mathcal F}_{t-1} \subseteq \bar {\mathcal F}_{t}\), Eq. (42) yields
$$\begin{array}{*{20}l} E\left[ \left(f,\rho_{t}^{N}\right) | {\mathcal F}_{t-1} \right] &= E\left[ G_{t-1}^{N} \left(fg_{t}^{y_{t}},\xi_{t}^{N}\right) | {\mathcal F}_{t-1} \right] \\ &= G_{t-1}^{N} E\left[ \left(fg_{t}^{y_{t}},\xi_{t}^{N}\right) | {\mathcal F}_{t-1} \right], \end{array} $$
since \(G_{t-1}^{N}\) is measurable w.r.t. \({\mathcal F}_{t-1}\). Moreover, for any \(h \in B({\mathcal X})\), it is straightforward to show that
$$E\left[\left(h,\xi_{t}^{N}\right) | {\mathcal F}_{t-1} \right] = \left(h, \tau_{t}\pi_{t-1}^{N}\right) = \left((h,\tau_{t}), \pi_{t-1}^{N} \right), $$
hence, as \(f g_{t}^{y_{t}} \in B({\mathcal X})\), we readily obtain
$$ E\left[ \left(fg_{t}^{y_{t}},\xi_{t}^{N}\right) | {\mathcal F}_{t-1} \right] = \left(\left(fg_{t}^{y_{t}},\tau_{t}\right), \pi_{t-1}^{N} \right). $$
Substituting (44) into (43), we arrive at
$$\begin{array}{*{20}l} E\left[ \left(f,\rho_{t}^{N}\right) | {\mathcal F}_{t-1} \right] &= G_{t-1}^{N} \left(\left(fg_{t}^{y_{t}},\tau_{t}\right), \pi_{t-1}^{N} \right) \\ &= \left(\left(fg_{t}^{y_{t}},\tau_{t}\right), \rho_{t-1}^{N} \right), \end{array} $$
where (45) follows from the definition of the estimate of ρt−1, namely \(\rho _{t-1}^{N} ~=~ G_{t-1}^{N}\pi _{t-1}^{N}\). If we take unconditional expectations on both sides of Eq. (45), we obtain
$$\begin{array}{*{20}l} E\left[ \left(f,\rho_{t}^{N}\right) \right] &= E\left[ \left(\left(fg_{t}^{y_{t}},\tau_{t}\right), \rho_{t-1}^{N} \right) \right] \\ &= \left(\left(fg_{t}^{y_{t}},\tau_{t}\right), \rho_{t-1} \right) \end{array} $$
$$\begin{array}{*{20}l} &= \left(f, g_{t}^{y_{t}} \cdot \tau_{t} \rho_{t-1}\right) \end{array} $$
$$\begin{array}{*{20}l} &= \left(f,\rho_{t}\right), \end{array} $$

where equality (46) follows from the induction hypothesis (39), (47) is obtained by simply re-ordering (46) and Eq. (48) follows from the recursive definition of ρ t in (5).

9 Appendix 2

10 Proof of Lemma 3

For t=0, \(\rho _{0}^{N} = \pi _{0}^{N}\), hence the result follows from Lemma 1. At any time t>0, since \(\rho _{t}^{N} = G_{t}^{N} \pi _{t}^{N}\), we readily have
$$\begin{array}{*{20}l} {}E\left[ \left| \left(f,\rho_{t}^{N}\right) \,-\, (f,\rho_{t}) \right|^{p} \right] &= E\left[ \left| \frac{1}{N} \sum_{i=1}^{N} G_{t}^{N} f\left(\!x_{t}^{(i)}\!\right) \,-\, (f,\rho_{t}) \right|^{p} \right] \\ &= E\left[ \left| \frac{1}{N} \sum_{i=1}^{N} Z_{t}^{(i)} \right|^{p} \right], \end{array} $$

where \(Z_{t}^{(i)} = G_{t}^{N} f\left (x_{t}^{(i)}\right) - (f,\rho _{t})\), i=1,...,N. It is apparent that the random variables \(Z_{t}^{(i)}\), i=1,...,N, are conditionally independent given the σ-algebra \(\bar {\mathcal F}_{t}\) generated by the set \(\left \{ x_{0:t-1}^{(j)}, \bar x_{0:t}^{(j)} : 1 \le j \le N \right \}\). It can also be proved that every \(Z_{t}^{(i)}\) is centred and bounded, as explicitly shown in the sequel.

To see that \(Z_{t}^{(i)}\) has zero mean, let us note first that
$$ E\left[ G_{t}^{N} f\left(x_{t}^{(i)}\right) | \bar {\mathcal F}_{t} \right] = G_{t}^{N} \left(f,\bar \pi_{t}^{N}\right), $$
since \(G_{t}^{N}\) is measurable w.r.t. \(\bar {\mathcal F}_{t}\). Moreover, by the same argument as in the proof of Lemma 2, one can show that \(G_{t}^{N}\left (f,\bar \pi _{t}^{N}\right) ~=~ G_{t-1}^{N} \left (fg_{t}^{y_{t}},\xi _{t}^{N}\right)\) and, therefore,
$$\begin{array}{*{20}l} {}E\left[ G_{t}^{N} f\left(x_{t}^{(i)}\right) | {\mathcal F}_{t-1} \right] &= E\left[ G_{t-1}^{N} \left(fg_{t}^{y_{t}},\xi_{t}^{N}\right) | {\mathcal F}_{t-1} \right] \\ &= G_{t-1}^{N} \left(\left(fg_{t}^{y_{t}},\tau_{t}\right), \pi_{t-1}^{N}\right), \end{array} $$
where we have used the fact that, for any \(h \in B({\mathcal X})\), \(E\left [ \left (h,\xi _{t}^{N}\right) | {\mathcal F}_{t-1} \right ] = \left ((h,\tau _{t}),\pi _{t-1}^{N}\right)\). However, since \(\rho _{t-1}^{N} ~=~ G_{t-1}^{N} \pi _{t-1}^{N}\), Eq. (50) amounts to
$$ E\left[ G_{t}^{N} f\left(x_{t}^{(i)}\right) | {\mathcal F}_{t-1} \right] = \left(\left(fg_{t}^{y_{t}},\tau_{t}\right),\rho_{t-1}^{N} \right) $$
and taking (unconditional) expectations on both sides of the equation above yields
$$\begin{array}{*{20}l} E\left[ G_{t}^{N} f\left(x_{t}^{(i)}\right) \right] &= E\left[ \left(\left(fg_{t}^{y_{t}},\tau_{t}\right),\rho_{t-1}^{N} \right) \right] \\ &= \left(\left(fg_{t}^{y_{t}},\tau_{t}\right),\rho_{t-1} \right) \end{array} $$
$$\begin{array}{*{20}l} &= (f,\rho_{t}), \end{array} $$

where (51) follows from Lemma 2 (i.e. \(\rho _{t-1}^{N}\) is unbiased) and (52) is a straightforward consequence of the definition of ρ t in (5). Eq. 52 states that \(E\left [ Z_{t}^{(i)} \right ] = E\left [ G_{t}^{N} f\left (x_{t}^{(i)}\right) - \left (f,\rho _{t}\right) \right ] = 0\).

To see that (every) \(Z_{t}^{(i)}\) is bounded, note that
$$ G_{t}^{N} \le \prod_{k=1}^{t} \| g_{k}^{y_{k}} \|_{\infty} < \infty, \quad \text{for any \(t<\infty\),} $$
$$\begin{array}{*{20}l} \left(f,\rho_{t}\right) &= \left(\left(fg_{t}^{y_{t}},\tau_{t}\right),\rho_{t-1}\right) \\ &= \left(\left(\left(\left(fg_{t}^{y_{t}},\tau_{t}\right)g_{t-1}^{y_{t-1}},\tau_{t-1}\right)g_{t-2}^{y_{t-2}},..., \tau_{1}\right), \pi_{0}\right) \\ &\le \|f \|_{\infty} \prod_{k=1}^{t} \| g_{k}^{y_{k}} \|_{\infty} < \infty. \end{array} $$
Taking (53) and (54) together, we arrive at
$$ | Z_{t}^{(i)} | \le 2\| f \|_{\infty} \prod_{k=1}^{t} \| g_{k}^{y_{k}} \|_{\infty} $$

which is finite for any finite t (indeed, for every tT).

Since the variables \(Z_{t}^{(i)}\), i = 1,...,N, in (49) are bounded, with zero mean and conditionally independent given \(\bar {\mathcal F}_{t}\), it is not difficult to show (see, e.g. [23] (Lemma A.1)) that
$$ {}E\left[ \left| \left(f,\rho_{t}^{N}\right) - (f,\rho_{t}) \right|^{p} \right] \le \frac{ 2^{p} \breve c_{t}^{p} \| f \|_{\infty}^{p} \prod_{k=1}^{t} \| g_{k}^{y_{k}} \|_{\infty}^{p} }{ N^{\frac{p}{2}} }, $$

where the constant \(\breve c_{t}\) is finite and independent of N. From (56), we easily obtain the inequality (16) in the statement of Lemma 3, with \(\tilde c_{t} ~=~ 2 \breve c_{t} \| f \|_{\infty } \prod _{k=1}^{t} \| g_{k}^{y_{k}} \|_{\infty } < \infty \) for any tT<.


Note that \(G^{N}_{t}\) is an estimate of the normalising constant for π t (namely, the integral (1,ρ t )) which can be shown to be unbiased under mild assumptions [4]. In Bayesian model selection, this constant is termed ‘model evidence’, while in parameter estimation problems, it is often referred to as the likelihood (of the unknown parameters) [27].


Other particle filtering algorithms can be applied in a straightforward way; however, we assume bootstrap filters (i.e. the procedure of Algorithm 1) for the sake of clarity and notational simplicity.


Note the difference in notation between the continuous time s and the parameter s.


Chosen from a typical trajectory of the deterministic Lorenz 63 model.


The deterministic Lorenz 96 system is chaotic for F > 6, with increasing turbulence of the chaotic flow as F is made larger.




The authors thank Dr. Katrin Achutegui for her valuable assistance in obtaining and plotting the numerical results in Section 4.


This work was partially supported by Ministerio de Economía y Competitividad of Spain (TEC2012-38883-C02-01 COMPREHENSION and TEC2015-69868-C2-1-R ADVENTURE) and the Office of Naval Research Global (N62909- 15-1-2011). D. C. and J. M. acknowledge the support of the Isaac Newton Institute through the program Monte Carlo Inference for High-Dimensional Statistical Models.

Authors’ contributions

DC and JM carried out the analysis and obtained the theoretical results. JM and GR-M coded the algorithms and run the computer experiments. All authors collaborated in the composition of the manuscript. The authors are listed in alphabetical order. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

Department of Mathematics of Imperial College London, London, UK
Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Madrid, Spain
Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain


  1. G Hendeby, R Karlsson, F Gustafsson, Particle filtering: the need for speed. EURASIP J. Adv. Sig. Process. 2010:, 22 (2010).MATHGoogle Scholar
  2. M Bolić, PM Djurić, S Hong, Resampling algorithms and architectures for distributed particle filters. IEEE Trans. Sig. Process. 53(7), 2442–2450 (2005).MathSciNetView ArticleMATHGoogle Scholar
  3. A Doucet, N de Freitas, N Gordon, Sequential Monte Carlo Methods in Practice (Springer, New York, 2001).View ArticleMATHGoogle Scholar
  4. P Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications (Springer-Verlag, New York, 2004).View ArticleMATHGoogle Scholar
  5. O Cappé, SJ Godsill, E Moulines, An overview of existing methods and recent advances in sequential Monte Carlo. Proc. IEEE. 95(5), 899–924 (2007).View ArticleGoogle Scholar
  6. A Bain, D Crisan, Fundamentals of Stochastic Filtering (Springer-Verlag, New York, 2008).MATHGoogle Scholar
  7. A Gelencsér-Horváth, G Tornai, A Horváth, G Cserey, Fast, parallel implementation of particle filtering on the gpu architecture. EURASIP J. Adv. Sig. Process. 2013(1), 1–16 (2013).View ArticleGoogle Scholar
  8. J Míguez, Analysis of selection methods for cost-reference particle filtering with applications to maneuvering target tracking and dynamic optimization. Digit. Sig. Process. 17:, 787–807 (2007).View ArticleGoogle Scholar
  9. Hlinka O, Sluciak O, Hlawatsch F, Djuric P, Rupp M, Likelihood consensus and its application to distributed particle filtering. IEEE Trans. Sig. Process. 60(8), 4334–4349 (2012).MathSciNetView ArticleGoogle Scholar
  10. J Miguez, MA Vázquez, A proof of uniform convergence over time for a distributed particle filter. Sig. Process. 122:, 152–163 (2016).View ArticleGoogle Scholar
  11. K Heine, N Whiteley, Fluctuations, stability and instability of a distributed particle filter with local exchange. Stoch. Process. Appl.127.8(2017), 2508–2541 (2016).MathSciNetMATHGoogle Scholar
  12. N Whiteley, A Lee, K Heine, On the role of interaction in sequential Monte Carlo algorithms. Bernoulli. 22(1), 494–529 (2016).MathSciNetView ArticleMATHGoogle Scholar
  13. C Vergé, C Dubarry, P Del Moral, E Moulines, On parallel implementation of sequential Monte Carlo methods: the island particle model. Stat. Comput. 25(2), 243–260 (2015).MathSciNetView ArticleMATHGoogle Scholar
  14. P Del Moral, E Moulines, J Olsson, C Vergé, Convergence properties of weighted particle islands with application to the double bootstrap algorithm. Stoch. Syst. 6(2), 367–419 (2016).MathSciNetView ArticleMATHGoogle Scholar
  15. W Han, On the Numerical Solution of the Filtering Problem (Ph.D. Thesis. Department of Mathematics, Imperial College London, 2013).Google Scholar
  16. N Gordon, D Salmond, AFM Smith, Novel approach to nonlinear and non-Gaussian Bayesian state estimation. IEE Proc.-F. 140(2), 107–113 (1993).Google Scholar
  17. A Doucet, N de Freitas, N Gordon, in Sequential Monte Carlo Methods in Practice, ed. by A Doucet, N de Freitas, and N Gordon. An introduction to sequential Monte Carlo methods (Springer-VerlagNew York, 2001), pp. 4–14. chapter 1.View ArticleGoogle Scholar
  18. A Doucet, S Godsill, C Andrieu, On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 10(3), 197–208 (2000).View ArticleGoogle Scholar
  19. R Douc, O Cappé, in Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of the 4th International Symposium on. Comparison of resampling schemes for particle filtering (IEEE, 2005).Google Scholar
  20. D Crisan, A Doucet, A survey of convergence results on particle filtering. IEEE Trans. Sig. Process. 50(3), 736–746 (2002).MathSciNetView ArticleMATHGoogle Scholar
  21. N Chopin, A sequential particle filter method for static models. Biometrika. 89(3), 539–552 (2002).MathSciNetView ArticleMATHGoogle Scholar
  22. XL Hu, TB Schon, L Ljung, A basic convergence result for particle filtering. IEEE Trans. Sig. Process. 56(4), 1337–1348 (2008).MathSciNetView ArticleGoogle Scholar
  23. D Crisan, J Miguez, Particle-kernel estimation of the filter density in state-space models. Bernoulli. 20(4), 1879–1929 (2014).MathSciNetView ArticleMATHGoogle Scholar
  24. J Míguez, D Crisan, PM Djurić, On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization. Stat. Comput. 23(1), 91–107 (2013).MathSciNetView ArticleMATHGoogle Scholar
  25. P Del Moral, A Guionnet, On the stability of interacting processes with applications to filtering and genetic algorithms. Ann. l’Institut Henri Poincaré, (B) Probab. Stat. 37(2), 155–194 (2001).MathSciNetView ArticleMATHGoogle Scholar
  26. J Miguez, in IEEE 8th Sensor Array and Multichannel Sig. Process. Workshop (SAM). On the uniform asymptotic convergence of a distributed particle filter (IEEE, 2014), pp. 241–244.Google Scholar
  27. C Andrieu, A Doucet, R Holenstein, Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B. 72:, 269–342 (2010).MathSciNetView ArticleMATHGoogle Scholar
  28. N Chopin, PE Jacob, O Papaspiliopoulos, SMC2: an efficient algorithm for sequential analysis of state space models. J. R. Stat. Soc. Ser. B Stat Methodol.75.3(2013), 397–426 (2012).Google Scholar
  29. J Míguez, IP Mariño, MA Vázquez, Analysis of a nonlinear importance sampling scheme for Bayesian parameter estimation in state-space models. Sig. Process. 142:, 281–291 (2018).View ArticleGoogle Scholar
  30. J Olsson, O Cappé, R Douc, E Moulines, Sequential Monte Carlo smoothing with application to parameter estimation in nonlinear state space models. Bernoulli. 14(1), 155–179 (2008).MathSciNetView ArticleMATHGoogle Scholar
  31. EN Lorenz, Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130–141 (1963).View ArticleMATHGoogle Scholar
  32. AJ Chorin, P Krause, Dimensional reduction for a Bayesian filter. PNAS. 101(42), 15013–15017 (2004).MathSciNetView ArticleMATHGoogle Scholar
  33. EN Lorenz, in Proceedings of the Seminar on Predictability, vol. 1. Predictability: a problem partly solved (European Centre on Medium Range Weather ForecastingReading, UK, 1996).Google Scholar
  34. J Hakkarainen, A Ilin, A Solonen, M Laine, H Haario, J Tamminen, E Oja, H Järvinen, On closure parameter estimation in chaotic systems. Nonlinear Proc. Geoph. 19(1), 127–143 (2012).View ArticleGoogle Scholar


© The Author(s) 2018