 Research
 Open Access
 Published:
Multiprediction particle filter for efficient parallelized implementation
EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 53 (2011)
Abstract
Particle filter (PF) is an emerging signal processing methodology, which can effectively deal with nonlinear and nonGaussian signals by a samplebased approximation of the state probability density function. The particle generation of the PF is a dataindependent procedure and can be implemented in parallel. However, the resampling procedure in the PF is a sequential task in natural and difficult to be parallelized. Based on the Amdahl's law, the sequential portion of a task limits the maximum speedup of the parallelized implementation. Moreover, large particle number is usually required to obtain an accurate estimation, and the complexity of the resampling procedure is highly related to the number of particles. In this article, we propose a multiprediction (MP) framework with two selection approaches. The proposed MP framework can reduce the required particle number for target estimation accuracy, and the sequential operation of the resampling can be reduced. Besides, the overhead of the MP framework can be easily compensated by parallel implementation. The proposed MPPF alleviates the global sequential operation by increasing the local parallel computation. In addition, the MPPF is very suitable for multicore graphics processing unit (GPU) platform, which is a popular parallel processing architecture. We give prototypical implementations of the MPPFs on multicore GPU platform. For the classic bearingonly tracking experiments, the proposed MPPF can be 25.1 and 15.3 times faster than the sequential importance resamplingPF with 10,000 and 20,000 particles, respectively. Hence, the proposed MPPF can enhance the efficiency of the parallelization.
1. Introduction
Hidden state estimation of a dynamic system with noisy measurements is an important problem in many research areas. Bayesian approach is a common framework for state estimation by obtaining the probability density function (PDF) of the hidden state. For the linear system models with Gaussian noise, Kalman filter (KF) can track mean and covariance of the state PDF. However, KF cannot work well in nonlinear system with nonGaussian noise. Particle filter (PF) [1–5] is an emerging signal processing methodology, which succeeds in dealing with nonlinear and nonGaussian signals by a samplebased approximation of the state PDF. Because, nonlinear dynamic systems with nonGaussian noise appear widely in realworld applications, such as surveillance, object tracking, computer and robot vision, etc., PF outperforms than classical KF in the aforementioned applications.
The conventional sequential importance resampling (SIR) PF is composed of four operations: (1) prediction, (2) weight updating, (3) weight normalization, and (4) resampling, as shown in Figure 1a. The prediction and weight updating steps form the sampling procedure, and the sampling procedure is a dataindependent operation and can be parallelized effectively. Since particle sampling is parallel in nature, many studies have explored and proposed parallel architectures for PF, especially by Bolić et al. [6, 7]. However, the resampling procedure of the SIRPF needs the weight information of whole particle set and results in global data exchange. Hence, it suppresses the efficiency of the SIRPF parallel implementation. Recently, the idea of independent MetropolisHastings (IMH) algorithm [8] is utilized to facilitate the parallel design of the resampling procedure in PF [9, 10]. In conclusion, to enhance the parallelized PF, the studies in [7–11] focus on the modification of the resampling operation.
Based on the Amdahl's law[12], the sequential portion of a task limits the speedup in parallelized implementation. The resampling procedure is a sequential task that significantly limits the acceleration of the parallelized PF. In general, the complexity of the resampling procedure is proportional to the size of the posterior particle set. In tradition, the prior application domain knowledge can be utilized in the system model to reduce the uncertainty of the system state, such as [13, 14]. However, this approach is applicationdependent and hard to be utilized in other applications.
In this article, we propose a multiprediction (MP) sampling approach to profit the parallelized PF. The proposed MPsampling approach consists of MP operation, weight updating, and local particle selection, as shown in Figure 1b. In the proposed MP operation, multiple predicted particles are generated from a specific basis particle, and the prediction number is defined as P. The SIRPF with N_{1} basis particles can generate N_{1} predicted particle. The proposed MPPF with N_{2} particles can generate N_{2} × P predicted particles. As P is large, the required basis particle number of the proposed MPPF can be significantly reduced for the same predicted particle number. Hence, the proposed MPPF can suppress the complexity of the resampling procedure and benefit the parallelized PF. Besides, the proposed MPPF has an overhead of additional prediction computations from the MP operation. Because the prediction procedure is data independent for each basis particle, the MP operation can be easily implemented in parallel. In summary, the proposed MPPF reduces the sequential global data operation resulting from the resampling procedure by increasing the local computation overhead. Hence, the proposed MPPF improves the execution time of the parallelized PF by reducing the complexity of the resampling procedure. It should be noted that our approach is not proposed to replace the algorithms in [7–11]. Proposed MPPF can be combined with modified resampling algorithms in [7–11] to further improve the efficiency of the parallelized PF. To clarify the benefit of our approach, we compare proposed MPPF with regular SIRPF.
Recently, multicore graphics processing unit s (GPUs) are popular in the signal processing domain [15–17] for its capability of massive parallel computation. The main feature of the multicore GPUs is its high efficiency to process many parallel local computations. However, the latency of the memory access in GPU is much larger, because GPU does not have levels of cache for global data. If the executed task consists of many sequential operations or uncoalesed global data access [18], then the processing cores have to stall and result in low utilization. The proposed MPPF trades additional local computations for reducing the amount of the global data access. To verify the benefit of the proposed MPPFs, we implement the proposed MPPF on NVIDIA multicore GPUs. Our prototype results show that the proposed MPPFs can be above 10× faster than the SIRPF on multicore GPU platform.
The rest of this article is organized as follows. The review of conventional SIRPF is given in Section 2. Then the proposed MPPF is presented in Section 3. The simulation results of the proposed MPPFs are shown in Section 4. Implementation on the NVIDIA GPU and comparisons are presented in Section 5. Finally, Section 6 concludes the study of this article.
2. Review of SIR Pf
The basic procedures of the SIRPF are briefly introduced in this section. System state transition model and measurement model are two key models in the SIRPF framework, as shown in Equations 1 and 2, respectively
where x _{ t } is the system state vector that we want to track; n _{ t } the random vector describing the system uncertainty; y _{ t } the observable measurement vector; and v _{ t } the measurement noise vector. The PF algorithm can work in the condition that f_{ t } and h_{ t } are nonlinear or n _{ t } and v _{ t } are nonGaussian distribution. The PF algorithm needs the following information about system x and observation y:

P(x_{0}): The PDF of the initial system state.

P(x_{ t }x_{t1}): The transition PDF of system state.

P(y_{ t }x_{ t }): The observation likelihood function of y_{ t }with a given system state.
To track the current system state, the posterior PDF P(x_{ t }y_{1:t}) is required. Based on the Bayes theorem, P(x_{ t }y_{1:t}) can be represented by likelihood function P(y_{ t }x_{ t }) transition prediction function P(x_{ t }y_{1:t1}) and the normalization term P(y_{ t }y_{1:t1}):
From Equation 3, the prior prediction probability P(x_{ t }y_{1:t1}) can be represented as
For nonlinear/nonGaussian scenario, Equations 3 and 4 cannot be obtained analytically. The SIRPF approximates the posterior P(x_{ t }y_{1:t1}) with a particle set ${\left\{{x}_{t}^{\left(i\right)},{w}_{t}^{\left(i\right)}\right\}}_{i=1}^{N}$, and ${w}_{t}^{\left(i\right)}$ is associated weight for each particle. The SIRPF algorithm with N particles is described as
Initialization
Generate N initial particles ${x}_{0}^{\left(1\right)},...,{x}_{0}^{\left(N\right)}$ from predefined initial state distribution P(x_{0}). All particles have equal initial weights, ${w}_{0}^{\left(i\right)}=\mathsf{\text{1}}\u2215N$.
IterationRepeat for t = 1, 2, 3,...:

(a)
Prediction: Draw the predicted particles ${x}_{t}^{\left(i\right)}$ through the state transition model. For i = 1,...,N, ${n}_{t}^{\left(i\right)}$ are independent with each other. These predicted particles can be utilized to approximate the prior prediction distribution P(x_{ t }y_{1:t1}).

(b)
Weight updating: After receiving the measurement, each particle needs to update the weight according to the likelihood function $P\left({y}_{t}{x}_{t}^{\left(i\right)}\right)$, as shown in Equation 5:
$${w}_{t}^{\left(i\right)}={w}_{t1}^{\left(i\right)}\cdot P\left({y}_{t}{x}_{t}^{\left(i\right)}\right)\mathsf{\text{.}}$$(5) 
(c)
Weight normalization: The normalization procedure makes the sum of particle weights equal to one. The particles with normalized updated weights can represent the posterior state distribution. The normalization procedure is represented as
$${w}_{t}^{\left(i\right)}={w}_{t}^{\left(i\right)}\u2215\sum _{N}{w}_{t}^{\left(i\right)}\mathsf{\text{.}}$$(6) 
(d)
Resampling: After weight updating operation, some particle weights may be degenerated to a small value near zero. In general, systematic resampling (SR) is widely used for standard implementation of the resampling procedure. The SR procedure is to draw a new particle set with independent index j _{1},...,j_{ N } such that $P\left({j}_{k}=i\right)\propto {w}_{t}^{\left(i\right)}$ and set ${\widehat{x}}_{t}^{\left(jk\right)}={x}_{t}^{\left(i\right)}$ Besides, all particle weights are set to 1/N.
The data flow of the SIRPF with N_{1} particles is shown in Figure 1a. The posterior particles at (t  1) serve as the basis particles to generate the predicted prior particles at t. There is a tradeoff between estimation accuracy and particle number. The SIRPF with larger N will increase the estimation accuracy. However, because the resampling operation is executed on the posterior particle set, the SIRPF with larger N will raise the complexity of the resampling operation.
3. Proposed MP PF algorithm
The data flow of the proposed MPPF with N_{2} basis particles and P predictions is shown in Figure 1b. Our proposed MPPF is developed based on the SIRPF. We replace the sampling procedure in the SIRPF with our proposed MPsampling approach. There are two modifications in the proposed MPsampling approach: (1) MP operation. (2) Local particle selection (LPS) operation.
3.1 Proposed MP operation
The proposed MP operation is inspired by the phenomenonunpredictable behavior of the target. Due to the uncertainty in the system transition model described by P(x_{ t }x_{t1}), the state propagation has many, even infinite possible outputs. In SIRPF, however, each particle makes only one prediction for next timing instant, and it is hard to predict the moving of the target perfectly. Hence, the SIR needs to store many particles to predict the system transition behavior. In our proposed MP operation, each basis particle makes multiple predictions according to the system model to track the uncertain system state. With the same number of basis particles, the MPPF can produce a predicted prior particle set with larger size than the SIRPF. Hence, the MPPF can give more prediction state diversity to track the system state.
In the MPPF, P local predicted particles are generated from one basis particle according to the system transition model, as shown in Equation 7
${x}_{t1}^{\left(i\right)}$ is a specific basis particle at t  1. The local predicted particle set, ${\left\{{x}_{\mathsf{\text{local}}}^{\left(j\right)}\right\}}_{j=1}^{P},$ is a samplebased representation of transition PDF $P\left({x}_{t}{x}_{t1}^{\left(i\right)}\right).$ In the predicted prior distribution, each predicted particle has equal weight as well as equal importance, and none of the predicted particles can be removed. After weight updating, the importance of each particle is not equal, and some local predicted particle with low importance can be removed. To maintain the same number of the basis particles for next iteration, the MPsampling approach uses the LPS procedure to reserves only one representative particle in each local particle set.
In each local particle set, only one particle has to be stored. For each basis particle, the local predicted particles are generated sequentially, and we can avoid storing all local temporary particles. The pseudo code of the MP operation with M basis particles and P predictions is shown in Table 1. The previous selected particle and the new generate particle are inputted to the LPS procedure. The LPS procedure reserves a proper particle as the new selected particle based on their weights. It should be noted that the MPPF is the same as the SIRPF as prediction number P = 1. For P > 1 with the same number of basis particles, the MPPF can generate a larger predicted prior particle set than the SIRPF.
3.2 Proposed LPS mechanisms
From each basis particle, a group of predicted particles are generated. As mentioned above, the importance of each particle is not equal after weight updating. Hence, after weight updating, fewer particles need to be stored. In the proposed MPsampling approach, the LPS procedure reserves one representative particle in each group. The representative particle is selected based on the weight distribution of the local predicted particle set. Two LPS approaches are described in the following.
3.2.1 Maximizing importance selection scheme
In each group of particles, the maximizing importance selection (MIS) scheme selects the particle with highest weight as the representative particle for this group, as described by Equation 8
Because the MIS scheme selects the particle with maximum weight in the local distribution, the MIS procedure can be implemented sequentially. It should be noted that, for a widely used normal likelihood function, the MIS can select the representative particle based on the error distance rather than actual likelihood value. Therefore, for normal likelihood, the MIS needs only one likelihood calculation to update particle weight. Besides, the MIS scheme does not need a uniform random variable for selection procedure. The pseudo code of the MIS LPS procedure is given in Table 2.
3.2.2 Systematic resampling like selection scheme
The predicted particles from a specific basis particle can be regarded as a local distribution. In the systematic resampling like selection (SRS) scheme, the representative particle is selected based on the SR algorithm. The SRS is a probabilistic selection scheme, and the probability of j th local predicted particle being selected is defined by
In general, the SR algorithm needs the cumulative sum information, and all predicted particles cannot be released until the SR procedure is accomplished. The conventional SR algorithm requires additional memory and processing latency. Fortunately, because the LPS procedure needs to select only one particle, the CDF scanning operation can be transformed to a sequential comparing operation. The detailed explanation is given in Appendix. The additional memory to temporally store the local particle set can be saved. Besides, the SRS procedure can start without waiting all local particles are generated. This feature can increase the execution efficiency of the SRS scheme. The pseudo code of the SRS LPS procedure is given in Table 3.
3.3 Prediction number and LPS scheme evaluation flow
Before describing the evaluation flow, we analyze two LPS schemes first. There are two considerations for choosing LPS scheme:
3.3.1 Complexity
Both the SRS and MIS schemes are implemented by sequential ComparingandReplace operation. The difference between two LPS schemes is the condition for replacing. The SRS scheme needs random variables to make a probabilistic selection. Besides, as mentioned above, the MIS scheme needs only one likelihood calculation for normal likelihood. The complexity comparison between two schemes is given in Table 4. With the same setting of particle number and prediction number, the MIS scheme has lower complexity than the SRS scheme.
3.3.2 Robustness to measurement noise
In the SRS scheme, the representative particle is selected based on the PDF of the whole local predicted particle set. Hence, the predicted particles with similar weights have similar chance to be chosen as the representative particle in the SRS scheme. However, in the MIS scheme, the predicted particle with highest weight is always selected as the representative particle. In summary, the weights of the local particle set affect the result of the LPS procedure. In general, the measurement has a noise term. The weights of the particle set are updated based on the likelihoods to the measurement, so the weight is also affected by the measurement noise. As variance of the noise is high, the MIS scheme may suffer accuracy degradation, because the MIS scheme always selects the predicted particle with highest weight and believes the measurement too much.
In summary, for target accuracy, we should evaluate both two schemes and select the scheme that has lower execution time. Prediction number P and basis particle number N are main design parameters in the proposed MPPF. By increasing P, the MPPF can reduce the basis particle number as well as the global sequential operation. However, the total execution time may increase with too large P. Therefore, for target accuracy, a proper setting of (N, P) and the LPS schemes should be evaluated for a specific parallel architecture.
Our suggested evaluation flow is shown in Figure 2. The prediction number set for evaluation and the target accuracy should be predefined. For a specific prediction number, the minimum particle number for the target accuracy can be obtained from simulation. With the prediction number and the particle number, the total execution time can be evaluated for a specific parallel architecture. We can obtain the setting of (N, P) with minimum execution time under the prediction number set. Eventually, we can choose a proper LPS scheme based on the minimum execution time of two LPS scheme.
4. Simulation results and discussion
The proposed MPPF does not utilize the prior knowledge related to the application. In this section, we verify the proposed MPPF by two widely used benchmark simulation models. In Section 4.1, we use a simple system transition model to evaluate the two LPS scheme at different measurement noise strength. In Section 4.2, we use the BOT model, which has high transition uncertainty to demonstrate the benefit of the proposed MPPF.
4.1 Robustness to measurement noise
This model is highly nonlinear and is bimodal in nature. The system model and measurement model are described in Equations 8 and 9, respectively
n_{ t } ~ N($0,{\sigma}_{n}^{2}$), v_{ t } ~ N($0,{\sigma}_{v}^{2}$). N(u, σ^{2}) is the normal distribution with mean u and variance σ^{2}. The initial state distribution is x_{0} ~ N(0,10). In our simulation, the variance of the system transition noise is set as ${\sigma}_{n}^{2}=10.0$. We take the weighted sum of posterior particles as the state estimation and calculate the meansquareerror (MSE) from the difference between the state estimation and the true state. The simulations are obtained from 10^{4} randomly initialized experiments with 50 steps. To evaluate the robustness of the proposed LPS schemes, Figures 3, 4, and 5 give the MSE comparisons at different noise variance, ${\sigma}_{v}^{2}\mathsf{\text{=1}},\mathsf{\text{1}}\u2215\mathsf{\text{4}},\mathsf{\text{and1}}\u2215\mathsf{\text{16}}$.
In this model, the term related to the hidden state is divide by 20, so the noise with ${\sigma}_{v}^{2}=1$ is a large noise. In Figure 3, the MISbased MPPF suffers from huge accuracy degradation due to high measurement noise, especially for large P. As the noise strength is large, the particle with highest weight is not perfectly correct. The representative particle should be selected based on their probability distribution. However, the MIS scheme always selects the particle with highest weight in the local particle set, and this simple but hasty approach does not comply with the statistic of the local predicted particle set.
When the noise strength is lower, as shown in Figures 4b and 5b, the estimation accuracy of the MIS scheme can be improved. Nevertheless, the MIS scheme is still not robust to the measurement noise. Because the SRS scheme selects the representative particle in probabilistic sense, the SRS scheme has better robustness to the measurement noise than the MIS scheme. The accuracy of the SRS scheme is always better than the SIRPF, as shown in Figures 3, 4 and 5.
From Figures 3, 4 and 5, it is apparent that the SRSbased MPPF has better estimation accuracy than the SIRPF with the same basis particle number. In Table 5, we compare the SRSbased MPPF and the SIRPF with fixed number of predictions. The MSE performance of the SIRPF converges around at particle number N = 500. At this convergent point, we can give a fair comparison between the SRSbased MPPF and the SIRPF at the same total prior prediction number, 500. Table 5 gives the MSE comparison results. As N < 50, the proposed MPPF has too few basis particles to sample the posterior PDF sufficiently. Although the MP approach can reduce the basis particle number, the basis particle number cannot be too small. With reasonable basis particle number, the proposed MPPFs can give similar MSE results with much fewer basis particles. This result supports our clam that the proposed MPPFs can reduce the memory requirement and the complexity of the resampling procedure.
4.2. The system model with high transition uncertainty
In this section, we use the BOT model with high system transition uncertainty to further demonstrate the benefit of the proposed MPPFs. In the BOT model, the state vector include four state variables, i.e.,${x}_{k}={\left({P}_{{x}_{k}}{P}_{{y}_{k}}{V}_{{x}_{k}}{V}_{{y}_{k}}\right)}^{\mathsf{\text{T}}}$. Following the Cartesian coordinate, the P_{ x } and P_{ y } stand for the twodimensional position, while V_{ x } and V_{ y } are the twodimensional velocity. The observer is assumed to be at the origin, and the position as well as velocity are relative to the observer. The BOT system model is given in Equation 10
where ${u}_{k}={\left({u}_{{x}_{k}}{u}_{{y}_{k}}\right)}^{\mathsf{\text{T}}}~N\left(0,q{I}_{2}\right)$, and the matrices F and Γ are shown in Equation 11:
The measurement is onedimensional and consists of only bearing, i.e., θ_{ k } . We assume observer is fixed at the origin, and the measurement model is illustrated in Equation 12
where v_{k} is additive Gaussian noise, and v_{ k } ~ N(0, r). In our simulation, $\sqrt{q}=0.001$ and $\sqrt{r}=0.005$, the same as the setting in [1]. We calculate the Position error from the difference between estimated position and the true position.
The position error results of the BOT model for two LPS schemes are shown in Figure 6. From Figures 3, 4 and 5, the proposed MPPF can give a better performance than the SIRPF, but the performance improvement is converged after P = 6. Because the MPsampling operation is utilized to track the uncertain system transition behavior, a huge number of predictions are not necessary in a simple model. However, in the BOT model, the SIRPF needs thousands of particles to obtain good estimation accuracy. This phenomenon means that the BOT model has high system uncertainty, and the PF needs more particles to track the hidden state. In this condition, the MP operation can further give improvement by using larger P. In other words, in the system model with higher transition uncertainty, the number of basis particle can be reduced by using more predictions.
In Figure 6b, it is apparent that the MISbased MPPF has unstable behavior of the estimation accuracy. As the prediction number is large, the aforementioned drawback of the MIS scheme is more apparent. Figure 7 gives the comparison results between two LPS schemes at different prediction number. For small prediction number, as shown in Figure 7a, two LPS schemes have similar estimation accuracy. With large prediction number, as shown in Figure 7b, the SRS scheme can give better estimation accuracy than the MIS scheme due to its probabilistic selection mechanism.
As mentioned in Section 3, the MIS scheme selects the representative particle compulsorily. We can observe two drawbacks in the MIS scheme from the above simulation: (a) low robustness to measurement noise; (b) the performance degradation in large prediction number. The drawbacks of the MIS scheme result from the simplification in the representative particle selection. The benefit of the MIS scheme is its simplicity. From the observation in simulation, the MIS scheme is feasible in low prediction number and low measurement noise. In contrast, the SRS scheme follows the posterior weight distribution to select the representative particle. Because the SRS select the local representative particle in probabilistic sense, the SRS scheme has higher stability and robustness than the MIS scheme.
5. Implementation of the MPPFs on GPU
5.1. Parallelized MPPF on NVIDIA multicore GPU
The proposed MPPF increases the prediction computation to reduce the complexity of the resampling procedure. Because the MPsampling operation can be executed independently among all basis particles, the prediction computation overhead can be compensated by parallel executions easily. In this subsection, we give the architecture of the MPPF implemented on NVIDIA GPU. NVIDIA multicore GPU can accelerate applications with singleinstructionmultiplethreads (SIMT) execution model and hierarchical memory.
As mentioned above, the MPsampling procedure is independent among particles and can be parallelized by mapping each particle to parallel threads without efforts. Weight summation for normalization results in global memory accessing. For efficiency, shared memory can be utilized to buffer the intermediate data. In the resampling procedure, the global particle exchange needs uncoalesced global memory accessing [18], and this will dominate the processing time to be near O(N) and slower the resampling step significantly. The thread block diagram of the SIRPF is shown in Figure 8. Though with superior computing capability, the SIMT parallelism somehow suffers from inefficiency when processing uncoalesced global data exchange. The task with many global data transfer, like the resampling, will dominate the execution time on GPU.
5.2. Implementation result of the SIRPF on GPU
To compare with the proposed MPPFs, we first implement the SIRPF of the BOT model on a NIVIDIA GPU. The software interface for programming on NVIDIA's GPU is the compute unified device architecture (CUDA) [18, 19]. The description of the GPU used in this work is shown in Table 6. In Section 5.1, we described how to map the proposed MPPF on NVIDIA multicore GPU. For the SIRPF, the only difference is the sampling procedure. As P = 1, the mapping in Section 3.4 is designed for the SIRPF. Figure 9 shows the profiling results of the SIRPF implemented on GPU, and the profiling data is the execution time of the PF with 25 iterations in the BOT model. The global operations, the weight normalization and the resampling, indeed cost over 99% execution time while the sampling costs extremely little. Figure 9 validates that the resampling procedure dominates the execution time of the parallelized PF.
5.3. Design example for loose target accuracy
To compare with the SIRPF, we set loose target accuracy first, 0.08, which are simulated accuracy of the SIRPF with 10,000 particles. The prediction number set for evaluation is {10, 20, 50, 100, 200, 500}. Figure 10 shows the estimation accuracy around the target accuracy. From Figure 11, the minimum particle number for each prediction number can be obtained. Table 7 illustrates the execution time and accuracy of the proposed MPPF designs with different parameters. All parameter settings can meet the target accuracy 0.08.
The MPPF can use hundreds of particles to meet the same estimation accuracy of the SIRPF with 10,000 particles. Besides, as the particle number is small, the particle with higher weight may be more important to represent the PDF, and the MIS scheme is a proper scheme for small particle number setting. Hence, the MIS MPPF can use fewer particles than SRS scheme to achieve this accuracy threshold.
The profiling results of the PFs listed in Table 7 are given in Figure 12. As shown in Figure 12, we can reduce the execution time of the resampling and the weight normalization procedure by using more predictionslarger P. However, when the particle number reduction slows down, larger P results in execution time overhead of MPsampling operation.
5.4. Design example for strict target accuracy
In the second design example, we set strict target accuracy first, 0.06, which are simulated accuracy of the SIRPF with 20,000 particles. The prediction number set for evaluation is {10, 20, 50, 100, 200, 500}, the same as in the above example. From the simulation result in Section 4.2, it should be noted that the MISbased MPPF is hard to achieve the threshold 0.06 with large P. Therefore, for the accuracy threshold 0.06, the MIS MPPF cannot use more predictions to reduce the execution time, and we skip the discussion of the MIS scheme for this target accuracy.
Figure 11 shows the estimation accuracy around the target accuracy. From Figure 11, the minimum particle number for each prediction number can be obtained. Table 8 illustrates the execution time and accuracy of the proposed MPPF designs with different parameters. All parameter settings can meet the target accuracy 0.06. Table 8 gives the proposed MPPF designs that meet the second accuracy threshold. It should be noted that the MPPF with the MIS scheme is hard to achieve the threshold 0.06 with large P. The profiling results of the PFs listed in Table 8 are given in Figure 13.
6. Conclusions
In this article, the MP framework with two LPS schemes is proposed to reduce the number of the basis particles. Among two proposed LPS schemes, the SRS scheme is robust to the measurement noise and does not suffer from accuracy saturation. The MIS scheme can work well for small prediction number P or particle number N. By reducing the basis particle number, the complexity of the resampling, the sequential part of the PF task, can be suppressed significantly. The MP framework increases the prediction computation, and this computation can be easily implemented in parallel due to its data independent feature. In other words, the MPPF increases the overhead of the parallel task and reduces the complexity of the sequential task significantly. To demonstrate the benefit of the MPPF for parallel architecture, we implement the MPPFs and the SIRPF on multicore GPU platform. For the classic BOT experiments, the maximum improvements of the proposed MPPF are 25.1 and 15.3 times faster than the SIRPF with 10,000 and 20,000 particles, respectively.
Appendix
Derivation of the proposed SRS scheme
Using the SR algorithm for selection, the probability of j th local predicted particle being selected as representative particle is defined by
In general, the SR procedure needs to collect all predicted particle information, and this results in additional latency and memory. Fortunately, the SRS procedure used in the proposed MP framework only selects one particle, and we modify the SR procedure into a sequential comparing operation, as shown in Table 1, to save the memory and latency overhead. In the following, we demonstrate the proposed SRS scheme also follows the probability defined in Equation 14 to select the representative particle.
For the MP operation with P prediction, the SRS can obtain a representative particle after (P  1) probabilistic comparing test. The first predicted particle is set as initial representative particle. After passing (P  1) comparing test, the first predicted particle is accepted as final representative particle. The condition for first predicted being the final representative particle is described as Equation 15
where u_{ i } is an independent uniform random variable for i th probabilistic comparing test. Hence, the probability of first particle being accepted as representative particle is shown as
The j th local predicted particle needs to pass (P  j + 1) comparing test, and the accept probability is formed as
From Equations 16 and 17, the SRS scheme follows the same probability described in Equation 14 to select the representative particle.
References
 1.
Gordon N, Salmond D, Smith AF: Novel approach to nonlinear/nonGaussian Bayesian state estimation. IEE Proc F Radar Signal Process 1993, 140: 107113. 10.1049/ipf2.1993.0015
 2.
Doucet A, de Freitas N, Gordon N, eds: Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science. Springer, New York; 2001.
 3.
Ristic B, Arulampalam S: Beyond the Kalman Filter: Particle Filters for Tracking. Artech House, Boston; 2004.
 4.
Arulampalam MS, Maskell S, Gordon N, Clapp T: A tutorial on particle filters for online nonlinear/nonGaussian Bayesian tracking. IEEE Trans Signal Process 2002,50(2):174188. 10.1109/78.978374
 5.
Capp'e O, Godsill SJ, Moulines E: An overview of existing methods and recent advances in sequential Monte Carlo. Proc IEEE 2007,95(5):899924.
 6.
Bolić M: Architectures for Efficient Implementation of Particle Filters. Ph.D. dissertation, State University of New York, Stony Brook; 2004.
 7.
Bolić M, Djurić PM, Hong S: Resampling algorithms and architectures for distributed particle filters. IEEE Trans Signal Process 2005,53(7):24422450.
 8.
Sankaranarayanan AC, Chellappa R, Srivastava A: Algorithmic and architectural design methodology for particle filters in hardware. Proc IEEE International Conference on Computer Design (ICCD) 2005, 275280.
 9.
Sankaranarayanan AC, Srivastava A, Chellappa R: Algorithmic and architectural optimizations for computationally efficient particle filtering. IEEE Trans Image Process 2008,17(5):737748.
 10.
Miao L, Zhang J, Chakrabarti C, PapandreouSuppappola A: A new parallel implementation for particle filters and its application to adaptive waveform design. Proc IEEE Workshop on Design and Impl Signal Proc Systems (SiPS) 2010, 1924.
 11.
Manjunath BB, Williams AS, Chakrabarti C: A PapandreouSuppappola, Efficient mapping of advanced signal processing algorithms on multiprocessor architectures. Proc IEEE Workshop on Design and Impl Signal Proc Systems (SiPS2008) 2008, 269274.
 12.
Hill MD, Marty MR: Amdahl's law in the multicore era. IEEE Trans Comput 2008,41(7):3338.
 13.
Chao CH, Chu CY, Wu AY: Locationconstrained particle filter for RSSIbased indoor human positioning and tracking system. Proc IEEE Workshop on Design and Impl Signal Proc Systems (SiPS2008) 2008, 7376.
 14.
Evennou F, Marx F, Novakov E: Mapaided indoor mobile positioning system using particle filter. Proc of IEEE Wireless Communications and Network Conf (WCNC) 2005, 1317.
 15.
Shams R, Sadeghi P, Kennedy RA, Hartley RI: A survey of medical image registration on multicore and the GPU. IEEE Signal Process Mag 2010,27(2):5060.
 16.
Bisceglie MD, Santo MD, Galdi C, Lanari R, Ranaldo N: Synthetic aperture radar processing with GPGPU. IEEE Signal Process Mag 2010,27(2):6978.
 17.
Cheung NM, Fan X, Au OC, Kung MC: Video coding on multicore graphics processors. IEEE Signal Process Mag 2010,27(2):7989.
 18.
NVIDIA, NVIDIA CUDA TM programming guide[http://www.nvidia.com/object/cudahomenew.html]
 19.
Lindholm E, Nickolls J, Oberman S, Montrym J: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 2008,28(2):3955.
Acknowledgements
Financial supports from NSC (grant no. NSC 972220E002012) are greatly appreciated.
Author information
Affiliations
Corresponding author
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Chu, CY., Chao, CH., Chao, MA. et al. Multiprediction particle filter for efficient parallelized implementation. EURASIP J. Adv. Signal Process. 2011, 53 (2011). https://doi.org/10.1186/16876180201153
Received:
Accepted:
Published:
Keywords
 particle filter
 parallelization
 GPU