Multilevel Mixture Kalman Filter

The mixture Kalman ﬁlter is a general sequential Monte Carlo technique for conditional linear dynamic systems. It generates samples of some indicator variables recursively based on sequential importance sampling (SIS) and integrates out the linear and Gaussian state variables conditioned on these indicators. Due to the marginalization process, the complexity of the mixture Kalman ﬁlter is quite high if the dimension of the indicator sampling space is high. In this paper, we address this di ﬃ culty by developing a new Monte Carlo sampling scheme, namely, the multilevel mixture Kalman ﬁlter. The basic idea is to make use of the multilevel or hierarchical structure of the space from which the indicator variables take values. That is, we draw samples in a multilevel fashion, beginning with sampling from the highest-level sampling space and then draw samples from the associate subspace of the newly drawn samples in a lower-level sampling space, until reaching the desired sampling space. Such a multilevel sampling scheme can be used in conjunction with the delayed estimation method, such as the delayed-sample method, resulting in delayed multilevel mixture Kalman ﬁlter. Examples in wireless communication, speciﬁcally the coherent and noncoherent 16-QAM over ﬂat-fading channels, are provided to demonstrate the performance of the proposed multilevel mixture Kalman ﬁlter.


INTRODUCTION
Recently there have been significant interests in the use of the sequential Monte Carlo (SMC) methods to solve online estimation and prediction problems in dynamic systems.Compared with the traditional filtering methods, the simple, flexible-yet powerful-SMC provides effective means to overcome the computational difficulties in dealing with nonlinear dynamic models.The basic idea of the SMC technique is the recursive use of the sequential importance sampling (SIS).There also have been many recent modifications and improvements on the SMC methodology [1,2,3,4,5,6,7,8,9,10,11,12].
Among these SMC methods, the mixture Kalman filter (MKF) [3] is a powerful tool to deal with conditional dynamic linear models (CDLMs) and finds important applications in digital wireless communications [3,13,14].A similar method is also discussed in [15] for CDLM system.The CDLM is a direct generalization of the dynamic linear model (DLM) [16] and it can be generally described as follows: where u t ∼ N (0, I) and v t ∼ N (0, I) are the state and observation noise, respectively, and λ t is a sequence of random indicator variables which may form a Markov chain, but are independent of u t and v t and the past x s and y s , s < t.The matrices F λt , G λt , H λt , and K λt are known, given λ t .An important feature of CDLM is that, given the trajectory of the indicator {λ t }, the system becomes Gaussian and linear, for which the Kalman filter can be used.Thus, by using the marginalization technique for Monte Carlo computation [17], the MKF focuses on the sampling of the indicator variable λ t other than the whole state variable {x t , λ t }.This method can drastically reduce Monte Carlo variances associated with a standard sequential importance sampler applied directly to the space of the state variable.Figure 1: The multilevel structure of the 16-QAM modulation used in digital communications.The set of transmitted symbol A 1 = {s i, j , i = 1, . . ., 4, j = 1, . . ., 4} is the original sampling space, and the centers A 2 = {c 1 , c 2 , c 3 , c 4 } constitute a higher-level sampling space.
However, the computational complexity of the MKF can be quite high, especially in the case of high-dimensional indicator space, due to the need of marginalizing out the indicator variables.Fortunately, often the space from which the indicator variables take values exhibits multilevel or hierarchical structures, which can be exploited to reduce the computational complexity of the MKF.For example, a multilevel structure of the 16-QAM modulation used in digital communications is shown in Figure 1.The set of transmitted symbols A 1 = {s i, j , i = 1, . . ., 4, j = 1, . . ., 4} is the original sampling space, and the centers A 2 = {c 1 , c 2 , c 3 , c 4 } constitute a higher-level sampling space.Thus, based on the observed data, for every sample stream, we first draw a sample (say c 1 ) from the higher-level sampling space A 2 and then draw a new sample from the associated subspaces s 1,1 , s 1,2 , s 1,3 , s 1,4 of c 1 in the original sampling space A 1 .In this way, we need not sample from the entire original sampling space, and many Kalman filter update steps associated with the standard MKF can be saved.
This kind of hierarchical structure imposed on the indicator space is also employed in the partitioned sampling strategy [18], which greatly improved the efficiency and the accuracy for multiple target tracking over the original SMC methods.However, in this paper, the hierarchical structure is employed to reduce the computational load associated with MKF, especially for high-dimensional indicator space, while retaining the desirable properties of MKF.
Dynamic systems often possess strong memories, that is, future observations can reveal substantial information about the current state.Therefore, it is often beneficial to make use of these future observations in sampling the current state.However, an MKF method usually does not go back to regenerate past samples in view of the new observations, although the past estimation can be adjusted by using the new importance weights.To overcome this difficulty, a delayed-sample method is developed [13].It makes use of future observations in generating samples of the current state.It is seen there that this method is especially effective in improving the performance of the MKF.However, the computational complexity of the delayed-sample method is very high.For example, for a ∆-step delayed-sample method, the algorithmic complexity is O(|A 1 | ∆ ), where |A 1 | is the cardinality of the original sampling space.Here, we also provide a delayed multilevel MKF by exploring the multilevel structure of the indicator space.Instead of exploring the original entire space of future states, we only sample the future states in a higherlevel sampling space, thus significantly reducing the dimension of the search space and the computational complexity.
In recent years, the SMC methods have been successfully employed in several important problems in communications, such as the detection in flat-fading channels [13,19,20], space-time coding [21,22], OFDM system [23], and so on.To show the good performance of the proposed novel receivers, we apply them into the problem of adaptive detection in flat-fading channels in the presence of Gaussian noise.
The remainder of the paper is organized as follows.Section 2 briefly reviews the MKF algorithm and its variants.In Section 3, we present the multilevel MKF algorithm.In Section 4, we treat the delayed multilevel MKF algorithm.In Section 5, we provide simulation examples.Section 6 concludes the paper.

Mixture Kalman filter
Consider again the CDLMs defined by (1).The MKF exploits the conditional Gaussian property conditioned on the indicator variable and utilizes a marginalization operation to improve the algorithmic efficiency.Instead of dealing with both x t and λ t , the MKF draws Monte Carlo samples only in the indicator space and uses a mixture of Gaussian distributions to approximate the target distribution.Compared with the generic SMC method, the MKF is substantially more efficient (e.g., it produces more accurate results with the same computational resources).
First we define an important concept that is used throughout the paper.A set of random samples and the corresponding weights {(η (i) , w (i) )} m i=1 is said to be properly weighted with respect to the distribution π(•) if, for any measurable function h, we have ( In particular, if η ( j) is sampled from a trial distribution g(•) which has the same support as π, and if w ( j) = π(η ( j) )/g(η ( j) ), then {(η ( j) , w ( j) )} m j=1 is properly weighted with respect to π(•).
Let Y t = (y 0 , y 1 , . . ., y t ) and Λ t = (λ 0 , λ 1 , . . ., λ t ).By recursively generating a set of properly weighted random samples {(Λ where are obtained with a Kalman filter on the system (1) for the given indicator trajectory Λ Thus, a key step in the MKF is the production at time t of the weighted samples of indicators {(Λ j=1 at the previous time (t − 1).Suppose that the indicator λ t takes values from a finite set A 1 .The MKF algorithm is as follows.
Algorithm 1 (MKF).Suppose at time (t −1), a set of property weighted samples {(Λ j=1 is available with respect to p(Λ t−1 | Y t−1 ).Then at time t, as the new data y t becomes available, the following steps are implemented to update each weighted sample.
(i) Based on the new data y t , for each a i ∈ A 1 , run a onestep Kalman filter update assuming λ t = a i to obtain κ (ii) For each a i ∈ A 1 , compute the sampling density Note that by the model (1), the first density in (7) is Gaussian and can be computed based on the Kalman filter update (6).Draw a sample λ ( j) t according to the above sampling density.Append λ The new sample {Λ (iv) Perform a resampling step as discussed below.

Resampling procedure
The importance sampling weight w ( j) t measures the "quality" of the corresponding imputed indicator sequence Λ ( j) t .A relatively small weight implies that the sample is drawn far from the main body of the posterior distribution and has a small contribution in the final estimation.Such a sample is said to be ineffective.If there are too many ineffective samples, the Monte Carlo procedure becomes inefficient.To avoid the degeneracy, a useful resampling procedure, which was suggested in [7,11], may be used.Roughly speaking, resampling is to multiply the streams with the larger importance weights, while eliminating the ones with small importance weights.A simple, but efficient, resampling procedure consists of the following two steps.
(1) Sample a new set of streams { Λ j=1 with probability proportional to the importance weights {w Resampling can be done at every fixed-length time interval (say, every five steps) or it can be conducted dynamically.The effective sample size can be used to monitor the variation of the importance weights of the sample streams and to decide when to resample as the system evolves.The effective sample size is defined as in [13]: where υ t , the coefficient of variation, is given by with wt = m j=1 w ( j) t /m.In dynamic resampling, a resampling step is performed once the effective sample size mt is below a certain threshold.
Heuristically, resampling can provide chances for good sample streams to amplify themselves and hence "rejuvenate" the sampler to produce a better result for future states as the system evolves.It can be shown that the samples drawn by the above resampling procedure are also indeed properly weighted with respect to p(Λ t | Y t ), provided that m is sufficiently large.In practice, when small to modest m is used (we use m = 50 in this paper), the resampling procedure can be seen as a tradeoff between the bias and the variance.That is, the new samples with their weights resulting from the resampling procedure are only approximately proper, and this introduces small bias in the Monte Carlo estimation.On the other hand, however, resampling significantly reduces the Monte Carlo variance for future samples.

Delayed estimation
Model (1) often exhibites strong memory.As a result, future observations often contain information about the current state.Hence a delayed estimate is usually more accurate than the concurrent estimate.In delayed estimation, instead of making inference on Λ t instantaneously with the posterior distribution p(Λ t | Y t ), we delay this inference to a later time (t + ∆), ∆ > 0, with the distribution p(Λ t | Y t+∆ ).
As discussed in [13], there are primarily two approaches to delayed estimation, namely, the delayed-weight method and the delayed-sample method.

Delayed-weight method
In the concurrent MKF algorithm, if the set {(Λ is properly weighted with respect to p(Λ t+δ | Y t+δ ), then when we focus our attention on λ t at time (t + δ), we have that {(λ j=1 is properly weighted with respect to p(λ t | Y t+δ ).Then any inference about the indicator λ t , E{h(λ t ) | Y t+δ }, can be approximated by t+δ .(11) Since the weights {w t+δ } m j=1 contain information about the future observations (y t+1 , . . ., y t+δ ), the estimate in ( 11) is usually more accurate than the concurrent estimate.Note that such a delayed estimation method incurs no additional computational cost (i.e., CPU time), but it requires some extra memory for storing {λ For most systems, this simple delayed-weight method is quite effective for improving the performance over the concurrent method.However, if this method is not sufficient for exploiting the constraint structures of the indicator variable, we must resort to the delayed-sample method, which is described next.

Delayed-sample method
An alternative method of delayed estimation is to generate both the delayed samples and the weights {(λ based on the observations Y t+∆ , hence making p(Λ t | Y t+∆ ) the target distribution at time (t + ∆).The procedure will provide better Monte Carlo samples since it utilizes the future observations (y t+1 , . . ., y t+∆ ) in generating the current samples of λ t .But the algorithm is also more demanding both analytically and computationally because of the need of marginalizing out λ t+1 , . . ., λ t+∆ .
For each possible "future" (relative to time t − 1) symbol sequence at time (t + ∆ − 1), that is, we keep the value of a ∆-step Kalman filter {κ with The delayed-sample MKF algorithm recursively propagates the samples properly weighted for p(Λ t−1 | Y t+∆−1 ) to those for p(Λ t | Y t+∆ ) and is summarized as follows.
(i) For each λ t+∆ = a i ∈ A 1 , and for each λ t+∆−1 t ∈ A ∆ 1 , perform a one-step update on the corresponding Kalman filter κ Note that the second density in ( 16) is Gaussian and can be computed based on the results of the Kalman filter updates in (15).Draw a sample λ (iv) Resample if the effective sample size is below a certain threshold, as discussed in Section 2.2.
Finally, as noted in [13], we can use the above delayedsample method in conjunction with the delayed-weight method.For example, using the delayed-sample method, we generate delayed samples and weights {(Λ j=1 based on observations Y t+∆ .Then with an additional delay δ, we can use the following delayed-weight method to approximate any inference about the indicator λ t : (18)

MULTILEVEL MIXTURE KALMAN FILTER
Suppose that the indicator space has a multilevel structure.For example, consider the following scenario of a two-level sampling space.The first level is the original sampling space Ω with Ω = {λ i , i = 1, 2, . . ., N}.We also have a higher-level sampling space Ω with Ω = {c i , i = 1, 2, . . ., N}.The higherlevel sampling space can be obtained as follows.Define the elements (say c i ) in the higher-level sampling space as the centers of subset ω i in the original sampling space.That is, where ω i , i = 1, 2, . . ., N, is the subset in the original sampling space We call c i the parent of the elements in subset ω i and ω i the child set of c i .We can also iterate the above merging procedure on the newly created higher-level sampling space to get an even higher-level sampling space.
For simplicity, we will use c i,l to represent the ith symbol value in the lth-level sampling space.Assume that there are, in total, L levels of sampling space and the number of elements at the lth level is |A l |.The original sampling space is defined as the first level.Then the multilevel MKF is summarized as follows.
Algorithm 3 (multilevel MKF).Suppose, at time (t − 1), a set of properly weighted samples {(Λ j=1 is available with respect to p(Λ t−1 | Y t−1 ).Then at time t, as the new data y t becomes available, the following steps are implemented to update each weighted sample.
(A1) Draw the sample c ( j) t,L in the Lth-level sampling space.(a) Based on the new data y t , for each c i,L ∈ A L , perform a one-step update on the corresponding Kalman filter κ (b) For each c i,L ∈ A L , compute the Lth-level sampling density Note that by the model (1), the first density in (24) is Gaussian and can be computed based on the Kalman filter update (23).(c) Draw a sample c ( j) t,L from the Lth-level sampling space A L according to the above sampling density, that is, t,l in the lth-level sampling space.First, find the child set ω ( j) l in the current lth-level sampling space for the drawn sample c ( j) t,l+1 in the (l + 1)thlevel sampling space, then proceed with three more steps.(a) For each c l , perform a one-step update on the corresponding Kalman filter κ Remark 5 (complexity).Note that the dominant computation required for the above delayed multilevel MKF is mainly in the first step.Denote J |A 1 | and M |A l |.The number of one-step Kalman filter updates in the delayed multilevel MKF can be computed as follows: Note that the delayed-sample MKF requires J ∆+1 Kalman updates at each time.Since usually M J, compared with the delayed-sample MKF, the computational complexity of the delayed multilevel MKF is significantly reduced.
Remark 6 (alternative sampling method).To further reduce the computational complexity in the first step, we can take the alternative sampling method composed of the following steps.
Algorithm 5 (alternative sampling method).(1) At time t+∆, for each c t+∆ t,l ∈ A ∆ l , perform the update on the corresponding Kalman filter κ (2) Select K paths from c t+∆ t+1,l based on the computed weight (3) For each λ t = a i ∈ A 1 , and for each path k in the selected K paths, perform the update on the corresponding Kalman filter κ (4) For each a i ∈ A 1 , compute the sampling density The weight calculation can be computed by (40) for the target distribution p(λ t | Y t ) or (45) for the target distribution p(λ t | Y t+∆ ).Besides the bias that resulted from the Gaussian approximation in Remark 3, the latter also introduces new bias because of the summation over K selected paths, other than the whole sampling space in higher level.However, the first one does not introduce any bias as in Remark 4. Denote J The number of one-step Kalman filter updates in the delayed multilevel MKF can be computed as Remark 7 (the choice of the parameters).Note that the performance of the delayed multilevel MKF is mainly dominated by the two important parameters: the number of prediction steps ∆ and the specific level of the sampling space A l .With the same computation, the delayed multilevel MKF can see further "future" steps (larger ∆) with a coarser-level sampling space (larger l), whereas it can see the "future" samples in a finer sampling space (smaller l), but with smaller ∆ steps.
Remark 8 (multiple sampling space).In Algorithm 5, we can also use different sampling spaces for different delay steps.That is, we can gradually increase the sampling space from the lower level to the higher level with an increase in the delay step.

SIMULATIONS
We consider the problem of adaptive detection in flat-fading channels in the presence of Gaussian noise.This problem is of fundamental importance in communication theory and an array of methodologies have been developed to tackle this problem.Specifically, the optimal detector for flat-fading channels with known channel statistics is studied in [24,25], which has a prohibitively high complexity.Suboptimal receivers in flat-fading channels employ a two-stage structure, with a channel estimation stage followed by a sequence detection stage.Channel estimation is typically implemented by a Kalman filter or a linear predictor, and is facilitated by per-survivor processing (PSP) [26], decision feedback [27], pilot symbols [28], or a combination of the above [29].
Other suboptimal receivers for flat-fading channels include the method based on a combination of a hidden Markov model and a Kalman filtering [30], and the approach based on the expectation-maximization (EM) algorithm [31].
In the communication system, the transmitted data symbols {s t } take values from a finite alphabet set A 1 = {a 1 , . . ., a |A1| }, and each symbol is transmitted over a Rayleigh fading channel.As shown in [32,33], the fading process is adequately represented by an ARMA model, whose parameters are chosen to match the spectral characteristics of the fading process.That is, the fading process is modelled by the output of a lowpass Butterworth filter of order r driven by white Gaussian noise where and {u t } is a white complex Gaussian noise sequence with unit variance and independent real and complex components.The coefficients {φ i } and {θ i }, as well as the order r of the Butterworth filter, are chosen so that the transfer function of the filter matches the power spectral density of the fading process, which in turn is determined by the channel Doppler frequency.On the other hand, a simpler method, which uses a two-path model to build ARMA process, can be found in [34]; the results there closely approximate more complex path models.Then such a communication system has the following state-space model form: where The fading coefficient sequence {α t } can then be written as In the state-space model, {u t } in (46) and {v t } in (47) are the white complex Gaussian noise sequences with unit variance and independent real and imaginary components: In our simulations, the fading process is specifically modeled by the output of a Butterworth filter of order r = 3 driven by a complex white Gaussian noise process.The cutoff frequency of this filter is 0.05, corresponding to a normalized Doppler frequency (with respect to the symbol rate 1/T) f d T = 0.05, which is a fast-fading scenario.That is, the fading coefficients {α t } are modeled by the following ARMA (3,3) process: where u t ∼ N c (0, 1).The filter coefficients in (51) are chosen such that Var{α t } = 1.However, a simpler method, which uses a two-path model to build ARMA process, can be found in [34].
We then apply the proposed multilevel MKF methods to the problem of online estimation of the a posteriori probability of the symbol s t based on the received signals up to time t.That is, at time t, we need to estimate Then a hard maximum a posteriori (MAP) decision on symbol s t is given by If we obtain a set of Monte Carlo samples of the transmitted symbols {(S , properly weighted with respect to p(S t | Y t ), then the a posteriori symbol probability in (52) is approximated by with W t m j=1 w ( j) t .In this paper, we use the 16-QAM modulation.Note that the 16-QAM modulation has much more phase ambiguities than the BPSK in [13].We have provided the following two schemes to resolve phase ambiguities.Scheme 1 (pilot-assisted).Pilot symbols are inserted periodically every fixed length of symbols; the similar scheme was used in [20].In this paper, 10% and 20% pilot symbols are studied.
Scheme 2 (differential 16-QAM).We view the 16-QAM as a pair of QPSK symbols.Then two differential QPSKs will be used to resolve the phase ambiguity.Given the transmitted symbol s t−1 and information symbol d t , they can be represented by the QPSK symbol pair as (r st−1,1 , r st−1,2 ) and (r dt,1 , r dt,2 ), respectively.Then the differential modulation scheme is given by The two-QPSK symbol pair (r st,1 , r st,2 ) will be mapped to the 16-QAM symbols and then transmitted through the fading channel.The traditional differential receiver performs the following steps:   We next show the performance of the multilevel MKF algorithm for detecting 16-QAM over flat-fading channels.The receiver implements the decoding algorithms discussed in Section 3 in combination with the delayed-weight method under Schemes 1 and 2 discussed above.In our simulations, we take m = 50 Markov streams.The length of the symbol sequence is 256.We first show the bit error rate (BER) performance versus the signal-to-noise (SNR) by the PSP, the multilevel MKF, and the MKF receiver under different pilot schemes without delayed weight in Figure 2 and with delayed weight in Figure 3.The numbers of bit errors were collected over 50 independent simulations at low SNR or more at high SNR.In these figures, we also plotted the "genie-aided lower bound." 1 The BER performance of PSP is far from the genie bound at SNR higher than 10 dB.But the performance of the multilevel MKF with 20% pilot symbol is very close to the genie bound.Furthermore, with the delayed-weight method, the performance of the multilevel MKF can be significantly improved.We next show the BER performance in Figure 4 under the differential coding scheme.As in the pilotassisted scheme, the BER performance of PSP is far from the genie bound at SNR higher than 15 dB.On the contrary, the   performance of the multilevel MKF is very close to that of the original MKF although there is a 5 dB gap between the genie bound and the performance of the multilevel MKF.
The BER performance of the MKF and the multilevel MKF with or without delayed weight versus the number of Markov streams is shown in Figure 5 under the differential coding scheme.The BER is gradually improved from the value 0.16 to the value about 0.08 for multilevel MKF, 0.07 for MKF, 0.065 for multilevel MKF with delayed weight, and 0.062 for MKF with delayed weight with 25 Markov streams.However, the BER performance can not be improved anymore with more than 25 streams.Therefore, the optimal number of Markov streams will be 25 in this example.
Next, we illustrate the performance of the delayed multilevel MKF.The receiver implements the decoding algorithms discussed in Section 4. We show the BER performance versus SNR in Figure 6, computed by the delayed-sample or the delayed multilevel MKF with one delayed step (∆ = 1).The BER performance of the MKF and the MKF with delayed weight is also plotted in the same figure.In the delayed multilevel MKF, we implement two schemes for choosing the sampling space for the "future" symbols.In the first scheme, we choose the second-level (l = 2) sampling space A 2 = {c 1 , c 2 , c 3 , c 4 }, where c i , i = 1, . . ., 4, are the solid circles shown in Figure 1; and in the second scheme, we choose the third-level (l = 3) sampling space A 3 = {c 1 + c 2 , c 3 + c 4 }.It is seen that the BER of the multilevel MKF is very close to that of the delayed-sample algorithm.It is also seen that the performance of the multilevel MKF method is conditioned on the specific level sampling space.The performance of the delayed multilevel MKF based on the second-level sampling space A 2 is nearly 2 dB better than that based on the thirdlevel sampling space A 3 .The BER performance of the delayed multilevel MKF with the second (l = 2)-level or the third (l = 3)-level sampling space for the differential 16-QAM system over flat-fading channels.The BER performance of the delayed-sample method is also shown.

CONCLUSION
In this paper, we have developed a new sequential Monte Carlo (SMC) sampling method-the multilevel mixture Kalman filter (MKF)-under the framework of MKF for conditional dynamic linear systems.This new scheme generates random streams using sequential importance sampling (SIS), based on the multilevel or hierarchical structure of the indicator random variables.This technique can also be used in conjunction with the delayed estimation methods, resulting in a delayed multilevel MKF.Moreover, the performance of both the multilevel MKF and the delayed multilevel MKF can be further enhanced when employed in conjunction with the delayed-weight method.
We have also applied the multilevel MKF algorithm and the delayed multilevel MKF algorithm to solve the problem of adaptive detection in flat-fading communication channels.It is seen that compared with the receiver algorithm based on the original MKF, the proposed multilevel MKF techniques offer very good performance.It is also seen that the receivers based on the delayed multilevel MKF can achieve similar performance as that based on the delayed-sample MKF, but with a much lower computational complexity. y on this sample, form κ ( j) t using the results from the previous step.(iii) Compute the importance weight.If λ ( j) t−1 = a k and λ

Figure 2 :
Figure 2: The BER performance of the PSP, MKF, and multilevel MKF for pilot-aided 16-QAM over flat-fading channels.

Figure 4 :Figure 5 :
Figure 4: The BER performance of the PSP, the MKF, and the multilevel MKF for differential 16-QAM over flat-fading channels.The delayed-weight method is used with δ = 10.

Figure 6 :
Figure6: The BER performance of the delayed multilevel MKF with the second (l = 2)-level or the third (l = 3)-level sampling space for the differential 16-QAM system over flat-fading channels.The BER performance of the delayed-sample method is also shown.