EURASIP Journal on Applied Signal Processing 2002:3, 1–11 © 2002 Hindawi Publishing Corporation Joint Estimation and Decoding of Space-Time Trellis Codes

We explore the possibility of using an emerging tool in statistical signal processing, sequential importance sampling (SIS), for joint estimation and decoding of space-time trellis codes (STTC). First, we provide background on SIS, and then we discuss its application to space-time trellis code (STTC) systems. It is shown through simulations that SIS is suitable for joint estimation and decoding of STTC with time-varying ﬂat-fading channels when phase ambiguity is avoided. We used a design criterion for STTCs and temporally correlated channels that combats phase ambiguity without pilot signaling. We have shown by simulations that the design is valid.


INTRODUCTION
Space-time coding (STC) originally introduced by Foschini [1] and further developed by Tarokh et al. in [2] provides a framework for exploiting spatial and temporal diversity to increase data rate in wireless communications.A general introduction of space-time coding can be found in [3].Among families of space-time codes, we are interested in STTCs because of their many advantages over block STCs as pointed out in [2].
It is generally assumed that STCs will be used in fading environments, and therefore in decoding it is necessary to obtain channel state information (CSI), that is, estimate of the fading coefficients of the channel.In the literature, most of the time it is assumed that CSI becomes available through sending pilot sequences periodically from the transmit to the receive side.When CSI is not available, we have to estimate the channel, and that presents many challenges because the wireless channel may vary with time or frequency or both, and in systems with multiple transmit and receive antennas, many channels need to be estimated.Some design efforts have been directed to the construction of STCs which allow for circumventing channel estimation such as the unitary STmodulation scheme discussed in [4] or differential spacetime modulation considered in [5].An interesting Kalman filter space-time algorithm for joint estimation and decoding of Alamouti's block ST code [6] was proposed in [7], where Kalman filtering is used for tracking the channel.We have not seen similar work for STTCs.Here we consider the problem of joint channel estimation and decoding of STTCs when no pilot signal is available and the channel is modeled as timevarying flat-fading channel.Because STTCs use complicated modulation schemes, the observation at the receive side may not be a linear function of the transmitted data.Also, the CSI is hidden in the observation, and it is hard to use conventional methods like the Kalman filter to solve this problem.In this paper, we address joint estimation and detection of STTC by the sequential importance sampling (SIS) method combined with Kalman filtering.
The following text is a brief summary of SIS.The statistical property of many parameters in communication problems, like the fading coefficients in flat-fading channels, channel impulse response in frequency selective fading channels, and the trellis coded user data can be characterized by hidden Markov processes (HMPs).Consequently, many systems, including STTC systems, can be described adequately by dynamic state space models (DSSMs) [8,9], which use state equations for describing the dynamics of the HMPs {x t } of interest, and observation equations for representing the observations {y t } as functions of {x t }.In the Bayesian framework, all information about the unknowns {x t } is included in the posterior distribution, p(x 0 , . . ., x T |y 0 , . . ., y T ), up to time T , from which the marginalized posterior distribution of interest, p(x t |y 0 , . . ., y T ) for all t ≤ T , can be derived.However, the posterior density is usually high dimensional, and it is hardly ever possible to find the analytical expression for the density function, let alone to evaluate it.Due to the properties of the HMP, the posterior distribution, p(x 0 , . . . ,x T |y 0 , . . . ,y T ), can be factored in many ways and each term after the factorization may be evaluated individually.For example, if the posterior density is written as p x 0 , . . ., x T |y 0 , . . ., y T ∝ p x 0 , . . ., x T −1 |y 0 , . . ., y T −1 p y T |x T −1 , (1) this factorization provides a way to evaluate the desired posterior density recursively.However, except in the few special cases like in the linear Gaussian DSSM when Kalman filtering can be applied, there is usually no analytical solution to evaluate it recursively.Recently, a variety of Monte Carlo sampling methods, like importance sampling, Markov Chain Monte Carlo (MCMC) including Gibbs sampling [10,11,12], have revived Bayesian signal processing with their ability to sample from highly complicated probability density functions and thereby allowing for the computation of complex high dimensional integrals.In the literature, SIS [13,14,15], was introduced for solving problems that can be described by DSSMs without the strict restrictions imposed by the Kalman filter.In communications, SIS methods have been used for blind detection of user data [16,17,18].Because there are no mathematical approximations of the model, SIS methods are found to perform better than traditional methods like the Extended Kalman Filter (EKF) [19,20] or the Gaussian Sum Filter (GSF) [21].It should be noted, however, that the sampling based methods use approximations too in that they approximate relevant densities with samples from these densities.The good performance, versatility, and the potential for implementation in VLSI make SIS methods very attractive.
This paper is organized as follows: Section 2 introduces the basic concept of SIS, where many aspects of SIS are discussed.Section 3 elaborates the application of SIS to the joint estimation and decoding of STTCs.In Section 4, we analyze the problem of phase ambiguity and find that when the channel is temporally correlated, it is possible to design ST coding schemes that alleviate phase ambiguity.There we propose an ad hoc space-time coding scheme, which is effective in combating phase ambiguity.Simulations are presented in Section 5, and in Section 6, conclusions are drawn.

SEQUENTIAL IMPORTANCE SAMPLING
In this section, we provide some background of SIS and the important special case of mixture SIS and Kalman filtering, which is suitable for joint estimation and decoding of STTCs in a time-varying flat-fading environment.
The SIS algorithm is applicable to systems that can be described by DSSMs which consist of state and observation equations, that is, where f (•) is the state transition function, which defines how the state vector x t evolves with time based on the previous state vector x t−1 and the input noise vector u t at time t, and g(•) is an observation function of x t and the observation noise vector v t .We can see that the state transition function defines a Markov process because, given the previous state vector x t−1 , the current state vector is no longer dependent on other previous state vectors or observations.Note that the state vector is not observed directly, and therefore the model describes an HMP.
From Bayesian perspective, all information about the state vectors is contained in the full posterior density p(x 0:t |y 0:t ) from which the filtering density p(x t |y 0:t ) and the smoothing density p(x t |y 0:T ) for all t < T can be obtained.Here, x 0:t = {x 0 , . . . ,x t } is the set of all state vectors from time 0 to time t, and y 0:t is defined similarly.When the full posterior density is available, the expected value of any function of the state vectors, ξ(x 0:t ) can be evaluated by E x0:t |y0:t ξ x 0:t = ξ x 0:t p x 0:t |y 0:t dx 0:t .
(3) If a DSSM is linear and Gaussian, that is, both the state transition and the observation functions are linear and both noise vectors u t and v t are Gaussian, the full posterior density of interest, p(x 0:t |y 0:t ), can be obtained using the Kalman filter.In other cases, if the DSSM is not linear and Gaussian, approximations to the model must be made before applying the Kalman filter.Because of these approximations, the performance of the Kalman filter is limited.On the other hand, SIS can be used in such cases as discussed in [13,16].With SIS, the goal is to generate samples from p(x 0:t |y 0:t ) that are used for computation of statistical expectations as in (3).However, it is generally hard to draw samples directly from the posterior distribution, p(x 0:t |y 0:t ), and instead, the importance sampling method is employed.Samples, {x (j)   0:t } J j=1 , where j = 1, 2, . . ., J is a sample index, are generated from a proposal density function π(x 0:t |y 0:t ), and subsequently they are weighted according to the real posterior distribution p(x 0:t |y 0:t ).The weight of a particular sample is given by The expectation in (3) is then approximated using Monte Carlo integration, that is, where is the sum of the unnormalized weights.In the literature, the weighted sample (x (j) is also called a particle, and sometimes SIS is referred to as particle filtering.Note that in this paper, x (j)  t is called a sample element.
An interesting characteristic of the full posterior density function at time t is that it can be factored as a product of the full posterior density of the previous time instant t − 1 and an incremental term, p x 0:t |y 0:t ∝ p x 0:t−1 |y 0:t−1 × p y t |x t p x t |x t−1 .(6) If the incremental term p(y t |x t )p(x t |x t−1 ) can be evaluated at each time instant, we can obtain the full posterior density recursively.If the importance function is chosen in the form of π x 0:t |y 0:t = π x 0:t−1 |y 0:t−1 π x t , (7) we can derive an SIS algorithm which allows us to draw samples and evaluate their weights recursively.Suppose that we have obtained the sample set {x (j) properly weighted with respect to p(x 0:t−1 |y 0:t−1 ) from previous recursions.Then at time t for the jth sample, we (1) draw a new sample element x (j)  t from the incremental proposal density π(x t ), (2) append the new sample element x (j)  t to x (j) 0:t−1 and obtain the new sample x (j)  0:t , and (3) evaluate the weight for the new sample x (j)  0:t according to (4) ( The same procedure is repeated for all J samples.The path that each sample takes is also called a trajectory and so there are totally J trajectories when running the SIS algorithm.Note that the incremental proposal density function π(x t ) is simply referred to as the proposal density function or importance function in the literature.
Next question of interest is the selection of the importance function, π(x t ).First of all, π(x t ) must have appropriate support.Second, we should be able to draw samples from π(x t ), and third, π(x t ) should be easily computable.There are many choices, two of which are the prior density function p(x t |x t−1 ) and the so-called "optimal" proposal function p(x t |x 0:t−1 , y 0:t ).Intuitively, good proposal functions take into account all the available observations and past state vectors.
The optimal importance function is proportional to the product of two density functions, (9) and after obtaining the new sample element x (j)  t , its weight x (j) 0:t is assigned using ( 8) We can see that to compute w (j) t from w (j) t−1 , we need to evaluate an integral because If x t is defined on a finite discrete space, there would be no problem in computing (11) when the size of the space is small because the integration simply becomes a sum of small number of proposal densities.When x t is defined on an infinite discrete space or continuous space, there is in general no analytical solution to (11).

Mixture SIS and Kalman filtering
An important special case of SIS is when the state vector can be partitioned into two parts, where one of them is conditionally linear and Gaussian given the other part.For example, in wireless communications with flat fading, the observation is the product of the sent user signal and the fading coefficients embedded in additive noise.The state vector x t contains two parts (h t , s t ), where h t is the state vector of the fading coefficients (channel state vector) and s t is the user state vector.The user state vector and channel state vector can all be modeled as HMPs, that is, where d t is the user data input at time t, z(•) is the state transition function of the user state vector, and E is a known matrix.The observation equation can be written as where C(•) is a matrix coding and modulation function of the communication system and R is another known matrix of model coefficients (their meaning will be explained in the sequel).Given s t , the DSSM is linear and Gaussian in h t , that is, both noise vectors u t and v t are complex Gaussian and the state transition and observation functions are linear and are described by the state transition matrix F and matrix R. Note that in most cases, one can find the unique inverse function of the state transition function of the user state vector, d t = z −1 (s t , s t−1 ), and then the inference of d 0:t can be derived from s 0:t .Here the posterior density of interest is p(s 0:t |y 0:t ), and when applying the SIS algorithm, the optimal proposal density for the jth sample is p(s t |s (j)  0:t−1 , y 0:t ), which can be expanded according to Given all the past user data s (j) 0:t−1 , the DSSM defined by ( 13) and ( 14) becomes linear Gaussian for the channel state vector h t , and therefore the density function p(h t |s (j)  0:t−1 , y 0:t−1 ) can be evaluated using the predictive step of Kalman filtering in terms of its mean and variance.In turn, because p(y t |s t , h t ) is a Gaussian function, the integration in ( 15) can be carried out analytically.In essence, when the state vector is composed of two parts, we integrate out the part that we are not interested in.This method is referred to as Rao-Blackwellization in [22].
Based on this line of reasoning, a mixture of SIS and Kalman filtering algorithm was developed in [16].Suppose that at time instant t for the jth trajectory, we have obtained from previous recursions via Kalman filtering and SIS the posterior estimate of the channel state vector p(h t−1 |s (j) 0:t−1 , y 0:t−1 ) and s (j)  0:t−1 weighted according to the posterior distribution p(s 0:t−1 |y 0:t−1 ).The following steps are then performed: (1) predict the current channel state vector using the Kalman filter based on the previous sample and all available observations, that is, find p(h t |s (j) based on p(h t−1 |s (j) 0:t−1 , y 0:t−1 ); (2) sample from the posterior distribution p(s 0:t |y 0:t ) using importance sampling.The optimal importance function p(s t |s (j) 0:t−1 , y 0:t ) is given by ( 15); (3) the weight of the drawn samples is computed by (11), which involves integration.Note that, the transmitted user signals belong to a finite signal set A and the integration becomes a summation, or (4) compute the posterior probability of the channel state vector p(h t |s (j) 0:t , y 0:t ) using the Kalman filter.

Delayed estimates
When there exists channel coding or Inter Symbol Interference (ISI), future observations, too, contain information about the current state vector.In such cases, a delayed estimate will be more accurate.There are two methods pertaining to delayed estimation, one of which is the delayed weight method and the other, the delayed sample method [16].In the following discussion, first we assume that the only unknown portion of the state vector is the user state vector, which belongs to a finite discrete space, that is, The distribution function of interest of the delayed weight method is p(s t |y 0:t+q ).It can be obtained through marginalizing the full delayed posterior distribution, p(s 0:t+q |y 0:t+q ).If at time t + q, we have obtained samples {s (j) 0:t+q } J j=1 appropriately weighted with weights {w (j)   t+q } J j=1 , the full posterior distribution at t + q is approximated by,  (18) The second method of delayed estimation comes from the observation that we can improve sampling efficiency by incorporating future observations to the proposal distribution, which becomes p(s t |s (j)  0:t−1 , y 0:t+q ).This proposal distribution can be expanded according to ( The evaluation of this proposal density is quite complex because in essence, the algorithm is propagated for all possible future state vectors from t + 1 to t + q and then marginalized for s t .If the state vectors are continuous, the summation will become integration in (19) and in most cases there will be 1 no analytical solution.
The delayed proposal distribution also needs to be weighted according to the desired posterior distribution p(s t |y 0:t+q ), or where the denominator can be expanded as st:t+q−1∈A q p y t:t+q−1 |s 0:t+q−1 p s t:t+q−1 |s ( Now we assume that the channel state vector is also unknown and the two parts of the state vector are conditionally Gaussian.Then the proposal distribution for the delayed sampling method can be derived similarly as the one of the non-delay case, and it results in p s t |s (j) 0:t−1 , y 0:t+q = st+1,t+q∈A q p s t , s t+1:t+q |s (j) 0:t−1 , y 0:t+q .(22) The term inside the summation in (22) can be readily evaluated.The weights of the samples are obtained similarly as in (20).Based on these derivations, an algorithm was developed for the online estimation of user data using the delayed sample method when the state vector can be partitioned into two parts.The details of the algorithm can be found in [16].The complexity of the algorithm is proportional to the size of the set A q .If the size of the set is moderate, the algorithm is still practical; otherwise, the algorithm is just too computationally intensive to be applied.

APPLICATION OF SIS ON JOINT ESTIMATION AND DECODING OF STTCs
In Section 2, we showed that the SIS algorithm can be used to systems described by DSSMs, and when the state vector consists of two parts, that the mixture SIS and Kalman filtering algorithm can be applied.Therefore, to use the SIS algorithm, first we try to represent the STTC system as a DSSM.
Suppose that a communication system employs N transmit and M receive antennas as in Figure 1.A sequence of user data symbols, s 0 , . . ., s t , where s t ∈ A, is put through a trellis space-time encoder.The new state vector of the trellis space-time encoder at time t is determined according to the state transition equation s t = z(s t−1 , s t ), where s t−1 is the previous state, and s t is the new user state.For example, in the case of delayed diversity STTC, the state vector simply consists of the last two user symbols, s T t = [s t , s t−1 ].Based on the current state vector, the space-time encoder then generates a set of N symbols, c T (s t ) = [c 1 (s t ), . . . ,c N (s t )], to be transmitted by N antennas where c i (•) denotes the code and modulation function of the ith antenna.Suppose α nm,t is the fading coefficient from the nth transmit antenna to the mth 19 20 receive antenna at time t.When assuming ideal timing and frequency information, the received signal in the flat fading environment at the mth antenna can be written as α nm,t c n s t + v m,t , m = 1, 2, . . . ,M, ( where v m,t ∼ N c (0, σ 2 vm ) is a complex Gaussian process present at the mth receive antenna.Let α m,t = [α 1m,t • • • α Nm,t ] T represent the set of channel states from all transmit antennas to the mth receive antenna.If we arrange all the channel states at time t into a single NM × 1 , all the received signals at time t can be written in the vector form as where T is the received signal vector, and ] T is the noise vector.The code and modulation matrix is an M × NM matrix of the form Note that here 0 is an N×1 all-zero vector.This somewhat odd matrix representation is selected to simplify the description of our joint estimation and decoding algorithm.The fading coefficients from the nth to the mth antenna can be modeled as an ARMA process to match the power spectral density of the channel [16].An ARMA(r 1 , r 2 ) process can be represented as 6 where u nm,t is a unit variance i.i.d.complex Gaussian process that drives the ARMA process, and {φ i } and {ρ i } are known auto-regressive (AR) and moving-average (MA) coefficients.In order to represent the ARMA process as a DSSM, we assume equal orders of the AR and MA parts of the model, that is r 1 = r 2 = r .Otherwise, we make the orders equal with zero padding.The ARMA process can be described as a DSSM if we introduce h nm,t = [h nm,t , . . ., h nm,t−r ] T as the state vector of dimension r + 1 for the AR part of the ARMA model (see the Appendix for more detailed information).The state transition equation is in the form of where The observation equation of the ARMA DSSM is α nm,t = ρ T h nm,t , where ρ is a vector of known MA coefficients, that is, Considering all the fading coefficients involved, one can represent the whole system in a compact DSSM form given by where h t = [h 11,t , . . . ,h N1,t , . . . ,h 1M,t , . . . ,h MN,t ] T is the extended state vector.The other symbols, R, F, E, in the above equations are defined similarly (see the Appendix).Now the received signal vector can be written as (32)

Mixture Kalman filtering and sequential importance sampling for STTC
We have represented the STTC system by a DSSM, and we notice that the state vector consists of two parts.One part is the user state vector s t , and the other is the channel state vector h t , which is linear Gaussian given s t .Consequently, the mixture SIS and Kalman filtering algorithm can be applied.Define the set of all received signals at time t as y 0:t = {y 0 , y 1 , . . ., y t }, and define s 0:t and h 0:t similarly.Our goal is to sample from the posterior distribution p(s 0:t |y 0:t ).The algorithm for doing it is summarized as follows: for the recursion of the jth sample at the tth time instant: (1) sample from the optimal proposal distribution following (15).In case of the STTC, the proposal distribution for s t = a i ∈ A or equivalently s t = z(a i , s (j) t−1 ) can be obtained by (33) In the above equation, the term p(h t |s (j) 0:t−1 , y 0:t−1 ) ∼ N c (µ ĥt , Σ ĥt ) is calculated via the prediction step of Kalman filtering given the previous elements of the jth sample s 0:t−1 .Then we can find that the likelihood y,t ) in terms of its mean and covariance matrix (34) Here, Q v is the covariance matrix of the received noise vector of dimension M × M. At the end of this step, we have the jth sample as s (j) 0:t and π (j)   t,st =z(ai,s for (2) compute the weight of the jth sample according to ( 16); (3) update the estimate of the channel state vector probability given the new observation y t and the new sample s (j) 0:t to prepare for the next round of estimation using the Kalman filter.It is easy to verify that the posterior density of the channel state is proportional to the product of the likelihood and the predictive channel density p h t |y 0:t , s 0:t ∝ p y t |h t , s t p h t |y 0:t−1 s 0:t−1 . (35) The predictive density of the channel state vector is Gaussian, that is, N c (µ (j) ĥt , Σ

(j) ĥt
).The likelihood function is proportional to a Gaussian with mean and covariance given by (34), and the posterior density of the channel state vector is a complex Gaussian distribution with mean and covariance where and (•) H stands for Hermitian transpose.Note that C(s (j) t ) need not be full rank to ensure the existence of the matrix inversion in (37).Equation ( 37) is equiv-7 alent to the Kalman filter equations and K is analogous to the Kalman filter gain when the code matrix C(s (j) t ) is known; (4) predict the channel state vector for the next time instant and calculate the mean and covariance matrix of the prediction.The mean vector and covariance matrix are Here Q u is the NM × NM covariance matrix of the complex Gaussian noise process that drives the ARMA process of the fading coefficients; (5) draw inference about the transmitted data.According to [16], the a posteriori symbol probability is estimated by Note that in the algorithm described above, it is not required that the fading coefficients or the noise vector at the receive antennas be independent as required in most other algorithms.
Since the space-time code is trellis coded, besides the current and previous received signals y 0:t , future observations hold information about the current user state, hence it is appropriate to use the delayed importance function as well as the delay weight method in evaluating the posterior density function.The number of delays can be chosen according to the constraint length of the trellis code.

PHASE AMBIGUITY
The fading coefficients and the modulated user data are the unknowns to be estimated, and there may be multiple pairs of solutions to the same observation.For example, if (C(s t ), α t ) is the true solution for the observation equation a phase shifted version ( C(s t ), αt ) = (C(s t )Θ, Θ −1 α t ) will be an equally likely solution, where Θ is a phase shifting matrix.Such unfavorable condition is called phase ambiguity.
(42) This obviously is a major problem that needs consideration.In the literature, unless unitary STC is employed, pilot signal is used for channel estimation.However, unitary STC or pilot signal are not necessarily the best strategies.To reduce phase ambiguity while providing better coding gain, here we opt to use an STTC designed using a new criterion proposed in [23] and summarized here.
It is desirable to design a set of code vector sequences so that with optimal joint estimation and decoding for any pair (C 0:T , C0:T ), the condition p(C 0:T , α 0:T |y 0:T ) > p( C0:T , α0:T |y 0:T ) is met.With the assumption in (42) and because the code vector sequences are independent of the channel vector sequences, the above condition can be simplified as The last step of the derivation is based on the assumption that the user data sequences are i.i.d. and therefore the prior probability for each code vector sequence is the same.If the 22 23 24 channel state vectors are i.i.d. in time, then the condition in (43) cannot be met, the two sides are always equal to one another and there is nothing we can do in terms of ST design to solve the problem.On the other hand, if the channel state vector is represented by an HMP as we have described, then the condition can be further simplified using the chain rule, ( It can be verified that the only case when the above condition is not satisfied is when the phase difference sequence is homogeneous, that is, Therefore, in designing the code vector sequence set, we should avoid the situation when the phase difference sequences are homogeneous between any pair of code vector sequences.If this cannot be avoided completely, one should try to alleviate frequent appearance of code vector sequences with homogeneous phase difference sequences.Consider, for example, the 8PSK delay diversity STTC with two transmit antennas as described in [2].We refer to this code as Tarokh STTC.The constraint length of the trellis is two and the codes for the two transmit antennas are where s t = [s t , s t−1 ] T , and each user data symbol s t ∈ A = {0, 1, . . ., 7}.The first two constellations in Figure 4 illustrate the two codes represented by (45).The phase shifting matrix in this case has two elements and it is given by Θ = diag{θ 1 , θ 2 }.A phase shifted version of the channel state estimate will produce erroneous estimate of the transmitted user data.For example, if the two channel estimates over two consecutive time slots are phase shifted by π , then the estimate of s t at t is ŝt,1 = c −1 1 (c 1 (s t ) * e −jπ ), and at It can be verified that for the code and modulation function c 1 (•) and c 2 (•) in (45), the two estimates of s t are equal for all s t ∈ A. Therefore, the phase shifted channel estimate and ŝt are equally likely estimates of s t .In summary, for this STTC the condition for homogeneous phase difference sequences to occur is equivalent to the following: To address the problem of phase ambiguity, one may design space-time codes such that the occurrence of the condition specified in (46) is minimized given a certain coding gain.Define the degree of phase regularity as the number of pairs of user data which have the same phase difference among all the modulation constellations.For example, the phase difference of s t = 2 and s t = 4, is the same in constellations c 1 and c 2 .We can find 12 pairs of such user data and the degree of phase regularity is 12.To reduce the occurrence of ( 46) is equivalent to reduce the degree of phase regularity.We can reduce phase regularity by redesigning the constellation and/or by increasing the number of constellations at the expense of coding, spatial, or temporal efficiency.The design of temporal/spatial varying modulation constellations that best explore the spatial and temporal diversities while balancing the need of reducing phase regularity is a very challenging task itself.
We consider the usage of pilot signal as a special case of the general idea for reducing phase regularity by using different constellations across different antennas or time slots.The use of pilot signal can be viewed as a special constellation, where all user data are modulated at one point, and the pilot signal is transmitted periodically.

SIMULATION
We simulated a two transmit and one receive antenna STTC system.The ARMA model for the fading coefficient chosen is as described in [16], and it is of order (3,3).This model corresponds to a fast fading scenario with a normalized Doppler frequency (with respect to the symbol rate 1/T ) f d T = 0.05.
We applied the mixture SIS and Kalman filtering algorithm to the Tarokh STTC system described in Section 4. 8 Due to phase ambiguity, the algorithm invariably breaks down.Figure 2 displays the tracking of the complex fading coefficients where the dotted lines represent the true channel, and the solid lines show the estimated channel.Consequently, the estimated user symbols are erroneous under the phase shifted channel estimates.To alleviate phase ambiguity, first we tried the traditional way of sending pilot signals.To accommodate pilot signals, each time-slot was divided into two sub-slots (an increase in bandwidth), and it was considered that fading coefficients did not change between the sub-slots.The first set of sub-slots were assigned to transmit user signals coded by Tarokh STTC, and the second set of sub-slots were used to send pilot signals, that is, both transmit antennas sent known signals to the receiver.In our scheme, the pilot signals used the same amount of energy as the user signals.Pilot signals were inserted every other symbol because of the considered fast fading channel.The tracking of the channel is shown in Figure 3.We can see that phase ambiguity no longer exists.However, the performance of the algorithm in terms of symbol error rate (SER) is not satisfactory which is shown in Figure 6.In Section 4 we pointed out that if we design the code vector sequences so that the phase difference sequences are not homogeneous, phase ambiguity can be avoided.Here we propose 4 ad-hoc designed 8PSK constellations to be used when each time-slot is divided into two sub-slots as in the case with pilot signals.The received signal at time slot t can be described as In the above equation, y  /4) , −j, e (−jπ3/4) , e (jπ /4) , j, e (jπ 3/4) , −1 , c 4 = 1, e (jπ 3/4) , −j, e (jπ /4) , e (−jπ3/4) , j, e (−jπ/4) , −1 . (48) The constellations of c 1 to c 4 are shown in Figure 4.It can be verified that only two data pairs, (4,6) and (5,7), have the same phase difference across all four constellations.The degree of phase regularities is reduced to 2 as opposed to 12 in the original Tarokh STTC.Unless a long string of user data composed of only 4-7 occurs, which has very low probability when the user data symbols are i.i.d., phase ambiguity can be avoided.The tracking of the channel estimates is shown in Figure 5, which verifies our analysis of phase ambiguity.From Figure 6, we can see that the proposed system performs much better than the system using pilot signals although the two systems have the same time and frequency efficiency.In general, we have found that using pilot signals is not necessarily as efficient as using carefully designed space-time coding schemes.
It is well known that in the implementation of the SIS method, as the recursions proceed, the variance of the sample weights increases.This means that only a few samples have significant weights leading to poor estimates of the unknowns.This problem can be addressed by resampling [14], which is a process of redrawing a new set of samples from the old set where the variance of the new weights is smaller than the variance of the old weights.During our simulations, we found that we had to apply resampling frequently, which is essential for good performance of the algorithm.We used the same residual resampling process as described in [16] and resampling was performed every 5 steps.Because the constraint length is one in the STTC, the number of delayed weights and    delayed samples was all selected as one.As a lower bound, the results were compared with the Genie aided case when a separated stream of known user data was space-time coded in the same proposed scheme and sent through the same channel for  the estimation of the channel.The true user data were then decoded using the additional transmitted data.As we can see from Figure 6, there is a 3 dB performance gap between the two cases.If we take into account the total amount of energy used for channel estimation in the Genie aided case, the performance loss of 3 dB can be explained.For every simulated point, at least 100 symbol errors were accumulated.
If the delayed weight and delayed sample methods are not used, the performance deteriorates considerably.In Figure 7, we compared the performance when different number of delayed samples ∆ = 0, 1 and delayed weights δ = 0, 1 were used in the algorithm.The results in Figure 7 are consistent with the ones in [16].

CONCLUSIONS
In this paper, we proposed and showed the viability of mixture SIS and Kalman filtering algorithm for joint estimation and decoding of STTCs.The channels in the system were modeled as ARMA processes of known orders and model coefficients, and the whole system was represented by a DSSM.Good results can only be obtained if the problem of phase ambiguity is addressed appropriately.Four ad-hoc designed constellations were used in our simulations and the results showed that the phase ambiguity was avoided.
Future research could extend to the design of STTCs that can best explore the spatial/temporal diversity and reduce phase ambiguity when using the SIS algorithm.Another direction is to apply more computationally efficient delayed sampling algorithms based on SIS for the joint estimation and decoding of STTC.

(A.2)
If we let h nm,t = [h nm,t , . . ., h nm,t−r ] T be the channel state vector, its dynamics in time is described by (29), and the fading coefficient becomes α nm,t = ρ T h nm,t .

A.2. The construction of extended matrices
In this part we list the construction of R, E, and F. First, the NM(r + 1) × NM matrix R is represented as where ρ is the (r + 1) × 1 vector as defined before, and 0 is an all-zero vector of the same dimension.E is of dimension NM(r + 1) × NM and it is written as where 0 is an (r + 1) × (r + 1) all-zero matrix.

p
s 0:t+q |y 0:t+q = where I(•) is an indicator function which equals to 1 when s (j) 0:t+q = s 0:t+q .Similarly, the density function of interest can be approximated as p s t |y 0:t+q =

1 t and y 2 t represent the receivedFigure 2 :
Figure 2: Example of phase ambiguity.

Figure 5 :
Figure 5: Channel tracking of the proposed STTC.
where 0 is also an (r +1)×1 all-zero vector.Finally, the square matrix F of dimension NM(r +1)×NM(r +1) is constructed according to , that is, ) can be obtained using the prediction step of Kalman filtering if the channel state is Gaussian.The prediction can give the mean and covariance matrix of the channel state, and the integral p y t+l |s t+l , h t+l p h t+l |s t+1:t+l , s t , s p s t , s t+1:t+q |s (j) 0:t−1 , y 0:t+q ∝ p y t+q |s t+1:t+q , s t , s = p y t+q |s t+q , h t+q × p h t+q |s t+1:t+q , s t , s × p h t+l |s t+1:t+l , s t , s (j) 0:t−1 , y 0:t+l−1 dh t+l × p s t , s t+1:t+q |s (j) 0:t−1 , y 0:t−1 .(23) The probability density function of the channel state p(h t+l |s t+1:t+l , s t , s (j) 0:t−1 , y 0:t+l−1