EURASIP Journal on Applied Signal Processing 2002:8, 787–796 c ○ 2002 Hindawi Publishing Corporation EM-Based Multiuser Detection in Fast Fading Multipath Environments

We address the problem of multiuser detection in fast fading multipath environments for DS-CDMA systems. In fast fading scenarios, temporal variations of the channel cause significant performance degradation even with the Rake receiver. We use a previously introduced time-frequency (TF) Rake receiver based on a canonical formulation of the channel and signals to simultaneously combat fading and multipath effects. This receiver uses the Doppler spread caused by rapid time-varying channel as another means of diversity. In dealing with multiaccess interference and as an attempt to avoid the prohibitive computational complexity of the optimum maximum-likelihood (ML) detector, we use the expectation maximization (EM) algorithm to derive an approximate ML detector. The new detector turns out to have an iterative structure very similar to the well-known multistage detector with some extra parameters. At the two extreme values of these parameters, the EM detector reduces to either one-shot TF Rake or generalized multistage detector. For the intermediate values of the parameters, it combines the two estimates to obtain a better decision for the bits of the users. Because of using the EM algorithm, this detector has better convergence properties than the multistage detector; the bit estimates always converge, and if an appropriate initial vector is used, they converge to the global maximizer of the likelihood function. As a result, the new detector provides significantly improved performance while maintaining the low complexity of the multistage detector. Our simulation results confirm the expected performance improvements compared to the base case of the TF Rake as well as the multistage detector used with the TF Rake.


INTRODUCTION
Multipath, fading, and multiple-access interference are the major factors that limit the performance of the existing mobile wireless communication systems. Fading of the received signal caused by wireless channels, coupled with the interference from other transmitters using the same channel, significantly degrades the performance of the receiver.
Wideband code-division multiple access (WCDMA), the accepted technology for the next generation cellular networks, provides intrinsic protection against the multipath effects of the channel. A Rake receiver structure is used to exploit the large time-resolution of the wideband signal and capture the information in its multipath components.
In fast-fading scenarios, temporal variations of the channel cause significant performance degradation even with the Rake receiver. The Doppler spread caused by rapid timevarying channel can be used as another means of diversity in such environments. Joint multipath-Doppler diversity schemes [1,2,3] use a canonical representation of the channel and signals to capture the multipath-Doppler components of the signal.
Lupas and Verdu [5] describe a family of linear detectors called decorrelator. These detectors eliminate multiuser interference at the expense of increased noise power. Furthermore, the linear decorrelating detectors require the correlation matrix inversion, which may be difficult to perform in real time, especially for asynchronous systems. Some suboptimal approaches have been taken to implement the decorrelating detector for asynchronous systems [6,7,8,9]. The most important advantage of the decorrelating detector is that it does not require the estimation of the received amplitudes.
Madhow and Honig [10] and Xie et al. [6] describe a minimum mean-squared error (MMSE) linear detector, which minimizes the mean-squared error between the actual data and the conventional detector soft outputs. Because of taking the background noise into account, the MMSE detector generally performs better than the decorrelating detector, and converges to the decorrelating detector as the background noise goes to zero.
Duel-Hallen in [11] presents a nonlinear multiuser detector called decorrelating decision-feedback detector (DDFD) in which the users are ranked according to their signal strengths from the strongest one to the weakest one. This detector is based on a white noise channel model whose noise-whitening filter is obtained by the Cholesky decomposition of the cross-correlation matrix. The detector performs successive interference cancellation at the output of the noise-whitening filter using past decisions. For the strongest user, this detector performs similarly to the decorrelator, but as the user's power decreases compared to the power of interferers, the detector outperforms the decorrelator and its performance approaches the single user bound. However, its important difficulty is the need for computing the Cholesky decomposition. Other successive interference cancellation detectors are described in [12,13].
In [14,15], Varanasi and Aazhang describe a parallel interference cancellation detector called multistage detector in which the tentative decisions obtained from the previous stage are used to estimate and subtract the multiuser interference. The first stage decisions are usually obtained from the conventional detector. This detector, like the DDFD of [11], outperforms the decorrelator when interfering users are stronger than the user under consideration, but its performance degrades as the energies of the interfering users decrease.
The expectation maximization (EM) algorithm has also been applied for multiple-access interference suppression in CDMA systems [16,17,18,19], as well as for channel estimation [20,21,22]. In [16,19], an iterative interference cancellation method in additive white Gaussian noise (AWGN) channels based on the EM algorithm is proposed. Since the likelihood function is bounded above, and since the EM estimates monotonically increase in likelihood, the suggested receiver is convergent. Also, because of taking into account the previous decision about the data symbol of each user in making new decision for that user, this detector outperforms the parallel interference cancellation detector of [15] for strong users, while having similar performance for the other users.
In [17], Nelson and Poor propose some other iterative multiuser receivers for CDMA systems, based on the EM algorithm and its generalized versions, such as space alternating generalized EM (SAGE), and missing-parameter spacealternating algorithm. The suggested multiuser detectors have structures similar to the parallel interference cancellation method of [14], except that updates of the estimates are made sequentially, rather than in parallel. For the same reason mentioned above, these algorithms are also convergent. The MPEM receiver suggested in this paper has a computational complexity which is proportional to the square of the number of the users, whereas the computational complexity of the original parallel interference cancellation method grows only linearly with the number of users.
In [18], the EM algorithm is applied to maximize the likelihood function over a nondiscrete set. The discrete sequence is obtained by quantizing the unquantized estimated sequence at convergence. Since the nondiscrete maximization problem has a closed form solution, namely, the decorrelator, the performance of this scheme is expected to be upper bounded by the performance of the decorrelating receiver. However, depending on the number of the iterations used, the computational complexity of this scheme might be lower. The proposed receiver also iterates between path component estimation and maximal-ratio combining to refine the nondiscrete sequence estimate.
In this paper, we first review the canonical representation of the signal and channel in fast fading multipath environments [1,3]. Then, in Section 3, we review some of the multiuser detection techniques in fast fading channels using this representation. These include the optimal (minimum probability of error) and the linear suboptimal decorrelating and MMSE receivers, rederived in [23] for the time-frequency (TF) Rake, as well as a generalization of the multistage detector of [15].
As mentioned earlier, we intend to use the EM algorithm to find an iterative approximate ML solution for the multiuser detection problem. For this, we first, in Section 4, review the EM algorithm, and then, in Section 5, in a similar way to [16], derive the new detection scheme for fast fading multipath environments with canonical representation. The proposed detector uses the two-dimensional TF Rake receiver [1,3] to combat the fading and multipath effects. The simulation results are reported in Section 6, and show the superior performance of the proposed detector compared to the original TF Rake, as well as the generalized multistage detector. Finally, Section 7 contains the conclusions.

CANONICAL TIME-FREQUENCY REPRESENTATION OF THE SIGNALS AND CHANNEL
The TF canonical representation [1,3] exploits the multipath and Doppler effects for obtaining diversity and results in a two-dimensional Rake receiver, which extracts Doppler components in addition to multipath components. This representation reduces the channel to a set of independent channels for the different time-delayed frequency-shifted versions of the signal for each user. Figure 1 illustrates the locations of canonical coordinates in the time delay-Doppler shift plane, used for TF representation of the channel. In a multiuser system, the received signal is a superposition of the signals of different users and noise. In this work, we consider a synchronous CDMA system in which the signature sequences of different users are aligned in time. With this assumption, if the delay spread of the channel is much smaller than the symbol interval, we can ignore the correlation terms between the symbols of different users in adjacent time intervals, and use a one-shot detector for estimating the data bits of different users, as in [23]. Therefore, we can restrict ourselves to only the first time interval and assume that the received signal is as follows: where K is the number of users, b k denotes the data bit of the kth user, n(t) is a white Gaussian noise with zero mean and variance σ 2 , T s is the symbol interval, and In this equation, s k (t) and h k (t, τ) are, respectively, the signature signal and the time-varying channel impulse response for the kth user, and T m denotes the multipath (delay) spread of the channel. An equivalent representation for the signal x k (t) in terms of the channel spreading function H k (θ, τ) [24] (i.e., the Fourier transform of h k (t, τ) with respect to t), is where θ corresponds to Doppler shifts introduced by the channel and B d denotes the Doppler spread of the channel. We use the wide-sense stationary uncorrelated scatterer (WS-SUS) [24] model for the channel, which assumes that H(θ, τ) is a two-dimensional uncorrelated Gaussian process. For a spread spectrum signal s(t) of duration T s and chip interval T c , and with the WSSUS assumption for the channel, using the canonical coordinates [1,3], we can rewrite the signal x k (t) as where s ml  [23] given by the following expression: In order to simplify the mathematical expressions, we use the following vector notation for the time-delayed and frequency-shifted versions of the signature waveforms of the users, where for k = 1, 2, . . . , K. Using this representation, the K(L + 1)(2M +1)×K(L+1)(2M +1) cross-correlation matrix of the components of the signature waveforms of different users is where We also define the channel matrix H as where for k = 1, 2, . . . , K.
Using the above notations, (4) and (1) can be rewritten as where In Section 3, we will see that the outputs of the time-frequency Rake receiver, given as form a set of sufficient statistics for ML multiuser detection. We collect all of these vectors in one vector z Using (14) and (9), it can be easily shown that where is a zero mean complex Gaussian noise vector with

REVIEW OF SOME MULTIUSER DETECTION SCHEMES
In this section, we review the optimal and linear suboptimal multiuser detectors rederived in [23] for fast fading channels. We also consider the generalization of the well-known multistage detector to fast fading channels using the TF Rake.

Conventional single user receiver
The single user receiver assumes that there is no multiaccess interference, that is, either there are no interfering users, or the signature codes of all of the users and their shifted versions are orthogonal. It can be easily shown [1,2] that, in this case, the TF Rake receiver with maximal ratio combining (MRC), given by the following expression, is the optimal (i.e., minimum probability of error) receiver: This receiver coherently combines the different multipath-Doppler shifted components of the signal to achieve a diversity of order (L + 1)(2M + 1). Of course, it is assumed that the receiver has complete channel state information (CSI). In practice, channel coefficients, H ml k , may be estimated through a pilot signal transmission.
In the presence of multiaccess interference, that is, when the signature codes of the interfering users are not completely orthogonal, the above receiver is no longer optimal, and does not show acceptable performance. The optimal multiuser detector is discussed in Section 3.2, and has a much more computational complexity.

Minimum probability of error receiver
Initially introduced by Verdu [4], the ML multiuser receiver achieves the minimum probability of error and is optimal in this sense. For the problem under consideration, the loglikelihood function of the received signal (1) can be written as where A is a constant. The ML receiver finds the vector b opt = [ b 1 b 2 · · · b K ] T , such that the above log-likelihood function is maximized for b = b opt . Ignoring the constant terms and the terms which do not depend on the unknown bits of the users, and using (9), (10), (13), (14), (15), and (16), we define the simplified loglikelihood function as Therefore, the decision rule for the ML receiver can be written as b opt = arg max b∈{−1,1} K Λ(r; b).
We observe that the outputs of the TF Rake, z k for k = 1, 2, . . . , K, form a set of sufficient statistics for the detection problem. We also observe that, still, maximal ratio combining of the outputs of the TF Rake is necessary, though not sufficient.
The above maximization is a K-dimensional discrete optimization problem and requires a search over 2 K possibilities. As a result, the computational complexity of the receiver increases exponentially with the number of users, which makes its real-time implementation prohibitive for large number of users. Therefore, several suboptimal approaches have been proposed. In the next subsections, we review some of these suboptimal receivers. Later, in Section 5, we introduce a new detection scheme, which iteratively solves the above optimization problem, and even with a few number of stages, shows better performance compared to the existing schemes with similar complexity.

Linear suboptimal multiuser receivers
Having established that z = [z1 z 2 · · · z K ] is a sufficient statistic for the detection problem, we can try other low complexity processings of this vector to obtain some suboptimal receivers. The approach is motivated by the fact that, in the absence of multiaccess interference, that is, when the noise free output of the correlators for the kth user is equal to h k b k , the MRC is optimal. Therefore, we first try to find a reliable estimate for the vectors h k b k for k = 1, 2, . . . , K, given the observation z, and then, to coherently combine them to obtain the bit estimate for each user. In [23], based on the above idea, the well-known decorrelating and MMSE receivers are rederived for the TF Rake. Since the noise vector at the outputs of these linear processings is correlated, a whitening operation is performed before maximal ratio combining of these outputs.
If the linear operation involved in the linear detector is performed using a matrix F, the general form of the overall linear multiuser TF Rake receiver will be where D is a block diagonal whitening matrix. The entries of this matrix depend on the type of the linear processing, that is, the matrix F, as well as the correlation matrix of the signature codes, R, where In Sections 3.3.1 and 3.3.2, we will consider two special cases of the above generic linear detector, called decorrelating and linear MMSE receivers.

Decorrelating receiver
From the likelihood function (20), it is easy to show that the ML estimate for u = Hb is given by Therefore, from (22) by letting F = R −1 , a generalization of the decorrelating receiver of [5] can be obtained, where D dec is defined as in (23), with Q = Q dec = σ 2 R −1 . This detector eliminates multiuser interference at the expense of increasing the noise power. It also requires the correlation matrix inversion, which may be difficult to perform in real time.

Linear MMSE receiver
A generalization of the linear MMSE multiuser detector of [6,10] results from employing a linear MMSE estimate for u = Hb. It is shown in [23] that the corresponding linear operation, F, for this detector is given by where Ψ = E[HH H ]. The resulting linear MMSE TF Rake receiver is given by where D MMSE is defined as in (23) [23]). Because of taking the background noise into account, this detector generally performs better than the decorrelating detector. However, like the decorrelating detector, it requires a correlation matrix inversion, which may be difficult to perform in real time.

Generalized multistage receiver
In [14], Varanasi and Aazhang describe a parallel interference cancellation detector called multistage detector, which attempts to iteratively maximize the likelihood function. At each stage, the bit estimate for each user is obtained by maximizing the likelihood function over the possible values of the data bit of that user, and by using the bit estimates from the previous stage for all other users. From the likelihood function (20), it is easy to show that for the system with TF Rake, the (n + 1)st-stage estimate of the data bit of the kth user, using this multistage detector will be given by the following expression: As it can be seen from the above expression, the tentative decisions obtained from the previous stage are used to estimate and subtract the multiuser interference. The first stage decisions are usually obtained from the conventional detector, which will be given by if the TF Rake is used. This detector outperforms the decorrelating detector when the interfering users are stronger than the user under consideration, but its performance degrades as the energies of the interfering users decrease. In this case, that is, when the interfering users are not much stronger than the user under consideration, because of the enormous errors in the estimate of the interference, the performance of the multistage detector can be even worse than the conventional detector, and using more stages may only result in even more degraded performance. Examples of this situation are given in Figures 3 and 5 and discussed in Section 6. In general, there is no guarantee that the multistage detector will converge, or in convergence, if at all, will produce the global maximizer of the likelihood function. However, its lower computational complexity, which is a result of its iterative nature, is a motivation to look for other iterative methods for maximizing the likelihood function, which have better convergence properties. The EM algorithm is one of these methods, and will be reviewed in the next section.

EM ALGORITHM
Expectation maximization algorithm is an iterative method for maximizing log-likelihood functions. The original problem is formulated as the following optimization problem: where r is the observed data. The vector b can be any set of parameters. In the problem under consideration, it is the vector of unknown data bits of different users. This is a Kdimensional discrete optimization problem whose real-time implementation is prohibitive because of exponential complexity in K (number of users). To construct an iterative suboptimal solution for this problem, a set of complete data, y, is defined such that r = g y 1 , y 2 , . . . , y K = g(y), where g is some many-to-one transformation relating the complete data set, y, to the observation r. Then, instead of solving the problem given in (31), we solve the following maximization problem: However, as mentioned above, y is related to r by a manyto-one transformation and there is no unique y for each value of r. Therefore, we replace the log-likelihood function in (33) with its expected value with respect to y given r, and maximize the following expression: (34) Since b is also unknown, we cannot calculate f Y|R (y | r; b) in (34), therefore we replace b in f Y|R (y | r; b) with the current estimate of b, that is, b, and maximize the following function with respect to its first argument, b, Using Jensen's inequality, it can be shown that This provides the following iterative method for maximizing likelihood function and guarantees that the likelihood function does not decrease along the iterations: • E-step (Expectation calculation step): compute U(b, b (n) ), where b (n) is the estimate of b in the nth iteration.
Since the likelihood function is bounded above, and since the above estimates monotonically increase in likelihood, we expect the algorithm to converge to at least a local maximizer. By an appropriate choice of the initial estimates, b (0) , the algorithm can produce the global maximizer of the likelihood function.
In most cases, if the complete data is chosen properly, the maximization step of the above algorithm can be decomposed into K one-dimensional maximization, which has linear complexity in K and can be easily implemented for realtime processing.

EM-BASED MULTIUSER DETECTOR
In order to apply the EM algorithm to the problem in hand, we define the complete data, y(t) = [y1(t) · · · y K (t)] T , where y k (t) = b k x k (t) + n k (t), for k = 1, . . . , K, and n k (t), k = 1, . . . , K are independent additive white Gaussian noise with variance σ 2 k . Then we have r(t) = K k=1 y k (t), and the log-likelihood function of the complete data is where B is a constant.
In the appendix, we will show that with this choice of complete data, the result of the E-step, that is, U(b, b (n) ), is given by the following equality: Since the data bit of each user appears only in one of the terms in the summation in (41), we can maximize each term separately in the M-step. Therefore, defining β k = σ 2 k /σ 2  · · · b (n−1) I-β + s g n b (n) Figure 2: Multiuser receiver structure.
iterative equation for updating the (n + 1)st-stage estimate of the data bit of the kth user will be As mentioned in Section 4, by an appropriate choice of the initial values for the unknown parameters, the algorithm convergence to the global maximizer of the log-likelihood function. As in the well-known multistage detector, a good choice for b (0) k can be the output of the filter matched to the signature signal of the kth user, or if, as in our case, multipath and Doppler diversities are available, the maximal ratio combined outputs of the time-frequency Rake receiver for the kth user, The block diagram of this multiuser detection scheme is shown in Figure 2.
With the above assumption for the initial value for b, we can consider two extreme special cases of the new detection scheme as follows: • if β k = 1, then the new detector for user k will be the same as the multistage one; • if β k = 0, then the new detector for user k will lose its iterative nature, and will reduce to the time-frequency Rake receiver with maximal ratio combining.
With a suitable choice of parameter β for different users, we hope to achieve better performance than both TF Rake and multistage receivers. According to the discussions of Section 3.4, we expect that large (close to one) values of β will result in good performance for weak (in terms of the signalto-interference ratio) users, whereas for strong users, smaller values of β will provide better performance. This parameter also determines the speed of convergence of the iterative algorithm. In our simulations discussed in Section 6, the value of this parameter for each user is chosen by simulation for the best performance. However, further simulations show that the performance of the detector is not very sensitive to the exact values of these parameters, and the values from the following heuristic expression: where SIR k is a measure of the signal-to-interference ratio, calculated as provide similar performance.

SIMULATION RESULTS
We implemented the EM-based multiuser detector and compared its performance with the base case of the timefrequency Rake as well as the multiuser detector. The simulations are done for a system with five users with Gold sequences of spreading length 7. In the EM and multistage detectors we obtained the performance curves for two-and three-stage cases. The channel was modeled as a three-path channel, with independent Jakes' models for each path. Figures 3 and 4 show the plots of bit error rate (BER) versus the signal-to-noise Ratio (SNR) for a case with Doppler frequency of 100 Hz. We observe that the performance of the EM-based detector is better for both users than the base case of the TF Rake as well as the multistage detector. Notice that for the multistage detector, the performance of the three-stage detector is worse than the two-stage detector for user 2, and does not show much improvement in the performance for user 5. As a result, the performance of the twostage EM detector is better than the three-stage multistage detector with higher computational complexity. It should be noted that the computational complexities of these two detectors with the same number of stages are similar. Finally, we observe that the three-stage EM provides significant gains with respect to the multistage case.
Similarly, Figures 5 and 6 show that the performance is consistent with other values of the Doppler (200 Hz). EM detector also shows similar performance for other users.   Note that the different users have different β's in the different plots. The appropriate value for parameter β can result in a rapid convergence of the EM algorithm. In our simulations, these parameters are chosen by simulation for the best performance within two or three stages. As mentioned in Section 5, however, even the values obtained from the heuristic expression (44) provide satisfactory performance.

CONCLUSIONS
We have presented a new multiuser detector for CDMA systems in fast fading multipath channels. The detector uses the time-frequency Rake receiver at the front end to exploit multipath and Doppler spreads as two sources of diversity. The multiaccess interference cancellation part of the detector is based on the EM algorithm. It has an iterative structure very similar to the generalized multistage detector but with better convergence properties. As a result, unlike the multistage detector whose performance could become very poor for strong users because of the errors in the decisions of the weak users, this detector shows good performance for all users. Our simulation results show that the new EM-based detector can provide a substantial improvement in performance compared to the generalized multistage detector as well as the TF Rake.
The improvement in the performance comes at the expense of introducing a set of new parameters, which have to be chosen appropriately. In this paper, the optimum values for these parameters were found by simulation and exhaustive search. Finding an analytical expression for the optimum values of these parameters is not addressed in this paper and requires more investigation, but we have provided an ad hoc expression which is shown to provide satisfactory performance, very close to that of optimum values found by simulation.

APPENDIX
In this appendix, we apply the E-step of the EM algorithm to (40) to obtain (41). Expanding the squared absolute value in (40) and noting that b 2 k = 1, we have log f Y (y; b) = g(y) + According to the definition of U(b, b (n) ), we have to compute the conditional expected value of the log-likehood function in (A.1) given the observed signal r(t), at a parameter value b (n) . Defining C(t) = [(b1/σ 2 1 )x * 1 (t) · · · (b K /σ 2 K )x * K (t)] T and ignoring the first term g(y), which has no effect on the maximization process, we have (A. 6) and (41) can be obtained by substituting (A.6) in (A.2) and using (10), (13), and (15).