Frequency-Domain Equalization for OFDMA-Based Multiuser MIMO Systems with Improper Modulation Schemes

In this paper, we propose a novel transceiver structure for Orthogonal Frequency Division Multiple Access (OFDMA) based uplink multiuser MIMO systems. The numerical results show that the proposed frequency-domain equalization schemes signiﬁcantly outperform conventional linear MMSE-based equalizers in terms of bit error rate (BER) performance with moderate increase in computational complexity


I. INTRODUCTION MIMO techniques in combination with Orthogonal Frequency Division Multiple Access (OFDMA)
have been commonly used by most of the 4G air-interfaces, e.g., WiMAX, Long Term Evolution, IEEE 802.20,Wireless broadband, etc.In the IEEE 802.16e mobile WiMAX standard, OFDMA has been adopted for both downlink and uplink transmission [1], [2].In 3GPP LTE, Single Carrier (SC) Frequency Division Multiple Access (FDMA) is used for uplink transmission, whereas the OFDMA signaling format is exploited for downlink transmission [3].There are also some proposals on using OFDMA for uplink transmission in the LTE advanced (LTE-A) standard, in which both SC-FDMA and OFDMA can be considered for uplink transmission.This paper investigates receiver algorithms for the uplink of OFDMA-based multi-user MIMO systems.
Frequency-domain equalization (FDE) is commonly used for OFDMA.This includes frequency-domain linear equalization (FD-LE) [4], decision feedback equalization (DFE) [5], [6], and the more recent turbo equalization (TE) [7], [8].FD-LE is analogous to time-domain LE.A zero-forcing (ZF) LE [9] eliminates intersymbol interference (ISI) completely, but introduces degradation in the system performance due to noise enhancement.Superior performance can be achieved by using the minimum mean square error (MMSE) criterion [9], which accounts for additive noise in addition to ISI.In OFDMA, a DFE results in better performance than a LE due to its ability to remove past echo ISI.However, a DFE is prone to error propagation when incorrect decisions are fed back.Consequently, it suffers from a performance loss for long error bursts.The principle that TE employs to improve performance is to add complexity at the receiver through an iterative process, in which feedback information obtained from the decoder is incorporated into the equalizer at the next iteration.The iterative processing allows for reduction of ISI, multistream interference, and noise by exchanging extrinsic information between the equalizer and the decoder [7], [8].
The second-order properties of a complex random process are completely characterized by its autocorrelation function as well as the pseudo-autocorrelation function [10].Most existing studies on receiver algorithms only exploit the information contained in the autocorrelation function of the observed signal.
The pseudo-autocorrelation function is usually not considered and is implicitly assumed to be zero.While this is the optimal strategy when dealing with proper complex random processes [11], it turns out to be sub-optimal in situations where the transmitted signals and/or interference are improper complex random processes, for which the the pseudo-autocorrelation function is non-vanishing, and the performance of a linear receiver can be improved by the use of widely linear processing (WLP) [12].Such a scenario arises when transmitting symbols with improper modulation formats (e.g., ASK and OQPSK) over complex channels.It was shown in [10] that the performance gain of WLP compared to conventional processing in terms of mean square error can be as large as a factor of 2. MIMO transceiver design was considered in [13], [14], where it was shown that when channel information is available both at the transmitter and receiver, joint design of the precoder and decoder using WLP yields considerable performance gains at the expense of a limited increase in the computational complexity, compared to the conventional linear transceiver in the scenario where real-valued symbols are transmitted over complex channels.By using the same principle, a real-valued MMSE (RV-MMSE) beamformer was developed in [15] for a binary phase shift keying (BPSK) modulated system, and was shown to offer significant enhancements over the standard complex-valued MMSE (CV-MMSE) design in terms of bit error rate performance and the number of supported users.
In this paper, we show that the conventional frequency-domain linear equalizer is suboptimal for improper signals, and that performance can be greatly improved by applying widely linear processing and utilizing complete second-order statistics of improper signals.

II. SYSTEM MODEL
The cellular multiple access system under study has n R receive antennas at the BS and a single transmit antenna at the ith user terminal, i = 1, 2, • • • , K T , where K T is the total number of users in the system.
We consider the multi-user MIMO case with K (K ≤ K T ) users being served at each time slot and K = n R .The system model for an OFDMA-based MIMO transmitter and receiver is shown in Figs. 1   and 2, respectively.On the transmitter side, the user data block containing N symbols first goes through a subcarrier mapping block.These symbols are then mapped to M (M > N ) orthogonal subcarriers followed by an M -point Inverse Fast Fourier Transform (IFFT) to convert to a time-domain complex signal sequence.
There are two approaches to mapping subcarriers among Mobile Stations (MSs) [3]: localized mapping and distributed mapping.The former is usually referred to as localized FDMA transmission, while the latter is usually called distributed FDMA transmission scheme.With the localized FDMA transmission scheme, each user's data are transmitted by consecutive subcarriers, whereas with the distributed FDMA transmission scheme, the user's data are placed in subcarriers that are distributed across the OFDM symbol.[3].Because of the spreading of the information symbols across the entire signal band, the distributed FDMA scheme is more robust against frequency-selective fading and can thus achieve better frequency diversity gain.For localized FDMA transmission, in the presence of a frequency-selective fading channel, multiuser diversity and frequency diversity can also be achieved if each user is assigned to subcarriers with favorable transmission characteristics when the channel is known at the transmitter.
In this work, we only consider localized FDMA transmission.A Cyclic Prefix (CP) is inserted into the signal sequence before it is passed to the Radio Frequency (RF) module.On the receiver side, the opposite operating procedures are performed after the noisy signals are received by the receive antennas.
A MIMO Frequency-Domain Equalizer (FDE) is applied to the frequency-domain signals after subcarrier demapping as shown in Fig. 2. For simplicity, we employ a linear MMSE receiver, which provides a good tradeoff between the noise enhancement and the multiple stream interference mitigation [16].
In the following, we let where k, m ∈ {1, • • • , M } are the sample number and the subcarrier number, respectively.Here ⊗ is the Kronecker product, and I K is the K × K identity matrix.
We denote by D −1 ).Furthermore, we let ̥ n represent the subcarrier mapping matrix of size M × N .Then ̥ −1 n is the subcarrier demapping matrix of size N × M .The received signal after the RF module and CP removal becomes r = HD −1 is the data sequence of all K users, and , is the transmitted user data block for the ith user; w ∈ C M n R ×1 is a circularly symmetric complex Gaussian noise vector with zero mean and covariance matrix The signal after performing the FFT operation, subcarrier demapping and employing a MIMO FDE, is given by where is the channel matrix in the frequency domain and r = HPs + w; G is the KN × KN equalization matrix; w ∈ C n R N ×1 is a circularly symmetric complex Gaussian noise vector with zero mean and covariance matrix N 0 I ∈ R n R N ×n R N , i.e., w ∼ CN (0, N 0 I).The vector x can be expressed as x = Ps, is the user data block for the ith user, and The power loading matrix P ∈ R KN ×KN is a block diagonal matrix with its ith sub-matrix expressed as where C rr = E [rr H ] = HPP H H H + N 0 I is the autocorrelation matrix of the observation vector r; C rs = E [rs H ] = HP is the crosscorrelation matrix between the observation vector r and the symbol vector s.
Note that the aforementioned FDE is a joint equalization algorithm, i.e., the transmitted symbols from different users are jointly equalized.To achieve spatial multiplexing gain, symbols from different users are assigned to the same subcarriers in the studied OFDMA based multiuser MIMO system.Due to co-channel interference (causing the channel matrix H to be non-diagonal), we need to perform joint equalization for the transmitted symbols from different users.

III. THE PROPOSED FREQUENCY-DOMAIN RECEIVER ALGORITHM
In the previous section, we presented the conventional linear MMSE solution for the uplink of OFDMAbased multiuser MIMO systems.It is designed based on the autocorrelation matrix C rr and the crosscorrelation matrix C rs .It is only optimal for systems with proper modulation, such as M -QAM and M -PSK, for which the pseudo-autocorrelation Crr = E [rr T ] and the pseudo-crosscorrelation C * rs = E [r * s H ] are zero when M > 2. However, for improper modulation schemes, such as M -ary ASK and OQPSK (for which both the pseudo-autocorrelation and the pseudo-crosscorrelation are non-zero), the conventional solution becomes suboptimal because Crr and C * rs are not taken into consideration in the receiver design.In order to utilize Crr and C * rs , we need to apply widely linear processing [10], [12], the principle of which is not only to process r, but also its conjugated version r * in order to derive the filter output, i.e., where It is worth noticing that the conventional linear MMSE receiver is a special case of the one expressed by (3), when G 0 = G H and G 1 = 0.
To derive the improved FDE, we re-define the detection error as ǫ = Ψ H y − s.According to the orthogonality principle [17], the mean-square value of the estimation error ǫ is minimum if and only if it is orthogonal to the observation vector y, i.e.,

E [yǫ
leading to the solution Ψ n = C −1 yy C ys , where and Based on the above derivations, we can form the optimal solution for Ψ as For the proposed FDE, the augmented autocorrelation matrix C yy and crosscorrelation matrix C ys expressed in (5), which give a complete second-order description of the received signal, are used to derive the filter coefficient matrix Ψ.On the other hand, for the conventional linear MMSE algorithm, the coefficient matrix G is calculated using only the autocorrelation of the observation C rr and the crosscorrelation C rs .The pseudo-autocorrelation Crr and pseudo-crosscorrelation C * rs are implicitly assumed to be zero, leading to sub-optimal solutions.5).The optimal solution of Ψ can be simplified to which is exactly the same as Eq. ( 2) for the conventional FDE.
The improved FDE has higher computational complexity than the conventional FDE.The difference in complexity lies in the computation of the matrix G for the conventional equalizer and the computation of Ψ for the improved equalizer as indicated in Table I, where we show the number of complex multiplication (×), division (÷), addition (+), and subtraction (−) operations to calculate G and Ψ, respectively.In the complexity calculation, we use the fact that for a L × L matrix, its matrix inversion involves 2L 2 divisions, 2L 3 multiplications, and 2L 3 subtractions.It should also be noted that the complexity increase by the improved scheme is compensated for the significant performance improvement.Furthermore, this issue becomes less critical in slow-fading channels for which the equalizer matrices do not need to be updated frequently.
In Fig. 3, we show the number of flops required to compute the matrix G (for the conventional FDE) and the matrix Ψ (for the improved FDE) as a function of the data block size N for a 2-user case.One flop is counted as one real operation, which can be addition, subtraction, multiplication or division [18].
A complex multiplication requires 4 real multiplications and 2 real additions.It is evident from Fig. 3 that the additional operations required by the improved FDE is moderate when the block size is small, e.g., N < 10, and increases significantly when the block size increases.For example, the number of flops required by the improved FDE is 4.5 times that required by the conventional FDE when N = 12.
Therefore, for efficient implementation, it is necessary to break the received data into blocks of moderate sizes before the equalization is applied.

IV. THE PROPOSED ITERATIVE RECEIVER ALGORITHM
In this section, we derive an iterative FDE algorithm by applying WLP and exploiting the complete second-order statistics of the improper signals.Recall that the received signal after CP removal, FFT and subcarrier demapping can be expressed as where the symbol vector s = s 1 . . .s n−1 s n s n+1 . . .s N K

T
. Let us assume that symbol s n is to be decoded.By using the iterative interference cancellation technique [8], [19], [20], the received vector can be expressed as where r n is the interference canceled version of r, and which contains the soft estimate of the interfering symbols from the previous iteration.Note that (8) represents a decision-directed iterative scheme, where the detection procedure at the p th iteration uses the symbol estimates from the (p − 1) th iteration.The performance is improved in an iterative manner due to the fact that the symbols are more accurately estimated (leading to better interference cancelation) as the iterative procedure goes on.For simplicity, the iteration index is omitted, whenever no ambiguity arises.
In order to further suppress the residual interference in r n , an instantaneous linear filter is applied to r n , to obtain z n = g H n r n , where the filter coefficient vector g n ∈ C N K×1 is chosen by minimizing e n = E {|w H n r n − s n | 2 }, under the MMSE criterion.It can be derived as where (HP) n is the n-th column of the matrix HP.The matrix V n ∈ R N K×1 is formed as where , and var(s Refer to [8], [19], [20] for a detailed description of this conventional iterative algorithm.
The conventional scheme suffers from the problem of error propagation caused by incorrect decisions.
As will become evident in Section V, the error propagation effect can be reduced and the system performance can be improved if we not only process r n , but also its conjugated version r * n in order to derive the filter output, i.
leading to the solution where In what follows, we demonstrate how the vector sn in ( 9) and the matrix V n in (11) can be derived in order to carry out the iterative process.The filter output can be expressed as where the combined noise and residual interference ν n is approximated as a Gaussian random variable [21], i.e., ν n ∼ CN (0, N ν ).The parameters µ n , N ν can be determined as [22] µ After computing the values of µ n and N ν , the conditional probability density function (PDF) of the filter output can be obtained as For M-ary PSK,QAM, ASK systems, each symbol s n corresponds to log 2 M bits, denoted as b i n , i = 1, . . ., log 2 M .The log-likelihood ratio (LLR) for the ith information bit b i n can be computed as where S i,1 (S i,0 ) is the set of symbols {x m } whose ith bit takes the value of 1 (0); s + denotes the symbol corresponding to max{f (z n |s n ∈ S i,1 )}, and s − denotes the symbol corresponding to max{f (z n |s n ∈ S i,0 )}.
The soft estimate si in (9) and the variance var(s i ) in (11), respectively, can be calculated as [22] si = E where The a priori probability of each symbol P r (s i ) can be calculated as P r (s i ) = Π p=1,...,log 2 M P r (b p i ), where 1 + e λ(b p i ) ; P r (b p i = 0) = 1 1 + e λ(b p i ) .

V. SIMULATION RESULTS
We consider a WiMAX baseline antenna configuration, in which two MSs are grouped together and synchronized to form a MIMO channel between the BS and the MSs.We assume a six-path fading channel and the channel matrix is normalized such that the average channel gain for each transmitted symbol be equal to unity.The fading coefficients for each path are modeled as independent identically distributed (i.i.d) complex Gaussian random variables.The channel is assumed to be fully interleaved, have a uniform power delay profile and to be a slowly time-varying so that it remains static during the transmission of one frame of data, but varies from one frame to another.The block size of the user data is 12, which is also the number of subcarriers in a resource block.The size of the FFT is 256, and the length of the Cyclic Prefix (CP) is 8.The power loss incurred by the insertion of the CP is taken into account in the SNR calculation.
Fig. 4 shows the BER performance comparison between the conventional and the improved receivers for 4ASK and OQPSK systems.The improved receiver scheme significantly outperforms its conventional counterpart, especially at high SNRs.The gap can be over 5-6 dB.The curve for a QPSK system with the conventional receiver is also provided for a baseline comparison.Note that for the conventional receiver, the BER performance for an OQPSK system is the same as for a QPSK system [23].The performance of the QPSK system is superior to the 4ASK system with the conventional receiver, but is inferior to the 4ASK system with the improved equalizer at high SNRs.Although QPSK modulation itself is more power efficient than 4ASK for using a signal constellation of 2 dimensions instead of 1, the 4ASK system can exploit the pseudo-autocorrelation function in the receiver design, whereas the QPSK system does not have this special property to utilize.The overall impact will render an advantageous situation for the 4ASK system.Refer to [24] for a detailed and quantitative analysis of the performance gain that can be achieved by a widely linear transceiver.
Fig. 5 shows the BER performance comparison between the conventional and the improved FDE for 16ASK and 16QAM systems.For the 16ASK system, the improved receiver significantly outperforms its conventional counterpart and the performance gain increases as the SNR increases.Fig. 5 also shows that the 16ASK system with the improved FDE performs better than the 16QAM system when SNR > 40 dB.
In Fig. 6, we compare the performance of the proposed iterative FDE introduced in Section IV with the conventional iterative FDE.The curves are plotted at the second iteration, since it has been observed that the major gain from the iterative process can be achieved with two iterations.The conclusions from previous experiments also hold here: the QPSK system has a better performance than the 4ASK system with the conventional iterative FDE, but it is inferior to the 4ASK system with the improved iterative FDE.The performance gain can be over 4 dB at high SNR.The gain achieved by the iterative process can be determined by comparing Fig. 6 to Fig. 4. For example, in order to achieve a target BER of 10 −3 , a SNR value of 28 dB is required for the 4ASK system with the proposed non-iterative FDE, while only 25 dB is required by the proposed iterative FDE at the second iteration.

VI. CONCLUSION
In this paper, we derived an improved FDE algorithm for an OFDMA-based mulituser MIMO system with improper signal constellations.Our simulation results reveal that the proposed scheme has superior BER performance compared to the ones with the conventional FDE.We also presented a novel iterative FDE scheme, which utilizes the complete second-order statistics of the received signal.It is shown that this scheme significantly outperforms the conventional iterative FDE.Fig. 5. BER performance for the uplink of OFDMA system (K = nR = 2) for the conventional FDE and the improved FDE for systems with high-order signal constellations.The users have equal transmit power.
Notations: we use upper bold-face letters to represent matrices and vectors.The (n, k)th element of a matrix A is represented by [A] n,k , the nth element of a vector b is denoted by [b] n and the nth column of a matrix A is represented by (A) n .Superscripts (•) H , (•) T and (•) * denote the Hermitian transpose, transpose and conjugate, respectively.E [•] denotes expectation (statistical averaging).
the transmitted power for the ith user at the nth subcarrier; s ∈ C KN ×1 represents the transmitted data symbol vector from different users with E[ss H ] = I KN .When proper modulation schemes are employed, the conventional equalizer G can be derived from the cost function e = E [ z − s 2 ] = E [ G H r − s 2 ].Minimizing this cost function leads to the optimal solution

For 0 
proper signals like QAM and PSK, the improved FDE converges to the conventional FDE since E [ss T ] = 0, leading to Crr = E {rr T } = 0 and C * rs = E {r * s H } = 0. Therefore, Crr = 0 and C ys =   HP in Eq. ( e., z n = a n r n + b n r * n = Ψ H n y n , where Ψ n = a n b n H and y n = r T n Ψ n can be derived by minimizing the MSE E {|e n | 2 }, where e n = z n − s n = Ψ H n y n − s n .According to the orthogonality principle,

Fig. 6 .
Fig.6.BER performance for the uplink of OFDMA system (K = nR = 2) for the conventional iterative FDE and the improved iterative FDE after the second iteration.The users have equal transmit power.

TABLE I COMPLEXITY
FOR CALCULATING THE EQUALIZATION MATRICES G AND Ψ.