Eurasip Journal on Applied Signal Processing (submitted) Multi-carrier Block-spread Cdma for Broadband Cellular Downlink Special Issue: Multi-carrier Communications and Signal Processing

Effective suppression of multiuser interference (MUI) and mitigation of frequency-selective fading effects within the complexity constraints of the mobile constitute major challenges for broadband cellular downlink transceiver design. Existing wideband direct-sequence (DS) code division multiple access (CDMA) transceivers suppress MUI statistically by restoring the orthogonality among users at the receiver. However, they call for receive diversity and multichannel equalization to improve the fading effects caused by deep channel fades. Relying on redundant block spreading and linear precoding, we design a so-called multicarrier block-spread-(MCBS-)CDMA transceiver that preserves the orthogonality among users and guarantees symbol detection, regardless of the underlying frequency-selective fading channels. These properties allow for deterministic MUI elimination through low-complexity block despreading and enable full diversity gains, irrespective of the system load. Different options to perform equalization and decoding, either jointly or separately, strike the trade-off between performance and complexity. To improve the performance over multi-input multi-output (MIMO) multipath fading channels, our MCBS-CDMA transceiver combines well with space-time block-coding (STBC) techniques, to exploit both multiantenna and multipath diversity gains, irrespective of the system load. Simulation results demonstrate the superior performance of MCBS-CDMA compared to competing alternatives. 1. INTRODUCTION The main drivers toward future broadband cellular systems, like high-speed wireless internet access and mobile multime-dia, require much higher data rates in the downlink (from base to mobile station) than in the uplink (from mobile to base station) direction. Given the asymmetric nature of most of these broadband services, the capacity and performance bottlenecks clearly reside in the downlink of these future systems. Broadband cellular downlink communications poses three main challenges to successful transceiver design. First, for increasing data rates, the underlying multipath channels


I. INTRODUCTION
The main drivers toward future broadband cellular systems, like high-speed wireless internet access and mobile multimedia, require much higher data rates in the downlink (from base to mobile station) than in the uplink (from mobile to base station) direction.Given the asymmetric nature of most of these broadband services, the capacity and performance bottlenecks clearly reside in the downlink of these future systems.Broadband cellular downlink communications poses three main challenges to successful transceiver design.First, for increasing data rates, the underlying multi-path channels become more time-dispersive, causing Inter-Symbol Interference (ISI) and Inter-Chip Interference (ICI), or equivalently planning in a cellular system, compared to convential multiple access techniques like Frequency Division Multiple Access (FDMA) and Time Division Multiple Access (TDMA) [2].In the downlink, DS-CDMA relies on the orthogonality of the spreading codes to separate the different user signals.However, ICI destroys the orthogonality among users, giving rise to MUI.Since the MUI is essentially caused by the multi-path channel, linear chip-level equalization, followed by correlation with the desired user's spreading code, allows to suppress the MUI [3], [4], [5], [6].However, chip equalizer receivers suppress MUI only statistically, and require receive diversity to cope with the effects caused by deep channel fades [7], [8].
Multi-Carrier (MC) CDMA has recently gained increased momentum as candidate air interface for future broadband cellular systems, because it combines the advantages of CDMA with those of Orthogonal Frequency Division Multiplexing (OFDM) [9].Indeed, OFDM enables high data rate transmissions by combatting ISI in the frequency-domain [10].Three different flavours of MC-CDMA exist, depending on the exact position of the CDMA and the OFDM component in the transmission scheme.The first variant, called MC-CDMA, performs the spreading operation before the symbol blocking (or serial-to-parallel conversion), which results in a spreading of the information symbols across the different subcarriers [11], [12], [13].However, like classical DS-CDMA, MC-CDMA does not exploit full frequency diversity gains, and requires receive diversity to ameliorate deal with the effects caused by deep channel fades.
The second variant, called MC-DS-CDMA, executes the spreading operation after the symbol blocking, resulting in a spreading of the information symbols along the time axis of the different subcarriers [14], [15].However, like classical OFDM, MC-DS-CDMA necessitates bandwidth consuming Forward Error Correction (FEC) coding plus frequency-domain interleaving to mitigate frequency-selective fading.Finally, Multi-Tone (MT) DS-CDMA, performs the spreading after the OFDM modulation, such that the resulting spectrum of each subcarrier no longer satisfies the orthogonality condition [16].Hence, MT-DS-CDMA suffers from ISI, Inter-Tone Interference (ITI), as well as MUI, and requires expensive multi-user detection techniques to achieve a reasonable performance.
To tackle the challenges of broadband cellular downlink communications differently, we design a novel, so-called Multi-Carrier Block-Spread (MCBS) CDMA transceiver.By capitalizing on the general concepts of redundant block spreading and linear precoding, our transceiver possesses three nice properties (Section II).First, by Cyclic Prefixing (CP) or Zero Padding (ZP) the block-spread symbol blocks, our MCBS-CDMA transceiver preserves the orthogonality among users, regardless of the underlying time-dispersive multi-path channels.This property allows for deterministic (as opposed to statistical) MUI elimination through low-complexity block despreading.Alternative MUI-free MC transceivers, like AMOUR [17] and Generalized Multi-Carrier (GMC) CDMA [18], rely on Orthogonal Frequency Division Multiple Access (OFDMA) to retain the orthogonality among users, regardless of the multipath channel.Unlike AMOUR and GMC-CDMA, our transceiver relies on Orthogonal CDMA, and thus inherits the nice advantages of CDMA related to universal frequency reuse in a cellular network, like increased capacity and simplified network planning.Second, redundant linear precoding guarantees symbol detectability and full frequency-diversity gains, thus robustifying the transmission against deep channel fades.Different equalization and decoding options, ranging from linear over decision-directed to Maximum Likelihood (ML) detection, strike the trade-off between performance and complexity (Section III).Finally, our transceiver exhibits a rewarding synergy with multi-antenna techniques, to increase the spectral efficiency and/or improve the link reliability (Section IV).Simulation results demonstrate the outstanding performance of the proposed transceiver compared to competing alternatives (Section V).

II. TRANSCEIVER DESIGN
Given the asymmetric nature of broadband services requiring much higher data rates in downlink than in uplink direction, we focus on the downlink bottleneck of future broadband cellular systems.Our goal is to design a transceiver that can cope with the three main challenges of broadband cellular downlink communications.First, multi-path propagation gives rise to time dispersion and frequency-selective fading causing ISI and ICI, which limit the maximum data rate of a system without equalization.Second, multiple users trying to access common network resources may interfere with each other, resulting in MUI, which upperbounds the maximum user capacity in a cellular system.Specific to DS-CDMA downlink March 6, 2003 DRAFT transmission, the MUI is essentially caused by multi-path propagation, since it destroys the orthogonality of the user signals.Third, cost, size and power consumption issues put severe constraints on the receiver complexity at the mobile.
Throughout the paper, we consider a single cell of a cellular system with a Base Station (BS) serving M active Mobile Stations (MSs) within its coverage area.For now, we limit ourselves to the single-antenna case and defer the multi-antenna case to Section IV.

A. Multi-carrier block-spread CDMA transmission
The block diagram in Fig. 1 describes the Multi-Carrier Block-Spread (MCBS) CDMA downlink transmission scheme (where only the m-th user is explicitly shown), that transforms the M user data symbol sequences {s m [i]} M m=1 into the multi-user chip sequence u[n] with a rate 1 Tc .Apart from the user multiplexing and the IFFT, the transmission scheme performs three major operations, namely linear precoding, block spreading, and adding transmit redundancy.Since our scheme belongs to the general class of block transmission schemes, the m-th user's data symbol sequence s m [i] is first serial-to-parallel converted into blocks of B symbols, leading to the symbol block sequence The blocks s m [i] are linearly precoded by a Q × B matrix Θ to yield the Q × 1 precoded symbol blocks: where the linear precoding can be either redundant (Q > B) or non-redundant (Q = B).For conciseness, we limit our discussion to redundant precoding, but the proposed concepts apply equally well to nonredundant precoding.As we will show later, linear precoding guarantees symbol detection and maximum frequency-diversity gains, and thus robustifies the transmission against frequency-selective fading.Unlike the traditional approach of symbol spreading that operates on a single symbol, we apply here block spreading that operates on a block of symbols.Specifically, the block sequence sm [i] is spread by a factor N with the user composite code sequence c m [n], which is the multiplication of a short orthogonal Walsh-Hadamard spreading code that is MS specific and a long overlay scrambling code that is BS specific.
The chip block sequences of the different active users are added, resulting into the multi-user chip block sequence: where the chip block index n is related to the symbol block index i by: n = iN + n , n ∈ {0, . .., N − 1}.
As will become apparent later, block spreading enables MUI-resilient reception, and thus effectively deals March 6, 2003 DRAFT with the MUI.Subsequently, the Q × Q IFFT matrix F H Q transforms the Frequency-Domain (FD) chip block sequence x[n] into the Time-Domain (TD) chip block sequence: As will be clarified later, this transmit redundancy copes with the time-dispersive effect of multi-path propagation and also enables low-complexity equalization at the receiver.Finally, the resulting transmitted chip block sequence u[n] is parallel-to-serial converted into the corresponding scalar sequence and transmitted over the air at a rate 1  Tc .

B. Channel model
Adopting a discrete-time baseband equivalent model, the chip-sampled received signal is a channeldistorded version of the transmitted signal, and can be written as: where h[l] is the chip-sampled FIR channel that models the frequency-selective multi-path propagation between the transmitter and the receiver including the effect of transmit and receive filters, L c is the order of h[l], and w[n] denotes the additive gaussian noise, which we assume to be white with variance σ 2 w .Furthermore, we define L as a known upperbound on the channel order: L ≥ L c , which can be well approximated by L ≈ τmax Tc + 1, where τ max is the maximum delay spread within the given propagation environment.

C. MUI-resilient reception
The block diagram in Fig. 2 describes the reception scheme for the MS of interest (which we assume to be the m-th one), which transforms the received sequence v[n] into an estimate of the desired user's data From the scalar input/output relationship in (3), we can derive the corresponding block input/output relationship: where (see e.g.[18] for a detailed derivation of the single-user case).

March 6, 2003 DRAFT
The time-dispersive nature of multi-path propagation gives rise to so-called Inter-Block Interference (IBI) between successive blocks, which is modeled by the second term in (4).The Q × K receive matrix R again removes the redundancy from the blocks v[n]: y The purpose of the transmit/receive pair (T, R) is twofold.First, it allows for simple block by block processing by removing the IBI.Second, it enables low-complexity frequency-domain equalization by making the linear channel convolution to appear circulant to the received block.To guarantee perfect IBI removal, the pair (T, R) should satisfy the following condition: To enable circulant channel convolution, the resulting channel matrix In this way, we obtain a simplified block input/output relationship in the TD: where is the corresponding noise block sequence.In general, two options for the pair (T, R) exist that satify the above conditions.The first option corresponds to Cyclic Prefixing (CP) in classical OFDM systems, and boils down to choosing K = Q + L, and selecting: where I cp consists of the last L rows of I Q .The circulant property is enforced at the transmitter by adding a cyclic prefix of length L to each block.Indeed, premultiplying a vector with T cp copies its last L entries and pastes them to its top.The IBI is removed at the receiver by discarding the cyclic prefix of each received block.Indeed, premultiplying a vector with R cp deletes its first L entries and thus satisfies (5).
The second option corresponds to Zero Padding (ZP), and boils down to setting K = Q + L, and selecting: where I zp is formed by the first L columns of I Q .Unlike classical OFDM systems, here the IBI is entirely dealt with at the transmitter.Indeed, premultiplying a vector with T zp pads L trailing zeros to its bottom, and thus satisfies (5).The circulant property is enforced at the receiver by time-aliasing each received block.Indeed, premultiplying a vector with R zp adds its last L entries to its first L entries.
Referring back to (6), circulant matrices possess a nice property that enables simple per-tone equalization in the frequency-domain.
Property 1: Circulant matrices can be diagonalized by FFT operations [19]: with Relying on Property 1, this leads to the following FD block input/output relationship: where we obtain the symbol block level equivalent of (10): where 2), we also have that: where ] T is the m-th user's composite code vector used to block spread its data symbol block sm [i].By inspecting (11) and ( 12), we can conclude that our transceiver preserves the orthogonality among users, even after propagation through a (possibly unknown) frequencyselective multi-path channel.This property allows for deterministic MUI elimination through low-complexity code-matched filtering.Indeed, by block despreading (11) with the desired user's composite code vector c m [i] (we assume the m-th user to be the desired one), we obtain: where

D. Single-user equalization
After succesfull elimination of the MUI, we still need to detect the desired user's symbol block s m [i] from (13).Ignoring for the moment the presence of Θ (or equivalently setting Q = B and selecting irrespective of the underlying channel realization.Since an FIR channel of order L can invoke at most L zero diagonal entries in H, this requires any Q − L = B rows of Θ to be linearly independent.In [20], two classes of precoders have been constructed that satisfy this condition and thus guarantee symbol detectability or equivalently enable full frequency-diversity gain, namely the Vandermonde precoders and the cosine precoders.For instance, a special case of the general cosine precoder is a truncated Discrete Cosine Transform (DCT) matrix.

III. EQUALIZATION OPTIONS
In this section, we discuss different options to perform equalization and decoding of the linear precoding, either jointly or separately.These options allow to trade-off performance versus complexity, ranging from optimal Maximum-Likelihood (ML) detection with exponential complexity to linear and decision-directed detection with linear complexity.To evaluate the complexity, we distinguish between the initialization phase, where the equalizers are calculated, and the data processing phase, where the actual equalization takes place.The rate of the former is related to the channel's fading rate, whereas the latter is executed continuously at the symbol block rate.

A. ML detection
The ML algorithm is optimal in a Maximum Likelihood sense, but has a very high complexity.The likelihood function of the received block ỹm [i], conditioned on the transmitted block s m [i], is given by: Amongst all possible transmitted blocks, the ML algorithm retains the one that maximizes the likelihood function or, equivalently, minimizes the Euclidean distance: In other words, the ML metric is given by the Euclidean distance between the actual received block and the block that would have been received if a particular symbol block had been transmitted in a noiseless environment.The number of possible transmit vectors in S is the cardinality of S, i.e. |S| = M B , with M the constellation size.So, the number of points to inspect during the data processing phase grows exponentially with the initial block length B. Hence, this algorithm is only feasible for a small block length B and a small constellation size M. Note that the ML algorithm does not require an initialization phase.

B. Joint Linear Equalization and Decoding
Linear equalizers that perform joint equalization and decoding combine a low complexity with medium performance.A first possiblity is to apply a Zero-Forcing (ZF) linear equalizer: which completely eliminates the ISI, irrespective of the noise level.By ignoring the noise, it causes excessive noise enhancement, especially at low SNR.A second possiblity is to apply a Minimum Mean-Square-Error (MMSE) linear equalizer: which minimizes the MSE between the actual transmitted symbol block and its estimate.The MMSE linear equalizer explicitly takes into account the noise variance σ 2 w and the information symbol variance σ 2 s , and balances ISI elimination with noise enhancement.From ( 17) and (18), it is also clear that G M M SE reduces to G ZF at high SNR.
During the initialization phase, G ZF and G M M SE can be computed from the multiple sets of linear equations, implicitly shown in (17) and (18), respectively.The solution can be found from Gaussian elimination with partial pivoting, based on the LU decomposition [19], leading to an overall complexity of O(QB 2 ).During the data processing phase, the equalizers G ZF and G M M SE are applied to the received block ỹm [i], leading to a complexity of O(QB).March 6, 2003 DRAFT

C. Joint Decision Feedback Equalization and Decoding
On the one hand, the ML algorithm of Subsection III-A achieves the optimal performance but with a very high complexity.On the other hand, the linear equalizers of Subsection III-B offer a low complexity but at a relatively poor performance.The class of non-linear equalizers that perform joint decision feedback equalization and decoding lie in between the former categories, both in terms of performance and complexity.Decision feedback equalizers exploit the finite alphabet property of the information symbols to improve performance relative to linear equalizers.They consist of a feedforward section, represented by the matrix W, and a feedback section, represented by the matrix B: The feedforward and feedback section can be designed according to a ZF or MMSE criterium [21].In either case, B should be a strictly upper or lower triangular matrix with zero diagonal entries, in order to feedback decisions in a causal way.To design the decision feedback counterpart of the ZF linear equalizer, we compute the Cholesky decomposition of the matrix : where U 1 is an upper triangular matrix with ones along the diagonal, and Σ 1 is a diagonal matrix with real entries.The ZF feedforward and feedback matrices then follow from: The linear feedforward section W ZF suppresses the ISI originating from "future" symbols, the so-called pre-cursor ISI, whereas the non-linear feedback section B ZF eliminates the ISI originating from "past" symbols, the so-called post-cursor ISI.
Likewise, to design the decision feedback counterpart of the MMSE linear equalizer, we compute the Cholesky decomposition of the matrix (18): where U 2 is an upper triangular matrix with ones along the diagonal, and Σ 2 is a diagonal matrix with real entries.The MMSE feedforward and feedback matrices can then be calculated as: During the initialization phase, the feedforward and feedback filters are computed based on a Cholesky decomposition [19], leading to an overall complexity of O(QB 2 ).During the data processing phase, the feedforward and feedback filters are applied to the received data according to (19), leading to a complexity of O(QB).Note that the decision feedback equalizers involve the same order of complexity as their linear counterparts.

D. Separate Linear Equalization and Decoding
Previously, we have only considered joint equalization and decoding of the linear precoding.However, in order to even further reduce the complexity with respect to the linear equalizers of Subsection III-B, equalization and decoding can be performed separately as well: where G performs linear equalization only and tries to restore sm [i], and Θ H subsequently performs linear decoding only and tries to restore s m [i].
The ZF equalizer perfectly removes the amplitude and phase distortion: but also causes excessive noise enhancement, especially on those tones that experience a deep channel fade.Since H is a diagonal matrix, the ZF equalizer decouples into Q parallel single-tap equalizers, acting on a per-tone basis in the FD.The MMSE equalizer balances amplitude and phase distortion with noise enhancement and can be expressed as: where If we neglect the color in the precoded symbols R s ≈ σ 2 s I Q , the MMSE equalizer also decouples into Q parallel and independent single-tap equalizers.
During the initialization phase, GZF and GMMSE are calculated from (25) and (26), respectively, where the matrix inversion reduces to Q parallel scalar divisions, leading to an overall complexity of O(Q).During the data processing phase, the received data is separately equalized and decoded, leading to an overall complexity of O(QB).systems that deploy N T transmit and N R receive antennas, enable an N min -fold capacity increase in rich scattering environments, where N min = min {N T , N R } is called the multiplexing gain [22], [23], [24].
Besides the time, frequency and code dimensions, MIMO systems create an extra spatial dimension that allows to increase the spectral efficiency and/or to improve the performance.On the one hand, Space Division Multiplexing (SDM) techniques achieve high spectral efficiency by exploiting the spatial multiplexing gain [25] (see also [26]).On the other hand, Space-Time Coding (STC) techniques achieve high Quality-of-Service (QoS) by exploiting diversity and coding gains [27], [28], [29].Besides the leverages they offer, MIMO systems also sharpen the challenges of broadband cellular downlink communications.
First, time dispersion and ISI are now caused by N T N R frequency-selective multi-path fading channels instead of just 1.Second, MUI originates from N T M sources instead of just M .Third, the presence of multiple antennas seriously impairs a low-complexity implementation of the MS.To tackle these challenges, we will demonstrate the synergy between our MCBS-CDMA waveform and MIMO signal processing.In particular, we focus on a space-time block coded MCBS-CDMA transmission, but the general principles apply equally well to a space-time trellis coded or a space division multiplexed MCBS-CDMA transmission.

A. Space-time block coded MCBS-CDMA transmission
The block diagram in Fig. 3 describes the Space-Time Block Coded (STBC) MCBS-CDMA downlink transmission scheme (where only the m-th user is explicitly shown), that transforms the M user data with a rate 1 Tc .For conciseness, we limit ourselves to the case of N T = 2 transmit antennas.As for the single-antenna case, the information symbols are first grouped into blocks of B symbols and linearly precoded.Unlike the traditional approach of performing ST encoding at the scalar symbol level, we perform ST encoding at the symbol block level; this was also done in e.g.[30].Out ST encoder operates in the FD and takes two consecutive symbol blocks {s m [2i], sm [2i + 1]} to output the following 2Q × 2 matrix of ST coded symbol blocks: At each time interval i, the ST coded symbol blocks sm 1 [i] and sm 2 [i] are forwarded to the first and the second transmit antenna, respectively.From (27), we can easily verify that the transmitted symbol block at time instant 2i + 1 from one antenna is the conjugate of the transmitted symbol block at time instant 2i from the other antenna (with a possible sign change).This corresponds to a per-tone implementation of the classical Alamouti scheme for frequency-flat fading channels [28].As we will show later, this property allows for deterministic transmit stream separation at the receiver.
After ST encoding, the resulting symbol block sequences {s m nt [i]} N T nt=1 are block spread and code division multiplexed with those of the other users: At this point, it is important to note that each of the N T parallel block sequences are block spread by the same composite code sequence c m [n], guaranteeing an efficient utilization of the available code space.As will become apparent later, this property allows for deterministic user separation at every receive antenna.
After IFFT transformation and the addition of some form of transmit redundancy: the corresponding scalar sequences {u nt [n]} N T nt=1 are transmitted over the air at a rate 1 Tc .

B. MUI-resilient MIMO reception
The block diagram in Fig. 4 describes the reception scheme for the MS of interest, which transforms the different received sequences {v nr [n]} N R nr=1 into an estimate of the desired user's data sequence ŝm [i].After transmit redundancy removal and FFT transformation, we obtain the multi-antenna counterpart of (11): where the n r -th receive antenna, Hnr,nt is the diagonal FD channel matrix from the n t -th transmit to the n rth receive antenna, and Xnt [i] and Znr [i] are similarly defined as Ỹnr [i].From ( 28) and ( 30), we can conclude that our transceiver retains the user orthogonality at each receive antenna, irrespective of the underlying frequency-selective multi-path channels.Like in the single-antenna case, a low-complexity block despreading operation with the desired user's composite code vector c m [i] deterministically removes the MUI at each receive antenna: Hence, our transceiver successfully converts (through block despreading) a multi-user MIMO detection problem into an equivalent single-user MIMO equalization problem.

C. Single-user space-time decoding
After MUI elimination, the information blocks s m [i] still need to be decoded from the received block despread sequences {ȳ m nr [i]} N R nr=1 .Our ST decoder decomposes into three steps: an initial ST decoding step and a transmit stream separation step for each receive antenna, and, finally, a receive antenna combining step.
The initial ST decoding step considers two consecutive symbol blocks {ȳ m nr [2i] and ȳm nr [2i + 1]}, both satisfying the block input/output relationship of (31).By exploiting the ST code structure of (27), we arrive at: Combining (32) and (33) into a single block matrix form, we obtain: where sm (27).From the structure of Hnr in (34), we can deduce that our transceiver retains the orthogonality among transmit streams at each receive antenna for each tone separately, regardless of the underlying frequency-selective multi-path channels.A similar property was also encountered in the classical Alamouti scheme, but only for single-user frequency-flat fading multi-path channels [28].
The transmit stream separation step relies on this property to deterministically remove the transmit stream interference through low-complexity linear processing.Let us define the Q × Q matrix Dnr with where the resulting noise ήm nr [i] := ŪH nr • ηm nr [i] is still white with variance σ 2 w .Since multiplying with a unitary matrix preserves ML optimality, we can deduce from (35) that the symbol blocks sm [2i] and sm [2i + 1] can be decoded separately in an optimal way.As a result, the different symbol blocks sm [i] can be detected independently from: Stacking the blocks from the different receive antennas {ý m nr [i]} N R nr=1 for the final receive antenna combining step, we obtain: At this point, we have only collected the transmit antenna diversity at each receive antenna, but still need to collect the receive antenna diversity.Let us define the Q × Q matrix D with non-negative diagonal entries as: where the resulting noise zm [i] := ÚH • źm [i] is still white with variance σ 2 w .Since the multiplication with a tall unitary matrix that does not remove information also preserves ML decoding optimality, the blocks s m [i] can be optimally decoded from (38).Moreover, (38) has the same structure as its single-antenna counterpart in (13).Hence, the design of the linear precoder Θ in Subsection II-D, and the different equalization options that we have discussed in Section III, can be applied here as well.

V. SIMULATION RESULTS
We consider the downlink of a single-antenna MCBS-CDMA system, operating at a carrier frequency of F c = 2 GHz, and transmitting with a chip rate of R c = 1 Tc = 4.096 M Hz.Each user's bit sequence is QPSK modulated with n b = 2 bits per symbol.We assume that the multi-path channel is FIR with, unless otherwise stated, order L c = 3, and Rayleigh distributed channel taps of equal variance 1 Lc+1 .To satisfy the IBI removal condition L ≥ L c , we choose L = 8.Note that this specific design can handle a delay spread of T g = LT c ≈ 2 µs.However, a larger transmit redundancy can be used to handle more ISI.To

A. Comparison of different equalization options
We test the different equalization options, discussed in Section III, for a fully-loaded system with M = 16 active users.Fig. 5 compares the performance of the different Linear Equalizers (LEs) and Decision Feedback Equalizers (DFEs) that perform joint equalization and decoding.As a reference, also the performance of a system without linear precoding (uncoded) as well as the optimal ML performance are shown.Clearly, the system without linear precoding only achieves diversity 1, whereas ML detection achieves the full frequency-diversity gain L c + 1 = 4.The ZF-LE performs worse than the uncoded system at low SNR, but better at high SNR (SN R ≥ 12 dB).The MMSE-LE always outperforms the uncoded system and achieves a diversity gain between 1 and L c + 1 = 4.At a BER of 10 −3 , it realizes a 5.5 dB gain compared to its ZF counterpart.The non-linear ZF-and MMSE-DFEs outperform their respective linear counterparts, although this effect is more pronounced for the ZF than for the MMSE criterion.At a BER of 10 −3 , the MMSE-DFE exhibits a 2.8 dB gain relative to the MMSE-LE, and comes within 1.4 dB of the optimal ML detector.Fig. 6 compares the performance of separate versus joint linear equalization and decoding.On the one hand, the separate ZF-LE always performs worse than the uncoded system, due to the excessive noise enhancement caused by the presence of channel nulls.On the other hand, the separate MMSE-LE almost perfectly coincides with its corresponding joint MMSE-LE, and thus achieves a diversity gain between 1 and L c + 1 = 4.

B. Comparison with DS-CDMA
In the following, we compare two different CDMA transceivers: T1.The first transceiver applies the classical downlink DS-CDMA transmission scheme of the UMTS and the IS-2000 WCDMA standards [1].At the receiver, a time-domain MMSE chip equalizer based on perfect Channel State Information (CSI) is applied.The bandwidth efficiency of the first transceiver supporting M 1 users can be calculated as 1 = M 1 N , where N is the length of the Walsh-Hadamard spreading codes.
T2.The second transceiver is our MCBS-CDMA transceiver, discussed in Section II.At the receiver, a frequency-domain MMSE equalizer (either jointly or separately) based on perfect CSI is used.The bandwidth efficiency of our transceiver supporting M 2 users can be calculated as 2 = M 2 B N (B+2L) , where the overhead 2L stems from the redundant linear precoding and the IBI removal.
In order to make a fair comparison between the two transceivers, we should force their respective bandwidth efficiencies to be the same 1 = 2 , which leads to the following relationship between the number of users to be supported by the different transceivers: M 2 = B+2L B M 1 .With B = 56 and L = 8, we can derive that M 2 = 9 7 M 1 .Fig. 7 compares the performance of the two transceivers for a small system load with M 1 = 3 and  Since T2 is an MUI-free CDMA transceiver, its performance remains unaffected by the MUI.So, even at large system load, T2 achieves a diversity order between 1 and L c + 1 = 4.We also observe that T1 now performs poorly compared to T2: e.g. at a BER of 3•10 −2 , T2 achieves a 9 dB gain compared to T1.In contrast with T2 that deterministically removes the MUI, T1 does not completely suppress these March 6, 2003 DRAFT interferences at high SNR.Hence, T1 suffers from a BER saturation level that increases with the system load M 1 .

C. Performance of Space-Time Block Coded MCBS-CDMA
We test our MIMO CDMA transceiver of Section IV, employing a cascade of STBC and MCBS-CDMA, for three different MIMO system setups (N T , N R ): the (1, 1) setup, the (2, 1) setup with TX diversity only and the (2, 2) setup with both TX and RX diversity.The system is fully-loaded supporting M = 16 active users.For each setup, both the MMSE-LE as well as the optimal ML detector are shown.setup.So, the larger the number of TX and/or RX antennas, the better the proposed transceiver with linear receiver processing succeeds in extracting the full diversity of order N T N R (L c + 1).
Fig. 10 shows the same results but now for frequency-selective channels with channel order L c = 3.
Again fixing the BER at 10 −3 and focusing on the MMSE-LE, the (2, 1) setup outperforms the (1, 1) setup by 4 dB, whereas the (2, 2) setup achieves on its turn a 2 dB gain compared to the (2, 1) setup.So, compared to Fig. 9, the corresponding gains are now smaller because of the inherently larger underlying multi-path diversity.

VI. CONCLUSION
To cope with the challenges of broadband cellular downlink communications, we have designed a novel Multi-Carrier (MC) CDMA transceiver that enables significant performance improvements compared to 3G cellular systems, yielding gains of up to 9 dB in full load situations.To this end, our so-called Multi-Carrier Block-Spread (MCBS) CDMA transceiver capitalizes on redundant block-spreading and linear precoding to preserve the orthogonality among users and to enable full multi-path diversity gains, regardless of the underlying multi-path channels.Different equalization options, ranging from linear to ML detection, strike the trade-off between performance and complexity.Specifically, the MMSE decision feedback equalizer realizes a 2.8 dB gain relative to its linear counterpart and performs within 1.4 dB of the optimal ML detector.Finally, our transceiver demonstrates a rewarding synergy with multi-antenna techniques to increase the spectral efficiency and/or improve the link reliability over MIMO channels.
Specifically, our STBC/MCBS-CDMA transceiver retains the orthogonality among users as well as transmit streams to realize both multi-antenna and multi-path diversity gains of N T N R (L c +1) for every user in the system, irrespective of the system load.Moreover, a low-complexity linear MMSE detector, that performs either joint or separate equalization and decoding, approaches the optimal ML performance (within 0.4 dB for a (2, 2) system) and comes close to extracting the full diversity in reduced as well as full load settings.

:
We use roman letters to represent scalars, lower boldface letters to denote column vectors (i.e.blocks) and upper boldface letters to denote matrices (i.e. a collection of blocks).(•) * , (•) T , and (•) H represent conjugate, transpose, and Hermitian, respectively.Further, | • | and • represent the absolute value and Frobenius norm, respectively.We reserve E{•} for expectation, and • for integer flooring.Subscripts n t and n r point to the n t -th transmit and the n r -th receive antenna, respectively.Superscript m points to the m-th user.Arguments i and n denote symbol (block) and chip (block) indices, respectively.Tilded letters x denote frequency-domain signals, upperlined letters x denote space-time block encoded signals at the transmitter and block-despread signals at the receiver.Acuted letters x denote space-time block decoded signals at the receiver.Hatted letters x denote soft estimates, whereas hatted and underlined letters x denote hard estimates.

March 6
, 2003 DRAFTIV.EXTENSION TO MULTIPLE ANTENNASAs showed in Sections II and III, MCBS-CDMA successfully addresses the challenges of broadband cellular downlink communications.However, the spectral efficiency of single-antenna MCBS-CDMA is still limited by the received signal-to-noise ratio and can not be further improved by traditional communication techniques.As opposed to single-antenna systems, Multiple-Input Multiple-Output (MIMO) March 6, 2003  DRAFT   non-negative diagonal entries as: Dnr := [ Hnr,1 • H * nr,1 + Hnr,2 • H * nr,2 ] 1/2 .From (34), we can verify that the channel matrix Hnr satisfies: HH nr • Hnr = I 2 ⊗ D2 nr , where ⊗ stands for Kronecker product.Based on Hnr and Dnr , we can construct a unitary matrix Ūnr := Hnr • (I 2 ⊗ D−1 nr ), which satisfies ŪH nr • Ūnr = I 2Q and ŪH nr • Hnr = I 2 ⊗ Dnr .Performing unitary combining on (34) (through ŪH nr ), collects the transmit antenna diversity at the n r -th receive antenna: 37), we can verify that: HH • H = D2 .Based on H and D, we can construct a tall unitary matrix Ú := H • D−1 , which satisfies ÚH • Ú = I Q and ÚH • H = D. Gathering the receive antenna diversity through multiplying (37) with ÚH , we finally obtain: limit the overhead, we choose the number of subcarriers Q = 8L = 64, leading to a transmitted block length K = Q + L = 72.Hence, the information symbols are parsed into blocks of B = Q − L = 56 symbols, and linearly precoded into blocks of size Q = 64.The Q×B precoding matrix Θ constitutes the first B columns of the DCT matrix.The precoded symbol blocks are subsequently block spread by a real orthogonal Walsh-Hadamard spreading code of length N = 16, along with a complex random scrambling code.

M 2 = 4
( 1 ≈ 2 ).Also shown in the figure is the optimal ML performance bound.At low SNR (SN R ≤ 12), T1 has a 1 dB advantage compared to T2.However, at high SNR (SN R ≥ 12), the performance of T1 starts already flooring off, due to ISI/ICI and associated MUI.Hence, T2 outperforms T1 at high SNR.

Fig. 9
Fig.9depicts the results for frequency-selective channels with channel order L c = 1.Fixing the BER at 10 −3 and focusing on the MMSE-LE, the (2, 1) setup outperforms the (1, 1) setup by 6 dB.The (2, 2) setup achieves on its turn a 3.5 dB gain compared to the (2, 1) setup.Comparing the MMSE-LE with its corresponding ML detector, it incurs a 4 dB loss for the (1, 1) setup, but only a 0.4 dB loss for the(2,2)

Fig. 4 .Fig. 5 .Fig. 7 .Fig. 8 .Fig. 9 .Fig. 10 .
Fig. 4. MUI-resilient STBC/MCBS-CDMA MIMO reception scheme this requires H to have full column rank Q.Unfortunately, this condition only holds for channels that do not invoke any zero diagonal entries in H.In other words, if the MS experiences a deep channel fade on a particular tone (corresponding to a zero diagonal entry in H), the information symbol on that tone can not be recovered.To guarantee symbol detectability of the B symbols in s m [i], regardless of the symbol constellation, we thus need to design the precoder Θ such that: