Timing-Free Blind Multiuser Detection for Multicarrier DS/CDMA Systems with Multiple Antennae

The problem of blind multiuser detection for an asynchronous multicarrier DS-CDMA system employing multiple transmit and receive antennae over a Rayleigh fading channel is considered in this paper. The solutions that we develop require prior knowledge of the spreading code of the user to be decoded only, while no further information either on the user to be decoded or on the other active users is required. Several combining rules for the observables at the output of each receive antenna are proposed and assessed, and the implications of the di ﬀ erent options are studied in depth in terms of both detection performance and computational complexity. A closed form expression is also derived for the conditional error probability and a lower bound for the near-far resistance is provided. Results conﬁrm that the proposed blind receivers can cope with both multiple access interference suppression and channel estimation at the price of a limited performance loss as compared to the ideal linear receivers which assume perfect channel state information.


INTRODUCTION
Multicarrier code division multiple access (MC-CDMA) has been conceived as a transmission format which retains the potentials of direct sequence CDMA (DS-CDMA)and in particular its resistance to multipath effects induced by the radio channel as the communication rate grows larger and larger [1]-while relaxing some very demanding requirements posed by its competitor. In particular, the efficacy of DS-CDMA on wireless channels is mainly due to the recombination of multiple rays so as to increase the average signal-to-noise ratio, but this inevitably poses the problem of a tight synchronization so as to avoid heavy mismatch losses in the replicas-retrieving process. MC-CDMA, instead, by partitioning the available bandwidth in many subbands, no larger than the channel coherence bandwidth, and allocating in each subband independently modulated digital signals, achieves two advan-tages, that is, (a) the propagation channel in each subband is frequency-flat, and (b) the symbol duration for the data signals occupying the frequency subbands grows linearly with the number of subbands, thus implying that the need for fast electronics and high-performance synchronization schemes is less stringent. The combination of the MC concept with the CDMA technology has led to the birth of three main access schemes, that is, multitone CDMA [2,3], MC CDMA [4,5,6], and MC DS-CDMA [7,8,9,10].
On the other hand, both MC-CDMA and DS-CDMA are expected to support, in future wireless networks, extremely high data rates, which may be in contradiction with their inherent spectral inefficiency. A viable mean to cope with this problem is to resort to multiple transmit and receive antennae. Indeed, recent results from information theory have shown that the capacity of a multiantenna wireless communication system in a rich scattering environment grows with a law approximately linear in the minimum between the number of transmit and receive antennae [11]. Roughly speaking, multiple transmit antennae generate a spatial diversity which can be successfully exploited at the receiver end to improve performance, especially if space-time coding techniques are employed at the transmitter [12]. Motivated by these considerations, many studies have been recently published for either single-user or multiuser multiantenna systems [13,14].
All of these studies, though, assume either perfect channel state information (CSI) or error-free estimation thereof. The problem of evaluating the cost of such an information has been only recently considered [15] and the main results are as follows: (a) the training and the data transmission phase should be carefully designed in order to ensure reliable transmission in a multiantenna system on wireless channel; (b) in the large signal-to-noise ratio regime, the length of the training phase should be in the order of the number of transmit antennae; (c) in the region of low signal-to-noise ratios, about half the transmission time should be devoted to training, and, moreover, the capacity of trained systems is far from the optimal one. It is also worth pointing out that in a CDMA multiaccess network, the signal-to-interferenceplus-noise ratio is expected to be quite low, at least as far as the network load increases, whereby the task of reducing-if not nullifying-the training phase is more and more stringent.
Motivated by these results, the present paper deals with the problem of blind multiantenna systems employing an MC DS-CDMA modulation format. 1 Since the prior uncertainty as to the CSI results in a complete lack of knowledge of the spatial signatures of both the user of interest and of the other users, while knowledge of the spreading code of all of the active users can be reasonably assumed only at the "base station" of an isolated cell, we consider the more general scenario where the receiver cannot avail itself of any prior information beyond the spreading code of the user of interest, and is thus faced with asynchronous cochannel interference (whether from the same cell or from nearby cells); thus differential encoding-decoding is assumed, as a result of the lack of a phase reference. For the sake of simplicity, we also consider uncoded transmission, even though the results can be extended to account for space-time block coding. The main contributions of this paper can be summarized as follows.
(1) We develop a signal model for an MC DS-CDMA system operating over a fading dispersive channel and employing multiple transmit and receive antennae that resembles the signal model developed in [16,17,18] with reference to a single-antenna DS-CDMA system operating in the same conditions. (2) Based on the above analogy, we extend the subspace techniques developed in [16,19] to the multiantenna 1 The results presented here can be easily extended to the multitone CDMA and to the MC-CDMA techniques as well.
MC DS-CDMA system and, moreover, we propose several combining schemes to integrate the statistics observed on each receive antenna branch. It should be noted that the resulting receivers are blind and timingfree, that is, they do not require any information beyond the spreading code of the user to be detected. Interestingly, not even the propagation delay and initial transmitter timing offset for the user of interest is required.
(3) As a by-product of the previous derivations, we also introduce a subspace-based technique which enables blind channel estimation up to a complex scaling factor. (4) We also provide a thorough performance analysis of the proposed receivers; in particular, we derive closedform formulas for the conditional error probability and for the near-far resistance, given the channel impulse response realization. It is worth noticing that the methodology outlined here is quite general and can be used to express the performance of any linear receiver in differentially encoded systems.
The rest of the paper is organized as follows. Section 2 outlines the system model, while Section 3 is devoted to the development of the detection structures. In Section 4, the statistical analysis of the receiver is provided, while Section 5 is devoted to the discussion of the numerical results. Finally, concluding remarks are given in Section 6.

Notation
In the following, (·), (·) T , and (·) H denote conjugate, transpose, and conjugate transpose, respectively; M m×n (C) is the set of all the m × n-dimensional matrices with complexvalued entries. E[·] denotes statistical expectation; (·) and (·) denote real part and coefficient of the imaginary part, respectively; column-vectors and matrices are indicated through boldface lowercase and uppercase letters, respectively. The term Im(A) is the image of A, that is, its column span, while Ker(A) is the null space of A, that is, the orthogonal complement of Im(A); dim(S) is the dimensionality of the subspace S; the symbols ·, · , ⊗, and denote the canonical scalar product, the Kronecker product, and the Schur (i.e., component-wise) matrix product, respectively; I n denote the identity matrix of order n; O m,n and 0 m are the m × n-dimensional matrix and m-dimensional vector with null entries, respectively, and diag(a) is a diagonal matrix containing the elements of the vector a on its diagonal; A + is the Moore-Penrose generalized inverse of A. supp{ f } is the support of the function f , that is, the set of its arguments for which f is not zero and u T (τ) is the unit height rectangular waveform of support (0, T). N (µ, C) denotes the distribution of a Gaussian vector with mean µ and covariance matrix C while Q(·) is the area under the leading tail of standard Gaussian pdf; finally Q 1 (·, ·) and I 0 (·) are the Marcum function and the modified Bessel function of the first kind and order zero, respectively.

SYSTEM MODEL
The general scheme of an MC communication system equipped with multiple transmit and receive antennae is shown in Figure 1. A block of n t symbols is converted from serial to parallel and each symbol feeds a (spatially) separate antenna. Thus, the n t symbols are transmitted in parallel, achieving an n t -fold increase in the data rate, and received on n r spatially separated receive antennae, providing an n r thorder receive diversity to combat fading.
The complex envelope of the signal received on the rth antenna can be formally written as where (1) K is the number of active users; (2) P is the length of the transmitted frame; (3) A k is the amplitude of the signal transmitted by the kth user; (4) b k t (l) is the symbol transmitted by the tth antenna of the kth user at the lth bit interval; (5) β k t (τ) is the signature assigned to the tth transmitter of the kth user; (6) T b is the bit duration; (7) τ k is the kth user's overall delay, that is, the sum of the kth user transmission delay and of the propagation time through the channel; (8) h k t,r (τ) is the channel impulse response from the tth transmit of the k-user to the rth receive; (9) w r (τ) is the additive white Gaussian noise on the rth receive antenna, independent for different antennae, with power spectral density 2N 0 .
On the other hand, the signatures in (1) are where (1) N is the number of subcarriers provided to each user; (2) M is the spreading gain on each subcarrier (hence PG = MN is the overall processing gain); For given N, the processing gain on each subcarrier is fixed (M = PG/N), and the channel frequency response can be approximated as follows: where f n = (n − (N − 1)/2)∆ f . We assume a slowly fading channel, namely, whose coherence time exceeds the packet duration PT b . As to H k t,r,n , it is modelled as a sequence of complex standard Gaussian random variables, independent for all n; additionally, due to the spatial separation, they are also independent for different t, r, and k.
At the receiver side, the signal observed on each antenna is converted to discrete-time. According to the scheme in Figure 2, there are N branches (i.e., as many as the number of carriers) in the anolog-to-digital converter (ADC), each one consisting of a mixer and of a low-pass filter ψ rx (τ), whose output is sampled every T c seconds. Ideally, the filter ψ rx (τ) should be strictly bandlimited, with bandwidth not smaller than B sc and not larger than ∆ f ; in practice, it is realized through a waveform with finite support [0, ∆ rx T c ] and bandwidth extending between B sc and ∆ f . It is also required to have a Nyquist autocorrelation, that is, r ψrx ( jT c ) = R ψ rx (τ)ψ rx (τ − jT c )dτ = δ( j): this implies that output noise samples are uncorrelated. At the nth branch, the output of the low-pass filter at the rth antenna is written as follows: where s k t,r,n (τ) = M−1 m=0 c k t (nM + m)g k t,r,n τ − mT c , In this equation, ϕ k (τ) = A k ψ tx (τ) * ψ rx (τ) and use has been made of the fact that the channel is flat on each subcarrier. It is worthwhile noticing that (i) in (4), the only substream surviving filtering is the nth one as, due to the bandlimitedness of the transmitted chip waveform, there is no intercarrier interference; (ii) all of the unknown parameters (H k t,r,n and τ k ) due to propagation through the channels and users transmitting delay have been shoved in the unknown functions g k t,r,n (τ).
Notice that the prior uncertainty as to the delay parameter τ k derives from the initial timing offset of the kth transmitter and from the propagation delay. However, while the latter contribution could be easily absorbed in the channel impulse response, the former should be explicitly accounted for in the context of an asynchronous network: this fact, coupled with the use of strictly bandlimited chip waveforms, poses some limitations on the maximum users number that will be discussed in greater detail later on in the paper. Upon sampling at chip rate, the signal r r,n (τ) is converted to the sequence r r,n jT c = As ϕ k (τ) has a compact support in [0, ∆T c ], with ∆ = ∆ tx + ∆ rx , according to (5), we have where the inclusions stem from the assumption that τ k + ∆ − 2T c < T b . Thus, assuming that we are interested in decoding the information symbols transmitted by the 0th antenna of the 0th user, as s k t,r,n ( jT c − iT b ) = 0 only for j = iM + 1, . . . , (i + 2)M, b 0 0 (i) can be detected through the windowed observables r r,n ( jT c ), for j = iM + 1, . . . , (i + 2)M, that can be arranged in the vector Stacking now the discrete-time version of g k t,r,n (τ) into the vector and defining the following matrices: where C k t,n,0H and C k t,n,0L ∈ M M×M+1 (C) contain the M upper and M lower rows of the matrix C k t,n,0 , respectively, the discrete-time version Thus, the discrete-time observable r r,n (i) in (8) can be recast as where Stacking up the vectors corresponding to the N subcarriers, we obtain the following discrete observable at the rth receive antenna: where we have let Notice that in (14), s k t,r,0 is the complete signature transmitted by the tth antenna of the k-user and received, after propagation, at the rth antenna (namely, it is a spatial signature related to the real one through the channel impulse response); s k t,r,−1 and s k t,r,+1 are parts of the signature related to the previous and successive transmitted symbol; the vectors g k t,r contain both the unknown channel coefficients (through the vectors h k t,r ∼ N (0 N , I N )) and the users timings (through the vectors ϕ k ); finally, w r (i) ∼ N (0 2MN , 2N 0 I 2MN ) accounts for the thermal noise.
The above model represents the extension to the MC DS-CDMA case with multiple antennae of a well-known representation derived for single-antenna DS-CDMA systems operating over fading dispersive channels [16,17,18,19]. In this scenario, in order to allow possible joint processing of the observables at all of the receive antennae, it is useful to define the vector r(i) = (r 0 (i) · · · r nr −1 (i)) T , which, upon defining quantities can be also written as follows: In (17), s 0 0,0 is the useful signature, z(i) represents the selfinterference, multiuser interference (MAI), and intersymbol interference (ISI) contribution, and w(i) ∼ N (0 2MNnr , 2N 0 I 2MNnr ) is the thermal noise. Notice that the subscript "t" points out that each transmit antenna of a given user is assigned a different spreading sequence, a condition that will be shown to be necessary in blind uncoded systems. For future reference, notice that the covariance matrix of r(i) is equal to

DETECTOR DESIGN
The detectors that are considered in this paper are linear, and thus uniquely specified by a suitable complex-valued vector m. 2 As anticipated, differential coding/decoding is to be adopted to cope with the absence of a phase reference, whereby the desired information is contained in the quantity . At the receiver side, the observables r 0 (i), . . . , r nr −1 (i) can be either processed separately and then combined or processed jointly through the vector in (17); we refer to the former case as noncooperative detection and to the latter case as cooperative detection.

Noncooperative detection
If we adopt a noncooperative scheme, the signals at the output of the n r antennae are processed through as many detectors, whose outputs are expressed by ϑ r (i) = r r (i), m r , r = 1, . . . , n r − 1. The vector ϑ(i) = (ϑ 0 (i) · · · ϑ nr −1 (i)) T is then forwarded to a combining block, which makes the decisions d 0 0 (i) = f (ϑ(i), ϑ(i − 1)). We consider three different scenarios.
(1) Soft integration. In this case, the decision rule assumes the form that is, the decision takes place after the integration of the soft differential statistics ϑ r (i)ϑ r (i − 1). (2) Hard integration (with a randomized offset): that is, the combination takes place after one-bit quantization of the soft differential statistics. Observe that, for n r odd, the randomized offset has no effect and this decision amounts to a majority rule, which is optimal for hard-quantized statistics; on the other hand, for n r even, the possibility that f (ϑ(i), ϑ(i − 1)) = 0 is ward off through the secondary threshold u. 3 (3) Maximal ratio combiner (MRC). According to (14), the vector ϑ(i) is expressed as follows: A possible detection strategy consists of weighting the n r unquantized statistics of the vector ϑ(i) with the elements of the gain vector a, thus realizing an MRC; afterwards, the uncertainty on the phase can be removed though differential detection. The detection rule is thus

Cooperative detection
In this scheme, the observables are first stacked in a unique vector and then jointly processed, obtaining ϑ(i) = r(i), m ; a decision is finally made through Obviously, the cooperative scheme is expected to achieve, at the price of some complexity increase, a substantial performance improvement with respect to the noncooperative detection schemes. Notice also that (17) reduces to (14) for n r = 1; as a consequence, the synthesis of the receiver can be carried out starting from the observables in (17) and then specify the results to the case n r = 1. There are, of course, a number of different criteria to design m. The first step is to generalize the subspace-based detector, introduced in [16,21], to the new scenario and then move on to the newly proposed detector family that is referred to as "two-stage" receivers in what follows.

Subspace-based receiver
The correlation matrix R rr of the received signal can be decomposed as 3 For further details on the optimality of randomized tests, see [20].
) contains the 3Kn t largest eigenvalues of R rr in descending order and U s the corresponding orthonormal eigenvectors; Im(U s ) and Im(U n ) are the signal subspace and the noise subspace, respectively. Based on the above decomposition, the orthogonality between the noise subspace and the useful signal s 0 0,0 can be exploited to obtain an estimate, g 0 0 , say, of the vector g 0 0 . In particular, under the condition 4 dim Im R qq ∩ Im S 0 0,0 S 0H g 0 0 can be obtained as the unique, nontrivial solution of the equation Since in practice the covariance matrix R rr is not known, it has to be replaced by its sample estimate R rr = (1/Q) Q−1 i=0 r(i)r H (i), whose spectral decomposition is Accordingly, g 0 0 solves the problem that is, it is the eigenvector corresponding to the smallest eigenvalue of the matrix S 0H 0,0 U n U H n S 0 0,0 . The vector g 0 0 is then used to obtain the classical minimum mean square error (MMSE) and zero-forcing (ZF) receivers, that is, with

Two-stage receiver
The subspace-based receivers exhibit a noticeable performance degradation as the users number grows large, since the dimensionality of the noise subspace decreases and the estimate of the vector g 0 0 becomes worse and worse. A possible mean to cope with these overloaded scenarios is to resort to the "two-stage" receivers, introduced in [18,19] with reference to single-antenna DS-CDMA networks. As a consequence, the mathematical proofs of the results in Sections 3.2.1 and 3.2.3 will be omitted so as to avoid any overlap with available literature. 4 Remember that Im(R qq ) = Im(U s ) = Ker(U H n ) and Im(S 0 0,0 S 0H 0,0 ) = Im(S 0 0,0 ). Two-stage detectors owe their name to a functional split of their operation in a suppression block, represented by the matrix D of Figure 3, and a BER optimization block, represented by the vector e of the same figure. Obviously, the two stages may collapse into the single vector m = De.

Synthesis of the interference cancellation stage D
The useful signature s 0 0,0 lies in Im(S 0 0,0 ), which, in turn, is a vector subspace of C (M+1)Nnr . The first stage is thus a noninvertible transformation of the observables, that is, where D ∈ M 2MNnr ×(M+1)Nnr (C) solves one of the following two constrained minimization problems: The former cost function is the classical one for minimum mean output energy (MOE), while the latter involves the minimization of the noise-free observables; in both cases, the constraint ensures that the signal of interest always survives after the noninvertible transformation. Under the condition (25), the solution to the above problems can be shown to be written as follows: where α ∈ C (M+1)Nnr is an arbitrary vector with strictly positive entries and R can be either R rr or R qq . If R = R rr , D is the solution to the former problem in (32) and subsumes, as the special case of nonfading channel with known timing, the minimum MOE solution equivalent to the MMSE receiver; accordingly, we refer to this solution as an MMSE-like receiver. Otherwise, if R = R qq , D is the solution to the latter problem in (32) and subsumes in the same way the linear ZF receiver; we thus refer to this solution as ZF-like receiver.
Since scalar multiplicative constants have no influence on the decision rule (see [19]), the matrix D can be also expressed as follows: Before proceeding in the system derivation, it is worth commenting on condition (25), which was advocated to support solution (33). Indeed, the constraints in (32) just ensure that the output useful signature is nonzero with probability one, but they do not offer any guarantee that all of the interference be blocked before further processing. On the other hand, defining that is, the matrix containing all the 3Kn t signatures s k t,l and S 0 0,0 , and noticing that it is seen that a necessary condition for (i.e., for all the interferers to be nullified and the useful signal to survive) is that s k t,l and the columns of S 0 0,0 be linearly independent with respect to X for all (k, t, l) = (0, 0, 0) (see [19] for more details). Ensuring that s 0 0,0 is the only signature linearly dependent on the columns of S 0 0,0 with respect to X amounts to forcing s 0 0,0 = S 0 0,0 g 0 0 to be the only direction which belongs both to Im(S 0 0,0 S 0H 0,0 ) and to Im(R qq ), that is, to forcing (25) to hold true. This condition will be, in the following, referred to as identifiability condition, a term we borrow from [17]: notice however that, while in the subspacebased detectors such a condition is a necessary one in order to ensure the channel identification-and indeed its violation would result in a useless receiver-in our approach, (25) is not a precondition, even though its violation usually results in a performance degradation and in the loss of the near-far resistance properties.
It is also worth pointing out here that, in the considered scenario, (25) cannot be relaxed through signal-space oversampling, as suggested in [16], and implemented in [19], where rectangular chip waveforms were adopted. The MC modulation format, instead, requires avoiding the intercarrier interference, which, for asynchronous systems, can be accomplished through the use of strictly bandlimited chip waveforms: obviously, no further sampling beyond the Nyquist rate may be advantageous in this situation.

Blind implementation of D
In order to implement in a blind fashion the MMSE-like receiver, the covariance matrix R rr is to be replaced in practice by its sample estimate R rr ; the blocking matrix is then The implementation of the ZF-like receiver requires, instead, more attention since an estimate of R qq + S 0 0,0 S 0H 0,0 is needed. To this end, first note that, based on (25), dim Im R qq + S 0 0,0 S 0H whereby, upon eigendecomposition, we obtain where U = [U1 U 2], Λ = diag(Λ 1 , Λ 2 ), Λ 1 = diag(λ 1 , . . . , λ 3Knt+(M+1)Nnr −1 ) contains the 3Kn t + (M + 1)Nn r − 1 largest eigenvalues and U 1 the corresponding orthonormal eigenvectors. An estimate of R qq + S 0 0,0 S 0H 0,0 is thus and the blind implementation of the ZF-like filter is

Synthesis of the second stage e
Assuming that the blocking matrix D has suppressed all of the interference (the term D H z(i) is very small if the MMSElike solution is adopted, while it is exactly zero for the ZF-like one), the observables at the output of the second stage can be written as The vector e can be now chosen so as to minimize the BER, that is, it is the cascade of a whitening filter and of a filter matched to the warped useful signal. Upon considering the "economy size" singular value decomposition D = U D ΛV H , the whitening filter is VΛ −1 , with Λ ∈ M (M+1)Nnr ×(M+1)Nnr (C) a diagonal matrix and V ∈ M (M+1)Nnr ×(M+1)Nnr (C) a unitary square matrix. Accordingly, the whitened observables are given by and the matched filter is U H D S 0 0 g 0 0 . The second stage is then and the expression of the complete receiver is given by

Blind implementation of e
Since in practice the vector g 0 0 is not known, a further processing is needed to obtain an estimate of the second stage (45). To this end, notice that the correlation matrix of y w (i) can be written as that is, it consists of the sum of a full-rank matrix and of a unit rank one, the latter admitting U H D S 0 0,0 g 0 0 as its unique eigenvector. Consequently, the eigenvector u max corresponding to the largest eigenvalue of R yw yw is parallel to U H D S 0 0,0 g 0 0 , and the receiver's second stage is e = VΛ −1 u max . Thus the receiver is given by In practice, the vector u max is estimated through an eigendecomposition of the sample covariance matrix R yw yw of the whitened observables y w (i) with

Channel estimation
As a by-product of the previous derivations, an estimate (up to a complex scalar factor) of the discrete-time channel impulse response g 0 0 can be obtained, based on the consideration that u max is parallel to U H D S 0 0,0 g 0 0 . Accordingly, the estimate g 0 0 of g 0 0 is This estimate (and, in the same way, the subspace-based one) can be further improved based on (16), which shows that g 0 0 = h 0 0 ⊗ ϕ 0 is a structured vector. Thus we can look for the nearest vector to d having this structure, that is, we can consider the following optimization problem: Unfortunately, the cost function in (51) can be shown to have multiple minima, and no closed-form solution can be devised to compute its global minimum. A suitable strategy is to minimize this function alternately with respect to h and ϕ, which yield the following iterative rule: where we have denoted by g 0 0 (n) the estimate of g 0 0 at the nth iteration. Note that convergence of this procedure to the global minimum is not guaranteed; however, experimental evidence has shown that after few iteration (i.e., 3-4), a fixed point is reached.

Gain vector estimation
If a noncooperative scheme with maximal ratio combining is adopted, after we have realized the n r receivers, one for each antenna, a further processing is needed in order to get an estimate of the gain vector a.
Assuming again complete suppression of all of the interference, (21) becomes A simple blind method for estimating a (see [21]) can be developed noticing that the correlation matrix of ϑ(i) is given by 5 R ϑϑ = a a H + 2N 0 I nr . (54) Thus, the eigenvector corresponding to the largest eigenvalue of R ϑϑ is parallel to a and so, except for a complex scaling factor, it is an estimate of the gain vector a (note that the phase ambiguity introduced by this complex constant is removed by the differential detection rule). Finally, note that this estimation technique can be easily made adaptive using the tracking algorithm suggested in [21].

Maximum number of users and system complexity
The identifiability condition sets a limit on the maximum rank of R qq and, consequently, on the maximum number of users, K max say, that the system can accommodate reliably.
Since, based on (39), we have Recalling that each user is assigned n t spreading sequences, the maximum number of active users is for noncooperative and cooperative detection, respectively. Note that the cooperative detection scheme, jointly elaborating the signals received at the n r antennae, achieves better BER performance and, at the same time, can accommodate a larger number of users than the noncooperative scheme, as expected, at the price of some complexity increase. In fact, due to the matrix inversion in the first stage and to the singular value decomposition in the second one, the receiver complexity is cubic with the dimension of R rr , that is, the complexity is O((MNn r ) 3 ). Noncooperative receivers, instead, rely on n r parallel operations conducted on matrices of order 2MN and entail a complexity O(n r (MN) 3 ). Note, however, that, coupling a recursive least squares (RLS) procedure with subspace tracking techniques as in [18,19], the overall complexity can be limited to be quadratic, that is, O((n r MN) 2 ) and O(n r (MN) 2 ) for cooperative and noncooperative detection, respectively. Moreover, since n r is not very large for real applications, the complexity increase involved by cooperative over the noncooperative detection is often negligible.
A final key remark is now in order. Conditions (57) represent the extension to the case of MC DS-CDMA employing multiple transmit and receive antennae of the condition reported in [19] for single-antenna DS-CDMA systems employing rectangular chip pulses. As already anticipated, such an identifiability condition cannot be relaxed through signal-space oversampling, once bandlimited waveforms are employed. Indeed, adopting rectangular pulses corresponds to enlarging the bandwidth beyond 1/T c and to using infinite effective bandwidth which in turn corresponds to a theoretically infinite precision in delay estimation (see [20]). Thus, in the case of asynchronous systems with unknown delays, the DS-CDMA multiplex actually spans, in the ensemble of the delays realizations, an infinite-dimensional space whose principal directions can be in principle resolved by progressively enlarging the front-end bandwidth (i.e., "oversampling" by a factor L, which corresponds to chip-matched filtering through a unit-height pulse of duration T c /L and sampling at rate L/T c ). In the considered strictly bandlimited scenario, instead, the signal span is strictly finite, whereby there appear to be just two alternatives in order to increase the maximum user number: the former is obviously an increase of the number of receive antennae, while the latter, that we just mention here, is to enlarge the processing window.
Before moving on to the statistical analysis of the proposed detection schemes, it is worth commenting on the two-stage receiver family introduced in this section. First, notice that the functional split between the interference cancellation and the BER maximization stages results in a greater flexibility at a design level; indeed, the blocking matrix D may be designed according to several different criteria, mainly depending on the intensity of the interfering users, without affecting the structure of the BER optimization stage. Additionally, even though we do not dwell on this issue here, it is natural to investigate the feasibility of adaptive (on a bitby-bit scale) blind systems. Notice that, in our scenario, several different time-scales can be envisaged for channel variations: the abrupt changes in the MAI, wherein new users may enter the network and former users may abandon it, short-term variations in the channel tap-weights, and longterm variations in the temporal and spatial signatures of the active users. Notice also that the MAI structure affects only the interference-blocking stage of the proposed receiver, and would in principle require a self-recovering updating of the blocking matrix D, which is indeed the focus of current research. As for the long-term variations, it is reasonable to assume that their time scale is large enough so as to allow batch processing with offline estimation of the relevant statistical measures. An open problem is, instead, the handling of short-term variations, which have an impact on both stages of the receiver. At an intuitive level, one might expect that the interference-blocking matrix design criterion should be modified in order to ensure nonzero output signal in the ensemble of the channel tap-weights realizations, which expectedly results in a set of constraints dictated by the covariance matrix of the channel taps. Additionally, constrained-complexity tracking procedures should be introduced in order to adapt the BER optimization stage in such a time-varying scenario. All of the above issues form the objects of current investigations.

STATISTICAL ANALYSIS
In this section, we develop a statistical performance analysis of the proposed receiver and, in particular, we derive analytical expressions for the conditional error probability and near-far resistance, given the timing and the channel realizations of all of the users, that is, conditioned on the vector

Probability of error
First of all, recall that the decision rule is written as where Assuming that the MAI plus ISI contribution m H z(i) at the output of the filter is approximately Gaussian with zero mean (see [22]), the term ζ(i) in (60) can be modeled as a complex Gaussian random variate with zero mean. Thus, given g and b 0 0 (i), the random variable x i is itself Gaussian and where Notice also that R z(i)z(i) and R z(i)z(i−1) are equal to the null matrix if the ZF-like receiver is adopted. Since the probability of error can be written as and since (1/2)x i x i−1 + (1/2)x i x i−1 is a quadratic form in correlated complex-valued Gaussian random variables, upon defining and using the results in [23], we obtain P e|g = Q 1 (a, b) − αI 0 (ab)e −(a 2 +b 2 )/2 , Notice that (65) is the expression of the probability of error of any linear receiver employing differential data detection. In order to obtain the unconditional error probability, we should carry out the expectation with respect to the vector g; however, this task cannot be easily accomplished, whereby we resort to a numerical average over a finite number of random realizations of g. So far, the case of a cooperative reception has been analyzed; moving to the noncooperative receiving scheme with hard integration, denote by p e the conditional probability of error over each of the n r receive antennae (note that p e can be computed with the same approach as in the case of cooperative detection); since the channel gains and the thermal noise are assumed independent across the receive antennae, the hard integration strategy amounts to a Bernoulli counting and the overall probability of error is easily shown to be written as follows: Determining an analytical expression for the error probability in the case of noncooperative reception with soft integration is quite involved a task. Indeed, in this case, the test statistic can be expressed through the quadratic form are Gaussian variates, statistically independent of each other but not identically distributed, thus implying that the results in [23] cannot be directly applied. For the sake of brevity, we do not dwell any further on this issue, and just point out that the system error probability in this scenario is lower and upper bounded by those of the cooperative scheme and noncooperative scheme with hard integration, respectively.

Near-far resistance
For a multiuser detector, the asymptotic efficiency and the near-far resistance for the 0th transmit antenna of the 0th user are defined as follows: respectively, where P o e is the probability of error of the optimum receiver (maximum likelihood) for an isolated system (i.e., with no other user except the 0th one); the performance measures in (66) determine the loss due to the presence of the MAI in the limit of very low background noise. We just focus on the ZF-type receiver, since the MMSE-like solution converges to the ZF-like one as N 0 vanishes.
First of all, note that if (25) is met, the proposed receiver achieves asymptotic multiuser efficiency, since the first stage is able to completely suppress interference (see (37)). However, as P e cannot be easily computed in a closed form, in the sequel we condition on the vector g and consider the following conditional near-far resistance: Note that even though η(g) does not coincide with the actual system near-far resistance η, it is still a measure of the receiver capability to combat interference with arbitrarily large strength in the low-noise region: precisely, η(g) is the nearfar resistance that the receiver experiences during the transmission of a frame. Now, since a closed-form expression of P o e|g is not available, a lower bound for η(g) can be obtained by replacing P o e|g itself with the error probability Q( s 0 0,0 2 /N 0 ) of a synchronous single-antenna system employing binary phaseshift keying; thus, we have Now, we evaluate this parameter. For the ZF-like receiver, the quantities in (61) and (64) simplify to respectively, and the probability of error for the 0th transmit antenna of the 0th user given g in (65) can be also written as follows: Since Q 1 (ξ/ N 0 , φ/ N 0 ) and I 0 (ξφ/N 0 )e −(ξ 2 +φ 2 )/(2N0) are both asymptotic functions, for N 0 → 0, to Q((φ − ξ)/ N 0 ) (see [24]), the conditional near-far resistance admits the following lower bound: It is obviously understood that averaging the above quantity with respect to g leads to a sort of average near-far resistance, that is, the near-far resistance experienced, on the average, by the receiver during the transmission of many (theoretically infinite) packets; in this case too, the expectation with respect to the vector g can be evaluated numerically.

NUMERICAL RESULTS
In this section, we discuss numerical results illustrating the performance of the proposed receivers. We use both semianalytical procedures exploiting the previously derived analytical formulas, and plain Monte Carlo simulations. In both situations, the curves shown will be the result of an average over 10 4 channels and delays realizations. We assume that (a) each user is equipped with two transmit antennae; (b) the convolution (ψ tx * ψ rx )(τ) = ϕ(τ) is a raised cosine with duration 4T c (∆ = 4) and roll-off factor 0.22; (c) the number of subcarriers is N = 4 and the spreading over each one is M = 8 (the composite spreading gain is then PG = 32 and the spreading sequences are PN ∈ {−1, 1} of length 31 stretched out with a−1); (d) the sample correlation matrix R rr is obtained through a sample estimate over Q = 1300 samples. In Figure 4, the computed lower bound for the average near-far resistance of the two-stage receiver with cooperative detection is represented versus the number of active users for different number of receive antennae (n r = 1, . . . , 4). Results show that the proposed receiver is near-far resistant and, also, that increasing the number of receive antennae yields a remarkable performance improvement (note that for n r = 4, the limiting factor of the number of users is no longer dictated by (25), but by the number of available spreading sequences, that is, sixteen times two). Figures 5 and 6 show the probability of error (obtained through the semianalytical procedure) of the nonblind receivers with cooperative detection versus the ratio γ 0 = E 0 b /N 0 , for several values of the number of receive antennae and of active users. It is here assumed that perfect average power control has been pursued, even though, due to the said near-far resistance feature of the proposed receivers, the system performance is only slightly degraded in a near-far scenario. It is seen from Figure 5 that as the number of receive antennae grows, the receiver performance improves and, for a fixed error probability value, a higher number of users can be accommodated. On the other hand, Figure 6 shows the error probability for different receivers and fixed number of users, that is, K = 4. It can be seen that the MMSE-like receiver behaves slightly worse than the MMSE one for n r = 1, while for n r = 2, all the nonblind receivers exhibit the same performance. Simulation results, not provided here for the sake of brevity, have also confirmed a perfect agreement between the semianalytical procedure and the Monte Carlo-based performance evaluation technique.
With regard to the performance of the blind receivers, results of Monte Carlo simulations are presented in Figures 7,8,9,10,11, and 12 for a severe near-far scenario (the interfering users are 15 dB above the user of interest) and with K = 4 active users, for both cooperative and noncooperative   reception (observe that the maximum number of user K max for the noncooperative scheme is 4 implying that the network is fully loaded). Figures 7 and 8 show the performance of the proposed subspace-based channel estimation procedure for a noncooperative and a cooperative reception scheme, respectively. In particular, the correlation coefficient is reported versus γ 0 . Here, the word "mod" in the legends refers to the improved channel estimation rule in (52). Figures 9 to 11 show the system error probability for the noncooperative scheme with hard and soft integration, and for the cooperative scheme, respectively. Here, the curve labeled as "MMSE-like limit" reports the performance of the MMSElike receiver in the limit of increasingly large size Q of the sample set used to estimate the covariance matrix of the data. Inspecting the figures, it is seen that in the noncooperative case, with the network fully loaded, the best channel esti- mation is achieved by the ZF-like receiver, immediately followed by the subspace-based one, while for the cooperative case, both the ZF-like and the MMSE-like receivers outperform the subspace-based one. This trend is confirmed in the plots showing the error probability; indeed, the ZF-like receiver performs slightly better then the ZF subspace-based one in both cases while the MMSE-like receiver outperforms the subspace-based receiver only in the cooperative case. It is also seen that the soft integration achieves superior performance with respect to the hard integration scheme and that both of them incur a loss with respect to the cooperative reception. Notice that for the noncooperative receiver, due to the network full load, the MMSE-like limit performance is not coincident with that of the ideal MMSE receiver; conversely, for cooperative reception, since now the users number is smaller than the maximum one, the MMSE-like limit curve is quite coincident with the ideal MMSE receiver performance. Finally, in Figure 12 a comparison between the error probability of the soft integration and MRC techniques in a noncooperative reception scheme is provided. Notice that, at the price of some complexity increase, the MRC scheme achieves better results with respect to the soft integration one for the nonblind receivers; on the other hand, concerning the blind receivers, the performance improvement is less evident due to the not perfect estimation of the vector gain ( R ϑϑ was obtained though a sample estimate over Q 2 = 1000 samples).

CONCLUSIONS
In this paper, we have considered the problem of blind multiuser detection for asynchronous MC DS-CDMA systems equipped with multiple transmit and receive antennae. This is nowadays an interesting research topic, since MC modulation formats coupled with the use of multiple antennae represent a suitable means to achieve high data rates on the wireless channel at a reasonable computational and practical implementation cost.
The receivers that have been proposed here are codeaided in the sense that they require knowledge of the spread-  ing code for the user of interest only, while no prior knowledge on the channel state and on the timing offset is needed. Several combining rules for the statistics obtained at the output of each antenna have been considered and assessed. A thorough statistical analysis has been derived for the proposed receivers (and for any linear receiver employing binary differential transmission), while the performance of the blind version has been evaluated through Monte Carlo simulations. Results have shown that these receivers exhibit performance levels close to those of the MMSE and ZF ones and that the use of multiple receive antennae has a beneficial impact on the system performance.
Future work on this topic comprises the consideration of space-time and space-frequency codes, as well as the extension of the proposed detection strategy to the situation in which the channel is time-dispersive, that is, it does not remain constant over the whole transmitted frame.