Blind Identiﬁcation of Out of Cell Users in DS-CDMA

In the context of multiuser detection for the DS-CDMA uplink, out-of-cell interference is usually treated as Gaussian noise, possibly mitigated by overlaying a long random cell code on top of symbol spreading. Di ﬀ erent cells use statistically independent long codes, thereby providing means for statistical out-of-cell interference suppression. When the total number of (in-cell plus out-of-cell) users is less than the spreading gain, subspace identiﬁcation techniques are applicable. If the base station is equipped with multiple antennas, then completely blind identiﬁcation is possible via three-dimensional low-rank decomposition. This works with more users than spreading and antennas, but a purely algebraic solution is missing. In this paper, we develop an algebraic solution under the premise that the codes of the in-cell users are known. The codes of out-of-cell users and all array steering vectors are unknown. In this pragmatic scenario, we show that in addition to algebraic solution, better identiﬁability is possible. Our approach yields the best known identiﬁability result for three-dimensional low-rank decomposition when one of the three component matrices is partially known, albeit noninvertible. Simulations show that the proposed identiﬁcation algorithm remains close to the pertinent asymptotic (symbol-independent) Cram´er-Rao bound, which is also derived herein.


INTRODUCTION
In the context of uplink reception for cellular DS-CDMA systems, interference can be classified as either (i) interchip 1 (ICI) and intersymbol (ISI) self-interference, (ii) in-cell multiuser access interference (commonly referred to as MUI 2 or MAI), or (iii) out-of-cell multiuser access interference. The latter is typically ignored or treated as noise; however, it has been reported [1] that in IS-95 other cells account for a large percentage of the interference relative to the interference coming from within the cell. MUI is usually a side-effect of propagation through dispersive multipath channels. The conceptual difference between in-cell and out-of-cell interference boils down to what the base station (BS) can assume about the nature of interfering signals. Typically, the codes of interfering in-cell users are known to the BS, whereas those of out-of-cell users are not. Specifically, in the presence of ICI, the receive-codes of the in-cell users can be estimated via training or subspace techniques (e.g., cf. [2]), using the fact that the transmit-codes are known. This is not the case for out-of-cell users.
Appealing to the central limit theorem, the total interference from out-of-cell users is usually treated as Gaussian noise. In IS-95, a long random cell-specific code is overlaid on top of symbol spreading, and cell despreading is used at the BS to randomize out-of-cell interference. This helps mitigate out-of-cell interference in a statistical fashion. To see how random cell codes work, consider the simplified synchronous flat-fading baseband-equivalent received data model where x holds the received data corresponding to one symbol period, C in (resp. C out ) is the spreading code matrix, s in (resp. s out ) is the symbol vector, D in (resp. D out ) is a diagonal matrix that holds a portion of the random cell code for the in-cell (resp. out-of-cell) users, and n models receiver noise. For simplicity, assume that the in-cell symbol-periodic codes are orthogonal of length P, and all codes and symbols are BPSK (+1 or −1). Let c 1 stand for the code of an in-cell user of interest. Then The interference term is zero-mean; under certain conditions, its variance is O(1/P). This is easy to see for a single out-of-cell user. It follows that random cell codes work reasonably well in relatively underloaded systems with large spreading gain (e.g., 128 chips/symbol), but performance can suffer from near-far effects, and cell codes cannot help identify out-of-cell transmissions. Although the latter may seem of little concern in commercial applications, it can be important for tracking, handoff, and monitoring. In a way, a structured approach towards the explicit identification 1 of out-of-cell users is the next logical step beyond in-cell multiuser detection and is motivated by considerations similar to those that stimulated research took from 3 matched filtering to multiuser detection. Note that, unlike the case of in-cell interference, out-of-cell interference cannot be mitigated by power control, simply because the BS does not have the authority to exercise power control over out-of-cell users. For a power-controlled in-cell population, near-far effects may be chiefly due to out-of-cell interference.
Unfortunately, out-of-cell detection is compounded by the fact that it has to be blind, since the BS has no control and usually no prior information on out-of-cell users. This places limitations on the number and nature of out-of-cell transmissions that can be identified. The literature on out-of-cell blind identification is scarce. Assuming that (i) the codes of the in-cell users are known, (ii) the total number of (in-cell plus out-of-cell) users is less than the spreading gain and the combined spreading code matrix is full column rank, and (iii) given the correlation matrix of the vector of chip samples taken over a symbol interval, it is possible to cancel out the effect of out-of-cell users [3], then adopt linear or nonlinear solutions for in-cell detection. This approach is appealing, but it has two drawbacks. First, it can be unrealistic to assume that the total number of users is less than the spreading gain. This is especially so in loaded systems and urban areas. Second, in practice one uses sample estimates of the correlation matrix. This yields cancellation errors for finite samples, even in the noiseless case.
Recently, a novel code-blind identification approach has been proposed, exploiting uniqueness of low-rank decomposition of three-way arrays [4]. This requires the use of a BS antenna array, but in return allows the identification of both in-cell and out-of-cell users without requiring knowledge of the code or steering vector of any user. More users than spreading and antenna elements can be supported. There are two drawbacks to this approach. First, a direct algebraic solution is generally not possible, thus iterative estimation tech-niques must be employed. Although these iterative methods generally work very well, they are computationally intensive. Second, in-cell code information, which may be available, is not directly exploited (except numerically, by constraining certain parameters during the iterations). In this paper, we develop an algebraic solution that exploits the fact that the codes of the in-cell users are known. In this scenario, we show that in addition to algebraic solution, better identifiability is possible. Our approach yields the best known identifiability result for three-dimensional low-rank decomposition when one of the three component matrices is partially known, albeit noninvertible.
Note that the group-blind multiuser detection approach of [3] can be easily extended to handle multiple BS antennas, but this requires that the array steering vectors, in addition to the spreading codes 2 of all the in-cell users, are known. Estimating steering vectors is more difficult than estimating codes, partly because they are generally unstructured, but also due to mobility-induced fast fading. Note that the approach developed herein (see also [4]) does not assume any parameterization of the manifold vectors.
For clarity of exposition, we will begin our analysis by assuming that both in-cell and out-of-cell user transmissions are synchronized at the BS. In practice, this can be approximately true in synchronous CDMA systems, like CDMA2000. 3 Quasisynchronism (i.e., timing offsets in the order of a few chips) can be handled by dropping a short chip prefix at the receiver. We will refer to both cases as synchronous CDMA for brevity. Synchronization is usually achieved via pilot tones emitted from the BS, or a GPSderived timing reference for synchronous networks involving multiple cells. Out-of-cell transmissions will typically not be synchronized with in-cell transmissions. Notable exceptions include synchronous microcellular networks for "hotspot" coverage, and calls undergoing hand-off at cell boundaries (hence approximately equidistant from the two base stations). As we will see, when delay spread is small relative to the symbol duration, this can be handled by treating each out-of-cell user as two virtual users. Hence our analysis generalizes to the interesting case of a quasisynchronous in-cell population plus asynchronous out-of-cell interference, as in Wideband CDMA (WCDMA). We will refer to this situation as asynchronous CDMA.
The rest of the paper is organized as follows. The main ideas and concepts are exposed in Section 2.1, which treats the idealized case of a synchronous DS-CDMA uplink subject to flat fading. This is then extended to frequencyselective multipath and quasisynchronous transmissions in Section 3, which also discusses a suitable admission protocol that avoids explicit code estimation for the in-cell users. Note that in the presence of strong out-of-cell interference and frequency selectivity, estimating the codes of the incell users is a difficult task in itself. Section 4 discusses issues related to our choice of a pertinent symbol-independent asymptotic Cramér-Rao Bound (CRB) to benchmark performance of steering vector and spreading code estimation. Associated derivations are deferred to the appendix. Section 5 provides analytical and simulated performance comparisons, and Section 6 summarizes our conclusions.

Notation
(·) T and (·) H denote transpose and Hermitian transpose, respectively; δ(·) stands for Kronecker's delta. r A stands for the rank of matrix A, while k A stands for the k-rank (Kruskalrank) of matrix A: the maximum k ∈ Z + such that every k columns of A are linearly independent (k A ≤ r A ). · F stands for Frobenius norm; (·) −1 and (·) † stand for the matrix inverse and pseudoinverse, respectively. D i (A) stands for the diagonal matrix constructed out of the ith row of A. I n stands for the n × n identity matrix. E(·) denotes the expectation operator. f , g denotes the L 2 inner product between functions f and g.

Data model
Consider a DS-CDMA uplink with M users (in-cell plus outof-cell), normalized chip waveform ψ of duration T c , and spreading gain P (chips per symbol). The mth user is assigned a binary chip sequence (c m (1), . . . , c m (P)). The resulting signature waveform for the mth user is where T s = PT c is the symbol duration. All spreading codes are assumed short (symbol periodic). The baseband-equivalent signal received at the BS for a burst of L transmitted symbols can be written as (4) where M is the total number of active users, α m is the complex path gain, E m is the incident power for the mth user loaded at the transmitter, s m (l) is the lth transmitted symbol associated with the mth user, τ m is the delay of the mth user's signal, and w(·) is additive white Gaussian noise (AWGN). Since in-cell users are synchronized with the BS, the delays τ m for all in-cell users are taken to be zero. For out-of-cell users, the associated delays can be assumed to lie in [0, T s ], without loss of generality.
If K receive antennas are employed at the BS, the baseband signal at the output of the chip-matched filter of the kth antenna for the pth chip in the nth symbol interval can be written as where M in (≤ P) denotes the number of in-cell users and M out the number of out-of-cell users (M = M in + M out ); β k is the antenna gain associated with the kth antenna; ν pm (n, Note that, due to asynchronism, each out-of-cell user is viewed by the BS as two synchronous users, whose symbol sequences are time-shifted versions of one another. The associated spreading codes are given by ν pm (·, ·).
From (5), in a frequency-flat block-fading scenario, the baseband-equivalent chip-rate sampled data model for a synchronous DS-CDMA system with short symbol-periodic spreading codes and K receive antennas at the BS can be written as for k = 1, . . . , K, n = 1, . . . , N, p = 1, . . . , P, where N is the number of symbol snapshots, x k,n,p denotes the baseband output of the kth antenna element for symbol (time) n and chip p, a m (k) is the compound flat fading/antenna gain associated with the response of the kth antenna to the mth user. It is useful to recast this model in matrix form. We define P received data matrices X p ∈ C K×N with (k, n)-element given by x k,n,p , and AWGN matrices W p ∈ C K×N with (k, n)-element given by w k,n,p . We also define the steering matrix A ∈ C K×M with mth column [a m (1) · · · a m (K)] T , the spreading code matrix C ∈ C P×M with mth column [c m (1) · · · c m (P)] T , and the signal matrix S ∈ C N×M with mth column [s m (1) · · · s m (N)] T . Without loss of generality, we assume that the submatrices A in ∈ C K×Min , C in ∈ C P×Min , S in ∈ C N×Min , consisting of the first M in columns of A, C, S, respectively, correspond to the in-cell users; and similarly for A out , C out , and S out . Thus, we have X p admits the factorization for p = 1, 2, . . . , P.
It is also worth mentioning that we can write the above set of matrix equations into more compact form if we introduce the so-called Khatri-Rao product (column-wise Kronecker product, see [4] and references therein). Stacking the matrices in (7), we obtain Due to the symmetry of the model (6), we may also recast (8) in the following form where W PN×K is a reshuffled AWGN matrix (see [4]).
In what follows, we consider detecting the signal matrix S transmitted from all active users given only knowledge of C in and M. As a byproduct, we will be able to recover the steering matrix A and the unknown spreading code matrix C out from the received data X as well.

Preliminaries
In the sequel, we will need to invoke certain preliminary results in order to prove our main identifiability result in Theorem 1. Identifiability means that, in the absence of noise, it is possible to recover the sought signals (model parameters) without error; that is, it is possible to pin down the sought parameters exactly. For this reason, we drop noise terms in the discussion that follows. The basic ideas behind preliminary results leading to Theorem 1 are due to Harshman [5]. We begin by recalling the definition of k-rank.

Definition
Definition 1. The k-rank [6] of A is equal to k A if every k A columns drawn from A are linearly independent, and either there exists a collection of k A + 1 linearly dependent columns in A or A has exactly k A columns. Note that k A ≤ rank(A), for all A.

Eigenanalysis
Consider two matrices X 1 = AD 1 (C)S T , X 2 = AD 2 (C)S T , where both A ∈ C K×M and S ∈ C N×M are full column rank (M), C ∈ C 2×M contains no zero entry, and all elements 4 on the diagonal of D := D 2 (C)D −1 1 (C) are assumed 4 distinct. Consider the singular value decomposition (SVD) of the stacked data matrix The linear space spanned by the columns of U is the same as the space spanned by the columns of A AD since SD 1 (C) has full column rank; hence there exists a nonsingular matrix P such that Next, construct the auto-and cross-product matrices Note that since both A and S are assumed full column rank, 5 the matrices R 0 , R 1 , Q, P, and D in (12) are M × M full rank matrices. Solving the first equation in (12) for Q, then substituting the result into the second, it follows that which is a standard eigenvalue problem with distinct eigenvalues. P can therefore be determined up to permutation and scaling of columns based on the matrices X 1 and X 2 . After that, A can be obtained as A = U 1 P, CD −1 1 (C) can be retrieved with all ones in the first row, and the entire second row taken from the diagonal of D, and finally SD 1 (C) can be recovered as SD 1 T , all under the same permu-tation and scaling of columns, which carries over from the solution of the eigenvalue problem in (13). Repeated values along the diagonal of D 2 (C)D −1 1 (C) give rise to eigenvalues of multiplicity higher than one. In this case, the span of eigenvectors corresponding to each distinct eigenvalue can still be uniquely determined. This will be important when we discuss the case of asynchronous out-of-cell users later in Section 3.
More generally, we have the following claim.
Claim 1. Given matrices X p = AD p (C)S T for p = 1, . . . , P ≥ 2, A, C, and S can be found up to permutation and scaling of columns provided that both A and S are full column rank, and k C ≥ 2.
Since k C ≥ 2, we know that the spreading code matrix C does not contain any zero columns. Note that k C ≥ 2 does not necessarily imply that there always exists a submatrix of C which comprises two rows of C such that the k-rank of this submatrix is 2. For instance, consider It can be seen that r C = k C = 3, whereas none of the 2 × 3 submatrices of C has k-rank greater than 1. From this example, it is evident that one cannot prove Claim 1 by eigendecomposition applied to a pair of X p 's. For this, we will need the following claim.
For a proof of Claim 2, note that the objective can be easily shown equivalent to proving that there exists a 2 × P matrix G such that the determinants of all 2 × 2 submatrices of GC are not zero. G is determined by its 2P complex entries. The determinant of each 2 × 2 submatrix of GC is a polynomial in those 2P variables, and hence analytic. Since k C ≥ 2, for each specific 2 × 2 submatrix of GC, for instance, the submatrix comprising the first two columns of GC, it is not hard to show that there always exists a G 0 such that the determinant of the corresponding submatrix of G 0 C is not zero. Invoking [7, Lemma 2], we conclude that the set of G's which yield zero determinant for any specific submatrix of GC constitutes a measure zero set in C 2P . The number of all 2×2 submatrices of GC is finite, and any finite union of measure zero sets is of measure zero. The existence of the desired G thus follows. Not only does such a G exist, but in fact a random G drawn from, for example, a Gaussian product distribution, will do with probability one. This establishes Claim 2.
The existence of such G implies that the elements on the diagonal of D 2 (GC)D −1 1 (GC) will be distinct. Therefore, the eigenanalysis steps can be carried through to solve for A and S from the two mixed slabs AD 1 (GC)S T and AD 2 (GC)S T . With the recovered A and S, C can be computed from X p . Therefore Claim 1 follows.

Lemma
In the proof of our main theorem, we will need the following lemma.
where * stands for a nonzero entry, it holds that for almost every (µ 1 , µ 2 ) ∈ R 2 (i.e., except for a set of Lebesgue measure zero), the matrix contains no zero entry in the second row; and the first two elements on the diagonal of D 1 (E)D −1 2 (E) are distinct and distinct from the remaining elements.
Proof. Having a zero entry in the second row occurs when (µ 1 , µ 2 ) lies on the union of M lines. Since a finite union of lines cannot cover the plane, zeros in the second row are excluded almost surely. The second claim can be proven in the same manner.

Main theorem on identifiability
Without loss of generality, we assume that C in is in canonical form. The general case can be reduced to canonical form as explained in the following section. Theorem 1. Given X p = AD p (C)S T , p = 1, . . . , P, 2 ≤ M in ≤ P, where A ∈ C K×M , C ∈ C P×M , S ∈ C N×M , and C in canonical form where I P (1 : M in ) denotes the first M in columns of I P , if the first M in rows of C out contain no zero entries, and k C ≥ 2, min{k A , k S } ≥ M out + 2, then the matrices A, C, and S are unique up to permutation and scaling of columns.
Proof. We will show that we can first recover A in and S in up to permutation and scaling of columns from the given X p , and then obtain A out , C out , and S out afterwards. We begin by recovering the first two columns of A in and S in . Start from Mout * · · · * S T =Ā diag 1 0 * · · · * S T , Recall that * stands for a nonzero entry;Ā (S) is a columnreduced submatrix of A (S). Invoking Lemma 1, we always can pick a pair (µ 1 , µ 2 ) ∈ R 2 such that contains no zero entry in the second row; and the first two elements on the diagonal of D 1 (E)D −1 2 (E) are distinct and distinct from the remaining elements. We also note that both A andS have M out + 2 columns from the original A and S; by definition of k-rank, it follows that Due to the fact that min{k A , k S } ≥ M out + 2, bothĀ andS are full column rank. Therefore, eigendecomposition as in Section 2.2.2 can be applied to the following mixed slabs, to recover the first two columns of A and S T up to permutation and scaling. We can repeat this procedure with X i and X i+1 to recover the ith and the (i + 1)th columns of A in and S in for i = 2, . . . , M in − 1 until both A in and S in are recovered.
The matrices X in p := A in D p (I P (1 : M in ))S T in corresponding to the in-cell users can be constructed, and we thus obtain the matrices X out p by subtracting X in p from X p for p = 1, . . . , P. X out p is nothing but A out D p (C out )S out . Since A out , C out , and S out are all M out -column submatrices of A, C, and S, respectively, we have The first two inequalities hold due to the condition that min{k A , k S } ≥ M out + 2, and imply that both A out and S out are full column rank matrices.
If M out ≥ 2, we know that k Cout ≥ 2; therefore Claim 1 can be invoked, and eigenanalysis of two mixed slabs can be carried out to recover A out , C out , and S out , up to permutation and scaling of columns.
When M out = 1, it is known that rank-one matrix decomposition is unique up to scaling.
Remark 1. Note that C in Theorem 1 can be a fat matrix. A similar result can be derived for M in = 1, with slightly different conditions on C out .
The assumption that the first M in rows of C out contain no zero entries is posed mainly for simplicity of proof of Theorem 1. Theorem 1 holds, provided that none of the columns of the submatrix comprising the first M in rows of C out is proportional to a column of I Min . We chose to prove the slightly restricted Theorem 1 due to space considerations.
Remark 3. The model identifiability conditions of Theorem 1 are usually met in practice deterministically or statistically with proper system parameters. For instance, if we assume that A and C are drawn from a continuous distribution, and S drawn from an i.i.d. BPSK source, it can be shown that k A ≥ M out + 2, k C ≥ 2 holds almost surely, provided K ≥ M out + 2, P ≥ 2, while k S ≥ M out + 2 occurs with high probability provided that N is moderately higher than M.

Algorithms
The proof of Theorem 1 is constructive; it directly yields a sequential eigenvalue-based solution that recovers everything exactly in the noiseless case, under only the model identifiability condition in the theorem. In the noisy scenario, this eigenvalue approach can be coupled with an iterative LSbased refinement algorithm that yields good estimation performance for moderate signal-to-noise ratio (SNR) and beyond.
Assuming that C in is known, the two major steps of our algorithm are summarized next.
(1) Algebraic initialization Arrange the received noisy data x k,n,p into a set of matrices, X k ∈ C P×N , for k = 1, . . . , K. The (p, n) entry of X k is x k,n,p . It can be shown that where W k is the AWGN matrix. Left multiply by the pseudoinverse of C in to get Z k ∈ C Min×N : Form another set of matrices X m ∈ C K×N , for m = 1, . . . , M in such that the (k, n) entry of X m is equal to the (m, n) entry of Z k . It can be shown that where W m is the rearranged Gaussian noise matrix. Note that C † in C is in canonical form, and thus we may apply the approach described in the proof of Theorem 1 to estimate A, C † in C out , and S. C can also be estimated as where the (k, p) element of X n ∈ C K×P is given by x k,n,p (cf. [4] for details).
(2) Joint constrained Least Squares refinement Use the A, C out , and S obtained in the first step and the known C in as initialization for constrained trilinear alternating least squares (CTALS) regression applied to the original data x k,n,p . The basic idea behind TALS is to compute a conditional LS update of A given C, S, then repeat for S, and so forth in a circular fashion until convergence [4]. For CTALS, the C in part of C is fixed, and only C out is updated in the iterations.

EXTENSION TO QUASISYNCHRONOUS SYSTEMS AND MULTIPATH CHANNELS
There are two issues that must be addressed in order to establish the usefulness of our algorithm in a realistic cellular CDMA environment. One is synchronization; the other is frequency selectivity.
In so-called quasisynchronous CDMA (QS-CDMA) the symbol timing of the in-cell users may be off by as much as a few chips. This causes ISI, but, as already mentioned, it can be circumvented by dropping a short chip-prefix for each symbol at the receiver-the associated performance degradation is negligible when the prefix is short relative to the spreading gain.
Quasisynchronism is a reasonable assumption for the in-cell user population in the context of 3G systems (e.g., CDMA2000), but much less so for out-of-cell users, who actually attempt to synchronize with a different BS. The key here is (5) asynchronous out-of-cell users appear as two virtual synchronous users, with "split" code pieces, and symbol sequences that are offset by one symbol. Note that splitting and offset generally preserve linear independence; however, the steering vectors (spatial responses) will be colinear for each such pair of virtual users. Fortunately, by exchanging the roles of A and C and invoking the remark on repeated eigenvalues in Section 2.2.2, it can be shown that the parameters of all in-cell users can still be uniquely determined, along with the span of each pair of virtual out-of-cell users.
Frequency selectivity is realistically modeled by convolution with a relatively short chip-rate FIR filter that models the discrete-time baseband-equivalent channel impulse response, including transmit chip pulse-shaping and receive chip-matched filtering. The effective spreading codes seen at the receiver are the convolution of the transmit codes with the corresponding multipath channels. This means that the in-cell receive codes must be estimated before our basic approach developed in the above section can be applied. This estimation is compounded by the cochannel out-of-cell interference, which is not under the control of the BS. In order to deal with the problem of receive-code estimation for the in-cell users, we propose the following admission protocol "as new in-cell users come into the system, they are initially treated as out-of-cell: their receive-codes are thereby estimated blindly, and they are subsequently added to the list of in-cell users. Initially, the process is started by solving a blind problem," as in [4]. In this way, the problem of receivecode estimation for the in-cell users is never explicitly solved. Once the in-cell receive-codes have been estimated at the BS, the proposed algorithm can be carried over to the quasisynchronous frequency-selective DS-CDMA systems.

ASYMPTOTIC CRAMÉR-RAO BOUND
In order to benchmark the performance of our estimation algorithm, it is useful to derive pertinent bounds. While low bit error rate (BER) is of primary concern, accurate estimates of the out-of-cell user's receive-codes and both in-cell and outof-cell steering vectors are also of interest. CRBs can be developed for the latter, owing to the fact that unlike symbols, steering vectors and receive-codes are continuous parameters.
The conditional CRB for low-rank decomposition of multidimensional arrays has been derived in [8], assuming all matrices are fixed unknowns. In our present context, however, we are more interested in bounds that are independent of the symbol matrix S. Towards this end, we can aim for one of two options: computing an averaged (or modified) CRB, or an asymptotic CRB. The former turns out to be far more complicated to derive in closed form; we therefore opt for the latter.
In the appendix, wherein the detailed CRB derivations can be found, we begin by developing a compact form of the conditional CRB in [8]. The new compact form is much simpler to compute than the expression given in [8]. Then, following the approach developed in [9], we work out the asymptotic CRB as the number of symbols, N, goes to infinity. The key to this computation is that the limit and the CRB operator can be exchanged, since the latter is continuous; and when N tends to infinity, the sample estimate of the correlation matrix of S approaches the exact correlation matrix of S. For the sake of brevity, in what follows, we assume that the entries of S are drawn from an i.i.d. BPSK source. This implies that E s m1 n 1 s m2 n 2 = δ m1,m2 δ n1,n2 . (27) Note that the asymptotic CRB derived in the appendix is valid for arbitrary C-it is not necessary to have C in in canonical form. The main limitation of the asymptotic CRB is that it is valid for large enough N, but for small N there will be some mismatch.

SIMULATION RESULTS
In this section, we provide computer simulation results to demonstrate the performance of the proposed algorithm.
As per Theorem 1, scaling ambiguity for all active users and the permutation ambiguity among out-of-cell users is inherent to this blind separation problem. We remove the column scaling ambiguity among the estimated symbol matrix S via differential encoding, and assume differentially en- biguity among the out-of-cell users is resolved using a greedy least square matching algorithm [4]. This permutation ambiguity among the out-of-cell users cannot be solved at the BS without additional side information, but this indeterminacy is irrelevant in practice. Let X p = AD p (C)S T + W p be the received noisy data, for p = 1, . . . , P, where W p are the AWGN matrices. We define the sample SNR at the input of the multiuser receiver as SNR := 10 log 10 We first show that the proposed algebraic initialization significantly accelerates the convergence of least square refinement and improves the performance. In order to have a benchmark, we consider cases wherein the TALS-based COMFAC algorithm [4] is also applicable, but note that the approach developed herein can work well when COMFAC fails. When both methods are applicable, our simulations show that the new approach yields better performance. Figure 1 plots BER versus average SNR, without out-ofcell interference and for M in = 4, DE-BPSK, K = 2, N = 50, and P = 4. Results are averaged over 10 2 i.i.d. Rayleigh channels (A-no power control is assumed), and 10 6 realizations per each Rayleigh channel. Note that total averaging is O(10 8 ). The spreading codes are randomly drawn from a continuous distribution and fixed throughout the simulations. the number of total active users. It is seen from those figures that, as expected, the proposed algorithm has provided better BER performance than COMFAC; in particular, such improvement is significant in the high SNR regime. In addition, the proposed algorithm has been observed to converge at least 70 percent faster (in terms of time) than the general TALS with random initialization, and comparably with respect to the computation-efficient TALS-based COMFAC, especially in the high SNR regime.
Next, the performance of the proposed algorithm and that of the linear group-blind decorrelating detector [3] with two different sample sizes is shown in Figure 3. The original group-blind multiuser detector is designed for uplink CDMA with a single receive antenna, but the approach of [3] can be easily extended to handle multiple BS antennas, provided that the array steering vectors, in addition to the spreading codes, of all the in-cell users are known. Estimating steering vectors is more difficult than estimating codes, because the former vary faster due to mobility-induced fast fading. In our simulation, in contrast to the proposed algorithm, the linear group-blind decorrelating detector assumes perfect knowledge of in-cell user's steering matrix A in , that is, we provide the linear group-blind decorrelating detector with perfect knowledge of (C in A in ) in (8). Figure 3 depicts the performance of the two competing detectors for two different sample sizes, N = 25, N = 50. It is observed that the linear group-blind decorrelating detector exhibits an error floor in the high SNR regime due to using sample estimates of the correlation matrix. This yields cancellation errors which persist for any number of finite samples, even in the noiseless case. However, such error floor is acceptable when we use large sample sizes. With 50 snapshots, the linear group-blind decorrelating detector provides better BER performance than the proposed detector in the high SNR regime even though the error floor surfaces at about 24 dB. With a small sample size of N = 25, the proposed detector clearly outperforms the linear group-blind decorrelating detector, despite the fact that it uses less side information.
In both cases, the proposed detector outperforms the linear group-blind decorrelating detector in the low SNR regime. We emphasize that the proposed algorithm performs well even for very small sample sizes (e.g., N = 10) in the high SNR regime, whereas the group-blind approach hits the error floor at very low SNR in this case. Our proposed detector is also robust to strong out-ofcell interference. We have compared the user 1's BER performance of proposed approach against the usual minimum mean squared error (MMSE) receiver, which assumes exact knowledge of the in-cell user codes and steering vectors, but treats out-of-cell users as Gaussian interference. The soft MMSE solution for S is (29) Figure 4 shows that as the power of out-of-cell users increases, the performance of the MMSE receiver deteriorates significantly whereas the degradation of the proposed detector is marginal. The proposed algorithm is capable of accurately estimating the steering matrix of all active users and the code matrix of out-of-cell users. In order to illustrate this, we compare the (mean squared error MSE) performance of the proposed ap- proach against the associated asymptotic CRB. Throughout, the asymptotic CRB is first normalized in an elementwise fashion, that is, each unknown parameter's CRB is weighed with weight proportional to the inverse modulus square of respective parameter. The average weighted CRB of all the unknown parameters is then used as a single performance metric. The average MSE for all free model parameters is calculated in the same fashion. The SNR is defined as SNR := 10 log 10 which can be shown consistent with the definition (28) when 6 we take the expectation of (28) with respect to S. Figure 5 depicts simulation results comparing TALS performance to this asymptotic CRB for two different snapshots. In this simulation, K = 4, P = 4, M = 6, and the true parameters were used to initialize TALS. The point here is to measure how tight the asymptotic CRB is for various N; for this reason, we use the sought parameters as initialization in order to ensure the best possible scenario for TALS. It can be seen that TALS with good initialization remains very close to the CRB from medium to high SNR and relatively large sample size, N = 64. Note that N = 64 is a reasonable number of symbol snapshots in practice. When the sample size is relatively small, the MSE performance of TALS is naturally 7 worse than what is predicted by the asymptotic CRB. Figure 6 presents the average MSE performance of COM-FAC and the proposed algorithm against the CRB bound. We note that the performance of the proposed algorithm exceeds that of COMFAC considerably once SNR goes beyond the low SNR regime. This is because the new algebraic approach can provide fairly accurate initializations for CTALS, whereas the COMFAC is forced to use random initializations in this   case, wherein no two modes are full column rank. The average MSE of the proposed algorithm deviates from CRB about two to three dB. This is mainly because the initializations the algebraic approach provides are still not perfect, and the prespecified tolerance threshold used to terminate the iterative refinement algorithm is set higher than in previous simulations, due to complexity considerations.

CONCLUSIONS
Out-of-cell interference in DS-CDMA systems is usually treated as noise, possibly mitigated using random cell codes. If the total number of in-cell plus out-of-cell users is smaller than the spreading gain, subspace-based suppression of outof-cell users is possible. The assumption of more spreading than the total number of users can be quite unrealistic, even for moderately loaded cells. Completely blind reception is feasible under certain conditions (even with more users than spreading) with BS antenna arrays. We have proposed a new blind identification procedure that is capable of recovering both in-cell and out-of-cell transmissions, with sole knowledge of the in-cell user codes. The codes of the out-of-cell users and the steering vectors of all users are also recovered. The new procedure remains operational even when completely blind or subspace-based procedures fail. Interestingly, if the in-cell codes are known, then algebraic solution is possible.

ASYMPTOTIC CRB AS N TENDS TO INFINITY
To derive a meaningful CRB, following what has been done in [8], we assume that the first row of A and S is fixed (or normalized) to [1 · · · 1] 1×F (this takes care of scale ambiguity), the first row of C out is known and consists of distinct elements (which subsequently resolves the permutation ambiguity) and C in is in canonical form. In turn, the number of unknown complex parameters is (N +K where a k denotes the kth row of A, c out p denotes the ith row of C out , and s n denotes the nth row of S. It has been shown in [8] that the Fisher information matrix (FIM) is given by where f (θ) is the log-likelihood function and  Since we have assumed that 8 E s * n1 m 1 s n2 m 2 = δ n1,n2 δ m1,m2 , (A.9) 6 The forms given here can be shown to be mathematically equivalent to those in [8].