Joint CFO and DOA estimation for multiuser OFDMA uplink

In this article, we develop a new subspace-based multiuser joint carrier frequency offset (CFO) and direction-of-arrival (DOA) estimation scheme for orthogonal frequency division multiple access uplink transmissions. We leverage multi-antenna at the receiver and consider that the signals transmitted by each user arrive at the receiving antenna array from multiple DOAs after bouncing from both surrounding and far scatterers. The rank reduction approach is then exploited to estimate the multiple CFOs and DOAs. Specifically, for each user, after the CFO estimation from one-dimensional search, its multiple DOAs can be obtained simultaneously via polynomial rooting. The proposed method supports generalized subcarrier assignment scheme and fully loaded transmissions. Both performance analysis and numerical results are provided to corroborate the proposed studies.


Introduction
As has widely been studied in recent years [1][2][3], orthogonal frequency division multiple access (OFDMA) is deemed as a promising technique for next-generation multiuser wireless communications. The performance of OFDMA, however, is sensitive to multiple carrier frequency offsets (CFOs) introduced by the mismatch of the transceiver oscillators or the Doppler effect. In multiuser scenarios, the non-zero CFOs lead to both inter-carrier interference and multiple-access interference, which could severely degrade the system performance.
The CFO estimation scheme for OFDMA uplink transmissions has intensively been investigated in the past few years. Using the frequency domain embedded pilot symbols, an iterative CFO estimation approach was described in [4] for tile structure-based OFDMA transmission [5]. The CFOs can also be estimated from the maximum likelihood (ML) approach by transmitting training sequences from each user, but with very high complexity. The alternating-projection algorithm was introduced in [6] to replace the multi-dimensional search with a sequence of one-dimensional (1D) searches. An improved approach was later proposed in [7,8] to further reduce the complexity of [6] by using the divide-and-update frequency estimator. An interesting alternative to avoid the ML multi-dimensional search is to use the mean likelihood estimator combined with the importance sampling technique [9,10]. Another complexity-reduced CFO estimator was reported in [11] by approximating the inverse of a CFO-dependent matrix with that of a predetermined matrix.
Blind CFO estimation methods, on the other hand, were also developed to improve the bandwidth efficiency. The CFOs can be computed by looking for the position of null subcarriers within the signal bandwidth in the system for subband subcarrier assignment scheme (SAS) [12]. A frequency estimation scheme for uplink OFDMA with interleaved SAS that exploits the periodic structure of the signals from each user has been reported in [13], where the subspace estimation theory was utilized, which makes the scheme similar to the multiple signal classification technique [14]. Based on the observation of [13], several advancements have been proposed later [15,16]. Despite their good performance, both [13] and its variations [15,16] are only applicable for interleaved SAS and cannot be used for generalized SAS. Moreover, they must reserve null subcarriers or a much longer cyclic prefix (CP) to construct the noise space, which reduces the bandwidth efficiency.
More recently, several CFO estimation schemes have been developed for the OFDMA systems by leveraging multi-antenna at the receiver. For instances, a CFO estimation scheme for interleaved OFDMA/space division multiple access uplink systems was developed in [17] to support spatially separated users and to maximize the channel throughput. Another several schemes were proposed in [18,19] to support generalized SAS as well as fully loaded transmissions. They adopted the estimation of signal parameters via rotational invariance technique (ESPRIT)-like approach and exploited the direction-ofarrival (DOA) information to separate the signals from different users.
In this article, we develop a new subspace-based multiuser joint CFO and DOA estimation scheme for OFDMA uplink transmissions. We leverage multi-antenna at the receiver and consider that the signals transmitted by each user arrive at the receiver's antenna array from multiple DOAs, after bouncing from both surrounding and far scatterers [20]. The multiple CFOs and DOAs are then derived by a rank-reduction approach. Specifically, for each user, after the CFO estimation using 1D search, its multiple DOAs can also be obtained simultaneously by polynomial rooting, which is one unique property of our scheme. In summary, the main contributions of this article include the following: 1. With the consideration of multi-cluster channels, we design a new joint CFO and DOA estimation method for multiuser OFDMA uplink. The proposed method supports generalized SAS and fully loaded transmissions. 2. We provide the theoretical performance analysis of our method in terms of both CFO and DOA estimation. 3. Compared with [18,19], the simulation results demonstrate that our method not only has the advantage of being applicable to multi-cluster channels, but also can obtain much better performance in single cluster channels.
Notations: Superscripts (·)*, (·) T , (·) H , [·] † , and E[·] represent conjugate, transpose, Hermitian, pseudo inverse, and expectation, respectively; j = √ −1 is the imaginary unit; ||X|| denotes the Frobenius norm of X, and diag(·) is a diagonal matrix with main diagonal (·); The kronecker product is denoted by ⊗; The component-wise product is denoted by°; I N denotes the N × N identity matrix and 1 N denotes the 1 × N matrix with all entries being 1; Matlab matrix representations are adopted, for example, X(r 1 : r 2 , c 1 : c 2 ) denotes the submatrix of X with the rows from r 1 to r 2 and the columns from c 1 to c 2 .

System model
We consider a multiuser OFDMA system with K users, N subcarriers. The base station (BS) is equipped with a uniform linear array (ULA) with M antennas, which is elevated above the rooftop. All subcarriers are sequentially indexed with {0, 1, . . . , N -1}. Assume that the channel between each user and the receiver is composed of N cl clusters (N cl ≥ 1). The multipath components in each cluster exhibit similar DOAs. Among the total N cl clusters, one cluster is called surrounding cluster that corresponds to the scatterers located around each user, and the remaining N cl -1 clusters, called far clusters, correspond to high-rise buildings in urban environments and hills/mountains in rural environments [20,21]. As the BS is deployed above its surrounding scatterers, following [18,19], we further approximate that the multipath components from one cluster have a single DOA. Here we should note that the works in [18,19] considered only the surrounding cluster, but ignored the existence of far clusters. However, as has been reported in [20], in the typical Urban environment, the fractions of the cases with two and three clusters are 9 and 4%, respectively. The fractions are even higher in the bad Urban environment, which are given by 28 and 45%, respectively. We should note that the methods developed in [18,19] may not be applicable to these multi-cluster scenarios.
Denote χ = d/l¸, where d is the antenna spacing of the ULA, and l is the radio wavelength. Assume that in the gth block, the multipath channel components between the ith cluster of the kth user and the reference antenna (1st) of ULA can be modeled by a length-L p vector (1) We assume that the entries of h (k) 1,i,g are independent Gaussian variables with variance 1/L p such that the expectation of the channel vector norm is 1, i.e., denote the DOA of the ith cluster of the kth user and then the multipath channel components between the ith cluster of the kth user and the mth antenna of ULA can be expressed as where a (k) m,i = e j2πχ(m−1) cos ϕ (k) i . Correspondingly, its frequency domain channel is given by where F stands for the N × N DFT matrix with its (i, j) We denote the normalized CFO of the kth user by ξ (k) = Δf (k) /Δf, where Δf is the subcarrier spacing and Δf (k) is the CFO of the kth user. We assume ξ (k) (-0.5, 0.5). Denote the number and the index set of the subcarriers allocated to the kth user by N k and C (k) , where N k ,g ] T be the modulated symbols of the kth user in the gth block. In the noise-free environment, the received time-domain signal components after removing CP from the kth user at the mth antenna can be expressed as p,g , (5) where the term in the bracket stands for the composition frequency-domain channel response at the c (k) p th subcarrier of the kth user resulting from total N cl clusters. Then the overall received signal from K users at the mth antenna can be expressed as Stacking the received signals from all M antenna elements at the nth sample, we obtain the following spacedomain snapshot vector γ n,g = γ 1,g (n), γ 2,g (n), . . . , γ M,g (n) T .
Define the Vandermonde vector which reflects the DOA of the ith cluster of the kth user, and obtain the corresponding Vandermonde matrix by collecting N cl Vandermonde vectors. Considering the noise item, we can then rewrite g ng in the following matrix form (2) , . . . , (K) ), and n n,g is a length-M additive white Gaussian noise (AWGN) vector with variance matrix σ 2 n I M at the nth sample in the gth block.

Joint CFO and DOA estimation
Properties of the subspace Stacking L (L ≤ N) continuous space-domain snapshot vectors from the nth to the (n+L-1)th sample time, we obtain The effect of the parameter L will be discussed later.
we can rewrite A as We obtain the correlation matrix of γ g | n+L−1 n as follows where s being the average power of the transmitted signals. Then, we have R γ = σ 2 s AA H + σ 2 n I ML . In practice, using successive L s OFDMA blocks, this correlation matrix can be approximatd by Afterwards, we assume the matrix A is tall and has full column rank, and the corresponding discussion will be presented later. Performing singular value decomposition (SVD) on R g gives.
where U g and V g represent the (N cl N sum )-dimensional signal space and (ML -N cl N sum )-dimensional noise space matrices, respectively. We define the following length-L parameterized Vandermonde vector with respect to ξ: . For notational convenience, we denote 0 as the all-zero matrix with appropriate dimension. Lemma 1 gives the key properties to design our joint estimator: Lemma 1: When the matrix A − has full column rank, then for a non-zero length-M vector ω, there holds and where A − is the first M(L-1) rows of A which can be expressed as Proof. See Appendix 1. □
It implies the matrix ∏ (k) (ξ) drops rank if and only if ξ = ξ (k) . Based on above observations, we design the CFO estimation as follows. Select a trial ξ from (-0.5, 0.5) and compute the M eigenvalues of the matrix ∏ (k) (ξ), denoted by κ M (ξ ) in ascending order. The CFO for the kth user can be obtained from 1D search by minimizing the following cost function:

DOA estimation
We denote ε Thereby, after the CFO estimation for the kth user, we can further derive the N cl DOAs for the kth user by finding the N cl minimum point of the following cost function: . , e j2πχ(M−1) cos ϕ ] T . Note that the polynomial rooting approach can be used to implement this minimization problem. The basic idea is first obtaining all local minimum/maximum solutions by setting the derivative of the cost function to be zero, and then putting these solutions back to the original cost function and selecting the minimum after comparison [22]. Specifically, According to [22], we know z = e j2πχ cosφ (k) i , i = 1, 2, . . . , N cl , is one of the roots for the polynomial By putting the roots on the unit circle of Q(z) back to the original cost function g (k) () and selecting the N cl minimum points after comparisons, the solution to (22) is obtained. Afterwards, for A − with full column rank, we obtain Lemma 2: Lemma 2: Assume all the K users have distinct DOAs. According to the number of subcarriers allocated, we arrange the user in descending order, as follows e 1 , e 2 , . . . , e K , such that N e k ≥ N e k+1 , then when where v = M N c1 ≥ 2 with ⌊·⌋ denoting the integer floor operation, the matrix A − has full column rank.
Proof. See Appendix 2. □ Note that when A − has full column rank, the matrix A will be tall and also has full column rank, which guarantees the validity of the SVD operation in (15). Following Lemmas 1 and 2, we could make the following important observation; that is when M ≥ 2N cl , i.e., the number of antennas at the receivers is not less than two times of the number of channel clusters from each user, the validity of our method is guaranteed with L satisfying the condition in (25).

Performance analysis
In this section, we provide theoretical performance analysis for both the CFO and DOA estimation performance of our proposed method. Bearing in mind that we can rewrite Π (k) (ξ) as follows Then, (20) and (22) can be rewritten as follows:

CFO estimation performance
We rewrite (15) as where σ 2 i and e i , i = 1, 2, ..., N cl N sum , denote the eigenvalues and eigenvectors corresponding to signal space, respectively, while the remaining σ 2 i = σ 2 n and e i , i = N cl N sum + 1, ..., ML, correspond to the noise space. Let . Let x stand for the estimated value for x. The CFO estimation variance of the kth user can be given by [23]: We denote (k) In the following, we omit the parameterized notation (ξ) for presentation clarity. Then, we have Next, we obtain where Using and skipping some algebraic steps, we can simplify (38) as where Likewise, we can also obtain Furthermore, using we can further obtain Based on the results from [24], we know where and J tr is a submatrix of I N cl N sum which is given by J t,r = I N cl N sum (1 + (t − 1)M : 1 + (t + L − 2)M , 1+(r -1)M : 1+(r+ L-2)M).
Substituting (48) and (49) into (47) and after some algebraic steps, we arrive at By defining we can further rewrite (51) into the following more compact form where d σ = [ 1 Based on the fact that V H γ A = 0, we further obtain Then, we have Bearing in mind that (54) is infinitesimal under high signal-to-noise ratio (SNR) as compared to (53), we can then approximate (52) as (56) On the other side, we can also rewrite ∂ 2 G (k) ∂ξ 2 as follows Finally, substituting (55) and (57) into (31), we arrive at From (58), we make the following observations: First, it is seen that, the CFO estimation variance is decreased when SNR, i.e., σ 2 s /σ 2 n , increases. Second, increasing the number of blocks, i.e., L s , also improves the CFO estimation performance. Third, the relationship between the value of L and the CFO estimation variance is quite complicated. We see that, decreasing the value of L, on the one hand, will decrease the value of 1 (N−L+1) 2 ; on the other hand, it will also decrease the eigenvaluesσ 2 i , resulting in larger entries ofd σ In fact, from (11) and (14), we see that, the effect of L resembles the smoothing technique in array signal processing. That is, decreasing the value of L, on the one hand, will reduce the fluctuations of the estimated correlation matrixR ; on the other hand, it also decreases the potential for higher resolution of the subspace algorithm. We will later investigate the impact of L on the estimation performance via numerical simulations.

DOA estimation performance
Denote μ Likewise, the DOA estimation variance of the kth user is given by We have We further obtain where Let Δ x denote the perturbation of x, i.e., x =x − x.
Bearing in mind that ν (k) l , l = 1, 2, . . . , M, denotes the eigenvector of matrix (k) (ξ (k) ) , its perturbation is introduced by both the CFO estimation error of the kth user and the perturbations of the eigenvectors e i , i = 1, 2, . . . , N cl N sum . Hence, using the first-order approximation, the perturbation of ν (k) l can be expressed as Likewise, we can also rewrite T i, as By definingQ (k) = Q (k) + (Q (k) ) H , we obtain Afterwards, following similar steps from (46) to (56), we can rewrite (65) as wherẽ (k) Bearing in mind that we can then rewrite (59) as Interestingly, the DOA estimation variance in (74) exhibits a similar format with the CFO estimation variance given in (58). We can immediately observe that, the DOA estimation performance can be improved by increasing SNR or the number of blocks. Likewise, the relationship between the value of L and the DOA estimation performance is not that evident. We will leave its investigation in the simulations.

Computational complexity analysis
Let us now evaluate the computational complexity of our estimator in terms of the number of complex multiplications. We denote a as the number of trial CFO values to derive the solutions in (20). Calculation ofR γ and its  [18]. Thus, the complexities of our method and ESPRIT-2 are in the same order. Bearing in mind that the algorithm is implemented at BS rather than the users, the required complexity is not the bottleneck [18].

Simulations
In this section, we assess the performance of the proposed CFO and DOA estimation algorithm from computer simulations. The total number of subcarriers is taken as N = 64. The quadrature phase-shift keying (QPSK) constellation is adopted. The number of multipaths within each channel cluster is L p = 10. In each trial, the normalized CFO of each user is randomly generated from -0.4 to 0.4, and all subcarriers are allocated to each user uniformly with randomly generated SAS. The root mean square error is adopted to evaluated the estimation performance.
First, we consider the single cluster scenario, i.e., N cl = 1. Both the CFO and DOA estimation performance versus SNR are shown in Figure 1. In this example, we set M = 2, K = 4, L s = 64 and L = 50, and the DOAs from set {30°, 45°, 95°, 100°}. Both the simulation and analytical results of the proposed method are presented in the figure. It is demonstrated that the analytical results closely match the simulations, especially under moderate or high SNR region, which verifies the correctness of our analysis. Besides, the performance of [18,19] are also presented for comparison, which are referred to as 'ESPRIT-1' and 'ESPRIT-2', respectively. Note that there is a similar smoothing parameter L in ESPRIT-1, which will be set the same as ours in the following unless otherwise stated. The results clearly demonstrate that substantial improvement can be achieved by our proposed method compared to the other two counterparts, especially in the lower SNR region. This can be explained as follows. First, we have exploited the smoothing technique in our method, which improves the estimation of the correlation matrix; Second and more importantly, both ESPRIT-1 and ESPRIT-2 derive the CFO or DOA for each user by an indirect approach: they first estimate multiple CFO or DOA values separately for each occupied subcarrier of each user, and then combine them as the final result. Hence, lots of redundant parameters need to be estimated from the subspace algorithm, making both ESPRIT-1 and ESPRIT-2 less efficient. However, in our method, we make use of the fact that there is only one CFO and only one DOA for each user, and these two parameters are directly estimated by the designed rank reduction approach.
Next, we evaluate the performance of our method in the multi-cluster scenarios with K = 4 users. Both two clusters, i.e., N cl = 2, and three clusters, i.e., N cl = 3, are considered in this example. For   [19]; ◊: the method from [18]. effectiveness of our method for multi-cluster scenarios. The analytical results also closely match the simulations. Combing the observations from Figures 1 and 2, we can conclude that, as compared to ESPRIT-1 [19] and ESPRIT-2 [18], our method not only has the advantage of being applicable to multi-cluster channels, but also can obtain much better performance in single cluster channels.
In this example, we assume SNR = 20 dB, M = 2, K = 4, and L s = 64, while the DOAs of the users are set the same as that in Figure 1. The estimation performance evolution of our method under the single cluster scenario is shown in Figure 3, with L increasing from the lower bound 49 calculated from (25) b to N = 64. The analytical results are presented by the dashed curves. For comparison, we also include the corresponding performance of ESPRIT-1 and ESPRIT-2. Note that since ESPRIT-2 does not have the parameter L, its performance is presented as a straight line parallel to the x-axis. From the results, we observe an optional range of L that can be selected without considerable loss of performance, e.g., from 49 to 62 in this example. Moreover, we see that, our method always obtains better DOA estimation performance than the other two candidates under all values of L. On the side of CFO estimation, our method always behaves better than ESPRIT-1, while as compared to ESPRIT-2, our method behaves worse only when L = 64. In addition, some mismatch appears between the simulation and analytical results of our method at the right-hand end of the curves. This is because that, when L is very large, the estimation for the correlation matrix R g becomes worse, whose fluctuation may result in the occurrence of some outliers in simulations.
Afterwards, we investigate the performance of our method with the increasing of L s in Figure 4, where the DOA configuration is the same as that in Figure 3. Besides, we assume SNR = 20 dB, M = 2, K = 4, and L = 50 in this example. As expected, performance improvement can be observed with the increase of blocks adopted. Our method almost converges within 30 blocks. Moreover, it is seen that, in terms of both CFO and DOA estimation, our method behaves better than ESPRIT-1 under all values of L s . Note that since ESPRIT-2 does not work when L s < 64, its performance is not included in this figure. In addition, some mismatch appears between the analytical and simulation results of our method at the left-hand end of the curves. This is also because of the occurrence of outliers when a small number of blocks are adopted.   [19]; ◊: the method from [18].

Conclusions
In this article, we developed a multiuser joint CFO and DOA estimation scheme for OFDMA uplink transmissions. We employed multi-antenna at the receiver and exploited the rank reduction approach to derive the CFO and multiple DOAs simultaneously for each user. The proposed scheme supports generalized SAS and can provide fully loaded transmissions such that the bandwidth efficiency is improved. Both analysis and simulation results corroborated the proposed study and demonstrate its advantages over the existing schemes.
Further, we rewrite the above equation as Combing these two equations yields p ,i should equal to zeros. Thus, this is in contrary with the assumption of (79). The implication is that B (k) p (ξ ) ⊗ ω does not belong to the column space of A when ξ ≠ ξ (k) . Finally, we arrive at (18).

Appendix 2
Proof of Lemma 2 Define v = ⌊M/N cl ⌋ and assume v ≥ 2. Here, we consider only the case of M = vN cl . Note that the results obtained here is directly applicable to the case of M >vN cl .