PARAFAC-based frequency synchronization for SC-FDMA uplink transmissions

Frequency synchronization is a critical requirement for single-carrier frequency division multiple-access (SC-FDMA) uplink transmissions. Based on the key observation that the received signal at the base station can be formulated as the parallel factor analysis model, we present a joint carrier frequency offset (CFO) estimation and compensation scheme for SC-FDMA uplink transmissions with interleaved sub-carrier allocation. The proposed scheme, employing identical training symbols, guarantees the identifiability of CFO estimation, allows the system to operate on full load, and it is readily extendible to multi-antenna base station (BS) reception. Comparisons with existing approaches in the literature reveal that the proposed algorithms provide lower complexity and superior performance.


Introduction
Orthogonal frequency division multiple-access (OFDMA) is a multi-carrier multi-user transmission technology used in various wireless communication systems nowadays, e.g. [1][2][3]. However, high peak-to-average-power ratio (PAPR) and sensitivity to carrier frequency offsets (CFOs) are the two main challenges in implementation of OFDMA systems. High PAPR forces the power amplifier to operate with a back-off, reducing the energy efficiency of the system. Energy efficiency is a major concern in uplink (UL) transmissions for battery-operated mobile users (MUs) due to limited power resources. To resolve the high PAPR issue, a variant of OFDMA termed as single-carrier frequency division multiple-access (SC-FDMA) has been proposed [4], which employs extra discrete Fourier transform (DFT) precoding at the transmitter (TX) to reduce the PAPR of the transmitted signal. Having the advantage of lower PAPR, SC-FDMA is employed in longterm evolution (LTE) UL transmissions as opposed to OFDMA [3]. However, the reduction in PAPR is also influenced by the choice of sub-carrier allocation scheme in UL transmissions [5]. Sub-carrier allocation schemes are broadly classified into three types: grouped, interleaved, *Correspondence: usman.gul@gatech.edu School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA and generalized sub-carrier allocations [6]. Among the three schemes, interleaved allocation scheme (IAS), in which the sub-carriers of different users are interleaved into each other, provides the lowest PAPR [5]. In addition, in a perfectly synchronized system, SC-FDMA with IAS outperforms OFDMA with minimum mean squared error (MMSE) equalization [7], which motivates the use of IAS in SC-FDMA UL transmissions.
However, being a variant of OFDMA, SC-FDMA transmissions are still sensitive to CFOs. CFOs arise in UL transmissions due to the residual errors in the downlink (DL) synchronization or Doppler shift induced by the mobility of MUs. The orthogonality of the sub-carriers is violated by CFOs, introducing inter-carrier interference (ICI). In addition, as different MUs have independent CFOs, multiple UL CFOs also introduce multi-user interference (MUI). Without precise CFO estimation and compensation at the base station (BS), corresponding to each MU, ICI, and MUI severely degrade the system performance. Thus, frequency synchronization is a critical step in SC-FDMA transmissions just as in OFDMA. As sub-carriers of different users as interleaved into each other, CFOs generate more MUI in IAS as compared to grouped or generalized allocation schemes. The superior performance of SC-FDMA with IAS in a perfectly synchronized system is lost in the case of non-zero CFOs [8] implying that SC-FDMA is more sensitive to CFOs as http://asp.eurasipjournals.com/content/2014/1/146 compared to OFDMA and motivating the need for an efficient frequency synchronization algorithm for SC-FDMA with IAS.
For CFO estimation in SC-FDMA with IAS, multiple signal classification (MUSIC) algorithm is employed in [18] for blind CFO estimation using a line search. A closed-form solution with better performance at lower signal-to-noise ratios (SNR) is proposed in [19] employing the estimation of signal parameters via rotational invariance technique (ESPRIT). As both MUSIC and ESPRIT are sub-space-based techniques, the CFO estimation schemes in [18,19] require a non-empty noise subspace for CFO estimation. This condition implies that the system cannot operate on full load or an extended cyclic prefix (CP) must be employed, which reduces the overall spectral efficiency. Low-complexity blind CFO estimation algorithm has been proposed in [22] while two CFO estimation methods for multi-antenna BSs, based on rank reduction and alternating projection methods are proposed in [21]. Although the schemes in [21] support fully loaded systems, the number of receive antennas must be greater than the channel order, which restricts its applicability in broadband systems. Data-aided maximum likelihood (ML) CFO estimation has been proposed in [16]. However, the complexity of the proposed scheme is too high for practical implementations. A sub-optimal algorithm with lower computational complexity but degraded performance is proposed in [17].
In addition to CFO estimation, various CFO compensation algorithms have also been proposed in the literature [24][25][26][27][28][29]. An interference cancelation scheme based on circular convolution is proposed in [24] while parallel interference cancelation is used in [25] for CFO compensation. A linear decorrelator detector is proposed in [26]. However, the method is computationally intensive as it requires inversion of a large matrix for satisfactory performance. A joint CFO compensation and channel estimation algorithm is proposed in [27] for multi-antenna SC-FDMA systems. Several time-domain CFO compensation schemes paired with successive interference cancelation and MU ordering are proposed in [24]. A low-complexity CFO compensation scheme for OFDMA transmissions has been proposed in [29] for interleaved OFDMA transmissions.
In this paper, we propose a joint CFO estimation and compensation scheme for SC-FDMA UL transmissions with IAS using parallel factor (PARAFAC) analysis method [30]. PARAFAC, an extension of matrix decomposition to tensors, has found numerous applications in signal processing including multi-user detection in CDMA systems [31,32], CFO estimation for DL OFDM systems [33], channel equalization [34], and array signal processing [35,36]. However, the scheme proposed in [33] is for single-user OFDM transmissions and it is not applicable to multi-user UL SC-FDMA transmissions, discussed in this paper. Employing identical training symbols and exploiting the structure of transmitted SC-FDMA signals with IAS, we show that in the presence of CFOs, the signal received at the BS can be formulated as the PARAFAC model. This key observation allows us to apply tensor decomposition methods to jointly retrieve the estimates of CFO and the MUI-free received training symbol corresponding to each user. The formulation also allows us to perform a low-complexity CFO compensation for subsequent data symbols. The proposed algorithm is semi-blind in the sense that it requires identical training symbols but the actual knowledge of the training symbols is not required. Thus, the training signals can be designed to optimize other receiver operations like timing synchronization or channel estimation. We prove that the proposed algorithm guarantees the identifiability of CFO estimation while allowing the system to operate on full load as opposed to [18,19,21]. In addition, the proposed algorithm can be readily extended to multi-antenna BS reception and offers better performance as compared to the existing approaches in the literature.
The rest of the paper has been organized as follows. Section 2 presents the system model and the structure of transmitted and received signals in SC-FDMA uplink transmissions. The proposed joint CFO estimation and compensation method is presented in Section 3 along with the discussion on identifiability. Section 4 presents the extension of the proposed algorithm to multi-antenna BS case while tensor decomposition algorithms and the associated computational complexity are discussed in Section 5. Section 6 presents the simulation results followed by conclusions in Section 7.

System model
We consider a baseband-equivalent system model of SC-FDMA UL transmissions where M active users transmit to a BS using their allocated sub-carriers. The maximum number of users allowed in the system is denoted as M u . The total number of sub-carriers is denoted as N http://asp.eurasipjournals.com/content/2014/1/146 while the number of sub-carriers allocated to each user is R = N M u . The transmission takes place in the form of frames where each frame contains a block of Q identical SC-FDMA training symbols followed by data symbols. The block diagram of the SC-FDMA system is shown in Figure 1.

Transmitted signal structure
As IAS is employed, the exclusive set of sub-carriers allocated to the mth user, 0 ≤ m ≤ M − 1 is denoted as where v m is the unique starting index of the mth user. The qth SC-FDMA training symbol contains R training points, which are precoded by an R point DFT and then mapped to the sub-carriers allocated to the mth user. Thus, the kth sub-carrier, 0 ≤ k ≤ N − 1, of the qth training symbol transmitted by the mth user is given as As the Q training symbols are identical, we drop the index q in x m [r ]. A N point inverse discrete Fourier trans- X m q, k e j 2π N kn , 0 n N − 1. ( As X m q, k is non-zero only for k ∈ I m , Using Equation 1, Equation 3 can be written as where δ(·) is the Kronecker delta function, and (·) R represents the modulo R operation. Equation 4 shows that the transmitted SC-FDMA signal of the mth user contains M u copies of the original signal x m [r] , 0 ≤ r ≤ R − 1 scaled by the phase term corresponding to the starting index of the user. A CP of N g samples is then appended at the start of each SC-FDMA symbol to combat inter-symbol interference (ISI) due to multi-path channel.

Received signal structure
The channel between the mth user and the BS is assumed to be of length P m and denoted ash m [p] , 0 ≤ p ≤ P m − 1.
We assume that the channel remains constant for the Q training symbols, and incorporated into its channel and the effective channel is given as Such a system is termed as quasi-synchronous system [6]. Once the TO estimate of each user is estimated at the BS, it is fed back to the user to advance its timing correspondingly and thus, the length of the CP of following symbols can be reduced.
After CP removal, the qth received symbol at the BS is the sum of the M SC-FDMA symbols transmitted by the MUs and given as where f m is the CFO between the mth user and the BS normalized by the sub-carrier spacing of the SC-FDMA system, N t = N + N g , and w[ q, n] represents the corresponding additive white Gaussian noise (AWGN) sample.
We assume that |f m | < 0.5, ∀m implying that only fractional CFOs appear in the UL transmission. It is a reasonable assumption since UL transmissions initiate after DL synchronization, which implies that the UL CFOs can only be caused by the Doppler effect or residual DL synchronization errors. Using Equation 4, Equation 6 can be written as If the effective channel length P ≤ R, the inner sum in Equation 7 is also periodic with period R. If we denote where φ m = f m + v m and ψ m = f m N g . Thus, the received signal is also periodic except for the phase terms introduced by the CFOs and the starting indices. As non-zeros CFOs introduce ICI and MUI, which severely degrade the overall system performance, CFO estimation and compensation must be performed before channel estimation, equalization, and subsequent receiver processes as shown in Figure 1.

CFO estimation and compensation for SC-FDMA UL with IAS
In this section, we present the proposed CFO estimation and compensation algorithms for SC-FDMA UL transmissions. As the proposed scheme is based on PARAFAC analysis, we start by showing that the received signal in Equation 9 can be formulated as a PARAFAC model, which will allow us to apply tensor decomposition algorithms to estimate the CFO and the MUI-free signal of each user. For an introduction to PARAFAC, refer to [30,31].

SC-FDMA UL as a PARAFAC model
If we stack the periods of the qth received signal in (9) in then Y(q) can be expressed as where and D M (q) is a M × M diagonal matrix with diagonal entries e j 2π N f m qN t 0 m M −1, and W(q) is an M u ×R matrix containing the corresponding AWGN samples.
As the Q training symbols are identical, if we stack all Y(q) matrices corresponding to the Q received training http://asp.eurasipjournals.com/content/2014/1/146 symbols, the resulting QM u × R matrix can be expressed as or where represents the Khatri-Rao product [30] and Similarly it can be shown that where Y R is a RQ × M u matrix, which is a concatenation of R matrices with rth matrix given as ⎡ where In the absence of noise, Equations 14, 16, and 17 form the PARAFAC model of three-way tensors with rank M [30], which allows us to apply tensor decomposition algorithms to obtain estimates of A, B, and C, denoted asÂ, B, andĈ, respectively. The details of the algorithms along with the discussion on computational requirements are deferred to Section 5. The CFO estimate and the MUIfree received signal estimate corresponding to each user can then be extracted fromÂ,B, andĈ as shown in the following sections. We refer to Equations 14, 16, and 17 collectively as the PARAFAC model. A key advantage of employing the PARAFAC model is that the uniqueness of the signal decomposition in the PARAFAC model and thus, identifiability of CFO estimation can be guaranteed.

Identifiability of CFO estimation
We now discuss the conditions, which guarantee the identifiability of CFO estimation by using the uniqueness properties of the PARAFAC model. Identifiability refers to the uniqueness of the CFO estimates. Guaranteeing the identifiability of CFO estimation is an important requirement for channel frequency selectivity as well as MUI can affect the identifiability of CFO estimation [13,37].
A key advantage of applying the PARAFAC model is that its uniqueness can be guaranteed. A sufficient condition is presented in [38], which says that the decomposition in Equation 14 is unique up to scaling and column permutation if where k A denotes the k-rank of A. Specifically, if rank(A) = M and every l ≤ M columns of A are linearly independent, then the k-rank of A is l [31]. We now investigate the k-rank of the matrices involved in the proposed PARAFAC model for the SC-FDMA UL in Equation 14.
Recall that φ m = v m + f m in Equation 11. As the CFO of each user is fractional, i.e., f m < 0.5 as mentioned in Section 2, and the starting index v m of each user is unique, v m − 0.5 < φ m < v m + 0.5 in Equation 11 lies in a unique interval for each user even if two or more users have CFO values close to each other. Therefore, using the fact that A is a Vandermonde matrix, it has full k-rank, i.e., As the PARAFAC decomposition is unique only up to scaling and column permutations, matching the columns to the respective users is required once the estimates of the matrices are available. Also, the scaling ofĈ must be identified and corrected. As each entry of A and B is a complex exponential, any scaling of columns ofÂ andB can be easily identified. The scaling of the columns of C can then be rectified by dividing each column with the product of the scaling of respective columns of A and B. As φ m = v m + f m lies in a unique interval for each user, http://asp.eurasipjournals.com/content/2014/1/146 as discussed earlier, any column permutation can also be easily identified throughÂ. Specifically, the index of the column ofÂ corresponding to the mth user, denoted as c m is given as (see Appendix A)

CFO estimation using PARAFAC
As both A and B depend on CFO,Â and/orB can be employed for CFO estimation after correction of scaling and column permutation. We use the entries ofB, and the CFO of the mth user is estimated as (see Appendix B) Thus, the acquisition range of CFO estimation is f max = N 2N t sub-carrier spacings.
Remark 1. The CFO estimation schemes proposed in [18,19,21] cannot operate on fully loaded systems and require either CP extension or multiple antenna BSs to support full load. However, the proposed CFO estimation is independent of the number of active user and thus, provides better spectral efficiency as compared to existing algorithms.

Remark 2.
It is now clear that the proposed CFO estimation only requires the transmissions of at least two identical training symbols. The idea of identical training signals has been employed before in timing and carrier synchronization for OFDM system, e.g., see [40]. Also, two out of four preambles employed in random access in LTE-Advanced contain two identical symbols [3]. However, it is important to note that the knowledge of actual data transmitted on the sub-carriers of the training signals is not required for CFO estimation. Thus, the proposed scheme can be referred as a semi-blind CFO estimation, and the training signal can be designed to optimize other receiver tasks such as timing synchronization and/or channel estimation.

CFO compensation for the training symbol
Apart from CFO estimates obtained throughÂ andB, the PARAFAC model also provides the estimate of C, whose columns contain MUI-free received signals corresponding to each user. Thus, another advantage of PARAFAC decomposition is that it automatically decouples user's received training signals, and CFO compensation can be performed for each user separately to recover y m [r] 0 ≤ r ≤ R − 1 in Equation 8. Specifically, Once the CFO-compensated training signal of each user is available, it can be employed for channel estimation. Standard channel estimation techniques, like least squares (LS) or MMSE channel estimation [41] are applicable to the estimate of the received training signal in Equation 21.

CFO compensation for the data symbols
In this section, we present the CFO compensation scheme for SC-FDMA data symbols following the training symbols. As the proposed PARAFAC model requires multiple identical training symbols, PARAFAC decomposition algorithms cannot be employed for CFO compensation of data symbols following the training, for data symbols cannot be identical. However, the received data symbols still satisfy Equation 10 with C replaced by a symboldependent matrix C(q) whose columns contain the MUIfree received data symbols of the users. As the receiver has already estimated the CFOs, it has the knowledge of A andD M (q) in Equation 10, and C(q) for the qth data SC-FDMA symbol can estimated aŝ whereÂ † denotes the pseudo-inverse of A. CFO compensation for each user can then be performed separately using Equation 21. Thus, the formulation of the received signal in Equation 10, leading to the PARAFAC model, also allows us to design a time-domain CFO compensation scheme for data symbols using Equations 22 and 21.
The CFO compensation scheme in [29] also formulates the received periodic signal in interleaved OFDMA transmissions as a low-dimensional matrix form similar to Y(q) for CFO compensation. However, the scheme in [29] is based on OFDMA transmissions, and thus, the scheme in [29] is not directly applicable to SC-FDMA transmissions.

Complexity analysis of CFO compensation
We now calculate the computational requirements of the proposed CFO compensation scheme under full load and compare it with some of the existing schemes in the literature. Complexity analysis of PARAFAC decomposition for CFO estimation is discussed in Section 5.
Calculation ofĈ(q) in Equation 22 requires inversion ofÂ, which can be efficiently implemented using the fact that A is Vandermonde matrix and a closed-form solution for its inverse exists with a complexity of only O M 2 u operations [42]. Thus, the total computational load of Equations 22 and 21 is O (N + 1)M u + M 2 u . In contrast, the time-domain CFO compensation scheme in [28] requires the inversion of an N × N block diagonal matrix with O M 2 u N + N 2 operations while the linear decorrelation-based frequency-domain CFO compensation scheme in [26] requires inversion of an N × N matrix with O (N 3 ) operations. The complexity of CFO compensation in [26] can be lowered by a banded matrix approximation at the cost of performance degradation. Thus, compared with the existing schemes, the proposed CFO compensation has the advantage of significantly reduced computational complexity.
This completes the discussion on the proposed CFO estimation and compensation techniques, and the block diagram of the proposed algorithms is shown in Figure 2. We now discuss the extension of the proposed algorithm to multi-antenna BS reception.

Extension to multi-antenna BS case
In this section, we show that the proposed synchronization schemes can be readily extended to the case where the BS has multiple antennas. Specifically, if the BS has N r receive antennas, we denote the parameters associated with the tth antenna by using the subscript 't' . Thus, y[q, n], y m [r] , and h m [p] in Equations 8 and 9 are replaced by y t [q, n], y t,m [r], and h t,m [p], respectively. We assume that the channels between a user and different receive antennas are independent. However, receive antennas are collocated, and all receive chains are assumed to employ a common clock, which implies that the CFO across receiver chains remains constant. This allows us to employ the PARAFAC model in Equation 14 for the multi-antenna BS case with A and B unchanged as in Equations 11 and 15, respectively, while Y Q in Equation 14 is updated as which is a concatenation of Y t (q) in Equation 10 corresponding to different receiver antennas. Correspondingly, C in Equation 12 is updated as with the element in rth row and mth column of C t given as Thus, the PARAFAC model is still applicable to the multi-antenna BS reception, and tensor decomposition can still be applied to jointly estimate the CFOs and MUIfree received signal corresponding to different users and receive antennas. A better CFO estimation performance is expected in this case due to more information brought by the additional receive chains at the cost of increased computational complexity.

Remark 3. As the dimensions of C in Equation 24
grow to N r R × M for multi-antenna BS as opposed to R × M in Equation 12 for single-antenna BS, the k-rank of C is given as k C = min(N r R, M). Thus, according to the discussion in Section 3.2, identifiability of CFO estimation in multi-antenna BS case is guaranteed if N r R ≥ M, which is a less restrictive condition as compared to R ≥ M for single-antenna BS. Moreover, the CFO compensation scheme presented in Section 3.5 is also applicable to multi-antenna reception. Since A and B remain unchanged,Â † in Equation 22 needs to be calculated only once for CFO compensation corresponding to all receive antennas. Thus, the complexity of CFO compensation increases only linearly with respect to the number of receive antennas.

Tensor decomposition algorithms and the associated computational complexity
In this section, we discuss how to estimate the matrices involved in the proposed PARAFAC model in Section 3.1.

PARAFAC decomposition using alternating least squares
ALS is one of the most popular algorithms for PARAFAC, which we briefly describe here in the context of SC-FDMA CFO estimation. Given the three equivalent forms of PARAFAC model in Equations 14, 16, and 17, A, B, and C matrices can be estimated by minimizing either of the following cost functions The idea of ALS is to iteratively solve these three cost functions and update one of the three matrices while keeping the other two constant. Thus, for iteration i, given A (i−1) andB (i−1) ,Ĉ i is calculated as the LS solution of J Q given aŝ Similarly,Â (i) andB (i) are then updated aŝ respectively, and the algorithm moves on to the next iteration. The initial values, i.e.,Â (0) andB (0) required for starting the algorithm are computed by setting the CFO of each user to zero, i.e., f m = 0 ∀m in Equations 11 and 15, respectively. The effect of number of iterations on the convergence of the algorithm will be shown through simulation results in Section 6.

Computational complexity of PARAFAC decomposition
The computational complexity of ALS is dominated by the pseudo-inverse calculations required in Equations 25, 26, and 27 for each iteration. Thus, the computational complexity is given by O N i (QM u + N + QR)(M 2 + M) + NQM operations [30], where N i is the number of iterations of ALS. As a comparison, the complexity of data-aided ML CFO estimation in [16] is O MN c N i (P 3 + PN 2 ) where N c is the number of candidate CFO values used in the line search, which is significantly higher than the proposed algorithm. On the other hand, the complexity of blind ESPRIT-based CFO estimation in [19] is O M 3 u + M 2 u N , which is about the same complexity order as the proposed algorithm. However, the performance of the proposed algorithm is significantly better than [19] as shown in Section 6.

Simulation results
In this section, we evaluate the performance of the proposed algorithms through Monte Carlo simulations and compare it with some of the existing schemes in the literature. For CFO estimation, we compare with the algorithms in [16,18,19], and for CFO compensation, we compare with the CFO compensation scheme in [26]. The simulation parameters are as follows: the total number of sub-carriers N = 128, length of CP N g = 16, maximum number of users M u = 8, active users M = 4, number of sub-carrier allocated to each user R = N 8 = 16, and channel length for each user P m = 6. For each iteration of the simulation, the CFO of each user is modeled as a random variable uniformly distributed between [ −f max , f max ] with f max = 0.4. The modulation scheme is quadrature phase shift keying (QPSK). The transmission is modeled in the form of frames, where each frame contains Q identical training symbols in the beginning followed by 20 SC-FDMA symbols carrying data. CFO and channel estimation is carried out using the training symbols, and the estimates are employed to decode all the data symbols in that frame. For comparison, Q symbols are also employed for CFO estimation using [16,18,19]. Figure 3 shows the performance of the proposed CFO estimation in terms of mean squared error (MSE) of CFO estimation with increasing value of f max , i.e., the maximum absolute CFO at an SNR of 20 dB. The rest of the simulation parameters is given in the beginning of this section. As the maximum CFO increases, ICI and MUI increase, and we expect to see degradation in CFO estimation performance. This trend is visible in Figure 3 for both values of simulated SNR as the MSE of CFO estimation increases slightly as f max increases.

Effect of number of training symbols Q on CFO estimation
We now evaluate the effect of number of training symbols Q and the number of iterations of ALS on the MSE of CFO estimation at an SNR of 20 dB. The values of Q range from 2 to 5, and the results are shown in Figure 4. As shown, the MSE of CFO estimation decreases as more training symbols and/or more iterations of ALS are employed for CFO estimation. However, the price for the increase in CFO estimation accuracy is decreased spectral efficiency due to more training symbols and/or increased complexity and latency of CFO estimation due to more iteration of ALS. Thus, there exists a trade-off between CFO estimation performance and spectral efficiency/complexity. Figure 4 also shows that the improvement in performance of CFO estimation is negligible after about 4 to 6 iterations of ALS. Thus, in the following simulations, only 5 iterations of ALS are employed.

CFO estimation performance for multi-antenna BS
In this section, we evaluate the performance of the CFO estimation for multi-antenna BS reception. As the CFO remains constant across receiver antennas at the BS, the performance of CFO estimation is expected to improve as more antennas are employed at the BS. This trend is  Figure 5, which shows the MSE of CFO estimation for multi-antenna BS reception. As the number of receive antennas N r is increased from 1 to 4, about 7 dB improvement in performance is observed.

Comparison of CFO estimation performance with existing schemes
In this section, we compare the performance of the proposed CFO estimation with the schemes in [16,18,19] denoted as 'MUSIC' and 'ESPRIT' and 'alternatingprojection frequency estimation (APFE)' with singleantenna BS. The same number of symbols Q = 2 is employed for CFO estimation in all schemes although both schemes in [18,19] are truly blind schemes and do not require knowledge of the training symbols. The proposed scheme is semi-blind as discussed in Section 3 because it does not require the knowledge of the training symbols and only requires that the Q = 2 training symbols are identical. The scheme in [16] Figure 6, shows that the performance of the proposed scheme is significantly better (8 to 10 dB improvement) than the MUSIC and ESPRIT schemes. The performance of ESPRIT is better than MUSIC for lower SNR while the opposite is true for higher SNR. As both MUSIC and ESPRIT are blind CFO estimation algorithms, CFO estimation accuracy can be increased by employing data symbols for CFO estimation. However, it will increase the complexity and latency of CFO estimation. Compared to APFE, the proposed algorithm offers about 1 dB performance improvement for SNR > 5 dB. However, it should be noted that the complexity of the proposed algorithm is significantly less than APFE as discussed in Section 5. The complexity of CFO estimation in [16] can be lowered by using sub-optimal algorithms at the cost of degradation of CFO estimation accuracy. Compared to the MCRB, the performance of the proposed algorithm is consistently close to the absolute reference provided by MCRB.

Comparison of CFO estimation performance with increasing number of users
In this section, we evaluate the performance of the proposed algorithm with increasing number of active users (M) and compare it with the existing schemes in the literature. The number of users varies from a single user (M = 1) to full load (M = 8) while the operating SNR is 20 dB. The results are shown in Figure 7.
As the number of users increases, MUI increases, and the performance of each algorithm decreases. As the subspace-based algorithms cannot operate on full load, the performance of MUSIC and ESPRIT algorithms is severely degraded on full load. The performance of the proposed algorithm is consistently better than the other algorithms while it also supports full-load transmissions.

Performance of CFO compensation and channel estimation
We now compare the performance of the proposed CFO compensation algorithm with the one proposed in [26]. The performance is evaluated in terms of MSE of channel estimation performed after CFO compensation. LS channel estimation [41] is employed in each case. A better CFO compensation scheme will result in less residual ICI and MUI, which will improve the channel estimation accuracy. We compare the performance of the proposed CFO estimation and compensation with the three different combinations of CFO estimation and compensation. CFO estimation algorithms in [18,19], combined with the CFO compensation in [26] are denoted as 'MUSIC + [26]' and 'ESPRIT + [26]'. As the CFO estimation accuracy for MUSIC and ESPRIT in worse than the proposed algorithm as shown in the previous section, we also evaluate the performance of CFO compensation in [26] combined with the proposed CFO estimation scheme for fair comparison of the proposed and [26]-based CFO compensation schemes. Figure 8 shows the MSE for channel estimation for all scenarios. As shown, the performance of MUSIC + [26] and ESPRIT + [26] is worse than the proposed schemes. Moreover, the performance of the proposed CFO estimation and compensation is better (about 2 dB improvement in performance) than the proposed CFO estimation combined with [26], which shows that the proposed CFO compensation schemes do a better job of extracting the interference-free received signal.  [19]. [18,19] combined with CFO compensation scheme in [26]. It is clear from the figure that the proposed CFO estimation and compensation schemes outperform both MUSIC and ESPRIT-based CFO estimation combined with CFO compensation in [26], which shows the effectiveness of the proposed schemes. These simulation results show that the proposed CFO estimation and compensation schemes offer significant performance improvements in both CFO estimation accuracy and estimation of MUI-free received signals corresponding to different users.

Conclusions
We have presented CFO estimation and compensation algorithms for SC-FDMA UL transmissions with interleaved sub-carrier allocation scheme. The proposed algorithm is based on PARAFAC and guarantees the identifiability of CFO estimation. Moreover, the proposed scheme allows the system to operate on full load gaining better spectral efficiency. Comparisons with the existing schemes in the literature demonstrate that the proposed algorithms have lower complexity and offer considerable performance improvements for both CFO estimation and compensation in SC-FDMA UL receivers.