Joint frequency offset, time offset, and channel estimation for OFDM/OQAM systems

Among the multicarrier modulation techniques considered as an alternative to orthogonal frequency division multiplexing (OFDM) for future wireless networks, a derivative of OFDM based on offset quadrature amplitude modulation (OFDM/OQAM) has received considerable attention. In this paper, we propose an improved joint estimation method for carrier frequency offset, sampling time offset, and channel impulse response, needed for the practical application of OFDM/OQAM. The proposed joint ML estimator instruments a pilot-based maximum-likelihood (ML) estimation of the unknown parameters, as derived under the assumptions of Gaussian noise and independent input symbols. The ML estimator formulation relies on the splitting of each received pilot symbol into contributions from surrounding pilot symbols, non-pilot symbols and additive noise. Within the ML framework, the Cramer-Rao bound on the covariance matrix of unbiased estimators of the joint parameter vector under consideration is derived as a performance benchmark. The proposed method is compared with a highly cited previous work. The improvements in the results point to the superiority of the proposed method, which also performs close to the Cramer-Rao bound.


Introduction
Due to the several benefits of multicarrier modulation (MCM) over single carrier modulation, the former has been considered as the primary choice in the physical layer implementation of telecommunication systems for quite a long time. Among the MCM family, orthogonal frequency division multiplexing (OFDM) has been largely studied and adopted in many wireless and wireline standards [1,2]. Still, as an alternative and promising form of MCM for future generations of wireless networks, a variant of OFDM based on offset quadrature amplitude modulation (OFDM/OQAM) has attracted much research interest in recent years, due to its many advantages over classical OFDM, including higher spectral efficiency and reduced sensitivity to timing and frequency mismatch [3]. In spite of these advantages, accurate carrier frequency and timing synchronization along with channel estimation (for the purpose of equalization) remain of There exist three main categories of synchronization and channel estimation methods for OFDM/OQAM systems: blind, semi-blind and pilot-based methods. While blind methods, e.g., [4][5][6][7], provide higher spectral efficiency by avoiding the overhead of training sequences, the requirement of a longer observation window for accurate estimation limits their tracking ability, rendering them less popular in most practical applications. In contrast, the semi-blind methods only require the transmission of a small number of parameters to resolve an estimation ambiguity, e.g., [8] and as such, they offer a useful tradeoff between spectral efficiency and estimation accuracy. However, since in practical scenarios, the training symbol overhead needed to obtain a better estimation performance is usually tolerated, our focus here is on pilot-based synchronization and channel estimation.
Compared to channel estimation, pilot-based carrier frequency offset (CFO), and sampling time offset (STO) estimation has received less attention in the OFDM/OQAM literature. In [9], a maximum likelihood (ML) symbol timing estimator is derived by using two training symbols per burst transmission. In [10], using the same preamble structure, the authors extend this work by proposing a joint ML-based estimator of the CFO and STO. Then, to avoid the computational complexity of a two dimensional ML search, a feasible joint estimation method, called approximate maximum-likelihood (AML), is developed by assuming a small CFO and using only one OQAM preamble symbol per burst. The same authors, in [11], propose a joint least-squares (LS) CFO and STO estimation method by using two identical OFDM/OQAM pilot symbols per burst transmission. Therein, as a timedomain method, the estimation is performed before the analysis filter bank (AFB) at the receiver. In [12], by using a polyphase network implementation of OFDM/OQAM, the preloading technique, and a conjugate-symmetric preamble, the CFO and STO are separately estimated. The CFO estimation exploits the phase difference between the adjacent pilots while the frame detection and STO estimation are derived based on the conjugate-symmetry property. Moreover, in [13,14], a joint CFO and STO estimation method is proposed by using a four-column preamble per burst transmission, which contains zeros in every other subcarrier and every other symbol time index.
In contrast to the CFO and STO estimation, during the past decade, many pilot-based channel estimation schemes have been proposed for OFDM/OQAM systems, which can be broadly classified into frequency domain and time domain methods. Frequency domain methods, e.g., [15][16][17][18][19][20][21][22][23], rely on the assumption that the symbol duration is much longer than the maximum channel delay spread. While these methods are generally characterized by a lower computational complexity, when the above condition is not satisfied, they will suffer from a performance degradation. Time domain methods, e.g., [24][25][26], attempt to estimate the channel impulse response (CIR) by using sequences of pilot tones. In [24], a time domain CIR estimator is proposed based on the multiple signal classification (MUSIC) and LS algorithms. In [25], a per-subchannel estimator is proposed in which the CIR on each subcarrier is estimated separately. In [26], the authors exploit pilot tone structures in OFDM/OQAM systems to derive two new CIR estimators, namely the linear minimum mean square error (LMMSE) and weighted least-square (WLS) estimators. The former exploits a priori knowledge of the CIR covariance matrix while the latter only requires the knowledge of the channel length; both methods are benchmarked against the Cramer-Rao bound (CRB). In a recent paper [27], based on a combination of the ideas of [15] and [28], a coded auxiliary pilot scheme is proposed for frequency domain channel estimation. The coded auxiliary pilots are carefully designed to compensate for the inherent imaginary interference of OFDM/OQAM. Although the aforementioned synchronization and equalization problems have been separately addressed throughout the literature on OFDM/OQAM systems, only a few research papers can be found, e.g., [29], that are devoted to STO, CFO and channel estimation at the same time, let alone a joint estimation approach based on a unified criterion. In fact, to the best of our knowledge, a joint estimation method for OFDM/OQAM systems, accounting for all the three error sources, i.e., CFO, STO, and CIR, has not yet been developed. Hence, our focus in this paper is to develop and investigate a general estimation method to fill this need.
Specifically, a new formulation of the joint parameter estimation problem in OFDM/OQAM system is first introduced, which is based on splitting the interference term on the desired received pilot into adjacent pilot, nonpilot and noise contributions. Then, by assuming Gaussian noise and independent input symbols, a pilot-based joint ML estimator of CFO, STO, and CIR is derived. Such a general approach offers many advantages, including a unified framework for the estimation of multiple parameters using a common preamble/burst structure and the proper treatment of different types of interference in the estimator derivation. More importantly, a significant performance improvement is expected in the joint estimation of the aforementioned error sources as opposed to their separate estimation. Through numerical simulations of wireless OFDM/OQAM transmission over multipath fading channels, the proposed estimator is evaluated by comparing the accuracy of the resulting parameter estimates to that obtained with a selected benchmark approach among a few existing works where all the three error sources of our focus are estimated 1 , as well as to the CRB. The simulation results show that, the proposed method is capable of significant improvements in parameter estimation accuracy, performing close to the CRB. In turn, this improved performance leads to a lower bit error rate (BER) of the compensated (i.e., synchronized and equalized) transceiver system. This paper is a more developed and improved version of our previous work [30] addressing the joint estimation problem under a more restrictive set of assumptions. Specifically, the new contributions of the current work include the following: • In [30], orthogonality of the OFDM/OQAM analysis/synthesis filters in the complex domain is assumed, as opposed to orthogonality in the real domain only. The former condition leads to important simplifications in the derivation of the ML estimator, especially in the statistical properties of the subband noise and data interference. As a consequence of such simplifications, the resulting estimator in [30] only qualifies as an approximate ML estimator, although it shows performance improvements compared to earlier work. In contrast, herein, by invoking the true orthogonality condition of OFDM/OQAM in the real domain, we can strive for an exact ML-based estimator, which achieve even better estimation accuracy. • Another important contribution of this paper is the analysis of the distributions of the subband noise and data interference terms in the general OFDM/OQAM framework, where the pilot tones can be scattered or appended as preamble to the data. In particular, we show that the subband noise contributions, after the real operation, are uncorrelated along the time and frequency axes, as a consequence of the exact orthogonality relation. Furthermore, we show through analysis and numerical simulations that the data interference terms are well modeled by a Gaussian distribution for which we derive the second order statistics. We further show that the data interference terms are only weakly correlated along the time and frequency axes. We conclude that with a carefully designed pilot distribution, their correlation can be confidently approximated as zero. • Based on this reformulation of the problem and subsequent derivation of an accurate log-likelihood function for the received pilot tones, we derive in detail the CRB for the joint parameter estimation problem under consideration. The former plays a key role in demonstrating the near optimality of the newly derived joint ML-based estimator, whose performance (estimation error) comes within 1 dB or less from the bound. • In addition to the above new theoretical contributions, the paper contains a number of improvements, including a computational complexity analysis and discussion of practical approaches to reduce implementation complexity.
The paper is organized as follows. Section 2 is dedicated to reviewing the OFDM/OQAM system model as implemented in this work. In Section 3, the joint ML estimator of the CFO, STO, and CIR is developed in details based on a new formulation. Several related aspects are also discussed, including: computational simplifications for efficient implementation; evaluation of computational complexity; CFO and STO compensation and channel equalization. The CRB on the unbiased estimator of the aforementioned parameters is derived in Section 4. The methodology used in our simulations and the results are provided in Section 5, while Section 6 concludes the paper. Appendices A and B provide important developments about statistical properties of the subband noise and data interference terms.
Notations: Bold-faced letters indicate vectors and matrices, e.g., A. The (i, j)th entry of a matrix is represented by [ A] i,j . The superscripts T and H stand for the transpose and Hermitian transpose of a vector or matrix, respectively. The operator * represents a linear convolution while the superscript * denotes complex conjugation. The identity and zero matrices are denoted by I and 0, respectively. The paraconjugate operation on a matrix function E(z) is defined byẼ(z) = E(1/z * ) H . The operators E{.}, R[ .] and I[ .] stand for the expected value, real part and imaginary part of their arguments, respectively. The floor operation is denoted by . while ||.|| represents the second norm operation.

Problem formulation
In this section, the OFDM/OQAM system model is presented along with its input-output relation over a frequency selective fading channel. The effects of the CFO and STO on the reconstructed signal are discussed and finally, the joint estimation problem for the CFO, STO, and CIR is stated.

OFDM/OQAM System Model
The OFDM/OQAM system model, as implemented in this work and commonly used in the literature, e.g., [31], is illustrated in Fig. 1. OFDM/OQAM makes use of a specific filter bank structure where the upsampling and downsampling factor equals half the number of subcarriers, denoted by M. At each input symbol time, with symbol duration T s , a vector of discrete input symbols is loaded on the M available subcarriers. The latter are separated in frequency by F s = 1/T s , so that the system occupies a total bandwidth of W = MF s .
On the transmitter side, let x k,n ∈ A denote the complex valued symbols at the input, where k ∈ {0, 1, . . . , M − 1} is the frequency index, n ∈ Z is the symbol time index, and A is the digital constellation from which the symbols are drawn. In the first stage of pre-processing, each x k,n is converted to a pair of real symbols, d k,n , according to the following equations, This complex-to-real (C2R) operation doubles the sampling rate of the subcarrier signals. The second stage of pre-processing involves multiplication of the real OQAM symbols, d k,n , by the sequence θ k,n = e j π 2 (k+n) , which results in complex symbols s k,n = d k,n θ k,n = d k,n e j π 2 (k+n) .
In a practical implementation of OFDM/OQAM, the output signal y[ m] is passed through a pulse shaping filter and up-converted to an appropriate frequency band for transmission over the physical medium. In this work, however, we consider an equivalent baseband channel model for simplicity. Specifically, the channel is modeled as a linear time-invariant system with FIR h[ l] of length Q, and corresponding system function H(z) = Q−1 l=0 h[ l] z −l . The filter length Q is proportional to the channel delay spread τ ds , that is, Q = Mτ ds /T s + 1. The channel coefficients h[ l] are assumed to remain constant during the transmission time of one data block of N symbols, i.e., block duration plus overall processing delay of the transceiver system. Finally, the channel output is corrupted by additive white Gaussian noise (AWGN) η[ m], assumed to be circularly complex with zero mean and variance E[ |η[ m] | 2 ] = σ 2 η . A more detailed discussion of the effects of fading channels, CFO and STO is provided in Subsection 2.2.
On the receiver side, letȳ[ m] denote the received baseband signal after transmission through the noisy channel. In the analysis filter bank (AFB), signalȳ[ m] is passed through the analysis filters with FIR g k [ m] of length L p and corresponding system functions 2 G k (z) = 0 m=−L p +1 g k [ m] z −m , whose outputs are downsampled by M/2 afterwards. The resulting symbols at the output of the AFB can be represented ass where the range of summation over m is determined by the finite support of the subband FIR filters. The symbolss k,n then pass through the first postprocessing stage, which involves multiplication by the sequence θ * k,n followed by taking the real part, i.e., The second post-processing stage is the real-to-complex (R2C) conversion, where two consecutive real valued symbols are combined into a complex one as follows, x k,n = d k,2n + jd k,2n+1 , k even , We consider a complex-valued, uniform modulated filter bank, where the subchannel filters are all generated from a common low-pass prototype filter, p[ m], by means of exponential modulation as follows, where The prototype filter used in this work is a near perfect reconstruction (NPR), real-valued linear phase (symmetric) FIR low-pass filter with length L p and support region m ∈ {0, 1, . . . , L p − 1}. It is derived by using the frequency sampling technique, as in [31], with overlap factor K, so that its non-zero coefficients can be represented in closed form as where the coefficients A[ l] satisfy A[ l] 2 +A[ K − l] 2 = 1 for l = 1, 2, . . . , K/2 and α is a normalization factor such that m p[ m] 2 = 1.
In particular, for the adopted value of the overlap factor, i.e., K = 4, we have where A[ 1] = x can be determined by using various optimization criteria. Since the LS criterion is used in this work (i.e., minimizing stopband energy), we set A[ 1] = 0.97741677 according to Table 1 in [31]. Since the prototype filter is linear-phase (symmetric), the overall processing delay of the complete OFDM/OQAM transceiver system will be L p T s /M. By using the paraconjugates of the synthesis filters as the analysis filters in the receiver, as specified in (7), the orthogonality condition of the transceiver system can be expressed as where δ kk denotes the Kronecker delta function [3, 32, 33] 3 .

Effects of fading channel, carrier frequency offset, and sampling time offset
In addition to channel fading and additive noise, as illustrated in Fig. 1, the received signalȳ[ m] at the front-end of the receiver will be affected by CFO due to oscillator mismatch or Doppler effect, as well as STO due to imperfect sampling. These effects can be mathematically modeled asȳ where τ 0 is the normalized STO 4 with respect to the sampling period at the baseband transmitter output, T s /M, and μ 0 is the normalized CFO with respect to F s , the subcarrier frequency spacing. It is worth mentioning that, similar to previous works on this subject (e.g., [12,13]) the second-order effects, i.e., those of CFO, STO, and CIR on one another, have been neglected in this model. It has been observed through simulations that these effects are, indeed, negligible.
From (12), (4), and (5), useful expressions can be obtained for the real-valued output symbolsd k,n that appear in the OQAM post-processing module on the receiver side in Fig. 2 5 . Specifically, where ζ k,n represents the contribution from the transmitted data (pilot and information symbols) as given by and η k,n represents the additive noise (i.e., the contribution from η[ m]) passed through analysis filter bank and first post-processing stage, as given by Substituting (3) into (14) and using (7), ζ k,n can be further developed as follows, where we define The term γ k ,n k,n (l, μ 0 , τ 0 ) in (18), known as ambiguity function [34], characterizes the level of the 'intrinsic interference' of the n th real input sample from the k th subband, d k ,n , on the nth output sample from the kth subband, through the lth channel tap, h[ l], in the presence of CFO, μ 0 , and STO, τ 0 . In the special case l = μ 0 = τ 0 = 0, the quantity γ k ,n k,n (0, 0, 0) describes the level of complex orthogonality of the analysis/synthesis filters of the OFDM/OQAM transceiver system up to a multiplicative factor θ k ,n θ * k,n . The values of γ k ,n k,n (0, 0, 0) for the filter bank adopted in this work, with the prototype filter p[ m] and its parameters as described in Section 2.1, are given in Table 1. We note that due to the finite length of the subband filters f k [ m], the summation in (18) is, in fact, performed over a finite range.
For the same reason, the range of summation over the symbol time index n in (17) is also finite.

Problem statement
If estimates of the CFO and STO are available, they can be compensated at the receiver front-end to avoid their degrading effects. Likewise, if estimates of the CIR coefficients are available, they can be used on the receiver side to design a set of subband equalizers to compensate for the distortion caused by the multipath fading channel 6 . The estimation and compensation of these imperfections is critical to achieve the low level of bit error rate (BER) required for the practical operation of multi-carrier modulation in broadband communication systems.
The main focus of this work, therefore, lies in the joint estimation and compensation of above imperfections for the OFDM/OQAM system. To this end, the use of pilotbased estimation is preferred over the blind approach, since the latter generally requires a longer data record to achieve a desired level of accuracy, which in turns increases the computational complexity and limits applications to static or very slowly time-varying channels. Furthermore, the framework of point estimation theory is employed here, where the parameters under estimation are modeled as unknown, yet deterministic quantities, i.e., no prior distribution is assumed. By transmitting a sequence of known pilot tones, and observing the received sequence over a given time interval, our specific interest lies in developing and investigating the properties of the joint ML estimator of the CFO μ 0 , STO τ 0 and CIR h [ l] for the generic OFDM/OQAM transceiver system illustrated in Fig. 1 and described in mathematical terms in Section 2.1. We shall denote the resulting ML estimates bŷ μ,τ andĥ[ l], respectively.

Joint estimation
In this section, we first introduce our proposed approach to the estimation problem by splitting a received pilot symbol into pilot, data, and noise contributions. Next, we formulate and develop a pilot-based joint ML estimator of the CFO, STO and CIR. We then present possible simplifications to reduce the implementation complexity of the resulting joint ML estimator and discuss its computational requirements. Finally, we explain how the jointly estimated parameters can be used to compensate the detrimental effects of STO, CTO, and CIR.

Structure of received pilots
In this work, for convenience in analysis, a real OQAM symbol at time n is defined as the ordered set of M subband symbols d k,n for k ∈ {0, 1, . . . , M − 1}, as they appear at the output of the C2R modules in the preprocessing stage of the SFB in Fig. 1 To allow for flexibility in the application of the derived pilot-based estimation method, we consider a general framework for the allocation of pilots. Specifically, within a burst of N consecutive Table 1 Transmultiplexer response γ k ,n k,n (0, 0, 0) of the OFDM/OQAM system using Bellanger's (also known as PHYDYAS) filter [41,42]  For clarity in the presentation, the data or information symbols (i.e., non-pilot) are denoted as d k,n ≡ d d k,n where (k, n) / ∈ P. These symbols, unknown to the receiver, are modeled as independent and identically distributed (i.i.d.) random variables with zero-mean and variance 1 2 σ 2 x . As mentioned earlier, a demodulated pilot symbol on the receiver side of the OFDM/OQAM system, assuming a general pilot distribution P which is scattered among the data symbols, consists of additive contributions from surrounding pilot symbols, surrounding data symbols and noise. Specifically, for the case (k, n) ∈ P, the real-valued output symbold k,n ≡d p k,n in (13) can be written as where ζ p k,n and ζ d k,n , respectively, denote the contributions from surrounding pilots and data. The pilot contribution can be expressed as wherē The data contribution, which can be interpreted as "data interference", can be expressed as In the sequel, we use the described splitting of a received symbol into pilot, data and noise contributions to develop the joint ML estimator of CFO, STO and CIR for a general pilot-data distribution. When the received symbold k,n =d p k,n corresponds to a transmitted pilot, in the aforementioned formulation in (20) and (21), the various terms d p k ,n represent the contribution from the corresponding transmitted pilot i.e. d p k,n , as well as, depending on the pilot distribution, possible contributions from surrounding pilot symbols i.e. , d p k ,n for (k , n ) ∈ P and (k , n ) = (k, n) . Since the pilots are known symbols, this part can be accounted for as a deterministic component (albeit dependent on the unknown parameters μ 0 , τ 0 and h [ l]) in the derivation of the ML estimator. To further proceed with this derivation, we therefore need to characterize the statistical properties of the noise term η k,n and the data contribution term ζ d k,n in the decomposition (19) of the received pilot symbold p k,n . From the AWGN assumption made earlier on the additive channel noise η [ m], and the linearity of the processing operations involved in the OFDM/OQAM receiver, it follows that the subband noise contribution η k,n is a jointly Gaussian (real) random process in the variables (k, n). Furthermore, on account of the orthogonality property of the analysis and synthesis filters, as stated in (10), it follows that the various random variables η k,n (for different pairs (k, n)) are uncorrelated and therefore statistically independent. Specifically, it can be shown (see Appendix A) that The data contribution term ζ d k,n in (22) is unknown to the receiver and therefore should be modeled as a (realvalued) random process. Based on the assumptions made above on d d k,n , we note that the expression (22) involves the weighted sum of a large number of statistically independent, zero-mean terms d d k ,n . Hence, invoking the central limit theorem, we shall assume that ζ d k,n in (22) can be conveniently modeled as zero-mean (real-valued) Gaussian random process. This assumption is further investigated in Appendix B. In addition, under mild assumptions usually satisfied in applications, it can be shown that the random variables ζ d k,n (for different pairs (k, n)) are nearly uncorrelated. Specifically (see Appendix B), we have that From (19), (23), and (24) by introducing v k,n = ζ d k,n +η k,n and assuming that the data and the additive noise are statistically independent, it follows that the terms v k,n are independent Gaussian random variables with variance Finally, by substituting v k,n in (19) we will haved p k,n = ζ p k,n + v k,n , which provides a convenient basis for the derivation of the coveted ML estimator.

Pilot-based joint ML estimator
Focusing on the pilot contribution in (20), we have where the superscripts R and I are used in the sequel to identify the real and imaginary parts of the underlying quantity. Hence, by letting we can obtain the relationship between the received symbol and the channel taps as By stackingd p k,n , λ R k,n (μ 0 , τ 0 ) and λ I k,n (μ 0 , τ 0 ), and v k,n over the time index n and then over the frequency index k, we arrive at the following matrix-vector equation, As a result of the AWGN assumption and the ensuing assumptions on the noise and data interference terms η k,n and ζ d k,n , V will be a real Gaussian random vector with zero mean and a nearly diagonal covariance matrix Similarly, for given values of μ 0 , τ 0 and h, the observationD p is also a Gaussian random vector with mean (μ 0 , τ 0 )h and covariance CDp ≈ σ 2 v I. Hence, the probability density function (PDF) ofD p can be presented as, Thus, the log-likelihood function (LLF) is written, up to a constant, as, The joint ML estimators of the CFO, CIR and STO can be obtained by maximizing the derived LLF with respect to the parameters μ 0 , τ 0 and h . Let the unknown search parameters for CFO and STO be denoted by μ and τ . By fixing μ and τ and varying h in C 2Q , the LLF achieves its maximum at where (μ, τ ) † = ( (μ, τ ) H (μ, τ )) −1 (μ, τ ) H is the pseudo-inverse of (μ, τ ). By substituting the resulting channel guess of (32) into the LLF we can obtain the CFO and STO estimates using a two-dimensional search according to The ML estimate of the CIR can be obtained by substituting the estimates (μ,τ ), resulting in

Computational simplifications
Herein, three simplifications are introduced in computingλ k,n (l, μ 0 , τ 0 ) in (21) to speed up the calculation of the LLF (31) significantly. To this end, we first consider the term γ k ,n k,n (l, μ 0 , τ 0 ) in (18), whose definition includes a summation over the length (pretty large) of the prototype filter p[ m]. Since for the filter banks, we have f k [ m] = p[ m] e j2πkm/M , we can write γ k ,n k,n (l, μ 0 , τ 0 ) = θ * k,n θ k ,n ϕ n,n k−k (l, μ 0 , τ 0 ) where letting α = k − k In this way, instead of calculating γ k ,n k,n (l, μ 0 , τ 0 ) for all the possible pairs of (k , k), it is sufficient to compute ϕ n,n α (l, μ 0 , τ 0 ) for only possible values of k − k = α and find the corresponding γ k ,n k,n (l, μ 0 , τ 0 ) by multiplication with a discrete phase factor as in (35). The number of possible values of α depends on the distribution of the pilots over the frequency axis.
In the second simplification, regarding the calculation of the interference from the surrounding pilots, λ k,n (l, μ 0 , τ 0 ), we might assume that, owing to the excellent spectral containment of the prototype filters, the main source of the CFO-induced interference on each subband is due to its first few neighboring subbands; i.e, the interference from more distant subbands is negligible. Hence, to derive the total interference from subbands k on the subband with index k in (21), it suffices to take into the account the interference from a few neighboring pilot-carrying subbands on each side of the kth subband. Consequently, (21) can be approximated as where in practice, the value of β can be set 7 to 2.
To reduce the complexity of the estimator even further and make the two-dimensional search for CFO-STO more practical, the third compromise is to only consider the first few channel taps in computing pilot contribution in (27). This is due to the fact that these taps contribute the most to the entire power of the channel. By considering the implemented channel as described in Section 5.1, this reduces the running time of the proposed method approximately by three times. As our experiments confirm, this simplification only introduces a marginal degradation to the performance of the estimator, while maintaining it still significantly superior to that of the benchmark.
It is notable that, from (32) to (34), a second iteration of this estimation method can be performed by running a two-dimensional search over (μ, τ ) using the obtained channel estimate and continuing to obtain a new set of estimates. However, our experiments indicate that the improvement gained by performing a second iteration is not significant enough to justify the additional complexity. Indeed, as it will be seen from the results, performing close to the CRB, a single iteration suffices to provide a significant improvement over the benchmark method.
As the LLF in (31) provides a closed form solution to the estimation problem, the multi-dimensional ML estimation is reduced to a two-dimensional search over the unknown CFO and STO, μ and τ . The search is performed in two stages, namely, a coarse search followed by a fine one in proximity of the coarse estimate. The decisive factor in the complexity of the proposed ML estimator is the number of operations required for each evaluation of the LLF. This is approximately calculated as C 8MQN 2 p + 8Q 2 N p + O(QN p ) complex-valued operations where the first term is the cost of forming (μ, τ ), the second term is the cost of QR decomposition to solvē D p = (μ, τ )h (μ, τ ) and the third term is the cost of forming the LLF. Since the first term is dominant for typical N p and Q, we conclude that the overall complexity is proportional to the squared number of pilot tones in the burst. This complexity evaluation is based on the case where none of the aforementioned practical simplifications above in Subsection (3.3) are applied. Employing these simplifications reduces the complexity of the practical implementation by a factor of O QM 2 , where the first two foregoing simplifications each reduce the complexity by a factor O(M) and the third one by a factor O(Q). Figure 2 illustrates the block diagram of the OFDM/OQAM receiver as implemented in the proposed method for estimation and compensation of CFO, STO and CIR. First, after passing through the AFB and multiplication by θ * k,n and real taking, the received symbols are used to obtain the ML estimates of the CFO, STO and CIR. The CFO and STO estimates are then fed back to correct the signal at the front-end of the receiver. The CFO-STO compensated signal can be written as

Compensation of the estimated parameters
where W I [ m;τ ] represents the hamming-windowed sinc fractional-delay interpolation filter used for STO simulation [35]. Next, the CFO-STO corrected symbols pass through the AFB again where, this time, a single-tap per subcarrier equalization is performed based on the DFT of the estimated CIR according to the following equation, where e k for k ∈ {0, . . . , M − 1} is the coefficient of the equalizer for subband k andĤ(z) = being the estimated CIR coefficients. It is notable that although a single-tap per subcarrier equalizer is used here, generalizations to other, more advanced types of equalizers are possible. This simple equalization scheme inverts the channel at the center frequency of the corresponding subcarrier and it works well in mildly selective channels as long as the number of subcarriers is sufficiently large [36]. The equalized symbols can be represented as, The final received symbols are then obtained by undergoing the OQAM post-processing stage.
It is notable that the formulation of the problem through the linear equations obtained from the LLF of the received pilots, leads to performing the channel estimation in the time domain. Furthermore, as mentioned earlier in Section 1, the channel estimation methods in time domain do not make assumption on the subchannels being almost flat. In contrast, the channel equalization has been performed in the frequency domain by using onetap-per-subcarrier scheme, which is a common equalization technique in MCM systems including OFDM and OFDM/OQAM. In addition to simplicity, this technique allows for flexibility of equalization in a multi-user scenario where different subchannels need to be equalized separately.

Joint Cramer-Rao bound analysis based on the Gaussian assumption for data interference ζ d k,n
In this section, assuming known transmitted symbols, i.e., pilots, the CRB on the covariance matrix of unbiased estimators of the CFO, STO and CIR are derived [37] 8 . Letting θ be the vector including the unknown (real) parameters we have, Therefore, θ holds (2Q + 2) real entries with indexes denoted by a and b ∈ {1, 2, . . . , 2Q + 2}. The Fisher information matrix (FIM), J , is then (2Q + 2) × (2Q + 2). For the covariance matrix C V , since for all a in the aforementioned interval ∂C V /∂θ a = 0, the entries of FIM are given by, which results in, n (μ, τ )h I , the partial derivative of A with respect to θ a for four different ranges of the indexes a and b, namely a = 1, a = 2, 3 ≤ a ≤ Q + 2 and Q + 3 ≤ a ≤ 2Q + 2 can be written as, Also, for 3 ≤ a ≤ Q + 2, where l = a − 2. In addition, for Q + 2 ≤ a ≤ 2Q + 2 n (l, μ, τ ) . (47) Thus, J (θ) can be written as where according to (42)-(47) we have, Also, is a 1 × Q vector whose entries can be written as Moreover, φ is a 1 × Q vector whose entries can be represented as In addition, χ , ψ and ζ are Q × Q matrices with the following entries, The CRB of an unbiased estimator of θ , denoted asθ, is expressed as Cov(θ ) ≥ J (θ ) −1 . As a result, we can compute the variance of the unbiased CFO estimator, μ, as Similarly, the variance of the unbiased STO estimator,τ , is derived as Finally, for the lth tap of the channel, the lower bound of the unbiased estimator can be obtained as The lower bound on the average variance of the CIR estimator over different taps can be obtained by assuming that the tap estimates are independent. Then, we can write It is worth emphasizing that in general the entries of the vectors κ, , φ, ρ and χ are not negligible, i.e., there is a coupling between the estimation errors that can be achieved for μ, τ , and h. This means that, for example, the CRB on μ with no channel knowledge will be greater than the one obtained with a known channel, which could be directly computed as the inverse of the first entry of the FIM, i.e. (J 1,1 ) −1 . This also applies to the STO and CIR estimators with or without knowledge of other parameters. Furthermore, as it has been mentioned in [38], the derivations imply that the CRB is a function of the particular channel realization. This has also been observed through simulations. Also, it should be noted that in the derivation and implementation of the CRB, the simplifications of Section 3.3 are not introduced, i.e., on a given output symbol, the impact of all the subbands are taken into account.

Performance evaluation
This section begins with the simulation setup and parameter settings for performance evaluation of the proposed method compared to the existing one, followed by presentation and discussion on their estimation results and complexity evaluation.

Methodology
The prototype filter of the transceiver system is obtained using the frequency sampling technique with overlap factor K = 4 as described in [31] and used in [29]. The data are modulated to a 4-QAM constellation. The input sampling frequency is F s = 175 kHz corresponding to a channel bandwidth of MF s = 11.2 MHz.
To obtain BER that are more representative of a practical digital communications system, a punctured convolutional channel coding scheme is applied to the information sequence with the overall rate of 2/3 by using constraint lengths vector [5 4] and vector of function generators [23 35 0; 0 5 13]. A frequency selective wireless channel is used with Q = 8 randomly generated coefficients h[ l] based on the ITU-Vehicular A channel guidelines [39]. The channel is assumed constant during the transmission of a burst but changes over different transmissions 910 . During each transmission, the STO and CFO obey a uniform distribution within the intervals − T s 4 T s 4 and − F s 4 F s 4 . The root mean squared error (RMSE) and BER results are obtained by running 500 independent Monte-Carlo simulations for given values of the SNR per bit. The latter is expressed as E b /N 0 , where E b denotes the bit energy and N 0 is the noise power spectral density level. Regarding the implementation results of [29], we follow the estimation and equalization algorithms and structure precisely as described in the paper. This method is referred to as "Stitz" in Figs. 4, 5, 6 and 7.
We compare the proposed method to [29], one of a few works that estimate all the three aforementioned parameters in their paper. To this end, two different distributions of pilots are considered in the implementation of the proposed method. In the first distribution, the pilots are scattered within each burst as mentioned in [29]. Specifically, the size of the transmitted bursts in time and frequency are according to DL-PUSC configuration as illustrated in Fig. 3, with M = 64 subcarriers and N = 54 input symbols. The other distribution, which also uses M = 64 subcarriers, adopts a full preamble of pilot tones of length T = 4, followed by N − T = 50 data symbols. In this way, the two bursts share the same data rate and hence are comparable. The consideration of these two different distributions of pilot tones is useful to assess the performance of different transmission modes.

Results and discussion
The performance and the complexity comparison of the proposed estimator vis-a-vis the benchmark are, respectively, presented in this subsection. Figure 4 compares the proposed method with [29] in terms of RMSE of CFO as a function of E b /N 0 . The average CRB of the CFO estimation is also presented in the figure. The estimation is jointly performed in the presence of other sources of error, namely, a fixed STO of 2.5% with respect to T s , and Rayleigh fading channel as described earlier. The figure indicates that the proposed method not only outperforms the other method as implemented with a preamble of full 64 × 4 pilot tones, but also, is capable of significant improvement when adopting the burst structure of DL-PUSC. Especially in the former case, the performance of the proposed estimator is very close to the average CRB as a lower bound.

Estimation results
Comparison of the two methods and the average CRB in terms of RMSE of STO is depicted in Fig. 5, where a fixed CFO of 5% and the multi-path channel are used. The superior performance of the proposed method implemented in both configurations can be seen in the figure. Again, the best result belongs to the one based on a full grid of pilot tones followed by the data; although, the proposed method remarkably reduces the estimation error with the same burst structure as in [29]. Similar to the previous figure, the average CRB runs closest to the full-preamble case.
In Fig. 6 , the RMSE of CIR, in the presence of fixed CFO and STO, is depicted for the three implementations along with the average CRB. Similar to the previous figures, It is worth mentioning that, in this work, all the CRB terms are inversely proportional to This explains the mild slope of the estimation error graphs with respect to E b /N 0 . In other words, in our range of interest of E b /N 0 , the dominant term is the power of the data interference, σ 2 ζ d rather than the noise power. However, our experiments show that the noise power effect begins to rise when applying lower SNR values with an abrupt increase in the estimation error around E b /N 0 = −10 dB.
The coded BER performance of the methods, after estimation and compensation, are compared in Fig. 7. The figure indicates that, in both configurations, the proposed method is capable of a significant decrease in BER of the system especially at high input E b /N 0 .

Complexity evaluation and discussion
The complexity of the two methods is compared in Table 2 in terms of the running time needed for the processing of a burst transmission of size 64 × 54. For the proposed method, three different implementations are considered as follows: In addition, Table 2 includes the RMSE figures of CFO, STO and CIR (for Eb/N0 = 20 dB), to illustrate the achievable trade-off between estimation accuracy and computational complexity. The RMSE figures are obtained based on the average of 500 burst transmissions of size 64 × 54 with the scattered pilot distribution as described in Section 5.1, on an ordinary quad-core PC running MATLAB 2016 b. The table indicates that the performance degradation due to the suggested simplifications is rather small, i.e., on the order of 10% for Prop 1 and 25% for Prop 2. Furthermore, by employing all the foregoing simplifications, the obtained running time of the proposed method comes within about two times that of the benchmark, which implies that, by using state-ofthe-art DSP technology, the running time of the proposed method remains practical. It should be noted that, as can be seen from the results in Subsection 5.2.1, in mid and specially low SNR regime, the performance gap between the two methods is much larger than that presented in Table 2 for E b /N 0 = 20 dB. Hence, considering the performance gain of the proposed estimator presented in Section 5.2, the complexity compromise seems justifiable.
From a theoretical perspective, a key advantage of the proposed ML-based approach is to offer a unified treatment leading to a compact solution format (i.e., near closed form) for the joint estimation of CFO, STO and CIR in OFDM/OQAM systems, in contrast to making use of different signal processing techniques (correlation, phase estimation, etc.) for the treatment of different impairment sources. Another important motivation behind the ML-based approach lies in its asymptotic optimality, as observed in many practical situations of interest, under conditions of high SNR or long observation time [37].
From a practical perspective, the proposed joint MLbased estimator leads to significant improvements in estimation accuracy compared to existing methods, as demonstrated by the simulation results in this section. Specifically, over the complete range of SNR considered (from 0 to 20dB), the proposed estimator achieves the best performance for the three types of parameters, i.e., CFO, STO and CIR coefficients. For each parameter type, the resulting RMSE obtained with the joint-ML estimators comes within 1dB of the CRB. In turn, the The main limitation of the proposed method is the additional computational burden. Indeed, the calculation of the joint ML estimator involves several matrix operations and a two-dimensional search over the CFO and STO space. However, by allowing a number of possible simplifications to reduce the processing time as discussed in Section 3.3, the proposed method offers a trade-off between complexity and performance.

Conclusions
A new general pilot-based ML joint estimation method for OFDM/OQAM systems has been developed and evaluated. The CFO, STO, and CIR effects have been jointly estimated and compensated. The CRB on the joint estimator variance was also derived and implemented as a reference to evaluate the performance of the proposed algorithm. The comparison has been made with a highlycited method among the few research papers of the same focus. The results have shown the significant improvement that the proposed method offers in both transmission modes, i.e., as scattered pilots in data and as a full preamble of pilot tones followed by the data. As it was observed on the figures, the proposed estimator, especially when used in a full-preamble setup, performs close to the CRB and can robustly estimate the CFO, STO, and CIR. Furthermore, by comparing the performance and running times of the simplified versions of the proposed method to those of the benchmark, we conclude that the former provides a significant improvement over the estimation result while maintaining computational complexity in a feasible range. This, in turn, offers a useful trade-off between performance and complexity. In addition, to examine the covariance of these data interference terms at pilot locations, let the data symbols d d k,n be i.i.d. (real) random variables with zero mean and variance 1 2 σ 2 x . By definition, we have According to (22) we can write Following the observation above in the opening of this appendix, by invoking the central limit theorem and the assumptions made on d d k,n , it follows from (75) that ζ d k,n can be modeled as a zero-mean (real-valued) Gaussian random process. For the second order moments, it follows from (75) that We note that the filter-bank response term γ k ,n k,n (l, μ 0 , τ 0 ) tend to be non-zero only in the vicinity of (k, n).
Therefore, for a given (k , n ) either γ k ,n k,n (l, μ 0 , τ 0 ) or γ k ,n k ,n (l, μ 0 , τ 0 ) tend to be zero. Hence, for which we can write It can be seen from (79) that the variance of the data interference term depends on the particular channel realization and the unknown CFO and STO parameters. However, since the filter bank response γ k ,n k,n (l, μ 0 , τ 0 ) does not profoundly vary with the aforementioned parameters, the effect of the channel has been approximated by a fixed gain. This approximation can be further inspected by the simulation result in Fig. 9. The figure illustrates the covariance of data interference contributions with respect to one another in the simulated OFDM/OQAM system with the scattered pilot-data distribution considered in this work. More specifically, it depicts the covariance of a data interference contribution at a pilot location with data interference contributions to other pilot locations in presence of 8-tap Rayleigh fading channel, 20% CFO, 25% STO with no noise. For the considered pilot-data distribution with length of 160 OQAM symbols, the second largest Covariance of data interference at a pilot location with data interference in other pilot locations covariance is less than 10% of the peak value. Although only one sample is presented here, we were able to verify the consistency of this result for different values of CFO and STO with various fading channels, indicating that the observed maximum off-center covariance value does not exceed 14% of the peak value. This result is consistent with our approximation above in (77) and (79). 1 The authors' intention in the performance comparison is not to include the existing estimators in which only one or two of the aforementioned error sources are considered. 2 For convenience in our analysis, G k (z) is assumed noncausal; although, in practice, causality can be restored simply by introducing an appropriate delay in the receiver. 3 The precise orthogonality condition corresponding to the specific configuration of OFDM/OQAM system adopted in this work does not exist in the literature. However, by applying appropriate modifications pertained to the changes in the configuration, the presented orthogonality condition can be derived. 4 In practice, non-integer STO can be modeled with the help of an ideal low-pass interpolation filter (see Section 3.4). 5 Due to the inherent property of OFDM/OQAM, as implied in (10), which only sustains real orthogonality, estimation is performed on the received pilots after extracting the real part. 6 In this work, we develop a single-tap-per-subcarrier equalizer, although generalizations to other, more advanced types of equalizers are possible. This simple equalization scheme inverts the channel at the center frequency of the corresponding subcarrier and it works well in mildly selective channels as long as the number of subcarriers is sufficiently large [36]. 7 This is only valid for the PHYDYAS filter. There are other prototype filter functions in discussion for OFDM/OQAM systems which may require a different value for β [34]. 8 It is worth emphasizing that the CRB derived in this section is based on the approximation of Gaussian distribution for the data interference term ζ d k,n ; hence, it provides a benchmark for the performance of the estimators developed on such an assumption. Nevertheless, as it can be seen later, the proposed estimator obtains satisfactory results by a significant improvement over the existing method. 9 The authors have also evaluated the performance of the proposed and the existing method by adopting a timevarying fading channel in contrast to the static channel used in the following simulations. However, the results show the same relative trend among the curves corresponding to the two methods and the two pilot distribution schemes described in this section. 10 Although, in the proposed method, the channel length is supposed to be known, the authors have investigated a mismatch in the presumed channel length with the actual one and did not find a significant performance degradation in CFO and STO estimation. Nevertheless, the effect of such a mismatch on channel estimation and BER performance is considerable. In a practical system, to mitigate this problem, as a conventional way to make the channel estimation task more robust, the channel length can be matched to a worst-case scenario and may be modified adaptively [40].

Funding
This work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada, under sponsorship of InterDigital Canada.

Authors' contributions
AB has contributed to the conception and design of this research, mathematical developments, performing the simulations, analysis and interpretation of the results, and drafting the manuscript. BC has participated in the conception and design of this research, mathematical developments, analysis and interpretation of the results, and critical revision of the research and the manuscript. Both authors read and approved the final manuscript.