EM-Based Channel Estimation Algorithms for OFDM

Estimating a channel that is subject to frequency-selective Rayleigh fading is a challenging problem in an orthogonal frequency division multiplexing (OFDM) system. We propose three EM-based algorithms to e ﬃ ciently estimate the channel impulse response (CIR) or channel frequency response of such a system operating on a channel with multipath fading and additive white Gaussian noise (AWGN). These algorithms are capable of improving the channel estimate by making use of a modest number of pilot tones or using the channel estimate of the previous frame to obtain the initial estimate for the iterative procedure. Simulation results show that the bit error rate (BER) as well as the mean square error (MSE) of the channel can be signiﬁcantly reduced by these algorithms. We present simulation results to compare these algorithms on the basis of their performance and rate of convergence. We also derive Cramer-Rao-like lower bounds for the unbiased channel estimate, which can be achieved via these EM-based algorithms. It is shown that the convergence rate of two of the algorithms is independent of the length of the multipath spread. One of them also converges most rapidly and has the smallest overall computational burden.


INTRODUCTION
Orthogonal frequency division multiplexing (OFDM) [1], a spectrally efficient form of frequency division multiplexing (FDM), divides its allocated channel spectrum into several parallel subchannels. OFDM is inherently robust against frequency-selective fading since each subchannel occupies a relatively narrowband, where the channel frequency characteristic is nearly flat. OFDM has an additional advantage of being computationally efficient because the fast Fourier transform (FFT) technique can be used to implement the modulation and demodulation functions [2]. Furthermore, the OFDM system can eliminate interframe interference (IFI 1 ) through the use of a cyclic prefix (CP) that is longer than the order of the channel impulse re-sponse (CIR). OFDM has already been used in European digital audio broadcasting (DAB), digital video broadcasting (DVB) systems, high performance radio local area network (HIPERLAN) and IEEE 802.11a wireless local area networks (WLAN). It has also been shown that OFDM is an effective way of increasing data rates and simplifying the equalization in wireless communications [3]. However, it is not possible to make reliable data decisions unless a good channel estimate is available for coherent demodulation. Although differential detection could be used to detect the transmitted signal in the absence of channel information, it would result in about a 3 dB loss in signal-to-noise ratio (SNR) compared with coherent detection. A number of channel estimation algorithms have been reported in the literature [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]. For some of these algorithms, however, the channel estimate is continuously updated by transmitting pilot symbols using specified time-frequency lattices. One class of  such pilot-assisted estimation algorithms adopt an interpolation technique with fixed parameters (two-dimensional (2D) [6,7] or one-dimensional (1D) [5]) to estimate the channel frequency response by using the channel estimate obtained at the lattices assigned to the pilot tones. Linear, spline, and Gaussian filters have all been studied [5]. Another method in this category adopts a known channel frequency covariance matrix and uses a channel estimate at pilot tones to estimate the CIR in the sense of minimum mean square error (MMSE) [4,8,9,11]. Shortcomings of these algorithms include (1) a large error floor that may be incurred by a mismatch between the estimated and real CIR, (2) difficulty in obtaining the channel frequency covariance matrix and the resultant error due to channel statistics mismatch, and (3) spectrum inefficiency due to the overhead (typically 20%) associated with use of pilot symbols. In addition, several kinds of blind channel estimation algorithms have been proposed in order to improve transmission efficiency. These algorithms are based on the statistical property of received signals (e.g., second-order statistics [12,13,14,15]), the characteristic of virtual subcarriers [16], and the finitealphabet property of transmitted signals [18]. However, each of these blind estimation algorithm has its limitation. For example, second-order statistics-based algorithms cannot be used in a high mobility environment (i.e., a large Doppler spread) since they require many blocks of data to carry out the estimation procedure. A finite-alphabet-based algorithm can be applied only to a constant modulus signal. In contrast, in this paper, we extend and enhance some existing pilot-based channel estimation algorithms by substantially reducing the number of pilot symbols using the expectationmaximization (EM) algorithm.
The EM algorithm [19,20] is a technique for finding maximum likelihood (ML) estimates of system parameters in a broad range of problems where observed data are incomplete. The EM algorithm consists of two iterative steps: the expectation (E) step and the maximization (M) step. The E-step is performed with respect to the unknown underlying parameters, using current estimates of the parameters, conditioned upon the incomplete observations. The M-step then provides new estimates of the parameters that maximize the expectation of the log-likelihood function defined over complete data, conditioned on the most recent observation and the last estimate. These two steps are iterated until the estimated values converge.
The main objective of this paper is to investigate the use of the EM algorithm for channel estimation of an OFDM system that is subject to slow time-varying frequency-selective fading. Three different algorithms have been developed and compared. In each of the algorithms, the initial channel estimate is obtained either from pilot symbols (that are inserted in the OFDM frame) or from the channel estimate of the previous OFDM frame (where there is no pilot symbol in the current OFDM frame).
The rest of the paper is organized as follows. In Section 2, we will describe the baseband OFDM system model and discuss some assumptions. In Section 3, the three different EMbased channel estimation algorithms are derived and fully discussed. The Cramer-Rao lower bound (CRLB) and modified CRLB (MCRB) are discussed in Section 4 for both constant and nonconstant modulus signals. Comprehensive simulation results and discussion are given in Section 5. Finally, we draw some conclusions in Section 6.

SYSTEM MODEL AND ASSUMPTIONS
The schematic diagram ( Figure 1) is a baseband equivalent representation of an OFDM system. The input binary bits 2 are first fed into a serial-to-parallel (S/P) converter. Each data stream then modulates the corresponding subcarrier by MPSK or MQAM. The modulation scheme may vary from one subcarrier to another in order to achieve the maximum capacity or the minimum bit error rate (BER) for a given channel characteristic and total signal power constraint. In this paper, we assume, for simplicity, that only QPSK or 16 QAM is used in any of these subcarriers. We use M to denote the number of subcarriers in the OFDM system. The modulated data symbols, represented by complex variables X(0), . . . , X(M − 1), are then transformed by the inverse fast Fourier transform (IFFT). The output symbols are denoted as x(0), . . . , x(M − 1) and are given by In order to avoid IFI, CP symbols, which replicate the end part of the IFFT output symbols, are added in front of each frame, that is, where N cp denotes the length of the CP. The parallel data are converted back to a serial data stream, that is, , and transmitted over the frequency-selective channel with additive white Gaussian noise (AWGN). The received data after discarding the prefix symbols y(−N cp ), . . . , y(−1), and applying the FFT and demodulation to the remainder y(0), . . . , y(M − 1). The channel model we adopt in the present paper is a multipath slowly time-varying (unchanged in any one OFDM frame) fading channel, which can be described by The CIR h l 's (0 ≤ l ≤ L−1) are independent complex-valued Gaussian random variables (which represents a frequencyselective Rayleigh fading channel), and n(k)'s (0 ≤ k ≤ M − 1) are i.i.d. complex-valued Gaussian random variables with zero mean and variance σ 2 for both real and imaginary components. L is the length of the CIR. If we add the CP in each OFDM data frame, with its length chosen to be longer than L, there will be no IFI between OFDM frames. Thus, we only need to consider one OFDM frame at a time in deriving the system model. After discarding the CP and performing an FFT at the receiver, we can obtain the received data frame in the frequency domain: Then using the CP condition (2), we obtain the following simple expression: where H(m) is the frequency response of the channel at subcarrier m defined as follows: and the set of the transformed noise variables N(m), 0 ≤ m ≤ M − 1, are i.i.d. complex-valued Gaussian variables and have the same distribution as n(k), that is, with mean zero and variance σ 2 . In a regular OFDM system, the channel delay spread L is much smaller than the number of subcarriers. This leads to a high correlation between the channel frequency re- In this paper, we assume the CIR is constant in each OFDM frame and varies from frame to frame according to the fading rate. However, in the derivation below, we assume, for generality, that the channel is constant during D OFDM frames. Note that intercarrier interference (ICI) is also eliminated at the FFT output because of the orthogonality between the subcarriers under the assumption that the CP is longer than the channel delay spread. Furthermore, we assume the system has perfect timing and frequency synchronization.

Notation
We use the standard notation, that is, (·) T denotes the transpose, (·) * denotes the complex conjugate, (·) H denotes the Hermitian, underscore letters stand for column vectors, and bold letters stand for matrices. We denote the pth estimates of the channel response in the frequency domain as H (p) and in the time domain as h (p) , and transmitted signals as X (p) .

Introduction to the EM algorithm
The EM algorithm [19,20] is an iterative method to find the ML estimates of parameters in the presence of unobserved data. The idea behind the algorithm is to augment the observed data with latent data, which can be either missing data or parameter values, so that the likelihood function conditioned on the data and the latent data has a form that is easy to manipulate. The algorithm can be broken down into two steps: the E-step and the M-step. We assume that the data Z ("complete" data) can be separated into two components, Z = (X, Y), where X are the observed data ("incomplete" data) and Y are the missing data. We denote θ as the unknown parameter we try to estimate from Y.
The E-step finds Q(θ|θ (p) ), the expected value of the loglikelihood of θ, log f (Z|θ), where the expectation is taken with respect to Y conditioned on X and the latest estimate of θ, θ (p) : The M-step then finds θ (p+1) , the value of θ that maximizes Q(θ|θ (p) ) over all possible values of θ: This procedure is repeated until the sequence θ (0) , θ (1) , θ (2) , . . . converges. The EM algorithm is constructed in such a way that the sequence of θ (p) 's converges to the ML estimate of θ.
Applications of the EM algorithm to estimation problems in communications systems have appeared a lot in the literature. Channel estimation [21] and signal detection [22,23] are two typical applications of the EM algorithm. Georghiades and Han [22] provide a general study of data sequence estimation in the presence of random parameters. Zeger and Kobayashi [23] give a simplified algorithm to detect continuous phase modulated signals in fading channels. In the remainder of this section, we propose three different EM-based channel estimation and signal detection algorithms by defining different "complete" and "incomplete" data sets for these algorithms.

Algorithm 1: estimating the channel frequency response
OFDM divides its allocated channel spectrum into several parallel subchannels that are only subjected to frequency flat fading. Thus, we only need to estimate the individual H(m), 0 ≤ m ≤ M −1, separately, which will result in a considerable reduction in computational complexity. To simplify the expressions, we omit the subcarrier index m, and simply write Y , X, and H instead of Y (m), X(m), and H(m). We assume that the frequency-domain signal X of a given subcarrier represents a QPSK or 16 QAM signal with constellation size C(= 4 or 16, respectively). We denote the symbols in the signal constellation by Due to the Gaussian noise assumption, the probability density function (pdf) of Y given X and H is given by By assuming that all C symbols are equally likely and averaging the conditional pdf of (10) over the variable X, we obtain the pdf of Y given H as follows: Suppose the channel is static over the period of D OFDM frames. Different values of D can be applied in different applications depending on how rapidly the channel changes. We define the received signal vector Y = [Y 1 , . . . , Y D ] and the transmitted signal vector X = [X 1 , . . . , X D ] for a specific subcarrier over D frames. Then we call Y and (Y , X) "incomplete" and "complete" data, respectively, following the terminology of the EM algorithm. Assuming that additive Gaussian noise is independent from frame to frame for each subcarrier, we can write the conditional pdf of the incomplete data as follows: Thus, the log-likelihood function of the incomplete data is and the log-likelihood function of the complete data is given by In the conventional ML estimation, we try to find an estimate of H that maximizes f (Y |H). But since log f (Y |H), (11), is not easy to manipulate (summation of several exponential functions), we resort to the EM algorithm, which increases the likelihood at each step. Each iterative process p = 0, 1, 2, . . . in the EM algorithm for estimating H from Y consists of the following two steps: where (see Appendix A) H (p+1) is the tentative estimate of H directly from (16). The final (p + 1)st estimate of H, that is, H (p+1) , will be obtained through additional manipulation onH (p+1) . The conditional pdfs f (Y d |H (p) , X i ) and f (Y d |H (p) ) can be calculated from (10) and (11), where X i is the ith signal in the constellation.
The value of H that maximize (17) is found as (see Appendix B) follows: It should be pointed out that the above maximization problem is actually a weighted least square (LS) problem.

FFT
. . . In this paper, we assume that L, the delay spread in the CIR, is known. In practice, however, L is another unknown parameter. In such a case, we need to perform channel-order detection and parameter estimation. Alternatively, we may use some upper bound for L, which may be easier to obtain than trying to estimate the exact value of L. However, use of an upper bound of L would degrade the estimation performance. One obvious upper bound of L can be the length of the CP since its length is chosen to be longer than L.
The channel estimate of the form (18) obtained for the M subcarriers, which we denoteH (p+1) (m), 0 ≤ m ≤ M − 1, can be refined by taking advantage of the structure of OFDM systems and the fact that L is much smaller than M, the number of subcarriers. We will proceed as follows: where we use the notation defined in Section 3.3 for mathematical simplification and W L is an M × L matrix: Finally, we can obtain the (p+1)st estimate of the channel frequency response as follows: The above procedure can be simply realized by applying the IFFT followed by the FFT, as schematically shown in Figure 2.
obtained by the IFFT must be set to zero before performing the FFT. The reason is to eliminate the estimation noise from paths that do not actually exist.
The iterative procedure should be terminated as soon as the difference between H (p+1) and H (p) is sufficiently small, since at this point, H (p) has presumably converged to the estimate we are seeking. Once the frequency-domain channel re-sponseĤ is found, the ML estimate of the transmitted signal can be obtained by solvinĝ which leads to the final estimates of the transmitted signals as follows: For a constant modulus signal, for example, a PSK modulation signal |X(m)| 2 = A for all m, where A is a positive constant. Thus, we can simplify (18) as follows: Notice that only the noise variance σ 2 is used to calculate f (Y d |H (p) , X i ) in this algorithm. Any other statistical information about the channel is not necessary. Moreover, in Section 5, we will show that the accuracy of σ 2 will not affect the performance very much. Thus, this algorithm is fairly robust to the noise variance.

Algorithm 2: estimating the transmitted signals
In this algorithm, we try to improve the performance of the detection accuracy of the transmitted signal and H = W L h. We also use the notation X d = diag(X d ), which denotes an M × M matrix with X(m) as its (m, m) entry and zeros elsewhere. The system model can be expressed in the vector form for the dth OFDM frame as follows: We still assume that the channel is static over the period of D frames for generality. To process the channel estimation algorithm using observed data in all D frames, we define some variables: With this notation, the system model can be modified as follows: The "incomplete" and "complete" data are defined as (Y ) and (Y , h), respectively. Each iterative process p = 0, 1, 2, . . . in the EM algorithm for estimating X from Y consists of the following two steps: In the E-step at the (p + 1)st iteration, we compute the expected value of log f (Y , h|X), given Y and X (p) , the estimates obtained in the pth iteration. The M-step of the (p + 1)st iteration determines the transmitted signal X (p+1) that maximizes Q(X|X (p) ) given X (p) . After some calculations (see Appendix C), we obtain the solution of (28): where h (p) and Σ (p) are called the estimated posterior mean and posterior covariance matrix at the pth iteration. Therefore, in each iteration, the updated estimation of CIR h (p) is obtained automatically as a by-product. After quantizingX (p+1) , we obtain the (p + 1)st estimate The limitation of this algorithm is that the mean E{h} and the covariance matrix Σ of time-domain CIR are also assumed to be known. In a practical situation, these channel statistics may not be known. Fortunately, as we examine (33) and (34), we find that when σ 2 is small (i.e., SNR is high), the contribution of Σ −1 and Σ −1 E{h} is so small that we can eliminate them and still expect similar performance. Furthermore, for an MPSK modulated signal, that is, |X(m)| 2 = A for all m, the signal estimation can be performed by using only the phase information. Thus, we can simplify (35) to Consequently, only multiplication and addition operations are required. Furthermore, W LD W H LD can be calculated and stored ahead of time. Thus, the computational complexity is considerably reduced for the high SNR case.
A closer examination of (36) reveals that the simplified Algorithm 2 is a combination of ML channel estimation assuming X (p) = X and ML signal detection assuming h (p) = h. This has been proposed in [17] in a different context. To conclude, Algorithm 2 is the extension of the iterative ML channel estimation algorithm when we take advantage of the channel statistics. The corresponding simplified algorithm is the same as the iterative ML channel estimation algorithm.

Algorithm 3: estimating the channel impulse response
In this section, we try to estimate the time-domain channel response by applying the parameter estimation algorithm proposed by Feder and Weinstein [24] for the general estimation problem based on the EM algorithm. We still assume that the channel is static over the period of D frames for generality. The system model used here is the same as the previous algorithm stated in (26). We define A = XW LD which is a MD × L matrix, and rewrite the system model as follows: where A i is the ith column of the matrix A. Note from (37) that each element of Y , Y (m), consists of L superimposed signals and AWGN which can be represented by Following [24], a natural choice for the "complete" data Z m is defined by decomposing the observed data Y (m) into L components, that is, Here, a i (m) is the (m, i)th entry of the matrix A and N i (m), 0 ≤ i ≤ L − 1, are obtained by arbitrarily decomposing the total noise N(m) into L components such that Thus, the relation between the "complete" data Z m and "incomplete" data Y (m) is given by It is convenient to choose the N i (m) to be statistically independent Gaussian random variables with zero mean and variance σ 2 i , where The EM-based algorithm is used here to obtain an estimation of h that maximizes f (Y |h). The "incomplete" and "complete" data for mth element of Y , as stated before, are (Y (m)) and (Z m ), respectively. We then group all Z m for all D OFDM frames and all M subcarriers into a new vector Each iterative process p = 0, 1, 2, . . . in the EM algorithm for estimating h from Y consists of the following two steps: (i) E-step: (ii) M-step: In the E-step at the (p+1)st iteration, we compute the expected log-likelihood function log f (Z|h), given Y and h (p) , the estimates obtained in the pth iteration. The M-step of the (p + 1)st iteration determines the transmitted CIR h (p+1) that maximizes Q(Z|h (p) ).
After some calculation (see Appendix D), we obtain the solution of (44): Observe that β i , the ith decomposition factor, can be arbitrarily selected with the constraint (47) due to the arbitrary selection of the independent noise components N i (m). Different sets of β i will give different system performance and we will discuss the selection of β i with simulation results in the next section.
Note that the elements of A = XW LD are dependent on the transmitted signals X. However, we do not know all these transmitted signals in the OFDM frames except for some pilot symbols. Thus, in order to proceed, we adopt the pth estimates X (p) instead of the actual values (which are unknown) to calculate the matrix A. In this case, the elements of X (p) are given by where W m is the (m + 1)st row of matrix W LD .
Notice that we do not need any information about the channel in this algorithm except the choice of the set β i . However, we can always choose β i = 1/L which will give near optimum performance as demonstrated in the simulation results. Thus, this algorithm is also very robust.

Initialization
As is known from the general convergence property of the EM algorithm, there is no guarantee that the iterative steps converge to the global maximum. For a likelihood function with multiple local maxima, the convergence point may be one of these local maxima, depending on the initial estimates H (0) , X 0 , and h (0) . Therefore, we propose to use pilot symbols distributed at certain locations in the OFDM timefrequency lattices to find appropriate initial values of H (0) , X 0 , and h (0) if there are pilot symbols inserted in the current OFDM frame. On the other hand, if there is no pilot symbol, we just set the initial channel estimates of the current OFDM frame as the final channel estimates of the previous OFDM frame assuming the channel is changing slowly. This is more likely to lead us to the true maximum point, as can be observed in the numerical results. Another benefit of this selection of the initial estimates of the CIR is that we do not need to do time-domain filtering or interpolation. Thus, we can considerably reduce the detection latency since we can carry out channel estimation and signal detection procedures as soon as we have received signals for each OFDM frame.
For those OFDM frames with pilot symbols, we define the pilot position set S = {s 1 , . . . , s |S| }. The corresponding FFT matrix only with those rows belonging to S is denoted as W S . Thus, we use the simple LS algorithm to obtain the channel frequency response [8] at each pilot position bỹ Then, we apply the IFFT onH (0) (s i ), . . . ,H (0) (s |S| ) and obtain the initial CIR by

CRAMER-RAO LOWER BOUND
The CRLB is an important criterion to evaluate how good any unbiased estimator can be since it provides the MMSE bound among all unbiased estimators. In this section, we will derive the CRLB for the CIR in OFDM systems. In Section 5, we will show the performance of the three proposed EMbased channel estimation algorithms and compare it to the CRLB. We note that in [25], Morelli and Mengali discuss the CRLB for channel estimators in OFDM, but they only treat PSK modulation in their discussion. We will discuss below the modified and averaged CRLB for the CIR with nonconstant modulus modulation.
The CRLB for the channel estimation is given by (see Appendix E) where Clearly, the CRLB changes from a set of D frames to another due to the different sets of transmitted signals. We define the average CRLB [26] denoted CRLB(h) as follows: where the expectation is carried out with respect to the transmitted data X in D frames. Another CRLB is called the modified CRLB [27], denoted by MCRB. It is defined as We note that we use M to denote the number of subcarriers in this paper. It also could be the number of effective subcarriers which exclude the null subcarriers as the guard frequency band. Of course, in the presence of null subcarriers, we have to make some modifications on W L by deleting those rows corresponding to the null subcarriers.
It is easy to show that CRLB(h) ≥ MCRB(h) by simply applying the Cauchy-Schwarz inequality. This is equivalent to saying that the CRLB(h) is always tighter than the MCRB(h) [27]. We will discuss the specific CRLB for constant and nonconstant modulus signals in the following.

CRLB for constant modulus signals
For constant modulus signals, |X d (m)| 2 = A for all d's and m's (for instance, PSK modulated signals). Thus, we can simplify (53) as follows: It is obvious that the above CRLB is inversely proportional to the number of observed OFDM frames D, number of subcarriers M, and SNR A/2σ 2 . Note that CRLBs of different frames for OFDM channel estimation are constant and do not depend on the channel response H or h. Consequently, this CRLB can be applied to any multipath fading channel. Another important observation is that in the case of constant modulus signals.

CRLB for nonconstant modulus signals
For nonconstant modulus signals, |X d (m)| 2 is no longer constant (e.g., 16 QAM modulated signals). Thus, the CRLB in this case changes from D frames to another D frames. In addition, it is not straightforward to obtain an explicit expression for the CRLB(h) because I(h) can no longer be easily inverted. However, the MCRB(h) can be computed assuming the transmitted signals are independent. This results in Thus, the MCRB(h) can be calculated as follows: which is the same as the constant modulus CRLB in the case of the same average signal energy A. Figure 3 shows the theoretical curve of MCRB(h) and the numerically evaluated curve of 16 QAM signals. These two curves agree and justify the use of MCRB(h) as a performance measure for unbiased channel estimation algorithms in OFDM systems, both for constant modulus and nonconstant modulus signals.

SIMULATION AND DISCUSSION
We constructed an OFDM simulation model, which is similar to the specifications of 802.11a, to demonstrate the validity and effectiveness of the EM-based channel estimation and signal detection algorithms. The entire channel bandwidth is 800 kHz, and is divided into 64 subcarriers (or tones). To make the tones orthogonal to each other, the symbol duration is chosen as 80 microseconds. An additional 20 microseconds CP (N cp = 16) is used to provide protection from IFI and ICI due to channel delay spread. Thus, the total OFDM frame length is T s = 100 microseconds and subchannel symbol rate is 10 kbaud. The modulation scheme used in the system is QPSK. One OFDM frame out of 8 OFDM frames (N t = 8) has pilot symbols and 8 pilot symbols (N f = 8) are inserted into such a frame with equal space, where N t and N f denote the pilot spacing along the frequency and time domains, respectively. Thus, the overhead caused by pilot symbols is only 1/64. The simulated system can transmit uncoded data at 1.28 Mbps. The maximum Doppler frequency f d is chosen to be 100 Hz, which implies f d T s = 0.01. The CIRs used in the simulations are given by where C 2 = 4 k=0 e −2k and C 3 = 7 k=0 e −k are the normalization constants and α k , 0 ≤ k ≤ 7, are independent complex-valued Gaussian random variables with unit variance, which vary in time according to the Doppler frequency. The amplitude of α k are Rayleigh distributed. This is a conventional exponential decay multipath channel model. We set the stopping criterion as h (p+1) − h (p) 2 ≤ 10 −3 .

Simulation results of Algorithm 1
The channel model we use to test the performance of Algorithm 1 is h 3 (n). Since we normalize the average channel power, the BER performance of different channel models should be the same. However, the MSE is proportional to the channel length L as shown in (57). For those OFDM frames containing pilot symbols, the initial estimate of CIR is obtained by using these 8 equally spaced pilot symbols. For those OFDM frames without pilot symbols, the initial estimate of CIR comes from the channel estimate of the previous OFDM frame.
From Figures 4 and 5, we observe that the EM-based Algorithm 1 reduces the BER and MSE simultaneously. Furthermore, the BER can achieve performance close to the known channel case and the MSE can almost achieve the CRLB in the high SNR region. For example, the MSE is very close to the CRLB when E b /N 0 > 14 dB, which is a very favorable result since we only sacrifice 1/64 spectral efficiency ignoring the effect caused by the CP. One drawback of the algorithm is that the BER cannot be improved from the initial estimate when SNR is low. It is clear from Figure 6 that the algorithm needs more iterations in the low SNR region than in the high SNR region for the iterative procedure to converge. Indeed, for low SNR case, the BER may increase after a few iterations, while the MSE still decreases from the initial value. That is because the EM algorithm is used to obtain the true values of the CIR and better estimates of CIR (less MSE) do not necessarily lead to lower BER. Therefore, this algorithm is practical only when SNR is large. We see that the number of necessary iterations decreases rapidly as E b /N 0  increases. When E b /N 0 = 20 dB, for instance, only three or four iterations are needed to achieve the convergence in the 8-path channel. It turns out that the number of iterations does not depend on the channel delay spread L, which is not illustrated here.
For this algorithm, we need to know the Gaussian noise variance σ 2 in order to compute f i (Y d |H (p) (m)) in each iteration. In practice, the noise variance is not known directly at the receiver and the error in the noise variance estimate will degrade channel estimation accuracy. We performed such a   simulation to illustrate this effect. This is shown in Figure 7.
The exact E b /N 0 of the system is 10 dB and the horizontal axis is the E b /N 0 we adopted in the EM-based algorithm. From this figure, it is seen that the effect of noise variance error is relatively small on the system performance (BER) when the noise variance error is within −2 dB and 3 dB. Therefore, we can use the following method to estimate the Gaussian noise variance on the fly with only negligible effect on the system performance:  The E b /N 0 computed by the above equation is about 11 dB by using the initial estimates of the CIR and transmitted signals. In this way, the performance degradation caused by using the estimated noise variance would be relatively small.

Simulation results of Algorithm 2
The channel model we use to test the performance of Algorithm 2 is still h 3 (n). Figure 8 shows the BER performance of EM-based Algorithm 2 and Figure 9 displays the corresponding MSE. The initial channel estimates are obtained using the same method stated above. In the EM-based Algorithm 2, we use the estimate of the previous OFDM frame as the initial value for the current OFDM frame if there are no pilot symbols in the current frame. From these two figures, we can see that the EM-based Algorithm 2 can also achieve almost as good performance as the ideal case in terms of BER, where the channel characteristics are completely known and f d Ts = 0.01, that is, the channel does not change very fast. Furthermore, the MSE of the EM-based channel estimation 2 converges to the CRLB when SNR becomes large. Another interesting result obtained from our simulation is that the performance degradation is quite small when we use the simplified Algorithm 2 that does not use the channel statistics. Degradation occurs only when the E b /N 0 is less than 10 dB. Thus, this algorithm is well suited in practical situations.
In Figure 10, we plot the number of iterations required for the estimates X p to converge versus E b /N 0 at the receiver input. We see that the numbers of necessary iterations for both the simplified and nonsimplified algorithms are relatively small for a broad range of SNR. And the simplified algorithm causes only a very small increase in the number of iterations required to converge. This demonstrates that the algorithm can achieve a substantial performance improvement with only a modest increase in the computational complexity (details are in Section 5.5). This is due to the additional computation for the iterations. Furthermore, it turns out that the number of iterations does not depend on the channel length L, which is the same as Algorithm 1. Here, however, there are fewer iterations compared to Algorithm 1.

Simulation results of Algorithm 3
The channel model we use to test the performance BER and MSE of Algorithm 3 is also h 3 (n). However, we tested all three channel models to study the effect of the number of iterations needed to converge. Figure 11 shows the BER performance of Algorithm 3 and Figure 12 displays the corresponding MSE.  Similar conclusions can be drawn about the performance of BER and MSE as with Algorithms 1 and 2. Figure 13 shows the relationship between the number of iterations needed for convergence versus the channel delay spread L for different E b /N 0 . From this figure, we observe that the number of iterations decreases when E b /N 0 increases; the number of iterations increases when the channel delay spread increases under the same E b /N 0 . Furthermore, we see that the number of iterations is almost proportional to the channel delay spread L; for example, when L doubles, the number of  iterations approximately doubles. Therefore, this algorithm is more suitable for the case of small channel delay spread. Figure 14 shows the MSEs using different sets of β i over the 8-path channel. Scheme 1 corresponds to β i = 1/8. Scheme 2 corresponds to β i = E{h 2 i } = e −i /C 3 . Scheme 3 corresponds to β i equal to the energy of ith path obtained in each iteration. It changes from iteration to iteration. From this figure, we find that Scheme 1 has the best MSE performance and the other two schemes have larger MSE, especially in the large E b /N 0 region. Furthermore, Scheme 1 is simpler than the other two since it only needs to know L, while Scheme 2 has to know the average energy of each path of the channel and Scheme 3 has additional computation in each iteration. Based on these preliminary simulations, it appears that Scheme 1 has the best performance without additional knowledge of the channel and additional computation.

Performance in different time-varying channels
Since the three algorithms show similar performance patterns, we choose Algorithm 2 to display the channel estimation performance in various time-varying channels with f d T = 0.01, 0.03, and 0.05, respectively. Some interesting observations can be noted from Figures 15 and 16. As f d T increases, that is, the channel varies faster, the performance of our algorithms degrades. However, the degradation is not so significant, especially when f d T ≤ 0.03. The pilot pattern definitely affects the channel estimation and system performance [10]. However, our algorithms show similar performance with different number of pilot symbols in the frequency domain as long as N f ≥ M/L. Most performance degradation comes from the time-varying nature of the channel. For those fast time-varying channels (e.g., f d T = 0.05), reducing the pilot spacing along the time domain will improve the system performance as well as channel estimation accuracy. But, for slowly and moderately timevarying channels (e.g., f d T = 0.01 and 0.03), it does not help much as our algorithms have already achieved near-optimal performance. Thus, our algorithms are able to achieve nearoptimal performance by using very few pilot symbols both for slowly and fast time-varying channels.

Comparison with existing methods
In order to compare the performance with existing methods, we choose the method with MMSE in the frequency domain [8] and linear interpolation (LI) in the time domain [7] (MMSE + LI) as an example, since MMSE estimation in the frequency domain demonstrates a good and robust performance. The reason to use LI in the time domain is to make the demodulation latency and necessary memory requirement as small as possible. For example, if N t = 8, the largest demodulation latency is 7 OFDM frames' period and 9 OFDM frames' data must be stored in the memory. Comparing Figure 17 with Figure 15 and Figure 18 with Figure 16, it is obvious that MMSE + LI performs much worse under the same channel model and pilot pattern, especially in the high SNR region. Our algorithms are CRLB achievable, whereas MMSE + LI is not. Furthermore, there is an error floor for MSE in fast time-varying channels using MMSE + LI. In order to remove it, more pilot symbols in the time domain must be inserted. This will reduce the system spectrum efficiency. Another advantage of our algorithms is that they do not have demodulation latency except the processing time because our algorithms are based on the received signals of the current OFDM frame. MMSE + LI or other pilot-symbols-assisted channel estimation methods have some extent of demodulation latency as long as they apply some kinds of time-domain interpolation or filtering.
One major disadvantage of our algorithms is the large computational complexity due to the iterative nature. We      simple MMSE + LI method shown in Figure 19, especially when SNR is small. Among the three EM-based algorithms, Algorithm 2 has the least complexity and Algorithm 3 has the most. This suggests that our algorithms are suitable for a high SNR environment with comparable complexity as existing methods, while achieving much better performance.

CONCLUSION
In this paper, we proposed three EM-based iterative algorithms to efficiently estimate the CIR and demodulate the transmitted signals in an OFDM system. By defining different "complete" data sets for the EM algorithm, we are led to the three algorithms: (1) estimating the frequency response of the channel; (2) estimating the transmitted signals; and (3) estimating the CIR. Making use of a modest number of pilot symbols or the channel estimate of the previous OFDM frame to obtain the initial estimate, these algorithms can achieve near-optimal estimates after a few iterations. We also derived the CRLB and MCRB for the channel estimate for both constant and nonconstant modulus signals. Advantages and disadvantages of each algorithm have been discussed and illustrated by means of simulation. The simulations reveal that the first two (estimating the frequency response and demodulating the transmitted signals directly) converge at a rate independent of the multipath spread. Bypassing the channel estimate by demodulating the transmitted signals directly (Algorithm 2) is the fastest converging procedure and, thus, has the smallest overall computational burden. Algorithm 2 has the least complexity among the three and is comparable with MMSE + LI. However, the performance is much better than that of MMSE + LI, especially in the high SNR region. All three algorithms are able to work in a fast fading environment. These algorithms can easily be extended to estimate multiple-input multiple-output (MIMO) channels in MIMO-OFDM systems. Our results indicate that the performance is acceptable only when E b /N 0 is larger than 8 dB. In the small SNR region, these algorithms need to be utilized with channel coding schemes to further improve performance.

Q H H (p)
where we assume X i , 1 ≤ i ≤ C, and H (p) are independent random variables.

B. DERIVATION OF (18)
arg max Then differentiating the last expression with respect to H, and setting it to zero, we havẽ

C. DERIVATION OF ALGORITHM 2
Equation (27) can be rewritten as follows: where the log-likelihood function can be expressed as follows: The conditional pdf f (h|Y , X (p) ) is used in (C.1) to take the conditional expectation over the unknown parameters h. We assume that h and X are independent of each other. This is a reasonable assumption since the CIR does not depend on the transmitted signal in general. Thus, for the purpose (C.14) Since the distribution of random vector Y given h and X (p) is Gaussian with mean h (p) and covariance matrix Σ (p) , it is easy to compute Moreover, all entries of matrix G are given in terms of the signal energies, that is, |X(0)| 2 , . . . , |X(M − 1)| 2 . Thus, we can compute the third part of (C.12) as follows: In order to calculate Q(X|X (p) ) completely with respect to X, we write (C.13) as follows:

E. DERIVATION OF CRLB
In [25], the authors derive the CRLB by separating the complex vector h into real and imaginary parts. We will simplify the derivation by applying derivatives with respect to the complex vector itself. Recall the OFDM system model can be represented as in (26), where we use the same notation as in Section 3.3. We again assume for generality that the channel is static over the period of D frames. We will derive a CRLB by using all the data from the D frames. The parameter vector here is obviously θ = h. The CRLB gives a lower bound [28] for the variance of an unbiased estimate: where I(h) is the Fisher information matrix Following (26), we have the conditional pdf of Y given h: where we assume the data matrix X is known. Therefore, the pdf is not conditioned on X. After differentiating the logarithm of (E.3) with respect to h, we obtain Then the Fisher information matrix is obtained as follows: (E.5) The CRLB for the CIR of the ith tap is I −1 (h) ii . We define the CRLB for the overall CIR as follows: