Channel Frequency Response Estimation for MIMO Systems with Frequency-Domain Equalization

,


Introduction
The severe frequency selectivity often characterizing wideband radio channels would inevitably induce intersymbol interference (ISI) which can span over many symbol intervals.High-speed broadband wireless systems targeting data rate of tens of megabits or beyond should be, as a result, designed to mitigate the effect of such intense ISI.Traditionally, time-domain equalization (TDE) is a popular approach to compensate for ISI in single-carrier communication systems.But for wideband channels, TDE becomes unattractive as its complexity grows exponentially with channel memory or it requires very long finite impulse response filters to achieve acceptable performance.An alternative approach is the single-carrier frequencydomain equalization (SC-FDE), which has the advantage of large reduction in the computational complexity due to the use of the computationally efficient fast Fourier transform (FFT) (see [1][2][3] for a tutorial treatment).Even compared with orthogonal frequency-division multiplexing (OFDM), a well-recognized multicarrier solution to combat channel delay spread which also uses FFT, single-carrier transmission with FDE can handle the same channels with similar performance and essentially the same overall complexity but smaller peak-to-average transmitted power ratio [1].This is particularly advantageous to mobile terminals and mobile personal assistants, as it can greatly alleviate the requirements on the radio frequency hardware at the transmitter, such as the digital-to-analog converter and the power amplifier, to name a few.For that reason, a technology named single-carrier frequency division multiple access (SC-FDMA), which is essentially based on SC-FDE, has been adopted for the uplink transmissions in the next-generation cellular systems 3 GPP long-term evolution (LTE) and LTE advanced [4].SC-FDE has thus grasped more attention in both academic and industrial circles.SC-FDE has also been applied to multiple-input multiple-output (MIMO) communication systems.This, however, is often done jointly with space-time coding (STC), in order that the spatial diversity available in a MIMO system can be exploited to further mitigate the frequency selectivity, for example, [5][6][7][8][9].For this case, properly designed ST block codes (STBCs) are generally required and there exist some works in that regard.For example, a time-reversal Alamoutilike STBC scheme with FDE was proposed firstly in [5].This scheme is attractive as it can achieve full spatial diversity, and nearly full transmit rate if the cyclic prefix (CP) overhead is ignored.For SC-FDE in MIMO systems with more than 2 transmit antennas, a general block-level STC was proposed in [6] and a method based on quasi-orthogonal STBCs was proposed in [7].
Note that when performing FDE in MIMO systems, the channel frequency response between each transmit-receive antenna pair is usually required at the receiver to recover the transmitted signals [2,3].To obtain such channel frequency response (CFR) knowledge, one approach is to obtain the channel impulse response (CIR) firstly and then transfer it back to the frequency domain through FFT processing.As a result, the CFR estimation problem merely reduces to the problem of estimating the CIR in MIMO systems, which has been vigorously investigated over the years, for example, see [10] and references therein.As an alternative, one can apply the FFT firstly, and then estimate the CFR directly afterwards.In fact, we notice that this alternative approach, or the CFR estimation problem, has been studied, for example, in [11] for systems with single transmit and single receive antenna, and in [12] for SC-FDE in ultrawideband communication systems.However, there does not seem to exist a lot of works which explore this alternative approach particularly for MIMO systems employing both STC and SC-FDE.This line of work merits interest on its own terms, for not only can it advance the existing knowledge on the subject of CFR estimation, but the CFR estimation scheme, when designed in a manner to be integrated with the techniques of STC and FDE in MIMO systems, can be amenable to system implementation, and has the potential to induce less hardware complexity and cost.This basically motivates our work as detailed next.
In this paper, we present and investigate a CFR estimation scheme for MIMO systems with both STC and FDE.In this scheme, training sequences are encoded in space and time in a similar manner as data sequences.(We notice that the CIR estimation for MIMO channels using ST codes was considered in [13,14].)In fact, the same set of coding hardware can be reused; thus, no additional hardware complexity is introduced at the transmitter and this is particularly suitable for mobile terminals.At the receiver, different from the traditional approach where CIR is obtained first then transferred to CFR, these training sequences are simply processed in a similar fashion as the data sequences, for example, CP removal and FFT processing.Following these procedures, estimation of the CFR can thus be done directly in the frequency domain.As the CFR estimation can make use of the existing FFT modules for FDE, fewer complexity or cost would be required at the receiver.This scheme is illustrated in Figure 1.Further, in this paper, we provide a thorough mean square error (MSE) analysis for the CFR estimation based on two criteria, least squares (LS) and minimum MSE (MMSE), by assuming different a priori knowledge of the channel statistics.More specifically, for the LS-based approach, we assume no a priori knowledge of the channel statistics is given other than the noise statistics, while for the MMSE-based method, we assume both the channel covariance matrix and the noise statistics are known.Under both criteria, we also study the optimal training sequence design by imposing a constraint on the transmit power of training sequences.Finally, we investigate the adaptive implementation of the proposed CFR estimation scheme for Alamouti-like transmissions.We provide several block-wise recursive algorithms to update the adaptive filter, and also study the convergence behaviors of these recursive algorithms.
The remainder of this paper is structured as follows.In Section 2, we describe the system model and the transmission scheme of the training sequences.In Section 3, we describe in detail the CFR estimation scheme for MIMO systems with more than 2 transmit antennas.We also investigate the optimal training sequence design under both LS and MMSE criteria.In Section 4, we focus on the special Alamouti case with 2 transmit antennas.We discuss an adaptive implementation of the CFR estimation scheme for this special case, and provide a brief convergence analysis.In Section 5, we provide extensive simulation results and also compare with others' work to demonstrate the efficacy of this estimation approach.Section 6 concludes this paper.
Notation.Throughout this paper, we use bold upper case letters to denote matrices and bold lower case letters to signify column vectors.Superscript {•} H , {•} * , and {•} T will be used to denote the complex conjugate transpose, conjugate, and transpose of a matrix or vector, respectively.We use diag{a} for a diagonal matrix with its diagonal vector given by a, and ⊗ for Kronecker product.I K denotes the identity matrix of size K × K, and 0 M×N for a zero matrix of size M × N .We use the subscript {•} F to denote the matrices or vectors in the frequency domain, and (•) + for the nonnegative part of a real-valued scalar or matrix.

Signal and System Model
We consider an ST-coded MIMO system equipped with N T transmit and N R receive antennas.With symbol rate sampling, let h (p,q) = [h (p,q) (0), . . ., h (p,q) (ν)] T denote the equivalent baseband discrete-time CIR (including the transmit and receive filters as well as the multipath effect) between the pth transmit antenna and the qth receive antenna, where 1 ≤ p ≤ N T , 1 ≤ q ≤ N R , and ν is the channel order.We assume the channel is quasistatic, that is, its response remains time invariant within one ST-coded frame but can vary from frame to frame.We define N S vectors of dimension L × 1, {s i } NS i=1 as the training sequences, where the symbols in s i belong to the same alphabet A, and L denotes the sequence length and is assumed to be at least equal to the number of multipaths, that is, L ≥ ν + 1.In this proposed CFR estimation scheme, the training sequence s i is encoded in space and time, using the same ST block encoder for data sequences, as depicted in Figure 1.As a result of this, the same set of hardware can be reused without additional complexity and cost.As for the ST encoder, we adopt the code design described in [6].It is an extension of the original orthogonal STBCs in [15,16] for frequency-selective fading channels.This type of STBCs are capable of achieving full spatial diversity and are particularly amenable to FDE.
Without loss of generality, suppose the N S training blocks are ST coded in a manner that they are transmitted over N c = 2N S time slots, where a time slot is defined as the duration required to transmit a CP appended training block.Thus, the code rate is given by R = N S /N c = 1/2.There exist some sporadic code designs which could achieve code rate higher than 1/2.For example, when N T = 3 and 4, the code design with R = 3/4 can be found in [16].However, it has been proved in [17] that with complex signal constellation and under the orthogonality assumption, R cannot be greater than 3/4 for N T > 2. For simplicity, in this part we only focus on the case of R = 1/2 for N T > 2. The special case of R = 1 for N T = 2 will be discussed in detail in Section 4.
Let {Π i } NS i=1 be a set of N S × N T real-valued matrices of a full-rate generalized orthogonal STBC design for real symbols.Entries of Π i are either 0 or ±1, and Π i further satisfies the following conditions [18, Chapter 7]: (1) Then, for the block-level generalized complex orthogonal STBC that is employed in our work, the code matrix, if denoted as G ∈ C NcL×NT , can be written as where Γ Ai and Γ Bi are both N c × N T matrices, and are, respectively, defined as In (2), P (1)  L is an L × L permutation matrix which performs a reverse cyclic shift when applied to an arbitrary L × 1 vector, for example, suppose s = [s(0), s(1), . . .s(L − 1)] T , we then have Given the properties of Π i in (1), it can be easily verified that Γ Ai and Γ Bi have the following properties: ( Let G(:, i) denote the ith column of G that corresponds to the training blocks to be transmitted from the ith transmit antenna over N c time slots.For notational convenience, we express the ith column of G as follows: where i = 1, . . ., N T .To give an example of G, let us consider a code design with rate R = 1/2 for N T = 3, where N S = 4 and N c = 8.For this instance, G is illustrated as below L s * s 1 (5) s 2 (5) s 3 (5) s 1 (6) s 2 (6) s 3 (6) s 1 (7) s 2 (7) s 3 (7) After ST coded, the transmission structure of the training sequences is shown in Table 1.
To avoid the interblock interference from preceding information or training sequences, a CP with a length of ν is inserted for each block before transmission.Then, at time slot k, the training sequence s p (k) is forwarded to the pth transmit antenna after CP insertion.The length of total training symbols from each transmit antenna, denoted as N b , is equal to N b = N c (L + ν), and its minimum length is N b = N c (2ν + 1) when L is chosen to be equal to ν + 1.

CFR Estimation for MIMO Transmissions (N T > 2)
At the receiver, symbols corresponding to the CP are discarded.Thus, the received signal at the qth receive antenna at time slot k can be written as where H (p,q) is an L × L channel matrix with its (k, l)th entry given by h (p,q) ((k−l) mod L), and n q (k) denotes the additive white Gaussian noise (AWGN) vector.It is easy to verify that H (p,q) is a circulant matrix.Thus, its eigen matrix is the FFT matrix, or in other words, its eigendecomposition can be written as F L is the orthonormal FFT matrix whose (k, l)th entry is given by where F ), we have where i = 1, . . ., L. Applying the FFT operations on both sides of (8), we obtain where is diagonal, we can rewrite (12) into Table 1: Transmission structure of training sequences (N T > 2). 1 where S pF (k) = diag{s pF (k)}.Stacking N c blocks of received signals at the qth receive antenna, we have x qF (1) . . .
or in a more simplified form Collecting the received signals across all those N R receive antennas, we obtain the received data matrix where Thus, our task is to recover the CFR H F from (16).
Additionally, let us denote h q = [h (1,q) T , . . ., h (NT ,q) T ] T as the corresponding CIR associated with the qth antenna, and stack all the CIR across N R receive antennas in matrix We further define the compound inverse FFT (IFFT) matrix F H NT = I NT ⊗ F H L , and the compound transmit matrix ]. Therefore, the corresponding CIR estimate can be computed by where H F is the CFR estimate for H F .In the sequel, we discuss the linear CFR estimators based on both LS and MMSE criteria, along with the respective optimal designs of training sequences.

LS Estimator with Power Constraint.
For the convenience of ensuing analysis, we explicitly make the following assumption.
(A1) All noise components are assumed to be complex, independently and identically Gaussian distributed with zero mean and variance σ 2 n .Thus, Except for the noise statistics, we assume no a priori knowledge of the channel parameters (e.g., the covariance matrix of the CFR) is given, and we only consider the conventional LS method.Therefore, the unique LS solution H F that minimizes the cost function defined by X F − S F H F 2 can be written as It should be noted that if we want to obtain the CFR with a length greater than the default length L, interpolation is needed.
Based on assumption (A1), it is clear that this estimate is unbiased since E{ H F } = H F .Let us define the CFR estimation error as E F = H F − H F .Using ( 16) and ( 18), we obtain Its correlation matrix, R EF = E{E F E H F }, can be calculated through Thus, the MSE for this CFR estimation is given by Now we consider the problem of designing the matrix S F so that the estimation error is minimized.To have a reasonable solution, it is necessary to impose a constraint to limit the power of training sequences.Let such a constraint be S F 2 ≤ P 0 , where P 0 is a given constant.Note that the power used in the cyclic prefix is not included in this formulation.Mathematically, this power constraint can also be written as tr{S H F S F } ≤ P 0 .For simplicity, we start with a general problem formulation, without examining the structure of the data matrix S F but only assuming it has full rank.Therefore, our task is to find S F that minimizes the MSE subject to the power constraint given above.This constrained optimization problem can be cast as To solve this problem, the following lemma will be useful.

Lemma 1.
For any M × M positive semidefinite Hermitian matrix A with its (i, j)th entry given by a i j , the following inequality holds where the equality is achieved if and only if A is diagonal.
Applying this lemma and the method of Lagrange multipliers [19], we could readily solve this optimization problem.For brevity, we omit the details and simply provide the solution which means that the diagonal entries of S H F S F have the same value.Re-examining the matrix S F as defined in (14) and its relation to G in (2), we find that due to the orthogonal structure of the ST code, S H F S F is precisely diagonal.Moreover, recall {s i } NS i=1 are training sequences, we define Then, we arrive at the following result.

Theorem 1. The following equality holds
Proof.S H F S F is an N T L × N T L matrix and can be expressed in the block matrix form as where Ξ i, j , i = 1, . . ., N T , j = 1, . . ., N T , is a square matrix of size L × L. According to both ( 6) and ( 14), Ξ i, j can be expressed as To simplify (27), we need to use the mixed-product property of Kronecker product, that is, ( where A, B, C, and D are matrices of such size that one can form the matrix products AC and BD.Further, given the properties of Γ Am and Γ Bn in (5), we have the following: Similar properties also hold for Γ T Bm (:, i)Γ Bn (:, j).Moreover, we have Based on the above properties, ( 27) can be simplified into (30) Plugging (30) into (26), we then obtain (25).
Based on ( 24) and ( 25), we summarize the following result.

Theorem 2. The optimal training signals under the LS criterion should satisfy the following condition:
This condition is the same as where s iF ( j) denotes the jth element of s iF .
Of note is that although Theorem 2 states the conditions for training signals to be optimal in the sense of achieving the minimum value of MSE, it does not mean any sequences which satisfy (32) would be suitable for practical applications.This is because practical implementation of communication systems will inevitably impose some additional constraints on the sequences.To give an example, let us consider the CP-based communication systems.These systems are usually plagued by the well-known peak-toaverage ratio (PAR) problem; thus, sequences with lower PAR values are, in general, more preferred in practice, for they can greatly alleviate the requirement on the power amplifier.Under this circumstance, training sequences which not only satisfy (32) but have a constant magnitude in both the time domain and the frequency domain would lend themselves to be a superior choice, for they are able to successfully preclude the PAR problem while achieving the minimum value of MSE.Chu sequences [20] and the class of training sequences proposed in [21] are examples of those sequences.Finally, the resulting minimum value of MSE can be calculated by

MMSE Estimator with Power Constraint.
In this section, we consider the linear MMSE estimation of the CFR as well as the optimal training sequence design.For simplicity, we consider only the CFR associated with the qth receive antenna, that is, h q F , which was defined in (14).Besides assumption (A1), we make one additional assumption about the channel statistics as follows.
(A2) The CFR h q F is a Gaussian random vector with zero mean and full-rank covariance matrix Σ q .
For convenience, we denote Σ q by Σ.Since h q F = (I NT ⊗F L )h q , we have where E{h q (h q ) H } is the covariance matrix of the corresponding CIR.
The MMSE estimate of the CFR can be computed through We define the CFR estimation error as e q F = h q F − h q F , then the resulting MSE can be expressed as Similar to the approach that we took in Section 3.1, we also impose a power constraint, and the design problem can be formulated into min Note that Σ can be diagonalized through its eigenvalue decomposition, that is, where V is a unitary matrix whose columns are eigenvectors of Σ, and Λ is a nonnegative and diagonal matrix consisting of all the eigenvalues of Σ.Then, (36) can be reformulated into where

Consequently, we can reformulate the optimization problem into min
Using the method of Lagrange multipliers [19], we can obtain the following solution to the modified optimization problem where Λ ii denotes the (i, i)th element of Λ, and the value of τ can be found by solving Alternatively, Q can be rewritten as Thus, the resulting MSE can be computed through It is worth noting that Ψ H Ψ is invariant to the postmultiplication of Ψ by a semi-orthogonal matrix.Thus, given the optimal solution for Q in (43), a general solution for Ψ can be composed as Ψ = ZQ 1/2 , where Z is an N c L × N T L matrix with its column forming an orthonormal basis.Since Ψ = S F V, it is clear that the necessary condition for S F to be optimum is S F = ZQ 1/2 V H .Meanwhile, we have S H F S F = VQV H and both sides are diagonal matrices.Considering the structure of S F in ( 14) and applying Theorem 1, we are thus led to following result.

Theorem 3. The optimal training signals under the MMSE criterion should satisfy the following condition for a specific channel statistics (i.e., Σ)
Equation ( 45) specifies the essential characteristics of the optimum sequence under the MMSE criterion.It indicates that the optimal design should employ a water-filling type power allocation.Evidently, the structure of the covariance matrix Σ will have a large impact on the optimal training signal design.For example, when Σ is diagonal, then from (34), we can see that E{h q (h q ) H } can be a block circulant matrix, and the optimum condition (45) would represent a water-filling in power distribution with respect to the power spectral density samples of the CIR.For this special case, the optimal sequence may be generated through the frequencydomain water-filling.For cases where Σ is not diagonal, the optimal condition (45) may need to be jointly considered with the Kronecker product approximation in [22].We omit further discussions for brevity.

CFR Estimation for Alamouti-Like Transmissions
Here, we study the CFR estimation for the special case of N T = 2 and N R = 1.This corresponds to the Alamoutitype transmission, where N S = N c = 2 and R = 1.The transmission structure for the training sequences is illustrated in Table 2.The length of total training symbols from each transmit antenna, N b , is equal to N b = 2(L + ν), and its minimum length is N b = 4ν+2 when L is chosen to be the minimum value ν + 1.At the receiver, CPs are removed, which yields the channel input-output relationship in matrix vector form as where x 1 (k) and x 1 (k + 1) denote two consecutive received blocks at the single receive antenna.Applying the orthonormal FFT matrix F L on (46), we obtain the frequency domain input-output relationship as shown below For this special case, the CFR estimation based on both the LS and MMSE criteria can be readily obtained by following the procedures outlined in Section 3. In this section, we further demonstrate that the CFR estimation for this special case can be implemented adaptively with block-wise recursive algorithms.Additionally, we also provide a brief convergence analysis of these algorithms.

Adaptive Implementation of CFR Estimation.
It is easy to show that the CFR estimator for this special case has the following structure where We further define L × L diagonal matrices X 1F (k) = diag{x 1F (k)} and X 1F (k + 1) =diag{x 1F (k + 1)}.Then, (49) can be reformulated into or the simplified form where in (50), Φ = [X H 1F (k)X 1F (k) + X H 1F (k + 1)X 1F (k + 1)] −1 ; hF is a 2L × 1 vector; U F is an orthogonal matrix with the size of 2L × 2L; g F is a 2L × 1 vector that contains the elements of g 1F and g 2F .
We would like to emphasize that this reformulation from (49) to (50) is largely attributed to the benign property of Alamouti's code.This, as a result, enables the CFR estimation to be performed adaptively, and the channel to be tracked when the adaptive filter operates.To be more specific, we can view U F as the tap-input data matrix, g F as the output, and hF as the filter coefficients.The block diagram of this adaptive filter is depicted in Figure 2. We further define the error signal ȇF , which is generated by comparing the filter output with the desired response, that is, Note that as g F is fixed and already available beforehand at the receiver, the adaptive filter can always operate at the training mode.Hence, if the channel is slowly time-varying, the adaptive method, through estimating the current channel gains based on the previous channel estimate, can achieve accuracy refinement without significantly increasing the complexity.Simulation results illustrating this can be found in Section 5.For notational convenience, we add in the time index for vectors or matrices in the ensuing description.And we summarize the recursive algorithms that are used to update the CFR estimate in Table 3, which include the block least mean square (LMS) algorithm and the block recursive least squares (RLS) algorithm.
The block RLS algorithm usually achieves a quicker convergence than the block LMS algorithm (as will be shown later by simulation results).But such a quick convergence is attained at the cost of a heavy increase in the computational complexity.To exemplify this, let us examine the computational complexity of both algorithms.At each iteration, the block LMS algorithm requires around O(8L) computations, while the block RLS algorithm requires O(24L 3 + 20L 2 + 4L) operations.A fast version of this block RLS algorithm, namely fast subsampled-updating RLS algorithm [23], can be used to achieve some complexity reduction, but may make this filter cumbersome.Fortunately, thanks to the special structure of the Alamouti's code, it is easy to verify that U H F (k)U F (k) = I 2L .Furthermore, we can induce that P (k) (cf.Table 3) is a 2L × 2L diagonal matrix, that is, P (k) = I 2 ⊗ P(k), where P(k) denotes an L × L diagonal matrix.Then, by following a similar technique used in [24,25], we can avoid the need for matrix inversion in the block RLS algorithm and hence can eventually achieve a substantial reduction in the computational complexity but without losing the convergence advantage.For brevity, we summarize the simplified algorithm in Table 4.This simplified algorithm requires only O(13L) operations for each iteration, which is much less than that of the original block RLS algorithm.
It is worthwhile to make a remark here that the above adaptive implementation of the CFR estimation is a special property owned by the Alamouti scheme with N T = 2.When N T increases beyond 2, the linear CFR estimator G F , under both the LS and MMSE criteria (cf.( 18) and ( 35)), will no longer have the simple Alamouti's structure.And so, a similar transformation as that from (49) to (50) may not necessarily where μ denotes the step size.
For each instant of time, k = 2, 4, . .., compute hold.Then, the adaptive implementation for CFR estimation for cases of N T > 2 requires further investigation.

Convergence Analysis.
Convergence behaviors of these block-level recursive algorithms are briefly discussed as follows.We are interested in the behavior of ξ(k) = E{ȇ F (k)ȇ H F (k)}, particularly at the steady state, where ȇF (k) denotes the error signal, as defined in (52).For the block LMS algorithm, we define the weight-error vector as where h F ,0 is the optimum tap-weight vector for the filter.Thus, we have Defining e F ,0 (k Let the weight-error correlation matrix be given as Thus, the MSE of weight vector error can be obtained by simply taking the trace of R vv (k).To facilitate the convergence analysis, we make the following assumptions.
(A3) Elements of e F ,0 (k) are samples of a white noise process, which implies that E{e F ,0 (k)e H F ,0 (k)} = ξ min •I 2L , where ξ min is the minimum MSE at the filter output.
(A4) U F (k) and e F ,0 (k) are jointly Gaussian, and are uncorrelated with each other.
(A5) v F (k) is independent of U F (k) and e F ,0 (k).Further, we assume , where R uu is the correlation matrix of the filter tap inputs.
Based on the above assumptions and following a similar procedure in [26, Appendix 8A], we can compute the excess MSE, which is defined as the difference between the steadystate MSE (i.e., ξ(k = ∞)) and the minimum MSE ξ min of an adaptive filter, approximately by where tr(R uu ) is equivalent to the sum of the powers of the signal samples at the filter tap inputs.Accordingly, the misadjustment, a dimension-free degradation measure that is defined as the ratio of the steady-state value of the excess MSE to the minimum MSE, can be written as Also, the steady-state MSE of the block LMS algorithm is given by It is obvious that the convergence behavior of the block LMS algorithm is governed by the eigenvalues of the correlation matrix R uu of the filter tap input.Therefore, similar to the conventional LMS algorithm, the block LMS algorithm in nature is also a stochastic implementation of the steepestdescent method [26].
For the block RLS algorithm, its convergence analysis is undertaken on an adaptive identification scheme [27].We consider a linear multiple regression model characterized by where h F ,0 is the regression parameter vector, U F (k) is the tap-input matrix, e F ,0 (k) is the measurement noise, and g F (k) is the desired response.We define the weight error vector v F (k) the same as in (53) and its correlation matrix R vv (k) the same as in (56).Further, we assume that the input signal vector is drawn from a stochastic process which is ergodic in the autocorrelation function, thus the time average can be used instead of the ensemble average [28].Then, for λ < 1, following a similar approach as described in [27] for the analysis of RLS algorithms, the excess MSE for this block RLS algorithm at steady state can be written as and the misadjustment is simply Finally, the steady-state MSE is approximately given by

Simulation Results
In this section, we provide some simulation results to demonstrate the efficacy of our proposed scheme.In our simulations, we employ a specific block structure for both data and training sequences, which is illustrated in Figure 3, taking the case of N T = 2 as an example.This structure would be able to accommodate the proposed CFR estimation scheme and various FDE techniques.We assume the channel is frequency selective with channel memory ν = 3, and further assume block fading, that is, the channel fading gains are constant over one ST-coded block including both data and training subblocks, but vary from block to block.For simplicity, we assume no a priori knowledge is available regarding the channel second-order statistics.Hence, only LS method is considered in our simulations.Chu sequences [20], a special case which satisfies the optimal condition given in (32), are chosen to be the training sequences.We use 8-PSK for data transmission without channel coding.At the receiver, channel estimation and equalization are both processed in the frequency domain.As a result, the FFT modules for FDE can be easily reused for the CFR estimation.Several different FDE approaches that are applicable to the structure shown in Figure 3 can be found in [9], and are employed in our simulations.
Figures 4(a) and 4(b) illustrate the BER performance corresponding to the frequency-domain MMSE linear equalization and MMSE decision-feedback equalization, respectively, under both CFR estimation and perfect CFR knowledge.When L = 4 (N b = 14), that is, the minimum length to estimate the CFR, we have P 0 = 16.The performance penalties due to inaccurate channel estimation, if evaluated at BER = 10 −4 , are about 2.4 dB for the decision-feedback equalization and 2.8 dB for the linear equalization.When L extends to 7, or equivalently N b extends to 20 as shown in Figure 3, P 0 is accordingly increased to 28.Then, the BER performance penalties for the decision-feedback equalization and the linear equalization are reduced to 1.1 dB and 1.9 dB, respectively.
Furthermore, we also compare the performance of our approach with the method proposed in [29].The approach reported in [29] was designed for channel estimation in MIMO systems with SC-FDE.It allows the transmitted sequence to be nulled on certain frequency tones, causing the transmitted training sequences to be orthogonal in the frequency domain.Essentially, this approach [29] is equivalent to the on-off type estimation for each channel.To ensure a fair comparison, we apply the reference method [29] to the same structure depicted in Figure 3 for the case of N T = 2.Then, both our scheme and the reference scheme [29] will achieve full rate, that is, R = 1.Since there are 20 symbols in total allocated for the channel parameter estimation in the structure shown in Figure 3, when implementing the approach reported in [29], we allocate 16 for training sequences, and 4 (rather than ν = 3) for the CP.This is because it is required in [29] that the length of training sequences must be evenly divisible by N T .Furthermore, in the simulations, Chu sequences [20] are also adopted as the training sequences for this benchmark approach, as they as well satisfy the condition of optimality described in [29].The BER performance of such an algorithm is depicted in Figure 4 by dash-dot lines.As illustrated by Figure 4, the system using our proposed scheme performs as well as, if not better than, the system using the approach described in [29].However, considering the fact that implementation of the method given in [29] requires the transformation from CFR to CIR and then back to CFR (see details in [29]), our approach appears much simpler and straightforward.
Under similar simulation set-up, we also study the case of 2TX-2RX where the Alamouti-type STBC is employed at the transmitter side.At the receiver side, CFR estimation is performed based on the received signals across those two receive antennas, which is followed by FDE.In particular, we consider the equal gain diversity combining in the frequency domain.We further consider the case of 3TX-1RX, where the code design illustrated in (7) is used.BER performance of these scenarios under the frequency-domain linear equalization is depicted in Figure 5.For the purpose of comparison, we also plot in the same figure the BER CFR estimation with L = 4 CFR estimation with L = 7 Reference method [29] Ideal CFR knowledge Reference method [29] Ideal CFR knowledge performance of the 2TX-1RX case.From these curves, we notice that performance penalties due to inaccurate channel estimation are almost the same for the 2TX-1RX and 2TX-2RX cases, but are relatively smaller for the 3TX-1RX case.Furthermore, because of the addition of one more receive antenna, the BER performance of 2TX-2RX is much improved over that of the 2TX-1RX case.However, as shown in Figure 5, the BER performance of 2TX-2RX is inferior to that of the 3TX-1RX case.This is largely due to the fact that this 3TX-1RX system we consider here is not a full-rate system (i.e., R = 1/2), which is in contrast to those systems employing two transmit antennas and Alamouti-type code.
For the special case of N T = 2, we also conduct simulations to study the behaviors of these adaptive estimation algorithms.For simplicity, Chu sequences [20] are used again in our simulations.We set L = ν + 1, P 0 = 16, and σ 2 n = 0.1.Block fading is still adopted, but the channel fadings are further assumed to be correlated in the time domain.This means the Doppler spread is introduced, and it may affect performance of the adaptive algorithms, as will be confirmed later.The rate of fading in our simulations is determined by f d T, where f d denotes the maximum Doppler frequency shift and T denotes the duration of one whole ST-coded block.A larger value of f d T implies faster fading and vice versa.The following simulation results are obtained by setting f d T = 10 −4 , unless otherwise stated.Figure 6(a) shows a plot of the squared error ȇF (k) 2 versus the number of iterations for a single run or trial of the block-wise LMS and RLS algorithms.Since those algorithms only iterate once for each ST-coded frame, the number of iterations also corresponds to the number of frames.As is shown by Figure 6(a), the learning curves for a single trial of both adaptive algorithms exhibit a noisy form.However, it is clearly seen that the block RLS algorithm converges much faster than the block LMS algorithm.Additionally, we are also interested in the behavior of the squared error deviation v F (k) 2 for both algorithms.For the same realization, Figure 6  the filter from scratch by simply initializing elements of the channel estimate (i.e., the filter coefficients) all to zeros.This is to demonstrate the convergence performance of these block-wise algorithms.However, in practice, it is certainly possible to speed up the convergence process and reduce the amount of training data.For example, for the first frame, we can obtain the channel estimate by using nonadaptive approach from (49) (i.e., hot start initialization).Afterwards, we can apply the adaptive method.
Given the same set of parameters that lead to the results shown in Figure 6, we conduct 100 independent trials and compute the ensemble average.In Figure 7(a), we plot the learning curves of both block-wise algorithms for E{ ȇF (k) 2 } versus the number of iterations.It is clearly seen that ensemble averaging helps smooth out the effects of gradient noise in the learning curves.For the same set of trials, we also compute the corresponding values of E{ v F (k) 2 }, and plot them in Figure 7(b).In addition, for the purpose of comparison, MSE values obtained from the nonadaptive CFR estimation experiments based on the LS method are also plotted in the same figure, together with the theoretical value.Such a theoretical MSE value can be obtained by plugging these simulation parameters into (33), and we obtain E{ E F 2 } = 0.4.The subplot in Figure 7(b) indicates a very good match between the simulated values and the theoretical one, which in turn corroborates the correctness of our derivations.Moreover, it is also observed that after the learning curves converge (especially for the block RLS algorithm), the MSE values attained are much smaller than those obtained by the nonadaptive method or the one computed theoretically.This basically demonstrates the performance advantage of using this adaptive approach.
Finally, we provide some simulation results in Figure 8 to demonstrate the error performance of both adaptive estimation algorithms at higher Doppler spreads.In particular, we consider three different Doppler spreads: f d T = 10 −4 , 10 −3 , and 10 −2 .And we conduct 100 independent trials for each case.For simplicity, we leave the step sizes unchanged in our simulations, that is, λ = 0.8 and μ = 0.08; but note that it is desirable to reduce them accordingly as the frequency dispersion or Doppler spread increases.Here, we only study the behavior of E{ v F (k) 2 }, and for the ease of comparison, we also plot in Figure 8 the theoretical MSE value of the nonadaptive LS estimation approach.The results shown in Figure 8 indicate that as the Doppler spread increases moderately, for example, from f d T = 10 −4 to f d T = 10 −3 , the estimation accuracy of both algorithms will degrade a little, but not severely.However, further increase in the Doppler spread, for example, from f d T = 10 −3 to f d T = 10 −2 , will lead to a drastic degradation in the estimation accuracy for both algorithms.In fact, in this case, the estimation accuracy of each of these two adaptive algorithms is inferior to that of the nonadaptive estimation approach, indicating that they are unable to track faster channel variations and thus may no longer be usable in practice.

Conclusion
In this paper, we presented and studied a training-based CFR estimation scheme for ST-coded MIMO systems with SC-FDE.This scheme is different from the traditional one which obtains the CIR firstly then transfers it to the CFR.In this scheme, CFR estimation is jointly implemented with FDE; thus, estimate of the CFR can be obtained directly and the hardware complexity of the transceiver can also be reduced.To be more specific, training sequences are ST block encoded at the transmitter using the same encoder for data sequences.At the receiver, similar procedures are applied to both data and training sequences, including the CP removal and FFT processing.Then, estimation of the CFR is performed immediately afterwards.Conditioning on different a priori channel knowledge, we further studied the CFR estimation based on two criteria: LS and MMSE.A thorough analysis of the MSE in estimating the CFR was provided under each criterion.Moreover, imposing a constraint on the transmit power of training sequences, we also investigated the optimal design of training signals.It is shown that under the LS criterion, training sequences having a constant sum magnitude at each frequency tone, such as Chu sequences, will lead to the least MSE.For the MMSE criterion, we have shown that the optimal design of training sequences features a water-filling-type power distribution.Additionally, we demonstrated that adaptive implementation of the CFR is feasible when the number of transmit antennas is equal to 2, which is due to the benign property of Alamouti's code.However, we feel that the identical property may not be possessed when N T increases beyond 2 although it may need further investigation.

Figure 1 :
Figure 1: Block diagram of the CFR estimation for MIMO system with STC and SC-FDE.

Figure 2 :
Figure 2: Block diagram of the adaptive filter.

Figure 3 :
Figure 3: Block structure for both data and training sequences.

5 BER
CFR estimation with L = 4 CFR estimation with L = 7

Figure 4 :Figure 5 :
Figure 4: BER performance with FDE under CFR estimation and perfect CFR knowledge.
(b)  shows the transient behavior of v F (k) 2 for both algorithms.As ȇF (k) 2 converges, v F (k) 2 converges accordingly.But notice that the curves in both figures are plotted at different vertical scales.It is worth noting that in our simulations, we ran v F (k)2