Downlink Channel Estimation in Cellular Systems with Antenna Arrays at Base Stations Using Channel Probing with Feedback

In mobile communication systems with multisensor antennas at base stations, downlink channel estimation plays a key role because accurate channel estimates are needed for transmit beamforming. One e ﬃ cient approach to this problem is channel probing with feedback. In this method, the base station array transmits probing (training) signals. The channel is then estimated from feedback reports provided by the users. This paper studies the performance of the channel probing method with feedback using a multisensor base station antenna array and single-sensor users. The least squares (LS), linear minimum mean square error (LMMSE), and a new scaled LS (SLS) approaches to the channel estimation are studied. Optimal choice of probing signals is investigated for each of these techniques and their channel estimation performances are analyzed. In the case of multiple LS channel estimates, the best linear unbiased estimation (BLUE) scheme for their linear combining is developed and studied.


INTRODUCTION
In recent years, transmit beamforming has been a topic of growing interest [1,2,3,4,5]. The aim of transmit beamforming is to send desired information signals from the base station array to each user and, at the same time, to minimize undesired crosstalks, that is, to satisfy a certain quality of service constraint for each user. This task becomes very complicated if the transmitter does not have precise knowledge of the downlink channel information for each user. Therefore, the beamforming performance severely depends on the quality of channel estimates and an accurate downlink channel estimation plays a key role in transmit beamforming [6,7,8,9]. One of the most popular approaches to downlink channel estimation is channel probing with user feedback [1,2]. This approach suggests to probe the downlink channel by transmitting training signals from the base station to each user and then to estimate the channel from feedback reports provided by the users.
In this paper, we study the performance of channel probing with feedback in the case of a multisensor base station antenna array and single-sensor users [2]. We develop three channel estimators which offer different tradeoffs in terms of performance and a priori required knowledge of the channel statistical parameters. First of all, the traditional least squares (LS) method is considered which does not require any knowledge of the channel parameters. Then, a refined version of the LS estimator is proposed (which is referred to as the scaled LS (SLS) estimator). The SLS estimator offers a substantially improved performance relative to the LS method but requires that the trace of the channel covariance matrix and the receiver noise powers be known a priori. Finally, the linear minimum mean square error (LMMSE) channel estimator is developed and studied. The latter technique is able to outperform both the LS and SLS estimators, but it requires the full a priori knowledge of the channel covariance matrix and the receiver noise powers. For each of the aforementioned techniques, the optimal choices of probing signal matrices for downlink channel measurement are studied and channel estimation errors are analyzed. Moreover, in the case of multiple LS channel estimates, an optimal scheme for their linear combining is proposed using the so-called best linear unbiased estimation (BLUE) approach. The effect of such a combining on the performance of downlink channel estimation is investigated.

BACKGROUND
We assume a base station array of L sensors and arbitrary geometry and consider the case of flat block fading 1 [2]. In this case, the signal received by the ith mobile user can be expressed as follows: where s(k) is the transmitted signal, w is the L × 1 downlink weight vector, h i is the L × 1 vector which describes an unknown complex vector channel from the array to the ith user, n i (k) is the user zero-mean white noise, and (·) H stands for the Hermitian transpose.
In order to measure the vector channel for each user, the method of [2] suggests to use the so-called probing mode to transmit N ≥ L training signals s(1), . . . , s(N) from the base station antenna array using the beamforming weight vectors w 1 , . . . , w N , respectively. The received signals at the ith mobile can be expressed as follows: where . . , n i (N)] T , and (·) * and (·) T stand for the complex conjugate and the transpose, respectively. Then, each receiver (mobile user) employs the information mode to feed the data received in the probing mode back to the base station where these data are used to estimate the downlink vector channels. Alternatively (to decrease the amount of feedback bits), channel estimation can be done directly at each receiver. In the latter case, receivers feed the corresponding channel estimates back to the base station.

LS CHANNEL ESTIMATION
Knowing r i , the downlink vector channel between the base station and the ith user can be estimated using the least LS approach as [2] where W † = (WW H ) −1 W is the pseudoinverse of W H . Assume that the transmitted power in the probing mode is constrained as: where P is a given power constant. We find W which minimizes the channel estimation error for the ith user subject to the transmitted power constraint (5). This is equivalent to 1 The flat fading assumption is valid for narrowband communication systems. the optimization problem where E{·} is the statistical expectation. Using (2) and (4), we have that h i −ĥ i = W † n i and, hence, the objective function in (6) can be rewritten as where we use the fact that E{n i n H i } = σ 2 i I. Here, σ 2 i is the noise power of the ith user, I is the identity matrix, and tr{·} denotes the trace of a matrix.
Using (7), the optimization problem (6) can be equivalently written in the following form: We obtain the solution to this problem using the Lagrange multiplier method, that is, via minimizing the function where λ is the Lagrange multiplier.
To compute ∂L(W, λ)/∂W H , the following lemma will be useful. Lemma 1. If a square matrix F is a function of another square matrix G = A + BX + X H CX, then the following chain rule is valid: where A, B, and C are constant matrices and the dimensions of all the matrices in (10) are assumed to match.
Proof. See Appendix A.
Furthermore, the following expressions for the matrix derivatives of traces will be used [10]: Applying (11) and (12) to (13), we can transform the latter equation as Using (14) and applying (11) to compute ∂ tr{WW H }/∂W H in the second term of (9), we have that Setting (15) to zero, we obtain that any probing matrix is the optimal one if it satisfies the equation Since WW H is Hermitian and positive definite, we can write its eigendecomposition as where Γ is a diagonal matrix with positive eigenvalues on the main diagonal. Using the positiveness of the eigenvalues of WW H and taking into account that Q is a unitary matrix (Q H Q = QQ H = I), we have from (16) that and, therefore, Inserting (19) into (17) and using the identity QQ H = I, we obtain that W is an optimal probing matrix if Using the power constraint (5), we can rewrite (20) as Therefore, any probing matrix with orthogonal rows of the same norm √ P/L is an optimal one. Note that the similar fact has been earlier discovered from different points of view in [11,12]. With such optimal probing, the LS estimator reduces to the simple decorrelator-type estimator.
According to (21), there is an infinite number of choices of the optimal probing matrix. It is also worth noting that each optimal choice of W is user independent. Therefore, any probing matrix that satisfies (21) is optimal for all users.
It should be stressed that additional constraints on W may be dictated by particular implementation issues. For example, the peak transmitted power per antenna may be limited. In this case, we have to distribute the transmitted power uniformly over the antennas and, therefore, the additional constraint is that all the elements of the optimal probing matrix should have the same magnitude. To satisfy this con-straint, a properly normalized submatrix of the DFT matrix can be used, that is, where W N = e j2π/N . Using (21) along with (7), we obtain that the minimum downlink channel mean-square estimation error becomes We stress that the error in (23) is proportional to the square of the number of transmit antennas and this may lead to a certain restriction of the dimension of the transmit array. However, one can compensate for this effect by increasing the total transmitted power in the probing mode.
Another interesting observation is that the error in (23) is independent of the channel realization h i and the array geometry.

SCALED LS CHANNEL ESTIMATION
Obviously, the LS estimate (4) does not necessarily minimize the channel estimation error because its objective is to minimize the signal estimation error rather than the channel estimation error. Therefore, it may be possible to use an additional scaling factor γ to further reduce this error. Using this idea, applying (2) and (4), and dropping the user index i for the sake of simplicity, we can write the channel estimation error in the following form: whereĥ LS is the LS channel estimate of (4), R h = E{hh H } is the channel correlation matrix, and J LS is given by (7). Clearly, (24) is minimized with and the minimum of (24) with respect to γ is given by Note that the optimal γ in (25) is a function of the trace of the channel correlation matrix R h and the noise variance σ 2 . Therefore, these values have to be known (or preliminary estimated) when using the SLS approach. In practice, the estimate of tr{R h }, can be used in (25) in lieu of tr{R h }. Assuming that the values of tr{R h } and σ 2 are given in advance, defining the SLS channel estimate asĥ and using (4) and (25), we havê The optimal probing matrix for channel estimation using the SLS method can be found by means of solving the following optimization problem: Since tr{R h } > 0, we see from (26) that J SLS is a monotonically increasing function of J LS . Note that tr{R h } is not a function of W, and, therefore, J LS is the only term in (26) which depends on W. This means that the optimization problems (6) and (30) are equivalent. Therefore, the optimal choice of probing matrix for the SLS channel estimation technique is the same as for the LS approach.

LMMSE CHANNEL ESTIMATION
In this section, we consider the LMMSE estimator of h which is given by [13] The performance of this estimator is characterized by the error e = h −ĥ LMMSE whose mean is zero, and the covariance matrix is given by [13] The LMMSE estimation error is given by To minimize (33) subject to the transmitted power constraint tr{WW H } = P, we can use the Lagrange multiplier method. The problem can be written as follows: Using the chain rule (10), it can be readily shown that the optimal probing must satisfy Using the constraint tr{WW H } = P, (35) can be rewritten as follows: Interestingly, in the high signal-to-noise ratio (SNR) case (σ 2 → 0), (36) transforms to (21). Therefore, in the high SNR domain, the LS, SLS, and LMMSE approaches all have the same condition on optimal probing matrices. Using (36), we obtain that in the optimal probing case, Therefore, If the channel coefficients are all i.i.d. random variables, we have R h = ξ 2 I, where ξ 2 can be viewed as the channel attenuation parameter. In this case, (36) transforms to (21) and, therefore, the optimal probing matrix for the LS estimator is also optimal for the LMMSE estimator. Furthermore, in such a situation, the minimum of the channel estimation error is given by Interestingly, if R h = ξ 2 I, then (26) and (39) are identical which means that the performances of the SLS and LMMSE estimators are similar in this case.

COMBINING OF MULTIPLE LS CHANNEL ESTIMATES
In Sections 3, 4, and 5, the specific case of a single channel estimate has been considered. In this section, we extend the optimal probing approach to the case of multiple LS channel estimates. If there are multiple probing periods available within the channel coherency time, it may be inefficient from the computational and buffering viewpoints to store and process dynamically long amounts of data that are formed by accumulation of multiple received data blocks corresponding to different probing periods. A good alternative here is to obtain a particular channel estimate for each probing period and then to store these estimates dynamically rather than storing the data itself, and to compute the final channel estimate based on a proper combination of such (previously obtained) particular estimates. Let us have K estimatesĥ i,1 , . . . ,ĥ i,K of the downlink channel corresponding to the ith user. Let each estimate be computed using (4) based on some probing matrices W 1 , . . . , W K , respectively. The channel is assumed to be quasistatic (fixed) at the interval of K probings, and P k = W k 2 F is the transmitted power during the kth probing.
We aim to improve the performance of downlink channel estimation by combining the estimated valuesĥ i,k for k = 1, . . . , K in a linear way as follows: where α i,k are unknown weighting coefficients.
Let us obtain the optimal weighting coefficients by means of minimizing the error in (40). Then, these coefficients can be found by solving the following optimization problem: where the constraint in (41) guarantees the unbiasedness of the final channel estimate. This problem corresponds to the so-called BLUE estimator [13].
The solution to (41) is given by the following lemma.
Lemma 2. The optimal weights {α i,k } K k=1 for the ith user are given by It is worth noting that the optimal weighting coefficients α i,k are user independent (i.e., they are the same for each user).
Choosing optimal orthogonal weighting matrices in each probing, we have where is the total transmitted power during the K probings. Inserting (43) into (42), we obtain that in the case of using optimal orthogonal weighting matrices, the expression for optimal weighting coefficients can be simplified to In this case, the downlink channel estimation error is given by where n i,k is the zero-mean white noise vector of the ith user in the kth probing. When deriving (46), we have used the property E{n i,k n H i,l } = σ 2 i δ k,l I, where δ k,l is the Kronecker delta.
We observe that, similar to (23), the error in (46) is independent of the channel realization and the array geometry. Comparing (46) with (23), we see that the optimal linear combining of multiple estimates reduces the estimation error by a factor of P tot /P. For example, if each probing has the same power (P k = P, K = 1, 2, . . . , K), then P tot = KP and the estimation error is reduced by a factor of K.

NUMERICAL EXAMPLES
In our simulations, we compare the performance of the LS, SLS, and LMMSE channel estimators in the cases of optimal and nonoptimal choices of probing matrices. Throughout all our simulation examples, we assume that N = L. The channel coefficients and the receiver noise are assumed to be circular complex Gaussian random variables with the unit variance.
We assume that the base station has a uniform linear array and the downlink channel correlation matrix R h has the following structure: where n and m are the indices of the array sensors. This model of the array covariance is frequently used in the literature; see [14,15,16] and references therein. The elements of L × L probing matrices W in the case of nonoptimal probing have been drawn independently from a zero-mean complex Gaussian random generator in each simulation run. However, to avoid possible computational inaccuracy of the LS estimator, we have ignored all probing matrices that have resulted in a condition number of WW H greater than 10 4 . Each simulated point is obtained by averaging 5000 independent simulation runs.
In Figure 1, we display the mean of the norm squared of the channel estimation error (MNSE) of the LS channel estimator in the optimal and nonoptimal probing matrix cases. P/σ 2 . Note that the performance of the LS estimator is independent of the parameter r. The parameter L is varied in this figure.
In Figure 2, the performance of the SLS estimator is tested under the similar conditions. Similar to the LS method, the performance of the LS estimator is independent of the parameter r. In both figures, the channel covariance matrix R h is assumed to be known exactly. Other conditions are similar to that of Figures 1 and 2.
From Figures 1, 2, 3, and 4, it can be seen that the optimal probing improves the quality of channel estimation substantially for all methods. Note that this improvement is especially pronounced for large values of P/σ 2 if the SLS or LMMSE method is used. Comparing Figures 3 and 4, we also see that these figures give nearly the same results. This means that moderate correlation of the channel coefficients does not affect the LMMSE approach.
As it has been mentioned before, the SLS channel estimator requires the knowledge of tr{R h }. However, note that the LS estimator can be applied to estimate this parameter using (27). In Figure 5, the MNSEs of the SLS estimator with optimal probing are plotted versus P/σ 2 in the cases when the exact and estimated values of tr{R h } are used. In the latter case, the LS method is applied to obtain the estimate of tr{R h } which is then inserted into the SLS estimator. All other conditions are similar to that of the previous figures.
In the LMMSE method, the full knowledge of the channel correlation matrix R h is required either at the base station or at the mobile station to estimate the channel (depending on where the channel estimation is done). Also, the base station transmitter has to know this matrix in order to compute the optimal probing matrix. However, one may use the following rank-one estimate of this matrix: In Figure 6, the performance of the LMMSE channel estimator is tested versus P/σ 2 in the cases when R h is known exactly and when its estimate (48) is used. In the latter case, the optimal LS probing is used (note, however, that in the general case, such a probing is nonoptimal for the LMMSE approach). The value of L is varied in this figure and r = 0.25 is taken. From Figures 5 and 6, we see that there are only small performance losses caused by using the estimated values of tr{R h } and R h in the SLS and LMMSE estimators, respectively, in lieu of the exact values of tr{R h } and R h . Also, from Figure 6, we see that the optimal LS probing becomes nearly optimal for the LMMSE approach starting from moderate values of SNR. This observation supports theoretical results of Section 5.  Figure 7 compares the performances of the LS, SLS, and LMMSE estimators versus P/σ 2 . In this figure, we assume that r = 0.25, and two variants of the LMMSE estimator are considered. Both these variants assume that the estimator knows R h exactly, but the first variant uses the optimal probing signal that satisfies (36), while the second one employs the matrix which satisfies (21) and, therefore, is optimal only for LS and SLS estimators and/or for the uncorrelated channel case (r = 0). From this figure, we observe that the difference in performance between the first and second variants of the LMMSE estimator is negligible at all the tested values of SNR. Therefore, the LS/SLS probing appears to be suboptimal for the LMMSE estimator.
In the last example, the case of multiple LS channel estimates are assumed. In Figure 8, the parameter L = 4 is chosen and the performance of the BLUE estimator is compared for K = 2 and K = 4. Three cases are considered in this figure: (i) both the probing matrices and the coefficients α i,k are optimal; (ii) the probing matrices are nonoptimal but the coefficients α i,k are optimal; (iii) both the probing matrices and the coefficients α i,k are nonoptimal.
In the third case, the coefficients α i,k = 1/K are assumed for all i and k. Figure 8 demonstrates substantial improvements which can be achieved when the BLUE estimator is used in the case of multiple channel estimates. This figure also shows that the choice of optimal probing matrices and coefficients α i,k is critical for the estimator performance as nonoptimal choices of one or both of these parameters may cause a severe performance degradation.

CONCLUSIONS
We have studied the performance of the channel probing method with feedback using a multisensor base station antenna array and single-sensor users. Three channel estimators have been developed which offer different tradeoffs in terms of performance and a priori required knowledge of the channel statistical parameters. First of all, the traditional LS method has been considered. The LS estimator does not require any knowledge of the channel parameters. Then, a new (refined) version of the LS estimator has been proposed. This refined technique is referred to as the SLS estimator. It has been shown to offer a substantially improved channel estimation performance relative to the LS method but requires that the trace of the channel covariance matrix and the receiver noise powers be known a priori. Finally, the LMMSE channel estimator is developed and studied. The latter technique has been shown to potentially outperform both the LS and SLS estimators, but it requires the full a priori knowledge of the channel covariance matrix and the receiver noise powers.
For each of the above mentioned techniques, the optimal choices of probing signal matrices for downlink channel measurement have been studied and channel estimation errors have been analyzed. In the case of multiple LS channel estimates, the BLUE scheme for their linear combining has been developed.
Simulation examples have demonstrated substantial performance improvements that can be achieved using optimal channel probing.

A. PROOF OF LEMMA 1
First of all, we prove the chain rule for the particular case when G = BX. Writing this equation elementwise, we have g i,l = k b i,k x k,l and, therefore, and the proof for the particular case G = BX is completed.
To extend the proof to the general case G = A + BX + X H CX, we notice that this equation can be rewritten as G = A + (B + X H C)X and, therefore, the established result for the particular case G = BX can be applied taking into account that the matrix A is constant and that ∂ tr{B + X H C}/∂X = 0. In other words, replacing the matrix B by the matrix B+X H C, we straightforwardly extend our proof to the general case.