Low-mobility channel tracking for MIMO–OFDM communication systems

Pagadarai, Srikanth; Wyglinski, Alexander M; Anderson, Christopher R

doi:10.1186/1687-6180-2013-78

Research
Open access
Published: 15 April 2013

Low-mobility channel tracking for MIMO–OFDM communication systems

Srikanth Pagadarai¹,
Alexander M Wyglinski¹ &
Christopher R Anderson²

EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 78 (2013) Cite this article

2483 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

It is now well understood that by exploiting the available additional spatial dimensions, multiple-input multiple-output (MIMO) communication systems provide capacity gains, compared to a single-input single-output systems without increasing the overall transmit power or requiring additional bandwidth. However, these large capacity gains are feasible only when the perfect knowledge of the channel is available to the receiver. Consequently, when the channel knowledge is imperfect, as is common in practical settings, the impact of the achievable capacity needs to be evaluated. In this study, we begin with a general MIMO framework at the outset and specialize it to the case of orthogonal frequency division multiplexing (OFDM) systems by decoupling channel estimation from data detection. Cyclic-prefixed OFDM systems have attracted widespread interest due to several appealing characteristics not least of which is the fact that a single-tap frequency-domain equalizer per subcarrier is sufficient due to the circulant structure of the resulting channel matrix. We consider a low-mobility wireless channel which exhibits inter-block channel variations and apply Kalman tracking when MIMO–OFDM communication is performed. Furthermore, we consider the signal transmission to contain a stream of training and information symbols followed by information symbols alone. By relying on predicted channel states when training symbols are absent, we aim to understand how the improvements in channel capacity are affected by imperfect channel knowledge. We show that the Kalman recursion procedure can be simplified by the optimal minimum mean square error training design. Using the simplified recursion, we derive capacity upper and lower bounds to evaluate the performance of the system.

Introduction

In the presence of a rich scattering environment, multiple-input multiple-output (MIMO) systems enable a linear increase in capacity with no increase in bandwidth or transmit power compared to single-input single-output (SISO) systems. However, the seminal work of [1] is based on the assumption that the channel is perfectly known to the receiver. In practical systems, the estimated channel using training sequences can be imperfect. As a result, there is potentially a mutual information loss between the input and the output of the channel. Given a power budget and a desired data rate, the time and power spent on training versus information symbols have to be judiciously selected since there is an interesting interplay involving information throughput and the quality of the channel estimates. If a large fraction of the time and/or power is spent on training, excellent channel estimates can be obtained at the expense of poor information throughput. Conversely, expending too little time and/or power on training results in poor channel estimates that lead to error-prone information symbol transmission. Receivers that rely on channel estimates to perform information symbol decoding are termed as “mismatched” receivers [2–5]. In this article, we study this scenario involving a transmitter with no channel state information (CSI) communicating with a receiver that relies on imperfect channel estimates. A different problem which deals with transmit and receive precoder design under the assumption that CSI is available at the transmitter has been studied extensively in the published literature e.g., see [6–8].

The problem of channel estimation has been studied in numerous contexts. Here, we list a few relevant studies. For an exhaustive survey of the area of channel estimation using known pilot sequences, see [9]. One of the earliest works in formulating training designs to obtain channel estimates for orthogonal frequency division multiplexing (OFDM) systems was [10]. In [11], optimal training designs have been designed for single-carrier and OFDM systems by maximizing a tight lower bound on the ergodic training-based independent and identically distributed (i.i.d.) capacity. Optimal pilot symbol design and their placement in a packet were addressed for both SISO and MIMO systems in [12] by minimizing the Bayesian Cramer-Rao bound (CRB) of a semi-blind channel estimator. In [13], a general affine-precoding framework [14] was considered and it was shown that decoupling channel estimation from symbol detection and optimizing a least-squares channel estimator naturally leads to an OFDM system with information and training symbols on disjoint subcarriers. Considering the same framework, Ohno and Giannakis [15] provide a link between optimal training designs and maximizing the channel capacity lower bound similar to [11]. This work was extended in [16] to include a MIMO communications setup. Furthermore, by considering block-processing of transmitted symbols with a cyclic-prefix or zero-padding, optimal training designs are provided that maximize the channel capacity lower bound when a linear minimum mean square error (LMMSE) estimator is employed.

The impact of receiver estimation error from an information theoretic viewpoint has also extensively been studied. One of the earliest studies was conducted in [4] where the relationship between lower and upper bounds on the mutual information between transmitted and estimated Gaussian symbols is derived by modeling a time-varying frequency-selective channel as a random component with a known mean and a covariance that accounts for the channel estimation error. Specifically, it was shown that signal-to-noise ratio in the mutual information lower bound is lowered as a result of imperfect channel knowledge. In [17], the achievable data rate of a flat-fading interleaved MIMO channel is related to the LMMSE covariance matrix. In [5], the transmission of Gaussian symbols through a flat-fading channel was considered and it was demonstrated that when the Gaussianity assumption on the additive noise is rendered invalid due to channel estimation errors, scaled nearest neighbor detection is suboptimal. In [18], a lower bound on the capacity of a time-multiplexed training scheme in the presence of a flat-fading channel was studied and related to the variance of an LMMSE channel estimator. In [19], two pilot arrangement schemes were considered and the impact of the receiver estimation error was analyzed when CSI is available only at the receiver and when it is also fed back to the transmitter. In both cases, maximum likelihood channel estimation was considered. The relationship between the symbol Bayesian CRB and the mutual information between estimated and transmitted symbols was shown in [20]. In this study, two strategies were considered. One, when the receiver obtains joint Bayesian channel and symbol estimates and two, when the receiver computes channel estimates followed by their utilization in obtaining symbol estimates. The model presented in [18] was generalized in [21] by considering a superimposed training scheme of which time multiplexed training can be termed as a special case. Based on the mutual information bounds derived, a comparison between the superimposed training and the conventional time multiplexed training is performed by optimizing over training design, number of transmit antennas, and a training/information symbol power budget. While Hassibi and Hochwald [18] provide the optimal noise covariance matrix that maximizes a tight lower bound on the mutual information between the input and the output when both the transmitter and the receiver have imperfect CSI, Ding and Blostein [22] provide the optimal signal covariance matrix and show that the uniform power allocation scheme is suboptimal.

Our most important contribution in this article is the result provided in Theorem 1. Although this result is similar to that provided in ([23], Lemma 1), the approach that we have taken, i.e., the application of the theory of complex-valued differentials to compute the Bayesian Fisher Information Matrix (FIM), is novel. Second, although the decoupling of channel and symbol estimation has been noted based on the structure of the least-squares channel estimator for SISO systems [13] as well as MIMO–OFDM systems [24], we arrive at this conclusion by maximizing the Bayesian FIM of a general affine-precoded MIMO system. Moreover, we extend the analysis conducted regarding Kalman channel tracking of SISO–OFDM systems in [25] to MIMO–OFDM systems. In the process, we extend the discussion in [18, 21] by considering a “slowly” time-varying frequency-selective channel. In other words, while both [18, 21] consider a block-invariant frequency-flat channel, we consider a frequency-selective channel that is correlated over successive symbol blocks

The system model considered is described in Section 2. Based on this system model, we derive the Bayesian FIM of a general MIMO communications system that employs affine precoding at the transmitter in Section 2. We then show that in order to decouple channel estimation from data detection, an orthogonality constraint has to be met between the training and linear precoder matrices. A solution to the orthogonality constraint is the MIMO–OFDM system with frequency domain multiplexing (FDM) training symbols. We consider a MIMO channel that undergoes block-wise variations according to a first-order autoregressive (AR) model. Moreover, in order to improve the information throughput while understanding the impact of imperfect channel estimates, we formulate a scheme where during the training phase, the OFDM symbol contains training and information symbols, whereas in the data transmission phase, only information symbols are transmitted. Consequently, we consider a scheme wherein during the training phase, channel tracking is performed by a Kalman filter followed by the estimation of information symbols during the data phase based on channel state prediction in Section 2. Using this setup, we derive the capacity upper and lower bounds in Section 2 based on a training scheme that has been derived in an MMSE minimizing sense. We then provide simulation examples in Section 2 to support the theoretical results.

System model

In our analysis, we consider a MIMO communications system consisting of K transmit antennas that transmit N training and information symbols over a time-varying frequency-selective block fading channel. We design super-imposed training symbols optimally such that the channel estimates from N _t consecutive blocks of training symbols are utilized in the data detection of the following N _d information symbol blocks. We assume that the receiver also has K receive antennas without loss of generality. The maximum order of the discrete-time complex baseband wireless channels, L, is assumed to be known.

Training phase

In the training phase, training symbols and information symbols are affinely-precoded [14] and transmitted over K antennas. A matrix formulation of this system for an arbitrary time index, n, is as follows. Assuming that the information symbol vector at each antenna is of size M, we stack the symbols transmitted across K transmit antennas as shown below

{\tilde{x}}_{n} ≜ vec ([{\tilde{x}}_{n, 1} {\tilde{x}}_{n, 2} \dots {\tilde{x}}_{n, K}]),

(1)

where the n th block of M symbols from k th transmit antenna is represented as

\begin{array}{lcr} {\tilde{x}}_{n, k} ≜ {[{\tilde{x}}_{n, k} (0) {\tilde{x}}_{n, k} (1) \dots {\tilde{x}}_{n, k} (M - 1)]}^{T} . \end{array}

(2)

The affine-precoder output vector is similarly arranged as

x_{n} ≜ vec ([x_{n, 1} x_{n, 2} \dots x_{n, K}]),

(3a)

x_{n, k} ≜ {[x_{n, k} (0) x_{n, k} (1) \dots x_{n, k} (P - 1)]}^{T} .

(3b)

Denoting the precoder matrix of size K P × K M as Q and the additive pilot-symbol vector of size K P × 1 as t, we can now write the equation for the transmitted symbol vector during the training mode as follows:

\begin{array}{lcr} x_{n} = t + Q {\tilde{x}}_{n}, \end{array}

(4)

where t ≜ vec ([t ₁ t ₂ … t _K]). Furthermore, the matrix Q is such that the data stream transmitted from an antenna is precoded independently of the data-streams from the other antennas. In other words, Q has a block diagonal structure and hence Q ≜ diag ([Q ₁ Q ₂ … Q _K]). Despite this restriction on the structure of Q, it is still general enough to encapsulate not only a MIMO system employing K antennas but also a multi-user system, e.g., K _U users utilizing K antennas in total and communicating with a base-station equipped with K antennas. Also, restricting the structure of Q to be block diagonal simplifies an orthogonality condition (cf., Theorem 2) that helps in the design of the linear precoder and the training vector.

After pre-multiplying the above vector by I _K ⊗ C _T where $C_{T} ≜ {[{[0_{L \times (P - L)} I_{L}]}^{T} I_{P}]}^{T}$ with $\bar{P} = P + L$ , the $K \bar{P} \times 1$ vector ( $\bar{P} = P + L$ ) undergoes a digital-to-analog conversion followed by pulse-shaping to yield a continuous-time signal. Assuming perfect timing and carrier synchronization at the receiver, the signal is sampled to obtain the received symbol vector. Subsequently, the cyclic-prefix is removed by a pre-multiplication operation with I _K ⊗ C _R ( $C_{R} ≜ {[0_{P \times L} I_{P}]}^{T}$ ) and an ISI-free received vector of size K P × 1 is available for processing:

y_{n} = H_{n} t + H_{n} Q {\tilde{x}}_{n} + z_{n},

(5)

where the channel matrix $H_{n}$ is

H_{n} ≜ [\begin{array}{l} H_{1, 1}^{(n)} & H_{1, 2}^{(n)} & \dots & H_{1, K}^{(n)} \\ H_{2, 1}^{(n)} & H_{2, 2}^{(n)} & \dots & H_{2, K}^{(n)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ H_{K, 1}^{(n)} & H_{K, 2}^{(n)} & \dots & H_{K, K}^{(n)} \end{array}] .

(6)

Each matrix in the set, ${H_{i, j}^{(n)}}$ for 1 ≤ i ≤ K, 1 ≤ j ≤ K is circulant with the first column, ${[h_{i, j}^{(n)} [0] \dots h_{i, j}^{(n)} [L] 0]}^{T}$ and first row, $[h_{i, j}^{(n)} [0] 0 h_{i, j}^{(n)} [L] \dots h_{i, j}^{(n)} [1]]$ . We now define a channel vector h _n such that

\begin{align} h_{n} & ≜ & vec ([h_{n, 1} h_{n, 2} \dots h_{n, K}]), \end{align}

(7a)

\begin{align} h_{n, i} & ≜ & vec ([h_{i, 1}^{(n)} h_{i, 2}^{(n)} \dots h_{i, K}^{(n)}]), \end{align}

(7b)

\begin{align} h_{i, k}^{(n)} & ≜ & {[h_{i, k}^{(n)} [0] \dots h_{i, k}^{(n)} [L]]}^{T} . \end{align}

(7c)

By exploiting the commutativity property of discrete convolution, (5) can now be written in a different form in terms of the MIMO channel vector, h _n and the pilot symbol matrix, $T ≜ [T_{1} T_{2} \dots T_{K}]$ as,

y_{n} = (I_{K} \otimes T) h_{n} + H_{n} Q {\tilde{x}}_{n} + z_{n},

(8)

where the circulant matrices, {T _k} are constructed such that [t _k[0] … t _k[P − 1]]is the first column and [t _k[0] t _k[P − L] … t _k[P − 1]] is the first row. In (5) and (8), we use the subscript n in $H_{n}$ and h _n to indicate the time-dependence of the random channel. The system model described above needs to satisfy the following conditions.

(C1) The K P × K M dimensional linear-precoder matrix, Q is of full column-rank and strictly-tall, i.e., P > M. Also, V ≜ P − M.

(C2) The P × K(L+1) dimensional training matrix, t is a tall matrix, i.e., P ≥ K(L+1).

(C3) The matrix, t is of full column-rank, i.e., rank(T) = K(L + 1).

Remark: Condition (C1) is enforced as a simple way of introducing redundancy in the precoding process [7, 26]. Condition (C2) ensures that enough dimensions are available for the identification of the unknown channel coefficients in a linear least-squares sense. As we shall show in Theorem 2, the extra dimensions that are available as a result of employing a full-column rank, strictly-tall precoding matrix are useful in designing the training vector. Condition (C2) also suggests that given the knowledge of the channel order and for a fixed number of transmit antennas, the data-block size has to be at least equal to the product of the number of channel taps and the number of transmit antennas. Condition (C3) which complements (C2) implies that each element of the set, {T _k} is also of full column-rank.

Data transmission phase

Due to the fact that no training symbols are available in the data transmission phase, we can write the system model as follows:

r_{n} = H_{n} s_{n} + z_{n},

(9)

where $s_{n} = \bar{Q} {\tilde{s}}_{n}$ and ${\tilde{s}}_{n}$ are obtained in a manner similar to (1).

A few assumptions on the system model shown in (8) and (9) are now in order.

(A1) The channel vector, h _n is zero-mean, i.i.d complex Gaussian with variance $σ_{h}^{2}$ , i.e., $h_{n} \sim C N (0, σ_{h}^{2} I_{K^{2} (L + 1)})$ . Moreover, each channel tap gain is assumed to be an independent AR process. We only consider a first-order AR model (cf. Appendix 2 for a brief discussion of the general AR model) for each tap gain so that

\begin{array}{lcr} h_{n} = a h_{n - 1} + u_{n}, \end{array}

(10)

where a ∈ [0, 1] is the AR coefficient for the l th channel tap gain and the excitation noise, $u_{n} \sim C N (0, σ_{u}^{2} I_{K^{2} (L + 1)})$ . In order to match the correlation functions at lags 0 and 1 and thus make the random process WSS for n ≥ 0, we select $σ_{u}^{2} = (1 - a^{2}) σ_{h}^{2}$ .

(A2) The transmitted symbol vectors, ${\tilde{x}}_{n}$ and ${\tilde{s}}_{n}$ are i.i.d complex Gaussian with variance $σ_{x}^{2}$ and $σ_{s}^{2}$ , i.e., ${\tilde{x}}_{n} \sim C N (0, σ_{x}^{2} I_{KP})$ and ${\tilde{s}}_{n} \sim C N (0, σ_{s}^{2} I_{KP})$ respectively.

(A3) The additive noise vector, z _n is zero-mean, circularly-symmetric i.i.d complex Gaussian noise with variance $σ_{z}^{2}$ , i.e., $z_{n} \sim C N (0, σ_{z}^{2} I_{KP})$ .

Remark: Assumption (A1) indicates that the channel is modeled as Rayleigh-fading random vector. This assumption represents a standard model for a rich scattering environment in the absence of line-of-sight. An expression for a in terms of the channel Doppler spread and the transmission bandwidth was shown in [4]. However, the first-order AR model possibly incurs considerable estimation error and results in numerous erroneous symbol decisions [27, 28]. One reason for making assumption (A2) is to satisfy the regularity conditions related to the evaluation of the Bayesian FIM described below. They require that the joint distribution of $p (y_{n}, {\tilde{x}}_{n}, h_{n})$ be absolutely continuous with respect to x _n,k(p). A data vector modeled as Gaussian meets this criterion. For those transmit symbol vectors modeled on other distributions, the Theorem 1 gives an approximation. Another reason for making this assumption lies in the fact that a signal that is a zero-mean uncorrelated complex Gaussian distributed maximizes the lower bound (which is given with respect to a zero-mean uncorrelated complex Gaussian noise vector) on the mutual information between the input and the output for of MIMO channels [18, 29]. Moreover, for a block transmission scheme such as an OFDM system with large number of subcarriers, the transmit symbol vector obtained by linear-precoding the information-symbol vector with an IDFT matrix can be claimed to be Gaussian by an appeal to the central limit theorem ([30], Figure 4.21). Hence, (A2) is not a particularly restrictive assumption.

Decoupled channel and symbol estimation

An observation of (8) reveals that the knowledge of the MIMO channel vector is contained not only in the known training symbols, but also in the unknown information symbols. However, the joint estimation of the channel vector and the detection of the information symbol vector is a non-linear problem, and its solution may not exist in certain cases [13]. On the other hand, a sub-optimal approach is to decouple the channel estimation problem from the data detection process. In order to do so, we may consider the channel vector to be a deterministic unknown within the classical approach to statistical estimation or as a random vector by adopting the Bayesian viewpoint. In this study, we consider the latter approach and derive the FIM of the channel vector based on the (8). That is, we derive the Bayesian FIM concerning the estimation of the channel vector using K P × 1 observations gathered from all the receive antennas at an arbitrary time instant, n. We then maximize the Bayesian FIM, which is equivalent to minimizing the Bayesian Cramer-Rao lower bound and obtain an orthogonality criterion. Finally, we formulate an affine precoder scheme that meets this condition.

Strategy: Bayesian FIM maximization

Theorem 1.

Assuming that the likelihood function of p (y _n;h _n) for the system model given in (8) satisfies the regularity condition, the complex FIM for estimating the MIMO channel is

\begin{array}{l} I (h_{n}) & = σ_{z}^{- 2} (I_{K} \otimes T^{H} T) - σ_{z}^{- 4} σ_{x}^{2} Ξ (t, Q) \\ + σ_{x}^{4} Ξ (Q) + σ_{h}^{- 2} I_{K^{2} (L + 1)}, \end{array}

(11)

where,

\begin{align} Ξ (t, Q) & ≜ E_{h} \{{(I_{K} \otimes T)}^{H} H_{n} Q G Q^{H} H_{n}^{H} (I_{K} \otimes T)\}, \end{align}

(12a)

\begin{align} Ξ (Q) & ≜ E_{h} \{\sum_{j = 0}^{KM - 1} (Q_{j}^{H} \otimes h_{n}^{T} Q_{j}^{T}) (R_{y_{n} | h_{n}}^{- 1} \otimes R_{y_{n} | h_{n}}^{- T}) \\ \times \sum_{j = 0}^{KM - 1} (Q_{j}^{*} \otimes Q_{j} h_{n})\}, \end{align}

(12b)

\begin{align} G & ≜ {(I_{KM} + σ_{z}^{- 2} σ_{x}^{2} Q^{H} H_{n}^{H} H_{n} Q)}^{- 1}, \end{align}

(12c)

\begin{align} Q_{j} & ≜ I_{K} \otimes [Q_{j, 1} Q_{j, 2} \dots Q_{j, K}] . \end{align}

(12d)

Proof. See Appendix 1.2. □

Remark: Since we will be considering a non-decision-aided setup where any information about the channel coefficients that is contained in data symbols is discarded, the term Ξ (Q) represents potentially useful information that is not utilized. Consequently, as it is independent of t, the maximization of $I (h_{n})$ in such a scenario is possible by working with Ξ (t,Q) alone. The maximization of $I (h_{n})$ leads us to the orthogonality condition shown in Theorem 2.

Theorem 2. If the affine precoder scheme, (t, Q) satisfies conditions, (C1) and (C2), then the following orthogonality condition is necessary and sufficient for a non decision-aided training-only estimator to maximize the Bayesian FIM, $I (h_{n})$ obtained in (11):

\begin{array}{lcr} T_{i}^{H} Q_{j, m} = 0 1 \leq i, j \leq K 0 \leq m \leq M - 1 . \end{array}

(13)

Proof. See Appendix 1.3. □

The expression for the Bayesian FIM that we have obtained in Theorem 1 is analogous to the result provided in ([23], Lemma 1). Moreover, as shown in Appendix 2, we have not based this result on the block-diagonal structure of Q. Hence, the result in Theorem 1 is a general one. Also, the result that we derived in Theorem 2 was showed previously in [13] for SISO systems and in [24] for MIMO OFDM systems using minimum least-squares estimation error variance arguments. In [16, 23], the orthogonality condition was derived within a Bayesian framework with the former relying on an LMMSE channel estimator while the latter uses a Bayesian FIM expression similar to this study. However, while we focus on a block diagonal structure of the linear precoder, Vosoughi and Scaglione [23] focus on a general linear precoder matrix.

OFDM with FDM training: an orthogonal affine precoder scheme

We see from ([13], Theorem 1) for the case of a SISO system that the affine precoder scheme which uses linearly precoded OFDM along with an FDM training sequence that modulates a disjoint set of tones not used for data transmission meets the orthogonality criterion. Similarly, for MIMO systems, Theorem 3 establishes that although the training symbols and information-bearing symbols overlap in time domain, orthogonality between the subcarriers in frequency domain satisfies (13).

Theorem 3. The affine precoder scheme, (t, Q) that satisfies the orthogonality condition given in (13) irrespective of the FIR channel provides a non-data-aided channel estimator if it is selected from the class

\begin{align} Q_{k} & = W^{H} P^{(Q)} [\begin{array}{l} Θ_{M \times M} \\ 0_{V \times M} \end{array}], \end{align}

(14a)

\begin{align} t_{k} & = W^{H} P^{(t)} [\begin{array}{l} {\tilde{t}}_{k} \\ 0 \end{array}] . \end{align}

(14b)

In the above equations, Θ _M × M is any full-rank matrix and P ^(t) is a permutation matrix that places the L + 1 possible non-zero entries of ${\tilde{T}}_{k}$ on non data-bearing subcarriers, whereas $W_{m, n} = \frac{1}{P} exp (j 2 πmn / P)$ .

Proof. The proof is a straight-forward generalization of ([13], Appendix I). □

In the subsequent sections, we focus our analysis on a MIMO-OFDM communication system. That is, we assume ${\tilde{x}}_{n, \bar{k}}$ to be the result of a linear-precoding operation involving a general full-rank matrix, Θ _M × M before it is IDFT-modulated. Moreover, the same set of subcarriers are used for transmitting training symbols across all the antennas.

Training phase

By substituting the result of Theorem 3 in (5), and considering the signal at an arbitrary receive antenna, k, we write the following equation:

\begin{align} y_{n, k} & = \sum_{\bar{k} = 1}^{K} H_{k, \bar{k}}^{(n)} t_{\bar{k}} + \sum_{\bar{k} = 1}^{K} H_{k, \bar{k}}^{(n)} Q_{\bar{k}} {\tilde{x}}_{n, \bar{k}} + z_{n, k} \\ = \sum_{\bar{k} = 1}^{K} H_{k, \bar{k}}^{(n)} W^{H} P^{(t)} [\begin{array}{l} {\tilde{t}}_{\bar{k}} \\ 0 \end{array}] \\ + \sum_{\bar{k} = 1}^{K} H_{k, \bar{k}}^{(n)} W^{H} P^{(Q)} [\begin{array}{l} I_{M \times M} \\ 0_{V \times M} \end{array}] {\tilde{x}}_{n, \bar{k}} + z_{n, k} . \end{align}

(15)

By multiplying the above equation with $P_{0 : V - 1}^{(t) T} W$ , we notice that the channel estimation is decoupled from data detection so that the following expression is obtained

\begin{array}{l} P_{0 : V - 1}^{(t) T} W y_{n, k} = & \sum_{\bar{k} = 1}^{K} P_{0 : V - 1}^{(t) T} {\bar{H}}_{k, \bar{k}}^{(n)} P_{0 : V - 1}^{(t)} {\tilde{t}}_{\bar{k}} \\ + P_{0 : V - 1}^{(t) T} W z_{n, k}, \end{array}

(16)

where ${\bar{H}}_{k, \bar{k}}^{(n)} ≜ W H_{k, \bar{k}}^{(n)} W^{H}$ is a diagonal matrix and $P_{0 : V - 1}^{(t)}$ is the result of disregarding the zero entries in ${[{\tilde{t}}_{\bar{k}}^{H} 0^{H}]}^{H}$ . Moreover, we have utilized the fact that $P_{0 : V - 1}^{(t) T} P_{0 : M}^{(Q)} = 0$ i.e., the submatrices, P 0:V−1(t) and $P_{0 : M}^{(Q)}$ are orthogonal to each other. We also recognize that the following relationship holds due to the diagonal nature of ${\bar{H}}_{k, \bar{k}}^{(n)}$ :

\begin{array}{l} P_{0 : V - 1}^{(t) T} {\bar{H}}_{k, \bar{k}}^{(n)} P_{0 : V - 1}^{(t)} {\tilde{t}}_{\bar{k}} & = \sqrt{P} {\tilde{T}}_{\bar{k}} P_{0 : V - 1}^{(t) T} W_{0 : L} h_{k, \bar{k}}^{(n)} \\ = {\tilde{T}}_{\bar{k}} {\hat{W}}_{0 : L} h_{k, \bar{k}}^{(n)}, \end{array}

(17)

where ${\tilde{T}}_{\bar{k}} ≜ diag ({\tilde{t}}_{\bar{k}})$ and ${\hat{W}}_{0 : L} ≜ \sqrt{P} P_{0 : V - 1}^{(t) T} W_{0 : L}$ . As a result, (16) can be written as

\begin{array}{lcr} {\tilde{y}}_{n, k}^{(t)} = \tilde{T} h_{n, k} + {\tilde{z}}_{n, k}, \end{array}

(18)

where ${\tilde{y}}_{n, k}^{(t)} ≜ {\hat{W}}_{0 : L} y_{n, k}$ and

\begin{array}{lcr} \tilde{T} ≜ [{\tilde{T}}_{1} {\hat{W}}_{0 : L} {\tilde{T}}_{2} {\hat{W}}_{0 : L} \dots {\tilde{T}}_{K} {\hat{W}}_{0 : L}] . \end{array}

(19)

Also, it can be showed that $T_{k}^{H} T_{k} = {\hat{W}}_{0 : L}^{H} {\tilde{T}}_{k}^{*} {\tilde{T}}_{k} {\hat{W}}_{0 : L}$ . With, ${\tilde{y}}_{n}^{(t)} ≜ vec ([{\tilde{y}}_{n, 1}^{(t)} {\tilde{y}}_{n, 2}^{(t)} \dots {\tilde{y}}_{n, K}^{(t)}])$ , we can write the MIMO system model for the measured signal across all receive antennas due to the pilot tones as follows:

\begin{array}{lcr} {\tilde{y}}_{n}^{(t)} = (I_{K} \otimes \tilde{T}) h_{n} + {\tilde{z}}_{n} . \end{array}

(20)

It can be showed that enforcing conditions (C1), (C2), and (C3) naturally result in two more standard conditions regarding the structure of $\tilde{T}$ and thus satisfy the dimensionality of (20).

(C4) The V × K(L + 1) dimensional training matrix, $\tilde{T}$ is a tall matrix, i.e., V ≥ K(L + 1).

(C5) The matrix, $\tilde{T}$ is of full column-rank, i.e., $rank (\tilde{T}) = K (L + 1)$ .

By employing operations similar to those that helped in obtaining (20), the equation for the observation vector affected by the information symbols alone is as follows:

{\tilde{y}}_{n}^{(dt)} = {\tilde{H}}_{n} {\tilde{x}}_{n} + {\tilde{z}}_{n},

(21)

where the K M × K M channel matrix, ${\tilde{H}}_{n}$ is as follows:

{\tilde{H}}_{n} ≜ [\begin{array}{l} {\tilde{H}}_{1, 1}^{(n)} & {\tilde{H}}_{1, 2}^{(n)} & \dots & {\tilde{H}}_{1, K}^{(n)} \\ {\tilde{H}}_{2, 1}^{(n)} & {\tilde{H}}_{2, 2}^{(n)} & \dots & {\tilde{H}}_{2, K}^{(n)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\tilde{H}}_{K, 1}^{(n)} & {\tilde{H}}_{K, 2}^{(n)} & \dots & {\tilde{H}}_{K, K}^{(n)} \end{array}] .

(22)

Each matrix in the set, ${{\tilde{H}}_{i, j}^{(n)}}$ for 1 ≤ i ≤ K, 1 ≤ j ≤ K is a diagonal matrix and is obtained by performing the operation,

\begin{array}{lcr} {\tilde{H}}_{i, k}^{(n)} = P_{0 : M - 1}^{(Q) T} {\bar{H}}_{i, k}^{(n)} P_{0 : M - 1}^{(Q)} . \end{array}

(23)

Data transmission phase

Although the linear precoder matrix, $\bar{Q}$ can be any full-column rank matrix in general, we focus on a block diagonal structure. We consider each element in the set, ${{\bar{Q}}_{k}}$ to be a P × P IDFT matrix that modulates an information symbol vector which has been linearly precoded by a general full-rank matrix, ${\bar{Θ}}_{P \times P}$ .

{\tilde{r}}_{n} = {\bar{H}}_{n} {\tilde{s}}_{n} + {\tilde{z}}_{n},

(24)

where ${\tilde{r}}_{n} ≜ W r_{n}$ and the K P × K P channel matrix, ${\bar{H}}_{n}$ is defined as follows:

{\bar{H}}_{n} ≜ [\begin{array}{l} {\bar{H}}_{1, 1}^{(n)} & {\bar{H}}_{1, 2}^{(n)} & \dots & {\bar{H}}_{1, K}^{(n)} \\ {\bar{H}}_{2, 1}^{(n)} & {\bar{H}}_{2, 2}^{(n)} & \dots & {\bar{H}}_{2, K}^{(n)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\bar{H}}_{K, 1}^{(n)} & {\bar{H}}_{K, 2}^{(n)} & \dots & {\bar{H}}_{K, K}^{(n)} \end{array}] .

(25)

Remark: By enforcing the orthogonality condition and by choosing MIMO–OFDM with FDM training symbols as the affine precoder scheme, we have broken down (8) into (20) and (21). As a result, the impact of overlapping data-bearing symbols on the channel estimator has been circumvented. Moreover, we carryover the linear precoder from the training phase to the data transmission phase by introducing a simple modification on the dimensions of the IDFT matrix.

Before we study the MMSE characteristics during training and data transmission phases when a Kalman filter is employed to track the time-varying channel vector, h _n we note that the following time and power budget constraints are enforced over (20), (21), and (24),

\begin{align} N & = N_{t} + N_{d}, \end{align}

(26a)

\begin{align} P N & = (P_{t} + P_{dt}) N_{t} + P_{d} N_{d}, \end{align}

(26b)

where $P$ is the total average transmit power that is split into $P_{t}$ , the average power allocated for training, $P_{dt}$ , the average power allocated for information symbols in the training phase, and $P_{d}$ , the average power allocated for information symbols in the data transmission phase. In addition, $P_{t}$ is distributed equally among the transmit antennas, i.e.,

P_{t} = ∥ t ∥^{2} = \sum_{k = 1}^{K} ∥ t_{k} ∥^{2},

(27)

where $∥ t_{k} ∥^{2} = \sum_{v = 0}^{V - 1} | {\tilde{T}}_{k} [v] |^{2} = P_{t} / K, \forall 1 \leq k \leq K$ .

Blockwise Kalman tracking

Due to the AR(1) random process model for time-variations on the channel vector, in order to compute the channel estimator in the MMSE sense, we have to utilize the past and the current observations, ${{\tilde{y}}_{nN + k}^{(t)} : k \in [0, N_{t} - 1], n = 1, 2, \dots}$ . An MMSE channel estimator can then be given as,

\begin{array}{l} {\hat{h}}_{nN + k} = & E {h_{nN + k} | {{\tilde{y}}_{nN + j}^{(t)} : j \leq k, j \in [0, N_{t} - 1]}, \\ {{\tilde{y}}_{(n - m) N + j}^{(t)} : j \in [0, N_{t} - 1], m = 1, 2, \dots}} . \end{array}

(28)

However, a batch processing approach would necessitate the use of large datasets. A natural choice is the sequential MMSE approach and is implemented by a Kalman filter. A Kalman filter is well known for its computational efficiency which results from the fact that only the most recent estimate need to be stored in order to refine the MMSE estimate of the unknown parameter of interest based on the new observations. For the current problem at hand, we compute the channel estimate during the training phase based on (20) and utilize the predicted channel vector in the data transmission phase. The Kalman filter recursion algorithm for estimating the MIMO channel vector in the setup considered is summarized in (29a)–(29e) [31].

Prediction:

{\hat{h}}_{n | n - 1} = a {\hat{h}}_{n - 1 | n - 1},

(29a)

Minimum Prediction MSE Matrix:

M_{n | n - 1} = a^{2} M_{n - 1 | n - 1} + (1 - a^{2}) σ_{h}^{2} I_{K^{2} (L + 1)},

(29b)

Kalman Gain Matrix:

\begin{array}{l} K_{n} & = M_{n | n - 1} (I_{K} \otimes {\tilde{T}}^{H}) \\ \times {(σ_{z}^{2} I_{KP} + (I_{K} \otimes \tilde{T}) M_{n | n - 1} (I_{K} \otimes {\tilde{T}}^{H}))}^{- 1}, \end{array}

(29c)

Correction:

{\hat{h}}_{n | n} = {\hat{h}}_{n | n - 1} + K_{n} ({\tilde{y}}_{n}^{(t)} - (I_{K} \otimes \tilde{T}) {\hat{h}}_{n | n - 1}),

(29d)

Minimum MSE Matrix:

M_{n | n} = (I - K_{n} (I_{K} \otimes \tilde{T})) M_{n | n - 1} .

(29e)

Kalman filter recursion

It can be noticed that when the system converges to a steady state, the MMSE of the channel estimator is not stationary during each cycle of N blocks. In the data transmission phase, the MMSE associated with the channel estimator’s predicted state increases monotonically from the N _tth block to the (N − 1)th block. Thus, the maximum steady-state MMSE in the data transmission phase occurs at the last information symbol block of each cycle. On the other hand, since the channel estimator computed based on the observations of the 0th block in n th cycle refines the predicted channel state at the end of the last information symbol block of (N − 1)th cycle, the steady-state MMSE decreases monotonically from the 0th training block to the (N _t − 1)th training block. Before we derive the steady-state MMSE expressions for the two cases described above, we derive the steady-state MMSE when all the blocks are training symbols and make an interesting observation. The steady-state MMSE when all blocks are training symbols is given by the solution to the Ricatti equation (based on (29e)),

M^{(\infty)} = (I - K^{(\infty)} (I_{K} \otimes \tilde{T})) M_{1}^{(\infty)},

(30)

where $M^{(\infty)} ≜ lim_{n \to \infty} M_{n | n}$ , $M_{1}^{(\infty)} ≜ lim_{n \to \infty} M_{n | n - 1}$ , and $K^{(\infty)} ≜ lim_{n \to \infty} K_{n}$ . Although several techniques have been proposed in the published literature such as eigenvector solutions [32], Schur vector approaches [33], iterative solving for scalar polynomials [34], etc., to solve the system of equations obtained in (30), we will show that by utilizing the following lemma describing the optimal design of the training symbols in the MMSE sense, the above system of equations is greatly simplified for MIMO-OFDM systems.

Lemma 1

Lemma 1. For the system model shown in (20), the minimum error variance of the MMSE channel estimator is,

σ_{Δ h_{n}}^{2} = \frac{K^{3} (L + 1) σ_{z}^{2} σ_{h}^{2}}{K σ_{z}^{2} + σ_{h}^{2} P_{t}} .

(31)

The optimal $\tilde{T}$ , ${\tilde{T}}^{(opt)}$ that attains this error variance is,

\begin{array}{lcr} {\tilde{T}}^{(opt)} = {[{(E_{1} ϕ (V))}^{T} \dots {(E_{K} ϕ (V))}^{T}]}^{T}, \end{array}

(32)

where

\begin{align} {[E_{k}]}_{v, v} & = exp [\frac{j 2 πv f_{k}}{V}], \\ \forall 0 \leq v \leq V - 1, 1 \leq k \leq K \end{align}

(33a)

\begin{align} ϕ (V) & = \sqrt{\frac{P_{t}}{KV}} {[exp (j ϕ_{0}) exp (j ϕ_{1}) \dots exp (j ϕ_{V - 1})]}^{T}, \end{align}

(33b)

\begin{align} f_{k} & = k (V - L - 1), \end{align}

(33c)

{ϕ_{v}} \in [- π, π] .

(34)

Proof. See Appendix 1.4. □

Remark: By employing the training design described in (32), ${\tilde{T}}^{H} \tilde{T}$ in (D.7) is diagonal and the MMSE of (31) is attained. The time-domain training sequences can be obtained from (14b) in a straight-forward manner by using the relation, $t = (I_{K} \otimes W^{H} P_{0 : V - 1}^{(t)}) \tilde{T}$ . It can be noticed that a simple way of making the term, ${\tilde{T}}_{k_{1}}^{*} [v, v] {\tilde{T}}_{k_{2}} [v, v]$ in (D.13) equal to zero is to allow only (L+1) out of V subcarriers dedicated for training symbols to be used at any given antenna. These equispaced and equipowered training symbols occupy disjoint sets of subcarriers at each transmit antenna. Clearly, such a scheme utilizes only (L+1) out of V subcarriers dedicated for training symbols at any given antenna. On the other hand, a general training scheme design described in (32) uses all non-data-bearing subcarriers, i.e., V for channel estimation purposes.

Remark: In [16], disjoint sets of subcarriers were considered to reduce the MMSE channel estimation error. Training designs similar to ours were shown in [24] by minimizing the least-squares channel estimation error and in [20] by minimizing the MSE of the LMMSE channel estimate. In [35], several classes of training schemes are derived by minimizing the least-squares channel estimation error. In this study, the disjoint allocation of subcarriers for training symbols from different antennas is referred to as a FDM scheme and the phase-shift orthogonal design as a code-division multiplexing in the frequency domain scheme.

If we were to initialize the Kalman recursion by substituting the scaled identity covariance matrix of h _n for M _{− 1 | − 1}, the one-step prediction error matrix, M _{n|n − 1} is always a scaled identity matrix. Consequently, the matrix ${\tilde{K}}_{n} ≜ K_{n} (I_{K} \otimes \tilde{T})$ is also a scaled identity matrix since $(I_{K} \otimes {\tilde{T}}^{H} \tilde{T})$ is designed to be a scaled identity matrix. This is better understood by writing the alternative version of the Kalman gain matrix using the matrix inversion lemma^a:

\begin{align} K_{n} & = σ_{z}^{- 2} M_{n | n - 1} (I_{K} \otimes {\tilde{T}}^{H}) \\ - σ_{z}^{- 4} M_{n | n - 1} (I_{K} \otimes {\tilde{T}}^{H} \tilde{T}) \\ \times {(M_{n | n - 1}^{- 1} + (I_{K} \otimes {\tilde{T}}^{H} \tilde{T}))}^{- 1} (I_{K} \otimes {\tilde{T}}^{H}) . \end{align}

(35)

As an extension of the above fact, due to assumption, (A1) and the optimal training design described by Lemma 1, M ^( ∞ ) is also a scaled identity matrix. It can be showed that an arbitrary diagonal element, $m^{(\infty)} ≜ M^{(\infty)} [l, l]$ , 0 ≤ l ≤ K ²(L + 1) − 1 is given as follows:

\begin{array}{l} m^{(\infty)} & = \frac{σ_{z}^{2} (a^{2} m^{(\infty)} + σ_{u}^{2})}{σ_{z}^{2} + \frac{P_{t}}{K} (a^{2} m^{(\infty)} + σ_{u}^{2})} \\ = \frac{σ_{h}^{2}}{\frac{1}{2} (1 + \frac{σ_{h}^{2} P_{t}}{σ_{z}^{2} K}) + \sqrt{\frac{1}{4} {(1 + \frac{σ_{h}^{2} P_{t}}{σ_{z}^{2} K})}^{2} + \frac{a^{2}}{1 - a^{2}} \frac{σ_{h}^{2} P_{t}}{σ_{z}^{2} K}}} . \end{array}

(36)

This steady-state Ricatti solution is the lower bound on the MMSE for estimating any of the K ²(L+1) channel filter taps, irrespective of the particular phase being considered. To compute the steady-state MMSE characteristics, we let n → ∞, and define,

\begin{array}{lcr} M_{j}^{(\infty)} ≜ \lim_{n \to \infty} M_{nN + j | nN + j}, \end{array}

(37)

for j ∈ [0, N − 1]. We can now review the closed-form expressions for steady-state channel MMSEs in training and data transmission phases based on [36].

Lemma 2. When the training vectors are designed according to (32) and a Kalman filter is employed to perform channel tracking, the steady-state channel MMSEs for the system model corresponding to (20), (21), and (24) are given as follows:

m_{N - 1}^{(\infty)} = δ_{N - 1}^{(\infty)} [l, l] + m^{(\infty)},

(38a)

Training phase (j ∈[ 0, N _j − 1] ) :

m_{j}^{(\infty)} = m^{(\infty)} + \frac{(1 - α) δ_{N - 1}^{(\infty)} [l, l]}{β (1 - α^{j}) δ_{N - 1}^{(\infty)} [l, l] + α^{j} (1 - α)},

(38b)

Data transmission phase (j ∈[ 0, N _j − 1] ) :

m_{j}^{(\infty)} = \frac{m_{N - 1}^{(\infty)} - σ_{h}^{2} (1 - a^{2 (N - j)})}{a^{2 (N - j)}},

(38c)

where $m_{j}^{(\infty)} ≜ M_{j}^{(\infty)} [l, l]$ and $δ_{N - 1}^{(\infty)} [l, l]$ is computed as follows:

\begin{align} δ_{N - 1}^{(\infty)} [l, l] & = - b_{\infty} + \sqrt{b_{\infty}^{2} + c_{\infty}}, \end{align}

(39a)

\begin{align} b_{\infty} & ≜ (\frac{α^{N_{t}} - a^{2 (N - N_{t})}}{α^{N - t} - 1}) (\frac{α - 1}{2 β}) \\ - (\frac{1 - a^{2 (N - N_{t})}}{2}) (σ_{h}^{2} - m^{(\infty)}), \end{align}

(39b)

\begin{align} c_{\infty} & ≜ α^{N_{t}} (\frac{1 - a^{2 (N - N_{t})}}{α^{N_{t}} - 1}) (\frac{α - 1}{β}) (σ_{h}^{2} - m^{(\infty)}), \end{align}

(39c)

\begin{align} α & ≜ \frac{1}{a^{2}} {(1 + (σ_{h}^{2} - a^{2} (σ_{h}^{2} - m^{(\infty)})) \frac{P_{t}}{K})}^{2}, \end{align}

(39d)

\begin{align} β & ≜ \frac{P_{t}}{K} (1 + (σ_{h}^{2} - a^{2} (σ_{h}^{2} - m^{(\infty)})) \frac{P_{t}}{K}), \end{align}

(39e)

Proof. See proof of Lemma 1 in [36]. □

Capacity bounds with sequential MMSE channel estimation

Similar to [18], we adopt the definition of capacity in bits per channel use to be the maximum over the distribution of the transmit signal of the mutual information between the known training symbols and the observations and the unknown transmitted signal. In other words, for the system model shown in (20), (21), and (24), the channel capacity averaged over the random channel is defined as follows:

\begin{array}{l} C = \frac{1}{N} \times \frac{M}{P} \sum_{n = 0}^{N_{t} - 1} E [max_{p_{x} (.), E [| | {\tilde{x}}_{n} | |^{2}] = P_{dt}} I ({\tilde{y}}_{n}^{(dt)}; {\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n})] \\ + \frac{1}{N} \sum_{n = N_{t}}^{N - 1} E [max_{p_{s} (.), E [| | {\tilde{s}}_{n} | |^{2}] = P_{d}} I ({\tilde{r}}_{n}; {\tilde{s}}_{n} | {\hat{\bar{H}}}_{n})] \\ bits/channel use . \end{array}

(40)

Upper bound on the channel capacity

To benchmark the maximum achievable capacity, we consider the ideal scenario where the channel estimation is perfect. We also utilize the Gaussianity assumption on the distribution of the information symbol vectors, ${\tilde{x}}_{n}$ and ${\tilde{s}}_{n}$ due to (A2) in the channel capacity expression. We now have the following result:

Theorem 4. The upper bound on the channel capacity for the system model shown in (20), (21), and (24) is obtained when the information symbol vectors, ${\tilde{x}}_{n}$ and ${\tilde{s}}_{n}$ are Gaussian distributed and is given by the expression:

\begin{align} C_{u} & = \frac{N_{t}}{N} \times \frac{M}{P} E [max_{p_{x} (.), E [| | {\tilde{x}}_{n} | |^{2}] = P_{dt}} I ({\tilde{y}}_{0}^{(dt)}; {\tilde{x}}_{0} | {\tilde{H}}_{0})] \\ + \frac{N_{d}}{N} E [max_{p_{s} (.), E [| | {\tilde{s}}_{n} | |^{2}] = P_{d}} I ({\tilde{r}}_{N_{t}}; {\tilde{s}}_{N_{t}} | {\bar{H}}_{N_{t}})] \\ bits / channel use \\ = \frac{N_{t}}{N} \times \frac{M}{P} E [log det (I_{KM} + \frac{P_{dt}}{σ_{z}^{2}} {\tilde{H}}_{0} {\tilde{H}}_{0}^{H})] \\ + \frac{N_{d}}{N} E [log det (I_{KP} + \frac{P_{d}}{σ_{z}^{2}} {\bar{H}}_{N_{t}} {\bar{H}}_{N_{t}}^{H})] \\ bits / channel use . \end{align}

(41)

Proof. See Appendix 1.5. □

Lower bound on the channel capacity

From [4] and ([18], Theorem 1), we know that the lower bound on the mutual information between the channel input and its output is obtained when the additive noise is Gaussian distributed. In other words, when imperfect channel estimates are employed for estimating information symbols, a zero-mean uncorrelated complex Gaussian noise vector minimizes the upper bound over the distribution of the information symbol vector of the mutual information between the transmitted and observed information symbols. For the problem under consideration, the following signal model can be written by expressing the estimated channel matrix, as a sum of the conditional mean, and the random error component,

\begin{align} {\tilde{y}}_{n}^{(dt)} & = {\hat{\tilde{H}}}_{n} {\tilde{x}}_{n} + {\overset{̌}{\tilde{H}}}_{n} {\tilde{x}}_{n} + {\tilde{z}}_{n} \\ = {\hat{\tilde{H}}}_{n} {\tilde{x}}_{n} + (I_{K} \otimes {\tilde{X}}_{n}) {\overset{̌}{h}}_{n} + {\tilde{z}}_{n} . \end{align}

(42a)

\begin{align} {\tilde{r}}_{n} & = {\hat{\bar{H}}}_{n} {\tilde{s}}_{n} + {\overset{̌}{\bar{H}}}_{n} {\tilde{s}}_{n} + {\tilde{z}}_{n} \\ = {\hat{\bar{H}}}_{n} {\tilde{s}}_{n} + (I_{K} \otimes {\tilde{S}}_{n}) {\overset{̌}{h}}_{n} + {\tilde{z}}_{n} . \end{align}

(42b)

In (42a), we made use of the following relationship,

\begin{align} P_{0 : M - 1}^{(Q) T} {\bar{H}}_{k, \bar{k}}^{(n)} P_{0 : M - 1}^{(Q)} {\tilde{x}}_{n, \bar{k}} & = \sqrt{P} {\tilde{X}}_{n, \bar{k}} P_{0 : M - 1}^{(Q) T} W_{0 : L} h_{k, \bar{k}}^{(n)} \\ = {\tilde{X}}_{n, \bar{k}} {\overset{̌}{W}}_{0 : L} h_{k, \bar{k}}^{(n)}, \end{align}

(43)

where ${\tilde{X}}_{n, \bar{k}} ≜ diag ({\tilde{x}}_{n, \bar{k}})$ , ${\overset{̌}{W}}_{0 : L} ≜ \sqrt{P} P_{0 : M - 1}^{(Q) T} W_{0 : L}$ and

\begin{array}{lcr} {\tilde{X}}_{n} ≜ [{\tilde{X}}_{n, 1} {\overset{̌}{W}}_{0 : L} {\tilde{X}}_{n, 2} {\overset{̌}{W}}_{0 : L} \dots {\tilde{X}}_{n, K} {\overset{̌}{W}}_{0 : L}] . \end{array}

(44)

Similarly, in (42b), we made use of the following relationship,

\begin{array}{lcr} {\bar{H}}_{k, \bar{k}}^{(n)} {\tilde{s}}_{n, \bar{k}} = \sqrt{P} {\tilde{S}}_{n, \bar{k}} W_{0 : L} h_{k, \bar{k}}^{(n)}, \end{array}

(45)

where ${\tilde{S}}_{n, \bar{k}} ≜ diag ({\tilde{s}}_{n, \bar{k}})$ and

\begin{align} {\tilde{S}}_{n} ≜ [\sqrt{P} {\tilde{S}}_{n, 1} W_{0 : L} \sqrt{P} {\tilde{S}}_{n, 2} W_{0 : L} \dots \sqrt{P} {\tilde{S}}_{n, K} W_{0 : L}] . \end{align}

(46)

It should be observed that in (21) and (24), the channel is unknown whereas in (42a) and (42b), the channel is known. Furthermore, the additive noise in the former two equations is Gaussian and independent of the information symbols whereas in the latter two, it is possibly neither. This is due to the fact that each of the effective additive noise vectors, ${\tilde{z}}_{n}^{(dt)} ≜ (I_{K} \otimes {\tilde{X}}_{n}) {\overset{̌}{h}}_{n} + {\tilde{z}}_{n}$ and ${\tilde{z}}_{n}^{(d)} ≜ (I_{K} \otimes {\tilde{S}}_{n}) {\overset{̌}{h}}_{n} + {\tilde{z}}_{n}$ appear to be a sum of a Gaussian vector and a vector whose elements are obtained by summing products of Gaussian random variables. As a result, we will merely derive the lower bound by replacing the effective noise vectors, with Gaussian noise vectors that possess the same average powers. The expressions for the average noise powers in each phase are as shown below.

Training phase

\begin{array}{l} σ_{z^{(dt)}}^{2} (n) & = \frac{1}{KM} E {trace {{\tilde{z}}_{n}^{(dt)} {\tilde{z}}_{n}^{(dt) H}}} \\ = \frac{1}{M} m_{k}^{(\infty)} E {trace {{\tilde{X}}_{n} {\tilde{X}}_{n}^{H}}} + σ_{z}^{2} \\ = \frac{1}{M} m_{n}^{(\infty)} \sum_{k = 1}^{K} E {trace {{\tilde{X}}_{n, k} {\overset{̌}{W}}_{0 : L} {\overset{̌}{W}}_{0 : L}^{H} {\tilde{X}}_{n, k}^{H}}} + σ_{z}^{2} \\ = \frac{L + 1}{M} m_{n}^{(\infty)} \sum_{k = 1}^{K} \sum_{m = 0}^{M - 1} E {| {\tilde{X}}_{n, k} [m, m] |^{2}} + σ_{z}^{2} \\ = \frac{L + 1}{M} m_{n}^{(\infty)} M σ_{x}^{2} + σ_{z}^{2} \\ = (L + 1) P_{dt} m_{n}^{(\infty)} + σ_{z}^{2} n \in [0, N_{t} - 1], \end{array}

(47)

where we substituted, $σ_{x}^{2} = P_{dt}$ to account for the power budget on the transmit symbols in the training phase.

Data transmission phase

\begin{array}{l} σ_{z^{(d)}}^{2} (n) & = \frac{1}{KP} E {trace {{\tilde{z}}_{n}^{(d)} {\tilde{z}}_{n}^{(d) H}}} \\ = \frac{1}{P} m_{n}^{(\infty)} E {trace {{\tilde{S}}_{n} {\tilde{S}}_{n}^{H}}} + σ_{z}^{2} \\ = \frac{1}{P} m_{n}^{(\infty)} \sum_{k = 1}^{K} E {trace {P {\tilde{S}}_{n, k} {\overset{̌}{W}}_{0 : L} {\overset{̌}{W}}_{0 : L}^{H} {\tilde{S}}_{n, k}^{H}}} + σ_{z}^{2} \\ = \frac{L + 1}{P} m_{n}^{(\infty)} \sum_{k = 1}^{K} \sum_{p = 0}^{P - 1} E {| {\tilde{S}}_{n, k} [p, p] |^{2}} + σ_{z}^{2} \\ = \frac{L + 1}{P} m_{n}^{(\infty)} K σ_{s}^{2} + σ_{z}^{2} \\ = (L + 1) P_{d} m_{n}^{(\infty)} + σ_{z}^{2} n \in [N_{t}, N - 1], \end{array}

(48)

where we substituted, $σ_{s}^{2} = P_{d}$ to account for the power budget on the transmit symbols in the data transmission phase.

The lower bound on the channel capacity when the estimated MIMO channels are taken to be the true channels is now given by the following result.

Theorem 5. The worst-case lower bound on the channel capacity for the system model shown in (20), (21), and (24) is obtained when the additive noise is Gaussian distributed and is maximized when the information symbol vectors, ${\tilde{x}}_{n}$ and ${\tilde{s}}_{n}$ are Gaussian distributed. It is given by the expression:

\begin{align} C_{l} & = \frac{1}{N} \times \frac{M}{P} \sum_{n = 0}^{N_{t} - 1} E [max_{p_{x} (.), E [| | {\tilde{x}}_{n} | |^{2}] = P_{dt}} I ({\tilde{y}}_{n}^{(dt)}; {\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n})] \\ + \frac{1}{N} \sum_{n = N_{t}}^{N - 1} E [max_{p_{s} (.), E [| | {\tilde{s}}_{n} | |^{2}] = P_{d}} I ({\tilde{r}}_{n}; {\tilde{s}}_{n} | {\hat{\bar{H}}}_{n})] \\ bits/channel use \\ \geq \frac{1}{N} \times \frac{M}{P} \sum_{n = 0}^{N_{t} - 1} log det (I_{KM} + P_{dt} σ_{z^{(dt)}}^{2} (n) {\hat{\tilde{H}}}_{n}^{H} {\hat{\tilde{H}}}_{n}) \\ + \frac{1}{N} \sum_{n = N_{t}}^{N - 1} log det (I_{KP} + P_{d} σ_{z^{(d)}}^{2} (n) {\hat{\bar{H}}}_{n}^{H} {\hat{\bar{H}}}_{n}) . \end{align}

(49)

Proof. See Appendix 1.6. □

Simulation results

In our simulation, we selected K = 2, P = 32, L = 3, and M = 24 (since M = P − V and V is set to K(L + 1)). The training vectors are generated according to (32). We also set $P = 1$ , so that the SNR is defined as: $SNR ≜ - 10 \underset{10}{log} σ_{z}^{2}$ . In designing the optimal training vector and in generating Gaussian information symbols over each of the K transmit antennas, their variances have been appropriately scaled such that the total power constraint on the overall system is satisfied. We selected the Rayleigh channel variance to be $σ_{h}^{2} = 1 / (L + 1)$ . Thus, the Rayleigh channel adopted is an uncorrelated uniform scattering model. Moreover, we averaged the results over 500 randomly generated MIMO channel vectors. Given the fact that the channel capacity lower bound given by (49) is quite involved, we do not attempt to provide analytical results for the optimal power allocation and the optimal number of blocks out of N that carry superimposed training symbols. Consequently, we resort to numerical optimization to determine optimal $P_{dt}$ , $P_{t}$ , $P_{d},$ and N _t.

Performance evaluation of optimal training designs over non-time-varying wireless channels

In this section, we have generated the Rayleigh channels such that there is no correlation between successive block indices. In other words, each MIMO channel vector of any index is assumed to be independent of the MIMO channel vector of any other index. Moreover, we consider each block to contain training and information symbols such that channel tracking is not performed. Thus each block is represented by (20) and (21) alone. This also implies that $P_{dt} + P_{t} = 1$ and $P_{d} = 0$ are assumed.

Comparison of the MIMO channel estimator and BCRB

Figure 1 provides a comparison of the Bayesian CRBs with the corresponding variances of channel estimators for varying training powers. In this figure, we normalized the Bayesian CRB and the channel variance values by the number of MIMO channel coefficients, i.e., K ²(L + 1). From Figure 1, we notice that when a large fraction of total power is allocated for training symbols, the lower bound of the channel estimator progressively decreases. When the training symbols carry a small fraction of the total power, the under-performance of the MMSE channel estimator that is evaluated based solely on (20) w.r.t the lower bound is evident. On the other hand, as the power of training symbols increases, the difference between the MMSE channel estimator variance and the Bayesian lower bound achievable is negligible. In other words, the role played by the term, $σ_{x}^{4} Ξ (Q)$ in (11) is progressively minimized.

MMSE estimation of information symbols

Figure 2 describes the performance of an MMSE equalizer for estimating the information symbols using (21). We provide the MMSE variance characteristics for the case when the true channel was used in (21) as well as the case where estimated channels were used. Unsurprisingly, the curves corresponding to true channel values suggest that the MMSE variance of the estimated information symbols is lower than those that result when estimated channel vectors are used. However, a more interesting observation is that the performance is impaired both when $P_{t}$ is too low or too high. Specifically, when $P_{t} = 0.25$ , a small fraction of the total power is employed to gather channel estimates. Since, there are bound to be numerous errors in this scenario, data estimation suffers. Conversely, when $P_{t} = 0.95$ , only a small fraction of total power is expended for information symbols and hence the data estimation suffers. On the other hand, when $P_{t} = 0.5$ , the performance appears to be better. However, we refrain from computing the optimal power allocation by considering a capacity lower bound similar to (49) for the non-time-varying wireless channel scenario and reserve such an analysis for the next section.

Kalman tracking of time-varying wireless channels

In this section, we selected the MIMO channel vectors such that they are correlated. The excitation noise is generated with the appropriate variance so that the channel vectors are WSS. Throughout this section, we set N = 10. We then performed a numerical optimization as mentioned above and determined that the following values result maximize the lower bound obtained in (49): $P_{dt} = 0.32$ , $P_{t} = 0.41$ , $P_{d} = 0.27$ and N _t = 4. When a non-optimal value of $P_{dt}$ is allocated to training symbols, the division of the remaining power to information symbols in the training phase and the data transmission phase is arbitrarily chosen.

Steady-state MMSE of the channel estimator

In Figure 3, we provide the steady-state MMSE characteristics of the channel estimator when a Kalman filter is used to track the channel. We set a = 0.95 and fixed N _t = 4 in order to generate these characteristics. We also analyzed the characteristics for $P_{t} = 0.25, 0.5 and 0.95$ in addition to the optimal value. The MMSE lower bound shown in Figure 3 corresponds to (36) whereas the normalized MMSE corresponds to averaging (38b) and (38c) every N blocks. We notice that for small $P_{t}$ the errors committed due to channel predictions in the data transmission phase cause significant deviation of the normalized steady-state MMSE from the lower bound. Only at high values of $P_{t}$ , these errors become insignificant. Of particular importance is the fact that even at an SNR of 30 dB and with optimal training power allocation, the deterioration suffered due to prediction errors w.r.t the lower bound is close to 3 dB.

MMSE estimation of information symbols due to Kalman channel tracking

Figure 4 shows the resulting MMSE estimation error variance characteristics of information symbols. The solid curves rely on true channel estimates whereas the dashed curves depend on not only the estimated channel states, but also on Kalman predictions. Similar to Figure 2, we see that when $P_{t}$ is too small or too large the error variance of information symbol estimation suffers greatly. Even at high SNR, the non-judicious power allocation combined with Kalman predictions during the data transmission phase leads to numerous errors. On the other hand, optimal power allocation between training and information symbols leads to the lowest possible information symbol estimation error variance.

Capacity bounds

The final simulation example that we will consider is the capacity upper and lower bound characteristics. While the upper bound characteristics for varying levels of $P_{t}$ exhibit a gradual improvement toward the theoretical upper bound, the lower bound characteristics are more abrupt. This can be attributed to the prediction errors that occur in the Kalman prediction stage during the data transmission phase. When $P_{t} = 0.95$ , the fraction of the total power available for information symbols in the training phase and the data transmission phase is minuscule and hence the achievable capacity lower bound is small. In contrast, this value can be improved by more than 15 bits/channel use with an optimal allocation of training and data powers (Figure 5).

Conclusion

In this article, we have shown that similar to a SISO case, an OFDM linear precoder with an FDM training sequence satisfies the orthogonality condition and results in decoupled channel estimation and symbol detection. Furthermore, we have derived optimal training sequences such that the FDM training sequences between different antennas are phase-shift orthogonal to each other. Based on the structure of the training matrices, the Kalman filter recursion was simplified to a scalar recursion. Eventually, the upper and lower bounds on the channel capacity were obtained by utilizing the Kalman filter’s MMSE expressions to account for imperfect channel estimates. We showed that the Kalman filter predictions affect the capacity calculations substantially. Taking this degradation into account, we numerically determined the optimal training power allocation and optimal number of training blocks to achieve the best possible capacity lower bound. Finally, we provided numerical results to support the theoretical results.

Appendix 1

Modeling time-variations in low-mobility wireless channels

While the complex random variable description of a wireless channel as Rayleigh, Rician, etc., forms one aspect of characterization, another one involves taking the time-variations of the channel filter taps into consideration. A common assumption on the random process that drive the time-variations of the channel filter taps is its wide-sense stationarity. In other words, the mean and the auto-correlation functions of each filter tap are assumed to be independent of time, with the latter being a function of the time-difference alone. Further, each tap at a given time instant is assumed to be independent of every other tap at any time instant. Together these two assumptions give rise to the wide-sense stationary, uncorrelated scattering (WSSUS) model.

Autoregressive model: A widely used approach to model time variations of a WSSUS channel is by a general P th order AR random process. By considering (7a), the AR model that helps us to specify the correlation between the current state of the system and the past states is as shown below:

\begin{array}{lcr} h_{n} = \sum_{p = 1}^{P} A_{p} h_{n - p} + B u_{n} . \end{array}

(A.1)

In (A.1), each element in {A _p} is termed as an AR coefficient matrix or a state-transition matrix and u _n as the excitation or driving noise vector. The eigen values of each element in {A _p} are assumed to be less than 1 in magnitude and the driving noise is assumed to be i.i.d and complex Gaussian distributed with zero mean. The AR model admits the following Yule–Walker equations to describe the covariance function of the process [37].

R_{h} [n] = \sum_{p = 1}^{P} A_{p} R_{h} [n - p] + σ_{u}^{2} B B^{H} .

(A.2)

Assuming that R _h[0] and {A _p} are known, we can apply the fact that R _h[n] = R _h[−n], and recursively find {R _h[n]} for n = 1,2,…,P. We can also find a non-unique B by computing the square-root of (A.2) for n = 0 ([38], p. 358).

Proof of Theorem 1

From [39], we know that the complex FIM is given by the equation,

\begin{align} E \{(\frac{\partial ln p (y_{n}; h_{n})}{\partial h_{n}^{*}}) {(\frac{\partial ln p (y_{n}; h_{n})}{\partial h_{n}^{*}})}^{H}\} \\ = & E \{E [(\frac{\partial ln p (y_{n} | h_{n})}{\partial h_{n}^{*}}) {(\frac{\partial ln p (y_{n} | h_{n})}{\partial h_{n}^{*}})}^{H} | h_{n}]\} \\ + & E \{(\frac{\partial ln p (h_{n})}{\partial h_{n}^{*}}) {(\frac{\partial ln p (h_{n})}{\partial h_{n}^{*}})}^{H}\} . \end{align}

(B.1)

In the second equality of the above equation, the inner expectation in the first term is w.r.t y _n, whereas the outer expectation is w.r.t h _n. The log-likelihood function of the probability density function, p(y _n|h _n) in (B.1) and its derivative are as follows:

\begin{align} ln p (y_{n} | h_{n}) & = constant - ln (| R_{y_{n}} |) - u^{H} R_{y_{n}}^{- 1} u, \end{align}

(B.2a)

\begin{align} \frac{\partial ln p (y_{n} | h_{n})}{\partial h_{n}^{*}} & = - \frac{\partial ln | R_{y_{n} | h_{n}} |}{\partial h_{n}^{*}} - \frac{\partial u^{H} R_{y_{n} | h_{n}}^{- 1} u}{\partial h_{n}^{*}}, \end{align}

(B.2b)

where, $u ≜ (y_{n} - (I_{K} \otimes T) h_{n})$ and

\begin{align} R_{y_{n} | h_{n}} & = σ_{x}^{2} H_{n} Q Q^{H} H_{n}^{H} + σ_{z}^{2} I_{KP} \\ = σ_{x}^{2} \sum_{j = 0}^{KM - 1} Q_{j} h_{n} h_{n}^{H} Q_{j}^{H} + σ_{z}^{2} I_{KP}, \end{align}

(B.3a)

\begin{align} R_{y_{n} | h_{n}}^{- 1} & = σ_{z}^{- 2} I_{KP} - σ_{z}^{- 4} σ_{x}^{2} H_{n} Q \\ \times {(I_{KM} + σ_{z}^{- 2} σ_{x}^{2} Q^{H} H_{n}^{H} H_{n} Q)}^{- 1} Q^{H} H_{n}^{H} \\ = σ_{z}^{- 2} I_{KP} - σ_{z}^{- 4} σ_{x}^{2} H_{n} Q G Q^{H} H_{n}^{H} . \end{align}

(B.3b)

and $Q_{j} ≜ I_{K} \otimes [Q_{j, 1} Q_{j, 2} \dots Q_{j, K}]$ are obtained from each of the KM columns of Q. The matrices ${Q_{j, k}}$ are a result of applying the commutativity property of convolution. It should be noted that we have not utilized the block-diagonal structure of Q in obtaining ${Q_{j, k}}$ . In other words, the matrices ${Q_{j, k}}$ are constructed without explicit consideration of the fact that (K−1)P out of KP elements in each column of Q are zeros. In addition, we have utilized the matrix inversion lemma in obtaining (B.3b) where $G ≜ {(I_{KM} + σ_{z}^{- 2} σ_{x}^{2} Q^{H} H_{n}^{H} H_{n} Q)}^{- 1}$ .

We now evaluate the two partial derivatives in (B.2b) separately.

\frac{\partial ln | R_{y_{n} | h_{n}} |}{\partial h_{n}^{*}}

Using ([40], (9)), we note that

\begin{align} \frac{\partial ln | R_{y_{n} | h_{n}} |}{\partial h_{n}^{*}} & = D_{h_{n}^{*}} ln | R_{y_{n} | h_{n}} | \\ = (D_{R_{y_{n} | h_{n}}} ln | R_{y_{n} | h_{n}} |) D_{h_{n}^{*}} R_{y_{n} | h_{n}} \\ + (D_{R_{y_{n} | h_{n}}^{*}} ln | R_{y_{n} | h_{n}} |) D_{h_{n}^{*}} R_{y_{n} | h_{n}}^{*} . \end{align}

(B.4)

Here, $D_{R_{y_{n} | h_{n}}} ln | R_{y_{n} | h_{n}} | = R_{y_{n} | h_{n}}^{- T}$ and $D_{R_{y_{n} | h_{n}}^{*}} ln | R_{y_{n} | h_{n}} | = 0$ ([41], Table II). Moreover, from (B.3a), we see that (cf. ([40], (1)),

\begin{align} d R_{y_{n} | h_{n}} & = σ_{x}^{2} \sum_{j = 0}^{KM - 1} Q_{j} h_{n} d h_{n}^{H} Q_{j}^{H} \\ + σ_{x}^{2} \sum_{j = 0}^{KM - 1} Q_{j} d h_{n} h_{n}^{H} Q_{j}^{H}, \end{align}

(B.5a)

\begin{align} d vec R_{y_{n} | h_{n}} & = σ_{x}^{2} \sum_{j = 0}^{KM - 1} (Q_{j}^{*} \otimes Q_{n} h_{n}) d vec (h_{n}^{*}) \\ + σ_{x}^{2} \sum_{j = 0}^{KM - 1} (Q_{j}^{*} h_{n}^{*} \otimes Q_{n}) d vec (h_{n}) . \end{align}

(B.5b)

From the above equations and ([40], Table III), we notice that, $D_{h_{n}^{*}} R_{y_{n} | h_{n}} = σ_{x}^{2} \sum_{j = 0}^{KM - 1} (Q_{j}^{*} \otimes Q_{n} h_{n})$ . It should be noted that the definition of the partial derivative for the case of a scalar function w.r.t a column vector adopted by Hjørungnes and Gesbert result in a row vector ([40], Table III, 2nd row). We consider this definition to lead to transposed derivative and perform a transpose operation of the results obtained based on ([40], (9)) in order to obtain the FIM with appropriate dimensions. Consequently,

\begin{align} \frac{\partial ln | R_{y_{n} | h_{n}} |}{\partial h_{n}^{*}} \\ = & (D_{R_{y_{n} | h_{n}}} ln | R_{y_{n} | h_{n}} |) D_{h_{n}^{*}} R_{y_{n} | h_{n}} \\ = & {({vec}^{T} [\frac{\partial ln | R_{y_{n} | h_{n}} |}{\partial R_{y_{n} | h_{n}}}] [\frac{\partial vec R_{y_{n} | h_{n}}}{\partial^{T} h_{n}^{*}}])}^{T} \\ = & σ_{x}^{2} \sum_{j = 0}^{KM - 1} (Q_{j}^{H} \otimes h_{n}^{T} Q_{j}^{T}) vec (R_{y_{n} | h_{n}}^{- T}) . \end{align}

(B.6)

\frac{\partial u^{H} R_{y_{n} | h_{n}}^{- 1} u}{\partial h_{n}^{*}}

Using ([40], (9)), we can similarly show that

\begin{array}{l} \frac{\partial u^{H} R_{y_{n} | h_{n}}^{- 1} u}{\partial h_{n}^{*}} & = - σ_{x}^{2} \sum_{j = 0}^{KM - 1} (Q_{j}^{H} \otimes h_{n}^{T} Q_{j}^{T}) (R_{y_{n} | h_{n}}^{- 1} \otimes {R_{y}}_{n}^{- T}) (u \otimes u^{*}) \\ - {(I_{K} \otimes T)}^{H} R_{y_{n} | h_{n}}^{- 1} u . \end{array}

(B.7)

Hence, from (B.2b),

\begin{align} \frac{\partial ln p (y | h_{n})}{\partial h_{n}^{*}} & = σ_{x}^{2} \sum_{j = 0}^{KM - 1} (Q_{j}^{H} \otimes h_{n}^{T} Q_{j}^{T}) \\ \times [(R_{y_{n} | h_{n}}^{- 1} \otimes {R_{y}}_{n}^{- T}) (u \otimes u^{*}) - vec (R_{y_{n} | h_{n}}^{- T})] \\ + {(I_{K} \otimes T)}^{H} R_{y_{n} | h_{n}}^{- 1} u . \end{align}

(B.8)

Before we evaluate the inner expectation in the first term of (B.1), we recall that,

\begin{align} E [u \otimes u^{*}] & = E [vec (u^{*} u^{T})] = vec (E [u^{*} u^{T}]) \\ = vec (R_{y_{n} | h_{n}}^{T}) . \end{align}

(B.9)

Incidentally, by utilizing the above result, we can see that $E [\frac{\partial ln p (y_{n} | h_{n})}{\partial h_{n}^{*}}] = 0$ indicating that the regularity condition is satisfied. Employing (B.8) and (B.9), we can show that

\begin{align} E [(\frac{\partial ln p (y | h_{n})}{\partial h_{n}^{*}}) {(\frac{\partial ln p (y | h_{n})}{\partial h_{n}^{*}})}^{H} | h_{n}] \\ = {(I_{K} \otimes T)}^{H} R_{y_{n} | h_{n}}^{- 1} (I_{K} \otimes T) \\ + σ_{x}^{4} \sum_{j = 0}^{KM - 1} (Q_{j}^{H} \otimes h_{n}^{T} Q_{j}^{T}) (R_{y_{n} | h_{n}}^{- 1} \otimes R_{y_{n} | h_{n}}^{- T}) \\ \times \sum_{n = 0}^{KM - 1} (Q_{j}^{*} \otimes Q_{n} h_{n}) . \end{align}

(B.10)

Moreover, we observe that $E \{(\frac{\partial ln p (h_{n})}{\partial h_{n}^{*}}) {(\frac{\partial ln p (h_{n})}{\partial h_{n}^{*}})}^{H}\} = σ_{h}^{- 2} I_{K^{2} (L + 1)}$ . Substituting this result along with (B.10) and (B.3b) in (B.1) gives:

\begin{align} I (h_{n}) & = E_{h} \{{(I_{K} \otimes T)}^{H} R_{y_{n} | h_{n}}^{- 1} (I_{K} \otimes T)\} \\ + σ_{x}^{4} Ξ (Q) + σ_{h}^{- 2} I_{K^{2} (L + 1)} \\ = σ_{z}^{- 2} {(I_{K} \otimes T)}^{H} (I_{K} \otimes T) \\ - σ_{z}^{- 4} σ_{x}^{2} E_{h} \{{(I_{K} \otimes T)}^{H} H_{n} Q G Q^{H} H_{n}^{H} (I_{K} \otimes T)\} \\ + σ_{x}^{4} Ξ (Q) + σ_{h}^{- 2} I_{K^{2} (L + 1)} \\ = σ_{z}^{- 2} (I_{K} \otimes T^{H} T) - σ_{z}^{- 4} σ_{x}^{2} Ξ (t, Q) + σ_{x}^{4} Ξ (Q) \\ + σ_{h}^{- 2} I_{K^{2} (L + 1)} . \end{align}

(B.11)

Proof of Theorem 2

From (12c), we see that G is the inverse of sum of two full-rank positive-definite matrices. This is because, $H_{n}$ is a Rayleigh-fading channel matrix of full rank (with probability 1) due to (A1) and (C1) stipulates that Q be a full column-rank matrix. Hence, $Q^{H} H_{n}^{H} H_{n} Q$ is a matrix with strictly-positive eigenvalues. Together with the fact that I _{K
M} is also a matrix with strictly-positive eigen-values, we arrive at the result that G ≻ 0. By making a similar argument, we can show that $R_{y_{n} | h_{n}}^{- 1} ≻ 0$ . As a result of the above statements, we can claim that Ξ(t, Q)≽0 and Ξ(t)≽0. Combining the above results with (C1) leads us to conclude that $I (h_{n}) ≻ 0$ . Now, based on a previous observation that only Ξ (t, Q) is the term under the designer’s control, we see that $I^{(opt)} (h_{n}) ≽ I (h_{n})$ where the optimal Bayesian FIM for a training-based channel estimator is as follows:

\begin{align} I^{(opt)} (h_{n}) & = σ_{z}^{- 2} {(I_{K} \otimes T)}^{H} (I_{K} \otimes T) + σ_{x}^{4} Ξ (Q) \\ + σ_{h}^{- 2} I_{K^{2} (L + 1)} . \end{align}

(C.1)

It should be recalled that for any A, B ≻ 0 such that A ≽ B, we have B ^− 1≼ A ^− 1 and therefore, tr (B ^− 1)≼ tr (A ^− 1). Hence, finding the conditions under which $I^{(opt)} (h_{n}) ≽ I (h_{n})$ is equivalent to finding the conditions under which $tr (I^{(opt)} {(h_{n})}^{- 1}) ≼ tr (I {(h_{n})}^{- 1})$ . where $tr (I^{(opt)} {(h_{n})}^{- 1})$ is the Bayesian CRB of a non-decision-aided channel estimator for the system model described in (8). We can now see that $I^{(opt)} (h_{n})$ is obtained by making Ξ (t, Q) = 0 which in turn is possible by enforcing the condition:

\begin{array}{l} {(I_{K} \otimes T)}^{H} H_{n} Q \\ = [\begin{array}{l} T^{H} H_{1, 1}^{(n)} Q_{1} & T^{H} H_{1, 2}^{(n)} Q_{2} & \dots & T^{H} H_{1, K}^{(n)} Q_{K} \\ T^{H} H_{2, 1}^{(n)} Q_{1} & T^{H} H_{2, 2}^{(n)} Q_{2} & \dots & T^{H} H_{2, K}^{(n)} Q_{K} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ T^{H} H_{K, 1}^{(n)} Q_{1} & T^{H} H_{K, 2}^{(n)} Q_{2} & \dots & T^{H} H_{K, K}^{(n)} Q_{K} \end{array}] \\ = 0 . \end{array}

(C.2)

In other words,

\begin{array}{lcr} T^{H} H_{i, j}^{(n)} Q_{j} & = & 0 1 \leq i, j \leq K. \end{array}

(C.3)

We now utilize the commutativity property of convolution and in a manner similar in the construction of the matrices, {T _k}, we see that $H_{i, j}^{(n)} Q_{j} = [Q_{j, 1} h_{i, j}^{(n)} Q_{j, 2} h_{i, j}^{(n)} \dots Q_{j, M - 1} h_{i, j}^{(n)}]$ where the circulant matrices, {Q _{j, m}} are constructed such that [q _j,m[0] … q _j,m[P − 1]]is the first column and [q _j,m[0] t _k[P − L] … q _j,m[P − 1]] is the first row. The column vector q _j,m is the m th column of the j th linear precoder, Q _j. Hence,

\begin{array}{lcr} T_{i}^{H} Q_{j, m} = 0 1 \leq i, j \leq K 0 \leq m \leq M - 1 . \end{array}

(C.4)

Proof of Lemma 1

First, it is easy to see that the optimal MMSE estimator coincides with the linear MMSE estimator for the system under consideration i.e., (20), due to the joint Gaussian nature of the unknown parameter and the observation vectors. The optimal MMSE channel estimator, ${\hat{h}}_{n}$ is now ([31], (11.33) and (11.35)):

\begin{align} {\hat{h}}_{n} & = σ_{z}^{- 2} {[σ_{h}^{- 2} I_{K^{2} (L + 1)} + σ_{z}^{- 2} (I_{K} \otimes {\tilde{T}}^{H} \tilde{T})]}^{- 1} \\ \times (I_{K} \otimes {\tilde{T}}^{H}) {\tilde{y}}_{n}, \end{align}

(D.1a)

\begin{align} C_{Δ h_{n}} & = {[σ_{h}^{- 2} I_{K^{2} (L + 1)} + σ_{z}^{- 2} (I_{K} \otimes {\tilde{T}}^{H} \tilde{T})]}^{- 1} . \end{align}

(D.1b)

where, $Δ h_{n} = h_{n} - {\hat{h}}_{n}$ . The resulting channel estimator error variance is,

\begin{align} σ_{Δ h_{n}}^{2} & = trace {C_{Δ h_{n}}} \\ = trace {{[σ_{h}^{- 2} I_{K^{2} (L + 1)} + σ_{z}^{- 2} (I_{K} \otimes {\tilde{T}}^{H} \tilde{T})]}^{- 1}} . \end{align}

(D.2)

The optimal $\tilde{T}$ , ${\tilde{T}}^{(opt)}$ needs to minimize $σ_{Δ h_{n}}^{2}$ subject to (27). An equivalent representation of this pilot power constraint that will be useful for finding ${\tilde{T}}^{(opt)}$ is as follows:

\begin{align} trace ({\tilde{T}}_{k}^{H} {\tilde{T}}_{k}) & = P_{t} / K, \end{align}

(D.3a)

\begin{align} trace ({\tilde{T}}^{H} \tilde{T}) & = (L + 1) P_{t} . \end{align}

(D.3b)

As a result of (D.3b), we have

{\tilde{T}}^{(opt)} = {arg}_{\tilde{T}} \min_{trace ({\tilde{T}}^{H} \tilde{T}) = (L + 1) P_{t}} σ_{Δ h_{n}}^{2} .

(D.4)

From ([15], Appendix I), we see that for any M × M dimensional positive-definite matrix A,

\begin{array}{lcr} trace (A^{- 1}) \geq \sum_{m = 0}^{M - 1} \frac{1}{a_{m, m}}, \end{array}

(D.5)

where the equality is attained if and only if A is diagonal. Therefore, if ${\tilde{T}}^{(opt)}$ is employed to perform the MIMO–OFDM channel estimation, the resulting variance of the MMSE channel estimator is as follows:

\begin{array}{l} σ_{Δ h_{n}}^{2} & = trace {{[σ_{h}^{- 2} I_{K^{2} (L + 1)} + σ_{z}^{- 2} (I_{K} \otimes {\tilde{T}}^{H} \tilde{T})]}^{- 1}} \\ \geq \sum_{l = 0}^{K^{2} (L + 1) - 1} \frac{1}{{[σ_{h}^{- 2} I_{K^{2} (L + 1)} + σ_{z}^{- 2} (I_{K} \otimes {\tilde{T}}^{H} \tilde{T})]}_{l, l}} \\ = \frac{K^{3} (L + 1) σ_{z}^{2} σ_{h}^{2}}{K σ_{z}^{2} + σ_{h}^{2} P_{t}}, \end{array}

(D.6)

and equality in the above equation is attained when ${\tilde{T}}^{H} \tilde{T}$ is diagonal.

Optimal training design

We now design the optimal training design that achieves the minimum MSE variance shown in (D.6). We will see that in order to attain this bound, the pilot sequences of each transmit antenna as well as their relationship with the training sequences emitted from every other transmit antenna need to satisfy certain specific properties. These properties are a direct consequence of (D.3b) and (D.5).

A closer observation of ${\tilde{T}}^{H} \tilde{T}$ reveals the following:

\begin{align} {\tilde{T}}^{H} \tilde{T} = [\begin{array}{l} R_{1, 1} & \dots & R_{1, K} \\ R_{2, 1} & \dots & R_{2, K} \\ ⋮ & ⋱ & ⋮ \\ R_{K, 1} & \dots & R_{K, K} \end{array}], \end{align}

(D.7)

where the (L + 1) × (L + 1) dimensional submatrix $R_{k_{1}, k_{2}}$ is defined based on (19) as:

\begin{array}{lcr} R_{k_{1}, k_{2}} ≜ {\hat{W}}_{0 : L}^{H} {\tilde{T}}_{k_{1}}^{H} {\tilde{T}}_{k_{2}} {\hat{W}}_{0 : L} . \end{array}

(D.8)

The minimum variance as shown in (D.6) is therefore attained when

\begin{array}{lcr} R_{k_{1}, k_{2}} = {\hat{W}}_{0 : L}^{H} {\tilde{T}}_{k_{1}}^{H} {\tilde{T}}_{k_{2}} {\hat{W}}_{0 : L} = \frac{\bar{P}}{K} I_{L + 1} δ (k_{1} - k_{2}) . \end{array}

(D.9)

Case - k ₁ = k ₂ In order to understand the conditions that need to be imposed on the structure of ${\tilde{T}}_{k}$ , we examine an arbitrary element of R _k,k. From (D.9), we notice that

\begin{array}{l} {[R_{k, k}]}_{l_{1}, l_{2}} & = \sum_{v = 0}^{V - 1} | {\tilde{T}}_{k} [v, v] |^{2} exp {- j 2 π l_{v} (l_{2} - l_{1}) / P} \\ = \frac{P_{t}}{K} δ (l_{1} - l_{2}) . \end{array}

(D.10)

It can be verified that the above expression is true under three conditions:

(C6) P = V S where $S \in Z^{+}$ .

(C7) {l _v}, the index set of subcarriers that carry pilot symbols are chosen such that l _v = l _s + v S for l _s ∈ [0, S − 1] and 0 ≤ v ≤ V − 1.

(C8) The pilot tones are all equally powered so that $| {\tilde{T}}_{k} [v, v] |^{2} = \frac{P_{t}}{KV}$ .

The above conditions indicate that the pilot symbols used for channel estimation must be equispaced in the subcarrier domain and equipowered. Due to (C8), we see that ${\tilde{T}}_{k}^{*} {\tilde{T}}_{k} = \frac{P_{t}}{KV} I_{V}$ . Combined with the fact that ${\hat{W}}_{0 : L}^{H} {\hat{W}}_{0 : L} = V I_{L + 1}$ , we see that $R_{k, k} = \frac{P_{t}}{K} I_{L + 1}$ . We now see that when K = 1, the following pilot sequence design:

\begin{align} {\tilde{T}}_{k} & = ϕ (V) \\ = \sqrt{\frac{P_{t}}{KV}} {[exp (j ϕ_{0}) exp (j ϕ_{1}) \dots exp (j ϕ_{V - 1})]}^{T}, \end{align}

(D.11)

meets conditions (C6), (C7), and (C8) that mandate the usage of equipowered, equispaced pilots at each transmit antenna. In (D.11), {ϕ _v} are any arbitrary values in [− π, π].

Case - k ₁ ≠ k ₂ We now incorporate the consequences of imposing the condition, $R_{k_{1}, k_{2}} = 0$ when k ₁ ≠ k ₂, in (D.11). We again utilize (D.9) and apply (C7). We see that

\begin{align} {[R_{k_{1}, k_{2}}]}_{l_{1}, l_{2}} & = exp {- j 2 π l_{s} (l_{1} - l_{2}) / P} \\ \times \sum_{v = 0}^{V - 1} {\tilde{T}}_{k_{1}}^{*} [v, v] {\tilde{T}}_{k_{2}} [v, v] \\ \times exp {- j 2 π vS (l_{1} - l_{2}) / P}, \end{align}

(D.12)

which equals zero when

\begin{array}{l} \sum_{v = 0}^{V - 1} {\tilde{T}}_{k_{1}}^{*} [v, v] {\tilde{T}}_{k_{2}} [v, v] exp {- j 2 π vS (l_{1} - l_{2}) / P} = 0 \\ \Leftrightarrow & \sum_{v = 0}^{V - 1} {\tilde{T}}_{k_{1}}^{*} [v, v] {\tilde{T}}_{k_{2}} [v, v] exp {- j 2 π v (l_{1} - l_{2}) / V} = 0, \end{array}

(D.13)

∀ 1 ≤ k ₁, k ₂ ≤ K and ∀ (l ₁ − l ₂) ∈ [0, ±1 … ± L]. This condition clearly suggests that in the frequency-domain, phase-shift orthogonality is required between the pilot sequences of different transmit antennas over the range, (l ₁ − l ₂) ∈ [0, ±1 … ± L]. Equivalently, this translates to circular shift orthogonality in the time-domain. We now define a diagonal V × V dimensional phase-shifting matrix, $E_{k}$ as shown below:

\begin{array}{lcr} {[E_{k}]}_{v, v} = exp [\frac{j 2 πv f_{k}}{V}], \end{array}

(D.14)

∀ 0 ≤ v ≤ V − 1, 1 ≤ k ≤ K and design the pilot sequence such that

\begin{array}{lcr} {\tilde{T}}_{k} = E_{k} ϕ (V) . \end{array}

(D.15)

As a result of substituting (D.14) in (D.13), we see that

\begin{array}{lcr} \sum_{v = 0}^{V - 1} exp {- j 2 π v (f_{k_{1}} - f_{k_{2}} + l_{1} - l_{2}) / V} = 0 . \end{array}

(D.16)

We selected $E_{k}$ as shown in (D.14) so that we can exploit the property of summation of the roots of unity. In order to do so, we require that the term, $(f_{k_{1}} - f_{k_{2}} + l_{1} - l_{2})$ be a non integer-multiple of V. So, we choose f _k = k(V − L − 1). In conclusion, the training design shown in (32) meets not only conditions (C6), (C7), and (C8) but also (D.13) so that phase-shift orthogonality is maintained between the pilot sequences of any pair of transmit antennas.

Proof of Theorem 4

By denoting the entropy using (.) and applying the definition of mutual information, we can write the following expression,

I ({\tilde{y}}_{n}^{(dt)}; {\tilde{x}}_{n} | {\tilde{H}}_{n}) = H ({\tilde{x}}_{n} | {\tilde{H}}_{n}) - H ({\tilde{x}}_{n} | {\tilde{H}}_{n}, {\tilde{y}}_{n}^{(dt)}),

(E.1)

for the system model in the training phase. In the above equation, $H ({\tilde{x}}_{n} | {\tilde{H}}_{n})$ is maximized when ${\tilde{x}}_{n}$ is Gaussian. Hence, with $R_{{\tilde{x}}_{n}} = P_{dt} I_{KM}$ ,

H ({\tilde{x}}_{n} | {\tilde{H}}_{n}) = log det (π e P_{dt} I_{KM}) .

(E.2)

On the other hand,

\begin{array}{lcr} H ({\tilde{x}}_{n} | {\tilde{H}}_{n}, {\tilde{y}}_{n}^{(dt)}) = log det (π e R_{{\tilde{x}}_{n} | {\tilde{y}}_{n}^{(dt)}, {\tilde{H}}_{n}}), \end{array}

(E.3)

where $R_{{\tilde{x}}_{n} | {\tilde{y}}_{n}^{(dt)}, {\tilde{H}}_{n}} = {((1 / P_{dt}) I_{KM} + (1 / σ_{z}^{2}) {\tilde{H}}_{n}^{H} {\tilde{H}}_{n})}^{- 1}$ . In writing the covariance matrix of the random vector, ${\tilde{x}}_{n}$ conditioned on ${\tilde{y}}_{n}^{(dt)}$ and ${\tilde{H}}_{n}$ , we utilize the fact that ${\tilde{z}}_{n}$ is Gaussian distributed. Consequently,

\begin{array}{lcr} I ({\tilde{y}}_{n}^{(dt)}; {\tilde{x}}_{n} | {\tilde{H}}_{n}) = log det (I_{KM} + \frac{P_{dt}}{σ_{z}^{2}} {\tilde{H}}_{n} {\tilde{H}}_{n}^{H}) . \end{array}

(E.4)

By a similar approach, we can show that

\begin{array}{lcr} I ({\tilde{r}}_{n}; {\tilde{s}}_{n} | {\bar{H}}_{n}) = log det (I_{KP} + \frac{P_{d}}{σ_{z}^{2}} {\bar{H}}_{n} {\bar{H}}_{n}^{H}), \end{array}

(E.5)

in the data transmission phase. Due to the fact that the mutual information between the transmitted and estimated data vectors is independent of the block index, we represent the channel capacity upper bound by choosing the mutual information with respect to an arbitrary block indices as shown in (41). It can also be observed that we have included the appropriate normalization factor since only M out of P subcarriers in each OFDM symbol in the training phase, carry information symbols.

Proof of Theorem 5

We again apply the definition of mutual information and write the expression,

I ({\tilde{y}}_{n}^{(dt)}; {\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n}) = H ({\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n}) - H ({\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n}, {\tilde{y}}_{n}^{(dt)}),

(F.1)

for the system model in the training phase. Similar to (E.2), we see that

\begin{array}{lcr} H ({\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n}) = log det (π e P_{dt} I_{KM}), \end{array}

(F.2)

whereas,

\begin{array}{lcr} H ({\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n}, {\tilde{y}}_{n}^{(dt)}) \leq log det (π e R_{{\tilde{x}}_{n} | {\tilde{y}}_{n}^{(dt)}, {\hat{\tilde{H}}}_{n}}), \end{array}

(F.3)

with equality if and only if ${\tilde{x}}_{n}$ given ${\tilde{y}}_{n}^{(dt)}$ and ${\hat{\tilde{H}}}_{n}$ is drawn from a Gaussian distribution with the covariance matrix, $R_{{\tilde{x}}_{n} | {\tilde{y}}_{n}^{(dt)}, {\hat{\tilde{H}}}_{n}}$ . We can now formulate an expression for $R_{{\tilde{x}}_{n} | {\tilde{y}}_{n}^{(dt)}, {\hat{\tilde{H}}}_{n}}$ by assuming that LMMSE estimator^b has been used to estimate ${\tilde{x}}_{n}$ :

\begin{array}{lcr} R_{{\tilde{x}}_{n} | {\tilde{y}}_{n}^{(dt)}, {\hat{\tilde{H}}}_{n}} = R_{{\tilde{x}}_{n}} - R_{{\tilde{x}}_{n}, {\tilde{y}}_{n}^{(dt)}} R_{{\tilde{y}}_{n}^{(dt)}} R_{{\tilde{y}}_{n}^{(dt)}, {\tilde{x}}_{n}}, \end{array}

(F.4)

where

\begin{align} R_{{\tilde{x}}_{n}, {\tilde{y}}_{n}^{(dt)}} & ≜ E {{\tilde{x}}_{n} {\tilde{y}}_{n}^{(dt) H} | {\hat{\tilde{H}}}_{n}} = P_{dt} {\hat{\tilde{H}}}_{n}^{H}, \end{align}

(F.5a)

\begin{align} R_{{\tilde{y}}_{n}^{(dt)}} & ≜ E {{\tilde{y}}_{n}^{(dt)} {\tilde{y}}_{n}^{(dt) H} | {\hat{\tilde{H}}}_{n}} \\ = P_{dt} {\hat{\tilde{H}}}_{n} {\hat{\tilde{H}}}_{n}^{H} + σ_{z^{(dt)}}^{2} (n) I_{KM} . \end{align}

(F.5b)

In (F.5a) and (F.5b), we have utilized the orthogonality property of LMMSE estimation and eliminated the covariance terms between the information symbol vector, ${\tilde{x}}_{n}$ and the effective noise vector, ${\tilde{z}}_{n}^{(dt)}$ . By substituting (F.5a) and (F.5b) in (F.4), we see that

\begin{array}{lcr} R_{{\tilde{x}}_{n} | {\tilde{y}}_{n}^{(dt)}, {\hat{\tilde{H}}}_{n}} & = P_{dt} I_{KM} - P_{dt}^{2} {\hat{\tilde{H}}}_{n}^{H} \\ \times {(P_{dt} {\hat{\tilde{H}}}_{n} {\hat{\tilde{H}}}_{n}^{H} + σ_{z^{(dt)}}^{2} (n) I_{KM})}^{- 1} {\hat{\tilde{H}}}_{n} \\ = {((1 / P_{dt}) I_{KM} + σ_{z^{(dt)}}^{2} (n) {\hat{\tilde{H}}}_{n}^{H} {\hat{\tilde{H}}}_{n})}^{- 1} . \end{array}

(F.6)

Finally, from (F.2), (F.3) and (F.6), we have,

\begin{align} I ({\tilde{y}}_{n}^{(dt)}; {\tilde{x}}_{n} | {\hat{\tilde{H}}}_{n}) \geq log det (I_{KM} + P_{dt} σ_{z^{(dt)}}^{2} (n) {\hat{\tilde{H}}}_{n}^{H} {\hat{\tilde{H}}}_{n}) . \end{align}

(F.7)

By a similar approach, we can show that

\begin{align} I ({\tilde{r}}_{n}; {\tilde{s}}_{n} | {\hat{\bar{H}}}_{n}) \geq log det (I_{KP} + P_{d} σ_{z^{(d)}}^{2} (n) {\hat{\bar{H}}}_{n}^{H} {\hat{\bar{H}}}_{n}), \end{align}

(F.8)

in the data transmission phase. Therefore, the channel capacity lower bound is obtained by combining (F.7) and (F.8) as shown in (49).

Endnotes

^a (A + B C D)^− 1 = A ^− 1− A ⁻¹ B(C ^− 1 + D A ⁻¹ B)^− 1 D A ⁻¹.

^bNotice that when ${\tilde{z}}_{n}^{(dt)}$ is Gaussian, the LMMSE estimator coincides with an MMSE estimator.

References

Foschini GJ: Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas. Bell Labs Tech. J 1996, 1(2):41-59.
Article Google Scholar
Merhav N, Kaplan G, Lapidoth A, Shamai Shitz S: On information rates for mismatched decoders. IEEE Trans. Inf. Theory 1994, 40: 1953-1967. 10.1109/18.340469
Article Google Scholar
Lapidoth A, Narayan P: Reliable communication under channel uncertainty. IEEE Trans. Inf. Theory 1998, 44: 2148-2177. 10.1109/18.720535
Article MathSciNet Google Scholar
Medard M: The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel. IEEE Trans. Inf. Theory 2000, 46: 933-946. 10.1109/18.841172
Article Google Scholar
Lapidoth A, Shamai S: Fading channels: how perfect need “perfect side information” be? Theory 2002, 48: 1118-1134.
MathSciNet Google Scholar
Yang J, Roy S: On joint transmitter and receiver optimization for multiple-input-multiple-output (MIMO) transmission systems. IEEE Trans. Commun 1994, 42: 3221-3231. 10.1109/26.339844
Article Google Scholar
Scaglione A, Giannakis G, Barbarossa S: Redundant filterbank precoders and equalizers. Parts I & II. IEEE Trans. Signal Process 1999, 47: 1988-2006. 10.1109/78.771047
Article Google Scholar
Palomar D, Lagunas M, Cioffi J: Optimum linear joint transmit-receive processing for MIMO channels with QoS constraints. IEEE Trans. Signal Process 2004, 52: 1179-1197. 10.1109/TSP.2004.826164
Article MathSciNet Google Scholar
Tong L, Sadler B, Dong M: Pilot-assisted wireless transmissions: general model, design criteria, and signal processing. IEEE Signal Process Mag 2004, 21: 12-25. 10.1109/MSP.2004.1359139
Article Google Scholar
Negi R, Cioffi J: Pilot tone selection for channel estimation in a mobile OFDM system. IEEE Trans. Consum. Electron 1998, 44: 1122-1128. 10.1109/30.713244
Article Google Scholar
Adireddy S, Tong L, Viswanathan H: Optimal placement of training for frequency-selective block-fading channels. IEEE Trans. Inf. Theory 2002, 48: 2338-2353. 10.1109/TIT.2002.800466
Article MathSciNet Google Scholar
Dong M, Tong L: Optimal design and placement of pilot symbols for channel estimation. IEEE TransSignal Process 2002, 50: 3055-3069. 10.1109/TSP.2002.805504
Article Google Scholar
Ohno S, Giannakis G: Optimal training and redundant precoding for block transmissions with application to wireless OFDM. IEEE Trans. Commun 2002, 50: 2113-2123. 10.1109/TCOMM.2002.806547
Article Google Scholar
Manton J, Mareels I, Hua Y: Affine precoders for reliable communications. Proceedings of the 25th IEEE International COnference on Acoustics Speech, and Signal Processing, 2000 (ICASSP ’00) 2000, 2749-2752.
Google Scholar
Ohno S, Giannakis G: Capacity maximizing mmse-optimal pilots for wireless OFDM over frequency-selective block Rayleigh-fading channels. IEEE Trans. Inf Theory 2004, 50: 2138-2145. 10.1109/TIT.2004.833365
Article MathSciNet Google Scholar
Ma X, Yang L, Giannakis G: Optimal training for MIMO frequency-selective fading channels. IEEE Trans. Wirel. Commun 2005, 4: 453-466.
Article Google Scholar
Baltersee J, Fock G, Meyr H: Achievable rate of MIMO channels with data-aided channel estimation and perfect interleaving. IEEE J. Sel. Areas Commun 2001, 19: 2358-2368. 10.1109/49.974602
Article Google Scholar
Hassibi B, Hochwald B: How much training is needed in multiple-antenna wireless links? IEEE Trans. Inf. Theory 2003, 49: 951-963. 10.1109/TIT.2003.809594
Article MathSciNet Google Scholar
Samardzija D, Mandayam N: Pilot-assisted estimation of MIMO fading channel response and achievable data rates. IEEE Trans. Signal Process. 2003, 51: 2882-2890. 10.1109/TSP.2003.818158
Article MathSciNet Google Scholar
Vosoughi A, Scaglione A: On the effect of receiver estimation error upon channel mutual information. IEEE Trans. Signal Process 2006, 54: 459-472.
Article Google Scholar
Coldrey M, Bohlin P: Training-based MIMO systems–Part, I: performance comparison. IEEE Trans. Signal Process 2007, 55: 5464-5476.
Article MathSciNet Google Scholar
Ding M, Blostein S: Maximum mutual information design for MIMO systems with imperfect channel knowledge. IEEE Trans. Inf. Theory 2010, 56: 4793-4801.
Article MathSciNet Google Scholar
Vosoughi A, Scaglione A: Everything you always wanted to know about training: guidelines derived using the afine precoding framework and the CRB. IEEE Trans. Signal Process 2006, 54: 940-954.
Article Google Scholar
Barhumi I, Leus G, Moonen M: Optimal training design for MIMO OFDM systems in mobile wireless channels. IEEE Trans. Signal Process 2003, 51: 1615-1624. 10.1109/TSP.2003.811243
Article Google Scholar
Dong M, Tong L, Sadler B: Optimal pilot placement for channel tracking in OFDM. IEEE Military Communications Conference 2002 2002, 602-606.
Google Scholar
Giannakis G: Filterbanks for blind channel identification and equalization. IEEE Signal Process. Lett 1997, 4: 184-187.
Article Google Scholar
Stojanovic M, Proakis J, Catipovic J: Analysis of the impact of channel estimation errors on the performance of a decision-feedback equalizer in fading multipath channels. IEEE Trans. Commun 1995, 43: 877-886. 10.1109/26.380120
Article Google Scholar
Komninakis C, Fragouli C, Sayed A, Wesel R: Multi-input multi-output fading channel tracking and equalization using Kalman estimation. IEEE Trans. Signal Process 2002, 50: 1065-1076. 10.1109/78.995063
Article Google Scholar
Telatar IE: Capacity of multi-antenna gaussian channels. Eur. Trans. Telecommun. 1999, 10: 585-595. 10.1002/ett.4460100604
Article Google Scholar
Schulze H, Lueders C: Theory and Applications of OFDM and CDMA: Wideband Wireless Communications. John Hoboken; 2005.
Book Google Scholar
Kay SM: Fundamentals of Statistical Signal Processing, Vol. I-Estimation Theory. Englewood Cliffs, NJ: Prentice Hall; 1993.
Google Scholar
Vaughan D: A nonrecursive algebraic solution for the discrete Riccati equation. IEEE Trans. Autom. Control 1970, 15: 597-599. 10.1109/TAC.1970.1099549
Article Google Scholar
Laub A: A Schur method for solving algebraic Riccati equations. IEEE Trans. Autom. Control 1979, 24: 913-921. 10.1109/TAC.1979.1102178
Article MathSciNet Google Scholar
Leland R: An alternate calculation of the discrete-time Kalman filter gain and Riccati equation solution. IEEE Trans. Autom. Control 1996, 41: 1817-1819. 10.1109/9.545748
Article MathSciNet Google Scholar
Minn H, Al-Dhahir N: Optimal training signals for MIMO OFDM channel estimation. IEEE Trans. Wirel. Commun 2006, 5: 1158-1168.
Article Google Scholar
Dong M, Tong L, Sadler B: Optimal insertion of pilot symbols for transmissions over time-varying flat fading channels. IEEE Trans. Signal Process. 2004, 52: 1403-1418. 10.1109/TSP.2004.826182
Article MathSciNet Google Scholar
Kay S: Modern Spectral Estimation: Theory and Application. Upper Saddle River: Prentice-Hall; 1999.
Google Scholar
Stoica P, Moses RL: Spectral Analysis of Signals. Upper Saddle River: Prentice-Hall; 2005.
Google Scholar
Van Trees HL: Detection, Estimation, and Modulation Theory, Part I. USA: Wiley-Interscience; 2001.
Book Google Scholar
Hjørungnes A, Gesbert D: Complex-valued matrix differentiation: techniques and key results. IEEE Trans. Signal Process 2007, 55: 2740-2746.
Article MathSciNet Google Scholar
Hjørungnes A, Gesbert D, Palomar D: Unified theory of complex-valued matrix differentiation. IEEE International Conference on Acoustics, Speech and Signal Processing 2007, 345-348.
Google Scholar

Download references

Acknowledgements

This study was supported by the Office of Naval Research (ONR) via grant N00014-10-1-0065.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, Worcester, MA, 01609, USA
Srikanth Pagadarai & Alexander M Wyglinski
Department of Electrical Engineering, United States Naval Academy, Annapolis, MD, 21402, USA
Christopher R Anderson

Authors

Srikanth Pagadarai
View author publications
You can also search for this author in PubMed Google Scholar
Alexander M Wyglinski
View author publications
You can also search for this author in PubMed Google Scholar
Christopher R Anderson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Srikanth Pagadarai.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Pagadarai, S., Wyglinski, A.M. & Anderson, C.R. Low-mobility channel tracking for MIMO–OFDM communication systems. EURASIP J. Adv. Signal Process. 2013, 78 (2013). https://doi.org/10.1186/1687-6180-2013-78

Download citation

Received: 02 February 2012
Accepted: 11 February 2013
Published: 15 April 2013
DOI: https://doi.org/10.1186/1687-6180-2013-78

Low-mobility channel tracking for MIMO–OFDM communication systems

Abstract

Introduction

System model

Training phase

Data transmission phase

Decoupled channel and symbol estimation

Strategy: Bayesian FIM maximization

Theorem 1.

OFDM with FDM training: an orthogonal affine precoder scheme

Training phase

Data transmission phase

Blockwise Kalman tracking

Kalman filter recursion

Lemma 1

Capacity bounds with sequential MMSE channel estimation

Upper bound on the channel capacity

Lower bound on the channel capacity

Training phase

Data transmission phase

Simulation results

Performance evaluation of optimal training designs over non-time-varying wireless channels

Comparison of the MIMO channel estimator and BCRB

MMSE estimation of information symbols

Kalman tracking of time-varying wireless channels

Steady-state MMSE of the channel estimator

MMSE estimation of information symbols due to Kalman channel tracking

Capacity bounds

Conclusion

Appendix 1

Modeling time-variations in low-mobility wireless channels

Proof of Theorem 1

Proof of Theorem 2

Proof of Lemma 1

Optimal training design

Proof of Theorem 4

Proof of Theorem 5

Endnotes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords