Optimal Training for Time-Selective Wireless Fading Channels Using Cutoff Rate

We consider the optimal allocation of resources—power and bandwidth—between training and data transmissions for single-user time-selective Rayleigh ﬂat-fading channels under the cuto ﬀ rate criterion. The transmitter exploits statistical channel state information (CSI) in the form of the channel Doppler spectrum to embed pilot symbols into the transmission stream. At the receiver, instantaneous, though imperfect, CSI is acquired through minimum mean-square estimation of the channel based on some set of pilot observations. We compute the ergodic cuto ﬀ rate for this scenario. Assuming estimator-based interleaving and M - PSK inputs, we study two special cases in-depth. First, we derive the optimal resource allocation for the Gauss-Markov correlation model. Next, we validate and reﬁne these insights by studying resource allocation for the Jakes model. Copyright © 2006 Saswat Misra et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


INTRODUCTION
In wireless communications employing coherent detection, imperfect knowledge of the fading channel state imposes limits on the achievable performance as measured by, for example, the mutual information, the bit-error rate (BER), or the minimum mean-square error (MMSE). Typically, a fraction of system resources-bandwidth and energy-is devoted to channel estimation techniques (known as training) which improve knowledge of the channel state. Such schemes give rise to a tradeoff between the allocation of limited resources to training on one hand and data on the other, and it is natural to seek the optimal allocation of resources between these conflicting requirements. Such optimization is of particular interest for rapidly varying channels, where the energy and bandwidth savings of an optimized scheme can be significant.
In this context, the pilot symbol assisted modulation (PSAM) [1,2] has emerged as a viable and robust training technique for rapidly varying fading channels. In PSAM, known pilot symbols are multiplexed with data symbols for transmission through the communications channel. At the receiver, knowledge of these pilots is used to form channel estimates, which aid the detection of the data both directly (by modifying the detection rule based on the channel estimate) and indirectly (e.g., by allowing for estimator-directed modulation, power control, and media access). PSAM has been incorporated into standards for IEEE 802.11, Global System for Mobile Communication (GSM), Wideband Code-Division Multiple-Access (WCDMA), and military protocols, and many theoretical issues are now being addressed. For example, optimized approaches to PSAM have recently been studied from the perspectives of frequency and timing offset estimation [3,4], BER [1,[5][6][7], and the channel capacity or its bounds [8][9][10][11].
Most relevant to the current study are [12][13][14], each of which considers PSAM design for the continuously timevarying single-input single-output (SISO) time-selective Rayleigh flat-fading channel, under capacity or its bounds. In each work, the transmitter is assumed to have knowledge of the Doppler spectrum, and the receiver makes (instantaneous) MMSE estimates of the channel based on some subset of the pilot observations. In [13], three estimators (of varying complexity) are proposed and used to predict the channel state for a Gauss-Markov channel correlation model. The optimal binary inputs based on the SNR and estimator statistics are used, and it is determined that for sufficiently correlated channels (i.e., slow enough fading), PSAM provides significant gains in the achievable rates over the no pilot approach. Analysis was carried out through numerical simulation, and the optimization of energy between pilot and data symbols was not attempted. In [12,14], the authors assume a bandlimited Doppler spectrum and derive closed-form bounds on the channel capacity, using the estimator that exploits all past and future pilot observations. In both works the capacity and/or its bounds are seen to be parameterized by the variance of this channel estimator. Closed-form results are derived for the optimal allocation of training and bandwidth in some cases.
Here, we study optimal PSAM design for the SISO timeselective Rayleigh flat-fading channel under the cutoff rate. The cutoff rate is a lower bound on the channel capacity and provides an upper bound on the probability of block decoding error (by bounding the random coding exponent). It has been used to establish practical limits on coded performance under complexity constraints [15], and can often be evaluated in closed-form when the capacity cannot (an overview of the cutoff rate for fading channels can be found in [16]). The cutoff rate with perfect receiver channel state information (CSI) has been examined in [17] (independent fading) and in [18,19] (temporally correlated fading), and for no CSI multiple-input multiple-output (MIMO) systems in [20]. However, we are not aware of any work in which PSAM design is considered from the cutoff rate perspective. Assuming M-PSK inputs, and a general class of MMSE estimators in which some subsets of past and future pilots are exploited at the receiver, we derive a simple expression for the interleaved cutoff rate that will be seen to facilitate analysis. This paper is organized as follows. In Section 2 we specify the system model and derive the corresponding cutoff rate using M-PSK inputs. In Section 3 we study optimal training for the special case of the Gauss-Markov correlation model. Closed-form expressions for the optimal energy and bandwidth allocation follow in some cases. In Section 4 we validate and refine the design paradigms gained in the last section, by studying optimal training for the well-known (though less tractable) Jakes correlation model. In Section 5 we summarize our guidelines for PSAM design in rapidly fading channels, and propose future work.
Notation. We use the following (standard) notation: (a) x ∼ CN (µ, Σ) denotes a complex Gaussian random vector x with mean µ and with independent real and imaginary parts, each having covariance matrix Σ/2, (b) E X [·] is expectation with respect to the random variable X (the subscript X is omitted where obvious), (c) superscripts " * ," "t," and "H" denote complex conjugation, transposition, and conjugate transposition, (d) I N is the N ×N identity matrix, and (e) |a| denotes the absolute value of the scalar a, |A| denotes the determinant of the matrix A, and |A| denotes the cardinality of the set A (the context will make use of | · | clear in each case).

SYSTEM MODEL
We review the channel model and PSAM-based training scheme, discuss the transmission of a codeword, and evaluate the cutoff rate.

Channel model
We consider single-user communications over a time-selective (i.e., temporally correlated) Rayleigh flat-fading channel. The sampled baseband received signal y k (assuming perfect timing) is given by the scalar observation equation where k denotes discrete time, s k ∈ S M {e − j2πν/M } M−1 ν=0 represents the M-PSK input, E k is the energy in the kth transmission slot, h k ∼ CN (0, σ 2 h ) models fading, and n k ∼ CN (0, σ 2 n ) models additive white Gaussian noise (AWGN). We define the normalized channel correlation function

Pilot symbol assisted modulation
In PSAM, the transmitter embeds known pilot symbols into the transmission stream. We consider periodic PSAM in which pilots are embedded with period T, so that s k = +1 at times k = mT (m = 0, ±1, . . .). Because the allocation of energy to training versus data symbols entails a tradeoff, we allow a different energy level for each. Define where E P is the pilot symbol energy and E D the data symbol energy. 1 We define the received SNR in the pilot and data slots as In each time slot k = mT + (m = 0, ±1, . . . ; 0 ≤ ≤ T − 1), an MMSE (i.e., conditional-mean) estimate of the channel is made at the receiver using a selection of past, current, and future pilot symbol observations. Specifically, the estimate at the th lag from the most recent pilot is where N ⊆ Z is the subset of pilot indices used by the estimator. 2 The cardinality |N | denotes the number of pilots used for estimation. Since h mT+ and {y (m+n)T } n∈N are jointly Gaussian, the MMSE estimate of (5) is linear in the pilot observations, and therefore, also Gaussian. We get [22, pages 508-509] h mT+ = C h y C −1 yy y, where C h y is the 1 × |N | correlation vector between the estimate and observation, C yy the |N | × |N | observation 1 The current two-dimensional energy allocation problem is easily extendable to a T-dimensional one, in which each of the T − 1 data slots may be allocated a unique energy value. We report results from this approach in [21]. 2 Observations in the nonpilot slots could be used to further improve the channel estimate, as is done in semiblind estimators.
3 covariance matrix, and y the |N | × 1 observation vector, whose elements in the ith row and jth column are given by (1 ≤ where N v denotes the vth smallest element in N (v = 1, . . . , |N |), and where δ(·) is the Kronecker delta. We will find it useful to write the last two equations in the form The estimate of (6) and estimation error h mT+ are independent (by application of the orthogonality principle), and it follows that h mT+ ∼ CN (0, σ 2 is the estimator variance positions from the most recent pilot. The performance of a particular estimator will be characterized by the normalized estimator variance, termed the CSI quality, and defined as Note that ω is not a function of m (we assume steady state estimation), and that ω = 0 denotes no CSI, while ω = 1 denotes perfect CSI. It is assumed throughout that the transmitter has knowledge of ω , the statistical quality of channel estimates, but not the instantaneous values h mT+ .(For the transmitter to acquire knowledge of ω it must know the channel correlation R h (τ), the estimation scheme N , and the pilot SNR κ P .) In the remainder of this paper we will consider two subclasses of estimators.

Transmission of a codeword
The system transmits codewords of length N N(T − 1) where N > 0 is a positive integer. Without loss of generality, consider the codeword that starts at time k = 0 denoted by and let denote the channel, the channel estimate, and the estimation error during the span of the codeword. We define normalized correlation matrices for the channel estimate and estimation error, The observation of the codeword after transmission through the channel (9) is where n [n 1 , . . . , n T−1 , n T+1 , . . . , n NT−1 ] t is the noise vector. Note that the diagonal elements of Σ and Σ are 1 N ⊗ [ω 1 , . . . , ω T−1 ] and 1 N ⊗ [1 − ω 1 , . . . , 1 − ω T−1 ], respectively, where 1 N is a row-vector of N ones and where "⊗" denotes the matrix Kronecker product.

EURASIP Journal on Applied Signal Processing
The receiver employs the maximum likelihood (ML) detector which regards S as the channel input and the pair (y, h) as the channel output. Among all possible input symbol sequences for S, denoted by S, the detector chooses the sequence which maximizes the posterior probability of the output, that is, where P(·, · | ·) is the probability distribution function (PDF) of the channel outputs, conditioned on the channel input. Noting that P(y, h | S) = P(y | S, h)P( h) and using standard simplifications under Gaussian statistics, we have, from (17),

Cutoff rate
The cutoff rate, measured in bits per channel use, is [23,24] (see [18] for time-selective fading channels with perfect receiver CSI) where Q(·) is the probability of transmitting a particular codeword. (The normalization factor is 1/NT (rather than 1/N ) to account for the information-loss in pilots slots.) The cutoff rate is evaluated in the appendix and found to be Equation (20) is seen to match [18, equation (14)] for the special case of perfect channel estimation (i.e., Σ = I and Σ = 0). Equation (20) can be used to determine optimal PSAM parameters and the resulting cutoff rate, however, the ensuing analysis would be largely based on numerical techniques. In the remainder of this paper, we focus on more tractable approaches to an analysis of optimal PSAM.

Interleaving
An interleaving-deinterleaving pair [25, pages 468-469] is an integral component of many wireless communications systems. A common assumption is that of infinite depth (i.e., perfect) interleaving, in which the correlation between channel fades at any two symbols within a codeword is completely removed. For example, this assumption has been used to study the cutoff rate of the time-selective fading channel with perfect CSI in [18]. Although interleaving discards information on the channel correlation, such a step is necessary in practice since most channel codes in use have been designed for independently fading channels.(The effect of interleaving on the cutoff rate was studied in [19] for a class of block-interference channels with memory. It was shown that the cutoff rate is generally a decreasing function of the chan-nel memory length, without or without channel state information (this represents a different behavior than known for channel capacity). An analysis of the effect of interleaving is complicated in our setting by the fact that both the estimated channel and effective noise term (consisting of the estimation error plus AWGN) are rendered memoryless sequences by the interleaver. Thus, there exist scenarios where interleaving may either increase or decrease R o .) Since channel realizations occurring exactly (1 ≤ ≤ T − 1) slots from the last pilot have the same estimator statistic ω , we assume that these slots are interleaved only among each other (preserving the marginal statistics of the channel estimate and error). Further, it is assumed that the interleaver uses a different interleaving scheme in each sub-channel, so that the correlation between any two codeword symbols is zero. Perfect interleaving renders Σ and Σ diagonal, so that Each of the matrices in (20) is now diagonal. The cutoff rate simplifies to Saswat Misra et al.

5
where Q (·) is the probability distribution slots from the last pilot (1 ≤ ≤ T − 1). The communications channel is symmetric in its input (M-PSK), and so the cutoff rate is maximized by the equiprobable distribution Q (·) = 1/M. Evaluating the double sum and invoking the constant modulus property of M-PSK yields Equation (23) can be interpreted as follows: the th term in the above sum represents the cutoff rate of the th data subchannel (conceptually consisting of all transmissions occurring slots after a pilot). Thus, (23) represents the cutoff rate of T − 1 parallel subchannels, normalized by the factor 1/T to account for pilot transmissions. Because the temporal-correlation of the channel is exploited for channel estimation before deinterleaving, the cutoff rate depends on the CSI quality {ω } T−1 =0 . If estimation is perfect (ω = 1, for all ), (23) matches [18, equation (16)], as it must. Equation (23) represents the M-PSK cutoff rate under perfect interleaving for an arbitrary channel correlation R h (τ), estimation scheme N and power and bandwidth allocation (κ P , κ D , T). It is the basis for the subsequent analysis.

OPTIMAL TRAINING FOR THE GAUSS-MARKOV MODEL
In this section we determine optimal PSAM parameters under energy and bandwidth constraints for the Gauss-Markov (GM) channel model, whose correlation is described by a first-order autoregressive (AR) process. It is known that second-and third-order AR models provide excellent fits to the Jakes model [26], but they are not as tractable. The GM model has previously been used to characterize the effect of imperfect channel knowledge on the performance of decision-feedback equalization [27], mutual information [28], and minimum mean-square estimation error [6] of time-selective fading channels. The correlation is given by where the α parameter is related to the normalized Doppler spread of the channel and is typically within the range 0.9 ≤ α < 0.99 [13,28]. It will be seen that the GM model provides simple, closed-form, and intuitive expressions for the CSI quality of many estimators of interest (including those of infinite length) and leads to simple design rules for the optimal allocation of resources between training and data, motivating its study in this section.

Energy allocation
In one period of transmission, the total energy consumed is κ P +(T −1)κ D (without ambiguity, we use received energies), and an energy constraint requires that where κ av > 0 is the allowable average energy per transmission (averaged over pilots and data). The inequality in the constraint will be met with equality since R o is increasing in both κ P and κ D . We consider causal and noncausal estimators separately in the following.

(1) Causal estimation
For causal (L, 0) estimators, it can be shown that the cutoff rate optimizing pilot energy κ P is given by the following one dimensional optimization problem involving only the CSI quality in the pilot slot ω 0 where ω 0 (κ P ) emphasizes dependency on κ P . The proof follows by substituting for κ D in terms of the energy constraint into (23), and uses the fact that ω = α 2 ω 0 for any causal estimator. 3 The optimal pilot energy κ P is implicit in (26), as a particular estimator has not been specified (explicit expressions will be given in the examples below). However, when |N | is finite, it is clear from the last equality in (10) that ω 0 is a ratio of polynomials in κ P . Consequently, maximization of (26) involves polynomial rooting. We can write where a 0 , . . . , a U are coefficients to be determined. A sufficient condition for a closed-form solution is U ≤ 4. Next, we derive the optimal training energy at low and high SNR.

Low SNR
To study the low SNR setting, we start from (10): where the approximations hold as κ P → 0. Substitution of which states that half of the total energy per period should be allocated to the pilot symbol. 3 To prove this fact, note that under the GM model, we have (C h y ) 1, j = For causal estimators N j ≤ 0, and therefore, (C h y ) 1, j = E P σ 2 h α −N j T = α (C h0y ) 1, j . Therefore, C h y = α C h0y , and from (10), ω = α 2 ω 0 . 6 EURASIP Journal on Applied Signal Processing

High SNR
At high SNR, the performance of any causal estimator converges to that of the (1, 0) estimator. To see this, start from (10) where the approximation holds as κ P → ∞, and where we have exploited the specific tridiagonal structure of R −1 hh to arrive at the last equality. Clearly, (30) matches (11) with (24) at high SNR. Intuitively, the channel state in the most recent pilot transmission k = mT is learnt perfectly at high SNR, and this renders older pilots k = (m − 1)T, (m − 2)T, . . . irrelevant for prediction in the Markov model of (24).
The fractional training energy for any causal estimator at high SNR can now be found by substituting (11) with (24) into (26). We find that lim κav→∞ κ P κ av T The general properties of κ P for causal estimators are summarized in the left half of Table 1.  (1, 0) estimator, the CSI quality ω 0 is given by (11) with (24). Substitution into (26) yields which agrees with Table 1 in the limiting cases, κ av → 0 and κ av → ∞, as it must. When T = 2, energy is equally divided between pilot and data, as it is in typical transmit-reference schemes.
For the (∞, 0) estimator, the CSI quality is found from (10) to be where inversion of the infinite-dimension C yy matrix has been carried out using the spectral factorization technique [29]. Substituting (33) into (26), it can be verified that as α → 1, the optimal training energy κ P → 0. This is because the (∞, 0) estimator provides an infinite number of noisy observations of the time-invariant (in the α → 1 limit) channel. Each observation requires only a minuscule amount of energy in order to exploit the infinite (in the limit) diversity gain. As α → 0, κ P converges to the κ P of the (1, 0) estimator in (32) (this follows since ω (∞,0) converges to ω (1,0) ): for a rapidly fading channel, only the most recent pilot proves useful. For arbitrary α, the optimal training energy is found by solving (26) with (33). For brevity, we use the coefficient notation of (27), for which we get Note that U = 4, ensuring a closed form solution. Properties of the (1, 0) and (∞, 0) estimators, representing the limiting Table 2: The optimal fractional training energy κ P /κ av T for the (1, 0) and (∞, 0) causal estimators, and the (1, 1) and (∞, ∞) noncausal estimators, under the Gauss-Markov channel.   Table 2.
In Figure 1, we plot the fractional training energy for the (1, 0), (2, 0), (3, 0), and (∞, 0) estimators as a function of the energy constraint κ av for M = 8, T = 4, and α = 0.99. 4 It is seen that as more pilots are exploited, less training energy is required. The fractional training energy is nonmonotonic in κ av for the multipilot estimators, though κ P is monotonic. 5 4 A closed-form solution for κ P under the (2, 0) estimator also exists (i.e., U ≤ 4), but it has been omitted for brevity. For the (3, 0) estimator, a sixth-order polynomial in κ P ensues. 5 Using the Kuhn-Tucker conditions, it can be shown that the fractional energy allocation is nonmonotonic when the channel estimation is better (when more pilots are used, for larger α, and/or for smaller T). For example, for the (∞, 0) estimator, it can be shown that the fractional energy allocation is nonmonotonic according to (2) Noncausal estimation The optimal energy allocation is generally not available in closed-form for noncausal (L, Z) estimators. In general, it can be expressed as We start by considering κ P in the limiting SNR cases. We obtain a closed-form solution at low SNR, and simple, but useful, bounds at high SNR.

Low SNR
At low SNR, the CSI quality (10) is simplified using a technique similar to that used in (28) for causal estimators. We find that where the approximation holds as κ P → 0. Although this expression depends on , substitution into (35) nevertheless yields a closed-form expression for κ P . After taking the limit, we get lim κav→0 κ P κ av T implying once again that half of the available energy per period should be allocated to the pilot symbol at low SNR.

High SNR
At high SNR, the performance of any noncausal estimator converges to that of the (1, 1) estimator (the proof is similar to the one used to derive (30) for causal estimators). Using this fact, we substitute (12) with (24) into (35), and consider the limiting cases of rapid (α → 0) and slow (α → 1) fading, which provide upper and lower bounds on κ P . We get where the lower bound is met with equality as α → 1, and the upper bound as α → 0 (the technique used to evaluate these limits will be made clear shortly, in the arguments leading to (42)). Comparison of (38) to (31) reveals that a noncausal estimator never uses more training energy than a causal one at high SNR (for fixed T). General properties of κ P for noncausal estimators are summarized in the right half of Table 1.

Example 2.
We start with an analysis of the (1, 1) estimator which is valid for all SNR. Simplifying (12) for the Gauss-Markov model, we get Next, we evaluate the CSI quality under rapid and slow fading. For rapid fading, we get and for slow fading we get Substitution of (40) and (41) into (35) yields closed-form solutions. We get In Figure 2 Next, we consider the (∞, ∞) estimator. The CSI quality is found to be which follows from (10) after applying spectral factorization. To determine bounds on the optimal training energy, we again consider the cases of slow and rapid fading. For slow fading, we apply L'Hôpital's rule to (43), and obtain lim α→1 ω (∞,∞) = 1, and it follows from (35) that κ P → 0. For rapid fading (α → 0), it is seen that ω (∞,∞) converges to ω (1,1) (i.e., to the expression on the right hand side of (40)). Therefore, κ P converges to the κ P of the (1, 1) estimator. In Figure 2 the (2, 2) estimator provides most of the reduction in the required training energy, and gains saturate with more sophisticated estimators. For large α, the (∞, ∞) estimator takes advantage of the high-order diversity gain available over the slowly varying channel, and requires considerably less energy than the competing estimators. Properties of the (1, 1) and (∞, ∞) estimators, which represent the limiting cases of noncausal estimation, are summarized on the right side of Table 2.

Training period
In this section we consider the optimal period (equivalently, frequency) with which pilot symbols should be inserted into the symbol stream. The optimal value of T depends on the normalized Doppler α, the cardinality of the input M, the energy constraint κ av , the energy allocation (e.g., the optimal allocation as in Section 3.1 or a static allocation κ D = κ P = κ av ), and the particular estimator employed at the receiver.
However, we will see that the analysis simplifies greatly in the high SNR setting. We will again find it convenient to distinguish between the cases of causal and noncausal estimation.

(1) Causal estimation
At high SNR, the optimal training period for any causal estimator is found from (23). Taking the argmax in T and letting κ av → ∞ we get where we have again used the convergence of all causal estimators to the (1, 0) estimator at high SNR. Equation (45) depends only on M and α; it is independent of the particular estimator used and the energy allocation strategy. Although motivated by the high SNR setting, it will be seen that (45) provides good approximation to the optimal training period over a wide range of SNR.

Example 3.
We study the applicability of the training period rule of (45) to (1, 0) and (∞, 0) estimators at finite values of SNR. A comparison is given in Table 3 for QPSK (i.e., M = 4). The second and third columns are the optimal training period for the (1, 0) estimator under the static and optimal energy allocations, respectively (determined numerically). The fourth and fifth columns are the training period for the (∞, 0) estimator under static and optimal energy allocations (determined numerically), and the sixth column is the optimal training period at high SNR determined from (45). The optimal training period for either estimators, under either energies allocation strategy, is seen to converge to T C as the SNR increases, which is expected. It is seen that convergence occurs sooner when the fading becomes more rapid. For example, for α = 0.80, the training period predicted by (45) is correct for SNRs as small as 0 dB (for either the (1, 0) or (∞, 0) estimators and under either energy allocation strategy). For α = 0.95, T C is exact for SNRs as low as 10 dB, and for α = 0.99, T C is correct to an SNR of 20 dB. For a fixed estimator, it is seen that the optimal training period can vary greatly depending on the energy allocation strategy-at least for smaller κ av and larger α. For example, when α = 0.99 and κ av = 0 dB, the optimal training period varies from 10 (under constant allocation) to 20 (under optimal allocation).

(2) Noncausal estimation
Similarly, we find the optimal training period for any noncausal estimator by considering the high SNR setting. Letting κ av → ∞ in (23), we get Example 4. The right side of Table 3 illustrates the training period for noncausal estimators. The seventh and eighth columns of the table are the optimal training period for the (1, 1) estimator under static and optimal energy allocations, respectively (determined numerically). The ninth and tenth columns are the training period for the (∞, ∞) estimator under static and optimal energy allocations (determined numerically), and the eleventh column is the optimal training period at high SNR determined from (46). Again, we note that T NC provides good approximation to the optimal training period for larger SNR and for more rapid fading. The table reflects intuition: the more predict-able the channel (larger α), the less frequently training is required (larger T). However, the table generally indicates that more sophisticated estimators (e.g., the (∞, ∞)) require more frequent training symbols than simpler ones (e.g., the (1, 1)). To explain this result we refer to (23), which shows that the optimal T is determined not directly by the quality of the estimator, but rather by how quickly the cutoff rate in the th subchannel diminishes in (1 ≤ ≤ T − 1/2 for noncausal estimators). If the better estimator causes the biased sum of (23) to degrade more quickly in , then T will be smaller for the better estimator.  (1, 0) estimator under the static (1st column) and optimal (2nd) energy allocations, the (∞, 0) estimator under static (3rd) and optimal (4th) energy allocations, and the optimal training period at high SNR (5th column) determined from (45). The right half of the table is a study for noncausal estimators: the (1, 1) estimator under the static (6th column) and optimal (7th) energy allocations, the (∞, ∞) estimator under static (8th) and optimal (9th) energy allocations, and the optimal training period at high SNR (10th column) determined from (46).

Performance analysis
We now examine the effect of optimal training on the cutoff rate. In Figure 3 (the training period is fixed at the high SNR optimal value determined from (45)). The merits of optimal allocation increase with the channel predictability: when α = 0.99 there is a ∼ 2 dB gain at κ av = 0 dB, but when α = 0.9 the gain is only a fraction of a dB. In each case, we find that it is the energy allocation, not assignment of the training period, that provides most of the gain in optimized training. This is due in part to our choice of T = T C , which is optimal at high SNR. In Figure 3(b) we plot the impact of an arbitrary choice of T on the cutoff rate under constant energy allocation, κ D = κ P = κ av = 20 dB. The degradation may be significant when T is chosen suboptimally.
To determine the merits of more sophisticated estimators, we compare the cutoff rate under the simplest and the most complex causal ((1, 0) and (∞, 0)) and noncausal ((1, 1) and (∞, ∞)) estimators in Figure 4(a) for α = 0.98 and a constant energy allocation with unoptimized choice of T (we choose T = T C for the causal estimators or T = T NC for the noncausal estimators). Therefore, the curves in the figure represent the largest increase in cutoff rate due to the use of a sophisticated estimator in place of a simpler one. At small SNR there is a ∼ 2 dB gain in using more sophisticated estimators. However, this gain is seen to diminish at high SNR (as expected for the GM model). We repeat the figure, but with optimized energy and training assignments, in Figure 4(b). Remarkably, the energy saving for using the (∞, 0) estimator in place of the (1, 0) (or the (∞, ∞) in place of the (1, 1)) is seen to be a fraction of a dB over the entire SNR range. Energy optimization reduces the need for sophisticated estimators in the GM model.

OPTIMAL TRAINING FOR JAKES MODEL
In this section we study optimized training for the Jakes channel correlation [30]. While the GM model studied in the last section provides straightforward analytic results, the Jakes model is known to be an accurate and experimentally validated model in dense scattering environments. The analysis in this section will be used to validate and refine the design paradigms derived in the last section. For the Jakes model we have where J 0 (·) is the zero-order Bessel function of the first kind, and f D T D > 0 is the normalized Doppler parameter. The cutoff rate is given by (23) with CSI quality (10) as before, but now under the channel correlation of (47).
In the following simulation we consider mobile speeds of {12, 120} Km/h which correspond to the Doppler parameters f D T D = {1/1000, 1/100} at a carrier frequency of 900 MHz and symbol period of T D = 10 μ-sec [31, pages 141-143]. In Figure 5, we plot the cutoff rate for both values of f D T D under (a) optimized energy and training period assignments, κ P = κ P , κ D = κ D , T = T , and under the optimized training period only, κ P = κ D = κ av , T = T . Behavior is seen to be qualitatively similar to that in Figure 3(a) for the GM model: energy-optimized training is seen to provide a noticeable increase in the cutoff rate for slower fading channels, but not for rapidly fading channels. Further, optimized training provides the largest savings at low SNR (∼ 3 dB at κ av = 0 dB), but is of diminishing benefit as SNR increases. The vehicle speeds tested represent extreme cases. For intermediate speeds (i.e., values of f D T D ) performance increases smoothly both in the cutoff rate and in the gains realized by optimal resource allocation.
In Section 3 we made use of the Markov property of the channel in several instances, exploiting the convergence of the (L, 0) estimator to the (1, 0) estimator (alt., the (L, Z) estimator to the (1, 1) estimator) at high SNR. To test the degree to which this property holds under the Jakes model, we plot the cutoff rate for QPSK input and optimized training (both the energy and training period have been optimized) in Figure 6 when f D T D = 1/100. In comparison to Figure 4(b), the counterpart figure for the GM model, we notice a similar qualitative behavior at low SNR. However, we notice differences at high SNR: more sophisticated estimators are seen to be useful at high SNR under optimized training (for both causal and noncausal estimation). For example, there is now a ∼ 3 dB gain in going from the (1, 0) estimator to the (2, 0) estimator at an SNR of κ av = 15 dB. Further, performance of the (L, 0) (alt., (L, Z)) estimator does not converge to that of the (1, 0) (alt., (1, 1)) estimator at high SNR. In general it is seen that the largest gain is achieved in going from the (1, 0) estimator to the (2, 0) estimator (alt., from the (1, 1) to the (2, 2)), after which adding more pilots provides diminishing returns to the cutoff rate.

DISCUSSION AND FUTURE WORK
We have considered cutoff rate optimal training within a PSAM framework for time-selective Rayleigh flat-fading channels. For M-PSK inputs, we have derived in (23) a simple expression for the interleaved cutoff rate that is parameterized by an arbitrary channel correlation, channel estimation scheme, and the allocation of power and bandwidth to training symbols. Using this expression, we have derived analytic rules for the optimal allocation of resources for the Gauss-Markov fading channel, the basic properties of which are summarized in Tables 1 and 2 that at low SNR, half the total available energy should go to training, while at high SNR, a noncausal estimator never uses more energy than a causal one (with equality when the fading is rapid). We have provided expressions for the training period that, while optimal at high SNR, were seen to pro-  Table 3). Next, we studied optimal training for the Jakes model. It was seen that insights derived from the Gauss-Markov model were predictive at low SNR and for simpler estimators, but were not useful at higher SNR with more sophisticated multi-pilot estimators. Using the Jakes model, it was seen that multi-pilot estimators were indeed useful at high SNR, with the largest gains coming with the addition of the first few pilots. Both the Gauss-Markov and Jakes models indicated that while optimized energy allocation is useful at low SNR, it is not useful at high SNR.
Among related work on optimal training for point-topoint time-selective Rayleigh flat-fading channels, only [14] offers analytic results for the optimal training period and energy allocation in a similar framework. The authors consider Gaussian inputs and derive a lower bound on channel capacity that relates estimation error directly to system performance in similar fashion to (23) (however, the capacity formulation requires an expectation over the random channel that cannot be evaluated directly). The (∞, ∞) estimator is used. As a representative example, we compare the optimal energy allocation result of [14] at high SNR for any (bandlimited) Doppler spectrum to our result at high SNR for the GM model. Our result is given by (38) where the lower and upper bounds are achieved in limit of static and i.i.d. fading (resp.). Reference [14] derives κ P κ av T [14] where β > 0 is a constant. This represents a different behavior for large T, as κ P /κ av T saturates to some value > 0 for [14], but not for our model. A comparison of optimal training period results is not as straightforward, as [14] considers pilots to be samples of an underlying continuous time channel: an interpretation that is excluded here. It is of further interest to study cutoff rate optimal training under nonsymmetric inputs such as M-QAM (an analysis for perfect CSI appears in [32]), and generalizations to the MIMO setting and the case where the transmitter has only statistical knowledge of the Doppler spectrum. (20) We start from (19). Note that Dividing the numerator and denominator by σ 2 n and substituting the result into (19) yields (20).