Doubly Selective Channel Estimation Using Superimposed Training and Exponential Bases Models

Channel estimation for single-input multiple-output (SIMO) frequency-selective time-varying channels is considered using superimposed training. The time-varying channel is assumed to be described by a complex exponential basis expansion model (CE-BEM). A periodic (nonrandom) training sequence is arithmetically added (superimposed) at a low power to the information sequence at the transmitter before modulation and transmission. A two-step approach is adopted where in the ﬁrst step we estimate the channel using CE-BEM and only the ﬁrst-order statistics of the data. Using the estimated channel from the ﬁrst step, a Viterbi detector is used to estimate the information sequence. In the second step, a deterministic maximum-likelihood (DML) approach is used to iteratively estimate the SIMO channel and the information sequences sequentially, based on CE-BEM. Three illustrative computer simulation examples are presented including two where a frequency-selective channel is randomly generated with di ﬀ erent Doppler spreads via Jakes’ model.


INTRODUCTION
Consider a time-varying SIMO (single-input multiple-output) FIR (finite impulse response) linear channel with N outputs. Let {s(n)} denote a scalar sequence which is input to the SIMO time-varying channel with discrete-time impulse response {h(n; l)} (N-vector channel response at time n to a unit input at time n−l). The vector channel may be the result of multiple receive antennas and/or oversampling at the receiver. Then the symbol-rate, channel output vector is given by In a complex exponential basis expansion representation [4] it is assumed that where N-column vectors h q (l) (for q = 1, 2, . . . , Q) are timeinvariant. Equation (2) is a basis expansion of h(n; l) in the time variable n onto complex exponentials with frequencies {ω q }. The noisy measurements of x(n) are given by Equation (2) is the complex-exponential basis expansion model (CE-BEM). A main objective in communications is to recover s(n) given noisy {y(n)}. In several approaches this requires knowledge of the channel impulse response [11,19]. In conventional training-based approaches, for time-varying channels, one has to send a training signal frequently and periodically to keep up with the changing channel [7]. This wastes resources. An alternative is to estimate the channel based solely on noisy y(n) exploiting statistical and other properties of {s(n)} [11,19]. This is the blind channel estimation approach. More recently a superimposed trainingbased approach has been explored where one takes where {b(n)} is the information sequence and {c(n)} is a training (pilot) sequence added (superimposed) at a low power to the information sequence at the transmitter before modulation and transmission. There is no loss in information rate. On the other hand, some useful power is wasted in superimposed training which could have otherwise been allocated to the information sequence. Periodic superimposed training for channel estimation via first-order statistics for SISO systems have been discussed in [9,16,21] (and references therein) for time-invariant channels, and in [17] (a conference version of Section 2 of this paper) for both timeinvariant and time-varying (CE-BEM based) channels.
2 EURASIP Journal on Applied Signal Processing CE-BEM representation/approximation of doubly selective channels have been used in [1,2,[4][5][6][7]15], among others. Reference [7] deals with time-multiplexed training sequence design for block transmissions. In this paper we only deal with serial transmissions. In [5], a semiblind approach is considered with time-multiplexed training with serial transmissions and at least two receive antennas. In this paper our results hold even with one receive antenna. Reference [2] deals with time-varying equalizer design given CE-BEM representation.
Reference [3] appears to be the first to use (periodic) superimposed training for SISO time-invariant channel estimation. Periodic training allows for use of the first-order statistics (time-varying mean) of the received signal. Since blind approaches cannot resolve a complex scaling factor ambiguity, they require differential encoding/decoding resulting in an approximately 3 dB SNR loss. It was noted in [3] that power loss in superimposed training would be typically much less than 3 dB. Furthermore, it was also noted in [3] that identifiability conditions for superimposed trainingbased methods are much less stringent than that for blind approaches. As noted earlier periodic superimposed training for channel estimation via first-order statistics for SISO systems has been discussed in [17] for both time-invariant and time-varying (CE-BEM based) channels. While in principle aperiodic superimposed training can also be used, periodic training allows for a much simpler algorithm; for instance, for CE-BEM channels, relation (13) leads to (19) (see Section 2) which allows for a "decoupled" estimation of the coefficients d mq (see (10)) from data. In the CE-BEM model the exponential basis functions are orthogonal over the record length. When we use periodic training with appropriately selected period in relation to the record length, the "composite" basis functions (e jωmqn in Section 2) are still orthogonal, leading to (13). However, there does not exist any relative advantage or disadvantage between periodic and aperiodic superimposed training when using the iterative approach to joint channel and information sequence estimation discussed in Section 3. In the simulations presented in this paper we used an m-sequence (maximal length pseudorandom binary sequence) as superimposed training sequence. While there exist a large class of periodic training sequences which are periodically white and/or optimal in some sense (see [9]), some of them do not have a peak-to-average power ratio of one and some of them do not have finite alphabet, whereas an m-sequence has finite (binary) alphabet and unity peak-to-average power ratio.
As noted earlier, compared to periodically inserted timemultiplexed training (as in [7]), there is no loss in data transmission rate in superimposed training. However, there may be an increase in bit-error rate (BER) because of an SNR loss due to power allocated to superimposed training. Our simulation comparisons show that at "low" SNRs we also have a BER advantage (see Example 3 in Section 4). In semi-blind approaches (such as that in [5]), there is periodically inserted time-multiplexed training but one uses the nontrainingbased data also to improve the training-based results: it uses a combination of training and blind cost functions. While [5] needs at least two receive antennas, in this paper our results hold even with one receive antenna; besides, in [5] there is still a loss in data transmission rate owing to the presence of time-multiplexed training.
In [17] a first-order statistics-based approach for timeinvariant channel estimation using periodic superimposed training has been presented. This approach is further analyzed and enhanced in [18] where a performance analysis has been carried out, and issues such as frame synchronization and training power allocation have been discussed. Both these papers do not deal with time-varying channels; moreover, they do not discuss any iterative approach to joint channel and information sequence estimation even in the context of time-invariant channels.

Objectives and contributions
In this paper, we first present and extend the first-order statistics-based approach of [17] for time-varying (CE-BEM based) channels. Then we extend the first-order statisticsbased solution to an iterative approach to joint channel and information sequence estimation, based on CE-BEM, using Viterbi detectors. The first-order statistics-based approach views the information sequence as interference whereas in the iterative joint estimation version it is exploited to enhance channel estimation and information sequence detection. All results in this paper are developed for an SIMO formulation since everything developed for an SISO system carries over to an SIMO model in a straightforward fashion. However, all our simulations are presented for an SISO system (for simplicity of presentation).

Notation
Superscripts H, T, and † denote the complex conjugate transpose, the transpose and the Moore-Penrose pseudoinverse operations, respectively. δ(τ) is the Kronecker delta and I N is the N × N identity matrix. The symbol ⊗ denotes the Kronecker product. The superscript * denotes the complex conjugation operation.

CE-BEM representation
We now briefly discuss the CE-BEM representation of timevarying communications channels, following [4] and particularly [6], to consider practical situations where the basis frequencies ω q 's would be known a priori. Consider a timevarying (e.g., mobile wireless) channel with complex baseband, continuous-time, received signal x(t) and transmitted complex baseband, continuous-time information signal s(t) (with symbol interval T s seconds) related by h(t; τ) which is the time-varying impulse response of the channel (response at time t to a unit impulse at time t − τ). Let τ d denote the (multipath) delay-spread of the channel and let f d denote the Doppler spread of the channel. If x(t) is sampled once every T s seconds (symbol rate), then by [6], for t = nT s + t 0 ∈ [t 0 , t 0 + TT s ), the sampled signal x(n) := x(t)| t=nTs+t0 has the Jitendra K. Tugnait et al. where This is a scenario where the CE-BEM representation is appropriate. The above representation is valid over a duration of TT s seconds (T samples). Equation (1) arises if we follow (5) and consider an SIMO model arising due to multiple antennas at the receiver. Although discussed in the context of OFDM, in [12] it is shown that finite-duration observation window effects compromise the accuracy of CE-BEM, that is, CE-BEM is "accurate" only as T → ∞. One could try to improve the CE-BEM efficacy by explicitly incorporating timedomain windowing effects (as in [12]). Such modifications are outside the scope of this paper. We do note that in [8], alternative models (such as polynomial bases models) coupled with CE-BEM have been used to improve the modeling results.

A FIRST-ORDER STATISTICS-BASED SOLUTION
It is based on CE-BEM. Assume the following: (H1) the time-varying channel {h(n; l)} satisfies (2) where the frequencies ω q (q = 1, 2, . . . , Q) are distinct and known with ω q ∈ [0, 2π). Also The mean vector m may be unknown; (H4) the superimposed training sequence c(n) = c(n + mP) for all m, n is a nonrandom periodic sequence with period P.
For model (7), we have q = (Q + 1)/2. Negative values of ω q 's in (7) are to be interpreted as positive values after a modulo 2π operation, that is, in (7), for 1 ≤ q < q, we also have In this section, we will exploit the first-order statistics (i.e., E{y(n)}) of the received signal. (A consequence of using the first-order statistics is that the knowledge of the noise variance σ 2 v in (H3) is not used here.) By (H4), we have where The coefficients c m are known at the receiver since {c(n)} is known. By (1)-(3), (8)- (9), and (H3), we have Suppose that we pick P to be such that (ω q + α m )'s are all distinct for any choice of m and q. For instance, suppose that the data record length T samples (see also Section 1.1) and P are such that T = KP for some integer K > 0. In such a case, we have ω mq If P and K are such that K ≥ Q, then it follows from (12) that ω m1q1 = ω m2q2 if q 1 = q 2 or m 1 = m 2 . Henceforth, it is assumed that the above conditions hold true. Then we have Note that ω mq = 0 only when m = 0 and q = q. We rewrite (10) as Given the observation sequence y(n), 0 ≤ n ≤ T − 1, our approach to estimating h q (l)'s using the first-order statistics of the data is to first estimate d mq 's for 0 ≤ m ≤ P − 1, 1 ≤ q ≤ Q ((q, m) = (q, 0)), and then estimate h q (l)'s from the estimated d mq 's. By (14), d mq is the coefficient of the exponential e jωmqn for (q, m) = (q, 0), whereas d 0q + m is the coefficient of e jω0qn = 1. Since the dc offset m is not necessarily known, we will not seek the coefficient of e jω0qn in (14). By (1)- (3) and (14), we have where e(n) is a zero-mean random sequence. Define the cost function

EURASIP Journal on Applied Signal Processing
Choose d mq 's (q = 1, 2, . . . , Q; m = 0, 1, . . . , P − 1, (q, m) = (q, 0)) to minimize J. For optimization, we must have where the partial derivative in (17) Using (13), (15), and (18), it follows that (for (q, m) = (q, 0)) It follow from (14) and (19) that Now we establish that given d mq for 1 ≤ q ≤ Q and 0 ≤ m ≤ P − 1 but excluding ω q + α m = 0, we can (uniquely) estimate h q (l)'s if P ≥ L + 2 and c m = 0 for all m = 0. Define Omitting the term m=0 and using the definition of d mq from (10), it follows that Notice that we have omitted all pairs (m, q) = (0, q) (q = q) from (27). In order to include these omitted terms, we further define an [N(Q − 1)]-column vector and an [N( Then it follows from (10) and (28)-(30) that In order to concatenate (27) and (31), we define which lead to Equation (33) By (20) and (33), it follows that We summarize our method in the following lemma. Remark 1. A more logical approach would have been to select h q (l)'s and m jointly to minimize the cost J in (16). The resulting solution is more complicated and it couples estimates of h q (l)'s and m. Since we do not use d 0q , we are discarding any information about h q (l) therein.

Remark 2.
It should be emphasized that precise knowledge of the channel length L is not required; an upperbound L u suffices. Then we estimate H l for 0 ≤ l ≤ L u with E{ H l } = 0 for l ≥ L + 1. Moreover, we do not need c m = 0 for every m. We need at least L + 2 nonzero c m s. (16) is not novel; it also occurs in [1,15] in the context of time-multiplexed training for doubly selective channels. However, unlike these papers, as noted in Remark 1 we do not directly estimate h q (l)'s and m (there is no m in these papers); rather, we first estimate d mq 's which are motivated through the time-varying mean E{y(n)}, hence, the term first-order statistics. This aspect is missing from [1,15], and in this paper it is motivated by the time-invariant results of [9,16,21] (among others). Choice of periodic superimposed training is also motivated by the results of [9,16,21].

DETERMINISTIC MAXIMUM-LIKELIHOOD (DML) APPROACH
The first-order statistics-based approach of Section 2 views the information sequence as interference. Since the training and information sequences of a given user pass through an identical channel, this fact can be exploited to enhance the channel estimation performance via an iterative approach. We now consider joint channel and information sequence estimation via an iterative DML (or conditional ML) approach assuming that the noise v(n) is complex Gaussian. We have guaranteed convergence to a local maximum. Furthermore, if we initialize with our superimposed training-based solution, one is guaranteed the global extremum (minimum error probability sequence estimator) if the superimposed training-based solution is "good.
Using (1)-(3) we then have the following linear model: where V = V + M is a column-vector consisting of samples of noise {v(n)} in a manner similar to (36), H is defined in (24), T (s) is a block Hankel matrix given by a block Hankel matrix has identical block entries on its block antidiagonals, and Σ n := e jω1n I N e jω2n I N · · · e jωQn I N .
Also using (1)- (3), an alternative linear model for Y is given by where s is the estimate of s. In the above we have followed a DML approach assuming no statistical model for the input sequences {s(n)}. Using (39) The finite alphabet properties of the information sequences can also be incorporated into the DML methods. These algorithms, first proposed by Seshadri [13] for time-invariant SISO systems, iterate between estimates of the channel and the input sequences. At iteration k, with an initial guess of the channel H (k) and the mean m (k) , the algorithm estimates the input sequence s (k) and the channel H (k+1) and mean m (k+1) for the next iteration by where S is the (discrete) domain of s. The optimizations in (47) and (48) are linear least squares problems whereas the the optimization in (46) can be achieved by using the Viterbi algorithm [11]. Note that (46)-(48) can be interpreted as a constrained alternating least-squares implementation with s ∈ S as the constraint. Since the above iterative procedure involving (46), (47), and (48) decreases the cost at every iteration, one achieves a local maximum of the DML function.

EURASIP Journal on Applied Signal Processing
We now summarize our DML approach in the following steps.
(1) (a) Use (34) to estimate the channel using the firstorder (cyclostationary) statistics of the observations. Denote the channel estimates by H (1) and h (1) q (l). In this method {c(n)} is known and {b(n)} is regarded as interference.

SIMULATION EXAMPLES
We now present several computer simulation examples in support of our proposed approach. Example 1 uses an exact CE-BEM representation to generate data whereas Examples 2 and 3 use a 3-tap Jakes' channel to generate data. In all examples CE-BEMs are used to process the observations; therefore, in Examples 2 and 3 we have approximate modeling.
Example 1. In this example we pick an arbitrary value of Q independent of T. In (2) take N = 1, Q = 2, and We consider a randomly generated channel in each Monte Carlo run with random channel length L ∈ {0, 1, 2} picked with equal probabilities and random channel coefficients h q (l), 0 ≤ l ≤ L, taken to be mutually independent complex random variables with independent real and imaginary parts, each uniformly distributed over the interval [−1, 1]. Normalized mean-square error (MSE) in estimating the channel coefficients h q (l), averaged over 100 Monte Carlo runs, was taken as the performance measure for channel identification. It is defined as (before Monte Carlo averaging) The training sequence was taken to be an m-sequence (maximal length pseudorandom binary sequence) of length 7 (= P) The input information sequence {b(n)} is i.i.d. equiprobable 4-QAM. As in [9,16], define a power loss factor and power loss −10 log(α) dB, as a measure of the information data power loss due to the inclusion of the training sequence. Here The training sequence was scaled to achieve a desired power loss. Complex white zero-mean Gaussian noise was added to the received signal and scaled to achieve a desired signal-tonoise (SNR) ratio at the receiver (relative to the contribution of {s(n)}). Our proposed method using L = L u = 4 (channel length overfit) in (34) was applied for varying power losses due to the superimposed training sequence. Figure 1 shows the simulation results. It is seen that as α decreases (i.e., training power increases relative to the information sequence power), one gets better results. Moreover, the proposed method works with overfitting. Finally, adding nonzero mean (dc offset) to additive noise yielded essentially identical results (differences do not show on the plotted curves).
Example 2. Consider (1) with N = 1 and L = 2. We simulate a random time-and frequency-selective Rayleigh fading channel following [20]. For different l's, h(n; l)'s are mutually independent and for a given l, we follow the modified Jakes' model [20] to generate h(n; l): where X(t) = (2/  over [0, 2π), T s denotes the symbol interval, f d denotes the (max.) Doppler spread, and M = 25. For a fixed l, (57) generates a random process {h(n; l)} n whose power spectrum approximates the Jakes' spectrum as M ↑ ∞. We consider a system with carrier frequency of 2 GHz, data rate of 40 kB (kB = kilo-Bauds), therefore, T s = 25 × 10 −6 seconds, and a varying Doppler spread f d in the range 0 Hz to 200 Hz (corresponding to a maximum mobile velocity in the range 0 to 108 km/hr). We picked a data record length of 400 symbols (time duration of 10 msec). For a given Doppler spread, we pick Q as in Section 1.1 (T = 400, L = 2 in (7)). For the chosen parameters it varies within the values {1, 3, 5}. We emphasize that the CE-BEM was used only for processing at the receiver; the data were generated using (57).
We take all sequences (information and training) to be binary. For superimposed training, we take a periodic (scaled) binary sequence of period P = 7 with the trainingto-information sequence power ratio (TIR) of 0.3 where and σ 2 b and σ 2 c denote the average power in the information sequence {b(n)} and training sequence {c(n)}, respectively. Complex white zero-mean Gaussian noise was added to the received signal and scaled to achieve a target bit SNR at the receiver (relative to the contribution of {s(n)}).
For comparison, we consider conventional time-multiplexed training assuming time-invariant channels, as well as CE-BEM-based periodically placed time-multiplexed training with and without zero-padding, following [7]. In the former, the block of data of length 400 symbols was split into two nonoverlapping blocks of 200 symbols each. Each subblock had a training sequence length of 46 symbols in the middle of the data subblock with 154 symbols for information; this leads to a training-to-information sequence power ratio (over the block length) of approximately 0.3. Assuming synchronization, time-invariant channels were estimated using conventional training and used for information detection via a Viterbi algorithm; this was done for each subblock. In the CE-BEM set-up, following [7], we took a training block of length 2L + 1 = 5 and a data block of length 17 bits leading to a frame of length 22 bits. This frame was repeated over the entire record length (22 × 18). Thus, we have a trainingto-information bit ratio of approximately 0.3. Two versions of training sequences were considered. In one of them zeropadding was used with a random bit in the middle of the training block, as in [7]: this leads to a peak-to-average power ratio (PAR) of 5. In the other version we had a random binary sequence of length 5 in each training block, leading to a PAR of 1 (an ideal choice). Assuming synchronization, CE-BEM channel was estimated using conventional training and used for information detection via a Viterbi algorithm. We also considered another variation of zero-padded training with a training block of length 2L + 1 = 5 but a data block of length 50 bits leading to a training-to-information bit ratio of 0.1. Thus the proposed superimposed training scheme results in a data transmission rate that is 30% higher than the data transmission rate in all of the time-multiplexed training schemes considered in this example, except for the last scheme compared to which the data transmission rate is 10% higher. Figure 2 shows the BER (bit error rate) based on 500 Monte Carlo runs for conventional training based on timeinvariant (TI) modeling, the CE-BEM-based periodically placed time-multiplexed training for PAR = 5 and PAR = 1, the first-order statistics and superimposed training-based method and the proposed DML approach with two iterations, under varying Doppler spreads f d and a bit SNR of 25 dB. It is seen that as Doppler spread f d increases beyond about 60 Hz (normalized Doppler T s f d of 0.0015), superimposed training approach of Section 2 (step (1)) outperforms the conventional (midamble) training with time-invariant channel approximation, without decreasing the data transmission rate. Furthermore, the proposed DML enhancement can lead to a significant improvement with just one iteration. On the other hand, the CE-BEM-based periodically placed time-multiplexed training approach of [7] significantly outperforms the superimposed training-based approaches, but at the cost of a reduction in the data transmission rate. Figure 3 shows the normalized channel mean-square error (NCMSE), defined (before averaging over runs) as (59) It is seen that the proposed DML enhancement leads to a significant improvement in channel estimation also with just one iteration.  Example 3. To further compare the relative advantages and disadvantages of CE-BEM-based superimposed training and periodically placed time-multiplexed training, we now repeat Example 2 but with varying SNR; the other details remain unchanged. Figures 4 and 5 show the simulation results for a Doppler spread of 120 Hz (normalized Doppler spread of 0.003 for bit duration of T s = 25 μs) where we compare the results of the second iteration of the proposed DML approach based on superimposed training with that of periodically placed time-multiplexed training. There is an error floor with increasing SNR which is attributable to modeling errors in approximating the Jakes' model with CE-BEM. It is seen from Figure 4 that our proposed approach outperforms (better BER) the CE-BEM-based periodically placed timemultiplexed training approach of [7] for SNRs at or below 10 dB, and underperforms for SNRs at or above 20 dB. There is also the data transmission rate advantage at all SNRs.

CONCLUSIONS
In this paper we first presented and extended the first-order statistics-based approach of [17] for time-varying (CE-BEM based) channel estimation using superimposed training. Then we extended the first-order statistics-based solution to an iterative approach to joint channel and information sequence estimation, based on CE-BEM, using Viterbi detectors. The first-order statistics-based approach views the information sequence as interference whereas in the iterative joint estimation version it is exploited to enhance channel estimation and information sequence detection. The results were illustrated via several simulation examples some of them involving time-and frequency-selective Rayleigh fading where we compared the proposed approaches to some of the existing approaches. Compared to the CE-BEM-based periodically placed time-multiplexed training approach of [7], one achieves a lower BER for SNRs at or below 10 dB, and higher BER for SNRs at or above 20 dB. There is also the data transmission rate advantage at all SNRs. Further work is needed to compare the relative advantages and disadvantages of CE-BEM-based superimposed training and periodically placed time-multiplexed training.