### 5.1. Pilot Positioning Problem Formulation

Pilot placement may influence the equalization performance significantly. Traditionally, pilots have been grouped in big clusters. Recent results, however, indicate that using small groups of pilots that are spread evenly throughout the data block is a better strategy [29, 30, 39]. Proper pilot placement for EM based algorithms is particularly important because of the highly nonlinear nature of the EM objective function in doubly selective channels, which results in many local maxima. The purpose of the pilots is, therefore, to enable sufficient quality channel parameters vector initialization so that the probability of convergence to a local maximum is minimized. First, we reformulate the initialization scheme in Section 3.3 as an equivalent Least Squares (LS) problem. Consider first the case where all transmitted symbols are known. In this case, the channel parameters can be found with an LS solution to the problem

where represents the known transmitted symbols and the BE model-based time variations. More specifically,

where and is an diagonal matrix such that and negative indexes represent data symbols from the previous block that are affecting the observations of the current block due to ISI (if no interblock interference is assumed, these can be replaced by zeros). In addition, and is an identity matrix. The LS solution to (18) is

Now consider the case where only part of the transmitted symbols are known (pilots) and replace the unknown symbols in with zeros. The solution to this "sparse LS" problem is

where

and is defined similarly to with nonpilot symbols set to zero. Finally, is received by setting to zero all elements in the rows corresponding to nonpilot symbols in the matrix . In Appendix C, we show that the initialization method in Section 3.3 is equivalent to (21). The initialization method is thus equivalent to finding the best BE model parameters vector that fits, in the LS sense, the transmitted pilot sequence (Note that the noise term in this model is not white (since the data is treated as part of the noise). Therefore, a better initialization would be to use weighted least squares method. To do that, however, the noise level and average channel profile need to be known or estimated). Our goal is to position the pilots such that the initial channel guess, based on these pilots, will be optimal according to some criterion. Two reasonable criteria for pilot positioning are

where is a vector of the pilot positions in the block. The maximum function in (23) and expectation in (24) are taken with respect to the data symbols, noise, and channel realizations. Using these criteria, it might be possible to optimize both the pilots *positions* and the pilot *patterns*. We, however, select known pilot patterns (e.g., Barker sequences) so that we keep constant envelope signals and optimize the positioning for this given pilot pattern. The usage of these two criteria is detailed in the next sections. Interestingly, both criteria lead to the same positioning scheme for high Doppler rates.

### 5.2. Worst Case Analysis for Flat Fading Channels

In this section, we find the best positioning scheme by using (23). First, notice that the criterion may be decomposed to two terms because

where the second equality is justified because, by construction of , the term is orthogonal to the span of the matrix to which belongs. Note that only the second term is dependent on the pilot positions. Obviously, the best pilot positioning is dependent on the channel and noise realizations. Our goal is to obtain positioning scheme suitable for all channels, data, and noise realizations by optimizing the positioning scheme with respect to the worst case realizations. Using (19) and (22), the second term in (25) may be bounded by

where and is the largest eigenvalue of the matrix . For flat fading channels (and any PSK constellation) and, therefore,

where is the vector of eigenvalues of the matrix . The second equality follows from the fact that for flat fading channels and (Assume that is eigenvalue of , that is, , define , then and . Matrices and have therefore the same eigenvalues).

It follows that minimization of the worst case MSE is achieved by finding a pilot positions vector such that

The matrix is a deterministic function of the BE functions, block size, pilot positioning, and the pilot pattern (sequence). It is therefore possible to find the best positioning scheme for the desired block size, BE model, and pilot sequence with a computer search. For simplicity, we limit the search for patterns in which the pilots are grouped in groups of length and these groups are spread throughout the block as evenly as possible. This means that the pilot positioning we find with this limited search is only optimal amongst all positioning with evenly spaced pilot clusters. However, all previous works on pilot positioning arrived at positioning schemes that are consistent with this structure. It turns out that the best positioning scheme is obtained with for all tested block sizes. It is interesting to note that this result is identical to the result in [29] which was obtained using different channel model and criterion.

### 5.3. Mean Case Analysis for Frequency Selective and Frequency Flat Channels

In this section, we optimize (24). We begin with the approximation

where

and is defined as

Note that an accurate expression (with no approximation) may be obtained by replacing with . When either or is an information symbol (not a pilot), this multiplication result is a random variable, uniformly distributed over a finite set of values with zero average. As a result, for long enough blocks, the contributions from the information symbols to the sum in (30) cancel out and this approximation is fairly accurate. Our criterion may be therefore approximated with

The analysis that follows should be considered valid only for large enough block sizes where (29) is accurate. The expectation of the approximated metric is

The autocorrelation matrix is composed of submatrices, where the submatrix is . Using the standard assumption that the channel's paths are statistically independent (assumption A2), we may express the autocorrelation matrix as a linear combination of the contributions of the channel paths, that is,

Using assumptions A1-A2 and (3), the entry in the submatrix (or equivalently, the element in the matrix ) is

where

The best pilot positioning scheme is therefore

This expression is deterministic and depends only on the BE functions, block size, noise variance, channel order (), Doppler rate, and pilot sequence. It is therefore possible to find the best pilot positioning for a particular set of parameters by evaluating (37) for various . As we did in the previous section, we limit the positioning patterns for patterns in which the pilots are grouped in groups of length , and these groups are spread evenly throughout the block. This positioning strategy coincides with the pilot positioning in [39] for , with the pilot positioning in [30] for and with the pilot positioning in [29] for . In addition, every group of pilots is a Barker sequence of length . Barker sequences are known to enable good channel estimation because of their autocorrelation properties. Define the positioning metric as the value to be minimized in (37). A typical behavior of this positioning metric is shown in Figure 2 (based on (37) and in agreement with simulation results).

The optimal positioning strategy is shown to be dependent on the Doppler rate and the number of pilots in the block. As can be seen from Figure 2, for low Doppler rates it is better to use group of pilots as also indicated by [29, 30] (although the difference is not very significant, at least for short delay spreads). For high Doppler rates and a small number of pilots, however, it turns out that using leads to much better results. This is because there is a tradeoff between accurate estimation of the multipath at specific points in time (that is better achieved by grouping the pilots) and tracking the channel time variations (that is better achieved by spreading the pilots throughout the block). Our results indicate that for high velocities using leads to a lower metric value as this means better ability to track time variations. Note that this result is obtained for severe ISI channel with three equal energy paths (and similar result was obtained for channel with 5 equal energy paths). We have also simulated channels with less severe ISI (that is, decaying power profiles), and the advantage of using was even larger, as could be expected. The switching point (Doppler rate beyond which it is advantageous to use ) is dependent mainly on the percentage of pilots in the block. For larger number of pilots, the switching point will occur at higher Doppler rate. The reason is that for large number of pilots there will be sufficient number of groups in the block to allow tracking of path time variations even when the group size is kept , so both multipath profile and time variations could be estimated accurately. We, however, are interested in the smallest number of pilots that enables good performance, and in these conditions, is advantageous even for moderate Doppler rates (see Figure 2). This conclusion is somewhat surprising as it is different from previous conclusions in [29, 30]. However, these previous works used different channel models and performance criteria. Moreover, both works considered only pilot groups equal to [29] or [30] or longer, to facilitate their analysis.