EURASIP Journal on Applied Signal Processing 2002:12, 1437–1447 c ○ 2002 Hindawi Publishing Corporation A Suboptimal Iterative Method for Maximum-Likelihood Sequence Estimation in a Multipath Context

We present an iterative semiblind suboptimal maximum-likelihood sequence estimation (MLSE) method for single-carrier block transmission over stationary multipath channels. This suboptimal ML detector is based on an iterative least squares with projection (ILSP) algorithm exploiting both the finite alphabet properties of the transmitted signal and its cyclic prefixed structure in order to approach ML detection in a cheap way. Since the initial channel estimate is crucial for the convergence speed of the ILSP algorithm, we propose a new low-complexity stochastic method for providing an initial channel estimate. We therefore rely on some known symbols that are provided by a variant of cyclic prefix only (CP-Only) transmission, known as the known symbol padding only (KSP-Only) technique. The resulting channel model is sufficiently accurate to be used as a starting point for the iterations. The final result is a direct symbol estimation method that is characterised by its low computational complexity and its promising results in terms of bit error rate (BER).


INTRODUCTION
The constantly increasing need for high data rate transmission systems has driven the research in broadband communications in the last years. Multipath effects, resulting in frequency-selective fading, are a major impediment of these broadband communication systems since they introduce intersymbol interference (ISI), which needs to be tackled by appropriate techniques. However, these are most often computationally demanding. In this context, block transmission techniques (multicarrier or single-carrier) based on the use of a cyclic prefix (CP) have attracted a lot of attention in the last years for they allow an efficient and computationally cheap ISI cancellation procedure [1,2]. ISI can be suppressed by a single-tap "frequency-domain" equalization on blocks of data symbols using FFT and IFFT operations relying on the fact that the blocks of transmitted data are made cyclic by the use of a CP, whose length is at least equal to the channel order. More details on these techniques are presented in Section 1.1.
In this paper, we consider the case where the transmitter sends data over an unknown convolutive channel and where the receiver has no a priori information on that channel. The problem in this case consists of estimating the transmitted data without having an explicit channel model. The different approaches to solving this problem fall under two categories. The first category of methods aims at computing an equalizer that will filter the received data in order to cancel the effects of the convolutive channel. This can be done using blind or semiblind techniques that either directly search an equalizer or first compute a channel model that is then used to compute an equalizer. The second category, which is the one that will be investigated here, aims at maximizing the a posteriori probability of the transmitted sequence exploiting the finite alphabet properties of the transmitted signals. These techniques are known as maximum-likelihood (ML) techniques.
Their main advantage is their performance; when optimal ML decoding is done, the lowest achievable error floor for the estimation of the transmitted sequence is obtained. However, their major impediment is their computational complexity since most of them use exhaustive or trellis searching techniques in order to estimate the transmitted sequence. Suboptimal techniques aim at approaching the ML detector with a limited computational complexity. The problem of ML estimation of data transmitted over an unknown channel and some recently proposed techniques are briefly presented in Section 1.2.

Block transmission techniques
Many different block transmission techniques have been proposed in the last years, and we will briefly describe some of them here, discussing their compared advantages and drawbacks. Multicarrier block transmission techniques based on a CP (i.e., discrete multitone (DMT) techniques, also known as OFDM (orthogonal frequency division modulation)) perform an IFFT at the transmitter after which the CP is added; the receiver performs an FFT followed by a one-tap frequency domain equalization. These techniques can be used in combination with carrier loading, which allows the transmitter to optimize the power spectral density (PSD) of the transmitted signal as a function of the channel (see [1, page 7] and [3]). If the channel is unknown to the transmitter, however, this loading is not applicable and the performance of the system becomes very sensitive to deep channel fades in the frequency domain. A solution that is often used to reduce the sensitivity to channel fades in the frequency domain consists of encoding the data across the tones with the drawback of an increased complexity yielded by the encoding and decoding operations. Another drawback of classical DMT is the occurrence of large peaks in the transmitted signal, known as the peak to average power ratio (PAPR) problem [4]. In this paper, we consider the case where the channel is unknown to the transmitter, making carrier loading unapplicable. We therefore focus on single-carrier block transmission techniques based on a CP (i.e., CP-Only techniques), see [2, page 36] and [5, pages 103-104], where the transmitter simply adds a CP to every block of data symbols. This solves the PAPR problem since the transmitted data sequences now show finite alphabet properties while keeping the advantage of computationally cheap ISI mitigation. CP-Only can be seen as a classical OFDM transmission scheme where the data are encoded across the tones by means of an FFT. The sensitivity to channel fades in the frequency domain is therefore reduced by this technique but channel-irrespective symbol recovery is not guaranteed (if there is a zero on the FFT grid of the channel). A variant of CP-Only, which has recently been proposed, is known as the known symbol padding only (KSP-Only) [6,7,8]. A sequence of known symbols is padded to every block of transmitted data symbols, which makes the data pseudocyclic and thus allows the same equalization scheme as in the classical CP-Only context. The known symbols can be further exploited as a training sequence for channel estimation while the percentage of redundant symbols remains roughly the same. Zero-padding (ZP) techniques [2], which were also proposed recently, can be seen as a special case of the KSP-Only techniques where the known symbols are all set to zero. The advantage of these KSP-Only techniques is twofold: they allow new channel estimation techniques exploiting the knowledge of the padded symbols (see, e.g., [7]) and they also allow channel-irrespective symbol recovery.

Blind maximum-likelihood sequence estimation techniques
The problem of blindly estimating a sequence that was transmitted over a convolutive channel h of order L is given as follows. Let x be the transmitted sequence and y the received sequence. The goal is to find among all possible combinations of x and h, the pairx andĥ that yields the highest probability of observing the received sequence y, This maximization problem has to be solved under the constraint that the elements of the transmitted sequence belong to a finite alphabet while the channel coefficients are unconstrained. When the noise at the receiver is AWGN (additive white Gaussian noise), which is a common assumption, this is equivalent to a joint least squares (LS) minimization problem where * denotes the linear convolution operator. Since this problem is separable in its two variablesx and h, it can be solved in two steps. In a first step, we compute the LS channel estimate for every possible transmitted sequence x. In a second step, the value of the cost function (2) is computed for all possible candidates. The joint ML sequence and channel estimator chooses the pair x, h with the smallest cost function.
This exhaustive search procedure is optimal but computationally prohibitive, its cost being exponential in the length of the transmitted sequence. Cheaper optimal decoding techniques have been proposed. A first family of solutions proposes iterative algorithms that rely upon an initial channel estimate. They find the ML transmitted sequence under the hypothesis that the real channel is equal to the available channel estimate. The channel is then re-estimated based on the decoded sequence. These algorithms proceed iteratively until convergence. The different proposed solutions differ in the way they estimate the ML sequence. The K-means algorithm [9] uses the Viterbi algorithm to decode the sequence, while the EM algorithm [10] computes the likelihoods of all possible data sequences conditioned on the available channel estimate.
In [11], noniterative blind methods are proposed. A first proposed optimal technique performs decoding using Viterbi-related trellis search techniques on several hypothesized trellises. The computational cost of this optimal technique remains prohibitive enough to discount its use in practical systems. A suboptimal blind trellis searched technique whose complexity is comparable to adaptive Viterbi decoding with exact channel knowledge was proposed in [11] as well. Here, instead of retaining a single survivor into each state as is done in classical Viterbi, the M best paths into a state are retained. The channel model is then updated for each of the retained paths as in adaptive Viterbi. Each survivor path is thus coupled to a channel estimate. The channel model associated with the path with the lowest metric converges to the real channel model whatever the initial channel estimate. Here again the computational cost, which is exponential in the channel order, remains problematic especially in broadband communications where the channels have long impulse responses.
When ISI is not present in the transmission system, the problem of blindly estimating the transmitted sequence becomes much less complex, thanks to the absence of memory in the channel. In [12], iterative methods exploiting finite alphabet properties of the transmitted signals have been proposed in a MIMO context for such systems. One of these proposed methods, known as iterative least squares with projection (ILSP), approaches ML detection of the transmitted sequence whilst maintaining the complexity at an acceptably low level.

Proposed method
In this paper, we aim at performing blind MLSE (maximumlikelihood sequence estimation) or suboptimal blind sequence estimation in the context of multipath channels whilst maintaining the computational complexity at affordable levels. When block transmission techniques using a CP are used, the ISI is totally cancelled, creating flat-fading channels in the frequency domain. This property allows us to use simpler MLSE methods that are designed for flat-fading channels, but these methods have to be applied in the frequency domain instead of the time domain as it is usually done.
We use CP-Only transmission techniques rather than classical DMT in order to reduce the sensitivity of the system to the frequency-domain fades of the unknown channel. To approach MLSE, we propose a novel ILSP algorithm that exploits both the cyclic prefixed structure of the transmitted signals and their finite alphabet properties. The idea of combining block transmission techniques with MLSE techniques, designed for flat-fading channels, is not new. It has already been used in [13], where a modified ILSP procedure for OFDM systems is proposed. This method reconstructs the frequency-domain channel matrix from a time-domain estimate of the channel with a limited number of taps. In the algorithm proposed here, this tap constraint on the frequencydomain channel matrix is relaxed. Experimental results show that this difference is crucial for the performance of the algorithm, allowing our method to perform much better in terms of bit error rates (BER) than the existing one. The reasons for these improved performances are discussed in more details in Section 6.1.
Experimental results show that the starting point of the iterations is critical for the convergence speed of the ILSP algorithm. In order to reduce the number of iterations before convergence, we need a sufficiently accurate initial channel estimate. We rely therefore on KSP-Only modulation which is a special case of CP-Only. The knowledge of the padded symbols allows us to find an initial channel estimate in a cheap way using a new stochastic method that is described in this paper. This initial channel estimate is the only reason for using KSP-Only instead of CP-Only; the ILSP procedure can be used in a classical CP-Only framework provided that a sufficiently accurate initial channel estimate is made available by other means.
We end up with a semiblind iterative suboptimal MLSE method in the context of CP-Only block transmission over unknown stationary multipath channels, which is characterised by both a low computational complexity and a high BER performance.
The rest of the paper is structured as follows. In Section 2, we present the data model that will be used in this paper. For the sake of simplicity, we directly present the model for KSP-Only; the extension to CP-Only is straightforward. In Section 3, we present the ML detector in the CP-Only context. In Section 4, we propose an ILSP algorithm suited for this context whose performance approaches the ML detector. In Section 5, we propose a stochastic method that exploits the known symbols and provides a sufficiently accurate channel estimate for initialising the ILSP algorithm. In Section 6, we compare the proposed method with the existing ones, namely [11,13]. In Section 7, we present more detailed simulation results, and we finally draw some conclusions in Section 8.

DATA MODEL
Scalars are represented with small letters, vectors with boldface small letters, and matrices with boldface capital letters. We are working in a block transmission context where successive blocks of data are transmitted after one another, forming a long sequence referred to as a burst. A superscript indicates the block index within the burst; a subscript indicates the time index. Hence, x i is the sequence constituting the ith transmitted block of the burst whereas x i j is the jth element of the ith data block.
The data model we present describes a known symbol padding system with ISI. We first define a transmission channel of order L where (·) T denotes transpose and define a training sequence t of length T, Let P be the transmitted block size. The sequence of transmitted data symbols is organised in blocks of length B = P − T. The nth transmitted data block is defined as where s i is the ith transmitted data symbol.
The transmitter transmits a burst of ᐂ blocks where every block is padded with the sequence t. The total transmitted sequence is thus x = t s 1 t · · · t s ᐂ t . We consider the channel to be constant across the transmission of an entire burst. Define the matrix of transmitted data omitting the first transmitted training sequence as The received sequence is the convolution of the transmitted sequence with the channel where x i is the ith element of x and n i is the AWGN at the receiver. The received symbols are organized in ᐂ blocks of length P: y n = y T+(n−1)P+1 · · · y T+nP defined as the nth received block. These blocks are organized in a (P×ᐂ) matrix Define the (P × P) circulant channel matrix Exploiting the cyclic structure of the transmitted symbols, we obtain where N is the AWGN matrix. Define the frequency-domain equivalent of the channel where Ᏺ P is the P-point FFT matrix. The circulant channel matrix H circ is diagonalized by means of an FFT and an IFFT. This allows us to describe the transmission scheme in a simplified way using FFT and IFFT operations [2] where Ᏽ P is the P-point IFFT matrix and H f = diag(h f ) is a diagonal matrix with the frequency-domain description of the channel on the main diagonal.
Note that the ratio of training symbols over the total number of transmitted symbols for this KSP-Only context is equal to L/P when T = L. In a classical CP-Only context, the percentage of redundant symbols is (L/P)(1 + L/P) −1 which tends to be L/P as the number of subcarriers increases. This shows that for a sufficiently large number of subcarriers P, CP-Only and KSP-Only are equivalent as far as the percentage of redundant symbols is concerned. Hence, we do not waste extra bandwidth using KSP-Only instead of CP-Only [8].

MAXIMUM-LIKELIHOOD SEQUENCE ESTIMATION
The data model shows that the received signals can be modeled as deterministic sequences corrupted by AWGN. The log likelihood function of the transmitted data is where α and β are constants, σ 2 is the noise variance, and x n is the nth column of X. The ML detector maximizes ᏸ with respect to the unknown parameters H f and x n , n = 1, . . . , ᐂ, under the following constraint set.
(i) x belongs to a finite alphabet, This is equivalent to the following minimization problem.
under the same constraints. Multiplying both terms by Ᏺ P and using the notations X f = Ᏺ P X and Y f = Ᏺ P Y yields the following alternative formulation of the ML problem.
with the same constraint on H f , and X f being constrained by the finite alphabet structure of X. This problem can be solved using exhaustive search procedures as described in Section 1.2, the cost of this method being totally prohibitive.
We know that the constraint on H f gives it the structure of a diagonal matrix with L + 1 degrees of freedom. We investigate what happens if we give H f P degrees of freedom, that is, if we allow the elements of its diagonal to vary freely. 1 The constraint set becomes as follows.
(i) x belongs to a finite alphabet, (ii) H f is a diagonal matrix with unconstrained elements on the diagonal.
The MLSE problem in our context is thus equivalent to the following: (i) solving a minimization problem equivalently expressed by CF1 or CF2; (ii) this minimization is either performed under the CS1 or the CS2.
In the latter, we focus on the solution of this problem under the CS2. Since the problem is separable in its two variables, the optimization can be carried out in two steps [12], using the two equivalent formulations (14) and (15). We first solve (15) with respect to H f using (15). Exploiting the new condition on H f (CS2), we force it to have the desired diagonal structure by splitting the problem into P independent subproblems where A( j, :) is a shorthand notation for the row vector made of the jth row of a matrix A. The solution of these subproblems can be expressed aŝ where (·) H denotes complex conjugate transpose. Note that this solution would not be achievable under the initial constraint on H f (CS1), all the subproblems being then linked to one another. We thus see that using CS2 allows us to reduce the complexity of the problem. Inserting these results that only depend on X f and Y f into (14) enables us to find the optimum sequence by enumerating over all the possible values for the matrix X, computing the value of the cost function and choosing the matrix that minimizes this cost function. This includes the computation ofĤ f (and thus X f ) for every possible combination of the inputs. The computational cost of this method, which is exponential in P, ᐂ, and the alphabet size, again severely limits its practical interest.
In the noiseless case, the solution to this modified problem (CS2) and the initial ML problem (CS1) are the same. When the received sequence is noisy, we cannot guarantee that the derived solution will be equal to the ML solution under CS1. However, we still find the solution of an ML problem, but with a modified CS2. We therefore propose to classify this method as a modified MLSE method.

ITERATIVE LEAST SQUARES WITH PROJECTION
In the previous section, we derived an expression for a modified ML detector. The associated minimization problem is separable in its continuous and discrete variables. In this section, we apply an ILSP algorithm inspired by [12], that uses this separation property to approach the solution of the modified ML problem by iteratively minimizing the cost functions (14) and (15) for one variable and then for the other.
Assume that we have an initial channel estimateĤ f . We first minimize the cost function (14) with respect to X with a fixedĤ f , This is equivalent to computing a soft estimate of the transmitted symbols, implicitly performing a classical frequencydomain equalization on the received symbolŝ Note thatĤ −1 f is a diagonal matrix withĤ −1 f ( j, j) = H f ( j, j) −1 . This step is followed by a finite alphabet projection of the soft estimatesX = FAP(X) where FAP denotes the finite alphabet projection operation.
Note that in the special case of KSP-Only, the last T symbols of every column of X are known in advance and do not need to be re-estimated at each step. The problem of estimating the transmitted sequence then becomes a true leastsquares problem rather than P equations in P unknownŝ where S is the B × ᐂ matrix of unknown symbols and Ᏽ BP is the B × P partial IFFT matrix made of the first B rows of Ᏽ P . We then use this estimate to minimize the cost function (15) with respect to H f , whereX f = Ᏺ ᏼX is derived from the previous step Again, we split this into P parallel independent problems to force H f to have a diagonal structure the solution of which iŝ Since H −1 f is used in the iterations to estimate X, we can alternatively estimate it directly by solving the following minimization problem: this is a simple least squares problem whose solution iŝ Note that this modification implies that we perform an implicit MMSE equalizer estimate rather than a direct channel estimate, which also has the benefit of avoiding noise enhancement when the next X estimate is performed. This H −1 f is used to compute a newX and we proceed iteratively. The iterations are stopped when two consecutive finite alphabet estimates of the transmitted sequences are identical. The convergence point is a minimum of the cost function associated with the modified ML problem. Note that a fairly good initial channel estimate is needed to avoid a too large number of iterations and/or the convergence of the algorithm to local minima. In the following section, we therefore provide a cheap channel estimation technique for initializing the algorithm which allows this iterative procedure to converge to minima that are close or equal to the global minimum containing the ML sequence. An important advantage of this ILSP method is its low computational complexity. At every iteration, one symbol estimation step and one channel estimation step are performed. The complexity of the symbol estimation step is ᏻ(ᐂP log(P)). The complexity of the channel estimation step is ᏻ(ᐂP log(P)) including the computation of Y f from Y. The total complexity of the proposed method is thus ᏻ(log(P)) per symbol as well. Unlike existing methods, the complexity of the proposed one is independent from the channel order and grows only linearly with the length of the transmitted sequence.
The use of an iterative least squares with enumerations algorithm [12] is an alternative solution to ILSP that always converges. However, in our context, this ILSE procedure has a complexity ᏻ(2 P ) which makes it unaffordable for most practical systems where the number of subcarriers is large. For a practical system with 64 carriers using BPSK modulation, there are in total 1.84 × 10 19 different combinations that have to be enumerated. This approach will therefore not be developed here.

INITIAL CHANNEL ESTIMATE
Define y n T as the (T + L)-long vector of received samples capturing all the energy of the nth transmitted training sequence where n = 1, . . . , ᐂ + 1. Note that there are ᐂ + 1 such vectors. Define T n as the (T + L) × (L + 1) matrix of transmitted symbols that contribute to y n T . In the case where L = T, we have Using (7), we can express y n T as where n n is the AWGN vector. This expression contains both deterministic and random variables. If we look at the expected value of y nT T assuming that E{s i j } = E{n i } = 0 for all i, j, which is the case if the noise is zero-mean and if the data symbols are equiprobable, (28) reads where Define the average value of y n T as It is clear that lim ᐂ→∞ y T = E{y n T }, and we therefore use it as an approximation of E{y n T } even with a finite ᐂ. We can thus write from which we derive a channel estimate based on the knowledge of the training symbolŝ This channel estimate is presented here as a method for initializing H f in the ILSP procedure. Experimental results show that this initial estimate yields good performance of the algorithm, dramatically reducing the number of iterations needed to reach convergence. While any other initial channel estimate can be used as a starting point for the iterations, the advantage of this one is its very low computational complexity.

ILSP for OFDM systems
In [13], an ILSP procedure for joint blind estimation of channel and data symbols in OFDM systems is proposed. Like in our method, the data are organised in blocks of B symbols padded with T known symbols with the condition T L.
The known symbols are set to zero in this case. An IFFT is then computed on the data blocks before transmission and no CP is added. This transmission scheme makes it possible to avoid inter block interference by removing T samples at the receiver. The system can be described with a matrix formulation similar to (10) where H circ is replaced by some appropriate matrix A. This channel matrix A is directly derived from the time-domain model of the channel h. The proposed detection algorithm starts with an arbitrary estimate of the channel matrix A. Based on this channel estimate, a soft estimate of the transmitted sequence is computed, followed by a hard decision. Based on this sequence estimate, the channel matrix A is re-estimated. We could decide to start the next iteration with this new channel matrix estimate. However, the fact that the channel matrix is derived from an Lth order channel model can be seen as a constraint on the structure of A. To fulfill this constraint, [13] seeks the Lth order channel model that is the best LS fit to the estimated channel matrix. A new channel matrix is then computed based on this estimate of h. The iterations proceed with this last channel matrix estimate. Now, we compare this method with the one we propose. First, we slightly modify our transmission scheme by setting the padded sequence to zeros: t = 0 · · · 0 . This modification makes our transmission scheme the exact CP-Only counterpart of the proposed OFDM scheme. This modification makes the initial channel estimate procedure described in Section 5 unapplicable. As a consequence, our iterations are started with random channel estimates. If the random channel estimate is too far from the real channel, the method may converge to irrelevant local minima. When a bad local minimum is reached, we thus restart the iterations with another random channel estimate until a good convergence is achieved. As a consequence, the total number of iterations needed to converge is largely increased.
As mentioned earlier, the most important difference between this OFDM method and the one we propose resides in the constraints that are put on the channel matrix. In the OFDM method, the A channel matrix is forced to be derived from an Lth order channel model, allowing only L+1 degrees of freedom to the channel matrix. If this constraint is relaxed, A has B 2 degrees of freedom and the convergence properties of the iterative algorithm become erratic because the problem is under-constrained. In the proposed method, we could have decided to put an equivalent constraint on H f by forcing it to be the frequency-domain description of a general Lth order channel, allowing L + 1 degrees of freedom to the channel matrix. However, in order to reduce the computational complexity of the algorithm, we decided to relax this constraint on H f , replacing it by the constraint of having a diagonal structure. This increases the number of degrees of freedom on H f to P instead of the L + 1 offered by the previous constraint. Note that this is still much less than the P 2 degrees of freedom we would have had if H f had been totally unconstrained.
Experimental results, that are presented, further show that relaxing this Lth order time-domain equivalent channel constraint on the channel matrix yields spectacular improvements in terms of BER. Next we give an intuitive explanation for these improved performances.
When blind MLSE is performed by exhaustive search (see Section 1.2), the cost function (2), (14), or (15) is minimized for every possible input sequence. Iterative methods avoid this computationally prohibitive exhaustive search by only considering a few selected candidates among all the possible sequences. The general procedure is as follows: we start with an initial sequence candidate; the LS channel model associated with this sequence candidate is then computed (this LS channel model is also the ML channel model). The next step consists in selecting a new sequence candidate under the hypothesis that the provided channel model is exact. Finally, the best proposed sequence is selected (the one with the lowest cost function). The technique that is used to select new candidate sequences is crucial for the performance of the iterative methods. Only the candidates selected by this technique will be considered for estimating the transmitted sequence. For instance, the K-means algorithm [9] performs ML detection using the Viterbi algorithm under the hypothesis that the provided channel model is exact in order to select new candidates. The selection procedure used in the existing ILSP method for OFDM systems implicitly performs a zero-forcing (ZF) equalization of the estimated channel. The best possible candidate is obtained performing such a ZF equalization with the real channel model. However, ZF equalization procedures are known to have a poor performance for ill-conditioned channels and low SNRs. This poor selection method severely limits the performance that could possibly be achieved by such methods. The proposed method behaves a bit differently. As we let the iterations run, H f converges to the frequency-domain diagonal matrix that will best project the received sequence onto the known finite alphabet of the transmitted sequence. The H f that is computed at each step is thus a sort of frequency-domain best fit between the hypothesised input and the received noisy channel output, instead of the ML channel model for the corresponding input sequence that is usually computed. The noise effects are implicitly included in the computed H f since its estimate is based on noisy data observations. We end up with a frequency-domain MMSE estimator that generates much better candidate sequences than the existing ZF sequence estimator. Note that it would be impossible to find this MMSE estimator if H f was constrained according to CS1. Thus, using CS2 rather than CS1 not only allows to reduce the complexity of the method, but also finds better candidate sequences, and thereby increases the performance of the algorithm.
In order to illustrate this discussion, we performed simulations comparing the proposed method with the existing OFDM one. As mentioned earlier, the padded symbols were set to zero and the iterations were started with random initial channel estimates for our method, the initial channel estimate for the OFDM method being set to the identity matrix as recommended by the authors.
In a first experiment, we compare the OFDM method with a modified version of our algorithm that uses the initial CS1 rather than the modified one. We thus perform an LS estimation of the time-domain channel model h at each iteration. We then perform an implicit ZF equalization on the received symbols in order to find the next candidate input sequence. This CS1 method slightly performs better in terms of BER than the existing ILSP method for OFDM. The only reason for this is that the proposed method is a CP-Only method, it is thus less sensitive to the frequency-selectivity of the transmission channel than the existing OFDM method (see discussion in Section 1.1).
In a second experiment, we relax the constraint on H f , only forcing it to have a diagonal structure and applying the second set of constraint CS2 as we did in the rest of the paper. This results in much better BER performance than in the previous experiment. Since the problem is now under-constrained, the convergence properties of the algorithm get much worse than the previous case, requiring many more iterations and initial channel guesses to reach the optimum. Note that if known symbols were to be used like we initially proposed, the convergence would not be problematic anymore since the initial starting point would then be accurate enough to achieve fast and cheap convergence.
The results are presented in Figure 1 where the achieved BERs are shown for different SNRs. The experiments were performed on a set of 200 different realizations of a first order Rayleigh fading channel using BPSK signaling. We ran the three proposed methods on the same data sequences and noise realizations. The experiments were performed with a block length ᐂ = 100 and 64 subcarriers. The upper line is the BER obtained with the existing OFDM method. The second curve is obtained with our modified method. The lower  curve is obtained with our method when H f is only constrained to have a diagonal matrix structure.

Trellis searched blind equalization
In Section 5, we have seen how relaxing the constraint on H f allows our method to reach high performance. In this section, we investigate how the resulting iterative method compares with existing suboptimal blind MLSE methods. We chose to compare our method with a trellis searched blind equalization method that was first proposed in [11], which approaches blind ML decoding whilst keeping the computational complexity at a reasonable level. This method was already presented in Section 1.2. The KSP context we are working in allows us to make a few changes, lowering both its computational complexity and implementation cost. In [11], the M best paths into a state are retained instead of one as done in the classical Viterbi algorithm. The main reason for this is the absence of a reliable initial channel estimate. In the context of this paper, a quite accurate initial channel estimate is available exploiting the known symbols of the transmitted sequence (see Section 5). This allows us to retain only one surviving path into each state. Another feature of interest which is offered by the known padded sequence is that it forces the trellis initialization and termination. At the beginning and the end of a transmitted block, the channel memory is filled with the training sequence that is padded to the blocks of data symbol. This allows us to perform a block per block detection without having to retain the trellis metrics and surviving path between different blocks.
The results are presented in Figure 2 where the BERs achieved for different SNRs are shown. The experiments were performed on a set of 200 different realizations of a thirdorder Rayleigh fading channel using BPSK signaling. We ran  the different methods on the same data sequences and noise realizations. The experiments were performed with a blocklength ᐂ = 100 and 64 subcarriers. The two methods show similar performance in terms of BERs. However, simulations showed that the proposed method requires much less computing power than the trellis searched one. As the channel order increases, the complexity of our method remains the same whilst the complexity of the trellis method grows exponentially.

SIMULATION RESULTS
In this last section, we investigate the performance of the proposed method for higher-order channels. We also study the impact of the burst length ᐂ both in terms of performance and complexity. We present simulation results for Rayleigh fading channels with L = 5, T = 5, and P = 64. The simulations were performed using BPSK modulation. We used a large number of randomly generated Rayleigh fading channels and approximately estimated 10 7 data symbols for each point of the graph. Figure 3 shows the BER for the proposed method as a function of the SNR. The different curves show the results for different values of the burst length ᐂ. The lower curve shows the theoretical BER floor for ML decoding of BPSK signals that are transmitted over an Lth order Rayleigh fading SISO channels [14, page 955]. The upper curve shows the BER obtained with classical ZF equalization in the frequency domain assuming perfect channel knowledge.
The achieved BER approaches the MLSE limit as ᐂ increases. This is probably because the curve of the cost function gets smoother as the number of data symbols increases, which keeps the algorithm from getting stuck in local  minima or because the initial estimate becomes more accurate. Figure 4 shows the average number of iterations that were required for the algorithm to converge. This number is maintained at a quite low level thanks to the good initial channel estimator. This is especially true for high SNRs. It should be noted that large ᐂ's, which yield better BER performance, require more iterations, especially for low SNRs. Figure 5 shows the evolution of the BER performance as the number of iterations increase. The upper curve shows the performance after one iteration, the second curve shows the BER obtained after two iterations, and so on. The experiments are performed on third order channels with a block length ᐂ set to 100. We see from these experiments that most of the gain is achieved after four iterations. Further iterations still improve the BER but only in a marginal way. Thus, this shows that a very small number of iterations is required in order to approach closely the optimal BER for this method.

CONCLUSIONS
The contribution of this paper is twofold. First, a new semiblind iterative modified MLSE method for CP-Only transmission over stationary multipath channels has been proposed. This method achieves a very low BER, approaching the ML sequence estimate while maintaining the computational complexity at very low levels.
Second, a new stochastic channel estimation method was proposed which can be used in a KSP-Only context. This method provides a sufficiently good channel estimate to initialize the iterative algorithm. Its main advantage is its low computational complexity.