EURASIP Journal on Applied Signal Processing 2004:5, 727–739 c ○ 2004 Hindawi Publishing Corporation Maximum Likelihood Turbo Iterative Channel Estimation for Space-Time Coded Systems and Its Application to Radio Transmission in Subway Tunnels

This paper presents a novel channel estimation technique for space-time coded (STC) systems. It is based on applying the maximum likelihood (ML) principle not only over a known pilot sequence but also over the unknown symbols in a data frame. The resulting channel estimator gathers both the deterministic information corresponding to the pilot sequence and the statistical information, in terms of a posteriori probabilities, about the unknown symbols. The method is suitable for Turbo equalization schemes where those probabilities are computed with more and more precision at each iteration. Since the ML channel estimation problem does not have a closed-form solution, we employ the expectation-maximization (EM) algorithm in order to iteratively compute the ML estimate. The proposed channel estimator is first derived for a general time-dispersive MIMO channel and then is particularized to a realistic scenario consisting of a transmission system based on the global system mobile (GSM) standard performing in a subway tunnel. In this latter case, the channel is nondispersive but there exists controlled ISI introduced by the Gaussian minimum shift keying (GMSK) modulation format used in GSM. We demonstrate, using experimentally measured channels, that the training sequence length can be reduced from 26 bits as in the GSM standard to only 5 bits, thus achieving a 14% improvement in system throughput.


INTRODUCTION
Recently, the so-called Turbo codes [1,2,3] have revealed themselves as a very powerful coding technique able to approach the Shannon limit in AWGN channels. A Turbo code is made up of two component codes (block or convolutional) parallely or serially concatenated via an interleaver. This simple coding scheme produces very long codewords, so each source information bit is highly spread through the transmitted coded sequence. At reception, optimum maximum likelihood (ML) decoding can be carried out by considering the hypertrellis associated with the concatenation of the two component codes. Obviously, such a decoding approach becomes impractical in most situations. The key idea behind Turbo coding is to overcome this problem by employing a suboptimal, but very powerful, decoding scheme termed iterative maximum a posteriori (MAP) decoding [3,4]. Basically, the method relies on independently decoding each of the component codes and exchanging in an iterative fashion the statistical information, that is, the a posteriori probabilities about symbols, obtained in each decoding module.
The same decoding principle has also been successfully applied, under the term Turbo equalization [5], to effectively compensate the ISI induced by the channel and/or the modulation scheme. This technique exploits the fact that ISI can be viewed as a form of rate-1, nonrecursive coding. So, whatever coding scheme is used, if an interleaver is located prior to the channel, the overall effect of coding and ISI can be treated as a concatenated code and therefore, iterative MAP decoding can be applied. Luschi et al. [6] present an in-depth review of this technique and further improvements can be found in [7,8,9,10]. In general, iterative MAP processing can be applied to a variety of situations where the overall system can be viewed as a concatenation of modules whose input/output relationship can be described as a (hidden) Markov chain. Several works have appeared in the last years exploiting this idea. For instance, Görtz [11], Garcia-Frias and Villasenor [12], and Guyader et al. [13] worked on the problem of joint source-channel decoding and Zhang and Burr [14] addressed the problem of symbol timing recovery.
In practical receivers, where the channel impulse response has to be estimated, it is convenient to have channel estimators capable of benefiting from the high performance of Turbo equalizers [15,16,17]. Moreover, secondand third-generation mobile standards consider the transmission of pilot sequences known by the receiver for channel estimation purposes. In the global system mobile (GSM) standard, this sequence is 26 bits long, which represents 17.6% of the total frame length (148 bits) [18]. Such a long training sequence is necessary if classical estimation techniques, such as least squares (LS), are used. Employing more refined channel estimators, such as the one presented in this paper, we can dramatically decrease the necessary length of the training sequence and therefore increase the overall system throughput. In [19], an ML-based channel estimator is presented where the ML principle is applied not only to the pilot sequence, but also to the whole data frame. Since the involved optimization problem had no analytical solution, the expectation-maximization (EM) algorithm [20] was used for iteratively obtaining the solution.
Also, wireless communications research has been very influenced by the discovery of the potentials of communicating through multiple-input multiple-output (MIMO) channels, which can be carried out using antenna diversity not only at reception, as classical space-diversity techniques have been doing, but also at transmission. MIMO techniques have the advantage to provide high data rate wireless services at no extra bandwidth expansion or power consumption. Telatar [21] calculated the capacity associated with a MIMO channel that in certain cases grows linearly with the number of antennas [22]. More recent progress in information theoretical properties of multiantenna channel can be found in [23].
Although MIMO channel capacity can be really high, it can only be successfully exploited by proper coding and modulation schemes. The term space-time Coding (STC) [24,25] has been adopted for such techniques. Special efforts have been made in code design [24,26] and several decoding approaches have been developed for these codes. In both fields, the Turbo principle has been applied in profusion. Turbo ST codes designs can be found in [27,28,29] and various Turbo decoding schemes are exposed in [30,31].
As in single-antenna systems, practical ST receivers must perform the operation of channel estimation. Having efficient and robust estimators is crucial to guarantee that the system performance degradation due to the channel estimation error is minimized. In this paper, we present a novel channel estimation technique that gathers both the deterministic information corresponding to the pilot sequence and the statistical information, in terms of a posteriori probabilities, about the unknown symbols. The method is suitable for Turbo equalization schemes where those probabilities are computed with more and more precision at each iteration. We derive the channel estimator for general MIMO time-dispersive channels and analyze its performance in a multiple-antenna communication system based on the GSM standard operating inside subway tunnels.
The main motivation for developing a multiple-antenna GSM-based communication system is the following. GSM is, by far, the most widely deployed radio-communication system. Since 1993, its radio interface (GSM-R) has been adopted by the European railway digital radio-communication systems. Due to the conservative nature of its market, it is expected that railway radio-communication systems will employ GSM-R for the long-term future. For this reason, when subway operators wish to deploy advanced, high data rate, digital services for security or entertainment purposes, it is very likely that they will prefer to increase the capacity of the existing GSM-R system rather than switch to another radio standard. STC and Turbo equalization are very promising ways of achieving this capacity growth [32]. In this specific application, we will show that the proposed iterative MLMIMO channel estimation method has large benefits over traditional channel estimation approaches.
The rest of the paper is organized as follows. Section 2 presents the signal model and Section 3 describes the Turbo equalization scheme for STC systems. Next, in Section 4, we derive the ML channel estimator for a general time-dispersive MIMO channel. Since direct application of the ML principle leads to an optimization problem without closed-form solution, the EM algorithm is applied for computing the actual value of the solution, resulting in the so-called ML-EM estimator. The application of the proposed channel estimator to a STC GSM-based system operating in subway tunnels is detailed in Section 5. Section 6 presents the results of computer experiments for both the general case and experimental measurements of subway tunnel MIMO channels. Finally, Section 7 is devoted to the conclusions.

SIGNAL MODEL
We consider the transmitter signal model corresponding to an STC system shown in Figure 1. The original bit sequence u(k) feeds an ST encoder whose output is a sequence of vectors c(k) = [c1(k) c 2 (k) · · · c N (k)] T , with N being the number of transmitting antennas. The specific spatiotemporal structure of the sequence of vectors c(k) depends on the particular STC technique employed. Any of the several STC methods that have been proposed in the literature could be used in our scheme. However, we have focused on ST . . . . . . Figure 1: Transmitter model. trellis codes [24,25] to elaborate our simulation results. Each component of c(k) is independently interleaved to produce a new symbol vector b(k) = [b1(k) b 2 (k) · · · b N (k)] T and these are the symbols that are afterwards modulated (waveform encoded) to yield the signals s i (t; b i ) i = 1, 2, . . . , N that will be transmitted along the radio channel. Without loss of generality, we will assume that the modulation format is linear and that the channel suffers from time-dispersive multipath fading with memory length m. It is well known that at reception, matched-filtering and symbol-rate sampling can be used to obtain a set of sufficient statistics for the detection of the transmitted symbols. Using vector notation, this set of statistics will be grouped in vectors where L is the number of receiving antennas and K is the number of total transmitted symbol vectors in a data frame. Elaborating the signal model, it can be easily shown that the sufficient statistics x(k) can be expressed as where matrix H = [H(m − 1) H(m − 2) · · · H(0)] represents the overall dispersive MIMO channel with memory length m. Each submatrix contains the fading coefficients that affect the symbol vector b(k − i). Vector z(k) results from stacking the source vectors b(k), that is, Finally, the noise component v(k) is a vector of mutually independent complex-valued, circularly symmetric Gaussian random processes, that is, the real and imaginary parts are zero-mean, mutually independent Gaussian random processes having the same variance. We will also assume that the noise is temporally white with variance σ 2 v . Figure 2 shows the block diagram of an ST Turbo detector. The MAP equalizer [4] computes L[b(k)|x] which are the a posteriori log-probabilities of the input symbols b(k) based on the available observationsx = [x T (0) x T (1) · · · x(K − 1)] T . Due to its time-dispersive nature, it is convenient to represent our MIMO channel by means of a finite-state machine (FSM) having 2 N(m−1) states. This FSM has 2 N transitions per state which implies that there is a total number of 2 Nm transitions between two time instants. Let e k = (s k−1 , b(k), s(k), s k ) be one of the 2 Nm possible transitions at time k of this FSM. This transition depends on four parameters: the incoming state s k−1 , the outgoing state s k , the input symbol vector b(k), and the output symbol vector without noise s(k) = Hz(k). It is important to point out that the incoming state is determined by the m − 1 previous symbol vectors, that is,

ST TURBO DETECTION
). On the other hand, the outgoing state is a function of the previous state and the current input symbols, that is, s k = f next (s k−1 , b(k)). For a better description of the MAP equalizer, we are going to introduce the notation b(k) = L in (e k ) and s(k) = L out (e k ) to represent the input and output symbol vectors associated to the transition e k , respectively. Note that the output vector does not depend on the outgoing state s k , so we will slightly change our notation and write The a posteriori log-probabilities L[b(k)|x] can be recursively computed by means of the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [3,4] which is summarized in the sequel. The first stage when computing the a posteriori log-probabilities is noting that where h b is the constant that makes P[b(k)|x] a probability mass function and is the joint log-probability of the transition e k and the set of available observationsx. This joint log-probability can be expressed as where Note that the noise variance σ 2 v is needed in (9). Our simulation results assume this parameter as known. However, it could be estimated and, in particular, it can be considered as another parameter to be estimated by the ML estimator described in Section 4, as shown in [33], for the case of a decision feedback-equalizer (DFE) instead of a MAP detector. The computation of the quantities α k [s], γ k [e k ], and β k [s] can be carried out recursively by first performing a forward recursion with initial values α 0 [s = 0] = 0 and α 0 [s = 0] = −∞, and then proceeding with a backward recursion using as initial values Similarly, the decoder has to compute the a posteriori logprobabilities of the original symbols L[u(k); O] from their a priori log-probabilities L[u(k); I] = log(0.5) and the a priori log-probabilities L[c(k); I] which come from the detector. Again, the BCJR algorithm applies [3,4]. It also computes the a posteriori log-probabilities of the transmitted symbols (14) where L[c(k); I] is utilized as branch metric. These computed log-probabilities are then fed back to the detector to act as the a priori log-probabilities L[b(k)]. As reflected in Figure 2, note that it is always necessary to subtract the a priori component from the computed log-probabilities before forwarding them to the other module in order to avoid statistical dependence with the results of the previous iteration.

MAXIMUM LIKELIHOOD CHANNEL ESTIMATION
Channel estimation is often mandatory when practically implementing ST detection strategies, unless we deal with some kind of blind processing techniques. In this section, we will present a novel channel estimation method that will enable us to take full advantage from the Turbo detection scheme presented in the Section 3.
When developing our channel estimation approach, we will exploit the fact that transmitted data frames in most practical systems contain a deterministic known pilot sequence of length M for the purpose of estimating the channel at reception. For instance, in GSM, this sequence is M = 26 bits long [18].
corresponds to the information sequence. The ML estimator is thus given by where fx t |bt ;H is the probability density function (pdf) of the observations conditioned on the available information (the training sequence b t ) and the parameters to be estimated (the channel matrix H). Although, this is a problem without closed-form solution, the EM algorithm [20] can be employed to iteratively solve (15). The EM algorithm relies on defining a so-called "complete data" set formed by the observable variables and by additional unobservable variables. At each iteration of the algorithm, a more refined estimate is computed by averaging the log-likelihood of the complete data set with respect to the pdf of the unobservable variables conditioned on the available set of observations. Using the EM terminology, we define the union of the observations (which are the observable variables) and the transmitted bit sequence (which are the unobservable variables) f ] T as the complete data set, whereas the observations x f are the incomplete data set. The relationship betweenx e andx f must be given by a noninvertible linear transformation, that is,x f = Tx e . It can be easily seen that in our case, this transformation is given by With these definitions in mind, the estimate of the channel at the i + 1th iteration is obtained by solving where E f {·} denotes the expectation operator with respect to the pdf f (x). Expanding the previous expression, we have where the last equality follows from the fact that, as far as we assume AWGN, the pdf of the observations conditioned on the transmitted symbols fx |b; Hi is Gaussian. This leads to the following quadratic optimization problem: with the closed-form solution 1 where Note that for computing (22) and (23), it is necessary to know the probability mass function p z(k)|x; Hi . Towards this aim, we take benefit from the Turbo equalization process because where h z is the constant that makes p z(k)|x; Hi a probability mass function and L[e k ,x] is the joint log-probability of the transition e k and the set of available observations. Notice that this quantity has already been computed in the Turbo equalization process (see (7)). This fact makes the proposed channel estimator very suitable to be used within a Turbo equalization structure.

APPLICATION TO AN STC SYSTEM FOR SUBWAY ENVIRONMENTS
We focus now on the application of the ML-EM channel estimator described in Section 4 to an STC GSM-like system for underground railway transportation systems. Some practical considerations follow. In subway tunnel environments, propagation conditions result in flat multipath fading because its delay spread is small when compared to the GSM symbol period [35]. Nevertheless, the modulation employed by the GSM standard, Gaussian minimum shift keying (GMSK), induces controlled ISI and thus Turbo ST Equalization can be employed for the purpose of joint demodulating and decoding. In addition, experimental measurements [36] have revealed that in this environment, there exist strong spatial correlations between subchannels. These spatial correlations will be taken into account when evaluating the receivers' performance in the following section because we will use, in the computer simulations, experimental measurements of MIMO channel impulse responses obtained in subway tunnels. These field measurements have been carried out in the framework of the European project "ESCORT" [37]. We will show how the proposed channel estimator allows to reduce the necessary length of the training sequence from 26 bits in the GSM standard up to only 5 bits, while performance is maintained very close to the optimum (i.e., the bit error rate (BER) obtained when the channel is perfectly known at reception) which clearly implies a very high gain in the overall system throughput. Figure 1 can be useful again for modeling the STC transmitter under consideration (for the sake of clarity, we refer the reader to Appendix A for a detailed description). This model can be summarized as follows. Each component of b(k) is independently modulated using the GMSK modulation format. GMSK is a partial response continuous phase modulation (CMP) signal and thus a nonlinear modulation format. Nevertheless, it can be expressed in terms of its Laurent expansion [38,39,40] as the sum of 2 p−1 PAM signals, where p is the memory induced by the modulation. For the GMSK format in the GSM standard, p = 3 but the first PAM component contains 99.63% of the total GMSK signal energy [39,40], so we can approximate the signal radiated by the ith antenna as where E b is the bit energy, T the symbol period, a i (k) = ja i (k − 1)b i (k) are the transmitted symbols which belong to a QPSK constellation, b i = {b i (k)} ∞ k=−∞ is the bit sequence to be modulated, and h(t) is a pulse waveform that spans along the interval [0, pT], where p is the memory of the modulation. It is demonstrated in [38] that the transmitted symbols a i (k) are uncorrelated and have unit variance. In order to simplify the detection process at the receiver, we will assume that a differential precoder is employed prior to modulation, that is, Considering that the transmission channel inside subway tunnels suffers from flat multipath fading [35], the signal received at the lth antenna is where h li is the fading observed between the ith transmitting antenna and the lth receiving antenna and n l (t) is a continuous-time complex-valued white Gaussian process with power spectral density N 0 /2. The received signals y l (t) are passed through a bank of filters matched to the pulse waveform h(t) and sampled at the symbol rate in order to obtain a set of sufficient statistics for the detection of the transmitted symbols. Because h(t) does not satisfy the zero-ISI condition, a discrete-time whitening filter [41,42] is located after sampling. In addition, the rotation j k induced by the GMSK modulation is compensated by multiplying the received signal by j −k , resulting in the following expression for the observations: where v l (k) represents the complex-valued AWGN with variance σ 2 v and f (m) = [0.8053, −0.5853 j, −0.0704] is the equivalent discrete-time impulse response that takes into ac-count the transmitting, receiving, and whitening filters, and the derotation operation. Using vector notation, the output of the whitening filters after the derotation can be expressed as where Equation (28) can be rewritten in the form of (1) as However, this signal model for the observations does not emphasize that the ISI comes from the GMSK modulation format instead of the time-dispersion of the multipath channel. As a consequence, we prefer to rewrite (28) as where

ML channel estimation for STC GSM-like systems with flat fading
Estimating the channel according to (30) and directly applying the method described in the previous section is highly inefficient because we have to estimate an unnecessarily large number of parameters. In addition, this way we do not take into account the knowledge at reception of the controlled ISI introduced by the modulator, given by f (m). Equation (31) is preferable because it enables us to formulate the estimation of only the unknown channel coefficients h li , as it is explained in the sequel. Again, we assume that the transmitted data frames contain a known pilot sequence of length M. The ML estimator of the channel is given by This is a problem without closed-form solution, so we will apply the EM algorithm in a similar way to the general case explored in Section 4. We define the complete and incomplete data sets asx Making similar manipulations to those made for the timedispersive MIMO channel, we arrive at the following optimization problem: which is also a quadratic optimization problem whose solution is where Here we need to average with respect to the pdf f B(k)|x; Hi . Again, we take benefit from the Turbo equalization process because (38) where h B is the constant that makes p B(k)|x; Hi a probability mass function and L[e k ,x] is a quantity already computed in the Turbo equalization process.

Rayleigh MIMO channel
Computer simulations were carried out to illustrate the performance of the proposed channel estimator. Figure 3 plots the BER after decoding obtained for a 2 × 2 STC system over a nondispersive channel. Data are transmitted in blocks of 218 bits out of which the pilot sequence occupies M = 10 bits. The performance curves for both the LS method and when the channel is perfectly known are also shown for comparison. Note that there is no iteration gain when the channel is known because there is no ISI and, therefore, no "inner coding" for the Turbo processing. Nevertheless, this is not true when the ML-EM channel estimator is used because the channel is reestimated at each iteration of the Turbo equalization process. The ST encoder is a rate 1/2 full diversity convolutional binary code with generating matrix  [26]. The independent interleavers are 20800 bits long each. The modulation format is BPSK and each channel coefficient is modeled as a zeromean, complex-valued, circularly invariant Gaussian random process. Consequently, their magnitudes are Rayleigh distributed. We have also assumed that the channel coefficients are both temporally and spatially independent, having variance σ 2 h = 1/2 per complex dimension. The signal-tonoise ratio (SNR) is defined as where Tr{·} denotes the trace operator. The channel changes at each transmitted block. Figure 3 shows that, even if its result for the first iteration is very poor, the ML-EM channel estimator outperforms the classical LS method from the fourth iteration. The bad performance obtained by the ML-EM estimator at the first iteration comes from the fact that the Turbo equalizer is using an uninformative initial estimate of the channel. Specifically, (19) can be viewed as an LS estimator, where the correlation matrices R xz,t and R z,t have been modified by the addition of the matrices R xz and R z , respectively. In the first iteration, these matrices are computed by assuming that p z(k)|x; Hi is a uniform probability mass function (therefore, independent of the initial channel estimate H 0 ) in (22) and (23). This results in a degradation of the pure LS estimator and a very high symbol error rate (SER) after decoding. Such a high SER (around 0.4) can never lead the Turbo equalization process to convergence. However, in our case, convergence is achieved because, in the next iterations, a substantial improvement is obtained in channel estimation from the EM algorithm (not from the Turbo structure itself). Notice that one iteration of the EM algorithm (19) is performed only after one complete equalization and decoding step. Anyway, once the channel estimate is good enough for the Turbo  equalization structure to lie in its convergence region, both the EM algorithm and the Turbo iterative process help in reducing the error rate. Figure 3 also shows that at the eighth iteration, the performance is very close to the optimum, that is, known channel case. Only 0.5 dB separates the two curves at a BER of 10 −4 . Figure 4 shows the results (BER after decoding) obtained when a time-dispersive MIMO channel with memory m = 2 is considered. The simulation parameters are the same as in Figure 3. In particular, note that, again, each channel coefficient has variance σ 2 h = 1/2 per complex dimension. It is apparent that at the fourth iteration, the ML-EM estimator performs very similar to the LS method, which does not improve significantly through the iterations. At the eighth iteration, the performance of the ML-EM estimator is again very close to the known channel case.

GSM-based transmission over subway tunnel MIMO channels
The performance of the proposed GSM-based transmission system with a Turbo STC receiver in subway tunnel environments has also been tested through computer simulations. The channel matrices H result from experimental measurements (carried out within the framework of the European project "ESCORT") of the MIMO channel impulse response present in a subway tunnel. The experimental setup consisted of four transmitting antennas, each one having a 12 dBi gain, located at the station platform, and four patch antennas located behind the train windscreen. The complex impulse responses were measured with a channel sounder having a bandwidth of 35 MHz by switching successively the antennas and stopping the train approximately each 2 m. From the whole set of 4 × 4 measured subchannels, only those corresponding to the furthest antennas were picked up for constructing a 2 × 2 system. In [35], it was demonstrated that the mean capacity of the measured channel is less than the ca- pacity of Rayleigh fading channels, this difference being more remarkable in the case of a 4 × 4 system. The ability of our channel estimation technique to combine the deterministic information of the pilot symbols and the statistical information from the unknown symbols, thanks to the ST Turbo detector, enables us to considerably reduce the size of the training sequence in GSM systems. Indeed, by means of computer simulations, we have determined the minimum length of the training sequence for the considered GSM-based MIMO system. Figure 5 shows the channel estimation mean square error (MSE) for several values of the training sequence length (M = 4, 5, and 6 bits). The channel code is the same as in the previous simulations. The interleaver size is 20800 bits and the frame length is 148, as established in the GSM standard. There is a significant difference in the estimation error between using M = 4 bits and M = 5 bits, whereas the gap between M = 5 and M = 6 is very small. This points out that M = 5 bits is the minimum length for the training sequence. This assumption can also be corroborated in Figure 6, where the SER at the output of the decoder is plotted versus the required SNR.
Next, we compare the results obtained with the proposed estimator using a training sequence of M = 5 bits and those obtained with classical LS using a training sequence of M = 26 bits (the length standardized in GSM). The results obtained when the receiver perfectly knows the channel are also plotted for comparison. As it is shown in Figure 7, the proposed method (ML-EM) with M = 5 bits performs better than the LS with M = 26 bits beyond the sixth iteration, achieving a performance very close to the known channel case beyond the seventh iteration.

CONCLUSIONS
In this paper, we propose a novel ML-based time-dispersive MIMO channel estimator for STC systems that employ Turbo ST receivers. We formulate the ML estimation problem that takes into account the deterministic symbols corresponding to the training sequence and the statistics of the unknown symbols. These statistics can be obtained and successively refined if an ST Turbo equalizer is used at reception. This full exploitation of all the available statistical information at reception renders an extremely powerful channel estimation technique that outperforms conventional approaches based only on the training sequence. Since the involved optimization problem has no closed-form solution, the EM algorithm is employed in order to iteratively obtain the solution. The main limitation of our approach is that the computational complexity of the channel estimator grows exponentially with the number of transmitting antennas and the channel memory size, hence it is only practical for a moderate size of the transmitter antenna array. Note, however, that this complexity is inherent to the problem of optimal detection and estimation in MIMO systems. The method has been particularized for a realistic scenario in which an STC system based on the GSM standard transmits along railway subway tunnels. Simulation results show how our channel estimation technique enables us to diminish the training sequence length up to only 5 bits, instead of the 26 bits considered in the GSM standard, thus achieving a 14% increase in the system throughput.

A. SIGNAL MODEL OF AN STC GSM SYSTEM
The transmitter model depicted in Figure 1 is valid for an STC GSM system. The signal radiated by ith antenna is given by [38,40]  where E b is the bit energy, T the symbol period, b i = {b i (k)} ∞ k=−∞ the bit sequence to be modulated, and where g(t) is the convolution between a Gaussian-shaped pulse and a rectangular-shaped pulse centered at the origin [43,44], that is, where B is the 3 dB bandwidth of u(t). It is possible to derive a closed-form expression for g(t) given by [38,40] where is the Gaussian complementary error function. With the aim of simplifying subsequent analysis, we redefine g(t) ≡ g(t − p/2T), so it is limited to the interval [0, pT], where p is the number of symbol periods where the signal has significant values. For GSM (B = 0.3), a value of p = 3 is reasonable [40], as it can be verified in Figure 8, that plot the properly shifted versions of g(t) and q(t) when B = 0.3. Since GMSK is a partial response CPM, it can be expressed in terms of its Laurent expansion [38,39,40], formed by the sum of 2 p−1 PAM signals, where p is the memory induced by the modulation. Since in GSM, the first PAM component contains 99.63% of the total GMSK signal energy [39,40], we can approximate the signal radiated by the ith antenna by where a i (k) = ja i (k − 1)b i (k) are the transmitted symbols, which belong to a QPSK constellation, are uncorrelated and have unit variance [38]. In order to simplify the detection process at the receiver, we will assume that a differential precoder is employed prior to modulation, that is, where C(t) = cos(πq(|t|)). Figure 9a shows that it takes significant values over the interval [0.5T, 3.5T] because the actual and the linearized GMSK waveforms are shifted by half a symbol period. In order to detect the transmitted symbols, s i (t; b i ) is passed through a filter matched to the pulse waveform h(t) and then sampled at the symbol rate. The output of the matched filter is given by and R h (t) (see Figure 9b) denotes the autocorrelation function of h(t). After sampling, we have where the autocorrelation function of g(k) is R g (k) = (N 0 /2)R h (k). Clearly, the noise g(k) is colored because h(t) does not satisfy the zero-ISI condition. Since it is more comfortable to perform detection assuming white noise, a discrete-time whitening filter [41,42] is located after sampling where F * (z −1 ) comes from the factorization of the autocorrelation function R h (k) = F(z)F * (z −1 ). This expression for the whitening filter leads to an overall system response given by F(z). In Appendix B, we demonstrate that the maximum phase F(z) polynomial is given by where ρ 1 = −0.1522, ρ 2 = −0.5746, and r 2 = R h (−2). In addition, the rotation j k induced by the GMSK is compensated by multiplying the received signal by j −k , resulting in the following expression for the observations: where v l (k) is AWGN with variance σ 2 v and f (m) = [0.8053 −0.5853 j −0.0704] is the equivalent discrete-time impulse response that takes into account the transmitting, receiving, and whitening filters, and the derotation operation. Using vector notation, the output of the whitening filters after the derotation can be expressed as in (28), where x(k) = [x1(k) x 2 (k) · · · x L (k)] T , s(k) = [s1(k) s 2 (k) · · · s N (k)] T , and is as in (29).

B. COMPUTATION OF THE DISCRETE-TIME WHITENING FILTER
First, it is important to note that there are 2 p choices of F(z) that satisfy the desired factorization R h (z) = F(z)F * (z −1 ). The different choices yield filters 1/F * (z −1 ) that have the same magnitude but different phase response. One possible choice is to select F * (z −1 ) so that it is a minimum phase, that is, with all its roots inside the unit circle. In this way, 1/F * (z −1 ) is a realizable causal and stable discrete system. The problem of this selection is that the overall impulse response F(z) will be the maximum phase and anticausal, and the resulting ISI will be difficult to compensate. To overcome this limitation, we choose F * (z −1 ) to be the maximum phase and thus the whitening filter 1/F * (z −1 ) will be stable only if it is considered anticausal. Nevertheless, anticausal filters can be implemented if a sufficient large delay is introduced. The advantage of this approach is that now the overall impulse response F(z) is causal and minimum phase. As mentioned before, note that h(t) contains 99.63% of the actual GMSK total energy because R h (0) = 0.9963. The Ztransform of R h (k) is R h (z) = r −2 z 2 + r −1 z + r 0 + r 1 z −1 + r 2 z −2 (B.2) that we can express as Forcing |ρ 1 |, |ρ 2 | ≤ 1 in order that the resulting whitening filter exists and be stable, we have ρ 1 = −0.1522 and ρ 2 = −0.5746. Taking into account that ρ 1 and ρ 2 are real valued, we arrive at and thus the whitening filter is given by Since {|w k |} −∞ k=0 is a strictly decreasing series, we can consider only the first significant w k coefficients. Taking into account that |w −20 | < 10 −4 , we can implement w(k) as an anticausal FIR filter: