EURASIP Journal on Applied Signal Processing 2002:3, 267–274 c ○ 2002 Hindawi Publishing Corporation A Novel Decoder for Unknown Diversity Channels Employing Space-Time Codes

We suggest new decoding techniques for diversity channels employing space time codes (STC) when the channel coe ﬃ cients are unknown to both transmitter and receiver. Most of the existing decoders for unknown diversity channels employ training sequence in order to estimate the channel. These decoders use the estimates of the channel coe ﬃ cients in order to perform maximum likelihood (ML) decoding. We suggest an e ﬃ cient implementation of the generalized likelihood ratio test (GLRT) algorithm that improves the performance with only slight increase in complexity. We also suggest an energy weighted decoder (EWD) that shows additional improvement without further increase in the computational complexity.


INTRODUCTION
Space time coding schemes have been shown to significantly improve the performance of communication over fading channels when multiple antennas are available at transmitter and receiver. Efficient space time codes (STC) for mobile cellular diversity channel are introduced by Tarokh et al. [1] where the channel coefficients are assumed to be known to the decoder. Many of the STC schemes assume full or partial receiver knowledge of the channel coefficients. When no knowledge of the channel is available, training sequences can be used in order to facilitate communication over an unknown channel. Since the sequence is known at the receiver, an estimate of the channel can be achieved at the receiver [2].
The use of training sequences has some drawbacks. First, there is a mismatch penalty. Because the training sequence is of limited length, the channel estimate formed at the receiver is imprecise which results in an increased error rate. Secondly, there is penalty in throughput, because the training sequence carries no information. This penalty is worse the longer the training sequence is, as compared to the length of the data sequence. When the channel changes rapidly over time, using training sequences might be inadequate. Since the channel can be assumed to be constant only over a very short period of time, training sequences would have to be transmitted very often, which would severely increase the throughput penalty. In broadcast multipoint communication networks, if a channel from the master to one of the receiver stations goes down at any time following the initial training period and it is desired to retain only that receiver, the use of training sequences is unsuitable. In such situations, it is desirable for the receiver to decode without having a known training sequence available.
An efficient differential detection scheme which does not require training sequences and has a linear complexity was developed in [3]. The detection scheme was developed for a simple transmit encoding design, known as the Alamouti block coding, first introduced in [4]. The detection scheme in [3] requires equal-energy signal constellation such as PSK. A different approach which requires no pilot sequences is the unitary space-time modulation introduced in [5]. The decoder in [5] assumes a Rayleigh statistical model on the channel coefficients with a known covariance matrix. The encoding and decoding in [5] have exponential complexities.
From the error probability point of view the maximumlikelihood (ML) decoder is optimal for known channels. In the situation we are interested in this work, however, the channel coefficients are unknown and also do not have a known statistical model. A generalization of the ML decision rule for unknown channels is the generalized likelihood ratio test (GLRT). Unlike the ML decoder for known channels, the GLRT is generally not optimal under the error probability criterion. In [6,7] we develop a new decoder that improves the performance of the GLRT for unknown Gaussian linear channels. The improvement in [6,7] is uniform and exponential, that is, for all the possible channel coefficients the error probability is either smaller by an exponential factor, or, only for some, remains the same. However, both the GLRT algorithm and the decoder we propose in [6,7] have high computational complexity for practical implementation in space time systems. In this paper, we are interested in developing a computationally efficient decoder that could be applied to a large class of codes and on the other hand would assume no statistical model on the channel coefficients. To this end, we relax the complexity of the GLRT by using an appropriate approximation. We further suggest a new decoder that improves the performance of the approximated GLRT decoder for non equalenergy signal constellation while maintaining its low complexity.
The outline of the paper is as follows. In Section 2, the channel model is presented. In Section 3, an example of a space time system is given, later used for simulation. In Section 4, we introduce the GLRT for diversity channels and suggest an efficient approximation of the GLRT. We give motivation for a new decoder that improves the GLRT and propose the new energy weighted decoder (EWD), later in Section 4. In Section 5, a low complexity implementation of the approximated GLRT and the EWD is developed. In Section 6, simulation results for the performance of the new decoders will show that the GLRT decoder improves the existing decoder (that estimates the channel by the use of training sequences) and that the EWD shows further performance improvement. A summary and suggestions for further research will conclude the paper.

CHANNEL MODEL
The model of a complex diversity channel with L transmit antenna and J receive antenna is given by n,i + z n, j , where {y n, j } N−1 n=0 are the complex observed data samples at receive antenna j, {x (m) n,i } N−1 n=0 are the complex symbols transmitted by the ith antenna for the mth codeword, α i, j are the complex unknown fading coefficients from transmit antenna i to receive antenna j and z n, j are white noise samples modeled as independent zero-mean complex Gaussian random variables with variance σ 2 per dimension. We can write (1) as where where matrices X m , m = 1, . . . , M are assumed to be full rank, as is the case for some of the coding methods we have encountered in the literature. For example, in [8] the matrix X m has an orthogonal structure and in [5] the columns of X m are designed to be (scaled) orthonormal.

The encoding algorithm
In this section we refer to the trellis based STC proposed by Tarokh et al. [1]. In [1], the channel coefficients were modeled as independent complex Gaussian random variables. This assumption is necessary for the criteria for optimal encoding design, but not for the decoding algorithm which is the main concern of our work. Therefore, we do not assume a statistical model of the channel coefficients until Section 6 for simulations. In this example, a 16-QAM 16-state STC is used with two transmit antennas and two receive antennas (L = 2, J = 2). Figure 1 shows the 16-QAM constellation and the trellis description for this code. Each row in the matrix represents the edge labels for transitions from the corresponding state. At the beginning and end of each frame the encoder is required to be in zero state. At each time t, depending on the state of the encoder and the input bits a transition branch is chosen. If the label of this branch is s 1 s 2 then the s 1 is transmitted over the first transmit antenna and s 2 over the second one. In the general case, the label of each branch is s 1 s 2 · · · s L . We will illustrate the encoding with an example. Suppose the input stream to be encoded is  The bit stream is divided into groups of four bits and each group is mapped into one of sixteen constellation points. Thus the input to the encoder becomes The encoder is at the beginning in zero state. The first symbol to be encoded is 1. The encoder transits to state 1 in the trellis. The transmitted symbols s 1 s 2 are indicated in the zeroth row (current state) and first column (next state) of the matrix and therefore s 1 s 2 = 01. The first antenna transmits 0 and the second antenna transmits 1. Continuing the encoding process this way, the symbol sequences transmitted by the two antennas are

The decoding algorithm for known channels
In [1], the decoder is assumed to have full knowledge of the channel coefficients α i, j . When the channel is known, the optimal decoder uses the ML decision rule, which for Gaussian channels selects the codewords according to the minimal Euclidean distance criterion. For the trellis STC described above, assuming that y n, j is the received signal at receive antenna j at time n, the branch metric for a transition labeled s 1 s 2 · · · s L is given by The Viterbi algorithm is then used to compute the path with the lowest accumulated metric.

Decoding algorithm for unknown channels
The more practical situation where the channel coefficients are unknown to the decoder is investigated by Naguib et al. [2]. The decoding method suggested in [2,9] is to first estimate the channel using a training sequence. This is a mismatched decoder since the decoder operates according to the estimated channel and not according to the actual one. At the beginning of each frame, a sequence W i of length k pilot symbols is transmitted from transmit antenna i The sequences W 1 , W 2 , . . . , W L are designed to be orthogonal to each other whenever p = q. Let y j = (y 0, j , y 1, j , . . . , y k−1, j ) be the observed sequence of received signals at antenna j during the training period given by The goal is to estimate α i, j , i = 1, 2, . . . , L, j = 1, 2, . . . , J using the observed signals.
The least square (LS) estimate forα i, j is used, which is given by the inner product The estimated coefficient vectorα i, j is then substituted in (7) instead of α i, j .

NOVEL DECODING METHODS
The classical decoding algorithm for unknown channels is the GLRT decision rule. This section begins by reviewing the GLRT decision rule in general and specifically for diversity channels. An approximation of the GLRT is suggested, which can be efficiently implemented. Later, we give the motivation for a new decoder that improves the GLRT and propose the new energy weighted decoder (EWD).

The generalized likelihood ratio test (GLRT)
Consider the family of channels specified by a set of channel laws, where p θ (·|·) denotes the conditional probability mass functions (PMF) governing the channel, which depends on the unknown and deterministic parameter θ ∈ Θ. For a known channel, the ML decoder is optimal in the sense of minimizing the probability of error (over messages). The decision rule of this decoder is given by where x (i) is the ith codeword. Since the ML decision rule is in general different for different channels, it cannot be employed when the channel is unknown. A common approach in this case is the GLRT decoding rule. The GLRT decoder can be defined as follows: While the GLRT is intuitively appealing as joint channel and data estimation, it does not have optimal performance and lacks solid theoretical justification. For Gaussian diversity channels the decoding criterion of the GLRT becomeŝ where Y , X m , and α are defined in (2) and the norm of the matrix A is defined as The LS solution for α yieldŝ and it can be easily derived that the closed form GLRT decision rule ism

Approximated GLRT decoder
Since the decoding process in Section 3 involves the Viterbi algorithm, an approximated GLRT decoding that uses a sequential updated estimation of the channel coefficients can be incorporated into the existing decoder. The approximated GLRT decoder can be implemented, as we will show in Section 5 with only slight increase in complexity over the decoder for known channels. We define the matrices y 0,1 y 0,2 · · · y 0,J y 1,1 y 1,2 · · · y 1,J . . . . . . . . . . . . y n,1 y n,2 · · · y n,J The LS solution of the following minimization problem: is given by,α where n is the update index. Note thatα n m can be computed at instant n without relying on future received signals.
The approximated GLRT criterion iŝ Unlike the GLRT where the LS estimation of the channel coefficient matrix α is carried out over an entire codeword, here the estimated coefficients are updated at each time interval. The estimation of α at instant n depends only on present and past received signals.

Improving the GLRT: motivation
For a simple scalar case of fading (a single transmit antenna and a single receive antenna) a decoder that uniformly improves the GLRT is found in the appendix of [10] and is also suggested in [11]. That is, the new decoder improves the error probability performance for each choice of fading coefficients. The (real) fading channel model is where y = (y 0 , y 1 , . . . , y N−1 ) is the observed vector, x (m) is the mth codeword, α is an unknown fading coefficient, and the components of the noise vector z are i.i.d. zero-mean, Gaussian random variables with variance σ 2 . Suppose that our codebook consists of two codewords (M = 2) of length N given by x (1) = (a, 0, 0, . . . , 0) and x (2) = (0, b, 0, . . . , 0). Any orthogonal code of two codewords can be transformed to this form. Since all of the coordinates of both codewords are zero for n > 1, the problem is actually two dimensional.
The decoding regions for the GLRT decoder appear in Figure 2. The GLRT projects the received signal (y 0 , y 1 ) onto the directions of the two-dimensional vectors formed by the first two coordinates of x (1) and x (2) , and decides according to the smaller between the distances of (y 0 , y 1 ) to the vertical axis and to the horizontal axis of the coordinate system. The decoding rule decides x (1) if |y 0 | ≥ |y 1 | and decides x (2) if |y 0 | < |y 1 |. Thus, the boundaries between the two decision regions are straight lines through the origin at slopes of ±45 degrees. Note that the decoding rule does not depend on the specific values of a and b. The distances of α · (a, 0) and α · (0, b) from the boundary lines dictate the error probability for the decoder. The distance d 1 of α · (a, 0) from the boundary lines at slope ±1 is αa/ √ 2 and the distance d 2 of α · (0, b) from the same lines is αb/ √ 2. The probability of error is of the exponential order of exp{−α 2 min{a 2 , b 2 }/(4σ 2 )}.
The decoding regions of the new decoder suggested in [10] appear in Figure 3. This decoder projects the vector formed by the first two coordinates of each x (m) in the direction of the first two coordinates of y. The decoding rule decides x (1) if |y 0 /a| ≥ |y 1 /b| and decides x (2) if |y 0 /a| < |y 1 /b|. In other words the new decoder decides x (1) if |by 0 | ≥ |ay 1 | and decides x (2) if |by 0 | < |ay 1 |. That is, the new decoder multiplies the metric of the GLRT for each codeword with the square root of the energy of that codeword. The boundary between the two decision regions is a pair of straight lines with slopes ±b/a. For the new decoder the distances d 1 , d 2 of α · (a, 0) and α · (0, b) from the boundary lines are both αab/ √ a 2 + b 2 . The probability of error is of the exponential order of exp{−α 2 a 2 b 2 /[2σ 2 (a 2 + b 2 )]}. Unless a = b, the exponential order of the new decoder is strictly better than that of the GLRT for any α. The improvement was achieved by maximizing the minimal of the distances d 1 and d 2 .

Energy weighted decoder
Motivated by the simple fading case with a single antenna in the transmitter and receiver we can derive a new decoder for STC over a diversity channel. Instead of (22) for the approximated GLRT decoder, the decision rule for the new decoder iŝ That is, the metric for each time interval is multiplied by the square root of the transmitted energy in that time interval, analogously to the new decoder suggested for the simple fading example. Unlike the single fading coefficient case, for diversity channels this decoder does not generally improve the GLRT for all possible channel coefficient matrices α, as we have demonstrated in several examples of codebooks in [6]. Nevertheless, this decoder improves the GLRT in many cases. Indeed, our experimental results in Section 6 indicate the EWD outperforms the GLRT. An intuitive explanation for the EWD performance improvement is given as follows. For a certain decoder and a channel coefficient matrix, the error probability is given by where α denotes the channel coefficients matrix and m denotes the index of transmitted codeword matrix X (m) . The exponential order of the error probability is therefore For the GLRT decoder the exponential order in (26) is determined, for most values of α, by the error probability of one of the less energetic codewords. Since the weighting of the EWD prefers selecting the less energetic codewords (in comparison to the GLRT), this compensation improves the exponential order in (26) for most values of α.

IMPLEMENTATION
When the encoder uses a trellis code, such as in the example in Section 2 the decoder uses the Viterbi algorithm. Both the approximated GLRT and the EWD can be incorporated with the Viterbi algorithms, using the recursive least square (RLS) algorithm. At each step n of the algorithm we calculate the estimator α (n−1)→n r (r = 1, . . . , T, where T is the number of branches of the trellis at each interval) corresponding to each of the transitions from instant n − 1 to instant n and the associated surviving sequences. The estimated channel coefficients can be substituted in the trellis metrics in (7). Each state at instant n has in addition to the cumulative metric and associated surviving sequence an associated estimated channel coefficient vectorα n r .
In order to avoid increase in computational complexity we observe that the estimated coefficientsα (n−1)→n r , n = 1, . . . , N − 1 do not have to be calculated separately for each n. The estimated coefficients can be updated via the recursive least square algorithm (RLS). We fully describe below the algorithm.
Step 0 At the beginning of the algorithm (n = 0) the decoder is at state 0. The decoder is supplied with an initial estimate of α, denoted asα 0 0 . This estimate can be achieved, for example, by using a pilot sequence matrix with L (number of transmitters) rows. In this case we define where X p is the pilot sequence. When a pilot sequence is not availableα 0 0 can be determined, for example, by using the most updated estimation ofα 0 0 from the previous frame. When no pilot sequences are used we set Q 0 0 = 0 (L × L matrix).
Step n In addition to the cumulative metric associated with each state in step n − 1 and the surviving sequence, each state q is also associated with estimated coefficientsα n−1 q . Consider a transition r labeled s n,1 s n,2 · · · s n,L . Define s n r = s n,1 s n,2 · · · s n,L , y n = y n,1 y n,2 · · · y n,J . (28) where K n r is calculated according to The algorithm requires at each transition of the trellis only scalar inversions. The matrix multiplications in (29) is of order L × J which is the same order as the metric computation. The matrix multiplications in (30) and (31) or (32) are of order L × L (in practical systems L ≤ 16 but most often L = 2), but do not depend on the received signal and therefore do not have to be computed in real-time. Therefore, the complexity of the approximated GLRT decoder is not significantly higher than that of the decoder designed for a known channel.

SIMULATION RESULTS
For the simulations, we used the 16-QAM 16-state STC described in Section 2 with two transmit antennas and two receive antennas. The channel coefficients were generated as samples of independent complex zero-mean Gaussian random variables with variance 0.5 per dimension. The frame of message to be encoded was 128 bits long. The channel coefficients were assumed to be constant during a frame period. The channel coefficients during different frame periods were assumed to be statistically independent (quasi-static conditions). The short pilot sequence was of two symbols only. The three decoding algorithms examined were the mismatched decoder described in Section 3.3, the approximated GLRT and the energy weighted decoder. The results of the simulations are shown in Figure 4. The ML decoder, employed when the channel is known to the receiver, is shown as a performance bound. The GLRT has about 2.5 dB advantage over the mismatched decoder and the EWD shows an additional advantage of 0.5 dB. Both algorithms allow communication at significantly lower SNR than the mismatched decoder for this QAM STC system.

CONCLUSIONS
In this paper, we have examined the problem of decoding for STC systems when the channel coefficients are unknown to the decoder. We have pointed out the difficulties which arise from employing pilot sequences, especially if those sequences are large in comparison to the frame length. These difficulties led us to suggest the GLRT decoder and its approximation for diversity channels. We later derived the EWD that improves the performance of the GLRT, motivated by the simple fading example in [10]. An efficient implementation of both the approximated GLRT and the EWD was developed, with slight increase of complexity over the decoding scheme for known channels. Simulation results demonstrated the performance improvement of the approximated GLRT and the EWD. Since the EWD shows robust performance improvement for non equal-energy signal constellation, it may enable the usage of QAM even when the antenna fading gains are unknown.
One direction for future research is modifying our decoders to time variant channel model. A natural generalization of the GLRT and the EWD to time variant channels would modify the estimation of the channel coefficients involved in the algorithm. Instead of LS estimation it could involve the weighted least square (WLS) algorithm, where the weights are chosen to account for the changes in the channel.
The problem of the encoder design for unknown diversity channels can be investigated more closely in order to achieve a complete view of a universal communication system. A general discussion of robust communication for various classes of unknown channels can be found in [12], where both encoding and decoding methods were considered. The design criteria for good codes when the channel coefficients are assumed to have complex Gaussian distribution were investigated in [1]. Understanding the design criteria for good codes when the channel coefficients do not have a statistical model can be the subject of a future research.