EURASIP Journal on Applied Signal Processing 2002:5, 459–470 c ○ 2002 Hindawi Publishing Corporation Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications

This paper presents the design of space-time turbo trellis coded modulation (ST turbo TCM) for improving the bandwidth efficiency and the reliability of future wireless data networks. We present new recursive space-time trellis coded modulation (STTC) which outperform feedforward STTC proposed in by Tarokh et al. (1998) and Baro et al. (2000) on slow and fast fading channels. A substantial improvement in performance can be obtained by constructing ST turbo TCM which consists of concatenated recursive STTC, decoded by iterative decoding algorithm. The proposed recursive STTC are used as constituent codes in this scheme. They have been designed to satisfy the design criteria for STTC on slow and fast fading channels, derived for systems with the product of transmit and receive antennas larger than 3. The proposed ST turbo TCM significantly outperforms the best known STTC on both slow and fast fading channels. The capacity of this scheme on fast fading channels is less than 3 dB away from the theoretical capacity bound for multi-input multi-output (MIMO) channels.


INTRODUCTION
In the present cellular mobile communication systems, multiple antennas are being considered for applications at base station receivers with the aim to suppress cochannel interference and minimize the fading effects on the uplink. The size of base stations allows the deployment of receive diversity on the uplink. On the downlink, however, the limited size and power of the mobile stations make it more practical to consider transmit diversity. Transmit diversity decreases the required processing power of the receivers, resulting in a simpler system structure, lower power consumption and lower cost. Furthermore, transmit diversity can be combined with receive diversity to further improve the system performance and increase the spectral efficiency. Channel coding com-bined with spatial diversity is called space-time (ST) coding.
Code design criteria based on the rank and the determinant of the codeword distance matrix for trellis based ST codes were derived in [1,2]. In this approach, multiple transmit antennas and error correction coding are combined with higher level modulation schemes. An ST encoder takes as input a block of b binary data, and maps them into n T modulation symbols from a signal set of 2 b points. Each output modulation symbol feeds a separate transmit antenna. The symbols from n T antennas are transmitted simultaneously in one symbol interval. The scheme gives a maximum spectral efficiency of b bits/s/Hz which is equal to the spectral efficiency of the reference uncoded systems.
The receiver uses a maximum likelihood decoding algorithm to recover the transmitted information. Space-time trellis coded modulation (STTC) can achieve a substantial improvement in performance, benefiting from both diversity and coding gains. However, when the number of transmit antennas gets larger, the complexity of the receiver structure and the code construction becomes prohibitive. In [1], feedforward STTCs with two transmit antennas were designed. In [3], a set of improved feedforward 4PSK STTCs relative to the codes in [1] were proposed.
Recently, a new set of design criteria for STTCs for slow and fast fading channels were proposed [4,5]. These criteria are applicable to multiple-input multiple-output (MIMO) channels with a high diversity order. When the diversity order, defined as the product of the minimum rank of the distance matrices and the number of receive antennas, is small, the rank and the determinant criteria, proposed in [1], are valid. However, for high diversity orders, (larger than 3), the minimum trace of the codeword distance matrix, or equivalently the minimum squared Euclidean distance, dominates the code performance and its minimum value should be maximized in code design. Motivated by this design criterion, in this paper we design recursive STTCs and demonstrate that they are superior to feedforward STTCs reported in [1,3]. Furthermore, we construct an ST turbo trellis modulation (TCM) scheme with the new recursive STTCs as constituent codes. The recursive structure of the constituent codes enables the full benefit of interleaver gain and iterative decoding. The proposed ST turbo TCM scheme is based on a parallel concatenation of two constituent STTCs and alternate puncturing of parity symbols, analogous to a turbo TCM scheme reported in [6]. The ST turbo trellis encoder consists of two identical recursive STTCs linked by an interleaver and followed by an MPSK signal mapper. The iterative decoder operates on the constituent code trellis and generates soft symbol estimates by a log-MAP algorithm [7,8].
Independent of the work of this paper, a similar design was done in [9], based on a recursive code obtained by converting the feedforward STTC reported in [1] into a recursive code. In [10], a turbo code is serially concatenated with a space-time block code. In [11], the concept of recursive STTCs is first suggested and the serial and parallel concatenation structures with recursive STTCs as component codes are proposed. In [10,11], full diversity is guaranteed but full rate is not achieved. In [12], a novel serial concatenation of STTC with interleaver and rate 1 simple recursive inner code is proposed.
One of the key issues with turbo codes is decoding algorithm convergence. We discuss the decoder convergence of the proposed ST turbo TCM scheme and evaluate the decoding thresholds, expressed as the minimum E b /N 0 ratio for which the code can converge. Furthermore, we estimate that the proposed ST turbo TCM codes are less than 3 dB away from the MIMO theoretical channel capacity limit [13].

STTC SYSTEM MODEL
The system under consideration employs a recursive STTC with n T transmit and n R receive antennas. While the trans-mitter has no knowledge about the channel, it is assumed that the receiver can recover the channel state information perfectly. Information bits are encoded into n T streams of MPSK symbols by the ST encoder. A space-time symbol x t at time t consists of n T MPSK symbols, and can be written as x t = (x 1 t , x 2 t , . . . , x nT t ). At any given time t, an MPSK symbol x i t is transmitted through the ith antenna, i = 1, 2, . . . , n T .
At the receiver, each antenna receives a noisy superposition of n T transmitted symbols which have been subjected to independent fading. After matched filtering, assuming ideal timing information, the received signal r j t at the jth receive antenna at time t can be expressed as where h i, j (t) models the complex fading gain from transmit antenna i to receive antenna j at time t, i = 1, 2, . . . , n T , j = 1, 2, . . . , n R , and E s is the energy per symbol. On a fast fading channel, we assume that the fading coefficients change independently from symbol to symbol. On a slow fading channel, we assume that the fading coefficients remain the same over a frame and change independently from frame to frame. When the fading coefficients remain the same over more than one symbol but less than a frame, the channel undergoes a block fading. Regardless of the fade rate, the fading gains are modelled as independent samples of a complex Gaussian random variable with a zero mean and a variance of 0.5 per dimension. The noise n j t at the jth receive antenna at time t is modeled as an independent sample of a zero mean complex Gaussian random variable with a noise spectral density of N 0 .

PERFORMANCE ANALYSIS AND CODE DESIGN CRITERIA
A memory ν recursive MPSK STTC can be described in terms of its 2 ν -state trellis. At time t = 0 the trellis is at the zero state. Given a particular input, the state of the trellis at any given time is indicated by the content of the ν memory taps. At time t, there are M branches leaving each state s i t , i ∈ {0, 1, . . . , 2 ν − 1}, each of which corresponds to an incoming input j, j ∈ {0, 1, . . . , M − 1}, and is labeled with n T MPSK symbols. These MPSK symbols are the encoder output to be transmitted simultaneously through n T transmit antennas when the previous state is s i t and the input is j. At the decoder, the received sequence is decoded using a maximum likelihood decoding algorithm based on the Mary trellis.
Following the derivation in [5], consider an n T × n T codeword distance matrix A(x,x) = B(x,x) · B H (x,x) between two codewords x = (x 1 , x 2 , . . . , x t , . . . , x l ) andx = (x 1 ,x 2 , . . . ,x t , . . . ,x l ) of length l. The matrix B H denotes the Hermitian of a matrix B, and B(x,x) is a codeword difference matrix, defined as For the purpose of our analysis, define r as the minimum rank of the matrix A(x,x) over all possible codeword pairs, and δ H , the minimum symbol Hamming distance, is defined as taken over all codeword pairs, where υ(x,x) denotes the set of time instances t ∈ {1, 2, . . . , l}, such that x t −x t = 0.

Performance on slow fading channels
The pairwise error probability P(x,x) is the probability that the decoder selects as its estimate the sequencex when the transmitted sequence was in fact x. When r ·n R ≥ 4, on a slow fading channel, the pairwise error probability can be upper bounded as [5] where λ i , i = 1, 2, . . . , r, are nonzero eigenvalues of the matrix A(x,x), σ 2 is the noise variance, and Q(·) is the complementary error function.
By using inequality Q(x) ≤ (1/2)e −x 2 /2 for x ≥ 0, at high signal-to-noise ratios the upper bound in (4) can be further approximated as From (5), it can be seen that, in order to minimize the error probability, the minimum sum of all eigenvalues of the matrix A(x,x) among all codeword pairs should be maximized. For a square matrix, the sum of the eigenvalues is equal to the sum of all elements on the main diagonal which is called the trace of the matrix. The performance is dominated by the minimum trace which is equivalent to the minimum Euclidean distance over all codewords.
When r · n R < 4, however, the upper bound on the pairwise error probability at high signal-to-noise ratios can be expressed as which suggests that to achieve the best performance the minimum rank and the minimum product of all nonzero eigenvalues of A(x,x) should be maximized. If full rank is achievable, it is equivalent to maximizing the minimum determinant of A(x,x), as first proposed in [1].

Performance on fast fading channels
Provided that δ H · n R ≥ 4 [5], the pairwise error probability on fast Rayleigh fading channels can be upper bounded by where d 2 E is the accumulated squared Euclidean distance between two space-time symbol sequences, given by while D 4 is given by By using an approximation of the Q(·) function, at high signal-to-noise ratios the upper bound in (7) can be further approximated as From (10), we can conclude that the pairwise error probability is dominated by the squared Euclidean distance d 2 E . When δ H · n R < 4, the upper bound on the pairwise error probability at high signal-to-noise ratios becomes where d 2 p is the product of the squared Euclidean distances between two space-time symbol sequences, given by When r · n R ≥ 4 and δ H · n R ≥ 4, the design criteria for STTC on slow and fast fading channels are identical. The design criteria in this case can be formulated as • Maximize the minimum Euclidean distance over all codewords.
Therefore, provided that r · n R ≥ 4 and δ H · n R ≥ 4, we can construct a set of recursive STTCs which best satisfy the design criterion and perform well on both types of fading channels, and can be directly used as constituent codes in a parallel concatenation structure. Figure 1: Feedforward STTC encoder.

Code structure
In this section, the structure of systematic and nonsystematic recursive STTC is explained. A feedforward STTC encoder for 4PSK and two antennas with a memory order of ν = 2ν 1 is shown in Figure 1. If the sequence . .) is the binary input stream to the upper row of shift registers, in a polynomial form it can be represented as Similarly, the binary input sequence to the lower row of shift registers can be written as The feedforward generator polynomial for the upper row of shift registers and transmit antenna i, where i ∈ {1, 2}, can be written as Similarly, the feedforward generator polynomial for the lower row of shift registers and transmit antenna i, where i ∈ {1, 2} can be written as The encoded symbol sequence transmitted from antenna i is given by Equivalently, the relationship in (18) can be written in the following form: .
The feedforward generator matrix from (19), can be converted into an equivalent recursive matrix by dividing it by a binary polynomial q(D) of a degree equal to or less than ν 1 . However, if q(D) is chosen to be a primitive polynomial, the resulting recursive code should have a high minimum distance. The generator polynomial for antenna i can be represented as where A systematic recursive STTC can be obtained by setting which means that the output of the first antenna is obtained by directly mapping the input sequences c 1 and c 2 into a 4PSK sequence. A diagram of a recursive 4PSK STTC encoder with two transmit antennas is shown in Figure 2.
A recursive 8PSK STTC, can be generated by a similar procedure by converting a feedforward 8PSK STTC generator matrix with polynomial entries into an equivalent recursive generator matrix with rational entries. The spectral efficiency in this case is 3 bits/s/Hz.

Algebraic structure of recursive space-time trellis codes
For a 4PSK recursive STTC, the output symbols s 1 t and s 2 t from Figure 2 can be expressed algebraically as where i ∈ {0, 1}. The encoder for an 8PSK recursive STTC is implemented as a feedforward shift register with a memory order of ν. The encoder output can be expressed as

A hybrid design of robust recursive STTC
In this section, we consider design of recursive STTCs which can deliver data transmission with bandwidth efficiency of 2 and 3 bits/s/Hz. Unlike previously reported feedforward STTC in [1,3], these recursive codes can be used directly as constituent codes in ST turbo TCM schemes to deliver data transmission at the same rates but at much lower signal to noise ratios than the reference uncoded systems with the same spectral efficiency. In a cellular system, a lower transmission power means lower interference to neighboring cells, thus allowing a frequency band to be reused more frequently. Section 3 discusses the design criteria for recursive STTCs on slow and fast fading channels. In reality, however, the fade rate falls somewhere between these two extremes. Therefore, it would be desirable to obtain a set of codes which satisfy the design criteria for both extreme conditions. It is expected that such codes will perform well in a wide variety of fading conditions. In [1], such codes are termed smart and greedy space-time codes because the encoder does not need to know the channel but can take advantage of the benefits offered by both the multiple transmit/receive antennas and the possible temporal channel variations.
We have stated previously that when r · n R ≥ 4 and δ H · n R ≥ 4, the design criteria for recursive STTCs on slow and fast fading channels coincide. Under these conditions, the error probability is minimized when the minimum squared Euclidean distance, d 2 E , of the code is maximized. Therefore, with the code structure given in Section 4.1 and assuming that at least two receive antennas, (n R ≥ 2), are available to the system, we find a set of coefficients a i j k , b i j k for recursive 4PSK STTCs, and a i j k , b i j k , d i j k for recursive 8PSK STTCs for a given memory order which maximizes d 2 E . Tables 1 and 2 list recursive 4PSK and 8PSK STTCs, respectively, with two transmit antennas which best satisfy the design criterion on slow and fast fading channels, provided that n R ≥ 2. Each code in both tables have the minimum rank r = 2 and the minimum symbol Hamming distance δ H ≥ 2, satisfying the condition on the design criterion. These codes  were obtained through an exhaustive computer search. These codes were initially constructed in a feedforward form in [4].
A further investigation shows that these codes maintain their superiority in terms of their squared Euclidean distance, and thus their performance, when they are converted into a feedback recursive form as discussed in Section 4.1. Both tables list the squared Euclidean distance of each code and that of its counterparts of the same memory order reported in [1,3]. For any given memory order, the new recursive STTC has the largest d 2 E , indicative of a superior performance on slow and fast fading channels for a large product rn R .

PERFORMANCE OF RECURSIVE STTC
In this section, we compare the performance of the new recursive STTCs with previously known feedforward STTCs on slow and fast fading channels. The performance is measured in terms of the frame error rate as a function of E b /N 0 , the ratio between the energy per information bit to the noise at each receive antenna. Each frame consists of 130 MPSK symbol transmissions from each transmit antenna. Figure 3 shows the performance of the new 8-state and 32-state 4PSK recursive STTCs in comparison with feedforward STTCs of the same memory order proposed in [1,3] with four receive antennas on slow fading channels. The 8-state STTCs in [1,3] achieve virtually the same performance, while the new 8-state recursive STTC offers a 0.5 dB gain over the other two STTCs at a frame error rate (FER) of 10 −3 . The new recursive 32-state STTC offers a 0.5 dB gain over the feedforward STTC in [1] at the same frame error rate. Figure 4 shows the performance of the new 16 and 32state 4PSK recursive STTCs with two receive antennas on fast fading channels. The performance curves show consistently lower error rates of the new recursive STTC over the  feedforward STTC for the same memory order, previously proposed in [1,3]. When FER = 10 −3 the new recursive 16state 4PSK STTC offers a 2 dB and a 0.8 dB gain over feedforward STTCs in [1,3], respectively. The new recursive 32-state 4PSK outperforms feedforward STTC of the same memory order in [1] by 0.5 dB at FER = 10 −3 .
All figures we have shown in this section confirm that the new recursive STTC outperforms feedforward STTC of the same memory order previously proposed in [1,3], both on slow and fast fading channels. Note however, that the recursive structure STTC by itself does not have any advantage over feedforward STTC. As stated in Section 4, recursive STTCs in Tables 1 and 2 were originally constructed in feedforward form. Figure 5 shows the performance of the 16-state 4PSK STTC in feedforward and recursive forms on quasi-static fading channels. The frame error rate performance of the code in both forms is identical.

SPACE-TIME TURBO TCM
Having designed and constructed a set of recursive STTCs with a superior performance on slow and fast fading channels, we would like to use them in a parallel concatenation to further reduce bit errors by taking advantage of interleaver gain and iterative decoding. Figure 6 shows the encoder structure of a ST turbo TCM with two transmit antennas, consisting of two recursive STTC encoders in the upper and lower branches, and linked by a pairwise interleaver and a symbol deinterleaver [6]. Each encoder operates on a message block of L groups of b information bits, where L is the interleaver size. The message sequence c is given by c = (c 1 , c 2 , . . . , c t , . . . , c L ), where c t is a group of b information at time t, given by c t = (c t,0 , c t,1 , . . . , c t,b−1 ).
The upper recursive STTC encoder in Figure 6   input sequence into two streams of L MPSK symbols, x 1 1 , Prior to encoding by the lower encoder, the information bits are interleaved by a pairwise symbol interleaver. The pairwise symbol interleaver operates on groups of b bits instead of on single bits. The interleaver maps even positions to even positions, and odd ones to odd ones. The interleaver ensures that the ordering of b information bits arriving at the interleaver at any time instant t remains unchanged. The lower encoder also produces two streams of L MPSK symbols. Each stream is deinterleaved, resulting in x 1 2 and x 2 2 , where x i 2 = (x i 2,1 , x i 2,2 , . . . , x i 2,L ), i ∈ {1, 2}. Deinterleaving at this stage ensures that the b information bits determining the output symbols of the upper and lower encoders at any given time instant are identical. Assuming that L is even, the first stream of symbols generated by the upper and lower encoders, x 1 1 and x 1 2 , are alternately punctured into x 1 = (x 1 1,1 , x 1 2,2 , x 1 1,3 , x 1 2,4 , . . . , x 1 1,L−1 , x 1 2,L ) and transmitted through the first transmit antenna. The second stream of symbols generated by the upper and lower encoders, x 2 1 and x 2 2 , are alternately punctured into x 2 = (x 2 1,1 , x 2 2,2 , x 2 1,3 , x 2 2,4 , . . . , x 2 1,L−1 , x 2 2,L ) and transmitted through the second transmit antenna.

DECODING ALGORITHM
The decoding process of ST turbo TCM is very similar to that of binary turbo codes except that the symbol probability is used as the extrinsic information instead of the bit probability. The MAP decoding algorithm for nonbinary trellises is called symbol-by-symbol MAP algorithm. Since the extrinsic information can become either too large or too small and causes computational overflows, a log-MAP algorithm is used instead of MAP. With a log-MAP decoder, the logarithm of probabilities is computed and passed to the next decoding stage.
The log-MAP decoder computes the log-likelihood ratio of each group of information bits c t = i. The soft output Λ(c t = i) is given by  decoding operation proceeds as for the binary turbo codes when the decoder receives the symbol generated by its own encoder. However, for every even received signal, the decoder receives the punctured symbol which is generated by the other encoder. The decoder in this case ignores this symbol by setting the branch transition metric to zero. The only input at this step in the trellis is the a priori component obtained from the other decoder.

ST TURBO TCM PERFORMANCE
This section evaluates the performance of ST turbo TCM scheme on fast and block fading channels. In each case, it is assumed that the receiver has two receive antennas. Figure 7 shows the (FER) performance comparison between the 16state recursive 4PSK STTC in Table 1 and a 16-state 4PSK ST turbo TCM. The 16-state recursive 4PSK STTC is the constituent code in the ST turbo TCM configuration. The performance curves show that the ST turbo TCM configuration offers a tremendous improvement. At a frame error rate of 10 −3 , with ten iterations and an interleaver size of 1024, it achieves a gain of more than 7 dB relative to STTC. At the same frame error rate, it achieves more than 2 dB gain compared to ST turbo TCM with the constituent code of the same memory order, proposed in [9]. The bandwidth efficiency in all cases is 2 bits/s/Hz. Figure 8 shows the performance of the 4-state 4PSK ST turbo TCM on quasi-static fading channels. The number of iterations is 10 and the interleaver size is 130. The curves show that at FER=10 −2 the ST turbo TCM offers 8.8 dB and 8.0 dB gain over the recursive STTC for the fading block size of 100 and 200, respectively. Figure 9 shows the FER performance of the new 32-state 8PSK ST turbo TCM in comparison with that of the 32-state  8PSK STTC. In this case, with ten iterations the new 32-state 8PSK ST turbo TCM offers more than 7 dB gain at FER=10 −3 , compared to the 32-state recursive 8PSK STTC in Table 2. When the number of iteration is reduced from ten to six, the performance is degraded by about 0.3 dB. Figure 10 shows the effects of increasing the number of transmit and receive antennas on the performance of the 16-state 4PSK ST turbo TCM on fast fading channels. Following an algebraic description of a recursive 4PSK STTC in Section 4.2, the constituent recursive 4PSK STTC with 2T, 4R 3T, 4R 4T, 4R three transmit antennas is given as Similarly, the constituent recursive 4PSK STTC with four transmit antennas is given as where in both casesĉ k t , k ∈ {0, 1} is defined in (25). The performance curves show that increasing the number of transmit antenna from two to three brings about 0.7 dB gain at FER = 10 −3 , while increasing the number of transmit antennas from three to four results in a negligible gain. The incremental gain resulting from increasing the number of transmit antennas stays relatively the same when the number of receive antennas increases from two to four.
The performance curves of Figures 7, 8, and 9 suggest that the parallel concatenation of STTC outperforms recursive STTC scheme. One may argue, however, that the comparison is less than fair since ST turbo TCM can take advantage of interleaver gain and iterative decoding. Thus, a fairer comparison should consider the performance of ST turbo TCM with other known turbo TCM schemes such as that proposed by Robertson [6,14]. Figure 11 shows the FER performance comparison between the new 16-state 4PSK ST turbo TCM and a 16-state turbo TCM scheme from [6]. Note that although the turbo TCM scheme uses only one transmit antenna while the ST turbo TCM scheme uses two, the total transmit power remains the same. Two receive antennas are used in both cases. With the same interleaver size of 1024 and ten iterations, the ST turbo TCM offers a 2.5 dB gain at a frame error rate of 10 −2 . The bandwidth efficiency in all cases is 2 bits/s/Hz. Note that to achieve the same bandwidth efficiency, the scheme by Robertson et al. has to use 8PSK signal set. Figure 12 shows the performance of 4-state ST turbo TCM and 4-state STTC on quasi-static fading channels with two transmit and two receive antennas. At FER = 10 −3 , the ST turbo TCM offers more than 1.5 dB improvement. The frame size is 130 symbols.

System capacity
Telatar investigated and derived the formula for the capacity of multiantenna Gaussian channels with or without fading in [13]. Assuming independent Rayleigh fading and independent noise at different receive antennas, the capacity of the channel with n T transmit and n R receive antennas under power constraint P equals [13] ∞ 0 log 1 + P σ 2 n T λ where σ 2 is the noise variance per dimension, m = min{n R , n T }, n = max{n R , n T }, and L i j are the associated Laguerre polynomials [15].
Using this formula, we plotted the theoretical capacity of a MIMO independent Rayleigh fading channel when (n T , n R ) = (2, 2) and (2,4). Figure 13 shows the spectral effi- In Figure 14, the performance of 16-state ST turbo TCM with four transmit and receive antennas on quasi-static Rayleigh fading channel is presented. For comparison, the outage probability for 2 bits/s/Hz, which is a lower bound for FER on quasi-static fading channels, is also included. The performance curves show that the 16-state ST turbo TCM is 1.5 dB away from the outage capacity at the FER of 10 −3 .

Decoder convergence
We analyze the convergence of ST turbo TCM decoder by approximating the density functions of the extrinsic information message as a Gaussian distribution, and calculating the mean and variance in the Gaussian density evolution. This technique was used to analyze turbo codes [16] and to obtain an E b /N 0 threshold on low density parity check (LDPC) codes [17]. A threshold is the smallest E b /N 0 value beyond which an iterative decoder converges and the bit error rate goes to zero as the number of iterations increases.
Assuming perfect interleaving, each extrinsic informa- tion message is independent and identically Gaussian distributed with mean µ i and variance σ 2 i at end of the ith iteration. The mean and the variance at each iteration can be determined through simulations. The SNR i of the extrinsic information at the ith iteration is defined as For a parallel concatenation code, the decoder convergence can be determined by plotting the output SNR versus the input SNR of the first decoder and the input SNR versus the  output SNR of the second decoder. If the two curves intersect with each other, the decoder does not converge. The threshold is the value of E b /N 0 at which the two curves just touch. Figure 15 shows the input/output SNR curves of ST turbo TCM scheme with a 16-state 4PSK STTC in [9] as the constituent code. Note that SNR in denotes the SNR of the extrinsic information at the input of a decoder, and SNR out denotes the SNR of the extrinsic information at the output of a decoder. The curves were generated when E b /N 0 = −0.5 dB. The figure shows the two curves just touch. This implies that the threshold is −0.5 dB. Figure 16 shows the input/output SNR curves of ST turbo TCM scheme with the new 16-state 4PSK STTC as the constituent code when E b /N 0 = −0.5 dB. The figure shows a tunnel between the two curves through which the iterative decoding progresses. This figure suggests that the threshold is less than −0.5 dB. A further investigation shows that the threshold for this code is −0.8 dB. This shows that ST turbo TCM with the new 16-state 4PSK STTC as the constituent code is more optimized than that with the 16-state QPSK STTC in [9] as the constituent code, because it converges more quickly at a lower operating E b /N 0 . Furthermore, Table 3 compares the E b /N 0 thresholds between the new recursive 4PSK STTC with that proposed in [9] when being used as constituent codes in a parallel concatenation structure. The entries show that for a given memory order, the new recursive STTC converges more rapidly. The entries of Table 3 suggests that increasing the memory order does not necessarily result in a lower threshold. A similar phenomenon has been observed with binary turbo codes, for which a lower memory code has a lower threshold. This can be explained as follows. Firstly, codes with larger memory have longer paths in the trellis and when the noise is large at low operating SNRs the decoder is more likely to diverge as the number of iterations increases. Secondly, codes with larger memory have more nearest neighbor codewords, resulting in larger error coefficients. Consequently, at low operating SNRs, it is harder for the decoder to choose the correct codeword.

CONCLUSIONS
This paper considers the design of a space-time turbo trellis coded modulation scheme. The structure of recursive STTC is presented and new recursive STTCs which best satisfy the design criterion on slow and fast fading channels are proposed. These recursive STTCs outperform previously known feedforward STTC. Moreover, they can be used directly as constituent codes in a parallel concatenation structure, benefiting from interleaver gain and iterative decoding. This structure offers significant performance improvement compared to the traditional STTC scheme on fast and block fading channels. The new ST turbo TCM is less sensitive to any change in the fading rate compared to previously known codes, and falls within 3 dB from the theoretical MIMO channel capacity.