EURASIP Journal on Applied Signal Processing 2002:3, 236–248 c ○ 2002 Hindawi Publishing Corporation Space-Time Turbo Coded Modulation: Design and Applications

A design method for recursive space-time trellis codes and parallel-concatenated space-time turbo coded modulation is proposed that can be applied to an arbitrary existing space-time trellis code. The method enables a large, systematic increase in coding gain while preserving the maximum transmit diversity gain and bandwidth e ﬃ ciency property of the considered space-time trellis code. Applying the above method to Tarokh et al. space-time trellis codes, signiﬁcant performance improvements can be obtained even with extremely short input information frames. The application of space-time turbo coded modulation to the space-frequency domain is also proposed in this paper. Exploiting the bandwidth e ﬃ cient orthogonal frequency division modulation (OFDM), multiple transmit antennas and large frequency selectivity o ﬀ ered by typical low mobility indoor environments, the proposed space-frequency turbo coded modulation performs within 2.5dB of the outage capacity for a variety of practical wideband multiple-input multiple-output (MIMO) radio channels.


INTRODUCTION
The knowledge of the fact that increasing the codeword length of block codes or constraint length of convolutional codes leads to better performance dates back to Shannon theory [1].It is also well known that in case of maximumlikelihood (ML) decoding the drawback of such a performance gain is the increased decoding complexity up to the point where decoding becomes physically unrealizable.Thus, the research in coding theory over the years has seen many proposals aiming at constructing powerful codes with large equivalent codeword or constraint lengths structured so to permit breaking the ML decoding into simpler partial decoding steps.Turbo codes [2] are the most recent of such an attempt, already accepted to be the result of a clever intuition built upon several concepts already established, rather than just a sudden apparition.
Turbo codes were originally introduced as binary errorcorrecting codes built from the parallel concatenation of two recursive systematic convolutional codes (RSC) exploiting a suboptimal but very powerful iterative decoding algorithm, the so-called turbo decoding algorithm.However, it turned out that the method applied for this parallel concatenation is more general.The turbo principle is nowadays successfully applied in many detection/decoding problems such as serial concatenation, equalization, coded modulation, multi-user detection, joint interference suppression, and decoding.
Attempts to combine turbo codes with multilevel amplitude or phase modulations in order to improve transmission spectral efficiency has brought many proposals of the so-called turbo coded modulations (TuCM) [3,4,5].Behind all schemes is Ungerboeck's trellis coded modulation (TCM) principle [6], now a well-established technique in digital communications, where significant coding gains are achieved through signal set expansion rather than sacrificing data rate or bandwidth efficiency.
The idea of improving wireless communication system reliability and capacity through diversity has been for a long time an interesting and promising topic [7].Many different diversity (temporal, frequency, polarization, code, spatial) techniques have been utilized in isolation bringing significant performance improvements by neutralizing detrimental effects of fading in wireless communication channels.If diversity techniques are combined, more independent dimensions become available for information transfer and therefore significantly more margin exists for system performance improvement.
During the past few years there has been a growing interest to combine the benefits of forward error control coding and antenna diversity.Many authors [8,9,10,11] have demonstrated that under specific configuration [12], multiple-input multiple-output (MIMO) wireless channels enable increased information theoretic capacity compared to single antenna systems.The so-called space-time coding (STC) schemes are focused on merging antenna diversity with appropriate channel coding in order to achieve both coding and antenna diversity gains.One of the first design criteria for such codes were derived in [13].However, the main impetus on research in the space-time coding area was done in [14] where powerful and bandwidth efficient Tarokh et al. space-time trellis codes (Tarokh-STTrC) were proposed.Unlike the Ungerboeck TCM approach where coding gain is achieved through signal set expansion, in the space-time trellis coding approach expansion is done in antenna space.For example, TCM enables 2 bit/s/Hz with 8PSK modulation and a single transmit antenna while in the case of STTrCs, the same bandwidth efficiency is achieved with QPSK modulation and two transmit antennas.In quasistatic fading channels, with two transmit antennas, Tarokh-STTrCs were reported to have performance close to the outage capacity.
With the handcrafted design of a low number of trellis states, Tarokh-STTrCs have the maximum diversity gain for a given number of transmit antennas but with a very poor coding gain.More extensive code search provided improved versions of STTrCs [15,16] but no significant breakthrough has been achieved.Further performance improvement, expected by increasing the code constraint length comes with the cost of increased ML decoding complexity.Due to the lack of systematic procedure for building STTrCs for a large number of trellis states it also turned out to be a tedious task.Recently, such an attempt [17] resulted in highly nonoptimized codes as we will show.
In this paper, we propose the space-time turbo coded modulation (STTuCM), a signaling method that with limited increase in decoding complexity enables a large, systematic increase in coding gain while preserving the maximum transmit diversity gain of the underlined space-time trellis code.The method can be applied to an arbitrary STTrC and assumes the construction of an equivalent, recursive spacetime trellis code (Rec-STTrC) which is then employed in the parallel concatenation with iterative decoding.Puncturing the outputs of component codes enables a considerable improvement in power efficiency with no loss bandwidth efficiency.We will show that STTuCM owes its good performance to mainly two important features.First, relatively simple constituent Rec-STTrCs are optimized for both multiantenna transmission and parallel concatenation.Second, a distinctive feature in the proposed scheme is the bit-wise interleaving between the two constituent codes, resulting in the overall parallel-concatenated coding scheme that operates on the bit level, despite the fact that constituent codes have nonbinary trellises.
In parallel, somewhat similar but independent work has been presented in [18,19].In [18], no puncturing was applied resulting in a turbo code with a reduced data rate compared to constituent codes.In [19], the interleaving between two constituent codes was performed on the symbol level; we will show that symbol level interleaving considerably limits the performance improvements in comparison to bit level interleaving.We also outline the other attempts to apply the Turbo principle to MIMO systems [20,21,22], which can be mainly summarized as combinations of binary, single antenna turbo codes with spatial multiplexing at the transmitter and suboptimal ML demodulation and Turbo decoding at the receiver.

RECURSIVE SPACE-TIME TRELLIS CODES
Through the analytical upper-bounding technique in [23], it was shown that turbo codes do require recursive but not necessarily systematic component encoders to work properly and that the implementation of recursive convolutional codes represents a distinctive feature of turbo codes.Parallel concatenated scheme built from recursive constituent codes has the sparse code distance spectrum similar to the type of code distance spectrum achieved by the "random-like" codes [23].
The originally proposed Tarokh-STTrCs are nonrecursive and therefore not suitable for interleaved code concatenations.For binary trellis codes, building an equivalent recursive code from a nonrecursive convolutional code is straightforward and is done by closing the feedback from the output to the input of the encoders block diagram.For symbol level trellises, like in the STTrC case, there exist more than one possibility to close the feedback from an output to input.Moreover, translating the trellis diagrams to closed analytic forms and sketching the corresponding block diagrams is not always straightforward.We propose the systematic way to build an equivalent recursive code from a nonrecursive trellis code based on the codes trellis diagram only.Closing the feedback on the codes block diagram introduces the infinite impulse response of the transfer function.Alternative way to assure the infinite impulse response of the transfer function is to reorganize the input/output transitions of the nonrecursive, nonbinary codes trellis diagram in the way described below.
Let Z be the number of input bits to the STTrC encoder during each trellis transition and let Q be the number of states in the nonrecursive codes trellis diagram.There are 2 Z branches entering each node and the same number of different input symbols taking values in 0, 1, . . ., 2 Z − 1.As STTrCs are designed to have no parallel transitions, 2 Z ≤ Q is always satisfied.Let P = Q/2 Z be the number of adjacent nodes in the trellis within a group of nodes so that there are Q/P such groups; see Figure 1 for a graphical illustration.For each branch in the trellis of recursive STTrC, assign the same corresponding output symbols as for the equivalent nonrecursive STTrC.This will preserve the maximum diversity gain and the frame error rate (FER) of nonrecursive STTrC.For corresponding input symbols, follow the algorithm depicted in Figure 1.
(1) Start with the group G 0 consisting of the first P nodes in the trellis.Assign the array of input symbols [0, 1, 2, . . ., 2 Z − 1] to branches consecutively departing from node 0. Assign the same array to the group of next P − 1 nodes.
(2) For each of P nodes within group

The encoder
We consider a system employing N transmit and M receive antennas.The block diagram of the proposed encoder is depicted in Figure 2 as a parallel concatenation of two Rec-STTrCs followed by a block for puncturing and/or multiplexing.The input bit information stream to the encoder is first divided into Z-bit long blocks and encoded by the nonbinary trellis of the first encoder.After being scrambled by pseudo-random bit-wise interleaving, it is again divided into Z-bit blocks and encoded by the second constituent encoder.The two blocks of N output symbols at each time instant (one block from each component encoder, each block having N symbols from the complex constellation-one for each transmit antenna) are then punctured and/or multiplexed and associated to N transmit antennas.
Both puncturing and multiplexing are done in parallel across antennas so that at one time instant, one of the encoders has full access to N transmit antennas.In case of nonpuncturing, equivalent code-word length of a parallelconcatenated code is doubled so that the overall bandwidth efficiency is halved.In case of puncturing, each component encoder is sending only every second N-symbol output block ?from its codeword, which is time multiplexed with every other N-symbol output block from the other encoder.For example, if the output of the first encoder is and the output of the second encoder is in case of puncturing, the sequence

2N
(3) will be transmitted.S k en above is the output symbol from encoder e, e = 1, 2, associated to transmit antenna n at discrete time k.In this way, full bandwidth efficiency of component codes is preserved.

The interleaving
Pseudo-random bit-wise interleaving for the nonpuncturing case does not have any restrictions and has the same size as the input information frame in bits.In case of puncturing, it actually consists of two, half-length bit-wise interleavers π 1 and π 2 .One interleaver is scrambling input bits on odd input symbol positions, another is independent of the first one, scrambling the input bits on the even input symbol positions.This will assure that due to puncturing all the input bits will contribute once and only once to the output codeword.Other than this constraint, both interleavers were chosen to be pseudo-random and bit-wise.For example, for Z = 3 and the input information frame given as (

The decoder
The block diagram of the decoder is depicted in Figure 3. Nonbinary component codes are decoded by a symbol-bysymbol MAP algorithm as in [24].To enable pseudo-random bit-wise interleaving, additional symbol-to-bit and bit-tosymbol reliability transforms are applied so the resultant encoder and iterative (turbo) decoder operate on the bit level.The bit-wise scrambling improves the resolution of the implemented interleaving and on a fading channel increases the block Hamming distance.
Assume that information frame at the input of one of the component encoders consists of L bits denoted as At the receiver end, for each component receiver antenna m, m = 1, . . ., M, received signal for whole input frame is denoted as where K = 2L/Z in case of nonpuncturing and K = L/Z in case of puncturing.At discrete time k, the signal received by antenna m is denoted as where α k nm are time varying path gains from transmit antenna n to receive antenna m modeled as samples of independent zero mean complex Gaussian random variables with variance 0.5 per dimension.The path gains along different paths are assumed to be noncorrelated.η k m are noise samples given as independent samples of zero mean complex Gaussian random variables with variance σ 2 = N 0 /2 per complex dimension.In (4), we assume e = 1 for k = 2c − 1 and e = 2 for k = 2c, c = 1, 2, . . .for both, nonpunctured and punctured case.Complex symbols S k en have the average energy of E s for each n = 1, . . ., N. To conform to the definition in [14] we define signal-to-noise ratio (SNR) per receive antenna as SNR = NE s /N 0 .Prior to decoding, each received vector r m is de-multiplexed into two vectors, r e,m, e = 1, 2 each contributed from one of two component encoders.In case of nonpuncturing these are For the sake of brevity we will repeat only the final results from [24] necessary for further understanding.Also we drop subscript e denoting component encoders, that is, r m = r e,m for e = 1, 2 and m = 1, . . ., M. The output of a symbol-bysymbol MAP algorithm is given as a posteriori probability (APP) of input information symbol , where the subscript 2 denotes the Z-bit long binary representation of the value in brackets; r k is the kth column of matrix r.The constant ξ can be eliminated by straightforward normalization; χ k and β k are results of forward and backward recursions while γ k denotes the branch transition probability for step k given as where the first term in product ( 6) denotes APP of transmitted symbols at time instant k, the second term in the product is either one or zero depending on whether the encoder input is associated with the transition from state S k−1 = q 0 to state S k = q or not.The third term in product ( 6) is a priori probability of information symbol d k In case of iterative decoding, a priori probability is supplied by the other decoder which makes the iterative (turbo) decoding algorithm suboptimal.This is done in all cases except the first iteration of the first decoder where no a priori information is available and therefore it is assumed that all input symbols are equally likely.The logarithm value of APP in ( 6) is calculated from In case of puncturing, for time instances k in which demultiplexed vectors r k are exchanged with erasures, the logarithm value of APP in ( 8) is set to 0. This means that at such time instances we cannot use channel outputs as we do not have them.Luckily at those time instances channel outputs correspond to another decoder, which is therefore capable of providing reliable a priori probabilities that dominate the term in (6).As the component encoders are nonsystematic, the output of MAP in (5) comprises two terms instead of three, which is the case when component codes are systematic.Two terms represent the extrinsic and a priori information.In the case that symbol-wise interleaving between constituent codes is employed, symbol-level extrinsic information is first extracted and symbol-wise interleaved to form a priori probability for the other symbol-by-symbol MAP decoder in (7).Therefore symbol-to-bit and bit-to-symbol transforms in Figure 3 are avoided and the exchange of loglikelihood information between the two constituent decoders is done directly at the symbol level.We propose bit-wise interleaving between constituent encoders and therefore prior to bit-level extrinsic information extraction and bit-wise interleaving, additional symbol-to-bit reliability transform is applied in the form producing a log-likelihood ratio for each information bit b l for all l ∈ {kZ, . . ., (k+1)Z−1} and k = 0, . . ., L/Z−1.Bit-level extrinsic information is now extracted where L apri (b l ) is a priori probability of the information bit b l .After being bit-wise interleaved it becomes Lext and is passed through bit-to-symbol reliability transform to result in a priori probability for the second decoder to be used in (7) Pr In the first decoding iteration of the first decoder, L 1apri in Figure 3 is set to all zeros, as we do not have any a priori information for that stage of decoding.Symbol-to-bit reliability transform transfers the joint into marginal probabilities.Also due to bit-wise interleaving and de-interleaving, bit-level extrinsic reliabilities can be assumed independent and therefore there is no loss of information when converting the symbolto-bit reliabilities and vice versa.Symbol-to-bit and bit-tosymbol reliability transforms add negligible complexity to the iterative decoder but which we will see, have a crucial impact on performance.

APPLICATION TO NARROWBAND RADIO SYSTEMS
In this section, we apply the proposed method to design an STTuCM based on Tarokh-STTrCs and evaluate the performance under two models of narrowband frequency flat Rayleigh fading channel models, namely the quasi-static and block fading channels.Narrowband transmission is assumed.Therefore, the results illustrate the performance in time division multiple access (TDMA) type systems, like the global system for mobile communications (GSM), IS-136, or enhance data rates on GSM Evolution (EDGE).The performance comparisons between different schemes in all of the following figures was done in terms of power efficiency under the same bandwidth efficiencies and modulation levels.In all simulations, the output frame size (the number of discrete time transmissions) consisted of 66 symbols which corresponds to an input information frame of only 132, 198, and  264 bit for QPSK, 8PSK, and 16QAM modulation formats, respectively.For turbo codes, this falls into the region of very low frame sizes.Two transmit and a single receive antenna are assumed in all simulations.The quasi-static fading channel is the worst case of fading realization where fading is constant during the whole transmission of one frame and therefore no temporal diversity is available.In order to test the importance of recursive component codes to overall, parallel-concatenated code performance, in Figure 4 we compared the performance of 4-state Tarokh-STTrC and Rec-STTrC.When implemented alone, both codes have the same frame error rate (FER) which is shown by two almost overlapping curves.On the other hand, when implemented in parallel concatenation, Rec-STTrCs have 3 dB gain compared to Tarokh-STTrCs already after 6 iterations.The performance gain of the parallel concatenated scheme compared to the performance of the single component code is more than 4.5 dB but the penalty is halved bandwidth efficiency as nonpuncturing is implemented.Unlike the parallel concatenation of Tarokh-STTrCs where the gain of parallel concatenation saturates already after the second iteration, in case of Rec-STTrCs, increasing the number of iterations above 6, additional gains are achieved.Therefore, we applied a total of 10 iterations in all of the following simulations.
In case of puncturing on quasi-static fading channels with extremely short frames, it has turned out that the best performance is achieved with the concatenation of the original Tarokh-STTrC and Rec-STTrC.This results from the availability to terminate both component encoders to the all-zero state at the end of each frame, though not with the same tail sequence.Figures 5 and 6 present the performance comparisons of STTuCM built as parallel concatenation of  the two 8-state STTrCs one of the component codes being Rec-STTrC compared to 8-state and 16-state Tarokh-STTrCs for QPSK and 8PSK modulation formats, respectively.Puncturing is implemented this time so that full bandwidth efficiency of constituent codes is preserved.Slopes of the simulation curves on Figures 4, 5, and 6 prove that the proposed STTuCM preserves the maximum diversity gain with and without puncturing.Horizontal distance between parallel curves is a measure of the additional coding gain.At an FER level of 10 −2 STTuCM outperforms 16-state Tarokh-STTrCs by 1 dB in both the QPSK and 8PSK case.At the usual targeted 10 −1 FER for packet data applications, STTuCM  performs within 1.5 dB and 2.2 dB of the 10% outage capacity for the QPSK and 8PSK case, respectively (see [14] or Figure 10).
For many practical wireless systems the channel can be modeled as block fading.Enhanced data rates for GSM evolution (EDGE) system relies on bandwidth efficient 8PSK modulation but has adopted the same underlying GSM frame structure where data is organized into 4 (half rate) or 8 (full rate) bursts and where optional frequency hopping in between bursts is included in the standard.
The block fading model is in general suitable for fading channels in which a certain block of adjacent transmitted symbols are affected by the highly correlated fading path gains.The length of the block may be considered as a first approximation of the channels coherence time (single-carrier systems) or the channels coherence bandwidth (multi-carrier systems).
Figures 7 and 8 present the performance comparisons of STTuCM built as a parallel concatenation of two 8-state Rec-STTrCs compared with 8-state and 16-state Tarokh-STTrCs for QPSK and 8PSK modulations with two independent fading blocks per frame.In both cases, STTuCM outperforms the 16-state Tarokh-STTrC by 3 dB at FER 10 −2 .Figure 9 depicts the performance comparison of STTuCM built as a parallel concatenation of two, 16-state Rec-STTrCs compared with 16-state Tarokh-STTrC for 16QAM modulation with two and four independent fading blocks per frame.
At FER 10 −2 , STTuCM outperforms Tarokh-STTrC by more than 3.5 dB and 6 dB for two and four independent fading blocks per frame, respectively.As seen from [25,26], an increase in the frame size severely deteriorates the performance of Tarokh-STTrCs while increasing the frame size of STTuCM improves the performance on channels with a considerable amount of temporal diversity.

APPLICATION TO WIDEBAND RADIO SYSTEMS
Due to its high bandwidth efficiency and suitability for high data rate wireless applications, orthogonal frequency division modulation (OFDM) was chosen as a modulation scheme for a physical layer in several new wireless standards, that is, digital audio and video broadcasting (DAB, DVB) in Europe [27,28] and the three broadband wireless local area networks (WLAN), European HIPERLAN/2, American IEEE 802.11a, and Japanese MMAC [29].
The application of STC to the space-frequency domain exploiting the bandwidth efficient OFDM modulation was presented as a natural solution for future high data rates over wide band MIMO radio channels [30].Large bandwidth and power efficiency gains were reported as compared to single antenna channel codes employed with OFDM transmit diversity [31].In this section, we advocate the application of STTuCM to the space-frequency domain and demonstrate significant performance improvements when compared to some other space-time coding schemes applied to multiantenna OFDM.

Space-frequency coding in OFDM systems
We consider a system employing N transmit and M receive antennas.The equivalent sampling rate discrete-time channel from any of the transmit antenna n to the receive antenna m can be represented with an equivalent Wth-order finite impulse response (FIR) filter with filter taps Coefficients h k n,m,w are modeled as samples of independent zero mean complex Gaussian random variables with variance 0.5P k n,m,w per dimension.
] is defined by the power delay profile of the channel and is assumed to have a unit norm.As seen from [32] the system can be easily coupled with the additional transmit antennas for OFDM delay diversity but this is beyond our consideration in this paper.At each discrete time instant k, k = 1, . . ., B, the input sequence of enters STC encoder where C is the number of sub-carriers in the OFDM symbol.Corresponding output of the STC encoder and modulator is a tall ,n denotes a point in complex constellation of 2 Z symbols and has the average energy of E s for each of n = 1, . . ., N. As in [33] T and R cp = [0 CxW I CxC ] denote the C × C fast Fourier transform (FFT) matrix, (W + C) × C cyclic prefix insertion matrix and C × (W + C) cyclic prefix removal matrix, respectively.
After OFDM demodulation at the receiver, complex baseband C × 1 signal vector at receive antenna m can be expressed as where η k m denotes (C +W)×1 vector of noise samples, mutually independent zero mean complex Gaussian random variables with variance σ 2 = N 0 /2 per complex dimension.Diagonal matrix D k n,m is given as ]F c and where H k n,m denotes (C + W) × (C + W) Toeplitz matrix with its (x, y) entry h k n,m,(x−y) .Signal-to-noise ratio (SNR) per receive antenna is defined as SNR = NE s /N 0 .We assume in general that input information frame so that one coded information frame covers multiple of B successive OFDM symbols which gives rise to simultaneous coding across space, frequency and time.For the perfect knowledge of channel state information (CSI) at the receiver, maximum likelihood sequence detection (MLSD) metric for Viterbi and maximum a posteriori (MAP) probability decoder is given by where the minimization is done over all possible codewords of the space-time code used for transmission.
For future high data rate and low mobility applications, Doppler frequency normalized to OFDM symbol interval T OFDM is rather small so usually the channel offers no temporal diversity.Therefore regardless of parameter B, we will model a channel within the duration of one coded frame as the quasi-static so although we are coding across multiple OFDM symbols we will still refer to implemented schemes as space-frequency codes rather than space-frequency-time codes.

Capacity
OFDM and multi-carrier modulation (MCM) in general, are considered as one of many information-theoretic inspired signaling methods.Calculating the capacity of a frequencyselective channel, Shannon has demonstrated that slicing the bandwidth into infinitesimal, flat sub-bands represent a capacity approaching signaling strategy [34,35].
The fading on quasi-static channels, as a nonergodic process, determines the capacity as a random variable.Let Ĥk c , c = 1, . . ., C, denote the N × M dimensional matrix with its (n, m) entry α k n,m,c .When the channel state information (CSI) is unknown at the transmitter and perfectly known at the receiver, we determine the capacity of the OFDM MIMO signaling system in bit/s/Hz as [36] where log is the logarithm of base 2, I N×N is the identity matrix and superscript k is dropped due to the quasi-static assumption.Note that only in the limiting case of an infinite number of OFDM sub-carriers, the above defined MIMO OFDM signaling system capacity approaches the exact capacity of the underlying space-frequency channel.Calculating the instantaneous capacity in (14) for a large enough number of channel realizations h n,m and collecting the statistics represents a straightforward semi-analytical method of calculating the outage capacity for specific wideband multi-path radio channels.In Figure 10, the 10% outage capacity for the two different OFDM signaling systems with 2 transmit and single receive antennas was evaluated.
The MIMO-OFDM-1 system is compromised of 1 MHz bandwidth, 256 sub-carriers, sub-channel separation of 3.9 kHz, OFDM frame duration of 256 µs and a guard interval of 40 µs.The channel was assumed to be spatially noncorrelated and its power delay profile P n,m was modeled as a  two-path equal power channel from each of the two transmit antennas.Note that for the equal power multi-path channels, capacity depends only on the order of the channel W and not on the relative delays between the multi-paths.
The MIMO-OFDM-2 system adopted the physical layer parameters from HIPERLAN/2 and IEEE 802.11aWLAN standards [37].The total available bandwidth of 20 MHz with 64 sub-carriers in OFDM symbol corresponding to subchannel separation of 312.5 kHz and OFDM frame duration of 3.2 µs was assumed.To each frame a guard period of 0.8 µs was added and a total of 48 sub-carriers were used for data transmission.An additional 4 sub-carriers were assigned for pilots, though CSI was assumed to be perfectly estimated at the receiver.We apply MIMO-OFDM-2 with two different wideband channel models.In MIMO-OFDM-2a, power delay profile P n,m was adopted from a six-path ITU-B, indoor office channel model [38] while in MIMO-OFDM-2b a P n,m was chosen according to 18-path, large open space office environment ETSI BRAN-B channel model [39].

Performance evaluation
In the following, we evaluate the performance of STTuCM applied to the two previously defined MIMO-OFDM signaling systems and compare the performance to some other, recently proposed space-frequency codes designed for 2 bit/s/Hz bandwidth efficiency exploiting the two transmit antennas and QPSK modulation.We assume perfect frame and sample clock synchronization between the transmitter and the receiver.Prior to OFDM modulation at the transmitter, complex codeword symbols were interleaved with length BC channel interleaving.
Based on the large effective code length, Lu and Wang recently proposed a new family of space-time trellis codes for multi-antenna OFDM systems in [17].Codes were designed upon already existing trellis coded modulation schemes optimized for frequency flat fading channels.A class of rate 2/3 8PSK TCM for single antenna transmission was transformed into rate 2/4 QPSK code for two transmit antennas by splitting the original 8PSK mapper into two QPSK mappers, one for each transmit antenna.Large performance gains were reported by increasing the code complexity up to 256 trellis states.We refer to this space-frequency trellis code approach as SFTrC-L to distinguish between application of Tarokh-STTrCs to the space-frequency domain [30], which we denote as SFTrC-T.In both cases, a Viterbi decoder is used for decoding.
We denote with SFTuCM-Dbit the application of the proposed STTuCM to the two MIMO-OFDM signaling systems.SFTuCM was built as a parallel concatenation of two, 8-state Rec-STTrCs.To demonstrate the importance of the bit-wise interleaving between constituent codes, we also employ the symbol-wise interleaved Cui and Haimovich STTuCM [19] as the space-frequency turbo coded modulation and denote it with SFTuCM-Csymb.As seen from Figures 11, 12, and 13, SFTuCM-Dbit strongly outperforms all the above considered space-frequency coding schemes.Bit-wise interleaving between constituent codes brings more than 2 dB gain at 10 −2 FER as compared to symbol-wise interleaving realization.The large effective code length design criteria applied to SFTrC-L resulted in a highly nonoptimized solution.It is rather a brute force method of increasing the number of trellis states not taking into account the rank criteria [13] and transmit diversity properties of the code.Moreover, it was further demonstrated in [32] that the performance of the rather complex 256-state SFTrC-L, can be achieved applying the newly proposed STTuCM method to the simple 8-state code of the same family.
We also conclude that the newly proposed SFTuCM-Dbit performs within 2.5 dB of the 10% outage capacity for all of the considered MIMO-OFDM signaling systems.Note that we already concluded similar performance of STTuCM on frequency flat fading channels, which indicates the robustness of the proposed coding scheme.

CONCLUSIONS
We proposed a new method for the design of recursive spacetime trellis codes and parallel-concatenated space-time turbo coded modulation that can be applied to an arbitrary existing space-time trellis code.The method enables a large, systematic increase in coding gain while preserving the maximum transmit diversity gain and bandwidth efficiency property of the considered space-time trellis code.With rather limited additional complexity, related exclusively to the applied iterative decoding algorithm, the proposed method solves the problem of building the powerful and optimized spacetime trellis codes with a large equivalent number of trellis states.Space-time turbo coded modulation was demonstrated to owe its good performance mainly to two important features.First, relatively simple 8-state and 16-state recursive constituent space-time trellis codes are optimized for both the multi-antenna transmission and parallel concatenation.Second, a distinctive feature in the proposed scheme is the bit-wise interleaving between two constituent codes.Applying the above method to Tarokh et al. space-time trellis codes, we reported significant performance improvements even with extremely short input information frames.Finally, we advocated the application of space-time turbo coded modulation to the space-frequency domain.Exploiting the bandwidth efficient OFDM modulation, multiple transmit antennas and large frequency selectivity offered by typical low mo-  bility indoor environments, the proposed space-frequency turbo coded modulation performs within 2.5 dB of the outage capacity for a variety of practical wideband MIMO radio channels.

Figure 2 :
Figure 2: Block diagram of the encoder.
b 0 b 1 b 2 ; b 3 b 4 b 5 ; b 6 b 7 b 8 ; b 9 b 10 b 11 ), one realization of the π 1 -π 2 pair will result in the sequence (b 8 b 6 b 0 ; b 11 b 9 b 10 ; b 2 b 1 b 7 ; b 3 b 5 b 4 ) entering the second constituent encoder.In the case of single, pseudo-random symbol-wise interleaving [19], one realization of the interleaving will result in the sequence (b 9 b 10 b 11 ; b 6 b 7 b 8 ; b 3 b 4 b 5 ; b 0 b 1 b 2 ) entering the second constituent encoder.

Figure 10 :
Figure 10: Capacity achieved in 90% of transmissions with N = 2 transmit and M = 1 receive antennas.