Iterative Reconﬁgurable Tree Search Detection of MIMO Systems

This paper is concerned with reduced-complexity detection, referred to as iterative reconﬁgurable tree search (IRTS) detection, with application in iterative receivers for multiple-input multiple-output (MIMO) systems. Instead of the optimum maximum a posteriori probability detector, which performs brute force search over all possible transmitted symbol vectors, the new scheme evaluates only the symbol vectors that contribute signiﬁcantly to the soft output of the detector. The IRTS algorithm is facilitated by carrying out the search on a reconﬁgurable tree, constructed by computing the reliabilities of symbols based on minimum mean-square error (MMSE) criterion and reordering the symbols according to their reliabilities. Results from computer simulations are presented, which proves the good performance of IRTS algorithm over a quasistatic Rayleigh channel even for relatively small list sizes.


INTRODUCTION
A multiple-input multiple-output (MIMO) technology, deploying multiple transmit and receive antennas, is most likely to be the dominant solution to meet the requirement of rapid data flow in future wireless communication systems [1,2].It makes full use of random fade and multipath propagation to improve transmit rate greatly without increasing bandwidth and transmit power.To approach MIMO channel capacity, channel code is usually required to provide redundancy to guard against burst fading, interference, and noise.
It is advantageous to apply iterative receivers with spacetime bit interleaved coded modulation (ST-BICM) techniques in view of performance and computational complexity [3][4][5][6].By applying "turbo processing" principle, the iterative receiver is divided into two stages: MIMO detector and channel decoder.These two stages iteratively exchange extrinsic information learned from one to the other until the receiver converges.The design of low-complexity MIMO detector to eliminate interference between layers totally is the main challenge.Maximum a posteriori (MAP) algorithm is the optimal in a sense of the least bit error rate (BER) from the detector output, which performs an exhaustive search over the complete set of all the possible symbol vectors and has exponential complexity with the number of transmit antennas and constellation size [6].To explore the tradeoff between the coding gain attained and the computational effort expensed, some suboptimal methods are presented.By modifying the null-canceling approach used in the Bell laboratory layered space-time (BLAST) detection scheme introduced in [7], soft cancellation minimum mean-squared error (SC-MMSE) detection scheme of [3] provides soft output using priors.Most other available schemes are essentially approximations of MAP detector, in which transmitted symbol vectors with a relatively low likelihood are excluded from search space.The list sphere detector (LSD) determines a list of candidate vectors for the transmitted symbols, all of which result in a small Euclidean distance between the received vector and the noiseless channel output corresponding to the candidate vector [6].Gibbis sampling, a statistical method based on Markov chain Monte Carlo (MCMC) simulation techniques, is an alternative method for choosing candidate list.MCMC techniques are demonstrated to perform better than LSD with less complexity [8,9].Via tight lower and upper bounds, branch and bound method can considerably speed up the solution process for sphere detectors [10].Iterative tree search (ITS) detection of [11] performs a channel triangularization procedure by matrix Cholesky factorization, which enables  a reduced search space to be selected by means of the Malgorithm [12].This paper presents an iterative reconfigurable tree search (IRTS) algorithm based on the ITS scheme.By reconfiguring the tree structure according to the symbol reliability information, the new algorithm can further decrease the number of sequences in the search space and attain the better bit error performance with lower complexity.

SYSTEM MODEL AND ITERATIVE RECEIVER
Consider the MIMO system with N t transmit and N r receive antennas.A Q × 1 vector of symbols, s = [s 1 , s 2 , . . ., s Q ] ∈ S Q , is encoded by ST encoder into the N t × T ST block C, where the superscript T indicates transpose, S denotes the constellation with 2 Mc (M c ≥ 1) possible signal points, T is the number of symbol periods in each block.The symbol transmit rate of the ST code is Q/T symbols per channel use (pcu).Let Y be N r × T received signal matrix, then it can be written as where H is N r × N t channel matrix, known perfectly to the receiver, whose entries are assumed to be independent and identically distributed zero-mean complex Gaussian random variables with a common variance 0.5 per real dimension, to remain constant within each block and to change independently from one block to the next (i.e., quasistatic).The entries of N r ×T noise matrix W are assumed to be independent samples of zero-mean complex Gaussian random variables with a common variance σ 2 per real dimension.
To describe the decoding problem conveniently, let y = vec(Y), w = vec(W), where vec(•) denotes stacking all the columns of matrix into one column, (1) can be rewritten as where ⊗ denotes the Kronecker matrix product, c n (n = 1, 2, . . ., T) is the nth column of C. In this paper we only consider vertical Bell labs layered ST (V-BLAST) multiplexer [7]; other ST block codes can be easily extended.In the case of V-BLAST Q = N t and T = 1, (2) can be represented compactly as Figure 1 illustrates a block diagram of the coded MIMO system employing ST-BICM and iterative receiver.The receiver follows the structure that was first proposed in [13] for code division multiple access (CDMA) systems and later applied to MIMO systems [3][4][5][6].At the transmitter, binary information bit sequence u is encoded into the sequence v by the predetermined error correction code; coded sequence v is bit-interleaved by a pseudorandom permuter Π to generate x; based on constellation S, the interleaved sequence x is mapped to symbol vectors s, and then sent by multiple antennas.At the receiver, the transmitted signals are received on N r receive antennas, and the received signal vectors y are fed to the MIMO detector.The optimum decoder is maximumlikelihood (ML) decoder, which has an exponential computational complexity increasing with the length of information bit sequence and does not lend itself to a feasible decoding method.
Channel encoder and ST constellation mapper are separated by an interleaver, which forms a structure of a serially concatenated code: channel code as outer code and ST mapper as inner code [3][4][5][6].Based on iterative "turbo processing" principle, the concatenated code can be decoded using a lowcomplexity iterative method.The optimal decoding problem is divided into two stages: MIMO detector (inner module) and channel decoder (outer module).Soft-input soft-output (SISO) algorithm is adopted at each stage and soft information is exchanged between the two stages.Assume L D (•), L A (•), and L E (•) denote log-likelihood ratio (LLR) of the a posteriori information, the priori information and the extrinsic information, respectively, the decoding process can be generalized as follows.
(1) Inner module computes L E (x), conditional on y and which is fed into outer module as the a priori information of v. (2) Outer module processes L A (v) based on the constraints imposed by channel code to yield L E (v) and which is passed to inner module as a priori information.
The above operations ( 1) and ( 2) are repeated until predefined terminal condition is satisfied.At the end of iterative process the estimation of u is obtained by hard-deciding L D (u), thus

ST MAP DETECTOR AND ITS ALGORITHM
At the transmitter, the use of interleaver makes the bits within x statistically independent.Based on MAP detector the extrinsic information of the coded bits, expressed as a loglikelihood ratio [6], can be computed by where x qk denotes the kth bit mapped onto the symbol s q , qk and X −1 qk are sets of all possible bit sequence x with x qk = +1 and x qk = −1, respectively.The likelihood function p(y | x) can be deduced from (3), we have where s = [ s 1 , s 2 , . . ., s Nt ] T = (H H H) −1 H H y is the unconstrained ML solution, and the superscript H denotes Hermitian transpose.The second term of the right-hand side of ( 9) is independent of s and can be omitted from the metric.For H H H is nonnegative definite matrix, it can produce L H L by Cholesky factorization, where L is N t × N t lower triangular matrix.The first term of the right-hand side of ( 9) can be written as By defining the metric can be computed in a symbol-by-symbol fashion, starting with the first symbol s 1 and proceeding to s Nt , by exploiting the following relations: A symbol vector s consists of N t symbols and can uniquely be represented by a path through tree structure with depth N t , having a single symbol on each branch and 2 Mc branches out of each node.A sequence of symbols s 1 , s 2 , . . ., s q and a metric μ q is associated with each path of the tree, where q ≤ N t denotes the symbol depth of path.Each symbol vector s corresponds to a path with depth N t and has a metric μ(s) = μ Nt .The computational complexity of such an optimum detector is exponential with N t M c .
M-algorithm [11,12], a reduced complexity algorithm based on the breadth-first sorting, is applied to the iterative tree search of MIMO detection.M-algorithm only searches for the best paths through the tree, that is, those corresponding to the symbol vectors with the highest a posteriori probabilities.At each symbol depth smaller than N t , the algorithm keeps a list of the best M paths and then moves forward by extending the M paths it has retained to form new M • 2 Mc paths.For all the terminal branches to this depth, metrics are computed, the best M paths are kept in the updated list and the rest M • (2 Mc − 1) paths are deleted.Practically nearoptimum performance is often achieved when M is only a small fraction of the full search space.
After having obtained the M candidate symbol sequences, denoted by the set L, and also using max-log approximation [6], (7) can be written as μ(s) M-algorithm only considers a fraction of all possible paths and the set L is not guaranteed to contain the best M candidates, but the probability that it does increases with signal noise ratio.Moreover, all bit sequences in L might end up having the same binary value at some positions especially when M is small.In such a case, ( 13) cannot be evaluated because either L ∩ X +1 qk or L ∩ X −1 qk is empty and L E (x qk | y) is assigned a positive or negative clipping value.The optimized value in [11], ±3, is used in the simulations of Section 5.

IRTS ALGORITHM
The reconfigurable trellis (tree) search algorithm has been employed in channel decoders [14,15].It achieves near-ML performance with low complexity.The key idea is to arrange symbol positions according to different reliabilities of symbols.During the search process in the previously mentioned ITS algorithm, the number of branches is decreased by exploring paths that are most likely to be part of the maximumlikelihood path (MLP), while discarding those paths that are unlikely to belong to the MLP as early in the search as possible.Few branches are needed to be explored and a reduced search algorithm can stop any further exploration of a path relatively early in the search without losing the MLP, if the influence of unexplored branch metrics on the rank order of the path metrics are insignificant.The order is only determined at the first iteration and a reconfigurable tree structure is constructed according to the order; during the following iterations, the detection process is based on the reconfigurable tree structure.
Let s k (k = 1, 2, . . ., N t ) be the desired signal, (3) can be denoted as [3] where and s k = [s 1 , s 2 , . . ., s k−1 , s k+1 , . . ., s Nt ] T .By using a linear filter z k , an N r ×1 column vector, the decision statistic of the kth substream is According to ( 14), ( 15) can be rewritten as where the three terms on the right-hand side of ( 16) are desired response obtained by the linear filter, coantenna interference and phase-rotated noise, respectively.The weights of the linear filter should be optimized.Based on MMSE criteria, z k is the vector such that the mean-squared error between r k and s k is the minimum: where E denotes the expectation and z k can be computed as [3,16] The estimation of transmitted symbol at the kth antenna, s k , can be achieved by quantizing r k .The reliability of symbol can be computed and denoted by log-likelihood ratio where p{r k | s k is the conditional probability density function of r k given s k .Here we assume that each element of z H k w still obeys the Gaussian distribution and has the same variance σ 2 , and we have [17] where dist(r k , s k ) denotes the Euclidean distance between r k and s k .Using max-log approximation, ( 19) can be simplified as Based on this reliability measure, the symbols within the vector s are reordered in descending order and the columns of channel matrix are also rearranged correspondingly.Then the ITS algorithm is applied to this reconfigurable tree.
Example 1.The following example with the N t = N r = 4QPSK-modulated MIMO system illustrates the procedure.
The system is given by Assume that the noise variance σ 2 = 2.0047, and the received symbol vector is (24) By quantizing r k and using (21), s k and L( s k ) can be computed and are listed in Table 1.
According to the computed reliability metrics, the search sequence can be arranged as k = 3, 1, 2, and 4. Observing (21), we can find that if real and imaginary components of r k are separated, the reliability metric by the exact computation is the tradeoff between the two components; while the reliability metric by the max-log approximation computation is mainly decided by the unreliable one between real and imaginary components.In both cases the higher reliability component is influenced by the lower one.The example also proves such a result.
For QPSK or QAM modulations because of the independence between real and imaginary components of each constellation symbol, the real and imaginary components can be processed separately.By defining y = [y 1R , y 2R , . . ., y Nr R , y 1I , y 2I , . . ., y Nr I ] T , s = [s 1R , s 2R , . . ., s NtR , s 1I , s 2I , . . ., s NtI ] T , w = [w 1R , w 2R , . . ., w Nr R , w 1I , w 2I , . . ., w Nr I ] T , and imag(H) real(H) , where real(•) and imag(•) indicate the real and imaginary components of a complex matrix, respectively, (3) can be written as Using (25) in IRTS algorithm, since the real and imaginary components can be separated, the order for the detection of the real and imaginary components can be determined separately.Based on their respective reliability metrics, the performance of the algorithm can be further improved.

COMPLEXITY ANALYSIS AND SIMULATION RESULTS
In the section of complexity analyses, complexity orders estimation of MMSE detection, LSD, exact MAP detection is provided, and then the number of basic operations for exact MAP detection, ITS detection, and IRTS detection is counted.Complexity analysis of the detectors is based on an iteration of the detection/decoding loops.The matrix inversion performed by the MMSE-based detector constructs the bulk of the total complexity, whose complexity is O(N 3 r ) [4].The complexity of the LSD scheme is dependent on the noise.There exist different viewpoints for the complexity of sphere decoder.References [10,18] indicate that the expected complexity of sphere decoder is subjected to polynomial dependence on N t , that is, O(N 3 t ) when SNR is high, and the complexity is predicted as exponential when SNR is low.Reference [19] indicates that the complexity of sphere decoder is exponential and the rate of the exponential function depends on the SNR.It is quite small for high SNR.As to the exact MAP detection, the total number of symbol vectors needed to be processed is 2 ITS detection, compared with the metric update procedures associated by (12), other complexities associated with the computation of the unstrained ML symbol estimation, the detection output (13) with the aid of the max-log approximation, and the Cholesky factorization of H H H, is negligible, and therefore not considered in the analysis.Based on the ITS detection, the IRTS detection scheme introduces the extra complexity of the computation of MMSE preprocessing and the symbol reliabilities for the first iteration.For the following iterations, only some symbol position permutations need to be performed, whose complexity can be ignored.The numbers of floating-point additions and multiplications involved in ITS detection, IRTS detection, and exact MAP detection of the N t M c code bits transmitted during a single symbol period are listed in Table 2.
Table 2 shows that the complexity of the ITS detection is O(M2 Mc N 2 t ), and IRTS detection only introduces the additional complexity of O(N 3 r ) for the first iteration, which may be ignored.
In the simulations, the channel code is a parallel concatenated (turbo) code with rate R = 1/2, whose constituent convolutional codes both have memory 2, with feedback polynomial G r (D) = 1 + D + D 2 and feedforward polynomial G f (D) = 1 + D 2 .Frames of 1024 information bits are fed to the channel encoder and interleaver, QPSK modulated and subsequently transmitted over a quasistatic fading channel.There are eight iterations over MIMO detector/turbo decoder loop, and four iterations within turbo decoder.All the interleavers are pseudorandom, and no attempt was made to optimize their design.Figures 2 and 3 show the performance of iterative detection and decoding for N t = N r = 4  and N t = N r = 8 transmit/receive antennas, respectively.For IRTS detection discussed in Section 5, the performance of the following two cases are given: the case with separating real and imaginary components, denoted as "IRTS Real," and the case without separating real and imaginary components, denoted as "IRTS Complex." For the 4 × 4 MIMO system, exact MAP detection is performed, which computes soft a posteriori value based on all the 256 symbol vectors.Performance of IRTS detection with M = 16, which is better than that of MMSE detection, LSD and ITS detection, is shown to have achieved near exact MAP performance.At BER = 10 −4 , "IRTS Real" detection has achieved more than 0.3 dB coding gains over ITS detection.For the 8×8 MIMO system, the exhaustive search space is composed of 2 16 symbol vectors.Because of the relatively small number of searched symbol vectors, the performance of LSD with L = 64 and ITS detection with M = 16 is worse than that of MMSE detection.The IRTS detection is shown to have the excellent ability to find the MLP, and "IRTS Real" detection with M = 16 even performs better than ITS detection with M = 32.
The simulation results have also demonstrated the performance improvement by separating real and imaginary components.For the 4×4 MIMO system, Figure 2 shows that about 0.2 dB gain has been achieved at BER = 10 −5 .For the 8 × 8 MIMO system, Figure 3 shows that the performance of "IRTS Real" detection with M = 16 even equals that of "IRTS Complex" detection with M = 32.

CONCLUSIONS
This paper has proposed a novel reduced-complexity detection scheme for iterative ST-BICM MIMO receivers, named iterative reconfigurable tree search detection.An important improvement of this scheme is using the reliability metrics computed by MMSE criterion to order the transmitted symbols, constructing a reconfigurable tree structure and applying M-algorithm to the reconfigurable tree.The IRTS detection scheme, whose complexity per bit is almost linear in the number of transmit antennas, offers the possibility of trading off lower complexity for improved performance.And it has been demonstrated that such a scheme is capable of approaching MAP performance at considerably reduced complexity.
We have focused primarily on the reduced-complexity detection schemes.Some possible ways that we have not considered to improve performance include optimizing the design of interleaver to have a good minimum distance and improving constellation shaping [6], and so forth.

Figure 1 :
Figure 1: Block diagram of the coded MIMO system with iterative receiver.
Nt•Mc and has the complexity order of O(2 Nt•Mc ).

Table 2 :
Operation counts for ITS, IRTS, and exact MAP detection, per symbol period (N t M c bits).