Selected basis for PAR reduction in multi-user downlink scenarios using lattice-reduction-aided precoding

The application of OFDM within a multi-user downlink scenario is considered. Thereby, two problems occur. First, due to OFDM, the transmit signal exhibits a large peak-to-average power ratio (PAR). Second, the multi-user interferences have to be equalized (or precoded) at the transmitter side. In this article, we address combined precoding and PAR reduction. As precoding schemes sorted Tomlinson-Harashima precoding (sTHP) and its lattice-reduction-aided variant (LRA-THP) are considered. In order to reduce the PAR, we review the scheme selected sorting (SLS), which is a combined approach of PAR reduction and precoding with sTHP. Based on this idea, the novel PAR reduction scheme selected basis (SLB) is introduced which combines PAR reduction with the precoding approach LRA-THP. It can be shown that SLB achieves very good PAR reduction performance and hardly influences the error performance. Both schemes, SLB and SLS, are compared with simplified selected mapping (sSLM), the only PAR reduction scheme from the SLM family, which can be applied in multi-user downlink scenarios. The comparison is done on the basis that the respective schemes exhibit the same computational complexity. In terms of PAR reduction performance, it turns out that sSLM outperforms SLS, whereas the performance of sSLM and SLB is similar. Noteworthy, the great benefit of SLB or SLS is that no side information has to be communicated to the receiver as it is necessary with sSLM. Moreover, using SLB, full diversity error rate performance is possible with only low-PAR transmit signals.


Introduction
Orthogonal frequency-division multiplexing (OFDM) [1] is a very popular scheme for equalizing the temporal interferences caused by frequency-selective channels. One essential drawback of OFDM systems is large peaks in the transmit signal. This property leads to signal clipping at the nonlinear power amplifier, which in turn leads to very undesirable out-of-band radiation. In order to avoid violating spectral masks, a transmitter-sided algorithmic control of the peak power is essential. Such algorithms are denoted as peak-to-average power ratio (PAR) reduction schemes. PAR reduction techniques for single-antenna OFDM systems have been well analyzed in the literature. The most prominent are selected mapping (SLM) [2], partial transmit sequences (PTS) [3], active constellation extension (ACE) [4] or tone reservation (TR) [5].
In order to satisfy the demands for high data rates, modern communication systems use multiple antennas at transmitter and receiver to increase the channel capacity [6]. The problem of out-of-band radiation gets even more serious for such a multiple-input/multiple-output (MIMO) system. Since the transmitter is equipped with multiple antennas, out-of-band radiation is generated as soon as the signal at only one antenna is clipped. Hence, the reduction of the signal's peak power is even more relevant for such systems.
Recently, peak power reduction schemes, developed for single antenna systems, have been transferred to the MIMO case. Possible extensions for the popular scheme SLM have been proposed in [7][8][9]. However, in many cases these extensions have only been discussed for multi-antenna point-to-point scenarios where the equalization of the multi-antenna interferences can be accomplished at the receiver side. This article deals with the specific scenario of multiuser downlink transmission. Here, the transmission between a central unit, equipped with multiple antennas, and independent users, each equipped with a single or multiple antennas, takes place. In this case, it is essential to apply transmitter sided precoding [10,11] to preequalize the multi-user interferences. The combination of transmitter sided precoding with peak-power reduction algorithms is not straightforwardly possible and may lead to a significant degradation of the error performance, to a decrease in PAR reduction capability, or to an increase of computational complexity.
Due to its very low complexity but good performance, we consider the precoding schemes sorted Tomlinson-Harashima precoding (sTHP) and, in particular, latticereduction aided THP (LRA-THP). Recently, the PAR reduction scheme selected sorting (SLS) has been introduced in [12,13], which combines PAR reduction with sTHP. Based on this idea, in this article we introduce a combination of PAR reduction with LRA-THP. This scheme is denoted as selected basis (SLB). As reference PAR reduction scheme, we consider simplified SLM (sSLM) [7], the only extension of SLM which is applicable in multi-user downlink scenarios.
This article is organized as follows: next section introduces the considered MIMO OFDM system model and the considered precoding schemes sTHP and LRA-THP. Followed by the novel PAR reduction scheme SLB is introduced. Then, numerical results are shown. Finally, conclusions are drawn.

OFDM System Model
We consider downlink transmission between a central unit, equipped with N C antennas, and K independent users which are not able to cooperate in any way. For brevity, we assume that each mobile terminal has a single receive antenna; the extension to multiple antennas is easily possible by considering data streams rather than users and each user may receive multiple data streams.
The impulse response (in the equivalent complex baseband [14]) of the respective MIMO channel is given in the z domain by the matrix polynomial The fading coefficient at delay step k is given by the complex K × N C matrix h k which describes the multiuser interferences; l H is the length of the channel impulse response. Throughout this article, we assume that the transmitter has full channel state information (CSI).
In order to equalize the temporal interferences OFDM using D subcarriers is applied. The remaining multi-user interferences at each subcarrier, described by the flat fading channel matrix have to be equalized by transmitter-sided precoding. In the following, we compare the precoding schemes (sorted) Tomlinson-Harashima Precoding ((s)THP) [11] with its lattice-reduction-aided variant (LRA-THP) [15,16].
The complex-valued modulation symbols for each user k and each subcarrier d are drawn from an M-ary QAM constellation (modulation alphabet A M ) and collected in the K × D matrix A = [A k,d ], which is denoted as the frequency-domain MIMO OFDM frame. The precoding of the multi-user interferences has to be applied over the columns (vectors The resulting precoded frequency-domain MIMO OFDM frame is denoted by the matrix X. The timedomain MIMO OFDM frame (matrix x) is obtained via an inverse discrete Fourier transform (IDFT) [17] along each row (vectors Due to the D-wise superposition of the precoded frequency-domain symbols within the Fourier transform, the time-domain symbols x = [x k , d ] exhibit a large dynamic range, i.e., the peak-to-average power ratio (PAR) of these symbols is very high. As usual in literature we consider the worst-case PAR to be the relevant criterion, i.e., the maximum PAR over all antennas within one OFDM frame, which is defined as For performance comparison of the PAR reduction schemes discussed in this article, we assess the complementary cumulative distribution function (ccdf) of the PAR, i.e., the probability that the PAR of a given OFDM frame exceeds a certain threshold PAR th : Under the assumption that all samples of x are Gaussian distributed (which is a very good approximation due to the central limit theorem) and under the assumption that the samples of x are statistically independent, the ccdf of the original signal can be calculated to [18] ccdf orig (PAR th ) = 1 − (1 − e −PAR th ) DK .

Precoding Strategies
Subsequently, we consider Tomlinson-Harashima precoding [10] to preequalize the multi-user interferences caused by the channel in each subcarrier. The basic block diagram of this scheme, which has to be applied to each subcarrier, is depicted in Figure 1. First, the signal vector A d (dth column of A) is passed through one of the matrices P opt,d or Z opt,d . The matrix P opt,d describes a permutation matrix, which is used with sorted THP. The matrix Z opt,d describes the unimodular a basis change matrix, which is present in LRA-THP. A detailed description how these matrices are chosen is given subsequently.
Next, the signal is precoded within the feedback-loop, i.e., it is successively processed by the feedback matrix B d , a lower triangular matrix with unit main diagonal, taking the interferences of already encoded users into account. Then the signal is modulo reduced onto the support of A M . After that, the signal vector is passed through the feedforward matrix F d . In order to ensure constant sum power at each subcarrier, the signal is multiplied with the scalar b d . This scalar factor is given . At the receiver, the signals are scaled suitably, quantized with respect to the lattice of the constellation alphabet, and modulo reduced onto the support of A M . Due to the assumed scaling each user exhibits the same signal-to-noise ratio and therefore the same error performance.

Sorted Tomlinson-Harashima precoding
When considering sorted THP the precoding order of the users is optimized in each subcarrier via the permutation matrix P opt,d . A reasonable optimization criterion is to achieve least average error rate. This is achieved in an almost optimum way if the user exhibiting the lowest signal-to-noise ratio is encoded first (reverse V-BLAST ordering b [11]). Considering the uplink-downlink duality, e.g., [19], the calculation of the optimum permutation order and the decomposition into feedforward and feedback matrix can hence be performed applying the V-BLAST algorithm [20] or one of its low complex implementations [21,22]. The resulting decomposition of the channel matrix H d reads Lattice-reduction-aided Tomlinson-Harashima precoding In order to significantly enhance the error performance of the transmission scheme, it is possible to extend sorted THP to lattice-reduction-aided THP (LRA-THP) [15,16]. The huge advantage of this scheme is that it achieves full diversity (here: diversity order N C ), i.e., the error performance is close to that of the optimum approach of vector precoding [23,24]. Applying a suited lattice reduction algorithm, e.g., the LLL algorithm [25], it is possible to decompose the channel matrix into a reduced channel H red,d and a unimodular matrix Z opt,d according to The reduced channel matrix H red,d is then passed to the V-BLAST algorithm, which, including its sorting, leads a decomposition according to c Considering the precoding structure according to Figure 1, after processing the data vector with Z opt,d the symbols are still drawn from the underlying integer grid. The following precoding equalizes the interferences caused by the reduced channel H red,d . To this end, the aim of the LLL algorithm is to find a suited representation of the lattice spanned by the rows of H d . This representation, given by H red,d , should fulfill two properties. On the one hand, the basis vectors should be as short as possible, on the other hand, the vectors should be close to orthogonal. Since Z opt,d changes the lattice basis from H d to H red,d it is also denoted as basis change matrix subsequently. A detailed analysis of this type of precoding scheme can be found in [11,16].

Review of selected mapping in multi-antenna environments
In the literature, selected mapping (SLM) [2] is one of the most popular techniques for PAR reduction in OFDM systems. The idea behind this scheme is, given the original OFDM frame, to generate several, say U SLM , different signal representations via U SLM different bijective mappings. Out of these signal candidates, the best one, i.e., the one exhibiting the lowest PAR, is chosen for transmission. At the receiver, after equalization the original data can be reconstructed by inverting the applied mapping. Hence, side information, in terms of an index of the applied mapping, has to be transmitted. The required redundancy has to be encoded with at least ⌈log 2 (U SLM )⌉ bits (⌈·⌉: round towards plus infinity). However, this index is extraordinarily sensitive to transmission errors as the application of the wrong inverse mapping leads to the loss of the whole OFDM frame. Possible schemes to transmit the side information have been discussed in [26][27][28][29].
Originally, SLM has been proposed for single-antenna schemes. A first extension for multi-antenna point-topoint scenarios has been presented in [7] and named ordinary SLM (oSLM). However, this approach is nothing else than a straightforward application of singleantenna SLM to each transmit antenna. A more sophisticated extension has been presented in [8,9] and named directed SLM (dSLM). Following the analytical analysis of these schemes in [18], this approach offers very promising results in terms of PAR reduction performance compared to the ordinary SLM.

Simplified selected mapping
However, both extensions, ordinary and directed SLM, are not applicable in the multi-user point-to-multipoint scenario considered in this article. Due to the required precoding at the transmitter side, it is not possible to influence the data streams at each antenna individually. Hence, to generate different signal candidates, we have to consider the data signals of all users jointly. The corresponding extension of SLM has been originally proposed in [7] and named simplified SLM (sSLM).
With sSLM the original frequency-domain MIMO OFDM frame A has to be mapped jointly onto U SLM different signal representations, whereby each row of A has to be mapped in the same way. Afterwards, each of the resulting signal candidates has to be precoded and transformed into time domain. Out of these, the best one, i.e., the one exhibiting the lowest PAR, is chosen for transmission.
Assuming the individual signal candidates to be statistically independent, the ccdf of sSLM can be given with respect to the ccdf of the original signal (5) and reads [7,9] ccdf sSLM (PAR th ) = (ccdf orig (PAR th )) U SLM (9) Subsequently, we consider this ccdf as reference for the PAR reduction performance.

Selected sorting
Another approach to generate different signal representations, named selected sorting (SLS), has been proposed in [12,13]. This approach combines mapping and precoding by applying different sortings in each subcarrier. In particular, different instances of THP are generated by considering different permutations of the users in each subcarrier. A practical advantage of this approach is that no side information needs to be communicated to the receiver.
The idea of SLS is as follows. A set of V different permutation matrices P (v) , v = 1,...,V, out of the set of K! possible ones are arbitrarily chosen d . Starting with the optimum sorting order, we consider the alternative permutation according to Next, the information carrying signal A is precoded via all V different precoder instances and the resulting precoded signals are denoted asX (v) , v = 1,...,V. In oder to generate U SLS different signal candidates X (u) , u = 1, . . . , U SLS , the respective columns (corresponding to the carriers) ofX (v) are combined in U SLS different ways. Hence, every column of each of the U SLS signal candidates X (u) is drawn from one of the V possible precoded signals. This is possible as the actual choice of the sorting order of THP at the dth subcarrier influences the precoded signal only at this position.
Noteworthy, with this approach we are able to generate (much) more signal candidates than precoded candidates are present (U SLS ≫ V may hold). The principal strategy how the U SLS signal candidates are generated is depicted in Figure 2.
Moreover, SLS requires much less computational complexity compared to sSLM as the precoding has to be performed only V times to generate the U SLS signal candidates. However, to further reduce the computational complexity the SLS technique could only be applied on a subset of D i ≤ D (randomly chosen) influenced subcarriers. All other subcarriers remain unaffected and the optimum sorting order is applied. Following the results of [13], operating only on a subset of subcarriers leads to a poor PAR reduction performance compared to the case when operating on all subcarriers. For this reason, we subsequently consider only the case for D i = D.
Compared to sSLM, assuming perfect transmission of the side information, this scheme will exhibit a small loss in error performance as suboptimal sorting orders are used to generate the signal candidates. However, even if very efficient schemes exist for transmitting the side information (e.g., [28]), perfect transmission is never possible. Moreover, the transmission of the side information and the inversion of the actual applied mapping requires additional signal processing at the receiver, which is not required in SLS.

Selected basis
The idea of generating signal candidates with selected sorting may straightforwardly be extended to the case of LRA-THP as well, where the pure permutation is replaced by an unimodular matrix Z opt,d . Consequently, in this case we introduce an additional unimodular matrix Z (v) . The effective unimodular basis change matrix in the dth subcarrier now reads In principal, Z (v) can be chosen to be any unimodular matrix. In the following, we construct arbitrary unimodular matrices by multiplying an upper and a lower triangular matrix (13) To guarantee that |det(Z (v) )| = 1 , for the diagonal elements of both matrices z u/1,m,m ∈ {±1, ±j}, ∀m has to hold. Moreover, in order to ensure that Z (v) contains only Gaussian integers, all non-zero elements of the upper and lower triangular matrix have to be Gaussian integers as well. For practical reasons we additionally restrict the magnitude of the elements, i.e., Subsequently, we choose z max = 1.

Numerical results
For the subsequent numerical results, we consider transmission over an (l H = 5)-tap equal gain Rayleigh fading channel. Moreover, we assume N C = K = 4 and OFDM applying D = 512 subcarriers (all of them are active). As modulation alphabet, we consider (M = 4)-ary QAM. Figure 3 shows numerical results when considering SLS as PAR reduction scheme-hence sTHP as precoding procedure. The left plot shows the respective ccdf of PAR and the right plot shows the bit error rates. The subcarriersX (2) X (1) X (2)  ccdf curves for Gaussian signaling ((5) or (10), depicted in gray) serve as reference.

Discussion
Considering the PAR reduction performance, it turns out that the ccdf of the original signal is not equal to the reference (5) when considering Gaussian signaling. The reason for this behavior is as follows: in the above definition of the feedforward and feedback matrices power loading over the users is included implicitly within each subcarrier. Considering the time-domain signal, i.e., after applying the IDFT, the antenna signals are no longer pairwise statistically independent. Hence, the distribution of PAR values will not exactly match the analytic result from (5) but higher PAR values will occur. Noteworthy, it is possible to overcome this issue by avoiding power loading over the users. In this case, there remains an individual scaling of each user, which can be equalized within the receiver's automatic gain control. However, in this article, we consider sTHP only with power loading over the users in order to have a fair comparison towards LRA-THP, where it is not straightforwardly possible to avoid power loading.
When considering the error performance of SLS, we can observe a little loss compared to the original signal, where the optimum permutation order is applied in each subcarrier. Noteworthy, using sorted THP the diversity order is only one. Figure 4 shows the numerical results for the PAR reduction scheme SLB-hence LRA-THP as precoding procedure. The first row of this figure displays the results for using arbitrary additional unimodular matrices according to the construction method from section "Selected basis" (z max = 1). In terms of PAR reduction performance, the ccdf of the original signal coincides with the reference (5) and the same holds when applying SLB with U SLB = 8 or U SLB = 16 candidates. Hence, with LRA-THP, the effect due to the power loading over the users is not an issue as it is in sTHP. However, when considering the error performance of this approach, it is obvious that a large loss compared to original LRA-THP is present, even if a significant gain compared to sTHP is achieved.

Choosing suited alternative precoders
As can be seen from the numerical results of Figure 4, SLB offers excellent results in terms of PAR reduction performance but also a significant loss in terms of error performance. The reason for this behavior is due to the arbitrary choice of the additional unimodular matrices Z (v) . Applying such additional matrices leads to a nonoptimum decomposition (with respect to the definition of LLL reduced) of the channel matrices in each subcarrier, which in turn leads to the significant loss of the error rate. However, applying arbitrary additional unimodular matrices Z (v) , it is possible to generate statistical independent signal candidates which leads to a PAR reduction performance equal to the reference (9). Subsequently, we study the influence of the additional unimodular matrix Z (v) 5) and (10)) when assuming Gaussian signaling and statistically independent signal candidates are depicted in gray. Right: bit error ratio over signal-to-noise ratio; insert: zoom into the BER curves; M = 4, D = 512, l H = 5. effective reduced channel and its QR-type decomposition reads The idea of the LLL algorithm is to find a more suited representation (H red,d ) of the lattice spanned by the rows of the channel matrix H d . Thereby, the row vectors of H red,d should be as short as possible and close to orthogonal. Applying the additional unimodular matrix Z (v) , this property remains also valid forH red,d as long As a first approach, this can be achieved when allowing only pure permutation matrices for Z (v) , similar to the SLS approach. The second row of Figure 4 shows numerical results for this case. Now, there is no loss in terms of error ratios compared to the original signal. However, the ccdf curves flatten out. The reason for this effect is that the restriction to pure permutation matrices offers not enough degrees of freedom to generate statistical independent signal candidates.
In order to introduce more degrees of freedom but ensure that the additional unimodular matrices Z (v) are still unitary, we allow matrices containing exactly one element from the set {±1, ±j} in each row and column and only zeros at all other positions. Such matrices are a generalization of permutation matrices and subsequently denoted as permutation/phase matrices. In total, there exist exactly 4 K K! of such matrices. The bottom row of Figure 4 shows numerical results when using such unimodular matrices to generate alternative signal candidates. It can be seen, that there is no loss in terms of error rates again. Additionally, the flattening of the ccdf curves is significantly reduced compared to the case when using pure permutation matrices. The PAR reduction performance when allowing arbitrary unimodular matrices can almost be achieved. Hence, with this kind of matrices it is possible to offer sufficient degrees of freedom to generate almost statistical independent signal candidates.

Analysis of computational complexity
As already mentioned above, the PAR reduction/precoding schemes SLS and SLB have two major advantages compared to sSLM. On the one hand, no side information has to be transmitted and, on the other hand, the computational complexity is reduced, as the precoding procedure has to be performed only V times to generate U SLS/SLB >V signal candidates. In the following, we compare the PAR reduction performance e of sSLM with the schemes SLS and SLB, respectively, incorporating the computational complexity. In this context, as complexity measure we consider the number of complex operations and treat multiplications and divisions equally. However, additions and multiplications with Gaussian integers are not incorporated into the counting.
In the following, we assume that the channel remains constant for the duration of N B OFDM symbols. Hence, for this block of OFDM symbols the calculation of the precoding matrices has to be performed only once, whereas the computation of the precoded signal, the FFT, and the selection metric have to be accomplished for each of the N B OFDM symbols.
With SLS or SLB, the computational complexity (per carrier) consists of the single calculation of the optimum decomposition (factorization) of the channel matrix according to (6) or (8). This complexity is denoted as c fac . In addition to that, V -1 alternative precoding matrices have to be determined. For each alternative, the computational complexity c QR of one QR-decomposition [30] is needed.
The V alternative precoders are now valid for N B OFDM blocks. For each of these OFDM blocks, we have to precode the MIMO OFDM frame V times. Moreover, U SLS/SLB K calculations of the inverse Fourier transform (complexity c FFT ) and of the selection metric (complexity c met ) are necessary in order to determine the best signal candidate.
Using sSLM, the complexity consists also of the calculation of the optimum decomposition of the channel (complexity c fac ) and of U SLM K transformations into time-domain (complexity c FFT ) and PAR evaluations (complexity c met ). Generating the different signal candidates is not incorporated into the considerations, as it is implemented via the multiplication of phase vectors (cf. [2]) and different candidates differ only in a change of sign or interchange of the quadrature components of the QAM symbols within each subcarrier. This operation is trivial in terms of computational complexity. Finally, the precoding of the signal has to be applied for each of the U SLM signal candidates.
For a fair comparison of sSLM with SLS or SLB, the respective scheme should exhibit the same complexity (i.e., c SLS/SLB ≈ c sSLM ). Given the parameters V and U SLS/ SLB for SLS or SLB then sSLM assessing signal candidates will exhibit approximately the same computational complexity. Hereby, when rounding the number U SLM of assessed candidates for sSLM to the next greater integer, sSLM will exhibit a slightly larger complexity.
In order to evaluate this number, we have to further specify the complexities c QR , c prec , c FFT , and c met . The calculation of the feedforward and feedback matrices is usually implemented via a QR-type decomposition [30] and requires complex operations. The precoding of the transmit signal requires complex operations; the transformation into time domain (implemented as fast Fourier transform [17]) and the calculation of the decision metric (PAR) require complex multiplications, respectively. For the following numerical results we choose the block lengths N B = 10 and fix the number of assessed signal candidates for SLS or SLB to either U SLS/SLB = 8 or U SLS/SLB = 16. The respective numbers of assessed signal candidates for sSLM according to (17) will be U SLM = 7 and U SLM = 11. Figure 5 shows the ccdf of PAR of sSLM and SLS. In this case, sSL M outperforms SLS even if less signal candidates are assessed. The reason for this behavior is that SLS is not able to generate statistical independent signal candidates as it is possible with sSLM. Hence, the ccdf curves of SLS flatten out compared to sSLM, which leads to the worse performance.
Numerical results of the comparison of sSLM with SLB are depicted in Figure 6. The top plot shows the results when using arbitrary unimodular matrices (cf. section "Selected basis"). In this case, sSLM is outperformed by SLB in terms of PAR reduction. However, cf. Figure 4, when choosing arbitrary unimodular matrices in SLB the loss in error rate compared to the original signal is significant.
The middle plot of Figure 6 compares the PAR reduction performance when restricting the additional unimodular matrices in SLB to permutation matrices. Now, it is no longer possible to generate statistical independent signal candidates, which leads to some flattening of the ccdf curves. Hence, SLB is outperformed by sSLM due to the steeper ccdf curves.
The bottom plot shows results when applying permutation/phase matrices for the additional unimodular matrices. In this case, the PAR reduction performance of SLB is more or less equal to the one of sSLM. Additionally, according to the numerical results of Figure 4, the loss in terms of bit error ratios is negligible. Noteworthy, the huge benefit of S LB is that no side information has to be communicated and no error multiplication due to erroneous side information occurs as it would with sSLM.

Conclusions
This article introduces a novel combined precoding/PAR reduction scheme for OFDM multi-user downlink scenarios. This scheme, named selected basis (SLB), is a further development of the scheme selected sorting (SLS). Both schemes are based on the idea of generating multiple redundant signal representations and selecting the one exhibiting the lowest PAR and are thus based on the philosophy of the SLM family. The multiple signal representations are generated by applying different instances of the precoder, which has to be applied within the multi-user downlink scenario. In particular, SLS generates multiple instances of the precoder by applying different permutations within the Tomlinson-Harashima precoding scheme. SLB works in combination with LRA precoding and generates different instances of the precoder by employing different additional unimodular (basis change) matrices. It turns out that the best PAR reduction performance can be achieved when using arbitrary unimodular matrices as an offset to the optimum (with respect to the definition of LLL reduced) basis change matrix. However, the error performance is quite poor in this case. The best trade-off between PAR reduction capabilities and error performance can be achieved when restricting the additional unimodular matrices to so-called permutation/phase matrices.
Finally, the PAR reduction performance of SLS and SLB is compared with the one of sSLM, the only feasible extension of SLM for the multi-user downlink scenario. For a fair comparison, the parameter of both schemes are chosen that they exhibit (almost) the same computational complexity. It turns out that sSLM offers better PAR reduction performance than SLS, because it is not possible to generate statistical independent signal candidates with SLS but with sSLM. However, the PAR reduction performance of SLB is almost the same as that of sSLM. Noteworthy, the huge benefit of SLS and SLB is that in contrast to sSLM no side information has to be communicated to the receiver. It can be summarized that using SLB in the OFDM multi-user downlink, both, very good PAR statistics and full diversity error performance can be achieved. As the receivers do not require any side information, it is a very attractive strategy for future downlink transmission systems.
Endnotes a A unimodular matrix Z = [z m,n ] contains only Gaussian integers, i.e., all elements z m,n are from the set {x + jy|x, y ∈ Z} and for its determinant |det(Z)| = 1 has to hold. b The V-BLAST algorithm calculates the optimum detection order for decision-feedback equalization when transmitting over MIMO channels. c The LLL algorithm can directly perform the decomposition (8) of the channel matrix H d into the unimodular matrix Z opt,d , the feed forward matrix F d , and the feedback matrix B d [31]. However, no explicit control on the resulting sorting is possible in this case. d In principal, it is reasonable to select V additional permutation matrices out of the set of K! ones, which have only marginal influence on the error ratio. Such a suited choice is discussed in [13], where only additional permutation matrices are used which do not change the encoding position of the last encoded user (with respect to the optimum sorting order). This strategy makes sense because no power loading of the users is applied in [13]. On the contrary, in this paper, power loading over the users is applied (cf. Figure 1), which makes the selection of suited additional permutation matrices not that easy. However, according to the numerical results shown in Sec., choosing arbitrary additional permutation matrices exhibits almost the same performance as the optimum permutation, which makes this strategy a reasonable approach. e In this paper, the comparison of sSLM with SLS or SLB, respectively, is done in terms of the PAR reduction performance. Comparing also the error performance of the respective schemes needs to incorporate a specific strategy to transmit the side information with sSLM. Certainly, the exist a wide range of different schemes to transmit the side information for the original approach of SLM (cf. [27][28][29][32][33][34]), which can be easily transferred to sSLM as well. Some of these schemes are able to transmit the side information very reliable. For the sake of brevity, we do not consider a specific scheme and omit the comparison of the error performance in this paper. Noteworthy, even if a reliable transmission of the side information with sSLM is possible, error propagation will still occur. Moreover, the transmission of the side information leads to additional complexity within transmitter and receiver. This additional complexity is not required with SLS or SLB, which is a further advantage of these schemes.