Bandwidth-Efﬁcient Cooperative Relaying Schemes with Multiantenna Relay

We propose coded cooperative relaying schemes in which all successfully decoded signals from multiple sources are forwarded simultaneously by a multiantenna relay to a common multiantenna destination to increase bandwidth e ﬃ ciency. These schemes facilitate various retransmission strategies at relay and single-user and multiuser iterative decoding techniques at destination, suitable for trade-o ﬀ s between performance, latency, and complexity. Simulation results show that the proposed schemes signiﬁcantly outperform direct transmission under the same transmit power and bandwidth e ﬃ ciency.


INTRODUCTION
Cooperative relaying has attracted a great deal of attention recently due to its capability of improving performance, increasing system capacity, extending coverage, and so forth [1,2]. Different signal processing techniques for retransmission and detection at relays and destination for cooperative relaying have been presented. In [3][4][5][6], the relays receive signals from sources in one phase and simply amplify or demodulate source signals before forwarding processed signals to the destination in another phase. The destination can use maximum ratio combining in both phases to recover the original information. In [7][8][9][10][11], coded cooperative relaying schemes were proposed, in which the relays decode the source signals and re-encode the decoded information in a different manner as compared to the sources (e.g., the decoded information is interleaved before being reencoded [8]) so that the destination can use code combining techniques such as iterative decoding to recover the original information. Coded cooperative relaying schemes are not only better than those based on repetition coding under various channel conditions [1], but also provide a great degree of flexibility to adapt channel conditions by allowing different code rates and partitions, for example, relayed signal can include just new parity bits [9] or with a fraction of repeated information bits [10].
The cooperative relaying schemes in [2][3][4][5][6][7][8][9][10][11] only consider a simple scenario with a source, a relay, and a destination; all are equipped with a single antenna. To increase spatial diversity order as well as cooperation probability between the source and the relay, several multiantenna relays were investigated using the diversity combining schemes in [12]. In general, all schemes in [2][3][4][5][6][7][8][9][10][11][12] reduce dramatically bandwidth efficiency as extended to a scenario with multiple sources. This comes from the fact that at least one additional phase is required to relay the signal for each source.
Different from those in [2][3][4][5][6][7][8][9][10][11][12], the coded cooperative relaying scheme in [13,14] illustrates another scenario in which a relay assists the information transmission of two sources. This scheme can be extended to the case of multiple sources. However, it suffers the same disadvantage of low bandwidth efficiency as those in [2][3][4][5][6][7][8][9][10][11][12]. It is noted that, in order to achieve high bandwidth efficiency, a single-antenna relay can detect multiple source signals and retransmit them in only one time slot as a multiplexed signal using a much higher modulation level than that of the sources at the expense of increased complexity and transmit power. In [15], a cooperative relaying scheme is proposed, where a multiantenna relay helps multiple single-antenna sources in their information transmission to a common multiantenna destination. By relaying each source signal on each antenna of the relay, this scheme exploits the multiplexing gain of 2 EURASIP Journal on Advances in Signal Processing multi-input multi-output (MIMO) systems, thus improving bandwidth efficiency. Theoretical analysis in terms of outage probability shows its superiority to direct transmission. However, the choice of channel codes that can approach the theoretical limit on outage probability is not addressed. In addition, the cooperative relaying scheme under consideration is based on repetition coding and, hence, is not comparable with coded cooperative relaying schemes.
In this paper, we propose coded cooperative relaying schemes using multiantenna relay to achieve high bandwidth efficiency and high cooperation probability between the sources and the relay (due to receive diversity), which is essential to provide spatial diversity at the destination. In addition, instead of demodulate-and-forward and zeroforcing detection as in [15], we explore the proposed colocated multiantenna relaying and code combining structures to develop different efficient retransmission schemes at the relay and single-user and multiuser iterative decoding techniques at the destination in order to improve the system performance. As an example of channel coding, we consider a convolutional code and investigate the performance of the proposed scheme in terms of bit error rate (BER) instead of the outage probability as in [15].
The rest of this paper is organized as follows. In Section 2, we present the system model under consideration. The proposed signal processing techniques at the relay and destination are discussed in Sections 3 and 4, respectively. Simulation results are presented in Section 5 for performance evaluation of the proposed schemes and comparison. Finally, the paper is concluded in Section 6. Figure 1 shows the cooperative relaying system under consideration with T single-antenna sources, a T-antenna destination, and a T-antenna relay to assist the communication between the sources and destination. For simplicity, we consider the number of sources equal to that of antennas at the destination and the relay. However, it is straightforward to extend to the general case with F single-antenna sources, a destination with U antennas, and a relay with K antennas where U, K ≥ F as in [15]. In addition, we do not consider the cooperation between sources (i.e., similar to [15]), although this cooperation can improve the system performance.

SYSTEM MODEL
All terminals operate in a half-duplex mode as follows. Each source S t , t ∈ {1, . . . , T} takes turn to transmit its signal in its assigned time slot as shown in Table 1. Throughout this paper, equal-length time slots are assumed. Its information bit segment I t is first encoded and then mapped into modulation signaling elements s t (e.g., M-PSK, M-QAM) to be transmitted, that is, where ϕ{·} and Φ{·} represent the modulation and encoding functions, respectively; s t [l] is a complex symbol transmitted from the source S t at the time instant l (l = 1, . . . , L t ); L t is the number of modulated symbols in the time slot t. If all sources use the same modulation channel coding schemes, L t = L for any t ∈ {1, . . . , T}.
During the first T time slots , the relay decodes the signals received from T sources. Subsequently, the relay processes only the successfully decoded signals (e.g., indicated by the cyclic redundancy check (CRC)) and forwards the processed signals to the destination in the time slot (T + 1) as shown in Table 1. The destination uses both the signals directly received from the sources and the signal from the relay to perform the signal detection.
With only one additional time slot (T + 1) required to relay all decoded signals of T sources, the bandwidth efficiency of the proposed schemes is reduced by a factor of T/(T + 1) as compared to 1/2 for the conventional schemes in [2][3][4][5][6][7][8][9][10][11][12][13][14]. For large T, T/(T + 1) approaches 1, that is, the bandwidth loss for relaying is negligible. In a synchronized system with T-antenna relay and destination, simultaneous transmission from T single-antenna sources in one time slot is possible for further improved bandwidth efficiency at the expense of receiver complexity and possible performance degradation at relay and destination, and beyond the scope of this paper.
We assume all channels experience independent block frequency-flat fading, that is, frequency-flat fade is fixed during a time slot but independently changed from one time slot to another. Furthermore, channel state information is available only at the receivers, not at the transmitters.

PROPOSED COOPERATIVE RELAYING SCHEMES
In this section, we will discuss the signal processing at the relay for detection and retransmission. Figure 2 shows the simplified receiver structure at the relay. The baseband-equivalent, discrete-time received signal vector r t [l] at the relay can be expressed as  where a t is the T × 1 channel vector from the transmit antenna of the source S t to the T receive antennas of the relay (each element of a t is modeled as circularly symmetric zeromean complex Gaussian random variable), and n t [l] is the T × 1 noise vector with the covariance matrix N 0 I T×T (i.e., the elements of n t [l] are modeled as circularly symmetric zeromean complex Gaussian random variables with variance N 0 /2 per dimension). Here, I T×T is the unity matrix of the size T × T.

Signal detection at relay
To produce I t , at first maximum ratio combining is applied to the elements of r t [l] as where a t = a H t a t , n t [l] is the noise variable with variance N 0 , and (·) H is the complex conjugate transpose.
The resulting signals r t [l] are then soft demapped to produce the log-likelihood ratios (LLRs) for all the coded bits, that is, the bit metrics, as follows where p ∈ {1, 2, . . . , m = log 2 M}, b t,l,p is the pth coded bit in a group of m = log 2 M bits carried by s t [l], and M is the constellation size. The subsets χ 1,p and χ 0,p contain the signal points in the M-ary constellation whose pth labeling bits are "1" and "0," respectively. Finally, the bit metrics are applied to decoding I t (e.g., [16]) and error detection (e.g., using CRC) is performed.

Signal retransmission at relay
For unsuccessful error detection, the corresponding I t is disregarded. The successfully recovered I t is first interleaved by a random interleaver Π and then processed for retransmission.
For low implementation complexity, the relay applies the same channel coding and modulation schemes used by the sources.
We propose two following retransmission techniques.

Parallel transmission (PT)
For parallel transmission (PT), the N (≤T) successfully recovered information segments, I t , t ∈ {1, . . . , T} are processed separately and retransmitted on different antennas as shown in Figure 3. The relay randomly chooses N among T transmit antennas (e.g., the first N out of T antennas as in the simulations). With channel knowledge at relay transmitter, an optimum choice of N antennas for retransmission can be derived. For notational simplicity, we assume T = N in the sequel. Obviously, by simply changing the sizes of vectors and matrices in equations, we easily obtain equations for the case of T ≥ N.
The signal x t transmitted on the antenna t can be represented as where Π{·} represents the interleaving function, and x t [l] is the modulated symbol transmitted on the antenna t at the time instant l. Figure 4 shows the block diagram of the proposed multiplexing transmission (MT) technique. The interleaved information segments Π{I t } are first bit-level multiplexed as in [17], that is, the information bits of Π{I 1 }, . . . , Π{I T } are alternately selected. Therefore, the correlation between I t is introduced to facilitate the high-performance multiuser joint iterative decoding (MUJID) to be done at the destination. While multiplexing increases the volumes (in bits), it also makes longer parity segments, and hence stronger codes. Then, the multiplexed segment J = Ω{Π{I 1 }, . . . , Π{I T }} is encoded, where Ω{·, ·} represents the multiplexing function. Finally, the resulting coded bits Φ{J} are subsequently split into T parallel streams; each is modulated and transmitted on one antenna.

SIGNAL PROCESSING AT DESTINATION
The destination processes the signals from T sources received in the first T time slots to produce their corresponding bit metrics in a similar manner as the relay. Hence, we use the same notations as in Section 3.1 to avoid the duplication. In the last (T + 1)th time slot, the destination receives the signal from the relay. The baseband-equivalent, discrete-time received signal vector y[l] at the time instant l in the time slot (T + 1) at the destination can be modeled as where y[l] is the T × 1 received signal vector on the T receive antennas of the destination, H is the T × T channel matrix from the T transmit antennas of the relay to the T receive antennas of the destination (the elements of H are modeled as circularly symmetric zero-mean complex Gaussian random variables), is the T × 1 symbol vector transmitted from the relay at the time instant l, and n[l] is the T × 1 noise vector with the covariance matrix N 0 I T×T . Here (·) T is the transpose operator.
In the following subsections, we will discuss the proposed bit metric calculations and iterative decoding structures.

Bit metric calculations in time slot (T + 1)
The destination also needs to calculate the bit metrics for all coded bits (retransmitted by the relay) in order to perform the iterative decoding for all T source signals. We consider three calculation techniques based on maximum likelihood (ML), zero-forcing (ZF), and QR decomposition.

ML-based bit metric calculation (MLC)
The LLRs for all coded bits transmitted from the relay are computed as where p ∈ {1, 2, . . . , m}, b r,t,l,p is the pth coded bit in a group of m bits carried by x t [l]. The subsets χ 1,t,p and χ 0,t,p contain the symbol vectors x =(x 1 , x 2 , . . . , x T ) T so that the signal points x t in the M-ary constellation whose pth labeling bits are "1" and "0," respectively. The ML-based bit metric calculationis optimum in the sense of minimum bit error probability. However, to calculate Λ(b r,t,l,p | y[l]) in (7), we need to sum over 2 mT−1 possible symbol vectors in the set χ 1,t,p . So, the complexity of the ML-based bit metrics calculation can be prohibitive for large M and T. This problem can be remedied by applying the list slab-sphere detection method in [18], but the searching range of this method depends on the received signals, thus making the complexity still high. In this paper, we propose two low-complexity methods: ZF-based bit metric calculation (ZFC) and QR -based bit metric calculation (QRC).

ZF-based bit metric calculation (ZFC)
The received vector y[l] is first multiplied by W = (H H H) −1 H H to suppress the interference between transmitted symbols on different transmit antennas: Explicitly, (8) can be rewritten as Therefore, we apply (4) to compute the LLRs for all coded bits from the relay as Although the ZF-based bit metrics calculation is much simpler than the ML-based bit metrics calculation (i.e., to calculate Λ(b r,t,l,p | z t [l]) in (10), we only need to sum over 2 m−1 possible symbols in the set χ 1,p ), multiplying y[l] by W causes the noise enhancement with a factor of W(t, :)W(t, :) H and therefore, leading to the performance degradation.

QR-based bit-metric calculation (QRC)
Using QR decomposition [19], that is, H = QR where Q is a unitary matrix and R = [r i, j ] is an upper triangular matrix (i.e., r i, j = 0 if i > j), (6) can be rewritten as The above expressions, (12)- (13), indicate that the signal element x T [l] does not contain any interference from the K. Ho-Van and T. Le-Ngoc 5 other elements, and the element x t [l] contains interference from only the elements x t+ j [l], where j = 1, . . . , (T − t) and t = T − 1, . . . , 1. Consequently, we propose the bit metrics calculation in accompany with the successive soft interference cancellation (e.g., [20,21]) as follows.
Based on (12), and similar to (4), the LLRs for the coded bits transmitted on the antenna T of the relay can be first computed as Then In (16), we assume the statistical independence of each bit b r,T,l,p carried by the symbol x T [l] and the probability of b r,T,l,p is Pr b r,T,l,p = 1 1 + exp (−1) br,T,l,p Λ b r,T,l,p | k T [l] . (17) Finally, we calculate the LLRs for the coded bits on the remaining transmit antennas in the order t = T − 1, . . . , 1 in two steps. In the first step, all interferences from the symbols x j [l]'s, on other transmit antennas j, j = t + 1, . . . , T on the symbol x t [l], on the considered transmit antenna t (see (13)), are softly cancelled out from k t [l] as Based on (18) and the Gaussian assumption on the residual interference (same as [20]), the ν t [l] in (18) is the circularly symmetric zero-mean complex Gaussian random variable with variance σ t [l] In (18) and (19), m j [l] and λ j [l] are given by (15), respectively, with T being substituted by j.
In the second step, we compute the LLRs for the coded bits transmitted on the transmit antenna t of the relay as From (14) and (20), we realize that to calculate the LLRs for the coded bits we only need to sum over 2 m−1 possible symbols in the set χ 1,p . Therefore, the searching range of QRC and ZFC is the same. However, QRC can avoid the noise enhancement of ZFC (see (18)).

Bit metrics calculation
Bit metrics calculation From the relay Figure 6: Single-user iterative decoding for source S t .  . At the end of each iteration j, the SISO decoder will produce a sequence of T extrinsic segments, L t,e ( j) , which are the soft outputs corresponding to T information segments of the T sources, I t 's. They are stored to be used as inputs of the SISO decoder in the next iteration (j + 1). After a sufficient number of iterations, T extrinsic segments, L t,e ( j) , can be used to make a decision on the transmitted information bit segments.

Single-user iterative decoding (SUID)
As the parallel transmission does not introduce any correlation among the T source signals, the SUID can be used to recover the information bit segment of the source t as shown in Figure 6. This iterative decoding is akin to the standard Turbo decoding and, hence, will not be described further in detail for briefness.

SIMULATION RESULTS
Simulation is used to evaluate and compare the performance of the proposed schemes and others in an independent frequency-flat block Rayleigh fading environment under various conditions. Table 2 summarizes the 6 proposed schemes under consideration by simulation, as the results of 2 relay retransmission techniques are PT and MT, and 3 bit metric calculations: MLC, ZFC, and QRC. As reference, we consider the direct transmission (i.e., without the relay) using the 4-state, rate 1/2 recursive systematic convolutional code (RSCC) of generator polynomial [1, 5/7] in octal form, and the cooperative relaying scheme in [9] where T single-antenna relays help T single-antenna sources in the pairwise manner. All considered schemes use the same encoder.

Simulation setup
Obviously, the difference in the system model between our proposed schemes and the scheme in [9] is the way to deploy T relay antennas: T colocated antennas as in our system model or T distributed antennas as in [9]. Using T colocated antennas as in our system model benefits from the high cooperation probability between the sources and the relay which is essential to provide spatial diversity at the destination and high bandwidth efficiency (reduced by a factor of T/(T + 1) compared to 1/2 for [9]). On the other hand, the proposed schemes suffer the symbol interference in the time slot (T + 1) while that in [9] does not. However, the low bandwidth efficiency of the scheme in [9] requires an increase in modulation level, thus degrading the performance, which cannot be compensated by the interference-free advantage if the cooperation probability between the source and the relay is low (i.e., interuser channel is bad). These aspects will be demonstrated by the following simulation results.
For the purpose of illustration, we investigate the case of T = 3. For a fair comparison in terms of bandwidth efficiency, the direct transmission, the proposed schemes, and that in [9] use 8-PSK, 16-QAM, and 64-QAM, respectively. We also assume equal transmitted power for all terminals and for the relay antennas (i.e., the total relay transmitted power is equally shared by its antennas, E{|x t [l]| 2 } = E{|s t [l]| 2 }/N).
We assume identically and independently distributed (iid) frequency-flat fading over any source-relay (or destination) or relay-destination channel. For the scheme in [9], we assume that the relay t corresponds to the antenna t of the relay in our model. We denote the average signal-tonoise ratio of the channel between the source and the receive antenna of the relay as SNR in , between the source and the receive antenna of the destination as SNR, and between the transmit antenna of the relay to the receive antenna of the destination as SNR rd .
The information bit segment is of 180-bit length and the CRC-16-CCITT code is used to check if the recovered source's information segment is error free. In addition, we examine J = 5 iterations.
Due to the above iid fading assumption, all sources in the schemes PT ZF, PT ML, MT ZF, and MT ML have identical performance. However, PT QR and MT QR offer different performances for different sources due to the nature of the soft interference cancellation. For this, the performance curves for PT QR and MT QR in the following results represent the BER averaged over all sources (i.e., sum of BERs of all sources divided by the number of sources).   Figure 7 shows the performance curves of the investigated schemes with SNR in = SNR + 10 dB and SNR rd = SNR + 5 dB. We observe that all the proposed schemes significantly outperform the others. Among the proposed schemes, those with MUJID (i.e., MT ML/MT QR/MT ZF) are considerably better than those with SUID (i.e., PT ML/PT QR/PT ZF) due to the longer codeword generated from the multiplexing operation. However, the longer codeword also makes longer decoding latency for the MUJID. Therefore, performance delay trade-off can be made for different requirements. In addition, among those with MUJID (or SUID), MT ML, MT QR, and MT ZF (or PT ML, PT QR, and PT ZF) perform in the descending order but their complexities are in the reversed order. This is consistent with the previous discussions. Consequently, another trade-off between performance and complexity is also an option for different requirements. Moreover, the scheme in [9] performs even worse than the direct transmission. This comes from the fact that the former (due to the nature of the two time slot cooperative relaying) must use a higher modulation level than that of the latter for the same bandwidth efficiency, while the interuser channel is of low quality, making the cooperation between the source and the relay take place less frequently. Therefore, the scheme in [9] is almost in the direct transmission mode (i.e., the direct transmission with 64-QAM in [9] is obviously worse than that with 8-PSK). Figure 8 shows the performance curves of the investigated schemes with better quality interuser channels, SNR in = SNR + 20 dB. Since the source-destination channel qualities are unchanged, the direct transmission has the same performance as previously shown in Figure 7, while the performance of the scheme in [9] is drastically improved with the interuser channel quality. This is because with the improved interuser channel, the cooperation probability between the source and the relay increases, thus enhancing the spatial diversity at the destination. However, it is still worse than any proposed scheme.

Simulation results
The simulation results in Figures 7 and 8 are combined in Figure 9 to see the impact of the interuser channel on the BER performance. It is seen that the proposed schemes are relatively insensitive to the change of the individual interuser channel, while the scheme in [9] is greatly affected. This is obvious since multiple colocated antennas at the relay increase the spatial diversity of the received signals, providing an overall highly reliable transmission over the source-relay channel. As a result, improving an individual source-relay SNR does not contribute significantly to the performance of signal detection at the relay. In contrast, the singleinput, single-output source-relay channel in the scheme [9] makes the transmission reliability over this channel heavily dependent on its channel quality (or SNR). Figure 10 illustrates the performance of various schemes with SNR rd = SNR + 15 dB and SNR in = SNR + 10 dB. The   performance of the direct transmission is the same as shown in Figure 7 due to the unchanged source-destination channel qualities. With the improved relay-destination channel, the relay forwards the processed information of the sources more reliably, thus enhancing the spatial diversity at the destination. For the scheme in [9], its performance is not improved much, since the cooperation between the relays and the sources are rare (due to unchanged SNR in = SNR + 10 dB as for Figure 7), and as a consequence the better relay-destination channel does not contribute much to its performance improvement. For easy comparison, we combine the results in Figures 7 and 10 into Figure 11. Figure 11 indicates that the proposed schemes perform drastically better with improved relay-destination channel quality as compared to the others. Figure 11 also shows that MUJID is significantly better than SUID, but their performance difference is reduced with the increased SNR rd . For example, at the target BER of 10 −3 , the improvement offered by MT ML as compared to PT ML is around 2 dB for SNR rd = SNR + 5 dB and reduced to only 0.75 dB for SNR rd = SNR + 15 dB. To see the effect of both the source-relay channels and the relay-destination channels on the performance of the investigated schemes, we consider the case where the source-relay channels are improved (e.g., SNR in = SNR + 20 dB), while the relay-destination channels are similar to those in Figure 10, that is, SNR rd = SNR + 15 dB. The simulation results are illustrated in Figure 12. Since the source-destination channel qualities are unchanged, the direct transmission has the same performance as shown in Figure 7, while the performance of the proposed schemes and that in [9] are substantially improved. In addition, the performance gap between the proposed scheme and that in [9] is dramatically increased with the improvement of the source-relay channels and the relay-destination channels (by comparing Figures 7 and 12). Figure 13 indicates the BER performance of the 6 proposed schemes for different iterations where SNR in = SNR + 10 dB and SNR rd = SNR + 5 dB. We see that all the proposed schemes converge after 3 iterations.

CONCLUSIONS
We proposed the coded cooperative relaying schemes using a multiantenna relay to assist the information retransmission of multiple sources. These schemes achieve high bandwidth efficiency as well as high performance due to different transmission techniques at the relay and the diversified iterative decoding at the destination. In addition, different from the conventional cooperative relaying schemes (e.g., [9]) whose performance heavily depends on the individual source-relay channel quality, the proposed schemes are almost insensitive to the individual source-relay channel due to the diversity provided by multiple receive antennas. Therefore, the relay can help the sources to improve their performances in a large range of SNR.
In the proposed schemes, we do not consider the cooperation between sources. This cooperation is expected to improve further performance but also makes the cooperative schemes more complicated. It could be an interesting topic for further research.
For a fixed relay as considered in this paper, the channel from the relay and the destination is less time variant. Consequently, the channel state information can be available at the relay so that some techniques such as precoding and power allocation at the relay can be exploited to enhance the information transmission reliability over the relay-destination channel, thus improving the overall system performance.

ACKNOWLEDGMENT
This work was partially supported by the Prompt/NSERC/ CRD Grants with InterDigital Canada.