Diversity Analysis of Distributed Space-Time Codes in Relay Networks with Multiple Transmit/Receive Antennas

The idea of space-time coding devised for multiple-antenna systems is applied to the problem of communication over a wireless relay network, a strategy called distributed space-time coding , to achieve the cooperative diversity provided by antennas of the relay nodes. In this paper, we extend the idea of distributed space-time coding to wireless relay networks with multiple-antenna nodes and fading channels. We show that for a wireless relay network with M antennas at the transmit node, N antennas at the receive node, and a total of R antennas at all the relay nodes, provided that the coherence interval is long enough, the high SNR pairwise error probability ( PEP ) behaves as (1 /P ) min { M , N } R if M / = N and (log 1 /M P/P ) MR if M = N , where P is the total power consumed by the network. Therefore, for the case of M / = N , distributed space-time coding achieves the maximal diversity. For the case of M = N , the penalty is a factor of log 1 /M P which, compared to P , becomes negligible when P is very high.

This work follows the strategy of [5], where the idea of space-time coding devised for multiple-antennasystems is applied to the problem of communication over a wireless relay network.(Though having the same name, the distributed space-time coding idea in [5] is different from that in [14].Similar ideas for networks with one and two relays have appeared in [6,11].)In [5], the authors consider wireless relay networks in which every node has a single antenna and the channels are fading, and use a cooperative strategy called distributed space-time coding by applying a linear dispersion space-time code [25] among the relays.It is proved that without any channel knowledge at the relays, a diversity of R(1 − log log P/ log P) can be achieved, where R is the number of relays and P is the total power consumed in the whole network.This result is based on the assumption that the receiver has full knowledge of the fading channels.Therefore, when the total transmit power P is high enough, the wireless relay network achieves the diversity of a multiple-antenna system with R transmit antennas and one receive antenna, asymptotically.That is, antennas of the relays work as antennas of the transmitter although they cannot fully cooperate and do not have full knowledge of the transmit signal.After the appearance of [5], code designs for distributed space-time coding have been proposed in [26][27][28][29][30][31] and the differential use of distributed space-time coding has been introduced in [32][33][34][35].The references [36,37] analyze the diversity-multiplexing tradeoff of distributed spacetime coding.Distributed space-time coding in asynchronous networks is discussed in [38][39][40][41][42][43].Other related papers can be found in [44][45][46].
This paper has two main contributions.First, we extend the idea of distributed space-time coding to wireless relay networks whose nodes have multiple antennas.Second and more importantly, based on the pairwise error probability (PEP) analysis, we prove lower bounds on the diversity of this scheme.We use the same two-step transmission method in [5], where in one step the transmitter sends signals to the relays and in the other the relays encode their received signals into a linear dispersion space-time code and transmit to the receiver.For a wireless relay network with M antennas at the transmitter, N antennas at the receiver, and a total of R antennas at all the relay nodes, our work shows that when the coherence interval is long enough, a diversity of min{M, N}R if M / = N and MR(1−(1/M)(log log P/ log P)) if M = N can be achieved, where P is the total power used in the network.With this two-step protocol, it is easy to see that the errorprobability is determined by the worse of the two steps: the transmission from the transmitter to the relays and the transmission from the relays to the receiver.Therefore, when M / = N, distributed space-time coding is optimal since the diversity of the first stage cannot be larger than MR, the diversity of a multiple-antenna system with M transmit antennas and R receive antennas, and the diversity of the second stage cannot be larger than NR.When M = N, the penalty on the diversity, because the relays cannot fully cooperate and do not have full knowledge of the signal, is R(log log P/ log P).When P is very high, it is negligible.Therefore, with distributed space-time coding, wireless relay networks achieve the same diversity of multiple-antenna systems, asymptotically.
The paper is organized as follows.In the following section, the network model and the generalized distributed space-time coding are explained in detail.A training scheme is also proposed.The PEP is first analyzed in Section 3. In Section 4, the diversity for the network with an infinite number of relays is discussed.Then, the diversity for the general case is obtained in Section 5. Section 6 contains the conclusion.Proofs of some of the technical theorems are given in Appendices A-D.In Appendix E, we discuss heterogeneous networks.

Network model and distributed space-time coding
We first introduce some notation.For a complex matrix A, A, A t , and A * denote the conjugate, the transpose, and the Hermitian of A, respectively.det A, rank A, and tr A indicate the determinant, rank, and trace of A, respectively.A denotes the vectorization of A formed by stacking the columns of X into a single column vector.I n denotes the n × n identity matrix and 0 m,n is the m × n matrix with all zero entries.We often omit the subscripts when there is no confusion.log indicates the natural logarithm.
a is the minimal integer that is not less than a.
Consider a wireless network with R + 2 nodes which are placed randomly and independently according to some distribution.As shown in Figure 1, there are one transmit node and one receive node.All the other R nodes work Step 1: time 1 to T Step 2: time T + 1 to 2T as relays.The transmitter has M transmit antennas, the receiver has N receive antennas, and the ith relay has R i antennas.Since the transmit and received signals at different antennas of the same relay can be processed and designed independently, the network can be transformed to a network with R = R i=1 R i single-antenna relays by designing the transmit signal at every antenna of every relay according to the received signal at that antenna only.This is one possible scheme.In general, the signal sent by one antenna of a relay can be designed using received signals at all antennas of the relay.However, as will be seen later, this simpler scheme achieves the optimal diversity asymptotically although a general design may improve the coding gain of the network.Therefore, to highlight the diversity results by simplifying notation and formulas, in the following, we assume that every relay has a single antenna.Denote the channel vector from the M antennas of the transmitter to the ith relay as t , and the channels from the ith relay to the N antennas at the receiver as We use the block-fading model [2] by assuming a coherence interval T. From the two-step protocol that will be discussed in the following, we can see that we only need f i to keep constant for the first step of the transmission and g i to keep constant for the second step.It is thus good enough to choose T as the minimum of the coherence intervals of f i and g i .Also, perfect symbol-level synchronization is assumed in this network model.For asynchronized networks, please refer to [38][39][40][41][42][43].
The information bits are encoded into T × M matrices s, whose mth column is the signal sent by the mth transmit antenna.For the power analysis, s is normalized as To send s to the receiver, the same two-step strategy in [5] is used, as shown in Figure 1.In step one, the transmitter sends P 1 T/Ms.The average total power used at the transmitter for the T transmissions is P 1 T. The received signal vector and the noise vector at the ith relay are denoted as r i and v i .In step two, the ith relay sends t i .The received signal and noise at the receiver are denoted as X and w.The noises are assumed to be i.i.d.CN (0, 1).Clearly, where We use distributed space-time coding proposed in [5] by designing the transmit signal at relay i as a linear function of its received signal: where A i is a predetermined T × T unitary matrix known to both the ith relay and the receiver.It is fixed during training and data transmissions.For various methods on how to design the A i , see [26][27][28][29][30][31].P 2 can be proved to be the average transmit power for one transmission at every relay.After some calculation, the system equation can be written as where The received signal matrix, X, is The covariance matrix of the equivalent noise matrix can be proved to be R W .The diversity analysis in this paper is much more difficult than that in [5] because in networks with single-antenna nodes, the covariance matrix of the equivalent noise is a multiple of the identity matrix.Here, for the diversity result, we need to analyze the eigenvalues of R W or find bounds on them.

Assumptions and training
In this paper, we assume that f mi and g in have independent Rayleigh distributions; that is, f mi and g in are independent circulant complex Gaussian random variables with zero mean.For simplicity, we also assume that f mi and g in have the same variance, which is 1.The heterogeneous case, in which every channel has a different variance, is discussed in Appendix E. The same diversity results can be obtained in heterogeneous networks.We make the practical assumption that the relays have no channel information.However, we do assume that the receiver has enough channel information to do coherent detection.Thus, a training process is needed.For coherence ML decoding at the receiver, the receiver needs to know H and R W , or equivalently, H and G.We propose a training process that contains two steps and takes M p + 2N p symbol periods (other training methods can also be envisioned, and the one proposed here is one possibility).
Each step mimics the training process of a multiple-antenna system [47] as its system equation has the same structure.
First, we estimate G, which takes M p symbol periods.Let U p be a predesigned full-rank M p × R pilot matrix.The ith relay sends the ith column of U p simultaneously.The receiver gets where Q p is the power used at every relay and w p is the M p × N noise matrix.Since there are RN unknowns (corresponding to the components of G) and min{M p , R}N independent equations, we need M p ≥ R. We could estimate G from U p using ML, MMSE, or other criteria.Then, we estimate H using distributed space-time coding discussed in Section 2.1.This takes 2N p symbol periods.The transmitter sends a full-rank N p × M pilot signal matrix s p and the relays perform distributed space-time coding.From (4), the received signal can be written as where P 1,p and P 2,p are the powers used at the transmitter and every relay and is the carefully designed N p ×MR pilot space-time codeword.Now, let us discuss the number of training symbols needed in this step.Note that G is known from the first training step.Define By stacking the columns of X into one single column vector, we can rewrite (9) as Denote The number of independent equations in (9) equals the rank of H p , which is min{N p N, N p R, MR}.Since there are MR unknowns (corresponding to the components of f), we need min{N p N, N p R, MR} ≥ MR, which is equivalent to While this condition is satisfied, we could estimate f from X p using ML, MMSE, or other criteria.The overall training process takes at least R+2 max{ MR/N , M} symbol periods.
The optimal designs of U p , Q p , S p (or s p ), and P 1,p , P 2,p are interesting issues.However, they are beyond the scope of this paper.

PAIRWISE ERROR PROBABILITY AND OPTIMAL POWER ALLOCATION
To analyze the PEP, we have to determine the maximumlikelihood (ML) decoding rule.This requires the conditional probability density function (PDF) P(X | s k ), where s k ∈ S and S is the set of all possible transmit signal matrices.

Theorem 1.
Given that s k is transmitted, define Then conditioned on s k , the rows of X are independently Gaussian distributed with the same variance R W .The tth row of X has mean Proof.See Appendix A.
In view of Theorem 1, we should emphasize that for a wireless relay network with multiple antennas at the receiver, the columns of X are not independent although the rows of X are.(The covariance matrix of each row R W is not diagonal in general.)That is, the received signals at different antennas are not independent, whereas the received signals at different times are.This is the main reason that the PEP analysis in the new model is much more difficult than that of the network in [5], where X had only a single column.
With P(X | s k ) in hand, we can obtain the ML decoding and thereby analyze the PEP.The result follows.

Theorem 2 (ML decoding and the PEP Chernoff bound).
The ML decoding of the relay network is arg min With this decoding, the PEP of mistaking s k by s l , averaged over the channel realization, has the following upper bound: Proof.The proof is omitted since it is the same as the proof of Theorem 1 in [5].
As both H and R W are known at the receiver, sphere decoding can be used to perform the ML decoding in (17).
The main purpose of this work is to analyze how the PEP decays with the total transmit power.The total power used in the whole network is P = P 1 + RP 2 .One natural question is how to allocate power between the transmitter and the relays if P is fixed.Notice that when R → ∞, according to the law of large numbers, the off-diagonal entries of (1/R)G * G go to zero while the diagonal entries approach 1 with probability 1.It is thus reasonable to assume (1/R)G * G ≈ I N for large R.With this approximation, minimizing the PEP is now equivalent to maximizing P 1 P 2 T/4M(1 + P 1 + RP 2 ).This is exactly the same power allocation problem in [5].Therefore, we can conclude that the optimum solution is to set That is, the optimum power allocation is such that the transmitter uses half the total power and the relays share the other half.As discussed in Section 2.1, for the general network where the ith relay has R i antennas, the antennas are treated as R i different relays.Therefore, in general, the optimum power allocation is such that the transmitter uses half the total power as before, but every relay uses a power that is proportional to its number of antennas, that is, P 1 = P/2 and the power used at the ith relay is R i P/2R.

Basic results
As mentioned earlier, to obtain the diversity, we have to compute the expectations over f mi and g in in (18).We will do this rigorously in Section 5.However, since the calculation is detailed and gives little insight, in this section, we give a simple asymptotic derivation for the case where the number of relay nodes approaches infinity, that is, R → ∞.As discussed in the previous section, when R is large, we can make the approximation R W ≈ (1 + P 2 R/(P 1 + 1))I N .Denote the nth column of H as h n .From ( 5), h n = G n f, where we have defined G n = diag{g 1n I M , . . ., g Rn I M }.Therefore, from (18) and using the optimal power allocation in (19), Since f is white Gaussian with mean zero and variance I RM , Similar to the multiple-antenna case [4,48] and the case of wireless relay networks with single-antenna nodes [5], to achieve full diversity, S k − S l must be full rank.Since the distributed space-time codes S k and S l are T × MR, in the following, we will assume T ≥ MR and the code is fully diverse.Denote the minimum singular value of (S k − S l ) * (S k − S l ) by σ 2 min .From the full diversity of the code, σ 2 min > 0. Therefore, the right side of ( 21) can be further upper bounded as Since g in are i.i.d.
By defining y = 1 + (PTσ 2 min /16MR)x, we have The following theorem can be obtained by calculating the integral.
Theorem 3 (diversity for R → ∞).Assume that R → ∞, T ≥ MR, and the distributed space-time code is full diverse.For large total transmit power P, by looking at only the highestorder term of P, the PEP of mistaking s k by s l has the following upper bound: Therefore, the diversity of the wireless relay network is Proof.See Appendix B.

Discussion
With the two-step protocol, it is easy to see that regardless of the cooperative strategy used at the relay nodes, the error probability is determined by the worse of the two transmission stages: the transmission from the transmitter to the relays and the transmission from the relays to the receiver.The PEP of the first stage cannot be better than the PEP of a multiple-antenna system with M transmit antennas and R receive antennas, whose optimal diversity is MR, while the PEP of the second stage can have diversity not larger than NR.Therefore, when M / = N, according to the decay rate of the PEP, distributed space-time coding is optimal.For the case of M = N, the penalty on the decay rate is just R(log log P/ log P), which is negligible when P is high.
If we can use the diversity definition in [49], since lim P→∞ (log log P/ log P) = 0, diversity min{M, N}R can be obtained.
The results in Theorem 6 are obtained by considering only the highest-order term of P in the PEP formula.In brief, we call the rth highest-order term of P in the PEP formula the rth term.When analyzing the diversity, not only is the first term important, but also how dominant it is.Therefore, we should analyze the contributions of the second and also other terms of P compared to those of the first one.This is equivalent to analyzing how large the total transmit power P should be for the terms in (25) to dominate.The following remarks are on this issue.They can be observed from the proof of Theorem 3 in Appendix B. 13) and (B.22), the second term behaves as P − min{M,N}R+1 .The difference between the first and second terms is a P factor.Therefore, the first term is dominant when P 1.In other words, contributions of the second and other terms are negligible when P 1.
(2) If M = N, from (B.16), the second term is which has one less log P than the first one.Therefore, the first term, (1 , is dominant if and only if log P 1, which is a much stronger condition than P 1.When P is not very large, contributions of the second and even other terms are not negligible. (3) If |M − N| = 1, from (B.11) and (B.24), the second term behaves as P − min{M,N}R (log P/P).The difference between the first and second terms is log P/P factor.Therefore, the first term given in (25) is dominant if and only if P log P.This condition is weaker than the condition log P 1 in the previous case; however, it is still stronger than the normally used condition P 1.

A simple derivation
The diversity analysis in the previous section is based on the assumption that the number of relays is very large.In this section, analysis on the PEP and diversity for networks with any number of relays is given.
As discussed in Section 3, the main difficulty of the PEP analysis lies in the fact that the noise covariance matrix R W is not diagonal.From (18), we can see that one way of upper bounding the PEP is to upper bound R W . Since R W ≥ 0, Therefore, from (18) and using the power allocation given in (19), when P 1.If the space-time code is fully diverse, using similar argument in the previous section, where, as before, σ 2 min is the minimum singular value of (S k − S l ) * (S k − S l ) and Calculating this integral, the following theorem can be obtained.
Theorem 4 (diversity for wireless relay network).Assume that T ≥ MR and the distributed space-time code is full diverse.For large total transmit power P, by looking at the highest-order terms of P, the PEP of mistaking s k by s l satisfies Therefore, the same diversity as in (26) is obtained.
Proof.See Appendix C.
Although the same diversity is obtained as in the R → ∞ case, there is a factor of N in (31), which does not appear in (25).This is because we upper bound R W by (trR W )I N , whose expectation is N times the expectation of R W , while in the previous subsection we approximate R W by its expectation.This factor of N can be avoided by tighter upper bounds of R W .In the following subsection, we analyze the maximum eigenvalue of R W . Then in Section 5.3, a PEP upper bound using the maximum eigenvalue of R W is obtained.

The maximum eigenvalue of Wishart matrix
Denote the maximum eigenvalue of (1/R)G * G as λ max .Since G is a random matrix, λ max is a random variable.We first analyze the PDF and the cumulative distribution function (CDF) of λ max .
If entries of G are independent Gaussian distributed with mean zero and variance one, or equivalently, both the real and imaginary parts of every entry in G are Gaussian with mean zero and variance 1/2, (1/R)G * G is known as the Wishart matrix.While there exists explicit formula for the distribution of the minimum eigenvalue of a Wishart matrix, we could not find nonasymptotic formula for the maximum eigenvalue.Therefore, we calculate the PDF and CDF of λ max from the joint distribution of all the eigenvalues of (1/R)G * G in this section.The following theorem has been proved.(1) The PDF of the maximum eigenvalue of where F is an (N − 1) × (N − 1) Hankel matrix whose (i, j)th entry equals where F is an N × N Hankel matrix whose (i, j)th entry equals f i j = λ 0 t R−N+i+ j−2 e −Rt dt.

Proof. See Appendix D.
A theoretical analysis of the PDF and CDF from ( 32) and (33) appears to be quite difficult.To understand λ max , we plot the two functions in Figures 2 and 3 for different R and N. Figure 2 shows that the PDF has a peak at a value a bit larger than 1.As R increases, the peak becomes sharper.An increase in N shifts the peak right.However, the effect is smaller for larger R. From Figure 3, the CDF of λ max grows rapidly around λ = 1 and becomes very close to 1 soon after.The larger R is, the faster the CDF grows.Similar to the PDF, an increase in N results in a right shift of the CDF.However, as R grows, the effect diminishes.This verifies the validity of the approximation G * G ≈ RI N in Section 4 for large R.
In the following corollary, we give an upper bound on the PDF.This result is used to derive the diversity result for general R in the next subsection.Corollary 1.When R ≥ N, the PDF of the maximum eigenvalue of (1/R)G * G can be upper bounded as where is a constant that depends only on R and N.
Proof.From the proof of Theorem 5, F is a positive semidefinite matrix.Therefore, det F ≤ N−1 n=1 f nn .From (32), f nn can be upper bounded as then we have det Thus, (34) is obtained.

Bound on PEP from bound on eigenvalues
If the maximum eigenvalue of (1/R)G * G is λ max , the maximum eigenvalue of R W is 1+(P 2 R/(P 1 +1))λ max , and therefore R W ≤ (1 + (P 2 R/(P 1 + 1))λ max )I N .From (20) and using the power allocation given in (19), we have e −(P1P2T/4M(1+P1+P2Rλmax))tr(Sk −Sl) * (Sk−Sl)HH * E fmr ,grn e −(PT/8(1+λmax)MR)tr(Sk −Sl) * (Sk−Sl)HH * . ( The only difference of the above formula with formula ( 20) is that the coefficient in the constant in the denominator of the exponent is 8(1 + λ max ) now instead of 16.This makes sense since c → 1 as R → ∞.Therefore, using an argument similar to the proof of Theorem 3, at high total transmit power, by looking at the highest-order terms of P, The following theorem can thus be obtained.
Theorem 6 (diversity for wireless relay network).Assume that T ≥ MR and the distributed space-time code is full diverse.For large total transmit power P, by at the highest-order terms of P, the PEP of mistaking s k by s l can be upper bounded as where Therefore, the same diversity as in (26) is obtained.
For the case of R < N, G * is an N × R (N > R) matrix whose entries are i.i.d.CN (0, 1).Denote the maximal eigenvalue of (1/N)GG * as λ max .Its PDF and CDF are given in Theorem 5 with R and N being switched.Using the facts that the maximal eigenvalue of ( we can finish the proof of this theorem.

SIMULATION RESULTS
In this section, we show simulated block error rates of three networks with multiple transmit/receive antennas and compare them with the three PEP bounds we derived in ( 25), (31), and (40).These bounds are also addressed as PEP bound 1, PEP bound 2, and PEP bound 3 for the sake of presentation.The main purpose of this section is to verify the diversity results in (26).The optimal code design is not an issue.In the simulations, we use the power allocation in (19) and the ML decoding in (17).It is known that with ML metric, a factor of 1/2 can be applied to Chernoff bounds on the two-signal error rate, which is the block error rate when there are two possible signals.Thus, the PEP bounds shown in Figures 4-6 are calculated from ( 25), (31), and ( 40) with a factor of 1/2.In all figures, the horizontal axis indicates P, the total transmit power used in the whole network.
Our first example, whose performance is shown in Figure 4, is a network with one transmit antenna, two relay antennas, and two receive antennas, that is, M = 1, R = 2, N = 2.We set T = MR = 2.The transmit signal is designed as where s 1 and s 2 are chosen as BPSK signals (normalized according to ( 1)).The matrices used at relays are designed as The distributed space-time codeword formed at the receiver S is thus a 2 × 2 real orthogonal design [50].Then, we show performance of a network with M = 2, R = 2, N = 1 in Figure 5.We set T = MR = 4.The transmit signal is deigned as where s 1 , s 2 , s 3 , s 4 are also BPSK signals (normalized according to (1)).The matrices used at relays are designed as The distributed space-time codeword formed at the receiver S is thus a 4 × 4 real orthogonal design [50].Finally, in Figure 6, we show performance of a network with The transmit signal is     designed as where s 1 and s 2 are BPSK signals (normalized according to (1)).The matrices used at the relay are set to be I 2 .The distributed space-time codeword formed at the receiver S is again a 2 × 2 real orthogonal design [50].The transmission rate of all three networks can be calculated to be 1/2.For comparison, we also show the 2-signal error rates of the three networks by fixing s 2 , . . ., s T .Figure Figures 4-6 indicate that when the transmit power is high, all three networks achieve the diversities shown by the PEP bounds.This verifies our diversity result in (26).PEP bound 1 is the tightest of the three.This is because PEP bound 1 is obtained by approximating R W by its asymptotic (R → ∞) limit, which is also its mean; however, strict lower bounds on R W are used in the calculations of bound 2 and bound 3.In Figure 5, the three bounds are very close to each other and, actually, bounds 1 and 2 are the same.

CONCLUSIONS
In this paper, we generalize the idea of distributed space-time coding to wireless relay networks whose transmitter, receiver, and/or relays can have multiple antennas.We assume that the channel information is only available at the receiver.The ML decoding at the receiver and PEP of the network are analyzed.We have shown that for a wireless relay network with M antennas at the transmitter, N antennas at the receiver, a total of R antennas at all the relay nodes, and a coherence interval not less than MR, an achievable diversity is min{M, N}R if M / = N and MR(1 − (1/M)(log log P/ log P)) if M = N, where P is the total power used in the whole network.This result shows the optimality of distributed space-time coding according to the diversity gain.Simulation results are exhibited to justify our diversity analysis.

APPENDICES A. PROOF OF THEOREM 1
Proof.It is obvious that since H is known and W is Gaussian, the rows of X are Gaussian.We only need to show that the rows of X are uncorrelated and that the mean and variance of the tth row are (P 1 P 2 T/(P 1 + 1)M)[S k ] t H and R W , respectively.The (t, n)th entry of X can be written as where a i,tτ is the (t, τ)th entry of A i and s k,τm is the (τ, m)th entry of s k .With full channel information at the receiver, Therefore, the mean of the tth row is then represented by The fourth equality is true since A i are unitary.Therefore, the rows of X are independent since the covariance of x t1n1 and x t2n2 is zero when t 1 / = t 2 .It is also easy to see that the variance matrix of each row is 16) can be obtained.

B. PROOF OF THEOREM 3
Proof.Define We first give three integral equalities that will be used later: where is the exponential integral function [51].To calculate I, we discuss the following cases separately.
Case 1 (M < N).In this case, By only looking at the highest-order term of P, which is in the first term with l = N − 1, we have While analyzing the performance of the system at high transmit power P, not only is the highest-order term of P important, but also how fast other terms decay with respect to it.Therefore, we should also look at the second highestorder term of P. To do this, we have to consider two different cases.
If N = M + 1, The second highest-order term of P in the PEP behaves as log P/P MR+1 = P −(MR+1−log log P/ log P) .
Also, the second highest-order term of P in the PEP behaves as log R−1 P/P RM and the next term has one log P less and so on.
Case 3 (M > N).In this case, + lower-order terms of P.
(B.18) Thus, We can further upper bound the PEP to get a simpler formula.Notice that 1/(M − l − 1) ≤ 1/(M − N).Thus, As discussed before, we also want to see how dominant the highest-order term of P given in the above formula is.If The second highest-order term in the PEP behaves as 1/(P NR+1 ).If M = N + 1, which indicates that the second highest-order term in the PEP behaves as log P/P NR+1 = R −(NR+1−log log P/ log P) .

C. PROOF OF THEOREM 4
Proof.Since g i have PDF p(g where and x is any positive real number.Let us calculate T 1,...,r first: where γ(n, x) is the incomplete gamma function [51].We should choose x so that the diversity is maximized.Define x = βP α , where β is a positive constant and α is any real constant.The value of β does not affect the diversity.Here, to have the PEP result consistent with formula (25) in Section 6, we set β = (Tσ 2 min /8MNR) α .Therefore, choosing the optimal (in the sense of maximizing the diversity) x is equivalent to choosing the optimal α.If α > 0, the r = 0 term in the PEP upper bound is Therefore, having α positive is not optimal according to diversity.Similarly, if α = 0, x = 1.The r = 0 term in the PEP upper bound, (1/(N −1)! R )γ R (N, 1), is a constant.Therefore, α should be negative.Thus, We are only interested in the highest-order term of P. When P is large, ((R − r)/NR)x is negligible compared with 1.Therefore, where we have defined We consider the expansion of (A + k i=1 λ i ) a into monomial terms: where j denotes how many g i are present, l 1 , . . ., l j are the subscripts of the g i that appear, i m ≥ 1 indicates that g lm is taken to the i m th power, and finally counts how many times the term g i1 l1 g i2 l2 • • • g ij lj appears in the expansion.Thus, where (C.12) Therefore, the highest-order term of P in Λ is the j = 0 term.If we only keep the highest-order term of P in Λ, From the symmetry of g 1 , . . ., g R , we have T i1,...,ir = T 1,...,r .Therefore, We should choose a negative α such that the exponent of the highest-order term of P in the above formula is minimized.
In other words, if we denote the exponent of the rth term as f (r), choose an α < 0 such that max r f (r) is minimized.

D. PROOF OF THEOREM 5
Proof.We first give a theorem that will be needed later.
Theorem 7. Define Λ = (λ 1 , . . ., λ N ).For any functions f , g, and h, g 0 (t) . . .where in the second equality we have changed the integral space from ordered λ i to unordered one.From the symmetry of λ i , we only need to divide the new value by (N − 1)!.From Theorem 7, whose (i, j)th entry is f i j = λ 0 (λ − t) 2 t R−N+i+ j−2 e −Rt dt.The CDF of λ 1 can be obtained similarly.

E. DISCUSSION ON HETEROGENEOUS NETWORKS
In Section 2.2, it is assumed that f mi and g in have the same variance.Physically, this means that the distances between the transmitter/receiver and all relays are about the same, which may not be a practical assumption for networks with scattered nodes.In this appendix, we extend our diversity analysis to heterogeneous networks whose channels have different variances.We assume that the distributions of f mi and g in are CN (0, σ 2 fmi ) and CN (0, σ 2 gin ), respectively.By following the derivation in Section 4, compared with (21), the PEP for the heterogeneous case can be upper bounded by Thus, the same diversity results as in ( 26) can be obtained.Similarly, the rigorous analysis in Section 5 also applies to this heterogeneous case.

Figure 1 :
Figure 1: Wireless relay network with multiple-antenna nodes.