Maximum MIMO System Mutual Information with Antenna Selection and Interference

Maximum system mutual information is considered for a group of interfering users employing single user detection and antenna selection of multiple transmit and receive antennas for ﬂat Rayleigh fading channels with independent fading coe ﬃ cients for each path. In the case considered, the only feedback of channel state information to the transmitter is that required for antenna selection, but channel state information is assumed at the receiver. The focus is on extreme cases with very weak interference or very strong interference. It is shown that the optimum signaling covariance matrix is sometimes di ﬀ erent from the standard scaled identity matrix. In fact, this is true even for cases without interference if SNR is su ﬃ ciently weak. Further, the scaled identity matrix is actually that covariance matrix that yields worst performance if the interference is su ﬃ ciently strong.


INTRODUCTION
Multiple-input multiple-output (MIMO) channels formed using transmit and receive antenna arrays are capable of providing very high data rates [1,2]. Implementation of such systems can require additional hardware to implement the multiple RF chains used in a standard multiple transmit and receive antenna array MIMO system. Employing antenna selection [3,4] is one promising approach for reducing complexity while retaining a reasonably large fraction of the high potential data rate of a MIMO approach. One antenna is selected for each available RF chain. In this case, only the best set of antennas is used, while the remaining antennas are not employed, thus reducing the number of required RF chains. For cases with only a single transmit antenna where standard diversity reception is to be employed, this approach, known as "hybrid selection/maximum ratio combining," has been shown to lead to relatively small reductions in performance, as compared with using all receive antennas, for considerable complexity reduction [3,4]. Clearly, antenna selection can be simultaneously employed at the transmitter and at the receiver in a MIMO system leading to larger reductions in complexity.
Employing antenna selection both at the transmitter and the receiver in a MIMO system has been studied very recently [5,6,7]. Cases with full and limited feedback of information from the receiver to the transmitter have been considered. The cases with limited feedback are especially attractive in that they allow antenna selection at the transmitter without requiring a full description of the channel or its eigenvector decomposition to be fed back. In particular, the only information fed back is the selected subset of transmit antennas to be employed. While cases with this limited feedback of information from the receiver to the transmitter have been studied in these papers, each assume that the transmitter sends a different (independent) equal power signal out of each selected antenna. Transmitting a different equal power signal out of each antenna is the optimum approach for the case where selection is not employed [8] but it is not optimum if antenna selection is used. The purpose of this paper is to find the optimum signaling. This problem is still unsolved to date. For simplicity, we ignore any delay or error that might actually be present in the feedback signal. We assume the feedback signal is accurate and instantly follows any changes in the environment.
Consider a system where cochannel interference is present from L − 1 other users. We focus on the Lth user and assume each user employs n t transmit antennas and n r receive antennas. In this case, the vector of received complex baseband samples after matched filtering becomes where H L, j and x j represent the normalized channel matrix and the normalized transmitted signal of user j, respectively. The signal-to-noise ratio (SNR) of user L is ρ L and the interference-to-noise ratio (INR) for user L due to interference from user j is η L, j . For simplicity, we assume all of the interfering signals x j , j = 1, . . . , L − 1, are unknown to the receiver and we model each of them as being complex Gaussian distributed, the usual form of the optimum signal in MIMO problems. Then if we condition on H L,1 , . . . , H L,L , the interference-plus-noise from (1), L−1 j=1 √ η L, j H L, j x j + n, is complex Gaussian distributed with the covariance matrix R L = L−1 j=1 η L, j H L, j S j H H L, j + I nr , where S j denotes the covariance matrix of x j and I nr is the covariance matrix of n. Under this conditioning, the interference-plus-noise is whitened by multiplying y L by R −1/2 L . After performing this multiplication, we can use results from [2,8,9] (see also [10, pp. 12-23, pp. 250,256]) to express the ergodic mutual information between the input and output for the user of interest as in the following: (H reminds us of the assumed model for H L,1 , . . . , H L,L ). In (2), the identity det (I + AB) = det (I + BA) was used. If we wish to compute total system mutual information, we should find S 1 , . . . , S L to maximize Now, assume that each receiver selects n sr < n r receive antennas and n st < n t transmit antennas based on the channel conditions and feeds back the information to the transmitter. 1 Then the observations from the selected antennas follow the model in (1) with n t and n r replaced by n st and n sr , respectively, and H i, j replaced byH i, j . The matrixH i, j is obtained by eliminating those columns and rows of H i, j corresponding to unselected transmit and receive antennas. Thus we can writeH i, j = g(H i, j ), where the function g will choosẽ H i, j to maximize the instantaneous (and thus also the ergodic) mutual information (or some related quantity for the signaling approach employed). In order to promote brevity, we will restrict attention in the rest of this paper to the case where n st = n sr so we will only use the notation for n st . We note that the majority of the results given carry over immediately for the case of n st = n sr , and since this will be obvious in these cases, we will not discuss this further.
It is important to note that we restrict attention to narrowband systems using single user detection, equal power (constant over time) for each user, and fixed definitions of the transmitting and receiving users. Future extensions which remove some assumptions are of great interest. However, as we will show, these assumptions lead to interesting closed form results which we believe give insight into the fundamental properties of MIMO with antenna selection.
In Section 2, we give a general discussion and some useful relationships used to study the convexity and concavity properties of the system mutual information. In Section 3, we study cases with weak interference. We follow this, in Section 4, with our results for strong interference. The results in Sections 3 and 4 are general for any n st = n sr , n t , n r , and L. Section 5 is devoted to numerical studies for the particular case of n r = n t = 8, n sr = n st = L = 2 to illustrate the agreement with the theory from Sections 3 and 4. The results in Section 5 also show that our asymptotic results give useful information for nonasymptotic cases as well. The paper concludes with Section 6.

GENERAL ANALYSIS OF SYSTEM MUTUAL INFORMATION
Clearly for 0 ≤ t ≤ 1 a scalar. Then Ψ(S 1 , . . . , S L ) is a convex function of (S 1 , . . . , S L ) if [12] Similarly, Ψ(S 1 , . . . , S L ) is a concave function of (S 1 , . . . , S L ) if There are several useful known relationships for the derivative of a function of a matrix Φ with respect to a scalar parameter t. In particular, we note that [ Assuming selection is employed, we can use (3) and (7) to find (interchanging a derivative and an expected value) where A second derivative yields with

OPTIMUM SIGNALING FOR WEAK INTERFERENCE
We can use (12) to investigate convexity and concavity for any particular set of SNRs ρ i , i = 1, . . . , L, and INRs η i, j , i, j = 1, . . . , L, i = j. We investigate extreme cases, weak or strong interference, to gain insight. The following lemma considers the case of very weak interference. Lemma 1. Assuming sufficiently weak interference, the best (S 1 , . . . , S L ) (that maximizes the ergodic system mutual information) must be of the form where O nst is an n st by n st matrix of all ones, α = 1/n st , and Outline of the proof. For the case of very weak interference, we ignore terms which are multiples of η i, j (essentially, we set η i, j → 0 for i = 1, . . . , L, j = 1, . . . , L, and j = i) and we find (d/dt)Q i = 0 so that (d 2 /dt 2 )Q i = 0 which leads to  [13]. We see trace[BB H ] must be nonnegative since the matrix inside the trace is nonnegativedefinite so that (15) implies that Ψ(S 1 , . . . , S L ) is concave. This will be true for sufficiently small To recognize the significance of the concavity, we note that given any permutation matrix Π, we know [8] thatH i, j has the same distribution asH i, j Π (switching the ordering or names of selected antennas cannot change the physical problem), so Ψ(ΠS 1 Π H , . . . , ΠS L Π H ) = Ψ(S 1 , . . . , S L ). Let Π denote the sum over all the different permutation matrices and let N denote the number of terms in the sum. From concavity, [8] which implies that the optimum (S 1 , . . . , S L ) must be of the form such that it is invariant to transforms by permutation matrices. This implies that the best (S 1 , . . . , S L ) must be of the form given in (14). We refer the interested reader to [14] for a rigorous proof of this (taken from a single user case).
Before considering specific assumptions on the SNR, we note the similarity of (14) to (4)

Small SNR
Thus we have determined the best signaling except for the unknown scalar parameters γ 1 , . . . , γ L which we now investigate. Generally, the best approach will change with SNR. First, consider the case of weak SNR for which the following lemma applies (recall we have now already focused on very weak or no interference). Lemma 2. Leth(p, p) i, j denote the (i, j)th entry of the ma-trixH p,p and defineS 1 , . . . ,S L from (14). Assuming sufficiently weak interference and sufficiently weak SNR, Outline of the proof. Using the similarity of (14) to (4), (d/dγ p )Ψ can be seen to be the pth component of the sum in , and t = γ p . To assert the weak signal and interference assumptions, we set η i, j → 0 for all i, j and ρ i → 0 for all i and in this case we find and using (8) gives where the n st × n st matrix can be explicitly written as Explicitly carrying out the operations in (18) gives (16).
Notice that without selection (in this caseH p,p = H p,p ), the quantity in (16) becomes zero under the assumed model for H p,q (i.i.d complex Gaussian entries). Thus selection turns out to be an important aspect in the analysis. The following lemmas will be used with the result in Lemma 2 to develop the main result of this section. Lemma 3. Leth(p, p) i, j denote the (i, j)th entry of the ma-trixH p,p and defineS 1 , . . . ,S L from (14). Assuming sufficiently weak interference and sufficiently weak SNR, Outline of the proof. Consider an n st × n st nonnegative definite matrix A and let λ 1 (A), . . . , λ nst (A) denote the eigenvalues of A. For sufficiently weak SNR ρ i , we can approxi- (3), for the set of covariance matrices in (14) and assume that selection is employed. Thus we consider the resulting Ψ as a function of (γ 1 , . . . , γ L ) and we see Note that the n st × n st matrix can be explicitly written as Using (22) in (21) with further simplification gives (20).
Lemma 4. Assuming sufficiently weak interference and sufficiently weak SNR, the antenna selection that maximizes the ergodic system mutual information will make positive.
Outline of the proof. First, consider the antenna selection approach for the pth link which maximizes the ergodic system mutual information in (20) when γ p = 1 in (14). Thus the selection approach will maximize the quantity in the pth term in the first sum in (20) when γ p = 1 by selecting antennas for each set of instantaneous channel matrices to make the terms inside the expected value as large as possible. It is important to note that the choice (if γ p = 1) depends only on the squared magnitude of elements of the channel matrices. If we use this selection approach when γ p = 1, then the terms multiplied by (1 − γ p ) in (20) will be averaged to zero due to the symmetry in the selection criterion. To see this, first note that the contribution to the ergodic mutual information due to the pth term is p)1,1,...,h(p,p) p)1,1,...,h(p,p)n t ,nr h(p, p) 1,1 , . . . , h(p, p) will all appear in (24). Since each of these four possible values appear for four equal area (actually probability) regions in channel coefficient space, a complete cancellation of these terms results in (24). In fact, this leads to (24) averaging to zero. Thus if we use the selection approach that will maximize (20) with γ p = 1, this is the best we can do. However, if γ p = 1, we can do better. Due to the cross terms in (20) in the term multiplied by (1 − γ p ), we can use selection to do better by modifying the selection approach. To understand the basic idea, letH denote the ma-trixH p,p for a particular selection of antennas andH denote the same quantity for a different selection of antennas. Now consider two selection approaches which are the same except the second approach will chooseH in cases where and (in the sum, both a term and its conjugate appear, giving a real quantity) Assume the first selection approach is the one trying to max-imize (20) with γ p = 1 so it will just select randomly if since it ignores the cross terms in its selection. From (20), the second selection approach will give larger instantaneous mutual information for each event where the selection is different. Since the probability of the event that makes the two approaches different is greater than zero under our assumed model, then the second antenna selection approach will lead to improvement (if γ p = 1) and it will do this by making the term multiplied by (1 − γ p ) in (20) positive. Clearly the optimum selection scheme will be at least as good or better, so it must also give improvement by making the term multiplied by (1 − γ p ) in (20) positive.
We are now ready to give the main result of this section. Theorem 1. Assuming sufficiently weak interference, sufficiently weak SNRs, and optimum antenna selection, the best (S 1 , . . . , S L ) (that maximizes the ergodic system mutual information) uses Outline of the proof. The assumption of weak SNRs implies that ρ p is small for all 1 ≤ p ≤ L. In this case, optimum selection will attempt to make E{ nst i=1 nst j=1 nst j =1, j = jh * (p, p) i, jh (p, p) i, j } as large as possible as shown in Lemma 3. Lemma 4 builds on Lemma 3 to show that optimum selection can always make E{ nst i=1 nst j=1 nst j =1, j = jh * (p, p) i, jh (p, p) i, j } positive. Lemma 2 shows that (d/dγ p )Ψ is directly proportional to the negative of E{ nst i=1 nst j=1 nst j =1, j = jh * (p, p) i, jh (p, p) i, j } which the selection is making positive and large. Thus it follows that (d/dγ p )Ψ is always negative which implies that the best solution employs γ p = 0 since any increase in γ p away from γ p = 0 causes a decrease in Ψ. Since ρ p is small for all p, the theorem follows.

Large SNR
Now consider the case of large SNR, where the following theorem applies.
Outline of the proof. Asserting the weak interference, large SNR assumption in (8) gives which is positive for 0 < γ p < 1 (since (n st − 1)γ p < n st ) and zero if γ p = 1. In (33), we used trace [CD] = trace [DC] [13]. Thus for the large SNR case (large ρ p for all p) when the interference is very weak, the best signaling uses (14) with γ p = 1. Since this is true for all p, the theorem follows.
As a further comment on Theorem 2, we note that the proof makes it clear that if ρ p is large only for certain p, then γ p = 1 for those p only. Likewise, it is clear from Theorem 1 that if ρ p is small only for certain p, then γ p = 0 for those p only. Of course, this assumes weak interference. Thus we can image a case where the best signaling uses γ p = 1 for some p and γ p = 0 for some p = p with proper assumptions on the corresponding ρ p , ρ p . One can construct similar cases where only some of the η i, j are small and easily extend the results given here in a straight forward way.

STRONG INTERFERENCE
Now consider the other extreme of dominating interference where η i, j , i = 1, . . . , L, j = 1, . . . , L, is large (compared to ρ 1 , . . . , ρ L ). The following lemma addresses the worst signaling to use. Lemma 5. Assuming sufficiently strong interference, the worst (S 1 , . . . , S L ) (that minimizes the ergodic system mutual information) must be of the form where O nst is an n st by n st matrix of all ones, α = 1/n st , and Outline of the proof. Provided η i, j is sufficiently large, we can approximate (9) as After applying this to (12) and using (13) for large η i, j so that we find the first term inside the trace in (12) depends inversely on η i, j , while the second term inside the trace in (12) depends inversely on η 2 i, j so that the first term dominates for large η i, j . Further, we can interchange the expected value and the trace in (12) so we are concerned with the expected value of (13). Now note that the first term in (13) consists of the product of a term A =H i,i S iH H i,i and another term depending onH i, j for j = i. Now consider the expected value of (13) computed first as an expected value conditioned on {H i, j , j = i} and then this expected value is averaged over {H i, j , j = i}. Now note that the conditional expected value of A becomes the zero matrix. 3 Thus the contribution from the first term in (13) averages to zero so that which is nonnegative. To see this, we can use a few of the same simplifications used previously. Expand the nonnegative definite matrices 2ρ iHi,iSiH H i,i and ( L j=1, j =i η i, jHi, jS jH H i, j ) −1 using the unitary matrix/eigenvalue expansions as done after (15). Then the matrix inside the expected value in (36) can be factored into BB H after manipulations similar to those used after (15). Thus Ψ(S 1 , . . . , S L ) is convex. Thus using the same permutation argument as used for the weak interference case, the result stated in the theorem follows.
The following theorem builds on Lemma 5 to specify the exact γ 1 , . . . , γ L giving worst performance. Theorem 3. Assuming sufficiently strong interference and optimum antenna selection, the worst (S 1 , . . . , S L ) (that minimizes the ergodic system mutual information) uses Outline of the proof. Consider Ψ(S 1 , . . . , S L ) for (S 1 , . . . , S L ) of the form given by Lemma 5 which is (from (2) and (3)) where the first simplification follows from large η i, j and the same simplifications used in (21). The second simplification follows from those in (20) but nowĥ(p, p) i, j denotes the (i, j)th entry of the matrix R −1/2 pHp,p . Now note that antenna selection will attempt to make the second term in the last line of (38), which multiplies the positive constant 1−γ p , as large and positive as it possibly can. In fact, it is easy to argue that antenna selection can always make this term positive as done previously for (20). We skip this since the problems are so similar. Thus we see that the best performance for (S 1 , . . . , S L ) of the form given by Lemma 5 must be obtained for γ p = 0 and the worst performance must occur at γ p = 1. Since this is true for all p, the result in the theorem follows.
The result in Theorem 3 tells us that the best signaling for cases without interference and selection is the worst for strong interference and selection. It appears that the best signaling for (S 1 , . . . , S L ) of the form given by Lemma 5 (see the discussion in Theorem 3) may be the best signaling overall.
However, it appears difficult to show this generally.
The following intuitive discussion gives some further insight. Due to convexity, the best performance will occur at a point as far away from the point giving worst performance Consider the case of n st = n sr = L = 2, n t = n r = 8, η 1,2 = η 2,1 = η, and ρ 1 = ρ 2 = ρ and assume that the optimum antenna selection (to optimize system mutual information) is employed. First consider the case of no interference and assume a set of covariance matrices of the form (S 1 , S 2 ) = γ(1/2)(I 2 , . Thus since ρ 1 = ρ 2 = ρ and η 1,2 = η 2,1 = η, we set γ 1 = γ 2 = γ. Figure 1 shows a plot of the γ giving the largest mutual information versus SNR, for SNR (ρ) ranging from −10 dB to +10 dB. We see that the best performance for very small ρ is obtained for γ = 0 which is in agreement with our analytical results given previously. For large ρ, the best signaling uses γ = 1 which is also in agreement with our analytical results given previously. Figure 1 shows that the switch from where γ = 0 is optimum to where γ = 1 is optimum is very rapid and occurs near ρ = −3 dB. Now consider cases with possible interference. Again consider the case of n st = n sr = L = 2, n t = n r = 8, η 1,2 = η 2,1 = η, and ρ 1 = ρ 2 = ρ and assume that the optimum antenna selection (to optimize system mutual information) is employed. To simplify matters, we constrain S 1 = S 2 in all cases shown. First we considered three specific signaling covariance matrices which are We tried each of these for SNRs and INRs between −10 dB and +10 dB. Then we recorded which of the approaches provided the smallest and the largest system mutual information. These results can be compared with the analytical  Figure 1: Optimum γ versus ρ 1 = ρ 2 = SNR for cases with no interference and n st = n sr = 2, n t = n r = 8. Note that γ = 0 is the best for −10 dB < SNR < −3 dB and γ = 1 is the best for −2 dB < SNR < 10 dB.  results given in Sections 3 and 4 of this paper for weak and strong interference and SNR. Figure 2 shows the worst signaling we found versus SNR and INR for ρ 1 = ρ 2 = SNR and η 1,2 = η 2,1 = INR. For large INR, Figure 2 indicates that S 1 = S 2 = (1/2)I 2 leads to worst performance which is in agreement with our analytical results given previously. Figure 2 also shows that either S 1 = S 2 = (1/2)I 2 (for weak SNR) or (for large SNR) S 1 = S 2 with only one nonzero entry (a one which must be along the diagonal) will lead to worst performance for weak interference.
For weak interference, Figure 3 shows that the best performance is achieved by either S 1 = S 2 = (1/2)O 2 (for weak SNR) or S 1 = S 2 = (1/2)I 2 (for large SNR). This agrees with our analytical results presented previously. Figure 3 shows that the best performance is achieved by S 1 = S 2 = (1/2)O 2 for large interference and this also agrees with our analytical results presented previously. We note that in the cases of interest (those for which we give analytical results), the difference in mutual information between the best and the worst approach in Figures 2 and 3 was about 1 to 3 bits/s/Hz. We selected a few SNR-INR points sufficiently (greater than 2 dB) far from the dividing curves in Figures 2 and 3. For these points, we attempted to obtain further information on whether the approaches shown to be the best and worst in Figures 2 and 3 are actually the best and the worst of all valid approaches under the assumption that S 1 = S 2 . We did this by evaluating the system mutual information for for various values of the real constant a and the complex constant b on a grid. When we evaluated (40) for all real a and b an a grid for a range of values consistent with the trace (power) and nonnegative definite enforcing constraints on S 1 = S 2 , we did find the approaches in Figures 2 and 3 did indicate the overall best and worst approaches for the few cases we tried. Limited investigations involving complex b (here the extra dimension complicated matters, making strong conclusions difficult) indicated that these conclusions appeared to generalize to complex b also.

Partitioning the SNR-INR Plane
Based on Sections 3 and 4, we see that generally the space of all SNRs ρ i , i = 1, . . . , L, and INRs η i, j , i, j = 1, . . . , L, i = j, can be divided into three regions: one where the interference is considered weak (where Figure 1 and its generalization apply), one where the interference is considered to dominate (where Figure 3 and its generalization apply), and a transition region between the two. For the case with n st = n sr = L = 2, n t = n r = 8, η 1,2 = η 2,1 = η, and ρ 1 = ρ 2 = ρ, we have used (12) to study the three regions. We first evaluated (12) numerically using Monte Carlo simulations for a grid of points in SNR and INR space. The Monte Carlo simulations just described were calculated over a very fine grid over the region −10 dB ≤ ρ ≤ 10 dB and −10 dB ≤ η ≤ 10 dB. For each given point in SNR and INR space, we evaluated (12) for many different choices of (S 1 , . . . , S L ), (Ŝ 1 , . . . ,Ŝ L ), and the scalar t. We checked for a consistent positive or negative value for (12) for all (S 1 , . . . , S L ), (Ŝ 1 , . . . ,Ŝ L ), and the scalar t on the discrete grid (quantize each scalar variable, including those in each entry of each matrix). In this way, we have viewed the approximate form of these three regions. We found that generally for points sufficiently far (more than 2 dB from closest curve) from the two dividing curves in Figures 2 and 3, the convexity and concavity follows that for the asymptotic case (strong or weak INR) in the given region. Thus the asymptotic results appear to give valuable conclusions about finite SNR and INR cases. Limited numerical investigations suggest this is true in other cases but the high dimensionality of the problem (especially for n st , n sr , L > 2) makes strong conclusions difficult.

CONCLUSIONS
We have analyzed the (mutual information) optimum signaling for cases where multiple users interfere while using single user detection and antenna selection. We concentrate on extreme cases with very weak interference or very strong interference. We have found that the best signaling is sometimes different from the scaled identity matrix that is best for no interference and no antenna selection. In fact, this is true even for cases without interference if SNR is sufficiently weak. Further, the scaled identity matrix is actually the covariance matrix that yields worst performance if the interference is sufficiently strong.