Multimode Transmission in Network MIMO Downlink with Incomplete CSI

1 Department of Signals and Systems, Chalmers University of Technology, 412 96 Gothenburg, Sweden 2 Wireless Networking and Communications Group (WNCG), Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712-0240, USA 3 Deptartment of Electronic and Computer Engineering, Hong Kong University of Science and Technology, (HKUST), Clear Water Bay, Kowloon, Hong Kong 4 Ericsson Research, Ericsson AB, 417 56 Gothenburg, Sweden


Introduction
Recently, cooperative multicell transmission (also called network MIMO) has been proposed as an efficient way to suppress the intercell interference and increase the downlink capacity of cellular systems [1][2][3]. In one way of realizing a network MIMO system, multiple base stations (BSs) are connected to a central unit via backhaul links. The central unit coordinates BSs and performs joint scheduling and signal processing operations. Assuming no limitations with regards to the capacity, error, and delay in the backhaul, and upon the availability of perfect channel state information (CSI) of all users at the central unit, the network MIMO system in the downlink is equivalent to a MIMO broadcast channel with per-BS power constraint (PBPC) [4][5][6][7].
Many of the previous studies on network MIMO assume full coordination over the whole system, which is not practical (if not impossible). First, the backhaul links connecting the BSs and the central unit are subject to transmission error [8] and delay [9]. They also have limited capacity which confines the amount of data and CSI sharing [10][11][12]. Second, connecting a large number of BSs for joint processing is of high complexity, which has motivated the development of coordination strategies at a local scale [13,14]. Third, obtaining perfect CSI from all the users at the central unit, which is indispensable to achieve the full diversity or multiplexing gains, results in a substantial training and feedback overhead [15][16][17].
In this paper, we consider local BS cooperation and focus on the third limitation, that is, the substantial overhead to obtain the CSI for each active user at the central unit. To this end, we assume the backhaul links to be perfect and leave the study of the effect of imperfect backhaul to future work. We propose a framework that enables scheduling based only on the knowledge of the average received SNR at each user from all the cooperating BSs, denoted as incomplete CSI. This reduces the overhead both on the feedback channel and the backhaul, since only the selected users (usually a small number) have to feedback their instantaneous CSI for precoder design.

Related Work.
The scheduling problem in the singlecell multiuser MIMO downlink has been widely investigated under various precoding and beamforming strategies [18][19][20][21][22]. The total number of transmitted data streams in such systems is upper bounded by the number of BS antennas under a linear precoding framework. Therefore, if the total number of receive antennas in the system is greater than the total number of transmit antennas, the scheduling will consist of selecting both users and the number of data streams or modes (note that the term "mode" used in this paper denotes the number of data streams for a given user rather than the number of active users, as in [22] or different MIMO transmission techniques, such as spatial multiplexing/diversity mode [23]) to each user. Such multimode transmission improves performance by allowing a dynamic allocation of the transmission resources among the users [24]. User and mode selection in a network MIMO system is more challenging than in its single-cell counterpart. The increased number of users and BSs in the network MIMO makes the CSI requirement daunting.
Acquiring CSI at the central unit is one of the limiting factors for a practical network MIMO system. The availability of perfect CSI, however, has a cardinal role in exploiting the spatial degrees of freedom in such systems [13]. In practice, CSI for the downlink is obtained through some form of training and feedback. In time-division duplexing (TDD) systems, the CSI is obtained at each BS using the channel reciprocity (see, e.g., [25]). In frequency-division duplexing (FDD) systems, since uplink and downlink take place in widely separated frequency bands, the downlink CSI is fed back via some explicit feedback channels (see e.g., [26]). This places a significant burden on the uplink feedback channel. The feedback overhead increases with the number of BSs, users, antennas, and subcarriers and can easily occupy the whole uplink resources. Furthermore, in both TDD and FDD systems, the CSI should be forwarded from the BSs to the central unit which limits the backhaul resources for data transmission.
The tradeoff between the resources dedicated to CSI overhead and data transmission in the backhaul has been recently studied in [15][16][17], where several multicell system architectures were compared. It was further shown that the downlink performance of network MIMO systems is mainly limited by the inevitable acquisition of CSI rather than by limited backhaul capacity.
As a solution to this limitation, some authors [27,28] have proposed strategies based on local CSI at the BSs and statistical CSI at the central unit, whereas others [29,30] consider to serve only certain subsets of users with multiple BSs. In [31], a decentralized cooperation framework has been proposed in which all the necessary processing is performed in a truly distributed manner among the BSs without the need of any CSI exchange with the central unit. Several BS cooperation strategies have been studied which consider the combination of limited-capacity backhaul and imperfect CSI [32,33].

Contributions.
In this paper, we develop a scheduling algorithm for a network MIMO system with multiantenna users. To reduce the feedback overhead, we adopt a twostep scheduling process: the first step is joint user and mode selection, and the second step is the feedback and precoder design which only involves the users selected in the first step. The two-step multimode transmission strategy was also proposed in the single-cell MIMO downlink in [22], with single-antenna users and imperfect CSI at the BS. The main contributions are as follows.
Ergodic Rate Analysis. We propose an analytical framework to compute an accurate approximation for the ergodic rate of each user with different number of data streams, based only on the knowledge of incomplete CSI for any given location. Essentially, the aggregate channel from multiple distributed cooperating BSs can be well approximated as coming from a single super BS. This enables an efficient method to evaluate the performance of network MIMO systems without the need for extensive and computationally intensive Monte-Carlo simulations.
Joint User and Mode Selection Algorithm. We use the derived ergodic user rates as a metric to perform user and mode selection, which is suitable for the data application without stringent delay constraint. Since the ergodic rate for each user is obtained only based on incomplete CSI, the small-scale fading is not exploited in the proposed strategy. Therefore, it does not provide small-scale multiuser diversity gain, but instead, it omits the need for feedback of instantaneous CSI from a large number of users for scheduling by more than 93%. It is also shown that the performance of the proposed user and mode-selection strategy is very close to the opportunistic scheduling based on instantaneous CSI feedback from all the users.

Organization.
The rest of the paper is organized as follows: the system model and transmission strategy are described in Section 2. In Section 3, some mathematical preliminaries, which are useful throughout the paper, are presented. An analytical framework to derive an approximation for the ergodic rate of each user at different modes is proposed in Section 4. A greedy joint user and mode selection algorithm based on the derived ergodic rates of each user is described in Section 5. The performance of the proposed user and mode-selection algorithm is evaluated in Section 6. Finally, Section 7 concludes the paper and discusses the future work.
[Φ] (m,n) denotes the m × n upper-left corner of a square matrix Φ. λ i (Φ) and λ min (Φ) denote the ith ordered and the smallest eigenvalue of ΦΦ * , respectively.
x denotes the Euclidean norm of a complex vector x, and |S| is the cardinality of a set S; dim(·) is the dimensionality operator. Further, · denotes the floor operation, Φ ⊗ Ψ denotes the Kronecker product of the two matrices Φ and Ψ, and C n m denotes the combination of n choosing m.

System Model
2.1. Network MIMO Structure. The network MIMO system considered in this paper comprises B cells, each of which has a BS with N t antennas and K b users, each equipped with N r antennas, for K b ≥ 1 and b = 1, 2, . . . , B. The total number of active users in the system is denoted as K = B b=1 K b . Users in different locations of the cellular coverage are subject to distance-dependent pathloss and shadowing. A narrowband frequency-flat fading channel is considered. We consider the downlink transmission. The following are the key assumptions made in this paper. Assumption 1. All the B cooperating BSs are interconnected via a central unit with the use of backhaul links with infinite capacity such that they can fully share CSI and user data.
With this assumption, all the cooperating BSs form a distributed antenna array that can perform joint scheduling and transmission. Assumption 2. The number of antennas at each BS is greater than that of each user, that is, N t ≥ N r .
Due to space constraints, user terminals can only have a small number of antennas, which makes N t ≥ N r a reasonable assumption. Therefore, each user can have at most N r data streams. This assumption of scheduling CSI significantly reduces the feedback and backhaul signaling overhead for scheduling. The transmission CSI assumption is due to the transmission strategy employed in this paper (see Section 2.3), which reduces the transmission CSI required for precoding design with respect to the strategies which requires the complete channel matrix of each selected user. The transmission CSI can be reduced even more using limited feedback techniques [34][35][36], which we will not explore in this paper.

Received Signal
Model. The aggregate channel matrix of user k from all the B cooperating BSs can be written as where H k,b ∈ C Nr ×Nt represents the small-scale fading channel matrix and ρ k,b is the large-scale fading channel coefficient that captures the distance-dependent pathloss including shadowing for user k from the bth BS. We denote the N t × 1 transmit signal vector from the bth BS as x b . Therefore, the BN t × 1 aggregate transmit signal vector from all the B cooperating BSs can be written as The discrete-time complex baseband signal received by the kth user is given by where n k is the noise vector at the kth user, with entries that are independent and identically distributed complex Gaussian with zero mean and unit variance, denoted as i.i.d CN (0, 1).

Transmission Strategy.
To simultaneously transmit multiple spatially multiplexed streams to multiple users, we adopt a linear precoding strategy called multiuser eigenmode transmission (MET) (the framework developed in this paper, however, can be used with any other linear precoding in which the precoding matrix for each user is dependent only on the other users' channels) [37]. The MET approach enables the number of data streams for each user to be adaptively selected and at the same time avoids the complexity of joint iterative precoder/equalizer design [38]. Denote K as the set of served users at a given time interval and assign indices k = 1, . . . , |K|. Denote L k as the set of eigenmodes selected for transmission to user k, which are indexed from 1 to k , where k = |L k |. Under a linear precoding framework, the total number of data streams in the downlink, denoted as the system transmission mode (STM), is upper bounded by the number of transmit antennas and can be written as where 1 ≤ L ≤ BN t . The aggregate transmitted signal is given by

EURASIP Journal on Advances in Signal Processing
where T k ∈ C BNt× k is the precoding matrix and d k denotes the k dimensional signal vector for user k. It is assumed that each user k at a given time slot is able to perfectly estimate its channel matrix H k without any error. Furthermore, each user k performs a singular value decomposition (SVD) on its channel as H k = U k Σ k V k . We denote the ith singular value and the corresponding left and right singular vectors of H k as σ k,i , u k,i , and v k,i , respectively. We also assume that the singular values in Σ k are arranged in the descending order; that is, σ k,1 ≥ σ k,2 ≥ · · · ≥ σ k,Nr . The kth user's receiver is a linear equalizer given by [U k ] * :(1: k ) . Using (3) and (5), the postprocessed signal r k after applying the linear equalizer is given by where F k = [U k ] * :(1: k ) H k and w k is the processed noise, which is still white since the equalizer is a unitary matrix. In the case of perfect knowledge of F 1 , . . . , F |K| , denote the aggregate interference matrix as To suppress the interuser interference, the constraint F j T k = 0 must be satisfied for k / = j [39]. This requires that T k lies in the null space of H k . With this constraint satisfied, the second term on the right hand side of the equality in (6) becomes zero. Denote the total number of interfering data streams for user k from the other (|K| − 1) selected users as k = |K| j=1, j / = k j . As a result, there are only BN t − k spatial degrees of freedom available at the transmitter side to support spatial multiplexing for user k, and therefore, [40], it was shown that the precoding matrix T k can be written as a cascade of two precoding matrices B k and D k , that is, where V (0) k corresponds to the right singular vectors of H k associated with the null modes. One natural choice is B k = V (0) k . As a matter of fact, F k B k is the effective interuser interference-free channel for user k. The (BN t − k ) × k matrix D k is used for parallelization. Denote the SVD of the effective channel for user k as where V (1) k denotes the right singular vectors of F k B k corresponding to the first k nonzero singular values. The optimum choice of D k is then

Mathematical Preliminaries
In this section, we present some mathematical preliminaries from matrix variate distributions and random matrix theory which prove useful in the analysis to follow. For more detailed discussions, the readers are referred to [42][43][44]. Definition 1. Let Z denote a q × p complex matrix with q ≤ p and a common covariance matrix C = E{z j z * j } for all j, where z j is the jth column vector of Z. The elements of two columns z i and z j are assumed to be mutually independent. If the elements of Z are identically distributed as CN (0, 1) such that E{Z} = 0, then the Hermitian matrix ZZ * is a central Wishart matrix with p degrees of freedom and covariance matrix C, denoted as ZZ * ∼ CW q (p, C).

Approximation of a Linear Combination of Wishart
Matrices. Let Y s ∼ CW q (p s , C s ) for s = 1, 2, . . . , S be mutually independent central Wishart matrices. Consider a linear combination The distribution of Y can be approximated by the distribution of another Wishart matrix as Y ∼ CW q ( p, C) [42, page 124], where p is the equivalent degrees of freedom given by and C denotes the equivalent covariance matrix written as Now, if p 1 = · · · = p S = p and C 1 = · · · = C S = C, then (10) can be rewritten as Using the determinant property of the Kronecker product [42,Chapter 3], that is, det(C ⊗ C) = det(C) 2q , in (12), we can obtain By substituting (13) in (11), it then holds that Finally, recall from Definition 1 that the condition q ≤ p should hold for the Wishart distribution CW q ( p, C) to be meaningful. In the following theorem, the upper and lower bound for p are obtained.

Theorem 1.
Assume that α s ≥ 0 for s = 1, . . . , S and also that at least one of the α s 's is nonzero. If p is defined in (13), then p ≤ p ≤ Sp. Furthermore, the upper bound equality happens when α 1 = α 2 = · · · = α S , while the lower bound equality holds when ∃!s : α s > 0 (∃! means there exists one and only one).
Proof. See Appendix A.
EURASIP Journal on Advances in Signal Processing 5

Truncation of Random Unitary Matrices and Jacobi Ensemble
Definition 2. If X ∼ CW m (n 1 , C) and Y ∼ CW m (n 2 , C) are independent complex Wishart matrices, then J = X(X + Y) −1 is called a complex Jacobi matrix.
It is shown in [45,Proposition 4.1] that J has the same distribution as that of [U] (q,p) [U] * (q,p) , where U ∈ U(n, n) with q = m, p = n 1 , and n = n 1 + n 2 . Therefore, the eigenvalues of J are the same as those of [U] (q,p) [U] * (q,p) . The distribution of the extreme eigenvalues of the complex Jacobi ensemble is derived in [44].

Ergodic Rate Analysis
In this section, we derive an approximation for the ergodic rate of each user k at different modes k . To assist the analysis, we assume that the elements of H k,b are distributed such that Let the precoding matrix for user k be written as where T k,b denote the precoding applied at the bth BS for user k, such that the transmitted signal from the bth BS can be written as Assuming MET and the practical per-BS power constraint (PBPC) with STM equal to L, the ergodic rate of a user k with k data streams using (6) can be expressed as subject to where is the power allocation matrix for user k and P is the power constraint at each BS. Since the total power constraint (TPC) over all the BSs is less restrictive, the performance under TPC is equal or better than that under PBPC. It has also been shown that there is only a marginal rate loss of PBPC to TPC [13]. Therefore, for simplicity and analytical tractability, we assume TPC and equal power allocation among all the L data streams in the downlink, that is, Q k = (BP/L)I k . The ergodic rate in (16), can be written as [37] where (a) follows using the matrix identity det(I + AB) = det(I + BA). Therefore, the ergodic rate of a user k at mode k depends on the distributions of λ i (F k B k ) for all i. In order to compute the distribution of λ i (F k B k ) in the network MIMO case, we provide the following result.
is the matrix that projects the channel of user k onto the null space of other users and is independent of F k . Assume F k and B k have SVDs given by F k = U Fk Σ Fk V * Fk and B k = U Bk Σ Bk V * Bk , respectively. It then holds that Proof. See Appendix B.
We denote λ min ([V * Fk U Bk ] (i,(BNt− k )) ) with λ min hereafter in the paper for the ease of notation. Denote the joint probability density function (pdf) of λ i (F k ) and λ min as f (λi(Fk),λmin) (λ, λ ), and let f (λi(Hk)) (λ) and f (λmin) (λ ) denote the marginal pdf of λ i (F k ) and λ min , respectively. Using the result of Lemma 1 in (18) and the approximation log(1 + x) ≈ log(x), we can get an approximation for the ergodic rate for user k as where (a) follows from the fact that λ i (F k ) = λ i (H k ) for all i, which results from (6).  ( N t,k , ρ k C), where N t,k and ρ k are obtained using (13) and (14) as (since 6 EURASIP Journal on Advances in Signal Processing the dimensions of H k must be integers, we use x + 0.5 to round x to the nearest integer) and Remark 1. We note that N t,k is a function of ρ k,b for b = 1, . . . , B, which depends on the position of the user k. Therefore, for user k at any given position in the cell, the N r × N t,k i.i.d channel matrix H k ∼ CN (0, ρ k C) can be interpreted as if the user is communicating with one super BS with N t,k transmit antennas and the equivalent large-scale channel coefficient ρ k . Furthermore, according to Theorem 1, the maximum of N t,k is BN t , which corresponds to positions where ρ k,1 = ·· · = ρ k,B . At other positions, however, where user k experiences larger ρ k,b values from some of the BSs and smaller from the others, N t,k will be smaller than BN t . It can be concluded that N t,k is determined mainly by those BSs to which the user has largest ρ k,b values, and those are the ones that help the cooperation and are actually seen by the user.
Since the distribution of H k H * k is approximated with the distribution of another Wishart matrix H k H * k , we have f λi(Hk) (λ) ≈ f λi( Hk) (λ). The distribution f λi( Hk) (λ) for i = 1, . . . , N r for the uncorrelated central case is given in [46] as (the general framework developed in this paper, however, is applicable to arbitrarily correlated channels. We only express the result for the uncorrelated case for simplicity) where G uc is given by The (i, j)th element of Ω is written as and In order to find f λmin (λ ), we note that the multiplication of two unitary matrices is another unitary matrix, that is, [47]. Therefore, [V * Fk U Bk ] (i,(BNt− k )) is a truncated unitary matrix. As mentioned in Section 3.2, for any Wishart distributed unitary matrix with Haar measure A ∈ U(n, n), the multiplication [A] (q,p) [A] * (q,p) for q ≤ p ≤ n, has the same distribution as a complex Jacobi ensemble [45,Proposition 4.1]. The distribution of the minimum eigenvalues of the complex Jacobi ensemble is obtained in [44,Equation 3.2] where Γ m (c) = π m(m−1)/2 m j=1 Γ(c − j + 1) denotes the multivariate Gamma function, Γ(a)¸ ∞ 0 x a−1 e −x dx is the Gamma function, and 2 F (1) 1 ( k , i − BN t + k ; k + i; (1 − λ )I i ) is a hypergeometric function of a matrix argument [44,48]. Based on (23) and (28), we can evaluate (20) numerically.
To verify the accuracy of the approximation in (20), we consider a hexagonal cellular layout with cell sectoring. By using 120-degree sectoring in each cell, every 3 neighboring cells can coordinate with each other to serve users in the shadow area shown in Figure 1. The number of transmit antennas is chosen to be N t = 4, which is the value currently implemented in wireless standards such as 3GPP LTE, and N r = 2. We randomly place two users in each cell sector. The pathloss model is based on scenario C2 of the WINNER II specifications [49]. The large-scale fading is modeled as lognormal with standard deviation of 8 dB. The edge SNR is defined to be the received SNR at the edge of the cell, assuming that one BS transmits at full power while all other BSs are off, accounting for pathloss but ignoring shadowing and small-scale fading. to one random snapshot of user locations when for any given L and k other users are assigned with 1 or 2 data streams. It is also assumed that no user is within a normalized distance of 0.2 from its closest BS for the pathloss model to be valid. It is observed that the lower bound approximation in (20) is very close to the simulation results obtained by Monte-Carlo simulations using (16) over the full range of edge SNR. The difference between the approximation and the achieved rate is small enough to consider the approximation good enough for scheduling as explained in the next section.
To justify the argument in Remark 1, in Figure 3, we plot N t,k versus the normalized distance from the home BS for a sample user k moving along the line that connects the BS 3 to the center of the shaded hexagon in Figure 1. It is observed that within a normalized distance of 0.5 from the BS 3, N t,k = 4 which results from the fact that ρ k,3 is much larger than ρ k,1 and ρ k,2 , and therefore, only the BS 3 is seen by this user. As the user moves toward the center of the shaded hexagon, ρ k,1 and ρ k,2 increase but ρ k,3 decreases, resulting an increase in N t,k . Indeed at the center of the shaded hexagon N t,k = 12, which means all the 3 BSs are seen by the user and actually can be helpful in the cooperation. Therefore, BS cooperation is not very helpful for the cell interior user, and only edge users get most of the benefit. This can be used to design BS cooperation.

Downlink Scheduling: Joint User and Mode Selection
In this section, downlink scheduling for multimode transmission is discussed. The total number of streams in the system under study is upper bounded by BN t , and normally KN r BN t , so at each scheduling phase a subset of users, and the preferred mode of each user must be selected for transmission. In multimode transmission, the number of data streams for each user is adaptively selected, which allows to efficiently exploit the available degrees of freedom in the channel using multimode diversity (multimode diversity is a form of selection diversity among users with multiple antennas, which enables the scheduler to perform selection not only among the users (multiuser diversity), but also among the different eigenmodes of each user) [24].
In a system with heterogenous users, the goal of the downlink scheduling is to make the system operate at a rate 8 EURASIP Journal on Advances in Signal Processing point of its ergodic achievable rate region such that a suitable concave and increasing network utility function g(·) of the user individual ergodic rates is maximized [50].
Let M = {1, 2, . . . , KN r } denote the set consisting of all possible modes for all users, and let S i be a subset of M with |S i | ≤ BN t . Let K i denote the set of users with at least one selected mode in S i . The downlink scheduling problem we wish to solve is defined as To solve (29) through brute-force exhaustive search over S i , BNt m=1 C KNr m combinations must be checked. Furthermore, for each combination S i , the knowledge of {R k ( k , |S i |)} k∈Ki is required, which is not easy to compute in general. This is a computationally complex problem if KN r BN t and very difficult to implement.

Low-Complexity User and Mode Selection.
To reduce the computational complexity and at the same time exploit the benefits of multimode transmission, we propose a lowcomplexity joint user and mode selection algorithm. To simplify the computation of {R k ( k , |S i |)} k∈Ki for any given S i , we propose to use the approximations obtained in (20) for R k ( k , |S i |) instead of the exact fomula in (16). This enables the analytical computation of g({ R k ( k , |S i |)} k∈Ki ) for any given S i , with only the knowledge of the average SNR of each users to all the BSs, and avoids the complexity of precoding matrix computations. Therefore, it not only reduces the computational complexity at the central unit but also removes the need for any instantaneous CSI feedback at the expense of sacrificing the small-scale fading multiuser diversity.
One way to reduce the complexity associated with exhaustive search is to treat (29) as a relaxed optimization problem, that is, to greedily select data streams which maximize the network utility function. Toward this goal, we gradually increase L from 1 to BN t . For any given L, the approximate ergodic rate for the next unselected eigenmode of all users is computed using (20). The algorithm continues until either L = BN t or the network utility function starts to decrease. Once the user and mode selection is done, only selected users need to feedback the singular values and the corresponding singular vectors of their selected eigenmodes for precoder design. Therefore, the proposed scheduling algorithm is of low complexity and is suitable for application when delay is not a stringent constraint or when the feedback resources is limited. The resulting algorithm is summarized in Algorithm 1.

Network Utility Function.
We focus on two special cases of network utility function, namely, the ergodic sum rate and the sum log ergodic rate. To perform maximum sum rate scheduling (MSRS) for a given S i , the per-cell ergodic sum rate utility function is defined as To introduce fairness by performing proportional fairness scheduling (PFS) for a given S i , the per-cell sum log ergodic rate utility function is defined as [50]

Simulation Results and Key Observations
In this section, the performance of the proposed user and mode-selection strategy is evaluated via Monte-Carlo simulations. The assumptions for the cellular layout, pathloss, shadowing, and the number of antennas are given in Section 4. We drop K 1 = K 2 = K 3 = 10 users randomly according to a uniform distribution in each cell. Inspired from [49], we follow a drop-based simulation. In this approach, a drop corresponds to one realization for user locations, during which the large-scale fading parameters as well as velocity and direction of travel for users, are practically constant. Therefore, each user can only undergo small-scale fading at each location. Furthermore, large-scale fading parameters are realized independently from drop to drop. This method does not take into account the time evolution of the channel. The main advantage of it is the simplicity of the simulation. We run 1000 drops for user locations. At the beginning of each drop, all the users feedback their average SNR from all the cooperating BSs (in real systems, such update is not frequent and only occurs when users move around) and the set S is obtained using the Algorithm 1. For each obtained S at each drop, 1000 realizations are simulated with independent small-scale channel states. Figure 4, compares the ergodic sum rate of the proposed strategy with both MSRS and PFS to that of single-user transmission (SUT) and opportunistic scheduling based on instantaneous CSI (OSICSI). In SUT, only one user with the best ergodic rate is selected and served at each scheduling interval. For the detailed information about the OSICSI algorithm, see [37]. It is shown in Figure 4 that the approximate sum rate is quite close to the achieved one with MSRS. It can also be observed that the achieved sum rate for PFS is very close to that of MSRS. It is further shown that the proposed algorithm for both MSRS and PFS performs much better than SUT and achieves a large fraction of the sum rate of OSICSI over a practical range of edge SNR values. For example, at an edge SNR of 10 dB, it achieves 80% and 68% of the sum rate of opportunistic scheduling with MSRS and PFS, respectively. In Figure 5, the sum log ergodic rate versus edge SNR is plotted. It is observed that the approximate and simulated curves are in good agreement.

Sum Rates for Different Systems.
EURASIP Journal on Advances in Signal Processing 9 (1) Initialization: L = 1, S = ∅, g(∅) = 0 (2) while L ≤ min(KN r , BN t ) do (3) r(L) = 0, ν = 0, S L = ∅, K L = ∅, R k (i, L) = 0, k = 0, u k = 0 for k = 1, . . . , K and i = 1, . . . , N r (4) while ν ≤ L do (5) for k = 1 to K do (6) C o m p u t es (20) (7) end for (8) k max ← arg max k s k , k max ← k max + 1 (9) r(L) ← s(k max), u k max ← 1, ν ← ν + 1 (10) end while (11) for k = 1 to K do (12) if u k / = 0 then (13) (14) end if (15) end for (16) if r(L) < r(L − 1) then (17) S = S L−1 , break (18) end if (19) L ← L + 1 (20) end while Algorithm 1: Pseudocode for the proposed algorithm. To compare the performance of MSRS and PFS, we have divided the distance between users and their home BS into bins of width 0.1 of the cell radius, and for each bin, the fraction of times over all the 1000 drops that a user in that bin has been scheduled is plotted in Figure 6. It is observed that with PFS, the activity fraction of users at farther distances from the BS is higher as compared to that using MSRS. On the other hand, the simulation results show that the PFS algorithm almost always chooses 1 data streams for each served user, while in MSRS 2 data streams are also selected about 50% of the time that 1 stream is selected.  Therefore, in PFS, choosing the users without considering the problem of the mode selection (dominant-mode transmission) seems to be the relevant strategy, at least in the chosen scenario. The sum rate of PFS with only dominant selection (PFS-DMS) is also plotted in Figure 4. It can be seen that the sum rate of PFS-DMS matches very well with that of PFS with user and mode selection.

Feedback Analysis.
In this section, we compare the amount of feedback required by the proposed algorithm with  that by opportunistic scheduling based on instantaneous CSI. We define the average feedback load (AFL) as the average number of real coefficients that are fed back to the central unit during each drop normalized to the number of small-scale realizations within that drop. For the proposed algorithm (the results are almost the same for both MSRS and PFS since the AFL depends on the total number of transmitted data stream, that is, STM, which is the same on average for both schemes. We only plot the result for MSRS here), once the set S is obtained at the beginning of a drop, each selected user k ∈ K sends back k real-valued singular values and k complex-valued right singular vectors of size BN t , corresponding to their selected eigenmodes at each small-scale realization for precoding design. If the number of realizations of small-scale channel states within each drop is assumed to be T (1000 in this paper), the AFL is given by Since set S changes from drop to drop, the AFL is obtained by averaging over many drops. For the opportunistic scheduling based on instantaneous CSI, however, all K users should feed back their respective N r × BN t complex channel matrix in all the T realizations in each drop. Therefore, we have In Figure 7, the AFL per cell versus the number of users per cell is compared for the proposed scheduling algorithm and the opportunistic scheduling based on instantaneous CSI. It is observed that for the proposed algorithm, the AFL increases very slowly with the number of users, since no matter how big the number of users is in the cell, the number of served users will be limited by the maximum number of transmit antennas. The only overhead of increasing the number of users is the average SNR values that should be fed back at the beginning of each drop, which is negligible with respect to the amount fed back during a drop. For the opportunistic scheduling, however, the AFL increases linearly with the number of users since it has to feed back the CSI for all the users at each realization. It is observed that for K 1 = K 2 = K 3 = 30, AFL is decreased by more than 93% which makes the proposed algorithm attractive in such kind of scenarios.

Conclusion
In this paper, we propose an analytical framework to approximate ergodic rates of users with different modes in a network MIMO system, based only on the knowledge of received average SNR from all the cooperating BSs at each user, called incomplete CSI in this paper. Based on the derived approximate ergodic rates, the problem of downlink scheduling with both MSRS and PFS is addressed. The proposed scheduling algorithm significantly reduces the feedback amount and performs close to the opportunistic scheduling based on instantaneous CSI. It is of particular interest for applications where there is a total feedback overhead constraint and/or when there is no stringent delay constraint. It is also shown that by introducing fairness, the probability of selecting higher modes for each users decreases significantly, which results in dominant mode transmission (beamforming) to each user.

A. Proof of Theorem 1
In order to prove the lower bound for p, we write The equality condition for the Cauchy-Schwarz inequality is only satisfied when the vectors α = [α 1 · · · α S ] and β = [β 1 · · · β S ] are linearly dependent; that is, α = xβ for some scalar x. Since in our case β = [1 · · · 1], it results that the equality holds only when α 1 = · · · = α S . This completes the proof of the Theorem.

B. Proof of Lemma 1
First, notice that for values of i > k or i > BN t − k , λ i (F k ), or λ i (B k ) are defined as zero. Hence, the argument of this lemma is obvious for these cases. Now, we prove the argument for i ≤ k . For notational convenience, we define m = BN t − k . According to "max-min" half of the Courant-Fischer Theorem [47], for any subspace M ⊂ C BNt− k with dimention i, we have (B.5) Now, let us define V = v 1 (B k ), v 2 (B k ), . . . , v i (B k ) , where v j (B k ) denotes the jth column of V Bk and a 1 , a 2 , . . . , a n is the span of a 1 , a 2 , . . . , a n . We have (e) follows from the fact that with q = [V * Fk U Bk ] (i,m) x , we have [Σ Fk ] :(1:i) q 2 ≥ λ i (F k ) q 2 . Finally, (f) follows from the Rayleigh-Ritz Theorem [47]. Equation (B.6) completes the proof of the Lemma.