A Limited Feedback SDMA for Downlink of Multiuser MIMO Communication System

This paper proposes a limited feedback SDMA scheme of combining opportunistic scheduling and codebook-based multiuser precoding. A new systematic construction for SDMA codebook, which comprises a set of precoders for multiple simultaneously active users, is ﬁrst presented. Di ﬀ erent from conventional Grassmannian codebook, the proposed codebook is designed in terms of array processing and has a cluster-based structure, with each cluster generated using a perturbation method. In order to tackle the intractable interuser interference issue inhered in limited feedback SDMA, this paper further proposes two novel opportunistic scheduling algorithms, which are able to fully exploit the cluster structure of the proposed codebook. The ﬁrst proposed algorithm schedules the simultaneous users and their preferred precoders in a successive way, and is implemented in a Markov-like fashion. The second proposed algorithm is capable of rapidly ﬁnding a group of channel-matching users along with their preferred precoders. Simulation results demonstrate that in sparse networks, the proposed SDMA exhibits a better throughput performance than the conventional limited feedback SDMA does, while both having a comparable feedback overhead.


INTRODUCTION
Space division multiple access (SDMA) is capable of considerably improving the throughput of multiple antenna broadcast channels, in comparison with time division multiple access (TDMA). Thus, SDMA has been adopted by IEEE 802.20 and other standard bodies. Through multiuser precoding, SDMA enables base station to simultaneously communicate with multiple users using the same timefrequency resource. The optimal performance of SDMA can be achieved by combining dirty paper coding (DPC) [1] with appropriate user scheduling. However, DPC is infeasible since it is extremely computationally intensive, even some simplified versions of DPC, such as Tomlinson-Harashima precoding [2][3][4], are still very difficult for implementation due to high computational complexity. In contrast, several low-complexity multiuser precoding techniques have been developed, including zero-forcing precoding [5], block diagonalized precoding [6], MMSE precoding [7], the generalized eigenvalue-based solutions [8] and the iterative algorithms [9], and so forth, which can achieve a large portion of DPC capacity. To implement these schemes, perfect channel state information (CSI) of each user is acquired at the base station. In FDD systems, the downlink CSI should be fed back to the base station, which is infeasible in practice due to the limited bandwidth. In TDD systems, CSI can be obtained by exploiting the reciprocity between the downlink and uplink channels. However, for some reasons such as different RF circuits at the base station and user terminal, the obtained CSI may suffer from severe estimation error. Thus, it is important to improve the robustness of the existing multiuser precoding techniques to CSI estimation error [10], which, however, will dramatically increase the complexity. Recently, a type of SDMA based on quantized CSI feedback has been extensively studied [11][12][13].
Most of these works concentrate on the MISO-SDMA, their extensions for the MIMO-SDMA commonly require much more feedback bits to guarantee a reasonable performance. Alternatively, a type of opportunistic SDMA using a random unitary matrix as the multiuser precoder is proposed in [14], which can asymptotically obtain the optimal scaling law of sum capacity when the number of users is sufficiently 2 EURASIP Journal on Advances in Signal Processing large. However, in sparse networks where the number of users is small, the performance of this scheme severely degrades due to excessive mutual interferences between simultaneously active users. To conquer this problem, a modified opportunistic SDMA is proposed in [15], which requires the scheduled users to feed back full CSI using an extra feedback round. Although this scheme is able to deal with the interference problem in sparse networks, the feedback overhead is dramatically increased as well. Comparatively, the MIMO-SDMA scheme adopted by IEEE 802.20 standard [16] uses a set of predefined precoding matrices, that is, an SDMA codebook, to guarantee a reasonable performance in sparse networks. This scheme needs a small amount of feedback information and has a simple user scheduling algorithm. However, this codebookbased SDMA scheme is oversimplified since its codebook is essentially packed with only two Grassmannian subspaces. Therefore, the performance improvement from codebook employment is still limited.
In this paper, we first present a systematic design for a new SDMA codebook. Unlike the conventional precoder codebook used for point-to-point communication systems [16][17][18][19], the SDMA codebook design should take into account more performance metrics such as the facilitation of user scheduler and the degree of interuser interference suppression [20,21]. Aiming at an SDMA codebook that is more general than the one adopted in IEEE 802.20, this paper proposes a systematic construction for SDMA codebook from the viewpoint of array processing. In terms of subspace packing [18], the proposed SDMA codebook can be packed with arbitrary number of subspaces. Instead of using the conventional criterion of maximizing the minimum distance, our employed subspace packing is designed with a uniform separation of beam direction, that is more suitable for multiuser precoding in terms of interuser interference suppression. In addition, each subspace included in the SDMA codebook experiences a number of unitary perturbations [22], with each perturbation generating a distinct precoding matrix. The set of the resultant precoding matrices from each subspace forms a cluster, and the collection of all the clusters forms the codebook. The perturbation diversity associated with each subspace is capable of compensating for the performance loss caused by a linear receiver [22]. Based on the proposed SDMA codebook, we further propose two opportunistic scheduling algorithms, both of which acquire only limited feedback information from users. The first algorithm is called Markov opportunistic scheduling (MOS), which utilizes the feedback information from both the current and previous scheduling intervals, and is implemented in a Markov-like fashion such that the interuser interference is fully suppressed. In this proposed algorithm, the scheduled users and the preferred precoders will be selected successively, and the feedback information in each scheduling can be divided into two parts, with one part used in the current scheduling and the other part used in the next scheduling. The second algorithm is called quick matching opportunistic scheduling (QMOS), which is able to rapidly find a group of best-matching users along with their preferred precoders. The channels of the matching users have a potential of good interference suppression. Thus, a simultaneous transmission to the matching users can fully exploit the spatial degree of freedom.
The rest of this paper is organized as follows. Section 2 briefly describes the proposed system model. Section 3 presents the construction for a new SDMA codebook. Section 4 proposes two novel opportunistic scheduling algorithms. Simulation results are given in Section 5 and conclusions are drawn in Section 6.

SYSTEM MODEL
We consider the downlink of a multiuser MIMO communication system. The base station is equipped with n T transmit antennas and each user terminal is equipped with n R receive antennas. It is assumed that n T ≥ n R and there are U users being served by the base station. The base station will schedule K out of U users together and simultaneously communicate with them, that is, in an SDMA mode. The signal intended for the kth user is precoded with W k ∈ C nT ×M , M ≤ min(n T , n R ), thus the base station will simultaneously transmit M independent data streams to this user. The overall transmit signal at the base station can be expressed as where s k denotes the transmit signal vector for the kth user, and we assume that E[s k s H k ] = E s I M . The received signal at the kth user can be written as where H k denotes an n R ×n T flat channel matrix whose entry h nm,k represents the channel response from the kth scheduled user's transmit antenna m to the receive antenna n, and n k denotes the noise vector whose entry is the complex white noise with zero mean and N 0 variance. In order to reduce the amount of feedback information, we predefine an SDMA codebook which includes all the precoding matrices {W k }. Once the base station gathers the limited feedback information from all the users, the scheduler will rapidly select multiple simultaneously active users together with their preferred precoders. Different from the precoder selection in point-to-point communication system, the opportunistic SDMA scheduler should tackle the extra issue of interuser interference suppression.

IEEE 802.20 SDMA codebook
An SDMA codebook is presented in IEEE 802.20 standard [16], which is defined as a set of precoders for individual active users in SDMA. This codebook consists of two clusters, each defined as a set of precoders which column span a same subspace. This section provides a quick review of this SDMA codebook construction and shows some of its advantages.

3
The IEEE 802.20 SDMA codebook focuses on the configuration of 4 transmit antennas at the base station and 2 receive antennas at the terminal, thus the codewords in the SDMA codebook belong to C 4×2 . As stated in [16], the subspaces represented by two clusters are constructed from a preset DFT-based matrix B, which is given as (3) Next, we will introduce the detailed construction of two clusters, each including 14 precoders. 2)e j2π(m−1)(n−1)/2 , m, n = 1, 2. Actually, the matrix Λ i D can be viewed as a random unitary matrix distributed on U(2, 2) [23].

(ii) Advantages of the two-cluster construction
From the Grassmannian subspace definition [18,24], the subspace spanned by B(:, 1 : 2) is equivalent to the one spanned by B(:, 1 : 2)A, ∀A ∈ U(2, 2). This is because the distance between these two subspaces always equals zero, no matter what type of distance definition is used. Therefore, the different precoders in the same cluster span the same subspace, which implies that the SDMA codebook is packed with two subspaces spanned by B(:, 1 : 2) and B(:, 3 : 4). We can also find that any two precoders from different clusters are orthogonal to each other, that is, Assuming that each individual user in SDMA supports min(n T , n R ) multiplexed substreams, the 4-2 antenna configuration implies that the base station can schedule up to two users on the same time-frequency resource. To suppress the interuser interference, it is required that the two selected precoders be from different clusters, meaning that the resultant multiuser precoder, that is, the collection of the precoders for all the active users, has to be a unitary matrix. For a given precoder F i , we define its unitary perturbation as is called the perturbation matrix. Accordingly, the construction of the clusters G 1 and G 2 can be viewed as a finite number of unitary perturbations to B(:, 1 : 2) and B(:, 3 : 4), using the perturbation matrices A i = Λ i D. It is found that a unitary perturbation will not affect the original precoder from the subspace perspective, since the subspace spanned by T i is identical to that spanned by F i . However, a different perturbation may have a distinct impact on the sum throughput achieved by the SDMA system, which will be described in detail in Section 4. This property reveals the significance of including different perturbations into the codebook construction. In the SDMA implementation, the scheduled user k will receive not only its own signal but also the signal from the other scheduled user (denoted as the interfering user). As shown in Section 4, when F i is used as the precoder of the interfering user, any unitary perturbation to F i will not change the interference imposed on the user k, which means the use of any precoder in the same cluster/subspace by the interfering user will impose the same interference on the user k. Since the codebook only consists of two clusters/subspaces, the user can easily estimate the possible interference from the other scheduled users. As a result, the scheduling algorithm can easily take into account the suppression of the interuser interference.

Proposed SDMA codebook
Although the SDMA codebook adopted in IEEE 802.20 is able to simplify the design of the scheduling algorithm and provides a reasonable performance even in sparse networks, the following issues have not been addressed.
(i) The IEEE 802.20 SDMA codebook actually restricts the resultant multiuser precoder into a unitary matrix, which is optimal in the case of a large number of users [14]. However, it has been shown that in sparse networks, especially when the number of users is equal to that of the supported simultaneously active users, the optimal linear multiuser precoder designed with full CSI is not a unitary matrix in most cases [5][6][7][8][9]. How can we design the SDMA codebook while taking into account both the above two scenarios?
(ii) The SDMA codebook is packed with two subspaces/clusters, which leads to a low density of subspace packing. Since the design of the SDMA codebook should consider both the optimization of the interference-free performance of the individual users and the suppression of interuser interferences, the criterion of maximizing the minimum distance [18] is not suitable to increase the density of subspace packing. We need to find a new method to extend the SDMA codebook for any arbitrary number of subspaces/clusters.
(iii) The feedback overhead mainly depends on the size of the SDMA codebook, and is as such determined by the density of subspace packing as well as the cluster size. Is it possible to find a good tradeoff between the number of packed subspaces and the cluster size such that the SDMA performance will be improved with a feedback overhead comparable to that required by IEEE 802.20 SDMA codebook? 4 EURASIP Journal on Advances in Signal Processing The packing subspaces SDMA codebook of size 2n u × L c With an aim to address the above questions, we now design a new SDMA codebook with precoder in size of n T × n T /2, for a multiuser network in which each terminal is equipped with no more than n T /2 antennas. Different from the conventional Grassmannian subspace packing that uses the criterion of maximizing the minimum distance, we would like to construct a general number of packed subspaces from the beam direction perspective. In our scheme, each column of a precoder is regarded as a beam, and the codebook construction is completed with two phases.
In the first phase, we design a set of beams and then group them into several subsets. The beams in each subset constitute a precoding matrix. Let each beam be given by where θ l is the phase of the beam, which indicates the beam direction. Obviously, this type of beam is in period of 2π with respect to θ l . By equally separating the phase value between 0 and 2π, that is, θ l = 2π(l − 1)/n b , l = 1, . . . , n b , a general number of n b beams are generated and written as From the viewpoint of array processing, the resultant n b beams convey uniform power at intervals of π/n b /d T in angular domain [25,26], where d T denotes the spacing between two neighboring transmit antennas in wavelength.
Assuming that the selected number n b satisfies n b = n U n T , where n U is an integer number, the following unitary matrices can be constructed by subset grouping, and are given by In the second phase of codebook construction, the above unitary matrices are used to generate multiple clusters. Each B s is first partitioned into a pair of submatrices denoted as B s (1 : n T /2), B s (n T /2+1 : n T ). It is seen that the submatrices {B s (1 : n T /2), B s (n T /2 + 1 : n T )} actually follow a DFTbased structure. However, it should be mentioned that these submatrices are different from those generated by the DFTbased subspace packing, where the criterion of maximizing the minimum distance is used. The key difference lies in that the subspaces spanned by each pair of submatrices are orthogonal to each other, due to the fact that each pair of submatrices is a partition of a unitary matrix. Actually, this orthogonality characteristic provides a potential of achieving the asymptotical optimal SDMA performance [14]. To construct a codebook comprising multiple clusters, each submatrix is used to generate a particular cluster, by right multiplying a finite number of unitary perturbation matrices denoted as {Λ i D, i = 1, . . . , L c } (an alternative type of perturbation matrices is available too, the interested readers please refer to [22]), where L c denotes the cluster size and can be chosen as any integer number.
The two-phase codebook construction process is illustrated in Figure 1, it is seen that the constructed SDMA codebook consists of N c = 2n U clusters/subspaces, with each cluster comprising L c precoders. Essentially, the IEEE 802.20 SDMA codebook is a special case of the proposed SDMA codebook, with the configuration of n U = 1, n T = 4. It is worth mentioning that the proposed codebook would not only extend the number of clusters, but it would also extend the pattern of the resultant multiuser precoder. Since the clusters from different pairs are commonly nonorthogonal, the resultant multiuser precoder is not always a unitary matrix now.
The results in [14] have shown that the use of an arbitrary unitary matrix as the multiuser precoder is capable of achieving the optimal capacity scaling in the case of very large number of users. In our proposed SDMA codebook, if the active users are scheduled on a pair of orthogonal clusters/subspaces, the resultant multiuser precoder has to be a unitary matrix, thus the asymptotical optimal performance would be preserved. In sparse networks, the optimal multiuser precoder is commonly not a unitary matrix, thus it is highly possible that the case of scheduling users on two nonorthogonal clusters brings lower interference than that on a pair of orthogonal clusters. In this sense, the proposed SDMA codebook provides a better performance than the IEEE 802.20 codebook does in terms of interference suppression. This advantage can also be confirmed from the viewpoint of array processing, since multiple clusters are constructed from a set of beams with a uniform separation of beam direction and the simultaneous transmission with these clusters has a good potential of interuser interference suppression. Moreover, the increased density of subspace packing in the proposed SDMA codebook will considerably improve the performance of the individual user. By far the first two problems mentioned above have been addressed to some degree in the extension of SDMA codebook. As for the third problem, we use computational simulations to seek a reasonable tradeoff between the number of clusters and the cluster size. The simulation results given in Section 5 show that a four-cluster SDMA codebook provides a better tradeoff between the cluster number and the cluster size than the IEEE 802.20 SDMA codebook does.

SCHEDULING ALGORITHM
The goal of an SDMA scheduling algorithm is to fully exploit the multiuser diversity [27][28][29] and spatial multiplexing gain. The extension of SDMA codebook disables the scheduling algorithm adopted in IEEE 802.20. In this section, we propose two schemes of opportunistic scheduling which only require limited feedback information. The proposed algorithms would carefully schedule multiple users on a fixed number of n T active substreams, aiming at a maximum sum throughput. Although a fixed number of active substreams are not always optimal, the adaptation of the number of active substreams will dramatically increase the feedback overhead and scheduling complexity. This section first focuses on the configuration of n T = 2n R , in which each individual user is multiplexed with n R substreams. Thus, the scheduling algorithm needs to select two MIMOtransmission users each time. The extension of the scheduling algorithms to a more general configuration is introduced in Section 4.3.

Markov opportunistic scheduling
To implement SDMA scheduling, each user should first derive the feedback information based on the proposed codebook and send it to the base station. After collecting the feedback information from all the users, the base station should then select several simultaneously active users and assign them specific precoders. The selection scheme should guarantee that the users in good channel conditions be selected and the matched precoders be assigned, such that the multiuser diversity is fully exploited. Also, the selection scheme should guarantee some degree of orthogonality among the effective channels of the simultaneously active users. Unfortunately, these two requirements usually conflict with each other, thus we need to find a good tradeoff between them.
In this section, we propose a novel opportunistic scheduling algorithm which is implemented in a Markov-like way. In the proposed algorithm, each feedback information from a user falls into two parts denoted as part I and part II.
For the kth scheduling, part II information from the current feedback along with part I information from the previous feedback, that is, the (k − 1)th feedback, will be used to select active users and assign precoders. In particular, as depicted in Figure 2, before the actual SDMA transmission, three scheduling stages are employed to complete user selection and precoder assignment. In the first stage, part I information from the previous feedback is utilized to select the first active user, henceforth denoted as the main user, and assigns its preferred precoder. At the same time, the expected interfering cluster is determined to restrict the range of precoder selection for the second active user (henceforth denoted as the secondary user), so that the interference from the main user on the secondary user is controlled. In order that all the users respond to these selection results, the indices of the preferred precoder for the main user and the expected interfering cluster, termed as preschedule information, are broadcast immediately, which finishes the job of the first stage. In the second stage, each user derives the feedback information (two parts) based on the preschedule information and transmits it to the base station. In the last stage, the base station utilizes part II information from the current feedback to select the second active user and its preferred precoder included in the expected interfering cluster. Note that in this stage the interference from the secondary user to the main user can be controlled. If the channels keep quasistatic between these two continuous scheduling intervals, the above Markov OS scheme succeeds in suppressing the mutual interference between two successively scheduled users, that is, the main and secondary users, and the multiuser diversity is achieved as well through the above successive user selection.
To present the above scheduling algorithm in more detail, we will first introduce the derivation for the two parts of feedback information, and then present the related selection/assignment scheme.

Feedback information part I
This part of feedback information includes the user's maximum supported rate in SDMA mode, the indices of its preferred precoder, and the expected interfering cluster.
We assume perfect channel state information (CSI) is available at the user terminal. For the kth user, in order to obtain its maximum supported rate in SDMA mode, we assume the base station will transmit signal s 1 to this user using a precoder U 1 , and the base station will simultaneously transmit s 2 with equal power to the other user using a precoder U 2 , where U 1 and U 2 can be any different precoders selected from the SDMA codebook, and [U 1 , U 2 ] form the resultant multiuser precoder. Thus, a virtual receive model for the user k can be expressed as Then, the maximum supported rate of the user k in SDMA mode will be obtained by searching all the possible receive models with respect to different U 1 and U 2 . If an MMSE linear receiver is employed, the output of its linear filtering for the virtual receive signal is written as where For the ith possible value of (U 1 , U 2 ), denoted as (U (i) 1 , U (i) 2 ), the corresponding G k is denoted as G i k , the signal to interference plus noise ratio (SINR) for the mth data stream can be represented as The ideal supported rate associated with (U (i) 1 , U (i) 2 ) is then written as To seek the maximum supported rate of the user k, we need to search all the possible (U 1 , U 2 ). Mathematically, the maximum supported rate is expressed as where L = N c L c denotes the size of codebook, and L(L − 1) represents the search space for (U 1 , U 2 ). The specific values of U 1 and U 2 which achieve C 1 k are viewed as the user's preferred precoder (henceforth denoted as I 1 k ) and the expected interfering precoder in SDMA mode, respectively.
As for the computational complexity, obviously the exhaustive elementwise search of both U 1 and U 2 over the SDMA codebook requires L(L − 1) times of rate calculation using (12)- (13). It is to be noted that the cluster-based structure of the proposed codebook can be exploited to greatly reduce the search complexity. To see this, we first investigate the impact of a unitary perturbation to U 1 and U 2 on the sum rate supported by the SDMA system with a linear receiver. We define the unitary perturbed precoders as U 1 = U 1 A 1 and U 2 = U 2 A 2 , where A 1 and A 2 denote the unitary matrices. It follows from (12) that the SINR for the mth stream in the SDMA system using the precoders ( U 1 , U 2 ) is expressed as where . It is seen that a perturbation to the precoder U 1 affects the value of SINR, which in turn affects the system performance, actually, the unitary perturbation has a potential of compensating for the performance loss caused by a linear receiver, more details can be found in [22], while a perturbation to the interfering precoder U 2 has no impact on the SINR value, which means that different precoders included in the same cluster produce the same influence on the rate performance of the receive model expressed as (8). Thus, in order to search for the maximum supported rate, we need only a clusterwise search of U 2 together with an elementwise search of U 1 , instead of an exhaustive elementwise search of both U 1 and U 2 . Moreover, the search case that U 1 and U 2 fall into the same cluster can be omitted due to the excessive interuser interference. Hence, we finally only need L(N c −1) times of rate calculations. Also, the specific interfering cluster agreeing with the maximum Yongming Huang et al.  rate is denoted as the expected interfering cluster (henceforth expressed as J 2 k ).

Feedback information part II
Part II feedback information is derived based on the broadcast preschedule information, and includes the user's maximum supported rate (denoted as C 2 k ) and the index of its preferred precoder (denoted as I 2 k ), both in prescheduled mode, where the prescheduled mode means that the user is viewed as a candidate for the second active user, with the precoder for the main user determined and its possible precoder restricted in the expected interference cluster.
The virtual model of (8) will be used again to find C 2 k and I 2 k by searching the expected cluster. Note that I 2 k actually denotes the relative index of the preferred precoder in the restricting cluster. As for the computational complexity, since now U 2 in (8) has been determined in the preschedule information, and the search with respect to U 1 is restricted in a given cluster, we only need L c times of rate calculations to obtain C 2 k and I 2 k .

Selection scheme
As stated above, two parts of feedback information are denoted as C 1 k , I 1 k , and J 2 k (part I); C 2 k and I 2 k (part II). We have clarified in the second paragraph of Section 4.1 that in the first stage, a main active user and its preferred precoder are selected by using part I feedback {C 1 k , I 1 k , J 2 k } from the previous feedback, and in the third stage, the main task is to select the second active users and its preferred precoder. If we define then k is the index of the main active user, I 1 k is the index of its preferred precoder, J 2 k is the index of its expected interfering cluster. Noting that some individual users may terminate traffic request at the current scheduling, the C 1 k associated with these users should be first removed from the set {C 1 k } in the above process. In order to control the interference from the secondary active user to the main active user, we restrict the precoder of the second active user in the cluster indexed by J 2 k . Note that the third stage is implemented until all the users have responded to the broadcast preschedule information and complete feedback at the current scheduling. The set k } from the current feedback will be utilized to select the second active users and its preferred precoder.
If we define then k is the index of the secondary user, I 2 k indicates the relative position of its preferred precoder in the given cluster indexed by J 2 k .

Feedback overhead and scheduling complexity
We assume the SDMA codebook consists of N c = 2n U clusters and each cluster has L c codewords, namely, the size of codebook is L = N c ·L c . The total number of feedback bits where Q(x) denotes the number of bits used to quantize the scalar quantity x. This amount is about twice the feedback amount acquired by the IEEE 802.20 SDMA. In addition, the number of the broadcast bits used in preschedule information The majority of the scheduling complexity lies in the calculation of feedback information and the comparison of the supported rate among all the users. Since the comparison operation aims to find the maximum one, its complexity linearly increases with the number of users. The complexity of feedback information calculation mainly depends on the times of rate calculation using (12)- (13). In the proposed MOS algorithm, each user needs N c L c (N c − 1) + L c times of rate calculations.

Quick matching opportunistic scheduling
The above MOS requires each user to feed back two parts of information. In this section, we propose another opportunistic scheduling algorithm which requires less feedback information, where the cluster structure of the proposed SDMA codebook is fully exploited to reduce the feedback information and simplify the scheduling algorithm.

Feedback information
In every schedule, each user k feeds back the following information: (1) the maximum supported rate in SDMA mode, denoted as C k ; (2) the index of its preferred precoder, denoted as I k ; (3) and the index of its expected interfering cluster, denoted as J k . The way of obtaining the above feedback information is similar to that introduced in Section 4.1, which will not be repeated.

Selection scheme
After gathering feedback information from all the users, the base station will first classify the users according to the feedback indices, every two classes with good orthogonality between their effective channels are paired together. Through a pairwise comparison, the users in favorable channels and with good interuser interference suppression can be easily found. The detailed processes are provided as follows.
(1) We define the cluster including the preferred precoder as the preferred cluster. For the user k, we denote the index of its preferred cluster as J k . Based on this information plus the index of the expected interfering cluster {J k }, the users can be classified as follows: the users with the same preferred cluster m and the same expected interfering cluster n are classified into one class, denoted by (m, n), namely, If we assume that two active users must be scheduled on different clusters, there will be N g = 2n U ·(2n U −1) different classes.
(2) We regard the class (m, n) and class (n, m) as one pair, denoted as n ∼ m. Since the users in the paired classes have their preferred cluster and expected interfering cluster pointed to each other, the effective channels between these two classes of users have good orthogonality characteristic, which means that the mutual interference between these two classes can be suppressed.
(3) For the pair n ∼ m, we find one user in class (m, n) and one user in class (n, m) such that the sum of their supported rate is maximum for that pair. In this way, the multiuser diversity will be exploited.
(4) Among all the possible pairs, we find the pair with the maximum sum rate, the two users which achieve the maximum rate will be scheduled as the current active users, and their preferred precoders are scheduled as well.
In essence, this algorithm aims at finding two users with matching channel conditions and both in favorable conditions; hence we call it quick matching opportunistic scheduling. Note that it is possible for the user in good channel condition to be unable to find its matching user while its supported rate is bigger than the sum rate of any two matched users. In this case, the algorithm will schedule this single user to transmit signal, and no other simultaneously active user will be scheduled together. Compared with MOS, this QMOS only depends on the feedback information from the current scheduling, and requires no broadcast of preschedule information at the very beginning of scheduling.
It is worth mentioning that the proposed algorithm reduces to the one adopted in IEEE 802.20 when the proposed SDMA codebook only consists of two clusters, that is, n U = 1. Since now the number of the available classes is N g = 2n U ·(2n U − 1) = 2, only a unique pair exists in the scheduling algorithm and thus the fourth step can be omitted. Also, the feedback of the index of the expected interfering cluster is redundant now, since the expected interfering cluster must be the opposite one to the preferred cluster.

Feedback overhead and scheduling complexity
The feedback information consists of {C k , I k , J k }, the number of the total feedback bits is Q(C k ) + log(N c ·L c ), which is half of that acquired by MOS.
We focus on the computational complexity of feedback information calculation for each user. In order to obtain the maximum rate, we need N c L c (N c − 1) times of rate calculations using (12)-(13), which is slightly smaller than that required in MOS.

Extension of proposed scheduling algorithms
The above scheduling algorithms only work in the case of two simultaneously active users. Here, we extend the proposed algorithms into a more general case. We assume that the antenna configuration satisfies n R ∈ {n T /2, n T /4, n T /8, . . . , 1}, such that the SDMA may include more than two simultaneously active users. In order to extend the proposed scheduling algorithms for this case, we still employ the SDMA codebook with the precoders in size of n T × n T /2, with one precoder possibly carrying signal intended for more than one user. Instead of selecting two preferred active users and their preferred precoders, the extended OS algorithms each time will select two preferred user groups and two preferred precoders, each user group transmitting signal with one preferred precoder. If we view one user group as one virtual user and provide it with the equivalent feedback information, the extension of the proposed OS algorithms introduced in Sections 4.1 and 4.2 is straightforward. Thus, the key to the extension lies in the definition of the virtual user.
If the number of multiplexing substreams of user k is less than n T /2, that is, M < n T /2, only a submatrix in one precoder will be used by such a user. Thus, the virtual receive model of (8) should be modified to find its maximum supported rate, together with both its preferred precoder and preferred submatrix. To this end, U 1 is first partitioned as U 1 = [Φ 1 , . . . , Φ P ], where Φ 1 ∈ C nT ×M and P = n T /M. The virtual receive model of user k is modified as Yongming Huang et al.  where s 1 denotes the transmit signal of the user k, and Φ p is its transmit matrix; Φ p denotes the matrix constructed by subtracting Φ p from U 1 ; s 1 denotes the signal transmitted to the other users with the beams included in Φ p . This model assumes that each time two precoders U 1 and U 2 are selected, and the users associated with the transmit signal of s 1 and s 1 are viewed as a user group scheduled on the same precoder U 1 . Similar to the method introduced in Section 4.1, the SINR of the mth substream of user k can be easily calculated. In order to obtain the maximum supported rate in SDMA mode, in the modified model we need to search not only all the possible values of (U 1 , U 2 ) but also those of Φ p . If we express one specific SINR value of the mth substream as μ k,m,i,p , where the indices i and p indicate the specific selection of (U 1 , U 2 ) and that of Φ p , respectively. The maximum supported rate can then be calculated as Similarly, we define the specific selection of U 1 achieving C k as the preferred precoder, denoted by I k (with a slight notation abuse). Also, the definition of the expected interfering cluster follows that introduced in Section 4.1. Since user k now only uses a submatrix of its preferred precoder as the transmit matrix, a new index P k is required to feed back to indicate the position of the preferred submatrix. We classify the users reporting the same preferred precoder into one set, which is expressed as where i denotes the index of a specific precoder, the elements in the set denote the user index. In order to define the virtual user, we further divide the set into multiple subsets S i, j , where j denotes the subset index. The criterion is to make the collection of all the preferred transmit matrices of the user subset exactly constitute the preferred precoder, namely, S i, j satisfies the following property: Such a user subset is then defined as a virtual user. Based on the feedback information, the maximum supported sum capacity of a virtual user can be calculated as k∈Si,j C k . Provided with the definitions of all the possible virtual users, their corresponding maximum supported sum rate, their preferred precoder, and expected interfering cluster, the extension of MOS and QMOS remains to replace the selection of a preferred user with that of a preferred virtual user.

SIMULATION RESULTS
We assume a multiuser network where multiple users are randomly distributed around the base station and have the same distance to the base station. The channels between the users and the base station experience time-varying flat fading, and the 3GPP spatial channel model is used to simulate the outdoor multiuser MIMO channels, in which it is assumed that the time variation of the channels depends on the mobile speed. We consider the configuration of (n T , n R ) = (4, 2) and a scheduling interval of 5 milliseconds. At the beginning of the interval, the users calculate their feedback information and transmit them to the base station, after collecting the feedback information the base station schedules two simultaneously active users and implement the SDMA transmission at the end of the interval. Therefore, a delay error of 5 milliseconds arises between the channels of SDMA scheduling and implementation. The Monte Carlo simulations will take this channel delay error into consideration but assume no channel estimation error and feedback error. We assume all the receivers have the same noise variance, and define SNR = n T E s /N 0 . The system throughput is calculated as the sum of the throughputs of all the simultaneously active users, where the throughput for each active user can be calculated with (12)- (13), in which H k , U 1 , and U 2 should be replaced with the actual ones in the simulation. Figures 3 and 4 illustrate the system throughput of the proposed SDMA schemes under channels with 3 km/h mobile speed. For comparison, the IEEE 802.20 SDMA [16] and the opportunistic SDMA with an arbitrary unitary precoder [14] are simulated as well. Note that the IEEE 802.20 SDMA is also based on a codebook, and the required feedback information includes a CQI (such as the maximum supported rate) and an index of the preferred precoder.  Thus the feedback overhead is comparable to that in the proposed QMOS SDMA. The simulated IEEE 802.20 SDMA employs a codebook of size 28 (2 clusters, 14 precoders in each cluster). Results show that both of the two proposed SDMAs outperform the IEEE 802.20 SDMA in terms of system throughput, even with the codebook of size 4 × 4 (m × n means N c = m and L c = n). In particular, the proposed SDMAs exhibit 1 ∼ 1.5 bps/Hz gain at the SNR of 10 dB and 1.5 ∼ 2 bps/Hz gain at the SNR of 20 dB, with the number of users ranging from 15 to 40. With the increase of the codebook size, more gain is observed, especially when the user number is large. This gain is obtained by exploiting the potential of performance enhancement offered by the proposed SDMA codebook. Simulation results also show that in sparse networks the SDMA with an arbitrary unitary precoder [14] has a much poorer performance than the SDMA employing SDMA codebook. In addition, it is seen that the MOS SDMA outperforms the QMOS SDMA in the case of small number of users, while the opposite result (the QMOS SDMA outperforms the MOS SDMA) is observed in the case of large number of users. Since the MOS selects active users in a successive way, it always succeeds in scheduling together two active users for simultaneous transmission, while the QMOS may fail in scheduling two simultaneously active users, especially when the number of users is insufficient. On the other hand, the successive scheduling way may lose some multiuser diversity at the second step where the selection of the secondary scheduled user is restricted by the already scheduled main user. These characteristics can explain why an intersection  exists between the performances of the proposed two SDMAs. Figures 5 and 6 illustrate the system throughput of the SDMA schemes under channels with 30 km/h mobile speed. Simulation results show that the proposed SDMAs still outperform the IEEE 802.20 SDMA under this channel environment. In contrast to the case of 3 km/h mobile speed, the performance of all the SDMAs degrades slightly, which is caused by the more severe delay error of feedback information [30]. When the number of users ranges in 20 ∼ 40, the MOS SDMA has a throughput loss of 6 ∼ 7%, while the other two SDMAs have a loss of 2 ∼ 3%. Note that different from the other two SDMAs, the MOS SDMA depends on not only the feedback information at the current scheduling, but also that at the previous scheduling. Thus, its performance loss is determined by the time variation of channels between two continuous scheduling intervals, which is more severe than the other SDMAs.
In the previous simulations of MOS SDMA, we have assumed that the traffic request of all the users between two continuous scheduling intervals keep static. In practical scenarios, some users may terminate traffic request at a certain scheduling interval. At the same time some new users may be added with traffic request. Figure 7 provides the system throughput performance of the MOS SDMA in such a practical scenario; in each schedule time some percent of users are newly added while some users terminate traffic request, where we assumed the number of newly added users is equal to that of the users terminating traffic request. Simulation results show that the performance of the Yongming Huang et al. MOS SDMA degrades with the increasing of the percent of the newly added users. That is because MOS SDMA cannot utilize the feedback information of the recently added users in the process of scheduling a main user and thereby results in a multiuser diversity loss. However, it is seen that, under channel condition with 30 km/h mobile speed, the performance loss with 10% newly users is very small. Even with 70% users newly added, the MOS SDMA employing 4 × 4 codebook still outperforms the IEEE 802.20 SDMA. Figure 8 illustrates the average rate distortion of the proposed QMOS SDMA, in comparison with that of the IEEE 802.20 SDMA. We have assumed no channel delay error in this simulation. If we use Th QMOS (H) to denote the throughput obtained by the QMOS SDMA for one time of channel realization, and use Th BD (H) to denote the throughput obtained by the block diagonalized precoding [6] with full channel knowledge of all the active users, the average rate distortion is defined as E[Th BD (H) − Th QMOS (H)], where E[·] denotes the expectation over the channel realization. The simulation results show that the average distortion increases with the number of the users, which implies that the SDMA schemes employing the SDMA codebook are more effective in sparse networks. It is also seen that the average distortion decreases with the increasing of the cluster number N c and the cluster size L c . It is worth mentioning that, even with a smaller codebook size (L = N c ·L c ), the SDMA codebook packed with 4 clusters exhibits a lower distortion than the codebook packed with 2 clusters.
Finally, the effect on the throughput performance of quantizing the feedback scalar quantity, that is, the maximum supported rate, in the proposed opportunistic schedul-  ing algorithms is investigated by simulation. As shown in Figure 9, when a simple linear quantization with 4 bits is employed, the performance loss of the proposed two scheduling algorithms is minor and can be ignored in the case of small number of users. It is also found that, compared with the IEEE 802.20 SDMA scheme, the superiority of the proposed schemes in terms of the throughput is obviously preserved in the quantization case.

CONCLUSION
In this paper, we have presented a novel design method for limited feedback SDMA by combining the codebookbased multiuser precoding and the opportunistic scheduling.
We have first proposed an SDMA codebook construction from the perspective of array processing and unitary perturbation. The proposed codebook provides a good property of interuser interference suppression, and its cluster-based structure is helpful for simplifying the scheduling algorithm. Then, we proposed two codebook related opportunistic scheduling algorithms, that is, a Markov OS and a quick matching OS. The MOS schedules active users and their precoders in a successive way, while the QMOS schedules by a way of classifying plus matching. Simulation results have shown that in sparse networks, the proposed SDMAs outperform the IEEE 802.20 SDMA in terms of throughput, while both having a comparable feedback overhead.