Mode Switching for MIMO Broadcast Channel Based on Delay and Channel Quantization

Imperfect channel state information degrades the performance of multiple-input multiple-output (MIMO) communications; its effect on single-user (SU) and multi-user (MU) MIMO transmissions are quite different. In particular, MU-MIMO suffers from residual inter-user interference due to imperfect channel state information while SU-MIMO only suffers from a power loss. This paper compares the throughput loss of both SU and MU MIMO on the downlink due to delay and channel quantization. Accurate closed-form approximations are derived for the achievable rates for both SU and MU MIMO. It is shown that SU-MIMO is relatively robust to delayed and quantized channel information, while MU MIMO with zero-forcing precoding loses spatial multiplexing gain with a fixed delay or fixed codebook size. Based on derived achievable rates, a mode switching algorithm is proposed that switches between SU and MU MIMO modes to improve the spectral efficiency, based on the average signal-to-noise ratio (SNR), the normalized Doppler frequency, and the channel quantization codebook size. The operating regions for SU and MU modes with different delays and codebook sizes are determined, which can be used to select the preferred mode. It is shown that the MU mode is active only when the normalized Doppler frequency is very small and the codebook size is large.


I. INTRODUCTION
Over the last decade, the point-to-point multiple-input multiple-output (MIMO) link (SU-MIMO) has been extensively researched and has transited from a theoretical concept to a practical technique [1], [2]. Due to space and complexity constraints, however, current mobile terminals only have one or two antennas, which limits the performance of the SU-MIMO link. Multi-user MIMO (MU-MIMO) provides the opportunity to overcome such a limitation by communicating with multiple mobiles simultaneously. It effectively increases the number of equivalent spatial channels and provides spatial multiplexing gain proportional to the number of transmit antennas at the base station even with single-antenna mobiles. In addition, MU-MIMO has higher immunity to propagation limitations faced by SU-MIMO, such as channel rank loss and antenna correlation [3].
There are many technical challenges that must be overcome to exploit the full benefits of MU-MIMO. A major one is the requirement of channel state information at the transmitter (CSIT), which is difficult to get especially for the downlink/broadcast channel. For the MIMO downlink with N t transmit antennas and N r receive antennas, with full CSIT the sum throughput can grow linearly with N t even when N r = 1, but without CSIT the spatial multiplexing gain is the same as for SU-MIMO, i.e. the throughput grows linearly with min(N t , N r ) at high SNR [4].
Limited feedback is an efficient way to provide partial CSIT, which feeds back the quantized channel information to the transmitter via a low-rate feedback channel [5], [6]. However, such imperfect CSIT will greatly degrade the throughput gain provided by MU-MIMO [7], [8]. Besides quantization, there are other imperfections in the available CSIT, such as estimation error and feedback delay. With imperfect CSIT, it is not clear whether-or more to the point, when-MU-MIMO can outperform SU-MIMO. In this paper, we compare SU and MU-MIMO transmissions in the MIMO downlink with CSI delay and channel quantization, and propose to switch between SU and MU MIMO modes based on the achievable rate of each technique with practical receiver assumptions.

A. Related Work
For the MIMO downlink, CSIT is required to separate the spatial channels for different users.
To obtain the full spatial multiplexing gain for the MU-MIMO system employing zero-forcing (ZF) or block-diagonalization (BD) precoding, it was shown in [7], [9] that the quantization codebook size for limited feedback needs to increase linearly with SNR (in dB) and the number of transmit antennas. Zero-forcing dirty-paper coding and channel inversion systems with limited feedback were investigated in [8], where a sum rate ceiling due to a fixed codebook size was derived for both schemes. In [10], it was shown that to exploit multiuser diversity for ZF, both channel direction and information about signal-to-interference-plus-noise ratio (SINR) must be fed back. More recently, a comprehensive study of the MIMO downlink with ZF precoding was done in [11], which considered downlink training and explicit channel feedback and concluded that significant downlink throughput is achievable with efficient CSI feedback. For a compound MIMO broadcast channel, the information theoretic analysis in [12] showed that scaling the CSIT quality such that the CSIT error is dominated by the inverse of the SNR is both necessary and sufficient to achieve the full spatial multiplexing gain.
Although previous studies show that the spatial multiplexing gain of MU-MIMO can be achieved with limited feedback, it requires the codebook size to increase with SNR and the number of transmit antennas. Even if such a requirement is satisfied, there is an inevitable rate loss due to quantization error, plus other CSIT imperfections such as estimation error and delay.
In addition, most of prior work focused on the achievable spatial multiplexing gain, mainly based on the analysis of the rate loss due to imperfect CSIT, which is usually a loose bound [7], [9], [12]. Such analysis cannot accurately characterize the throughput loss, and no comparison with SU-MIMO has been made. In this paper, we derive good approximations for the achievable throughput for both SU and MU MIMO systems with fixed channel information accuracy, i.e. with a fixed delay and a fixed quantization codebook size. We are interested in the following question: With imperfect CSIT, including delay and channel quantization, when can MU-MIMO actually deliver a throughput gain over SU-MIMO? Based on this, we can select the one with the higher throughput as the transmission technique.

B. Contributions
In this paper, we investigate SU and MU-MIMO in the broadcast channel with CSI delay and limited feedback. The main contributions of this paper are as follows.
• SU vs. MU Analysis. We investigate the impact of imperfect CSIT due to delay and channel quantization. We show that the SU mode is more robust to imperfect CSIT as it only suffers a constant rate loss, while MU-MIMO suffers more severely from the residual inter-user interference. We characterize the residual interference due to delay and channel quantization, which shows these two effects are equivalent. Based on an independence approximation of the interference terms and the signal term, accurate closed-form approximations are derived for the ergodic rates for both SU and MU MIMO modes.
• Mode Switching Algorithm. A SU/MU mode switching algorithm is proposed based on the ergodic sum rate as a function of the average SNR, normalized Doppler frequency, and the quantization codebook size. This transmission technique only requires a small number of users to feed back instantaneous channel information. The mode switching points can be calculated from the previously derived approximations for ergodic rates.
• Operating Regions. The operating regions for SU and MU modes are determined, from which we can determine the active mode and find the condition that activates each mode. With a fixed delay and codebook size, if the MU mode is possible at all, there are two mode switching points, with the SU mode preferred at both low and high SNRs. The MU mode will only be activated when the normalized Doppler frequency is very small and the codebook size is large. From the numerical results, the minimum feedback bits per user to get the MU mode activated grow approximately linearly with the number of transmit antennas.
The rest of the paper is organized as follows. The system model and some assumptions are presented in Section II. The transmission techniques for both SU and MU MIMO modes are described in Section III. The rate analysis for both SU and MU modes and the mode switching are done in Section IV. Numerical results and conclusions are in Section V and VI, respectively.

II. SYSTEM MODEL
We consider a MIMO downlink, where the transmitter (the base station) has N t antennas and each mobile user has a single antenna. The system parameters are listed in Table I. During each transmission period, which is less than the channel coherence time and the channel is assumed to be constant, the base station transmits to one (SU-MIMO mode) or multiple (MU-MIMO mode) users. The discrete-time complex baseband received signal at the u-th user at time n is given as 1 where h u [n] is the N t × 1 channel vector from the transmitter to the u-th user, and z u [n] is the normalized complex Gaussian noise vector, i.e. z u [n] ∼ CN (0, 1). x u [n] and f u [n] are the transmit signal and N t × 1 precoding vector for the u-th user, respectively. The transmit power As the noise is normalized, P is also the average transmit SNR.
To assist the analysis, we assume that the channel h u [n] is well modeled as a spatially white Gaussian channel, with entries h i,j [n] ∼ CN (0, 1), and the channels are i.i.d. over different users.
The results will be different for different channel models. For example, a limited feedback system with line of sight MIMO channel requires fewer feedback bits compared to the Rayleigh channel [13]. The investigation of other channel models is left to future work.
We consider two of the main sources of the CSIT imperfection-delay and quantization error 2 , specified as follows.

A. CSI Delay Model
We consider a stationary ergodic Gauss-Markov block fading process [14,, where the channel stays constant for a symbol duration and changes from symbol to symbol according to where e[n] is the channel error vector, with i.i.d. entries e i [n] ∼ CN (0, ǫ 2 e ), and it is uncorrelated with h[n−1]. We assume the CSI delay is of one symbol. It is straightforward to extend the results to the scenario with a delay of multiple symbols. For the numerical analysis, the classical Clarke's isotropic scattering model will be used as an example, for which the correlation coefficient is 1 In this paper, we use uppercase boldface letters for matrices (X) and lowercase boldface for vectors (x). E[·] is the expectation operator. The conjugate transpose of a matrix X (vector x) is X * (x * ). Similarly, X † denotes the pseudo-inverse,x denotes the normalized vector of x, i.e.x = x x , andx denotes the quantized vector ofx. 2 For a practical system, the feedback bits for each user is usually fixed, and there will inevitably be delay in the available CSI, both of which are difficult or even impossible to adjust. Other effects such as channel estimation error can be made small such as by increasing the transmit power or the number of pilot symbols. ρ = J 0 (2πf d T s ) with Doppler spread f d [15], where J 0 (·) is the zero-th order Bessel function of the first kind. The variance of the error vector is ǫ 2 e = 1 − ρ 2 . Therefore, both ρ and ǫ e are determined by the normalized Doppler frequency f d T s .
The channel in (2) is widely-used to model the time-varying channel. For example, it is used to investigate the impact of feedback delay on the performance of closed-loop transmit diversity in [16] and the system capacity and bit error rate of point-to-point MIMO link in [17]. It simplifies the analysis, and the results can be easily extended to other scenarios. Essentially, this model is , where γ p is the SNR of the pilot symbol [18]. b) Analog Feedback: For analog feedback, the error variance is ǫ 2 e = 1 1+τ ul γ ul , where τ ul is the number of channel uses per channel coefficient and γ ul is the SNR on the uplink feedback channel [19]. c) Analog Feedback with Prediction: As shown in [20], for analog feedback with a dstep MMSE predictor and the Gauss-Markov model, the error variance is ǫ 2 e = ρ 2d ǫ 0 + (1 − ρ 2 ) d−1 l=0 ρ 2l , where ρ is the same as in (2) and ǫ 0 is the Kalman filtering mean-square error. Therefore, the results in this paper can be easily extended to these systems. In the following parts, we focus on the effect of CSI delay.

B. Channel Quantization Model
We consider frequency-division duplexing (FDD) systems, where limited feedback techniques provide partial CSIT through a dedicated feedback channel from the receiver to the transmitter.
The channel direction information for the precoder design is fed back using a quantization codebook known at both the transmitter and receiver.
The quantization is chosen from a codebook of unit norm vectors of size L = 2 B . We assume each user uses a different codebook to avoid the same quantization vector. The codebook for user u is C u = {c u,1 , c u,2 , · · · , c u,L }. Each user quantizes its channel to the closest codeword, where closeness is measured by the inner product. Therefore, the index of channel for user u is Each user needs to feed back B bits to denote this index, and the transmitter has the quantized channel informationĥ u = c u,Iu . As the optimal vector quantizer for this problem is not known in general, random vector quantization (RVQ) [21] is used, where each quantization vector is independently chosen from the isotropic distribution on the N t -dimensional unit sphere. It has been shown in [7] that RVQ can facilitate the analysis and provide performance close to the optimal quantization. In this paper, we analyze the achievable rate averaged over both RVQ-based random codebooks and fading distributions.
An important metric for the limited feedback system is the squared angular distortion, defined it was shown in [7], [22] that the expectation in i.i.d. Rayleigh fading is given by where β(·) is the beta function. It can be tightly bounded as [7]

III. TRANSMISSION TECHNIQUES
In this section, we describe the transmission techniques for both SU and MU MIMO systems with perfect CSIT, which will be used in the subsequent sections for imperfect CSIT systems.
By doing this, we focus on the impacts of imperfect CSIT on the conventional transmission techniques. Designing imperfect CSIT-aware precoders is left to future work. Throughout this paper, we use the achievable ergodic rate as the performance metric for both SU and MU-MIMO systems. The base station transmits to a single user (U = 1) for the SU-MIMO system and to N t users (U = N t ) for the MU-MIMO system. The SU/MU mode switching algorithm is also described.

A. SU-MIMO System
With perfect CSIT, it is optimal for the SU-MIMO system to transmit along the channel direction [1], i.e. selecting the beamforming (BF) vector as f[n] =h[n], denoted as eigenbeamforming in this paper. The ergodic capacity of this system is the same as that of a maximal ratio combining diversity system, given by [23] where Γ(·, ·) is the complementary incomplete gamma function defined as

B. MU-MIMO System
For MIMO broadcast channels, although dirty-paper coding (DPC) [24] is optimal [25]- [29], it is difficult to implement in practice. As in [7], [11], ZF precoding is used in this paper, which is a linear precoding technique that precancels inter-user interference at the transmitter.
There are several reasons for us to use this simple transmission technique. Firstly, due to its simple structure, it is possible to derive closed-form results, which can provide helpful insights.
Second, the ZF precoding is able to provide full spatial multiplexing gain and only has a power offset compared to the optimal DPC system [30]. In addition, it was shown in [30] that the ZF precoding is optimal among the set of all linear precoders at asymptotically high SNR. In Section V, we will show that our results for the ZF system also apply for the regularized ZF precoding [31], which provides a higher throughput than the ZF precoding at low to moderate SNRs.
With precoding vectors f u [n], u = 1, 2, · · · , U, assuming equal power allocation 3 , the received SINR for the u-th user is given as there is no inter-user interference. The received SINR for the u-th user becomes As , and f u [n] 2 = 1, the effective channel for the u-th user is a single-input single-output (SISO) Rayleigh fading channel. Therefore, the achievable sum rate for the ZF system is given by Each term on the right hand side of (9) is the ergodic capacity of a SISO system in Rayleigh fading, given in [23] as where E 1 (·) is the exponential-integral function of the first order,

C. SU/MU Mode Switching
Imperfect CSIT will degrade the performance of the MIMO communication. In this case, it is unclear whether and when the MU-MIMO system can actually provide a throughput gain over the SU-MIMO system. Based on the analysis of the achievable ergodic rates in this paper, we propose to switch between SU and MU modes and select the one with the higher achievable rate.
The channel correlation coefficient ρ, which captures the CSI delay effect, usually varies slowly. The quantization codebook size is normally fixed for a given system. Therefore, it is reasonable to assume that the transmitter has knowledge of both delay and channel quantization, and can estimate the achievable ergodic rates of both SU and MU MIMO modes. Then it can determine the active mode and select one (SU mode) or N t (MU mode) users to serve. This is a low-complexity transmission strategy, and can be combined with random user selection, roundrobin scheduling, or scheduling based on queue length rather than channel status. It only requires the selected users to feed back instantaneous channel information. Therefore, it is suitable for a system that has a constraint on the total feedback bits and only allows a small number of users to send feedback, or a system with a strict delay constraint that cannot employ opportunistic scheduling based on instantaneous channel information.
To determine the transmission rate, the transmitter sends pilot symbols, from which the active users estimate the received SINRs and feed back them to the transmitter. In this paper, we assume the transmitter knows perfectly the actual received SINR at each active user. In practice, there will inevitably be errors in such information due to estimation error and feedback delay, which will result in rate mismatch, i.e. the transmission rate based on the estimated SINR does not match the actual SINR on the channel, so there will be outage events. How to deal with such rate mismatch is of practical importance, and we mention several possible approaches as follows. The full investigation of this issue requires further research and is out of scope of this paper. Considering the outage events, the transmission strategy can be designed based on the actual information symbols successfully delivered to the receiver, denoted as goodput in [32], [33]. With the estimated SINR, another approach is to back off on the transmission rate based on the variance of the estimation error, as did in [34], [35] for the single-antenna opportunistic scheduling system and in [36] for the multiple-antenna opportunistic beamforming system. Combined with user selection, the transmission rate can also be determined based on some lower bound of the actual SINR to make sure that no outage occurs, as did in [37] for the limited feedback system.

IV. PERFORMANCE ANALYSIS AND MODE SWITCHING
In this section, we investigate the achievable ergodic rates for both SU and MU MIMO modes.
We first analyze the average received SNR for the BF system and the average residual interference for the ZF system, which provide insights on the impact of imperfect CSIT. To select the active mode, accurate closed-form approximations for both SU and MU modes are then derived.

A. SU Mode-Eigen-Beamforming
Based on (11), we get the following theorem on the average received SNR for the SU mode.
Theorem 1: The average received SNR for a BF system with channel quantization and CSI delay is where ∆ (Q) BF show the impact of channel quantization and feedback delay, respectively, given by Proof: See Appendix B.
From Jensen's inequality, an upper bound of the achievable rate for the BF system with both quantization and delay is given by Remark 1: Note that ρ 2 = 1 − ǫ 2 e , so the average SNR decreases with ǫ 2 e . With a fixed B and fixed delay, the SNR degradation is a constant factor independent of P . At high SNR, the imperfect CSIT introduces a constant rate loss log 2 ρ 2 ∆ (Q) BF . The upper bound provided by Jensen's inequality is not tight. To get a better approximation for the achievable rate, we first make the following approximation on the instantaneous received i.e. we remove the term with e[n] as it is normally insignificant compared to ρh[n − 1]. This will be verified later by simulation. In this way, the system is approximated as the one with limited feedback and with equivalent SNR ρ 2 P .
From [22], the achievable rate of the limited feedback BF system is given by where can be approximated as As a special case, considering a system with delay only, e.g. the time-division duplexing (TDD) system which can estimate the CSI from the uplink with channel reciprocity but with propagation and processing delay, the BF vector is based on the delayed channel direction, i.e.
We provide a good approximation for the achievable rate for such a system as follows.
The instantaneous received SNR is given as

Theorem 3:
The average noise plus interference for the u-th user of the ZF system with both channel quantization and CSI delay is where ∆ ZF,u are the degradations brought by channel quantization and feedback delay, respectively, given by ZF,u = ǫ 2 e,u .

Proof:
The proof is similar to the one for Theorem 1 in appendix B.

Remark 2:
From Theorem 3 we see that the average residual interference for a given user consists of three parts: (i) The number of interferers, U −1. The more users the system supports, the higher the mutual interference.
(ii) The transmit power of the other active users, P U . As the transmit power increases, the system becomes interference-limited. It is possible to improve performance through power allocation, which is left to future work.
(iii) The CSIT accuracy for this user, which is reflected from ZF,u . The user with a larger delay or a smaller codebook size suffers a higher residual interference.
From this remark, the interference term, P U (U − 1)ǫ 2 e,u , equivalently comes from U − 1 virtual interfering users, each with equivalent SNR as P ZF,u . With a high P and a fixed ǫ e,u or B, the system is interference-limited and cannot achieve full spatial multiplexing gain. Therefore, to keep a constant rate loss, i.e. to sustain the spatial multiplexing gain, the channel error due to both quantization and delay needs to be reduced as SNR increases. Similar to the result for the limited feedback system in [7], for the ZF system with both delay and channel quantization, we can get the following corollary for the condition to achieve the full spatial multiplexing gain.

Corollary 1:
To keep a constant rate loss of log 2 δ 0 bps/Hz for each user, the codebook size and CSI delay need to satisfy the following condition Proof: As shown in [7], [11], the rate loss for each user due to imperfect CSIT is upper ZF,u . The corollary follows from solving log 2 ∆ (QD) ZF,u = log 2 δ 0 . Equivalently, this means that for a given ρ 2 , the feedback bits per user needs to scale as As ρ 2 u → 1, i.e. there is no CSI delay, the condition becomes B = (N t − 1) log 2 P δ 0 −1 , which agrees with the result in [7] with limited feedback only.
2) Achievable Rate: For the ZF system with imperfect CSI, the genie-aided upper bound for the ergodic achievable rate 4 is given by [11] We assume the mobile users can perfectly estimate the noise and interference and feed back it to the transmitter, so the upper bound is chosen as the performance metric, i.e. R ZF,ub , as in [7], [8], [10].
The following lower bound based on the rate loss analysis is used in [7], [11] where R ZF is the achievable rate with perfect CSIT, given in (9). However, this lower bound is very loose. In the following, we will derive a more accurate approximation for the achievable rate for the ZF system.
To get a good approximation for the achievable rate for the ZF system, we first approximate the instantaneous SINR as i.e. eliminating the interference terms which have both h u [n − 1] and e u [n] as e u [n] is normally very small, so we get two separate interference sums due to delay and quantization, respectively.
For the interference term due to delay, |e * [n] 2 = 1. For the interference term due to quantization, it was shown in [n]| 2 is equivalent to the product of the quantization error sin 2 θ u and an independent β(1, N t − 2) random variable. Therefore, we have In [10], with a quantization cell approximation 5 [38], [39], it was shown that h u [n−1] 2 (sin 2 θ u ) has a Gamma distribution with parameters (N t − 1, δ), where δ = 2 − B N t −1 . As shown in [10] the analysis based on the quantization cell approximation is close to the performance of random vector quantization, so we use this approach to derive the achievable rate.
The following lemma gives the distribution of the interference term due to quantization.
Lemma 1: Based on the quantization cell approximation, the interference term due to quantization in (27), [n]| 2 , is an exponential random variable with mean δ, i.e. its probability distribution function (pdf) is Proof: See Appendix D.
Remark 3: From this lemma, we see that the residual interference terms due to both delay and quantization are exponential random variables, which means the delay and quantization error have equivalent effects, only with different means. By comparing the means of these two terms, i.e. comparing ǫ 2 e and 2 − B N t −1 , we can find the dominant one. In addition, with this result, we can approximate the achievable rate of the ZF limited feedback system, which will be provided later in this section.
Based on the distribution of the interference terms, the approximation for the achievable rate for the MU mode is given in the following theorem.

Theorem 4:
The ergodic achievable rate for the u-th user in the MU mode with both delay and channel quantization can be approximated as and I 3 (·, ·, ·) is given in (38) in Appendix A.
Proof: See Appendix E.
The ergodic sum throughput is 5 The quantization cell approximation is based on the ideal assumption that each quantization cell is a Voronoi region on a spherical cap with the surface area 2 −B of the total area of the unit sphere for a B bits codebook. The detail can be found in [10], [38], [39].
As a special case, for a ZF system with delay only, we can get the following approximation for the ergodic achievable rate.

Corollary 2:
The ergodic achievable rate for the u-th user in the ZF system with delay is approximated as where α = P U , β = ǫ 2 e,u P U , M = N t − 1, and I 3 (·, ·, ·) is given in (38) in Appendix A. Proof: Following the same steps in Appendix E with δ 1 = 0.
Remark 4: As shown in Lemma 1, the effects of delay and channel quantization are equivalent, so the approximation in (32) also applies for the limited feedback system. This is verified by simulation in Fig. 1, which shows that this approximation is very accurate and can be used to analyze the limited feedback system.

C. Mode Switching
We first verify the approximation (30) in Fig. 2, which compares the approximation with simulation results and the lower bound (26), with B = 10, v = 20 km/hr, f c = 2 GHz, and T s = 1 msec. We see that the lower bound is very loose, while the approximation is accurate especially for N t = 2. In fact, the approximation turns out to be a lower bound. Note that due to the imperfect CSIT, the sum rate reduces with N t .
In Fig. 3, we compare the BF and ZF systems, with B = 18, f c = 2 GHz, v = 10 km/hr, and T s = 1 msec. We see that the approximation for the BF system almost matches the simulation exactly. The approximation for the ZF system is accurate at low to medium SNRs, and becomes a lower bound at high SNR, which is approximately 0.7 bps/Hz in total, or 0.175 bps/Hz per user, lower than the simulation. The throughput of the ZF system is limited by the residual inter-user interference at high SNR, where it is lower than the BF system. This motivates to switch between the SU and MU MIMO modes. The approximations (17) and (30) will be used to calculate the mode switching points. There may be two switching points for the system with delay, as the SU mode will be selected at both low and high SNR. These two points can be calculated by providing different initial values to the nonlinear equation solver, such as fsolve in MATLAB.

V. NUMERICAL RESULTS
In this section, numerical results are presented. First, the operating regions for different modes are plotted, which show the impact of different parameters, including the normalized Doppler frequency, the codebook size, and the number of transmit antennas. Then the extension of our results for the ZF precoding to the MMSE precoding is demonstrated.

A. Operating Regions
As shown in Section IV-C, finding mode switching points requires solving a nonlinear equation, which does not have a closed-form solution and gives little insight. However, it is easy to evaluate numerically for different parameters, from which insights can be drawn. In this section, with the calculated mode switching points for different parameters, we plot the operating regions for both SU and MU modes. The active mode for the given parameter and the condition to activate each mode can be found from such plots.
In Fig. 4, the operating regions for both SU and MU modes are plotted, for different normalized Doppler frequencies and different number of feedback bits in Fig. 4(a) and Fig. 4(b), respectively, and with U = N t = 4. There are analogies between two plots. Some key observations are as follows: (i) For the delay plot Fig. 4(a), comparing the two curves for B = 16 and B = 20, we see that the smaller the codebook size, the smaller the operating region for the ZF mode. For the ZF mode to be active, f d T s needs to be small, specifically we need f d T s < 0.055 and f d T s < 0.046 for B = 20 and B = 16, respectively. These conditions are not easily satisfied in practical systems. For example, with carrier frequency f c = 2 GHz, mobility v = 20 km/hr, the Doppler frequency is 37 Hz, and then to satisfy f d T s < 0.055 the delay should be less than 1.5 msec.
(ii) For the codebook size plot Fig. 4(b), comparing the two curves with v = 10 km/hr and v = 20 km/hr, as f d T s increases (v increases), the ZF operating region shrinks. For the ZF mode to be active, we should have B ≥ 12 and B ≥ 14 for v = 10 km/hr and v = 20 km/hr, respectively, which means a large codebook size. Note that for BF we only need a small codebook size to get the near-optimal performance [5].
(iii) For a given f d T s and B, the SU mode will be active at both low and high SNRs, which is due to its array gain and the robustness to imperfect CSIT, respectively.
The operating regions for different N t are shown in Fig. 5. We see that as N t increases, the operating region for the MU mode shrinks. Specifically, we need B > 12 for N t = 4, B > 19 for N t = 6, and B > 26 for N t = 8 to get the MU mode activated. Note that the minimum required feedback bits per user for the MU mode grow approximately linearly with N t .

B. ZF vs. MMSE Precoding
It is shown in [31] that the regularized ZF precoding, denoted as MMSE precoding in this paper, can significantly increase the throughput at low SNR. In this section, we show that our results on mode switching with ZF precoding can also be applied to MMSE precoding. .
From this, we see that the MMSE precoders converge to ZF precoders at high SNR. Therefore, our derivations for the ZF system also apply to the MMSE system at high SNR.
In Fig. 6, we compare the performance of ZF and MMSE precoding systems with delay 6 . We see that the MMSE precoding outperforms ZF at low to medium SNRs, and converges to ZF at high SNR while converges to BF at low SNR. In addition, it has the same rate ceiling as the ZF system, and crosses the BF curve roughly at the same point, after which we need to switch to the SU mode. Based on this, we can use the second predicted mode switching point (the one at higher SNR) of the ZF system for the MMSE system. We compare the simulation results and calculation results by (19) and (32) for the mode switching points in Table II For future work, the MU-MIMO mode studied in this paper is designed with zero-forcing criterion, which is shown to be sensitive to CSI imperfections, so robust precoding design is needed and the impact of the imperfect CSIT on non-linear precoding should be investigated.
As power control is an effective way to combat interference, it is interesting to consider the efficient power control algorithm rather than equal power allocation to improve the performance, especially in the heterogeneous scenario. It is also of practical importance to investigate possible approaches to improve the quality of the available CSIT with a fixed codebook size, e.g. through channel prediction.

A. Useful Results for Rate Analysis
In this Appendix, we present some useful results that are used for rate analysis in this paper.
The following lemma will be used frequently in the derivation of the achievable rate.
Lemma 2: For a random variable x with probability distribution function (pdf) f X (x) and cumulative distribution function (cdf) F X (x), we have Proof: The proof follows the integration by parts.
where g ′ is the derivative of the function g, and step (a) follows the integration by parts.
The following lemma provides some useful integrals for rate analysis, which can be derived from the results in [40].
Lemma 3: where E 1 (x) is the exponential-integral function of the first order.

B. Proof of Theorem 1
The average SNR is As e[n] is independent of h[n − 1], it is also independent ofĥ[n − 1], which gives (a).

D. Proof of Lemma 1
, and x is independent of y.
Then the interference term due to quantization is z = xy. The cdf of z is

E. Proof of Theorem 4
Assuming each interference term in (27) is independent of each other and independent of the signal power term, denote u ′ =u ρ 2 [n]| 2 = ǫ 2 e,u y 2 , then from Lemma 1 we have y 1 ∼ χ 2 2(Nt−1) , and y 2 ∼ χ 2 2(Nt−1) as e u [n] is complex Gaussian with variance ǫ 2 e,u and independent of the normalized vector f . In addition, the Then the received SINR for the u-th user is approximated as , and y 1 , y 2 , z are independent of each other.
Let y = δ 1 y 1 +δ 2 y 2 , then the pdf of y, which is the sum of two independent chi-square random variables, is given as [41] p Y (y) = e −y/δ 1 where a (1) The cdf of x is where step (a) follows the equality x + 1 dx where step (a) follows from Lemma 2, step (b) follows the expression of I 3 (·, ·, ·) in (38).