DFT codebook-based hybrid precoding for multiuser mmWave massive MIMO systems

In millimeter wave (mmWave) massive MIMO (multiple-input multiple-output) systems, it is difficult to apply conventional digital precoding techniques due to hardware constraints. Fortunately, the hybrid precoding can be utilized to reduce power consumption and high costs. In this paper, a codebook-based hybrid precoding scheme for downlink multiuser mmWave massive MIMO systems is proposed. Our main idea is that the analog and digital precoders are designed separately to maximize the achievable sum rate. In the analog domain, we take the potential multiuser conflict and angular domain of channel matrix into consideration and propose an efficient conflicting-aware (CA) beam-column selection method to obtain a discrete Fourier transform (DFT) codebook-based analog precoder. According to the CA method, all users are classified into two groups, i.e., conflicting users (CUs) and non-conflicting users (NCUs). Different criteria of beam-column selection are applied for the two user groups. Then, zero-forcing (ZF) digital precoder is directly used in the digital domain. Simulation results illustrate that our proposed algorithm which has low complexity achieves satisfactory SR performance, which approaches that of the full digital precoding (the upper bound) and outperforms other existing hybrid algorithms.


Introduction
Millimeter-wave (mmWave) massive MIMO has been considered as a promising candidate for the fifth generation of mobile telecommunications (5G) [1], since it will enable gigabits per second transmitting data rate owing to the large bandwidth available in mmWave spectrum. In order to implement high-quality communication in mmWave massive MIMO systems, the deployment of a large number of antenna arrays are required at base stations (BSs). Each BS needs to serve multiple mobile stations (MSs) simultaneously for efficient system performance. The precoding technology that is applied to generate transmitting signals at BSs is usually processed in the baseband for conventional lower frequency systems. Nevertheless, for conventional digital precoding, energy-intensive radio frequency (RF) chains are required for each antenna. The energy consumption of each RF chain (about 250 mW per RF chain [2]) is a large portion of the total energy consumption. If traditional full digital precoding is used in mmWave massive MIMO systems, the relevant large number of RF chains will produce high energy consumption. The high hardware cost of numerous RF chains makes traditional fully digital precoders unfit for mmWave massive MIMO systems unfortunately [3]. So precoding in the above systems is a trend to be divided into analog domain and digital domain due to current research.
The authors in [3] investigated hybrid precoding which contains analog and digital precoding for single-user scenario. It was indicated that hybrid precoding technology can reduce the number of required RF chains and performs almost the same as fully digital precoding [3] was designed to obtain spatial multiplexing gain, and it can achieve good performance only when numerous antennas are deployed at both transmitter and receiver. The beam steering scheme for the multiuser scenario was proposed in [4], which firstly chose the best beam steering vectors as the analog precoding matrix for each user and then applied the zero-forcing (ZF) precoding in the digital domain. This algorithm performed well in singlepath scenarios, but would deteriorate a lot in multi-path scenarios because beam steering could not make full use of multipath channel gains.
Another related set of works has considered the codebook-based hybrid precoding, where a precoder is selected from a set of a priori fixed codebook. The complexity of the codebook-based design is generally much lower than the non-codebook-based design. The codebooks in the IEEE 802.15.3c standard [5] of indoor systems and IEEE 802.11ad [6] standard of wireless local area networks (WLAN) were designed. A Q-bit resolution codebook using the optimal precoding weight vector was proposed in [7] to achieve a uniform maximum gain in all directions. However, the column vectors in codebooks of [5][6][7] are not orthogonal, which may lead to performance degradation for multiuser scenarios due to the multiuser interference. For mmWave systems, an efficient discrete Fourier transform (DFT) codebookbased precoding training scheme was proposed in [8]. The analog precoding matrix is produced by column vector selection from the DFT codebook whose columns are orthogonal [9]. proposed hybrid precoding design, where mutually unbiased bases combined (MUB) were applied for digital precoder and the DFT codebook was invoked for analog precoder. A single-user scenario was considered in [9], which had a good performance and low complexity. For multiuser scenario, the DFT codebook-based beam selection scheme was proposed in [10] to formulate the hybrid precoding design into an optimization problem, referred to exhausted searching which traverses all the possible beam combinations. The joint optimization of the DFT codebook-based hybrid precoding was investigated in [11] for multiuser mmWave systems under two performance measures: sum rate (SR) and energy efficiency (EE). The joint codeword selection and precoder design (JWSPD) scheme in [11] are flexible, which can achieve a satisfactory performance for both SR and EE.
In this paper, we propose a new codebook-based hybrid precoding to maximize the achievable SR for multiuser mmWave massive MIMO systems. Especially, the analog and digital precoders are designed separately. In the analog domain, the constant modulus limitation of analog precoder should be taken into account. So the analog precoder is derived from the DFT codebook whose column vectors are orthogonal and each entry of which is constant modulus. We propose an efficient conflicting-aware (CA) beam selection scheme for column selection from the DFT codebook of analog precoder, with the consideration of the potential multiuser conflict. Note that the same beam column in DFT codebook may be selected for different users, which will cause severe multiuser interference. Moreover, some RF chains are likely to be wasted because they do not contribute to SR. Hence, all users are firstly classified into two groups, i.e., non-conflicting users (NCUs) and the conflicting users (CUs). If the dominant beam index of one user differs from any other user's, the user is defined as NCU. On the contrary, the user whose dominant beam index is the same as any other users is defined as CU. Then, different criteria of beam-column selection are applied for the two user groups. For NCUs, the beams with the largest signal to leakage plus noise ratio (SLNR) are picked out, while for CUs, the best beams are chosen by an incremental algorithm with low complexity in order to maximize the achievable SR. The proposed CA searching scheme performs similarly to the exhausted searching algorithm in [10]. In the digital domain, zero-forcing (ZF) technique [12] is adopted to obtain the digital precoder. An arbitrary number of RF chains of the selection is considered in this paper without the restriction that the number of RF chains equals the number of users. Finally, the numerical results validate that our proposed codebook-based hybrid precoding scheme achieves satisfactory SR performance, which approaches to that of the full digital precoding (the upper bound), and gets a better performance than other existing hybrid precoding algorithms, such as [4,5,7,11]. The contributions of the paper are summarized as follows.
The hybrid precoding design is investigated in multiuser mmWave massive MIMO systems under SR performance optimization with non-convex constraint. Our scheme is based on the idea that analog and digital precoders are separately designed. Considering the constant modulus limitation of analog precoder, we propose a DFT codebook-based analog precoder, whose orthogonal beam columns (i.e., analog precoding vectors) are specified by DFT codeword. We propose a novel CA beam-column selection method for designing analog precoder, which can efficiently classify users into two groups, i.e., CUs and NCUs. Different criteria of beam-column selection are applied for the two user groups. Our proposed codebook-based hybrid precoding scheme has low computational complexity. The channel matrix is mapped from the spatial domain to angular domain. Since the channel matrix in the spatial domain is sparse, the mapping can decrease the complexity while selecting beams. Then, due to the Sherman-Morrison-Woodbury identity [13], a recursive way can be used on the matrix inversion which can decrease the complexity.
The remaining of this paper is organized as follows. Section 2 introduces the system model and formulates the SR maximization problem. The proposed DFT codebook-based hybrid precoding design with CA beam selection method and complexity analysis are shown in Section 3. In Section 4, simulations are provided to evaluate the performance of the proposed scheme. Conclusions are presented in Section 5.

Notation
In this paper, boldface letters denote vectors and matrices; E(⋅)denotes the expectation; (⋅) T , (⋅) H , and (⋅) −1 denote the transpose, conjugate transpose, inversion, respectively; |·| denotes the determinant of a matrix; ||·|| F denotes the Frobenius norm of a matrix; diag(p) denotes the diagonal matrix. £ M × M represents the spaces of M × M matrices with complex entries; CN ðm; RÞ denotes the complex Gaussian distribution with mean m and covariance R; I U is the U × U identity matrix. AnB denotes the set where the elements in set B are eliminated from set A. Finally, [⋅] represents to round a decimal to its nearest lower integer.
2 System model and problem statement

System and signal model
In this paper, a fully connected downlink mmWave system is considered, which is shown in Fig. 1. Every RF chain is connected to all antennas of the BS. Now, a general scenario with one BS serving U users simultaneously is considered. The BS is equipped with N RF RF chains and N BS antennas. Each MS is equipped with N MS antennas. The BS communicates with each MS only through one stream. So the total number of transmitting streams is N S = U. In addition, the number of MS cannot exceed the number of BS RF chains [7], i.e., N S ≤ N RF . The spatial multiplexing gain of this hybrid system for multiuser is limited by N BS ≥ N RF .
On the downlink multiuser mmWave MIMO system as shown in Fig. 1, the BS applies an N RF × N S digital precoding matrix ½1; 2; :::; UÞ is an N RF × 1 vector, followed by an N BS × N RF RF precoding matrix. The transmitting signal is therefore where s = [s 1 , s 2 , …, s U ] T is the message to be transmitted which is an N S × 1(U × 1) vector and normalized as E[ss H ] = (1/N S )I U . Since F RF is the analog precoding matrix implemented as analog phase shifters, and the entries of F RF are constant modulus, which should be normalized to satisfy j F RF ½i; j j¼ 1= ffiffiffiffiffiffiffiffiffi ffi is the (i, j) th element of analog precoding matrix F RF . The total power constraint of the hybrid precoding matrix is enforced by normalizing F BB as k F RF F BB k 2 F ¼ N S . The received signal observed by the u th MS is where H u is an N MS × N BS matrix which indicates the channel between the BS and the u th user, n u ∈CN ð0; σ 2 I N r Þ represents the Complex Gaussian noise vector, and ρ represents the average received power. At the u th MS, the received signal is processed by the u th combiner w u . The processed signal is where w u ∈ℂ N MS Â1 is the digital combining vector of the u th user.

Millimeter-wave channel model
An extended Saleh-Valenzuela model is used in this paper due to the high antenna correlation and the limited spatial selectivity, which is commonly applied in mmWave massive MIMO channel modeling [14][15][16].
We consider the scenario that each scatter reflects one path. An adoption of a geometric channel model with L u scatters is used for the channel of the u th user. Under this model, the channel matrix of the u th user H u ∈ ℂ N MS ÂN BS can be expressed as where N BS is the number of antennas at the BS, N MS is the number of antennas at the u th MS. α u, l is the complex gain of the lth path, which includes the path loss with E½jα u;l j 2 ¼ α and αis a normalization constant. The variable ϕ u;l ∈½0; 2π is the lth path's angle of departure (AoD). The variable θ u, l ∈ [0, 2π] is the lth path's angle of arrival (AoA). At last, a u,BS (ϕ u, l ) is the antenna array response vector of the BS. a u,MS (θ u, l ) is the antenna array response vector of the u th MS. L u is the number of propagation paths for the channel of the u th MS with l ∈ [1, 2, …, L u ]. We assume the BS and every MS get knowledge about the structure of their antenna arrays. A typical uniform linear array (ULA) is utilized in the simulations. If a ULA is equipped, a u,BS (ϕ u, l ) is defined as where d is the distance between adjacent antenna elements and λ is the wavelength of mmWave. a u,MS (θ u, l ) of the MS can be described in a similar pattern. If each MS has one antenna, no AoAs exist. Therefore, (4) can be simplified as Now, H indicates the collective channel matrix from all users, which is denoted as The received signal vector y = [y 1 , y 1 , ⋯, y U ] T of the total U users can be expressed as where n∈CN ð0; σ 2 I UÂN MS Þ represents the Complex Gaussian noise vector and w ¼ ½w The channel model can be translated into the beam space or angular domain [17] via a beamforming matrix U t , which is an N BS × N BS unitary matrix. The columns of U t are the vectors in the DFT matrix with uniform spacing. The (k, l) entry of U t is analytically defined as The DFT codebook that divides the space into N BS orthogonal beams is introduced. An N BS -dimensional DFT matrix can be denoted as Hence, H a is defined as an equivalent representation of the channel in angular domain [17], which can be denoted as (7) can be expressed in the angular domain as follows.
The relationships between the channel matrix of the angular domain H a and channel matrix of the spatial domain H in (6) are denoted as The channel matrix H a is a mapping of the signals for each MS in a new domain of orthogonal vectors. The channel matrix in the spatial domain H is the discrete Fourier transform of the channel matrix in the angular domain H a . The U t defined in (8) is unitary, i.e., U t U H mmWave systems, the non-zero elements of every vector in angular channel is much smaller than N BS , namely, the angular domain channel matrix H a is sparse [18]. This sparse structure can be utilized to design F RF with dimension reduced by column vector selection from the DFT codebook U t whose columns are orthogonal. The ideal performance can only be achieved when the users located in the orthogonal subspace in mmWave massive MIMO systems. The multiplication of U t and F RF can obtain parallel sub-channels to transmit signals after analog precoding according to (11). Moreover, U t satisfies the constant modulus constraint which is enforced by the feasible domain of analog precoder F RF . Then, the problem of designing F RF is transformed to select proper columns from the DFT codebook.

Problem formulation
The objective is to efficiently design appropriate hybrid precoding matrices at the BS for the mmWave massive MIMO system. For this purpose, the common performance measures are considered: the achievable sum rate (SR) of the system. Our optimum target is to maximize SR in the downlink multiuser scenario. Assume that the BS can get the perfect channel state information (CSI). According to the processed signal received at the u th user in (3), the achievable data rate of the u th user is The SR of the system is Although other criterion such as the max-min fairness criterion is interesting in [19], we aim to maximize the achievable SR in this paper, since the achievable SR is a crucial criterion in the mmWave systems. The SR performance is preferable to fairness when precoders of BS are practically operated, which brings greater system capacity and better user experience. Hence, we choose SR performance as the optimization objective.
In our paper, we assume that each MS is equipped with a small number of antennas, since N MS is usually much smaller than N BS in massive MIMO systems [16]. The BS communicates with each MS only through one stream. Therefore, in order to get the multi-antenna gain, digital combining is utilized at the MSs. The hybrid combining cannot reduce RF chains when the number of MS antennas is smaller than the number of propagation paths in the channel. The digital combiner of MS is practical when each MS has a few antennas, since it will not bring plenty of RF chains and high cost of hardware overhead. The matched filter is chosen by us as the combiner at MSs, since the matched filter is an optimal linear filter, and its output signal to interference plus noise ratio (SINR) can be maximized which can make a great contribution to the system sum rate. The matched filtering (MF) combiner is defined as whereĤ u is an effective channel matrix which can be in- The proceeding design problem of F RF and F BB can be described as where W represents the feasible domain of analog precoders that is the set of The total power constraint of the BS is enforced by normalizing F BB as k F RF F BB k 2 F ¼ N S . No other hardware-related constraints are imposed on the digital precoding matrix.
The optimal solution to (17) is almost impossible to obtain due to several difficulties, including the RF constraints, the couple between F RF and F BB . However, near-optimal solutions can be found through iterative algorithms, which require a large number of training processes and feedback overhead. Therefore, a direct solution to this maximization problem of sum rate in (17) is neither tractable nor practical. In view of the application difficulties of previous precoding algorithms in mmWave, we develop a new DFT codebook-based hybrid algorithm suitable for multiuser mmWave massive MIMO systems in Section 3. Our proposed scheme achieves better performance than the other algorithms.

Methods
In this section, we propose a codebook-based hybrid precoding algorithm to solve the optimization problem and analyze its computational complexity. The main idea of our scheme is dividing the calculation of the hybrid precoding into analog and digital domains.

The proposed hybrid algorithm
In the digital domain, zero-forcing (ZF) technique is adopted to obtain the digital precoder, based on the equivalent channel matrix.
In the analog domain, the analog precoder F RF is picked from DFT codebook [9] which is also called beam selection. Beams mean column vectors of the DFT codebook. Our target is to select N RF best unshared beams from total N BS beams for U users to maximize the achievable sum rate. Hence, it is crucial to choose proper beams which will be applied for the design of analog precoder, although that can be done by exhaustive searching with high time cost over all the possible beam combinations to identify the optimum beams. A more efficient CA beam selection method that can make the balance between performance and time consumption is proposed for the column vector selection of the analog precoder. Then, ZF precoding is directly utilized for digital precoder. CA method consists of two stages: (1) group to identify NCUs and CUs by consideration of the potential multiuser conflict and (2) fix the beam indexes for NCUs and seek the best exclusive beam indexes for CUs, in order to maximize the SR. As F RF is the abstraction from the DFT matrix U t , the column indexes of DFT codebook which can be picked by F RF are obtained. For the sake of simplicity, the scenario that each MS has one antenna and N RF = U will be firstly explained as follows.

Stage 1: Grouping stage to identify CUs and NCUs
Our goal is to pick N RF best dedicated beams from the total N BS beams of U t to maximize the achievable sum rate. As we know, the optimal solution can be obtained by exhaustive search, which has the best performance but suffers the high complexity which is C N RF N BS . For a mmWave massive MIMO system with N BS = 128 and N RF = 16, the total searching number reaches 128 ! / 16 ! (128 − 16) ! = 9.8 × 10 18 . Hence, a more practical selection scheme to get the near-optimal performance is necessary.
Firstly, we sort the row elements of H a in ascending order of the ratio T as follows.
where H a (u, i) is the (u, i) th element of the matrix H a . The maximum value T u, i in the whole u th row of the angular channel H a means the dominant beam of the u th user. The dominant beam index i of the u th user is denoted as b Ã u ∈½1; 2; :::; N BS : The dominant beams fb Ã u g U u¼1 make a direct contribution to achievable sum rate according to Appendix. Hence, if b Ã 1 ≠b Ã 2 ≠⋯≠b Ã U , the beam set B ¼ fb Ã 1 ; b Ã 2 ; ⋯; b Ã U g can get the near optimal sum rate. But if there are two users sharing the same dominant beam, the two users will suffer severe multiuser interference. Unfortunately, if U is large and N BS is small, the probability of selecting the same dominant beam is obviously large. Therefore, users can be grouped. Users are categorized into line-of-sight (LOS) and non-line-of-sight (NLOS) groups in [20], based on the channel conditions. In this paper, all U users are classified into two groups, i.e., non-conflicting users (NCUs) and conflicting users (CUs) according to their dominant beam index in H a . Different criteria to select beams are adopted for different user groups. An example for grouping is given as follows.
(1) User u is defined as NCU, if the dominant beam index of u differs from any other users, i.e., b Ã u ∉fb Ã 1 ; The NCU group F NCU consists of all NCU users. The example is shown in Fig. 2a, in which the dominant beam index of users can be obtained according to (18) The dominant beam index can be directly selected for the user u∈ F NCU , because this beam not only make a direct contribution to achievable sum rate but also produce little interference to other users.
(2) User u is defined as CU, if the dominant beam index of u is the same as any other users, i.e., b Ã The CU group F CU consists of all CU users. F CU ∪ F NCU ¼ f1; 2; ⋯; U g and F CU ∩ F NCU ¼ ∅ are obvious. The example in Fig. 2a For the user u∈ F CU , the suitable beam index is picked from the set f1; 2; ⋯; N BS gnfb Ã u ju∈ F NCU g. which is introduced in the next stage as shown in Fig. 2b.
A simple example for single-antenna MS is taken above to better understand our proposed scheme. If each MS of the user has several antennas, the angular channel of the u th user is not a vector, but a matrix denoted as H a u . Then, we search all elements in H a u , instead of the row elements of H a , and the ratio T is sorted in ascending order as well.

Stage 2: Best beam indexes searching stage
For the user uðu∈ F NCU Þ, the best beam index can be directly selected as the dominant beam index. In the above example, the best indexes of NCUs are CardðF NCU Þ ¼ f2 ; 8g. Then, another search for u∈ F CU remains ðCardðN BS Þ−CardðF NCU ÞÞ ¼ f1; 3; 4; 5; 6; 7g beams to maximize the SR. A classical ZF precoding technology is adopted in the digital domain as follows.
where H eq ¼ H a U H t F RF is the equivalent channel matrix seen from the perspective of the digital domain. α is a scaling factor to make sure that EfkF BB sk 2 2 g ¼ ρ. Hence, the factor α is depicted as If equal power allocation is utilized at the BS, the average data rate of the u th user is given by Then, the SR maximization problem in (17) can be written as where D is the possible beam candidate set for the CUs with CardðF NCU Þ , which is the unshared columns selected from the set f1; 2; ⋯; N BS gnb Ã u ju∈ F NCU g . We denote H r eq ¼H eq ð:; sÞ;s∈D∪fb Ã u ju∈ F NCU g.
It is obvious that maximizing the SR is equivalent to minimize trðH r eq ðH r eq Þ H Þ −1 . Hence, (22) can be further formulated as Next, the beam columns with the greatest contribution to the SR are picked out in each step. The optimal solution is by an exhaustive search which involves prohibitively high complexity of search with C N RF N BS possible combinations. A low-complexity incremental algorithm [21] for beam-column selection is proposed to get a near-optimal solution.
As the example is shown as Fig. 2, in which CUs consist of user 2 and user 3, since b Ã 2 ¼ b Ã 3 ¼ 5: Therefore, in the first step, the unshared beam column b 2 of user 2 should be selected as where A¼H eq ð:; sÞ;s∈fb Ã u ju∈ F NCU g is a matrix of beam columns selected for NCUs, g b ¼H eq ð:; bÞ;b∈S ¼ f1; 2; ⋯ ; N BS gnb Ã u ju∈ F NCU g is one beam column possibly selected for CUs. ε is set to make G nonsingular, which can guarantee the matrix inversion in (24) exists. ε is a positive number (e.g., 10 -2 ).
Based on the Sherman-Morrison-Woodbury identity [13], a recursive way can be used to obtain the matrix inversion which can decrease the complexity. The matrix inversion can be obtained with complexity OðU 2 Þ instead of OðU 3 Þ. A convenient expression is given for the inverse of the matrix (B + UV H ) where B is n by n and U and V are n by k: In (25), both B and (I + V H B −1 U) are assumed to be nonsingular. A rank-k correction to a matrix results in a rank-k correction of the inverse. Particularly, if k = 1, B is nonsingular, According to (26), since G is nonsingular, (24) can be equivalently formulated as By calculating the objective function in (27), b 2 has been selected. In the example in Fig. 2, user 2 selects the beam 5. Then, G and S can be updated as G ¼ G þ g b2 g H b2 and S = S\b 2 . Then, user 3 in CUs uses the same method presented above to pick other beams. We assume that user 3 selects the beam 6 after computation. If there are more users, an incremental algorithm should continue until all CardðF CU Þ beams have been picked out. Consequently, the set B Ã of all selected beam columns can be formed as In this paper, the number of RF chains selected is arbitrary. Many methods such as [4] solve the hybrid precoding problem, but N RF = U is strictly demanded, which cannot meet the general scene as N RF ≥ U. Our proposed algorithm satisfies the normal condition without the constraint N RF = U. Hence, this can utilize the additional RF chains to explore all potential gains of sum rate. As mentioned above, we first introduce per-user CA beam selection method above that each of the user is assigned with one RF chain. While the condition is N RF ≥ U, each user will be assigned multiple RF chains. For fairness, we strive to make every user use the same number of RF chains. Denote C = [N RF /U], each user is firstly allocated with C RF chains which can be acquired from the CA method executing C rounds. The remaining RF chains can be denoted as T = N RF − [N RF /U] ⋅ U. The remaining RF chains are prioritized to allocate to NCUs, since users without multiuser interference can bring higher SR. Then, two cases are classified according to the number of NCUs which is denoted as N F NCU : (1) If N F NCU > T , each of the first Tth NCUs is allocated with one RF chains. (2) If N F NCU < T , all of the NCUs are allocated with one RF chain. Each of the first ðT −N F NCU Þ th CUs is allocated with one RF chain.

Computational complexity
Here, we compare the computational complexities of our proposed algorithm with the full digital algorithm, the exhausted searching algorithm, and the algorithms in [4,5,7,11], which are presented in Table 1. The complexity of the proposed algorithm includes two parts as follows. Consequently, the totally computational complexity of the proposed algorithm is OðN BS U 2 þ U 2 Þ. Meanwhile, the complexity of the full digital algorithm is OðN 3 BS Þ . Compared to the exhausted searching, the proposed CA scheme reduces the computation time from C N RF N BS to U 2 , which is much more acceptable. Hence, the complexity of an exhausted searching hybrid algorithm is OðN BS U 2 þC N RF N BS Þ . The complexity of beam steering hybrid algorithm in [4] is OðN 2 BS U 2 Þ . The complexities of hybrid algorithms in [5,7] are OðN BS U 2 þ U 2 Þ as well, since beam selection methods except for selected codebooks in them are the same in simulations of the next section. The complexity of the JWSPD hybrid algorithm in [11] is OðN 4 BS U 4 Þ. It can be noticed that our proposed hybrid precoding algorithm can highly reduce the computational complexity compared with the full digital algorithm and the exhausted searching algorithm. Besides this above, when N BS is much larger than U, the complexity of the  [4] OðN 2 BS U 2 Þ 802.15.3c hybrid in [5] OðN BS U 2 þ U 2 Þ Q-bit resolution hybrid in [7] OðN BS U 2 þ U 2 Þ JWSPD hybrid in [11] OðN 4 BS U 4 Þ proposed algorithm is much lower than the algorithms in [4,11]. Though the complexity of our proposed algorithm is equal to the algorithms in [5,7], our proposed hybrid precoding can get better performance.

Results and discussion
In this section, we evaluate the performance of our proposed algorithm with the full digital precoding, the algorithm based on DFT codebook with exhausted searching scheme mentioned in [9], beam steering hybrid algorithm in [4], IEEE 802.15.3c codebook-based hybrid precoding [5], Q-bit resolution codebook-based hybrid precoding [7], and JWSPD hybrid algorithm in [11], which are all under fully connected architecture for fairness. The simulation parameters are described as follows. Considering the system model depicted in Section 2, the ULA is implemented at the BS. The spacing between antenna elements of BS is half-wavelength. The channel of every MS has 5 propagation paths. The AoD of every propagation path is uniformly distributed [0, 2π]. The average power of every propagation path is randomly generated from a uniform random variable within [0, 1]. The total received power is normalized for every MS in order to satisfy the power constraint. Signal to noise ratio (SNR) is defined as SNR = ρ/σ 2 . The simulation results are generated by 1000 trial Monte-Carlo simulation. The performance of precoding algorithms is shown in terms of achievable SR and EE in the following subsections. Figure 3 depicts the performance comparison of achievable sum rate, against the average SNR, with the BS antenna number N BS = 128 and user number U = 16. A simple scenario is assumed that the number of RF chains is equal to the number of MSs and each MS is equipped with one antenna. It is shown that the performance of our proposed algorithm is close to the full digital algorithm (the upper bound), but the number of RF chains is significantly reduced. The proposed algorithm can also get better performance than the algorithms in [5,7]. The comparison of our proposed hybrid precoding and the JWSPD scheme proposed in [11] is given under two different configurations, i.e., λ ≠ 0 and λ = 0. A turnable sparse parameter λ is introduced in [11] to control the sparsity of the optimization solution. The larger λ is, the more sparse solution is. Thus, λ is properly chosen firstly in the JWSPD scheme to balance maximizing the system SR and minimizing the number of the selected virtual antennas (codewords). Then, the remained (unselected) RF chains with the corresponding phase shifter networks can be turned off to save power. In other words, the introduced sparse parameter λ represents a trade-off between system SR and EE. However, our proposed algorithm directly optimizes only one factor which is SR. Hence, the SR performance of our proposed scheme is better than the SRmax JWSPD scheme, when λ is properly chosen and λ ≠ 0 as it is shown in Fig. 3a. If the sparse parameter λ is mandatorily chosen to zero in the simulation of Fig. 3b, the JWSPD scheme achieves a little better SR performance than our proposed scheme, but with higher computational complexity. Since λ = 0 means that the sparsity of the optimization solution and EE are mandatorily unconsidered, the system SR can be maximized.

Achievable sum rate
The performance of our proposed algorithm is also close to the exhausted searching DFT codebook-based hybrid algorithm. The exhausted searching algorithm traverses all the possible beam combinations and jointly exploits the optimum beams. However, it is extremely time-consuming when N BS grows large. Our proposed scheme is a more practical scheme with an acceptable performance difference. When fixing the SR at 40 bits/s/ Hz, the proposed hybrid algorithm can get 5 dB gain compared to [5]. That demonstrates the CA beam selection method and DFT codebook of the proposed algorithm are efficient and attractive ways to solve the optimal problem (17). Figure 4 depicts the performance comparison of achievable sum rate, against the average SNR, with U = 4, and N BS = 128 or N BS = 256. Each MS is equipped with two antennas in simulations. The BS serves U MSs simultaneously using N RF , and RF chains are assumed for simplicity as well.
As shown in Fig. 4a, the performance of our proposed algorithm is better than the algorithms in [4,5,7], when N BS = 128. The performance difference is caused by the following three reasons. Firstly, the column vectors are not orthogonal in beam steering codebook in [4], 802.15.3c codebook in [5] and Q-bit resolution codebook in [7], which may lead to performance degradation due to multiuser interference. However, we use the DFT codebook which has orthogonal columns. Secondly, our proposed CA method for columns selection from the codebook is more flexible than the selection method in [4]. CA method utilizes the sparse angular domain of the channel matrix, with the consideration of the potential multiuser conflict. But the selection method in [4] is based on the assumption of a single path in the channel matrix. Thirdly, the algorithm in [4] performed well in single-path scenarios but would deteriorate a lot in multi-path scenarios because beam steering could not make full use of multi-path channel gains. In fact, mmWave systems are mainly based on multi-path scenarios in reality. Our proposed algorithm can perform well in both single-path and multi-path scenarios.
Some observations can be obtained by analyzing Fig.  4b. The sum rates of all the algorithms grow as the number of BS antennas N BS increases from 128 to 256. The sum rate of our proposed algorithm almost rises to 70 bps/Hz at SNR is 30 dB. But the performance gaps between the proposed algorithm and other algorithms are almost invariable with the increasing number of N BS , since numerous antennas are deployed at BS which can obtain spatial multiplexing gain. Figure 5 depicts the performance comparison of achievable sum rate, against the average SNR, with different number of users at N BS = 256. The number of user U is the same as the number of RF chains N RF , for three different configurations such as U = 32, U = 16, and U = 8. From Fig. 5, it can be observed that the SR of the proposed algorithm improves with the increasing number of users. But the rate gap between the proposed algorithm and full digital algorithm becomes bigger as the number of users increases. This trend is because the inter-user interference becomes high. Figure 6 depicts the performance comparison of achievable sum rate, against the number of RF chains at N BS = 128, U = 16, and SNR = 25 dB. The number of RF chains is from 8 to 128. It can be observed that the SR of the proposed algorithm enhances by increasing RF chains owing to the diversity gains supplied by multiple RF chains. It can take full advantage of additional RF chains. Some other observations can be obtained by analyzing Fig. 6. The first is that there are some differences between full digital algorithm and the proposed algorithm at low N RF . The second is the performance finally converges to the full digital algorithm (the upper bound) by increasing N RF , while the gap reduces to zero at N RF = 128. The third is when N RF = 64, and the proposed algorithm achieves 99% achievable sum rate of the full digital algorithm. Figure 7 depicts the performance comparison of the achievable sum rate, against the average SNR, with different power allocation schemes for users, i.e., waterfilling power distribution [22] and equal power allocation. For waterfilling, power allocation is implemented after hybrid precoding. The effective channel matrixĤ u is firstly implemented by singular value decomposition (SVD) to get parallel subchannels. Then, the Lagrangian method is used to distribute the power of subchannels. It is shown that waterfilling power distribution scheme is better than equal power allocation. The performance of our proposed hybrid precoding with waterfilling power distribution is much closer to the full digital precoding (the upper bound). Due to Lagrangian of waterfilling power distribution method can be used to distribute the power of subchannels, the system capacity is the largest. Consequently, the achievable sum rate of the system can be maximized.

Energy efficiency
The energy efficiency ζ is defined in [2] as the ratio between the achievable sum rate and the total power consumption, i.e., ζ ¼ R SUM PþN RF P RF þP BB ðbps=Hz=WÞ, where P is the transmitting power of BS, P RF is the power consumption of every RF chain, and P BB is the power consumption of digital precoder. In general, the typical values are adopted that P RF = 250 mW, and P BB = 300 mW [23]. Figure 8 shows the performance of EE, against the average SNR, also with the BS antenna number N BS = 128 and user number U = 16. It is shown that our proposed algorithm can get higher EE than other five compared algorithms. In particular, our proposed algorithm performs much better than the full digital precoding algorithm because N RF equals to N BS in full digital precoding. A large number of RF chains cause very high energy consumption, i.e., 250 mW per RF chain. In Fig. 6 The achievable sum rate vs. the number of RF chains hybrid precoding, RF chains are far less than antennas at the transmitter. Therefore, the hybrid precoding algorithm can extremely reduce the energy consumption generated by the RF chains compared to the full digital scheme. Moreover, in terms of EE, our proposed algorithm outperforms the JWSPD scheme [11] (λ = 0), which achieves a similar SR performance to that of our scheme in Fig. 3b. Since EE are unconsidered in the JWSPD scheme as λ = 0. Figure 9 shows the performance of EE, against the number of users, where SNR is 15 dB and the number of users increases from 5 to 25. It is obvious that EE of our proposed algorithm decreases with the increasing number of users, since more users will cause more RF chains selected at BS with more energy consumption generated. The EE of our proposed algorithm is higher than all of other five algorithms, especially when there are not too many users U.

Conclusion
In this paper, a low-complexity hybrid precoding scheme is proposed for multiuser mmWave massive MIMO systems. The main idea is designing hybrid precoding in the digital and analog domains separately to achieve the near-optimal achievable sum rate. In the analog domain, we exploit a DFT codebook-based analog precoder. The core of the analog precoder design is the proposed conflicting-aware (CA) beam-column selection from the DFT codebook, which is from the analysis of sum rate maximum and the utilization of angular channel matrix. Compared to exhaustive search, the CA selection method dramatically reduces computational complexity. Then, in the digital domain, the digital precoder is obtained by applying ZF criterion in the effective channel matrix. After our proposed hybrid precoding, the achievable sum rate of the system can be maximized. The SR and EE performance of the schemes under different configurations are shown from various simulation results. Our proposed algorithm with low computational complexity approaches to the full digital performance and gets better performance compared to other existing hybrid algorithms.

Appendix
In this section, we introduce a selection criterion of the dominant beam for the u th user.
Different from retaining most of the power in previous beam selection, we use a metric T, defined by the signal to leakage plus noise ratio (SLNR) to maximize SR. SLNR for the u th user is defined as where H a u is the N MS × N BS matrix which indicates the angular channel between the BS and the u th user, and w u is the hybrid precoding matrix described as F RF f u BB . If the leakage channel matrix for the u th user is defined as , which can be the metric for dominant beam selection. The physical significance of the high SLNR value is that the equivalent channel gain of the user increases, while the power leaked to other users is still small, which means the interference to other users is small. When each user leaks to other users with less power, the interference to other users is relatively smaller. Hence, the SINR of each user can be large. Then, the achievable sum rate is high according to (14). Following this selection criterion of beams for the u th user, the dominant beams make a direct contribution to SR.