Neighbor-based joint spatial division and multiplexing in massive MIMO: user scheduling and dynamic beam allocation

Two-stage precoding schemes have been developed to reduce the channel estimation overhead in FDD systems. By integrating user scheduling into these schemes, it becomes possible to meet the quality-of-service requirements of high-density wireless communication systems, despite the limitations on spatial resources and transmit power budget. In this paper, we propose a user scheduling and dynamic beam allocation method for neighbor-based joint spatial division multiplexing (N-JSDM) transmission. The user scheduling problem is formulated as a 0–1 quadratic programming problem to maximize effective spectral efficiency (ESE) using directional channel properties. To address the complexity issue, convex relaxation and linearization methods are employed to transform the problem into a 0–1 mixed integer linear programming, and a dimensionality reduction method is introduced. The proposed user scheduling-aided N-JSDM scheme reduces downlink training length and feedback of channel state information. Additionally, a dynamic configuration form is used for pre-beamforming matrix design based on user distribution, outperforming conventional approaches. Simulation results demonstrate higher ESE achieved by the proposed scheme compared to previous methods.


Introduction
Over the past three decades, the data rates of wireless communication have been doubling every eighteen months, and it is projected to reach Terabit-per-second in the near future [1].Massive multiple-input multiple-output (MIMO) has been a crucial technology for enhancing system throughput and providing reliable communication [2].By employing a large-scale antenna array at the base station (BS), massive MIMO achieves higher data transmission rates, with the number of BS antennas significantly surpassing the number of served user terminals.It utilizes spatial resources and capitalizes on the multipath propagation characteristics to establish a parallel transmission mechanism, multiplying system capacity without the need for additional spectrum resources or transmit power [3].In the forthcoming communication systems, massive MIMO will continue to play a pivotal role.
Massive MIMO relies on the channel state information (CSI), which is the communication link state information from the transmitter to the receiver [4].When the CSI is perfect, the performance of massive MIMO scales linearly with the smaller number of antennas between the transmit and receive sides [5], underscoring the critical importance of obtaining instantaneous CSI.In previous research on massive MIMO systems, time division duplex (TDD) mode has been widely adopted.TDD leverages channel reciprocity, enabling the estimation of downlink CSI through the uplink channel, thereby reducing spectral overhead [6][7][8][9].However, the prevailing wireless standards predominantly employ frequency division duplex (FDD) systems, which offer more mature industrial products and market share [10].Furthermore, in the extensively studied millimeter-wave frequency band, FDD systems may exhibit similarly impressive performance in cell-free massive MIMO systems [11].Nonetheless, due to the absence of channel reciprocity, FDD massive MIMO systems necessitate substantial downlink training length (DTL) and CSI feedback during the downlink communication to acquire CSI at the transmitter [12].Additionally, the cost of reconfiguring frequency bands to accommodate TDD in FDD systems is considerably high [13].Consequently, for FDD massive MIMO systems, acquiring CSI presents a significant challenge, particularly for telecom operators compelled to upgrade their existing FDD systems to 5 G wireless communications.
There have been a lot of research efforts on reducing DTL and channel feedback in FDD massive MIMO systems.Similar to the CSI acquisition in TDD mode, several studies (e.g., [11] and [13]) leverage angle reciprocity by transmitting uplink pilots to obtain CSI, thereby eliminating the need for CSI feedback.The minimum number of pilots required corresponds to the number of terminals.Moreover, some works focus on the spatially correlated MIMO channels and utilize the structure of CSI to reduce DTL and CSI feedback.Specifically, the compressed sensing techniques are used to exploit channel sparsity [14,15].Expanding on the consideration of spatial correlation, additional researches have taken into account temporal correlation and leveraged the spatial and temporal common sparsity of massive MIMO channels to acquire CSI with reduced overhead [4,16].Additionally, a two-stage beamforming scheme called joint spatial division multiplexing (JSDM) based on statistical CSI is proposed [12].The JSDM beamforming scheme comprises two stages.In the first stage, the pre-beamformer uses the channel covariance matrix (CCM) to mitigate inter-group interference.In the second stage, the instantaneous CSI of each group is used to design a precoding scheme for intra-group interference suppression.Obtaining the statistical CSI is relatively easier compared to instantaneous CSI since its variations occur at a slower rate [17,18].
Extensive research attentions have been paid to enhance the performance of JSDM [19][20][21][22][23].Some works consider the pre-beamformer design to achieve a better spectral efficiency [19][20][21].Specifically, due to the non-convexity caused by signal-to-interference-plus-noise ratio (SINR) as an optimization criterion, Kim et al. proposed to use signal-to-leakage-plus-noise ratio (SLNR) as the optimization objective and simplified the SLNR-based pre-beamformer design problem to the trace quotient problem encountered in the field of machine learning [19].In [20], Jeon et al. used the minimum mean squared error criterion to design the pre-beamformer and multi-user precoder sequentially.However, none of the above works considered the impact of user grouping.Since the channel covariance matrices of users differ, and the goal of user grouping is to make users in each group have a common eigen-subspace, there will inevitably be overlapping signal spaces between groups.Eliminating inter-group interference by pre-beamforming will reduce the signal space and result in a loss of system performance.Recently, a scheme called neighbor-based JSDM (N-JSDM) is proposed in [21], which avoids the user grouping problem by adopting the neighbor scheme to fully utilize the signal space.N-JSDM is still a two-stage scheme.In the first stage, a pre-beamforming matrix is designed according to the CCMs to reduce the interference of non-neighbors, and the effective channel matrix becomes a band matrix.Neighbor interference is removed in the second stage.Besides, Khalilsarai et al. proposed a method to approximate the downlink CCM of users as the columns of the discrete Fourier transformation matrix, particularly when the number of antennas at the BS is large [22].This approximation enables the BS to utilize codebookbased beam selection for designing the pre-beamforming matrix, thereby reducing the computational complexity.There are also works to improve the performance of JSDM from the aspects of antenna structures [23] and BS selection [24].Tang et al. provided an analysis of two-stage precoding designs under different antenna structures, offering guidelines for antenna structure selection to achieve a better balance between performance and cost [23].Considering that the overlap of the angle-spreading-ranges (ASR) of different user clusters may seriously degrade the performance of two-stage precoding, Ma et al. proposed a solution to minimize ASR using BS selection [24].
As the number of users increases in the system, inter-user interference becomes severe, and a portion of degrees of freedom is used to mitigate inter-group interference, resulting in a degradation of desired signal energy [21].Therefore, it is necessary to schedule users to improve the spectral efficiency.User scheduling in conventional JSDM schemes are divided into two parts: user grouping and intra-group user scheduling.Before implementing JSDM beamforming, users need to be grouped, and the users in each group share a common eigen-subspace, i.e., group eigen-space, where the group eigen-spaces of different groups are orthogonal or non-overlapping.Several user grouping methods have been proposed [25][26][27][28].For example, the K-means clustering algorithm based on chordal distance and fixed quantization algorithm based on discrete Fourier transform are proposed in [25].Xu et al. presented a K-means algorithm based on weighted likelihood metric in [26].Nam et al. transformed user grouping into a subspace packing problem in Grassmann manifold [27], while a recent work [28] proposes a hierarchical clustering algorithm that considers both the number of groups and the chordal distance threshold.Besides, intra-group user scheduling has also been studied.A scheduling algorithm based on average SLNR has been proposed [28].The iterative SLNR-based group scheduling combines the outer precoder and group scheduling to achieve better performance.Xu et al. proposed an optimized scheduling algorithm based on channel quality indicator (CQI).The algorithm assumes that the users cannot achieve the maximum value on two or more beams and assigns a specific beam to each user based on CQI, allowing the user to obtain maximum gain on that beam [26].
Considering the advantages of N-JSDM, incorporating user scheduling into the N-JSDM transmission scheme enables better integration of precoding techniques, further optimizing system performance and enhancing communication quality.In this paper, we propose a user scheduling and dynamic beam allocation method for the N-JSDM transmission scheme to maximize effective spectral efficiency (ESE) subject to limited pilot length.Specifically, considering the challenges in acquiring complete CSI, we formulate the user scheduling problem as a 0-1 quadratic programming by leveraging the channel directional features.Since the users are randomly distributed, we propose dynamically allocating the number of beams serving each user.This idea is incorporated into the optimization problem as a constraint, and the pre-beamformer is designed accordingly.Additionally, we transform the optimization problem into a 0-1 mixed integer programming problem using convex relaxation and linearization techniques.Simulation results demonstrate the validity of the theoretical analysis.The primary contributions of this paper can be summarized as follows: • We analyze the factors that impact ESE and formulate the user scheduling problem as a 0-1 quadratic programming problem with linear constraints, leveraging the channel directional features.These features are more stable over larger time scales compared to instantaneous CSI, which varies according to the channel coherence time.• To simplify the 0-1 quadratic programming problem, we employ convex relaxation and linearization techniques to transform it into a mixed integer linear programming problem.Additionally, to further reduce computational complexity, we propose a dimensionality reduction method.• The pre-beamformer design using dynamic allocation scheme is proposed.Since the number of beams serving each user is determined by the interference between the user and its neighbors, it can be well applied in realistic scenarios where users are randomly distributed and/or DTL is limited.
The rest of the paper is organized as follows.Section 2 describes the system model and the N-JSDM scheme.The problem formulation of N-JSDM user scheduling is provided in Sect.3. The beam allocation method based on overlap density, the linearization method of 0-1 quadratic programming, and the dimensionality reduction method are presented in Sect. 4. In Sect.5, we propose a pre-beamformer design with dynamic beam configuration.Simulation results and discussion are given in Sect.6.Finally, we conclude this paper in Sect.7.

Notation
Bold

System model
We consider a typical single-cell FDD massive MIMO system where a BS is equipped with a uniform linear array (ULA) of M elements serving K single-antenna users.The BS applies a precoder V ∈ C M×K in the downlink to transmit symbols.Then, the received signal at the users can be written as where y = [y 1 , y 2 , . . ., y K ] T ∈ C K ×1 with y k being the received signal of user k , is the transmitted signal satisfying a power constraint E(ss H ) = I K , and n = [n 1 , n 2 , . . ., n K ] ∈ C K ×1 denotes the additive white Gaussian noise vector with n ∼ CN (0, I K ).
In this paper, we adopt a one-ring (OR) channel model [29], in which user k has an azimuth angle θ and an angular spread (AS) , and θ k is randomly distributed 1 .Here, we sort the users as where C is the carrier wavelength, D = C /2 is the spacing between two antenna ele- ments.According to Karhunen-Loeve representation, we can write the channel vector of user k as h k = C 1/2 k z k , where z k is small-scale fading with z k ∼ CN (0, I M ) .Letting C = K k=1 C k , span(H) can be any subspace of span(C).Since statistical CSI varies much slower than the instantaneous CSI, the BS can accurately obtain the statistical CSI through long-term feedback [32,33].

The description of N-JSDM scheme
The N-JSDM uses neighbor scheme instead of grouping scheme to fully utilize the signal space and thus provide a better performance.The following is a brief introduction about N-JSDM.
For user k , if | θ k − θ j |> ω , then user j is called user k 's non-neighbor, and the index set of user k 's non-neighbors is �k = {j| | θ k − θ j |> ω} , where ω is called neighbor angu- lar spread (NAS) (in [21], ω is chosen to be 2 ); the index set of user k and its neighbors is the elements in k are consecutive numbers, and k can be represented as � k = {k l , . . ., k, . . ., k u } .In the following, we refer to the set k as the neighbor domain of user k. (1) dθ, 1 We assume that the information about the CCM is known and accurate.This assumption is reasonable because there have been research studies on CCM estimation, as detailed in [30,31].These works leverage the angular reciprocity between the uplink and downlink channels in FDD systems to improve channel estimation.
N-JSDM is a two-stage beamforming scheme.In the first stage, user k 's CCM C k (k = 1, 2, . . ., K ) is used to design the pre-beamforming matrix B k to reduce non- neighbor interference, so that for each k The effective channel matrix after the pre-beamforming stage is 3), the k-th row of H H B can be written as where is defined as the matrix composed of the pre-beam- forming matrix of user k and its neighbors.Equation ( 4) indicates that h H k B has a contin- uous sequence of col(B � k ) nonzero values, where col(•) refers to the number of columns.It should be noted that since the azimuth angle of users is sorted, when θ k > θ j , there must be k l ≥ j l and k u ≥ j u , so the effective channel matrix H H B is a band matrix.
In the second stage of N-JSDM, to eliminate the interference from neighbors, W = ( H) † Ŵ is designed using the zero forcing criterion [34].Here, H represents the esti- mation of the effective channel matrix H H B , and Ŵ = diag(γ 1 , γ 2 , . . ., γ K ) is used to nor- malize each column of H .As a result, the SINR at user k is given by [21] where w k is the k-th column of W and σ 2 is the noise power.It should be noted that the precoding matrix of N-JSDM is written as V = BW.
In more practical scenarios, users are randomly distributed.The conventional JSDMs divide users into G groups, and when the users are distributed randomly, there always exists common space between the signal spaces of adjacent groups.The signal space of the g-th group is denoted by span(H g ) .To mitigate the inter-group interference, span(B g ) is orthogonal to all the signal space span(H j ), j = g .This means that span(B g ) is orthog- onal to all the overlapped signal space, and hence ∪ g=1,2,••• ,G span(B g ) (i.e., span(B)) is orthogonal to all the overlapped signal space.Consequently, span(H) span(B) , result- ing in a lower-dimensional utilized signal space span(B H H) compared to the full signal space span(H) , thereby decreasing the performance of JSDM.Compared to conventional JSDMs, N-JSDM offers the following advantages: Firstly, it achieves higher spectral efficiency.N-JSDM employs a neighbor grouping approach to further divide users into subgroups, eliminating the requirement for users in the same neighbor domain to share the same common subspace.This allows for the use of more refined precoding techniques to reduce interference, thereby improving the system's spectral efficiency.Secondly, N-JSDM exhibits better interference mitigation capabilities.When designing the pre-beamforming scheme, N-JSDM takes into account the mutual interference between subgroups.By optimizing the pre-beamforming matrix, interference between subgroups can be more effectively reduced, enhancing the system's interference mitigation performance.Therefore, N-JSDM is considered a more promising and feasible beamforming scheme. (3)

User scheduling in N-JSDM
The N-JSDM with user scheduling involves three stages.The first stage is user scheduling, which is used to determine the azimuth angle of the scheduled users and the number of beams serving each user, denoted by θ u and g u , respectively.The second and third stages are pre-beamforming and multi-user precoding, which are used to obtain matrix B and matrix W , respectively.In a more realistic scenario with only a limited number of users are randomly distributed in the cell, it is more reasonable to dynamically allocate the number of beams serving each user.Therefore, in the user scheduling stage, we propose to allocate beams according to the interference between users and their neighbors.The idea of dynamic allocation is also extended to the pre-beamforming stage.In the pre-beamforming stage, the obtained θ u and g u about the scheduled users are used to design the pre-beamformer.Further details regarding the design of the pre-beamformer can be found in Sect. 5.The transformation process of addressing the user scheduling problem is outlined below.
Because there is no concept of user group, the user scheduling approach in N-JSDM is fundamentally different from that of conventional JSDM.To address this, we propose a user scheduling algorithm that solely relies on user directional features.Specifically, we use two channel directional features [35]: azimuth angle and AS.Compared to the instantaneous CSI, these features are more stable [32,33], and easier to obtain.

Problem formulation
The objective of scheduling is to maximize the ESE of the system.Assuming a coherence block with T C symbols and pilot length P , the ESE of user k can be expressed as Note that there is overlapping signal space between some users in the system, and such overlapping signal space represents the inter-user interference (IUI).In the OR channel model, the angle region of user k is defined as Based on the angle region, we introduce the overlap angle (OA) to represent the degree of overlap between users.If there is an intersection between the angle regions of user k and user j , i.e., k ∩ j � = ∅ , the intersection is called an OA.The OA of user k and user j is defined as ( j, k = 1, 2, . . ., K) These angles depict the interference among users.Since ω = 2� , the OA between user k and user j is nonzero if they are neighbors, and zero otherwise.By using the OA, we can construct an angle matrix A , where A kj is the ( k, j)-th element of matrix A .The k-th row of the matrix A can be written as (6) where A k� k = [A kk l , . . ., A kk u ] is composed of the OA of user k and its neighbors.From (9), it can be observed that A k k has | k | − 1 nonzero elements, where | k | refers to the number of elements in the index set k .To distinguish the neighbors and non-neighbors, we introduce an unweighted matrix Â , whose k-th row can be written as Since the denominator of SINR contains the interference term, there is a strong correlation between IUI and SINR.By reducing interference among users, SINR increases.Furthermore, in practical systems, the length of pilot sequences is often limited.As a result, the problem of maximizing the ESE is transformed into minimizing interference while adhering to the constraint of maximum pilot length.
To describe the problem, use x i to denote whether user i is selected, i.e., Then, the problem of minimizing the sum of OAs with pilot constraints is formulated as where U represents the number of scheduled users, β i denotes the weighted factor used to adjust the number of beams allocated to each user, g s refers to the average number of beams assigned to each user in the system after scheduling, P C represents the maxi- mum pilot constraint, and . For a more effi- cient system, we aim to allocate a total number of beams close to M when the number of scheduled users is high.Conversely, when the number of scheduled users is low, increasing the number of beams allocated to each user beyond a certain point will not improve system performance.Therefore, an upper bound value ξ is set for the number of beams assigned to each scheduled user.Based on these considerations, the value of g s is set to min(ξ , M/U ) .The specific design details of β i can be found in Sect. 4.
It is evident that the angle matrix A serves as the coefficient matrix in the objective function of P 1 .In convex quadratic programming, the Hessian matrix of the objective function is positive definite.In the case of P 1 , the Hessian matrix of the objective func- tion is represented by L = 2A .If matrix A is positive definite, then according to the (10) Âk = (0, . . ., 0, 1, 1, . . ., 1, userk 0 , 1, . . ., 1 , 0, . . ., 0). (11) x i = 1, selected; 0, not selected. ( properties of eigenvalues and the necessary and sufficient conditions for a positive definite matrix, matrix L is also positive definite.
However, due to the fact that all diagonal elements of matrix A are 0 , some of the sequential principal minors of matrix A may be smaller than 0 .Therefore, matrix A can- not be a positive definite matrix.To address this, we add a scalar matrix to matrix A , transforming it into a positive definite matrix A P , which can be expressed as If α surpasses the absolute value of the minimum eigenvalue of matrix A , A P is deemed positive definite [36].For simplicity, we set α as the smallest positive integer that ensures the positive definiteness of matrix A P .By replacing the coefficient matrix in the objec- tive function of P 1 with matrix A P , the optimization problem can be transformed into the following matrix form where e = [1, 1, . . ., 1] T ∈ N K ×1 , Âf is a matrix formed by setting the diagonal elements of matrix Â to 1 , n B = [β 1 g s , β 2 g s , . . ., β K g s ] T , and β k g s is the number of beams serving users in the neighbor domain of user k .It is worth noting that P 1 and P 2 are equivalent, as they share the same optimal solution and their optimal values differ by a constant α • U.

Remark
Matrix A (or matrix A P ) is derived from two directional features of the users, namely the azimuth angle θ and angular spread .As a result, the proposed user scheduling algo- rithm only requires these two directional features to perform the scheduling task.

Linearization of 0-1 quadratic programming
In this section, we propose methods to solve β i and the scheduling problem P 2 .

Beam allocation based on overlap density
Before scheduling, only azimuth angle θ , AS and NAS ω can be determined.It is cru- cial to note that the pre-beamforming matrix B of N-JSDM is solved iteratively, and This implies that the azimuth angle of users must be known during the process of solving B , making it challenging to obtain matrix B during the user scheduling process.There- fore, we propose a beam allocation method based on the overlap density of neighbor ( 13) domains.The method is aimed at determining the number of columns of the pre-beamforming matrix B k .
Note that when the local distribution of users is dense, it is advisable to use a small number of beams to serve these users and use more beams to serve other users.This approach is based on the fact that the number of beams in a particular angle area is limited, and it can not only reduce the pilot overhead but also enable the system to serve more users.
The overlap density of the neighbor domain k is used to describe the average degree of overlap between any two users in set k and can be calculated as where the numerator represents the sum of OAs between users in k , the coefficient 1  2 is due to the real symmetry of the angle matrix A , and ! in the denominator is the combination number formula.Considering that the OA range between users in set k is (0, 2�] .The denominator of Eq. ( 15) represents the upper bound of the sum of OAs between users in k , which is equal to the superposition of the maximum OAs of any two users in k .It should be noted that the value range of ρ k is (0, 1]. Next, ρ k is used to determine the average number of beams allocated to each user within the set k .As explained in Sect.3, the average number of beams serving each user in k is β k g s .Assuming that the value range of β k g s is [g s − τ , g s + τ ] , then the expression of β k g s is as follows As seen in ( 16), when some users in the system are densely distributed, i.e., the overlap density of their neighbor domains is high, the number of beams serving these users will decrease, and vice versa.The detail of how to obtain n B is in Algorithm 1.

Remark
The value of τ should not be too large, because when the overlap density of the neigh- bor domain k is small, the average number of beams serving these users will be close to g s + τ .This implies that the total number of beams serving users in this neighbor domain will increase by τ |� k | .Additionally, it is essential to emphasize that while solving prob- lem P 2 (which will later be transformed into problem P 5 ), we do not have knowledge of the exact number of beams serving each user, but only the average value in the neighbor domain.(15)

Linearization
Note that the user scheduling problem in P 2 is a 0-1 quadratic programming problem whose computational complexity increases exponentially with the problem size.To solve P 2 with a low-computational complexity, we further transform it into a 0-1 mixed inte- ger linear programming as follows.Consider the following optimization problem where z ∈ R K ×1 and s ∈ R K ×1 .Theorem 1 P 2 has an optimal solution x * if and only if there are z * and s * such that (x * , z * , s * ) is an optimal solution to P 3 , and P 2 and P 3 have the same optimal solution.

Proof
See Appendix A. It can be observed that the constraint (17b) in P 3 is quadratic, so P 3 is not a lin- ear programming.To further process P 3 , we proceed as follows: From (17b), we can deduce that if x i = 1 , then z i must be 0 , but if x i = 1 , then z i is not necessarily 0 .Moreover, from (17a), we have z ≤ A P x , implying an upper bound on z .Thus, we have z ≤ A P x ≤ �A P � ∞ • e , where �A P � ∞ = max i K j=1 |a ij | is the infinite norm of the matrix A P .By letting M T = �A P � ∞ and using z ≤ M T (e − x) to replace z T x = 0 , we can trans- form P 3 into the following form (17) z, s) is a feasible solution of P 3 if and only if (x, z, s) is a feasible solution of P 4 ; (x, z, s) is an optimal solution of P 3 if and only if (x, z, s) is an optimal solution of P 4 .

Proof
When (x, z, s) is a feasible solution of P 3 , obviously (x, z, s) is a feasible solution of P 4 .Assuming that P 4 has a feasible solution (x, z, s) , because of 0 ≤ z ≤ M T (e − x) , when x i = 1 , there must be z i = 0 , while x i = 1 implies that z ≤ M T .Therefore, we can obtain z T x = 0 , indicating that (x, z, s) is also a feasible solution of P 3 .Similarly, it can be proven that (x, z, s) is an optimal solution of P 3 if and only if (x, z, s) is an optimal solution of P 4 .

The algorithm to obtain scheduled users and beams
It is worth noting that the solution space dimension of P 4 is 3K .This implies that if the number of original users in the system is large, the solution space dimension of P 4 will also be large.As the computational complexity grows with the size of the problem, P 4 still has a high complexity when the user scale is large.Thus, we simplify P 4 as follows: Since A P x − z − s = 0 ⇔ A P x − s = z and 0 ≤ z ≤ M T (e − x) , the constraints (18a) and (18b) can be transformed into 0 ≤ A P x − s ≤ M T (e − x) .Hence, P 4 can be trans- formed into (18) (19c) e T x = U P 5 is a mixed integer linear programming that can be solved using the branch and bound algorithm.Here, we implemented the branch and bound algorithm using the MOSEK optimization solver [37] in the CVX toolbox.

Remark
In practical scenarios, local users may be densely distributed, and/or the pilot requirements P C may be too strict, leading to situations where problem P 5 has no solution.In such cases, we choose to gradually reduce the number of scheduled users U until they can be effectively served.To achieve this, we reduce one user at a time, update g s , and then recalculate the solution of P 5 based on the updated conditions.
Once P 5 has been solved, we can determine the average number of beams serving each user in each neighbor domain and the scheduled users.However, the exact number of beams serving each user remains unknown.To address this issue, we utilize a linear system of equations to calculate g k .Firstly, we sort the users in ascending order based on their azimuth angle and obtain the angle matrix A S of the scheduled users.We set its diagonal elements to 1 and convert it into an unweighted matrix ÂS .Then, we sum the rows of the matrix and convert it into a diagonal matrix D S .The form of matrix D S is as follows We can also get the β k g s (k = 1, 2, . . ., K ) corresponding to the remaining users and sort them in ascending order, i.e., ňB = ( β1 g s , β2 g s , . . ., βU g s ) .Considering that some users are neighbors with each other but non-neighbors with other users, we take the average of βu for these neighbor users.The system of equations for solving g u (u = 1, 2, . . ., U ) is as follows The solution of g is g = [g 1 , g 2 , . . ., g U ] T = ( ÂS ) † D S ňB .As the solution for g u may con- tain decimal values, we perform round down operation on it, i.e., g u = ⌊g u ⌋ , and set the solution of g u to 1 if it is less than 1 .Please refer to Algorithm 2 for the details of solving P 5 and determining the number of beams.

Algorithm 2 Acquisition of Scheduling users and the number of beams
There are a total of three benchmark algorithms considered in this paper.It should be noted that user scheduling and pre-beamforming in the active channel sparsification [22] method are coupled, requiring the solution of a mixed integer linear programming for joint beam and user selection.However, without specifying the optimization toolkit used, it is not possible to determine its computational complexity.Therefore, we conduct a brief analysis and comparison of the computational complexity of proposed algorithm and the other two benchmark algorithms.The user scheduling in conventional JSDM schemes consists of two stages: user grouping and intra-group user scheduling.In the user grouping stage, the computational complexity of the K-means user grouping method with chordal distance in [25] is O(N it KG(2M 3 + M 2 )) , where N it is the default number of iterations.The computational complexity of the agglomerative user clustering method with chordal distance in [28] is O( K (K −1) 2 (2M 3 + M 2 )) .Since intra-group user scheduling is often coupled with beamforming design, it would be unfair to compare its computational complexity with our proposed algorithm.The computational complexity of the greedy intra-group user scheduling algorithm in [26] is O(UK ) after modifying the termination condition to scheduling U users and the complexity of beamforming design being ignored.It can be observed that the user grouping stage is the main contributor to the complexity.
In contrast, the computational complexity of our proposed algorithm (Algorithm 2) depends on the complexity of two sub-processes: Algorithm 1 and optimization problem P 5 .The computational complexity of Algorithm 1 is O(K (K − 1)) , where K − 1 is the number of times to determine the edge neighbors of each user.Optimization problem P 5 is solved using the branch and bound method, with a computational complexity of O(2 2K ) , where 2K represents the problem scale.Therefore, the overall computational complexity of our proposed algorithm is O(U(K (K − 1) + 2 2K )).

Discussion on proposed user scheduling algorithm
This scheme has three advantages.First, it proposes a beam allocation method that considers the overlap density in the neighboring domain, which guarantees that all scheduled users can be served.This is due to the problem that the pre-beamforming design method with constrained DTL in N-JSDM cannot be implemented because of the dense local user distribution.Second, the scheme takes into account the influence of the pilot.Furthermore, the scheme is adaptive.If there is no solution to the optimization problem, the number of scheduled users will be gradually reduced until they can be served.Gradually reducing the number of scheduled users in practical scenarios until they can be effectively served brings the following benefits: reduced system load, improved user experience, and decreased interference levels, among others [25].
Figures 1 and 2 illustrate two examples of user scheduling results in a macro-cell scenario.Figures 1a and 2a are drawn from the same initial distribution of users, as are Figs.1b and 2b.In Fig. 1, the hollow diamond at the center represents the massive MIMO base station, and the large circle indicates the coverage area with a radius of 50 km.Other markers represent users, where hollow circles denote unscheduled users, and solid circles represent scheduled users.In certain scenarios, the actual number of scheduled users, denoted as U ′ , may fall short of our expectations due to unfavorable initial user distributions and stringent pilot conditions (for example, Fig. 1b).In Fig. 2, we employ bar graphs to illustrate the scheduling status of users.The vertical axis represents the average number of beams serving each user in their respective neighborhoods, while the horizontal axis represents the user indices.Unfilled bars indicate unscheduled users, while filled bars indicate scheduled users.It can be observed that when the system imposed a limited length of pilots, the desired number of scheduled users cannot be achieved, resulting in U ′ < U .In such cases, the relationship between the average number of beams serving users in their neighborhoods and their scheduling status is not evident.The high average number of beams serving each user in the neighborhood can be attributed to two factors: low overlap density in user neighborhoods and users having fewer neighbors.According to the expression of the pilot (22), we know that the pilot is not only related to the number of neighbors but also to the total number of beams serving users in their neighborhoods.Hence, even if the average number of beams per user is relatively small, a subset of users will still be scheduled to ensure the scheduling of U' users within the limited pilot length.User 16 and user 36 in Fig. 2a and user 20 and user 21 in Fig. 2b serve as examples of this scenario.
Figure 3 displays the ESEs under different P C s.As P C increases, the ESE initially increases and then levels off.This indicates that a small number of P C s often leads to a Fig. 2 Average number of beams under different pilot constraints failure in scheduling the expected number of users, resulting in performance degradation.Furthermore, it can be observed that a small value of parameter ξ primarily helps to maintain a better level of ESE when P C is relatively small.Additionally, as shown in Fig. 3b, when the number of scheduled users is small, the value of P C that results in a smoother ESE will also decrease proportionally.From Fig. 3b, we can also observe that when the number of scheduled users is small and P C is large, the ESE for ξ = 2 is sig- nificantly lower than for other values.This discrepancy occurs because a larger P C value generally leads to a higher likelihood of achieving the expected number of scheduled users.Considering the condition M/U > 2 , it implies that with ξ = 2 , fewer beams are allocated to each user compared to other values.This limitation restricts the column number of B to a significantly smaller value than the number of antennas M, resulting in a larger discrepancy between the column space of B and the column space of C com- pared to other values.Consequently, the ESE is lower for ξ = 2 .Therefore, the param- eter ξ should be set based on both P C and the number of scheduled users U to optimize system performance.

Pre-beamformer design with dynamic beam configuration
We now present the dynamic pre-beamformer design for scheduled users, which differs from previous pre-beamformer designs in N-JSDM by considering the specific user distribution to dynamically configure the beams.The previous designs include the optimal design and the design method with constrained DTL [21].While the optimal design achieves good performance with a large DTL, it is not suitable for pilot-constrained scenarios.To reduce the DTL, the constrained DTL design limits the number of columns of the pre-beamforming matrix for each user [21].Specifically, the number of columns of the pre-beamforming matrix B is set to ⌊gK ⌋ , where g = M/K , and ⌊•⌋ is the round down operation.The number of columns in B k is ⌊gk⌋ − ⌊g(k − 1)⌋.
However, the constrained DTL design method has a fixed number of columns for B k , which makes it unsuitable for scenarios with randomly distributed users.Therefore, we propose dynamically configuring the number of columns in B k .Notably, in scenarios with harsh pilot conditions, the optimal design may not meet the transmission requirement, while our proposed method can satisfy it.In the following, we describe how to implement this method using the obtained θ u and g u .
Assume that θ u and g u of the scheduled users are given.For the unity of symbols, we still use the subscript k to denote the parameters related to user k in this section.From Eq. ( 4), we can know that user k only needs to feed back h H k B k to BS.The feedback length d k equals to the number of elements of h H k B k , i.e., the number of columns of B k .The minimum DTL is L = max k d k .In this work, we consider the case where the pilot is the minimum DTL.
In this paper, the pilot length is limited.Since the number of columns of matrix B k is g k , the index set of user k and its neighbors has a linear relationship with the number of columns of B k , i.e., the number of neighbors of the user has a linear relationship with the number of pilots.The pilot length P is given by To mitigate the non-neighbors' interference of user k, the pre-beamforming matrix B needs to be designed satisfying Eq. ( 3).Considering that if user i is a neighbor of user k, then conversely, user k is also a neighbor of user i.Therefore, we can regard Eq. ( 3) as the problem of designing matrix B k to satisfy for each k.According to Karhunen-Loeve representation, we can express the channel vector of user k as h k = C 1/2 k z k , where C 1/2 k is a Hermitian matrix.Substituting this into Eq.( 23), we obtain the equivalent form During the pre-beamforming stage, only the CCMs C k are available at the BS.Without the knowledge of z i , Eq. ( 24) can be reformulated as (22) This implies span(B k ) ⊆ span ⊥ (C 1/2 i ) for each i ∈ ¯ k .Based on Lemma 1 in [21], we have span(B k ) ⊆ span ⊥ ( Ck ) , where Ck = i∈ ¯ k C i .
To fully utilize the signal space and achieve large spectral efficiency, we design span(B) to be close to span(C) .This is because the spectral efficiency of the system will be maxi- mized when design B satisfying S C ∩ span(H) ⊆ span(B) where S C = span ⊥ ( Ck ) , and span(H) ⊆ span(C k ) ⊆ S C [21].The difference between two spaces is represented by the chordal distance [25], and the chordal distance of spaces span(C) and span(B) is where � • � F is the Frobenius norm, U C and U B are the orthonormal basis of spaces span(C) and span(B) , respectively.In order to design span(B) approaching to span(C) , the chordal distance between span(C) and span(B) should be minimized, and the prob- lem of designing B is formalized as where constraint (27a) ensures that the effective channel matrix is a band matrix and Ck = i∈ ¯ k C i , constraint (27b) ensures that the design of matrix B meets the pilot requirements.P 6 is solved iteratively using a greedy algorithm.First, the space span(C) is divided into K subspaces, i.e., S k = span(C k ), k = 1, 2, . . ., K .Then iteratively solves the pre-beamforming matrix B k such that the chordal distance between S k and k j=1 B j is minimized.When the iteration is complete, D C (span(B), span(C)) will be small.In the k -th iteration, the problem of designing B k is as follows where G k = [B 1 , B 2 , . . ., B k ] .Setting the number of columns of the matrix B k to the obtained value g k during the user scheduling stage ensures that the actual pilots of the system are less than or equal to P C .This is because of the constraint (12b) of the user scheduling problem P 1 .
Let U S k be the orthogonal basis of S k .Since G k is the orthogonal basis of span(G k ) , we have (26) Taking into account the non-negativity property of the Frobenius norm, we now just focus on

we have
Based on the property of trace, we can derive B = B� k U ε N , where B k is the orthogonal basis of the space span ⊥ (B � k ) , and U ε is the matrix composed of the eigenvectors cor- responding to the eigenvalues of the matrix B k Rk less than ε .For detailed derivations, please refer to [21].Unlike the design method with constrained DTL, N is an unitary matrix composed by the eigenvectors of Figure 4 illustrates the chordal distance of different iterations.It should note that, given the dimension of these two spaces (e.g.,N 1 and N 2 ), a chordal distance of 0 indicates that the spaces are the same.When the chordal distance reaches its maximum value of N 1 + N 2 , the spaces are orthogonal to each other.From Fig. 4, we can observe that in each iteration, the chordal distance between B k and C k remains small but nonzero.This is because B k is designed not only to approximate C k , but also to lie in the null space of the CCM Ck .The chordal distance between span(B) and span(C) gradually increases with the number of iterations.This can be attributed to the increasing dimension of span(B) over the course of iterations.Therefore, even though span(B) is designed to approach span(C) incrementally, their chordal distance still increases. (

Simulation results
In this section, we provide the simulation results of the proposed algorithm.A ULA with M = 64 antennas at the BS is considered, and K = 36 single-antenna users are served.The azimuth center angle of each user is uniformly distributed in [− π 3 , π 3 ] and the angu- lar spread is 5 • .For JSDM, the users are partitioned into G groups, and the number of G is proportional to the number of users, i.e., G = ⌊K /6⌋ .The value τ in Sect. 4 is set to 1 .The parameters in the design of the pre-beamforming matrix B are consistent with those in [21].
The user scheduling in conventional JSDM consists of two parts: user grouping and intra-group user scheduling.The user grouping stage utilizes the K-means algorithm with chordal distance [25] and the agglomerative user algorithm [28] for grouping users (where the effective channel dimension in the g-th group is ⌊M/G⌋ ).The user scheduling stage employs the algorithm from [26] for scheduling.Given the user grouping, the ESE for scheduled user k in group g is η , where SINR g,k denotes the SINR of user k(k = 1, 2, . . ., K g ) in group g and then the overall ESE is The expected number of active users to be scheduled each time is U . Figure 5 illus- trates the ESEs of all algorithms under different SNRs.Our proposed algorithm (denoted by N-JSDM Mixed integer) is compared with two benchmark algorithms of conventional JSDM user scheduling (denoted by JSDM Agglomerative & Greedy and JSDM K-means & Greedy, respectively), the active channel sparsification method (denoted by ACS) as well as N-JSDM with random user scheduling (denoted by N-JSDM Random).The number of scheduled users in Fig. 5a, b is 30 , while the number of scheduled users in Fig. 5c is 24 .T C in Fig. 5a, c are 100 , and T C in Fig. 5b is 50 .All algorithms exhibit increasing ESE with higher SNR values.It can be seen that our proposed algorithm achieves higher ESE compared to the other algorithms.The performance difference between JSDM Agglomerative & Greedy and JSDM K-means & Greedy stems from their user grouping schemes.The agglomerative user clustering method does not depend on the initial choices of the cluster centers [28].The ACS consistently exhibits lower ESE compared to other algorithms.This is because the ACS method approximates the downlink CCM of users using the columns of the discrete Fourier transformation matrix compared to other algorithms.This approximation enlarges the energy of both the received signal and interference, and the inter-user interference is directly proportional to the transmission power.When the transmission power is at low level, noise dominates over inter-user interference, and due to the large received power of the signal, the ACS method achieves a large SINR, resulting in a high ESE.However, as the transmission power increases, inter-user interference also increases, leading to no improvement in ESE with increasing SNR.All algorithms except for ACS exhibit similar performance at low SNR.This similarity arises from the fact that in smaller NAS, the impact of DTL on ESE is not significant, and ESE is primarily influenced by spectral efficiency.As the SNR increases, our algorithm achieves higher ESE by minimizing interference and considering spectral overhead.From Fig. 5, it can be observed that the performance gap between N-JSDM and JSDM widens as the number of users increases.This widening gap is attributed to the larger loss of signal space caused by JSDM grouping when the number of users is high, whereas N-JSDM, utilizing the neighbor strategy, can fully leverage the signal space, resulting in more significant advantages.Since the ACS method is primarily designed for scenarios where the number of antennas tends to infinity, detailed analysis of ACS performance will not be included in the following simulation.  .When T C is small, the influence of spectral overhead becomes significant since the DTL is positioned in the fractional numerator.As T C grad- ually increases, the influence of spectral overhead diminishes, and the significance of spectral efficiency becomes more pronounced.Consequently, the ESE exhibits a gradual upward trend.
Figure 7 depicts the ESEs of all algorithms under different numbers of scheduled users.Several noteworthy observations can be made.Firstly, the ESEs of N-JSDM algorithms increase as the number of scheduled users grows, albeit at a gradually slowing rate.This is because increasing the number of users can enhance spectral efficiency, but it also leads to an increase in interference between users.Secondly, when the number of scheduled users reaches a certain threshold, the performance of conventional JSDM schemes with user scheduling begins to decline.This indicates that as the number of users in the system becomes larger, the performance degradation caused by JSDM grouping becomes more pronounced.Thirdly, due to the approximation used for the CCM, the performance of the ACS method is consistently lower than other algorithms.
In addition, we have conducted additional simulations to evaluate the performance of our proposed algorithm under extreme conditions.These simulations aim to assess the algorithm's robustness and its behavior in challenging scenarios, including scenarios with extremely low SNR and non-uniform user distributions.The performance of the N-JSDM Random is not shown in the following as it is expected that random user scheduling performs inferior to our proposed algorithm.Our focus is on the performance differences between the proposed algorithm and the other benchmark algorithms.
Figure 8 presents the ESEs of all algorithms under different extremely low SNR conditions.It is evident from Fig. 8 that our proposed algorithm consistently achieves higher ESE compared to other algorithms.This superiority stems from our algorithm's scheduling objective of minimizing system interference, which enables better reduction of inter-user interference in low SNR scenarios.The performance of JSDM Agglomerative & Greedy and JSDM K-means & Greedy is similar, as they both utilize the same scheduling criterion, namely maximizing SINR.The slight performance differences arise from their distinct user grouping methods.On the other hand, the ACS method exhibits the poorest performance due to the approximation employed for the CCM.
Figure 9 illustrates the ESEs of all algorithms under different user distributions, with a standard deviation of 20 for the normal distribution.Comparing it to Fig. 5a, it is evident that the performance of all algorithms experiences a significant decline.This decrease in performance can be attributed to the extreme user distribution, which leads to densely populated local user clusters, making it challenging to achieve the desired number of scheduled users.Furthermore, the interference among the scheduled users is substantial, further contributing to the degradation in performance.To enhance visual clarity, we have omitted the curve for JSDM K-means & Greedy, which exhibits marginally lower performance compared to JSDM Agglomerative & Greedy.

Conclusion
We proposed a user scheduling method in massive MIMO systems using channel directional characteristics and proposed a dynamic beam allocation method matching the proposed user scheduling.Compared with the complete CSI-based schemes, the two directional features used in this paper, i.e., the azimuth angle and the AS, are generally stable over large time scales.The proposed method scheduled users using mixed integer programming, aiming to improve system performance.Simulations validated the superiority of the proposed method.In our future work, we will extend our method to more channel models, such as Saleh-Valenzuela geometric model and multiple scatterer clusters model.However, since (x * , z * , s * ) is the optimal solution of P 3 , it follows from the proof of necessity that Since xT A P x < x * T A P x * , e T s * > e T s can be obtained, which contradicts that (x * , z * , s * ) is the optimal solution of P 3 .Hence, Theorem 1 is proved.

Figure 6
Figure6shows the ESEs of all algorithms under different T C s.The weight of spectral overhead in ESE varies with T C .When T C is small, the influence of spectral overhead becomes significant since the DTL is positioned in the fractional numerator.As T C grad- ually increases, the influence of spectral overhead diminishes, and the significance of spectral efficiency becomes more pronounced.Consequently, the ESE exhibits a gradual upward trend.Figure7depicts the ESEs of all algorithms under different numbers of scheduled users.Several noteworthy observations can be made.Firstly, the ESEs of N-JSDM algorithms increase as the number of scheduled users grows, albeit at a gradually slowing rate.This is because increasing the number of users can enhance spectral efficiency, but it also leads to an increase in interference between users.Secondly, when the number of scheduled users reaches a certain threshold, the performance of conventional JSDM schemes with user scheduling begins to decline.This indicates that as the number of users in the system becomes larger, the performance degradation caused by JSDM grouping becomes more pronounced.Thirdly, due to the approximation used for the CCM, the performance of the ACS method is consistently lower than other algorithms.In addition, we have conducted additional simulations to evaluate the performance of our proposed algorithm under extreme conditions.These simulations aim to assess the algorithm's robustness and its behavior in challenging scenarios, including scenarios with extremely low SNR and non-uniform user distributions.The performance of the N-JSDM Random is not shown in the following as it is expected that random user scheduling performs inferior to our proposed algorithm.Our focus is on the performance differences between the proposed algorithm and the other benchmark algorithms.Figure8presents the ESEs of all algorithms under different extremely low SNR conditions.It is evident from Fig.8that our proposed algorithm consistently achieves higher ESE compared to other algorithms.This superiority stems from our algorithm's scheduling objective of minimizing system interference, which enables better reduction of