Optimum beamforming for MIMO multicasting

This paper investigates the transmit (Tx) beamforming design to maximize the throughput of a multiple-input multiple-output multicast channel, where common information is sent from the base station to K users simultaneously. This so-called max-min fair beamforming problem is known to be NP-hard. When the base station is equipped with two Tx antennas, we prove that the original complex-valued beamforming problem can be transformed into a real-valued problem and the globally optimal solution can be found by exhausting at most CK1+CK2+CK3 hypothesis tests. Moreover, a prune and search algorithm (PASA) is proposed for searching the optimal beamformer with computational complexity O(K3) in the worst case. When the base station has more than two Tx antennas, we develop an efficient algorithm named iterative two-dimensional optimization which converts the original beamforming problem into a series of two-antenna subproblems by iterations and hence, the beamformer is improved using PASA iteratively. Simulations results are presented to demonstrate the superior performance of the proposed algorithms.


Introduction
In the next generation of wireless networks, spectrally efficient multicasting techniques are required to support applications such as web TV, online gaming, and software updates, where common messages are sent to a group of users simultaneously. Under the assumption that the channel state information of all users is available at the multi-antenna base station (BS), transmit (Tx) beamforming can be used to improve the performance of multicasting. Consequently, the problem of multicast beamforming has received significant attentions recently [1][2][3][4][5].
In [1], Lopez formulated the multicast beamforming problem as maximizing the average signal-to-noise ratio (SNR) of all users, for which the optimal beamformer can be obtained by an eigendecomposition. However, this approach does not guarantee satisfactory performance for all users. In general, the performance of multicasting is determined by the user(s) with the lowest SNR. From this point of view, a practical criterion is to maximize the minimal SNR of all users. This optimization problem is referred to as max-min fair beamforming and is known http://asp.eurasipjournals.com/content/2013/1/121 vectors can be obtained by the semidefinite relaxation when the base station is equipped with two antennas. Yet, it cannot be applied to the multicast scenario we investigate here.
For the max-min fair beamforming problem, to the best of our knowledge, previous methods are not guaranteed to obtain the globally optimal beamforming vector when the base station serves arbitrary number of users. In addition, most of the methods are only applicable to the scenario of single-antenna users. In this paper, we intend to address the general case for max-min fair beamforming where the users are equipped with one or multiple receive antennas. The main contributions of this paper are listed below: (1) For the case that the BS has two Tx antennas, we derive that the feasible set of the SNR vector of all users is a two-dimensional ellipsoid in K -dimensional Euclidean space, where K is the number of users. With this geometrical property, we prove that the original complex-valued optimization problem can be simplified as a real-valued problem and the optimal beamforming vector can be found by exhausting C 1 K + C 2 K + C 3 K hypothesis tests of the bottleneck users, which are defined as the users with the lowest SNR. (2) In order to reduce the complexity of exhausting, we propose a prune and search algorithm (PASA) which is guaranteed to find the globally optimal beamformer. By analyzing the probabilities for three cases of bottleneck users, we prove that the worst-case computational complexity of PASA is O(K 3 ). It is showed that PASA is computationally more efficient than most of the existing schemes. (3) For the general case that the BS is equipped with more than two Tx antennas, we propose an iterative two-dimensional optimization (I2DO) algorithm which iteratively transforms the problem of beamformer design into a sequence of two-antenna subproblems and then PASA can be used to improve the beamformer at each iteration. When the number of users is large, the throughput achieved by the proposed beamformer has considerable improvement over the state-of-the-art multicasting techniques.
The remainder of this paper is organized as follows. In Section 2, we introduce the multiple-input multipleoutput (MIMO) multicast channel and formulate the transmit beamforming problem. In Section 3, we analyze the special case of two Tx antennas and deduce a new formulation for the beamformer design. Based on the new formulation, the PASA is proposed to obtain a globally optimal beamforming vector. For the case of more than two Tx antennas, we propose the I2DO algorithm in Section 4. Section 5 presents simulation results to verify the effectiveness of the proposed approaches. Finally, conclusions are drawn in Section 6.
Notations: We use uppercase and lowercase bold letters to denote matrices and vectors. The superscripts (·) T , (·) * , and (·) † stand for transpose, conjugate transpose, and pseudo-inverse, respectively. Re(·) and Im(·) mean the real part and imaginary part, respectively. · and · F denote the vector Euclidean norm and the Frobenius norm. I n represents an n × n identity matrix and 1 n is an all-one column vector with length n, while 0 n is an all-zero column vector with length n.

System model and problem statement
We consider a MIMO system consisting of one base station and K users, where the base station is equipped with M transmit antennas and the k-th user has N k receive antennas. The multicast scenario is investigated, that is, the base station broadcasts common messages to K users. Then, the received signal of the k-th user, i.e., y k ∈ C N k is whereH k ∈ C N k ×M is the channel between the base station and the k-th user, s ∈ C M is the transmit signal, and n k ∈ C N k is the additive complex Gaussian noise vector at the k-th user. We assume thatH k , k = 1, . . . , K are known to the base station by exploiting channel reciprocity or through a feedback channel. Moreover, we consider the block fading channel model, i.e., the multicast channel remains constant during the transmission block and changes from one block to another. Without loss of generality (w.l.o.g), we also assume that the additive noise follows the distribution: n k ∼ CN (0, I N k ) a . Let w ∈ C M , w 2 = 1 denote the beamforming vector. Then, the transmit signal is s = √ Pws, where s is the information symbol with zero mean and unit variance and P denotes the transmit power. Hence, the received signal at the k-th user can be rewritten as For the sake of clarity, H k = √ PH k can be referred to as the equivalent channel throughout this paper.
With transmit beamforming, the achievable rate of the k-th user is b To maximize the throughput of the multicast channel, which is the minimum achievable rate of all users, we have an optimization problem with respect to w max w∈C M min k=1,...,K log 2 w * H * k H k w + 1 subject to w 2 ≤ 1.
(4) http://asp.eurasipjournals.com/content/2013/1/121 Since the 'logarithm' function is monotonically increasing and the optimal beamforming vector w opt in Eq. (4) must satisfy w opt 2 = 1, Eq. (4) is equivalent to It is proven in [2] that the max-min fair beamforming problem (5) is non-convex and NP-hard in general. To solve this problem, we first analyze the special case of two transmit antennas before addressing the general cases.

Case of two Tx antennas
In this section, the special case that the base station has two Tx antennas is investigated, i.e., M = 2. The results obtained here will be used for the general case in Section 4. It is worth mentioning that the early forms of some lemmas in this section are formerly established in the conference version of this paper. For the sake of completeness, we recall the refined lemmas here.

Feasible set of the SNR vector
Defining γ (w) as the SNR vector of K users we have the following theorem.
Theorem 1 ([7]). For the case of M = 2, the feasible set of the SNR vector can be equivalently expressed as where and with a k , b k , and c k from the matrix Proof. The proof is omitted. Please see [7] for details.
For a given real matrix A, a hyper-ellipsoid is defined by {Ax : x 2 = 1} [8]. From Eq. (8), we can see that the feasible set of the SNR vector is a two-dimensional ellipsoid embedded in K-dimensional space with center z c .
Let h k = H k 2 F 2 and g k ∈ R 3 be the k-th column of matrix G T . With Theorem 1, problem (5) can be reformulated as As stated in the proof of Theorem 1, vector x in Eq. (12) has an one-to-one relationship with the beamforming vector w in Eq. (5) where θ ∈[0, π/2), φ ∈[0, 2π). Let (x opt , λ opt ) denote the optimal solution to Eq. (12). If (x opt , λ opt ) is given, then we can calculate θ opt and φ opt from Eq. (13a) and henceforth the optimal beamforming vector w opt follows in Eq. (13b). Due to the constraint x = 1, Eq. (12) is a non-convex problem [9]. Nevertheless, it is easier than problem (5) since Eq. (12) is an optimization problem in the real space. In the following, we develop a so-called prune and search algorithm (PASA) to find a globally optimal solution to Eq. (12).

Prune and search algorithm
To gain more insights of problem (12), firstly we derive the Fritz John necessary conditions (FJ conditions). Let f (λ) = −λ, g k (x, λ) = λ − h k − g T k x, k = 1, . . . , K, and h(x) = x T x − 1. Now, problem (12) can be written in the standard form According to the Fritz John necessary conditions [10], we have where μ 0 , ν, μ k , ∀k are scalars and they cannot be all equal to zero. For a feasible solution x, by condition (15a), we have x = 1 2ν K k=1 μ k g k . Let K ⊆ {1, · · · , K} denote the set of active http://asp.eurasipjournals.com/content/2013/1/121 constraints, i.e., g k (x, λ) = 0, k ∈ K. With Eq. (15b), we know that μ k = 0 for k / ∈ K. Hence, we can see that where ω k = μ k 2ν , k ∈ K are of the same sign. For the sake of clarity, we refer to the user(s) with the lowest SNR as bottleneck user(s). With this definition and the FJ conditions, we can see that the optimal solution to Eq. (12) must have form as x = k∈K ω k g k where {ω k } are of the same sign and K also denotes the set of bottleneck users.
In the lemma below, we show that the globally optimal solution to Eq. (12) can be found by a series of hypothesis tests.

Lemma 1 ([7]
). The globally optimal solution to Eq. (12) can be found through exhausting C 1 K + C 2 K + C 3 K hypothesis tests of the bottleneck users.
Proof. Defining {1, 2, . . . , L} as the indexes of the bottleneck users, we have To facilitate the analysis, we write Eq. (17) into the matrix form where Note that G a ∈ R L×4 , x a ∈ R 4×1 , and h a ∈ R L×1 . Since H k , k = 1, . . . , K are independent random fading channels, elements of g l and h l are random variables, and G a is full-rank with probability 1 c . For L > = 5, Eq. (18a) has no solutions in general; for L = 4, there is a single solution x a = G −1 a h a to Eq. (18a); however, Eq. (18b) is not fulfilled in general. If L ≤ 3, by letting the orthogonal basis in the null space of G a be N G a = [η 1 , η 2 , · · · ] , N G a ∈ C 4×(4−L) , we can write the general form of solutions to Eq. (18a) as where ρ ∈ C (4−L)×1 is an arbitrary vector. With the degree of freedom introduced by ρ, it is possible to find a solution which satisfies Eq. (18b) d . Since the number of bottleneck users is at most three, therefore, the globally optimal solution to Eq. (12) can be found by exhausting hypothesis tests of the bottleneck users.

Remark 1.
Even if the number of bottleneck users happens to be more than three, i.e., L > 3, albeit with zero probability, the optimal solution can still be calculated by hypothesis tests of three bottleneck users, since the solution to Eq. (17) is uniquely determined by any three of the L bottleneck users.
Although the optimal solution can be obtained by exhausting all possible combinations of bottleneck users, the computational complexity can be rather high especially when the number of users is large. In the following, we develop several lemmas to dramatically reduce the number of hypothesis tests before identifying the optimal solution. Lemma 2 (Sufficient condition for optimality [7]). Given a set of L ≤ min(3, K) constraints with indexes Then, we have where λ opt is an optimum solution to Eq. (12). Furthermore, if x 0 satisfies h k + g T k x 0 ≥ λ 0 for k = 1, . . . , K, then λ 0 = λ opt , and (x 0 , λ 0 ) is also an optimum solution to Eq. (12).
Proof. The proof is omitted. Please refer to [7] for details.
From Lemma 2, we can see that the optimal solution to Eq. (20) always yields an upper bound to problem (12). Letting UB denote the upper bound to problem (12), we have Besides, in the process of verifying the hypothesis tests, assuming we have tested N candidate solutions denoted as: {x n , x n 2 = 1, n = 1, . . . , N}, then we also have a lower bound of λ opt The lower bound LB and upper bound UB can be updated and become more and more tight as the search proceeds. With these bounds, most hypothesis tests can be pruned, and hence, the complexity of exhausting is http://asp.eurasipjournals.com/content/2013/1/121 reduced drastically. This is also the meaning of the prune and search algorithm.
In PASA, the hypothesis tests of one bottleneck user are checked firstly, then two bottleneck users and so on.
Lemma 3 (Case of one bottleneck user [7]). If there is only one bottleneck user in Eq. (12), then the bottleneck user must be the one with index and the optimum solution to Eq. (12) is Proof. Please see Appendix 1.
If we have found a globally optimum solution to Eq. (12) in the hypothesis tests of one bottleneck user, obviously there is no need to check the rest of hypothesis tests, and the PASA routine terminates. Otherwise, we turn to check the case of two bottleneck users.
If there are two bottleneck users in Eq. (12) with indexes i, j, then the optimum solution to Eq. (12) must be of form x = αg i +βg j with α, β being of the same sign according to the FJ conditions. Note that In the next, we show the process of calculating the candi- From the assumption of bottleneck users we have With Eq. (24), α, β can be calculated by Hence, x is Recalling the norm constraint x = 1, we have a quadratic equation with regard to λ and If b 2 − ac ≥ 0, from Eqs. (26), (27), and (28), the SNR of bottleneck users is Note that there are two solutions to λ, hence we need to verify both of them. With α, β in Eq. (25), the candidate solution is x = αg i + βg j .
Proof. Please see Appendix 2.
x always holds true for all unit-length vector x, then the i-th user must not be a bottleneck user since the SNR of the j-th user is always lower. Consequently, the i-th user can be eliminated from problem (12) without loss of optimality.
If there are three bottleneck users in Eq. (12) with indexes i, j, k, then the optimum solution to Eq. (12) must be of form x = αg i + βg j + γ g k with α, β, γ being of the same sign according to the FJ conditions. Note that Similarly from the assumption of three bottleneck users i, j, k, we have equations as below where With the norm constraint x = 1, we have Defining and (by a slight abuse of notation) we can rewrite Eq. (31) as If b 2 −ac < 0, it means that the SNR of user i, j, k cannot be equal, and the three users are surely not the real bottleneck users. If b 2 − ac ≥ 0, the solutions to Eq. (31) are and α, β, γ are given by Finally, the candidate solution is x = αg i + βg j + γ g k .
Proof. Please see Appendix 3.
Combining the calculation of candidate solutions with above lemmas and remarks, the full procedure of PASA follows straightforwardly and it is formerly proposed in [7]. To make this paper self-contained and facilitate the complexity evaluation of PASA, we include an improved version of PASA in Appendix 5 e . As we can see in the pseudo-code, the algorithm constantly updates the bounds UB and LB to prune the hypothetical combinations of bottleneck users and henceforth the name PASA.
Remark 3. The users with weak channels are more likely to be bottleneck users in general. So before the hypothesis tests, the users can be sorted by the strength of their channels, i.e., the Frobenius norm of channels (see Eq. (9)) and the users with weaker channels should be tested first.

Remark 4.
In the process of searching, the lower bound and upper bound are recorded in LB and UB. For a random generated multicast channel with K = 32 single-antenna users, we display the process of updating for LB, UB in Figure 1. We can see that the upper bound and lower bound are getting tighter along with the hypothesis tests. When the gap between LB and UB is less than a given threshold: UB − LB ≤ δ, we can conclude that the gap between the best solution we have found and the optimal solution to Eq. (12) is no greater than δ. If δ is small, we can early terminate the PASA searching procedure. This scheme, which can be referred to as the truncated PASA, offers a desirable tradeoff between performance and complexity.

Complexity evaluation of PASA
Let P 1 , P 2 , P 3 denote the probability for case of one bottleneck user, case of two bottleneck users, and case of three bottleneck users, respectively. W.l.o.g, we assume independent and identically distributed Rayleigh fading between the BS and users. When the BS is equipped with two antennas while all users are single-antenna users, we have derived the bounds for P 1 , P 2 , P 3 .
Theorem 2. For this multicast system, the probability for case of one bottleneck user is P 1 = 1 K ; the lower bound of P 2 is P L 2 = 2(K−1) and the upper bound of P 2 is P U 2 = 16K(K−1) (K+2) 3 ; consequently, P 3 can be bounded as Proof. Please see Appendix 4.
From Theorem 2, we can see that P 3 tends to be 1 as K increases. Therefore, the computational complexity of PASA depends only on the third case for asymptotically large K. Based on this observation, we can obtain a first-order estimation of the computational complexity of PASA.
With knowledge of bounds and form of feasible solution, most of combinations are pruned in PASA. In other words, most of hypothesis tests will be terminated before the division line marked in part III of PASA (Please see Appendix 5). Let P s denote the probability of combinations which survive after the division line. For a large scale of users, we count the probability P s in Table 1. It shows that P s ≈ 0 when K is large. For the hypothesis tests which are terminated before the division line (with probability close to 1), the complexity involves a constant number of vector multiplications. There are at most C 3 K hypothesis tests in the third part of PASA; therefore, the worst-case complexity of PASA is O(K 3 ).
Furthermore, when users are equipped with multiple receive antennas, some of the users can be dropped out http://asp.eurasipjournals.com/content/2013/1/121  Figure 2 shows the average number of dropped users when the users are equipped with multiple antennas. We can see that about half of users can be excluded; therefore, the complexity of PASA is substantially reduced.

Iterative two-dimension optimization for M > 2
When the base station is equipped with more than two transmit antennas, the max-min fair beamforming problem (5) becomes much more complicated due to the NPhardness. In this section, an I2DO algorithm is developed for the general case, i.e., M > 2.

The I2DO algorithm
For a problem that cannot be solved directly, a well-known approach is decomposing it into solvable subproblems. The main idea of I2DO is transforming the problem (5) into subproblems of M = 2. Considering the beamforming vector w is in the column space of matrix P ∈ C M×2 , P * P = I 2 , we can write w as We can see that Eq. (37) is exactly the max-min fair beamforming problem of two Tx antennas. As showed in Section 3, the optimal solution to Eq. (37) can be obtained efficiently by PASA. Based on above discussions, an iterative strategy can be adopted to handle the original problem (5). Let w n ∈ C M , w n = 1 denote the beamforming vector at the n-th iteration. The beamforming vector at the n + 1-th step is updated as where v n ∈ C M denotes the direction of updating which is orthogonal to w n and of unit-length. Note that u 1 , u 2 can be referred to as the complex-valued step size. After defining P = [w n , v n ] and u = [ u 1 , u 2 ] T , we can obtain the optimal step size by solving Eq. (37). In other words, we find the optimal beamforming vector on the plane spanned by w n and v n . With this scheme, w n+1 always outperforms w n , except that u 2 = 0. Hence, the objective function of Eq. (5) is monotonically increasing across iterations, and it shall converge to a stationary point. In the iterations, if min k γ k (w n+1 ) − min k γ k (w n ) is less than a preset threshold , then the I2DO algorithm can be terminated. Here, γ k (w) = w * H * k H k w denotes the SNR of the k-th user.
To accelerate the convergence of I2DO, the direction of updating v n should be carefully chosen. Firstly, we consider a rotation from w n to v n , thus the SNR of the k-th user can be expressed as a function of v n and the angle of rotation θ γ k (θ, v n ) = (cos θ w n + sin θ v n ) * H * k H k (cos θ w n + sin θv n ).
After differentiating γ k (θ, v n ) with respect to θ, we have If Re w * n H * k H k v n > 0, then a tiny rotation from w n to v n will increase the SNR of the k-th user.
Let B n ⊆ {1, · · · , K} denote the indexes of users whose SNRs are the lowest at the n-th iteration. To improve the SNR of these users, we can rotate w n to v n which satisfies For convenience, we turn it into real-valued form. Construct matrices W ∈ C 2M×2 and R ∈ R 2M×(|B n |) as

W =
Re{w T n } Im{w T n } −Im{w T n } Re{w T n } T R = [ r 1 , r 2 , · · · ] , http://asp.eurasipjournals.com/content/2013/1/121 where With above definitions and Eq. (40), we have where the vectorṽ n is a length-relaxed version of v n . Note that we choose Re{w * n H * k H kṽn } = 1, ∀k ∈ B n , w.l.o.g. It is clear that Eq. (41) are linear equations. Therefore, if |B n | ≤ 2M − 2,ṽ n can be obtained by pseudo-inverse, while the unit-length vector v n is v n =ṽ n / ṽ n . With this scheme, the convergence rate of I2DO is satisfactory.
In summary, the full procedure of the I2DO algorithm is summarized in Algorithm 1. Remark 5. In general, there is no need to choose the step size (u 1 , u 2 ) optimally [3,8]. Suboptimal algorithms such as the truncated PASA can be used here to obtain an approximate step size in Eq. (37).

Initialization of I2DO
In the iterations of I2DO, the minimum SNR is nondecreasing, and it will converge to a stable point. However, I2DO is not guaranteed to find the globally optimal beamformer, and it may get trapped into a local optimum. As a result, proper initialization of I2DO is of great significance [9].
In [11], Lozano's alternating gradient method is proposed which is quite simple and easily for implementation. It is further improved by choosing Lopez's initialization and adaptive step size in [12]. Hence, this modified Lozano's algorithm is called damped Lozano with Lopez's initialization (dLLI). Considering its low complexity and feasibility, dLLI can be employed to yield initialization points for I2DO. In order to obtain good performance, dLLI can be executed M times in parallel. Similar to Lopez's initialization, the M eigenvectors of k H * k H k can be used as the initialization points of dLLI. For each eigenvector of k H * k H k , we invoke dLLI and obtain an improved vectorŵ m . Finally, the best one is simply chosen to be the initialization point of I2DO

Implementation complexity of I2DO
The I2DO algorithm is comprised of the I2DO iterations and computation of initialization vector which requires calling dLLI M times. Let J 1 , J 2 denote the numbers of iterations in I2DO and dLLI. The worst-case computational complexity of the I2DO iterations is O (J 1 K 3 ), while the complexity involved in the generation of initialization vector is O(J 2 KM 2 ). So the entire complexity of I2DO is It is worthy to note that the actual complexity of I2DO can be further reduced if the truncated PASA is used instead of PASA. The worst-case complexity of the SDR-based scheme is O((K + M 2 ) 3.5 ), excluding the additional process of randomization [2]. The overall complexity of the GS-DL method proposed in [4] is O(K(M 3 + IKM)), where I denotes the number of iterations in local refinement. Therefore, the SDR-based scheme is less efficient than I2DO and GS-DL.

Simulation results
In this section, simulation results are presented to demonstrate the effectiveness of the proposed approaches: PASA and I2DO. The following results are based on 2, 000 Monte-Carlo trials, where independent and identically distributed Rayleigh fading channels between the base station and users are assumed. Besides, the transmit power is set as P = 1.

Performance of PASA
We consider that the base station is equipped with two transmit antennas and all users are single-antenna users, i.e., N k = 1, ∀k, since the GS-DL technique in [4] and the RCC2-SOR method proposed in [3] cannot handle the case of multi-antenna users. Figure 3 displays the achievable rates of different approaches with respect to the number of users. In the SDR-based scheme, the CVX [13] is used to solve the semidefinite programming problem. In the subsequent process of randomization, three randomization techniques: RandA, RandB, and RandC are used to generate 300 candidate vectors where 100 vectors are generated for each (see also [2]). We can see that both the SDR-based scheme and GS-DL perform quite close to PASA which yields the optimal beamformer. While for the RCC2-SOR method, it has about 0.05 bps/Hz performance loss compared to PASA. In Figure 4, the number of users is set to be K = 16, and an in-depth comparison between these methods is provided. The SNR loss of the SDR-based scheme and GS-DL compared to PASA is displayed in the form of cumulative http://asp.eurasipjournals.com/content/2013/1/121 distribution function (CDF). It is showed that the SNR loss of the SDR-based scheme and GS-DL are more than 1.5 dB in the worst case, even though their average achievable rates are close to the optimal value. Figure 5 gives a comparison of average runtime for different approaches, where all of them are implemented based on MATLAB. When the number of users is relatively small, for instance, K ≤ 12, the average runtime of PASA is less than those of the other three methods, and therefore, PASA is computationally more efficient. For the case of K > 12, RCC2-SOR is computationally faster than PASA; however, it is a suboptimal algorithm and may suffer from severe performance loss. Figure 6 shows the average rates achieved by the I2DO algorithm, the SDR-based scheme and GS-DL for M = 8.  For the SDR-based scheme, 3, 000 candidate vectors are generated by randomization, and the best one is chosen as an approximate solution. In the I2DO algorithm, numbers of iterations in I2DO and dLLI are J 1 = 20, J 2 = 100, respectively. When all users are equipped with single antenna (N k = 1), both I2DO and GS-DL have superior performance over the SDR-based scheme, and the gap between them is more than 0.25 bps/Hz for K ≥ 16. Moreover, I2DO also outperforms GS-DL especially when the number of users is large. In multipleantenna user scenario (N k = 2), GS-DL is not applicable anymore. Hence, we only compare the performance of I2DO and the SDR-based scheme. Similarly, we can see that I2DO has about 0.3 bps/Hz improvement over the latter.

Performance of I2DO
In the following scenario, the number of users is set as K = 36, where all of them are single-antenna users. The average achievable rates of different methods are  Figure 7. Note that 30MK candidate vectors are generated in the randomization process of SDR, since the dimensionality of beamforming problem gets larger as the number of transmit antennas increases [2]. Here, the multicast capacity is achieved by a multi-rank precoder [14]. Obviously, the multicast capacity is an upper bound of all achievable rates. From Figure 7, we can see that I2DO is superior to the SDR-based scheme and GS-DL. Moreover, the beamformer designed by I2DO can achieve a large majority of the multicast capacity, while the complexity of encoding and decoding is significantly reduced.
We further provide a detailed comparison for the scenario of M = 24, K = 36, N k = 1. The cumulative distribution functions of the multicast rate achieved by above three methods are displayed in Figure 8. Again, I2DO has the best performance and it can achieve almost twice the multicast rate of the SDR-based scheme. As shown in [15], the approximation accuracy of semidefinite relaxation is a decreasing function of the number of users. In other words, the performance of the SDR-based scheme degrades when the number of users increases.

Conclusions
For the well-known max-min fair beamforming problem, two efficient algorithms, PASA and I2DO, are proposed to handle the case of two Tx antennas and the general case of more than two Tx antennas, respectively. In the two-antenna case, PASA is guaranteed to obtain a globally optimal beamformer with worstcase complexity O(K 3 ). While in the general cases, I2DO can decompose the original beamforming problem into a series of two-antenna subproblems and iteratively improve the solution by PASA. The superior performance of the proposed algorithms is demonstrated Endnotes a If n k is colored noise and the covariance matrix is known, it can be pre-whitened at the receiver side. b The formula w * H * k H k w in Eq. (3) can be interpreted as the received SNR of the k-th user.
c It also means that every three of {g k } K k=1 are linearly independent.
d The detailed process of computing these solutions is demonstrated in the following three cases of bottleneck users. e There are some slight modifications in part II of PASA and the division line marked in part III of PASA is to be used for the complexity analysis.

Appendix 1 Proof of Lemma 3
In the case of one bottleneck user, w.l.o.g, we assume the i-th user is the bottleneck user. Hence, we have To maximize the SNR of the bottleneck user, the optimal solution to Eq. (12) must be x = g i / g i according to the Cauchy-Schwarz inequality. Besides, it yields an upper bound of λ opt : λ opt ≤ h i + g i by Lemma 2.
There are K possible cases of the bottleneck user, so we obtain K upper bounds of λ opt : h k + g k , k = 1, . . . , K.
We can see that only the index corresponding to the lowest upper bound is possible to be the bottleneck user. Therefore, instead of exhausting K hypothesis tests, we need only to verify if user i = arg min k h k + g k is the bottleneck user. http://asp.eurasipjournals.com/content/2013/1/121

Appendix 2 Proof of Lemma 4
In the case of two bottleneck users, w.l.o.g, we assume the i-th and j-th users are the bottleneck users. Then, we denote SNRs of the i-th and j-th users, respectively. According to the FJ necessary conditions, an optimal solution to Eq. (12) must have form as x = αg i + βg j , where α, β have the same sign. In particular, if α > 0, β > 0, it can be proven that x = αg i + βg j is the optimal solution to problem From geometrical perspective, we can see that the optimal solution to Eq. (43) must be a vector which is in the cone generated by g i and g j .
Assuming v is an arbitrary unit-length vector which is orthogonal to x, we define a rotation from x to v Taking the first derivative of γ i (θ), γ j (θ) with respect to θ, we have Then, v and x can be denoted as ρ is a scalar, and n is an unit-length vector in the null space of G T ij . Recalling that v and x are orthogonal to each other, we have x T v = αξ i + βξ j = 0. It can be divided in two cases: For case 1, it is impossible to improve the solution x as the rotation always decreases the SNR of bottleneck users. For case 2, we need to consider the second derivative of γ i (θ), γ j (θ) with respect to θ Consider the weighted sum of the second derivative We conclude that at least one of σ i , σ j is negative, w.l.o.g, we assume σ i < 0. Then, γ i (θ) is a concave function about θ and rotating x to v will decrease the SNR of the i-th user. Therefore, it is impossible to improve the solution x by rotating to v. From analysis of both cases, we can see that solution x = αg i + βg j , α > 0, β > 0 is the optimal solution to Eq. (43). According to Lemma 2, we know the correspond- is an upper bound of λ opt . If λ < UB, then λ is a tighter upper bound and we can update UB = λ. Besides, if the solution x satisfies h k + g T k x ≥ λ, k = 1, . . . , K, then from Lemma 2, it is also an optimal solution to Eq. (12).

Appendix 3 Proof of Lemma 5
In case of three bottleneck users, w.l.o.g, we assume the i-th user, j-th user, and k-th user are the bottleneck users. That is, γ i (x) = γ j (x) = γ k (x) < γ l (x), l ∈ {1, . . . , K}\{i, j, k}. According to the FJ necessary conditions, an optimal solution to Eq. (12) must have form as x = αg i + βg j + γ g k , where α, β, γ have the same sign. In particular, if α > 0, β > 0, γ > 0, then x = αg i +βg j +γ g k is the optimum solution to problem max x∈R 3 From geometrical perspective, it is easy to see that the optimal solution to Eq. (47) must be a vector in the cone generated by g i , g j , and g k .
Defining v as an arbitrary unit-length vector which is orthogonal to x, we consider the rotation from x to v and obtain γ i (θ), γ j (θ), γ k (θ) as defined in Eq. (44). Taking the first derivative of them with respect to θ, we have Recalling that v and x are orthogonal to each other, we have x T v = αξ i + βξ j + γ ξ k = 0. Note that ξ i , ξ j , ξ k cannot all equal to zero since g i , g j , g k are linearly independent. Consequently, at least one of ξ i , ξ j , ξ k is negative, and it is impossible to improve one of the SNRs without reducing another one. Now, we conclude that solution x = αg i + βg j + γ g k , α > 0, β > 0, γ > 0 is the optimal solution to Eq. (47). According to Lemma 2, the corresponding SNR λ = h i + is an upper bound of λ opt which is the optimal solution of the original problem (12). If λ < UB, then we obtain a tighter upper bound UB = λ. Also, if this solution x satisfies h k + g T k x ≥ λ, ∀k, then it is an optimal solution to Eq. (12) by Lemma 2.

Appendix 4 Proof of Theorem 2
When all users are single-antenna users, the channels can be denoted by In one bottleneck user case, w.l.o.g, we assume the i-th user is the bottleneck user. Then, the optimal beamforming vector should be w opt = h i / h i [3,5]. Hence, the SNR of the i-th user is γ i = h i 2 which is a chi-squared distributed random variable with four degrees of freedom [16], while the SNRs of other users are γ l = |h * l w opt | 2 , l ∈ {1, . . . , K}\{i}. Since the beamforming vector w opt is of unit length and is independent of h l ; therefore, h * l w opt is a complex Gaussian variable with zero mean and unit variance. Consequently, γ l is a chi-squared distributed random variable with two degrees of freedom. The probability density functions (pdf ) of γ i , γ l are given by [16] Finally, the probability of one bottleneck user case P 1 can be calculated by In two bottleneck user case, we assume that the i-th and j-th users are bottleneck users. Let γ ij denote the SNR of bottleneck users and γ l , l ∈ {1, . . . , K}\{i, j} denote the SNRs of other users. Similarly, the probability of two bottleneck user case P 2 can be expressed as To calculate P 2 , firstly we have to derive the closed form of γ ij and analyze its pdf.
Assuming h i < h j , then from the fact that both of them are bottleneck users, we have (see also [5]) Define H ij as and consider the QR decomposition of H ij H ij = QR, Q = [q 1 , q 2 ] , R = r 11 r 12 0 r 22 .
Then, h i , h j can also be expressed as h i = r 11 q 1 , h j = r 12 q 1 + r 22 q 2 .
To maximize the SNRs, φ must be φ = −∠r 12 . However, it is too complicate to derive the pdf of γ ij due to its complex expression. Instead of computing the precise expression of P 2 , we try to provide the bounds of P 2 . From the expression of γ ij , we can see that γ ij ≤ r 2 11 . Moreover, with Eq. (52), we have With above results, P 2 can be bounded as where P L 2 , P U 2 denote the lower bound and upper bound of P 2 , respectively.
As mentioned before, the beamforming vector w opt is only determined by the channels of bottleneck users and hence is independent of h l , l = i, j. Therefore, the SNRs of other users γ l = |h * l w opt | 2 still follow chi-squared distribution with two degrees of freedom and its pdf is f γ l (x) = e −x , l ∈ {1, . . . , K}\{i, j}.
To derive the lower bound and upper bound of P 2 , we turn to analyze the pdf of r 2 11 = h i 2 . Let υ denote the squared normalized inner product of h i and h j υ = |h * i h j | 2 h i 2 h j 2 , υ ∈ [ 0, 1] .
Then, Eq. (50) can also be expressed as Noting that υ follows uniform distribution [17,18], that is, Hence, the cumulative distribution function (cdf ) of h i 2 is where factor 2 before the integral denotes that there are two possible case of orders. After differentiation, we obtain the pdf of r 2 11 in the following Consequently, the pdf of r 2 11 2 is f r 2 11 2 (x) = 16x 2 e −4x . Now, the upper bound and lower bound of P 2 can be calculated by 3 . Since there are only three cases, we have P 1 + P 2 + P 3 = 1. From above results, it is straightforward to get the bounds corresponding to the probability of three bottleneck user case P L 3 ≤ P 3 ≤ P U 3 , where P L 3 = 1 − P 1 − P U 2 , P U 3 = 1 − P 1 − P L 2 are the lower bound and upper bound of P 3 , respectively.