An improved block diagonal precoding scheme for MIMO multicast channel with two users

Matrix theory plays an important role in precoding methodology for multiple input multiple output (MIMO) systems. In this paper, an improved block diagonal (BD) precoding scheme is proposed for a MIMO multicast channel with two users, where the unitary precoding matrix is constructed in a block-wise form by joint triangularization decomposition. In order to reduce large signal-to-noise ratios (SNRs) spread across different transmitted data streams and users, the combination of joint equi-diagonal triangularization (JET) and joint geometric mean decomposition (JGMD) is applied to submatrix construction in the inner process of this precoding scheme. An elaborate implementation is presented, and the existence condition of JGMD is also investigated for two complex-valued matrices with two columns, where the analytical result reveals the connection with the particular channel realization and essentially determines when to consider JGMD for submatrix construction. In addition, the properties of the diagonal elements generated by joint triangularization decomposition are discussed as well as the computational complexity of the proposed scheme. Simulation results indicate that in general, JGMD is employed with high probability in the hybrid model, and the proposed scheme readily outperforms the JET scheme in terms of bit error rate (BER) performance in the moderate to high SNR regimes.


Introduction
Precoding techniques have gained extensive research interests in the literature due to their capability of achieving the ergodic capacity and improving link reliability for wireless communication over multiple input multiple output (MIMO) channels (see, e.g., [1] and the references therein). Under the assumption of full channel state information (CSI) at the transmit side as well as at the receive side, matrix theory, including several decomposition approaches, is often used as a basic tool to facilitate the precoder design and performance analysis for different types of communication scenarios. For instance, singular value decomposition (SVD) probably serves as a prevailing way to construct the capacity-achieving precoder for a single-user scenario by diagonalizing the MIMO channel matrix. To avoid introducing rather complicated bit loading strategies at the transmit side, the precoding scheme in [2] using geometric mean decomposition (GMD) [3,4] *Correspondence: xdxu@ustc.edu.cn Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China and later the unitary channel decomposition (UCD) scheme [5] and the root-mean-square decomposition (RMSD) scheme [6] are successively proposed to achieve optimum bit error rate (BER) and channel throughput simultaneously.
Another attractive precoding scheme is recently introduced in [7] and [8] for MIMO multicast channels. Relying on the so-called joint equi-diagonal triangularization (JET) or, equivalently, the joint unitary triangularization (JUT) with fixed diagonal ratio, the research effort therein focuses primarily on constructing one common unitary precoding matrix for two users by jointly triangularizing their channel matrices. However, as one can see, the JET scheme usually results in subchannels with vastly different signal-to-noise ratios (SNRs) for every user, and the BER performance is substantially dominated by those who have the smallest gain on the diagonal, provided that the same modulation and coding scheme is used for all subchannels. Intuitively, conjecture that if there exists a joint GMD (JGMD) decomposition of two matrices, then each user can obtain multiple identical subchannels with equal http://asp.eurasipjournals.com/content/2014/ 1/39 SNRs and the problem arising from JET can be eliminated thereafter. However, the existence condition of such a decomposition is available only for two 2 × 2 real-valued matrices [9]. To the authors' best knowledge, the existence condition of JGMD is still unknown for more generalized cases.
In this paper, we aim to alleviate the gain spread of the subchannels for each user and improve the SNR of the worst subchannels under the assumption that the same constellations are employed over all the subchannels by ignoring bit loading. Inspired by the successful application of GMD for multiuser MIMO broadcast scenarios [10], we would like to propose an improved block diagonal (BD) precoding scheme for a MIMO multicast channel with two users based on the hybrid usage of JET and JGMD, which can obtain better BER performance than the JET scheme counterpart. Given a prescribed dimension for every block, the idea behind the proposed precoding scheme lies in that for each submatrix of the block-wise unitary precoder, it can be selectively constructed by JET or JGMD according to particular channel realization. The main contributions are listed in the following: 1. The application of JGMD is considered, and the sufficient and necessary condition of the existence of JGMD is derived for the case of two complex-valued matrices with two columns. The analytical result reveals the connection with two-antenna channel realization so that JGMD can be conditionally used to perform submatrix construction for the proposed precoding scheme with more than two transmit antennas. 2. We reformulate and elaborately implement the BD precoding scheme by using JET as the unique submatrix construction algorithm, namely the pure BD-JET scheme in this paper, which can be seen as a supplemental work to [8]. New interesting insight into the property of diagonal elements is also shown for the upper triangular matrices. 3. Since JET may introduce large SNRs spread across different transmitted data streams and users, the hybrid usage of JET and JGMD is proposed to construct submatrices in the inner process of the precoding scheme. The existence condition of JGMD is used to determine whether JGMD is employed or not. The computational burden of the proposed precoding scheme is briefly discussed and compared with the JET scheme counterpart. Benefiting from the high probability of using JGMD instead of JET, the proposed scheme has the capability to guarantee a satisfactory BER performance for each user.
The remainder of this paper is structured as follows. Section 2 briefly describes the MIMO multicast channel model. In the next section, the existence of JGMD is studied and the proposed scheme is presented based on the combination of JET and JGMD. Section 4 illustrates several numerical examples to validate the advantage of the proposed scheme over the JET scheme. Finally, Section 5 summarizes this paper.

Physical-layer multicasting system
Consider a MIMO wireless physical-layer multicasting system consisting of one base station and two users. The base station is equipped with M transmit antennas, and the ith user is equipped with N i ≥ M, i = 1, 2, receive antennas. Assume that linear precoding is employed at the base station with a unitary precoding matrix P ∈ C M×M , and a common message s ∈ C M will be transmitted through the multicast channel. At the receive side, the ith user receives where H i ∈ C N i ×M is the channel matrix with full column rank for the ith user and z i ∈ C N i is the circularly symmetric Gaussian white noise vector at the ith user. If the base station entirely knows in advance the instantaneous channel state, i.e., the full knowledge of matrix , JET is applicable to construct the unitary precoder P by jointly decomposing {H i } as follows: with a i = M M j=1 σ i,j , where the sequence {σ i,j } M j=1 is the singular value of H i . U 1 and U 2 are unitary matrices of dimensions N 1 × N 1 and N 2 × N 2 , respectively, D 1 and D 2 are generalized upper triangular matrices with equal diagonal elements, and (·) † denotes the conjugate transpose.
Suppose a zero-forcing VBLAST (ZF-VBLAST) detector [11] is performed at the receive side, hence the ith user yields a nulling output by multiplying U † i to both sides of (1), i.e., Ignoring the error propagation effect, a sequential signal detector using successive interference cancellation (SIC) will get M parallel scalar subchannels, which can be expressed aŝ whereŷ i,j , s j , andẑ i,j represent the jth entry of the vectorsŷ i , s, andẑ i , respectively, and d i,jj denotes the jth diagonal entry of the upper triangular matrix D i . Assume that the same modulation constellation is used in all the subchannels to reduce the system complexity, which is consistent with the HIPERLAN/2 and IEEE 802.11 standards. As a result, the overall BER performance of the http://asp.eurasipjournals.com/content/2014/1/39 JET scheme will be restricted by the subchannels with the smallest gain. Notice that the JET precoding scheme explicitly adopts a promising unitary triangularization decomposition approach by (2) to jointly triangularize two matrices with one common unitary matrix. However, large spread of the diagonal elements {d i,jj } M j=1 can be easily observed even for ordinary channel conditions, which will inevitably lead to the BER performance attenuation. Based on this observation, the motivation of this paper is to develop such a joint triangularization decomposition approach to dramatically reduce the spread or, equivalently, enlarge the smallest diagonal elements. Khina et al. [9] investigated this situation, but unfortunately, the sufficient and necessary condition of JGMD is still unknown not only for the case of more than two users but also for the case of more than two transmit antennas, except for the special case of two real-valued 2 × 2 matrices.

The proposed precoding scheme
To meet the requirement of high-speed data transmission, practical multicasting systems are usually expected to involve a large number of transmit antennas at the base station. Therefore, the equivalent channel matrices often contain columns larger than two in general. In particular, when the multicasting system encounters MIMO frequency-selective fading channels for all users, the dimensions of the equivalent channel matrices will dramatically increase, which will cause an inefficient use of JET at least for two aspects [12]: (1) the multiplication complexity equals O(M 2 ) while directly using a full-size unitary precoding matrix, which will increase with the increase of M, and (2) the requirement of feedback bits from the receiver will also increase with the increase of M for encoding the precoding matrix.
In the sequel, we construct the block diagonal precoder for multicast channels with column dimensions larger than two. In this way, the aforementioned drawbacks will be mitigated to a great extent. Moreover, via matrix segmentation, it is possible to employ JGMD for reduceddimensional submatrices to achieve better performance. But before that, let us first consider the existence of JGMD in the next subsection.

JGMD for complex-valued matrix case
For the case of M = 2, without loss of generality, denote the unitary matrix P as where p i is the ith column vector of P and θ ∈ [0, π/2), ψ ∈ [0, 2π). Then the following lemma states the existence condition of JGMD for two complex-valued matrices.  (7) with h i,jk represented as the entry of the jth row and kth column of Proof. For any unitary P, one can obtain the QR decomposition of H i P as follows: In view of (5), the first column of both sides of (8) becomes 11 denoting the first diagonal element of R i . Subsequently, the square form of r i,11 is obtained by Resorting to the statement of Lemma 1 in [13], the righthand side of (9) can be transformed into the matrix form of c + Gx, while the left-hand side of (9) essentially represents the SNR of the first subchannel for the ith user. Therefore, if there exists JGMD for matrix sequence 11 and r 2 i,22 should be equal to σ i,1 σ i,2 from the determinant viewpoint, which means that (6) has solutions, and vice versa. Remark 1. Lemma 1 reveals the relationship between the existence condition of JGMD and the solution of a constraint equation for two complex-valued matrices. It is straightforward from this lemma that, if existing, the resulting solution of (6) has the form of where g is chosen from the unit-length basis vectors in the null space of G, ρ is an arbitrary real value to meet the norm constraint of x = 1, and (·) ‡ denotes pseudoinverse. Apparently, the sufficient and necessary condition that Lemma 1 tells is equivalent to the following one: When the solution x is obtained, the corresponding angular parameters θ and ψ can then be uniquely acquired through the definition. In this case, the unitary matrix P is obtained finally. http://asp.eurasipjournals.com/content/2014/1/39 Remark 2. Consider more general cases of K > 2 users in one multicasting system with an extension version of c ∈ R K , e ∈ R K , and G ∈ R K×3 , respectively. It is clear from (6) that no proper solution x ∈ R 3 will exist with probability 1. Therefore, we only focus on precoding for multicasting systems with two users. For a complex-valued matrix with column dimension M > 2, the existence condition of JGMD is still an open issue.

BD-JET decomposition with discussions about diagonal elements
With the merit of Lemma 1, we are now dedicated in jointly decomposing two matrices into upper triangular forms with equal elements on each diagonal. Before proceeding with the design procedure, we are willing to reformulate BD-JET carefully with full implementation for reasons of completeness, which is not explicitly described in [8]. The following lemma indicates the principle of this decomposition approach.

Lemma 2. Full and equal column rank matrices
can be jointly triangularized into B blocks as follows: where Proof. Before starting the proof process, we are concerned with an equivalent structure of the regular-form JET via (2), which is referred to as an alternative-form JET in this paper. In other words, adopting JET to {H i } 2 i=1 will result in the following two equivalent forms: with H i,1 = Q i,1 T i,1 P † 1 , where for any matrix X i , X i,1 contains the first b 1 columns of X i , while X i,r contains the rest of the columns. For the first block, the alternative-form JET via (13) can be used to manipulate From (15), it is reasonable to obtain the exact solution of P 1 , T i,1 , and Q i,1 by setting P 1 = V 1,JET , T i,1 = α i,1 D i,1 , and Q i,1 = U i,1 , respectively. And the diagonal ratio between T 1,1 and T 2,1 is fixed by α 1,1 α 2,1 . Due to the orthogonalization between Q i,1 and Q i,r , the remainder of H i satisfies that which has the same decomposition pattern as that of H i,1 . Thus, we can continue the above operation recursively until all the B blocks have been factorized. Consequently, Q i , T i , and P consist of the output submatrices from every iteration, and finally, (12) is reached.
Remark 3. Lemma 2 allows for a block diagonal decomposition approach, named as BD-JET, to triangularize two matrices simultaneously. The proof is constructive and results in a detailed decomposition procedure, which is not pointed out in [8]. BD-JET can also be seen as a relaxed application of JET since JET is employed in the inner iteration process of BD-JET. Moreover, if there is only one block instead, i.e., B = 1, then BD-JET will be reduced to JET. And if B = M, as a result, the unitary matrix P will degrade to an identity matrix. However, there are at least two evidences that state the differences between BD-JET and JET. The first one is that in JET mode and in BD-JET mode, the unitary matrix P may have a different structure, except for the same matrix dimension. BD-JET possesses a block diagonal matrix P which contains several unitary submatrices. The second one lies in that in JET mode, the ratio of the diagonal elements of D i is fixed; nevertheless, in BD-JET mode, the ratio of the diagonal elements of T i , as mentioned in the next remark, is block-equal and not identical between blocks in general.
The following corollary describes the property of the kth block diagonal elements of T i,k , defined as t i,k,1 , t i,k,2 , . . ., t i,k,b  Proof. For the first block, i.e., k = 1, it is apparent that As a consequence, the connection between the determinants of adjacent blocks reads In other words, the product of the kth block diagonal entries of the upper triangular matrices T i,k is characterized by (17).

Remark 4.
From Corollary 1, the ratio of the kth block elements between the upper triangular matrices {T i } holds for r k = 2b k a 1,k a 2,k−1 a 1,k−1 a 2,k . In fact, r k may be controlled inherently by determinants of the current block as well as the former one. As for JET, the ratio would be kept as r JET = 2M a 1,B a 2,B for the whole diagonal elements. For the case of B = 1, r 1 = r JET holds true intuitively. However, as B increases, this relationship is invalid in general except that {H i } have special matrix structures. In addition, we note that the ratio r k majorizes r JET by majorization theory [14], and it is proved by the numerical evaluations in the next section that T i has relatively smaller gain spread in the diagonal than D i with high probability, which will lead to an improved smallest gain for the worst subchannels.
AssumingĤ i,k consists of the last b k columns of H i,k , we can use the properties of the block matrix determinant to further get the following relationship: where

The proposed scheme
Although the BD-JET decomposition has no specific constraint on the size of submatrices, the use of JET may still produce large gain spread across different subchannels in the decomposition procedure. In this subsection, we put forward an improved block diagonal precoding scheme for multicasting system with two users, where the unitary precoding matrix is constructed based on a combination of JET and JGMD. Actually, JGMD will act as the major decomposition approach for every two submatrices, thanks to the high existence probability, which is validated by numerical evaluations in the next section. Algorithm 1 summarizes the details of the proposed scheme. . And JET is automatically assigned for the joint triangularization decomposition of the last block in the current situation, which is omitted in Algorithm 1 for concise clarification purposes. Besides, in contrast with the proposed scheme, we rename the aforementioned BD-JET scheme including JET as the unique algorithm for submatrix construction as the pure BD-JET scheme hereafter.

Algorithm 1: The proposed scheme
Remark 6. For simplicity, let N 1 = N 2 = N and each block has equal size, i.e., b j = L, ∀j. Then the computational complexity of the proposed scheme can be http://asp.eurasipjournals.com/content/2014/1/39 approximately figured out in terms of the number of flops as In contrast, the number of flops for JET can be computed by for the same {H i } 2 i=1 . As can be seen from (21) and (22), the computational complexity of the proposed scheme is somewhat larger than the JET counterpart since typically, L < M. Nevertheless, due to the bock diagonal decomposition, the resulting block diagonal precoder P can reduce the multiplication complexity from O M 2 to O M 2 B in general.

Simulation results
In this section, several numerical examples are performed to demonstrate the efficiency of the proposed scheme. A single-group multicasting system with two users is employed and without loss of generality, the number of receive antennas for each user is set equal to each other, i.e., N 1 = N 2 = N. Transmit symbols are commonly chosen from quadrature phase shift keying (QPSK) constellation throughout this paper.
Example 1. For the proposed scheme, it is worth noting that the existence probability of JGMD seriously affects the gain spread along the diagonal of the upper triangular matrices. Too low probability will restrict the proposed scheme to have any advantage over the JET scheme. Theoretical analysis of this stochastic characteristics will become very cumbersome owing to the complicated probability model. For this reason, we start by first estimating the probability of interest by Monte Carlo trials. Assuming that {H i } 2 i=1 are of N × 2 independent and identically distributed (i.i.d) Rayleigh fading channels, Table 1 establishes the existence probability of JGMD versus N, where it is clear that JGMD exists in most cases, especially when N has large values. Table 2 shows the possibility that whether or not the smallest diagonal elements  caused by the proposed scheme are greater than the pure BD-JET case. From this table, it appears that the proposed scheme always has high probability to produce subchannels with better SNRs than the pure BD-JET. Since JGMD is essentially unused for the case of N = M ≤ 3, which means that the proposed scheme is equivalent to the pure BD-JET one in this case, therefore we do not consider this situation in this table. Assuming N = M, more evidences can be observed in Table 3, where the multicasting system suffers from MIMO frequency-selective channels with μ fading paths.
Example 2. In this example, the proposed scheme is investigated from the BER performance aspect for multicasting in MIMO frequency-flat fading channels, where the system parameters are set as M = N = 8, B = 4, and b j = 2, ∀j. Figure 1 depicts the comparative curves of average BER performance of each user versus SNR. As can be seen from this figure, each user almost has the same BER property, and the proposed scheme achieves superior performance over the JET scheme and the pure BD-JET scheme in moderate to high SNR regimes. Due to the unitary precoding, the proposed scheme is expected to have the same throughput as that of the JET scheme for the multicasting system. However, the smallest gain of the worst subchannels is improved by the proposed scheme at the cost of a small increase in computational complexity. The resulting performance gap implies the effective application of JGMD as the major decomposition approach for the inner process of the proposed scheme.
Notice that the unitary precoder P consists of four 2 × 2 unitary submatrices in this case. Therefore, it is clear that the block diagonal structure of P can reduce matrix multiplication complexity from O M 2 to O M 2 4 in comparison with the JET scheme.  Example 3. Another favorite multicast scenario is about the block transmission through MIMO frequencyselective fading channels with μ distinguishable paths. In this example, the system uses a zero-padded transmission mechanism, which means that the system will transmit μ zero vectors after τ symbol vectors in every τ + μ symbol duration. In fact, we can acquire the (τ + μ)N × τ M equivalent channel matrix {H i } in block Toeplitz structure in the current situation.
In the sequel, Figure 2 illustrates the trajectories of the BER performance versus SNR for two users with M = N = 2, μ = 4 effective paths, and τ = 4 symbol block size. In this case, the equivalent channel matrix actually has a dimension of 16 × 8. Similar behaviors manifest that the proposed scheme has reliable performance better than the JET scheme in moderate to high SNR regimes. The pure BD-JET scheme is also examined and plotted in this figure. It is shown that in contrast to the pure BD-JET scheme, the proposed one can still yield a performance improvement about 1 dB in the high SNR case. The BER performance versus the symbol block size τ is also shown in Figure 3 at SNR = 15 dB. It is clear from this figure that although the bandwidth efficiency increases as τ increases, the BER performance degrades a lot for the proposed scheme as well as for the JET scheme. Similarly, performance properties have also been mentioned for the BD-GMD scheme in [12] for unicast scenarios.

Conclusions
In this paper, we present an improved block diagonal precoding scheme to construct a unitary precoder for a MIMO multicast channel with two users by using joint triangularization decomposition. An elaborate implementation of the pure BD-JET scheme is provided by means of JET. Although the pure BD-JET scheme achieves relative advantages without an explicit constraint on the size of submatrices, the use of JET may suffer from large SNR spread across different transmit data streams and users. Fully aware of the existence condition of JGMD for two transmit antennas, we introduce the hybrid application model of JGMD and JET to construct submatrices in the inner process of the precoding scheme. The high existence probability of JGMD is proved by numerical results so that in the proposed scheme, JGMD is observed to be the major decomposition algorithm for submatrix construction, as compared with the JET counterpart. Besides the analytical result of the computational complexity, properties of the diagonal elements are also derived to gather useful insights into the proposed scheme. Several numerical examples demonstrate that the proposed scheme can guarantee satisfactory BER performance over the JET