Skip to main content

A limited feedback scheme for massive MIMO systems based on principal component analysis

Abstract

Massive multiple-input multiple-output (MIMO) is becoming a key technology for future 5G cellular networks. Channel feedback for massive MIMO is challenging due to the substantially increased dimension of the channel matrix. This motivates us to explore a novel feedback reduction scheme based on the theory of principal component analysis (PCA). The proposed PCA-based feedback scheme exploits the spatial correlation characteristics of the massive MIMO channel models, since the transmit antennas are deployed compactly at the base station (BS). In the proposed scheme, the mobile station (MS) generates a compression matrix by operating PCA on the channel state information (CSI) over a long-term period, and utilizes the compression matrix to compress the spatially correlated high-dimensional CSI into a low-dimensional representation. Then, the compressed low-dimensional CSI is fed back to the BS in a short-term period. In order to recover the high-dimensional CSI at the BS, the compression matrix is refreshed and fed back from MS to BS at every long-term period. The information distortion of the proposed scheme is also investigated and a closed-form expression for an upper bound to the normalized information distortion is derived. The overhead analysis and numerical results show that the proposed scheme can offer a worthwhile tradeoff between the system capacity performance and implementation complexity including the feedback overhead and codebook search complexity.

1 Introduction

The massive multiple-input multiple-output (MIMO) system which deploys large numbers of transmit antennas at the base station (BS) has been listed as one of the key techniques for fifth generation (5G) cellular networks [1]. The deployment of numerous antennas enables massive MIMO systems to achieve not only higher system capacity, but also higher spectrum and energy efficiency than conventional MIMO systems [2, 3].

The superior performance of the massive MIMO systems relies on the spatial multiplexing and the minor multi-user interference. As is the case for conventional MIMO systems, this in turn requires the BS to have perfect knowledge of the downlink channel state information (CSI) [4]. In a time division duplexing (TDD) system, the channel reciprocity can be exploited to acquire the downlink CSI at the BS [5]. However, things become more challenging when the system operates in a frequency division duplexing (FDD) mode, where the channel reciprocity no longer holds. Therefore, a mobile station (MS) needs to feedback the downlink CSI through a rate-limited uplink channel. The authors in [6] drew the conclusion that the required feedback rate per user should be increased in proportion to the number of the transmit antennas for the sake of obtaining the full multiplexing gain. Therefore, feedback overhead turns into a key challenge in the massive MIMO systems.

The foundation of the works on feedback overhead reduction for MIMO systems is the correlation feature of MIMO channels. Limited feedback techniques for correlated MIMO channels were designed in [79]. A modified Grassmannian line packing codebook was proposed in [7], and the authors in [8, 9] rotated the codebook for i.i.d. channels with a unitary matrix to obtain the codebook for correlated MIMO channels. A systematic codebook was designed for quantized beamforming in [10], which was implemented by maps that can rotate and scale spherical caps on the Grassmannian manifold.

Furthermore, a codebook for uniform rectangular arrays (URA) for massive MIMO antennas was designed in [11, 12]. It was derived by the Kronecker product of two ULA codebooks. The authors in [13] proposed a feedback framework for FDD massive MIMO systems that divides the coverage area into sub-sectors, where each sub-sector is formed by a set of narrow beams that covers a pre-assigned area in azimuth and elevation. Non-coherent trellis-coded quantization and trellis-extended codebooks for massive MIMO systems were proposed in [14, 15], which exploited a Viterbi decoder for CSI quantization and a convolutional encoder for CSI reconstruction. A projection based feedback compression was utilized to project the high-dimensional channel space into a lower dimensional subspace [16]. However, [16] did not explain how to feedback the projection matrix.

The compressive sensing (CS)-based limited feedback schemes for massive MIMO were proposed to reduce the feedback overhead by exploiting the spatial correlation of CSI [1720]. The authors in [17] introduced CS to massive MIMO for limited feedback. A unique insight was provided that strong spatial correlations are exhibited in massive closely-packed antenna arrays, so channel vectors can be represented in sparse form in the spatial-frequency domain. Subsequently, a compressed analog feedback strategy for spatially correlated massive MIMO channels was proposed in [18]. In contrast to the strategy in [18], the low-dimensional CSI was quantized with a codebook and the preferred index was fed back in [19] and [20].

The choice of orthogonal basis, which is intended for the sparse representation of the original signal, plays an important role in the recovery of the original high-dimensional signal at the BS. Two such kinds of orthogonal basis construction, the discrete cosine transform (DCT) and the Karhunen-Loeve transform (KLT), are usually employed [21]. If the channel correlation matrix is neither known at the MS nor the BS, the signal-independent DCT basis is a better option. On the one hand, because of its signal-independent nature, the utilization of the DCT basis does not require the MS to inform the BS of the channel correlation matrix. On the other hand, this makes the DCT basis incapable of tracking the real-time change of channel state, which has a negative effect on system capacity. In contrast to the DCT basis, the KLT basis can excellently adapt to CSI change. Therefore, when MS and BS both know the instantaneous channel correlation matrix, the KLT basis can provide the optimal sparse representation, which promises accurate recovery even if only a small number of measurements are available. Unfortunately, the signal-dependent nature of the KLT basis requests the MS to feedback channel correlation matrix instantaneously [22]. This can hardly be implemented in practical systems because of the heavy feedback overhead.

In this case, principal component analysis (PCA) can offer a tradeoff between system capacity and practical implementation [23, 24]. Compared with a DCT basis, PCA can be more adaptive to the change of the channel state, since PCA is signal-dependent [23]. This guarantees PCA better system capacity than a DCT basis. Compared with a KLT basis, PCA only needs that the MS and the BS have knowledge of the channel correlation matrix in a long-term period. This makes PCA achieve feedback overhead reduction much better than a KLT basis. What is more, the most attractive characteristic of PCA is that it is effective for dimensionality reduction of high-dimensional data [24], whose elements are correlated. Inspired by this, PCA has great potential to be applied to the compression of high-dimensional CSI with strong spatial correlation to reduce feedback overhead in massive MIMO systems. To the best of our knowledge, there have not existed any works addressing a practical feedback scheme based on PCA.

This paper proposes a PCA-based feedback scheme for massive MIMO systems. In the proposed scheme, the MS utilizes a compression matrix, which is obtained by operating PCA on CSI observed over a long-term period, to compress spatially correlated high-dimensional CSI into low-dimensional representation. After quantizing the low-dimensional CSI with a random vector quantization (RVQ) codebook, the index of the preferred codeword is fed back to the BS in each short-term period. In order to track the channel changes and enable the BS to recover the high-dimensional CSI, it is necessary for the MS to refresh and feedback the compression matrix at every long-term period. Through the dimensionality reduction processing by PCA, feedback overhead and codebook search complexity can be reduced. The contributions of the paper are summarized as follows.

  • A PCA-based feedback scheme for FDD massive MIMO systems is proposed. The operation procedures at the BS and the MS are divided into two types, which are long-term period operations and short-term period operations. In more detail, the exact operation procedures both at the BS and the MS, as well as the derivation of the compression matrix at the MS, are presented. The distortion of the proposed scheme is analyzed. An upper bound to the normalized distortion is derived.

  • System performance comparisons of the PCA-based feedback scheme, the DCT-based CS scheme and the KLT-based CS scheme are presented. The feedback overhead and the codebook search complexity are analyzed and the system capacity performance is simulated. Looking at the simulation results and the feedback overhead analysis comprehensively, we draw the conclusion that our proposed scheme can achieve a compromise between system capacity and implementation complexity (feedback overhead and codebook search complexity).

The remainder of this paper is organized as follows. In Section 2, the massive MIMO system model is described. Section 2 first reviews the PCA method itself in Subsection and then provides the details of the proposed scheme in Subsection. Moreover, distortion of the proposed scheme is analyzed in Subsection. The feedback overhead as well as codebook search complexity comparison and numerical results follow in Section 4 and Section 5, respectively. Finally, the conclusion of this paper is presented in Secton 6.

Notation: Throughout this paper, upper and lower case boldfaces are used to describe matrix A and vector a, respectively. We denote the transpose and the conjugate transpose of matrix A or vector a by A T(a T) and A H(a H). In addiction, A −1 denotes the inverse of a square matrix.

2 System model

We consider a downlink massive MIMO system, where there is a single cell, in which the BS equipped with N t antennas serves K single-antenna MSs.

2.1 Spatially correlated massive MIMO channel

A massive MIMO broadcast channel is modeled in this section. For simplicity, but without loss of generality, a large-scale uniform linear array (ULA) with an enormous number of antenna elements deployed compactly is assumed. The spatial correlations are exhibited in the massive MIMO channel model, because of the insufficient inter-element spacing. Additionally, a poor scattering environment may also contribute to the spatial correlation. Different from the previous works, which only consider either insufficient inter-element spacing or poor scattering environment, this paper combines the well-known Kronecker correlation model [25] with the geometrical one-ring model [26, 27], so as to describe the properties of the spatial correlation of the massive MIMO channel more precisely. Since the MS is equipped with a single antenna, the channel between the k th MS and BS is denoted by a 1×N t row vector h k (k=1,2,…,K). Based on the Kronecker correlation model, h k can be modeled as

$$ {\textbf{h}_{k}} = {\textbf{h}_{{\mathrm{one - ring}}}}\textbf{R}_{\text{Tx}}^{\frac{1}{2}}, $$
(1)

where \(\textbf {R}_{\text {Tx}}^{\frac {1}{2}}\) is the square root of the correlation matrix at the transmitter depicting the impact of insufficient inter-element spacing and h one−ring is derived from the one-ring model describing the spatial correlation caused by a scattering environment. Note that the correlation of the channel is time varying, due to the change of both the relative positions of scatterers and the correlation matrix at the transmitter.

In more detail, the u th row and the v th column entry of R Tx (the correlation coefficient between the u th and the v th elements within the BS transmit antenna array) obeys the zeroth-order Bessel function of the first kind correlation model [18], that is

$$ {r_{uv}} = {J_{0}}\left({\frac{{2\pi {d_{uv}}}}{\lambda }} \right), $$
(2)

where d uv is the distance between the two antenna elements and λ denotes the carrier wavelength.

As to the one-ring model, we assume that each MS is surrounded by Q scatterers, which are uniformly distributed on a circle with the radius r, as shown in Fig. 1. The h one−ring can be given as follows [27],

$$ {\textbf{h}_{{\mathrm{one - ring}}}} \buildrel \Delta \over = \frac{1}{{\sqrt Q }}\sum\limits_{q = 1}^{Q} {{\textbf{h}_{kq}}}. $$
(3)
Fig. 1
figure 1

Illustration of the one-ring model with Q scatterers uniformly distributed on a circle of radius r

In (3), h kq is the channel vector of the MS k over the q th scattering path, as given by

$$ {}{\textbf{h}_{kq}} \buildrel \Delta \over = \left[ {\sqrt {{\beta_{kq1}}} {e^{- j2\pi \frac{{{d_{kq1}} + r}}{\lambda }}}, \ldots,\sqrt {{\beta_{kq{N_{\mathrm{t}}}}}} {e^{- j2\pi \frac{{{d_{kq{N_{\mathrm{t}}}}} + r}}{\lambda }}}} \right] {e^{j{\varphi_{kq}}}}, $$
(4)

where d kqm is the distance between the q th scatterer of the k th MS and the m th (m=1,2…,N t) BS antenna, while d kqm +r denotes the path length from the k th MS to the m th antenna via the q th path. Also, \({{e^{j{\varphi _{kq}}}}}\phantom {\dot {i}\!}\) represents the random common phase resulting from either the random perturbations of the MS location or the phase shift due to the reflection of the scatterer, and β kqm denotes the path loss of the q th scattering path, which is modeled by

$$ {\beta_{kqm}} = \frac{\alpha }{{{{\left({{d_{kqm}} + r} \right)}^{\gamma} }}}, $$
(5)

where α is a constant and γ is the path loss exponent.

2.2 Downlink signal model

In the downlink transmission, \({{s_{k}} \in \mathbb {C} }\) and \({{\textbf {w}_{k}} \in {\mathbb {C}^{{N_{\mathrm {t}}} \times 1}}}\) denote the transmit signal with power constraint \({{\mathbb {E}}{\left | {{s_{k}}} \right |^{2}} = 1}\) and the column precoding vector intended for the k th MS, respectively. In this paper, zero-forcing precoding is adopted to eliminate multiuser interference [28]. Also, let n k be additive Gaussian noise with zero mean and unit variance at the MS k. Then the received signal of the k th MS can be expressed as

$$ {y_{k}} = \underbrace {\sqrt {\frac{{{P_{\mathrm{t}}}}}{K}} {\textbf{h}_{k}}{\textbf{w}_{k}}{s_{k}}}_{{\mathrm{desired~signal}}} + \underbrace {\sum\limits_{k' \ne k} {\sqrt {\frac{{{P_{\mathrm{t}}}}}{K}} {\textbf{h}_{k}}{\textbf{w}_{k'}}{s_{k'}}} + {n_{k}}}_{{\mathrm{interfering~signal~and~noise}}}, $$
(6)

where P t is the total transmit power of the BS. Equal power allocation is assumed with \({\frac {{{P_{\mathrm {t}}}}}{K}}\) being the power distributed to each MS.

As seen in (6), y k contains two main terms. The first term is the desired signal, while the other is the interfering signal and noise. From (6), we can derive the system capacity as

$$ \mathrm{C} = \sum\limits_{k = 1}^{K} {{{\log }_{2}}\left({1 + \frac{{\frac{{{P_{t}}}}{K}{{\left| {{\textbf{h}_{k}}{\textbf{w}_{k}}} \right|}^{2}}}}{{\frac{{{P_{t}}}}{K}\sum\limits_{k' \ne k} {{{\left| {{\textbf{h}_{k}}{\textbf{w}_{k'}}} \right|}^{2}} + 1} }}} \right)}. $$
(7)

3 Feedback scheme for massive MIMO

3.1 Review of principal component analysis

We suppose that there are a data samples, each of which contains b characteristics. The b characteristics have complicated correlation relationships with each other, which makes it possible for dimensionality reduction with PCA. For convenience of description, let the a×b matrix X denote the original data containing the a data samples. The key point of PCA is how to derive a b×l (l<b) compression matrix \({\bar {\boldsymbol {\Psi }}}\), which is utilized to compress the high-dimensional data a×b X into a low-dimensional a×l \({\bar {\mathbf {X}}}\) as follows,

$$ \bar{\mathbf{X}} = {\mathbf{X}\bar{\boldsymbol{\Psi}}}, $$
(8)

in which, \({\bar {\boldsymbol {\Psi }}}\) is composed of l-dominating eigenvectors, the so-called principal components, which are selected from all b eigenvectors of X.

For the sake of determining which components are to be selected, the concept of contribution rate is introduced. Consider a descending ordering of the b eigenvalues λ 1,λ 2…,λ b . Then, the contribution rate of the g th eigenvalue λ g is defined as \({\frac {{{\lambda _{g}}}}{{{\sum \nolimits }_{g = 1}^{b} {{\lambda _{g}}} }}}\), while the cumulative contribution rate of the top l eigenvalues can be expressed by \({\frac {{{\sum \nolimits }_{g = 1}^{l} {{\lambda _{g}}} }}{{{\sum \nolimits }_{g = 1}^{b} {{\lambda _{g}}} }}}\). Generally, when the cumulative contribution rate of the chosen l principal components exceeds a certain level, the information loss is acceptable.

Finally, the original X can be recovered from \({\bar {\mathbf {X}}}\) by

$$ \hat{\mathbf{X}} = \bar{\mathbf{X}}{\bar{\boldsymbol{\Psi}}^{H}}. $$
(9)

3.2 Proposed PCA-based feedback scheme

A PCA-based feedback scheme for massive MIMO is proposed in this subsection. In the proposed scheme, different operations at the MS and the BS have different time periods, long-term period T l and short-term period T s. Every long-term period T l contains several short-term periods T s. In every T s, the MS utilizes the compression matrix to compress high-dimensional CSI into low-dimensional representation. Then, the compressed low-dimensional CSI is quantized by the RVQ codebook and the index of the preferred codeword is fed back to the BS. Because of the signal-dependent nature of PCA, the compression matrix is derived by executing PCA on the CSI which is obtained through continuous channel estimation during a whole long-term period.

3.2.1 Compression matrix derivation

First of all, the detailed procedure for deriving the compression matrix in the n th long-term period T l (n) with the PCA method is given in Table 1. We assume MS k can obtain S high-dimensional channel vectors \(\left ({\textbf {h}_{k}^{\left ({n,1} \right)},\textbf {h}_{k}^{\left ({n,2} \right)} \ldots,\textbf {h}_{k}^{\left ({n,S} \right)}}\right)\) through ideal channel estimation in T l (n). Here, the S channel vectors can be viewed as S data samples, each of which contains N t characteristics. In order to compress the high-dimensional CSI (channel vectors), we choose M (MN t) dominating eigenvectors to compose the compression matrix \({{\bar {\mathbf {U}}^{\left (n \right)}} \in {\mathbb {C}^{{N_{\mathrm {t}}} \times M}}}\).

Table 1 Procedure of deriving the compression matrix in each long-term period with the PCA method at the MS

The compression matrix obtained in the long-term period T l (n) will be used by the MS to compress 1×N t channel vectors into 1×M vectors, as well as by the BS to perform recovery in the period T l (n+1).

3.2.2 MS operation

The main operations at the MS can be classified into two types: long-term period operations and short-term period operations.

In the long-term period \({T_{\mathrm {l}}^{\left (n \right)}}\), the MS performs the continuous channel estimation to obtain S high-dimensional channel vectors, and derives the compression matrix \({{\bar {\mathbf {U}}^{\left (n \right)}}}\). Then, each column of the compression matrix is quantized by another RVQ codebook. After quantization, the compression matrix \({{\bar {\mathbf {U}}^{\left (n \right)}}}\) is fed back to the BS at the end of \({T_{\mathrm {l}}^{\left (n \right)}}\).

Operation in the s th (s=1,2…,S) short-term period of the n th long-term period, \({T_{\mathrm {s}}^{\left ({n,s} \right)}}\), is described as follows:

Step 1. Channel estimation is performed to obtain a 1×N t channel vector \({\textbf {h}_{k}^{\left ({n,s} \right)}}\).

Step 2. Multiply \(\textbf {h}_{k}^{\left ({n,s} \right)}\) by the compression matrix derived in the previous long-term period

$$ \bar{\mathbf{h}}_{k}^{\left({n,s} \right)} = \textbf{h}_{k}^{\left({n,s} \right)}{\bar{\mathbf{U}}^{\left({n - 1} \right)}}. $$
(10)

By this step, the original high-dimensional CSI (1×N t) is compressed into a low-dimensional representation (1×M). The compression ratio is \({\frac {M}{{{N_{\mathrm {t}}}}}}\).

Step 3. Quantize the low-dimensional CSI \({\bar {\mathbf {h}}_{k}^{\left ({n,s} \right)}}\) by RVQ codebook and obtain the index number of the codeword that best fits \({\bar {\mathbf {h}}_{k}^{\left ({n,s} \right)}}\), that is

$$ j^{\left({n,s} \right)} = \mathop {\arg \max }\limits_{j} \left| {{\bar{\mathbf{h}}}_{k}^{\left({n,\,s} \right)} {\mathbf{c}}_{j}} \right|, $$
(11)

where c j is the j th codeword of the codebook. Compared with the quantizing high-dimensional CSI directly, the RVQ codebook used above can be designed to be much smaller. This not only reduces the feedback overhead, but also decreases the codebook search complexity.

Step 4. The index of the preferred codeword is fed back to the BS.

3.2.3 BS operation

Similarly, the main operation at the BS can also be classified into long-term period operation and short-term period operation. At the end of \({T_{\mathrm {l}}^{\left (n \right)}}\), the BS receives the compression matrix \({{\bar {\mathbf {U}}^{\left (n \right)}}}\) to perform high-dimensional CSI recovery in the next long-term period \({T_{\mathrm {l}}^{\left ({n + 1} \right)}}\). Meanwhile, the short-term period operation follows the steps below:

Step 1. The codeword index j (n,s) is received in each short-term period;

Step 2. As the BS and the MS share the same codebook, it is easy for the BS to find the quantized low-dimensional CSI \({\hat {\mathbf {h}}_{k}^{\left ({n,s} \right)}}\) by letting \({\hat {\mathbf {h}}_{k}^{\left ({n,s} \right)} = {\textbf {c}_{{j^{\left ({n,s} \right)}}}}}\);

Step 3. The high-dimensional CSI \({\stackrel {\frown }{\mathbf {h}}}_{k}^{\left ({n,s} \right)}\) can be recovered by

$$ {\stackrel{\frown}{\mathbf{h}}}_{k}^{\left({n,\,s} \right)} = \hat{\mathbf{h}}_{k}^{\left({n,\,s} \right)}{\left({{{\bar{\mathbf{U}}}^{\left({n - 1} \right)}}} \right)^{H}}, $$
(12)

where \({{\bar {\mathbf {U}}^{\left ({n - 1} \right)}}}\) is derived from the period \({T_{\mathrm {l}}^{\left ({n - 1} \right)}}\).

3.3 Distortion analysis of proposed scheme

The distortion of our proposed scheme consists of three components, the distortion resulting from the PCA processing, the low-dimensional CSI \({\bar {\mathbf {h}}}\) quantization and the compression matrix \({\bar {\mathbf {U}}}\) quantization. To facilitate the distortion analysis below, different representations of CSI in different stages are enumerated in Table 2.

Table 2 Different representations of CSI in different stages

3.3.1 Distortion analysis of PCA

First, we consider the distortion caused by the PCA method itself, with no quantization errors resulting from low-dimensional CSI or the compression matrix taken into account. That is, we measure the mean square error between h and \({\tilde {\mathbf {h}}}\), where \({\tilde {\mathbf {h}} = {\mathbf {h}\bar {\mathbf {U}}}{\bar {\mathbf {U}}^{H}}}\). In this paper, h is a N t dimensional row vector, which can be viewed as a point in the N t dimensional space. Therefore, h can be expressed by the linear combination of a set of orthogonal basis vectors, u i

$$ \textbf{h} = \sum\limits_{i = 1}^{{N_{\mathrm{t}}}} {{\alpha_{i}}{\textbf{u}_{i}}}, $$
(13)

where u i denotes the i th basis vector.

In the PCA method, the high-dimensional CSI h is compressed into low-dimensional (M-dimensional) \({\bar {\mathbf {h}}}\), the components of which are derived by projecting h onto the M dominating bais vectors. Given this, the reconstructed high-dimensional CSI \({\tilde {\mathbf {h}}}\) can be modeled as the combination of M dominating basis vectors and the other N tM less dominating vectors,

$$ \tilde{\mathbf{h}} = \sum\limits_{i = 1}^{M} {{z_{i}}{\textbf{u}_{i}}} + \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{b_{i}}{\textbf{u}_{i}}}. $$
(14)

Therefore, the information distortion caused by the PCA itself, which is defined as the mean square error between h and \({\tilde {\mathbf {h}}}\), can be expressed by

$$ J = \frac{1}{S}\sum\limits_{n = 1}^{S} {{{\left\| {{\textbf{h}_{n}} - {{\tilde{\mathbf{h}}}_{n}}} \right\|}^{2}}}, $$
(15)

where S denotes the number of short-term periods in a long-term period.

Proposition.

The PCA-caused information distortion J can be expressed by a linear sum of N tM less dominating eigenvalues of the channel covariance matrix.

$$J = \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\lambda_{i}}}. $$

Proof.

See Appendix A.

3.3.2 Distortion analysis of quantization

According to [6], to measure the quantization error, \({\bar {\mathbf {h}}}\) can be modeled as

$$ \bar{\mathbf{h}} = \sqrt {1 - {d^{2}}} \hat{\mathbf{h}} + d\textbf{e}, $$
(16)

where \({\hat {\mathbf {h}}}\) is the quantization of \({\bar {\mathbf {h}}}\) and e is a unit norm vector isotropically distributed in the nullspace of \({\hat {\mathbf {h}}}\). Parameter d denotes the quantization error independent of e, which satisfies \({\mathbb {E}\left [ {{d^{2}}} \right ] \le {2^{- \frac {{{B_{1}}}}{{M - 1}}}}}\). Here, M represents the number of principal components and B 1 is the number of feedback bits of \({\bar {\mathbf {h}}}\). Similarly, \({\bar {\mathbf {U}}}\) can be modeled as

$$ \bar{\mathbf{U}} = \sqrt {1 - {D^{2}}} \hat{\mathbf{U}} + D\textbf{E}, $$
(17)

where \({\hat {\mathbf {U}}}\) is the quantization of \({\bar {\mathbf {U}}}\) and E is composed of M unit norm vectors isotropically distributed in the nullspace of \({\hat {\mathbf {U}}}\). Moreover, quantization error D is independent of \({\hat {\mathbf {U}}}\) satisfying

$$ \mathbb{E}\left[ {{D^{2}}} \right] \le {2^{- \frac{{{B_{2}}/M}}{{{N_{\mathrm{t}}} - 1}}}}, $$
(18)

where B 2 denotes the number of feedback bits of the compression matrix.

Before analyzing the distortion between the original high-dimensional CSI h and the reconstructed h, we first focus on how to express h in terms of h.

Proposition.

The reconstructed high-dimensional CSI h can be expressed in terms of h as

$$ {\stackrel{\frown}{\mathbf{h}}} = \frac{{{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d\sqrt {1 - {D^{2}}} \cdot \textbf{e}{{\hat{\mathbf{U}}}^{H}}}}{{\sqrt {1 - {d^{2}}} \cdot \sqrt {1 - {D^{2}}} }}, $$
(19)

where \({{\textbf {I}_{M}} = \left [ {\begin {array}{*{20}{c}} {{\textbf {I}_{M \times M}}}&{{{\textbf {0}}_{M \times \left ({{N_{\mathrm {t}}} - M} \right)}}}\\ {{{\textbf {0}}_{\left ({{N_{\mathrm {t}}} - M} \right) \times M}}}&{{{\textbf {0}}_{\left ({{N_{\mathrm {t}}} - M} \right) \times \left ({{N_{\mathrm {t}}} - M} \right)}}} \end {array}} \right ]}\).

Proof.

See Appendix B.

Having derived an expression for h in terms of h, our purpose is to analyze the distortion of the proposed scheme. We derive an upper bound to the normalized distortion (denoted by δ) between h and h. Instead of calculating δ directly, we first calculate the normalized similarity (denoted by ρ) between h and h. Then, δ can be conveniently obtained by δ=1−ρ.

Before calculating the normalized similarity ρ between h and h, it is insightful to look at the non-normalized similarity \({\mathbb {E}\left [ {\left | {\textbf {h}{{\stackrel {\frown }{\mathbf {h}}}^{H}}} \right |} \right ]}\).

Proposition.

A lower bound to the non-normalized similarity between h and h is given by

$$ \mathbb{E}\left[ {\left| {\textbf{h}{{\stackrel{\frown}{\mathbf{h}}}}^{H}} \right|} \right] \ge A - M \cdot {2^{- \frac{{{B_{2}}/M}}{{{N_{\mathrm{t}}} - 1}}}}- A\cdot {2^{- \frac{{{B_{1}}}}{{M - 1}}}}, $$
(20)

where \(A={\left ({{N_{\mathrm {t}}} - \sum \limits _{i = M + 1}^{{N_{\mathrm {t}}}} {{\lambda _{i}}}} \right)}\).

Proof.

Since we have derived an expression for h in terms of h, as given in Proposition 2, then the non-normalized similarity can be expressed by

$$ \begin{array}{l} \mathbb{E}\left[ {\left| {\textbf{h}{{\stackrel{\frown}{\mathbf{h}}}^{H}}} \right|} \right]\\ = \mathbb{E}\left[ {\left| {\textbf{h}{{\left({\frac{{{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d\sqrt {1 - {D^{2}}} \cdot \textbf{e}{{\hat{\mathbf{U}}}^{H}}}}{{\sqrt {1 - {d^{2}}} \cdot \sqrt {1 - {D^{2}}} }}} \right)}^{H}}} \right|} \right]\\ = \mathbb{E}\left[ {\left| {\frac{{{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d \cdot \textbf{h}\left({\bar{\mathbf{U}} - D\textbf{E}} \right){\textbf{e}^{H}}}}{{\sqrt {1 - {d^{2}}} \cdot \sqrt {1 - {D^{2}}} }}} \right|} \right]. \end{array} $$
(21)

Now, because d 2<1 and D 2<1, \({\sqrt {1 - {d^{2}}} }\) and \({\sqrt {1 - {d^{2}}} }\) are also smaller than 1. Additionally, since the quantization of low-dimensional CSI and the compression matrix are independent, \({\mathbb {E}\left [ {\textbf {E}{\textbf {e}^{H}}} \right ] = 0}\). Therefore, the non-normalized similarity can be bounded by (22), shown at the top of the next page.

$$ {{\begin{aligned} {}\mathbb{E}\left[ {\left| {\textbf{h}{{\stackrel{\frown}{\mathbf{h}}}^{H}}} \right|} \right] &\ge \mathbb{E}\left[ {\left| {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d \cdot \textbf{h}\left({\bar{\mathbf{U}} - D\textbf{E}} \right){\textbf{e}^{H}}} \right|} \right]\\ &\ge \mathbb{E}\left[ {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d \cdot \textbf{h}\left({\bar{\mathbf{U}} - D\textbf{E}} \right){\textbf{e}^{H}}} \right]\\ &= \mathbb{E}\left[ {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d \cdot {\mathbf{h}\bar{\mathbf{U}}}{\textbf{e}^{H}}} \right]\\ &= \mathbb{E}\left[ {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}} - {{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}}{d^{2}}} \right]\\ &= \mathbb{E}\left[ {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}}} \right] - \mathbb{E}\left[ {{D^{2}}} \right] \cdot \mathbb{E}\left[ {\textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}}} \right] - \mathbb{E}\left[ {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}}} \right] \cdot \mathbb{E}\left[ {{d^{2}}} \right] \end{aligned}}} $$
(22)

Further, we take advantage of the equations

$$ \mathbb{E}\left[{{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}}} \right] = {N_{\mathrm{t}}} - J, $$
(23)
$$ \mathbb{E}\left[ {\textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}}} \right] = M, $$
(24)

which are proved in Appendix C. Moreover, the upper boundary of \({\mathbb {E}\left [ {{D^{2}}} \right ]}\) and \({\mathbb {E}\left [ {{D^{2}}} \right ]}\) are \({{2^{- \frac {{{B_{2}}/M}}{{{N_{\mathrm {t}}} - 1}}}}}\) and \({{2^{- \frac {{{B_{1}}}}{{M - 1}}}}}\), respectively. Consequently, we obtain (20).

From (23), we can observe that when there is no distortion caused by PCA (J=0), as well as no quantization error from low-dimensional CSI (\({{2^{- \frac {{{B_{1}}}}{{M - 1}}}} \to 0}\)) or compression matrix (\({{2^{- \frac {{{B_{2}}/M}}{{{N_{\mathrm {t}}} - 1}}}} \to 0}\)), the maximum value of the lower bound reaches N t . So the normalized similarity can be expressed as

$$ \rho \ge \frac{{A - M \cdot {2^{- \frac{{{B_{2}}/M}}{{{N_{\mathrm{t}}} - 1}}}} - A\cdot {2^{- \frac{{{B_{1}}}}{{M - 1}}}}}}{{{N_{\mathrm{t}}}}}. $$
(25)

Finally, the upper bound of the normalized distortion of our proposed scheme can be obtained,

$$ \delta = 1 - \rho \le 1 - \frac{{A - M \cdot {2^{- \frac{{{B_{2}}/M}}{{{N_{\mathrm{t}}} - 1}}}} - A\cdot {2^{- \frac{{{B_{1}}}}{{M - 1}}}}}}{{{N_{\mathrm{t}}}}}. $$
(26)

According to the expression (26), we give the theoretical upper bound of the distortion and the simulated distortion in Fig. 2.

Fig. 2
figure 2

Upper bound of distortion versus the number of principal components

4 Implementation complexity analysis

This section analyzes the feedback overhead and the codebook search complexity of the proposed scheme. For comparison, the existing CS-based schemes utilizing the KLT basis and the DCT basis are also taken into account.

The number of feedback bits per user increases linearly with the number of transmit antennas, as modeled by [6]

$$ B = \left({{N_{\mathrm{t}}} - 1} \right){\log_{2}}\rho \approx \frac{{{N_{\mathrm{t}}} - 1}}{3}\rho, $$
(27)

where ρ denotes the received signal-to-noise ratio (SNR) in decibels at the MS.

Consider a long-term period containing S short-term periods. When the proposed scheme is adopted, the number of feedback bits can be represented by

$$ {B_{{\text{pro}}}} = S\frac{{\left({M - 1} \right)\rho }}{3} + M\frac{{\left({{N_{\mathrm{t}}} - 1} \right)\rho }}{3}, $$
(28)

where M is the number of the principal components, the first term denotes the number of feedback bits for quantizing low-dimensional CSI, and the second term is caused by the quantization of the compression matrix.

For the DCT-based CS scheme, there is no need for the MS to inform the BS of the channel correlation matrix, due to the signal-independent nature of the DCT basis. So the number of feedback bits for DCT-based CS is given by \({{B_{\text {DCT}}} = S \cdot \frac {{\left ({M - 1} \right)\rho }}{3}}\).

In the case of the KLT-based CS scheme, the number of feedback bits dramatically increases. In order to feedback the complete correlation matrix in every short-term period, the second term of Eq. (28) needs to be modified to express the number of feedback bits of the KLT-based CS scheme, which is given by

$$ {B_{\text{KLT}}} = S \cdot \frac{{\left({M - 1} \right)\rho }}{3} + S \cdot {N_{\mathrm{t}}} \cdot \frac{{\left({{N_{\mathrm{t}}} - 1} \right)\rho }}{3}. $$
(29)

As for the codebook search complexity, it is proportional to the number of conjugate multiplications when searching for the best codeword. So the search complexity of the proposed scheme, DCT-based CS and KLT-based scheme in a long-term period can be expressed as \({S \cdot M \cdot {2^{\frac {{\left ({M - 1} \right)\rho }}{3}}} + M \cdot {N_{\mathrm {t}}} \cdot {2^{\frac {{\left ({{N_{\mathrm {t}}} - 1} \right)\rho }}{3}}}}\), \({S \cdot M \cdot {2^{\frac {{\left ({M - 1} \right)\rho }}{3}}}}\) and \({S \cdot M \cdot {2^{\frac {{\left ({M - 1} \right)\rho }}{3}}} + S \cdot {N_{\mathrm {t}}}^{2} \cdot {2^{\frac {{\left ({{N_{\mathrm {t}}} - 1} \right)\rho }}{3}}}}\), respectively.

Table 3 illustrates the comparison in detail. Based on the analysis above, we can observe that the number of feedback bits and the codebook search complexity of the proposed scheme falls in between the DCT-based and KLT-based CS schemes.

Table 3 Comparison of feedback overhead and codebook search complexity

5 Simulation results

In this section, we present simulation results. A single cell scenario is considered, where the BS deploys a uniform linear array with N t=128 antennas serving K=6 single-antenna MSs. Table 4 lists the detailed simulation parameters.

Table 4 Simulation parameters

5.1 Feasibility validation

We verify whether PCA can be utilized to compress spatially correlated high-dimensional CSI into low-dimensional representation. To achieve this purpose, we simulate the eigenvalue distribution of the spatially correlated channels defined in (1). As shown in Fig. 3, the eigenvalue distribution of the spatially correlated channel is far from uniform. The eigenvalues are sorted by their contribution rate in descending order. The contribution rate of the biggest eigenvalue exceeds 50 %, while the fourth biggest eigenvalue only contributes 5.9 %. The cumulative contribution rate of the top four eigenvalues exceeds 95 %. We can conclude that the spatially correlated channel vectors can be expressed by several principal components with low information distortion.

Fig. 3
figure 3

Eigenvalue distribution of the spatially correlated channel

5.2 Evaluations of the proposed scheme

We show the simulation results of channel compression of the proposed scheme in Figs. 4 and 5. It is assumed that the SNR is 20 dB and there is no quantization error of the low-dimensional CSI.

Fig. 4
figure 4

System capacity versus compression ratio η (SNR = 20 dB, perfect quantization of low-dimensional CSI)

Fig. 5
figure 5

Recovery performance comparison between PCA and DCT-based CS

Figure 4 shows the effect of the compression ratio \(\left ({\eta = \frac {M}{{{N_{\mathrm {t}}}}}}\right)\) on the system capacity. The comparison is among the proposed scheme, DCT-based CS and KLT-based CS. We can observe that whether in low or high compression ratio regimes, the KLT-based CS has the best performance, while the DCT-based CS performs the worst. To be emphasized, the best performance of the KLT-based CS is at a sacrifice of increased feedback overhead, as shown in Table 3. In this sense, the proposed scheme can offer a useful tradeoff. Additionally, as Fig. 4 shows, the proposed scheme performs much better than DCT-based CS in low compression ratio regimes.

Figure 5 illustrates the recovery performance of the high-dimensional CSI at the BS under the circumstances that the BS has perfect knowledge of the low-dimensional CSI without quantization. We take the proposed scheme and DCT-based CS for comparison. The 1×N t original CSI is compressed into 1×M CS low-dimensional information, where M CS=20 and the compression ratio is \({{\eta _{\text {CS}}} = \frac {{{M_{\text {CS}}}}}{{{N_{\mathrm {t}}}}} \approx 0.16}\), while in the proposed scheme, the number of principal components is M PCA=4 with the compression ratio being \({{\eta _{{\text {PCA}}}} = \frac {{{M_{\text {PCA}}}}}{{{N_{\mathrm {t}}}}} \approx 0.03}\).

As can be seen, the reconstructed high-dimensional CSI is considerably close to the original data when the PCA is utilized. But there still exists distortion because the PCA itself inevitably introduces information loss. However, the recovery performance gets poorer in the case of the DCT-based CS. The reason is that the proposed scheme takes advantage of the signal-dependent nature of PCA, which makes it possible for the compression matrix to change adaptively in every long-term period according to the variation of the original data.

Figure 6 shows a system capacity (defined as the sum of all the users’ rates in the system) comparison. We choose four principal components to form the compression matrix, so the compression ratio of PCA is 0.03. For reference, we first consider the ideal situation, where the BS can acquire perfect CSI with neither recovery distortion nor quantization error. As illustrated in Fig. 6, the best system capacity can be achieved only when the BS acquires perfect CSI. Meanwhile, the proposed scheme outperforms the existing DCT-based CS scheme whether there is quantization error resulting from low-dimensional CSI or not. But, it performs a little poorer than the KLT-based CS scheme.

Fig. 6
figure 6

System capacity comparison between PCA and other CS schemes

When we utilize the RVQ codebook to quantize low-dimensional CSI, the system capacity decreases in both cases because the quantization error must be taken into account. Based on the results in Fig. 6 and the feedback overhead analysis in Subsection 3.3, we can draw the conclusion that our proposed scheme can offer a worthwhile design tradeoff between system capacity and feedback overhead.

6 Conclusions

In this paper, a PCA-based feedback scheme for massive MIMO was proposed. In the proposed scheme, two kinds of feedback information, the quantized low-dimensional CSI index and the compression matrix utilized to perform both compression and recovery, are fed back hierarchically. Moreover, we obtained a closed-form expression for an upper bound to the normalized information distortion. We analyzed the feedback overhead and codebook search complexity of the proposed scheme. Simulation results showed that without considering the low-dimensional CSI quantization, the proposed scheme outperforms the existing DCT-based CS scheme in terms of compression ratio. When a RVQ codebook is adopted to quantize the low-dimensional CSI, worthwhile system capacity and recovery performance can be achieved. Finally, we draw the conclusion that the proposed scheme can achieve a useful performance tradeoff between system capacity and feedback overhead, which gives it high potential to be implemented in practical massive MIMO systems.

7 Appendix

7.1 A Proof of Proposition 1

The PCA-caused information distortion J is

$$ \begin{aligned} J &= \frac{1}{S}\sum\limits_{n = 1}^{S} {{{\left\| {{\textbf{h}_{n}} - {{\tilde{\mathbf{h}}}_{n}}} \right\|}^{2}}} \\ &= \frac{1}{S}\sum\limits_{n = 1}^{S} {{{\left\| {\sum\limits_{i = 1}^{{N_{\mathrm{t}}}} {{\alpha_{ni}}{\textbf{u}_{i}}} - \sum\limits_{i = 1}^{M} {{z_{ni}}{\textbf{u}_{i}}} - \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{b_{i}}{\textbf{u}_{i}}}} \right\|}^{2}}} \\ &= \frac{1}{S}\sum\limits_{n = 1}^{S} {{{\left\| {\sum\limits_{i = 1}^{M} {\left({{\alpha_{ni}} - {z_{ni}}} \right){\textbf{u}_{i}}} + \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {\left({{\alpha_{ni}} - {b_{i}}} \right){\textbf{u}_{i}}}} \right\|}^{2}}} \\ &= \frac{1}{S}\sum\limits_{n = 1}^{S} {\left[ {\sum\limits_{i = 1}^{M} {{{\left({{\alpha_{ni}} - {z_{ni}}} \right)}^{2}} + \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{{\left({{\alpha_{ni}} - {b_{i}}} \right)}^{2}}}} } \right]}. \end{aligned} $$
(30)

In order to minimize J, we take partial derivatives with respect to z ni and b i separately, as given by

$$ \frac{{\partial J}}{{\partial {z_{ni}}}} = \frac{1}{S}\sum\limits_{n = 1}^{S} {\sum\limits_{i = 1}^{M} {\left[ { - 2\left({{\alpha_{ni}} - {z_{ni}}} \right)} \right]} }; $$
(31)
$$ \begin{aligned} \frac{{\partial J}}{{\partial {b_{i}}}} &= \frac{1}{S}\sum\limits_{n = 1}^{S} {\sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {\left[ { - 2\left({{\alpha_{ni}} - {b_{i}}} \right)} \right]}} \\ &= - \frac{2}{S}\sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {\left({\sum\limits_{n = 1}^{S} {{\alpha_{ni}}} - S \cdot {b_{i}}} \right)}. \end{aligned} $$
(32)

Letting \({\frac {{\partial J}}{{\partial {z_{ni}}}} = 0}\), we can obtain z ni =α ni , that is

$$ {z_{ni}} = {\textbf{h}_{n}}{\textbf{u}_{i}}^{H} \left({i = 1,2\ldots,M;n = 1,2\ldots,S} \right). $$
(33)

Similarly, when letting \({\frac {{\partial J}}{{\partial {b_{i}}}} = 0}\), we can also acquire

$$ {b_{i}} = \frac{1}{S}\sum\limits_{n = 1}^{S} {{\alpha_{ni}}} = {\stackrel{\smile}{\mathbf{h}}}{\textbf{u}_{i}}^{H} \left({i = M + 1,M + 2\ldots,{N_{\mathrm{t}}}} \right), $$
(34)

where h denotes the mean vector of all the S high-dimensional channel vectors estimated in a long-term period, as given by

$$ {\stackrel{\smile}{\mathbf{h}}} = \frac{1}{S}\sum\limits_{n = 1}^{S} {{\textbf{h}_{n}}}. $$
(35)

Substitute (33) and (34) into (14), and then \({\tilde {\mathbf {h}}}\) can be re-written as

$$ {\tilde{\mathbf{h}}_{n}} = \sum\limits_{i = 1}^{M} {{\textbf{h}_{n}}{\textbf{u}_{i}}^{H}{\textbf{u}_{i}}} + \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\stackrel{\smile}{\mathbf{h}}}{\textbf{u}_{i}}^{H}{\textbf{u}_{i}}}. $$
(36)

Then,

$$ \begin{array}{l} {\textbf{h}_{n}} - {{\tilde{\mathbf{h}}}_{n}} \\ = \sum\limits_{i = 1}^{{N_{\mathrm{t}}}} {{\textbf{h}_{n}}{\textbf{u}_{i}}^{H}{\textbf{u}_{i}}} - \left({\sum\limits_{i = 1}^{M} {{\textbf{h}_{n}}{\textbf{u}_{i}}^{H}{\textbf{u}_{i}}} + \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\stackrel{\smile}{\mathbf{h}}}{\textbf{u}_{i}}^{H}{\textbf{u}_{i}}}} \right)\\ = \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\textbf{h}_{n}}{\textbf{u}_{i}}^{H}{\textbf{u}_{i}}} - \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\stackrel{\smile}{\mathbf{h}}}{\textbf{u}_{i}}^{H}{\textbf{u}_{i}}} \\ = \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {\left[ {\left({{\textbf{h}_{n}} - {\stackrel{\smile}{\mathbf{h}}}} \right){\textbf{u}_{i}}^{H}} \right]{\textbf{u}_{i}}}. \end{array} $$
(37)

Therefore, J can be re-expressed by

$$ \begin{aligned} J &= \frac{1}{S}\sum\limits_{n = 1}^{S} {{{\left\| {{\textbf{h}_{n}} - {{\tilde{\mathbf{h}}}_{n}}} \right\|}^{2}}} \\ &= \frac{1}{S}\sum\limits_{n = 1}^{S} {\sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{{\left({{\textbf{h}_{n}}{\textbf{u}_{i}}^{H} - \stackrel{\smile}{\mathbf{h}}{\textbf{u}_{i}}^{H}} \right)}^{2}}}} \\ &= \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {\frac{1}{S}} \sum\limits_{n = 1}^{S} {{{\left({{\textbf{h}_{n}}{\textbf{u}_{i}}^{H} - \stackrel{\smile}{\mathbf{h}}{\textbf{u}_{i}}^{H}} \right)}^{2}}}. \end{aligned} $$
(38)

In (38), the expression \({\frac {1}{S}\sum \limits _{n = 1}^{S} {{{\left ({{\textbf {h}_{n}}{\textbf {u}_{i}}^{H} - \stackrel {\smile }{\mathbf {h}}{\textbf {u}_{i}}^{H}} \right)}^{2}}} }\) can be viewed as the covariance of h n u i H. If we assume C h to be the covariance matrix of h n , then J can be further given by

$$ J = \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\textbf{u}_{i}}{\textbf{C}_{h}}{\textbf{u}_{i}}^{H}}. $$
(39)

Our target is to minimize the PCA caused information distortion J, which can be solved by the Lagrange Multiplier (LM) method. After applying the LM, we can observe that the base vector must satisfy

$$ {\textbf{C}_{h}}{\textbf{u}_{i}}^{H} = {\lambda_{i}}{\textbf{u}_{i}}^{H}. $$
(40)

Equation (40) indicates that base vector u i should be chosen as the eigenvector of channel covariance matrix C h , and λ i is the corresponding eigenvalue. As a result, the PCA-caused information distortion J is

$$ J = \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\textbf{u}_{i}}{\textbf{C}_{h}}{\textbf{u}_{i}}^{H}} = \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\textbf{u}_{i}}{\lambda_{i}}{\textbf{u}_{i}}^{H} =} \sum\limits_{i = M + 1}^{{N_{\mathrm{t}}}} {{\lambda_{i}}}. $$
(41)

7.2 B Proof of Proposition 2

As has been mentioned in Table 3, h is the reconstructed high-dimensional CSI recovered from \({\hat {\mathbf {h}}}\), which is given by

$$ {\stackrel{\frown}{\mathbf{h}}} = \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot \hat{\mathbf{h}}{\hat{\mathbf{U}}^{H}}, $$
(42)

where \({\left \| {{\mathbf {h}\bar {\mathbf {U}}}} \right \|}\) represents the modulus of low-dimensional CSI, since we have performed normalization in the very beginning, that is

$$ \bar{\mathbf{h}} = \frac{{{\mathbf{h}\bar{\mathbf{U}}}}}{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}}. $$
(43)

Substituting (16), (17) and (43) into (42), we can rewrite h as

$$ \begin{aligned} {\stackrel{\frown}{\mathbf{h}}} &= \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot \hat{\mathbf{h}}{{\hat{\mathbf{U}}}^{H}}\\ &= \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot \frac{{\bar{\mathbf{h}} - d\textbf{e}}}{{\sqrt {1 - {d^{2}}} }} \cdot {\left({\frac{{\bar{\mathbf{U}} - D\textbf{E}}}{{\sqrt {1 - {D^{2}}} }}} \right)^{H}}\\ &= \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot \frac{{\frac{{{\mathbf{h}\bar{\mathbf{U}}}}}{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}} - d\textbf{e}}}{{\sqrt {1 - {d^{2}}} }} \cdot {\left({\frac{{\bar{\mathbf{U}} - D\textbf{E}}}{{\sqrt {1 - {D^{2}}} }}} \right)^{H}}\\ &= \frac{{{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}} - {\mathbf{h}\bar{\mathbf{U}}}{{\left({D\textbf{E}} \right)}^{H}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d\textbf{e}{{\left({\bar{\mathbf{U}} - D\textbf{E}} \right)}^{H}}}}{{\sqrt {1 - {d^{2}}} \cdot \sqrt {1 - {D^{2}}} }}. \end{aligned} $$
(44)

Moreover, because of the independence between \({\hat {\mathbf {U}}}\) and E, when multiplying \({\bar {\mathbf {U}}}\) in (17) by E H, one obtains

$$ \begin{array}{l} \begin{aligned} \bar{\mathbf{U}}{\textbf{E}^{H}} &= \sqrt {1 - {D^{2}}} \hat{\mathbf{U}}{\textbf{E}^{H}} + D\textbf{E}{\textbf{E}^{H}} = D\textbf{E}{\textbf{E}^{H}}\\ &= D\left[ {\begin{array}{*{20}{c}} {{\textbf{I}_{M \times M}}}&{{{\textbf{0}}_{M \times \left({{N_{\mathrm{t}}} - M} \right)}}}\\ {{{\textbf{0}}_{\left({{N_{\mathrm{t}}} - M} \right) \times M}}}&{{{\textbf{0}}_{\left({{N_{\mathrm{t}}} - M} \right) \times \left({{N_{\mathrm{t}}} - M} \right)}}} \end{array}} \right], \end{aligned} \end{array} $$
(45)

where I M×M represents the M×M identity matrix; \({{{\textbf {0}}_{M \times \left ({{N_{\mathrm {t}}} - M} \right)}}}\phantom {\dot {i}\!}\), \({{{\textbf {0}}_{\left ({{N_{\mathrm {t}}} - M} \right) \times M}}}\phantom {\dot {i}\!}\) and \({{{\textbf {0}}_{\left ({{N_{\mathrm {t}}} - M} \right) \times \left ({{N_{\mathrm {t}}} - M} \right)}}}\phantom {\dot {i}\!}\) denotes the M×(N tM), (N tMM and \({\left ({{N_{\mathrm {t}}} - M} \right) \times \left ({{N_{\mathrm {t}}} - M} \right)}\phantom {\dot {i}\!}\) zero matrix respectively. When assuming \({{\textbf {I}_{M}} = \left [ {\begin {array}{*{20}{c}} {{\textbf {I}_{M \times M}}}&{{{\textbf {0}}_{M \times \left ({{N_{\mathrm {t}}} - M} \right)}}}\\ {{{\textbf {0}}_{\left ({{N_{\mathrm {t}}} - M} \right) \times M}}}&{{{\textbf {0}}_{\left ({{N_{\mathrm {t}}} - M} \right) \times \left ({{N_{\mathrm {t}}} - M} \right)}}} \end {array}} \right ]}\), the following expression results,

$$ \bar{\mathbf{U}}{\textbf{E}^{H}} = D \cdot {\textbf{I}_{M}}. $$
(46)

Substitute (46) into (44), then (44) can be rewritten as

$$ \begin{aligned} {\stackrel{\frown}{\mathbf{h}}} &= \frac{{{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}} - D{\mathbf{h}\bar{\mathbf{U}}}{\textbf{E}^{H}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d\textbf{e}{{\left({\bar{\mathbf{U}} - D\textbf{E}} \right)}^{H}}}}{{\sqrt {1 - {d^{2}}} \cdot \sqrt {1 - {D^{2}}} }}\\ \\ &= \frac{{{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}} - {D^{2}} \cdot \textbf{h}{\textbf{I}_{M}} - \left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\| \cdot d\sqrt {1 - {D^{2}}} \cdot \textbf{e}{{\hat{\mathbf{U}}}^{H}}}}{{\sqrt {1 - {d^{2}}} \cdot \sqrt {1 - {D^{2}}} }}. \end{aligned} $$
(47)

7.3 C Proof of Eqs. (23) and (24)

First, we focus on Eq. (23). Assume \({\textbf {U} = \left [ {\bar {\mathbf {U}} \vdots \Delta \textbf {U}} \right ]}\), where \({\bar {\mathbf {U}}}\) is composed of the M dominating eigenvectors, while Δ U is composed of the less dominating N t M eigenvectors. In the proposed scheme, we only choose M dominating eigenvectors to compose compression matrix \({\bar {\mathbf {U}}}\), which is to be utilized to compress high-dimensional CSI into low-dimensional representation. Particularly, if choosing all of the N t eigenvectors, that is \({\bar {\mathbf {U}} = \textbf {U}}\), the distortion disappears, as given by

$$ \textbf{h} = {\textbf{hU}}{\textbf{U}^{H}}. $$
(48)

Substituting \({\textbf {U} = \left [ {\bar {\mathbf {U}} \vdots \Delta \textbf {U}} \right ]}\) into (48), and we can obtain

$$ \begin{aligned} \textbf{h} &= \textbf{h}\left[ {\bar{\mathbf{U}} \vdots \Delta \textbf{U}} \right]{\left[ {\bar{\mathbf{U}} \vdots \Delta \textbf{U}} \right]^{H}}\\ &= \textbf{h}\left({\bar{\mathbf{U}}{{\bar{\mathbf{U}}}^{H}} + \Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right). \end{aligned} $$
(49)

As mentioned above, the choice of M dominating eigenvectors inevitably leads to information distortion. According to (49), the distortion caused by PCA can be expressed by

$$ J = \mathbb{E}\left[ {{{\left\| {\textbf{h} - {\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}}} \right\|}^{2}}} \right] = \mathbb{E}\left[ {{{\left\| {\textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right\|}^{2}}} \right]. $$
(50)

Meanwhile,

$$ \begin{array}{l} E\left[ {{{\left\| \textbf{h} \right\|}^{2}}} \right] = E\left[ {{{\left\| {\textbf{h}\left({\bar{\mathbf{U}}{{\bar{\mathbf{U}}}^{H}} + \Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right)} \right\|}^{2}}} \right]\\ = E\left[ {\left({{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}} + \textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right){{\left({{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}} + \textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right)}^{H}}} \right]\\ = E\left[ {{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}}\bar{\mathbf{U}}{{\bar{\mathbf{U}}}^{H}}{\textbf{h}^{H}} + {{\left\| {\textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right\|}^{2}} +} \right.\\ \left. {{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}}\Delta \textbf{U}\Delta {\textbf{U}^{H}}{\textbf{h}^{H}} + \textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}\bar{\mathbf{U}}{{\bar{\mathbf{U}}}^{H}}{\textbf{h}^{H}}} \right]. \end{array} $$
(51)

Because of the orthogonality between \({\bar {\mathbf {U}}}\) and Δ U, one has

$$ {\bar{\mathbf{U}}^{H}}\Delta \textbf{U} = {{\textbf{0}}_{M \times \left({{N_{t}} - M} \right)}}; $$
(52)
$$ \Delta {\textbf{U}^{H}}\bar{\mathbf{U}} = {{\textbf{0}}_{\left({{N_{t}} - M} \right) \times M}}. $$
(53)

Therefore, Eq. (51) can be rewritten as

$$ \begin{array}{l} \begin{aligned} \mathbb{E}\left[ {{{\left\| \textbf{h} \right\|}^{2}}} \right] &= \mathbb{E}\left[ {{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}}\bar{\mathbf{U}}{{\bar{\mathbf{U}}}^{H}}{\textbf{h}^{H}} + {{\left\| {\textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right\|}^{2}} +} \right.\\ &\left. {\underbrace {{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}}\Delta \textbf{U}\Delta {\textbf{U}^{H}}{\textbf{h}^{H}}}_{0} + \underbrace {\textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}\bar{\mathbf{U}}{{\bar{\mathbf{U}}}^{H}}{\textbf{h}^{H}}}_{0}} \right]\\ &= \mathbb{E}\left[ {{\mathbf{h}\bar{\mathbf{U}}}{{\bar{\mathbf{U}}}^{H}}\bar{\mathbf{U}}{{\bar{\mathbf{U}}}^{H}}{\textbf{h}^{H}} + {{\left\| {\textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right\|}^{2}}} \right]\\ &= \mathbb{E}\left[ {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}} + {{\left\| {\textbf{h}\Delta \textbf{U}\Delta {\textbf{U}^{H}}} \right\|}^{2}}} \right]\\ &= \mathbb{E}\left[ {{{\left\| {{\mathbf{h}\bar{\mathbf{U}}}} \right\|}^{2}}} \right] + J. \end{aligned} \end{array} $$
(54)

Since each element of channel vector h obeys the Gaussian distribution with unit variance, then \({\mathbb {E}\left [ {{{\left \| \textbf {h} \right \|}^{2}}} \right ] = {N_{t}}}\). So we can easily obtain that \({\mathbb {E}\left [ {{{\left \| {{\mathbf {h}\bar {\mathbf {U}}}} \right \|}^{2}}} \right ] = {N_{t}} - J}\).

As for Eq. (24), \({\mathbb {E}\left [ {\textbf {h}{\textbf {I}_{M}}^{H}{\textbf {h}^{H}}} \right ]}\) can be expressed by

$$ \mathbb{E}\left[ {\textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}}} \right] = \mathbb{E}\left[ {{{\left| {{\textbf{h}_{\left(1 \right)}}} \right|}^{2}} + {{\left| {{\textbf{h}_{\left(2 \right)}}} \right|}^{2}} + \ldots + {{\left| {{\textbf{h}_{\left(M \right)}}} \right|}^{2}}} \right], $$
(55)

where h (m)(m=1,2,…M) denotes the m th element of h. As mentioned above, each element of channel vector h obeys the Gaussian distribution with unit variance. Therefore,

$$ \mathbb{E}\left[ {\textbf{h}{\textbf{I}_{M}}^{H}{\textbf{h}^{H}}} \right] = M. $$
(56)

References

  1. F Boccardi, RW Heath Jr., A Lozano, TL Marzetta, P Popovski, Five disruptive technology directions for 5G. IEEE Comm. Mag. 52(2), 74–80 (2014).

    Article  Google Scholar 

  2. F Rusek, D Persson, BK Lau, EG Larsson, TL Marzetta, O Edfors, F Tufvesson, Scaling up MIMO: Opportunities and challenges with very large arrays. IEEE Sig. Proc. Mag. 30(1), 40–60 (2013).

    Article  Google Scholar 

  3. X Su, J Zeng, L-P Rong, Y-J Kuang, Investigation on key technologies in large-scale MIMO. J. Comput. Sci. Technol. 28(3), 412–419 (2013).

    Article  Google Scholar 

  4. X Rao, VKN Lau, Interference alignment with partial CSI feedback in MIMO cellular networks. IEEE Trans. Sig. Proc. 62(8), 2100–2110 (2014).

    Article  MathSciNet  Google Scholar 

  5. J Hoydis, S ten Brink, M Debbah, Massive MIMO in the UL/DL of cellular networks: how many antennas do we need. IEEE Jour.Select. Areas in Comm. 31(2), 160–171 (2013).

    Article  Google Scholar 

  6. N Jindal, MIMO broadcast channels with finite-rate feedback. IEEE Trans. Info. Th. 52(11), 5045–5060 (2006).

    Article  MathSciNet  MATH  Google Scholar 

  7. DJ Love, RW Heath Jr., Limited feedback diversity techniques for correlated channels. IEEE Trans. Vehicular Technol. 55(2), 718–722 (2006).

    Article  Google Scholar 

  8. P Xia, GB Giannakis, Design and analysis of transmit beamforming based on limited-rate feedback. IEEE Trans. Sig. Proc. 54(5), 1853–1863 (2006).

    Article  Google Scholar 

  9. J Choi, V Raghavan, DJ Love, Limited feedback design for the spatially correlated multi-antenna broadcast channel, 2013 IEEE Global Communications Conference (GLOBECOM), vol. 1, (Atlanta, GA, 2013).

  10. V Raghavan, RW Heath Jr., AM Sayeed, Systematic codebook designs for quantized beamforming in correlated MIMO channels. IEEE J. Select. Areas Commun. 25(7), 1298–1310 (2007).

    Google Scholar 

  11. J Li, X Su, Z Zeng, Y Zhao, S Yu, L Xiao, X Xu, Codebook design for uniform rectangular arrays of massive antennas, Vehicular Technology Conference (VTC Spring), 2013 IEEE 77th, (Dresden, 2013).

  12. X Su, J Zeng, J Li, L Rong, L Liu, X Xu, J Wang, International Journal of Antennas and Propagation, vol. 2013 (Hindawi Publishing Corporation, New York, US, 2013).

    Google Scholar 

  13. D Ying, FW Vook, T Thomas, DJ Love, Sub-sector-based codebook feedback for massive MIMO with 2D antenna arrays, 2014 IEEE Global Communications Conference, (Austin, TX, 2014).

  14. J Choi, Z Chance, DJ Love, U Madhow, Noncoherent trellis coded quantization: a practical limited feedback technique for massive MIMO systems. IEEE Trans. Commun. 61(12), 5016–5029 (2013).

    Article  Google Scholar 

  15. J Choi, DJ Love, T Kim, Trellis-extended codebooks and successive phase adjustment: a path from LTE-advanced to FDD massive MIMO systems. IEEE Trans. Wireless Commun. 14(4), 2007–2016 (2015).

    Article  Google Scholar 

  16. Y Han, S Wonjae, L Jungwoo, Projection based feedback compression for FDD massive MIMO systems, 2014 IEEE Globecom Workshops (GC Wkshps), (Austin, TX, 2014).

  17. PH Kuo, HT Kung, PA Ting, Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays, 2012 IEEE Wireless Communications and Networking Conference (WCNC), (Shanghai, 2012).

  18. J Lee, SH Lee, A Compressed Analog Feedback Strategy for Spatially Correlated Massive MIMO Systems, Vehicular Technology Conference (VTC Fall), 2012 IEEE, (Quebec City, QC, 2012).

  19. P Cao, E Jorswieck, DCT and VQ based limited feedback in spatially-correlated massive MIMO systems, 2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM), (A Coruna, 2014).

  20. MS Sim, CB Chae, Compressed channel feedback for correlated massive MIMO systems, 2014 IEEE Globecom Workshops (GC Wkshps), (Austin, TX, 2014).

  21. W Lu, X Tan, Q Liu, Y Liu, D Wang, Compressive Channel Feedback Schemes Based on Redundant Dictionary in MIMO Communication Systems. Wireless Personal Communications. 82(4), 2215–2229 (2015).

    Article  Google Scholar 

  22. MS Sim, C-B Chae, in Proc. IEEE Globecom Workshops (GC Wkshps). Compressed channel feedback for correlated massive MIMO systems (Austin, TX, 2014), pp. 327–332.

  23. JE Fowler, Compressive-projection principal component analysis. IEEE Trans. Image Process. 18:, 2230–2242 (2009).

    Article  MathSciNet  Google Scholar 

  24. LI Smith, A tutorial on principal components analysis. Technical report (Cornell University, USA, 2002).

    Google Scholar 

  25. D Shiu, G Foschini, M Gans, J Kahn, Fading correlation and its effect on the capacity of multi-element antenna systems. IEEE Trans. Comm. 48(3), 502–513 (2000).

    Article  Google Scholar 

  26. J Nam, J-Y Ahn, A Adhikary, G Caire, Joint spatial division and multiplexing: realizing massive MIMO gains with limited channel state information, 46th Annual Conference on Information Sciences and Systems(CISS 2012) (Princeton, NJ, USA, 2012).

    Google Scholar 

  27. H Yin, D Gesbert, L Cottatellucci, Dealing with interference in distributed large-scale MIMO systems: a statistical approach. IEEE Jour. Select. Topics Sig. Proc. 8(5), 942–953 (2014).

    Article  Google Scholar 

  28. A Wiesel, YC Eldar, S Shamai, Zero-forcing precoding and generalized inverses. IEEE Trans. on Sig. Proc. 56(9), 4409–4418 (2008).

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tiankui Zhang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by the NSF of China (No. 61271177 and No. 61461029) and Fundamental Research Funds for the Central Universities (2014ZD03-01).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Ge, A., Beaulieu, N.C. et al. A limited feedback scheme for massive MIMO systems based on principal component analysis. EURASIP J. Adv. Signal Process. 2016, 64 (2016). https://doi.org/10.1186/s13634-016-0364-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13634-016-0364-9

Keywords