Feedback Quantization for Linear Precoded Spatial Multiplexing

This paper gives an overview and a comparison of recent feedback quantization schemes for linear precoded spatial multiplexing systems. In addition, feedback compression methods are presented that exploit the time correlation of the channel. These methods can be roughly divided into two classes. The ﬁrst class tries to minimize the data rate on the feedback link while keeping the performance constant. This class is novel and relies on entropy coding. The second class tries to optimize the performance while using the maximal data rate on the feedback link. This class is presented within the well-developed framework of ﬁnite-state vector quantization. Within this class, existing as well as novel methods are presented and compared.


INTRODUCTION
An attractive scheme to make spatial multiplexing more robust against rank deficient channels, and to reduce the receiver complexity, is linear precoding.The linear precoding matrix is a function of the channel state information (CSI), which is, in general, only available at the receiver.Thus, the required information to calculate the precoding matrix must be fed back to the transmitter over a feedback link, which is assumed to be data-rate limited.An important approach to improve the performance of linear precoded spatial multiplexing is optimizing the exploitation of the limited data rate on the feedback link.
The notion of linear precoding was introduced in [1], where the optimal linear precoder that minimizes the symbol mean square error for linear receivers under different constraints was derived.The bit-error-rate (BER) optimal precoder was introduced in [2], and the capacity optimal precoder in [3].The first use of partial CSI at the transmitter was presented in [4], where the Lloyd algorithm is used to quantize the CSI.Other approaches focused on feeding back the mean of the channel [5], or the covariance matrix of the channel [6].An overview of the achievable channel capacity with limited channel knowledge can be found in [7].Schemes that directly select a quantized precoder from a codebook at the receiver, and feed back the precoder index to the transmitter have been independently proposed in [8,9].There, the authors proposed to design the precoder codebooks to maximize a subspace distance between two codebook entries, a problem which is known as the Grassmannian line packing problem.The advantage of directly quantizing the precoder is that the unitary precoder matrix [1] has less degrees of freedom than the full CSI matrix, and is thus more efficient to quantize.Several subspace distances to design the codebooks were proposed in [10], where the selected subspace distance depends on the function used to quantize the precoding matrix.In [11], a precoder quantization design criterion was presented that maximizes the capacity of the system and also the corresponding codebook design.A quantization function that directly minimizes the uncoded BER was proposed in [12].
This paper presents existing and novel schemes for linear precoding in the well-known vector quantization framework.We present the most popular selection and distortion criteria used for linear precoding, but also novel techniques like entropy coding, and finite state vector quantization.Further, we show how these schemes can be adapted to changing channel statistics, that is, to nonstationary sources.

Notation
We use capital boldface letters to denote matrices, for example, A, and small boldface letters to denote vectors, for example, a.The Frobenius norm and the 2-norm of a matrix A are denoted as A F and A 2 , respectively.E(•) denotes expectation and P(•) probability.
[A] m,n is the element in the mth row and nth column of A. The n × n identity matrix is denoted as I n , and U m×n is the set of unitary m × n matrices.tr(A) is the trace of A, and det(A) the determinant of A.

SYSTEM MODEL
Throughout the paper, we assume a narrowband spatial multiplexing MIMO system with N T transmit and N R receive antennas, transmitting N S ≤ min(N T , N R ) symbol streams, as depicted in Figure 1.The system equation at time instant n is where y[n] ∈ C NR×1 is the received vector, ν[n] ∈ C NR×1 is the additive noise vector, s[n] ∈ C NS×1 is the data symbol vector, H[n] ∈ C NR×NT is the channel matrix, and F[n] ∈ C NT ×NS is the linear precoding matrix.We assume the data symbol vector s[n] is zero mean spatially and temporally white distributed over a complex finite alphabet, for example, the entries belong to a QAM alphabet A, and the noise vector ν[n] is zero mean spatially and temporally white complex Gaussian distributed.The channel matrix H[n] is zero mean possibly spatially and temporally correlated complex Gaussian distributed.The spatial correlation can be modeled using [13], , where R r is the receive covariance matrix, R t the transmit covariance matrix, and H w [n] the possibly temporally correlated channel matrix.We assume without loss of generality that the symbols and the noise have unit variance.
The singular value decomposition (SVD) of H[n] is defined as Many studies have been carried out to derive the optimal precoding matrix for a certain performance measure, see [1][2][3]14].In general, the optimal precoding matrix looks like where Θ[n] ∈ C NS×NS is a diagonal power loading matrix, and M[n] ∈ U NS×NS is a unitary mixing matrix.For some performance measures, the mixing matrix is arbitrary, whereas for other performance measures its value matters.
In any case, it has been shown that for low-rate feedback channels, it is better not to feed back the power loading matrix and to stick to feeding back a unitary precoder [15].That is why we will limit the precoding matrix F to be unitary, The maximum data rate on the feedback link is R bits per channel use, and the feedback is assumed to be instantaneous and error free.We consider two different types of feedback channels: a dedicated feedback channel and a nondedicated feedback channel.A dedicated feedback channel is only used to transmit the precoder index to the transmitter, whereas a nondedicated feedback channel is also used for data transmission.The transmission is organized in a blockwise fashion, that is, feedback is only possible at the beginning of each new block, and every block has a duration of T f .We assume the channel is perfectly known at the beginning of every block.

VECTOR QUANTIZATION
The data-rate-limited feedback link requires quantization of the channel matrix, resulting in a unitary precoder.The simplest approach is to use memoryless VQ, which quantizes every channel matrix H[n] separately.Hence, we can drop the time index n everywhere in this section.In memoryless VQ, we select a unitary N T × N S matrix F i from a codebook C = {F 1 , . . ., F K } that minimizes or maximizes a given selection function S. We will denote Q(H) as the quantized version of the channel matrix, but note that it actually represents the unitary precoder.More specifically, for a given selection function S and a given codebook C, Q(H) can be defined as where we take the minimum or the maximum depending on the selection function S. The quantization process can be further separated into an encoding step and a decoding step.The encoder α maps the channel into one of K precoder indices, which for simplicity reasons can be represented by the set I = {1, 2, . . ., K}: The decoder β simply maps the precoder index into one of the K precoders: So we actually have Note that the index i ∈ I is transmitted over the feedback channel as a bitword w i .What type of bitwords we have to feed back strongly depends on the type of feedback link: dedicated or nondedicated.In case of a nondedicated feedback channel, the transmitter has to be able to differentiate between a bitword and the data.This means the bitwords should be instantaneously decodable and thus prefix-free (PF), that is, a bitword can not contain any other bitword as a prefix.This is not the case in a dedicated feedback channel, where we can use non-prefix-free (NPF) bitwords.If the quantizer is well designed, all precoders F i have more or less the same probability.Under that assumption, we can think of two ways to design our bitwords w i .For a nondedicated feedback link, we can take K equal-length PF bitwords, leading to a feedback rate of log 2 K bits per channel use.For a dedicated feedback link, however, we can take any K bitwords with the smallest average length, leading to an average feedback rate of 1/K K i=1 log 2 i .An example is given in Table 1, where we assume a codebook with K = 4 entries.Next we focus on a number of selection functions for linear precoding, and we discuss the design of precoder codebooks.

Precoder selection
In this section, we will give an overview of some common selection functions S that have been proposed in recent literature.Whether we have to minimize or to maximize the selection function will be clear from the context.In [10], selection criteria are derived based on different performance measures.Optimizing the performance of the maximum likelihood (ML) receiver is related to maximizing the minimum Euclidean distance between any two possible noiseless received vectors: For linear receivers, two performance measures are considered in [10], the minimum SNR on the substreams and the trace or determinant of the MSE matrix.Maximizing the first measure for the zero forcing (ZF) receiver is related to maximizing the minimal singular value (MSV) of the effective channel HF: where λ min {A} denotes the MSV of the matrix A. Minimizing the second measure for the minimum mean square error (MMSE) receiver, leads to minimizing the following selection function; where m = tr or m = det.Finally, [10] also proposes to maximize the mutual information (MI) between the transmitted symbol vector s and the received symbol vector y over the effective channel HF: It has been shown in [10] that the above performance measures can be associated to a subspace distance between the right singular vectors of H, collected in V, and F. As such, this subspace distance could also be used as selection function to be minimized.The performance of the ML receiver, the minimum SNR on the substreams for the ZF receiver, and the trace of the MSE matrix for the MMSE receiver are all related to the projection 2-norm distance: whereas the determinant of the MSE matrix for the MMSE receiver and the MI criterion can be connected to the Fubini-Study distance: Next to minimizing those subspace distances, minimizing the chordal distance is also used as selection criterion, This function is related to the performance of an orthogonal space-time block code (OSTBC) that is used on top of the precoder [16].For all the above selection criteria (for the ML criterion this is only approximately true), the optimal unitary precoder is given by VM, where M is an arbitrary N S ×N S unitary matrix, that is, M ∈ U NS×NS .This unitary ambiguity can be a problem when we are interested in other performance measures, such as uncoded bit-error-rate (BER), for instance.We know that in that case, the actual structure of the ambiguity matrix becomes important [12].One solution could of course be to simply minimize the BER: However, this is often difficult to compute.A simpler solution might be to encode V using VQ and to adopt the optimal (or a suboptimal) unitary mixing matrix M according to [12].Hence in that case we do not use F i but F i M as a precoder at the transmitter.We could encode V for instance by minimizing the Frobenius norm between V and F [16].
This selection function is however not invariant to a phase shift of the singular vectors collected in V.That is why, the Frobenius norm has been extended to the so-called modified Frobenius norm [17], where D n ⊂ U n×n is the set of all diagonal unitary n × n matrices.Notice how through the use of the real or absolute value of V H F, instead of the product V H FF H V in ( 13), we truly encode V instead of its subspace.Let us now discuss the codebook design.

Codebook design
In general, a codebook design aims at finding a set of precoders C that minimizes some average distortion, where D(H, Q(H)) is the distortion between H and Q(H), and p(H) is the probability density function (PDF) of the channel matrix H.The distortion function D can take many different forms depending on the performance measure we are interested in (as was the case for the selection function).In [10], it has been shown that if we are interested in the performance of the ML receiver, the minimum SNR on the substreams for the ZF receiver, or the trace of the MSE matrix for the MMSE receiver, we can take as distortion function, the squared projection 2-norm distance between V and Q(H): On the other hand, if we care about the determinant of the MSE matrix for the MMSE receiver or the MI, we should take the squared Fubini-Study distance between V and Q(H) as distortion function, Finally, the distortion function related to the performance of an orthogonal space-time block code (STBC) that is used on top of the precoder is presented in [16] as The reason why squared subspace distances are used as distortion functions (and not the performance measures themselves) is because they lead to simpler design procedures as detailed later on.
In [11], an alternative and more exact distortion measure for the MI is proposed, namely, the capacity loss introduced by quantization, where Note that this distortion function converges to the squared chordal distance D C when the diagonal elements of Σ 2 go to infinity.All the above distortion functions are invariant to a left multiplication of the precoder with a unitary matrix.As already indicated in the previous section, this could create a problem when performance measures like the uncoded BER are considered.Taking the distortion function equal to the BER, that is, D BER (H, Q(H)) = BER (H, Q(H)) leads to a difficult codebook design.But as before, we could take the squared Frobenius norm or squared modified Frobenius norm between V and Q(H) as a distortion function to solve this complexity problem, In this case, our goal is again to feedback V, and we will not use the precoder Q(H) but Q(H)M at the transmitter, where M is the optimal (or a suboptimal) unitary mixing matrix [12].Now, the question is how we can solve (17) for a certain distortion function.We can basically distinguish between three different approaches: Grassmannian subspace packing, the generalized Lloyd (GL) algorithm, and the Monte-Carlo (MC) algorithm.

Grassmannian subspace packing
In case the distortion function is a subspace distance and the channel is spatially white, we can simplify ( 17) by means of a Grassmannian subspace packing problem.In such a problem, the objective is to find a set of unitary precoders that maximizes the minimal subspace distance between them [10,16], where d is any of the subspace distances we discussed above.
Of course, such a codebook can also be used when the channel is not spatially white, but the performance will decrease with an increased spatial correlation of the channel.

Generalized Lloyd algorithm
The generalized Lloyd (GL) algorithm tries to solve (17) by iteratively optimizing the encoder and the decoder [18,19].For a given decoder β, the encoder is optimized by taking the precoder index leading to the smallest distortion (the socalled nearest neighbor condition): thereby splitting the space of channel matrices into K channel regions R i , i ∈ I; On the other hand, for a given encoder α, the decoder β is optimized by taking the centroid of the related channel region (the so-called centroid condition), Although not rigorously proven, the GL algorithm converges to a local minimum, which might not necessarily be the global minimum.To avoid working with the continuous channel distribution, the GL algorithm makes use of a set Codebook of training channels T = {H (r) }, where r is the realization index.This set can be interpreted as the discrete channel distribution that approximates the continuous one.The more training vectors in the set, the better the approximation.
Computing the exact centroid based on T is not always easy [20].For the squared subspace distances as well as the capacity loss distortion function in (18), closed form expressions for the centroid exist.However, for the BER and even the squared Frobenius norm or squared modified Frobenius norm, a closed form expression does not exist.
For those distortion functions, we simply apply a brute force (approximate) centroid computation by exhaustively searching the best possible candidate among the set of matrices V (r) for which H (r) belongs to the related region.

Monte-Carlo algorithm
Another interesting approach is the pure Monte-Carlo based design.Instead of trying to optimize an existing codebook, this design randomly generates codebooks, checks the average distortion (17) of these codebooks, and keeps the best one.As for the GL algorithm, we will make use of the set of training channels T to approximate the continuous channel distribution.Although this algorithm becomes computationally expensive for large dimensions, for small dimensions we have observed that the MC algorithm is a very good alternative to Grassmannian subspace packing or the GL algorithm.

FEEDBACK COMPRESSION THROUGH ENTROPY CODING
This section explores methods to compress the feedback requirements on the feedback link, without sacrificing performance.It uses variable-rate codes to encode highly probable precoder matrices with small bitwords and less probable precoder matrices with longer bitwords.This is called entropy coding [18].However, as we already indicated in Section 3, if the memoryless VQ is well designed, all precoders F i have more or less the same probability.We therefore try to exploit the time correlation of the channel and make use of the transition probabilities between precoders instead of the occurrence probabilities.Hence, instead of assigning a bitword w i to a precoder F i , we assign a bitword w i, j to a precoder F i if the previous precoder was the precoder F j .Our goal then is to minimize the average length where l(w i, j ) is the length of the bitword w i, j and is the transition probability from F j to F i .Depending on the type of feedback channel, we obtain a different solution for (23).For a nondedicated feedback link, or in other words for PF bitwords, the solution of ( 23) is given by the Huffman code [21].For a dedicated feedback link, or in other words for NPF bitwords, the solution of ( 23) is simply given by selecting any K bitwords with the smallest possible average length, and assigning the longest (smallest) bitwords to the lowest (highest) transition probabilities.An example of a codebook for a dedicated feedback link and a nondedicated feedback link is depicted in Table 2.The transition probabilities are estimated through Monte-Carlo simulations.This example assumes that the previous quantized precoder is Q(H[n − 1]) = F 8 .Due to the time correlation of the channel, the most probable precoder in this example at time instant n is then again F 8 .Thus, the most probable precoder matrix F 8 gets a short bitword assigned, whereas the precoders with lower probabilities get longer bitwords assigned.
Please note that for OFDM, where several precoder matrices for different tones are transmitted at the same time instant, the individual precoding matrices do not need to be instantaneously decodable.They can be jointly encoded, for example, through the use of arithmetic coding.
The scheme can be extended to incorporate error correcting codes to make it robust against errors on the feedback channel.
The above techniques rely on the exact knowledge or the knowledge of the order of the transition probabilities between the past precoder Q(H[n − 1]) and the actual precoder Q(H[n]).Unfortunately, a closed form expression of the transition probabilities is not known, and difficult to derive due to the nonlinearity of the quantization.For the special case of known channel statistics, they can be estimated offline through a Monte-Carlo approach [22].However, in practice the underlying channel statistics are unknown, or are changing at runtime.The next section provides a solution to this problem.

Adaptive entropy coding
In [23], we introduced a novel scheme to adaptively estimate the transition probabilities.The presented scheme is able to estimate the transition probabilities at runtime, and to adapt to changing channel statistics.The algorithm starts by assuming that all the different transitions are equiprobable.Then it counts the different transitions at both the decoder and the encoder, and updates the transition probabilities after each new feedback.Assuming a transition between the precoder F j and the precoder F k happens, the transition probability The factor N controls how fast or how accurate the probabilities are estimated.Larger values of N lead to a smaller increase or decrease after each iteration, and thus, to a slower, but more accurate estimation.Instead of updating the transition probabilities, one can also directly update the Huffman code, in the case of a nondedicated feedback link [24][25][26].However, the effect is very similar to the two-step approach of first updating the transition probabilities and then computing the new Huffman code.

FINITE-STATE VECTOR QUANTIZATION (FSVQ)
In this section, we will look at a number of methods to improve the performance exploiting the maximal data rate of R bits per channel use on the feedback channel.We will present the different methods in the well-developed framework of finite-state vector quantization (FSVQ), and we closely follow [18].
Before introducing FSVQ, let us consider a so-called switched VQ, consisting of a finite number of memoryless VQs and a classifier that periodically decides which memoryless VQ is best and feeds back the index of this VQ to the decoder.The decision of the classifier is generally based on an estimate of the statistics of the channel.An example of this approach is given in [27], where the different memoryless VQ codebooks are constructed by rotating and scaling a specific root codebook.The drawback of this approach is of course the additional feedback overhead due to the fact that the classifier periodically feeds back the index of the best memoryless VQ.
FSVQ solves this problem since it does not require any additional side information.An FSVQ has some built-in mechanism to determine which of the memoryless VQs should be used to transform the current channel into a quantization index.It is the current state that determines which memoryless VQ to employ, and that is why the related codebook is called the state codebook.The current state together with the obtained quantization index then determines the next state through the so-called next-state function.This is explained in more detail next.
Suppose we have a set of K states, which without loss of generality can be denoted as S = {1, 2, . . ., K}.Every state s ∈ S is related to a state codebook C s = {F 1,s , F 2,s , . . ., F N,s }.The encoder α maps the current channel and state into one of N quantization indices, which for simplicity reasons can be represented by the set I = {1, 2, . . ., N}. Assume for instance that at time instant n the channel and state are given by H[n] and s[n], respectively, then we can describe our encoder as where S is one of the selection functions described in Section 3.1.The decoder β simply maps the current quantization index and state into one of the N precoders of the related state codebook.Assume for instance that at time instant n the quantization index and state are given by i[n] and s[n], respectively, then our decoder can be expressed as So the overall quantization procedure can be written as Finally, we need a mechanism that tells us how to go from one state to the next.This is obtained by the next-state function.
Keeping in mind that both the encoder and decoder should be able to track the state, the next-state function f can only be guided by the quantization index.Assume that at time instant n the current quantization index and state are given by i[n] and s[n], respectively, then the next-state function can be expressed as follows: An FSVQ is now completely determined by the state space S = {1, 2, . . ., K}, the state codebooks C s = {F 1,s , F 2,s , . . ., F N,s } for all s ∈ S, the next state function f , and the initial state s[0].Note that the union of all state codebooks is called the super codebook C = s∈S C s , which contains no more than KN precoders.
As in memoryless VQ, we can consider two ways to assign bitwords w i to the indices i ∈ I.We can use N equal-length PF bitwords (for a nondedicated feedback link), with a feedback rate of log 2 N bits per channel use, or N increasing-length NPF bitwords (for a dedicated feedback link), with an average feedback rate of 1/N N i=1 log 2 i .This assignment is again based on the assumption that for a certain state s, the precoders F i,s have more or less the same probability.
Two special classes of FSVQs are the labeled-state and the labeled-transition FSVQs.Basically, every FSVQ can always be represented in either form and as a result, these classes are not restrictive.In a labeled-state FSVQ, the states are basically labeled by the quantized precoders, and the quantized precoder that is produced depends on the arrival state.In other words, the labeled-state FSVQ decoder β only depends on the next state: In a labeled-transition FSVQ, not the states but the state transitions are labeled by the quantized precoders, and the selected quantized precoder is determined not by the arrival state but by both the departure state and the arrival state.Hence, the labeled-transition FSVQ decoder β depends on the current as well as on the next state: As will be illustrated later on, the design of an FSVQ is often based on an initial classifier that classifies channels into states.Such a classifier could for instance be a simple memoryless VQ with a codebook where the selection function S class is one of the functions introduced in Section 3.1, and could possibly be different from the selection function S chosen in the encoder (25).We will come back to this issue in Section 5.2.
In the next few subsections, we will describe a few methodologies to design the state codebooks and the next state functions based on the initial classifier.In the first subsection, we will discuss some labeled-state FSVQ designs.These are basically existing designs, although they have not always been introduced in the framework of FSVQ or in the context of time-correlated channels.In the second subsection, we describe the so-called omniscient design, which is a completely novel feedback compression method.Note that it is still possible to iteratively improve the obtained state codebooks, given the next-state function, as illustrated in [18, page 536].However, this generally only shows marginal performance gains over the initial designs, and thus we will not consider it in this work.

Labeled-state FSVQ designs
In this section, we discuss a few labeled-state FSVQ feedback designs, where each state s ∈ S is labeled with the precoder F s from the classifier codebook C class .Hence, the decoder β is then simply given by In that case, the super codebook C corresponds to the classifier codebook C class , and the state codebooks C s are subsets of the classifier codebook C class .Below wedescribe a Note that the transition probabilities can be computed as in Section 4, but the adaptive approach can not be used here because the decoder does not have knowledge about the current channel.An example is given in Table 3, where we assume that the current state is s = 8.Assuming the state codebooks have size N = 4, the state codebook C 8 is given by C 8 = {F 8 , F 6 , F 1 , F 4 }.Although presented in a different framework, a similar approach has been proposed in [22].

Nearest neighbor design
For the nearest neighbor design, the next states of a current state s are not the N states s that have the highest transition probability, but the N states s that have the closest precoder to the precoder of state s in terms of some distance d, which could be a subspace distance, the Frobenius norm d F , or the modified Frobenius norm d MF , although the latter are not strictly speaking distances.Hence, the state codebook C s is the set of N precoders F s that have the smallest distance d(F s , F s ).If we define, without loss of generality, F i,s as the precoder F s of the state s with the ith smallest distance d(F s , F s ), then the next-state function f (i, s) is simply given by this state s .Again looking at the example in Table 3, we now see that the state codebook C 8 is given by C 8 = {F 8 , F 5 , F 4 , F 6 }.
In the context of orthogonal frequency division multiplexing (OFDM), this approach has already been proposed in [28] to compress the feedback of the precoders on the different subcarriers.

Discussion
The problem of both the conditional histogram design and the nearest neighbor design is that if K/N is large and the time correlation of the channel is small, the optimal transition might be not one of the N most likely ones or not one of the N transitions with the smallest distance between precoders.This could lead to a so-called derailment problem.Taking a smaller K/N is a possible solution, but it either leads to a lower performance (decreasing K) or a higher feedback rate (increasing N).As suggested in [18, page 540], the derailment problem could also be solved by periodic reinitialization.

Omniscient design
In this section, we present a novel feedback compression method, based on what in the field of vector quantization is known as the omniscient design [18].In general, the omniscient design provides the best performance of all the FSVQ design approaches [18].
To explain the omniscient design, let us assume that the next-state function is not determined by the current quantization index and state, but simply by the current channel, for instance by means of the classifier function g, The state codebook C s for a state s can then be designed by minimizing some average distortion: where D(H, Q(H, s)) is the distortion between H and Q(H, s), or equivalently, given the current state s[n] = s.Any of the distortion functions presented in Section 3.2 can be considered.We can now solve (34) by the GL algorithm or the MC algorithm, as was done in Sections 3.2.2 and 3.2.3.This requires a set of training channels T s .To construct T s , we first generate a large set of pairs of consecutive channels based on the channel statistics, where r is the realization index.From this set P we construct T s as the set of channels H (r) [n] for which g( This way we obtain an FSVQ.When K/N gets smaller and the time correlation of the channel gets larger, that is, when the regions related to the classifier codebook C class get larger compared to the regions related to the state codebooks C s , the approximation gets better.On the other hand, however, for a fixed N, it is sometimes worth to increase K to benefit from an increased knowledge about the past.
In [18], it is mentioned that the omniscient design leads to a labeled-transition FSVQ, because given a current state, every possible quantization index leads to a different next state.However, this is not necessarily true.Different quantization indices could sometimes lead to the same next state, and thus in general we do not have a labeled-transition FSVQ.

Adaptive FSVQ
Unfortunately, it is not trivial to extend the FSVQ to adapt to changing channel characteristics, that is, to a nonstationary source.The adaptation of the state codebooks C s has to rely on information that is available both at the encoder and the decoder.This shared information can for instance consist of the last l states s[n], s[n − 1], . . ., s[n − l + 1] and the last l quantized precoders ).We restrict our approach to such a window of l samples due to memory restrictions, and we forget past samples for which the channel might have different characteristics.Whenever the precoder is , we know that the channel matrix H[n] lies in some region R i,s [n] .Assuming a realistic channel distribution, we can then define one or more random channel matrices that also lie in the region R i,s [n] .Finally, the FSVQ design algorithms mentioned previously can be used with the new training sequence to design the new state codebooks.Note that the state codebooks, and thus the quantizer regions, are recalculated from scratch after each feedback.Instead, we could also consider updating the codebook as done in competitive learning [29].However, such techniques still have to be adapted to take the unitary constraint of the precoding matrix into account, and they are considered future work.

SIMULATIONS
In this section, we are providing numerical results for the different schemes and design approaches presented so far.We assume that N S = 2 data streams are transmitted over N T = 4 antennas.The receiver is equipped with N R = 2 receive antennas, and QPSK modulation is used.We start in Section 6.1 by comparing the BER performance for different codebooks using the BER criterion as selection function.Section 6.2 then shows the performance of Monte-Carlo and subspace packing codebooks for spatially correlated channels.In Section 6.3, the possible feedback compression gains of entropy coding over memoryless VQ are shown for time-correlated channels.Section 6.4 shows how fast the adaptive entropy coding schemes adapt to changing channel statistics.The following subsection then compares FSVQ to memoryless VQ, and it also compares the different FSVQ design approaches.Finally, Section 6.6 shows the duality between FSVQ and entropy coding.

Memoryless VQ
Figure 2 compares the performance of different codebook designs presented in Section 3.2.The BER is used as selection function (14).The Frobenius norm, the modified Frobenius norm, and the chordal distance codebook are using the Monte-Carlo algorithm to solve (17), using the respective squared distances as distortion function.The BER codebook is also designed using the Monte-Carlo algorithm.The Love-Heath codebook [10] and the Zhou-Li codebook [12] are designed to optimize (19) with the chordal distance as subspace distance.Love and Heath were using techniques from [30], and Zhou and Li were using the generalized Lloyd algorithm.The simulation shows that the performance of the different codebooks is similar, and even using the BER as a distortion function in the codebook design does not yield a noticeable performance gain.using the Grassmannian subspace packing approach with the chordal distance, and the other codebook is designed using the Monte-Carlo algorithm with the squared modified Frobenius norm as distortion function.The channel is modeled using the measurements in [31], and the BER selection function ( 14) is used to choose the best codebook entry.We see that the Monte-Carlo codebook, which takes the channel correlation into account, outperforms the Grassmannian subspace packing codebook, which aims at spatially white channels.

Entropy coding
Figure 4 depicts the compression gains possible through entropy coding.The channel is modeled through Jakes' model with the Doppler spread fixed.The mean feedback rate is depicted as a function of the frame duration T f .A small frame duration implies a highly correlated channel, whereas a longer frame duration implies a less correlated channel.The Huffman code is used as prefix-free code, and the simple binary numbering from Table 2 is used as the nonprefix-free code.The modified Frobenius norm ( 16) is used as selection function and the squared modified Frobenius norm as distortion function to design the codebook using the Monte-Carlo algorithm.The transition probabilities used to design the entropy codes are estimated through Monte-Carlo simulations.
We see that the prefix-free code achieves a mean feedback rate of 1 bit for highly correlated channels, whereas the nonprefix-free code can even achieve 0 bits, that is, no feedback is necessary.For longer frame durations, that is, uncorrelated channels, the mean feedback rate for the Huffman encoded bitwords converges to 4 bits, since the transitions between the different codewords become equiprobable, and then the Huffman code assigns equal-length bitwords to all the    precoders.The non-prefix-free code converges to 2.375 bits for uncorrelated channels since the transitions between the different codewords become equiprobable as well, and thus it assigns the binary numbering bitwords randomly.

Adaptive entropy coding
The tradeoff between adaptation speed and accuracy for adaptive entropy coding is depicted in Figures 5 and 6.
To depict the adaptation of the adaptive entropy coding to changing channel statistics, we changed the frame duration  from 10 −3 seconds to 10 −2 seconds after 3000 frames, and back after another 3000 frames.The remaining simulation parameters are identically as in the previous subsection.Figure 5 assumes a nondedicated feedback channel.We see how the selection of the weighting factor N controls the tradeoff between performance and speed of the adaptive encoding process.For small N, the transition probabilities are estimated faster but less accurate, and for higher N, the estimation is slower but more accurate.Figure 6 shows a similar scenario, but for a dedicated feedback channel, where the bitwords are designed using the non-prefix-free code from Table 2.We see that the system quickly adapts to the changing frame lengths for both values of N, since the encoding of the bitwords does no longer depend on the exact transition probabilities but only on their order.

FSVQ
The performance of different state codebook designs is depicted in Figure 7.The FSVQs are created using the omniscient design.The different codebooks are designed with the squared modified Frobenius norm as distortion function, and the modified Frobenius norm ( 16) is used as selection function for the classifier (33) as well as for the quantization (27).
We see that the performance of the FSVQ highly depends on the time correlation of the channel.If the time correlation between the channels is high, the 2 bit feedback of a FSVQ has the same BER performance as the 4 bit memoryless VQ.However, for less correlated channels the performance drops to the same performance as the 2 bit memoryless VQ.
Different design approaches for FSVQ codebooks are shown in Figure 8.We simulate for the different design approaches the performance after 1 transmission and after 100 transmissions.We use the same distortion and selection functions as in the previous simulations.
We see that the omniscient design performs best after 1 transmission, but it also suffers the most from the derailment  problem, that is, its performance after 100 transmissions is worse than the nearest neighbor and the conditional histogram design.This effect can be counteracted through periodic reinitialization.

Figure 1 :
Figure 1: System model of the linear precoded spatial multiplexing MIMO system with limited feedback.
and Σ[n] is a real nonnegative diagonal N R × N T matrix (the diagonal starts in the top left corner) with nonincreasing diagonal entries.The columns of U[n] and V[n] are called the left and right singular vectors, respectively, whereas the diagonal entries of Σ[n] are the corresponding singular values.Only focusing on the N S strongest modes of the channel (the ones with the largest singular values), let us define U[n] = [U[n]] :,1:NS ∈ U NR×NS , V[n] = [V[n]] :,1:NS ∈ U NT ×NS , and Σ[n] = [Σ[n]] 1:NS,1:NS , where [A] a:b,c:d selects the submatrix of A on the rows a to b and the columns c to d, and the range indices are omitted when all rows or columns should be selected.

Figure 3
Figure 3 compares the performance of two codebooks for a spatially correlated channel.One codebook is designed

Table 1 :
Example of a 4-entry (K = 4) codebook for a nondedicated and dedicated feedback link.

Table 2 :
Example of feedback compression through entropy coding.

Table 3 :
Example of transition probabilities and precoder distances assuming the previous state was s= 8.
5.1.1.Conditional histogram designFor the conditional histogram design, the next states of a current state s are the N states s that have the highest probability to be reached from state s in terms of the initial classifier.Hence, the state codebook C s is the set of N precoders F s corresponding to the N states s that have the highest transition probability P(g(H[n]) = s | g(H[n − 1]) = s).If we define, without loss of generality, F i,s as the precoder F s of the state s with the ith highest transition probability P(g(H[n]) = s | g(H[n − 1]) = s),then the next-state function f (i, s) is simply given by this state s .
(25)problem of this approach is that the decoder can not track the state, because it does not have access to the current channel.Hence, it is assumed here that the decoder is omniscient and we actually do not have an FSVQ.Thus, we should replace H[n] in the next-NT ×(NR−NS) ] H .This is of course not a good channel estimate for equalization, but it is good in terms of the N S largest right singular vectors collected in V[n].Hence, if the classifier g is designed based on a selectionfunction S class that only depends on V[n], then g( H[n]) is a good approximation of g(H[n]).That is why we often choose S class based on a subspace distance (S P2 , S FS , or S C ), the Frobenius norm (S F ), or the modified Frobenius norm (S MF ), irrespective of what is chosen as selection function S in the encoder(25).So, we keep the idealized state codebooks C s but we change the next-state function into state function by its estimate H[n] that is computed based on the quantized precoder Q(H[n], s[n]) known to the decoder.As an estimate, we could for instance consider H[n] = [Q(H[n], s[n]), 0