Low-Complexity Banded Equalizers for OFDM Systems in Doppler Spread Channels

Recently, several approaches have been proposed for the equalization of orthogonal frequency-division multiplexing (OFDM) signals in challenging high-mobility scenarios. Among them, a minimum mean-squared error (MMSE) block linear equalizer (BLE), based on a band LDL factorization, is particularly attractive for its good tradeo ﬀ between performance and complexity. This paper extends this approach towards two directions. First, we boost the BER performance of the BLE by designing a receiver window specially tailored to the band LDL factorization. Second, we design an MMSE block decision-feedback equalizer (BDFE) that can be modiﬁed to support receiver windowing. All the proposed banded equalizers share a similar computational complexity, which is linear in the number of subcarriers. Simulation results show that the proposed receiver architectures are e ﬀ ective in reducing the BER performance degradation caused by the intercarrier interference (ICI) generated by time-varying channels. We also consider a basis expansion model (BEM) channel estimation approach, to establish its impact on the BER performance of the proposed banded equalizers.


INTRODUCTION
Orthogonal frequency-division multiplexing (OFDM) is a well established modulation scheme, which mainly owes its success to the capability of converting a time-invariant (TI) frequency-selective channel in a set of parallel (orthogonal) frequency-flat channels, thus simplifying equalization [1].Conversely, a time-variant (TV) channel destroys the orthogonality among OFDM subcarriers, introducing intercarrier interference (ICI) [2,3], and therefore making the OFDM BER performance particularly sensitive to Doppler-affected channels.Thus, the widespread use of OFDM in several communication standards (e.g., DVB-T, 802.11a, 802.16, etc.) and the increasing request for communication capabilities in high-mobility environments have recently renewed the interest in OFDM equalizers that are able to cope with significant Doppler spreads [4][5][6][7][8][9][10].Among those, a low-complexity MMSE block linear equalizer (BLE) has been recently proposed in [9], which, similarly to other equalizers, exploits the observation that ICI generated by TV channels is mainly induced by adjacent subcarriers [8].Thus, assuming that the ICI induced by faraway subcarriers can be neglected, the BLE in [9] takes advantage of a band LDL factorization algorithm to reduce complexity, which turns out to be linear in the number of subcarriers.However, the neglected ICI introduces an error floor on the BER performance of the equalizer in [9].
In this paper we analyze two techniques to reduce this error floor while maintaining linear complexity.The first technique we consider takes advantage of receiver windowing [11] to reduce the spectral sidelobes of each subcarrier, and hence the ICI.This approach has been previously proposed in [10] to minimize the neglected ICI.The scheme of [10] does not only rely on receiver windowing, but it also adopts an ICI cancellation technique guided by an MMSE serial linear equalizer (SLE).Our approach differs from that of [10] in two aspects.First, we slightly modify the window design of [10] to consider block linear equalization.Second, we do not consider ICI cancellation techniques, because this paper is focused on assessing performance of low-complexity oneshot equalizers, which could be possibly employed as the first step of any iterative cancellation approach.In this view, we show by simulation results that receiver windowing for the BLE is more beneficial than for the SLE when no ICI cancellation is adopted.
The second technique we investigate is based on the MMSE approach of [12,13] for decision-feedback equalization.Specifically, we incorporate the band LDL factorization of [9] in the design of a banded block decision-feedback equalizer (BDFE), and we show by performance analysis and simulations that the proposed BDFE outperforms the BLE of [9], while preserving exactly the same complexity.In addition, we join receiver windowing and decision-feedback equalization, thereby boosting the BER performance while keeping linear complexity in the number of subcarriers.
Actually, the proposed low-complexity equalizers have to be aware of the TV channel in order to perform equalization.Thus, in order to prove the usefulness of those equalizers in fast TV scenarios, channel estimation as well as its effect on the BER performance has to be considered.Recently, several authors [7,[14][15][16] proposed pilot-assisted channel estimation techniques.All these techniques model the channel by means of a basis expansion model (BEM), in order to minimize the number of parameters to be estimated, while preserving accuracy.More specifically, for block transmissions in underspread TV channels modeled by a complex exponential (CE) BEM, [15] proved the MSE optimality 1 of a time-domain training with equally-spaced, equally-loaded, and zero-guarded2 pilot symbols.Its natural dual in the frequency domain, with equally-spaced, equally-loaded, and zero-guarded pilot carriers has been considered in [14].In this paper, we focus on the frequency-domain version, because it seems more natural for OFDM block transmissions.Indeed, this choice of embedding training, in each OFDM block, does not force us to insert pilot-blocks in the time domain between OFDM blocks.Furthermore, current OFDMbased standards generally employ equally-spaced (not zeroguarded) pilot subcarriers for channel estimation purposes in TI environments.Thus, conventional OFDM systems could adopt the proposed strategy with minor modifications, and could be employed in fast TV channels.
We show that the frequency-domain training, coupled with a general BEM, provides significantly accurate LS and LMMSE estimates to enable the use of the proposed lowcomplexity equalizers, also in scenarios with high Doppler spread.
The rest of the paper is organized as follows.We consider the OFDM system model in TV channels in Section 2, while Section 3 illustrates a BEM-based channel estimation technique.We develop the design of banded equalizers and of receiver windowing in Section 4. In Section 5 we comment on simulation results for the BER performance of the proposed receivers, with and without channel estimation.Finally, in Section 6, some conclusions are drawn.

OFDM SYSTEM MODEL
Firstly, we introduce some basic notations.We use lower (upper) boldface letters to denote column vectors (matrices), superscripts * , T, H, and † to represent complex conjugate, transpose, Hermitian, and pseudoinverse operators, respectively.We employ E{•} to represent the statistical expectation, and x and x to denote the smallest integer greater than or equal to x, and the greatest integer smaller than or equal to x, respectively.0 M×N is the M × N all-zero matrix, I N is the N × N identity matrix, δ(i) is the Kronecker delta function, and • is the Frobenius norm.We use the symbol • to denote the Hadamard (elementwise) product between matrices, and the symbol ⊗ to denote the Kronecker product.We define [A] m,n as the (m,n)th entry of matrix A, [a] n as the nth entry of the column vector a, (a) mod N as the remainder after division of a by N, diag(a) as the diagonal matrix with (n,n)th entry equal to [a] n , and vec(A) as the vector obtained by stacking the columns of matrix A.
An OFDM system with N subcarriers and a cyclic prefix of length L is considered.Using a notation similar to [1], the kth transmitted block can be expressed as where u[k] is a vector of dimension is the Ndimensional vector that contains the transmitted symbols, and T is the P × N matrix that inserts the cyclic prefix, where I CP contains the last L rows of the identity matrix I N .Assuming that N A subcarriers are active and N V = N − N A are used as frequency guard bands, we can write where a[k] is the N A ×1 data vector.For simplicity, we assume that the data symbols contained in a[k] are drawn from a finite constellation, and are independent and identically distributed (i.i.d.), with power σ 2 a .After the parallel-to-serial conversion, the signal stream where T S = T/N is the sampling period, T is the useful duration of an OFDM block (i.e., without considering the cyclic prefix duration), and Δ f =1/T is the subcarrier spacing.Throughout the paper, we assume that the channel amplitudes are complex Gaussian distributed, giving rise to Rayleigh fading, and that the maximum delay spread is smaller than or equal to the cyclic prefix duration L, that is, h[n, l] may have nonzero entries only for 0 ≤ l ≤ L. We will also assume a wide-sense stationary uncorrelated scattering (WSSUS) model, characterized by where all the taps are subject to the same Doppler spectrum, and σ 2 l R h (0) = σ 2 l is the average power of the lth tap.For instance, classical Jakes' power spectral density is characterized by the Clarke autocorrelation function R h (t) = J 0 (2π f D t), where f D is the maximum Doppler frequency.
By assuming time and frequency synchronization at the receiver side, the received samples can be expressed as where n t [n] represents the AWGN with average power σ 2 nt = E{|n t [n]| 2 }.The P received samples relative to the kth OFDM block are grouped in the vector x[k], thus obtaining where , and H (k)  0 and H (k) 1 are P × P matrices defined by By applying the matrix R CP = [0N×L I N ] to x[k] in (6), the cyclic prefix (and hence the interblock interference) is eliminated, and introducing windowing we obtain, by (1), the N × 1 vector, where 0 T CP is the equivalent N × N channel matrix in the time domain, defined by and Δ W = diag(w) is an N × N diagonal matrix representing a time-domain receiver window.For conventional OFDM, which does not employ receiver windowing, Δ W = I N .By applying the DFT at the receiver, we obtain z which by (8) can be rearranged as where Λ (k) = FH (k) F H is the Doppler-frequency channel matrix that introduces ICI, C W = FΔ W F H is the circulant matrix used to possibly reduce the ICI, and represents the (possibly colored) noise, with covariance matrix expressed by Actually, for conventional OFDM, C W = I N , and the noise is white with R n W n W = σ 2 nt I N .The elements of Λ (k) are obtained by the 2D-DFT transform of the time-varying channel impulse response, as expressed by h (k) [n, l]e − j(2π/N)(qn+l(p−1)) , (12) where q is the discrete Doppler index, and p is the discrete frequency index.It can be observed that the channel frequency response, for each Doppler component, is stored diagonally on Λ (k) .
From now on, we consider a generic OFDM block, and hence we drop the block index k.Due to the TV nature of the channel, Λ in (10) is not diagonal.However, as shown in [8] for relatively high Doppler spread and in [5] for high Doppler spread, Λ is nearly banded, and each diagonal is associated, by means of (12), with a discrete Doppler frequency that introduces ICI.Hence, Λ can be approximated by the band matrix B (Figure 1), thereby neglecting the ICI that comes from faraway subcarriers.We denote with Q the number of subdiagonals and superdiagonals retained from Λ, so that the total bandwidth of B is 2Q + 1.Thus, B = Λ • T (Q) , where T (Q) is an N × N Toeplitz matrix with lower and upper bandwidth Q [17] and all ones within its band (see Figure 1).The integer parameter Q, which can be chosen according to some rules of thumb in [10], is very small when compared with the number of subcarriers N, for example, 1 ≤ Q ≤ 5.
In the windowed case, the banded approximation is expressed by Λ W ≈ B W , with B W = Λ W • T (Q) .Hence, the window design can be tailored to make the channel matrix "more banded," so that Λ W − B W < Λ − B [10].Indeed, it was shown in [10] that receiver windowing reduces the band approximation error.In this view, the band approximation is even more justified.
Due to the band approximation of the channel Λ W ≈ B W , the ICI has a finite support.Consequently, it is possible to design the transmitted vector a by partitioning training and data in such a way that they will emerge from the channel (almost) orthogonal.Specifically, as proposed in [15] for time-domain training, and in [14] for the frequency-domain counterpart, we can design the transmitted vector as where s l represents the lth pilot tone, and d l is a D × 1 column vector containing the lth portion of the data.By comparing ( 13) with (2), is it clear that U = N V /2.The parameter U represents the maximum value of Q that preserves at the receiver the orthogonality between data and pilots, in the banded channel.Thus, the choice of U at the transmitter can be done according to the maximum Doppler spread allowed at the receiver.It is interesting to observe that the transmitted vector in (13) contains equispaced pilots, which is an optimal choice also in channels that are not doubly selective [18].Specifically, for U = 0, the pilot pattern of (13) reduces to the optimal pilot placement for OFDM in TI frequency-selective channels [19].

PILOT-AIDED CHANNEL ESTIMATION
Among the possible channel estimation techniques, trainingbased techniques seem preferable in time-varying environments, because the channel has to be estimated within a single block.For instance, pilot-aided channel estimation techniques for block transmissions over doubly selective channels have been proposed and analyzed in [7,[14][15][16].A common characteristic of all these approaches is the parsimonious modeling of the TV channel by a limited number of parameters that can capture the time-variation of the channel within one transmitted data block.The basic idea is to express each TV channel tap as a linear combination of deterministic time-varying functions defined over a limited time span.Hence, the time variability of each channel tap is captured by a limited number of coefficients.This approach is known in the literature as the basis expansion model (BEM), and further details can be found in [20,21].The evolution of each channel tap in the time domain during the considered OFDM block is stored diagonally in the matrix H, as summarized by (9), or in the equivalent windowed channel matrix H W = Δ W H. More precisely, the lth tap evolution is contained in the vector T , where h[n, l] represents the lth discrete-time channel path at time n.The BEM expresses each channel tap vector h l as where ξ p represents the (p + 1)th deterministic base of size N × 1, which is the same for all taps and all OFDM blocks, η l,p is the (p + 1)th stochastic parameter for the (l + 1)th tap during the considered OFDM block, and P + 1 is the number of basis functions.Since the channel has been modeled by the BEM, the possibly windowed channel matrix H W can be expressed as where Z l represents the N × N circulant shift matrix with ones in the lth lower diagonal (i.e., [Z l ] n,(n−l) mod N = 1) and zero elsewhere.Clearly, Z l represents the lth delay in the lag domain.Consequently, where X p =F diag(ξ p )F H is a circulant matrix with circulant vector N −1/2 Fξ p , which represents the discrete spectrum of the (p+1)th basis function, T contains the (L + 1)(P + 1) BEM parameters, and Γ = [Γ 0,0 , . . ., Γ 0,P , Γ 1,0 , . . ., Γ 1,P , . . ., Γ L,0 , . . ., Γ L,P ].By (10) and ( 16), assuming a general BEM, the received vector becomes (17) which can be rewritten as where is the data-dependent matrix that couples the channel parameters with the received vector.Whatever is the choice for the deterministic basis {ξ p }, and assuming that the transmitted vector a can be partitioned as the sum of a known training vector s and an unknown data vector d, that is, and d = a − s (see ( 13)), the received vector becomes where Λ W d = Ψ (d) η.Now we introduce the (2U +1)(L+1)× N matrix P S obtained by selecting from the N × N identity matrix only those rows that correspond to the pilot symbols, that is, the rows with indices from (4U + D + 1)l + 1 to (4U + D + 1)l + 2U + 1, for l = 0, . . ., L, as expressed by We obtain where Φ = P s Ψ (s) is a matrix with size (2U + 1)(L + 1) × (P + 1)(L + 1).Note that the pilot pattern design in (13) takes advantage of the (almost) banded nature of the channel.Indeed, we observe that if Λ W is exactly banded with Q ≤ U, P S Λ W d in ( 22) is equal to 0 (2U+1)(L+1)×1 , and hence the interference produced by the data is eliminated.However, in general Λ W is not exactly banded, and hence we consider 22) as an interference term.Consequently, we can estimate the BEM parameters in the least squares (LS) sense, as expressed by and P ≤ 2U.Alternatively, if the receiver is aware of the channel statistics, the channel can be estimated in the linear MMSE (LMMSE) sense, as expressed by [22] where is the covariance matrix of the selected windowed noise (which reduces to R nn = σ 2 nt P S P H S = σ 2 nt I (2U+1)(L+1) for rectangular windowing), R ii = P S Ψ (d) R ηη Ψ (d)H P H S is the covariance matrix of the interference, and R ηη = E{ηη H } is the covariance matrix of the (P + 1)(L + 1) channel parameters, composed by square submatrices {R η l η j = E{η l η H j }} of size P + 1. Bearing in mind (14), it is easy to show that R η l η j can be obtained from the knowledge of the channel statistics, as expressed by R η l η j = Ξ † E{h l h H j }Ξ †H .After estimating the BEM parameter vector η, for example, by (23) or (24), we can recover the channel matrix Λ W by (16).
Depending on the chosen basis matrix Ξ, the channel matrix Λ W obtained by ( 16) could be banded or nonbanded.A popular choice for the basis functions is represented by complex exponentials (CE) [20], which is also suggested by the banded assumption for the channel matrix Λ W . Indeed, for CE with P = 2Q, the pth basis function is ξ p = f p−Q , which represents a discrete Doppler frequency shift.Consequently, (16) becomes which clearly reveals the banded nature of the channel matrix.However, for the sake of generality, other bases that do not lead to a perfectly banded channel matrix could be considered.A possibility is the use of discrete prolate spheroidal (DPS) sequences as basis functions [23].Another basis is the polynomial (POL) basis, where [ξ p ] n = ((n − 1)/N) p , similarly to that proposed in [24].A third option is based on generalized complex exponentials (GCE), where [ξ p ] n = e j2π(p−Q)(n−1)/KN , which represents a truncated oversampled Fourier basis [25].Also orthonormal and/or windowed versions of these bases are possible.In all these cases, except for the CE, the estimated channel matrix Λ W is not perfectly banded.However, we have already discussed the nearly banded structure of the true channel matrix.Hence, we select only the 2Q + 1 main diagonals of Λ W , thus obtaining

BANDED EQUALIZERS
In this section, we present some low-complexity equalizers obtained by exploiting the band approximation of the Doppler-frequency channel matrix.We start by summarizing some results derived in [9], where we proposed a banded MMSE block linear equalizer (BLE) without considering the potential benefit of receiver windowing.Subsequently, we focus on the window design and derive the windowed MMSE-BLE (W-MMSE-BLE).Finally, we extend the proposed approach to consider the MMSE-BDFE and the windowed MMSE-BDFE (W-MMSE-BDFE).
In our equalizer designs, we assume that the 2U subcarriers at the edges of the received block z are removed.Indeed, because of the edge guard bands in the transmitted block (13), the received block z contains little transmitted power in its edge subcarriers, which could also be affected by adjacent channel interference (ACI).Anyway, similar equalizer designs without guard band removal can be obtained with minor modifications.
As a consequence of the edge guard band removal, we denote by z W the N A ×1 middle block of z W , Λ W the N A ×N A middle block of Λ W , and B W = Λ W • T (Q) , where T (Q) is an N A × N A Toeplitz matrix defined like T (Q) .In addition, when no windowing is applied, we omit the subscript for the sake of clarity, and hence use z, Λ, and B, instead of z W , Λ W , and B W , respectively.

MMSE-BLE
The band approximation Λ ≈ B has been exploited in [9] to design a low-complexity MMSE-BLE, as expressed by where the SNR γ = σ 2 a /σ 2 nt is assumed known to the receiver.By exploiting a band LDL factorization of the band matrix (26) requires approximately (8Q 2 + 22Q + 4)N A complex operations [9].The bandwidth parameter Q can be chosen to trade off performance for complexity.Since Q N A , the computational complexity of the banded MMSE-BLE ( 26)-( 27) is O(N A ), that is, significantly smaller than that for other linear MMSE equalizers previously proposed, whose complexity is quadratic [5] or even cubic [6] in the number of subcarriers.In addition, as shown in [19], the complexity of the MMSE-BLE is lower than that for a noniterative banded MMSE-SLE, that is, the MMSE-SLE used to initialize the iterative ICI cancellation technique in [10].

Banded MMSE-BLE with windowing
We now investigate a time-domain windowing technique that makes the channel matrix Λ W more banded than Λ.Our aim is to improve the performance of the banded MMSE-BLE by reducing the band approximation error.
It is clear that the main difference with that in Section 4.1 is the noise coloring produced by the windowing operation, as expressed by (11).By neglecting the edge null subcarriers, (10) can be rewritten as where n = FR CP n t , and C ∼ W is the middle block of C W with size N A × N. Hence, by the band approximation In this view, we consider the minimum band approximation error (MBAE) sum-of-exponentials (SOE) window, which is expressed by where the coefficients {b q } are designed in order to minimize Λ W − B W . Thanks to the SOE constraint, the covariance matrix of the windowed noise is banded with total bandwidth 4Q + 1.This leads to linear MMSE equalization algorithms characterized by a very low complexity, which is in the number of subcarriers, as detailed in Section 4.2.2.

Window design
Our goal is to design a receiver window with two features.
We point out that, without the band approximation, the application of a time-domain window at the receiver does not change the MSE of the MMSE-BLE.This is why we adopt the minimum band approximation error (MBAE) criterion, which can be mathematically expressed as follows.Choose w that minimizes E{ E W 2 }, where E W = Λ W − B W , subject to the energy constraint tr(Δ 2 W ) = N. (Equivalently, E{ B W 2 } can be maximized subject to the same constraint.)Note that this criterion is similar to the max Average-SINR criterion of [10].Indeed, also in [10] the goal is to make the channel matrix more banded, in order to facilitate an iterative ICI cancellation receiver.Differently, in our case, we want to exploit the band LDL factorization, and hence we also require the matrix C ∼ W C ∼ H W in (30) to be banded.Since the W F H , we impose that the SOE constraint, that is, the elements of the window w, should satisfy (31).Indeed, when w is a sum of 2Q+1 complex exponentials, the diagonal of Δ 2  W can be expressed as the sum of 4Q +1 exponentials, and consequently, by the properties of the FFT matrix, FΔ 2  W F H is exactly banded with lower and upper bandwidth 2Q.Obviously, the class of SOE windows includes some common cosine-based windows such as Hamming, Hann, and Blackman.The SOE constraint (31) can also be expressed by where is a vector of size 2Q + 1 that contains the design parameters.By applying the MBAE criterion, by [10,Appendix], we obtain where H is an N × N matrix obtained from H by rearranging the diagonals as columns, that is, By maximizing (33) with the SOE constraint (32), the window parameters in b are obtained by the eigenvector that corresponds to the largest eigenvalue of F H (R H H •A) F. Note that this maximization leads to b q = b * −q , and consequently the MBAE-SOE window is real and symmetric.
We remark that the window design depends not only on the selected Q, but also on the time-domain channel autocorrelation R H H , and hence on the maximum Doppler frequency f D .Therefore, even if we assume a specific Doppler spectrum (e.g., Jakes), the designed window will be different for each ( f D , Q). Anyway, we will show that for reasonable values of f D the designed window does not change so much.Consequently, a small set of window parameters can be designed and stored at the receiver, and chosen depending on ( f D , Q).

Computational complexity
We show that the windowing operation produces a minimal increase in terms of computational complexity.In this computation, we neglect the complexity of the window design, which can be performed offline.For the same reason, we also neglect the computation of Moreover, due to the SOE constraint, only 4Q + 1 entries are different from zero.Consequently, since W , which is also Hermitian.In the absence of windowing, only N A CA were necessary.Hence, 2QN A extra CA are required.In addition, N extra CM are needed to obtain Δ W H in Λ W .We do not consider the complexity of the FFT, which should be performed also in the absence of windowing.As a result, the complexity increase of the banded MMSE-BLE due to windowing is roughly (2Q +1)N A complex operations, for a total of (8Q 2 + 24Q + 5)N A complex operations.
For the SLEs, the complexity increase is nearly equal to that for the BLEs.Hence, the W-MMSE-BLE is less complex than the noniterative MMSE-SLE with windowing.

Equalizer design
We design a banded BDFE that exploits the low complexity offered by the band LDL factorization algorithm of [9].To design the feedforward filter F F and the feedback filter F B (see Figure 2), we adopt the MMSE approach of [12].This approach minimizes the quantity MSE = tr(R ee ), where R xy = E{xy H } and e = a − a (Figure 2).We also impose the constraint that F B is strictly upper triangular, so that the feedback process can be performed by successive cancellation [13].
By the standard assumption of correct past decisions, that is, a = a, the error vector can be expressed by e = F F z − (F B + I NA )a.By the orthogonality principle, it holds R ez = 0 NA×NA , which leads to We now apply the band approximation Λ ≈ B, which by (27) leads to This result points out that the feedforward filter is the cascade of the low-complexity MMSE-BLE G MMSE-BLE , and an upper triangular matrix F B + I NA with unit diagonal.To design F B , we observe that R ee can be expressed as After standard calculations that also involve the matrix inversion lemma, we obtain To exploit the computational advantages given by the LDL factorization, we make the band approximation Λ H Λ ≈ B H B, thus obtaining By using the LDL factorization, and hence tr(R ee ) can be simply minimized by setting which renders R ee diagonal.By ( 27), ( 36), (40), and (41), we obtain Since B is banded, L 2 is lower triangular and banded, and D 2 is diagonal, it turns out that the banded MMSE-BDFE is characterized by a very low complexity, as detailed in the following.

Complexity analysis
We now compute the number of complex operations necessary to perform the proposed banded MMSE-BDFE.By means of ( 41) and (42), the soft output of the MMSE-BDFE, expressed by a = F F z − F B a, can be rewritten as Since B is banded, we need (2Q + 1)N A CM and 2QN A CA to obtain μ = B H z. The matrices L 2 and D 2 are obtained by band LDL factorization of M 2 .From [9], (2Q 2 + 3Q + 1)N A CM and (2Q 2 + Q + 1)N A CA are necessary to obtain M 2 .
It is worth noting that, thanks to the banded approach, the proposed MMSE-BDFE is characterized by exactly the same complexity as the MMSE-BLE, which is linear in the number of subcarriers.Therefore, the proposed banded MMSE-BDFE is less complex than other nonbanded DFE schemes.Just to consider a few, the serial DFE [5] has quadratic complexity, while the complexity of the V-BLASTlike successive detection [6] is O(N 4  A ).
EURASIP Journal on Applied Signal Processing

Performance analysis
We compare the mean-squared error (MSE) performance of the banded BDFE with the banded BLE of [9].By (39) and (41), it is easy to verify that Moreover, the MMSE-BLE can be obtained from the MMSE-BDFE by setting the feedback filter to zero.Thus, from (39) with F B = 0 NA×NA , we obtain which is obviously greater than MSE BDFE in (44).Hence, we expect that the bit error rate (BER) of the proposed MMSE-BDFE will be lower than that for the MMSE-BLE.However, we still expect a BER floor, due to the band approximation of the channel matrix.This fact will be confirmed later by simulations.

Banded MMSE-BDFE with windowing
In Sections 4.2 and 4.3, we have presented two low-complexity equalizers that exploit either MBAE-SOE windowing or decision-feedback.In this section, we marry banded BDFE and MBAE-SOE windowing.

Equalizer design
The equalizer design follows the same MMSE approach of Section 4.3, hence we highlight the main differences introduced by windowing.In the windowed case, the error vector is expressed by e = F F z W −(F B +I NA )a, and the orthogonality principle leads to We can apply Λ W ≈ B W , thereby obtaining To design the F B , we observe that R ee = (F . By the matrix inversion lemma, we obtain We now make the approximation where Note that the approximation (49) is equivalent to the approximation that is, the equality in (49) holds true if we design the feedback filter by including the edge guard bands in the correlation matrices.
Since C W is circulant, where Λ ∼ is the N×N A middle block of the unwindowed channel matrix Λ.Consequently, (50) reduces to Henceforth, we can exploit the computational advantages given by the LDL factorization algorithm in [9] by applying the band approximation , where B ∼ is the N × N A middle block of B, and B is the banded version of Λ.Consequently, we obtain which is formally similar to (39).Hence, tr(R ee ) can be minimized by using the band LDL factorization: which leads to where G W = G W-MMSE-BLE is expressed by (30).We highlight that also G W can take advantage from a band LDL factorization, as in (53).However, these two band LDL factorizations are applied to different matrices, whereas in the unwindowed MMSE-BDFE case they are applied on the same matrix M 2 expressed by (40).Consequently, in the windowed case, the complexity advantage is smaller than that in the unwindowed case, as detailed in Section 4.4.2.We also observe that the design of the feedforward and feedback filters does not consider the presence of pilot symbols used for channel estimation purposes (see (13)).However, we can always reinsert the known pilot symbols when performing the successive cancellation in the feedback path.This partially prevents the error propagation, because the pilots are equispaced.Alternatively, we can design (L + 1) smaller DFEs, each one for a single portion d l of the data in (13).

Complexity analysis
The performance and complexity analyses of the W-MMSE-BDFE can be obtained similarly as those of the unwindowed MMSE-BDFE case.However, the result of the complexity analysis turns out to be slightly different.In the following, we use the same approach of Section 4.3.2 to evaluate the number of complex operations required by the W-MMSE-BDFE.By (54) and (55), the soft output of the W-MMSE-BDFE, expressed by a = F F z W − F B a, can be rewritten as The computation of G W z W is equivalent to applying the banded W-MMSE-BLE and hence requires roughly (8Q As a result, the proposed banded W-MMSE-BDFE requires approximately (16Q 2 +42Q+7)N A complex operations.Hence, with MBAE-SOE windowing, the complexity of the banded W-MMSE-BDFE is nearly doubled with respect to the banded W-MMSE-BLE.However, thanks to the banded approach, also the complexity of the banded W-MMSE-BDFE is linear in the number of subcarriers.

SIMULATION RESULTS
The aim of this section is twofold.First, assuming perfect channel knowledge, we compare the BER performance of the proposed equalizers with the MMSE-BLE of [9], in order to establish the performance gain obtained by decisionfeedback and by windowing.Second, we show how the pilotaided channel estimation of Section 3 affects the BER performance.
In the first set of simulations (i.e., with perfect channel knowledge), we consider an OFDM system with N = 128, and a unique block with N A = 96 active and contiguous data subcarriers, a cyclic prefix with L = 8, and QPSK modulation.We also assume Rayleigh fading channels with exponential power delay profile and Jakes' Doppler spectrum.The root-mean-square delay spread of the channel, normalized to the sampling period T S , is σ = 3.
Figure 3 shows the BER performance of the MMSE-BDFE for different values of Q when the normalized Doppler frequency f D /Δ f = 0.15.We want to highlight that this value generally represents a high Doppler spread condition.For instance, for a carrier frequency f C = 10 GHz and a subcar-  rier spacing Δ f = 20 kHz, it corresponds to a mobile speed V = 324 Km/h.We can deduce from Figure 3 that the performance gain obtained by BDFE tends to increase for high values of Q. However the banded MMSE-BDFE still presents an error floor, which is due to the band approximation of the channel.
Figure 4 shows the results obtained by MBAE-SOE window design when Q = 1 for several values of f D /Δ f .In this case, since Q = 1, the window design reduces to the optimization of a single amplitude parameter, which is the ratio 2|b 1 |/b 0 plotted in Figure 4.This figure clearly shows that, for a large range of Doppler spreads, the optimum ratio is close to 0.852, which is the ratio that characterizes the Hamming window [11].However, for very high normalized Doppler spreads, the optimum ratio tends to decrease, that is, less energy should be allocated to the cosine component.Figure 5 presents the BER of the MMSE-BLE with SOE windowing when Q = 1 and f D /Δ f = 0.15.The best performance is obtained for the ratio 2|b 1 |/b 0 = 0.844, which corresponds to our MBAE-SOE design.It should be pointed out that also other suboptimum SOE windows outperform the rectangular window, which represents the case of no windowing and can be considered as a degenerated SOE window with ratio 2|b 1 |/b 0 equal to zero.
Figure 6 shows the BER for some linear equalizers with windowing when Q = 2 and f D /Δ f = 0.15.As far as the MMSE-BLE is concerned, the Hamming window, which is near optimum for Q = 1, outperforms the rectangular window.Anyway, the BER performance of the MMSE-BLE with MBAE-SOE window is even better, thus confirming the goodness of our window design.Among the BLE approaches, the non-banded MMSE-BLE of [6] has the lowest BER, but its computational complexity is cubic instead   of linear in the number of subcarriers.Figure 6 also displays the BER of some noniterative MMSE-SLEs, with and without windowing, obtained from [5,10].In the SLE case, windowing is less effective than that for BLE.The Hamming window slightly worsens the BER performance with respect to the rectangular window, and the MBAE-SOE window even more.This indicates that for SLEs windowing alone is not ef-  fective and should be coupled with iterative ICI cancellation techniques as in [10].
By Figure 6, we can also note that the proposed banded MMSE-BLE with MBAE-SOE window outperforms the nonbanded MMSE-SLE of [5], which has the lowest BER among the considered noniterative SLE approaches.In addition, the proposed banded MMSE-BLE with MBAE-SOE window has linear complexity in the number of subcarriers, whereas the nonbanded MMSE-SLE of [5] has quadratic complexity.
It is also interesting to observe that MBAE-SOE windowing allows for a complexity reduction by simply reducing the parameter Q, without any performance penalty.Indeed, by comparing Figure 5 with Figure 6, it is evident that the W-MMSE-BLE with Q = 1 (i.e., that with 2|b 1 |/b 0 = 0.844 in Figure 5) outperforms the unwindowed MMSE-BLE with Q = 2 (i.e., that identified by rectangular window in Figure 6).In addition, the complexity of the W-MMSE-BLE with Q = 1 is roughly 46% of the complexity of the unwindowed MMSE-BLE with Q = 2.
Figure 7 plots the shapes of the windows designed for Q = 2 and f D /Δ f = 0.15.It is evident that the MBAE-SOE window and the Schniter window [10] are very similar.The Schniter window, which is designed without the SOE constraint (32), produces an almost-banded noise covariance matrix.This means that the SOE constraint (32) does not exclude good windows.Moreover, it is interesting to note that for Q = 2 both the Schniter window and the MBAE-SOE window are very similar to the Blackman window [11].We also remember that for Q = 1 the MBAE-SOE window and the Schniter window are similar to the Hamming window (at  least for reasonable values of normalized Doppler spread).Although the Hamming and Blackman windows have been derived in a different context, we feel that this is not merely a coincidence.Indeed, many common windows, such as Hamming and Blackman, have been derived with the purpose of reducing the spectral sidelobes of the Fourier transform of the window [11].Similarly, in our case, we want to mitigate the ICI outside the band of the channel matrix, and this ICI is caused by the spectral sidelobes of the Fourier transform of the window.However, in our scenario, the window design is also dependent on other factors, such as the Doppler spectrum and the maximum Doppler frequency.
In the second set of simulations, we also take into account the effect of channel estimation.We consider an OFDM system with N = 256, U = Q, Q = 2 unless otherwise stated, L = 4, and QPSK modulation.We assume Rayleigh fading channels with uniform power delay profile and Jakes' Doppler spectrum with f D /Δ f = 0.256.As far as channel estimation is concerned, we choose P + 1 = 2Q + 1 GCE basis functions with oversampling factor K = 2 [25].The channel is estimated by using the LMMSE criterion (24).The power ratio ρ ≈ 3.316 between data and pilots has been chosen according to [26].The SNR is defined as the ratio between total signal power (including pilot power) and noise power.
Figure 8 illustrates the MSE of the channel estimation, defined as MSE = E{ H − H }/E{H} for the unwindowed channel and as MSE = E{ H W − H W }/E{H W } for the windowed channel, assuming Q = 2 and by using orthogonalized GCE (O-GCE) (i.e., Ξ is obtained after the QR decomposition of the GCE basis matrix) and orthogonalized windowed GCE (OW-GCE) (i.e., Ξ is obtained after the QR decomposition of the windowed GCE basis matrix) basis functions.Specifically, with O-GCE we first estimate H and  then we reconstruct H W = Δ W H by the knowledge of the MBAE-SOE window, whereas with OW-GCE we first estimate H W and then we reconstruct H = Δ −1 W H W .It is shown that in both cases it is better to estimate the windowed channel rather than the unwindowed channel.Moreover, the O-GCE basis produces a better estimate of the unwindowed channel with respect to the OW-GCE basis.
Figure 9 compares the BER performance of the banded W-MMSE-BDFE with the banded W-MMSE-BLE and the banded MMSE-BDFE.It is evident that the W-MMSE-BDFE outperforms the other two equalizers.Specifically, the W-MMSE-BDFE is able to reduce the error floor.This reduction is more pronounced for high values of Q.It is also worth noting that the degradation produced by channel estimation is quite small for both W-MMSE-BLE and W-MMSE-BDFE, especially at high SNR.Due to the good channel estimation, the BER floor is caused mainly by the band approximation.Similar conclusions can be drawn for different Doppler spreads.

CONCLUSIONS
In this paper, we have designed banded MMSE equalizers for OFDM systems in high Doppler spread channels.Thanks to a band LDL factorization algorithm, these MMSE equalizers are characterized by a low complexity.To enhance BER performance, both decision-feedback and optimum (in the MBAE sense) receiver windowing have been investigated.Moreover, by means of a BEM channel estimation approach, we validated the effectiveness of the proposed equalizers also in the presence of channel estimation errors.We remark that the values of Q used in the various band approximations could also be different.However, due to space constraints, we used the same value for all the band approximations.A  deeper analysis of the impact of different Q's could be the subject of future work.

Figure 1 :
Figure 1: Effect of the band approximation.In this example, we show only the active part of the matrix (N A = 8, Q = 1).
(a) The approximation Λ W ≈ B W should be as good as possible, and possibly better than the approximation Λ ≈ B. This would reduce the residual ICI of the banded MMSE-BLE.(b) The noise covariance matrix C ∼ W C ∼ H W in (30) should be banded, so that the equalization can be performed by band LDL factorization of
2 + 24Q + 5)N A complex operations.The band LDL factorization of M 4 needs (8Q 2 + 10Q + 2)N A complex operations.To perform L H 4 G W z W , we need 2QN A CM and 2QN A CA. To perform (L H 4 − I NA ) a, 2QN A CM and (2Q − 1)N A CA are required.Moreover, N A CA are necessary to perform the subtraction between L H 4 G W z W and (L H 4 − I NA ) a.