Multiuser Detection Using Adaptive Multistage Matrix Wiener Filtering Schemes with Stage-Selection Criteria in DS-UWB

Adaptive reduced-rank (RR) multistage matrix Wiener ﬁltering (MMWF) techniques, based on the minimum mean-square error (MMSE) criterion, are proposed for direct-sequence (DS) ultra-wideband (UWB) communication systems. These RR-MMWF-based algorithms employ an adaptive fuzzy-inference determined ﬁlter stage. As a consequence, the proposed schemes achieve a substantial saving in complexity without compromising system performance and dynamic convergence/tracking capability. Additionally, the fuzzy-logic-controlled matrix conjugate gradient (MCG) algorithm is developed for a robust and reduced-rank implementation of the full-rank MMWF. Simulations are conducted to illustrate the convergence/tracking superiority and to provide a comparative evaluation of the proposed algorithms with the MMWF-based schemes using other adaptive stage-selecting criteria.


INTRODUCTION
Ultra-wideband (UWB) systems have drawn considerable attention as an indoor short-range high-data-rate transmission in wireless communications over the past few years.Equalization of the UWB signals [1,2] based on the conventional RAKE receiver technique has been addressed for both additive white Gaussian noise (AWGN) and multipath rich channels [3][4][5][6][7][8][9][10][11].However, the RAKE reception suffers from its multiple-access interference (MAI) suppression capability.It is well known that the linear minimum meansquare error (MMSE) receiver [12] is capable to suppress the MAI efficiently.In [13,14], the MMSE-based detectors are proposed for direct-sequence (DS) UWB communication systems.Moreover, it is shown that the MMSE decisionfeedback detection (DFD) receiver is able to provide a better performance than the MMSE receiver alone even when the error propagation occurs [15].The MMSE-DFD usually consists of one MMSE receiver in the forward path and one feedback filter in structure.Unfortunately, the computation of the MMSE-based filter weights starts with the calculation of the inverse of the input signal autocorrelation matrix, which involves an expensive computational cost.This requirement is even more exacerbated when the MMSEbased receiver operates in a nonstationary environment.To alleviate computational complexity, the authors in [16][17][18][19][20] propose a considerably lower complexity version of the MMSE receiver that utilizes the reduced-rank multistage vector Wiener filter (MVWF).This MVWF technique obviates the necessity of either a covariance matrix inversion or an eigen-decomposition.Additionally, there exist other iterative matrix inversion techniques, among which the conjugate gradient (CG) [21] scheme is able to provide fast initial convergence of the iterative procedure.It can also be shown that the CG scheme as well as the MVWF technique produces an MMSE approximation in the same Krylov subspace [22].
In this paper, an adaptive fuzzy-inference (FI) multistage matrix Wiener filtering (MMWF) technique, based on the MMSE performance criterion, is proposed to detect DS-UWB signals.A reduced-rank DFD scheme based on the MMWF is also considered.The MMWF, which can be compared analogously to the MVWF, is introduced to implement the MMWF-DFD receiver without a direct matrix inversion or eigen-decomposition.The feedforward and feedback filters of the MMWF-DFD receiver are capable of sharing the same calculation basis to alleviate the computational burden without affecting system performance.Moreover, the reduced-rank MMWF-based receivers [23] provide a significant performance gain and rapid adaptive convergence, relative to the conventional full-rank MMSE-based receivers, when observation-data support is limited [24].In addition, the matrix conjugate gradient (MCG) algorithm [25] is developed for a robust implementation of the full-rank MMWF.It should be pointed out that the filter-stage selection of the MMSE-based detectors governs the steady-state performance and the convergence characteristic.In general, a small-stage leads to rapid convergence but results in large steady-state MSE.The opposite phenomena occur when a large stage is chosen.To achieve better convergence/tracking capability and steady-state MSE performance of the MMWFbased receivers, we propose a fuzzy-inference controlled stage-selection mechanism in this paper.It can be shown that the fuzzy-inference system (FIS) [26] offers an effective and robust means to monitor instantaneous fluctuations of a dense multipath channel and thus is able to assist the MMWF-based receivers in selecting a proper time-varying filter stage M.
The rest of the paper is organized as follows.Section 2 describes the channel and system model.Sections 3 and 4 present the reduced-rank MMWF and the MMWF-DFD schemes, respectively.The reduced-rank MCG scheme is developed in Section 5.The details of the fuzzy-inference controlled filter-stage selection mechanism are given in Section 6. Section 7 analyzes the computational complexity of the proposed mechanism.Section 8 describes three existing filter stage-selection criteria.Numerical results and conclusions are presented in Sections 9 and 10, respectively.
Symbols for matrices (vectors) are denoted by boldface upper/lower case letters.The subscripts (•) x and (•) [x/ y] represent the integer floor of x and the integer division remainder operation of x/ y, respectively.The superscripts (•) and (•) H stand for transposition and Hermitian transposition, respectively.E{•} denotes the expected-value operator.| • | and • indicate, respectively, the absolute value and the matrix/vector Frobenius norm.I is the identity matrix.sgn denotes the sign operator.tr{•} is the trace of a matrix.Re(•) denotes the real part.Finally, round [•] indicates rounding to the nearest integer.

SIGNAL AND SYSTEM MODEL
In a K-user DS-UWB communication system with the use of BPSK modulation, the transmitted signal from user k can be expressed as follows [27][28][29][30]: where E k denotes the kth user's energy per pulse at the transmitter end and p(t) is the short-duration UWB pulse with unit energy [1].b k n/Nc ∈ {±1} denotes the n/N c th BPSK modulated data symbol of duration T s .Each symbol interval consists of N c transmission chips of duration T c , that is, The UWB multipath channel of user k can be described by its complex impulse response [6,[31][32][33][34]: where J k is the number of resolvable multipaths of user k. α k j indicates the complex multipath gain coefficient and τ k j is the propagation delay, which are associated with the jth path of user k.The probability distribution of α k j is given by N(0, (1/2)σ 2 k j ) + jN(0, (1/2)σ 2 k j ), where N(0, (1/2)σ 2 k j ) is a zero-mean Gaussian random variable with variance (1/2)σ 2 k j , j = 0, 1, . . ., J k −1.The energy of the jth channel path of user k, σ 2 k j , is given by where σ 2 0 is chosen to ensure that the average received energy is unity and τ RMS denotes the RMS delay spread.In addition, a chip-synchronous DS-UWB system is considered with τ k j = l k j T c , where l k j ∈ [0, J k −1] is selected randomly.In this paper, the parameters of CM4 [35] are used to generate the energy of each channel tap for the non-line-of-sight (NLOS) multipath channel.
After multipath fading channel "processing," the total received signal at the receiver is a superposition of propagated signals from all K users and the background channel noise.The received signal r(t) can be written as where n(t) indicates an AWGN.

REDUCED-RANK MMWF SCHEME
The received signal r(t) in ( 4) is passed through the chipmatched filter and is then sampled at the chip-rate over the multipath extended (N c + J k − 1)-chip period [36].For simplicity of notation, let N stand for the number of (N c + J k − 1) in what follows.Denote by the column N-vector of the discrete-time received samples corresponding to the ith information symbol interval.For the purpose of analysis, the desired users, Users 1∼J, are assumed to be perfectly synchronized at the receiver [36].
be the desired data Jvector and R rb Δ = E{r(i)b H (i)} denote the corresponding steering matrix.The MMSE receiver is the N × J matrix W, which is chosen to minimize the MSE, that is, MSE(W) 2 }.The weight matrix W is given by where R rr Δ = E{r(i)r H (i)}. Evidently, the computation of matrix W MMSE in (6) requires the inversion of matrix R rr .To avoid the computation of R −1 rr , the MMWF is used to perform decompositions of the observation vector by utilizing a series of orthogonal projections.Define the nonsingular linear transformation T 1 with the structure [37] where U 1 = R rb is an N × J matrix and B 1 is an (N − J) × N blocking matrix with B 1 U 1 = 0. Hence, the transformation of the vector r(i) by the operator T 1 in (7) yields a vector z 1 (i) in the form where b 1 (i) = U H 1 r(i) and r 1 (i) = B 1 r(i).Subsequently, the correlation matrix of z 1 (i), R z1z1 , and its inverse R −1 z1z1 can be computed as where the , is given by Consequently, the linear MMSE receiver of ( 6) can be reexpressed in the form where The first-stage (M = 1) orthogonal decomposition process of the MMWF receiver in ( 12) is illustrated in Figure 1.Subsequently, the decomposition procedure applied to W MMSE in ( 11) is used to W 1 and continued until the minimum dimension of both the data vector and the corresponding Wiener filter are achieved.Evidently, the maximum number of stages in the MMWF receiver is defined by M MAX = N/J .This results in a set of recursion equations with the number of stages M, as shown in Algorithm 1. Rank reduction is realized by truncating the multistage decomposition process at the Mth stage, where MJ N (full rank).Thus, the stage-M output, denoted by W MMWF,M (r(i)), can be obtained by the following equation: .

Feedforward and feedback filters of the MMWF-DFD scheme
Algorithm 1: Recursion equations for the M-stage MMWF/ MMWF-DFD schemes.

REDUCED-RANK MMWF-DFD SCHEME
The MMSE-DFD receiver is known to be able to outperform a linear MMSE detector.The MMSE-DFD usually consists of one MMSE receiver in the forward path and one feedback filter.The former is used for MAI suppression and the latter is for self-interference cancellation.Here, the parallel decision feedback detector (P-DFD) [38] based on the MMSE criterion is considered for multiuser detection in the DS-UWB communication systems.The feedforward filter of the P-DFD consists of the linear MMSE filter followed by an error estimation filter, as shown in Figure 2. Following the derivation in [38], the feedforward and feedback filters of the P-DFD receiver can be expressed, respectively, as Here, defines the J × J error covariance matrix and the J × J matrix ) JJ ] is adopted to normalize the matrix Q −1 .Note that the MMSE receiver in the forward path, W MMSE , can be computed by the MMWF in a reducedrank form with the use of U 1 = R rb .Fortunately, the feedback filter I can be computed efficiently by sharing the information from the MMWF.Specifically, by applying T 1 to both sides of R −1 rr at the first stage of decomposition, we have where Consecutively, a sequence of T 2 , . . ., T M is applied to perform successive orthogonal decompositions of and neglecting the term of R H rM bM R −1 rM rM R rM bM , Σ 1 becomes [39,40] Therefore, the matrix Σ 1 in ( 18) can be utilized to estimate Q in (15) as follows: Note that the MMWF-DFD scheme eliminates the need for a large matrix inversion, that is, R −1 rr , thus a substantial reduction of the computational cost can be achieved from the MMSE-DFD receiver.The set of recursion equations of the MMWF-DFD scheme and the estimate of the desired data vector are summarized in Algorithm 1.

REDUCED-RANK MCG SCHEME
The MCG algorithm can be applied to the common problem that we encounter in adaptive transversal filters.In other words, this algorithm is ideally suitable for deriving the solution of linear equations of a system, such as It is indirectly minimizing a cost function ξ defined as Note that the method of CG is simply the method of conjugate directions [41] where the search directions are constructed by conjugation of the residuals.In addition, it is worth to emphasize that the CG scheme cures the problem that the steepest descent (SD) method often finds itself taking steps in the same direction as earlier steps.In the CG algorithm, a set of R rr -orthogonal, or conjugate, search directions are picked and exactly only one step is taken in each search direction.Moreover, the difficulty is overcome by the CG method with using the Gram-Schmidt conjugation in the method of conjugate directions that all the old search vectors need to be kept in memory to construct each new one.
It is readily shown that the minimum MSE can be written as The MCG algorithm for implementing the MMWF starts with the initial matrix W MCG,0 , the initial search direction matrix D 0 , and the initial residual matrix G 0 = D 0 = R rb − R rr W MCG,0 .The MCG algorithm updates the filter matrix at the ( j + 1)th iteration as follows: where the step matrix is given by The residual matrix is calculated according to the equation given by The R rr -conjugate direction matrix is updated as follows: To sum up, the MCG algorithm is an iteration method for solving the Wiener-Hopf equation in a finite number of iterations.It can be shown that both the MCG and the MMWF schemes produce an MMSE approximation in the Algorithm 2: Recursion equations for the MCG scheme.same Krylov subspace [22].Additionally, both algorithms are based on optimization with identical cost functions, thus, computing the same approximate solution.The MCG algorithm is guaranteed to converge in N steps and converges more quickly when the eigenvalues of R rr are clustered together.Furthermore, the MCG scheme does not need to compute an estimate of R −1 rr .At every iteration step, the algorithm provides an improved approximation for the exact solution.Finally, the steps of the robust MCG algorithm with a fuzzy-inference controlled M-iteration are listed in Algorithm 2.

FUZZY-INFERENCE FILTER-STAGE SELECTION
The 2-to-1 fuzzy inference system (FIS) [26], based on the principle of fuzzy logic [42], uses the squared error (e 2 (i)) and the squared error variation (Δe 2 (i)) as the input variables at time i to assign the number of the filter-stage M(i+1).That is, where e 0 (i) = b(i) − W H MMWF/MCG,M (i)r(i), e 2 (i) = (e H 0 (i)e 0 (i))/J, and Δe ) is used to compute the vector e 0 (i) in blind-mode algorithms.In essence, the basic configuration of the FIS comprises four essential procedures, namely, (i) fuzzy sets for parameters, (ii) fuzzy rules, (iii) fuzzy operators, and (iv) defuzzification processes, which map a two-input vector, (e 2 (i), Δe 2 (i)), into a single-output parameter M for the adaptive time-varying stage selection.The function of each procedure in the FIS is introduced briefly as follows: (1) Fuzzy sets for parameters: The input variables of the FIS are transformed to the respective degrees to which they belong to each of the appropriate fuzzy sets via membership functions (MBFs).In what follows, the (e 2 , Δe 2 )-FIS system with the (8, 4)-partitioned regions to the fuzzy I/O domains [26] is employed, due to its excellent performance and moderate complexity(eight-triangular MBFs with centroids of the ultra-large (UL), very large (VL), large (L), medium (M), small medium (SM), small (S), very small (VS), and ultra-small (US), respectively, are selected to cover the entire universe of discourse for variables e 2 and M.) Four-triangular MBFs with centroids of the VL, L, M, and S, respectively, are utilized for the variable Δe 2 in this paper.The output of the fuzzification process demonstrates a fuzzy degree of membership between 0 and 1.
(2) Fuzzy control rules: This procedure is focused on constructing a set of fuzzy IF-THEN rules.Here, we claim that the convergence is just at the beginning in case of a "UL" e 2 and a "VL" Δe 2 and thus a "UL" value for M is used to speed up its convergence rate.On the other hand, the filter is assumed to operate in the steady-state status when e 2 is "US" and Δe 2 shows "S," and then a "US" M is adopted to lower its steady-state MSE.In particular, we may declare that a huge estimation error has occurred when e 2 is "US" and Δe 2 indicates "VL" and the "US" value of parameter M is assigned to system in order to stabilize system performance.
(3) Fuzzy operators: The fuzzified input variables are combined using the fuzzy "OR" operator, which selects the maximum value of the two, to obtain a single value.Subsequently, this is followed by the implication process, which defines the reshaping task of the consequent (THENpart) of the fuzzy rule based on the antecedent (IF-part).A min (minimum) operation is generally employed to truncate the output fuzzy set for each rule.Since decisions are based on the testing of all of the rules in an FIS, the rules need to be combined in some manner in order to make a decision.Aggregation is the process by which the fuzzy sets that represent the outputs of each rule are combined into a single fuzzy set.The input of the aggregation process is the list of truncated output functions returned by the implication process for each rule.The output of the aggregation process is one fuzzy set for each output variable.
(4) Defuzzification processes: The defuzzification process converts fuzzy control decision into nonfuzzy control signals.These control signals are applied to adjust the variable of M in order to improve convergence/tracking capability of the receiver.The crisp, physical control command is computed by the centroid-defuzzification method.The centroiddefuzzification output M is calculated by [43] where Υ is the number of discrete samples of the output MBF, M (l) (i) is the value at the location used in approximating the area under the aggregated MBF, and indicates the MBF value at location M (l) (i).To reduce the computational load in the centroid calculation, fewer points Υ must be used.The calculation of M(i + 1) in ( 27) returns the center of the area under the aggregated MBFs.

COMPUTATIONAL COMPLEXITY ANALYSIS
For the real-time applicability, a computationally efficient version of the M-stage MMWF scheme is derived and summarized in Algorithm 3 with the use of the blocking matrix B j = I − U j U H j and the estimated cross-correlation matrix R rj bj = (1 − μ) R rj−1bj−1 + r j b H j .The quantity of μ ∈ (0, 1] is referred to as the forgetting factor.The heavily computational operations of null(•) and E{•} can be avoided successfully.Thus, it can be easily evaluated from Algorithm 3 that the M-stage MMWF receiver costs

Forward recursion
Algorithm 3: Recursion equations for the simplified M-stage MMWF scheme a complexity of O(J 2 MN).Here, the big O(•) (order of) notation is used to indicate that complexity in number of operations is proportional to the argument.The complexity of the feedback filter of the MMWF-DFD scheme is at most O(J 3 ) (i.e., the computation of matrix , which is relatively small while compared to that of the MMWF scheme.Consequently, the computational complexity of the MMWF/MMWF-DFD systems is reduced substantially from O(N 3 ) to O(J 2 MN) for each computing cycle of clock time, where J 2 M N 2 .The primary complexity cost of the M-iteration MCG algorithm in Algorithm 2 is the calculation of the step matrix V j , which involves O(JN 2 ) + O(J 3 ) + O(J 2 N) ≈ O(JN 2 ) of complexity per iteration.The computational complexities of the W MCG, j+1 , G j+1 , Γ j+1 , and D j+1 , in terms of multiplications can be easily shown to be equal to O(J 2 N), O(JN 2 ), O(J 3 ) + O(JN 2 ) ≈ O(JN 2 ), and O(J 2 N) per iteration, respectively.Hence, the M-iteration MCG algorithm costs roughly O(JMN 2 ) of complexity.
The additional computational load introduced by the (2-to-1)-FIS, in terms of multiplications, is I + J + 3 at each sample time, in which the preparation of e 2 (i) requires J + 2 multiplications and the centroid-defuzzification output process costs I +1 multiplications.Furthermore, some special instructions (with a total of 44 lookups + 32 compares + 32I MAX operations) are required to perform the FIS, which come primarily from the fuzzification of two input variables (12 lookups), fuzzy OR operations (32 compares), fuzzy minimum implication (32 lookups), and aggregation of the output (32I MAX operations).Fortunately, these operations can be done very efficiently in the latest range of DSPs, which provide single cycle multiply and add, table lookups, and comparison instructions [44,45].

EXISTING STAGE-SELECTION CRITERIA
In this section, three filter-stage adaptation schemes used in [22,24] are briefly reviewed.The first stage-selection method is introduced originally in [46] for the rank-selection of an auxiliary-vector (AV) estimator.The time-varying stage-M of the AV filter is determined by the stopping rule, given by where P ⊥ S (x) is the orthogonal projection of the vector x onto the subspace S and the small positive constant η is computed by (37) in [24].Note that the subspace S m denotes the Krylov space spanned by the basis vectors v 1 , v 2 , . . ., v m , where v i = Vec{R i−1 rr R rb }.The second stage-selection technique for determining the filter stage is based on minimizing the cumulative exponentially-weighted squared error ξ, which is also know as the a posteriori LS method, given by where (•) M denotes the dynamic filter-stage at time i.For each i, the value of M is chosen to minimize ξ M (i) defined in (29).
The third stage-selection scheme is the well-known white noise gain constraint (WNGC) [22] technique where the filter-vector norm w is utilized as a rank-selection tool.The criterion used for the rank selection of the WNGC is 10 log w 2 ≤ 1 dB in this paper.

NUMERICAL RESULTS
A DS-UWB communication system with K = 20 is considered in multipath fading channels.Parameters N c = 310 and J k = 100 are used in computer simulations.In simulations, users 1 to 5 are the users of interest to be acquired, that is, J = 5.Additionally, the (e 2 , Δe 2 )-FIS system with the (8, 4)-partitioned regions to the fuzzy I/O domains [26] is employed due to its superior performance.The threshold level of the WNGC is selected as 1 dB in simulations.All experimental curves are obtained using 10 3 independent trials with the use of μ = 0.99 and η = 0.01.
Figure 3 compares the convergence rate of various reduced-rank MMWF-based algorithms with the use of training symbols for SNR = 20 dB.Results of dynamicstage MMWF algorithms using adaptation criteria of ( 28) and (29) (i.e., [24, equations (73) and (75)]) and WNGC are provided and compared.It is demonstrated in the figure that with the use of a small-stage (M = 2), the MMWF algorithm produces a faster convergence rate, while using a large-stage (M = 8) accomplishes a lower steady-state MSE.Thus, the proposed FI-MMWF algorithm, which performs the fuzzylogic filter-stage selection over the range of [2,8], takes advantage of both small and large stages in convergence and steady-state characteristic.Note that the extra computational load incurred by both stage-selection criteria in [24] is heavy, especially in the a posteriori LS method.simply substituting R rb by R rb ("spreading" code matrix of the desired users).The filter-stage selection of the FI-MMWF-based algorithms is conducted over the set of [2,5].Experimental results in Figure 4 are similar to those of in Figure 3.It should be pointed out that the convergence rate of the low-stage MMWF is much faster than that of the highstage MMWF-based in the blind version.Consequently, the advantage of fuzzy-stage selection MMWF-based algorithms in blind version is quite impressive.
Simulation results in Figure 5 show the convergence behavior of the blind-mode FI-MCG algorithm in terms of the number of iterations.Other parameters used in Figure 5 are set as in Figure 4. Evidently, the FI-MCG algorithm produces better convergence/tracking capability and steady-state MSE performance than MMWF schemes with a fixed stage.Additionally, the results in Figure 5 demonstrate that an improvement in MSE performance over the MCG scheme is achieved by the FI-MCG algorithm, presumably because of the use of a fuzzy variable stage in response to the time-varying fading channels.Also, these results show that the FI-MCG algorithm is able to accomplish a similar performance as the FI-MMWF-based approaches.Results in Figure 5 provide the convergence behavior of the MMWF-based algorithms using linear interpolation (LI) filter-stage selection criterion as well.With the use of the linear interpolation technique, the filter-stage update can be described by the following equations:  achieve better convergence/tracking capability and steadystate MSE performance over the LI-MMWF algorithm due to making full use of the 2-to-1 fuzzy-inference-based filterstage adaptation criterion.

CONCLUSIONS
The reduced-rank FI-MMWF-based receivers are proposed for data demodulation in the DS-UWB communication systems.The computational complexity of the forward path of the MMSE-DFD receiver is reduced by introducing the reduced-rank MMWF scheme.With the computation-basis sharing in the forward and backward filters of the MMWF-DFD receiver, the extra complexity incurred by the decision feedback mechanism is alleviated.Moreover, the MMWF-DFD receiver is able to achieve an improvement in convergence rate and offer an additional gain in performance for the MMWF receiver.In addition, the FI-MMWF-based receivers provide convergence/tracking and MSE performance benefits in multipath fading channels.Notably, the fuzzy-based MCG receiver is able to provide performance similar to those of the FI-based MMWF, and MMWF-DFD receivers.Furthermore, it is also noticed that the LI-MMWF algorithm does not outperform the FI-MMWF-based approaches, but does provide a lower complexity cost.As a consequence, these merits make the FI-based MMWF, MMWF-DFD, and MCG receivers well suitable for applications in the UWB wireless communications.

Figure 1 :
Figure 1: Block diagram of the first-stage orthogonal decomposition process of the MMWF receiver.

Figure 4
evaluates the convergence behavior of various blind-mode reduced-rank MMWF-based algorithms.The blind FI-MMWF-based algorithms can be obtained by C.-C. Hu and H.-

Figure 3 :
Figure 3: Mean square error versus the number of training symbols for reduced-rank MMWF-based algorithms.

Figure 4 :Figure 5 :
Figure 4: Mean square error versus the number of iterations for blind reduced-rank MMWF-based algorithms.
The pseudorandom code of length N c , {c k[n/Nc] }, denotes the normalized spreading code sequence of the kth user, where c k[n/Nc] takes the value of −1/ N c or +1/ N c with equal probability.