Frequency-Domain Block Signal Detection with QRM-MLD for Training Sequence-Aided Single-Carrier Transmission

,


Introduction
In next-generation mobile communication systems, broadband data services are demanded.Since the mobile wireless channel is composed of many propagation paths with different time delays, the channel becomes severely frequency selective as the transmission data rate increases.When the single-carrier (SC) transmission without any equalization technique is used, the bit error rate (BER) performance significantly degrades due to strong intersymbol interference (ISI) [1].The computational complexity of the maximum likelihood-(ML-) based equalization, that is, ML sequence estimation (MLSE), depends on the number of propagation paths and becomes extremely high in a severely frequencyselective channel [2].Therefore, several suboptimal linear detection schemes, such as time-domain and frequencydomain linear equalization schemes, have been proposed to reduce the computational complexity [3][4][5].A simple onetap frequency-domain equalization based on the minimum mean square error criterion (MMSE-FDE) can significantly improve the BER performance of cyclic prefix inserted SC (CP-SC) block transmission in a frequency-selective fading channel.However, a big performance gap from the matchedfilter (MF) bound still exists due to the presence of residual ISI after FDE.To narrow the performance gap, an MMSE-FDE combined with iterative ISI cancellation was proposed [6][7][8].However, the achievable BER performance is still a few dB away from the MF bound, particularly when high-level data modulation (e.g., 16QAM and 64QAM) is used.Near ML-based reduced complexity time-domain equalization schemes have been proposed in [9,10].
Recently, we proposed a near ML-based reduced complexity frequency-domain equalization scheme, which is called frequency-domain block signal detection (FDBD) using QR decomposition with M-algorithm ML detection (QRM-MLD), for the reception of CP-SC signals transmitted over a frequency-selective channel [11].QRM-MLD was originally proposed as a signal detection scheme for the multi-input multi-output (MIMO) spatial multiplexing in [12].In FDBD with QRM-MLD, QR decomposition is applied to a concatenation of the propagation channel and discrete Fourier transform (DFT).We showed [11] that FDBD with QRM-MLD can significantly improve the BER performance when compared to the MMSE-FDE and achieve TS Data symbols (0) TS Data symbols (1)   N c symbols N g symbols DFT block (a) TA-SC.CP (0)  Data symbols (0) CP (1)  Data symbols (1)   N c symbols N g symbols DFT block (b) CP-SC.the BER performance close to the MF bound even if high level data modulation is used.However, the use of a fairly large number M of surviving paths in the M-algorithm is required, leading to high computational complexity.If smaller M is used, the achievable BER performance degrades because of increased probability of removing the correct path at early stages.This probability greatly affects the achievable BER performance of FDBD with QRM-MLD.
In this paper, we will show that the use of training sequence-aided SC (TA-SC) block transmission [13,14] instead of CP-SC block transmission can significantly reduce the probability of removing the correct path at early stages in QRM-MLD and hence improve the achievable BER performance of FDBD with QRM-MLD.In TA-SC, CP is replaced by a known training sequence (TS), which is a part of DFT block at the receiver, and TS in the previous block acts as CP in the present block.When TA-SC is used, since the symbols to be detected at early stages belong to the known TS, the achievable BER performance of FDBD with QRM-MLD can be improved.The performance improvement of TA-SC over CP-SC when using FDBD with QRM-MLD is confirmed by computer simulation.
The remainder of this paper is organized as follows.In Section 2, TA-SC using FDBD with QRM-MLD is presented.In Section 3, we will show by computer simulation that TA-SC transmission using FDBD with QRM-MLD can achieve BER performance close to the MF bound while reducing the number of surviving paths when compared to CP-SC.We will also discuss the computational complexity of FDBD with QRM-MLD and show that TA-SC can reduce the overall complexity of FDBD with QRM-MLD to achieve almost the same performance as CP-SC.Section 4 offers some concluding remarks.

TA-SC Using FDBD with QRM-MLD
2.1.TA-SC versus CP-SC.The TA-SC block structure is illustrated and compared to CP-SC transmission in Figure 1.CP is replaced by TS.In order to let TS to play the role of CP, DFT size at the receiver must be the sum of number of useful data symbols and the TS length.In the case of CP-SC, the data symbol block length and the CP length are, respectively, denoted by N c and N g .For TA-SC, to keep the  same data rate as CP-SC, the data symbol block length and the TS length need to be set to N c and N g , respectively.The difference between TA-SC and CP-SC is the size of DFT to be used at the receiver; the DFT size is N c + N g symbols for TA-SC while it is N c symbols for CP-SC.

TA-SC Signal Transmission Model.
The TA-SC transmission model using FDBD with QRM-MLD is illustrated in Figure 2. Throughout the paper, the symbol-spaced discrete time representation is used.At the transmitter, a binary information sequence to be transmitted is data-modulated, and then the data-modulated symbol sequence is divided into a sequence of symbol blocks of N c symbols each.The data symbol block can be expressed using the vector form as d = [d(0), . . ., d(n), . . ., d(N c − 1)] T .Before the transmission, the TS of length N g symbols is appended at the end of each block.The block s to be transmitted is expressed using the vector form as where u = [u(0), . . ., u(n), . . ., u(N g − 1)] T denotes the TS vector which is identical for all blocks.We assume a symbol-spaced frequency-selective fading channel composed of L propagation paths with different time delays.The channel impulse response h(τ) is given by where h l and τ l are, respectively, the complex-valued path gain with and the time delay of the lth path.The lth path time delay is assumed to be l symbols, that is, τ l = l.The received signal block y (TA) = [y (TA) (0), . . ., y (TA) (t), . . ., y (TA) (N c + N g − 1)] T can be expressed using the vector form as where E s and T s are, respectively, the symbol energy and duration and n ] T is the noise vector.The tth element, n (TA) (t), of n (TA) is the zero-mean additive white Gaussian noise (AWGN) having the variance 2N 0 /T s with N 0 being the onesided noise power spectrum density.Since the identical TS is used for all blocks, the received signal block can be rewritten, similar to CP-SC transmission, as where h (TA) is the (N c + N g ) × (N c + N g ) channel impulse response matrix, given as At the receiver, (N c + N g )-point DFT is applied to transform the received signal block into the frequency-domain signal vector (TA) is expressed as where F (J) is the DFT matrix of size J × J, given as Due to the circulant property of h (TA) , we have [15] where and (•) H is the Hermitian transpose.Using ( 8), ( 6) can be rewritten as where H (TA) = H (TA) F (Nc+Ng ) and N (TA) = [N (TA) (0), . . ., N (TA) (k), . . ., N (TA) (N c + N g − 1)] T are, respectively, the equivalent channel matrix and the frequency-domain noise vector.

FDBD with QRM-MLD.
The conditional joint probability density function (pdf), p(Y (TA) | s), of Y (TA) for the given s can be given, from (9), as where σ 2 = N 0 /T s .The MLD is represented, from (10), as where d is the symbol-candidate vector.MLD requires a prohibitively high computational complexity.QRM-MLD [12], which was proposed for the signal detection for MIMO multiplexing, can achieve the BER performance near MLD with quite reduced complexity.In this paper, we apply QRM-MLD to TA-SC.QRM-MLD consists of two steps; QR decomposition and M-algorithm.In the case of SC transmissions, the signal-tointerference plus noise power ratio (SINR) is identical for all symbols in a block, and hence no ordering is necessary in the QR decomposition.First, the QR decomposition is applied to the equivalent channel matrix H (TA) to obtain (TA) , where Q (TA) is an (N c + N g ) × (N c + N g ) matrix satisfying Q (TA)H Q (TA) = I (I is the identity matrix) and R (TA) is an (N c + N g ) × (N c + N g ) upper triangular matrix.The transformed frequency-domain received signal T is obtained as From ( 12), the ML solution d (TA) can be obtained by searching for the best path having the minimum Euclidean distance in the tree diagram composed of N c + N g stages.However, in TA-SC, the N c , N c + 1, . . ., (N c + N g − 1)th elements of Y (TA) contain the training symbols only, and therefore only one path exists at the n = 0, 1, . . ., (N g − 1)th stages and the M-algorithm [16] can be started from the n = N g stage.
An example of the QRM-MLD is shown in Figure 3 assuming N c = 4 and N g = 2, binary phase shift keying (BPSK) modulation, and M = 3.In the n = N g th stage, all possible symbol-candidates for the last symbol d(N c − 1) in a data symbol block are generated (the number of all possible symbol-candidates is X for X-QAM).The path metric based on the squared Euclidean distance between Y (TA) (N c −1) and each symbol-candidate is calculated as where d(N c − 1) is the symbol-candidate for d(N c − 1).Next, M (M ≤ X) paths having the smallest path metric are selected as surviving paths.In the next stage (n = N g + 1), there are a total of X branches for d(N c − 2) leaving from each selected surviving path.Therefore, there are totally M•X possible paths for the two symbol sequence of d(N c − 1) and d(N c − 2).The path metrics are calculated for all possible M • X paths using Similar to the n = N g th stage, M surviving paths are selected from M • X paths.This procedure is repeated until the last stage (n = N c + N g − 1).The path metric at the nth stage (n = N g , N g + 1, . . ., N c + N g − 1) is calculated using The most possible transmitted symbol sequence is found by tracing back the path with the smallest path metric at the last stage (n = N c + N g − 1).QRM-MLD requires X{1 + M(N c − 1)} times squared Euclidean distance calculation, which significantly smaller than the original MLD that requires X Nc times squared Euclidean distance calculation.

Advantage of TA-SC over CP-SC. The received signal power associated with the symbol d(N
is the sum of the squared values of the (N c − 1), (N c − 2), . . ., (N c − 1 − i)th elements in the (N c − 1 − i)th column of R. In the case of SC transmission, the channel impulse response matrix is circulant, and therefore the magnitude of a lower right element of R drops with large probability [17].Therefore, the probability of removing the correct path is greater at early stages.
In the case of CP-SC transmission, the transformed frequency-domain received signal vector The lower right elements of R (CP) are relevant to the selection of the surviving path.Since the received signal power is lower at early stages, the probability of removing the correct path at early stages may increase when smaller M is used.The probability of removing the correct path at early stages affects significantly the achievable BER performance of FDBD with QRM-MLD.A fairy large M must be used to achieve the BER performance close to the MF bound.For example, M = 256 is necessary for the case of N c = 64 and 16QAM data modulation [11].The use of larger M increases the computational complexity.
In the case of TA-SC, it can be understood from (12) that the lower right elements of R (TA) are associated with TS, and therefore they are not relevant to the selection of the surviving path.The M-algorithm can start from the n = N g th stage and therefore, the probability of removing the correct path at early stages can be significantly reduced even if small M is used.This suggests that smaller M can be used for TA-SC than CP-SC.

Computer Simulation
The simulation condition is summarized in Table 1.The data symbol block length is N c = 64 for both TA-and CP-SC and the TS length of TA-SC is N g = 16 which is equal to the CP length of CP-SC.A partial sequence taken from a PN sequence with a repetition period of 4095 bits is used as TS.The same data modulation is used for TS and useful data.The channel is assumed to be a frequency-selective block Rayleigh fading channel having symbol-spaced L-path uniform power delay profile.Ideal channel estimation is assumed.

Average BER Performance.
The BER performance of TA-SC using FDBD with QRM-MLD is plotted in Figure 4 as a function of average received bit energy-to-noise power spectrum density ratio E b /N 0 (= (E s /N 0 )(1 + N g /N c ) /log 2 X) for M = 1, 4, and 16.For comparison, the BER performance of CP-SC [11] and the MF bound [18] are also plotted.It can be seen form Figure 4 that when small M is used, Figure 5 shows the pdf of the received signal power P Nc−1,n associated with the symbol d(N c − 1) at the nth stage, where P Nc−1,n is given by It is seen from Figure 5(a) that when CP-SC is used, the probability that the received signal power drops is high at early stages.Therefore, the probability of removing the correct path at early stages increases when smaller M is used.This is shown in Figure 6 which plots the probability of removing the correct path at the nth stage (n = N g , N g +1, N g +2 for TA-SC and n = 0, 1, 2 for CP-SC) when E b /N 0 = 10 dB and 16QAM is used.The use of larger M can reduce the probability of removing the correct path and hence improve the achievable BER performance; however, the computational complexity increases.The computational complexity of FDBD with QRM-MLD will be discussed in the next subsection.
In the case of TA-SC, the lower right elements of R are not used in QRM-MLD.Therefore, the probability that the received signal power at early stages drops is very low (see Figure 5(b)).As a consequence, the probability of removing the correct path at early stages is reduced.This is clearly seen in Figure 6.
Figure 7 plots the required E b /N 0 for achieving BER = 10 −4 as a function of M. For comparison, the required E b /N 0 for the MF bound is also plotted.In the case of CP-SC, the required value of M to achieve the BER performance close to the MF bound is 64 for QPSK and 256 for 16QAM and 64QAM.However, in the case of TA-SC, much smaller M is required, that is, M = 8 for QPSK and 16 for 16QAM and 64QAM.The performance gap of 1 dB from the MF bound is owing to the insertion of TS and CP. Figure 8 shows the influence of the number L of propagation paths on the required M to reduce the E b /N 0 gap from the MF bound for achieving BER = 10 −4 to 1.5, 2.5, and 3.0 dB for QPSK, 16QAM, and 64QAM, respectively.It can be seen from Figure 8 that the required M increases with L in the case of CP-SC.This is because the number of elements (whose magnitudes likely drop) of R in the lower right positions increases with L [17], and therefore the Probability of removing correct symbol candidates probability of removing the correct path at early stages also increases.However, in the case of TA-SC, required M does not almost depend on the number of L. Below, we examine the transmission performances of coded CP-SC and TA-SC systems.16QAM is assumed as the data modulation scheme.We employ a rate 1/3 turbo encoder using two (13,15)   and 8/9 turbo codes.Log-MAP decoding with 6 iterations is assumed.The packet length is set to 8 blocks (8N c symbols) in all simulations.The log likelihood ratio (LLR) is used as the soft-input in the turbo decoder.When FDBD with QRM-MLD is used, however, the LLR values cannot be directly computed, since surviving paths at the last stage do not necessarily contain both 1 and 0 for every coded bit.Therefore, how to estimate reliable LLR values is an important issue for FDBD with QRM-MLD.In our paper, we applied the LLR estimation scheme proposed in [19].The BER performance of turbo coded TA-SC using FDBD with QRM-MLD is plotted in Figure 9 as a function of average received E b /N 0 (= R(E s /N 0 ) (1 + N g /N c )/log 2 X) for M = 1, 4, and 16.For comparison, the BER performance of CP-SC is also plotted.It can be seen form Figure 9 that when small M is used, the achievable BER performance of CP-SC degrades.On the other hand, TA-SC can achieve better BER performance even if small M is used.The required value of M in TA-SC is 1, 16, and 16 for R = 1/2, 3/4, and 8/9, respectively, to achieve the BER performance similar to CP-SC using M = 256.

Complexity.
The computational complexities of FDBD with QRM-MLD required for TA-SC and CP-SC are discussed.The complexity here is defined as the number of complex multiply operations.The required number of multiplications is shown in Table 2. First, we discuss the number of multiplications required for the squared Euclidean distance calculations.In FDBD with QRM-MLD, the number of multiplications required for the squared Euclidian distance calculations is 2X + XM Nc−1 n=1 (n + 2), when M ≤ X.When M > X, it is a bit different from the case of M ≤ X.For example, when M = X 2 , the number of multiplications is (n + 2)X + (n + 3)X 2 + MX Nc−1 n=2 (n + 2).It can be seen from Figure 7 that the required value of M in TA-SC is 8, 4, and 2 for QPSK, 16QAM, and 64QAM, respectively, to achieve the BER performance similar to CP-SC with M = 256 when L = 16 (uncoded case).Therefore, the computational complexity required for the squared Euclidean distance calculations in TA-SC is reduced to about 3.1, 1.6, and 0.8% of that of in CP-SC.
Next, we discuss the overall computational complexity, which is the sum of the complexity required for DFT, QR decomposition, multiplication of Q H , and the squared Euclidean distance calculation.When the DFT size at a receiver is J, the number of complex multiplications is J 2 for DFT in general (There are also efficient algorithms for DFT [20]), J 3 + J 2 for QR decomposition, and J 2 for the multiplication of Q H .In TA-SC, CP is replaced by a known TS, which is a part of DFT block at the receiver, and TS in the previous block acts as CP in the present block as shown in Figure 1.In order to let TS to play the role of CP, DFT size at the receiver must be the sum of data symbol block length and the TS length.In this paper, for TA-SC to keep the same data rate as CP-SC, we have set the data symbol block length and the TS length to be N c and N g , respectively.Therefore, DFT requires (N c + N g ) 2 multiplications for the TA-SC case.Furthermore, it also requires large size of equivalent channel matrix H than that of CP-SC (resulting in higher complexity for QR decomposition and multiplication of Q H ).However, TA-SC can reduce significantly the computational complexity required for the squared Euclidean distance calculations as mentioned above.As a result, the overall computational complexity for TA-SC is smaller than that of CP-SC.The overall computational complexity in TA-SC is about 24, 7.4, and 2.3% of that in CP-SC for QPSK, 16QAM, and 64QAM, respectively, when L = 16 (uncoded case).performance than MMSE-FDE.However, when low-rate (R = 1/2) turbo-code is used, a fairy large M (M ≥ 512) must be used to achieve better BER performance than MMSE-FDE even if TA-SC is used.FDBD with iterative QRM-MLD may significantly improve the achievable BER performance.FDBD with iterative QRM-MLD for low-rate turbo-coded TA-SC system is left as an interesting future study.The use of TA-SC can reduce the computational complexity required for the M-algorithm, but still requires high computational complexity in the QR decomposition of the equivalent channel matrix.Another important future study is the further complexity reduction of FDBD with QRM-MLD.

nFigure 5 :
Figure 5: Pdf of the received signal power associated with the symbol d(N c − 1) at the nth stage.
recursive systematic convolutional (RSC) component encoders.The two parity sequences from the turbo encoder are punctured to obtain rate-1/2, 3

Figure 8 :
Figure 8: Required M as a function of the number L of propagation paths.
the achievable BER performance of CP-SC degrades.On the other hand, TA-SC can achieve better BER performance even if small M is used.The required value of M in TA-SC is 16, 16, and 4 for QPSK, 16QAM, and 64QAM, respectively, to achieve the BER performance similar to CP-SC using M = 256.The reason for this is discussed in the following.