 Research
 Open Access
 Published:
Reducing the PAPR in FBMCOQAM systems with lowlatency trellisbased SLM technique
EURASIP Journal on Advances in Signal Processing volume 2016, Article number: 132 (2016)
Abstract
Filterbank multicarrier (FBMC) modulations, and more specifically FBMCoffset quadrature amplitude modulation (OQAM), are seen as an interesting alternative to orthogonal frequency division multiplexing (OFDM) for the 5th generation radio access technology. In this paper, we investigate the problem of peaktoaverage power ratio (PAPR) reduction for FBMCOQAM signals. Recently, it has been shown that FBMCOQAM with trellisbased selected mapping (TSLM) scheme not only is superior to any scheme based on symbolbysymbol approach but also outperforms that of the OFDM with classical SLM scheme. This paper is an extension of that work, where we analyze the TSLM in terms of computational complexity, required hardware memory, and latency issues. We have proposed an improvement to the TSLM, which requires very less hardware memory, compared to the originally proposed TSLM, and also have low latency. Additionally, the impact of the time duration of partial PAPR on the performance of TSLM is studied, and its lower bound has been identified by proposing a suitable time duration. Also, a thorough and fair comparison of performance has been done with an existing trellisbased scheme proposed in literature. The simulation results show that the proposed lowlatency TSLM yields better PAPR reduction performance with relatively less hardware memory requirements.
Introduction
Filterbank multicarrier (FBMC)based systems, clubbed with offset quadrature amplitude modulation (OQAM), is being seriously considered for future communication systems. FBMCOQAM has many attractive features such as excellent frequency localization, a power spectral density (PSD) with very low side lobes, an improved robustness to timevariant channel characteristics, and carrier frequency offsets. Armed with these properties, FBMCOQAM seems to be a more suitable candidate as a radio waveform for 5G radio access technology (RAT) than orthogonal frequency division multiplexing (OFDM), especially for asynchronous devices [1]. However, FBMCOQAM, as a multicarrier technique, has a high peaktoaverage power ratio (PAPR). There is an essential need to introduce novel methods relevant to PAPR reduction. In this paper, we mainly focus on PAPR reduction using probabilistic schemes.
Although several classifications of the PAPR reduction methods for OFDM do exist, there is a notable classification with five categories which are as follows: clipping effect transformations [2], coding [3], frame superposition: tone reservation (TR) [4], expansible constellation point: tone injection (TI) [5] and active constellation extension (ACE) [6] and probabilistic schemes: selected mapping (SLM) [7] and partial transmit sequence (PTS) [8]. The classical schemes, proposed for OFDM, cannot be directly applied to FBMCOQAM, owing to their overlapping symbol structure. Off late, some PAPR schemes have been suggested for FBMCOQAM systems, namely, ACE [9], Iterative clipping [10, 11], ACE combined with TR [12] and TR [13, 14].
Coming to recently proposed probabilistic schemes, three symbolbysymbolbased schemes have been proposed in [15–17]. In [18], a trellisbased PTS scheme with multiblock joint optimization (MBJO) has been introduced. Inspired by this trellisbased approach, a novel trellisbased SLM (TSLM) scheme has been presented in [19]. However, the existing TSLM technique needs very high hardware memory, which also impacts the latency. So, in this paper, we have proposed a lowlatency TSLM, which needs very low hardware memory and thereby avoiding latency issues. A thorough and fair comparison of performance has been done with existing probabilistic schemes, overlapped SLM (OSLM) [16], dispersive SLM (DSLM) [17], and MBJOPTS [18]. The simulation results show that there is a tradeoff between hardware memory and PAPR reduction and also that lowlatency TSLM yields better performance with relatively low computational complexity and low latency and requires less hardware memory.
The rest of the paper is organized as follows: Section 2 gives a brief overview of the FBMCOQAM signal structure and the impact of their overlapping nature. Section 3 presents the analysis of PAPR in FBMCOQAM signals, along with abridged introduction to the classical SLM scheme. In Section 3.3, we briefly discuss about the exhaustive search. Section 4 presents the idea of trellisbased approach with its capability in achieving an optimal PAPR reduction performance along with the TSLM algorithm. In the same section, we propose the lowlatency TSLM algorithm. In Section 5, the computational complexity of probabilistic schemes are derived. In Section 6, the simulation results are presented, and the conclusion of the paper is given in Section 7.
Overview of FBMCOQAM system
Let us consider that we need to transmit M×N complex symbols in a FBMCOQAM system over N tones. Then, we transmit real symbols at interval \(\frac {T}{2}\), where T is the symbol period [20]. In OQAM mapping, the M complex input symbol vectors {X _{0},X _{1},…,X _{ M−1}} are mapped into 2M real symbols {a _{0,n },a _{1,n },…,a _{2M−1,n }}. After this OQAM mapping, the real symbols undergo polyphase filtering that involves IFFT transformations along with filtering by a synthesis filter bank. The obtained continuoustime baseband FBMCOQAM signal x(t) can be written as [21]
where

x(t)≠0 from \(t=[0, \left (M\frac {1}{2}\right)T+4T)\)

\(\mathcal G\{.\}\) is the FBMCOQAM modulation function

\(a_{m^{\prime },n}\phantom {\dot {i}\!}\) are OQAM mapped real symbols from X _{ m }

h(t) is the prototype filter impulse response

\(\varphi _{m^{\prime },n}\phantom {\dot {i}\!}\) is the phase term, equals to \(\frac {\pi }{2}(m'+n)\pi m'n\)
The prototype filter used in this paper is the one designed in the European PHYDYAS project, whose most significant parameter is the duration of its impulse response also known as overlapping factor, K. For K=4, the h(t) is given by [22]. FBMCOQAM signals have overlapping nature. We can see in Fig. 1 that the duration of the impulse response in the case of rectangular filter used in OFDM is T, whereas the duration of h(t) spreads beyond one symbol period, and this impacts the FBMCOQAM signal, causing adjacent FBMCOQAM symbols to overlap.
Probabilistic PAPR reduction schemes for OFDM and their adaptation for FBMCOQAM
PAPR
For a continuoustime baseband FBMCOQAM signal x(t) that is transmitted during a symbol period T, the PAPR is defined by
The complementary cumulative density function (CCDF) of PAPR of a signal quantifies how frequent the PAPR exceeds a given threshold value γ, and it is defined as P r{PAPR_{ x[n]}≥γ}.
Selected mapping for OFDM signals
SLM was introduced in [7], where we generate U complex phase rotation vectors ϕ ^{(u)}, for 0≤u≤U−1, of length N as:
where \(\phi _{k}^{(u)}\) is the kth element of ϕ ^{(u)} defined as
The frequencydomain input symbols X with N tones are phase rotated by U phase rotations vectors of size N as given below
where ⊙ denotes the carrierwise pointtopoint multiplication. By applying IFFT operation, we obtain the U timedomain signal patterns {x ^{(0)}(t),x ^{(1)}(t),…,x ^{(U−1)}(t)}. The target of the optimization problem is to identify the signal \({x^{(u_{\text {min}})(t)}}\phantom {\dot {i}\!}\) that has the least PAPR so that
In the index of the respective phase rotation vector, u _{min} is sent to a receiver as side information (SI), comprising log2U bits. If SI is errorprotected, then BER of SLM is the same as the original OFDM.
Recently, some symbolbysymbol based schemes have been proposed for FBMCOQAM such as, OSLM [16] and DSLM [17]. The suboptimality of any symbolbysymbol approach is effectively dealt in [19], where it has been shown that whatever improvement that has been achieved for one symbol can probably be hampered by its immediate next symbol.
Exhaustive search
In order to achieve the optimal performance in PAPR reduction, one need to consider all the possible U phase rotations for all M symbols and pick out the best one out of the U ^{M} different combinations. In practical sense, it is meaningless to perform this exhaustive search, since it adds mammoth complexity to the implementation of any SLMbased scheme. To deal with the similar problem in the case of PTS, a trellisbased PTS scheme with multiblock joint optimization (MBJO) has been introduced in [18]. Nevertheless, for small values of U and M, simulation results will be presented in order to quantify the gap between the proposed method, TSLM, and the optimal exhaustive search.
Overview on trellisbased approach and TSLM algorithm
In order to circumvent the high computational complexity of exhaustive search, we opt for the dynamic programming, which can help in reducing substantially the number of paths one need to pick [23]. At any transition between two stages, we have U ^{2} paths to compare, and for M FBMCOQAM symbols, we have totally M−1 transitions. Therefore, the TSLM scheme needs to search only U ^{2}(M−1) paths. This is due to eliminating certain paths by evaluating them based on a metric. If we have to transmit M input symbol vectors {X _{0},X _{1},…,X _{ M−1}}, then we need to find Θ, which is the optimal set of M different phase rotation vectors that give the best PAPR
where \(\left \{u_{\text {min}}^{0}, u_{\text {min}}^{1},\ldots,u_{\text {min}}^{M1}\right \}\) are the indices of the optimal phase rotation vectors for the M input symbol vectors, which are to be sent to the receiver as SI. With M FBMCOQAM symbols and U phase rotation vectors, we need to find the best path in the trellis of Fig. 2 that gives the lowest PAPR. Choosing an optimal path in the trellis means finding the multiplicative vectors by solving (6), with the help of a trellis diagram.
For 0≤m≤M−1, every mth FBMCOQAM symbol x _{ m }(t), obtained from modulation of input symbol vector X _{ m }, is represented as the mth stage in the trellis at time instant mT. At each stage, there will be U different states, representing the rotated FBMCOQAM symbols. Among these states, any ith trellis state indicates rotation by phase vector ϕ ^{(i)}. Between every two stages, there exist U ^{2} possible paths. The joint FBMCOQAM modulation of the mth and (m+1)th rotated input symbol vectors \(\mathbf {X}_{m}^{(u)}\) and \(\mathbf {X}_{m+1}^{(v)}\), respectively, is represented in the trellis by the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\) between the uth state in the mth stage and the vth state in the (m+1)th stage, where ⇒ represents a transition between two successive stages.
The partial PAPR that has been calculated between two stages with multiple states serves as characteristic of path metric, which can aid at identifying the U optimal paths that arrive at successive stages. Unlike a full PAPR, a partial PAPR of a signal x(t) is computed over a particular time instant T _{0}. For the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\), its path metric \(\Gamma ^{(u,v)}_{({m},{m}+1)}\) can be written as
where f(.) is any convex function and \(\text {PPAPR}^{(u,v)}_{{m},{m}+1}\) is the partial PAPR of the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\), which is to be computed over duration T _{0} as
where T _{0}∈[m T+T _{ a },m T+T _{ b }), which is any arbitrary interval within the [m T,m T+4.5T) interval. It has to be noted that T _{ a }≥0 and T _{ b }<4.5T. Similarly, we define a state metric Ψ _{(u,m)} at the mth stage as a measure of optimality of cumulative path metrics of the optimal paths that arrived to this state from previous stages through various transitions. It can be evaluated simply by adding the path metric \(\Gamma ^{(w,u)}_{(m1\Rightarrow m)}\) of the arriving optimal path \(\zeta ^{(w,u)}_{(m1\Rightarrow m)}\) from the wth state of the previous (m−1)th stage with the state metric Ψ _{(w,m−1)} of the wth state from which this optimal path departs.
The whole optimization problem in this regard can be viewed as a continuum of overlapping optimization subproblems, i.e., finding a FBMCOQAM signal with least PAPR is equivalent to obtaining the accumulation of the least peaks. This is reflected in the state metric of a given state at any stage.
TSLM algorithm
In the TSLM, every two symbols are rotated with different phase rotation vectors that are i.i.d and are FBMCOQAM modulated. The two optimal states between the two successive stages are chosen among others, based on the least PAPR criterion that has been computed over a given time instant T _{0}. The TSLM algorithm involves the following steps:

Step 1—Initialization: Firstly, we generate M complex input symbol vectors {X _{0},X _{1},…,X _{ M−1}} and U phase rotation vectors {ϕ ^{(0)},ϕ ^{(1)},…,ϕ ^{(U−1)}} of length N as per (3). We initialize the counter m and the state metrics for all states of the first stage as below.
$$\begin{array}{*{20}l} m&=0, \end{array} $$(10)$$\begin{array}{*{20}l} \Psi_{(u,0)}&=0,~u=0,\ldots,U1. \end{array} $$(11)As long as the condition 0≤m≤M−2 is satisfied, we perform steps 2, 3, 4, 5, and 6 in a repeated manner.

Step 2—Phase rotation: Two input symbol vectors X _{ m },X _{ m+1} are phase rotated with U different phase rotation vectors, as per (5), giving \(\left \{\mathbf {X}_{m}^{(0)},\mathbf {X}_{m}^{(1)},\ldots,\mathbf {X}_{m}^{(U1)}\right \}\) and \(\left \{\mathbf {X}_{m+1}^{(0)},\mathbf {X}_{m+1}^{(1)},\ldots,\mathbf {X}_{m+1}^{(U1)}\right \}\), respectively.

Step 3—FBMCOQAM modulation: For 0≤u,v≤U−1, FBMCOQAM modulation is done jointly for all combination of the patterns of the mth and (m+1)th input symbols, along with the preceding symbols, such as
$$ \begin{aligned} x_{{m},{m}+1}^{(u,v)}(t) \!=&\mathcal G\left\{\! \ldots,\mathbf{X}_{m2}^{\left(\boldsymbol{\lambda}\left(\left(\boldsymbol{\lambda}(u,{m1})\right),{m2}\right)\right)}, \mathbf{X}_{m1}^{\left(\boldsymbol{\lambda}(u,{m1})\right)},\mathbf{X}_{m}^{(u)}, \mathbf{X}_{m+1}^{(v)}\right\}, \end{aligned} $$(12)where λ(u,m−1) is the surviving phase rotation at the uth state of stage m.

Step 4—Path metric calculation: For each of the U ^{2} patterns of the modulated FBMCOQAM signal x m,m+1(u,v)(t), we compute partial PAPR as per Eq. (9). For the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\), we calculate its path metric \(\Gamma ^{(u,v)}_{({m},{m}+1)}\) according to (8).

Step 5—Survivor path identification: The states of stage m that are related to the survivor paths leading to stage m+1 are stored in a state matrix λ(v,m) of order U×M, as given below
$$\begin{array}{*{20}l} {}\boldsymbol{\lambda}(\! v,{m}) \,=\,\! \min\limits_{u\in [0,U1]}{\!\left[\! \Psi_{(u,{m})} \,+\, \Gamma^{(u,v)}_{({m},{m}+1)}\! \right]},\! ~v \,=\, 0,\ldots,U \,\, 1. \end{array} $$(13) 
Step 6—State metric updation: The state metric Ψ _{(v,m+1)}, for the stage m+1, can be updated as follows:
$$\begin{array}{*{20}l} {}\Psi_{(v,{m}+1)} \,=\, \Psi_{\left(\boldsymbol{\lambda}(v,{m}),{m}\right)} + \Gamma^{\left(\boldsymbol{\lambda}(v,{m}),v\right)}_{({m},{m}+1)},~v=0,\ldots,U1. \end{array} $$(14) 
Step 7—Incrementation: Increment the value of m by 1 and if 0≤m≤M−2, then go to step 2, or else, if 0≤m=M−1, go to step 8.

Step 8—Traceback: Once state metrics for all the the Mth stages has been computed, then identify the state that has the least state metric as shown below
$$\begin{array}{*{20}l} \boldsymbol{\Theta}(M1)&=\min\limits_{u\in [0,U1]}{\left[\Psi_{(u,M1)}\right]}. \end{array} $$(15)Then, start tracing back from last stage to the first one in order to find the unique survivor path Θ by identifying the optimal states at each stage as below
$$\begin{array}{*{20}l} \boldsymbol{\Theta}(k)&=\boldsymbol{\lambda}(\boldsymbol{\Theta}(k+1),k), \end{array} $$(16)where k=M−2,M−3,…,1,0. This survivor path Θ is the set of optimal phase rotation vectors that is obtained after solving the optimization problem by dynamic programming and its indices \(\{u_{\text {min}}^{0}, u_{\text {min}}^{1},\ldots,u_{\text {min}}^{M1}\}\) are supposed to be transmitted to the receiver as SI.
Proposed lowlatency TSLM in terms of hardware memory and latency
When we consider implementation complexity, we need to take two things into account, computational complexity and hardware memory. The former shall be dealt in our analysis in the next section. The originally proposed TSLM [19] needs a state matrix λ of order U×M, which means we need to store in total MNU timedomain complex samples in memory, before we start tracing back. This adds latency by M stages and requires very huge hardware memory. A latency of M stages means that we have to traceback until M stages for the identification of survivor paths. Hardware memory can significantly impact the implementation cost, and high latency is undesirable in some critical communication systems.
We have studied the impact of traceback depth parameter ∂, which heavily impacts not only in the PAPR reduction performance but also in the latency and hardware memory requirements. It has to be noted that the choice of ∂ depends upon the prototype filter overlapping factor K. So, in this paper, we propose a low latency TSLM that requires less hardware memory and also have lower latency when compared to the originally proposed TSLM. In the new proposal, the indices of the survivor paths can be stored, reducing the memory requirements to MU. However, we store the indices of the optimal states. When a new FBMC symbol pair (m,m+1) is processed (step 2 to step 6), we freeze definitely the rotation vector at stage m−∂. It is then possible to compute the modulated signal from (m−∂)T to (m−∂+1)T. Thus, we can slowly accumulate the modulated signal related to individual symbols, in order to obtain the total signal.
Later, in the simulation results, we shall show that for any value of ∂>K, the PAPR reduction performance of the lowlatency TSLM is the same as that of the originally proposed TSLM. In our analysis, we have realized that there is a tradeoff between latency and PAPR reduction performance. The PAPR reduction performance of lowlatency TSLM varies from being suboptimal to quasioptimal, depending upon the choice of ∂. However, it has to be noted that both the original TSLM and lowlatency TSLM have same computational complexity.
Computational complexity analysis of trellisbased probabilistic schemes
This section aims at fair comparison of PAPR reduction performances of TSLM and MBJOPTS [18] schemes in terms of computational complexity. A fair comparison of any PTS and SLM scheme cannot be possible, if both schemes do not exhibit the same computational complexity [24]. The complexity analysis in this paper includes both complex multiplications and additions. The following consideration holds generally for any SLM and PTS schemes that are applied in FBMCOQAM systems. However, in the performance comparison between the two schemes, only the complex multiplications are considered, since they dominate the overall complexity in common hardware implementations [25]. We have given general expressions for computational complexity, so that for any given probabilistic scheme, they can be readily derived accordingly.
Derivation of computational complexity in TSLM for multiplications
In any SLMbased scheme, the complexity in implementation will be due to phase rotation, FBMCOQAM modulation, and metric calculation. Let us denote the complexity related to these three operations in the TSLM scheme as c _{rot}, c _{mod}, and c _{met}, respectively. The phase rotation of the mth input symbol vector X _{ m } needs complex multiplications equal to the number of its tones N. So, the complexity c _{rot} is given as
In polyphase filtering operation, we perform IFFT and filtering with h(t). OQAM mapping involves complextoreal symbol mapping. It may seem that one has to perform two real IFFT operations. Nevertheless, it is possible to compute two real IFFTs simultaneously like a single complex IFFT operation without increasing the number of complex multiplications [26]. The same can be applied in filtering with h(t). Thus, the computational complexity involved in FBMCOQAM modulation c _{ mod } is given as
In the metric calculation operation, we need N complex multiplications to find the peak. The c _{met} depends on T _{0} and is given as
where T _{0} is the duration of time in terms of N and d is a constant that represents the number of successive symbol intervals, considered for metric calculation.
The computational complexity in the case of TSLM is summarized in Table 1, and its general expression is given below
Derivation of computational complexity in MBJOPTS for multiplications
In PTS scheme, we individually perform phase rotation in time domain to the V subblocks and then add them, leading to W ^{V} different signal patterns, where W is the total number of candidate phases that is to be chosen for a subblock. MBJOPTS scheme is a trellisbased adaption of classical PTS scheme to FBMCOQAM system by multiblock joint optimization and is presented in [18]. Unlike SLM, in any PTSbased scheme, we can perform phase rotation in time domain. This avoids the need for multiple FBMCOQAM modulation operations. Thus, the complexity due to FBMCOQAM modulation in a PTSbased scheme \(\hat {c}_{\text {mod}}\) can be reduced as \(\frac {N}{V}\)point IFFT.
Since we consider a certain time duration T _{0} for partial PAPR calculation, we need dN complex multiplications within that time duration. The computational complexity involved in phase rotation operation for MBJOPTS scheme \(\hat {c}_{met}\) is given by
The computational complexity involved in metric calculation for MBJOPTS scheme \(\hat {c}_{met}\) is given by
General expression for MBJOPTS computation complexity for M FBMCOQAM symbols has been derived similarly based on the information in Table 2
From (20) and (25), it is clear that, in FBMCOQAM with TSLM and MBJOPTS, the complexities involved in rotation and metric calculation are linear w.r.t N, whereas the modulation complexity with TSLM and MBJOPTS are of order \(\mathcal {O}(\frac {N}{2}\log _{2}(N))\) and \(\mathcal {O}(\frac {N}{2V}\log _{2}(\frac {N}{V}))\), respectively. It implies that the modulation operation has much significant complexity than the remaining ones. From the size of the phase rotation point of view, the complexity is solely dominated by U in TSLM. On the contrary, it is distributed between V and W in MBJOPTS.
Condition for identical computational complexity
In order to avail a fair comparison, the condition for identical computation complexity in both TSLM and MBJOPTS schemes is given by
By substituting (20) and (25) in (27), we obtain
For a large value of M, the term \(1\frac {1}{M}\to 1\), and therefore, it can be neglected. Eq. 28 is simplified
The possible root U in ideal case for the quadratic function (29), denoted by U _{root} is given by
where Δ is the discriminant, which is given by
Derivation of addition computational complexity in TSLM and MBJOPTS
The computational complexity due to complex additions for M FBMCOQAM symbol, in the TSLM and MBJOPTS schemes, is summarized in the Table 3. The expressions for computational complexity due to complex additions for TSLM and MBJOPTS can be derived accordingly in a similar fashion to that of complex multiplications. However, in the case of MBJOPTS, we need to take into account the extra V additions needed per symbol, due to subblock readdition.
Simulation results
The objective of the simulations is to analyze the performance of low latency TSLM scheme in comparison with OFDM when classical SLM scheme is used. Simulations are done for a FBMCOQAM signal that has been generated from 10^{5} 4QAM symbols with 64 tones. The PHYDYAS prototype filter [22], which spans over 4T was used by default unless specified otherwise. The range of the complex phase rotation vector was chosen such as ϕ ^{(u)}∈{1,−1}. In general, most of the PAPR reduction schemes are implemented over discretetime signals. So, we need to sample the continuoustime FBMCOQAM signal x(t), thereby obtaining its discretetime signal s[n]. In order to well approximate the PAPR, we have oversampled the modulated signal by a factor of 4 [27] and then implemented the TSLM scheme on the discretetime signal s[ n]. Exponential function has been used as the function f in (8), when calculating the path metrics. We have tried to see the impact of higher constellation on PAPR reduction with TSLM but found 16QAM to be more or less the same as 4QAM.
Impact of variation of T _{0} duration
When step 3 of the TSLM algorithm is proposed, we are interested with the PAPR related to the mth and (m+1)th input symbols over the duration T _{0}∈ [m T+T _{ a },m T+T _{ b }). Looking at Fig. 1, we can notice that these two symbols have an impact on the overall signal mainly in the interval [m T+T,m T+4T] (i.e., T _{ a } _{min}=T and T _{ b } _{max}=4T. As shown in Fig. 3, choosing T _{ a }=2T and T _{ b }=4T seems to be the lower bound as it yields better performance than the remaining intervals. If we choose intervals T _{ a }>2T or T _{ b }<3T, then there is a significant degradation on the performance. In conclusion, it was found that the intervals T _{ a }=2T and T _{ b }=4T are a quasioptimal choice, meanwhile having a lower complexity.
Comparison of TSLM and exhaustive search approach
In an exhaustive search over M symbols, all U ^{M} possible phase rotations are tested and the best one is chosen. With the trellisbased approach, only U ^{2} possible phase rotations are tested in step 3 of the TSLM algorithm and U of them are kept as surviving paths. By avoiding exhaustive search, we hamper optimality in trellisbased approaches. Thus, any trellisbased approach lags behind exhaustive search approach.
So, we tried to analyze how much better the TSLM fares in terms of PAPR reduction, w.r.t. exhaustive search approach. Since, it is not possible to simulate an exhaustive search over 10^{5} symbols, we have considered 10 symbols with U=2 and performed Monte Carlo simulation for 10000 number of times to be sufficient. It means we have to perform a search over 1024 different patterns and pick the one with the least PAPR. In Fig. 4, we have plotted the CCDF of PAPR for TSLM and exhaustive search. We can notice from this figure that TSLM is indeed a quasioptimal approach. Because, we loose a mere 0.65 dB at 10^{−3} of CCDF of PAPR, while reducing the computational complexity from \(\mathcal {O}(U^{M})\) to \(\mathcal {O}\left ((M1)U^{2}\right)\).
Impact of the size of U
Like any SLM scheme, the size of phase rotation vector impacts the performance of the PAPR reduction. With OFDM, we have only U possible phase rotations for PAPR reduction in the time interval T because we have a symbolbysymbol approach. Whereas with FBMCOQAM, we have U ^{M} possible phase rotations for reducing the PAPR in the time interval (M+3.5)T. The ratio of number of possible phase rotation divided by the impacted time interval is always better for FBMCOQAM explaining the fact that trellisbased approach can outperform the performance of OFDM for the same number of phase rotation vectors U.
For an illustration of impact of U, we have considered T _{0}=[m T+2T,m T+4T). The different sizes considered are U={2,4,8}. The values at 10^{−3} of CCDF of PAPR in Fig. 5 has been summarized in Table 4. We can see from this table that the FBMCOQAM with TSLM has outperformed the OFDM with classical SLM by 0.35, 0.24, and 0.02 dB at 10 ^{−3} value of CCDF of PAPR when U=2, U=4, and U=8, respectively. It is worth noting that at 10 ^{−3} value of CCDF of PAPR when U=2, we are able to achieve 1.73dB PAPR reduction from the original signal with an SI of 1 bit. Such proper exploitation can be possible with the trellisbased approach instead of symbolbysymbol optimization. Another observation is that the lead gap between CCDF curves of OFDM and FBMCOQAM gets narrowed as U increases.
Impact of traceback depth ∂ on latency and hardware memory
Even though TSLM is quasioptimal, as mentioned earlier, it is important to take into account the hardware memory and latency induced by this algorithm. We can observe in Fig. 1 that most of the energy of a FBMCOQAM symbol lies in its succeeding two symbols rather than its own period interval. This is due to the fact that the prototype filter overlapping factor K=4. So, it is of considerable interest to consider the cases of ∂={1,2,3}. The reason behind choosing ∂={1,2,3} is that the prototype filter overlapping factor K=4, e.g., ∂={2}, means, at any mth stage, we have to traceback until the (m−2)th stage, in order to identify survivor paths. If we do not alter the ∂, then we have to wait for the processing of all M symbols (10^{5} in our simulation). For these values along with ∂={10^{5}}, we have plotted the CCDF of PAPR in Fig. 6. In the legend of that figure, “ ∂=10^{5}” indicates the original TSLM and “ ∂={1,2,3}” indicate lowlatency TSLM with different traceback depths.
The case of ∂=1 may seem like that of DSLM [17], but it is different. In the case of DSLM, the choice of optimal rotation of a given mth input symbol vector X _{ m } depends only on the past input symbol vectors X _{ m−1},…,X _{0}, whose optimal rotations have already been fixed, whereas for lowlatency TSLM with ∂=1, at the mth stage, it shall depend not only on past input symbol vectors but also on one succeeding future input symbol vector X _{ m+1}, as we perform joint modulation in step 3 of the TSLM algorithm. So, when we move to the next (m+1)th stage in the trellis, the optimal choice (i.e., the survivor path) may vary and this may have impacted the decision in the previous stage. Then, the choice of the mth stage should bear with the incorrect decision, and this in turn will impact the PAPR reduction. Also, the possibility of incorrect decision will increase along with U leading to much suboptimal performance for higher value of U. As seen in Fig. 6, the PAPR reduction performance of lowlatency TSLM with ∂=1 lags the TSLM with ∂=10^{5} by around 0.8 dB at 10 ^{−3} value of CCDF of PAPR.
However, for ∂=2, we are rectifying the above gap by a large extent. Even though it has suboptimal performance, it is worth noting that lowlatency TSLM with ∂=2 lags the TSLM with ∂=10^{5} by around 0.37 dB at 10 ^{−3} value of CCDF of PAPR. Finally, we have observed that lowlatency TSLM with ∂=3 reaches the quasioptimal performance of TSLM. But, in this case, the latency is substantially reduced from 10^{5} stages to 3 stages and we need very less hardware memory, since we store just 2N U complex time samples instead of 10^{5} N U. The latency and the number of complex time samples needed to store for different values of ∂ have been summarized in Table 5, where we can see the tradeoff between latency and PAPR reduction performance. If there is a constraint on latency or hardware memory, then a lowlatency TSLM with ∂={2,3} can be considered, which have tolerable suboptimal and quasioptimal performances respectively.
Impact of choice of the metric function
The choice of metric function f(.) in Eq. (8) seems to have some impact on the performance in terms of PAPR mitigation. Two different functions, namely, linear and exponential functions, have been chosen to understand the impact of choice of the metric function f(.) on the performance of the TSLM scheme. As shown in Fig. 7, for low values of the PAPR, the PAPR reduction performance with exponential function is almost as same as that with the linear one, albeit, lagging minutely. Very small performance gain can be seen at high values of the PAPR. This can be explained by the fact that the exponential function puts more weightage to higher peaks than the linear one in identifying the set of optimal phase rotation vectors. Although we do consider exponential function in all our simulation, we suggest that it can be sufficient to choose a linear metric function.
Comparison of TSLM with existing probabilistic schemes
Among the SLMbased schemes, the TSLM has been already been compared with DSLM in [19], where it have been shown that it is superior to any scheme based on symbolbysymbol approach. DSLM has superior performance than OSLM, as shown in [28]. MBJOPTS is a trellisbased scheme, which yields quasioptimal performance among the PTS schemes. In fact, fair comparison of any PTS and SLM scheme cannot be possible, if both schemes do not exhibit the same computational complexity [24]. So, we try to compare the multiplications computational complexity of MBJOPTS with TSLM by keeping the number of tones, type of modulation, the prototype filter, and T _{0} duration identical. The value of W is 2 as per the proposed MBJOPTS scheme [18]. The value of U _{root} calculated for V={2,4} according to (30) is found to be 3 and 14, respectively. The comparison of the performance of MBJOPTS for V={2,4} and W=2 w.r.t. TSLM scheme for corresponding values of U={3,14} can be seen in Fig. 8. The number of complex multiplications and additions needed for implementation of TSLM and MBJOPTS algorithms over 10^{5} FBMCOQAM symbols has been summarized in Table 6.
At CCDF of PAPR equal to 10 ^{−3} in Fig. 8, we can infer that the FBMCOQAM with TSLM leads the MBJOPTS scheme in PAPR reduction by roughly 0.7 and 0.2 dB for U=3 and U=14, respectively.
To do a complex multiplication, we need to perform three complex additions. So, we can compute from Table 6 the relative reduction in computational complexity of TSLM w.r.t MBJOPTS. Thus, we have found that the proposed TSLM method with U={3,14} reduces the overall complexity in terms of complex additions, by 19.65 and 23.42% compared with the MBJOPTS method with V={2,4} and W=2, respectively.
Conclusions
Since FBMCOQAM signals have high PAPR, there is a dire need to probe for suitable PAPR reduction schemes. This paper is an extension of the recently proposed TSLM. In this paper, the computational complexity of the TSLM scheme has been derived and lowlatency TSLM has been proposed, which not only can yield tolerable suboptimal or same performance to that of TSLM but also has very low latency and needs less hardware memory. Then, the impact of time duration of partial PAPR on the performance of TSLM is studied and its lower bound has been identified by proposing suitable time duration. A thorough and fair comparison of performance has been done with an existing trellisbased scheme proposed in literature, and the simulation results show that lowlatency TSLM yields better performance with relatively low latency.
References
 1
BF Boroujeny, OFDM Versus Filter Bank Multicarrier. IEEE Signal Proc.Mag. 8(3), 92–112 (2006).
 2
X Li, LJ Cimini, Effects of clipping and filtering on the performance of OFDM. IEEE Commun.Lett. 2(5), 131–133 (1998).
 3
TA Wilkinson, AE Jones, in 45th IEEE Veh.Technol. Conf, 2. Minimisation of the peaktomean envelope power ratio of multicarrier transmission schemes by block coding (Chicago, 1995), pp. 925–829.
 4
J Tellado, J Cioffi, in IEEE CTMC, GLOBECOM. Peak power reduction for multicarrier transmission (IEEE PublicationSydney, 1998).
 5
H Ochiai, A novel trellis shaping design with both peak and average power reduction for OFDM systems. IEEE Trans.Commun. 52(11), 1916–1926 (2004).
 6
BS Krongold, DL Jones, PAR reduction in OFDM via active constellation extension. IEEE Trans. Broadcast. 49:, 258–268 (2003).
 7
RW Bauml, RFH Fischer, JB Huber, Reducing the peaktoaverage power ratio of multicarrier modulation by selected mapping. IEE Electron. Lett. 32(22), 2056–2057 (1996).
 8
SH Muller, JB Huber, OFDM with reduction peak to average power ratio by optimum combination of partial transmit sequences. IEEE Electron. lett. 33:, 368–369 (1997).
 9
N van der Neut, B Maharaj, F de Lange, G Gonzalez, F Gregorio, J Cousseau, in EURASIP Journal on Advances in Signal Processing, 2014, no. 172. PAPR reduction in FBMC using an ACEbased linear programming optimization (Springer Publications, 2014).
 10
Z Kollar, P Horvath, in Hindawi Journal of Computer Networks and Communications, 2012, no. 382736. PAPR Reduction of FBMC by Clipping and its Iterative Compensation (Hindawi Publications, 2012).
 11
Z Kollar, L Varga, B Horvath, P Bakki, J Bito, in Hindawi Scientific World Journal, 2014, no. 841680. Evaluation of Clipping Based Iterative PAPR Reduction Techniques for FBMC Systems (Hindawi Publications, 2014).
 12
B Horvath, P Horvath, in IEEE European Wireless Conference. Establishing Lower Bounds on the PeaktoAveragePower Ratio in Filter Bank Multicarrier Systems (Budapest, 2015), pp. 1–6.
 13
S Lu, D Qu, Y He, Sliding Window Tone Reservation Technique for the PeaktoAverage Power Ratio Reduction of FBMCOQAM Signals. IEEE Wireless Commun. Lett. 1(4), 268–271 (2012).
 14
KC Bulusu, H Shaiek, D Roviras, in IEEE International Symposium on Wireless Communication Systems. Reduction of PAPR of FBMCOQAM Signals by Dispersive Tone Reservation Technique (Brussels, 2015), pp. 561–565.
 15
G Cheng, H Li, B Dong, S Li, An improved selective mapping method for PAPR reduction in OFDM/OQAM system. Scientific Research Communications and Network Journal. 5(3C), 53–56 (2013).
 16
A Skrzypczak, P Siohan, JP Javaudin, in 63rd IEEE Veh.Technol. Conf, 4. Reduction of the peaktoaverage power ratio for OFDMOQAM modulation (Melbourne, 2006), pp. 2018–2022.
 17
KC Bulusu, H Shaiek, D Roviras, R Zayani, in 11th IEEE International Symposium on Wireless Communication Systems. PAPR Reduction for FBMCOQAM Systems Using Dispersive SLM Technique (Barcelona, 2014), pp. 568–572.
 18
D Qu, S Lu, T Jiang, MultiBlock Joint Optimization for the PeaktoAverage Power Ratio Reduction of FBMCOQAM Signal. IEEE Trans.Signal Process. 61(7), 1605–1613 (2013).
 19
KC Bulusu, H Shaiek, D Roviras, in IEEE International Conference on Communications. Potency of TrellisBased SLM over the SymbolbySymbol Approach in Reducing PAPR for FBMCOQAM Signals (London, 2015), pp. 4757–4762.
 20
P Siohan, C Siclet, N Lacaille, Analysis and design of OFDM/OQAM systems based on filter bank theory. IEEE Trans.Signal Process. 50:, 1170–1183 (2002).
 21
BL Floch, M Alard, C Berrou, Coded orthogonal frequency division multiplex. IEEE Proc. 83(6), 982–996 (1995).
 22
M Bellanger, in IEEE International Conference on Acoustic, Speech and Signal Processing. Specification and design of prototype filter for filter bank based multicarrier transmission (Salt Lake City, 2001), pp. 2417–2420.
 23
R Bellamen, Applied Dynamic Programming (Princeton University Press, New Jersey, 1962).
 24
C Siegl, RFH Fischer, in IEEE International ITG Workshop on Smart Antennas. Comparison of partial transmit sequences and selected mapping for peaktoaverage power ratio reduction in MIMO OFDM (Darmstadt, 2008), pp. 324–331.
 25
A Burg, VLSI Circuits for MIMO Communication Systems, Ph.D. Thesis, ETH Zurich (2006).
 26
E Chu, A George, Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms, Computational Mathematics Series (CRC Press, Boca raton, 1999).
 27
Tellado J, Peak to Average Ratio Reduction for Multicarrier Modulation, Ph.D. Thesis, Stanford University, Stanford, CA, USA (1999).
 28
KC Bulusu, Performance Analysis and PAPR Reduction Techniques for FilterBank based MultiCarrier Systems with NonLinear Power Amplifiers, Ph.D. Thesis, Conservatoire National des Arts et Métiers, Paris, France (2016).
Acknowledgements
The work done in this paper is financially supported by the French National Research Agency (ANR) project ACCENT5 with grant agreement code: ANR14 C E28002602.
Competing interests
The authors declare that they have no competing interests.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Bulusu, S.K., Shaiek, H. & Roviras, D. Reducing the PAPR in FBMCOQAM systems with lowlatency trellisbased SLM technique. EURASIP J. Adv. Signal Process. 2016, 132 (2016) doi:10.1186/s1363401604299
Received
Accepted
Published
DOI
Keywords
 5G
 Dynamic programming
 Computational complexity
 FBMCOQAM
 PAPR
 SLM
 Trellisbased