 Research
 Open Access
 Published:
An orthogonal wavelet division multipleaccess processor architecture for LTEadvanced wireless/radiooverfiber systems over heterogeneous networks
EURASIP Journal on Advances in Signal Processing volume 2014, Article number: 77 (2014)
Abstract
The increase in internet traffic, number of users, and availability of mobile devices poses a challenge to wireless technologies. In longterm evolution (LTE) advanced system, heterogeneous networks (HetNet) using centralized coordinated multipoint (CoMP) transmitting radio over optical fibers (LTE AROF) have provided a feasible way of satisfying user demands. In this paper, an orthogonal wavelet division multipleaccess (OWDMA) processor architecture is proposed, which is shown to be better suited to LTE advanced systems as compared to orthogonal frequency division multiple access (OFDMA) as in LTE systems 3GPP rel.8 (3GPP, http://www.3gpp.org/DynaReport/36300.htm). ROF systems are a viable alternative to satisfy large data demands; hence, the performance in ROF systems is also evaluated. To validate the architecture, the circuit is designed and synthesized on a Xilinx vertex6 fieldprogrammable gate array (FPGA). The synthesis results show that the circuit performs with a clock period as short as 7.036 ns (i.e., a maximum clock frequency of 142.13 MHz) for transform size of 512. A pipelined version of the architecture reduces the power consumption by approximately 89%. We compare our architecture with similar available architectures for resource utilization and timing and provide performance comparison with OFDMA systems for various quality metrics of communication systems. The OWDMA architecture is found to perform better than OFDMA for bit error rate (BER) performance versus signaltonoise ratio (SNR) in wireless channel as well as ROF media. It also gives higher throughput and mitigates the bad effect of peaktoaveragepower ratio (PAPR).
1 Introduction
The diversity of applications used over the internet has resulted in a demand for increased speed (data rate) over the network and a need for accommodating more users per unit area. This demand has urged research communities to provide greener and more costefficient networks. Several research studies have been conducted over the last decade, proposing costefficient broadband architectures. Today, nextgeneration longterm evolution (LTE) systems using radio signals over optical fibers are evolving towards centralized architectures, as a promising solution to meet the everincreasing demand for highspeed wireless connectivity. Centralized architectures, epitomized by micro base stations, femto and picocell basestation/accesspoint architectures, and mesh networking solutions have promised to provide several benefits, including reduced power consumption, enhanced radio spectrum utilization capacity, and diversity of nextgeneration wireless communication networks [1].
As radio spectrum is expensive and bandlimited, centralized LTE advancedradio over fiber (ROF) has attracted significant research interest. It focuses on the optimum construction and utilization of the hardware resources to cater an area of high traffic. A typical design uses optical fiber to move analog or digitized radiofrequency (RF) between the central facility and the remote sites [2]. Choosing optical fiber over conventional coaxial cables enables the usage of the enormous bandwidth provided by the fiber as well as almost errorfree transmission for short ranges in a metro area network (MAN). Softwaredefined radio (SDR) provides efficient, costeffective and easytohandle deployment architecture for the LTE AROF system. It follows a normal server/multiclient IT network and provides flexibility in architecture deployment. It also provides big savings of operational and infrastructure cost for service providers.
In the current LTE and WiFi systems, orthogonal frequency division multipleaccess (OFDMA) is the technology of choice [3]. OFDMA uses inverse fast Fourier transform (IFFT) at the transmitter and fast Fourier transform (FFT) at the receiver and allocates fixed resources to users for a given set of operating parameters. Despite its several advantages, if coupled with other components of the LTE A, the use of OFDMA increases the cost and utilization overhead of system resources. Moreover, it suffers from large implementation complexity, requiring a fixed allocation of resources to all the users, regardless of the present traffic as well as a high peaktoaveragepower ratio (PAPR) [4].
Orthogonal wavelet division multiple access (OWDMA) has been proposed as a viable alternative to OFDMA in communication systems. Previous work concentrated on digital video broadcast, and results were only plotted for the BPSK modulation scheme [5, 6]. Raajan et al. [7] provided bit error rate (BER) performance graphs for all the wavelets and modulation schemes, but no hardware architecture was provided for the proposed system. Similarly, Tao et al. [8] and Liew et al. [9] analyzed orthogonal wavelet division multiplexing (OWDM) for signaling over wideband linear timevarying channels (LTV) but, again, did not provide any architecture for deployment. 1D orthogonal wavelets have been used [10–14] for image processing applications.
The paper is organized in sections, where Section 3 provides a brief description of previous work, the definition of wavelet transform, and reasons for choosing 9/7 Daubechies lifting scheme for evolving the architecture. Section 4 describes the proposed OWDMA processor architecture and explains the different building blocks. In Section 5, pipelining is introduced in the architecture to reduce power consumption. Section 6 presents the synthesis and comparison results of resources and timing with other similar architectures. Section 7 gives the performance comparison for OWDMA and OFDMA, based on quality of constraint (QOS) metrics for LTE AROF systems. Finally, conclusions are offered in Section 8, followed by references.
2 Key contributions
A sequential outputbased parallel processing (SBPP) architecture for OWDM was proposed and evaluated for BER and PAPR [15]. Its deployment in LTE A future 3GPP rel.10 and above requires that its structure should be flexible enough to adapt according to channel conditions to different values of transform size in order to service uniformly the same number of users. The structure needs to accommodate both forward and inverse operations through a common control input. The architecture should be power efficient, easily controllable through a single control and should have inputoutput ports matching with other system subblocks that will satisfy the timing requirements of the whole system. Moreover, it is important for it to offer improved performance in terms of spectral efficiency (throughput), quality of service (better BER at the same signaltonoise ratio (SNR)) and should fit well in radiooverfiber systems. In this paper, an OWDMA architecture is developed that has significantly better performance, is easy to deploy, and consumes fewer resources than similar architecture available in the literature.
Analyzing the approaches described in [5–9] gives insight about the extensive research performed on the various solution approaches to problems of LTE OFDMA systems and provides proof that orthogonal wavelets are a better and viable alternative to the existing wireless systems. Although the analysis and evaluations were done for BER and PAPR, it lacks in a unified system implementation, resource analysis, and thorough performance evaluation for current LTE systems. Our major contribution in this paper is to deal with these shortcomings in the present knowledge and present an overall system level solution. Moreover, we also provide performance analysis for optical fiber medium.
3 Orthogonal wavelet division multiplexing
Fast fluctuations in the time domain or frequencyspecific information in the time domain can only be revealed through a time/frequency analysis. The wavelet transform maps a time function into a twodimensional function of ‘a’ (the scale) and ‘τ’ (the translation) of the Wavelet function along the time axis [16, 17]. The continuous waveform transform (CWT) of a signal s(t) has been defined as
where t is the time, ψ(t) is the basic (or mother wavelet), and ψ((t − τ)/a) is the translated baby wavelet [6] created by either stretching or compressing the mother wavelet.
3.1 Formulation of OWDM from the 9/7 filter using lifting
From the CWT, it is possible to construct the discrete wavelet transform (DWT) and the inverse DWT) from banks of matched highpass filters (HPF) and lowpass filter (LPF) [18]. Single carrier systems tend to have high bit rates but low frequency resolution, whereas OFDM has many sublevels, each transferring at a low bit rate. Since the wavelet transform contains both time and frequency information, it is possible to effectively send different data rates in different sublevels, according to channel conditions. When considering the DWT, there are a number of mother wavelet families that need to be evaluated. To replace OFDM systems in a multipath environment having carrier and symbol interference, the wavelets need to be orthogonal and periodical. Also, the realization using discrete structures is important for purpose of implementation. Therefore, only three families of wavelet satisfy all the abovementioned constraints: Daubechies, Symlet, and Coiflet [19].
Figure 1 presents a comparison of signaltonoise ratio vs. bit error rates for the Symlet 1, Coiflet 2, and Daubechies 2 similarorder filters [19]. It can be seen that the least resilient wavelet family is the Symlet, followed by the Coiflet, then the Daubechies, which appears to be better suited for implementation.
The lifting scheme is used for the development of the architecture for a 9/7 Daubechies 1D wavelet filter with two stages of lifting (N = 2), i.e., predict1 and update1, followed by predict2 and update2 in a second stage, followed by scaling [20, 21]. The basic idea of the lifting scheme is first to compute a trivial wavelet (or lazy wavelet transform) by splitting the original 1D signal into odd and evenindexed subsequences and then modify their values using alternating prediction and updating steps [22, 23]. The lifting algorithm consists of the following three steps:

Split step
The original signal, X(n), is split into odd and even samples (lazy wavelet transform).

Lifting step
This step is executed as N substeps (depending on the type of the filter), where the odd and even samples are alternatingly filtered by the prediction and update filters.

Scaling step
After N lifting steps, scaling coefficients K and 1/K are applied, respectively, on the odd and even samples, in order to obtain the lowpass band and the highpass subband.
Orthogonal wavelet division multiple access (OWDMA) is a system, in which the wavelet domain is used to separate the subband components in the same way as OFDMA. The big difference between OFDMA and OWDMA is that in OFDMA, the FFT performs subband decomposition with a specific number of subbands at welldefined intervals, while OWDMA may dynamically allocate the number of subbands and the bandwidth of each [24].
4 OWDMA processor architecture for LTE A and LTE AROF
From the SBPPOWDM scheme presented in the previous section, it is found that the final scaling and dilation coefficients are interdependent on predict and update outputs at each stage; thus, there is a delay and it also affects throughput. The structure requires two update and predict blocks to be implemented. OWDMA scheme requires that the structure should be flexible enough to adapt to different values of N, according to the channel conditions. The structure needs to accommodate both forward and inverse operations through a common control. The multiplicative coefficients for the filter need to be stored in a hardwarefriendly format which will reduce the number of multiplication operations. Thus, a new OWDMA processor architecture has been developed that caters to all the requirements of a multipleaccess system mentioned above. Moreover, parallelism is exploited in the architecture, along with pipelining, to formulate an efficient, lowpower, and resourcefriendly processor.
Predict and update block1 and predict and update block2 are combined together along with the scaling. In the forward operation, the wavelet coefficients for odd and even samples are calculated using (2) and (3). The odd values are calculated using a structure implementing a 9tap finite impulse response (FIR) filter, and the even values are found out with a 7tap FIR filter structure, as shown in Figure 2. Odd index wavelet coefficients first, third, n − 3, and n − 1, and even index wavelet coefficients second, n − 2, and n th have adjusted input values fed using the symmetry property of the filter. The values of k before the first value ‘X [1]’ and after the last value ‘X[N]’ are replaced by their symmetric values by using x[k] = x[k + 2i], where i takes the value − k + 1 w.r.t to the input index k at the left side of the axis, and it takes the value N − k at the right side of the axis. The FO_{ C }(j) and FE_{ C }(l) are forward odd and even filter coefficients, respectively, where j = 1,2…9 and l = 1,2….7.
In the inverse operation, the wavelet coefficients for odd and even samples are calculated using (4) and (5). The odd values are calculated using a structure implementing a 7tap FIR filter, and the even values are found with a 9tap FIR filter structure. Odd index wavelet coefficients first, third, and n − 1, and even value wavelet coefficients second, fourth, n − 2, and n th have adjusted input values from symmetry in a similar way as described in the previous paragraph for the forward operation. The boundary conditions are formulated using a state machine control logic implementation elaborated in the control unit section. The IO_{ C }(l) and IE_{ C }(j) are inverse odd and even filter coefficients, respectively, where j = 1,2…9 and l = 1,2….7.
The proposed OWDMA processor consists of a core unit to multiply filter coefficients with delayed input, accumulate with previous values, and compute the wavelet coefficients. Its control unit controls which coefficients are to be applied at the complex multiplier input. A coefficient generator unit reads the appropriate coefficients from memory. The OWDMA unit acts as slave to a master scheduler unit that feeds it with clock, address, input data, and variables. Figure 3 shows the highlevel architecture of the OWDMA processor unit with the scheduler. The architecture is a twoparallel structure due to simultaneous calculation of odd and even data. The scheduler and the three major units of the proposed system, namely, the core unit, control unit, and the coefficient generator unit, are discussed below.
4.1 Scheduler
The proposed OWDMA processor can be interfaced with the scheduler, according to the scheme presented Figure 3. In this scheme, the scheduler communicates with the OWDMA processor using a set of dedicated handshaking signals. The scheduler acts as the master, sets the address of the processor, and provides clock to it (CLK). First, the scheduler requests the control unit block to initiate a new transform using the START signal. The controller unit sets the BUSY signal low, if it is ready to start the process for the new transform, or high, if it is in the middle of an already continuing process. When the controller is ready, it sends a data request (D_REQ) signal to the scheduler, which then responds with the input data. If the controller correctly gets the input, it sends an acknowledgement (ACK) signal; otherwise, it sends $\overline{\mathrm{NACK}}$, and the scheduler retransmits. Along with the data input, it sends the information for the size of OWDM (N_OWDM) as well as the forward/inverse operation ($\mathrm{FW}/\overline{\mathrm{INV}}$) signal. The OWDMA processor uses the RST signal to indicate the end of data, when it completes the transform. At the same time, it sets the BUSY signal low to indicate to the scheduler that it is ready to start a new transform.
4.2 Core unit
The core unit consists of two FIR filter units. One is 9tap and the other is a 7tap, as shown in Figure 2. They both have CLK, CLK_EN, IN_EN, G1_EN, G2_EN, D_IN, and $\mathrm{FW}/\overline{\mathrm{INV}}$, as common inputs, and Y_{ODD} and Y_{EVEN} as the odd and even filter outputs. The only difference is the inputted coefficients for the multiply and accumulate units inside the FIR filters. The FO_{ C } (1…9), IE_{ C } (1…9) are the coefficient inputs for the 9tap filter block, and FE_{ C } (1…7), IO_{ C } (1…7) are coefficient inputs for the 7tap filter block. These coefficients are explained in more detail in Section 4.4. The CLK input counts from 0 to N + 4 and then gets reset. The extra 5 clock cycles after the normal Ncycles are for flushing the output to 0. D_IN is the data input. IN_EN is enable signal for data input. G1_EN and G2_EN are enable signals for switches that switch input gates and are enabled by the control logic. $\mathrm{FW}/\overline{\mathrm{INV}}$ signal is for forward (a ‘0’) and inverse operation (a ‘1’). The outputs of both filters are fed to a paralleltoserial converter block that downsamples the data and rearranges the coefficients to give the final coefficients W_{ C }. It has an OUT_EN (output enable) signal to start calculating the output W_{ C } (the wavelet coefficients). Figure 4 shows all the logic signals in a timing diagram.
4.3 Control unit
The control unit consists of two separate logic units for forward and inverse computation and is implemented using a finite state machine having five states: S0, S1, S2, S3, and S4. It toggles on the positive CLK edge input, and at each state, the output controls IN_EN, G1_EN, G2_EN, OUT_EN, FW/$\overline{\mathrm{INV}}$, _COEF_EN (0/1), and $\mathrm{FW}/\overline{\mathrm{INV}}$. The $\mathrm{FW}/\overline{\mathrm{INV}}$ signal controls which control the logic unit is to be used (forward or inverse). G1_EN and G2_EN are gate control switches that switch inputs for the delay registers at the boundary conditions.
The input value has to be symmetrically extended at the boundaries to avoid distortion. ‘X [1]’ is the first input value and no previous value is available. Using the symmetric property of the lifting scheme [20] as shown in Figure 5, the next input value ‘X [2]’ is extended to the left of ‘X [1]’ and is used to perform the filter operation. Similarly, at the end of the row, the input value ‘X[N]’ is the last one. By copying the input value ‘X[N1]’ to the right of ‘X[N]’, the boundary condition at the right end can be satisfied. The control logic is shown in Figure 6. At the positive edge count value of 4, G1_EN is enabled for 1 clock cycle. G2_EN is enabled at n th clock value. Output is calculated starting from fifth clock count to (n + 4)th count and then everything resets back to state S0.
4.4 Coefficient generator unit
The coefficient generator block is a memory block that contains the odd and even filter coefficients to be multiplied during forward/inverse operation. Providing the appropriate constant to the multiplier, it implements the desired multiplication. The width of the multipliers is determined by the accuracy of the constants and the data path bitwidth. The drawback of the above implementation is that the multipliers occupy a great amount of area and restrict the throughput of the processing unit. Using shiftadd operations to replace the multiplications with constants optimizes the above implementation and results in an improved processing block.
To perform shift and add operations, coefficients are converted in two's complement Q.15 format. That is, they are shifted 15 bits to the left and converted to their respective two's complement binary value. The filter constants are quantized, taking in account the number of bits with value ‘1’, in their positive representation. That is because each ‘1’ yields a term to be summed. For example, the sets of odd and even coefficients for the forward path are shown in Tables 1 and 2, respectively, similarly, the coefficients for the reverse path can be defined.
5 Pipelining the parallel architecture
Power dissipation is a major drawback of the system in the downlink and especially the uplink of a LTE A network. In the proposed OWDMA processor architecture, pipelining the stages of the 9tap and 7tap filters, along with the twostage parallel structure, can help save in power budget. The saved power can be used to accommodate more number of users or increase the range of the system. Pipelining reduces the effective critical path by introducing latches along the critical data path. The critical path (or the minimum time required for processing a new sample) is limited by 1 multiply and 8 add times in the 9tap filter structure and 1 multiply and 6 add times in the 7tap filter structure of the OWDMA processor, respectively, as shown in Figure 2. Thus, the ‘sample period’ is given by (6) and (7), where T_{sample} (9tap) and T_{sample} (7tap) are the sampling frequencies of the respective filters
Pipelining is accomplished by introducing 14 and 10 additional latches in the feedforward path of the 9tap and the 7tap filter structure, respectively, thereby reducing the critical path to T_{M} + T_{A} for both filters. In an Mlevel pipelined system, the number of delay elements in any path from input to output is (M − 1) greater than that in the same path in the original sequential circuit. Thus, we apply eightlevel pipelining to the 9tap filter circuit and sixlevel pipelining to the 7tap filter circuit. When the sample speed does not need to be increased, this can be used for lowering the power consumption. The power dissipation (P_{CMOS}) in any circuit depends on the total capacitance C_{total} of the CMOS logic, the supply voltage V_{CC}, and the clock frequency f. The total power depends on static and dynamic power consumption:
With increase in propagation/gate delay, leakage current decreases, thereby reducing the static power consumption of the system [25]. The propagation delay (T_{pd}) depends on the charging capacitance C_{charge} in a clock cycle and the difference (V_{CC} − V_{t})^{2}, where V_{t} is the threshold voltage:
Applying pipelining reduces the capacitance to be charged/discharged in oneclock period, while the inherent parallel processing allows for increasing the clock period for charging/discharging the original capacitance. In an Lparallel system (L = 2 in our case), the clock period of the circuit is increased to LT_{pd}. This implies that the supply voltage can be reduced to βV o (0 < β < 1). Hence, the power consumption, compared with the original system, is reduced by a factor β^{2}. The propagation delay of the Lparallel, Mpipelined filter is obtained as
Finally, we can obtain the following equations to compute β and power dissipated in OWDM circuit, respectively
6 Performance results and comparisons
6.1 Synthesis of the proposed architecture and resource utilization
In order to evaluate the performance of the architecture, it is required to make use of certain metrics that characterize the architecture in terms of the hardware resources used and the computation time. The hardware resources used for filtering are measured by the number of multipliers an number of adders, while those used for the storage of data and filter coefficients are measured by the number of registers. In general, the computation time is technology dependent. However, a metric that is technology independent and can be used to determine the computation time (T) is the number of clock cycles (N_{CLK}) elapsed between the first and the last samples inputted to the architecture. Assuming that the clock period is T_{c}, the total computation time can then be obtained as T = N_{CLK} × T_{c}.
To validate the circuit design based on the proposed architecture, the implementation is done on a test bed that includes one central processor with multiple distributed antenna nodes and multiple mobile stations. The test bed operates in the 2.4GHz ISM band for its licenseexempt convenience. The central processor consists of RF front ends with 20 to 80MHz bandwidth, a number of 125 ~ 250MHz 14bit ADC/DACs mounted on the latest Xilinx Virtex6 FPGA digital signal processing (DSP)/FPGA processing unit (San Jose, CA, USA) to preprocess data samples. All the carry propagation adders of the architecture have a 16bit word length and use a structure that combines the carryskip and carryselect adders [26]. The FPGAs inside the platforms are XC6VLX75T2, which is capable of operating at a clock frequency of 650 MHz at supply voltage of V_{CC} = 2.5 V and quiescent voltage of V_{t} = 1.5 V. The resources utilized by the FPGA implementation in terms of the numbers of configuration logic block (CLB) slices, flipflop slices, DSP 48's, input/output blocks (IOBs), and block RAMs (BRAMs) are given in Table 3.
The implemented circuit is found to perform well with a clock period as short as 7.036 ns (i.e., a maximum clock frequency of 142.13 MHz) for a transform size of N = 512. By replacing the values of V_{CC}, V_{t}, L, and M in (18) and (19), it can be found out that the power consumption on the chip on which the circuit is implemented is reduced by a factor of (1/9). The new power usage is only 143 mW per antenna.
6.2 Comparison with other architectures
The proposed OWDMA architecture is compared next with various 1D wavelet architectures, as well as with available commercial OFDMA chips. Computation time (T), CLB slices or area occupied on FPGA, maximum clock frequency and area/speed ratio are some of the key performance metrics that are compared. Table 4 gives the comparison results between different 1D wavelet architectures present in the literature. For even comparison, N = 512 is taken as the size of the input data in all architectures. It can be inferred from the table that although our proposed architecture consumes a few more hardware resources as compared to recursive architecture [10], parallel FDWT [12], pipelined [13], and Arch1DII [14], it significantly performs better in terms of maximum clock frequency and computation time. It is also seen that the area to speed ratio is the second lowest for our architecture. Although the parallel FDWT implementations [12] present a better area/speed ratio, high computation times make them unsuitable for highspeed applications.
Current advanced 4G systems deploy OFDMA architecture. So, it becomes imperative to compare our proposed OWDMA processor architecture with the stateoftheart implementation. The OFDMA core uses the Radix4 and Radix2 decompositions for computing the discrete Fourier transform (DFT) [27, 28]. When using the Radix4 decomposition, the Npoint FFT consists of log4 (N) stages, with each stage containing N/4 Radix4 butterflies. Point sizes that are not a power of 4 need an extra Radix2 stage for combining data. An Npoint FFT using Radix2 decomposition has log2 (N) stages, with each stage containing N/2 Radix2 butterflies. The comparison between the two architectures is depicted in Table 5. It can be seen that there is at least 90% improvement in the computation time when OWDMA core is used as opposed to OFDMA core for a 10% increase in DSP resources on the FPGA, whereas the total area occupied on FPGA remains comparable with the respective Npoint computation. Furthermore, it has been shown [6] that the OWDMA which implements wavelet transforms has a complexity of O(N), as opposed to the OFDMA that contains FFT operations and has complexity of O(N log_{2}N).
7 Quality comparison in 4G LTE A/LTE AROF system
In any communication system, BER versus SNR and spectral efficiency (throughput) are standard quality of service (QoS) parameters, which give a measure of system performance. Therefore, the proposed OWDMA architecture has been compared to existing OFDMA architectures in 4G LTE A systems with respect to the above two QoS parameters. Figure 7 shows the difference between the two implementations with respect to a practical system perspective. As can be seen in Figure 7a,b, the difference is in the IFFT/FFT blocks of OFDMA that are replaced by OWDM modulator/OWDM demodulator block. The number of samples N_{OFDM} is same as N_{OWDM} that is modulated and demodulated input/output in the respective paths. The inherent processing is similar for both OFDM and OWDM, the only difference being the way the samples are treated in the respective processors. So, there is no need to make any significant changes to the overall processing samples. Only replacing the IFFT/FFT blocks of OFDMA with that of OWDM modulator/OWDM demodulator will serve our purpose.
Figure 8a gives an overview of the proposed centralized CoMP system based on an existing LTE backbone with fiber connectivity. The system consists of a central processing unit (CPU) that contains the Xilinx FPGA and DSPs for all the data processing for uplink and downlink. There are four separate processing modules inside the CPU model in Figure 8a, each having transmit (Tx) and receive (Rx) data processing capability. The four modules are connected to a 4 × 4 hub that, in turn, is connected to a RF switch, capable of switching in Tx and Rx directions. The transmit and receive directions have an electricaltooptical convertor (laser diode) to convert from electrical to optical signal to be transmitted via the fiber to a remote antenna unit (RAU), located at different places in a cell site. The laser diode is modulated by the RF signal in the downlink path. The resulting intensitymodulated optical signal is then transmitted through the singlemode fiber towards a photodiode. The received optical signal is converted to RF signal (opticaltoelectrical convertor) by direct detection through a PIN photodetector. The signal is then amplified and radiated by the antenna. The optical fibers cover in practice a distance of few hundred meters to a few kilometers, enough to cover a building or a small area. The RAU is a passive unit containing only optical to electrical convertor and amplifier and a RF antenna at the 2.4/5GHz band to transmit or receive radio signals. The RAUs are relatively close to the user equipment, generally within few hundred meters. So, an International Telecommunication Union (ITU) pedestrian multipath channel with Doppler frequency f_{d} = 5 Hz is chosen for simulations.
Figure 8b,c shows the transmitter and receiver units, respectively, of the proposed architecture. In the first step of the transmitter processing, the user data are generated, depending on the previous acknowledgement (ACK) signal. If the previous user data transport block (TB) was not acknowledged, the stored TB is retransmitted using a hybrid automatic repeat request (HARQ) scheme. Then, a cyclic redundancy check (CRC) is calculated and appended to each user's TB. The data of each user are independently encoded using a turbo encoder with quadrature permutation polynomial (QPP)based interleaving [29]. Each block of coded bits is then interleaved and ratematched with a target rate, depending on the received channel quality indicator (CQI) user feedback. The encoding process is followed by data modulation, which maps the channelencoded TB to complex modulation symbols. Depending on the CQI, a modulation scheme is selected for the corresponding resource block. Modulation schemes used for downlinkshared channel (DLSCH) here are QPSK, 16QAM, and 64QAM. The modulated transmit symbols are then mapped to a multipleinput and multipleoutput (MIMO) precoding matrix. The optimum precoding matrix is selected from a code book, depending on the precoding control information (PCI) that is fed back from the user equipment (UE) to the transmitter. Finally, the individual symbols to be transmitted are mapped to the resource elements. Downlink reference symbols and synchronization symbols are also inserted into the OFDM/OWDM timefrequency grid. The assignment of a set of resource blocks (RBs) to UEs is carried out by the scheduler based on the CQI reports from the UEs.The receiver structure is shown in Figure 8c. Each UE receives the signal transmitted by the evolved node B (eNB) and performs the reverse physicallayer processing of the transmitter. First, the receiver has to identify the RBs that carry its designated information. The estimation of the channel is performed using the reference signals available in the resource grid. Based on this channel estimation, the quality of the channel may be evaluated, and the appropriate feedback information calculated. The channel knowledge is also used for the demodulation and soft decoding of the OFDM/OWDM signal. In case of MIMO, a MMSE decoder is used. Finally, the UE performs HARQ combining and channel decoding. In order to cut down the processing time after the end of every turbo iteration, a CRC check of the decoded block is performed and, if correct, decoding is stopped.
The path from UE to RAU, i.e., the uplink uses single carrierfrequency division multiple access (SCFDMA) as OFDM, has high PAPR. High PAPR requires expensive and inefficient power amplifiers with high requirements on linearity, which increases the cost of the user terminal and also drains the battery faster. Since OWDM has better PAPR, it can be used in the uplink as well.
7.1 Bit error rate comparison
Using the above system model, standard QoS parameters BER and throughput have been compared for OFDM and OWDM architectures. In addition, a performance evaluation of radio over singlemode fiber system using coded OFDM and OWDM and the relation of fiber length with BER is discussed. A comparison is also drawn between peaktoaveragepower ratio in the two systems.
Figure 9a,b,c shows the simulation results for biterrorrate versus signaltonoise ratio for OWDMA/OFDMA systems for different modulation formats. Table 6 shows the parameters used for the simulations. The plots are drawn for modulation schemes of QPSK, 16QAM, and 64QAM for an ITUdefined extended pedestrian A model (EPA) with Doppler frequency f_{d} = 5 Hz. The size of transform for both the architectures is taken as 1,024, for a total of 50 resource blocks at 20MHz bandwidth. Comparisons are drawn taking both uncoded and rate1/2 turbocoded OWDMA/OFDMA systems. We can infer from the graphs that OWDMA performs better than OFDMA for both coded and uncoded schemes. At QPSK modulation, as in Figure 9a, OWDMA shows a performance improvement close to 1.5 dB, which increases to nearly 2.5 dB for 64QAM modulation, as shown in Figure 9c. The gain in SNR can be used to either increase reach or accommodate more users.
The relation of optical fiber length with BER in terms of performance (SNR of radiooverfiber) is shown in Figure 9d. Table 7 defines the simulation parameters for the optical link. The SNR of radiooverfiber is defined as the ratio of probe power (P_{FIBER}) of the fiber and the total noise in the ROF system. Total noise of the ROF system is the sum of additive white Gaussian noise (AWGN) plus the fiber nonlinearities and it passes through an ITU multipath fading channel (EPA channel), as defined in the previous case. The SNR of ROF is given by (13), where h is the radio channel coefficient, and N_{AWGN} and N_{FIBERNL} are the AWGN noise and degradation from fiber nonlinearities, respectively.
We find that there is no significant degradation on the BER performance until the fiber length becomes 12 km, due to considerable modal dispersion. OWDMA shows slightly better performance as compared to the OFDMA system. This shows that the OWDMROF system ensures high service availability over long distances up to 8 km, which came in accordance with standard distances between the indoor (baseband) and outdoor (radio) units.
7.2 Throughput of the OWDMA system
In this section, we compare the throughput of the OWDMA systems as compared to that of the OFDMA systems. In a fading system, the capacity of a massive MIMO system is given by [30]
Here, SNR is the signaltonoise ratio, B the bandwidth occupied by the data subcarriers, β defines the combined path loss and lognormal shadowing at any RAU, M is the number of RAUs, and F is a correction factor. The transmission of an OFDM signal requires also the transmission of a cyclic prefix (CP) to avoid intersymbol interference and the reference symbols for channel estimation. The correction factor F represents the loss due to cyclic prefix. In order to avoid any ISI, the symbol time should be greater than the channel delay spread. The substreams should also be orthogonal to each other, and thus, OFDM is used. Assume the transmission of an Nsample sequence, x[n] = {x[0], x[1], . . ., x[N − 1]}, through a channel with L multipaths, h[l] = {h[0], h[1], . . ., h[L − 1]}. We assume that the channel consists of L distinct and resolvable paths, and v[n] is assumed to be. So, the discrete sampled received signal, r[n], at the output of the channel can be written as
It is well known that multiplication in the DFT domain corresponds to the circular convolution in the time domain. In order to achieve circular convolution using linear convolution, we must add a prefix that is the ‘cyclic prefix’ onto the transmitted signal. This cyclic prefix makes the linear convolution appear as a circular convolution and represents a loss in the achievable data rate that becomes significant in the highly fading channels. But in the case of OWDM that uses wavelet transform, the operations involve shift and multiply operations with filter coefficients. The shift by two for subsequent pairs of rows produces a downsampling operation within the matrix transformation and also makes the matrix orthogonal and circulant [31]. Therefore, a cyclic prefix is not required in the case of OWDM. This gives a significant throughput advantage particularly in highly dispersive channels.
Figure 10 shows the throughput (Mb/s) versus SNR in decibels with values of M (RAUs) ranging from 1 to 4 for OWDM and OFDM systems. Shannon's limit for the capacity at different M is also drawn to show the upper bound. The red circles depict the throughput for OWDM/OFDM at a particular modulation scheme and SNR. It is found that the OWDM system achieves better throughput at same channel conditions putting less burden on the overall resources in terms of modulation used, at a given SNR. For instance, OWDM with QPSK provides 25% better throughput performance as compared to OFDM with 16QAM between 12 and 14dB SNR. Data rates up to 760 Mb/s can be achieved using OWDM systems, and the results are very close to Shannon's limit. Thus, OWDM is not only a more efficient system than OFDM but a system that performs close to Shannon's limit.
7.3 Peakaveragetopower ratio
PAPR appears to be a main disadvantage of OFDM. In this paper, we show that PAPR performance in OWDMA is better than that of the OFDMA systems. PAPR depends on the bandwidth efficiency of the system. The problem of high PAPR is usually associated with OFDM because it is much easier to reach high bandwidth efficiency for OFDM. There exist a large number of publications in literature devoted to the PAPR problem in OFDM, mostly claiming only a slight improvement on the exiting architecture. The results of tests indicate that OWDM is the ideal candidate to solve the existing PAPR issues and still deliver high bandwidth efficiency. Our PAPR simulation and analysis is carried out based on its complementary cumulative distribution (CCD). So, for a given PAPR_{0} (dB), the percentage of combinations that guarantee (PAPR > PAPR_{0}) is a meaningful criterion for analysis. In general, the PAPR of wavelet coefficients coming out is defined in (16) as the ratio between the maximum instantaneous power and its average power, where E[W_{ C }(t)]^{2} is the average power of W_{ C }(t) (wavelet coefficients).
The PAPR performance metric we consider is the complementary cumulative distribution function (CCDF), which is plotted as in Figure 11. The OWDMA system provides reduced PAPR of around 2 dB, which is favorable for RF amplifier operation as compared to OFDMA systems.
8 Conclusion
In this paper, we have developed a flexible, hardwarefriendly, and lowpower OWDMA architecture design for deployment in ROF systems having LTEadvanced configuration. The key contribution of the paper is the architecture derived for a LTE AROF system with an interface of input and output ports that can replace the OFDMA block offering added benefits.
We first derived an architecture based on previous 9/7 lifting scheme wavelet filters. The computation of the method is described using filters, controller, and paralleltoserial units. The scheduler is also implemented for easy interfacing of the subblock with other blocks of the system. The architecture is validated on a centralized processor having Xilinx Virtex6 FPGAs at N = 512. We compare our architecture with various other 1D 9/7 wavelets available in the literature as well with existing OFDMA implementations. We also compute the quality parameters BER, throughput, and PAPR for OWDMA and compare them with the existing OFDMA systems.
We found that our architecture runs at a speed of 142.13 MHz, consuming only 143 mW of power per antenna. It is better, in terms of resource consumption, as compared to other similar 1D 9/7 implementations. We also found that it is also significantly better than OFDMA systems in terms of resource utilization and BER, throughput, and PAPR performance for ROF systems. Hence, it is shown that the OWDMA systems are well suited for high data rate communications and also can accommodate more users.
References
 1.
Haoming L, Hajipour J, Attar A, Leung VCM: Efficient HetNet implementation using broadband wireless access with fiberconnected massively distributed antennas architecture. Wirel. Comm. IEEE. 2011, 18(3):7278.
 2.
Attar A, Haoming L, Leung VCM: Green last mile: how fiber connected massively distributed antenna systems can save energy. Wirel. Comm. IEEE 2011, 18(5):6674.
 3.
3GPP: Technical specification group radio access network; (EUTRA) and (EUTRAN); overall description; stage 2. 2008. . Accessed 26 Nov 2013 http://www.3gpp.org/DynaReport/36300.htm
 4.
Jiang T, Imai Y: An overview: peaktoaverage power ratio reduction techniques for OFDM signals. IEEE Trans. Wirel. Comm. 2008, 57: 5657.
 5.
Linfoot SL: Wavelet families for orthogonal wavelet division multiplex. Electron. Lett. 2008, 44(18):11011102. 10.1049/el:20081681
 6.
Linfoot SL, Ibrahim MK, AlAkaidi MM: Orthogonal wavelet division multiplex: an alternative to OFDM. Consum. Electron. IEEE Trans. 2007, 53(2):278284.
 7.
Raajan NR, Monisha B, Kumar MR, Philomina AJ, Priya MV, Parthiban D, Suganya S: Design and implementation of orthogonal wavelet division multiplexing (OHWDM) with minimum bit error rate. Paper presented at the 3rd international conference on trends in information sciences and computing (TISC), Chennai; 2011:122127.
 8.
Tao X, Leus G, Mitra U: Orthogonal wavelet division multiplexing for wideband timevarying channels. Paper presented at the IEEE international conference on acoustics, speech and signal processing (ICASSP), Prague; 2011:35563559.
 9.
Liew BA, Berber SM, Sandhu GS: Performance of a multiple access orthogonal wavelet division multiplexing system. Volume 2. Paper presented at the third international conference on information technology and applications (ICITA), Sydney; 2005:350353.
 10.
Liao HY, Mandal MK, Cockburn BF: Efficient architectures for 1D and 2D liftingbased wavelet transforms. IEEE Trans. Signal Process. 2004, 52(5):13151326. 10.1109/TSP.2004.826175
 11.
McCanny P, Masud S, McCanny J: Design and implementation of the symmetrically extended 2D wavelet transform. ICASSP 2002, 3: 31083111.
 12.
Raghunath S, Aziz SM: High speed area efficient multiresolution 2D 9/7 filter DWT processor. Paper presented at the IFIP international conference on very large scale integration, Nice; 2006:210215.
 13.
Masud S, McCanny J: Reusable silicon IP cores for discrete wavelet transform applications. IEEE Trans. Circuits Syst. I, Reg. Papers1 2004, 51(6):11141124. 10.1109/TCSI.2004.829236
 14.
Uzun IS, Amira A: Rapid prototyping—framework for FPGA based discrete biorthogonal wavelet transforms implementation. IEEE Vision Image Signal Process 2006, 153(6):721734. 10.1049/ipvis:20045080
 15.
Mahapatra C, Ramakrishnan A, Stouraitis T, Leung VCM: A novel implementation of sequential output based parallel processing  orthogonal wavelet division multiplexing for DAS on SDR platform. Paper presented at the 19th IEEE international conference on electronics, circuits and systems (ICECS), Seville; 2012:320323.
 16.
Chan YT: Wavelet Basics. Kluwer Academic Publishers, Dordrecht; 1994.
 17.
Daubechies I: Ten Lectures on Wavelets. 3rd edition. SIAM, Philadelphia; 1994.
 18.
Mallat SG: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Machine Intell. 1989, 11(7):674693. 10.1109/34.192463
 19.
Linfoot SL: A study of different wavelets in orthogonal wavelet division multiplex for DVBT. IEEE Trans. Consum. Electron. 2008, 54(3):10421047.
 20.
Jamin A, Mahonen P: Wavelet packet modulation for wireless communications. J. Wirel. Commun. Mob. Comput. 2005, 5(2):123137. 10.1002/wcm.201
 21.
Cheng C, Parhi KK: Highspeed VLSI implementation of 2D discrete wavelet transform. Signal Process. IEEE Trans. 2008, 56(1):393403.
 22.
Sweldens W, Daubechies I: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 1998, 4: 247270. 10.1007/BF02476026
 23.
Sweldens W: Lifting scheme: a new philosophy in biorthogonal wavelet constructions. In Proceedings of the SPIE Conference on Wavelet Application in Signal and Image Processing III. Volume 2569. Edited by: Laine AF, Unser M. SPIE, Bellingham; 1995:6879. 10.1117/12.217619
 24.
Akansu N, Medley MJ: Wavelet and subband transforms: fundamentals and communication application. IEEE Commun. Mag. 1997, 35: 104115.
 25.
Qi W, Vrudhula SBK: An investigation of power delay tradeoffs for dual Vt CMOS circuits. Paper presented at the international conference on computer design (ICCD), Austin; 1999:556562.
 26.
Zhang C, Wang C, Ahmad MO: A pipelined VLSI architecture for highspeed computation of the 1D discrete wavelet transform. IEEE Trans. Circuits Syst. I, Reg. Papers1 2010, 57(10):27292740.
 27.
Lin YW, Lee CY: Design of an FFT/IFFT processor for MIMO OFDM systems. IEEE Trans. Circuits Syst. I, Reg. Papers1 2007, 54(4):807815.
 28.
Liu H, Lee H: A high performance fourparallel 128/64point radix24 FFT/IFFT processor for MIMOOFDM systems. Paper presented at the IEEE Asia Pacific conference on circuits and systems, Macao; 2008:834837.
 29.
3GPP: Technical specification group radio access network; evolved universal terrestrial radio access (EUTRA); multiplexing and channel coding (release 8). 2008. . Accessed 26 Nov 2013 http://www.3gpp.org/ftp/Specs/archive/36_series/36.212/
 30.
Mahboob S, Mahapatra C, Leung VCM: EnergyEfficient Multiuser MIMO Downlink Transmissions in Massively Distributed Antenna Systems with Predefined Capacity Constraints. Paper presented at the seventh international conference on broadband, wireless computing, communication and applications (BWCCA), Victoria, Canada; 2012:208211.
 31.
Dilmaghani R, Ghavami M: Comparison between waveletbased and Fourierbased multicarrier UWB systems. Commun. IET 2008, 2(2):353358. 10.1049/ietcom:20070181
Acknowledgements
Research performed and documented in this thesis was supported by the Canadian Natural Sciences and Engineering Research Council (NSERC) through grant STPGP 396756.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Mahapatra, C., Leung, V.C. & Stouraitis, T. An orthogonal wavelet division multipleaccess processor architecture for LTEadvanced wireless/radiooverfiber systems over heterogeneous networks. EURASIP J. Adv. Signal Process. 2014, 77 (2014) doi:10.1186/16876180201477
Received
Accepted
Published
DOI
Keywords
 Heterogeneous networks (HetNet)
 Coordinated multipoint (CoMP)
 LTE advanced radio over fiber (LTE AROF)
 Orthogonal wavelength division multipleaccess (OWDMA) processor
 Orthogonal frequency division multiple access (OFDMA)
 Xilinx vertex 6 FPGA
 Bit error rate (BER)
 Signaltonoise ratio (SNR)
 Peaktoaveragepower ratio (PAPR)