Efﬁcient Implementation of Complex Modulated Filter Banks Using Cosine and Sine Modulated Filter Banks

The recently introduced exponentially modulated ﬁlter bank (EMFB) is a 2 M -channel uniform, orthogonal, critically sampled, and frequency-selective complex modulated ﬁlter bank that satisﬁes the perfect reconstruction (PR) property if the prototype ﬁlter of an M -channel PR cosine modulated ﬁlter bank (CMFB) is used. The purpose of this paper is to present various implementation structures for the EMFBs in a uniﬁed framework. The key idea is to use cosine and sine modulated ﬁlter banks as building blocks and, therefore, polyphase, lattice, and extended lapped transform (ELT) type of implementation solutions are studied. The ELT-based EMFBs are observed to be very competitive with the existing modiﬁed discrete Fourier transform ﬁlter banks (MDFT-FBs) when comparing the number of multiplications/additions and the structural simplicity. In addition, EMFB provides an alternative channel stacking arrangement that could be more natural in certain subband processing applications and data transmission systems.


INTRODUCTION
In many practical applications, the signals under consideration are real-valued. However, in communications signal processing, complex-valued in-phase/quadrature (I/Q) signals are commonly used. I/Q signals are obtained in a natural way when the baseband equivalent of a (modulated) bandpass signal is used in analysis or actual signal processing tasks. Another approach is to build an artificial complex-valued signal from two independent real-valued signals by mapping them into real and imaginary parts, respectively [1].
Complex modulated filter banks are widely used as computationally efficient and versatile building blocks whenever subband processing or transmission of complex-valued signals is needed. However, certain applications related to audio/video processing and adaptive filtering can also utilize complex modulated filter banks even if input signals are realvalued [2,3]. In this way, main aliasing terms are missing and both magnitude and phase information is available.
The desired filter bank properties depend highly on the application under consideration. The emphasis of this paper is on 2M-channel finite impulse response (FIR) complex modulated filter banks that are orthogonal, critically sampled, and frequency selective. Moreover, they provide the PR property if the real-valued FIR linear-phase lowpass prototype filter of an M-channel PR CMFB is used. Due to the exponential modulation, the resulting analysis and synthesis filters have single-sided magnitude responses that divide the whole frequency range [−π, π] uniformly.
A very important class of filter banks is the discrete Fourier transform filter banks (DFT-FBs) [4]. An important reason for the wide success of DFT-FBs is their efficient implementation, which is based on the use of polyphase filters and fast Fourier transform (FFT) blocks. It is well known that the critically sampled 2M-channel DFT-FB, with FIR analysis and synthesis filters, satisfies the PR property if the prototype filters are simple 2M-length rectangular windows [5]. Because of this, the stopband attenuation of the resulting channel filters is only 13 dB.
More frequency-selective filter banks can be obtained by using longer and smoother prototype filters. Actually, it has been shown in [6][7][8] that highly frequency-selective PR CMFBs can be designed if the order of the prototype filter is N = 2KM − 1 and the overlapping factor K is sufficiently large. (The use of other order selections does not significantly improve the stopband attenuation of the prototype filter as observed in [9,10].) However, the critically sampled PR complex modulated filter bank system is possible only if certain additional modifications are introduced for the subband  signals as in the case of MDFT-FBs [11] and EMFBs [12].
In MDFT-FBs and EMFBs, the critical sampling is accomplished differently and their channel stacking arrangements are different. The MDFT-FB is derived from a DFT-FB with oversampling factor of 2 by introducing several changes to the subband downsampling and upsampling stages. The EMFB concept is very closely related to the modulated complex lapped transform (MCLT) and it relies on real-valued subband signals. The main advantage of EMFBs is a very efficient implementation, which is based on M-channel CMFBs and sine modulated filter banks (SMFBs) [13,14]. It is well known that critically sampled PR CMFBs have efficient implementations based on polyphase structures [5], lattice structures [7], and fast ELT structures [6], but efficient implementation structures for SMFBs have received only little attention in the literature. This paper extends our previous work in [14] by providing more detailed derivation of ELT and polyphase SMFB structures, introducing also our lattice structures for SMFBs, presenting an alternative approach to obtain an SMFB using original ELT structures, and comparing the arithmetic complexity of the ELT-based EMFBs with the complexity of MDFT-FBs. Section 2 introduces the key ideas of EMFBs. The efficient implementation structures for CMFBs are briefly reviewed and following the same kind of ideas, fast implementation structures for SMFBs are developed in Section 3. Section 4 gives the computational complexity calculations, in terms of the number of multiplications and additions for the ELT-based EMFBs. The MDFT-FB is reviewed in Section 5. Based on the number of arithmetic operations, the ELT-based EMFBs are shown to be less computationally complex and to have simpler implementation structures than the MDFT-FBs.

EMFBs
The EMFB is a further development of MCLT. The MCLT is a 2x oversampled system for the processing of real-valued signals, whereas the EMFB is a critically sampled complex modulated filter bank that suits complex-valued signals. The MCLT uses subfilters whose order is restricted to N = 2M−1, but EMFBs can utilize longer subfilters. Therefore, the EMFB can be considered to be a complex extension of the ELT. The odd-stacked synthesis filters f e k (n) and analysis filters h e k (n) are generated from a linear-phase lowpass FIR prototype filter by using exponential modulation sequences where k = 0, 1, . . . , 2M − 1, n = 0, 1, . . . , 2KM − 1, and j = √ −1. This means that each analysis filter is just a timereversed and complex-conjugated version of the corresponding synthesis filter. Here and later on, the superscripts e, c, and s denote exponential, cosine, and sine modulations, respectively. Figure 1 shows the EMFB system, where the analysis filter bank decomposes a complex-valued high-rate signal into low-rate subband signals. There are 2M subbands, twice as many as the downsampling factor, but the overall sample rate is preserved because only real parts are used in the subband processing unit. The synthesis filter bank can reconstruct the complex-valued output signal perfectly from the real-valued subband signals as verified in [13]. The resulting output signal is a delayed version of the input signal and the total system delay is equal to the filter order N.
The key idea behind the efficient implementation of the critically sampled EMFB system is that the EMFB channel filters can be represented using cosine and sine modulated channel filters as follows: Synthesis CMFB x 0 (m) Figure 2: Efficient implementation for the EMFB.
These definitions enable the efficient implementation of Figure 2 because real-valued subband signals can be simplified according to The filter bank structures of Figures 1 and 2 are equivalent, but obviously the latter is preferable for practical implementation. This is because Figure 1 suggests that also imaginary parts of subband signals are computed and then discarded. In Figure 2, these useless imaginary parts are not computed at all.

COSINE AND SINE MODULATED FILTER BANKS
In the literature, there exist two widely used modulation schemes for odd-stacked PR CMFBs [15]. The modulation sequences are slightly different due to different scaling factors and phase terms. Here, the ELT definitions are used and the impulse responses of sine modulated synthesis filters are obtained when the cosine term is simply replaced by the sine: where k = 0, 1, . . . , M − 1 and n = 0, 1, . . . , 2KM − 1. The kth analysis filters are simply the time-reversed version of the corresponding synthesis filters, that is, h c k (n) = f c k (N −n) and h s k (n) = f s k (N−n). Moreover, the following relations between the sine modulated and cosine modulated channel filters are found: h s k (n) = (−1) k+K f c k (n) and f s k (n) = (−1) k+K h c k (n). Because only the phases of the modulating sinusoids are different, the SMFBs are not commonly used alone, but they can cooperate with CMFBs in various applications. In [12], it is already shown that also the SMFB satisfies the PR conditions, if the same prototype filter as in the case of PR CMFB is used.
In efficient implementations, M-channel CMFBs and SMFBs can be divided into prototype filter and modulation parts. When comparing the modulation sequences and the basis functions of the discrete cosine/sine transforms of type IV (DCT-IV/DST-IV) [16], it becomes clear that the required modulation parts can be realized using M × M DCT-IV and DST-IV. These transform matrices are symmetric and they satisfy the following properties: is the identity matrix. Moreover, there exists a simple connection between these block matrices Φ s = I ± Φ c J, where I ± = diag(1, −1, 1, −1, . . . , 1, −1) and J denotes the reversing block matrix, which has ones on its antidiagonal and all the other elements are zero.

ELT-type of structures
The modulating cosines in (4) have the same frequencies as the basis functions of the DCT-IV. However, certain prototype filter coefficients need sign changes because of the relationship between the modulation sequences and the DCT-IV [17]. Anyway, the existence of a DCT-IV-based fast algorithm for the ELT is expected. A key point to the fast ELT implementation is the fact that the PR conditions imply an orthogonal butterfly implementation. In order to see this fact, the derivations for K = 1, K = 2, and the generalized case are presented in detail in [15].
The basic idea of Figure 3 is that the prototype filter, which is multiplied by the sign changing sequence, can be implemented with K cascaded orthogonal butterflies D c t (t = 0, 1, . . . , K − 1) and pure delays, which are connected to the outputs 0, 1, . . . , M/2 − 1 of the butterfly matrices [6]. These symmetric matrices have nonzero values only on their diagonals and antidiagonals: where The last element of the fast ELT structure is the DCT-IV transform. Because the DCT-IV matrix and the matrices D c t are their own inverses, the transform and the butterflies in the inverse ELT structure are identical to those in the direct ELT structure. The proposed ELT-type of structure for SMFBs is a generalization of the SMFB structure for K = 1 that is implicit in the basic implementation of the MCLT in [2]. In order to obtain a fast implementation for the analysis SMFB, one could expect that exactly the same butterfly matrices as in the CMFB structure could be directly used and only the transform part has to be changed. Unfortunately, this does not work directly because the impulse response of the prototype filter, which is multiplied with the sign changing sequence, is not perfectly symmetric. Therefore, when the sine modulation sequence is realized using DST-IV, the reversed version of the prototype filter is needed. The sine modulated analysis filters and the cosine modulated synthesis filters are linked together with a factor (−1) k+K . So, the inverse ELT structure includes this needed reversed version of the prototype filter and it also offers some hint for a modulation part as well.
At first, it is possible to consider the inverse ELT in such a manner that the whole system is flipped left to right, upside down, changing the direction of the lines, replacing upsamplers by downsamplers, replacing summations by connection points, and vice versa. Now the butterfly stages and DCT-IV are flipped upside down. If these new butterfly matrices are used instead of D c t matrices in the ELT structure, the impulse response of the prototype filter obtained from this structure is a reversed version of the one which can be obtained from the direct ELT structure. If the DST-IV replaces the flipped DCT-IV, the resulting impulse responses of the channel filters are reversed versions of the corresponding filters obtained from the original ELT structure. Moreover, every other channel filter is multiplied by −1, that is, every channel is multiplied by (−1) k depending on the channel number k. Now everything is fine when K is even, but when K is odd the extra multiplication by −1 is needed for every channel filter due to the factor (−1) K . This multiplication can be included in every butterfly matrix D s t and this results in the following butterfly matrices: In summary, SMFBs can be implemented using K cascaded orthogonal butterflies D s t , delays, and DST-IV transforms.

SMFBs using the original ELT structure
The relationship between DCT-IV and DST-IV and the relationship between the modified DCT and the modified DST presented in [18] give the idea of how to compute either of the two transforms using only one fast algorithm. Here, it is shown that this method also results in an alternative approach for obtaining a sine modulated analysis filter bank. The mathematical proof can be found in the appendix. Let us first define that the top path after the delay chain is numbered as k = 0 and the bottom line as k = M − 1. Now the scheme is as follows.
(1) Change the signs of odd elements in input data sequence. After a delay chain and downsamplers, the input values coming to odd paths are sign-changed. When feeding this modified sequence through the butterflies of the ELT, the input sequence to the transform block is almost correct if compared with the sequence obtained from the SMFB structure. The values coming from the even-numbered paths are correct, but the values from the odd-numbered paths have opposite signs. These opposite signs can be compensated if the DCT-IV matrix is used instead of the DST-IV matrix. After the modulation block all the subband signals are correct, but they are just in the reverse order. Thus, using the above procedure, it is possible to compute cosine/sine modulated sequences using only one fast algorithm originally designed for just ELT computing. It should be also pointed out that the sine modulated synthesis system is obtained when the abovementioned steps are done in reverse order.

Polyphase and lattice structures
In [19,20], it is indicated on a general level that cosine and sine modulated filter banks can be implemented in such a manner that they share the same polyphase filters. This fact Ari Viholainen et al.

5
is already verified in [14], where polyphase structures for SMFBs are derived. Here, it is exactly shown what kinds of modifications are needed when using the ELT type of cosine modulation sequence and its sine modulated counterpart.
For the cosine modulation sequence where n = 0, 1, . . . , 2KM −1 and k = 0, 1, . . . , M −1, the periodicity according to n is 2M. Therefore, it is straightforward to use the direct 2M polyphase decomposition of the prototype filter with this modulation sequence. By using matrix notations, the synthesis filters ( f c k (n) = [P] n,k ) can be expressed by multiplying a diagonal prototype filter matrix H with the modulation matrix Ψ c . These matrices can be further partitioned to 2M × 2M H l matrices and 2M × M Ψ c l matrices as follows: where It can be noticed that Ψ c l = (−1) l Ψ c 0 . Moreover, the matrix Ψ c 0 can be written using the DCT-IV matrix in the following way: where J c is a 2M × M matrix that consists of M/2 × M/2 submatrices. The matrix P can be written as a decomposition of the prototype filter matrix, J c matrices, and DCT-IV matrices: DCT-IV  This system of matrices describes a synthesis filter bank and the resulting analysis CMFB is shown in Figure 4. The corresponding SMFB can be obtained by replacing the DCT-IV with the DST-IV and using the mapping matrix J s instead of J c . The required mapping matrices are defined as follows: where the matrix I is the identity matrix, J is the reversing block matrix, and 0 is the zero matrix. In Figure 4, the prototype filter is expressed in the form of 2M polyphase components using type-1 polyphase filters: In order to get the signs to match the matrix decomposition, the polyphase filters are written in the form of −G i (−z 2 ). Moreover, the matrix J c has to be transposed and multiplied by (−1) K−1 so that 2M signals from the polyphase filters are mapped properly to the DCT-IV. This extra multiplication is not needed in the synthesis structure. The polyphase filter structure can be further simplified by forming M filter pairs as shown in Figure 5. This is because the general polyphase component pair {G i (z 2 ), G i+M (z 2 )} can share a common delay line. In the case of PR filter banks, the polyphase component pair can be efficiently implemented by using a two-channel lattice structure. Our lattice structures are formed in a slightly different way than in [7] because the definitions (4)-(5) for cosine and sine modulated channel filters have been used. Moreover, the presented lattice structures try to mimic the ELT structure. The transform part is fixed and the same butterfly angles as in the case of ELT are used. The resulting lattice sections are in reverse order and some signs of the coefficients are different if compared to those structures presented earlier in the literature. In Figure 6, lattice coefficients have been chosen in such a manner that the CMFB is directly obtained when a proper mapping is applied. Because the lattice coefficients have been chosen in an appropriate manner, the SMFB can be obtained just multiplying certain paths by −1. The required mapping matrices are

COMPUTATIONAL COMPLEXITY OF EMFBs
By using the algorithm presented in [15], The polyphase structure consists of 2M polyphase filters, each requiring K multipliers and K − 1 adders, and the mapping matrix, requiring M adders. The lattice structure is realized using cascaded lattice sections, delays, and a mapping matrix. There are M two-channel lattices each having one twomultiplier section and K−1 four-multiplier sections with two adders. In the fast ELT structure, the prototype filter is realized using K cascaded butterfly stages and pure delays. Each butterfly stage consists of M/2 butterflies that are realized by using four multipliers and two adders. The number of multiplications can be further reduced because all the coefficients in butterfly matrices D c 1 to D c K−1 and lattice matrices can be scaled in such a manner that their diagonal entries are equal to 1 or −1 or their antidiagonal entries are equal to 1 [6,7]. In order to compensate these modifications, the resulting scaling factors have to be applied to D c 0 or to the scaling multipliers in the case of the lattice structure. Furthermore, in the ELT structure, the four-multiplier butterfly matrix D c 0 requires only three multiplications and three additions because it can be realized using a special trick   presented in [15]. The final computational complexities are summarized in Table 1. As can be seen, ELT structures require (M/2)(2K − 1) multiplications and (M/2)(2K − 3) additions less than the direct 2M polyphase or the lattice structures.
The efficient implementation structure of analysis EMFB uses CMFB and SMFB as building blocks. In order to form correct subband signals, the EMFB structure also needs simple butterflies requiring just 2M adders. PR CMFBs and SMFBs can be realized by using fast ELT type of structures. By applying the computational complexity formulas of the fast ELT and noting that the input signals to the butterfly stages are real-valued, the number of real multiplications and additions per M complex-valued input samples for the analysis EMFB are [14] μ EMFB (M) = M 2K + log 2 M + 3 , It should be also pointed out that all operations take place with real-valued instead of complex-valued signals and arithmetic.

COMPARISON BETWEEN EMFBs AND MDFT-FBs
Our reference model is a 2M-channel even-stacked MDFT-FB system. 1 The key idea of analysis MDFT-FB is to use twostep downsampling for each subband [21]. After a complexvalued input signal is filtered using 2M analysis filters, the complex-valued subband signals are first downsampled by a factor of M. The resulting subband signals are further downsampled by a factor of 2 with and without a unit delay. The critical sampling is obtained by taking the real part of one polyphase component and the imaginary part of the other polyphase component in each subband and alternating this from one subband to the next. In the synthesis filter bank, similar modifications are performed. The price to be paid for these modifications is that the total system delay increases from N to N + M.
In [22], the first realization of MDFT-FB consisted of two DFT polyphase filter banks, one without delay and another delayed by M samples. Instead of calculating the complexvalued subband signals by two 2M × 2M IDFTs and discarding one of the real or imaginary parts, the required 2M real parts and the 2M imaginary parts can be calculated by using only a single 2M × 2M IDFT. Although two IDFTs have been reduced to a single one, each polyphase filter still has to be realized twice. However, the same input signals apart from a possible delay are fed to polyphase filters G i (z) and G (i+M)modulo2M (z). Therefore, the same delay chain can be used for both polyphase filters. As an example, an analysis part of 4-channel MDFT-FB is shown in Figure 7. In the case of PR MDFT-FB, polyphase filter pairs can be efficiently realized by using the lattice structure.
The simplified version of the analysis filter bank consists of 2M two-channel lattices, a 2M × 2M IDFT block, 2(2M − 2) extra multiplications by 0.5, two Re-operations, two j · Im-operations, and 2M + 2(2M − 2) extra additions [22]. The input signals of the lattices are complex-valued and, after scaling, each lattice can be realized using 2K multipliers and 2(K − 1) adders. Except for two adders, where input signals are purely real/imaginary-valued, input signals for other blocks are still complex-valued. According to [15], a 2M-length complex-valued DFT/IDFT via the "split-radix"  Both the MDFT-FB and EMFB have 2M subbands, but the EMFB takes in only M complex-valued input samples 1 In [1,11,21,22], M stands for the number of complex-valued channels.
This paper uses 2M for the same purpose because M already denotes the number of real-valued channels.
at time, whereas the MDFT-FB takes in 2M complex-valued input samples. In order to be able to properly compare the MDFT-FB and EMFB systems, two M-length complexvalued input sequences have to be processed in the case of EMFB. This results in the complexities that are shown in Table 2. The difference between computational complexities is in favor of the EMFB structure because it requires 2M(2K −5)+4 multiplications and 2M(2K −1)−4 additions less than the MDFT-FB structure. Table 3 summarizes the number of multiplications for certain values of K and M. For example, the optimization method in [23] can be used to generate PR prototype filters whose attenuation of the highest stopband ripple is about 38 dB and 50 dB, if the K values of 3 and 5 are used. The number of channels in many subband processing applications is typically tens, whereas for audio coding and efficient data transmission systems the number of channels can be hundreds or even thousands. So, if high number of highly frequency-selective channels is desired, then the EMFB structure offers significant improvements over the MDFT-FB structure. Another advantage of EMFBs is very clear and simple implementation structure. Moreover, the EMFB structure does not increase the total system delay.
The ELT-based EMFB cannot be used with biorthogonal low-delay filter banks, whereas the MDFT-FB realization with polyphase filters is directly valid for biorthogonal filter banks. It should be also pointed out that only PR CMFBs and SMFBs can be implemented using the ELT structures or lattice structures. Naturally, the direct 2M polyphase structures can be used to implement the prototype filter part for nearly PR filter banks. In [24], it is shown that the number of multiplications can be reduced by 25% compared to the direct 2M polyphase structure, if two polyphase branches are combined to one as in the case of the ELT. This improvement comes from the same trick that can be used for computing a complex multiplication with three multipliers and three adders.

CONCLUSION
In this paper, efficient CMFB-and SMFB-based EMFB implementations were studied and compared with the MDFT-FB implementation. It was shown that critically sampled PR CMFB structures (ELT, polyphase, and lattice) require only small changes for SMFB implementations. Furthermore, it is possible to compute cosine and sine modulated sequences using only one fast algorithm originally designed for just ELT computing. Based on the number of arithmetic operations, the proposed ELT-based EMFBs were shown to be less computationally complex and to have simpler implementation structures than the MDFT-FBs. Thus, the EMFB can be considered as a computationally efficient building block for the processing of complex-valued signals in various subband processing and data transmission systems.

APPENDIX
This appendix shows how a sine modulated sequence can be obtained from the original ELT structure (Figure 3). Let x(n),     5  16  452  708  416  544  32  964  1474  896  1152  64  2052  3076  1920  2432  128  4356  6404  4096  5120  256  9220  13316  8704  10752  512  19460  27652  18432  So when feeding the modified input data sequence through the first butterfly stage in the CMFB structure, the input sequence to the next stage is almost correct if compared with the sequence obtained from the SMFB structure. The even-numbered values are correct and the odd-numbered