 Research
 Open Access
 Published:
Unified commutationpruning technique for efficient computation of composite DFTs
EURASIP Journal on Advances in Signal Processing volume 2015, Article number: 100 (2015)
Abstract
An efficient computation of a composite length discrete Fourier transform (DFT), as well as a fast Fourier transform (FFT) of both time and space data sequences in uncertain (nonsparse or sparse) computational scenarios, requires specific processing algorithms. Traditional algorithms typically employ some pruning methods without any commutations, which prevents them from attaining the potential computational efficiency. In this paper, we propose an alternative unified approach with automatic commutations between three computational modalities aimed at efficient computations of the pruned DFTs adapted for variable composite lengths of the nonsparse inputoutput data. The first modality is an implementation of the direct computation of a composite length DFT, the second one employs the secondorder recursive filtering method, and the third one performs the new pruned decomposed transform. The pruned decomposed transform algorithm performs the decimation in time or space (DIT) data acquisition domain and, then, decimation in frequency (DIF). The unified combination of these three algorithms is addressed as the DFT_{COMM} technique. Based on the treatment of the combinationaltype hypotheses testing optimization problem of preferable allocations between all feasible commutingpruning modalities, we have found the global optimal solution to the pruning problem that always requires a fewer or, at most, the same number of arithmetic operations than other feasible modalities. The DFT_{COMM} method outperforms the existing competing pruning techniques in the sense of attainable savings in the number of required arithmetic operations. It requires fewer or at most the same number of arithmetic operations for its execution than any other of the competing pruning methods reported in the literature. Finally, we provide the comparison of the DFT_{COMM} with the recently developed sparse fast Fourier transform (SFFT) algorithmic family. We feature that, in the sensing scenarios with sparse/nonsparse data Fourier spectrum, the DFT_{COMM} technique manifests robustness against such model uncertainties in the sense of insensitivity for sparsity/nonsparsity restrictions and the variability of the operating parameters.
Introduction
Motivation
Many signal processing applications require computation of the socalled pruned discrete Fourier transform (DFT), i.e., an efficient alternative to compute the required DFT when the input sequence and/or the required output sequences are smaller than the length of the full DFT (a full DFT means that all the output components are to be computed, and all the input elements are used to compute the transform); in the literature those are referred to as pruned fast Fourier transforms (FFTs) or pruned DFTs [1]. Common practical examples relate to, e.g., the least mean squared (LMS) optimal DFTbased pruned signal filtering [2], and the complexityreduced computational implementation of the orthogonal frequency division multiplexing systems [3]. Another practical example relates to efficient implementation of the matched spatial filtering (MSF) algorithm for performing the range and azimuth data compression in unfocused of fractionally focused synthetic aperture radar (SAR) system that both employ the pruned DFTbased MSF processing of the trajectory data signals performed in a factorized fashion in the socalled slow time and fast time data acquisition scales [4–6]. Other examples relate to DFTbased analysis of remote sensing (RS) data acquired with a variety of sensor systems, ranging from seismology [7] to multispectral radiometry [8]. Other authors as Zhu et al. in [9] proposed an algorithm for performing SAR polar format regridding interpolation suited for the logicinmemory paradigm (hardware/architecture solution) and to provide the necessary design automation tool chain to implement their proposed algorithm (e.g., FFTs for image formation) in advanced silicon technology. It is important to note that a majority of realworld RS data acquisition and processing problems can be qualified as sensing in harsh environments [4–8, 10, 11] in the sense of intrinsic problem model uncertainties peculiar for such RS modalities. In a context of pruned DFTs, realistic harsh sensing scenarios are characterized by the uncertainties attributed to zeropadded input data acquisition modes with variable composite length windowing of the input and/or output Fourier transform sequences, in general cases, with nonsparse Fourier spectra [10–12]. Those specifics motivate the development of efficient pruned DFT/FFT techniques particularly adapted for computational implementation with uncertain data acquired in harsh sensing scenarios.
Related work
Traditional DFT algorithms adapted for such uncertain scenarios typically employ some pruning methods without any commutations, which prevent them from attaining the potential computational efficiency. Most of the proposals reported in the literature are based on construction of pruning modalities of specific FFTrelated algorithms. Some of them prune the input of a specific FFT algorithm, others prune the output, and just a few can prune the input and output (inputoutput) at the same time. Markel in [1], and Skinner in [13], proposed the input pruning methods based on a radix2 FFTs, while Yuan et al., in [14], proposed an input pruning of a splitradix FFT. The approaches of Bouguezel et al. [15] and Fan et al. [16] are applicable for output pruning a radix2 FFTs, while the Xu’s et al. [3] proposal suggests pruning the output of a splitradix FFT. In addition, Sreenivas et al., in [17], Roche, in [18], and Wang et al., in [19], developed the methods for pruning the inputoutput at the same time. The first one is based on a radix2 FFT, the second one employs the splitradix FFT, and the third one performs the mixedradix FFT, respectively. A majority of those methods are applicable only for computing DFTs with the length of a power of two that drastically restricts their applicability to general uncertain sensing scenarios.
On the other hand, a family of novel socalled sparse FFT (SFFT) algorithms adapted to computing the FFTs, when only a few Fourier spectrum coefficients of the input signal are different from zero (few largest coefficients of the Fourier transform spectrum), has been developed recently [20, 21]. The celebrated SFFTrelated algorithms, socalled SFFTv1 and SFFTv2, were reported by Hassanieh et al., in [20]. Later, in [21], the improved SFFTrelated versions, addressed as SFFTv3 and SFFTv4, were reported. Another algorithm that considers the Fourier spectrum sparsity restrictions is the socalled FADFT2 reported and implemented in the AAFFT library [22]. However, the SFFTrelated algorithms significantly outperform the AAFFT as it was corroborated in [20].
It is worthwhile to mention that the SFFTrelated techniques are applicable only for the sparse sensing scenarios; e.g., referring to [20, 21], the authors exemplified the sparsity level by imposing the restriction that up to 89 % of the Fourier coefficients are zeroes or negligible, thus can be discarded. Such a restriction could be valid in a variety of data compressing applications, e.g., compression and recovery of video data not degraded by noise and/or imaging system instrumental function [20]. Nevertheless, the restriction on such sparsity is not valid for many realworld operational scenarios, e.g., processing of the RS data acquired in harsh sensing environments [4–8, 10–12]. For example, in SAR imaging of nonhomogeneous scenes, e.g., urban areas, nonuniformly textured zones, etc., a majority of the Fourier transform coefficients should be considered for featureenhanced MSFbased imaging [5, 6]; thus, an 89 % of sparsity level restriction is never a feasible model assumption.
In this paper, we are interested in developing the pruned DFT (DFTs of highly composite length) algorithms applicable for nearrealtime signal processing and analysis in uncertain sensing scenarios (i.e., with nonguaranteed sparsity of the data Fourier spectra); that is why the family of the SFFTrelated techniques is beyond our detailed study here. Nevertheless, for the purpose of generality, in Section 4, we perform comparative analysis of our developed methods with the SFFT under the same conditions and constraints for different combinations of the specified processing/operational parameters.
In the related literature, in which the feasible nonsparse scenarios are considered, two competing approaches for pruning the composite (no prime) length DFTs were addressed. Sorensen et al., in [23], proposed two methods to prune composite length DFTs, first one to prune the input and another one to prune the output. Next, the methodology of MedinaMelendrez et al., in [24], merges the methods developed originally in [23] to obtain a composite structure that is capable to prune the input and/or output of a general decomposed transform at the same time. It was demonstrated, in [24], that such a computational structure could be as efficient as the one based on specific FFT algorithms [15–17]. In [24], a new methodology for decomposition over a composite length DFT has been proposed as a modification of the Sorensen’s approach [23]. Furthermore, the [24] suggests, first, to perform decimation in frequency (DIF) and, second, a decimation in time (DIT). For processing of spatial data, the corresponding decimation in the space domain should be performed similarly to the DIT operation for time data processing. To avoid misunderstandings, in the rest of the paper, we will use the same abbreviation (DIT) for both processing models and consider the time data processing as a principal model. Nevertheless, all developments are directly transferable for the space data processing scenario.
Hence, the three basic stages to compute the composite length DFTs of nonsparse data encompass the input, the intermediate, and the output stages. The decomposed transform is then pruned by eliminating, from the input and output stages, additions and multiplications by zero, multiplications by one, and all other computations not needed to obtain the required Fourier transform coefficients. In [24], such the multistage decomposed and pruned transform is referred to as FFT_{DIF−DIT−TD} (here, that method is referred as DFT_{DIF−DIT−Pr}). Nevertheless, both methods addressed in [23, 24] do not achieve the lowest attainable number of the required arithmetic operations. A possible alternative for computing few Fourier coefficients from few input elements (all nonzero, thus nonsparse) can be addressed based on the application of the secondorder Goertzel algorithm [23] modified to accept the input elements in a reverse order.
Novel contributions
The main contribution of this paper consists in the development of a new alternative method for efficient computing of a composite length DFT, when the input sequence and/or the required output sequence are smaller than the length of the full transform. Our proposal guarantees the same or smaller number of arithmetic operations in comparison with the competing methods in the literature. Moreover, it manifests robustness against sparsity/nonsparsity restrictions and the variability of the operating parameters as detailed in Sections 3 and 4.
The innovative idea is to automatically commute among three modalities to implement the DFT: the direct method, the recursive method, and the pruned decomposed transform. Thus, our new proposed composite approach unifies the decomposition of the DFT with its pruning. First, we develop an alternative technique to compute the pruned decomposed transform, in which the DIT is performed at the first stage followed by the DIF. We address this method as DFT_{DIT−DIF−Pr}. An analysis of the two alternatives (DFT_{DIT−DIF−Pr} and DFT_{DIF−DIT−Pr}) verifies that the DFT_{DIT−DIF−Pr} requires a smaller or as maximum equal number of arithmetic operations compared with the DFT_{DIF−DIT−Pr}, so the use of the DFT_{DIT−DIF−Pr} is strongly recommended when the decomposed and pruned transforms are required. Next, we demonstrate that our proposal requires a lower number of arithmetic operations than any of the pruningbased competing methods [3, 14, 23, 24]. Further, we demonstrate that both decomposed transforms (DFT_{DIF−DIT−Pr} and DFT_{DIT−DIF−Pr}) can be obtained from a general decomposition methodology. Also, it manifests the robustness in sparse and nonsparse sensing scenarios (i.e., operability for an arbitrary number of consecutive input elements (L _{ i }), the number of consecutive outputs that should be computed (L _{ o }), and the length of the full transform (N)) in contrast to the recently developed most prominent SFFT familyrelated methods [20, 21] operable in sparse scenarios only.
It is noteworthy to mention that in the majority of practical computational scenarios, significant savings in the number of arithmetic operations with the proposed technique are achieved, e.g., in Section 4.1, the DFT_{COMM} technique compared with splitradix FFT (SRFFT) algorithm produces savings of 42 to 92 %.
The rest of the paper is organized as follows: in Section 2, the general decomposition transform methodology is described and explained. An analysis of all feasible transform decomposition methods is presented next in Section 3 followed by the combinational hypotheses testing optimizationbased selection of the best decomposition transform permutation modality that yields the unified commutationpruning DFT_{COMM} technique. In Section 4, comparisons among the developed unified commutationpruning technique and other competing algorithms in the sense of savings in the number of required arithmetic operations are presented and featured. Also, the proposed DFT_{COMM} method is compared in detail with the most prominent competing SFFTrelated algorithms in the context of computing the DFTs in both sparse and nonsparse (harsh) sensing scenarios for different values of the operational parameters (L _{ i }, L _{ o }, and N). Concluding remarks in Section 5 summarize the study. The Appendix provides a pseudocode for implementing the proposed method.
DFT transform decomposition
The definition of the DFT of a sequence of length N (DFT_{ N }) is given by
where \( {W}_N^{nk}={e}^{j2\pi nk/N} \) is the kernel of the transform. Let us define L _{ i } as the number of consecutive input elements different from zero and L _{ o } as the number of consecutive outputs that should be computed. If N is a composite number formed by multiplications of many integer factors, the DFT_{ N } can be decomposed into smaller DFTs. In particular, the DFT_{ N } can be decomposed into three stages of DFTs (an input stage, an intermediate stage, and an output stage) in order to avoid the arithmetic operations involving zeros, multiplications by one, and the operations not required to compute the final outputs. Here beneath, we briefly describe such feasible decompositions. Assuming that there are two integer factors, D _{ ip } and D _{ op }, of N such that N/D _{ ip } D _{ op }≡ P is an integer, the indexes n and k can be reexpressed as
Substituting n and k in (1) by (2), (3), the original DFT_{ N } is decomposed into
Here, it is assumed that D _{ ip } and D _{ op } are chosen in such a way that N/D _{ ip } ≥ L _{ i } and N/D _{ op } ≈ L _{ o }. Thus, index n _{3} is always equal to zero; k _{3} is near 0, hence (4) can be next rewritten as follows
The computation of (5) is more efficient than the direct computation of the DFT_{ N } since the complex arithmetic operations dependent on n _{3} have been pruned. The complex exponential in (5) can next be grouped in different ways, resulting in different structures for the pruned decomposed transform. The methodology of [24] suggests expressing the pruned decomposed transform as
The pruned decomposed transform of (6) can be interpreted as follows: first, apply DIF to the DFT_{ N } with D _{ ip } as a decomposition factor, then, DIT to the resulting DFTs with D _{ op } as a decomposition factor and, finally, perform the pruning. In [24], the pruned decomposed transform of (6) was addressed as an FFT_{DIF−DIT−TD} modality, that in our notations, we refer to as DFT_{DIF−DIT−Pr}. A computational diagram of such technique (6) is presented in Fig. 1.
An alternative grouping of the complex exponentials in (5) yields
The computing of the pruned decomposed transform (7) requires, first, application of DIT to the DFT_{ N } with D _{ op } as a decomposition factor and, then, application of DIF to the resulting DFTs with D _{ ip } as a decomposition factor.
Hence, we refer to the pruned decomposed transform of (7) as a DFT_{DIT−DIF−Pr} modality. A computational diagram of such the technique (7) is presented in Fig. 2.
The DFT_{DIT−DIF−Pr} involves three processing stages: an input stage (computation of y(n _{1}, n _{2}, k _{1})), an intermediate stage (computation of D _{ ip } D _{ op } DFTs of length P), and an output stage (computation of the complex multiplications and additions dependent on index n _{1}).
Proposed method
Our method employs three different alternatives to compute the DFT_{ N }: a direct method, a recursive method, and/or a pruned decomposed transform. Admissible permutations/allocations of all feasible decompositionpruning modalities compose all possible hypotheses regarding the feasible alternative schemes for computing the composite DFTs.
All feasible commutingpruning implementation structures are listed in Table 1. Those could be addressed as possible search “hypotheses” to be tested. Thus, the problem of selection of an optimal computingpruning implementation structure can be recast as a hypotheses testing task. All feasible hypotheses relate to formal implementation structures specified in Table 1. Four of them prescribe cascade computational implementation involving cascade combinations of structures (hypotheses H_{4},…, H_{7}), while four others (hypotheses H_{9},…, H_{12}) prescribe combinational unions of the previous hypotheses. It is important to remark that (1), (6), and (7) are the mathematical definitions of H_{8}, H_{4}, and H_{5}, respectively. Hence, the decisionmaking process that is a selection from those feasible operational prescriptions cannot be formalized as an optimization strategy for minimization of some cost function subject to relevant restrictions/constraints specified in a closed analytical form. Thus, due to the composite combinations (hypotheses over hypotheses with cascade interlaces, as in the cases of hypotheses H_{9},…, H_{12}), the proper selection of the preferable implementation structure cannot be cast as an analytically tractable closedform optimization problem. Hence, it should be treated as a test of combinations of hypotheses (hypotheses over hypotheses, as in the case of H_{9},…, H_{12}), sometimes referred to as a combinational (or combinatorialtype) hypothesis testing problem [23, 24]. The global optimal solution to such a kind of problems presumes test of all feasible hypotheses in the list, making the decision in favor of the best one (in the prescribed quality measure), and rejection of all other competing hypotheses [23, 24]. In our particular case, only 12 hypotheses are admissible/feasible; thus the (global) optimal selection of the best possible implementation structure can be found simply via employing the socalled brute force search over complete hypotheses list specified in Table 1.
Sorensen et al. [23] sketched how to prune the input and output of DFTs using independent allocations listed in Table 1 as H_{1}, H_{2}, and H_{3} and featured in Fig. 3a. However, the authors of [23] concluded that their pruning method is less efficient than other pruning methods in the cases when both the number of input and output elements are bounded. They recommended turning to the method proposed by Sreenivas et al., in [17], i.e., to prune the input and output of a power of two length FFTs. Furthermore, an efficient inputoutput pruning method for a power of two length FFTs was proposed by Roche in [18].
Later, a more efficient inputoutput pruning method for composite length DFTs was developed in [24]. Such commuting between H_{4} ∪ H_{6} leads to hypothesis H_{9} as featured in Fig. 3a. In [24], such a technique was constructed as a modification of the transform decomposition proposed originally by Sorensen et al., in [23], but with extra capability to perform the inputoutput pruning at the same time. Additionally, the computation of each final output employs a commutation between a direct method and the 2BF filtering algorithm, i.e., the 2BFfiltering algorithm is an efficient method for computing a subset of final outputs from their decomposition transform [23, 24].
In our study, two additional feasible hypotheses are devised to perform unified commutationpruning techniques for efficient computations of composite length DFTs (hypotheses H_{11} and H_{12}) as reported in Fig. 3b. Therefore, our proposal relates to an adaptive commuting between feasible implementation structures specified by the union of hypotheses H_{10} ∪ H_{3} ∪ H_{8} that is included in Table 1 as an alternative composite hypothesis H_{12}. A comparison of computational complexities related to implementation of the competing computational structures formalized by hypotheses H_{9} and H_{10} (in the number of required arithmetical operations) is reported in Table 2. Also, the relevant comparisons between two other feasible structures specified by hypotheses H_{11} and H_{12}(referred here as DFT_{COMM−DIF−DIT−Pr} and DFT_{COMM−DIT−DIF−Pr}, respectively), are reported in Tables 2 and 3 and Figs. 5a–f (in the sense of the number of required arithmetic operations).
The selection of proper permutation/allocation structure directly relates to the considered above problem of selection of an optimal commutationpruning implementation structure casted and treated as a combinational hypotheses testing task. All feasible hypotheses \( {\left\{{\mathrm{H}}_h\right\}}_{h=1}^{12} \) relate to formal implementation structures specified in Table 1. Now, we are ready to find the best permutation/allocation structure in the sense of the imposed quality measure (in our case in the sense of the lowest possible number of required arithmetical operations).
Analysis of the hypotheses
Let us analyze, first, the pruned decomposed transform and deduce whether the direct or recursive method would be preferable. The total number of arithmetic operations (OPER_{tot}) required by the DFT_{DIF−DIT−Pr} and the DFT_{DIT−DIF−Pr} depends on the number of operations needed to be performed to implement the input stage (OPER_{input}), the output stage (OPER_{output}), and the intermediate stage (D _{ ip } D _{ op }OPER_{DFTP}), correspondingly. Thus, one could express OPER_{tot} of both pruned decomposed transforms as
According to (8), OPER_{tot} depends on L _{ i }, L _{ o }, N, D _{ ip }, D _{ op }, and the algorithm employed to implement the D _{ ip } D _{ op } DFT_{ P } blocks (OPER_{DFTP}).
At the input and output stages, there are multiplications by one, so those multiplications are avoided at all in our approach. Also, the multiplications by one at the input stage are also avoided depending on whether DFT_{DIF−DIT−Pr} or DFT_{DIT−DIF−Pr} was executed in the particular employed pruned decomposed transform modality.
If the DFT_{DIF−DIT−Pr} modality is employed (see the general diagram in Fig. 1), then:

At the input stage, the multiplications by one are excluded when n _{1} = n _{2} = 0 and k _{1} = 0.

Furthermore, the multiplications by one at the output stage are also avoided when n _{1} = 0 or k _{2} = 0.
Therefore, the DFT_{DIF−DIT−Pr} modality always requires fewer complex multiplications to compute the output stage than the DFT_{DITDIFPr} modality (this is reported in Tables 2 and 3).
On the other hand, if the DFT_{DITDIFPr} modality is used (see Fig. 2), then:

At the input stage, the multiplications by one are excluded when n _{2} = 0 or k _{1} = 0.

Also, at the output stage, the multiplications by one are avoided when k _{1} = k _{2} = 0 or n _{1} = 0.
Therefore, the DFT_{DITDIFPr} modality always requires fewer complex multiplications at the input stage than the DFT_{DIF−DIT−Pr} modality (as it is corroborated in the analysis reported in Tables 2 and 3).
The output stage of both pruned decomposed transform modalities can be computed by the direct addition of complex multiplications or a kind of recursive algorithm as those proposed in [23] (referred to as the 2BF filtering method), which reduces the number of required multiplications by about half. The number of arithmetic multiplications required by the output stage of the DFT_{DIF−DIT−Pr} algorithm is equal to 4 (L _{ o } − D _{ ip }) (D _{ op } − 1) when (L _{ o } > D _{ ip }) and (D _{ op } < 4). Next, the number of arithmetic multiplications is equal to (L _{ o } − D _{ ip }) (2D _{ op } + 2) when (L _{ o } > D _{ ip }) and (D _{ op } ≥ 4). Thus, the 2BF filtering algorithm can be effectively used to compute the output stage.
On the other hand, the number of arithmetic multiplications required to compute the output stage of the DFT_{DITDIFPr} algorithm is equal to 4(L _{ o } − 1) (D _{ op } − 1) when (D _{ op } < 4); and the number of arithmetic multiplications is equal to (L _{ o } − 1) (2D _{ op } + 2) when (D _{ op } ≥ 4). Thus, the 2BF filtering algorithm can also be effectively employed to compute the output stage.
In [23], it was proven that the 2BF filtering method is more efficient than the direct addition of complex multiplications when the number of input elements is larger than 4 (when the number of input elements is equal to 4, both methods manifest the same operational complexity performances). The output stages of both pruned decomposed transforms have the same structures, so same sort of commutations is required to efficiently compute the output stage of the DFT_{DIT−DIF−Pr}. The expressions for OPER_{input} and OPER_{ouput} for the DFT_{DIF−DIT−Pr} and the DFT_{DIT−DIF−Pr} are listed in Table 2, where it is implicitly assumed that each complex multiplication requires six arithmetic operations (four real multiplications and two real additions), and each complex addition requires two arithmetic operations (two real additions).
The performances of the pruned decomposed transforms depend on the decomposition factors, D _{ ip } and D _{ op }. A simple analysis can be carried out to deduce which decomposition factors are preferable to be used. Our unified commutationpruning method performs the decomposition of the DFT_{ N } into three stages of smaller dimension DFTs and pruning part of those inputs that are equal to zero and/or part of those outputs that are not needed to compute the final Fourier coefficients.
Thus, the decomposed transform algorithm always selects a pair (D _{ ip }, D _{ op }) for which the largest DFTs could be successfully pruned, or equivalently, a pair (D _{ ip }, D _{ op }) for which the intermediate stage results in the smallest dimension DFTs.
The DFTs of the intermediate stage have a size of N/D _{ ip } D _{ op } ≡ P, so D _{ ip } and D _{ op } should be chosen as large as possible. Furthermore, the values for the decomposition factors should satisfy the bound N/D _{ ip } ≥ L _{ i } (where, N/D _{ ip } must be close to but higher than L _{ i }) and N/D _{ op } ≈ L _{ o }, as it was considered in the derivation of (5). Hence, the pair of decomposition factors (D _{ ip }, D _{ op }) closest to (N/L _{ i }, N/L _{ o }) that satisfy D _{ ip } ≤ N/L _{ i } are used by the decomposed transform algorithm, according to the proximity evaluated by its Euclidean distance.
Let us now consider the cases when the number of input elements (L _{ i }) or the number of the required Fourier coefficients (L _{ o }) is too small. In these cases, for the both modalities, the general diagrams presented in Figs. 2 and 1 clarify the following features of the DFT_{DIT−DIF−Pr} and the DFT_{DIF−DIT−Pr} algorithms, respectively.

If L _{ i } ≤ D _{ op }, at most one input of each DFT_{ P } (i.e., the first one) in the intermediate stage would be applied; therefore, their P outputs would be replicas of that single input.

For L _{ o } ≤ D _{ ip }, only the first output of each DFT_{ P } (this corresponds to a simple addition of the input elements) is required to compute the final Fourier coefficients.
Thus, inefficient implementations of the DFT_{ P }s yield the inequalitytype constraints Li ≤ D _{ op } or L _{ o } ≤ D _{ ip }. In these cases, our method commutes to efficiently perform the direct computation of the DFT_{ N } or an efficient recursive alternative (via performing the 2BF filtering technique).
Sorensen et al., in [23], proposed a method to compute a subset of the output components of their proposed specific DFT decomposition; this algorithm was referred to as a 2BF filtering method. The 2BF filtering method [23] was derived as a modification of the previously addressed Goertzel algorithm [25]. The 2BF filtering method takes advantages of the periodicity and the shifted cyclic convolution shape between the input sequence and the \( {W}_N^{nk}={e}^{j\left(2\pi /N\right)kn} \) factor.
The transfer function H(z) of a system that performs the 2BF filtering method is given by the equation
The corresponding algorithmic diagram of the secondorder 2BF method is presented in Fig. 4. Thus, (9) is the mathematical definition of H_{3}.
The poles of the system transfer function (the roots of the polynomial in the denominator of H(z)) have to be evaluated L times (n = 0, 1, 2,…, L − 1), while the zeros of the system transfer function (the roots of the numerator of H(z)) only once. Here, L represents the number of consecutive nonzero input elements of the 2BF filter; i.e., in the opposite case, it represents the number of consecutive nonzero output components of the employed pruned decomposed transform modality (DFT_{DIF−DIT−Pr} or DFT_{DIT−DIF−Pr}).
The computation of each pole of (9) requires two arithmetic multiplications (two real multiplications) and two arithmetic additions (two real additions). Furthermore, the computation of the zeros of (9) requires four arithmetic multiplications and four arithmetic additions only.
The Q _{1} node in Fig. 4 is initialized with f(L − 1); therefore, the computation starts from n = L − 2. When n = 0, the complex addition of the input is only required; then, the zero is computed after such a delay. Such computational organization saves two arithmetic multiplications and six arithmetic additions for finding of each required output component.
The 2BF filtering method employed to compute the output components required by the pruned decomposed transform performed by the DFT_{COMM−DIF−DIT−Pr} or the DFT_{COMM−DIT−DIF−Pr} algorithm can be featured as the following multistage procedure:

The structure of the DFT_{DIF−DIT−Pr} contains D _{ ip } sets of D _{ op } DFT_{ P }s from which the final outputs are computed (see the general diagram in Fig. 1).

The DFT_{COMM−DIF−DIT−Pr} algorithm employs the 2BF filtering method to implement the output stage of DFT_{DIF−DIT−Pr} with L = L _{ o }, if ( (L _{ i } > D _{ op }) & (L _{ o } > D _{ ip }) ) & ( (L _{ o } > D _{ ip })&(D _{ op } ≥ 4) ) (as featured in Tables 2 and 3). Here, the required arithmetic operations are specified as follows: the number of arithmetic multiplications are equal to NumArithMult _{ 2BF } = (L _{ o } − D _{ ip })(2D _{ op } + 2) and the number of arithmetic additions are equal to NumArithAdd _{ 2BF } = 2 D _{ ip }(D _{ op } − 1) + (L _{ o } − D _{ ip })(4D _{ op } − 2).

Furthermore, the DFT_{COMM−DIF−DIT−Pr} algorithm employs the 2BF filtering method exclusively with L = L _{ i }, if ( (L _{ i } ≤ D _{ op })  (L _{ o } ≤ D _{ ip }) ) & (L _{ i } ≥ 4) (as featured in Table 3) to compute the required Fourier coefficients. Here, the required arithmetic operations are specified as follows: NumArithMult _{ 2BF } = (L _{ o } − 1)(2L _{ i } + 2) and NumArithAdd _{ 2BF } = 2(L _{ i } − 1) + (L _{ o } − 1)(4L _{ i } − 2).
In contrast, the DFT_{COMMDITDIFPr} algorithm differs from the abovementioned in the following features:

The structure of the DFT_{DITDIFPr} contains D _{ op } sets of D _{ ip } DFT_{ P }s from which the final outputs are computed (as featured in Fig. 2).

The DFT_{COMMDITDIFPr} algorithm employs the 2BF filtering method to implement the output stage of DFT_{DITDIFPr} with L = L _{ o }, if ( (L _{ i } > D _{ op }) & (L _{ o } > D _{ ip }) ) & (D _{ op } ≥ 4), (as featured in Tables 2 and 3). Here, the required arithmetic operations are specified as follows: NumArithMult _{ 2BF } = (L _{ o } − 1)(2D _{ op } + 2), and NumArithAdd _{ 2BF } = 2 (D _{ op } − 1) + (L _{ o } − 1)(4D _{ op } − 2).

On the other hand, the DFT_{COMMDITDIFPr} algorithm employs the 2BF filtering method exclusively with L = L _{ i }, if ( (L _{ i } ≤ D _{ op })  (L _{ o } ≤ D _{ ip }) ) & (L _{ i } ≥ 4) (as reported in Table 3) to compute the required Fourier coefficients. Here, the required arithmetic operations are specified as follows: NumArithMult _{ 2BF } = (L _{ o } − 1)(2L _{ i } + 2) and NumArithAdd _{ 2BF } = 2(L _{ i } − 1) + (L _{ o } − 1)(4L _{ i } − 2).
The computation of each input and/or output element in both cases detailed above is executed according to the diagram presented in Fig. 4. In closing, we note that the pseudocode presented in the Appendix (see Fig. 9) contains all scripts needed to compute each Fourier coefficient employing the 2BF filtering method.
Note once again that the 2BF filtering method has to be employed if L _{ i } is larger or equal to 4, in which case, it manifests a higher efficiency than the direct method for computing the DFT_{ N } in (1). The total number of arithmetic operations required by our proposed method is reported in Table 3.
Selection of the permutation/allocation structure
In Fig. 5, the total number of required arithmetic operations to compute the DFT_{DIF−DIT−Pr} from [24] (H_{9}), DFT_{COMM−DIF−DIT−Pr} (H_{11}), and DFT_{COMM−DIT−DIF−Pr} (H_{12}) modalities are plotted for different values of L _{ i } and L _{ o } for the test examples with N = 8192 and N = 6561 (It is assumed that the DFT_{ P }s are implemented employing the splitradix algorithm from [26] for N = 8192 and employing the radix3 algorithm from [27] for N = 6561.) All the competing alternatives corresponding to three feasible arrangements (H_{9}, H_{11}, and H_{12}) in the considered permutation/allocation structure are featured in Fig. 5. The DFT_{DIF−DIT−Pr} or the DFT_{DIT−DIF−Pr} could be used to implement the pruned decomposed transform in the DFT_{COMM−DIF−DIT−Pr} and DFT_{COMM−DIT−DIF−Pr} techniques. Here, the D _{ ip } and D _{ op } values are the pair specified by the rough selection method (the proximity evaluated by its Euclidean distance is referred as roughDP) and those obtained by an exhaustive search method (the total numbers of operations required to implement the DFT_{DIF−DIT−Pr} and the DFT_{DIT−DIF−Pr} were evaluated for each possible pair of (D _{ ip }, D _{ op }), and, then, the pair (D _{ ip }, D _{ op }) with the best performance metric is selected; this selection method is referred as exhDP). Fig. 5a–f demonstrate that two commutationpruning techniques (related to hypotheses H_{11} and H_{12}) require the same or smaller number of arithmetic operations than that specified by hypothesis H_{9}. Next, it is necessary to make a choice between H_{11} and H_{12}.
Graphs in Fig. 5 indicate that the number of operations required to perform our commutationpruning technique (DFT_{COMM−DIF−DIT−Pr} and DFT_{COMM−DIT−DIF−Pr}) with the selected decomposition factors using the roughDP method are equal to or slighty greater than those, in which the decomposition factors are specified employing exhDP. The differences correspond to the regions where the commutation conditions prescribe performing the pruned decomposed transform instead of the 2BF filtering method.
The DFT_{DIT−DIF−Pr} modality requires the same or a smaller number of arithmetic operations than the competing DFT_{DIF−DIT−Pr} for all the cases where the pruned decomposed transform is performed (as it follows from the data reported in Fig. 5). Since the same decomposition factors (D _{ ip }, D _{ op }) are used in both pruned decomposed transforms, it is sufficient to compare the number of required operations by their input and output stages (OPER_{input} + OPER_{output}) reported in Table 2 to distinguish which one is the most efficient. The comparison for the cases L _{ i } ≤ D _{ op } and L _{ o } ≤ D _{ ip } is not needed since in such scenarios, a direct or recursive method is employed instead of a pruned decomposed transform. For scenarios with D _{ op } < 4, both pruned decomposed transforms require the same number of arithmetic operations for their execution. Otherwise, for D _{ op } ≥ 4, the execution of DFT_{DIF−DIT−Pr} requires 2D _{ ip } D _{ op } − 8D _{ ip } − 2D _{ op } + 8 more arithmetic operations than DFT_{DIT−DIF−Pr} demonstrating that the latter manifests always the same or a better performance. Thus, from the combinational permutation analysis, it follows that it is always desirable to perform the DFT_{DIT−DIF−Pr} when a pruned decomposed transform would be required. In the following section, an efficient implementation of that proposed unified commutationpruning technique is detailed considering that the pruned decomposed transform is implemented using the DFT_{DIT−DIF−Pr}. In summary, we now resume that the performed combinational hypothesis testingbased optimal selection of the preferable computational structure of the decomposed DFTs made the decision in favor of hypothesis H_{12}; this yields the proposed DFT_{COMM−DIT−DIF−Pr} method (referred further on for simplicity as DFT_{COMM}) with the highest possible computational efficiency. Being the optimal decision of the performed “brute force search” based testing of all feasible hypotheses, this method is guaranteed to be globally optimal one and thus is strongly recommended for performing the required commuting between three techniques to implement the overall composite DFT in the following arrangement mode: the direct method, the recursive method, and the pruned decomposed transform implemented via DFT_{DIT−DIF−Pr}.
Comparison with other competing algorithms
A variety of competing methods for pruning the DFTs in arbitrary (nonsparse) computational scenarios have been addressed in the literature (see [1, 3, 13–19, 23, 24]). In [24], the FFT_{DIF−DIT−TD} modality (that we here refer to as DFT_{DIF−DIT−Pr}) was proposed as an alternative technique for pruning the input and/or the output of DFTs. That method [24] was compared with other pruning techniques reported in the literature until 2009. Comparisons of the methods proposed by Bouguezel et al. [15], Fan et al. [16], Sreenivas et al. [17], Roche [18], and the DFT_{DIF−DIT−Pr} reported in [24] demonstrated that the DFT_{DIF−DIT−Pr} modality requires fewer arithmetic operations than those of [15–17], while attaining the operational performances similar to that of [18]. Additionally, in Section 3, it was corroborated that our proposed DFT_{COMM} technique requires equal or less arithmetic operations than [24]. Here beneath, we compare our approach with the recently reported most prominent competing pruning methods.
Comparisons with pruningbased algorithms
The first competing algorithm for pruning the output of a SRFFT was reported in [3]. That socalled SRFFT_{pruning} algorithm was developed for an implicit restriction that only a few consecutive output components (a number L equal to a power of two) are required. Fig. 6 reports the number of arithmetic operations required to perform SRFFT_{pruning} in comparison with our unified DFT_{COMM} method for multiple output pruning examples using the decomposition factors (D _{ ip }, D _{ op }) evaluated via the roughDP method and those specified by the exhDP method, respectively.
In both cases, it is considered that the DFTs of length P required by the intermediate stage of the pruned decomposed transform have been implemented by applying the splitradix FFT, e.g., [26]. Therefore, the total number of arithmetic operations required by our proposed DFT_{COMM} method in comparison with the competing pruningbased algorithms can be found in Table 4. The savings in the number of arithmetic operations attained with the new developed DFT_{COMM} technique are reported in Tables 5 and 6.
From Fig. 6, one can deduce that our proposed DFT_{COMM} method requires fewer arithmetic operations than the competing SRFFT_{pruning} method in almost all the test cases (with the only one exception for the case L _{ o } = N/2 and L _{ o } = N/4). Next, Tables 5 and 6 report the savings in the number of arithmetic operations attained with our DFT_{COMM} in comparison with the competing SRFFT and the SRFFT_{pruning} techniques. In the scenarios with L _{ o } = N and L _{ i } = {2^{1}, 2^{2},…, N}, the DFT_{COMM} algorithm manifests 2.96 and 2.73 % savings in the number of arithmetic operations in comparison with the SRFFT_{pruning} for N = {262,144, 1024}, respectively.
In other cases, from Table 5, it follows that in the scenarios with L _{ i } = 1027, L _{ i } = 33, and L _{ o } = {2^{1}, 2^{2},…, N}, the SRFFT_{pruning} method fails to deliver a result at all. Thus, from Table 5, it follows that in the cases when L _{ i } = N = 262,144, L _{ i } = 1027, L _{ i } = 33, and L _{ o } = {2^{1}, 2^{2},…, N}, the DFT_{COMM} algorithm produces savings of 42.76, 75.02, and 91.35 %, respectively, in the number of arithmetic operations required to compute the composite length DFT in comparison with the competing SRFFT algorithm. Furthermore, from Table 6, it follows that in the scenarios with L _{ i } = 90, L _{ i } = 13, and L _{ o } = {2^{1}, 2^{2},…, N}, the SRFFT_{pruning} method fails to deliver a result at all. Thus, from Table 6, it follows that in the cases when L _{ i } = N = 1024, L _{ i } = 90, L _{ i } = 13, and L _{ o } = {2^{1}, 2^{2},…, N}, the DFT_{COMM} algorithm produces savings of 36.48, 59.30, and 81.65 %, respectively, in the number of arithmetic operations required to compute the composite length DFT in comparison with the competing SRFFT algorithm.
Yuan et al., in [14], proposed another competing, the socalled SRFFT_{pruning−time−shift} method via modifying the SRFFT_{pruning} employing a time shifting approach that yields the input pruning algorithm based on the SRFFT methodology for L consecutive nonzero input elements. It is noteworthy to stress that the SRFFT_{pruning−time−shift} approach implicitly assumes that lengths L and N may take values equal to the power of two only.
Figure 7 reports the number of required arithmetic operations to execute our proposed unified DFT_{COMM} method and those required by the competing pruned DFTs of [14]. These results verify that our approach requires fewer arithmetic operations than those required to perform the SRFFT_{pruning−time−shift} algorithm in all the reported tests. Again, it is implicitly assumed that the DFTs of length P involved in the DFT_{DIT−DIF−Pr} used by our DFT_{COMM} have been computed using the splitradix FFT [26], as reported in Table 4.
Next, Tables 7 and 8 report the savings in the number of arithmetic operations attained with our DFT_{COMM} in comparison with the competing SRFFT and the SRFFT_{pruning−time−shift} techniques. In the scenarios with L _{ o } = N and L _{ i } = {2^{1}, 2^{2},…, N}, the DFT_{COMM} algorithm manifests 5.11 and 8.71 % savings in the number of arithmetic operations in comparison with the SRFFT_{pruning−time−shift} for N = {262,144, 1024}, respectively.
In other test cases, from Tables 7 and 8, it follows that for L _{ o } = {1027, 90}, L _{ o } = {33, 13}, and L _{ i } = {2^{1}, 2^{2},…, N}, the SRFFT_{pruning−time−shift} algorithm fails to deliver a result at all. Furthermore, from Table 7, it follows that in the scenarios with L _{ o } = {N, 1027, 33} and L _{ i } = {2^{1}, 2^{2},…, N}, our DFT_{COMM} attains 43.26, 76.24, and 92.11 % savings for N = 262,144, respectively, in the number of arithmetic operations required to compute the composite length DFT. In addition, from Table 8, it follows that in the scenarios with L _{ o } = {N, 90, 13} and L _{ i } = {2^{1}, 2^{2},…, N}, our DFT_{COMM} attains 38.22, 59.22, and 82.45 % savings for N = 1024, respectively, in the number of arithmetic operations required to compute the composite length DFT.
Note that our DFT_{COMM} always requires fewer arithmetic operations than the competing SRFFT_{pruning} and SRFFT_{pruning−time−shift} algorithms due to the different butterfly schemes employed to implement the splitradix FFT algorithms [26] and the unified commutationpruning technique employed (see Section 3). The SRFFT_{pruning} and SRFFT_{pruning−time−shift} algorithms perform the twobutterfly scheme [26], while our DFT_{DIT−DIF−Pr} algorithm employs the threebutterfly scheme to achieve a reduction in the number of arithmetic operations required to implement the DFT_{ P } blocks. Furthermore, graphs of Fig. 6 report that the SRFFT_{pruning} algorithm fail to deliver a result at all in the scenarios with L equal to N due to their algorithmic construction as reported by the authors of [3]. For this reason, this algorithm cannot present a valid value for the last test of L _{ o } (it is simply unable to stop to prune at all). In addition, Fig. 6 reports minimal differences between the numbers of arithmetic operations attained by the DFT_{COMM} evaluated using the roughDP or exhDPbased selection for specifying D _{ ip } and D _{ op }. In summary, the number of arithmetic operations required to compute the SRFFT_{pruning}, SRFFT_{pruningtimeshift}, and DFT_{COMM} algorithms can be found in Table 4.
Comparison with the SFFTrelated algorithms
In a context of pruned DFTs, realworld sensing scenarios are characterized by the uncertainties attributed to zeropadded input data acquisition modes with variable composite length windowing of the input and/or output Fourier transform sequences, in general cases, with nonsparse Fourier spectrum [10–12]. In contrast, the celebrated SFFT method developed and featured in [20] presumes “sparsity” of the Fourier spectrum that requires that majority of the Fourier coefficients are zeros or negligible; e.g., the authors of [20] exemplified such sparsity level at approximately 89 %, i.e., up to 89 % of the Fourier transform coefficients are to be zeroes or negligible for operability of their SFFT. Otherwise, the DFT should be specified and treated as a nonsparse transform.
Currently, a family of novel efficient algorithms for computing the FFTs applicable for sparse sensing scenarios when only a few Fourier transform coefficients (k _{s} largest coefficients of the Nlength Fourier transform) of the input signal x are different from zero have been developed [20, 21], which compose a family of the socalled SFFT methods. To compute a reliable SFFT for typical high N > 2^{10}, the sparsity level constraint requires that majority of the Fourier coefficients are zeros [20] (or negligible to be discarded). Such model assumptions are valid, for example, in video compressing applications [20]. Therefore, if majority of the Fourier transform coefficients are supposed to be zeros or can be discarded, then efficient computing techniques from the SFFT family can be employed. The celebrated algorithms from such a family are the SFFTv1 and the SFFTv2 developed and featured in [20] where the sparsity level was exemplified at 89 % of zero (negligible) Fourier coefficients. In [21], the SFFTv3 and SFFTv4 algorithms were proposed, where some computational improvements were introduced. SFFTv3 was implemented in [28] while the program code for implementation of the SFFTv4 algorithm is not available at this time. Another competing technique for computing of the FFT of sparse (in the frequency domain) signals was addressed in [22] as the socalled FADFT2 algorithm from the AAFFT library [22]. However, in [20, 21], it was corroborated that the SFFTrelated algorithms manifest better operational performances than FADFT2 of [22].
To perform valid test comparisons between the SFFTv1, SFFTv2, SFFTv3, and the DFT_{COMM} algorithms, those should be tested under the same conditions and constrains. Here, we use the following feasible constraints: the values of N vary as follows: N = {2^{6}, 2^{7},…, 2^{20}} and k _{s} = L _{ o }, where L _{ o } represents the number of consecutive output coefficients to be calculated. In different test scenarios, the SFFTv1, SFFTv2, and SFFTv3 algorithms deliver successful results: the first of them for N = {2^{13}, 2^{14},…, 2^{20}} and k _{s} = L _{ o } = 50, the second of them for N = {2^{13}, 2^{14},…, 2^{20}} and k _{s} = L _{ o } = 50, and finally, the third of them for N = {2^{10}, 2^{11},…, 2^{20}} and k _{s} = L _{ o } = 50, respectively. Furthermore, it was experimentally corroborated that the DFT_{COMM} algorithm was able to deliver efficient results in all such tested sparse scenarios, as reported in Table 9.
In addition, DFT computations for other sparse test scenarios with different values of N and k _{s} were run, in particular, for N = L _{ i } = {2^{13}, 2^{14},…, 2^{17}} and k _{s} = L _{ o } = {1, 2, …, k _{smax}} with k _{smax} = 11 % of N. The test scenarios for the SFFT algorithms delivered successful results only for a few tested values of k _{s}. For example, the SFFTv1 algorithm is executed successfully for N = {2^{13}, 2^{15}} and k _{s} = {1, 2,…, 50}, for N = 2^{14} and k _{s} = {1, 2, …, 50} ∪ {56, 57, …, 63}, for N = 2^{16} and k _{s} = {1, 2, …, 50} ∪ {64, 65, …, 97}, and for N = 2^{17} and k _{s} = {1, 2,…, 74}.
The SFFTv2 algorithm is executed successfully for N = {2^{13}, 2^{14},…, 2^{17}} and k _{s} = {1, 2,…, 50}, while, the SFFTv3 algorithm performed successfully for N = 2^{13} and k _{s} = {4, 5,…, 673}, for N = 2^{14} and k _{s} = {4, 5,…, 1346}, for N = 2^{15} and k _{s} = {4, 5,…, 2692}, for N = 2^{16} and k _{s} = {4, 5,…, 5385}, and for N = 2^{17} and k _{s} = {4, 5,…, 10,771}. Furthermore, the DFT_{COMM} algorithm is executed successfully for all test cases (for N = {2^{13}, 2^{14},…, 2^{17}} in combination with all k _{s} = {1, 5,…, k _{smax}}, as follows from the data reported in Table 10.
Table 11 reports the absolute average errors attained with the SFFTv1, SFFTv2, SFFTv3, and DFT_{COMM} algorithms, for N = {2^{13}, 2^{14},…, 2^{18}} and k _{s} = L _{ o } = 50. In all test cases, the FFTW algorithm from [29] was used as a reference for computing the absolute error measures.
From the data reported in Table 11, it follows that for N = 8192 and k _{s} = L _{ o } = 50, the SFFTv1 and SFFTv2 algorithms manifest very close absolute error values; in particular, the attained average absolute error values were 5.6162 × 10^{−5} and 5.0689 × 10^{−5}, respectively. However, the SFFTv3 attains a lower absolute average error values than other SFFT versions. It is noteworthy to mention that the lowest absolute average error was attained with the DFT_{COMM} algorithm at a value of 2.7642 × 10^{−10}.
In addition, Fig. 8 reports the absolute values of errors of the compared tested SFFTv3 and the DFT_{COMM} algorithms for N = 8192 and k _{s} = L _{ o } = 50 under the same sparse computing scenarios.
On the other hand, the SFFTrelated algorithms demonstrate reliable operation for specific input parameter combinations, i.e., they are dependent on the combination of the dimension N of the input signal x, and the sparsity factor k _{s}. In contrast, the DFT_{COMM} algorithm manifests the operational robustness in the sense that it does not subject to any of such dimensional limitation and demonstrated perfect operational performances in all tested harsh (nonsparse) computational scenarios. Furthermore, all SFFTrelated algorithms are probabilistictype techniques [20, 21], in which the desired k _{s} largest coefficients of the Fourier spectrum of the input sequence are reconstructed (approximated) with a high probability (not mandatory with probability one). In contrast, the DFT_{COMM} algorithm is a deterministic technique, and it produces more reliable and accurate results than the family of the SFFTrelated algorithms (as demonstrated in Fig. 8 and Tables 10 and 11).
It is also worthwhile to note that presently (in the sparsityguaranteed computational scenarios only), the SFFTrelated algorithms outperform the DFT_{COMM} in the computational speed due to their specially devised execution parallelism [20, 21, 28]. From the family of the SFFTrelated algorithms, the SFFTv3 [28] manifests the most speedup computational performances for any input sequence dimension N and any feasible value k _{s} in the sparsityguaranteed scenarios only; in particular, when approximately only 8.2 % (or lower number) of the Fourier coefficients of the input signal are significant, thus not discarded (as shown in Table 10). In contrast, in all comparable (sparse or nonsparse) computational scenarios, the DFT_{COMM} algorithm manifested superior accuracy performances (lower absolute error values) than those attained with the SFFTrelated algorithms.
In closing, it is noteworthy to mention that in a majority of practical computational scenarios, the savings in the number of arithmetic operations achievable with the optimized unified DFT_{COMM} technique are significant. As a concluding example, refer to the test scenario with N = 8192 and L _{ i } = L _{ o } =307 in which case the savings in the total number of required arithmetic operations attainable with the DFT_{COMM} algorithm in comparison with the most prominent competing splitradix FFT algorithm [3, 14, 23, 24] constitute 45 %.
Conclusions
We have developed a new technique that carries out an efficient computation of the DFTs of composite lengths of the input and/or output data sequences smaller than the dimension N of the full DFT/FFT. The addressed methodology unifies the commuting, filtering, and pruning paradigms yielding the new DFT_{COMM} method that outperforms the existing competing pruningdecompositionbased techniques in the sense of attainable savings in the number of required arithmetic operations.
Furthermore, our DFT_{COMM} method admits computing the DFT_{ P } blocks at the intermediate stage of the pruned decomposed transform using any existing FFT algorithm. Based on the performed treatment of the combinational hypotheses testingtype problem regarding all feasible allocationpruning modalities, the decision in favor of the preferable hypothesis was made that yields the proposed DFT_{COMM} method. Being the globally optimal decision making result of testing the complete list of all feasible hypotheses, the DFT_{COMM} method guarantees to require a fewer or at most the same number of arithmetic operations for its execution than any other of the competing pruningdecompositionbased methods reported in the literature.
In addition, we have corroborated that, in the scenarios with nonguaranteed sparsity of the data Fourier spectra, the DFT_{COMM} method manifests better reliability and accuracy than the family of the celebrated competing SFFTrelated algorithms; while in scenarios with severe Fourier spectrum nonsparsity (i.e., when the majority of the data Fourier spectrum coefficients take nonzero values, thus cannot be discarded), the DFT_{COMM} technique always outperforms the celebrated SFFTrelated algorithms because all those simply fail to execute the program code in such uncertain computational scenarios.
References
 1.
J Markel, FFT pruning. Audio and Electroacoustics, IEEE Transactions on 19(4), 305, 311 (1971). doi:10.1109/TAU.1971.1162205
 2.
V Raghavan, KMM Prabhu, PCW Sommen, Complexity of pruning strategies for the frequency domain LMS algorithm. Signal Processing 86(10), 2836–2843 (2006). ISSN 0165–1684, http://dx.doi.org/10.1016/j.sigpro.2005.11.015
 3.
Y Xu; MS Lim, Splitradix FFT pruning for the reduction of computational complexity in OFDM based cognitive radio system, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 6972, May 30June 2 2010. doi: 10.1109/ISCAS.2010.5537048.
 4.
FM Henderson, AV Lewis (eds.), Principles and applications of imaging radar, manual of remote sensing, vol. 3, 3dth edn. (Willey, NY, 1998)
 5.
HH Barrett, KJ Myers, Foundations of image science (Willey, NY, 2004)
 6.
YV Shkvarko, Unifying experiment design and convex regularization techniques for enhanced imaging with uncertain remote sensing data––part I: theory, part II: adaptive implementation and performance issues. IEEE Trans. Geoscience and Remote Sensing 48(1), 82–111 (2010)
 7.
A Moni, CJ Bean, I Lokmer, S Rickard, Source separation on seismic data. IEEE Signal Processing Magazine 29(3), 16–28 (2012)
 8.
RM Willet, MF Duarte, MA Davenport, RG Baraniuk, Sparsity and structure in hyperspectral imaging. IEEE Signal Processing Magazine 31(1), 116–126 (2014)
 9.
Q Zhu, CR Berger, EL Turner, L Pileggi, F Franchetti, Polar format synthetic aperture radar in energy efficient applicationspecific logicinmemory, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 1557 1560, 25–30 March 2012. doi: 10.1109/ICASSP.2012.6288189.
 10.
YV Shkvarko, J Tuxpan, SR Santos, I Yaniez, Highresolution imaging with uncertain radar measurement data: a doubly regularized compressive sensing experiment design approach, in IEEE Intern. Symposium on Geoscience and Remote Sensing (IGRSS’2012), Munich, Germany, 6976–6970. (2012). ISBN: 97814673115951/12
 11.
YV Shkvarko, J Tuxpan, SR Santos, l _{2}l _{1} Structured descriptive experiment design regularization based enhancement of fractional SAR imagery. Signal Processing 93, 3553–3566 (2013). http://dx.doi.org/10.1016/j.sigpro.2013.03.024
 12.
S Foucart, H Rauhut, A mathematical introduction to compressive sensing (Springer, NYHeidelberg, 2013)
 13.
DP Skinner, Pruning the decimation intime FFT algorithm, in IEEE Transactions on Acoustics, Speech and Signal Processing, 24(2), 193–194 (1976). doi:10.1109/TASSP.1976.1162782
 14.
L Yuan, X Tian, Y Chen, Pruning splitradix FFT with time shift, International Conference on Electronics, Communications and Control (ICECC), 2011, 1581 1586, 9–11 Sept. 2011. doi: 10.1109/ICECC.2011.6066654.
 15.
S Bouguezel, MO Ahmad, MNS Swamy, Efficient pruning algorithms for the DFT computation for a subset of output samples, in Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03, vol.4, pp. IV97, IV100 vol.4, 25–28 May 2003. doi: 10.1109/ISCAS.2003.1205782.
 16.
CP Fan, GA Su, Pruning fast Fourier transform algorithm design using groupbased method, Signal Processing 87(11), 2781–2798 (2007), ISSN01651684, http://dx.doi.org/10.1016/j.sigpro.2007.05.012
 17.
TV Sreenivas, P Rao, FFT algorithm for both input and output pruning, in IEEE Transactions on Acoustics, Speech and Signal Processing, 27(3), 291–292 (1979). doi:10.1109/TASSP.1979.1163246
 18.
C Roche, A splitradix partial input/output fast Fourier transform algorithm, in IEEE Transactions on Signal Processing, 40(5), 1273, 1276 (1992). doi:10.1109/78.134493
 19.
L Wang, X Zhou, GE Sobelman, R Liu, Generic mixedradix FFT pruning, in IEEE Signal Processing Letters, 19(3), 167, 170 (2012). doi:10.1109/LSP.2012.2184283
 20.
H Hassanieh, P Indyk, D Katabi, E Price, 2012. Simple and practical algorithm for sparse Fourier transform, in Proceedings of the twentythird annual ACMSIAM symposium on Discrete Algorithms (SODA '12) Kyoto, Japan, 1719 Jan, 1183–1194, (2012)
 21.
H Hassanieh, P Indyk, D Katabi, E Price, Nearly optimal sparse Fourier transform, in Proceedings of the fortyfourth annual ACM symposium on Theory of computing (STOC '12), ACM, New York, (2012), 563–578. doi:10.1145/2213977.2214029. http://doi.acm.org/10.1145/2213977.2214029
 22.
M Iwen, A Gilbert, M Strauss et al., Empirical evaluation of a sublinear time sparse DFT algorithm, Communications in Mathematical Sciences 5(4), 981–998 (2007)
 23.
HV Sorensen, CS Burrus, Efficient computation of the DFT with only a subset of input or output points, in IEEE Transactions on Signal Processing, 41(3), 1184–1200 (1993). doi:10.1109/78.205723
 24.
M MedinaMelendrez, M AriasEstrada, A Castro, Input and/or output pruning of composite length FFTs using a DIFDIT transform decomposition, in IEEE Transactions on Signal Processing, 57(10), 4124, 4128 (2009). doi:10.1109/TSP.2009.2024855
 25.
AV Oppenheim, RW Schafer, Discretetime signal processing, (Prentice Hall, 2nd Edition, U.S., 1999)
 26.
HV Sorensen, M Heideman, CS Burrus, On computing the splitradix FFT, in IEEE Transactions on Acoustics, Speech and Signal Processing, 34(1), 152–156 (1986). doi:10.1109/TASSP.1986.1164804
 27.
Y Suzuki, S Toshio, K Kido, A new FFT algorithm of radix 3,6, and 12, in IEEE Transactions on Acoustics, Speech and Signal Processing, 34(2), 380–383 (1986). doi:10.1109/TASSP.1986.1164826
 28.
J. Schumacher, M. Püschel, High performance sparse fast Fourier transform, Master´s thesis, ETH Zurich, Department of Computer Science (2013).
 29.
M Frigo, SG Johnson, The design and implementation of FFTW3, Proceedings of the IEEE 93(2), 216–231 (2005) doi:10.1109/JPROC.2004.840301
Acknowledgements
The authors would like to thank the anonymous reviewers for their constructive criticism and comments that helped to improve the presentation of the paper.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Appendix
Appendix
Main function
Fig 9 presents the pseudocode of the main function that commute among the different alternatives to compute the DFT_{ N } (DFT_{COMM}). When the pruned decomposed transform is not required (L _{ i } ≤ D _{ op } or L _{ o } ≤ D _{ ip }), the direct method or the 2BF filtering method could be employed. In both cases, the Fourier coefficient X(0) is computed as a simple addition of the elements in the input sequence x(n).
The directFourier function is used in the scenarios with L _{ i } < 4 to compute the remaining Fourier coefficients (k = 1:1:L _{ o } − 1). The 2BF filtering method is implemented when L _{ i } ≥ 4. The directFourier function carries out the addition of complex multiplications of elements in x(n) by the complex exponential W _{ N } ^{nk} defined in (1). The filterFourier function computes each Fourier coefficient by implementing a recursive algorithm similar to the secondorder Goertzel algorithm of [25]. In the filterFourier function, the feedback signal is multiplied by the real part of the complex exponentials W _{ N } ^{k} and, next, by the conjugate of W _{ N } ^{m}. In our modification, the array of complex exponentials W _{ N } ^{m} is precomputed for m = 0:1:N − 1 and stored by duplicating in the vector W of length 2N (W = [W _{ N } ^{m}, W _{ N } ^{m}]), in such a way that W _{ N } ^{nk} and W _{ N } ^{k} could be read from it using nk and k as indexes, respectively. Accessing an element out of the vector W is impossible for these cases, as verified next. L _{ i } is inferior than 4 (or equivalently L _{ i } ≤ 3) when the direct method is used, thus n ≤ L _{ i } − 1 ≤ 2 and k ≤ L _{ o } − 1 ≤ N − 1, and consequently nk ≤ 2(N − 1) < 2 N. This assures that each element of W _{ N } ^{nk} can be extracted from W just via accessing the element indexed by nk. Similarly, for k ≤ L _{ o } − 1 ≤ N − 1, each element of W _{ N } ^{k} is directly extracted from W accessing the element indexed by k. In order to avoid multiplications in the generation of the index, nk, the latter is computed by adding k to nk in each iteration of the loop n (inside the function directFourier).
In the scenarios with L _{ i } > D _{ op } and L _{ o } > D _{ ip }, the DFT_{DIT−DIF−Pr} is performed to compute the DFT_{ N }. As it was explained previously, the DFT_{DIT−DIF−Pr} is performed in three commuting stages: the input stage, the intermediate stage, and the output stage. These stages are executed in a sequential order by calling the InputStage function, next the IntermediateStage function, and, finally, the OutputStage function.
InputStage function
The InputStage function generates the inputs to the intermediate D _{ ip } D _{ op } DFTs of length P (DFT_{ P }s), resulting in an array of three dimensions y(n _{1}, n _{2}, k _{1}). The pseudocode for implementing the InputStage function is listed in Fig. 10. The indexes, n _{1}, n _{2}, and k _{1} are varied using three nested loops (“for” instructions), in such an order that the number of accesses to each element in x(n) is reduced. This is achieved by specifying k _{1} for the inner loop, n _{1} for the intermediate loop, and n _{2} for the outer loop. With this order, once an element in x(n _{1} + D _{ op } n _{2}) is loaded, all the inputs of the DFT_{ P }s that depend on it are generated. To minimize the required computations, the nested loops have been broken down to avoid multiplications by one and the application of ifclauses.
In order to avoid overhead in the generation of the indexes, those are generated by additions only. After the InputStage function has been executed, the intermediate stage should be called.
Intermediate stage function
The intermediate stage consists in computing D _{ ip } D _{ op }DFTs of length P = N/D _{ ip } D _{ op }. This stage could be implemented with any algorithm for computing a DFT. For instance, the splitradix could be used if P is a power of two [26] or the radix3 could be used if P is a power of three [27]. For a general case, we recommend using the FFTW (the fastest Fourier transform in the west) reported in [29] to compute the D _{ ip } D _{ op }DFT_{ P }s since this is the most efficient algorithm for an arbitrary length DFT. The selected algorithm should be applied over each vector obtained from y(n _{1}, 0 : 1 : P − 1, k _{1}) for each value of n _{1} and k _{1}, resulting in a vector with output index k _{2} that is stored in the array z(n _{1}, k _{2}, k _{1}). This array is then processed by the OuputStage function.
OutputStage function
The OutputStage is performed to compute the final Fourier coefficients from the outputs of the D _{ ip } D _{ op }DFT_{ P }s stored in z(n _{1}, k _{2}, k _{1}). This function is listed in Fig. 11. In fact, the OutputStage function performs the computation of another stage of DFTs, although with a few outputs. As previously mentioned, there are two alternatives to compute each Fourier coefficient from z(n _{1}, k _{2}, k _{1}), using a direct computation or using the 2BF filtering method. Thus, the OutputStage function could employ the directFourier or the filterFourier functions listed in the pseudocode of Fig. 9 to compute the final Fourier coefficients.
Each Fourier coefficient depends on D _{ op } inputs (obtained from z(n _{1}, k _{2}, k _{1}) by varying n _{1}), so for D _{ op } < 4, the direct method is desirable; otherwise, the 2BF filtering method is to be executed.
These nested loops should be implemented in the indicated order to specify the indexes of the final Fourier coefficients. Those indexes are obtained by increasing index k by a unit in each iteration of the loop indexed by k _{1}. The directFourier function utilizes the complex exponential W _{ N } ^{nk}, while the filterFourier function involves the complex exponential W _{ N } ^{k}. All elements W _{ N } ^{k} and W _{ N } ^{nk} are extracted from W using k and nk as indexes, respectively. In order to reduce multiple copies of data and thus to achieve an enhanced efficiency of the algorithm, it is strongly desirable to implement inline functions and passing the arrays elements by reference instead of by value.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
CastroPalazuelos, D., MedinaMelendrez, M., TorresRoman, D. et al. Unified commutationpruning technique for efficient computation of composite DFTs. EURASIP J. Adv. Signal Process. 2015, 100 (2015) doi:10.1186/s136340150285z
Received
Accepted
Published
DOI
Keywords
 Composite length discrete Fourier transform
 Decimation
 Decomposition
 Fast Fourier transform
 Pruning