Unified commutationpruning technique for efficient computation of composite DFTs
 David E. CastroPalazuelos^{1, 2}Email author,
 Modesto Gpe. MedinaMelendrez^{1},
 Deni L. TorresRoman^{2} and
 Yuriy V. Shkvarko^{2}
https://doi.org/10.1186/s136340150285z
© CastroPalazuelos et al. 2015
Received: 30 March 2015
Accepted: 11 November 2015
Published: 26 November 2015
Abstract
An efficient computation of a composite length discrete Fourier transform (DFT), as well as a fast Fourier transform (FFT) of both time and space data sequences in uncertain (nonsparse or sparse) computational scenarios, requires specific processing algorithms. Traditional algorithms typically employ some pruning methods without any commutations, which prevents them from attaining the potential computational efficiency. In this paper, we propose an alternative unified approach with automatic commutations between three computational modalities aimed at efficient computations of the pruned DFTs adapted for variable composite lengths of the nonsparse inputoutput data. The first modality is an implementation of the direct computation of a composite length DFT, the second one employs the secondorder recursive filtering method, and the third one performs the new pruned decomposed transform. The pruned decomposed transform algorithm performs the decimation in time or space (DIT) data acquisition domain and, then, decimation in frequency (DIF). The unified combination of these three algorithms is addressed as the DFT_{COMM} technique. Based on the treatment of the combinationaltype hypotheses testing optimization problem of preferable allocations between all feasible commutingpruning modalities, we have found the global optimal solution to the pruning problem that always requires a fewer or, at most, the same number of arithmetic operations than other feasible modalities. The DFT_{COMM} method outperforms the existing competing pruning techniques in the sense of attainable savings in the number of required arithmetic operations. It requires fewer or at most the same number of arithmetic operations for its execution than any other of the competing pruning methods reported in the literature. Finally, we provide the comparison of the DFT_{COMM} with the recently developed sparse fast Fourier transform (SFFT) algorithmic family. We feature that, in the sensing scenarios with sparse/nonsparse data Fourier spectrum, the DFT_{COMM} technique manifests robustness against such model uncertainties in the sense of insensitivity for sparsity/nonsparsity restrictions and the variability of the operating parameters.
Keywords
1 Introduction
1.1 Motivation
Many signal processing applications require computation of the socalled pruned discrete Fourier transform (DFT), i.e., an efficient alternative to compute the required DFT when the input sequence and/or the required output sequences are smaller than the length of the full DFT (a full DFT means that all the output components are to be computed, and all the input elements are used to compute the transform); in the literature those are referred to as pruned fast Fourier transforms (FFTs) or pruned DFTs [1]. Common practical examples relate to, e.g., the least mean squared (LMS) optimal DFTbased pruned signal filtering [2], and the complexityreduced computational implementation of the orthogonal frequency division multiplexing systems [3]. Another practical example relates to efficient implementation of the matched spatial filtering (MSF) algorithm for performing the range and azimuth data compression in unfocused of fractionally focused synthetic aperture radar (SAR) system that both employ the pruned DFTbased MSF processing of the trajectory data signals performed in a factorized fashion in the socalled slow time and fast time data acquisition scales [4–6]. Other examples relate to DFTbased analysis of remote sensing (RS) data acquired with a variety of sensor systems, ranging from seismology [7] to multispectral radiometry [8]. Other authors as Zhu et al. in [9] proposed an algorithm for performing SAR polar format regridding interpolation suited for the logicinmemory paradigm (hardware/architecture solution) and to provide the necessary design automation tool chain to implement their proposed algorithm (e.g., FFTs for image formation) in advanced silicon technology. It is important to note that a majority of realworld RS data acquisition and processing problems can be qualified as sensing in harsh environments [4–8, 10, 11] in the sense of intrinsic problem model uncertainties peculiar for such RS modalities. In a context of pruned DFTs, realistic harsh sensing scenarios are characterized by the uncertainties attributed to zeropadded input data acquisition modes with variable composite length windowing of the input and/or output Fourier transform sequences, in general cases, with nonsparse Fourier spectra [10–12]. Those specifics motivate the development of efficient pruned DFT/FFT techniques particularly adapted for computational implementation with uncertain data acquired in harsh sensing scenarios.
1.2 Related work
Traditional DFT algorithms adapted for such uncertain scenarios typically employ some pruning methods without any commutations, which prevent them from attaining the potential computational efficiency. Most of the proposals reported in the literature are based on construction of pruning modalities of specific FFTrelated algorithms. Some of them prune the input of a specific FFT algorithm, others prune the output, and just a few can prune the input and output (inputoutput) at the same time. Markel in [1], and Skinner in [13], proposed the input pruning methods based on a radix2 FFTs, while Yuan et al., in [14], proposed an input pruning of a splitradix FFT. The approaches of Bouguezel et al. [15] and Fan et al. [16] are applicable for output pruning a radix2 FFTs, while the Xu’s et al. [3] proposal suggests pruning the output of a splitradix FFT. In addition, Sreenivas et al., in [17], Roche, in [18], and Wang et al., in [19], developed the methods for pruning the inputoutput at the same time. The first one is based on a radix2 FFT, the second one employs the splitradix FFT, and the third one performs the mixedradix FFT, respectively. A majority of those methods are applicable only for computing DFTs with the length of a power of two that drastically restricts their applicability to general uncertain sensing scenarios.
On the other hand, a family of novel socalled sparse FFT (SFFT) algorithms adapted to computing the FFTs, when only a few Fourier spectrum coefficients of the input signal are different from zero (few largest coefficients of the Fourier transform spectrum), has been developed recently [20, 21]. The celebrated SFFTrelated algorithms, socalled SFFTv1 and SFFTv2, were reported by Hassanieh et al., in [20]. Later, in [21], the improved SFFTrelated versions, addressed as SFFTv3 and SFFTv4, were reported. Another algorithm that considers the Fourier spectrum sparsity restrictions is the socalled FADFT2 reported and implemented in the AAFFT library [22]. However, the SFFTrelated algorithms significantly outperform the AAFFT as it was corroborated in [20].
It is worthwhile to mention that the SFFTrelated techniques are applicable only for the sparse sensing scenarios; e.g., referring to [20, 21], the authors exemplified the sparsity level by imposing the restriction that up to 89 % of the Fourier coefficients are zeroes or negligible, thus can be discarded. Such a restriction could be valid in a variety of data compressing applications, e.g., compression and recovery of video data not degraded by noise and/or imaging system instrumental function [20]. Nevertheless, the restriction on such sparsity is not valid for many realworld operational scenarios, e.g., processing of the RS data acquired in harsh sensing environments [4–8, 10–12]. For example, in SAR imaging of nonhomogeneous scenes, e.g., urban areas, nonuniformly textured zones, etc., a majority of the Fourier transform coefficients should be considered for featureenhanced MSFbased imaging [5, 6]; thus, an 89 % of sparsity level restriction is never a feasible model assumption.
In this paper, we are interested in developing the pruned DFT (DFTs of highly composite length) algorithms applicable for nearrealtime signal processing and analysis in uncertain sensing scenarios (i.e., with nonguaranteed sparsity of the data Fourier spectra); that is why the family of the SFFTrelated techniques is beyond our detailed study here. Nevertheless, for the purpose of generality, in Section 4, we perform comparative analysis of our developed methods with the SFFT under the same conditions and constraints for different combinations of the specified processing/operational parameters.
In the related literature, in which the feasible nonsparse scenarios are considered, two competing approaches for pruning the composite (no prime) length DFTs were addressed. Sorensen et al., in [23], proposed two methods to prune composite length DFTs, first one to prune the input and another one to prune the output. Next, the methodology of MedinaMelendrez et al., in [24], merges the methods developed originally in [23] to obtain a composite structure that is capable to prune the input and/or output of a general decomposed transform at the same time. It was demonstrated, in [24], that such a computational structure could be as efficient as the one based on specific FFT algorithms [15–17]. In [24], a new methodology for decomposition over a composite length DFT has been proposed as a modification of the Sorensen’s approach [23]. Furthermore, the [24] suggests, first, to perform decimation in frequency (DIF) and, second, a decimation in time (DIT). For processing of spatial data, the corresponding decimation in the space domain should be performed similarly to the DIT operation for time data processing. To avoid misunderstandings, in the rest of the paper, we will use the same abbreviation (DIT) for both processing models and consider the time data processing as a principal model. Nevertheless, all developments are directly transferable for the space data processing scenario.
Hence, the three basic stages to compute the composite length DFTs of nonsparse data encompass the input, the intermediate, and the output stages. The decomposed transform is then pruned by eliminating, from the input and output stages, additions and multiplications by zero, multiplications by one, and all other computations not needed to obtain the required Fourier transform coefficients. In [24], such the multistage decomposed and pruned transform is referred to as FFT_{DIF−DIT−TD} (here, that method is referred as DFT_{DIF−DIT−Pr}). Nevertheless, both methods addressed in [23, 24] do not achieve the lowest attainable number of the required arithmetic operations. A possible alternative for computing few Fourier coefficients from few input elements (all nonzero, thus nonsparse) can be addressed based on the application of the secondorder Goertzel algorithm [23] modified to accept the input elements in a reverse order.
1.3 Novel contributions
The main contribution of this paper consists in the development of a new alternative method for efficient computing of a composite length DFT, when the input sequence and/or the required output sequence are smaller than the length of the full transform. Our proposal guarantees the same or smaller number of arithmetic operations in comparison with the competing methods in the literature. Moreover, it manifests robustness against sparsity/nonsparsity restrictions and the variability of the operating parameters as detailed in Sections 3 and 4.
The innovative idea is to automatically commute among three modalities to implement the DFT: the direct method, the recursive method, and the pruned decomposed transform. Thus, our new proposed composite approach unifies the decomposition of the DFT with its pruning. First, we develop an alternative technique to compute the pruned decomposed transform, in which the DIT is performed at the first stage followed by the DIF. We address this method as DFT_{DIT−DIF−Pr}. An analysis of the two alternatives (DFT_{DIT−DIF−Pr} and DFT_{DIF−DIT−Pr}) verifies that the DFT_{DIT−DIF−Pr} requires a smaller or as maximum equal number of arithmetic operations compared with the DFT_{DIF−DIT−Pr}, so the use of the DFT_{DIT−DIF−Pr} is strongly recommended when the decomposed and pruned transforms are required. Next, we demonstrate that our proposal requires a lower number of arithmetic operations than any of the pruningbased competing methods [3, 14, 23, 24]. Further, we demonstrate that both decomposed transforms (DFT_{DIF−DIT−Pr} and DFT_{DIT−DIF−Pr}) can be obtained from a general decomposition methodology. Also, it manifests the robustness in sparse and nonsparse sensing scenarios (i.e., operability for an arbitrary number of consecutive input elements (L _{ i }), the number of consecutive outputs that should be computed (L _{ o }), and the length of the full transform (N)) in contrast to the recently developed most prominent SFFT familyrelated methods [20, 21] operable in sparse scenarios only.
It is noteworthy to mention that in the majority of practical computational scenarios, significant savings in the number of arithmetic operations with the proposed technique are achieved, e.g., in Section 4.1, the DFT_{COMM} technique compared with splitradix FFT (SRFFT) algorithm produces savings of 42 to 92 %.
The rest of the paper is organized as follows: in Section 2, the general decomposition transform methodology is described and explained. An analysis of all feasible transform decomposition methods is presented next in Section 3 followed by the combinational hypotheses testing optimizationbased selection of the best decomposition transform permutation modality that yields the unified commutationpruning DFT_{COMM} technique. In Section 4, comparisons among the developed unified commutationpruning technique and other competing algorithms in the sense of savings in the number of required arithmetic operations are presented and featured. Also, the proposed DFT_{COMM} method is compared in detail with the most prominent competing SFFTrelated algorithms in the context of computing the DFTs in both sparse and nonsparse (harsh) sensing scenarios for different values of the operational parameters (L _{ i }, L _{ o }, and N). Concluding remarks in Section 5 summarize the study. The Appendix provides a pseudocode for implementing the proposed method.
2 DFT transform decomposition
The computing of the pruned decomposed transform (7) requires, first, application of DIT to the DFT_{ N } with D _{ op } as a decomposition factor and, then, application of DIF to the resulting DFTs with D _{ ip } as a decomposition factor.
The DFT_{DIT−DIF−Pr} involves three processing stages: an input stage (computation of y(n _{1}, n _{2}, k _{1})), an intermediate stage (computation of D _{ ip } D _{ op } DFTs of length P), and an output stage (computation of the complex multiplications and additions dependent on index n _{1}).
3 Proposed method
Our method employs three different alternatives to compute the DFT_{ N }: a direct method, a recursive method, and/or a pruned decomposed transform. Admissible permutations/allocations of all feasible decompositionpruning modalities compose all possible hypotheses regarding the feasible alternative schemes for computing the composite DFTs.
Complete list of hypotheses \( {\left\{{\mathrm{H}}_h\right\}}_{h=1}^{12} \) regarding feasible commutingpruning implementation structures
Hypotheses  Specifications 

H_{1} : y = Ax  x = input 
H_{2} : y = Bx  y = output 
H_{3} : y = Fx  A = DIF 
H_{4} : y = ABx  B = DIT 
H_{5} : y = BAx  F = 2BF filtering 
H_{6} : y = ABFx  D = DFT_{ N } 
H_{7} : y = BAFx  
H_{8} : y = Dx  
H_{9} : H_{4} ∪ H_{6}  
H_{10} : H_{5} ∪ H_{7}  
H_{11} : H_{9} ∪ H_{3} ∪ H_{8}  
H_{12} : H_{10} ∪ H_{3} ∪ H_{8} 
Later, a more efficient inputoutput pruning method for composite length DFTs was developed in [24]. Such commuting between H_{4} ∪ H_{6} leads to hypothesis H_{9} as featured in Fig. 3a. In [24], such a technique was constructed as a modification of the transform decomposition proposed originally by Sorensen et al., in [23], but with extra capability to perform the inputoutput pruning at the same time. Additionally, the computation of each final output employs a commutation between a direct method and the 2BF filtering algorithm, i.e., the 2BFfiltering algorithm is an efficient method for computing a subset of final outputs from their decomposition transform [23, 24].
Total number of arithmetic operations required to compute the input and output stages of DFT_{DIF−DIT−Pr} and DFT_{DIT−DIF−Pr}
Pruned decomposed transform  Corresponding term in (8)  Number of required real arithmetic operations  Limiting constraints 

DFT_{DIF−DIT−Pr}  OPER_{input}  6 (L _{ i } − 1) (D _{ ip } − 1)  – 
OPER_{output}  2L _{ o } (D _{ op } − 1)  (L _{ o } ≤ D _{ ip })  
2L _{ o } (D _{ op } − 1) + 6 (L _{ o } − D _{ ip }) (D _{ op } − 1)  (L _{ o } > D _{ ip }) & (D _{ op } < 4)  
(L _{ o } − D _{ ip }) (2D _{ op } + 2) + 2D _{ ip } (D _{ op } − 1) + (L _{ o } − D _{ ip }) (4D _{ op } − 2)^{ a }  (L _{ o } > D _{ ip }) & (D _{ op } ≥ 4)  
DFT_{DIT−DIF−Pr}  OPER_{input}  0  (L _{ i } ≤ D _{ op }) 
6 (L _{ i } − D _{ op }) (D _{ ip } − 1)  (L _{ i } > D _{ op })  
OPER_{output}  6 (L _{ o } − 1) (D _{ op } − 1) + 2L _{ o } (D _{ op } − 1)  (D _{ op } < 4)  
(L _{ o } − 1) (2D _{ op } + 2) + 2 (D _{ op } − 1) + (L _{ o } − 1) (4D _{ op } − 2)^{ a }  (D _{ op } ≥ 4) 
Total number of arithmetic operations required to compute the input and output stages of DFT_{COMM−DIF−DIT−Pr} and DFT_{COMM−DIT−DIF−Pr} modalities
Method to compute the DFT_{ N }  Number of arithmetic operations  Limiting conditions 

Direct method  6 (L _{ o } − 1) (L _{ i } − 1) + 2L _{ o } (L _{ i } − 1)  ((L _{ i } ≤ D _{ op } )(L _{ o } ≤ D _{ ip } )) & (L _{ i } < 4) 
2BF filtering method  (L _{ o } − 1) (2L _{ i } + 2) + 2 (L _{ i } − 1) + (L _{ o } − 1) (4L _{ i } − 2)  ((L _{ i } ≤ D _{ op } )(L _{ o } ≤ D _{ ip } )) & (L _{ i } ≥ 4) 
Pruned decomposed transform  OPERtot of (8) using Table 2  (L _{ i } > D _{ op } ) & (L _{ o } > D _{ ip } ) 
The selection of proper permutation/allocation structure directly relates to the considered above problem of selection of an optimal commutationpruning implementation structure casted and treated as a combinational hypotheses testing task. All feasible hypotheses \( {\left\{{\mathrm{H}}_h\right\}}_{h=1}^{12} \) relate to formal implementation structures specified in Table 1. Now, we are ready to find the best permutation/allocation structure in the sense of the imposed quality measure (in our case in the sense of the lowest possible number of required arithmetical operations).
3.1 Analysis of the hypotheses
According to (8), OPER_{tot} depends on L _{ i }, L _{ o }, N, D _{ ip }, D _{ op }, and the algorithm employed to implement the D _{ ip } D _{ op } DFT_{ P } blocks (OPER_{DFTP}).
At the input and output stages, there are multiplications by one, so those multiplications are avoided at all in our approach. Also, the multiplications by one at the input stage are also avoided depending on whether DFT_{DIF−DIT−Pr} or DFT_{DIT−DIF−Pr} was executed in the particular employed pruned decomposed transform modality.

At the input stage, the multiplications by one are excluded when n _{1} = n _{2} = 0 and k _{1} = 0.

Furthermore, the multiplications by one at the output stage are also avoided when n _{1} = 0 or k _{2} = 0.
Therefore, the DFT_{DIF−DIT−Pr} modality always requires fewer complex multiplications to compute the output stage than the DFT_{DITDIFPr} modality (this is reported in Tables 2 and 3).

At the input stage, the multiplications by one are excluded when n _{2} = 0 or k _{1} = 0.

Also, at the output stage, the multiplications by one are avoided when k _{1} = k _{2} = 0 or n _{1} = 0.
Therefore, the DFT_{DITDIFPr} modality always requires fewer complex multiplications at the input stage than the DFT_{DIF−DIT−Pr} modality (as it is corroborated in the analysis reported in Tables 2 and 3).
The output stage of both pruned decomposed transform modalities can be computed by the direct addition of complex multiplications or a kind of recursive algorithm as those proposed in [23] (referred to as the 2BF filtering method), which reduces the number of required multiplications by about half. The number of arithmetic multiplications required by the output stage of the DFT_{DIF−DIT−Pr} algorithm is equal to 4 (L _{ o } − D _{ ip }) (D _{ op } − 1) when (L _{ o } > D _{ ip }) and (D _{ op } < 4). Next, the number of arithmetic multiplications is equal to (L _{ o } − D _{ ip }) (2D _{ op } + 2) when (L _{ o } > D _{ ip }) and (D _{ op } ≥ 4). Thus, the 2BF filtering algorithm can be effectively used to compute the output stage.
On the other hand, the number of arithmetic multiplications required to compute the output stage of the DFT_{DITDIFPr} algorithm is equal to 4(L _{ o } − 1) (D _{ op } − 1) when (D _{ op } < 4); and the number of arithmetic multiplications is equal to (L _{ o } − 1) (2D _{ op } + 2) when (D _{ op } ≥ 4). Thus, the 2BF filtering algorithm can also be effectively employed to compute the output stage.
In [23], it was proven that the 2BF filtering method is more efficient than the direct addition of complex multiplications when the number of input elements is larger than 4 (when the number of input elements is equal to 4, both methods manifest the same operational complexity performances). The output stages of both pruned decomposed transforms have the same structures, so same sort of commutations is required to efficiently compute the output stage of the DFT_{DIT−DIF−Pr}. The expressions for OPER_{input} and OPER_{ouput} for the DFT_{DIF−DIT−Pr} and the DFT_{DIT−DIF−Pr} are listed in Table 2, where it is implicitly assumed that each complex multiplication requires six arithmetic operations (four real multiplications and two real additions), and each complex addition requires two arithmetic operations (two real additions).
The performances of the pruned decomposed transforms depend on the decomposition factors, D _{ ip } and D _{ op }. A simple analysis can be carried out to deduce which decomposition factors are preferable to be used. Our unified commutationpruning method performs the decomposition of the DFT_{ N } into three stages of smaller dimension DFTs and pruning part of those inputs that are equal to zero and/or part of those outputs that are not needed to compute the final Fourier coefficients.
Thus, the decomposed transform algorithm always selects a pair (D _{ ip }, D _{ op }) for which the largest DFTs could be successfully pruned, or equivalently, a pair (D _{ ip }, D _{ op }) for which the intermediate stage results in the smallest dimension DFTs.
The DFTs of the intermediate stage have a size of N/D _{ ip } D _{ op } ≡ P, so D _{ ip } and D _{ op } should be chosen as large as possible. Furthermore, the values for the decomposition factors should satisfy the bound N/D _{ ip } ≥ L _{ i } (where, N/D _{ ip } must be close to but higher than L _{ i }) and N/D _{ op } ≈ L _{ o }, as it was considered in the derivation of (5). Hence, the pair of decomposition factors (D _{ ip }, D _{ op }) closest to (N/L _{ i }, N/L _{ o }) that satisfy D _{ ip } ≤ N/L _{ i } are used by the decomposed transform algorithm, according to the proximity evaluated by its Euclidean distance.

If L _{ i } ≤ D _{ op }, at most one input of each DFT_{ P } (i.e., the first one) in the intermediate stage would be applied; therefore, their P outputs would be replicas of that single input.

For L _{ o } ≤ D _{ ip }, only the first output of each DFT_{ P } (this corresponds to a simple addition of the input elements) is required to compute the final Fourier coefficients.
Thus, inefficient implementations of the DFT_{ P }s yield the inequalitytype constraints Li ≤ D _{ op } or L _{ o } ≤ D _{ ip }. In these cases, our method commutes to efficiently perform the direct computation of the DFT_{ N } or an efficient recursive alternative (via performing the 2BF filtering technique).
Sorensen et al., in [23], proposed a method to compute a subset of the output components of their proposed specific DFT decomposition; this algorithm was referred to as a 2BF filtering method. The 2BF filtering method [23] was derived as a modification of the previously addressed Goertzel algorithm [25]. The 2BF filtering method takes advantages of the periodicity and the shifted cyclic convolution shape between the input sequence and the \( {W}_N^{nk}={e}^{j\left(2\pi /N\right)kn} \) factor.
The poles of the system transfer function (the roots of the polynomial in the denominator of H(z)) have to be evaluated L times (n = 0, 1, 2,…, L − 1), while the zeros of the system transfer function (the roots of the numerator of H(z)) only once. Here, L represents the number of consecutive nonzero input elements of the 2BF filter; i.e., in the opposite case, it represents the number of consecutive nonzero output components of the employed pruned decomposed transform modality (DFT_{DIF−DIT−Pr} or DFT_{DIT−DIF−Pr}).
The computation of each pole of (9) requires two arithmetic multiplications (two real multiplications) and two arithmetic additions (two real additions). Furthermore, the computation of the zeros of (9) requires four arithmetic multiplications and four arithmetic additions only.
The Q _{1} node in Fig. 4 is initialized with f(L − 1); therefore, the computation starts from n = L − 2. When n = 0, the complex addition of the input is only required; then, the zero is computed after such a delay. Such computational organization saves two arithmetic multiplications and six arithmetic additions for finding of each required output component.

The structure of the DFT_{DIF−DIT−Pr} contains D _{ ip } sets of D _{ op } DFT_{ P }s from which the final outputs are computed (see the general diagram in Fig. 1).

The DFT_{COMM−DIF−DIT−Pr} algorithm employs the 2BF filtering method to implement the output stage of DFT_{DIF−DIT−Pr} with L = L _{ o }, if ( (L _{ i } > D _{ op }) & (L _{ o } > D _{ ip }) ) & ( (L _{ o } > D _{ ip })&(D _{ op } ≥ 4) ) (as featured in Tables 2 and 3). Here, the required arithmetic operations are specified as follows: the number of arithmetic multiplications are equal to NumArithMult _{ 2BF } = (L _{ o } − D _{ ip })(2D _{ op } + 2) and the number of arithmetic additions are equal to NumArithAdd _{ 2BF } = 2 D _{ ip }(D _{ op } − 1) + (L _{ o } − D _{ ip })(4D _{ op } − 2).

Furthermore, the DFT_{COMM−DIF−DIT−Pr} algorithm employs the 2BF filtering method exclusively with L = L _{ i }, if ( (L _{ i } ≤ D _{ op })  (L _{ o } ≤ D _{ ip }) ) & (L _{ i } ≥ 4) (as featured in Table 3) to compute the required Fourier coefficients. Here, the required arithmetic operations are specified as follows: NumArithMult _{ 2BF } = (L _{ o } − 1)(2L _{ i } + 2) and NumArithAdd _{ 2BF } = 2(L _{ i } − 1) + (L _{ o } − 1)(4L _{ i } − 2).

The structure of the DFT_{DITDIFPr} contains D _{ op } sets of D _{ ip } DFT_{ P }s from which the final outputs are computed (as featured in Fig. 2).

The DFT_{COMMDITDIFPr} algorithm employs the 2BF filtering method to implement the output stage of DFT_{DITDIFPr} with L = L _{ o }, if ( (L _{ i } > D _{ op }) & (L _{ o } > D _{ ip }) ) & (D _{ op } ≥ 4), (as featured in Tables 2 and 3). Here, the required arithmetic operations are specified as follows: NumArithMult _{ 2BF } = (L _{ o } − 1)(2D _{ op } + 2), and NumArithAdd _{ 2BF } = 2 (D _{ op } − 1) + (L _{ o } − 1)(4D _{ op } − 2).

On the other hand, the DFT_{COMMDITDIFPr} algorithm employs the 2BF filtering method exclusively with L = L _{ i }, if ( (L _{ i } ≤ D _{ op })  (L _{ o } ≤ D _{ ip }) ) & (L _{ i } ≥ 4) (as reported in Table 3) to compute the required Fourier coefficients. Here, the required arithmetic operations are specified as follows: NumArithMult _{ 2BF } = (L _{ o } − 1)(2L _{ i } + 2) and NumArithAdd _{ 2BF } = 2(L _{ i } − 1) + (L _{ o } − 1)(4L _{ i } − 2).
The computation of each input and/or output element in both cases detailed above is executed according to the diagram presented in Fig. 4. In closing, we note that the pseudocode presented in the Appendix (see Fig. 9) contains all scripts needed to compute each Fourier coefficient employing the 2BF filtering method.
Note once again that the 2BF filtering method has to be employed if L _{ i } is larger or equal to 4, in which case, it manifests a higher efficiency than the direct method for computing the DFT_{ N } in (1). The total number of arithmetic operations required by our proposed method is reported in Table 3.
3.2 Selection of the permutation/allocation structure
Graphs in Fig. 5 indicate that the number of operations required to perform our commutationpruning technique (DFT_{COMM−DIF−DIT−Pr} and DFT_{COMM−DIT−DIF−Pr}) with the selected decomposition factors using the roughDP method are equal to or slighty greater than those, in which the decomposition factors are specified employing exhDP. The differences correspond to the regions where the commutation conditions prescribe performing the pruned decomposed transform instead of the 2BF filtering method.
The DFT_{DIT−DIF−Pr} modality requires the same or a smaller number of arithmetic operations than the competing DFT_{DIF−DIT−Pr} for all the cases where the pruned decomposed transform is performed (as it follows from the data reported in Fig. 5). Since the same decomposition factors (D _{ ip }, D _{ op }) are used in both pruned decomposed transforms, it is sufficient to compare the number of required operations by their input and output stages (OPER_{input} + OPER_{output}) reported in Table 2 to distinguish which one is the most efficient. The comparison for the cases L _{ i } ≤ D _{ op } and L _{ o } ≤ D _{ ip } is not needed since in such scenarios, a direct or recursive method is employed instead of a pruned decomposed transform. For scenarios with D _{ op } < 4, both pruned decomposed transforms require the same number of arithmetic operations for their execution. Otherwise, for D _{ op } ≥ 4, the execution of DFT_{DIF−DIT−Pr} requires 2D _{ ip } D _{ op } − 8D _{ ip } − 2D _{ op } + 8 more arithmetic operations than DFT_{DIT−DIF−Pr} demonstrating that the latter manifests always the same or a better performance. Thus, from the combinational permutation analysis, it follows that it is always desirable to perform the DFT_{DIT−DIF−Pr} when a pruned decomposed transform would be required. In the following section, an efficient implementation of that proposed unified commutationpruning technique is detailed considering that the pruned decomposed transform is implemented using the DFT_{DIT−DIF−Pr}. In summary, we now resume that the performed combinational hypothesis testingbased optimal selection of the preferable computational structure of the decomposed DFTs made the decision in favor of hypothesis H_{12}; this yields the proposed DFT_{COMM−DIT−DIF−Pr} method (referred further on for simplicity as DFT_{COMM}) with the highest possible computational efficiency. Being the optimal decision of the performed “brute force search” based testing of all feasible hypotheses, this method is guaranteed to be globally optimal one and thus is strongly recommended for performing the required commuting between three techniques to implement the overall composite DFT in the following arrangement mode: the direct method, the recursive method, and the pruned decomposed transform implemented via DFT_{DIT−DIF−Pr}.
4 Comparison with other competing algorithms
A variety of competing methods for pruning the DFTs in arbitrary (nonsparse) computational scenarios have been addressed in the literature (see [1, 3, 13–19, 23, 24]). In [24], the FFT_{DIF−DIT−TD} modality (that we here refer to as DFT_{DIF−DIT−Pr}) was proposed as an alternative technique for pruning the input and/or the output of DFTs. That method [24] was compared with other pruning techniques reported in the literature until 2009. Comparisons of the methods proposed by Bouguezel et al. [15], Fan et al. [16], Sreenivas et al. [17], Roche [18], and the DFT_{DIF−DIT−Pr} reported in [24] demonstrated that the DFT_{DIF−DIT−Pr} modality requires fewer arithmetic operations than those of [15–17], while attaining the operational performances similar to that of [18]. Additionally, in Section 3, it was corroborated that our proposed DFT_{COMM} technique requires equal or less arithmetic operations than [24]. Here beneath, we compare our approach with the recently reported most prominent competing pruning methods.
4.1 Comparisons with pruningbased algorithms
Total number of arithmetic operations required to compute the SRFFT_{pruning}, SRFFT_{pruningtimeshift}, and DFT_{COMM} algorithms
Algorithm  Number of required real arithmetic operations  Conditions 

SRFFT_{pruning}  \( 6\left\{{\displaystyle \sum_{k=1}^d\left[{N_B}_{(k)}{N_W}_{(k)}\right]}\right.+\left.\frac{1}{2}\left({N_B}_{\left(d+1\right)}{N_W}_{\left(d+1\right)}\right)\right\}+2\left\{N\cdot d+{\displaystyle \sum_{k=0}^{r\left(d+1\right)}\left[L{2}^k\right]}\right\} \)  – 
SRFFT_{pruningtimeshift}  \( 6\left\{{\displaystyle \sum_{k=1}^{rd}\left[{N_B}_{(k)}2L\right]+}\right.\left.{\displaystyle \sum_{k=rd+1}^{r1}\left\{{N_B}_{(k)}\left[{N_W}_{(k)}+2\right]\right\}}\right\}+2\left\{N\left(d1\right)+N\right\} \)  – 
DFT_{COMM}  6(L _{ o } − 1)(L _{ i } − 1) + 2L _{ o }(L _{ i } − 1)  {(L _{ i } ≤ D _{ op })(L _{ o } ≤ D _{ ip })} & (L _{ i } < 4) 
(L _{ o } − 1)(2L _{ i } + 2) + 2(L _{ i } − 1) + (L _{ o } − 1)(4L _{ i } − 2)  {(L _{ i } ≤ D _{ op })(L _{ o } ≤ D _{ ip })} & (L _{ i } ≥ 4)  
6(L _{ i } − D _{ op })(D _{ ip } − 1) + D _{ ip } D _{ op }(4P log_{2}(P) − 6P + 8) + 6(L _{ o } − 1)(D _{ op } − 1) + 2L _{ o }(D _{ op } − 1)  {(L _{ i } > D _{ op }) & (L _{ o } > D _{ ip })} & (D _{ op } < 4)  
6(L _{ i } − D _{ op })(D _{ ip } − 1) + D _{ ip } D _{ op }(4P log_{2}(P) − 6P + 8) + (L _{ o } − 1)(2D _{ op } + 2) + 2(D _{ op } − 1) +  {(L _{ i } > D _{ op }) & (L _{ o } > D _{ ip })} & (D _{ op } ≥ 4) 
Savings in the number of arithmetic operations attained with the DFT_{COMM} algorithm in comparison with the competing SRFFT (noprun) and SRFFT_{pruning} methods for N = 262,144
DFT_{COMM} in comparison with:  L _{ o }  L _{ i }  Savings 

SRFFT(noprun)  {2^{1},2^{2},…, N}  N = 2^{18}  DFT_{COMM}, 42.76 % with output pruning 
SRFFT_{pruning}  DFT_{COMM}, 2.96 % with output pruning  
SRFFT(noprun)  1027  DFT_{COMM}, 75.02 % with inputoutput pruning at the same time  
SRFFT_{pruning}  SRFFT_{pruning} fails to deliver a result  
SRFFT(noprun)  33  DFT_{COMM}, 91.35 % with inputoutput pruning at the same time  
SRFFT_{pruning}  SRFFT_{pruning} fails to deliver a result 
Savings in the number of arithmetic operations attained with the DFT_{COMM} algorithm in comparison with the competing SRFFT (noprun) and SRFFT_{pruning} methods for N = 1024
DFT_{COMM} in comparison with:  L _{ o }  L _{ i }  Savings 

SRFFT(noprun)  {2^{1},2^{2},…, N}  N = 2^{10}  DFT_{COMM}, 36.48 % with output pruning 
SRFFT_{pruning}  DFT_{COMM}, 2.73 % with output pruning  
SRFFT(noprun)  90  DFT_{COMM}, 59.30 % with inputoutput pruning at the same time  
SRFFT_{pruning}  SRFFT_{pruning} fails to deliver a result  
SRFFT(noprun)  13  DFT_{COMM}, 81.65 % with inputoutput pruning at the same time  
SRFFT_{pruning}  SRFFT_{pruning} fails to deliver a result 
From Fig. 6, one can deduce that our proposed DFT_{COMM} method requires fewer arithmetic operations than the competing SRFFT_{pruning} method in almost all the test cases (with the only one exception for the case L _{ o } = N/2 and L _{ o } = N/4). Next, Tables 5 and 6 report the savings in the number of arithmetic operations attained with our DFT_{COMM} in comparison with the competing SRFFT and the SRFFT_{pruning} techniques. In the scenarios with L _{ o } = N and L _{ i } = {2^{1}, 2^{2},…, N}, the DFT_{COMM} algorithm manifests 2.96 and 2.73 % savings in the number of arithmetic operations in comparison with the SRFFT_{pruning} for N = {262,144, 1024}, respectively.
In other cases, from Table 5, it follows that in the scenarios with L _{ i } = 1027, L _{ i } = 33, and L _{ o } = {2^{1}, 2^{2},…, N}, the SRFFT_{pruning} method fails to deliver a result at all. Thus, from Table 5, it follows that in the cases when L _{ i } = N = 262,144, L _{ i } = 1027, L _{ i } = 33, and L _{ o } = {2^{1}, 2^{2},…, N}, the DFT_{COMM} algorithm produces savings of 42.76, 75.02, and 91.35 %, respectively, in the number of arithmetic operations required to compute the composite length DFT in comparison with the competing SRFFT algorithm. Furthermore, from Table 6, it follows that in the scenarios with L _{ i } = 90, L _{ i } = 13, and L _{ o } = {2^{1}, 2^{2},…, N}, the SRFFT_{pruning} method fails to deliver a result at all. Thus, from Table 6, it follows that in the cases when L _{ i } = N = 1024, L _{ i } = 90, L _{ i } = 13, and L _{ o } = {2^{1}, 2^{2},…, N}, the DFT_{COMM} algorithm produces savings of 36.48, 59.30, and 81.65 %, respectively, in the number of arithmetic operations required to compute the composite length DFT in comparison with the competing SRFFT algorithm.
Yuan et al., in [14], proposed another competing, the socalled SRFFT_{pruning−time−shift} method via modifying the SRFFT_{pruning} employing a time shifting approach that yields the input pruning algorithm based on the SRFFT methodology for L consecutive nonzero input elements. It is noteworthy to stress that the SRFFT_{pruning−time−shift} approach implicitly assumes that lengths L and N may take values equal to the power of two only.
Savings in the number of arithmetic operations attained with the DFT_{COMM} algorithm in comparison with the competing SRFFT (noprun) and SRFFT_{pruningtimeshift} methods for N = 262,144
DFT_{COMM} in comparison with:  L _{ i }  L _{ o }  Saving 

SRFFT(noprun)  {2^{1},2^{2},…, N}  N = 2^{18}  DFT_{COMM}, 43.26 % with input pruning 
SRFFT_{pruningtimeshift}  DFT_{COMM}, 5.11 % with input pruning  
SRFFT(noprun)  1027  DFT_{COMM}, 76.24 % with inputoutput pruning at the same time  
SRFFT_{pruningtimeshift}  SRFFT_{pruningtimeshift} fails to deliver a result  
SRFFT(noprun)  33  DFT_{COMM}, 92.11 % with inputoutput pruning at the same time  
SRFFT_{pruningtimeshift}  SRFFT_{pruningtimeshift} fails to deliver a result 
Savings in the number of arithmetic operations attained with the DFT_{COMM} algorithm in comparison with the competing SRFFT (noprun) and SRFFT_{pruningtimeshift} methods for N = 1024
DFT_{COMM} in comparison with:  L _{ i }  L _{ o }  Saving 

SRFFT(noprun)  {2^{1},2^{2},…, N}  N = 2^{10}  DFT_{COMM}, 38.22 % with input pruning 
SRFFT_{pruningtimeshift}  DFT_{COMM}, 8.71 % with input pruning  
SRFFT(noprun)  90  DFT_{COMM}, 59.22 % with inputoutput pruning at the same time  
SRFFT_{pruningtimeshift}  SRFFT_{pruningtimeshift} fails to deliver a result  
SRFFT(noprun)  13  DFT_{COMM}, 82.45 % with inputoutput pruning at the same time  
SRFFT_{pruningtimeshift}  SRFFT_{pruningtimeshift} fails to deliver a result 
In other test cases, from Tables 7 and 8, it follows that for L _{ o } = {1027, 90}, L _{ o } = {33, 13}, and L _{ i } = {2^{1}, 2^{2},…, N}, the SRFFT_{pruning−time−shift} algorithm fails to deliver a result at all. Furthermore, from Table 7, it follows that in the scenarios with L _{ o } = {N, 1027, 33} and L _{ i } = {2^{1}, 2^{2},…, N}, our DFT_{COMM} attains 43.26, 76.24, and 92.11 % savings for N = 262,144, respectively, in the number of arithmetic operations required to compute the composite length DFT. In addition, from Table 8, it follows that in the scenarios with L _{ o } = {N, 90, 13} and L _{ i } = {2^{1}, 2^{2},…, N}, our DFT_{COMM} attains 38.22, 59.22, and 82.45 % savings for N = 1024, respectively, in the number of arithmetic operations required to compute the composite length DFT.
Note that our DFT_{COMM} always requires fewer arithmetic operations than the competing SRFFT_{pruning} and SRFFT_{pruning−time−shift} algorithms due to the different butterfly schemes employed to implement the splitradix FFT algorithms [26] and the unified commutationpruning technique employed (see Section 3). The SRFFT_{pruning} and SRFFT_{pruning−time−shift} algorithms perform the twobutterfly scheme [26], while our DFT_{DIT−DIF−Pr} algorithm employs the threebutterfly scheme to achieve a reduction in the number of arithmetic operations required to implement the DFT_{ P } blocks. Furthermore, graphs of Fig. 6 report that the SRFFT_{pruning} algorithm fail to deliver a result at all in the scenarios with L equal to N due to their algorithmic construction as reported by the authors of [3]. For this reason, this algorithm cannot present a valid value for the last test of L _{ o } (it is simply unable to stop to prune at all). In addition, Fig. 6 reports minimal differences between the numbers of arithmetic operations attained by the DFT_{COMM} evaluated using the roughDP or exhDPbased selection for specifying D _{ ip } and D _{ op }. In summary, the number of arithmetic operations required to compute the SRFFT_{pruning}, SRFFT_{pruningtimeshift}, and DFT_{COMM} algorithms can be found in Table 4.
4.2 Comparison with the SFFTrelated algorithms
In a context of pruned DFTs, realworld sensing scenarios are characterized by the uncertainties attributed to zeropadded input data acquisition modes with variable composite length windowing of the input and/or output Fourier transform sequences, in general cases, with nonsparse Fourier spectrum [10–12]. In contrast, the celebrated SFFT method developed and featured in [20] presumes “sparsity” of the Fourier spectrum that requires that majority of the Fourier coefficients are zeros or negligible; e.g., the authors of [20] exemplified such sparsity level at approximately 89 %, i.e., up to 89 % of the Fourier transform coefficients are to be zeroes or negligible for operability of their SFFT. Otherwise, the DFT should be specified and treated as a nonsparse transform.
Currently, a family of novel efficient algorithms for computing the FFTs applicable for sparse sensing scenarios when only a few Fourier transform coefficients (k _{s} largest coefficients of the Nlength Fourier transform) of the input signal x are different from zero have been developed [20, 21], which compose a family of the socalled SFFT methods. To compute a reliable SFFT for typical high N > 2^{10}, the sparsity level constraint requires that majority of the Fourier coefficients are zeros [20] (or negligible to be discarded). Such model assumptions are valid, for example, in video compressing applications [20]. Therefore, if majority of the Fourier transform coefficients are supposed to be zeros or can be discarded, then efficient computing techniques from the SFFT family can be employed. The celebrated algorithms from such a family are the SFFTv1 and the SFFTv2 developed and featured in [20] where the sparsity level was exemplified at 89 % of zero (negligible) Fourier coefficients. In [21], the SFFTv3 and SFFTv4 algorithms were proposed, where some computational improvements were introduced. SFFTv3 was implemented in [28] while the program code for implementation of the SFFTv4 algorithm is not available at this time. Another competing technique for computing of the FFT of sparse (in the frequency domain) signals was addressed in [22] as the socalled FADFT2 algorithm from the AAFFT library [22]. However, in [20, 21], it was corroborated that the SFFTrelated algorithms manifest better operational performances than FADFT2 of [22].
Comparisons of the SFFTv1, SFFTv2, SFFTv3, and DFT_{COMM} algorithms for different sizes (N) of the signal x, with N = L _{ i } = {2^{6}, 2^{7},…, 2^{20}} and k _{s} = L _{ o } = 50
L _{ i } = N  k _{s} = L _{ o }  SFFTv1  SFFTv2  SFFTv3  DFT_{COMM} 

2^{6} = 64  50  *  *  *  ✓ 
2^{7} = 128  50  *  *  *  ✓ 
2^{8} = 256  50  *  *  *  ✓ 
2^{9} = 512  50  *  *  *  ✓ 
2^{10} = 1024  50  *  *  ✓  ✓ 
2^{11} = 2048  50  *  *  ✓  ✓ 
2^{12} = 4096  50  *  *  ✓  ✓ 
2^{13} = 8192  50  ✓  ✓  ✓  ✓ 
2^{14} = 16,384  50  ✓  ✓  ✓  ✓ 
2^{15} = 32,768  50  ✓  ✓  ✓  ✓ 
2^{16} = 65,536  50  ✓  ✓  ✓  ✓ 
2^{17} = 131,072  50  ✓  ✓  ✓  ✓ 
2^{18} = 262,144  50  ✓  ✓  ✓  ✓ 
2^{19} = 524,288  50  ✓  ✓  ✓  ✓ 
2^{20} = 1,048,576  50  ✓  ✓  ✓  ✓ 
In addition, DFT computations for other sparse test scenarios with different values of N and k _{s} were run, in particular, for N = L _{ i } = {2^{13}, 2^{14},…, 2^{17}} and k _{s} = L _{ o } = {1, 2, …, k _{smax}} with k _{smax} = 11 % of N. The test scenarios for the SFFT algorithms delivered successful results only for a few tested values of k _{s}. For example, the SFFTv1 algorithm is executed successfully for N = {2^{13}, 2^{15}} and k _{s} = {1, 2,…, 50}, for N = 2^{14} and k _{s} = {1, 2, …, 50} ∪ {56, 57, …, 63}, for N = 2^{16} and k _{s} = {1, 2, …, 50} ∪ {64, 65, …, 97}, and for N = 2^{17} and k _{s} = {1, 2,…, 74}.
Comparisons of the SFFTv1, SFFTv2, SFFTv3, and DFT_{COMM} algorithms for different sizes (N) of the signal x, with N = L _{ i } = {2^{13}, 2^{14},…, 2^{17}} and k _{s} = L _{ o } = {1, 2,…, k _{smax}} in the tested sparse scenarios with k _{smax} ~ 11 % of N
L _{ i } = N  1 ≤ k _{s} ≤ k _{smax}  SFFTv1 

2^{13}  1 ≤ k _{s} ≤ 901  k _{s} = {1, 2,…, 50} 
2^{14}  1 ≤ k _{s} ≤ 1802  k _{ s } = {1, 2, …, 50} ∪ {56, 57, …, 63} 
2^{15}  1 ≤ k _{s} ≤ 3604  k _{s} = {1, 2,…, 50} 
2^{16}  1 ≤ k _{s} ≤ 7208  k _{s} = {1, 2, …, 50} ∪ {64, 65, …, 97} 
2^{17}  1 ≤ k _{s} ≤ 14,417  k _{s} = {1, 2,…, 74} 
L _{ i } = N  1 ≤ k _{ s } ≤ k _{ smax }  SFFTv2 
2^{13}  1 ≤ k _{s} ≤ 901  k _{s} = {1, 2,…, 50} 
2^{14}  1 ≤ k _{s} ≤ 1802  k _{s} = {1, 2,…, 50} 
2^{15}  1 ≤ k _{s} ≤ 3604  k _{s} = {1, 2,…, 50} 
2^{16}  1 ≤ k _{s} ≤ 7208  k _{s} = {1, 2,…, 50} 
2^{17}  1 ≤ k _{s} ≤ 14,417  k _{s} = {1, 2,…, 50} 
L _{ i } = N  1 ≤ k _{ s } ≤ k _{ smax }  SFFTv3 
2^{13}  1 ≤ k _{s} ≤ 901  k _{s} = {4, 5,…, 673} 
2^{14}  1 ≤ k _{s} ≤ 1802  k _{s} = {4, 5,…, 1346} 
2^{15}  1 ≤ k _{s} ≤ 3604  k _{s} = {4, 5,…, 2692} 
2^{16}  1 ≤ k _{s} ≤ 7208  k _{s} = {4, 5,…, 5385} 
2^{17}  1 ≤ k _{s} ≤ 14,417  k _{s} = {4, 5,…, 10,771} 
L _{ i } = N  1 ≤ k _{ s } ≤ k _{ smax }  DFT _{ COMM } 
2^{13}  1 ≤ k _{s} ≤ 901  k _{s} = {1, 2,…, k _{smax}} 
2^{14}  1 ≤ k _{s} ≤ 1802  k _{s} = {1, 2,…, k _{smax}} 
2^{15}  1 ≤ k _{s} ≤ 3604  k _{s} = {1, 2,…, k _{smax}} 
2^{16}  1 ≤ k _{s} ≤ 7208  k _{s} = {1, 2,…, k _{smax}} 
2^{17}  1 ≤ k _{s} ≤ 14,417  k _{s} = {1, 2,…, k _{smax}} 
Average absolute errors attained in sparse scenarios with the SFFTv1, SFFTv2, SFFTv3, and DFT_{COMM} algorithms for N = {2^{13}, 2^{14},…, 2^{18}} and k _{s} = L _{ o } = 50
L _{ i } = N  k _{s} = L _{ o }  SFFTv1  SFFTv2  SFFTv3  DFT_{COMM} 

AbsError  AbsError  AbsError  AbsError  
2^{13}  50  5.6162 × 10^{−5}  5.0689 × 10^{−5}  2.4973 × 10^{−5}  2.7642 × 10^{−10} 
2^{14}  50  7.0000 × 10^{−4}  6.3012 × 10^{−4}  4.9943 × 10^{−5}  1.8228 × 10^{−9} 
2^{15}  50  2.8526 × 10^{−4}  2.5407 × 10^{−4}  9.9883 × 10^{−5}  1.6704 × 10^{−8} 
2^{16}  50  3.7305 × 10^{−4}  3.6885 × 10^{−4}  1.9976 × 10^{−4}  1.5567 × 10^{−7} 
2^{17}  50  4.9801 × 10^{−7}  4.8631 × 10^{−7}  1.5437 × 10^{−7}  2.6652 × 10^{−10} 
2^{18}  50  0.0023  0.0023  7.9905 × 10^{−4}  8.2515 × 10^{−6} 
From the data reported in Table 11, it follows that for N = 8192 and k _{s} = L _{ o } = 50, the SFFTv1 and SFFTv2 algorithms manifest very close absolute error values; in particular, the attained average absolute error values were 5.6162 × 10^{−5} and 5.0689 × 10^{−5}, respectively. However, the SFFTv3 attains a lower absolute average error values than other SFFT versions. It is noteworthy to mention that the lowest absolute average error was attained with the DFT_{COMM} algorithm at a value of 2.7642 × 10^{−10}.
On the other hand, the SFFTrelated algorithms demonstrate reliable operation for specific input parameter combinations, i.e., they are dependent on the combination of the dimension N of the input signal x, and the sparsity factor k _{s}. In contrast, the DFT_{COMM} algorithm manifests the operational robustness in the sense that it does not subject to any of such dimensional limitation and demonstrated perfect operational performances in all tested harsh (nonsparse) computational scenarios. Furthermore, all SFFTrelated algorithms are probabilistictype techniques [20, 21], in which the desired k _{s} largest coefficients of the Fourier spectrum of the input sequence are reconstructed (approximated) with a high probability (not mandatory with probability one). In contrast, the DFT_{COMM} algorithm is a deterministic technique, and it produces more reliable and accurate results than the family of the SFFTrelated algorithms (as demonstrated in Fig. 8 and Tables 10 and 11).
It is also worthwhile to note that presently (in the sparsityguaranteed computational scenarios only), the SFFTrelated algorithms outperform the DFT_{COMM} in the computational speed due to their specially devised execution parallelism [20, 21, 28]. From the family of the SFFTrelated algorithms, the SFFTv3 [28] manifests the most speedup computational performances for any input sequence dimension N and any feasible value k _{s} in the sparsityguaranteed scenarios only; in particular, when approximately only 8.2 % (or lower number) of the Fourier coefficients of the input signal are significant, thus not discarded (as shown in Table 10). In contrast, in all comparable (sparse or nonsparse) computational scenarios, the DFT_{COMM} algorithm manifested superior accuracy performances (lower absolute error values) than those attained with the SFFTrelated algorithms.
In closing, it is noteworthy to mention that in a majority of practical computational scenarios, the savings in the number of arithmetic operations achievable with the optimized unified DFT_{COMM} technique are significant. As a concluding example, refer to the test scenario with N = 8192 and L _{ i } = L _{ o } =307 in which case the savings in the total number of required arithmetic operations attainable with the DFT_{COMM} algorithm in comparison with the most prominent competing splitradix FFT algorithm [3, 14, 23, 24] constitute 45 %.
5 Conclusions
We have developed a new technique that carries out an efficient computation of the DFTs of composite lengths of the input and/or output data sequences smaller than the dimension N of the full DFT/FFT. The addressed methodology unifies the commuting, filtering, and pruning paradigms yielding the new DFT_{COMM} method that outperforms the existing competing pruningdecompositionbased techniques in the sense of attainable savings in the number of required arithmetic operations.
Furthermore, our DFT_{COMM} method admits computing the DFT_{ P } blocks at the intermediate stage of the pruned decomposed transform using any existing FFT algorithm. Based on the performed treatment of the combinational hypotheses testingtype problem regarding all feasible allocationpruning modalities, the decision in favor of the preferable hypothesis was made that yields the proposed DFT_{COMM} method. Being the globally optimal decision making result of testing the complete list of all feasible hypotheses, the DFT_{COMM} method guarantees to require a fewer or at most the same number of arithmetic operations for its execution than any other of the competing pruningdecompositionbased methods reported in the literature.
In addition, we have corroborated that, in the scenarios with nonguaranteed sparsity of the data Fourier spectra, the DFT_{COMM} method manifests better reliability and accuracy than the family of the celebrated competing SFFTrelated algorithms; while in scenarios with severe Fourier spectrum nonsparsity (i.e., when the majority of the data Fourier spectrum coefficients take nonzero values, thus cannot be discarded), the DFT_{COMM} technique always outperforms the celebrated SFFTrelated algorithms because all those simply fail to execute the program code in such uncertain computational scenarios.
Declarations
Acknowledgements
The authors would like to thank the anonymous reviewers for their constructive criticism and comments that helped to improve the presentation of the paper.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 J Markel, FFT pruning. Audio and Electroacoustics, IEEE Transactions on 19(4), 305, 311 (1971). doi:10.1109/TAU.1971.1162205 View ArticleGoogle Scholar
 V Raghavan, KMM Prabhu, PCW Sommen, Complexity of pruning strategies for the frequency domain LMS algorithm. Signal Processing 86(10), 2836–2843 (2006). ISSN 0165–1684, http://dx.doi.org/10.1016/j.sigpro.2005.11.015 MATHView ArticleGoogle Scholar
 Y Xu; MS Lim, Splitradix FFT pruning for the reduction of computational complexity in OFDM based cognitive radio system, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 6972, May 30June 2 2010. doi: 10.1109/ISCAS.2010.5537048.
 FM Henderson, AV Lewis (eds.), Principles and applications of imaging radar, manual of remote sensing, vol. 3, 3dth edn. (Willey, NY, 1998)Google Scholar
 HH Barrett, KJ Myers, Foundations of image science (Willey, NY, 2004)Google Scholar
 YV Shkvarko, Unifying experiment design and convex regularization techniques for enhanced imaging with uncertain remote sensing data––part I: theory, part II: adaptive implementation and performance issues. IEEE Trans. Geoscience and Remote Sensing 48(1), 82–111 (2010)View ArticleGoogle Scholar
 A Moni, CJ Bean, I Lokmer, S Rickard, Source separation on seismic data. IEEE Signal Processing Magazine 29(3), 16–28 (2012)View ArticleGoogle Scholar
 RM Willet, MF Duarte, MA Davenport, RG Baraniuk, Sparsity and structure in hyperspectral imaging. IEEE Signal Processing Magazine 31(1), 116–126 (2014)View ArticleGoogle Scholar
 Q Zhu, CR Berger, EL Turner, L Pileggi, F Franchetti, Polar format synthetic aperture radar in energy efficient applicationspecific logicinmemory, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 1557 1560, 25–30 March 2012. doi: 10.1109/ICASSP.2012.6288189.
 YV Shkvarko, J Tuxpan, SR Santos, I Yaniez, Highresolution imaging with uncertain radar measurement data: a doubly regularized compressive sensing experiment design approach, in IEEE Intern. Symposium on Geoscience and Remote Sensing (IGRSS’2012), Munich, Germany, 6976–6970. (2012). ISBN: 97814673115951/12Google Scholar
 YV Shkvarko, J Tuxpan, SR Santos, l _{2}l _{1} Structured descriptive experiment design regularization based enhancement of fractional SAR imagery. Signal Processing 93, 3553–3566 (2013). http://dx.doi.org/10.1016/j.sigpro.2013.03.024
 S Foucart, H Rauhut, A mathematical introduction to compressive sensing (Springer, NYHeidelberg, 2013)MATHView ArticleGoogle Scholar
 DP Skinner, Pruning the decimation intime FFT algorithm, in IEEE Transactions on Acoustics, Speech and Signal Processing, 24(2), 193–194 (1976). doi:10.1109/TASSP.1976.1162782
 L Yuan, X Tian, Y Chen, Pruning splitradix FFT with time shift, International Conference on Electronics, Communications and Control (ICECC), 2011, 1581 1586, 9–11 Sept. 2011. doi: 10.1109/ICECC.2011.6066654.
 S Bouguezel, MO Ahmad, MNS Swamy, Efficient pruning algorithms for the DFT computation for a subset of output samples, in Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03, vol.4, pp. IV97, IV100 vol.4, 25–28 May 2003. doi: 10.1109/ISCAS.2003.1205782.
 CP Fan, GA Su, Pruning fast Fourier transform algorithm design using groupbased method, Signal Processing 87(11), 2781–2798 (2007), ISSN01651684, http://dx.doi.org/10.1016/j.sigpro.2007.05.012 MATHView ArticleGoogle Scholar
 TV Sreenivas, P Rao, FFT algorithm for both input and output pruning, in IEEE Transactions on Acoustics, Speech and Signal Processing, 27(3), 291–292 (1979). doi:10.1109/TASSP.1979.1163246
 C Roche, A splitradix partial input/output fast Fourier transform algorithm, in IEEE Transactions on Signal Processing, 40(5), 1273, 1276 (1992). doi:10.1109/78.134493
 L Wang, X Zhou, GE Sobelman, R Liu, Generic mixedradix FFT pruning, in IEEE Signal Processing Letters, 19(3), 167, 170 (2012). doi:10.1109/LSP.2012.2184283
 H Hassanieh, P Indyk, D Katabi, E Price, 2012. Simple and practical algorithm for sparse Fourier transform, in Proceedings of the twentythird annual ACMSIAM symposium on Discrete Algorithms (SODA '12) Kyoto, Japan, 1719 Jan, 1183–1194, (2012)Google Scholar
 H Hassanieh, P Indyk, D Katabi, E Price, Nearly optimal sparse Fourier transform, in Proceedings of the fortyfourth annual ACM symposium on Theory of computing (STOC '12), ACM, New York, (2012), 563–578. doi:10.1145/2213977.2214029. http://doi.acm.org/10.1145/2213977.2214029Google Scholar
 M Iwen, A Gilbert, M Strauss et al., Empirical evaluation of a sublinear time sparse DFT algorithm, Communications in Mathematical Sciences 5(4), 981–998 (2007)Google Scholar
 HV Sorensen, CS Burrus, Efficient computation of the DFT with only a subset of input or output points, in IEEE Transactions on Signal Processing, 41(3), 1184–1200 (1993). doi:10.1109/78.205723
 M MedinaMelendrez, M AriasEstrada, A Castro, Input and/or output pruning of composite length FFTs using a DIFDIT transform decomposition, in IEEE Transactions on Signal Processing, 57(10), 4124, 4128 (2009). doi:10.1109/TSP.2009.2024855
 AV Oppenheim, RW Schafer, Discretetime signal processing, (Prentice Hall, 2nd Edition, U.S., 1999)Google Scholar
 HV Sorensen, M Heideman, CS Burrus, On computing the splitradix FFT, in IEEE Transactions on Acoustics, Speech and Signal Processing, 34(1), 152–156 (1986). doi:10.1109/TASSP.1986.1164804
 Y Suzuki, S Toshio, K Kido, A new FFT algorithm of radix 3,6, and 12, in IEEE Transactions on Acoustics, Speech and Signal Processing, 34(2), 380–383 (1986). doi:10.1109/TASSP.1986.1164826
 J. Schumacher, M. Püschel, High performance sparse fast Fourier transform, Master´s thesis, ETH Zurich, Department of Computer Science (2013).Google Scholar
 M Frigo, SG Johnson, The design and implementation of FFTW3, Proceedings of the IEEE 93(2), 216–231 (2005) doi:10.1109/JPROC.2004.840301