Goertzel algorithm generalized to non-integer multiples of fundamental frequency

The article deals with the Goertzel algorithm, used to establish the modulus and phase of harmonic components of a signal. The advantages of the Goertzel approach over the DFT and the FFT in cases of a few harmonics of interest are highlighted, with the article providing deeper and more accurate analysis than can be found in the literature, including the memory complexity. But the main emphasis is placed on the generalization of the Goertzel algorithm, which allows us to use it also for frequencies which are not integer multiples of the fundamental frequency. Such an algorithm is derived at the cost of negligibly increasing the computational and memory complexity.


Introduction
In the case of discrete-time signals, the discrete Fourier transform (DFT) is widely used for spectral analysis. The frequencies of the harmonics in the DFT always depend on the length of the transform, N, and they are integer multiples of the fundamental frequency f = f s N ,where f s represents the sampling frequency. Thus, Δf gives the frequency resolution of the DFT. In the case, when the transform length N is not a multiple of the signal period, the signal is a sum of harmonic components whose frequencies are not integral multiples of the fundamental frequency. Such components are not expressible in the N-point DFT spectrum by a single spectral line-this effect is called "leakage" into the neighboring DFT spectral coefficients, placed at the integer multiples of Δ f [1]. Therefore, the transform length N always needs to be chosen with respect to the desired accuracy of the frequency resolution. The computational complexity of the DFT increases quadratically with the number of samples/frequencies, and thus in practice we use almost exclusively the fast Fourier transform algorithm (FFT), whose computational complexity is linearithmic (linearlogarithmic). When the task is to identify the modulus and/or phase of a single or of just a few of the frequency components, even the FFT is of no advantage, because it always computes all the frequency components, most of which are discarded, as being of no interest. In such situations, methods specialized in computing a subset of output frequencies can be exploited with great benefit. Besides the Goertzel algorithm, which deals with single frequencies separately, it is worth mentioning the so-called pruned-FFT [2,3], which is also connected to the zoom-FFT algorithm [1], and the transform decomposition of Sorensen and Burrus [3], which efficiently combines the ideas of the splitradix FFT and the Goertzel approach. However, the essential disadvantage of rounding frequencies to the nearest integer multiples (and the inaccuracy thus introduced) remains if using any of these methods, including the classical version of the Goertzel algorithm.
In Section 2, we first show the derivation of the common Goertzel algorithm in detail. While this may seem superfluous, it will be necessary to refer back to a number of its particular steps in the later sections. Then the computational and memory complexity of the Goertzel algorithm and the FFT is compared. In Section 3, we provide the announced generalization of the Goertzel algorithm, so that it is possible to use it also for the non-integral multiples of the fundamental frequency. Such a generalization has been mentioned before, e.g., in [4,1], but just for the computation of the modulus, not the phase of a harmonic component.

Example of utilization of Goertzel algorithm-DTMF
The Goertzel algorithm is typically used for frequency detection in the telephone tone dialing (dual-tone multifrequency, DTMF), where the meaning of the signaling is determined by two out of a total of eight frequencies being simultaneously present [5]. The frequencies of each of the two groups of four signaling tones were chosen such that the frequencies of their higher harmonics or intermodulation products were sufficiently distant. The frequencies chosen for the DTFM have a big least common multiple. Hence, using a digital receiver with a sampling frequency of 8 kHz, the period of DTMF signal amounts to several tens of thousands of samples. In practice, however, the transform length N must be much smaller, so naturally the effect of spectrum leakage will appear. For example, with N = 205, instead of the accurate frequency 770 Hz the modulus at approximately 780.5 Hz (= 20·8000/205) is computed. This situation is illustrated in Figure 1, where it is evident that the maximum occurs at the non-integer multiple of the fundamental frequency.
The value N = 205 is often used in practice [6], because one of the local minima of the sum of squared relative deviations of the signaling frequencies is experienced precisely for this length. In this situation, the deviation is approximately equal to 1.4%, while the transmitter frequency tolerance is 1.8%. Nevertheless, in some applications of the Goertzel algorithm the deviation from the exact frequency can exceed a prescribed tolerance, and thus both the DFT and the Goertzel algorithm would be of little use.
Using the approach presented in this article it is not necessary to round the frequencies at which detection is desired; it is possible to determine the modulus and phase of a component at an arbitrary (even non-integer) frequency. The number of operations and memory requirements increases only negligibly with this approach.

Notation
In the following text, we assume a discrete signal x of length N, whose samples can be complex

Derivation of standard Goertzel algorithm
The algorithm invented by Goertzel [7] serves to compute the kth DFT component of the signal {x[n]} of length N, i.e., Multiplying the right side of this equation by 1 = e j2π k N N leads to its equivalent which can be rearranged into The right side of (3) which can be rewritten as where the compact support of the signal {x[n]} is taken into consideration.
Comparing (3) and (5), it is clear that the desired X[k] is the Nth sample of the convolution, i.e., for an arbitrary but fixed k = 0,..., N -1. This means that the required value can be obtained as the output sample in time N of an IIR linear system with the impulse response {h k [n]}.
The transfer function H k (z) of this system will now be derived; it is the L -transform of its impulse response [8], thus which can be viewed as a geometric series with the first term being equal to e j2π k 0 N z −0 = 1 and with the quotient q = e j2π k N z −1 . For |q| < 1, i.e., |z| > 1, the series is convergent and its sum equals the desired transfer function: The corresponding difference equation is This first order difference equation contains a complex multiplication factor, which is computationally demanding. To save the computational cost, the transmission function can be extended in both the numerator and the denominator by the conjugate of The respective difference equation of this second order IIR system is Such a structure can be described using the state variables: while the output is given by and we set s[-1] = s[-2] = 0. The signal flow graph representing the system is depicted in Figure 2.
The state-space description is advantageous because only the output sample y[N] is of interest. The algorithm iterates the real-number-only system (16) for (N + 1) times (beginning with the sample with the time index 0; in the last iteration the input sample x[N] is put equal to zero). Only in the last step is the output y k [N] calculated according to (17) using only a single complex multiplication. As mentioned earlier, the value in The Goertzel algorithm can hence be considered as an IIR filtering process, while only a single output sample is of interest. The algorithm is presented step by step in Figure 3.

Properties
The Goertzel algorithm in fact performs the computation of a single DFT coefficient. Compared to the DFT, it has several advantages, because of which it is used.
-First of all, the Goertzel algorithm is advantageous in situations when only values of a few spectral components are required (as in the DTMF example in Section 1.1), not the whole spectrum. In such a case the algorithm can be significantly faster.
-The efficiency of using the FFT algorithm for the computation of DFT components is strongly determined by the signal length N. The most effective case is when N is a power of two. On the contrary, N can be arbitrary in the case of the Goertzel algorithm, and the computational complexity does not vary.
-The computation can be initiated at an arbitrary moment, even at the very time of the arrival of the very first input sample; it is not necessary to wait for the whole data block as in the case of the FFT. Thus, the Goertzel algorithm can be less demanding from the viewpoint of the memory capacity and it can perform at a very low latency. Also, the Goertzel algorithm does not need any reordering of input or output data in the bit-reverse order [1]. -Finally, as will be shown later in the article, the modulus and phase can be established also for the non-integral spectral indexes k, raising the computational effort only negligibly. Therefore the Goertzel algorithm is convenient in cases when, for some reason, it is required to detect harmonic signals of nonintegral frequencies, or, signals with a limited number of samples which causes a decrease of the DFT frequency resolution.

Computational and memory complexities
In the following analysis, operations which can be performed before the first data sample has been received are not considered. Specifically, the constants A, B, C in Figure  3 can be precomputed. The memory performance is handled in a minimalist scenario, i.e., such that it would not be possible to implement the algorithm with fewer storage locations.
The FFT algorithm used with N being a power of two has computational demands proportional to N log 2 N, the absolute number depends on the particular implementation. Usually the number of real-number operations found in the literature is approximately 6N log 2 N (taking one complex multiplication as a combination of four multiplications and two summations). When working with real signals, a number of operations can be avoided; however, it is at the cost of increased complexity of the algorithm, and, it is not true that the demands can be reduced by half, as can be read, for example in [9]. For this reason, we consider the standard "complex" FFT even for real signals.
If we analyze the number of operations of the standard Goertzel algorithm, we realize that for a real input  Thus, if N frequencies were of interest, the Goertzel algorithm would be of quadratic complexity as the DFT is.
To answer the question "for how many frequencies K is it more advantageous to exploit the Goertzel algorithm than the FFT" we compare which represents a more accurate result than for example [[8], p. 635], where the sharper inequality K < log 2 N, based solely on a comparison of the order of magnitude, is presented. Such a result, however, holds only for N being a power of two; otherwise the inequality (18) can even be more favorable for the Goertzel algorithm.
The formula (18) says that the computation should be faster than the FFT as long as the number of frequencies does not exceed 2 log 2 N. For example, with a signal of length N = 32 the Goertzel algorithm is preferable if K ≤ 9. In the case of N = 128 Goertzel dominates over the FFT if K ≤ 13.
In fact, the algorithm introduced in [3] can be even more efficient than this. It combines the good properties of both the FFT and the Goertzel algorithm, producing a DFT decomposition similar to the one used in the splitradix FFT. The dominance of the algorithm of Sorensen and Burrus over the FFT is guaranteed even for K <N/2. An experimental comparison of this approach with the Goertzel algorithm showed that the Goertzel algorithm performs actually better than their algorithm when K ≤ 4 or K ≤ 5 for a wide range of N. It should be noticed, however, that the algorithm from [3] has to work with a whole data block, and also the complexity being compared does not include rearrangement of the input data sequence.
Using the FFT algorithm requires a memory space of at least 2N, which contains the real and imaginary parts of signal samples. Also the N values of the transformation kernel, sin and cos (so-called twiddle factors), are often precomputed and stored. The FFT calculation itself can be performed with no values being moved in memory (i.e., in-place), however, with regard to the impossibility of starting the computation until the last sample of a block of data is received, a buffer of at least 2N in size must be used. In the case of real signals, N memory locations are enough. Thus, the overall FFT memory demand is 4N for real signals.
For each considered frequency, the Goertzel algorithm requires: locations for saving two state variables, the real constant B, the real and imaginary parts of the precomputed C, and the real and imaginary parts of the final result. There is no need to implement input buffering, because the computation can be run as the new signal samples arrive. Similarly, the output signal can be overwritten after the last sample has arrived. In many cases it will therefore not be necessary to use buffering at the output side either. The total memory complexity of the Goertzel algorithm is thus 7K positions.
Combining all the above together, the Goertzel algorithm will be less memory-demanding than the FFT if A comparison of (19) and (18) leads to the conclusion that, if we look for a number K for which the Goertzel algorithm dominates over the FFT from both the memory and the computational viewpoints, then: for N ≥ 13 formula (18) is decisive, because for these N it holds 4 7 N > 2log 2 N; on the other hand, for N {2,..., 12} (which is unusual in practice), the decisive formula is (19), because for these N it holds 4 7 N > 2log 2 N; nevertheless, as the difference of the right and the left sides does not exceed 2 in this case, we can conclude, with a small loss of generality, that the comparison of the effectiveness of the two algorithms can be based just on relation (18).

Generalized Goertzel algorithm
Formula (2) holds for integer-valued k only. In such a case, the integer number of periods of the transformation kernel, e −j2π k n N , corresponds to the signal length N. In the case of k ℝ, formulas (1) and (2) are generally no longer in agreement. (The period of the transformation kernel no longer corresponds to N, hence the standard approach cannot be used.) In Sections 3.1 and 3.2, we will generalize the algorithm such that it includes also the non-integral-valued multiples of the fundamental frequency. The complexity of the novel approach is analyzed in Section 3.3. And, as shown in Section 3.4, the non-integer case can be treated by the standard algorithm using a small trick; however, this is at the cost of increased computational effort.

Generalizing to non-integer k
In fact, when k is not integer-valued, we can no longer speak of the DFT (1), rather of the discrete-time Fourier transform (DTFT), which is defined by (20) With the notation ω k = 2π k N we can write that where we exploited the compactness of the support of the signal {x[n]}.
The derivation of the generalized Goertzel algorithm is analog to the technique presented in Section 2. Compared to that, however, we extend formula (22) at the very beginning by unity in the form of leading to Since the sum in (26) is identical to (3), the derivation of the generalized algorithm can proceed using the same steps as in Section 2.1, with one noteworthy change: the equation that characterizes the output using state variables (17) will now be of the form Indeed this is so, since the "correction constant", e -j2πk , depends only on the index of the frequency component, which remains constant throughout the computation. The complex constant is equal to one for k Z, which shows that this is indeed a generalization. In fact, the only variation compared to the standard Goertzel algorithm is the multiplication by this constant at the very end of the algorithm.
The constant e -j2πk affects only the phase of the result, not the module. Among other things, this means that the interest in the modules of the components with non-integer k can be satisfied using the standard algorithm. Indeed, for example [4] uses it in this way. In cases when the phase plays a role (the delay of a signal is detected, for example), however, the use of this "correction constant" is necessary. A short remark can be found in [ [1], p. 531], describing the possiblility of computing the Goertzel results also for non-integer-valued k; however, it misleads the reader in that the phase case is not distinguished at all.

Reducing number of iterations
It will be shown in this section that the last iteration of the Goertzel algorithm can be substituted by merely a single complex multiplication, instead of performing it in the usual manner.
From equation (5) A comparison of (28) and (29) leads to the formula which characterizes the relationship between the last two samples of the convolution: This way the shortened generalized algorithm is obtained, as is summarized in Figure 4.

Computational and memory complexities
The computational complexity of the generalized Goertzel algorithm described in Section 3.1 (without the shortening in Section 3.2) grows by one complex multiplication (i.e., four real multiplications and two real additions) compared to the traditional approach. The memory requirements increase by two positions, which contain the real and imaginary parts of the correction constant e -j2πk .
Although saving one iteration in the main loop according to Section 3.2 results in lowering the computational effort by two additions and one multiplication, the need for the final complex multiplication cancels such a benefit. This means: there is no advantage in shortening the main loop in case of integer-valued k; in such a case the traditional algorithm as defined in Section 2 is the most efficient one.
However, in the case of non-integer-valued k the iteration reduction does make sense, since joining the correction constants into a single one (31) leads to the overall growth of computation complexity by three real multiplications (it would be four real multiplications and two real additions if the reduction was not exploited.) Considering the memory, such a case requires two more positions for the real and imaginary parts of (31), compared to the standard algorithm.
It is evident that the computational and memory complexities of the generalized case are only negligibly greater. The main advantage of shortening the loop according to Section 3.2 can be seen in that, for example, in continuous operation, it is not necessary to perform the last iteration and it is possible to start processing the input sample x[N] in the time spared.

Yet another approach utilizing standard Goertzel algorithm
It will be shown that, by a trick, the computation required for k ℝ can be transformed into integervalued problem, where the standard Goertzel algorithm can be utilized-so no modifications are needed. However, it is at the cost of raising the computational complexity, which is even greater than with the generalized Goertzel algorithm (Figure 4).
Starting from (22) again, the k ℝ can be divided into its integer part ⌊k⌋ Z and the remainder k ∈ [0, 1), i.e., k = k +k. This way, (22) can be rewritten as whose right side is a usual DFT of signalx (which is complex!) and thus can be computed by the standard Goertzel algorithm. Inputs: frequency "index" k ∈ R; signal x of length N Output: y, representing X(ω k ) according to eq. (20)  Regarding the memory complexity, in case of real-time processing it is of advantage to precompute and store the signal {e −j2πk n N } . It is complex and therefore requires 2N memory locations. Furthermore, in contrast to the traditional algorithm, we also need 2N positions to storex , which is complex (instead of N in the traditional, real case).
To computex we need 2N real multiplications. In the standard Goertzel algorithm (Figure 3) the N iterations of the main loop work with real numbers only (supposing the input signal to be real). In the approach presented above, the number of real operations is doubled.
It is therefore clear that the generalized algorithm according to Figure 4 beats the above alternative approach from both the computational and the memory viewpoints.

Software
Two Matlab functions are available for download at URL [10].
The function named goertzel_classic.m realizes the standard Goertzel algorithm for k Z; the generalized (and shortened) algorithm for k ℝ is implemented in the function goertzel_general_shortened.m. The structure of the functions corresponds to the pseudocodes in Figures 3 and 4. Indexing the vector elements, however, starts with "1" in Matlab, which differs from our theoretical description, where it starts with "0".

Conclusion
The article presented the generalization of the Goertzel algorithm. The novel approach allows us to employ also the non-integer-valued multiples of the fundamental frequency, making it possible to compute the Fourier transform in discrete-time (DTFT) this way. The main advantage consists in that in various applications where the Goertzel algorithm is utilized, it is no longer necessary to round the frequencies of desire, thus obtaining more accurate results. The article shows that this is reached at the cost of only a negligible rise in computational and memory complexities. Furthermore, it has been shown that the very last iteration of the algorithm can be substituted with a multiplication which is little more effective.