EURASIP Journal on Applied Signal Processing 2005:7, 1127–1136 c ○ 2005 Hindawi Publishing Corporation Adaptive Peak Frequency Estimation Using a Database of PARCOR Coefficients

This paper presents an adaptive peak frequency estimation method using a database that stores PARCOR coefficients as key attributes and the corresponding peak frequencies as nonkey attributes. The least-square lattice algorithm is used to recursively estimate the PARCOR coefficients to adapt to changing circumstances. The nearest neighbor to the current PARCOR coefficient is retrieved from the database, and the corresponding peak frequency is regarded as the estimation. A simultaneous execution of database construction and peak estimation with database update is performed to accelerate the processing time and to improve the estimation accuracy.


INTRODUCTION
Estimation of peak frequency of the power spectrum plays an important role in direction-of-arrival estimation, communication system, fault diagnosis, and speech processing. Direction-of-arrival of multiple wave sources arriving on a linear array antenna can be estimated by finding local maxima of the space spectrum obtained from the array inputs [1]. Also in biological signals such as electroencephalogram, blood pressure, heart rate, and laser Doppler flow, peak frequencies provide important information on the diagnosis of diseases [2,3,4,5]. Peak frequencies of speech signals are an important cue in the characterization of speech sounds since they have a close relation to the phonetic content. Therefore, extensive studies on peak frequency estimation have been conducted for many years [6,7,8,9,10,11,12,13].
There are two approaches to power spectrum estimation: nonparametric estimation methods and parametric estimation methods. The FFT (fast Fourier transform) method is a representative one of the nonparametric methods, and the AR (autoregressive) method is a representative one of the nonparametric methods. The FFT method estimates the power spectrum directly from the data. The FFT method is very fast, however, the estimation error variance increases as the number of observations decreases. Moreover, the computationally expensive peak picking is necessary for extracting peak frequencies from the FFT spectrum. The AR method estimates the AR parameters and characterizes the power spectrum by the AR parameters. The AR method provides high-resolution spectrum even with a small number of observations [14]. However, polynomial root finding or peak picking is necessary for extracting peak frequencies from the AR spectrum.
When signal statistics change with time as often happen in real applications, adaptive estimation of peak frequencies is required to adapt to the change. The short-time FFT uses a sliding data window with a constant duration to accommodate nonstationary data. The sliding window technique can also be employed in the AR method. However, the iterative computation of FFT spectrum or AR spectrum is computationally expensive.
In the AR spectrum method, the peak frequency is usually estimated by tracking roots of the AR polynomial rather than AR parameters. Several root-finding algorithms have been derived based on the LMS (least mean squares) algorithm [15,16,17] and the RLS (recursive least squares) algorithm [18]. The AR model constructed as a cascade of 2nd-order sections has also been designed to estimate the roots of the AR polynomial, because the roots can be easily obtained by solving 2nd-order polynomial equations [19].
Several adaptive algorithms for estimating coefficients of the cascade AR model have been derived [20,21]. However, the adaptive algorithms require much computation time. Moreover, the cost function to be minimized is unimodal, and thus the adaptive algorithms may get stuck at local minima.
There is a one-to-one correspondence between AR coefficients and partial autocorrelation (PARCOR) coefficients, and therefore we can characterize the AR spectrum by the PARCOR coefficients instead of the AR coefficients. The LSL (least-square lattice) algorithm can recursively estimate the PARCOR coefficient in a computationally efficient manner [22]. The LSL algorithm is derived based on the least-squarecriterion, nevertheless, the computation time is very fast. The PARCOR coefficients are well suited for fixed-point arithmetic, because they are robust against rounding errors and the absolute value is assured to be less than unity.
This paper proposes a fast and recursive peak frequency estimation method using a database of PARCOR coefficients. In database construction process, we estimate the PARCOR coefficients from real speech signals by the LSL algorithm, and compute the corresponding peak frequencies by using either root-finding or peak-picking technique. We put the quantized PARCOR coefficients as key attributes and the corresponding peak frequencies as nonkey attributes, and then enter a pair of the quantized PARCOR coefficients and the peak frequencies into a database. In estimation process, we estimate the PARCOR coefficients from new observations by the LSL algorithm, retrieve a record with a key nearest to the current key from the database, and use the corresponding peak frequency as the estimation. This estimation method requires neither root finding nor peak picking.
The estimation accuracy strongly depends on the database contents. The database size would become intractably large if we would store a large number of records obtained under many different circumstances. Such a large database requires large computation time and memory. We thus perform database construction and peak frequency estimation simultaneously. More precisely, we estimate the current PARCOR coefficient and enter it into the database only when the database does not contain a record with a key close to the current one. The simultaneous execution of database construction and estimation is useful for decreasing processing time and increasing estimation accuracy. Moreover, we update database contents according to timestamp so that the number of records does not exceed the predetermined threshold. The database update procedure prevents a monotonous increase of database size and thus resulting in reduction of processing time. We apply the proposed estimation method to real speech signals to evaluate the estimation accuracy, the processing speed, and the storage space.

AR model and AR spectrum
We define a signal at time n as s n , a white noise with mean zero and variance σ 2 e as e n , and the ith AR coefficient as a i . The Pth AR model is then represented by The AR model can be regarded as the system that inputs e n , outputs s n , and has the transfer function with The AR spectrum of the signal s n is described by where the sampling period is assumed to be unity for simplicity. It is evident that the peak frequencies depend only on the values of the AR coefficients {a i } P i=1 . We here consider the problem of finding peak frequencies that provide local maxima of P(ω).

Conventional peak frequency estimation method
There are two methods for finding peak frequencies from AR coefficients: peak picking and root finding. The peak-picking method finds local maxima of P(ω) by regularly sampling it with a constant sampling interval. Therefore, it is computationally expensive. The root-finding method computes peak frequencies from roots of the AR polynomial equation, defined by A(z) = 0. Putting the roots as z p (p = 1, 2, . . . , P), we can compute the resonance frequency and the bandwidth by [14] respectively, where T is the sampling period. The rootfinding method is also computationally expensive and is not suited for real-time processing. Using 2nd-order polynomials, we can express the Pthorder AR polynomial as where a (i) 1 and a (i) 2 are the 2nd-order coefficients in the ith section. Using the AR model constructed as a cascade of 2nd-order sections, we can easily get the roots of A(z) = 0 by solving 2nd-order polynomial equations 1 + a (i) 1 z −1 + a (i) 2 z −2 = 0 [19]. Several adaptive algorithms for estimating the 2nd-order coefficients have been proposed [20,21], but they require much more computation time than the standard AR parameter estimation algorithms. Moreover, the cost function to be minimized is unimodal, and thus they may get stuck at local minima.

Recursive estimation of model coefficients
There are batch-type and recursive-type algorithms for estimating the AR coefficients {a i } P i=1 . Recursive-type algorithms are suited for processing nonstationary signals. The representative ones are the LMS and RLS algorithms. The LMS algorithm updates the AR coefficients based on the gradient-descent technique. Unfortunately, it has a drawback that the convergence performance is very sensitive to the eigenvalue spread of the input data covariance matrix [23]. The RLS algorithm updates them based on the least-square method, and therefore it converges faster than the LMS algorithm. However, the RLS algorithm requires O(P 2 ) operations per iteration, while the LMS algorithm requires only O(P) operations per iteration.
There is a one-to-one correspondence between AR coefficients (a 1 , a 2 , . . . , a P ) and PARCOR coefficients (ρ 1 , ρ 2 , . . . , ρ P ). Therefore, the peak frequencies can be uniquely characterized by the PARCOR coefficients {ρ p } P p=1 . The LSL algorithm recursively computes the optimal PAR-COR coefficients, in least-squares sense, in O(P) operations. The convergence speeds of the LSL and RLS algorithms are exactly the same since they are derived based on the same least-square criterion. Nevertheless, the computational complexity of the LSL algorithm is almost equal to that of the LMS algorithm. This is an advantage of updating the PARCOR coefficients with the LSL algorithm.
Algorithm 1 summarizes the LSL algorithm, where λ is the exponential forgetting parameter such that 1 − λ 1. The influence of past signal values decays exponentially faster as the size of λ is smaller. The LSL algorithm recursively updates the PARCOR coefficient ρ p+1 n from the observation x n . The absolute value of the PARCOR coefficient is assured to be less than unity. The PARCOR coefficients can be converted into the AR coefficients by using the following recursion [23]: where a (p) i denotes the ith AR coefficient of order p. In the following, we simply denote a (P) i as a i .

PEAK FREQUENCY ESTIMATION USING A DATABASE
We will explain a peak frequency estimation method that uses a database of PARCOR coefficients. We first describe how to construct the database from observations, and then show how to estimate peak frequencies by database searching.

Quantization of PARCOR coefficients
The LSL algorithm is used to recursively estimate the PAR-COR coefficient ρ p n from observation x n . Since the PARCOR coefficients are assured to be less than unity in magnitude, we can efficiently quantize them with a uniform quantizer. It is known that the AR spectrum becomes more sensitive to changes in PARCOR coefficient as |ρ p | approaches to unity [24,25]. Moreover, the first and second PARCOR coefficients of speech signals are relatively larger in magnitude than the others. We thus transform (ρ 1 , ρ 2 ) into (g 1 , g 2 ) by and quantize g p uniformly, so that the quantization interval of ρ p becomes smaller as |ρ p | approaches to unity. On the other hand, we quantize ρ 3 , ρ 4 , . . . , ρ P uniformly. We denote the quantized value of ρ p as ρ p , and then define a vector of the quantized PARCOR coefficients as ρ = ( ρ 1 , ρ 2 , . . . , ρ p ). We further define the number of bits allocated to the pth PARCOR coefficient as α p . It is known that spectrum distortion caused by rounding errors of high-order coefficients is relatively small [24,25]. We thus put P = 10, and choose α p as so that higher-order coefficients are more coarsely quantized.

Computation of peak frequencies
We transform the quantized PARCOR coefficients { ρ p } P p=1 into the AR coefficients { a (P) i } P i=1 by using (8), and then define a vector of the AR coefficients as a = ( a (P) 1 , a (P) 2 , . . . , a (P) P ). We put the peak frequency of the AR spectrum P(ω) as f i ,  and then define a vector of the peak frequencies as f = ( f 1 , f 2 , . . .). We here use the peak-picking (PP) method or the Jenkins-Traub (JT) method to compute f from a.
The PP method finds local maxima of P(ω) by sampling it with a constant sampling interval. We applied the PP method with a sampling interval of 1 [Hz] to a set of real speech signals. Then we have f = (507, 1114, 2467, 3349) (Hz). Next we quantized the AR coefficient by using a quantizer with α = 6. Then we have f = (462, 2429, 3349) (Hz). Figure 1 shows the AR spectrums, where the solid line denotes the true AR spectrum and the dashed line denotes the AR spectrum obtained with the quantized AR coefficients. We see that the second peak labeled " f 2 " has disappeared due to rounding errors of the AR coefficients. This phenomenon often occurs in a dull peak such as f 2 .
The JT method finds the roots of the AR polynomial equation A(z) = 0. The resonance frequency f p and the bandwidth B p are then computed from the roots by using (5) and (6), respectively. As B p is smaller, the corresponding peak at frequency f p becomes sharper. Therefore, in the JT method, we can easily find sharp peaks according to the bandwidth. Therefore, the peak selection method is effective in overcoming the problem of peak disappearance.

Database construction
We put ρ and f as key and data (nonkey), respectively, and enter a record consisting of key and data items into a database at each time step. The kd trie (k-dimensional digital search tree) is used as the data structure to allow efficient range searching [26,27]. The average computation time for range searching is only O(log N), where N is the number of records contained in the database. When the database has contained a record with a key equal to the current key, the current record is discarded without being entered into the database.

Peak frequency estimation using test data
When evaluating the estimation performance, we used a different set of speech signals from the ones used for database construction. We recursively compute ρ p n from x n by the LSL algorithm to adapt to changing statistics of signals. We then obtain a coefficient vector ρ by quantizing ρ p n in the same way as in Section 3.1. We choose ρ as a query, and retrieve a coefficient vector nearest to ρ, defined by ρ, from the database by range search. More concretely, we retrieve records with keys lying in the space {(ρ 1 , ρ 2 , . . . , ρ p ) | |ρ p − ρ p | ≤ d, ∀p} from the database, where d is the range size, and increase the value of d from 0 by one until more than or equal to one record is retrieved. When more than one record is retrieved, the nearest neighbor to the query ρ is selected out of them and the corresponding data value (peak frequency) is used as the estimation. In this way, we can recursively estimate peak frequencies without solving A(z) = 0 nor sampling P(ω).

Experimental results
A speech signal of L seconds from a male is sampled at 8 [kHz] to obtain 8000L records. We then constructed the database consisting of 8000L records. Throughout the simulations, we put λ = 0.995 in the LSL algorithm so that good performance is achieved. We evaluated the estimation performance by using ten sets of speech signals of one second.
We denote the peak frequency estimation method that directly computes f from a by using the PP and JT methods as PP1 and JT1, respectively. Algorithm 2 summarizes the JT1 and PP1 methods without using a database. We denote the proposed estimation method that uses the PP and JT methods to compute f from a as PP2 and JT2, respectively. Algorithm 3 summarizes the JT2 and PP2 methods with using a database. We can estimate degradation of estimation accuracy due to rounding errors of PARCOR coefficients by comparing the results of the PP1 (JT1) method and PP2 (JT2) method.
We generated several databases by putting α = 5, 6, 7 [bit] and changing L from 10 [s] to 500 [s], and measured the estimation accuracies of the PP2 and JT2 methods. Figures 2  and 3 show the results of the PP2 and JT2 methods, respectively. We can measure the estimation accuracy of the PP2 and JT2 methods by the difference betweenf k and f k , where f k is the peak frequency estimated by the PP1 (JT1) method andf k is the one by the PP2 (JT2) method. Spectrum distortion caused by rounding errors of high order coefficients is relatively small. We thus evaluated the estimation accuracy by checking whether the relative error |f k − f k |/ f k < 0.1 is satisfied or not. When using the JT2 method, we selected sharp peaks of bandwidth less than 500 [Hz]. In both methods, better results are obtained by putting α = 6 and L = 500, and the PP2 and JT2 methods achieve accuracies of 78.8% and 76.8%, respectively. The estimation accuracy seems to be improved as L is larger.
One may think that the use of a finer quantizer improves the estimation accuracy, but Figures 2 and 3 show that the estimation accuracies for α = 6 and α = 7 are almost the   same. We can explain the reason as follows: if a record of ( ρ, f ) is stored for each bin distributed in ρ-space, a finer quantization of PARCOR coefficients results in a higher estimation accuracy. However, the database size would become extremely large, because the number of bins is increased by a factor of 2 P as α is increased by one. The estimation accuracy improves with an increase of the number of records, only under the assumption that we store a record for each bin. When the assumption is not satisfied, as is the current case, the improvement is not expected.
As mentioned earlier, the current record is discarded when a record with a key equal to the current key has been stored in the database. We thus evaluated the number of records in the database, the rate to the total number of records, and the storage space. Table 1 summarizes the results for L = 500. We see that the database size for α = 7 is large enough, and the number of records stored in the database increases as α is larger. Figures 4a and 4b plot peak frequencies for every 50 time steps estimated by the JT1 and JT2 methods. We put L = 500 and α = 6 in the JT2 method, because this choice gave good result as shown in Figures 2 and 3. We see that the results of the JT1 and JT2 methods are very close to each other, but differ in some snapshots.
Finally we measured the computation time of the PP1, PP2, JT1, and JT2 methods on a personal computer with an Intel Pentium III 1 GHz. Most of the computation time is spent in range searching in the PP2 and JT2 methods, and peak finding in the PP1 and JT1 methods. In other words, the computational complexities of the LSL algorithm and (8) are negligibly small as compared to range searching and peak finding. The computation times of the PP1, JT1, and JT2 (PP2) methods required for processing signals of one second are 84.0 seconds, 5.3 seconds, and 5.0 seconds, respectively. The JT2 (PP2) method is much faster than the PP1 method, but the computation time is almost the same as that of the JT1 method, and real-time processing is not possible in either case.

Simultaneous execution of database construction and estimation
Both of the PP2 and JT2 methods are not satisfactory in terms of processing speed and estimation accuracy. This is  due to the fact that searching in the large database of size 108 (MB) is computationally expensive, and that the range size is repeatedly increased by one until more than or equal to one record is retrieved. The repeated range search increases the processing time, and degrades the estimation accuracy with an increase of range size, because records with keys far apart from a query are extracted from the database. The estimation accuracies of the PP2 and JT2 methods strongly depend on the database contents, for example, whether training and test data are from the same speaker or not, whether acoustic environments in training and testing phases are the same or not. We have to build a large database if we would store a large number of records obtained from different speakers in different acoustic environments. However, searching in the large database requires large computation time and memory requirements. As can be seen from Table 1, the database size is large enough even when we store speech signals from only one speaker. A further data insertion into the database should be avoided from the view point of processing speed and storage space.
For the solution, we simultaneously execute database construction and peak frequency estimation. We first set the maximum range size as d max so that records with keys far apart from a query are not retrieved. We then compute the peak frequency vector f associated with the current PAR-COR coefficient ρ by using the JT method, only when no record is retrieved by range search with d = d max , we choose f as the estimation, and then enter a record of ( f , ρ) into the Database construction & peak frequency estimation (a) Estimate ρ from x n by the LSL algorithm. (b) Quantize ρ into ρ.
(c) Find ρ by range search, where the range size is increased from 0 to d max by one until more than or equal to one record is retrieved. (d) When more than or equal to one record is retrieved, choose f corresponding to ρ as the estimation, and go to step (a). (e) When no record is retrieved by range search with d = d max , (1) quantize ρ into ρ, and then transform ρ into a, (2) compute f from ρ with the JT method, (3) enter ( ρ, f ) into the database, (4) go to step (a). database. The simultaneous execution technique decreases processing time and increases estimation accuracy by appropriately choosing d max . We here denote the improved estimation method using the simultaneous execution as JT3, and summarize it in Algorithm 4. In JT3, the JT method needs to be performed when there is no record close to the current query in the database. Fortunately, the JT method is seldom performed, because statistical properties of speech signals slowly vary with time, and the current key is usually very close to the previous ones.

Experimental results
We have applied the JT3 method to ten sets of speech signals of one second, and have measured the estimation accuracy of four peak frequencies ( f 1 , f 2 , f 3 , f 4 ), the number of records in database, and the processing time per 8000 samples (one second) for different values of α and d max . Tables 2, 3, 4, and 5 summarize the results for α = 6, 7, 8, 9, respectively. Since the range size in ρ-space is represented by δρ = d max /2 α , the actual range sizes for (α = a, d max = b) and (α = a + 1, d max = 2b) are the same. Figure 5 shows the estimation accuracy and the processing time for different values of δρ. We see that the database size and the processing time decrease as d max is larger, while the estimation accuracy decreases. If we put d max = 0 and α = ∞, a perfect accuracy of 100% is achieved. However, instead of achieving perfect accuracy, the time-consuming polynomial rooting is frequently done. A choice of α = 9 and d max = 0 achieves the best accuracy of 99.0%, but requires a large processing time of 3.58 seconds. A choice of α = 6 and d max = 4 achieves the fastest processing speed of 0.52 second, but provides poor accuracy of 71.2%. We have to make a tradeoff between processing time and estimation accuracy in choosing α and d max .
When the sampling interval is large, we can make α large and d max small to improve estimation accuracy. When the sampling interval is small, we should make α small and d max small to reduce processing time. The optimal choice depends on the application. In the current application, 8000 sample data must be processed within one second. Therefore, we should put α = 8 and d max = 6, because this choice gave a satisfactory accuracy of 95.2% with a processing time of 0.64 second per 8000 samples. Figure 6 plots the estimated peak frequencies. It is evident that the estimation result is better than the previous one in Figure 4b.

Database update procedure
In the JT3 method, the current record is entered into the database when no record is retrieved by range search with d = d max . Therefore, the database size monotonously increases as time passes, and therefore the processing speed gets slower. This property is not suitable for real-time processing. We thus update the database contents according to timestamp. More precisely, we determined the maximum number of records in the database as N max , and deleted the oldest record from the database if the number of records in the database exceeds N max . The oldest record in the kd trie can be easily deleted without adding the time index into the key field [28]. Table 4 shows that the number of records is about 3200 when we put α = 8 and d max = 6. We thus put N max = 4000, and measured the processing time per 8000 samples. Figure 7 plots the result using speech signals of 500 seconds, where the solid and dashed lines denote the results with and without database update, respectively. We see that the processing time of the JT3 method without database update becomes larger as time passes, while the processing time with database update is almost constant, because the maximum number of records in the database is fixed at 4000. The estimation ac-  curacies with and without database update are 92.7% and 92.3%, respectively. They are found to be almost the same. We see that the database update procedure can make the processing time almost constant with maintaining high estimation accuracy.
Finally, we combined several set of speech signals, recorded from a radio, into a single file of speech signals of 500 seconds. Figure 8 shows the result. The estimation accuracies with and without database update are 92.9% and 92.4%, respectively, which are almost the same as the previous ones. We see that the performance of the proposed method with database update is fairly robust against changing statistics of signals and environments.

Application to LSP estimation
Also LSP (line spectrum pair) can be uniquely characterized by the PARCOR coefficients, and polynomial root finding is required to compute LSPs from PARCOR coefficients. We thus replaced peak frequencies by LSPs in the key field, and estimated LSPs by using the JT3 method with database update. Figure 9b plots the estimated LSPs for every 50 time steps, where we put p = 10 and α = 6. The result of the JT1 method is shown in Figure 9a for comparison purpose. The processing times of the JT1 and JT3 methods per 8000 samples are 9.2 seconds and 0.96 second, respectively, and the JT3 method achieves a very good accuracy of 98.4%.

CONCLUSION
We have developed the fast and recursive peak frequency estimation method using a database of PARCOR coefficients. We have investigated searching range size and quantization interval of PARCOR coefficients so that a good tradeoff between estimation accuracy and processing speed is achieved. We have simultaneously executed database construction and peak frequency estimation for decreasing processing time and increasing estimation accuracy. Moreover, we have updated the database contents according to timestamp so that processing time is not monotonously increased. We have also applied the database-based method to LSP estimation, and have shown the effectiveness.
The concept of the database-based method originally comes from an intelligent landing system designed previously by one of authors [29]. It is very difficult to model a human skill with a simple mathematical equation. We thus built a database that stores aircraft states as key and control commands provided by a human expert as nonkey, and succeeded to generate a control command close to human operation by database searching. The PARCOR coefficient and the peak frequency presented in this paper correspond to the aircraft state and the control command, respectively.