- Research Article
- Open Access
A Ramp Cosine Cepstrum Model for the Parameter Estimation of Autoregressive Systems at Low SNR
© Shaikh Anowarul Fattah et al. 2010
- Received: 28 September 2009
- Accepted: 13 April 2010
- Published: 20 May 2010
A new cosine cepstrum model-based scheme is presented for the parameter estimation of a minimum-phase autoregressive (AR) system under low levels of signal-to-noise ratio (SNR). A ramp cosine cepstrum (RCC) model for the one-sided autocorrelation function (OSACF) of an AR signal is first proposed by considering both white noise and periodic impulse-train excitations. Using the RCC model, a residue-based least-squares optimization technique that guarantees the stability of the system is then presented in order to estimate the AR parameters from noisy output observations. For the purpose of implementation, the discrete cosine transform, which can efficiently handle the phase unwrapping problem and offer computational advantages as compared to the discrete Fourier transform, is employed. From extensive experimentations on AR systems of different orders, it is shown that the proposed method is capable of estimating parameters accurately and consistently in comparison to some of the existing methods for the SNR levels as low as −5 dB. As a practical application of the proposed technique, simulation results are also provided for the identification of a human vocal tract system using noise-corrupted natural speech signals demonstrating a superior estimation performance in terms of the power spectral density of the synthesized speech signals.
- Power Spectral Density
- Discrete Cosine Transform
- Discrete Fourier Transform
- Vocal Tract
- Noisy Observation
The parameter estimation of autoregressive (AR) systems under noisy conditions has been extensively studied in areas of signal processing, communication, and control. For example, estimating the AR or linear predictive coding (LPC) parameters of a vocal tract (VT) system from an observed noisy speech plays an important role in speech coding, synthesis, and recognition . Numerous system identification methods have been developed for both noise-free and noisy AR systems. The maximum likelihood (ML) methods are asymptotically consistent but their convergence performance relies heavily on the initialization process of the methods [2, 3]. In , Xie and Leung have proposed a genetic algorithm to be employed to solve the ML estimation problem at a low SNR, where they consider an AR system driven by chaos. The Yule-Walker (YW) methods have been widely employed to identify the AR systems . The estimation performance of noise compensation-based identification schemes, such as the low-order Yule-Walker (LOYW) method, depend heavily on the accuracy of a priori knowledge of the noise corrupting the signal . Although the high-order Yule-Walker (HOYW) method does not require a priori estimate of the noise variance, it suffers from a singularity problem and has a large estimation variance . To reduce the estimation variance, a least-squares HOYW (LSYW) method can be used . However, in the presence of a reasonable level of noise, the estimation variance of the LSYW method is still large. In order to overcome this problem, in , Davila has proposed a signal/noise subspace YW (SSYW) method by introducing a noise compensation in the LOYW method. In , by deriving a method of removing the noise-induced bias from the standard least-squares (LS) estimator, Zheng has proposed a method known as the improved least-squares fast-converging (ILSF) algorithm. Both the SSYW and ILSF methods are computationally fast and provide estimation results that are quite satisfactory for low levels of signal-to-noise ratio (SNR).
Identifying AR systems from cepstral coefficients has been attempted only by a few researchers [7, 8]. In , a homomorphic LPC (HLPC) method has been proposed for both the white noise and periodic impulse-train excitations in a noise-free environment. The HLPC method cannot guarantee the stability of the estimated AR model. The ramp-cepstrum method proposed in  overcomes this problem even in a noisy environment. The method employs conventional cepstrum of a correlation function to formulate a ramp-cepstrum for the estimation of the AR parameters. Since conventional cepstrum is based on Fourier analysis, even for applications dealing with real data it involves complex computation and phase unwrapping operation during its implementation via the discrete Fourier transform (DFT) and inverse DFT (IDFT). In comparison to the DFT, the discrete cosine transform (DCT) is much better in many applications dealing with real signals, for example speech enhancement and speech recognition [1, 9], since it avoids complex computations. Also DCT requires relatively less number of coefficients to represent the signal/image data as compared to DFT. Moreover, it uses a very simple algorithm for phase unwrapping. Nevertheless, the DCT has rarely been employed for system identification problems . In , the real spectrum computed by the DCT is employed to obtain an AR model describing the squared Hilbert temporal envelope of a sequence. In this paper, motivated by the advantageous features of DCT, we develop an AR system identification technique in the cepstral domain where DCT rather than DFT is employed. To this end, unlike the conventional cepstrum determined via Fourier and inverse Fourier transforms, a cosine cepstrum is first formally defined through cosine and inverse cosine transforms, and then utilized to develop a theory for the AR parameter estimation.
The objective of this paper is to develop an effective cosine-cepstrum-based methodology for the identification of AR systems from very heavily noise-corrupted samples of the output observations. The main idea of the proposed methodology is to achieve the above stated goal by having a transformed version of the corrupted signal for model-fitting so that the process of transformation itself is noise-robust, and by developing a corresponding target model for the purpose of fitting. In the proposed technique, the noise-robust approach of obtaining the transformed signal is to use the ramp cosine cepstrum (RCC) of one-sided autocorrelation function (OSACF) of an AR signal for both white noise and periodic impulse-train excitations. With this transformation for the signal, we are able to develop the corresponding target model, referred to as the RCC model, for the estimation of the system parameters. The motivation behind using the OSACF for the cosine cepstrum computation is to reduce the effect of the noise. Unlike conventional methods, we deal with both white noise and periodic impulse-train excitations. By employing the RCC model, a residue-based least-squares (RBLS) optimization scheme is presented for the estimation of the AR parameters. For the purpose of implementation, the DCT, which is capable of handling the phase unwrapping problem and offers computational advantages over the DFT, is employed in the proposed method. The proposed method is tested for the estimation of the AR parameters of different synthetic AR systems and also for the identification of a human vocal tract system using natural speech signals.
The paper is organized as follows. In Section 2, the problem of AR system identification in the presence of noise is formulated in the cepstral domain. In Section 3, first, a ramp cosine cepstrum model based on a one-sided ACF of an AR signal for the two types of input excitations is derived and then the DCT is employed for the realization of the derived model. Section 4 presents a residue-based least-squares optimization scheme for the AR parameter estimation using the proposed ramp cosine cepstrum model under noisy conditions. The performance of the proposed method is demonstrated in Section 5 through extensive computer simulations for both synthetic and natural speech signals. Finally, in Section 6, salient features of the proposed algorithm are summarized with some concluding remarks.
where is the AR polynomial and represents the th pole with a magnitude and angle . In most of the system identification problems, is modeled to be a stationary zero-mean white Gaussian noise with an unknown variance . For some practical applications, such as speech signal processing, seismology, and communication, however, the excitation may have other forms [1, 11–13]. For example, in speech signal processing, a periodic impulse-train is often used as an excitation of the vocal tract system [1, 11, 13]. As such, in this paper, both the white Gaussian noise and the periodic impulse-train excitations are considered as input to the AR system.
Definition in (4) is valid provided is deterministic. Since a numerical computation of (5) provides only the principal or wrapped phase, a phase unwrapping algorithm is necessary to restore the phase continuity [11, 14].
where is the cepstrum of the impulse response and represents the cepstrum of one realization of the input signal. Utilizing such an advantage of homomorphic deconvolution, cepstrum domain methods have been proposed for system identification in [7, 15, 16]. For example, in , in order to estimate the AR parameters, a mean-squared error minimization involving (6) is used by employing the Cholesky decomposition. However, as mentioned in , the problem of this method is that the stability of the estimated AR model is not guaranteed. It is to be noted that all the cepstral domain methods mentioned above deal only with the noise-free environment.
where arises because of the noise. The term determines as to how the noise affects and it vanishes altogether in the absence of noise. In order to estimate the AR system parameters from , the effect of has to be reduced. It is difficult to obtain an accurate estimate of from , since the cepstrum decomposition techniques are very sensitive to the noise level [17, 18]. In this paper, in order to reduce the effect of noise in extracting the AR parameters, first, we avoid computing cepstrum directly from the noise-corrupted observations by using a one-sided ACF, and then develop a ramp cosine cepstrum (RCC) model for a model-fitting based least-squares optimization in the cepstral domain. Moreover, in the proposed method, the DCT, instead of the conventional DFT, is employed for computing the cepstrum so as to overcome the problem of phase unwrapping and to achieve computational savings in dealing with real signals.
The OSACF exhibits a higher noise immunity than the conventional ACF does . Since the spectral envelope of the OSACF of noisy observations, in comparison to the conventional two-sided conventional ACF, strongly enhances the highest power frequency bands corresponding to the spectral peaks, a large attenuation of the noise components lying outside the enhanced frequency bands would occur.
It is observed from (16) that for a constant value of for all . Thus for , the last term on the right side of (19) vanishes. Let us now consider the remaining third term of (19) that depends on the characteristics of the input excitation . In the following section we consider separately the white Gaussian noise and a periodic impulse-train as an input excitation.
3.1. White Noise Excitation
The model given by (28) is termed as the AR ramp cosine cepstrum (RCC) model for the OSACF of . This model will be used in the next section to formulate an objective function for the least-squares fitting problem in a noisy environment.
3.2. Periodic Impulse-Train Excitation
In the derivation of the RCC model with the white noise excitation, it was observed that the term containing the effect of white noise excitation becomes zero for , since the PSD of the input is a constant. However, the situation is more complicated in the case of a periodic impulse-train excitation where the corresponding PSD is no longer a constant. Next, we analyze the effect of the third term of (19), which is now denoted as , on .
3.3. Computation of RCC Model via DCT/IDCT
Thus, this representation clearly supports a simple phase unwrapping. On the other hand, in the case of using DFT for the computation of cepstrum, complicated phase unwrapping algorithms as proposed in literature [11, 14] need to be used, since the phase in this case has no longer binary values.
4.1. Effect of Additive Noise
Here, the term arises because of the noise. Like in (8), would vanish in the absence of noise. Now, the RCC model derived in Section 3 can be used in (52) for a ramp cosine cepstral model fitting to minimize the error between and . By this approach the RCC model parameters, and thus the AR parameters are estimated.
Since, in the presence of additive white Gaussian noise, the zero lag of the noisy ACF is most severely corrupted in comparison to other lags, if the zero lag is kept as it is during the computation of the RCC of the OSACF, it may result in a more erroneous value of RCC. On the other hand, excluding the zero lag, although it may reduce the effect of noise, would remove the average power of the observed data . Since for , we replace by with in order to reduce the effect of noise. This is suitable especially for a difficult situation where noise variance and/or noise-only data are not available. The process can efficiently suppress the level of while leaving the shape of similar to that of .
4.2. Ramp Cosine Cepstral Fitting: Residue-Based Least-Squares Optimization
Note that and are independent variables and depends on as seen from (29). We would like to find the optimal solution for and by a search algorithm based on the computation of (53) and (54). In order to reduce the computational burden, a two-step search algorithm is adopted. In the first step, a coarse-search based on the DCT spectrum of the OSACF of the observed data is employed to find out the initial estimate of and . In the second step, a fine-search is carried out around each initially estimated pair of and to obtain a more accurate estimate. In the fine-search, a neighborhood centered at each initial estimate of and is searched with a prescribed search resolution in a bounded region. A pair of and that globally minimizes is selected as the estimate of a desired pole. It can be observed from (54) that, in order to determine the th residual function , the computed values of and are utilized. Proceeding in this manner, the AR parameters can be determined using (3) once all the poles have been estimated. In the proposed search scheme, restricting the search range of within the stable region inherently guarantees the stability of the estimated AR system. Another advantage of the RBLS scheme is that in each fine-search, instead of the entire RCC model with all constituent terms, only one such term is estimated in (53), and in this fashion each term of the RCC model is sequentially obtained. This is done with a view to convert a multivariable optimization problem into a set of two-variable optimization scheme which makes the problem much simpler.
In this section, extensive simulations are carried out in order to demonstrate the effectiveness of the proposed technique in identifying the AR systems in the presence of noise. We investigate the identification performance for synthetic AR signals as well as natural speech signals corrupted by additive noise. The estimation performance of the proposed method in terms of the accuracy and consistency of the estimated parameters is obtained and compared with that of the existing methods including the improved least-squares fast-converging (ILSF) method , the signal/noise subspace Yule-Walker (SSYW) method , and the modified least-squares Yule-Walker (MLSYW) method .
5.1. Results on Synthetic AR Systems
5.1.1. White Noise Excitation
From the noisy observations, first, the OSACF is computed using (48) and (10). Note that for the purpose of implementing the cosine cepstrum, generally the continuous frequency is sampled as for resulting in a -point Discrete Cosine Transform (DCT). According to the description provided in Section 3.3 for the noise-free observations, DCT-IDCT-based ramp-cosine cepstrum (RCC) is computed using . The RCC model parameters are then determined using the residue-based least-squares optimization technique introduced in Section 4.2. In the proposed optimization scheme, the search range for is chosen in the range , that allows the identification of systems even with a very fast decaying autocorrelations. The initial estimates of are obtained from the location of the peaks of the smoothed DCT of the OSACF of . The search range for is in a range of chosen symmetrically around the neighborhood of the initial estimates. Search resolutions of and are used for and , respectively. It has been experimentally found that, in order to obtain a better estimate of the unknown AR coefficients, the number of RCC samples to be considered in the model-fitting operation should be higher than . In our experiment, the number of RCC samples is taken as .
5.1.2. Impulse-Train Excitation
We now consider the problem of AR system identification with periodic impulse-train excitations of different periods for various levels of noise. An impulse-train is generated using (30) with a known value of . We choose the number of RCC samples less than ; thus, . A noisy AR signal is generated according to (1) and (7) with . The simulations are carried out for independent trials and the results averaged.
It is to be mentioned that, we have also compared the proposed ramp cosine cepstrum (RCC) method with our ramp cepstrum (RC) method previously developed in  which employs conventional cepstrum of a correlation function via DFT and IDFT. It has been observed that the estimation performance of the RCC method is slightly better than our previous RC method at a very low SNR of around −5 dB, and remains comparable for other levels of SNR. Although the two methods exhibit quite a similar estimation performance, yet the RCC method based on the DCT-IDCT implementation offers significant computational advantages as opposed to the RC approach.
5.2. An Application for Vocal Tract System Identification
In this paper, a new technique for the parameter estimation of an AR system, given its noise-corrupted output observations, has been proposed. A comprehensive and accurate ramp cosine cepstrum (RCC) model of the one-sided ACF of an AR signal, valid for both white noise and periodic impulse-train excitations, has been developed in a unified fashion in order to identify the AR systems. A residue-based least-squares ramp cosine cepstral fitting scheme employing the RCC model has been presented. It has been shown that the proposed method is able to provide a more accurate estimate of the AR parameters. It combines the attractive features of the correlation- and cepstral-domain system identifications, and has the advantage of providing the flexibility in incorporating some a priori knowledge of the parameters, if available, to facilitate the process of parameter estimation. Extensive experimentation performed on different AR systems has demonstrated that the proposed method is sufficiently accurate and consistent in estimating the parameters of the AR signals at very low levels of SNR. The method has also been applied to noise-corrupted natural speech signals for the estimation of human vocal tract system parameters, the accuracy of which is demonstrated in terms of the PSD of the resulting synthesized speech. The simulation results have revealed that the proposed method is superior to some of the existing methods in handling the parameter estimation problem of natural speech signals under white or real-life babble noise degradation.
This work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the Regroupement Stratégique en Microsystèmes du Québec (ReSMiQ).
- O'Shaughnessy D: Speech Communications Human and Machine. 2nd edition. IEEE Press, New York, NY, USA; 2000.MATHGoogle Scholar
- Kay SM: Modern Spectral Estimation, Theory and Application. Prentice-Hall, Englewood Cliffs, NJ, USA; 1988.MATHGoogle Scholar
- Xie N, Leung H: Blind identification of autoregressive system using chaos. IEEE Transactions on Circuits and Systems I 2005, 52(9):1953-1964.View ArticleGoogle Scholar
- Vergara-Dominguez L: New insights into the high-order Yule-Walker equations. IEEE Transactions on Acoustics, Speech, and Signal Processing 1990, 38(9):1649-1651. 10.1109/29.60088View ArticleMATHGoogle Scholar
- Davila CE: A subspace approach to estimation of autoregressive parameters from noisy measurements. IEEE Transactions on Signal Processing 1998, 46(2):531-534. 10.1109/78.655442View ArticleGoogle Scholar
- Zheng WX: Fast identification of autoregressive signals from noisy observations. IEEE Transactions on Circuits and Systems II 2005, 52(1):43-48.View ArticleGoogle Scholar
- Huang Z, Yang X, Zhu X, Kuh A: Homomorphic linear predictive coding. A new estimation algorithm for all-pole speech modelling. IEE Proceedings, Part I: Communications, Speech and Vision 1990, 137(2):103-108. 10.1049/ip-i-2.1990.0014Google Scholar
- Fattah SA, Zhu W-P, Ahmad MO: Identification of autoregressive systems in noise based on a ramp-cepstrum model. IEEE Transactions on Circuits and Systems II 2008, 55(10):1051-1055.MathSciNetView ArticleGoogle Scholar
- Wang F, Yip P: Cepstrum analysis using discrete trigonometric transforms. IEEE Transactions on Signal Processing 1991, 39(2):538-541. 10.1109/78.80852View ArticleGoogle Scholar
- Athineos M, Ellis DPW: Autoregressive modeling of temporal envelopes. IEEE Transactions on Signal Processing 2007, 55(11):5237-5245.MathSciNetView ArticleGoogle Scholar
- Oppenheim AV, Schafer RW: Discrete-Time Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 1989.MATHGoogle Scholar
- de Almeida SJM, Bermudez JCM, Bershad NJ, Costa MH: A statistical analysis of the affine projection algorithm for unity step size and autoregressive inputs. IEEE Transactions on Circuits and Systems I 2005, 52(7):1394-1405.MathSciNetView ArticleGoogle Scholar
- Fattah SA, Zhu W-P, Ahmad MO: A novel technique for the identification of ARMA systems under very low levels of SNR. IEEE Transactions on Circuits and Systems I 2008, 55(7):1988-2001.MathSciNetView ArticleGoogle Scholar
- Long DG: Exact computation of the unwrapped phase of a finite-length time series. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988, 36(11):1787-1790. 10.1109/29.9019View ArticleMATHGoogle Scholar
- Verhelst W, Steenhaut O: A new model for the short-time complex cepstrum of voiced speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986, 34(1):43-51. 10.1109/TASSP.1986.1164787View ArticleGoogle Scholar
- Kobayashi T, Imai S: Spectrum analysis using generalized cepstrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984, 32(5):1087-1089. 10.1109/TASSP.1984.1164416View ArticleGoogle Scholar
- Hwang T-H, Lee L-M, Wang H-C: Cepstral behaviour due to additive noise and a compensation scheme for noisy speech recognition. IEE Proceedings Vision, Image and Signal Processing 1998, 145(5):316-321. 10.1049/ip-vis:19982319View ArticleGoogle Scholar
- Kim HK, Rose RC: Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Speech and Audio Processing 2003, 11(5):435-446. 10.1109/TSA.2003.815515View ArticleGoogle Scholar
- Byrnes CI, Enqvist P, Lindquist A: Cepstral coefficients, covariance lags, and pole-zero models for finite data strings. IEEE Transactions on Signal Processing 2001, 49(4):677-693. 10.1109/78.912912MathSciNetView ArticleGoogle Scholar
- Hernando J, Nadeu C: Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Transactions on Speech and Audio Processing 1997, 5(1):80-84. 10.1109/89.554273View ArticleGoogle Scholar
- Garofolo JS, Lamel LF, Fisher WM, et al.: Timit acoustic-phonetic continuous speech corpus. Proceedings of Linguistic Data Consortium, 1993, Philadelphia, PA, USAGoogle Scholar
- Hillenbrand JM, Getty LA, Clark MJ, Wheeler K: Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America 1995, 97(5 I):3099-3111.View ArticleGoogle Scholar
- Yegnanarayana B, Veldhuis RNJ: Extraction of vocal-tract system characteristics from speech signals. IEEE Transactions on Speech and Audio Processing 1998, 6(4):313-327. 10.1109/89.701359View ArticleGoogle Scholar
- Varga A, Steeneken HJM: Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 1993, 12(3):247-251. 10.1016/0167-6393(93)90095-3View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.