Skip to content

Advertisement

  • Research Article
  • Open Access

A Comprehensive Noise Robust Speech Parameterization Algorithm Using Wavelet Packet Decomposition-Based Denoising and Speech Feature Representation Techniques

EURASIP Journal on Advances in Signal Processing20072007:064102

https://doi.org/10.1155/2007/64102

  • Received: 22 May 2006
  • Accepted: 11 April 2007
  • Published:

Abstract

This paper concerns the problem of automatic speech recognition in noise-intense and adverse environments. The main goal of the proposed work is the definition, implementation, and evaluation of a novel noise robust speech signal parameterization algorithm. The proposed procedure is based on time-frequency speech signal representation using wavelet packet decomposition. A new modified soft thresholding algorithm based on time-frequency adaptive threshold determination was developed to efficiently reduce the level of additive noise in the input noisy speech signal. A two-stage Gaussian mixture model (GMM)-based classifier was developed to perform speech/nonspeech as well as voiced/unvoiced classification. The adaptive topology of the wavelet packet decomposition tree based on voiced/unvoiced detection was introduced to separately analyze voiced and unvoiced segments of the speech signal. The main feature vector consists of a combination of log-root compressed wavelet packet parameters, and autoregressive parameters. The final output feature vector is produced using a two-staged feature vector postprocessing procedure. In the experimental framework, the noisy speech databases Aurora 2 and Aurora 3 were applied together with corresponding standardized acoustical model training/testing procedures. The automatic speech recognition performance achieved using the proposed noise robust speech parameterization procedure was compared to the standardized mel-frequency cepstral coefficient (MFCC) feature extraction procedures ETSI ES 201 108 and ETSI ES 202 050.

Keywords

  • Speech Signal
  • Gaussian Mixture Model
  • Wavelet Packet
  • Automatic Speech Recognition
  • Noisy Speech

[123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354]

Authors’ Affiliations

(1)
Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ul. 17, Maribor, 2000, Slovenia

References

  1. Junqua J-C, Haton JP: Robustness in Automatic Speech Recognition. Kluwer Academic, Boston, Mass, USA; 1996.View ArticleGoogle Scholar
  2. Allen JB: How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing 1994,2(4):567-577. 10.1109/89.326615View ArticleGoogle Scholar
  3. Gales MJF: Model-based techniques for noise robust speech recognition, Ph.D. thesis. University of Cambridge, Cambridge, UK; 1996.Google Scholar
  4. Gong Y: Speech recognition in noisy environments: a survey. Speech Communication 1995,16(3):261-291. 10.1016/0167-6393(94)00059-JView ArticleGoogle Scholar
  5. ETSI standard document - ETSI ES 201 108 v1.1.1 : Speech Processing, Transmission and Quality aspects (STQ), Distributed speech recognition, Front-end feature extraction algorithm, Compression algorithm. 2000.Google Scholar
  6. ETSI standard document - ETSI ES 202 050 v1.1.1 : Speech Processing, Transmission and Quality aspects (STQ), Distributed speech recognition, Advanced front-end feature extraction algorithm, Compression algorithm. 2002.Google Scholar
  7. Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420View ArticleGoogle Scholar
  8. Bourlard H, Dupont S: Subband-based speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), April 1997, Munich, Germany 2: 1251-1254.Google Scholar
  9. Gowdy JN, Tufekci Z: Mel-scaled discrete wavelet coefficients for speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1351-1354.Google Scholar
  10. Gupta M, Gilbert A: Robust speech recognition using wavelet coefficient features. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '01), December 2001, Madonna di Campiglio, Trento, Italy 445-448.Google Scholar
  11. Hermansky H: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423View ArticleGoogle Scholar
  12. Paliwal KK: On the use of line spectral frequency parameters for speech recognition. Digital Signal Processing 1992,2(2):80-87. 10.1016/1051-2004(92)90028-WView ArticleGoogle Scholar
  13. Deller JR, Proakis JG, Hansen JHL: Discrete-Time Processing of Speech Signals. Macmillan, New York, NY, USA; 1993.Google Scholar
  14. Rabiner L, Juang B-H: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River, NJ, USA; 1993. section 4.5MATHGoogle Scholar
  15. Coifman RR, Wickerhauser MV: Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory 1992,38(2, part 2):713-718. 10.1109/18.119732View ArticleMATHGoogle Scholar
  16. Daubechies I: Ten Lectures on Wavelets. SIAM, Philadelphia, Pa, USA; 1997.MATHGoogle Scholar
  17. Mallat SG: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 1989,11(7):674-693. 10.1109/34.192463View ArticleMATHGoogle Scholar
  18. Strang G, Nguyen T: Wavelets and Filter Banks. Wellesley-Cambridge Press, Wellesley, Mass, USA; 1997.MATHGoogle Scholar
  19. Lu C-T, Wang H-C: Enhancement of single channel speech based on masking property and wavelet transform. Speech Communication 2003,41(2-3):409-427. 10.1016/S0167-6393(03)00011-6View ArticleGoogle Scholar
  20. Sarikaya R, Pellom BL, Hansen JHL: Wavelet packet transform features with application to speaker identification. Proceedings of the 3rd IEEE Nordic Signal Processing Symposium (NORSIG '98), June 1998, Vigsø, Denmark 81-84.Google Scholar
  21. Sheikhzadeh H, Abutalebi HR: An improved wavelet-based speech enhancement system. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 1855-1858.Google Scholar
  22. Ramchandran K, Vetterli M, Herley C: Wavelets, subband coding, and best bases. Proceedings of the IEEE 1996,84(4):541-560. 10.1109/5.488699View ArticleGoogle Scholar
  23. Reyes NR, Zurera MR, Ferreras FL, Amores PJ: Adaptive wavelet-packet analysis for audio coding purposes. Signal Processing 2003,83(5):919-929. 10.1016/S0165-1684(02)00489-9View ArticleMATHGoogle Scholar
  24. Bahoura M, Rouat J: Wavelet speech enhancement based on the Teager energy operator. IEEE Signal Processing Letters 2001,8(1):10-12. 10.1109/97.889636View ArticleGoogle Scholar
  25. Donoho DL: De-noising by soft-thresholding. IEEE Transactions on Information Theory 1995,41(3):613-627. 10.1109/18.382009MathSciNetView ArticleMATHGoogle Scholar
  26. Jafer E, Mahdi AE: Wavelet-based perceptual speech enhancement using adaptive threshold estimation. Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), September 2003, Geneva, Switzerland 569-572.Google Scholar
  27. Jansen M: Wavelet thresholding and noise reduction, Ph.D. thesis. Katholieke Universiteit Leuven, Leuven, Belgium; 2000.Google Scholar
  28. Andrassy B, Vlaj D, Beaugeant C: Recognition performance of the siemens front-end with and without frame dropping on the aurora 2 database. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 193-196.Google Scholar
  29. Hirsch H-G, Pearce D: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceedings of the Automatic Speech Recognition: Challanges for the New Millennium (ISCA ITRW ASR '00), September 2000, Paris, France 181-188.Google Scholar
  30. Pearce D: Enabling new speech driven services for mobile devices: an overview of the ETSI standards activities for distributed speech recognition front-ends. Proceedings of Applied Voice Input/Output Society Conference (AVIOS '00), May 2000, San Jose, Calif, USAGoogle Scholar
  31. AU/225/00 : Baseline Results for Subset of SpeechDat-Car Finnish Database for ETSI STQ WI008 Advanced Front-end Evaluation. Nokia, Janurary 2000Google Scholar
  32. AU/271/00 : Spanish SDC-Aurora Database for ETSI STQ Aurora WI008 Advanced DSR Front-End Evaluation: Description and Baseline Results. UPC, November 2000Google Scholar
  33. AU/273/00 : Description and Baseline Results for the Subset of the Speechdat-Car German Database used for ETSI STQ Aurora WI008 Advanced DSR Front-end Evaluation. Texas Instruments, December 2001Google Scholar
  34. AU/378/01 : Danish SpeechDat-Car Digits Database for ETSI STQ-Aurora Advanced DSR. Aalborg University, January 2001Google Scholar
  35. Macho D, Mauuary L, Noe B, et al.: Evaluation of a noise-robust DSR front-end on aurora database. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 17-20.Google Scholar
  36. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453View ArticleGoogle Scholar
  37. Kotnik B, Vlaj D, Horvat B: Efficient noise robust feature extraction algorithms for distributed speech recognition (DSR) systems. International Journal of Speech Technology 2003,6(3):205-219. 10.1023/A:1023410018862View ArticleGoogle Scholar
  38. Martin R: Spectral subtraction based on minimum statistics. Proceedings of the European Signal Processing Conference (EUSIPCO '94), September 1994, Edinburgh, UK 1182-1185.Google Scholar
  39. O'Shaughnessy D: Enhancing speech degraded by additive noise or interfering speakers. IEEE Communications Magazine 1989,27(2):46-52. 10.1109/35.17653View ArticleGoogle Scholar
  40. McClellan JH, Parks TW: A unified approach to the design of optimum FIR linear-phase digital filters. IEEE Transactions on Circuits Theory 1973,20(6):697-701.View ArticleGoogle Scholar
  41. Rioul O, Duhamel P: A Remez exchange algorithm for orthonormal wavelets. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 1994,41(8):550-560. 10.1109/82.318943View ArticleMATHGoogle Scholar
  42. Boll SF: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(2):113-120. 10.1109/TASSP.1979.1163209View ArticleGoogle Scholar
  43. Hillenbrand JM, Gayvert RT: Vowel classification based on fundamental frequency and formant frequencies. Journal of Speech and Hearing Research 1993,36(4):694-700.View ArticleGoogle Scholar
  44. Klein M: A Study of Voice Activity Detectors. Speech Communications 304-523B, McGill University, Computer and Electrical Engineering Department, April 2000Google Scholar
  45. Mak B, Junqua J-C, Reaves B: A robust speech/non-speech detection algorithm using time and frequency-based features. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '92), March 1992, San Francisco, Calif, USA 1: 269-272.Google Scholar
  46. Nemer E, Gourbran R, Mahmoud S: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing 2001,9(3):217-231. 10.1109/89.905996View ArticleGoogle Scholar
  47. Sohn J, Sung W: A voice activity detector employing soft decision based noise spectrum adaptation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 1: 365-368.Google Scholar
  48. Sohn J, Kim NS, Sung W: A statistical model-based voice activity detection. IEEE Signal Processing Letters 1999,6(1):1-3. 10.1109/97.736233View ArticleGoogle Scholar
  49. Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P: The HTK Book—Version 3.0. Microsoft, Redmond, Wash, USA; 2000.Google Scholar
  50. Atal BS: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America 1974,55(6):1304-1312. 10.1121/1.1914702View ArticleGoogle Scholar
  51. de Wet F, Cranen B, de Veth J, Boves L: A comparison of LPC and FFT-based acoustic features for noise robust ASR. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 865-868.Google Scholar
  52. Sarikaya R, Hansen JHL: Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 687-690.Google Scholar
  53. Kotnik B, Kačič Z, Horvat B: Development and integration of the LDA-toolkit into the COST249 speechdat (II) SIG reference recognizer. Proceedings the 4th International Conference on Language Resources and Evaluation (LREC '04), May 2004, Lisbon, Portugal 2083-2086.Google Scholar
  54. Welling L: Merkmalsextraction in spracherkennungssystemen für grossen wortschatz, Ph.D. thesis. Rheinisch-Westfälische Technische Hochschule, Aachen, Germany; 1999.Google Scholar

Copyright

Advertisement