Skip to main content

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Abstract

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

References

  1. 1.

    Tufts DW, Kumaresan R, Kirsteins I: Data adaptive signal estimation by singular value decomposition of a data matrix. Proceedings of the IEEE 1982,70(6):684–685.

    Article  Google Scholar 

  2. 2.

    Cadzow JA: Signal enhancement—a composite property mapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(1):49–62. 10.1109/29.1488

    MathSciNet  Article  Google Scholar 

  3. 3.

    Dendrinos M, Bakamidis S, Carayannis G: Speech enhancement from noise: a regenerative approach. Speech Communication 1991,10(1):45–57. 10.1016/0167-6393(91)90027-Q

    Article  Google Scholar 

  4. 4.

    De Moor B: The singular value decomposition and long and short spaces of noisy matrices. IEEE Transactions on Signal Processing 1993,41(9):2826–2838. 10.1109/78.236505

    Article  Google Scholar 

  5. 5.

    Van Huffel S: Enhanced resolution based on minimum variance estimation and exponential data modeling. Signal Processing 1993,33(3):333–355. 10.1016/0165-1684(93)90130-3

    Article  Google Scholar 

  6. 6.

    Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251–266. 10.1109/89.397090

    Article  Google Scholar 

  7. 7.

    Hu Y, Loizou P: Perceptual weighting motivated subspace based speech enhancement approach. Proceedings of International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1797–1800.

    Google Scholar 

  8. 8.

    Jabloun F, Champagne B: Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(6):700–708. 10.1109/TSA.2003.818031

    Article  Google Scholar 

  9. 9.

    Hu Y, Loizou PC: A perceptually motivated approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(5):457–465. 10.1109/TSA.2003.815936

    Article  Google Scholar 

  10. 10.

    Jensen SH, Hansen PC, Hansen SD, Sørensen JA: Reduction of broad-band noise in speech by truncated QSVD. IEEE Transactions on Speech and Audio Processing 1995,3(6):439–448. 10.1109/89.482211

    Article  Google Scholar 

  11. 11.

    Rezayee A, Gazor S: An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2001,9(2):87–95. 10.1109/89.902276

    Article  Google Scholar 

  12. 12.

    Lev-Ari H, Ephraim Y: Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters 2003,10(4):104–106. 10.1109/LSP.2003.808544

    Article  Google Scholar 

  13. 13.

    Hansen PSK, Hansen PC, Hansen SD, Sørensen JA: Experimental comparison of signal subspace based noise reduction methods. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 1: 101–104.

    Google Scholar 

  14. 14.

    Huang J, Zhao Y: Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Processing Letters 1997,4(10):283–285. 10.1109/97.633769

    Article  Google Scholar 

  15. 15.

    Hermus K, Verhelst W, Wambacq P: Optimized subspace weighting for robust speech recognition in additive noise environments. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 3: 542–545.

    Google Scholar 

  16. 16.

    Hermus K, Wambacq P: Assessment of signal subspace based speech enhancement for noise robust speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 945–948.

    Google Scholar 

  17. 17.

    Dologlou I, Carayannis G: Physical interpretation of signal reconstruction from reduced rank matrices. IEEE Transactions on Signal Processing 1991,39(7):1681–1682. 10.1109/78.134407

    Article  Google Scholar 

  18. 18.

    Hansen PC, Jensen SH: FIR filter representations of reduced-rank noise reduction. IEEE Transactions on Signal Processing 1998,46(6):1737–1741. 10.1109/78.678511

    Article  Google Scholar 

  19. 19.

    Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), April 1993, Minneapolis, Minn, USA 2: 355–358.

    Google Scholar 

  20. 20.

    Hermus K: Signal subspace decompositions for perceptual speech and audio processing, Ph.D. dissertation.

  21. 21.

    Doclo S, Moonen M: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions on Signal Processing 2002,50(9):2230–2244. 10.1109/TSP.2002.801937

    Article  Google Scholar 

  22. 22.

    Soon IY, Koh SN, Yeo CK: Noisy speech enhancement using discrete cosine transform. Speech Communication 1998,24(3):249–257. 10.1016/S0167-6393(98)00019-3

    Article  Google Scholar 

  23. 23.

    Rissanen J: Modeling by shortest data description. Automatica 1978,14(5):465–471. 10.1016/0005-1098(78)90005-5

    Article  Google Scholar 

  24. 24.

    Bakamidis S, Dendrinos M, Carayannis G: SVD analysis by synthesis of harmonic signals. IEEE Transactions on Signal Processing 1991,39(2):472–477. 10.1109/78.80831

    Article  Google Scholar 

  25. 25.

    Martin R: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 2001,9(5):504–512. 10.1109/89.928915

    Article  Google Scholar 

  26. 26.

    Cohen I: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 2003,11(5):466–475. 10.1109/TSA.2003.811544

    Article  Google Scholar 

  27. 27.

    Rangachari S, Loizou PC, Hu Y: A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 305–308.

    Google Scholar 

  28. 28.

    Golub G, Van Loan C (Eds): Matrix Computations. Johns Hopkins University Press, Baltimore, Md, USA; 1983.

    Google Scholar 

  29. 29.

    Hansen PC, Jensen SH: Prewhitening for rank-deficient noise in subspace methods for noise reduction. IEEE Transactions on Signal Processing 2005,53(10):3718–3726.

    MathSciNet  Article  Google Scholar 

  30. 30.

    Mittal U, Phamdo N: Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing 2000,8(2):159–167. 10.1109/89.824700

    Article  Google Scholar 

  31. 31.

    Hu Y, Loizou PC: A subspace approach for enhancing speech corrupted by colored noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 573–576.

    Google Scholar 

  32. 32.

    Hu Y, Loizou PC: A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing 2003,11(4):334–341. 10.1109/TSA.2003.814458

    Article  Google Scholar 

  33. 33.

    Kang GS, Fransen LJ: Quality improvement of LPC-processed noisy speech by using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(6):939–942. 10.1109/ASSP.1989.28065

    Article  Google Scholar 

  34. 34.

    Linguistic Data Consortium (LDC) https://doi.org/www.ldc.upenn.edu

  35. 35.

    Hirsch H-G, Pearce D: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceedings of International Speech Communication Association (ISCA) Workshop: Authomatic Speech Recognition: Challanges for the New Millenium (ASR '00), September 2000, Paris, France 181–188.

    Google Scholar 

  36. 36.

    Demuynck K: Extracting, modelling and combining information in speech recognition, Ph.D. dissertation.

  37. 37.

    Duchateau J, Demuynck K, Van Compernolle D: Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Communication 1998,24(1):5–17. 10.1016/S0167-6393(98)00002-8

    Article  Google Scholar 

  38. 38.

    Gong Y: Speech recognition in noisy environments: a survey. Speech Communication 1995,16(3):261–291. 10.1016/0167-6393(94)00059-J

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kris Hermus.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hermus, K., Wambacq, P. & Van hamme, H. A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition. EURASIP J. Adv. Signal Process. 2007, 045821 (2006). https://doi.org/10.1155/2007/45821

Download citation

Keywords

  • Speech Recognition
  • Automatic Speech Recognition
  • Speech Enhancement
  • Hankel Matrix
  • Noisy Speech