A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Hermus, Kris; Wambacq, Patrick; Van hamme, Hugo

doi:10.1155/2007/45821

Research Article
Open access
Published: 01 December 2006

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Kris Hermus¹,
Patrick Wambacq¹ &
Hugo Van hamme¹

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 045821 (2006) Cite this article

7686 Accesses
78 Citations
3 Altmetric
Metrics details

Abstract

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

References

Tufts DW, Kumaresan R, Kirsteins I: Data adaptive signal estimation by singular value decomposition of a data matrix. Proceedings of the IEEE 1982,70(6):684–685.
Article Google Scholar
Cadzow JA: Signal enhancement—a composite property mapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(1):49–62. 10.1109/29.1488
Article MathSciNet Google Scholar
Dendrinos M, Bakamidis S, Carayannis G: Speech enhancement from noise: a regenerative approach. Speech Communication 1991,10(1):45–57. 10.1016/0167-6393(91)90027-Q
Article Google Scholar
De Moor B: The singular value decomposition and long and short spaces of noisy matrices. IEEE Transactions on Signal Processing 1993,41(9):2826–2838. 10.1109/78.236505
Article Google Scholar
Van Huffel S: Enhanced resolution based on minimum variance estimation and exponential data modeling. Signal Processing 1993,33(3):333–355. 10.1016/0165-1684(93)90130-3
Article Google Scholar
Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251–266. 10.1109/89.397090
Article Google Scholar
Hu Y, Loizou P: Perceptual weighting motivated subspace based speech enhancement approach. Proceedings of International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1797–1800.
Google Scholar
Jabloun F, Champagne B: Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(6):700–708. 10.1109/TSA.2003.818031
Article Google Scholar
Hu Y, Loizou PC: A perceptually motivated approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(5):457–465. 10.1109/TSA.2003.815936
Article Google Scholar
Jensen SH, Hansen PC, Hansen SD, Sørensen JA: Reduction of broad-band noise in speech by truncated QSVD. IEEE Transactions on Speech and Audio Processing 1995,3(6):439–448. 10.1109/89.482211
Article Google Scholar
Rezayee A, Gazor S: An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2001,9(2):87–95. 10.1109/89.902276
Article Google Scholar
Lev-Ari H, Ephraim Y: Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters 2003,10(4):104–106. 10.1109/LSP.2003.808544
Article Google Scholar
Hansen PSK, Hansen PC, Hansen SD, Sørensen JA: Experimental comparison of signal subspace based noise reduction methods. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 1: 101–104.
Google Scholar
Huang J, Zhao Y: Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Processing Letters 1997,4(10):283–285. 10.1109/97.633769
Article Google Scholar
Hermus K, Verhelst W, Wambacq P: Optimized subspace weighting for robust speech recognition in additive noise environments. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 3: 542–545.
Google Scholar
Hermus K, Wambacq P: Assessment of signal subspace based speech enhancement for noise robust speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 945–948.
Google Scholar
Dologlou I, Carayannis G: Physical interpretation of signal reconstruction from reduced rank matrices. IEEE Transactions on Signal Processing 1991,39(7):1681–1682. 10.1109/78.134407
Article Google Scholar
Hansen PC, Jensen SH: FIR filter representations of reduced-rank noise reduction. IEEE Transactions on Signal Processing 1998,46(6):1737–1741. 10.1109/78.678511
Article Google Scholar
Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), April 1993, Minneapolis, Minn, USA 2: 355–358.
Google Scholar
Hermus K: Signal subspace decompositions for perceptual speech and audio processing, Ph.D. dissertation.
Doclo S, Moonen M: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions on Signal Processing 2002,50(9):2230–2244. 10.1109/TSP.2002.801937
Article Google Scholar
Soon IY, Koh SN, Yeo CK: Noisy speech enhancement using discrete cosine transform. Speech Communication 1998,24(3):249–257. 10.1016/S0167-6393(98)00019-3
Article Google Scholar
Rissanen J: Modeling by shortest data description. Automatica 1978,14(5):465–471. 10.1016/0005-1098(78)90005-5
Article Google Scholar
Bakamidis S, Dendrinos M, Carayannis G: SVD analysis by synthesis of harmonic signals. IEEE Transactions on Signal Processing 1991,39(2):472–477. 10.1109/78.80831
Article Google Scholar
Martin R: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 2001,9(5):504–512. 10.1109/89.928915
Article Google Scholar
Cohen I: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 2003,11(5):466–475. 10.1109/TSA.2003.811544
Article Google Scholar
Rangachari S, Loizou PC, Hu Y: A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 305–308.
Google Scholar
Golub G, Van Loan C (Eds): Matrix Computations. Johns Hopkins University Press, Baltimore, Md, USA; 1983.
MATH Google Scholar
Hansen PC, Jensen SH: Prewhitening for rank-deficient noise in subspace methods for noise reduction. IEEE Transactions on Signal Processing 2005,53(10):3718–3726.
Article MathSciNet Google Scholar
Mittal U, Phamdo N: Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing 2000,8(2):159–167. 10.1109/89.824700
Article Google Scholar
Hu Y, Loizou PC: A subspace approach for enhancing speech corrupted by colored noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 573–576.
Google Scholar
Hu Y, Loizou PC: A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing 2003,11(4):334–341. 10.1109/TSA.2003.814458
Article Google Scholar
Kang GS, Fransen LJ: Quality improvement of LPC-processed noisy speech by using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(6):939–942. 10.1109/ASSP.1989.28065
Article Google Scholar
Linguistic Data Consortium (LDC) https://doi.org/www.ldc.upenn.edu
Hirsch H-G, Pearce D: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceedings of International Speech Communication Association (ISCA) Workshop: Authomatic Speech Recognition: Challanges for the New Millenium (ASR '00), September 2000, Paris, France 181–188.
Google Scholar
Demuynck K: Extracting, modelling and combining information in speech recognition, Ph.D. dissertation.
Duchateau J, Demuynck K, Van Compernolle D: Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Communication 1998,24(1):5–17. 10.1016/S0167-6393(98)00002-8
Article Google Scholar
Gong Y: Speech recognition in noisy environments: a survey. Speech Communication 1995,16(3):261–291. 10.1016/0167-6393(94)00059-J
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering - ESAT, Katholieke Universiteit Leuven, Leuven-Heverlee, 3001, Belgium
Kris Hermus, Patrick Wambacq & Hugo Van hamme

Authors

Kris Hermus
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Wambacq
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Van hamme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kris Hermus.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hermus, K., Wambacq, P. & Van hamme, H. A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition. EURASIP J. Adv. Signal Process. 2007, 045821 (2006). https://doi.org/10.1155/2007/45821

Download citation

Received: 24 October 2005
Revised: 07 March 2006
Accepted: 30 April 2006
Published: 01 December 2006
DOI: https://doi.org/10.1155/2007/45821

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords