Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech, Signal Process.32(6), 1109–1121 (1984).
Article
Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process.33(2), 443–445 (1985).
Article
Google Scholar
P. Scalart, J. V. Filho, in Proc. of ICASSP. Speech enhancement based on a priori signal to noise estimation (IEEEAtlanta, 1996), pp. 629–632.
Google Scholar
T. Lotter, P. Vary, Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Adv. Sig. Process.2005(7), 1110–1126 (2005).
MATH
Google Scholar
C. Breithaupt, T. Gerkmann, R. Martin, in Proc. of ICASSP. A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing (IEEELas Vegas, 2008), pp. 4897–4900.
Google Scholar
S. Elshamy, N. Madhu, W. Tirry, T. Fingscheidt, Instantaneous a priori SNR estimation by cepstral excitation manipulation. IEEE/ACM Trans. Audio, Speech, Lang. Process.25(8), 1592–1605 (2017).
Article
Google Scholar
R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process.9(5), 504–512 (2001).
Article
Google Scholar
I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process.11(5), 466–475 (2003).
Article
Google Scholar
T. Gerkmann, R. C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE/ACM Trans. Audio Speech Lang. Process.20(4), 1383–1393 (2012).
Article
Google Scholar
S. Rangachari, P. C. Loizou, A noise-estimation algorithm for highly non-stationary environments. Speech Commun.48(2), 220–231 (2006).
Article
Google Scholar
C. Loizou Philipos, Speech enhancement: theory and practice (CRC Press, Boca Raton, 2007).
Google Scholar
Y. Wang, D. L. Wang, Towards scaling up classification-based speech separation. IEEE/ACM Trans. Audio Speech Lang. Process.21(7), 1381–1390 (2013).
Article
Google Scholar
Y. Xu, J. Du, L. R. Dai, C. H. Lee, An experimental study on speech enhancement based on deep neural networks. IEEE Sig. Process. Lett.21(1), 65–68 (2014).
Article
Google Scholar
Y. Xu, J. Du, L. R. Dai, C. H. Lee, A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process.23(1), 7–19 (2015).
Article
Google Scholar
Y. Wang, A. Narayanan, D. L. Wang, On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014).
Article
Google Scholar
F. Weninger, J. R. Hershey, J. Le Roux, B. Schuller, in Proc. of GlobalSIP Machine Learning Applications in Speech Processing Symposium. Discriminatively trained recurrent neural networks for single-channel speech separation (IEEEAtlanta, 2014), pp. 577–581.
Google Scholar
D. S. Williamson, Y. Wang, D. L. Wang, Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process.24(3), 483–492 (2016).
Article
Google Scholar
Z. Zhao, H. Liu, T. Fingscheidt, Convolutional neural networks to enhance coded speech. IEEE/ACM Trans. Audio Speech Lang. Process.27(4), 663–678 (2019).
Article
Google Scholar
S. Elshamy, N. Madhu, W. Tirry, T. Fingscheidt, DNN-supported speech enhancement with cepstral estimation of both excitation and envelope. IEEE/ACM Trans. Audio Speech Lang. Process.26(12), 2460–2474 (2018).
Article
Google Scholar
N. Takahashi, N. Goswami, Y. Mitsufuji, in Proc. of IWAENC. MMdenseLSTM: an efficient combination of convolutional and recurrent neural networks for audio source separation (IEEETokyo, 2018), pp. 106–110.
Google Scholar
T. Gao, J. Du, L. -R. Dai, C. -H. Lee, in Proc. of ICASSP. Densely connected progressive learning for LSTM-based speech enhancement (IEEECalgary, 2018), pp. 5054–5058.
Google Scholar
H. Erdogan, J. R. Hershey, S. Watanabe, J. Le Roux, in Proc. of ICASSP. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks (IEEEBrisbane, 2015), pp. 708–712.
Google Scholar
T. Fingscheidt, S. Suhadi, in Proc. of INTERSPEECH. Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo (ISCAAntwerpen, 2007).
Google Scholar
ITU-T Rec P.1100, Narrow-band hands-free communication in motor vehicles (2015).
J. Chen, D. L. Wang, Long short-term memory for speaker generalization in supervised speech separation. J. Acoust. Soc. Am.141(6), 4705–4714 (2017).
Article
Google Scholar
S. -W. Fu, T. Hu, Y. Tsao, X. Lu, in Proc. of MLSP. Complex spectrogram enhancement by convolutional neural network with multi-metrics learning (IEEETokyo, 2017), pp. 1–6.
Google Scholar
S. R. Park, J. Lee, in Proc. of INTERSPEECH. A fully convolutional neural network for speech enhancement (ISCAStockholm, 2017), pp. 1993–1997.
Chapter
Google Scholar
X. Mao, C. Shen, Y. -B. Yang, in Proc. of NIPS. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections (Curran Associates, Inc.Barcelona, 2016), pp. 2802–2810.
Google Scholar
V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39(12), 2481–2495 (2017).
Article
Google Scholar
H. Noh, S. Hong, B. Han, in Proceedings of the IEEE International Conference on Computer Vision. Learning deconvolution network for semantic segmentation (IEEESantiago, 2015), pp. 1520–1528.
Google Scholar
H. Zhao, S. Zarar, I. Tashev, C. Lee, in Proc. of ICASSP. Convolutional-recurrent neural networks for speech enhancement (IEEECalgary, 2018), pp. 2401–2405.
Google Scholar
K. Tan, D. L. Wang, in Proc. of INTERSPEECH. A convolutional recurrent neural network for real-time speech enhancement (ISCAHyderabad, 2018), pp. 3229–3233.
Chapter
Google Scholar
Z. Xu, M. Strake, T. Fingscheidt, Concatenated identical DNN (CI-DNN) to reduce noise-type dependence in DNN-based speech enhancement. arXiv:1810.11217 (2018).
M. Tinston, Y. Ephraim, in Proc. of CISS. Speech enhancement using the multistage wiener filter (IEEEBaltimore, 2009), pp. 55–60.
Google Scholar
D. S. Williamson, Y. Wang, D. L. Wang, Reconstruction techniques for improving the perceptual quality of binary masked speech. J. Acoust. Soc. Am.136(2), 892–902 (2014).
Article
Google Scholar
E. M. Grais, H. Erdogan, in Proc. of INTERSPEECH. Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation (ISCALyon, 2013).
Google Scholar
E. M. Grais, G. Roma, A. J. R. Simpson, M. D. Plumbley, Two-stage single-channel audio source separation using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process.25(9), 1773–1783 (2017).
Article
Google Scholar
Z. Zhao, H. Liu, T. Fingscheidt, Convolutional neural networks to enhance coded speech. ACM Trans. Audio Speech Lang. Process.27(4), 663–678 (2019).
Article
Google Scholar
M. Strake, B. Defraene, K. Fluyt, W. Tirry, T. Fingscheidt, in Proc. of WASPAA. Separated noise suppression and speech restoration: LSTM-based speech enhancement in two stages (IEEENew Paltz, 2019), pp. 234–238.
Google Scholar
T. Fingscheidt, S. Suhadi, in ITG-Fachtagung Sprachkommunikation. Data-driven speech enhancement (ITGKiel, 2006).
Google Scholar
T. Fingscheidt, S. Suhadi, S. Stan, Environment-optimized speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process.16(4), 825–834 (2008).
Article
Google Scholar
R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent neural networks. arXiv:1312.6026 (2013).
V. Nair, G. E. Hinton, in Proc. of ICML. Rectified linear units improve restricted boltzmann machines (OmnipressHaifa, 2010), pp. 807–814.
Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput.9(8), 1735–1780 (1997).
Article
Google Scholar
M. D. Zeiler, G. W. Taylor, R. Fergus, in Proc. of ICCV. Adaptive deconvolutional networks for mid and high level feature learning (IEEEBarcelona, 2011), pp. 2018–2025.
Google Scholar
A. L. Maas, A. Y. Hannun, A. Y. Ng, in Proc. of ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. Rectifier nonlinearities improve neural network acoustic models (OmnipressAtlanta, 2013).
Google Scholar
V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning. arXiv:1603.07285 (2016).
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, TIMIT acoustic-phonetic continuous speech corpus (Linguistic Data Consortium, Philadelpia, 1993). Linguistic Data Consortium.
Google Scholar
NTT Advanced Technology Corporation, Super wideband stereo speech database. San Jose, CA, USA. NTT Advanced Technology Corporation.
D. B. Dean, S. Sridharan, R. J. Vogt, M. W. Mason, in Proc. of INTERSPEECH. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms (ISCAMakuhari, 2010), pp. 3110–3113.
Google Scholar
H. -G. Hirsch, D. Pearce, in Proc. of ASR2000-Automatic Speech Recognition: Challenges for the New Millenium ISCA Tutorial and Research Workshop. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions (ISCAParis, 2000), pp. 181–188.
Google Scholar
EG 202 396-1, Speech Processing, ETSI, Transmission and Quality Aspects (STQ); Speech Quality Performance in the Presence of Background Noise; Part 1: Background Noise Simulation Technique and Background Noise Database (2008).
P. J. Werbos, Generalization of backpropagation with application to a recurrent gas market model. Neural Netw.1(4), 339–356 (1988).
Article
Google Scholar
D. P. Kingma, J. Ba, Adam: a method for stochastic optimization. arXiv:1412.6980 (2014).
D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors. Nature. 323(6088), 533–536 (1986).
Article
MATH
Google Scholar
H. Yu, Post-filter optimization for multichannel automotive speech enhancement. PhD thesis, Technische Universität Braunschweig (2013).
X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. Wong, W. Woo, in Proc. of NIPS. Convolutional LSTM network: a machine learning approach for precipitation nowcasting (Curran Associates, Inc.Montreal, 2015), pp. 802–810.
Google Scholar
ITU-T Rec. G.160 Appendix II, Objective measures for the characterization of the basic functioning of noise reduction algorithms (2012).
V. Mai, D. Pastor, A. Aïssa-El-Bey, R. Le-Bidan, Robust estimation of non-stationary noise power spectrum for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process.23(4), 670–682 (2015).
Article
Google Scholar
M. Rahmani, A. Akbari, B. Ayad, B. Lithgow, Noise cross psd estimation using phase information in diffuse noise field. Sig. Process.89(5), 703–709 (2009).
Article
MATH
Google Scholar
A. Sugiyama, R. Miyahara, in Proc. of ICASSP. A directional noise suppressor with a specified beamwidth (IEEEBrisbane, 2015), pp. 524–528.
Google Scholar
ITU-T Rec. P.862, Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001).
C. H. Taal, R. C. Hendriks, R. Heusdens, J. Jensen, in Proc. of ICASSP. A short-time objective intelligibility measure for time-frequency weighted noisy speech (IEEEDallas, 2010), pp. 4214–4217.
Google Scholar
S. Gustafsson, R. Martin, P. Vary, in Proc. of Workshop on Quality Assessment in Speech, Audio, and Image Communication. On the optimization of speech enhancement systems using instrumental measures (ITG/EURASIPDarmstadt, 1996), pp. 36–40.
Google Scholar