G Hinton, L Deng, D Yu, G Dahl, A Mohamed, N Jaitly, A Senior, V Vanhoucke, T Sainath, B Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Proc. Mag. 29(6), 82–97 (2012).
Article
Google Scholar
JT Geiger, JF Gemmeke, B Schuller, G Rigoll, in Proc. INTERSPEECH. Investigating NMF speech enhancement for neural network based acoustic models (IEEE Singapore, Singapore, 2014).
S Thomas, S Ganapathy, H Hermansky, Recognition of reverberant speech using frequency domain linear prediction. IEEE Signal Proc. Let. 15, 681–684 (2008).
Article
Google Scholar
B Kingsbury, N Morgan, S Greenberg, Robust speech recognition using the modulation spectrogram. Speech Commun. 25, 117–132 (1998).
Article
Google Scholar
KJ Palomäki, GJ Brown, JP Barker, Techniques for handling convolutional distortion with ‘missing data’ automatic speech recognition. Speech Commun. 43(1–2), 123–142 (2004).
Article
Google Scholar
F Weninger, S Watanabe, J Le Roux, JR Hershey, Y Tachioka, J Geiger, B Schuller, G Rigoll, in Proc. REVERB Workshop (REVERB’14). The MERL/MELCO/TUM system for the REVERB Challenge using deep recurrent neural network feature enhancement (Florence, Italy, 2014).
JT Geiger, E Marchi, B Schuller, G Rigoll, in Proc. REVERB Workshop (REVERB’14). The TUM system for the REVERB Challenge: recognition of reverberated speech using multi-channel correlation shaping dereverberation and BLSTM recurrent neural networks (Florence, Italy, 2014).
A Sehr, R Maas, W Kellermann, Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Trans. Audio, Speech, Language Process. 18(7), 1676–1691 (2010).
Article
Google Scholar
DD Lee, HS Seung, in Adv. Neur. In. 13, ed. by TK Leen, TG Dietterich, and V Tresp. Algorithms for non-negative matrix factorization (MIT PressCambridge, 2001), pp. 556–562.
T Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE T. Audio Speech. 15(3), 1066–1074 (2007).
Article
Google Scholar
P Smaragdis, JC Brown, in IEEE Workshop Applicat. Signal Process. Audio and Acoust. Non-negative matrix factorization for polyphonic music transcription (IEEENew Paltz, NY, USA, 2003), pp. 177–180.
KW Wilson, B Raj, P Smaragdis, A Divakaran, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). Speech denoising using nonnegative matrix factorization with priors (IEEELas Vegas, NV, USA, 2008), pp. 4029–4032.
JF Gemmeke, T Virtanen, A Hurmalainen, Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE T. Audio Speech. 19(7), 2067–2080 (2011).
Article
Google Scholar
P Smaragdis, in Independent Component Analysis and Blind Signal Separation. Lecture Notes in Computer Science, 3195, ed. by CG Puntonet, A Prieto. Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs (SpringerBerlin Heidelberg, 2004), pp. 494–499.
H Kameoka, T Nakatani, T Yoshioka, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms (IEEETaipei, Taiwan, 2009), pp. 45–48.
K Kumar, R Singh, B Raj, R Stern, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). Gammatone sub-band magnitude-domain dereverberation for ASR (IEEEPrague, Czech Republic, 2011), pp. 4604–4607.
H Kallasjoki, JF Gemmeke, KJ Palomäki, AV Beeston, GJ Brown, in Proc. REVERB Workshop (REVERB’14). Recognition of reverberant speech by missing data imputation and NMF feature enhancement (Florence, Italy, 2014).
K Palomäki, H Kallasjoki, in Proc. REVERB Workshop (REVERB’14). Reverberation robust speech recognition by matching distributions of spectrally and temporally decorrelated features (Florence, Italy, 2014).
U Remes, in Proc. INTERSPEECH. Bounded conditional mean imputation with an approximate posterior (ISCALyon, France, 2013), pp. 3007–3011.
AV Beeston, GJ Brown, in UK Speech Conf. Modelling reverberation compensation effects in time-forward and time-reversed rooms (Cambridge, UK, 2013).
S Dharanipragada, M Padmanabhan, in Proc. Int. Conf. Spoken Lang. Process. (ICSLP). A non-linear unsupervised adaptation technique for speech recognition (ISCABeijing, 2000).
K Kinoshita, M Delcroix, T Yoshioka, T Nakatani, E Habets, R Haeb-Umbach, V Leutnant, A Sehr, W Kellermann, R Maas, S Gannot, B Raj, in Proc. IEEE Workshop Applicat. Signal Process. Audio and Acoust. (WASPAA). The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech (IEEENew Paltz, NY, USA, 2013).
D Povey, A Ghoshal, G Boulianne, L Burget, O Glembek, N Goel, M Hannemann, P Motlicek, Y Qian, P Schwarz, J Silovsky, G Stemmer, K Vesely, in IEEE Automat. Speech Recognition and Understanding Workshop. The Kaldi speech recognition toolkit (IEEEWaikoloa, HI, USA, 2011).
SM Pizer, EP Amburn, JD Austin, R Cromartie, A Geselowitz, T Greer, JB Zimmerman, K Zuiderveld, Adaptive histogram equalization and its variations. Comput. Vision Graph. 39(3), 355–368 (1987).
Article
Google Scholar
G Saon, S Dharanipragada, D Povey, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), 1. Feature space Gaussianization (IEEEMontreal, Canada, 2004), pp. 329–332.
CB Moler, Numerical Computing with MATLAB, Revised Reprint Paperback (Society of Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 2008).
KJ Palomäki, GJ Brown, JP Barker, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). Recognition of reverberant speech using full cepstral features and spectral missing data (IEEEToulouse, France, 2006).
T Robinson, J Fransen, D Pye, J Foote, S Renals, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). WSJCAM0: a British English speech corpus for large vocabulary continuous speech recognition (IEEEDetroit, MI, USA, 1995).
M Lincoln, I McCowan, J Vepa, HK Maganti, in IEEE Automat. Speech Recognition and Understanding Workshop. The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments (IEEECancún, Mexico, 2005).
Y Tachioka, T Narita, F Weninger, S Watanabe, in Proc. REVERB Workshop (REVERB’14). Dual system combination approach for various reverberant environments with dereverberation techniques (Florence, Italy, 2014).
D Povey, K Yao, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). A basis method for robust estimation of constrained MLLR (IEEEPrague, Czech Republic, 2011), pp. 4460–4463.
D Povey, D Kanevsky, B Kingsbury, B Ramabhadran, G Saon, K Visweswariah, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). Boosted MMI for model and feature-space discriminative training (IEEELas Vegas, NV, USA, 2008), pp. 4057–4060.
H Xu, D Povey, L Mangu, J Zhu, Minimum Bayes risk decoding and system combination based on a recursion for edit distance. Comput. Speech Lang. 25(4), 802–828 (2011).
Article
Google Scholar
X Zhang, J Trmal, D Povey, S Khudanpur, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). Improving deep neural network acoustic models using generalized maxout networks (IEEEFlorence, Italy, 2014).
B Kingsbury, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP). Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling (IEEETaipei, Taiwan, 2009), pp. 3761–3764.
MF Font, Multi-microphone signal processing for automatic speech recognition in meeting rooms. Master’s thesis, Universitat Politècnica de Catalunya, Spain, 2005.
CH Knapp, GC Carter, The generalized correlation method for estimation of time delay. IEEE T. Acoust. Speech. 24(4), 320–327 (1976).
Article
Google Scholar
M Delcoix, T Yoshioka, A Ogawa, Y Kubo, M Fujimoto, N Ito, K Kinoshita, M Espi, T Hori, T Nakatani, A Nakamura, in Proc. REVERB Workshop (REVERB’14). Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB Challenge (Florence, Italy, 2014).