K Kinoshita, M Delcroix, T Yoshioka, T Nakatani, E Habets, R Haeb-Umbach, V Leutnant, A Sehr, W Kellermann, R Maas, S Gannot, B Raj, in Proceedings of WASPAA. The REVERB Challenge: A common evaluation framework for dereverberation and recognition of reverberant speech (IEEE, 2013).
Y Tachioka, T Hanazawa, T Iwasaki, Dereverberation method with reverberation time estimation using floored ratio of spectral subtraction. Acoust. Sci. Technol. 34(3), 212–215 (2013).
D Johnson, D Dudgeon, Array Signal Processing (Prentice-Hall, New Jersey, 1993).
C Knapp, G Carter, The generalized correlation method for estimation of time delay. IEEE Trans. Acous. Speech, and Signal Process. 24, 320–327 (1976).
Y Tachioka, T Narita, T Iwasaki, Direction of arrival estimation by cross-power spectrum phase analysis using prior distributions and voice activity detection information. Acoust. Sci. Technol. 33, 68–71 (2012).
D Povey, P Woodland, in Proceedings of ICASSP, I. Minimum phone error and I-smoothing for improved discriminative training (IEEE, 2002), pp. 105–108.
E McDermott, T Hazen, J Le Roux, A Nakamura, S Katagiri, Discriminative training for large-vocabulary speech recognition using minimum classification error. IEEE Trans. Audio Speech Lang. Process. 15, 203–223 (2007).
R Haeb-Umbach, H Ney, in Proceedings of ICASSP. Linear discriminant analysis for improved large vocabulary continuous speech recognition (IEEE, 1992), pp. 13–16.
R Gopinath, in Proceedings of ICASSP. Maximum likelihood modeling with Gaussian distributions for classification (IEEE, 1998), pp. 661–664.
M Gales, Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process. 7, 272–281 (1999).
T Anastasakos, J McDonough, R Schwartz, J Makhoul, in Proceedings of ICSLP. A compact model for speaker-adaptive training (ISCA, 1996), pp. 1137–1140.
D Povey, B Kingsbury, L Mangu, G Saon, H Soltau, G Zweig, in Proceedings of ICASSP. fMPE: Discriminatively trained features for speech recognition (IEEE, 2005), pp. 961–964.
G Hinton, L Deng, D Yu, G Dahl, A Mohamed, N Jaitly, A Senior, V Vanhoucke, P Nguyen, T Sainath, B Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 28, 82–97 (2012).
E Vincent, J Barker, S Watanabe, Le Roux, J, F Nesta, M Matassoni, in Proceedings of ICASSP. The second ‘CHiME’ speech separation and recognition challenge: Datasets, tasks and baselines (IEEE, 2013), pp. 126–130.
Y Tachioka, S Watanabe, J Hershey, in Proceedings of ICASSP. Effectiveness of discriminative training and feature transformation for reverberated and noisy speech (IEEE, 2013), pp. 6935–6939.
Y Tachioka, S Watanabe, J Le Roux, J Hershey, in Proceedings of the 2nd CHiME Workshop on Machine Listening in Multisource Environments. Discriminative methods for noise robust speech recognition: A CHiME challenge benchmark, (2013), pp. 19–24.
H Christensen, J Barker, N Ma, P Green, in Proceedings of INTERSPEECH. The CHiME corpus: a resource and a challenge for computational hearing in multisource environments (ISCA, 2010), pp. 1918–1921.
G Saon, S Dharanipragada, D Povey, in Proceedings of ICASSP, I. Feature space Gaussianization (IEEE, 2004), pp. 329–332.
K Palomäki, H Kallasjoki, in Proceedings of REVERB Workshop. Reverberation robust speech recognition by matching distributions of spectrally and temporally decorrelated features, (2014).
D Povey, K Yao, A basis representation of constrained MLLR transforms for robust adaptation. Comput. Speech and Language. 26, 35–51 (2012).
A Mohamed, G Hinton, G Penn, in Proceedings of ICASSP. Understanding how deep belief networks perform acoustic modelling (IEEE, 2012), pp. 4273–4276.
J Fiscus, in Proceedings of ASRU. A post-processing system to yield reduced error word rates: Recognizer output voting error reduction (ROVER) (IEEE, 1997), pp. 347–354.
G Evermann, P Woodland, in Proceedings of NIST Speech Transcription Workshop. Posterior probability decoding, confidence estimation and system combination, (2000).
B Hoffmeister, T Klein, R Schlüter, H Ney, in Proceedings of ICSLP. Frame based system combination and a comparison with weighted ROVER and CNC (ISCA, 2006), pp. 537–540.
F Diehl, P Woodland, in Proceedings of INTERSPEECH. Complementary phone error training (ISCA, 2012).
K Audhkhasi, A Zavou, P Georgiou, S Narayanan, Theoretical analysis of diversity in an ensemble of automatic speech recognition systems. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 711–726 (2014).
Y Tachioka, S Watanabe, in Proceedings of INTERSPEECH. Discriminative training of acoustic models for system combination (ISCA, 2013), pp. 2355–2359.
Y Tachioka, S Watanabe, J Le Roux, J Hershey, in Proceedings of ASRU. A generalized framework of discriminative training for system combination (IEEE, 2013), pp. 43–48.
D Povey, L Burget, M Agarwal, P Akyazi, F Kai, A Ghoshal, O Glembek, N Goel, M Karafiát, A Rastrow, R Rose, P Schwarz, S Thomas, The subspace Gaussian mixture model –a structured model for speech recognition. Comput. Speech Lang. 25(2), 404–439 (2011).
Y Tachioka, T Narita, S Watanabe, F Weninger, in Proceedings of REVERB Challenge. Dual system combination approach for various reverberant environments, (2014), pp. 1–8.
T Suzuki, Y Kaneda, Sound source direction estimation based on subband peak-hold processing. J. Acoust. Soc. Japan. 65(10), 513–522 (2009).
T Nishiura, T Yamada, T Nakamura, K Shikano, in Proceedings of ICASSP, 2. Localization of multiple sound sources based on a CSP analysis with a microphone array (IEEE, 2000), pp. 1053–1056.
E Habets, in Speech Dereverberation, ed. by P Naylor, N Gaubitch. Speech dereverberation using statistical reverberation models (SpringerLondon, 2010).
S Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acous. Speech Signal Process. 27(2), 113–120 (1979).
AH Sayed, Adaptive Filters (John Wiley & Sons, New Jersey, 2008).
D Povey, D Kanevsky, B Kingsbury, B Ramabhadran, G Saon, K Visweswariah, in Proceedings of ICASSP. Boosted MMI for model and feature-space discriminative training (IEEE, 2008), pp. 4057–4060.
D Povey, in Proceedings of INTERSPEECH. Improvements to fMPE for discriminative training of features (ISCA, 2005), pp. 2977–2980.
Vesely, Ḱ, A Ghoshal, L Burget, D Povey, in Proceedings of INTERSPEECH. Sequence-discriminative training of deep neural networks, (2013).
T Robinson, J Fransen, D Pye, J Foote, S Renals, in Proceedings of ICASSP. WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition (IEEE, 1995), pp. 81–84.
D Povey, A Ghoshal, G Boulianne, L Burget, O Glembek, N Goel, M Hannemann, M Petr, Y Qian, P Schwarz, J Silovský, G Stemmer, K Veselý, in Proceedings of ASRU. The Kaldi speech recognition toolkit (IEEE, 2011), pp. 1–4.
H Xu, D Povey, L Mangu, J Zhu, in Proceedings of ICASSP. An improved consensus-like method for minimum Bayes risk decoding and lattice combination (IEEE, 2010), pp. 4938–4941.
J Snoek, H Larochelle, R Adams, in Proceedings of Neural Information Processing Systems. Practical bayesian optimization of machine learning algorithms, (2012).
G Dahl, T Sainath, G Hinton, in Proceedings of ICASSP. Improving deep neural networks for LVCSR using rectified linear units and dropout (IEEE, 2013), pp. 8609–8613.
S Watanabe, J Le Roux, in Proceedings of ICASSP. Black box optimization for automatic speech recognition (IEEE, 2014), pp. 3280–3284.
F Weninger, S Watanabe, Y Tachioka, B Schuller, in Proceedings of ICASSP. Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition (IEEE, 2014), pp. 4656–4660.
F Weninger, S Watanabe, J Le Roux, J Hershey, Y Tachioka, JT Geiger, BW Schuller, G Rigoll, in Proceedings of REVERB Challenge. The MERL/MELCO/TUM system using deep recurrent neural network speech enhancement, (2014), pp. 1–8.
X Xiao, Z Shengkui, DHH Nguyen, Z Xionghu, D Jones, E-S Chng, H Li, in Proceedings of REVERB Challenge. The NTU-ADSC systems for reverberation challenge 2014, (2014), pp. 1–8.
MJ Alam, V Gupta, P Kenny, P Dumouchel, in Proceedings of REVERB Challenge. Use of multiple front-ends and i-vector-based speaker adaptation for robust speech recognition, (2014), pp. 1–8.
M Delcroix, T Yoshioka, A Ogawa, Y Kubo, M Fujimoto, I Nobutaka, K Kinoshita, M Espi, T Hori, T Nakatani, A Nakamura, in Proceedings of REVERB Challenge. Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge, (2014), pp. 1–8.