From: Strategies for distant speech recognitionin reverberant environments
Input features |
---|
40 log mel filterbank coefficients + Δ + Δ Δ (120 coef.) |
Global mean and variance normalization + utterance level CMN |
5 left and 5 right context (11 frames) |
Acoustic model |
DNN-HMM |
7 hidden layers, 2048 hidden units, |
1320 visible units, 3129 output units (HMM states) |
Training data |
(1) Baseline multi-condition training data (17h) |
(2) Extended multi-condition training data (85h) |
Language model |
TRI : Trigram language model available with the WSJ corpus [28] |
RNNLM : RNN-based language model (interpolation coef. 0.5) |
Decoding parameters |
Language model weight:11 |
Beam: 400 |