Skip to main content

Table 2 Settings for the ASR back-end

From: Strategies for distant speech recognitionin reverberant environments

Input features

40 log mel filterbank coefficients + Δ + Δ Δ (120 coef.)

Global mean and variance normalization + utterance level CMN

5 left and 5 right context (11 frames)

Acoustic model

DNN-HMM

7 hidden layers, 2048 hidden units,

1320 visible units, 3129 output units (HMM states)

Training data

(1) Baseline multi-condition training data (17h)

(2) Extended multi-condition training data (85h)

Language model

TRI : Trigram language model available with the WSJ corpus [28]

RNNLM : RNN-based language model (interpolation coef. 0.5)

Decoding parameters

Language model weight:11

Beam: 400