Skip to main content

Table 1 Settings for the SE Front-end

From: Strategies for distant speech recognitionin reverberant environments

WPE
T =3, T =40,30,7 for 1ch, 2ch and 8ch, respectively
Window length: 32 ms, frame shift : 8 ms
Number of FFT points: 512 (number of frequency bands: 257)
MVDR
Window length: 32 ms, frame shift : 8 ms
DOLPHIN see [ 22 ] Section V. A. 2) and Section V. B. 2) for details
Window length: 100 ms, frame shift : 25 ms
Spectral feature model:
GMM with 256 components, features: MFCC (13 dimensions)
Source location model:
Watson mixture model (4 components), features: normalized complex
spectrum