Skip to main content

Table 1 Settings for the SE Front-end

From: Strategies for distant speech recognitionin reverberant environments

WPE

T =3, T =40,30,7 for 1ch, 2ch and 8ch, respectively

Window length: 32 ms, frame shift : 8 ms

Number of FFT points: 512 (number of frequency bands: 257)

MVDR

Window length: 32 ms, frame shift : 8 ms

DOLPHIN see [ 22 ] Section V. A. 2) and Section V. B. 2) for details

Window length: 100 ms, frame shift : 25 ms

Spectral feature model:

GMM with 256 components, features: MFCC (13 dimensions)

Source location model:

Watson mixture model (4 components), features: normalized complex

spectrum