From: Environment-dependent denoising autoencoder for distant-talking speech recognition
Dereverberation | Acoustic | SimData | RealData | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
method | model | Room 1 | Room 2 | Room 3 | Ave. | Room 1 | Ave. | ||||||
Near | Far | Near | Far | Near | Far | Near | Far | ||||||
CMVN | SGMM | 5.47 | 5.88 | 6.59 | 12.68 | 8.29 | 16.77 | 9.28 | 44.84 | 44.53 | 44.69 | ||
DNN | 6.05 | 6.71 | 7.89 | 13.29 | 9.13 | 17.74 | 10.14 | 43.82 | 43.55 | 43.69 | |||
SGMM | 4.90 | 5.39 | 6.33 | 11.77 | 7.68 | 15.57 | 8.61 | 43.40 | 42.98 | 43.19 | |||
+DNN | |||||||||||||
MSLP | SGMM | 7.05 | 7.95 | 8.42 | 14.34 | 10.46 | 19.70 | 11.32 | 42.45 | 43.65 | 43.05 | ||
DNN | 7.95 | 9.10 | 9.48 | 15.98 | 11.72 | 21.39 | 12.60 | 43.50 | 44.80 | 44.15 | |||
SGMM | 4.71 | 5.18 | 5.95 | 9.95 | 7.32 | 14.45 | 7.93 | 35.52 | 36.09 | 35.81 | |||
+DNN | |||||||||||||
DAE | SGMM | 5.07 | 5.51 | 6.11 | 9.53 | 7.64 | 12.11 | 7.66 | 32.26 | 32.58 | 32.42 | ||
DNN | 5.86 | 6.45 | 7.06 | 10.85 | 7.92 | 12.78 | 8.49 | 31.62 | 32.88 | 32.25 | |||
SGMM | 4.79 | 5.40 | 5.64 | 9.00 | 7.06 | 10.85 | 7.12 | 30.02 | 31.09 | 30.56 | |||
+DNN | |||||||||||||
Two-step environment-dependent DAE | SGMM | 4.61 | 6.73 | 5.47 | 10.01 | 7.83 | 12.32 | 7.83 | 30.57 | 33.05 | 29.84 | ||
DNN | 5.57 | 8.37 | 6.16 | 10.90 | 7.85 | 13.14 | 8.66 | 31.49 | 33.59 | 30.47 | |||
SGMM | 4.25 | 6.32 | 5.19 | 8.95 | 7.08 | 11.50 | 7.22 | 29.16 | 31.03 | 30.10 | |||
+DNN | |||||||||||||
One-step environment-dependent DAE | SGMM | 4.93 | 5.30 | 5.82 | 8.47 | 7.25 | 10.47 | 7.04 | 28.65 | 28.66 | 28.66 | ||
DNN | 5.29 | 6.05 | 6.43 | 8.81 | 7.20 | 10.97 | 7.46 | 27.95 | 28.26 | 28.11 | |||
SGMM | 4.54 | 5.05 | 5.37 | 7.62 | 6.50 | 9.40 | 6.41 | 26.38 | 27.28 | 26.83 | |||
+DNN |