From: Environment-dependent denoising autoencoder for distant-talking speech recognition
Dereverberation | Acoustic | SimData | RealData | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
method | model | Room 1 | Room 2 | Room 3 | Ave. | Room 1 | Ave. | ||||
Near | Far | Near | Far | Near | Far | Near | Far | ||||
CMVN | SGMM | 3.83 | 5.51 | 6.33 | 12.60 | 7.72 | 14.71 | 8.45 | 42.92 | 44.77 | 43.85 |
DNN | 5.43 | 6.88 | 6.73 | 13.73 | 9.05 | 16.37 | 9.70 | 41.30 | 43.20 | 42.25 | |
SGMM | 3.86 | 5.21 | 5.62 | 11.63 | 7.20 | 13.30 | 7.80 | 41.42 | 42.58 | 42.00 | |
+DNN | |||||||||||
MSLP | SGMM | 5.17 | 5.39 | 6.77 | 10.95 | 7.93 | 15.90 | 8.69 | 36.22 | 38.08 | 37.15 |
DNN | 5.74 | 6.30 | 6.91 | 12.02 | 8.19 | 16.56 | 9.29 | 37.43 | 36.97 | 37.20 | |
SGMM | 4.01 | 5.04 | 5.23 | 10.25 | 5.93 | 12.17 | 7.11 | 35.50 | 36.91 | 36.21 | |
+DNN | |||||||||||
DAE | SGMM | 4.30 | 5.41 | 5.45 | 9.71 | 5.24 | 10.63 | 6.79 | 26.51 | 30.08 | 28.30 |
DNN | 5.41 | 6.69 | 6.24 | 10.67 | 6.50 | 11.55 | 7.84 | 28.70 | 29.19 | 28.95 | |
SGMM | 4.20 | 4.87 | 5.23 | 9.24 | 4.87 | 9.74 | 6.36 | 26.08 | 28.84 | 27.46 | |
+DNN | |||||||||||
Two-step environment-dependent DAE | SGMM | 3.79 | 5.90 | 4.91 | 9.22 | 6.08 | 8.85 | 6.46 | 28.45 | 31.24 | 29.84 |
DNN | 5.01 | 6.91 | 5.45 | 10.38 | 6.87 | 11.35 | 7.66 | 28.95 | 31.99 | 30.47 | |
SGMM | 3.76 | 4.89 | 4.63 | 8.48 | 6.26 | 9.20 | 6.20 | 27.45 | 29.12 | 28.28 | |
+DNN | |||||||||||
One-step environment-dependent DAE | SGMM | 4.08 | 4.67 | 4.61 | 8.80 | 4.70 | 8.75 | 5.94 | 28.95 | 27.61 | 28.28 |
DNN | 4.97 | 6.47 | 5.72 | 10.13 | 5.98 | 9.40 | 7.11 | 28.95 | 27.20 | 28.08 | |
SGMM | 3.88 | 4.79 | 5.10 | 7.94 | 4.60 | 8.28 | 5.77 | 25.83 | 27.48 | 26.66 | |
+DNN |