Skip to main content

Table 3 Word error rate of different dereverberation methods for Dev. dataset (%)

From: Environment-dependent denoising autoencoder for distant-talking speech recognition

Dereverberation Acoustic SimData   RealData
method model Room 1 Room 2 Room 3 Ave. Room 1 Ave.
   Near Far Near Far Near Far   Near Far  
CMVN SGMM 3.83 5.51 6.33 12.60 7.72 14.71 8.45 42.92 44.77 43.85
  DNN 5.43 6.88 6.73 13.73 9.05 16.37 9.70 41.30 43.20 42.25
  SGMM 3.86 5.21 5.62 11.63 7.20 13.30 7.80 41.42 42.58 42.00
  +DNN           
MSLP SGMM 5.17 5.39 6.77 10.95 7.93 15.90 8.69 36.22 38.08 37.15
  DNN 5.74 6.30 6.91 12.02 8.19 16.56 9.29 37.43 36.97 37.20
  SGMM 4.01 5.04 5.23 10.25 5.93 12.17 7.11 35.50 36.91 36.21
  +DNN           
DAE SGMM 4.30 5.41 5.45 9.71 5.24 10.63 6.79 26.51 30.08 28.30
  DNN 5.41 6.69 6.24 10.67 6.50 11.55 7.84 28.70 29.19 28.95
  SGMM 4.20 4.87 5.23 9.24 4.87 9.74 6.36 26.08 28.84 27.46
  +DNN           
Two-step environment-dependent DAE SGMM 3.79 5.90 4.91 9.22 6.08 8.85 6.46 28.45 31.24 29.84
  DNN 5.01 6.91 5.45 10.38 6.87 11.35 7.66 28.95 31.99 30.47
  SGMM 3.76 4.89 4.63 8.48 6.26 9.20 6.20 27.45 29.12 28.28
  +DNN           
One-step environment-dependent DAE SGMM 4.08 4.67 4.61 8.80 4.70 8.75 5.94 28.95 27.61 28.28
  DNN 4.97 6.47 5.72 10.13 5.98 9.40 7.11 28.95 27.20 28.08
  SGMM 3.88 4.79 5.10 7.94 4.60 8.28 5.77 25.83 27.48 26.66
  +DNN