Skip to main content

Table 4 Word error rate of different dereverberation methods for Eval. dataset (%)

From: Environment-dependent denoising autoencoder for distant-talking speech recognition

Dereverberation Acoustic SimData   RealData  
method model Room 1 Room 2 Room 3 Ave. Room 1 Ave.  
   Near Far Near Far Near Far   Near Far   
CMVN SGMM 5.47 5.88 6.59 12.68 8.29 16.77 9.28 44.84 44.53 44.69  
  DNN 6.05 6.71 7.89 13.29 9.13 17.74 10.14 43.82 43.55 43.69  
  SGMM 4.90 5.39 6.33 11.77 7.68 15.57 8.61 43.40 42.98 43.19  
  +DNN            
MSLP SGMM 7.05 7.95 8.42 14.34 10.46 19.70 11.32 42.45 43.65 43.05  
  DNN 7.95 9.10 9.48 15.98 11.72 21.39 12.60 43.50 44.80 44.15  
  SGMM 4.71 5.18 5.95 9.95 7.32 14.45 7.93 35.52 36.09 35.81  
  +DNN            
DAE SGMM 5.07 5.51 6.11 9.53 7.64 12.11 7.66 32.26 32.58 32.42  
  DNN 5.86 6.45 7.06 10.85 7.92 12.78 8.49 31.62 32.88 32.25  
  SGMM 4.79 5.40 5.64 9.00 7.06 10.85 7.12 30.02 31.09 30.56  
  +DNN            
Two-step environment-dependent DAE SGMM 4.61 6.73 5.47 10.01 7.83 12.32 7.83 30.57 33.05 29.84  
  DNN 5.57 8.37 6.16 10.90 7.85 13.14 8.66 31.49 33.59 30.47  
  SGMM 4.25 6.32 5.19 8.95 7.08 11.50 7.22 29.16 31.03 30.10  
  +DNN            
One-step environment-dependent DAE SGMM 4.93 5.30 5.82 8.47 7.25 10.47 7.04 28.65 28.66 28.66  
  DNN 5.29 6.05 6.43 8.81 7.20 10.97 7.46 27.95 28.26 28.11  
  SGMM 4.54 5.05 5.37 7.62 6.50 9.40 6.41 26.38 27.28 26.83  
  +DNN