Skip to main content

Table 4 Word error rate of different dereverberation methods for Eval. dataset (%)

From: Environment-dependent denoising autoencoder for distant-talking speech recognition

Dereverberation

Acoustic

SimData

 

RealData

 

method

model

Room 1

Room 2

Room 3

Ave.

Room 1

Ave.

 
  

Near

Far

Near

Far

Near

Far

 

Near

Far

  

CMVN

SGMM

5.47

5.88

6.59

12.68

8.29

16.77

9.28

44.84

44.53

44.69

 
 

DNN

6.05

6.71

7.89

13.29

9.13

17.74

10.14

43.82

43.55

43.69

 
 

SGMM

4.90

5.39

6.33

11.77

7.68

15.57

8.61

43.40

42.98

43.19

 
 

+DNN

           

MSLP

SGMM

7.05

7.95

8.42

14.34

10.46

19.70

11.32

42.45

43.65

43.05

 
 

DNN

7.95

9.10

9.48

15.98

11.72

21.39

12.60

43.50

44.80

44.15

 
 

SGMM

4.71

5.18

5.95

9.95

7.32

14.45

7.93

35.52

36.09

35.81

 
 

+DNN

           

DAE

SGMM

5.07

5.51

6.11

9.53

7.64

12.11

7.66

32.26

32.58

32.42

 
 

DNN

5.86

6.45

7.06

10.85

7.92

12.78

8.49

31.62

32.88

32.25

 
 

SGMM

4.79

5.40

5.64

9.00

7.06

10.85

7.12

30.02

31.09

30.56

 
 

+DNN

           

Two-step environment-dependent DAE

SGMM

4.61

6.73

5.47

10.01

7.83

12.32

7.83

30.57

33.05

29.84

 
 

DNN

5.57

8.37

6.16

10.90

7.85

13.14

8.66

31.49

33.59

30.47

 
 

SGMM

4.25

6.32

5.19

8.95

7.08

11.50

7.22

29.16

31.03

30.10

 
 

+DNN

           

One-step environment-dependent DAE

SGMM

4.93

5.30

5.82

8.47

7.25

10.47

7.04

28.65

28.66

28.66

 
 

DNN

5.29

6.05

6.43

8.81

7.20

10.97

7.46

27.95

28.26

28.11

 
 

SGMM

4.54

5.05

5.37

7.62

6.50

9.40

6.41

26.38

27.28

26.83

 
 

+DNN