Skip to main content

Table 3 Word error rate of different dereverberation methods for Dev. dataset (%)

From: Environment-dependent denoising autoencoder for distant-talking speech recognition

Dereverberation

Acoustic

SimData

 

RealData

method

model

Room 1

Room 2

Room 3

Ave.

Room 1

Ave.

  

Near

Far

Near

Far

Near

Far

 

Near

Far

 

CMVN

SGMM

3.83

5.51

6.33

12.60

7.72

14.71

8.45

42.92

44.77

43.85

 

DNN

5.43

6.88

6.73

13.73

9.05

16.37

9.70

41.30

43.20

42.25

 

SGMM

3.86

5.21

5.62

11.63

7.20

13.30

7.80

41.42

42.58

42.00

 

+DNN

          

MSLP

SGMM

5.17

5.39

6.77

10.95

7.93

15.90

8.69

36.22

38.08

37.15

 

DNN

5.74

6.30

6.91

12.02

8.19

16.56

9.29

37.43

36.97

37.20

 

SGMM

4.01

5.04

5.23

10.25

5.93

12.17

7.11

35.50

36.91

36.21

 

+DNN

          

DAE

SGMM

4.30

5.41

5.45

9.71

5.24

10.63

6.79

26.51

30.08

28.30

 

DNN

5.41

6.69

6.24

10.67

6.50

11.55

7.84

28.70

29.19

28.95

 

SGMM

4.20

4.87

5.23

9.24

4.87

9.74

6.36

26.08

28.84

27.46

 

+DNN

          

Two-step environment-dependent DAE

SGMM

3.79

5.90

4.91

9.22

6.08

8.85

6.46

28.45

31.24

29.84

 

DNN

5.01

6.91

5.45

10.38

6.87

11.35

7.66

28.95

31.99

30.47

 

SGMM

3.76

4.89

4.63

8.48

6.26

9.20

6.20

27.45

29.12

28.28

 

+DNN

          

One-step environment-dependent DAE

SGMM

4.08

4.67

4.61

8.80

4.70

8.75

5.94

28.95

27.61

28.28

 

DNN

4.97

6.47

5.72

10.13

5.98

9.40

7.11

28.95

27.20

28.08

 

SGMM

3.88

4.79

5.10

7.94

4.60

8.28

5.77

25.83

27.48

26.66

 

+DNN