Skip to main content

Table 1 Speech recognition performance on reverb challenge 2014 test set (WER (%))

From: Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

 

SimData

RealData

 

Room 1

Room 2

Room 3

Ave.

Room 1

Ave.

  

Near

Far

Near

Far

Near

Far

 

Near

Far

 

(1)

GMM-HMM (cln)

10.79

16.11

33.62

81.72

43.97

88.35

45.92

87.93

85.92

86.95

(2)

GMM-HMM (mc)

14.38

14.06

15.03

28.85

19.06

35.57

21.16

46.79

45.44

46.13

(3)

GMM-HMM (mc, w MLLR)

12.39

12.71

14.23

26.23

17.11

33.92

19.43

42.89

42.27

42.59

(4)

DNN-HMM (cln)

6.85

10.22

16.18

45.52

23.12

60.25

27.05

65.25

66.78

65.99

(5)

DNN-HMM (cln) + DAE

6.25

6.78

7.65

13.67

9.04

16.75

10.03

30.66

31.87

31.25

(6)

DNN-HMM (cln) + pDAE (P C soft )

5.51

6.44

7.06

12.74

8.17

14.26

9.04

27.37

26.60

27.00

(7)

DNN-HMM (cln) + pDAE (\(PC^{decode}_{\textit {hard}}\))

5.18

6.12

7.14

12.57

7.66

12.42

8.54

27.75

26.60

27.20

(8)

DNN-HMM (mc)

5.42

6.37

7.27

12.56

7.85

12.90

8.74

28.59

30.87

29.67

(9)

DNN-HMM (mc) + DAE

9.30

9.69

8.36

11.92

9.30

15.25

10.62

24.37

25.52

24.93

(10)

DNN-HMM (mc) + pDAE (P C soft )

8.59

9.13

7.77

11.53

8.74

13.53

9.87

23.47

23.09

23.29

(11)

DNN-HMM (mc) + pDAE (\(PC^{decode}_{\textit {hard}}\))

7.29

7.86

7.48

10.87

8.09

11.06

8.78

22.74

22.96

22.85

(12)

DNN-HMM (retrain) + DAE

6.10

6.32

7.04

13.04

6.89

13.50

8.83

31.30

32.14

31.71