Skip to main content

Table 4 WER (%) obtained with the non-reverberant part of the evaluation data set

From: Reverberant speech recognition exploiting clarity index estimation

 

Clean

 

R1

R2

R3

Clean-cond.

12.83

12.20

11.62

Multi-cond.

30.29

30.00

30.10

NIRA-CART

   

Clean&Multi cond.

13.98

13.76

12.81

C50FV

28.87

28.80

28.29

C50HLDA

25.84

24.97

25.76

MS3

22.31

21.64

22.59

MS3+C50HLDA

19.91

19.87

19.95

MS5

22.72

21.39

22.86

MS5+C50HLDA

20.18

19.57

20.47

MS8

21.94

20.69

22.11

MS8+C50HLDA

20.62

19.07

19.38

MS11

21.70

20.04

21.58

MS11+C50HLDA

20.67

19.76

19.06

MS14

21.57

20.63

21.84

MS14+C50HLDA

19.77

19.07

19.31

MS18

22.26

21.13

22.52

MS18+C50HLDA

21.47

20.31

20.41

NIRA-BLSTM

   

Clean&Multi cond.

12.98

12.32

11.76

C50FV

28.80

29.02

28.44

C50HLDA

26.45

25.28

25.87

MS3

20.89

20.13

21.02

MS3+C50HLDA

18.84

18.40

18.87

MS5

21.62

20.68

21.70

MS5+C50HLDA

19.16

19.15

19.96

MS8

20.35

19.39

20.20

MS8+C50HLDA

19.04

18.33

18.39

MS11

19.03

18.04

18.87

MS11+C50HLDA

18.26

17.80

17.23

MS14

19.37

18.75

18.87

MS14+C50HLDA

17.55

17.74

17.23

MS18

18.50

18.20

18.51

MS18+C50HLDA

17.38

16.82

16.69

  1. The first two rows correspond to the baseline methods, and the remainder are the methods proposed in this work. R1, R2 and R3 represent room numbers 1, 2 and 3, respectively. Best performance results in each column are shown in italics