Enhancing the magnitude spectrum of speech features for robust speech recognition

EURASIP Journal on Advances in Signal Processing

Table 5 Recognition accuracy (%) achieved by various approaches for Aurora-2 multi-condition training task averaged across the SNRs between 0 and 20 dB, where AVG (%) and RR (%) are the averaged accuracy rate and the relative error rate reduction over the baseline

Method	Set A	Set B	Set C	AVG	RR
MFCC baseline	86.10	86.05	83.88	85.64	–
SS_Berouti	83.66	84.00	82.93	83.65	-13.86
WF_PSNR	83.96	84.50	83.03	83.99	-11.49
MMSE log-STSA	81.21	82.62	80.82	81.69	-27.48
MSE	87.91	87.41	82.21	86.57	6.49
MVN	90.38	90.41	89.82	90.28	32.31
SS_Berouti+MVN	86.89	87.91	85.72	87.06	9.91
WF_PSNR+MVN	84.78	85.24	84.57	84.92	-5.00
MMSE log-STSA+MVN	86.99	86.82	85.99	86.72	7.53
MSE+MVN	90.00	89.59	87.01	89.24	25.07
HEQ	89.98	90.05	89.59	89.93	29.87
SS_Berouti+HEQ	87.38	88.21	86.23	87.48	12.83
WF_PSNR+HEQ	84.77	85.16	84.16	84.80	-5.84
MMSE log-STSA+HEQ	86.42	86.53	85.27	86.24	4.15
MSE+HEQ	89.78	90.03	87.74	89.47	26.67
MVA	90.97	91.04	90.85	90.98	37.19
SS_Berouti+MVA	88.14	88.69	87.37	88.21	17.90
WF_PSNR+MVA	85.80	85.68	85.38	85.67	0.20
MMSE log-STSA+MVA	87.40	87.17	86.59	87.14	10.46
MSE+MVA	90.69	89.75	88.28	89.83	29.17