Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

EURASIP Journal on Advances in Signal Processing

Table 1 Speech recognition performance on reverb challenge 2014 test set (WER (%))

		SimData							RealData
		Room 1		Room 2		Room 3		Ave.	Room 1		Ave.
		Near	Far	Near	Far	Near	Far		Near	Far
(1)	GMM-HMM (cln)	10.79	16.11	33.62	81.72	43.97	88.35	45.92	87.93	85.92	86.95
(2)	GMM-HMM (mc)	14.38	14.06	15.03	28.85	19.06	35.57	21.16	46.79	45.44	46.13
(3)	GMM-HMM (mc, w MLLR)	12.39	12.71	14.23	26.23	17.11	33.92	19.43	42.89	42.27	42.59
(4)	DNN-HMM (cln)	6.85	10.22	16.18	45.52	23.12	60.25	27.05	65.25	66.78	65.99
(5)	DNN-HMM (cln) + DAE	6.25	6.78	7.65	13.67	9.04	16.75	10.03	30.66	31.87	31.25
(6)	DNN-HMM (cln) + pDAE (P C _soft)	5.51	6.44	7.06	12.74	8.17	14.26	9.04	27.37	26.60	27.00
(7)	DNN-HMM (cln) + pDAE (\(PC^{decode}_{\textit {hard}}\))	5.18	6.12	7.14	12.57	7.66	12.42	8.54	27.75	26.60	27.20
(8)	DNN-HMM (mc)	5.42	6.37	7.27	12.56	7.85	12.90	8.74	28.59	30.87	29.67
(9)	DNN-HMM (mc) + DAE	9.30	9.69	8.36	11.92	9.30	15.25	10.62	24.37	25.52	24.93
(10)	DNN-HMM (mc) + pDAE (P C _soft)	8.59	9.13	7.77	11.53	8.74	13.53	9.87	23.47	23.09	23.29
(11)	DNN-HMM (mc) + pDAE (\(PC^{decode}_{\textit {hard}}\))	7.29	7.86	7.48	10.87	8.09	11.06	8.78	22.74	22.96	22.85
(12)	DNN-HMM (retrain) + DAE	6.10	6.32	7.04	13.04	6.89	13.50	8.83	31.30	32.14	31.71