- Research Article
- Open Access
Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach
EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 087219 (2007)
Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients) of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.
Fuemmeler JA, Hardie RC, Gardner WR: Techniques for the regeneration of wideband speech from narrowband speech. EURASIP Journal on Applied Signal Processing 2001,2001(4):266-274. 10.1155/S1110865701000300
Hu R, Krishnan V, Anderson DV: Speech bandwidth extension by improved codebook mapping towards increased phonetic classification. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1501–1504.
Seltzer ML, Acero A, Droppo J: Robust bandwidth extension of noise-corrupted narrowband speech. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1509–1512.
Makhoul J, Berouti M: High-frequency regeneration in speech coding systems. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '79), April 1979, Washington, DC, USA 4: 428–431.
Geiser B, Jax P, Vary P: Artificial bandwidth extension of speech supported by watermark-transmitted side information. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1497–1500.
Epps J, Holmes WH: A new technique for wideband enhancement of coded narrowband speech. Proceedings of IEEE Workshop on Speech Coding, June 1999, Porvoo, Finland 174–176.
Park K-Y, Kim HS: Narrowband to wideband conversion of speech using GMM based transformation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1843–1846.
Chen G, Parsa V: HMM-based frequency bandwidth extension for speech enhancement using line spectral frequencies. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 709–712.
Iser B, Schmidt G: Bandwidth extension of telephony speech. EURASIP Newsletter 2005,16(2):2-24.
Uncini A, Gobbi F, Piazza F: Frequency recovery of narrow-band speech using adaptive spline neural networks. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 2: 997–1000.
Graciarena M, Franco H, Sonmez K, Bratt H: Combining standard and throat microphones for robust speech recognition. IEEE Signal Processing Letters 2003,10(3):72-74. 10.1109/LSP.2003.808549
Zhang Z, Liu Z, Sinclair M, et al.: Multi-sensory microphones for robust speech detection, enhancement and recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 3: 781–784.
Deller JR, Proakis JG, Hansen JHL: Discrete-Time Processing of Speech Signals. Macmillan, New York, NY, USA; 1993.
Yegnanarayana B: On timing in time-frequency analysis of speech signals. Sadhana 1996, 21, part 1: 5–20.
Shahina A, Yegnanarayana B: Recognition of consonant-vowel units in throat microphone speech. Proceedings of International Conference on Natural Language Processing, December 2005, Kanpur, India 85–92.
Ladefoged P: A Course in Phonetics. Harcourt College Publishers, Orlando, Fla, USA; 2001.
Shahina A, Yegnanarayana B: Mapping neural networks for bandwidth extension of narrowband speech. Proccedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH-ICSLP '06), September 2006, Pittsburgh, Pa, USA
Yegnanarayana B: Artificial Neural Networks. Prentice-Hall, New Delhi, India; 1999.
Misra H, Ikbal S, Yegnanarayana B: Speaker-specific mapping for text-independent speaker recognition. Speech Communication 2003,39(3-4):301-310. 10.1016/S0167-6393(02)00046-8
Haykin S: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs, NJ, USA; 1999.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Shahina, A., Yegnanarayana, B. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach. EURASIP J. Adv. Signal Process. 2007, 087219 (2007). https://doi.org/10.1155/2007/87219
- Neural Network
- Feature Vector
- Neural Network Model
- Speech Signal
- Mapping Technique