Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

Shahina, A.; Yegnanarayana, B.

doi:10.1155/2007/87219

Research Article
Open access
Published: 01 December 2007

Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

A. Shahina¹ &
B. Yegnanarayana²

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 087219 (2007) Cite this article

1947 Accesses
26 Citations
Metrics details

Abstract

Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients) of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

References

Fuemmeler JA, Hardie RC, Gardner WR: Techniques for the regeneration of wideband speech from narrowband speech. EURASIP Journal on Applied Signal Processing 2001,2001(4):266-274. 10.1155/S1110865701000300
Article Google Scholar
Hu R, Krishnan V, Anderson DV: Speech bandwidth extension by improved codebook mapping towards increased phonetic classification. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1501–1504.
Google Scholar
Seltzer ML, Acero A, Droppo J: Robust bandwidth extension of noise-corrupted narrowband speech. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1509–1512.
Google Scholar
Makhoul J, Berouti M: High-frequency regeneration in speech coding systems. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '79), April 1979, Washington, DC, USA 4: 428–431.
Article Google Scholar
Geiser B, Jax P, Vary P: Artificial bandwidth extension of speech supported by watermark-transmitted side information. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1497–1500.
Google Scholar
Epps J, Holmes WH: A new technique for wideband enhancement of coded narrowband speech. Proceedings of IEEE Workshop on Speech Coding, June 1999, Porvoo, Finland 174–176.
Google Scholar
Park K-Y, Kim HS: Narrowband to wideband conversion of speech using GMM based transformation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1843–1846.
Google Scholar
Chen G, Parsa V: HMM-based frequency bandwidth extension for speech enhancement using line spectral frequencies. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 709–712.
Google Scholar
Iser B, Schmidt G: Bandwidth extension of telephony speech. EURASIP Newsletter 2005,16(2):2-24.
Google Scholar
Uncini A, Gobbi F, Piazza F: Frequency recovery of narrow-band speech using adaptive spline neural networks. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 2: 997–1000.
Google Scholar
Graciarena M, Franco H, Sonmez K, Bratt H: Combining standard and throat microphones for robust speech recognition. IEEE Signal Processing Letters 2003,10(3):72-74. 10.1109/LSP.2003.808549
Article Google Scholar
Zhang Z, Liu Z, Sinclair M, et al.: Multi-sensory microphones for robust speech detection, enhancement and recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 3: 781–784.
Google Scholar
Deller JR, Proakis JG, Hansen JHL: Discrete-Time Processing of Speech Signals. Macmillan, New York, NY, USA; 1993.
Google Scholar
Yegnanarayana B: On timing in time-frequency analysis of speech signals. Sadhana 1996, 21, part 1: 5–20.
Article Google Scholar
Shahina A, Yegnanarayana B: Recognition of consonant-vowel units in throat microphone speech. Proceedings of International Conference on Natural Language Processing, December 2005, Kanpur, India 85–92.
Google Scholar
Ladefoged P: A Course in Phonetics. Harcourt College Publishers, Orlando, Fla, USA; 2001.
Google Scholar
Shahina A, Yegnanarayana B: Mapping neural networks for bandwidth extension of narrowband speech. Proccedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH-ICSLP '06), September 2006, Pittsburgh, Pa, USA
Google Scholar
Yegnanarayana B: Artificial Neural Networks. Prentice-Hall, New Delhi, India; 1999.
Google Scholar
Misra H, Ikbal S, Yegnanarayana B: Speaker-specific mapping for text-independent speaker recognition. Speech Communication 2003,39(3-4):301-310. 10.1016/S0167-6393(02)00046-8
Article Google Scholar
Haykin S: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs, NJ, USA; 1999.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, 600036, India
A. Shahina
International Institute of Information Technology, Gachibowli, Hyderabad, 500032, India
B. Yegnanarayana

Authors

A. Shahina
View author publications
You can also search for this author in PubMed Google Scholar
B. Yegnanarayana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Shahina.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Shahina, A., Yegnanarayana, B. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach. EURASIP J. Adv. Signal Process. 2007, 087219 (2007). https://doi.org/10.1155/2007/87219

Download citation

Received: 04 October 2006
Accepted: 25 March 2007
Published: 01 December 2007
DOI: https://doi.org/10.1155/2007/87219

Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords