Open Access

Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

EURASIP Journal on Advances in Signal Processing20072007:087219

https://doi.org/10.1155/2007/87219

Received: 4 October 2006

Accepted: 25 March 2007

Published: 16 July 2007

Abstract

Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients) of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

[1234567891011121314151617181920]

Authors’ Affiliations

(1)
Department of Computer Science and Engineering, Indian Institute of Technology Madras
(2)
International Institute of Information Technology

References

  1. Fuemmeler JA, Hardie RC, Gardner WR: Techniques for the regeneration of wideband speech from narrowband speech. EURASIP Journal on Applied Signal Processing 2001,2001(4):266-274. 10.1155/S1110865701000300View ArticleGoogle Scholar
  2. Hu R, Krishnan V, Anderson DV: Speech bandwidth extension by improved codebook mapping towards increased phonetic classification. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1501-1504.Google Scholar
  3. Seltzer ML, Acero A, Droppo J: Robust bandwidth extension of noise-corrupted narrowband speech. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1509-1512.Google Scholar
  4. Makhoul J, Berouti M: High-frequency regeneration in speech coding systems. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '79), April 1979, Washington, DC, USA 4: 428-431.View ArticleGoogle Scholar
  5. Geiser B, Jax P, Vary P: Artificial bandwidth extension of speech supported by watermark-transmitted side information. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1497-1500.Google Scholar
  6. Epps J, Holmes WH: A new technique for wideband enhancement of coded narrowband speech. Proceedings of IEEE Workshop on Speech Coding, June 1999, Porvoo, Finland 174-176.Google Scholar
  7. Park K-Y, Kim HS: Narrowband to wideband conversion of speech using GMM based transformation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1843-1846.Google Scholar
  8. Chen G, Parsa V: HMM-based frequency bandwidth extension for speech enhancement using line spectral frequencies. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 709-712.Google Scholar
  9. Iser B, Schmidt G: Bandwidth extension of telephony speech. EURASIP Newsletter 2005,16(2):2-24.Google Scholar
  10. Uncini A, Gobbi F, Piazza F: Frequency recovery of narrow-band speech using adaptive spline neural networks. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 2: 997-1000.View ArticleGoogle Scholar
  11. Graciarena M, Franco H, Sonmez K, Bratt H: Combining standard and throat microphones for robust speech recognition. IEEE Signal Processing Letters 2003,10(3):72-74. 10.1109/LSP.2003.808549View ArticleGoogle Scholar
  12. Zhang Z, Liu Z, Sinclair M, et al.: Multi-sensory microphones for robust speech detection, enhancement and recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 3: 781-784.Google Scholar
  13. Deller JR, Proakis JG, Hansen JHL: Discrete-Time Processing of Speech Signals. Macmillan, New York, NY, USA; 1993.Google Scholar
  14. Yegnanarayana B: On timing in time-frequency analysis of speech signals. Sadhana 1996, 21, part 1: 5-20.View ArticleGoogle Scholar
  15. Shahina A, Yegnanarayana B: Recognition of consonant-vowel units in throat microphone speech. Proceedings of International Conference on Natural Language Processing, December 2005, Kanpur, India 85-92.Google Scholar
  16. Ladefoged P: A Course in Phonetics. Harcourt College Publishers, Orlando, Fla, USA; 2001.Google Scholar
  17. Shahina A, Yegnanarayana B: Mapping neural networks for bandwidth extension of narrowband speech. Proccedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH-ICSLP '06), September 2006, Pittsburgh, Pa, USAGoogle Scholar
  18. Yegnanarayana B: Artificial Neural Networks. Prentice-Hall, New Delhi, India; 1999.Google Scholar
  19. Misra H, Ikbal S, Yegnanarayana B: Speaker-specific mapping for text-independent speaker recognition. Speech Communication 2003,39(3-4):301-310. 10.1016/S0167-6393(02)00046-8View ArticleMATHGoogle Scholar
  20. Haykin S: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs, NJ, USA; 1999.MATHGoogle Scholar

Copyright

© A. Shahina and B. Yegnanarayana 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.