Skip to main content

Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

Abstract

Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients) of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

References

  1. 1.

    Fuemmeler JA, Hardie RC, Gardner WR: Techniques for the regeneration of wideband speech from narrowband speech. EURASIP Journal on Applied Signal Processing 2001,2001(4):266-274. 10.1155/S1110865701000300

    Article  Google Scholar 

  2. 2.

    Hu R, Krishnan V, Anderson DV: Speech bandwidth extension by improved codebook mapping towards increased phonetic classification. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1501–1504.

    Google Scholar 

  3. 3.

    Seltzer ML, Acero A, Droppo J: Robust bandwidth extension of noise-corrupted narrowband speech. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1509–1512.

    Google Scholar 

  4. 4.

    Makhoul J, Berouti M: High-frequency regeneration in speech coding systems. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '79), April 1979, Washington, DC, USA 4: 428–431.

    Article  Google Scholar 

  5. 5.

    Geiser B, Jax P, Vary P: Artificial bandwidth extension of speech supported by watermark-transmitted side information. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH-ICSLP '05), September 2005, Lisbon, Portugal 1497–1500.

    Google Scholar 

  6. 6.

    Epps J, Holmes WH: A new technique for wideband enhancement of coded narrowband speech. Proceedings of IEEE Workshop on Speech Coding, June 1999, Porvoo, Finland 174–176.

    Google Scholar 

  7. 7.

    Park K-Y, Kim HS: Narrowband to wideband conversion of speech using GMM based transformation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1843–1846.

    Google Scholar 

  8. 8.

    Chen G, Parsa V: HMM-based frequency bandwidth extension for speech enhancement using line spectral frequencies. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 709–712.

    Google Scholar 

  9. 9.

    Iser B, Schmidt G: Bandwidth extension of telephony speech. EURASIP Newsletter 2005,16(2):2-24.

    Google Scholar 

  10. 10.

    Uncini A, Gobbi F, Piazza F: Frequency recovery of narrow-band speech using adaptive spline neural networks. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 2: 997–1000.

    Google Scholar 

  11. 11.

    Graciarena M, Franco H, Sonmez K, Bratt H: Combining standard and throat microphones for robust speech recognition. IEEE Signal Processing Letters 2003,10(3):72-74. 10.1109/LSP.2003.808549

    Article  Google Scholar 

  12. 12.

    Zhang Z, Liu Z, Sinclair M, et al.: Multi-sensory microphones for robust speech detection, enhancement and recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 3: 781–784.

    Google Scholar 

  13. 13.

    Deller JR, Proakis JG, Hansen JHL: Discrete-Time Processing of Speech Signals. Macmillan, New York, NY, USA; 1993.

    Google Scholar 

  14. 14.

    Yegnanarayana B: On timing in time-frequency analysis of speech signals. Sadhana 1996, 21, part 1: 5–20.

    Article  Google Scholar 

  15. 15.

    Shahina A, Yegnanarayana B: Recognition of consonant-vowel units in throat microphone speech. Proceedings of International Conference on Natural Language Processing, December 2005, Kanpur, India 85–92.

    Google Scholar 

  16. 16.

    Ladefoged P: A Course in Phonetics. Harcourt College Publishers, Orlando, Fla, USA; 2001.

    Google Scholar 

  17. 17.

    Shahina A, Yegnanarayana B: Mapping neural networks for bandwidth extension of narrowband speech. Proccedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH-ICSLP '06), September 2006, Pittsburgh, Pa, USA

    Google Scholar 

  18. 18.

    Yegnanarayana B: Artificial Neural Networks. Prentice-Hall, New Delhi, India; 1999.

    Google Scholar 

  19. 19.

    Misra H, Ikbal S, Yegnanarayana B: Speaker-specific mapping for text-independent speaker recognition. Speech Communication 2003,39(3-4):301-310. 10.1016/S0167-6393(02)00046-8

    Article  Google Scholar 

  20. 20.

    Haykin S: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs, NJ, USA; 1999.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to A. Shahina.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Shahina, A., Yegnanarayana, B. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach. EURASIP J. Adv. Signal Process. 2007, 087219 (2007). https://doi.org/10.1155/2007/87219

Download citation

Keywords

  • Neural Network
  • Feature Vector
  • Neural Network Model
  • Speech Signal
  • Mapping Technique