Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array

Miyabe, Shigeki; Hinamoto, Yoichi; Saruwatari, Hiroshi; Shikano, Kiyohiro; Tatekura, Yosuke

doi:10.1155/2007/57470

Research Article
Open access
Published: 01 December 2007

Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array

Shigeki Miyabe¹,
Yoichi Hinamoto²,
Hiroshi Saruwatari¹,
Kiyohiro Shikano¹ &
…
Yosuke Tatekura³

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 057470 (2007) Cite this article

1341 Accesses
1 Citations
Metrics details

Abstract

A barge-in free spoken dialogue interface using sound field control and microphone array is proposed. In the conventional spoken dialogue system using an acoustic echo canceller, it is indispensable to estimate a room transfer function, especially when the transfer function is changed by various interferences. However, the estimation is difficult when the user and the system speak simultaneously. To resolve the problem, we propose a sound field control technique to prevent the response sound from being observed. Combined with a microphone array, the proposed method can achieve high elimination performance with no adaptive process. The efficacy of the proposed interface is ascertained in the experiments on the basis of sound elimination and speech recognition.

References

Juang BH, Soong FK: Hands-free telecommunications. Proceedings of International Workshop on Hands-Free Speech Communication, April 2001, Kyoto, Japan 5–8.
Google Scholar
Hänsler E: Acoustic echo and noise control: where do we come from—where do we go? Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC '01), September 2001, Darmstadt, Germany 1–4.
Google Scholar
Makino S, Shimauchi S: Stereophonic acoustic echo cancellation—an overview and recent solutions. Proceedings of 6th IEEE International Workshop on Acoustic Echo and Noise Control (IWAENC '99), September 1999, Pocono Manor, Pa, USA 12–19.
Google Scholar
Jung Y-W, Lee J-H, Park Y-C, Youn D-H: A new adaptive algorithm for stereophonic acoustic echo canceller. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2: 801–804.
Google Scholar
Herbordt W, Kellermann W: Acoustic echo cancellation embedded into the generalized sidelobe canceller. Proceedings of European Signal Processing Conference (EUPSICO '00), September 2000, Tampere, Finlande 3: 1843–1846.
Article Google Scholar
Buchner H, Spors S, Kellermann W: Wave-domain adaptive filtering: acoustic echo cancellation for full-duplex systems based on wave-field synthesis. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 4: 117–120.
Google Scholar
Tatekura Y, Saruwatari H, Shikano K: Sound reproduction system including adaptive compensation of temperature fluctuation effect for broad-band sound control. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2002,E85-A(8):1851-1860.
Google Scholar
Benesty J, Morgan DR, Cho JH: A family of doubletalk detectors based on cross-correlation. Proceedings of 6th IEEE International Workshop on Acoustic Echo and Noise Control (IWAENC '99), September 1999, Pocono Manor, Pa, USA 108–111.
Google Scholar
Ochiai K, Araseki T, Ogihara T: Echo canceler with two echo path models. IEEE Transactions on Communications 1977,25(6):589-595. 10.1109/TCOM.1977.1093869
Article Google Scholar
Miyoshi M, Kaneda Y: Inverse filtering of room acoustics. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(2):145-152. 10.1109/29.1509
Article Google Scholar
Bauck J, Cooper DH: Generalized transaural stereo and applications. Journal of the Audio Engineering Society 1996,44(9):683-705.
Google Scholar
Tatekura Y, Saruwatari H, Shikano K: An iterative inverse filter design method for the multichannel sound field reproduction system. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2001,E84-A(4):991-998.
Google Scholar
Haykin S: Adaptive Filter Theory. 4th edition. Prentice-Hall, Englewood Cliffs, NJ, USA; 1991.
MATH Google Scholar
Suzuki Y, Asano F, Kim H-Y, Sone T: An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses. Journal of the Acoustical Society of America 1995,97(2):1119-1123. 10.1121/1.412224
Article Google Scholar
Blauert J: Spatial Hearing. Revised edition. MIT Press, Cambridge, Mass, USA; 1997.
Google Scholar
Flanagan JL, Johnston JD, Zahn R, Elko GW: Computer-steered microphone arrays for sound transduction in large rooms. Journal of the Acoustical Society of America 1985,78(5):1508-1518. 10.1121/1.392786
Article Google Scholar
Hayamizu S, Itahashi S, Kobayashi T, Takezawa T: Design and creation of speech and text corpora of dialogue. IEICE Transactions on Information and Systems 1993,E76-D(1):17-22.
Google Scholar
Lee A, Kawahara T, Shikano K: Julius—an open source real-time large vocabulary recognition engine. Proceedings of 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 1691–1694.
Google Scholar
Lee A, Kawahara T, Takeda K, Shikano K: A new phonetic tied-mixture model for efficient decoding. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1269–1272.
Google Scholar
Yamade S, Lee A, Saruwatari H, Shikano K: Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments. Proceedings of 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), September 2003, Geneva, Switzerland 2: 1493–1496.
Google Scholar
Itou K, Yamamoto M, Takeda K, et al.: The design of the newspaper-based Japanese large vocabulary continuous speech recognition corpus. Proceedings of 5th International Conference on Spoken Language Processing (ICSLP '98), November-December 1998, Sydney, Australia 7: 3261–3264.
Google Scholar
Itou K, Yamamoto M, Takeda K, et al.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. Journal of the Acoustical Society of Japan (E) 1999,20(3):199-206.
Article Google Scholar
Rabiner L, Juang BH: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.
Google Scholar
Deller JR Jr., Hansen JHL, Proakis JG: Discrete-Time Processing of Speech Signals. Macmillan, New York, NY, USA; 1993.
Google Scholar
Miyabe S, Takatani T, Mori Y, Saruwatari H, Shikano K, Tatekura Y: Double-talk free spoken dialogue interface combining sound field control with semi-blind source separation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), May 2006, Toulouse, France 1: 809–812.
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, Takayama-Cho 8916-5, Ikoma-Shi, Nara, 630-0192, Japan
Shigeki Miyabe, Hiroshi Saruwatari & Kiyohiro Shikano
Department of Control Engineering, Takuma National College of Technology, Takuma-Cho Koda 551, Mitoyo-Shi, Kagawa, 769-1192, Japan
Yoichi Hinamoto
Faculty of Engineering, Shizuoka University, Johoku 3-5-1, Hamamatsu-Shi, Shizuoka, 432-8561, Japan
Yosuke Tatekura

Authors

Shigeki Miyabe
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Hinamoto
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Saruwatari
View author publications
You can also search for this author in PubMed Google Scholar
Kiyohiro Shikano
View author publications
You can also search for this author in PubMed Google Scholar
Yosuke Tatekura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shigeki Miyabe.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Miyabe, S., Hinamoto, Y., Saruwatari, H. et al. Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array. EURASIP J. Adv. Signal Process. 2007, 057470 (2007). https://doi.org/10.1155/2007/57470

Download citation

Received: 01 May 2006
Revised: 17 October 2006
Accepted: 29 October 2006
Published: 01 December 2007
DOI: https://doi.org/10.1155/2007/57470

Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords