Skip to content

Advertisement

  • Research Article
  • Open Access

Speaker Separation and Tracking System

EURASIP Journal on Advances in Signal Processing20062006:029104

https://doi.org/10.1155/ASP/2006/29104

  • Received: 26 January 2005
  • Accepted: 8 December 2005
  • Published:

Abstract

Replicating human hearing in electronics under the constraints of using only two microphones (even with more than two speakers) and the user carrying the device at all times (i.e., mobile device weighing less than 100 g) is nontrivial. Our novel contribution in this area is a two-microphone system that incorporates both blind source separation and speaker tracking. This system handles more than two speakers and overlapping speech in a mobile environment. The system also supports the case in which a feedback loop from the speaker tracking step to the blind source separation can improve performance. In order to develop and optimize this system, we have established a novel benchmark that we herewith present. Using the introduced complexity metrics, we present the tradeoffs between system performance and computational load. Our results prove that in our case, source separation was significantly more dependent on frame duration than on sampling frequency.

Keywords

  • System Performance
  • Information Technology
  • Complexity Metrics
  • Feedback Loop
  • Mobile Device

[1234567891011121314151617181920212223242526272829303132333435363738394041]

Authors’ Affiliations

(1)
The Wearable Computing Lab, ETH Zurich, Zurich, 8097, Switzerland

References

  1. Moore D: The IDIAP smart meeting room. IDIAP-COM 07, IDIAP, 2002Google Scholar
  2. Wooters C, Mirghafori N, Stolcke A, et al.: The 2004 ICSI-SRI-UW meeting recognition system. Lecture Notes in Computer Science, January 2005 3361: 196-208.View ArticleGoogle Scholar
  3. Kern N, Schiele B, Junker H, Lukowicz P, Tröster G: Wearable sensing to annotate meeting recordings. Personal Ubiquitous Computing 2003, 7(5):263-274.Google Scholar
  4. Choudhury T, Pentland A: The sociometer: a wearable device for understanding human networks. Proceedings of the Conference on Computer Supported Cooperative Work (CSCW '02), Workshop on Ad hoc Communications and Collaboration in Ubiquitous Computing Environments, November 2002, New Orleans, La, USAGoogle Scholar
  5. Kwon S, Narayanan S: A method for on-line speaker indexing using generic reference models. Proceedings of the 8th European Conference on Speech Communication and Technology, September 2003, Geneva, Switzerland 2653-2656.Google Scholar
  6. Nishida M, Kawahara T: Speaker model selection using Bayesian information criterion for speaker indexing and speaker adaptation. Proceedings of the 8th European Conference on Speech Communication and Technology, September 2003, Geneva, Switzerland 1849-1852.Google Scholar
  7. Lu L, Zhang H-J: Speaker change detection and tracking in realtime news broadcasting analysis. Proceedings of the 10th ACM International Conference on Multimedia, December 2002, Juan les Pins, France 602-610.Google Scholar
  8. Lathoud G, McCowan IA, Odobez J-M: Unsupervised location based segmentation of multi-party speech. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing – Meeting Recognition Workshop (ICASSP-NIST '04), May 2004, Montreal, Canada IDIAP-RR 04-14Google Scholar
  9. Siracusa M, Morency LP, Wilson K, Fisher J, Darrell T: A multi-modal approach for determining speaker location and focus. Proceedings of the International Conference on Multi-modal Interfaces (ICMI '03), November 2003, Vancouver, BC, Canada 77-80.View ArticleGoogle Scholar
  10. Ajmera J, Lathoud G, McCowan IA: Clustering and segmenting speakers and their locations in meetings. Research Report IDIAP-RR 03-55 December 2003.Google Scholar
  11. Busso C, Hernanz S, Chu C-W, et al.: Smart room: participant and speaker localization and identification. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 2: 1117-1120.Google Scholar
  12. Amft O, Lauffer M, Ossevoort S, Macaluso F, Lukowicz P, Tröster G: Design of the QBIC wearable computing platform. Proceedings of 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP '04), September 2004 398-410.Google Scholar
  13. Mann S: Wearable computing as means for personal empowerment. 1st International Conference on Wearable Computing (ICWC '98), May 1998, Fairfax, Va, USAGoogle Scholar
  14. Pentland A: Wearable intelligence. Scientific American 1998., 276(1es1):Google Scholar
  15. Shriberg E, Stolcke A, Baron D: Observations on overlap: findings and implications for automatic processing of multi-party conversation. Poceedings of 7th European Conference on Speech Communication and Technology Eurospeech, September 2001, Aalborg, Denmark 2: 1359-1362.Google Scholar
  16. Ferber R: Information Retrieval. dpunkt, Germany; 2003.MATHGoogle Scholar
  17. Yilmaz O, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004, 52(7):1830-1847. 10.1109/TSP.2004.828896MathSciNetView ArticleGoogle Scholar
  18. Rickard S, Balan R, Rosca J: Blind source separation based on space-time-frequency diversity. Proceedings of 4th International Symposium on Independent Component Analysis and Blind Source Separation, April 2003, Nara, Japan 493-498.Google Scholar
  19. Rickard S, Yilmaz Z: On the approximate W-disjoint orthogonality of speech. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529-532.Google Scholar
  20. Aarabi P, Mahdavi A: The relation between speech segment selectivity and source localization accuracy. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 273-276.Google Scholar
  21. Basu S, Schwartz S, Pentland A: Wearable phased arrays for sound localization enhancement. Proceedings of the IEEE International Symposium on Wearable Computing (ISWC '00), 2000, Atlanta, Ga, USA 103-110.Google Scholar
  22. Tritschler A, Gopinath R: Improved speaker segmentation and segments clustering using the Bayesian information criterion. Proceedings of the 6th European Conference on Speech Communication and Technology (EUROSPEECH '99), September 1999, Budapest, Hungary 679-682.Google Scholar
  23. Lu L, Jiang H, Zhang HJ: A robust audio classification and segementation method. Proceedings of the 9th ACM International Conference on Multimedia, September-October 2001, Ottawa, Ontario, Canada 203-211.Google Scholar
  24. Peltonen V, Tuomi J, Klapuri A, Huopaniemi J, Sorsa T: Computational auditory scene recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1941-1944.Google Scholar
  25. Scheirer E, Slaney M: Construction and evaluation of a robust multifeature speech/music discriminator. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), April 1997, Munich, Germany 2: 1331-1334.Google Scholar
  26. Atal BS: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America 1974, 55(6):1304-1312. 10.1121/1.1914702View ArticleGoogle Scholar
  27. Schwarz G: Estimating the dimension of a model. The Annals of Statistics 1978, 6(2):461-464. 10.1214/aos/1176344136MathSciNetView ArticleMATHGoogle Scholar
  28. Delacourt P, Kryze D, Wellekens C: Speaker-based segmentation for audio data indexing. Proceedings of the ESCA Tutorial and Research Workshop (ITRW '99). Accessing Information in Spoken Audio, April 1999, Cambridge, UK 78-83.Google Scholar
  29. Cettolo M, Vescovi M: Efficient audio segmentation algorithms based on the BIC. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong 6: 537-540.Google Scholar
  30. Ajmera J, McCowanand I, Bourlard H: BIC revisited and applied to speaker change detection. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong KongGoogle Scholar
  31. Campbell JP: Speaker recognition: a tutorial. Proceedings of the IEEE 1997, 85(9):1437-1462. 10.1109/5.628714View ArticleGoogle Scholar
  32. Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995, 3(1):72-83. 10.1109/89.365379View ArticleGoogle Scholar
  33. Nishida M, Kawahara T: Unsupervised speaker indexing using speaker model selection based on Bayesian information criterion. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 172-175.View ArticleGoogle Scholar
  34. Matsui T, Furui S: Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '92), March 1992, San Francisco, Calif, USA 2: 157-160.Google Scholar
  35. Bimbot F, Bonastre J, Fredouille C, et al.: A tutorial on text-independent speaker verification. EURASIP Jounral on Applied Signal Processing 2004, 2004(4):430-451. 10.1155/S1110865704310024View ArticleGoogle Scholar
  36. Kawahara H, Irino T: Exploring temporal feature representations of speech using neural networks. Tech. Rep. SP88-31 1988.Google Scholar
  37. Aoki M, Okamoto M, Aoki S, Matsui H, Sakurai T, Kaneda Y: Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones. Acoustical Science and Technology 2001, 22(2):149-157. 10.1250/ast.22.149View ArticleGoogle Scholar
  38. Baeck M, Zölzer U: Real-time implementation of a source separation algorithm. Proceedings of the 6th International Conference on Digital Audio Effects (DAFx '03), September 2003, London, UKGoogle Scholar
  39. van Rijsbergen CJ: Information retrieval. Butterworths, London, UK; 1979.MATHGoogle Scholar
  40. Anliker U, Beutel J, Dyer M: A systematic approach to the design of distributed wearable systems. IEEE Transactions on Computers 2004, 53(8):1017-1033. 10.1109/TC.2004.36View ArticleGoogle Scholar
  41. He JL, Liu L, Palm G: A text-independent speaker identification system based on neural networks. Proceedings of the International Conference on Spoken Language Processsing (ICSLP '94), September 1994, Yokohama, Japan 1851-1854.Google Scholar

Copyright

© Anliker et al. 2006

Advertisement