Skip to content

Advertisement

  • Research Article
  • Open Access

Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings

EURASIP Journal on Advances in Signal Processing20062007:086369

https://doi.org/10.1155/2007/86369

Received: 2 December 2005

Accepted: 10 September 2006

Published: 27 December 2006

Abstract

Recent work in blind source separation applied to anechoic mixtures of speech allows for improved reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely meet this constraint, requiring new approaches. We introduce a method that uses spatial cues from anechoic, stereo music recordings and assumptions regarding the structure of musical source signals to effectively separate mixtures of tonal music. We discuss existing techniques to create partial source signal estimates from regions of the mixture where source signals do not overlap significantly. We use these partial signals within a new demixing framework, in which we estimate harmonic masks for each source, allowing the determination of the number of active sources in important time-frequency frames of the mixture. We then propose a method for distributing energy from time-frequency frames of the mixture to multiple source signals. This allows dealing with mixtures that contain time-frequency frames in which multiple harmonic sources are active without requiring knowledge of source characteristics.

Keywords

Amplitude ModulationSource SignalActive SourceSource SeparationBlind Source Separation

[123456789101112131415161718192021222324252627282930]

Authors’ Affiliations

(1)
Music Technology Program, School of Music, Northwestern University, Evanston, USA
(2)
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, USA

References

  1. Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830-1846. 10.1109/TSP.2004.828896MathSciNetView ArticleGoogle Scholar
  2. Anemüller J, Kollmeier B: Amplitude modulation decorrelation for convolutive blind source separation. Proceedings of the 2nd International Workshop on Independent Component Analysis and Blind Signal Separation (ICA '00), June 2000, Helsinki, Finland 215-220.Google Scholar
  3. Lee T-W, Bell AJ, Orglmeister R: Blind source separation of real world signals. Proceedings of the IEEE International Conference on Neural Networks, June 1997, Houston, Tex, USA 4: 2129-2134.Google Scholar
  4. Parra LC, Spence CD: Separation of non-stationary natural signals. In Independent Component Analysis: Principles and Practice. Cambridge University Press, Cambridge, Mass, USA; 2001:135-157.View ArticleGoogle Scholar
  5. Stone JV: Independent Component Analysis: A Tutorial Introduction. MIT Press, Cambridge, Mass, USA; 2004.Google Scholar
  6. Aarabi P, Shi G, Jahromi O: Robust speech separation using time-frequency masking. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 741-744.Google Scholar
  7. Balan R, Rosca J: Source separation using sparse discrete prior models. Proceedings of the Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS '05), November 2005, Rennes, FranceGoogle Scholar
  8. O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18-33. 10.1002/ima.20035View ArticleGoogle Scholar
  9. Rickard S, Yilmaz Ö: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529-532.Google Scholar
  10. Bregman A: Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, Cambridge, Mass, USA; 1990.Google Scholar
  11. Rosenthal DF, Okuno HG: Computational Auditory Scene Analysis. Lawrence Erlbaum Associates, Mahwah, NJ, USA; 1998.Google Scholar
  12. Brown GJ, Wang D: Separation of speech by computational auditory scene analysis. In Speech Enhancement. Edited by: Benesty J, Makino S, Chen J. Springer, New York, NY, USA; 2005:371-402.View ArticleGoogle Scholar
  13. Ellis D: Prediction-driven computational auditory scene analysis, Ph.D. dissertation.Google Scholar
  14. Hu G, Wang DL: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks 2004,15(5):1135-1150. 10.1109/TNN.2004.832812View ArticleGoogle Scholar
  15. Every M, Szymanski J: A spectral-filtering approach to music signal separation. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 197-200.Google Scholar
  16. Vincent E: Musical source separation using time-frequency source priors. IEEE Transactions on Audio, Speech and Language Processing 2006,14(1):91-98.View ArticleGoogle Scholar
  17. Virtanen T, Klapuri A: Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2001, New Paltz, NY, USA 83-86.Google Scholar
  18. Virtanen T, Klapuri A: Separation of harmonic sounds using linear models for the overtone series. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1757-1760.Google Scholar
  19. Viste H, Evangelista G: Separation of harmonic instruments with overlapping partials in multi-channel mixtures. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2003, New Paltz, NY, USA 25-28.Google Scholar
  20. Risset JC, Wessel D: Exploration of timbre by analysis and synthesis. In The Psychology of Music. Academic Press, New York, NY, USA; 1982:26-58.Google Scholar
  21. Master AS: Sound source separation of n sources from stereo signals via fitting to n models each lacking one source. CCRMA, Stanford University, Stanford, Calif, USA; 2003.Google Scholar
  22. Roman N, Wang D, Brown GJ: Speech segregation based on sound localization. Journal of the Acoustical Society of America 2003,114(4):2236-2252. 10.1121/1.1610463View ArticleGoogle Scholar
  23. Viste H, Evangelista G: On the use of spatial cues to improve binaural source separation. Proceedings of the 6th International Conference on Digital Audio Effects (DAFx '03), September 2003, London, UKGoogle Scholar
  24. Viste H, Evangelista G: Binaural source localization. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 145-150.Google Scholar
  25. Woodruff J, Pardo B: Active source estimation for improved source separation. In Tech. Rep. NWU-EECS-06-01. EECS Department, Northwestern University, Evanston, Ill, USA; 2006.Google Scholar
  26. Boersma P: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam, 1993, Amsterdam, The Netherlands 17: 97-110.Google Scholar
  27. Oppenheim AV, Schafer RW: Discrete-Time Signal Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 1989.MATHGoogle Scholar
  28. Woodruff J, Pardo B, Dannenberg R: Remixing stereo music with score-informed source separation. Proceedings of the International Symposium on Music Information Retrieval (ISMIR '06), October 2006, Victoria, British Columbia, CanadaGoogle Scholar
  29. Fritts LUniversity of Iowa Musical Instrument Samples, http://theremin.music.uiowa.edu
  30. Gribonval R, Benaroya L, Vincent E, Fevotte C: Proposals for performance measurement in source separation. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, JapanGoogle Scholar

Copyright

© Woodruff and Pardo 2007

Advertisement