Skip to main content

Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings

Abstract

Recent work in blind source separation applied to anechoic mixtures of speech allows for improved reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely meet this constraint, requiring new approaches. We introduce a method that uses spatial cues from anechoic, stereo music recordings and assumptions regarding the structure of musical source signals to effectively separate mixtures of tonal music. We discuss existing techniques to create partial source signal estimates from regions of the mixture where source signals do not overlap significantly. We use these partial signals within a new demixing framework, in which we estimate harmonic masks for each source, allowing the determination of the number of active sources in important time-frequency frames of the mixture. We then propose a method for distributing energy from time-frequency frames of the mixture to multiple source signals. This allows dealing with mixtures that contain time-frequency frames in which multiple harmonic sources are active without requiring knowledge of source characteristics.

References

  1. 1.

    Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830–1846. 10.1109/TSP.2004.828896

    MathSciNet  Article  Google Scholar 

  2. 2.

    Anemüller J, Kollmeier B: Amplitude modulation decorrelation for convolutive blind source separation. Proceedings of the 2nd International Workshop on Independent Component Analysis and Blind Signal Separation (ICA '00), June 2000, Helsinki, Finland 215–220.

    Google Scholar 

  3. 3.

    Lee T-W, Bell AJ, Orglmeister R: Blind source separation of real world signals. Proceedings of the IEEE International Conference on Neural Networks, June 1997, Houston, Tex, USA 4: 2129–2134.

    Article  Google Scholar 

  4. 4.

    Parra LC, Spence CD: Separation of non-stationary natural signals. In Independent Component Analysis: Principles and Practice. Cambridge University Press, Cambridge, Mass, USA; 2001:135–157.

    Google Scholar 

  5. 5.

    Stone JV: Independent Component Analysis: A Tutorial Introduction. MIT Press, Cambridge, Mass, USA; 2004.

    Google Scholar 

  6. 6.

    Aarabi P, Shi G, Jahromi O: Robust speech separation using time-frequency masking. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 741–744.

    Google Scholar 

  7. 7.

    Balan R, Rosca J: Source separation using sparse discrete prior models. Proceedings of the Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS '05), November 2005, Rennes, France

    Google Scholar 

  8. 8.

    O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18–33. 10.1002/ima.20035

    Article  Google Scholar 

  9. 9.

    Rickard S, Yilmaz Ö: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529–532.

    Google Scholar 

  10. 10.

    Bregman A: Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, Cambridge, Mass, USA; 1990.

    Google Scholar 

  11. 11.

    Rosenthal DF, Okuno HG: Computational Auditory Scene Analysis. Lawrence Erlbaum Associates, Mahwah, NJ, USA; 1998.

    Google Scholar 

  12. 12.

    Brown GJ, Wang D: Separation of speech by computational auditory scene analysis. In Speech Enhancement. Edited by: Benesty J, Makino S, Chen J. Springer, New York, NY, USA; 2005:371–402.

    Google Scholar 

  13. 13.

    Ellis D: Prediction-driven computational auditory scene analysis, Ph.D. dissertation.

  14. 14.

    Hu G, Wang DL: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks 2004,15(5):1135–1150. 10.1109/TNN.2004.832812

    Article  Google Scholar 

  15. 15.

    Every M, Szymanski J: A spectral-filtering approach to music signal separation. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 197–200.

    Google Scholar 

  16. 16.

    Vincent E: Musical source separation using time-frequency source priors. IEEE Transactions on Audio, Speech and Language Processing 2006,14(1):91–98.

    Article  Google Scholar 

  17. 17.

    Virtanen T, Klapuri A: Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2001, New Paltz, NY, USA 83–86.

    Google Scholar 

  18. 18.

    Virtanen T, Klapuri A: Separation of harmonic sounds using linear models for the overtone series. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1757–1760.

    Google Scholar 

  19. 19.

    Viste H, Evangelista G: Separation of harmonic instruments with overlapping partials in multi-channel mixtures. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2003, New Paltz, NY, USA 25–28.

    Google Scholar 

  20. 20.

    Risset JC, Wessel D: Exploration of timbre by analysis and synthesis. In The Psychology of Music. Academic Press, New York, NY, USA; 1982:26–58.

    Google Scholar 

  21. 21.

    Master AS: Sound source separation of n sources from stereo signals via fitting to n models each lacking one source. CCRMA, Stanford University, Stanford, Calif, USA; 2003.

    Google Scholar 

  22. 22.

    Roman N, Wang D, Brown GJ: Speech segregation based on sound localization. Journal of the Acoustical Society of America 2003,114(4):2236–2252. 10.1121/1.1610463

    Article  Google Scholar 

  23. 23.

    Viste H, Evangelista G: On the use of spatial cues to improve binaural source separation. Proceedings of the 6th International Conference on Digital Audio Effects (DAFx '03), September 2003, London, UK

    Google Scholar 

  24. 24.

    Viste H, Evangelista G: Binaural source localization. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 145–150.

    Google Scholar 

  25. 25.

    Woodruff J, Pardo B: Active source estimation for improved source separation. In Tech. Rep. NWU-EECS-06-01. EECS Department, Northwestern University, Evanston, Ill, USA; 2006.

    Google Scholar 

  26. 26.

    Boersma P: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam, 1993, Amsterdam, The Netherlands 17: 97–110.

    Google Scholar 

  27. 27.

    Oppenheim AV, Schafer RW: Discrete-Time Signal Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 1989.

    Google Scholar 

  28. 28.

    Woodruff J, Pardo B, Dannenberg R: Remixing stereo music with score-informed source separation. Proceedings of the International Symposium on Music Information Retrieval (ISMIR '06), October 2006, Victoria, British Columbia, Canada

    Google Scholar 

  29. 29.

    Fritts LUniversity of Iowa Musical Instrument Samples, https://doi.org/theremin.music.uiowa.edu

  30. 30.

    Gribonval R, Benaroya L, Vincent E, Fevotte C: Proposals for performance measurement in source separation. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to John Woodruff.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Woodruff, J., Pardo, B. Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings. EURASIP J. Adv. Signal Process. 2007, 086369 (2006). https://doi.org/10.1155/2007/86369

Download citation

Keywords

  • Amplitude Modulation
  • Source Signal
  • Active Source
  • Source Separation
  • Blind Source Separation