- Research Article
- Open access
- Published:
Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings
EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 086369 (2006)
Abstract
Recent work in blind source separation applied to anechoic mixtures of speech allows for improved reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely meet this constraint, requiring new approaches. We introduce a method that uses spatial cues from anechoic, stereo music recordings and assumptions regarding the structure of musical source signals to effectively separate mixtures of tonal music. We discuss existing techniques to create partial source signal estimates from regions of the mixture where source signals do not overlap significantly. We use these partial signals within a new demixing framework, in which we estimate harmonic masks for each source, allowing the determination of the number of active sources in important time-frequency frames of the mixture. We then propose a method for distributing energy from time-frequency frames of the mixture to multiple source signals. This allows dealing with mixtures that contain time-frequency frames in which multiple harmonic sources are active without requiring knowledge of source characteristics.
References
Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830–1846. 10.1109/TSP.2004.828896
Anemüller J, Kollmeier B: Amplitude modulation decorrelation for convolutive blind source separation. Proceedings of the 2nd International Workshop on Independent Component Analysis and Blind Signal Separation (ICA '00), June 2000, Helsinki, Finland 215–220.
Lee T-W, Bell AJ, Orglmeister R: Blind source separation of real world signals. Proceedings of the IEEE International Conference on Neural Networks, June 1997, Houston, Tex, USA 4: 2129–2134.
Parra LC, Spence CD: Separation of non-stationary natural signals. In Independent Component Analysis: Principles and Practice. Cambridge University Press, Cambridge, Mass, USA; 2001:135–157.
Stone JV: Independent Component Analysis: A Tutorial Introduction. MIT Press, Cambridge, Mass, USA; 2004.
Aarabi P, Shi G, Jahromi O: Robust speech separation using time-frequency masking. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 741–744.
Balan R, Rosca J: Source separation using sparse discrete prior models. Proceedings of the Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS '05), November 2005, Rennes, France
O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18–33. 10.1002/ima.20035
Rickard S, Yilmaz Ö: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529–532.
Bregman A: Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, Cambridge, Mass, USA; 1990.
Rosenthal DF, Okuno HG: Computational Auditory Scene Analysis. Lawrence Erlbaum Associates, Mahwah, NJ, USA; 1998.
Brown GJ, Wang D: Separation of speech by computational auditory scene analysis. In Speech Enhancement. Edited by: Benesty J, Makino S, Chen J. Springer, New York, NY, USA; 2005:371–402.
Ellis D: Prediction-driven computational auditory scene analysis, Ph.D. dissertation.
Hu G, Wang DL: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks 2004,15(5):1135–1150. 10.1109/TNN.2004.832812
Every M, Szymanski J: A spectral-filtering approach to music signal separation. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 197–200.
Vincent E: Musical source separation using time-frequency source priors. IEEE Transactions on Audio, Speech and Language Processing 2006,14(1):91–98.
Virtanen T, Klapuri A: Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2001, New Paltz, NY, USA 83–86.
Virtanen T, Klapuri A: Separation of harmonic sounds using linear models for the overtone series. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1757–1760.
Viste H, Evangelista G: Separation of harmonic instruments with overlapping partials in multi-channel mixtures. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2003, New Paltz, NY, USA 25–28.
Risset JC, Wessel D: Exploration of timbre by analysis and synthesis. In The Psychology of Music. Academic Press, New York, NY, USA; 1982:26–58.
Master AS: Sound source separation of n sources from stereo signals via fitting to n models each lacking one source. CCRMA, Stanford University, Stanford, Calif, USA; 2003.
Roman N, Wang D, Brown GJ: Speech segregation based on sound localization. Journal of the Acoustical Society of America 2003,114(4):2236–2252. 10.1121/1.1610463
Viste H, Evangelista G: On the use of spatial cues to improve binaural source separation. Proceedings of the 6th International Conference on Digital Audio Effects (DAFx '03), September 2003, London, UK
Viste H, Evangelista G: Binaural source localization. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 145–150.
Woodruff J, Pardo B: Active source estimation for improved source separation. In Tech. Rep. NWU-EECS-06-01. EECS Department, Northwestern University, Evanston, Ill, USA; 2006.
Boersma P: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam, 1993, Amsterdam, The Netherlands 17: 97–110.
Oppenheim AV, Schafer RW: Discrete-Time Signal Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 1989.
Woodruff J, Pardo B, Dannenberg R: Remixing stereo music with score-informed source separation. Proceedings of the International Symposium on Music Information Retrieval (ISMIR '06), October 2006, Victoria, British Columbia, Canada
Fritts LUniversity of Iowa Musical Instrument Samples, https://doi.org/theremin.music.uiowa.edu
Gribonval R, Benaroya L, Vincent E, Fevotte C: Proposals for performance measurement in source separation. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Woodruff, J., Pardo, B. Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings. EURASIP J. Adv. Signal Process. 2007, 086369 (2006). https://doi.org/10.1155/2007/86369
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/86369