Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings

Woodruff, John; Pardo, Bryan

doi:10.1155/2007/86369

Research Article
Open access
Published: 01 December 2006

Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings

John Woodruff¹ &
Bryan Pardo²

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 086369 (2006) Cite this article

1438 Accesses
8 Citations
Metrics details

Abstract

Recent work in blind source separation applied to anechoic mixtures of speech allows for improved reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely meet this constraint, requiring new approaches. We introduce a method that uses spatial cues from anechoic, stereo music recordings and assumptions regarding the structure of musical source signals to effectively separate mixtures of tonal music. We discuss existing techniques to create partial source signal estimates from regions of the mixture where source signals do not overlap significantly. We use these partial signals within a new demixing framework, in which we estimate harmonic masks for each source, allowing the determination of the number of active sources in important time-frequency frames of the mixture. We then propose a method for distributing energy from time-frequency frames of the mixture to multiple source signals. This allows dealing with mixtures that contain time-frequency frames in which multiple harmonic sources are active without requiring knowledge of source characteristics.

References

Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830–1846. 10.1109/TSP.2004.828896
Article MathSciNet Google Scholar
Anemüller J, Kollmeier B: Amplitude modulation decorrelation for convolutive blind source separation. Proceedings of the 2nd International Workshop on Independent Component Analysis and Blind Signal Separation (ICA '00), June 2000, Helsinki, Finland 215–220.
Google Scholar
Lee T-W, Bell AJ, Orglmeister R: Blind source separation of real world signals. Proceedings of the IEEE International Conference on Neural Networks, June 1997, Houston, Tex, USA 4: 2129–2134.
Article Google Scholar
Parra LC, Spence CD: Separation of non-stationary natural signals. In Independent Component Analysis: Principles and Practice. Cambridge University Press, Cambridge, Mass, USA; 2001:135–157.
Chapter Google Scholar
Stone JV: Independent Component Analysis: A Tutorial Introduction. MIT Press, Cambridge, Mass, USA; 2004.
Book Google Scholar
Aarabi P, Shi G, Jahromi O: Robust speech separation using time-frequency masking. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 741–744.
Google Scholar
Balan R, Rosca J: Source separation using sparse discrete prior models. Proceedings of the Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS '05), November 2005, Rennes, France
Google Scholar
O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18–33. 10.1002/ima.20035
Article Google Scholar
Rickard S, Yilmaz Ö: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529–532.
Google Scholar
Bregman A: Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, Cambridge, Mass, USA; 1990.
Google Scholar
Rosenthal DF, Okuno HG: Computational Auditory Scene Analysis. Lawrence Erlbaum Associates, Mahwah, NJ, USA; 1998.
Google Scholar
Brown GJ, Wang D: Separation of speech by computational auditory scene analysis. In Speech Enhancement. Edited by: Benesty J, Makino S, Chen J. Springer, New York, NY, USA; 2005:371–402.
Chapter Google Scholar
Ellis D: Prediction-driven computational auditory scene analysis, Ph.D. dissertation.
Hu G, Wang DL: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks 2004,15(5):1135–1150. 10.1109/TNN.2004.832812
Article Google Scholar
Every M, Szymanski J: A spectral-filtering approach to music signal separation. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 197–200.
Google Scholar
Vincent E: Musical source separation using time-frequency source priors. IEEE Transactions on Audio, Speech and Language Processing 2006,14(1):91–98.
Article Google Scholar
Virtanen T, Klapuri A: Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2001, New Paltz, NY, USA 83–86.
Google Scholar
Virtanen T, Klapuri A: Separation of harmonic sounds using linear models for the overtone series. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1757–1760.
Google Scholar
Viste H, Evangelista G: Separation of harmonic instruments with overlapping partials in multi-channel mixtures. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2003, New Paltz, NY, USA 25–28.
Google Scholar
Risset JC, Wessel D: Exploration of timbre by analysis and synthesis. In The Psychology of Music. Academic Press, New York, NY, USA; 1982:26–58.
Google Scholar
Master AS: Sound source separation of n sources from stereo signals via fitting to n models each lacking one source. CCRMA, Stanford University, Stanford, Calif, USA; 2003.
Google Scholar
Roman N, Wang D, Brown GJ: Speech segregation based on sound localization. Journal of the Acoustical Society of America 2003,114(4):2236–2252. 10.1121/1.1610463
Article Google Scholar
Viste H, Evangelista G: On the use of spatial cues to improve binaural source separation. Proceedings of the 6th International Conference on Digital Audio Effects (DAFx '03), September 2003, London, UK
Google Scholar
Viste H, Evangelista G: Binaural source localization. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 145–150.
Google Scholar
Woodruff J, Pardo B: Active source estimation for improved source separation. In Tech. Rep. NWU-EECS-06-01. EECS Department, Northwestern University, Evanston, Ill, USA; 2006.
Google Scholar
Boersma P: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam, 1993, Amsterdam, The Netherlands 17: 97–110.
Google Scholar
Oppenheim AV, Schafer RW: Discrete-Time Signal Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 1989.
MATH Google Scholar
Woodruff J, Pardo B, Dannenberg R: Remixing stereo music with score-informed source separation. Proceedings of the International Symposium on Music Information Retrieval (ISMIR '06), October 2006, Victoria, British Columbia, Canada
Google Scholar
Fritts LUniversity of Iowa Musical Instrument Samples, https://doi.org/theremin.music.uiowa.edu
Gribonval R, Benaroya L, Vincent E, Fevotte C: Proposals for performance measurement in source separation. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan
Google Scholar

Download references

Author information

Authors and Affiliations

Music Technology Program, School of Music, Northwestern University, Evanston, IL, 60208, USA
John Woodruff
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, 60208, USA
Bryan Pardo

Authors

John Woodruff
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Woodruff.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Woodruff, J., Pardo, B. Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings. EURASIP J. Adv. Signal Process. 2007, 086369 (2006). https://doi.org/10.1155/2007/86369

Download citation

Received: 02 December 2005
Revised: 30 July 2006
Accepted: 10 September 2006
Published: 01 December 2006
DOI: https://doi.org/10.1155/2007/86369

Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords