Skip to content


  • Research Article
  • Open Access

Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony

EURASIP Journal on Advances in Signal Processing20062007:073205

  • Received: 30 November 2005
  • Accepted: 27 August 2006
  • Published:


The segmentation of music into intro-chorus-verse-outro, and similar segments, is a difficult topic. A method for performing automatic segmentation based on features related to rhythm, timbre, and harmony is presented, and compared, between the features and between the features and manual segmentation of a database of 48 songs. Standard information retrieval performance measures are used in the comparison, and it is shown that the timbre-related feature performs best.


  • Information Technology
  • Information Retrieval
  • Quantum Information
  • Multiple Scale
  • Automatic Segmentation


Authors’ Affiliations

Department of Medialogy, Aalborg University Esbjerg, Niels Bohrs Vej 6, Esbjerg, 6700, Denmark


  1. Andersen TH: Mixxx: towards novel dj interfaces. Proceedings of the International Conference on New Interfaces for Musical Expression (NIME '03), May 2003, Montreal, Quebec, Canada 30-35.Google Scholar
  2. Murphy D: Pattern play. In Additional Proceedings of the 2nd International Conference on Music and Artificial Intelligence, September 2002, Edinburgh, Scotland Edited by: Smaill A.Google Scholar
  3. Bartsch MA, Wakefield GH: To catch a chorus: using chroma-based representations for audio thumbnailing. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2001, New Paltz, NY, USA 15-18.Google Scholar
  4. Foote J: Visualizing music and audio using self-similarity. Proceedings of the 7th ACM International Multimedia Conference & Exhibition, November 1999, Orlando, Fla, USA 77-80.Google Scholar
  5. Foote J: Automatic audio segmentation using a measure of audio novelty. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '00), July-August 2000, New York, NY, USA 1: 452-455.View ArticleGoogle Scholar
  6. Cooper M, Foote J: Summarizing popular music via structural similarity analysis. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '03), October 2003, New Paltz, NY, USA 127-130.Google Scholar
  7. Jensen K: A causal rhythm grouping. Proceedings of 2nd International Symposium on Computer Music Modeling and Retrieval (CMMR '04), 2005, Lecture Notes in Computer Science 3310: 83-95.View ArticleGoogle Scholar
  8. Peeters G, Rodet X: Signal-based music structure discovery for music audio summary generation. Proceedings of International Computer Music Conference (ICMC '03), Octobre 2003, Singapore 15-22.Google Scholar
  9. Dannenberg RB, Hu N: Pattern discovery techniques for music audio. Journal of New Music Research 2003,32(2):153-163. 10.1076/jnmr. ArticleGoogle Scholar
  10. Goto M: A chorus-section detecting method for musical audio signals. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 5: 437-440.Google Scholar
  11. Dubnov S, Assayag G, El-Yaniv R: Universal classification applied to musical sequences. Proceedings of the International Computer Music Conference (ICMC '98), October 1998, Ann Arbor, Mich, USA 332-340.Google Scholar
  12. Jehan T: Hierarchical multi-class self similarities. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '05), October 2005, New Paltz, NY, USA 311-314.Google Scholar
  13. Jensen K, Xu J, Zachariasen M: Rhythm-based segmentation of popular chinese music. Proceedings of 6th International Conference on Music Information Retrieval (ISMIR '05), September 2005, London, UK 374-380.Google Scholar
  14. Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423View ArticleGoogle Scholar
  15. Jensen K: Perceptual atomic noise. Proceedings of the International Computer Music Conference (ICMC '05), September 2005, Barcelona, Spain 668-671.Google Scholar
  16. Collins N: A comparison of sound onset detection algorithms with emphasis on psychoacoustically motivated detection functions. Proceedings of AES 118th Convention, May 2005, Barcelona, SpainGoogle Scholar
  17. Desain P: A (de)composable theory of rhythm. Music Perception 1992,9(4):439-454.View ArticleGoogle Scholar
  18. Sekey A, Hanson BA: Improved 1-bark bandwidth auditory filter. Journal of the Acoustical Society of America 1984,75(6):1902-1904. 10.1121/1.390954View ArticleGoogle Scholar
  19. Eckmann JP, Kamphorst SO, Ruelle D: Recurrence plots of dynamical systems. Europhysics Letters 1987,4(9):973-977. 10.1209/0295-5075/4/9/004View ArticleGoogle Scholar
  20. Cormen TH, Stein C, Rivest RL, Leiserson CE: Introduction to Algorithms. 2nd edition. The MIT Press, Cambridge, UK; McGraw-Hill, New York, NY, USA; 2001.MATHGoogle Scholar
  21. Tzanetakis G, Cook P: Multifeature audio segmentation for browsing and annotation. Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '99), October 1999, New Paltz, NY, USA 103-106.Google Scholar


© Jensen 2007