Skip to main content

Multimodal Semantic Analysis and Annotation for Basketball Video


This paper presents a new multiple-modality method for extracting semantic information from basketball video. The visual, motion, and audio information are extracted from video to first generate some low-level video segmentation and classification. Domain knowledge is further exploited for detecting interesting events in the basketball video. For video, both visual and motion prediction information are utilized for shot and scene boundary detection algorithm; this will be followed by scene classification. For audio, audio keysounds are sets of specific audio sounds related to semantic events and a classification method based on hidden Markov model (HMM) is used for audio keysound identification. Subsequently, by analyzing the multimodal information, the positions of potential semantic events, such as "foul" and "shot at the basket," are located with additional domain knowledge. Finally, a video annotation is generated according to MPEG-7 multimedia description schemes (MDSs). Experimental results demonstrate the effectiveness of the proposed method.


  1. 1.

    Gong YH, Sin LT, Chuan CH, Zhang H, Sakauchi M: Automatic parsing of TV soccer programs. Proceedings of International Conference on Multimedia Computing and Systems (ICMCS '95), May 1995, Washington, DC, USA 167–174.

    Chapter  Google Scholar 

  2. 2.

    Tan Y-P, Saur DD, Kulkami SR, Ramadge PJ: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Transactions on Circuits and Systems for Video Technology 2000, 10(1):133–146. 10.1109/76.825867

    Article  Google Scholar 

  3. 3.

    Xu P, Xie L, Chang S-F, Divakaran A, Vetro A, Sun H: Algorithms and system for segmentation and structure analysis in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '01), August 2001, Tokyo, Japan 721–724.

    Google Scholar 

  4. 4.

    Ekin A, Tekalp AM, Mehrotra R: Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing 2003, 12(7):796–807. 10.1109/TIP.2003.812758

    Article  Google Scholar 

  5. 5.

    Lu H, Tan Y-P: Content-based sports video analysis and modeling. Proceedings of 7th International Conference on Control, Automation, Robotics and Vision (ICARCV '02), December 2002, Singapore 1198–1203.

    Google Scholar 

  6. 6.

    Fu Y, Ekin A, Tekalp AM, Mehrotra R: Temporal segmentation of video objects for hierarchical object-based motion description. IEEE Transactions on Image Processing 2002, 11(2):135–145. 10.1109/83.982821

    Article  Google Scholar 

  7. 7.

    Duan L-Y, Xu M, Chua T-S, Tian Q, Xu C-S: A mid-level representation framework for semantic sports video analysis. Proceedings of 11th ACM International Conference on Multimedia, November 2003, Berkeley, Calif, USA 33–44.

    Google Scholar 

  8. 8.

    Han M, Hua W, Xu W, Gong YH: An integrated baseball digest system using maximum entropy method. Proceedings of 10th ACM International Conference on Multimedia, December 2002, Juan les Pins, France 347–350.

    Google Scholar 

  9. 9.

    Nepal S, Srinivasan U, Reynolds G: Automatic detection of goal segments in basketball videos. Proceedings of 9th ACM International Conference on Multimedia, September 2001, Ottawa, Ontario, Canada 9: 261–269.

    Google Scholar 

  10. 10.

    Xu M, Duan L-Y, Xu C-S, Kankanhalli M, Tian Q: Event detection in basketball video using multiple modalities. Proceedings of 4th International Conference on Information, Communications and Signal Processing and the 4th Pacific Rim Conference on Multimedia (ICICS-PCM '03), December 2003, Singapore 3: 1526–1530.

    Google Scholar 

  11. 11.

    Naphade MR, Huang TS: Semantic video indexing using a probabilistic framework. Proceedings of International Conference on Pattern Recognition (ICPR '00), September 2000, Barcelona, Spain 3: 3083–3088.

    Google Scholar 

  12. 12.

    Snoek CGM, Worring M: Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 2005, 7(4):638–647.

    Article  Google Scholar 

  13. 13.

    Rui Y, Gupta A, Acero A: Automatically extracting highlights for TV baseball programs. Proceedings of 8th ACM International Conference on Multimedia, October–November 2000, Los Angeles, Calif, USA 105–115.

    Google Scholar 

  14. 14.

    Xu M, Maddage NC, Xu C-S, Kankanhalli M, Tian Q: Creating audio keywords for event detection in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 2: 281–284.

    Google Scholar 

  15. 15.

    Rabiner L, Juang B-H: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.

    Google Scholar 

  16. 16.

    Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W: Semantic annotation of soccer videos: automatic highlights identification. Computer Vision and Image Understanding 2003, 92(2–3):285–305. 10.1016/j.cviu.2003.06.004

    Article  Google Scholar 

  17. 17.

    Pan H, van Beek P, Sezan MI: Detection of slow-motion replay segments in sports video for highlights generation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 3: 1649–1652.

    Google Scholar 

  18. 18.

    Xie L, Xu P, Chang S-F, Divakaran A, Sun H: Structure analysis of soccer video with domain knowledge and hidden Markov models . Pattern Recognition Letters 2004, 25(7):767–775. 10.1016/j.patrec.2004.01.005

    Article  Google Scholar 

  19. 19.

    Xiong Z, Radhakrishnan R, Divakaran A, Huang TS: Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong, China 5: 632–635.

    Google Scholar 

  20. 20.

    Nam J, Tewfik A: Combined audio and visual streams analysis for video sequence segmentation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), April 1997, Munich, Germany 4: 2665–2668.

    Google Scholar 

  21. 21.

    Saraceno C, Leonardi R: Identification of story units in audio-visual sequences by joint audio and video processing. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 1: 363–367.

    Article  Google Scholar 

  22. 22.

    Yi H, Rajan D, Chia LT: A unified approach to detection of shot boundaries and subshots in compressed video. Proceedings of International Conference on Image Processing (ICIP '03), September 2003, Barcelona, Spain 2: 1005–1008.

    Google Scholar 

  23. 23.

    Siew LH, Hodgson RM, Wood EJ: Texture measures for carpet wear assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence 1988, 10(1):92–105. 10.1109/34.3870

    Article  Google Scholar 

  24. 24.

    Haralick RM, Shanmugam K, Dinstein I: Textural features for image classification. IEEE Transactions System, Man, and Cybernetics 1973, 3(6):610–621.

    Article  Google Scholar 

  25. 25.

    Stiller C, Konrad J: Estimating motion in image sequences. IEEE Signal Processing Magazine 1999, 16(4):70–91. 10.1109/79.774934

    Article  Google Scholar 

  26. 26.

    Szeliski R: Video mosaics for virtual environments. IEEE Computer Graphics and Applications 1996, 16(2):22–30. 10.1109/38.486677

    Article  Google Scholar 

  27. 27.

    Young S, Evermann G, Kershaw D, et al.: The HTK Book (for HTK Version 3.1). Cambridge University Engineering Department, Cambridge, UK, December 2002

    Google Scholar 

  28. 28.

    Manjunath BS, Salembier P, Sikora T: Introduction to MPEG-7. John Wiley & Sons, New York, NY, USA; 2002.

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Song Liu.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Liu, S., Xu, M., Yi, H. et al. Multimodal Semantic Analysis and Annotation for Basketball Video. EURASIP J. Adv. Signal Process. 2006, 032135 (2006).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:


  • Hide Markov Model
  • Domain Knowledge
  • Boundary Detection
  • Motion Prediction
  • Semantic Event