Skip to content

Advertisement

  • Research Article
  • Open Access

Multimodal Semantic Analysis and Annotation for Basketball Video

EURASIP Journal on Advances in Signal Processing20062006:032135

https://doi.org/10.1155/ASP/2006/32135

  • Received: 1 September 2004
  • Accepted: 14 March 2005
  • Published:

Abstract

This paper presents a new multiple-modality method for extracting semantic information from basketball video. The visual, motion, and audio information are extracted from video to first generate some low-level video segmentation and classification. Domain knowledge is further exploited for detecting interesting events in the basketball video. For video, both visual and motion prediction information are utilized for shot and scene boundary detection algorithm; this will be followed by scene classification. For audio, audio keysounds are sets of specific audio sounds related to semantic events and a classification method based on hidden Markov model (HMM) is used for audio keysound identification. Subsequently, by analyzing the multimodal information, the positions of potential semantic events, such as "foul" and "shot at the basket," are located with additional domain knowledge. Finally, a video annotation is generated according to MPEG-7 multimedia description schemes (MDSs). Experimental results demonstrate the effectiveness of the proposed method.

Keywords

  • Hide Markov Model
  • Domain Knowledge
  • Boundary Detection
  • Motion Prediction
  • Semantic Event

[12345678910111213141516171819202122232425262728]

Authors’ Affiliations

(1)
School of Computer Engineering, Nanyang Technological University, Block N4, 02A-32, Nanyang Avenue, Singapore, 639798

References

  1. Gong YH, Sin LT, Chuan CH, Zhang H, Sakauchi M: Automatic parsing of TV soccer programs. Proceedings of International Conference on Multimedia Computing and Systems (ICMCS '95), May 1995, Washington, DC, USA 167-174.View ArticleGoogle Scholar
  2. Tan Y-P, Saur DD, Kulkami SR, Ramadge PJ: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Transactions on Circuits and Systems for Video Technology 2000, 10(1):133-146. 10.1109/76.825867View ArticleGoogle Scholar
  3. Xu P, Xie L, Chang S-F, Divakaran A, Vetro A, Sun H: Algorithms and system for segmentation and structure analysis in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '01), August 2001, Tokyo, Japan 721-724.Google Scholar
  4. Ekin A, Tekalp AM, Mehrotra R: Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing 2003, 12(7):796-807. 10.1109/TIP.2003.812758View ArticleGoogle Scholar
  5. Lu H, Tan Y-P: Content-based sports video analysis and modeling. Proceedings of 7th International Conference on Control, Automation, Robotics and Vision (ICARCV '02), December 2002, Singapore 1198-1203.Google Scholar
  6. Fu Y, Ekin A, Tekalp AM, Mehrotra R: Temporal segmentation of video objects for hierarchical object-based motion description. IEEE Transactions on Image Processing 2002, 11(2):135-145. 10.1109/83.982821View ArticleGoogle Scholar
  7. Duan L-Y, Xu M, Chua T-S, Tian Q, Xu C-S: A mid-level representation framework for semantic sports video analysis. Proceedings of 11th ACM International Conference on Multimedia, November 2003, Berkeley, Calif, USA 33-44.Google Scholar
  8. Han M, Hua W, Xu W, Gong YH: An integrated baseball digest system using maximum entropy method. Proceedings of 10th ACM International Conference on Multimedia, December 2002, Juan les Pins, France 347-350.Google Scholar
  9. Nepal S, Srinivasan U, Reynolds G: Automatic detection of goal segments in basketball videos. Proceedings of 9th ACM International Conference on Multimedia, September 2001, Ottawa, Ontario, Canada 9: 261-269.Google Scholar
  10. Xu M, Duan L-Y, Xu C-S, Kankanhalli M, Tian Q: Event detection in basketball video using multiple modalities. Proceedings of 4th International Conference on Information, Communications and Signal Processing and the 4th Pacific Rim Conference on Multimedia (ICICS-PCM '03), December 2003, Singapore 3: 1526-1530.Google Scholar
  11. Naphade MR, Huang TS: Semantic video indexing using a probabilistic framework. Proceedings of International Conference on Pattern Recognition (ICPR '00), September 2000, Barcelona, Spain 3: 3083-3088.Google Scholar
  12. Snoek CGM, Worring M: Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 2005, 7(4):638-647.View ArticleGoogle Scholar
  13. Rui Y, Gupta A, Acero A: Automatically extracting highlights for TV baseball programs. Proceedings of 8th ACM International Conference on Multimedia, October–November 2000, Los Angeles, Calif, USA 105-115.Google Scholar
  14. Xu M, Maddage NC, Xu C-S, Kankanhalli M, Tian Q: Creating audio keywords for event detection in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 2: 281-284.Google Scholar
  15. Rabiner L, Juang B-H: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.MATHGoogle Scholar
  16. Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W: Semantic annotation of soccer videos: automatic highlights identification. Computer Vision and Image Understanding 2003, 92(2-3):285-305. 10.1016/j.cviu.2003.06.004View ArticleGoogle Scholar
  17. Pan H, van Beek P, Sezan MI: Detection of slow-motion replay segments in sports video for highlights generation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 3: 1649-1652.Google Scholar
  18. Xie L, Xu P, Chang S-F, Divakaran A, Sun H: Structure analysis of soccer video with domain knowledge and hidden Markov models . Pattern Recognition Letters 2004, 25(7):767-775. 10.1016/j.patrec.2004.01.005View ArticleGoogle Scholar
  19. Xiong Z, Radhakrishnan R, Divakaran A, Huang TS: Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong, China 5: 632-635.Google Scholar
  20. Nam J, Tewfik A: Combined audio and visual streams analysis for video sequence segmentation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), April 1997, Munich, Germany 4: 2665-2668.Google Scholar
  21. Saraceno C, Leonardi R: Identification of story units in audio-visual sequences by joint audio and video processing. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 1: 363-367.View ArticleGoogle Scholar
  22. Yi H, Rajan D, Chia LT: A unified approach to detection of shot boundaries and subshots in compressed video. Proceedings of International Conference on Image Processing (ICIP '03), September 2003, Barcelona, Spain 2: 1005-1008.Google Scholar
  23. Siew LH, Hodgson RM, Wood EJ: Texture measures for carpet wear assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence 1988, 10(1):92-105. 10.1109/34.3870View ArticleGoogle Scholar
  24. Haralick RM, Shanmugam K, Dinstein I: Textural features for image classification. IEEE Transactions System, Man, and Cybernetics 1973, 3(6):610-621.View ArticleGoogle Scholar
  25. Stiller C, Konrad J: Estimating motion in image sequences. IEEE Signal Processing Magazine 1999, 16(4):70-91. 10.1109/79.774934View ArticleGoogle Scholar
  26. Szeliski R: Video mosaics for virtual environments. IEEE Computer Graphics and Applications 1996, 16(2):22-30. 10.1109/38.486677View ArticleGoogle Scholar
  27. Young S, Evermann G, Kershaw D, et al.: The HTK Book (for HTK Version 3.1). Cambridge University Engineering Department, Cambridge, UK, December 2002Google Scholar
  28. Manjunath BS, Salembier P, Sikora T: Introduction to MPEG-7. John Wiley & Sons, New York, NY, USA; 2002.Google Scholar

Copyright

Advertisement