- Research Article
- Open Access
Multimodal Semantic Analysis and Annotation for Basketball Video
EURASIP Journal on Advances in Signal Processingvolume 2006, Article number: 032135 (2006)
This paper presents a new multiple-modality method for extracting semantic information from basketball video. The visual, motion, and audio information are extracted from video to first generate some low-level video segmentation and classification. Domain knowledge is further exploited for detecting interesting events in the basketball video. For video, both visual and motion prediction information are utilized for shot and scene boundary detection algorithm; this will be followed by scene classification. For audio, audio keysounds are sets of specific audio sounds related to semantic events and a classification method based on hidden Markov model (HMM) is used for audio keysound identification. Subsequently, by analyzing the multimodal information, the positions of potential semantic events, such as "foul" and "shot at the basket," are located with additional domain knowledge. Finally, a video annotation is generated according to MPEG-7 multimedia description schemes (MDSs). Experimental results demonstrate the effectiveness of the proposed method.
Gong YH, Sin LT, Chuan CH, Zhang H, Sakauchi M: Automatic parsing of TV soccer programs. Proceedings of International Conference on Multimedia Computing and Systems (ICMCS '95), May 1995, Washington, DC, USA 167–174.
Tan Y-P, Saur DD, Kulkami SR, Ramadge PJ: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Transactions on Circuits and Systems for Video Technology 2000, 10(1):133–146. 10.1109/76.825867
Xu P, Xie L, Chang S-F, Divakaran A, Vetro A, Sun H: Algorithms and system for segmentation and structure analysis in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '01), August 2001, Tokyo, Japan 721–724.
Ekin A, Tekalp AM, Mehrotra R: Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing 2003, 12(7):796–807. 10.1109/TIP.2003.812758
Lu H, Tan Y-P: Content-based sports video analysis and modeling. Proceedings of 7th International Conference on Control, Automation, Robotics and Vision (ICARCV '02), December 2002, Singapore 1198–1203.
Fu Y, Ekin A, Tekalp AM, Mehrotra R: Temporal segmentation of video objects for hierarchical object-based motion description. IEEE Transactions on Image Processing 2002, 11(2):135–145. 10.1109/83.982821
Duan L-Y, Xu M, Chua T-S, Tian Q, Xu C-S: A mid-level representation framework for semantic sports video analysis. Proceedings of 11th ACM International Conference on Multimedia, November 2003, Berkeley, Calif, USA 33–44.
Han M, Hua W, Xu W, Gong YH: An integrated baseball digest system using maximum entropy method. Proceedings of 10th ACM International Conference on Multimedia, December 2002, Juan les Pins, France 347–350.
Nepal S, Srinivasan U, Reynolds G: Automatic detection of goal segments in basketball videos. Proceedings of 9th ACM International Conference on Multimedia, September 2001, Ottawa, Ontario, Canada 9: 261–269.
Xu M, Duan L-Y, Xu C-S, Kankanhalli M, Tian Q: Event detection in basketball video using multiple modalities. Proceedings of 4th International Conference on Information, Communications and Signal Processing and the 4th Pacific Rim Conference on Multimedia (ICICS-PCM '03), December 2003, Singapore 3: 1526–1530.
Naphade MR, Huang TS: Semantic video indexing using a probabilistic framework. Proceedings of International Conference on Pattern Recognition (ICPR '00), September 2000, Barcelona, Spain 3: 3083–3088.
Snoek CGM, Worring M: Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 2005, 7(4):638–647.
Rui Y, Gupta A, Acero A: Automatically extracting highlights for TV baseball programs. Proceedings of 8th ACM International Conference on Multimedia, October–November 2000, Los Angeles, Calif, USA 105–115.
Xu M, Maddage NC, Xu C-S, Kankanhalli M, Tian Q: Creating audio keywords for event detection in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 2: 281–284.
Rabiner L, Juang B-H: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.
Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W: Semantic annotation of soccer videos: automatic highlights identification. Computer Vision and Image Understanding 2003, 92(2–3):285–305. 10.1016/j.cviu.2003.06.004
Pan H, van Beek P, Sezan MI: Detection of slow-motion replay segments in sports video for highlights generation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 3: 1649–1652.
Xie L, Xu P, Chang S-F, Divakaran A, Sun H: Structure analysis of soccer video with domain knowledge and hidden Markov models . Pattern Recognition Letters 2004, 25(7):767–775. 10.1016/j.patrec.2004.01.005
Xiong Z, Radhakrishnan R, Divakaran A, Huang TS: Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong, China 5: 632–635.
Nam J, Tewfik A: Combined audio and visual streams analysis for video sequence segmentation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), April 1997, Munich, Germany 4: 2665–2668.
Saraceno C, Leonardi R: Identification of story units in audio-visual sequences by joint audio and video processing. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 1: 363–367.
Yi H, Rajan D, Chia LT: A unified approach to detection of shot boundaries and subshots in compressed video. Proceedings of International Conference on Image Processing (ICIP '03), September 2003, Barcelona, Spain 2: 1005–1008.
Siew LH, Hodgson RM, Wood EJ: Texture measures for carpet wear assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence 1988, 10(1):92–105. 10.1109/34.3870
Haralick RM, Shanmugam K, Dinstein I: Textural features for image classification. IEEE Transactions System, Man, and Cybernetics 1973, 3(6):610–621.
Stiller C, Konrad J: Estimating motion in image sequences. IEEE Signal Processing Magazine 1999, 16(4):70–91. 10.1109/79.774934
Szeliski R: Video mosaics for virtual environments. IEEE Computer Graphics and Applications 1996, 16(2):22–30. 10.1109/38.486677
Young S, Evermann G, Kershaw D, et al.: The HTK Book (for HTK Version 3.1). Cambridge University Engineering Department, Cambridge, UK, December 2002
Manjunath BS, Salembier P, Sikora T: Introduction to MPEG-7. John Wiley & Sons, New York, NY, USA; 2002.