Semantic Context Detection Using Audio Event Fusion

Chu, Wei-Ta; Cheng, Wen-Huang; Wu, Ja-Ling

doi:10.1155/ASP/2006/27390

Research Article
Open access
Published: 01 December 2006

Semantic Context Detection Using Audio Event Fusion

Wei-Ta Chu¹,
Wen-Huang Cheng² &
Ja-Ling Wu^1,2

EURASIP Journal on Advances in Signal Processing volume 2006, Article number: 027390 (2006) Cite this article

1927 Accesses
4 Citations
Metrics details

Abstract

Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine (SVM)) approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.

References

Lienhart RW: Comparison of automatic shot boundary detection algorithms. Storage and Retrieval for Image and Video Databases VII, January 1999, San Jose, Calif, USA, Proceedings of SPIE 3656: 290–301.
Article Google Scholar
Hanjalic A: Shot-boundary detection: unraveled and resolved? IEEE Transactions on Circuits and Systems for Video Technology 2002, 12(2):90–105. 10.1109/76.988656
Article Google Scholar
Chang S-F, Vetro A: Video adaptation: concepts, technologies, and open issues. Proceedings of the IEEE 2005, 93(1):148–158.
Article Google Scholar
Lu L, Zhang H-J, Jiang H: Content analysis for audio classification and segmentation. IEEE Transactions Speech Audio Processing 2002, 10(7):504–516. 10.1109/TSA.2002.804546
Article Google Scholar
Zhang T, Jay Kuo C-C: Hierarchical system for content-based audio classification and retrieval. Multimedia Storage and Archiving Systems III, November 1998, Boston, Mass, USA, Proceedings of SPIE 3527: 398–409.
Article Google Scholar
Tzanetakis G, Cook P: Musical genre classification of audio signals. IEEE Transactions Speech Audio Processing 2002, 10(5):293–302. 10.1109/TSA.2002.800560
Article Google Scholar
Lu L, Zhang H-J: Automated extraction of music snippets. Proc. 11th ACM International Conference on Multimedia, November 2003, Berkeley, Calif, USA 140–147.
Google Scholar
Fischer S, Lienhart R, Effelsberg W: Automatic recognition of film genres. Proc. 3rd ACM International Conference on Multimedia, November 1995, San Francisco, Calif, USA 295–304.
Google Scholar
Liu Z, Huang J, Wang Y: Classification of TV programs based on audio information using hidden Markov model. Proc. IEEE 2nd Workshop on Multimedia Signal Processing (MMSP '98), December 1998, Redonda Beach, Calif, USA 27–31.
Google Scholar
Wang Y, Liu Z, Huang J-C: Multimedia content analysis-using both audio and visual clues. IEEE Signal Processing Magazine 2000, 17(6):12–36. 10.1109/79.888862
Article Google Scholar
Zettl H: Sight Sound Motion: Applied Media Aesthetics. Wadsworth, Belmont, Calif, USA; 1999.
Google Scholar
Dorai C, Venkatesh S: Media Computing: Computational Media Aesthetics. Kluwer Academic, Boston, Mass, USA; 2002.
Book Google Scholar
Moncrieff S, Venkatesh S, Dorai C: Horror film genre typing and scene labeling via audio analysis. Proc. IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 2: 193–196.
Google Scholar
Cai R, Lu L, Zhang H-J, Cai L-H: Highlight sound effects detection in audio stream. Proc. IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 3: 37–40.
Google Scholar
Naphade MR, Kristjansson T, Frey B, Huang TS: Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. Proc. International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 3: 536–540.
Article Google Scholar
Naphade MR, Huang TS: Extracting semantics from audio-visual content: the final frontier in multimedia retrieval. IEEE Transactions on Neural Networks 2002, 13(4):793–810. 10.1109/TNN.2002.1021881
Article Google Scholar
Smith JR, Naphade M, Natsev A: Multimedia semantic indexing using model vectors. Proc. IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 2: 445–448.
Google Scholar
Hyvärinen A, Karhunen J, Oja E: Independent Component Analysis. John Wiley & Sons, New York, NY, USA; 2001.
Book Google Scholar
Rabiner LR: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 1989, 77(2):257–286. 10.1109/5.18626
Article Google Scholar
Duda RO, Hart PE, Stork DG: Pattern Classification. John Wiley & Sons, New York, NY, USA; 2001.
MATH Google Scholar
Li SZ: Content-based audio classification and retrieval using the nearest feature line method. IEEE Transactions Speech Audio Processing 2000, 8(5):619–625. 10.1109/89.861383
Article Google Scholar
Bow S-T: Pattern Recognition and Image Preprocessing. Marcel Dekker, New York, NY, USA; 2002.
Book Google Scholar
Sound Ideas: Sound Effects Library https://doi.org/www.sound-ideas.com/
Zilca RD: Text-independent speaker verification using covariance modeling. IEEE Signal Processing Letters 2001, 8(4):97–99. 10.1109/97.911465
Article Google Scholar
Vapnik VN: Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.
MATH Google Scholar
Platt JC, Cristianini N, Shawe-Taylor J: Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems. Volume 12. MIT Press, Cambridge, Mass, USA; 2000:547–553.
Google Scholar
Hsu C-W, Lin C-J: A comparison of methods for multiclass support vector machines. IEEE Transactions Neural Networks 2002, 13(2):415–425. 10.1109/72.991427
Article Google Scholar
Wang J, Xu C, Chng E, Tian Q: Sports highlight detection from keyword sequences using HMM. Proc. IEEE International Conference on Multimedia and Expo (ICME '04), June 2004, Taipei, Taiwan 1: 599–602.
Google Scholar
Naphade MR, Garg A, Huang TS: Audio-visual event detection using duration dependent input output Markov models. Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL '01), December 2001, Kauai, Hawaii, USA 39–43.
Chapter Google Scholar
TREC Video Retrieval Evaluation https://doi.org/www-nlpir.nist.gov/projects/trecvid/

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 106, Taiwan
Wei-Ta Chu & Ja-Ling Wu
Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, 106, Taiwan
Wen-Huang Cheng & Ja-Ling Wu

Authors

Wei-Ta Chu
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Huang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Ja-Ling Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Ta Chu.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chu, WT., Cheng, WH. & Wu, JL. Semantic Context Detection Using Audio Event Fusion. EURASIP J. Adv. Signal Process. 2006, 027390 (2006). https://doi.org/10.1155/ASP/2006/27390

Download citation

Received: 31 August 2004
Revised: 20 February 2005
Accepted: 05 April 2005
Published: 01 December 2006
DOI: https://doi.org/10.1155/ASP/2006/27390

Semantic Context Detection Using Audio Event Fusion

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords