Skip to main content
  • Research Article
  • Open access
  • Published:

Discovering Recurrent Image Semantics from Class Discrimination

Abstract

Supervised statistical learning has become a critical means to design and learn visual concepts (e.g., faces, foliage, buildings, etc.) in content-based indexing systems. The drawback of this approach is the need of manual labeling of regions. While several automatic image annotation methods proposed recently are very promising, they usually rely on the availability and analysis of associated text descriptions. In this paper, we propose a hybrid learning framework to discover local semantic regions and generate their samples for training of local detectors with minimal human intervention. A multiscale segmentation-free framework is proposed to embed the soft presence of discovered semantic regions and local class patterns in an image independently for indexing and matching. Based on 2400 heterogeneous consumer images with 16 semantic queries, both similarity matching based on individual index and integrated similarity matching have outperformed a feature fusion approach by 26% and 37% in average precisions, respectively.

References

  1. Hsu WH-M, Chang S-F: Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '04), June 2004, Taipei, Taiwan 2: 1091–1094.

    Google Scholar 

  2. Li B, Goh K, Chang EY: Confidence-based dynamic ensemble for image annotation and semantics discovery. Proceedings of 11th ACM International Conference on Multimedia (MM '03), November 2003, Berkeley, Calif, USA 195–206.

    Google Scholar 

  3. Snoek CGM, Worring M, Hauptmann AG: Detection of TV news monologues by style analysis. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '04), June 2004, Taipei, Taiwan 2: 1103–1106.

    Google Scholar 

  4. Tseng BL, Lin C-Y, Naphade MR, Natsev A, Smith JR: Normalized classifier fusion for semantic visual concept detection. Proceedings of IEEE International Conference on Image Processing (ICIP '03), September 2003, Barcelona, Spain 2: 535–538.

    Google Scholar 

  5. Amir A, Iyengar G, Lin C-Y, et al.: The IBM semantic concept detection framework. 2003, https://doi.org/www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html

    Google Scholar 

  6. Lin C-Y, Tseng BL, Smith JR: VideoAnnEx: IBM MPEG-7 annotation tool for multimedia indexing and concept learning. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA

    Google Scholar 

  7. Adams WH, Iyengar G, Lin C-Y, et al.: Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP Journal on Applied Signal Processing 2003, 2003(2):170–185. 10.1155/S1110865703211173

    Google Scholar 

  8. Wang L, Chan KL, Zhang Z: Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '03), June 2003, Madison, Wis, USA 1: 629–634.

    Google Scholar 

  9. Wu Y, Tian Q, Huang TS: Discriminant-EM algorithm with application to image retrieval. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '00), June 2000, Hilton Head Island, SC, USA 1: 222–227.

    Google Scholar 

  10. Lu YL, Hu C, Zhu X, Zhang HJ, Yang Q: A unified framework for semantics and feature based relevance feedback in image retrieval systems. Proceedings of 8th ACM International Conference on Multimedia (MM '00), October–November 2000, Los Angeles, Calif, USA 31–37.

    Google Scholar 

  11. Liu W, Sun Y, Zhang H: MiAlbum—a system for home photo management using the semi-automatic image annotation approach. Proceedings of 8th ACM International Conference on Multimedia (MM '00), October–November 2000, Los Angeles, Calif, USA 479–480.

    Google Scholar 

  12. Benitez AB, Chang S-F: Automatic multimedia knowledge discovery, summarization and evaluation. to appear in IEEE Trans. Multimedia

  13. Benitez AB, Smith JR, Chang S-F: MediaNet: a multimedia information network for knowledge representation. Internet Multimedia Management Systems, November 2000, Boston, Mass, USA, Proceedings of SPIE 4210: 1–12.

    Article  Google Scholar 

  14. Benitez AB, Chang S-F: Image classification using multimedia knowledge networks. Proceedings of IEEE International Confererence on Image Processing (ICIP '03), September 2003, Barcelona, Spain 3: 613–616.

    Google Scholar 

  15. Duygulu P, Barnard K, de Freitas N, Forsyth D: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. Proceedings of 7th European Conference on Computer Vision (ECCV '02), May 2002, Copenhagen, Denmark 4: 97–112.

    MATH  Google Scholar 

  16. Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei DM, Jordan MI: Matching words and pictures. Journal of Machine Learning Research 2003, 3(6):1107–1135.

    MATH  Google Scholar 

  17. Kutics A, Nakagawa A, Tanaka K, Yamada M, Sanbe Y, Ohtsuka S: Linking images and keywords for semantics-based image retrieval. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 777–780.

    Google Scholar 

  18. Li J, Wang JZ: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions On Pattern Analysis And Machine Intelligence 2003, 25(9):1075–1088. 10.1109/TPAMI.2003.1227984

    Article  Google Scholar 

  19. Barnard K, Duygulu P, Guru R, Gabbur P, Forsyth D: The effects of segmentation and feature choice in a translation model of object recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '03), June 2003, Madison, Wis, USA 2: 675–682.

    Google Scholar 

  20. Fergus R, Perona P, Zisserman A: Object class recognition by unsupervised scale-invariant learning. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '03), June 2003, Madison, Wis, USA 2: 264–271.

    Google Scholar 

  21. Selinger A, Nelson RC: Minimally supervised acquisition of 3D recognition models from cluttered images. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '01), December 2001, Kauai, Hawaii, USA 1: 213–220.

    Google Scholar 

  22. Weber M, Welling M, Perona P: Unsupervised learning of models for recognition. Proceedings of 6th European Conference on Computer Vision (ECCV '00), June–July 2000, Dublin, Ireland 1: 18–32.

    Google Scholar 

  23. Schmid C: Constructing models for content-based image retrieval. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '01), December 2001, Kauai, Hawaii, USA 2: 39–45.

    Google Scholar 

  24. Vapnik VN: Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.

    MATH  Google Scholar 

  25. Bezdek JC: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, NY, USA; 1981.

    Book  Google Scholar 

  26. Vailaya A, Figueiredo MAT, Jain AK, Zhang H-J: Image classification for content-based indexing. IEEE Transactions On Image Processing 2001, 10(1):117–130. 10.1109/83.892448

    Article  Google Scholar 

  27. Manjunath BS, Ma WY: Texture features for browsing and retrieval of image data. IEEE Transactions On Pattern Analysis And Machine Intelligence 1996, 18(8):837–842. 10.1109/34.531803

    Article  Google Scholar 

  28. Boughorbel S, Tarel J-P, Fleuret F: Non-mercer kernel for SVM object recognition. Proceedings of British Machine Vision Conference (BMVC '04), September 2004, London, UK 137–146.

    Google Scholar 

  29. Joachims T: Making large-scale SVM learning practical. In Advances in Kernel Methods—Support Vector Learning. Edited by: Schölkopf B, Burges CJC, Smola A. MIT Press, Cambridge, Mass, USA; 1999:169–184.

    Google Scholar 

  30. Bishop CM: Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK; 1995.

    MATH  Google Scholar 

  31. Papageorgiou CP, Oren M, Poggio T: A general framework for object detection. Proceedings of IEEE 6th International Conference on Computer Vision (ICCV '98), January 1998, Bombay, India 555–562.

    Google Scholar 

  32. Swain MJ, Ballard DH: Color indexing. International Journal of Computer Vision 1991, 7(1):11–32. 10.1007/BF00130487

    Article  Google Scholar 

  33. Szummer M, Picard RW: Indoor-outdoor image classification. Proceedings of IEEE International Workshop on Content-Based Access of Image and Video Databases, January 1998, Bombay, India 42–51.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Lim, JH., Jin, J.S. Discovering Recurrent Image Semantics from Class Discrimination. EURASIP J. Adv. Signal Process. 2006, 076093 (2006). https://doi.org/10.1155/ASP/2006/76093

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/ASP/2006/76093

Keywords