Skip to main content

Advertisement

Discovering Recurrent Image Semantics from Class Discrimination

Abstract

Supervised statistical learning has become a critical means to design and learn visual concepts (e.g., faces, foliage, buildings, etc.) in content-based indexing systems. The drawback of this approach is the need of manual labeling of regions. While several automatic image annotation methods proposed recently are very promising, they usually rely on the availability and analysis of associated text descriptions. In this paper, we propose a hybrid learning framework to discover local semantic regions and generate their samples for training of local detectors with minimal human intervention. A multiscale segmentation-free framework is proposed to embed the soft presence of discovered semantic regions and local class patterns in an image independently for indexing and matching. Based on 2400 heterogeneous consumer images with 16 semantic queries, both similarity matching based on individual index and integrated similarity matching have outperformed a feature fusion approach by 26% and 37% in average precisions, respectively.

References

  1. 1.

    Hsu WH-M, Chang S-F: Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '04), June 2004, Taipei, Taiwan 2: 1091–1094.

  2. 2.

    Li B, Goh K, Chang EY: Confidence-based dynamic ensemble for image annotation and semantics discovery. Proceedings of 11th ACM International Conference on Multimedia (MM '03), November 2003, Berkeley, Calif, USA 195–206.

  3. 3.

    Snoek CGM, Worring M, Hauptmann AG: Detection of TV news monologues by style analysis. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '04), June 2004, Taipei, Taiwan 2: 1103–1106.

  4. 4.

    Tseng BL, Lin C-Y, Naphade MR, Natsev A, Smith JR: Normalized classifier fusion for semantic visual concept detection. Proceedings of IEEE International Conference on Image Processing (ICIP '03), September 2003, Barcelona, Spain 2: 535–538.

  5. 5.

    Amir A, Iyengar G, Lin C-Y, et al.: The IBM semantic concept detection framework. 2003, https://doi.org/www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html

  6. 6.

    Lin C-Y, Tseng BL, Smith JR: VideoAnnEx: IBM MPEG-7 annotation tool for multimedia indexing and concept learning. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA

  7. 7.

    Adams WH, Iyengar G, Lin C-Y, et al.: Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP Journal on Applied Signal Processing 2003, 2003(2):170–185. 10.1155/S1110865703211173

  8. 8.

    Wang L, Chan KL, Zhang Z: Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '03), June 2003, Madison, Wis, USA 1: 629–634.

  9. 9.

    Wu Y, Tian Q, Huang TS: Discriminant-EM algorithm with application to image retrieval. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '00), June 2000, Hilton Head Island, SC, USA 1: 222–227.

  10. 10.

    Lu YL, Hu C, Zhu X, Zhang HJ, Yang Q: A unified framework for semantics and feature based relevance feedback in image retrieval systems. Proceedings of 8th ACM International Conference on Multimedia (MM '00), October–November 2000, Los Angeles, Calif, USA 31–37.

  11. 11.

    Liu W, Sun Y, Zhang H: MiAlbum—a system for home photo management using the semi-automatic image annotation approach. Proceedings of 8th ACM International Conference on Multimedia (MM '00), October–November 2000, Los Angeles, Calif, USA 479–480.

  12. 12.

    Benitez AB, Chang S-F: Automatic multimedia knowledge discovery, summarization and evaluation. to appear in IEEE Trans. Multimedia

  13. 13.

    Benitez AB, Smith JR, Chang S-F: MediaNet: a multimedia information network for knowledge representation. Internet Multimedia Management Systems, November 2000, Boston, Mass, USA, Proceedings of SPIE 4210: 1–12.

  14. 14.

    Benitez AB, Chang S-F: Image classification using multimedia knowledge networks. Proceedings of IEEE International Confererence on Image Processing (ICIP '03), September 2003, Barcelona, Spain 3: 613–616.

  15. 15.

    Duygulu P, Barnard K, de Freitas N, Forsyth D: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. Proceedings of 7th European Conference on Computer Vision (ECCV '02), May 2002, Copenhagen, Denmark 4: 97–112.

  16. 16.

    Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei DM, Jordan MI: Matching words and pictures. Journal of Machine Learning Research 2003, 3(6):1107–1135.

  17. 17.

    Kutics A, Nakagawa A, Tanaka K, Yamada M, Sanbe Y, Ohtsuka S: Linking images and keywords for semantics-based image retrieval. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 777–780.

  18. 18.

    Li J, Wang JZ: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions On Pattern Analysis And Machine Intelligence 2003, 25(9):1075–1088. 10.1109/TPAMI.2003.1227984

  19. 19.

    Barnard K, Duygulu P, Guru R, Gabbur P, Forsyth D: The effects of segmentation and feature choice in a translation model of object recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '03), June 2003, Madison, Wis, USA 2: 675–682.

  20. 20.

    Fergus R, Perona P, Zisserman A: Object class recognition by unsupervised scale-invariant learning. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '03), June 2003, Madison, Wis, USA 2: 264–271.

  21. 21.

    Selinger A, Nelson RC: Minimally supervised acquisition of 3D recognition models from cluttered images. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '01), December 2001, Kauai, Hawaii, USA 1: 213–220.

  22. 22.

    Weber M, Welling M, Perona P: Unsupervised learning of models for recognition. Proceedings of 6th European Conference on Computer Vision (ECCV '00), June–July 2000, Dublin, Ireland 1: 18–32.

  23. 23.

    Schmid C: Constructing models for content-based image retrieval. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '01), December 2001, Kauai, Hawaii, USA 2: 39–45.

  24. 24.

    Vapnik VN: Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.

  25. 25.

    Bezdek JC: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, NY, USA; 1981.

  26. 26.

    Vailaya A, Figueiredo MAT, Jain AK, Zhang H-J: Image classification for content-based indexing. IEEE Transactions On Image Processing 2001, 10(1):117–130. 10.1109/83.892448

  27. 27.

    Manjunath BS, Ma WY: Texture features for browsing and retrieval of image data. IEEE Transactions On Pattern Analysis And Machine Intelligence 1996, 18(8):837–842. 10.1109/34.531803

  28. 28.

    Boughorbel S, Tarel J-P, Fleuret F: Non-mercer kernel for SVM object recognition. Proceedings of British Machine Vision Conference (BMVC '04), September 2004, London, UK 137–146.

  29. 29.

    Joachims T: Making large-scale SVM learning practical. In Advances in Kernel Methods—Support Vector Learning. Edited by: Schölkopf B, Burges CJC, Smola A. MIT Press, Cambridge, Mass, USA; 1999:169–184.

  30. 30.

    Bishop CM: Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK; 1995.

  31. 31.

    Papageorgiou CP, Oren M, Poggio T: A general framework for object detection. Proceedings of IEEE 6th International Conference on Computer Vision (ICCV '98), January 1998, Bombay, India 555–562.

  32. 32.

    Swain MJ, Ballard DH: Color indexing. International Journal of Computer Vision 1991, 7(1):11–32. 10.1007/BF00130487

  33. 33.

    Szummer M, Picard RW: Indoor-outdoor image classification. Proceedings of IEEE International Workshop on Content-Based Access of Image and Video Databases, January 1998, Bombay, India 42–51.

Download references

Author information

Correspondence to Joo-Hwee Lim.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • Image Annotation
  • Fusion Approach
  • Feature Fusion
  • Similarity Match
  • Visual Concept