Skip to main content

An Attention-Driven Model for Grouping Similar Images with Image Retrieval Applications


Recent work in the computational modeling of visual attention has demonstrated that a purely bottom-up approach to identifying salient regions within an image can be successfully applied to diverse and practical problems from target recognition to the placement of advertisement. This paper proposes an application of a combination of computational models of visual attention to the image retrieval problem. We demonstrate that certain shortcomings of existing content-based image retrieval solutions can be addressed by implementing a biologically motivated, unsupervised way of grouping together images whose salient regions of interest (ROIs) are perceptually similar regardless of the visual contents of other (less relevant) parts of the image. We propose a model in which only the salient regions of an image are encoded as ROIs whose features are then compared against previously seen ROIs and assigned cluster membership accordingly. Experimental results show that the proposed approach works well for several combinations of feature extraction techniques and clustering algorithms, suggesting a promising avenue for future improvements, such as the addition of a top-down component and the inclusion of a relevance feedback mechanism.


  1. 1.

    Marques O, Furht B: Content-Based Image and Video Retrieval. Kluwer Academic, Boston, Mass, USA; 2002.

    Google Scholar 

  2. 2.

    Rui Y, Huang TS, Chang S-F: Image retrieval: current techniques, promising directions, and open issues. Journal of Visual Communication and Image Representation 1999,10(1):39–62. 10.1006/jvci.1999.0413

    Article  Google Scholar 

  3. 3.

    Smeulders AMW, Worring M, Santini S, Gupta A, Jain R: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000,22(12):1349–1380. 10.1109/34.895972

    Article  Google Scholar 

  4. 4.

    Enser PGB, Sandom CJ: Towards a comprehensive survey of the semantic gap in visual image retrieval. Proceedings of the 2nd International Conference on Image and Video Retrieval (CIVR '03), July 2003, Urbana-Champaign, Ill, USA 291–299.

    Google Scholar 

  5. 5.

    Zhao R, Grosky WI: Narrowing the semantic gap—improved text-based web document retrieval using visual features. IEEE Transactions on Multimedia 2002,4(2):189–200. 10.1109/TMM.2002.1017733

    Article  Google Scholar 

  6. 6.

    Zhao R, Grosky WI: Negotiating the semantic gap: from feature maps to semantic landscapes. Pattern Recognition 2002,35(3):593–600. 10.1016/S0031-3203(01)00062-0

    MATH  Article  Google Scholar 

  7. 7.

    Colombo C, Del Bimbo A: Visible image retrieval. In Image Databases: Search and Retrieval of Digital Imagery. Edited by: Castelli V, Bergman LD. John Wiley & Sons, New York, NY, USA; 2002:11–33. chapter 2

    Google Scholar 

  8. 8.

    Leung CHC, Ip HH-S: Benchmarking for content-based visual information search. Proceedings of the 4th International Conference on Advances in Visual Information Systems (VISUAL '00), November 2000, Lyon, France 442–456.

    Google Scholar 

  9. 9.

    Müller H, Müller W, Squire DM: Automated benchmarking in content-based image retrieval. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '01), August 2001, Tokyo, Japan 290.

    Google Scholar 

  10. 10.

    Carson C, Belongie S, Greenspan H, Malik J: Blobworld: image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002,24(8):1026–1038. 10.1109/TPAMI.2002.1023800

    Article  Google Scholar 

  11. 11.

    Ma W-Y, Manjunath BS: NeTra: a toolbox for navigating large image databases. Multimedia Systems 1999,7(3):184–198. 10.1007/s005300050121

    Article  Google Scholar 

  12. 12.

    Li Y, Shapiro L: Object recognition for content-based image retrieval.

  13. 13.

    Hoiem D, Sukthankar R, Schneiderman H, Huston L: Object-based image retrieval using the statistical structure of images. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), June-July 2004, Washington, DC, USA 2: 490–497.

    Google Scholar 

  14. 14.

    Tao Y, Grosky WI: Image matching using the OBIR system with feature point histograms. Proceedings of the 4th Working Conference on Visual Database Systems (VDB '98), May 1998, L'Aquila, Italy 192–197.

    Google Scholar 

  15. 15.

    Itti L, Koch C, Niebur E: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998,20(11):1254–1259. 10.1109/34.730558

    Article  Google Scholar 

  16. 16.

    Stentiford FWM: An attention based similarity measure with application to content-based information retrieval. Storage and Retrieval for Media Databases, January 2003, Santa Clara, Calif, USA, Proceedings of SPIE 5021: 221–232.

    Google Scholar 

  17. 17.

    Baeza-Yates R, Ribeiro-Neto B: Modern Information Retrieval. Addison-Wesley/ACM Press, New York, NY, USA; 1999.

    Google Scholar 

  18. 18.

    Chang S-F, Smith JR, Beigi M, Benitez A: Visual information retrieval from large distributed online repositories. Communications of the ACM 1997,40(12):63–71. 10.1145/265563.265573

    Article  Google Scholar 

  19. 19.

    Palmer S: Vision Science: Photons to Phenomenology. MIT Press, Cambridge, Mass, USA; 1999.

    Google Scholar 

  20. 20.

    Veltkamp R, Tanase M: A survey of content-based image retrieval systems. In Content-Based Image and Video Retrieval. Edited by: Marques O, Furht B. Kluwer Academic, Boston, Mass, USA; 2002:47–101. chapter 5

    Google Scholar 

  21. 21.

    Chang E, Cheng K-T, Lai W-C, Wu C-T, Chang C, Wu Y-L: PBIR: perception-based image retrieval-a system that can quickly capture subjective image query concepts. Proceedings of the 9th ACM International Conference on Multimedia, September 2001, Ottawa, Canada 611–614.

    Google Scholar 

  22. 22.

    Marques O, Barman N: Semi-automatic semantic annotation of images using machine learning techniques. Proceedings of the 2nd International Semantic Web Conference (ISWC '03), October 2003, Sanibel Island, Fla, USA, Lecture Notes in Computer Science 2870: 550–565.

    Google Scholar 

  23. 23.

    Marques O, Furht B: MUSE: a content-based image search and retrieval system using relevance feedback. Multimedia Tools and Applications 2002,17(1):21–50. 10.1023/A:1014679605305

    Article  Google Scholar 

  24. 24.

    Oliva A: Gist of a scene. In Neurobiology of Attention. Edited by: Itti L, Rees G, Tsotsos J. Academic Press, Elsevier, New York, NY, USA; 2005:251–256. chapter 41

    Google Scholar 

  25. 25.

    Styles EA: Attention, Perception, and Memory: An Integrated Introduction. Taylor & Francis Routledge, New York, NY, USA; 2005.

    Google Scholar 

  26. 26.

    Noton D, Stark L: Scanpaths in eye movements during pattern perception. Science 1971,171(968):308–311. 10.1126/science.171.3968.308

    Article  Google Scholar 

  27. 27.

    Connor C, Egeth H, Yantis S: Visual attention: bottom-up versus top-down. Current Biology 2004,14(19):R850–R852. 10.1016/j.cub.2004.09.041

    Article  Google Scholar 

  28. 28.

    Santini S, Jain R: The graphical specification of similarity queries. Journal of Visual Languages and Computing 1996,7(4):403–421. 10.1006/jvlc.1996.0021

    Article  Google Scholar 

  29. 29.

    Pylyshyn ZW: Seeing and Visualizing: It's Not What You Think. MIT Press, Cambridge, Mass, USA; 2006.

    Google Scholar 

  30. 30.

    Palmer S: The effects of contextual scenes on the identification of objects. Memory & Cognition 1975,3(5):519–526. 10.3758/BF03197524

    Article  Google Scholar 

  31. 31.

    Biederman I: Perceiving real-world scenes. Science 1972,177(43):77–80. 10.1126/science.177.4043.77

    Article  Google Scholar 

  32. 32.

    Itti L, Koch C: Computational modeling of visual attention. Nature Reviews Neuroscience 2001,2(3):194–203. 10.1038/35058500

    Article  Google Scholar 

  33. 33.

    Rutishauser U, Walther D, Koch C, Perona P: Is bottom-up attention useful for object recognition? Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), June-July 2004, Washington, DC, USA 2: 37–44.

    Google Scholar 

  34. 34.

    Walther D, Itti L, Riesenhuber M, Poggio T, Koch C: Attentional selection for object recognition—a gentle way. Proceedings of the 2nd International Workshop on Biologically Motivated Computer Vision (BMCV '02), November 2002, Tubingen, Germany, Lecture Notes In Computer Science 2525: 472–479.

    MATH  Article  Google Scholar 

  35. 35.

    Navalpakkam V, Itti L: Modeling the influence of task on attention. Vision Research 2005,45(2):205–231. 10.1016/j.visres.2004.07.042

    Article  Google Scholar 

  36. 36.

    Einhäuser W, König P: Does luminance-contrast contribute to a saliency map for overt visual attention? European Journal of Neuroscience 2003,17(5):1089–1097. 10.1046/j.1460-9568.2003.02508.x

    Article  Google Scholar 

  37. 37.

    Parkhurst D, Law K, Niebur E: Modeling the role of salience in the allocation of overt visual attention. Vision Research 2002,42(1):107–123. 10.1016/S0042-6989(01)00250-4

    Article  Google Scholar 

  38. 38.

    Parkhurst D, Niebur E: Texture contrast attracts overt visual attention in natural scenes. European Journal of Neuroscience 2004,19(3):783–789. 10.1111/j.0953-816X.2003.03183.x

    Article  Google Scholar 

  39. 39.

    Peters RJ, Iyer A, Itti L, Koch C: Components of bottom-up gaze allocation in natural images. Vision Research 2005,45(18):2397–2416. 10.1016/j.visres.2005.03.019

    Article  Google Scholar 

  40. 40.

    Henderson JM, Brockmole JR, Castelhano MS, Mack M: Image salience versus cognitive control of eye movements in real-world scenes: evidence from visual search. In Eye Movement Research: Insights Into Mind and Brain. Edited by: van Gompel R, Fischer M, Murray W, Hill R. Elsevier, Amsterdam, The Netherlands; in press

  41. 41.

    Itti L, Koch C: Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging 2001,10(1):161–169. 10.1117/1.1333677

    Article  Google Scholar 

  42. 42.

    Bamidele A, Stentiford FWM, Morphett J: An attention-based approach to content-based image retrieval. BT Technology Journal 2004,22(3):151–160.

    Article  Google Scholar 

  43. 43.

    Boccignone G, Picariello A, Moscato V, Albanese M: Image similarity based on animate vision: information path matching. Proceedings of the 8th International Workshop on Multimedia Information Systems (MIS '02), October 2002, Tempe, Ariz, USA 66–75.

    Google Scholar 

  44. 44.

    Ballard DH: Animate vision. Artificial Intelligence 1991,48(1):57–86. 10.1016/0004-3702(91)90080-4

    MathSciNet  Article  Google Scholar 

  45. 45.

    Bamidele A, Stentiford FWM: An attention based similarity measure used to identify image clusters. Proceedings of the 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, November-December 2005, London, UK

    Google Scholar 

  46. 46.

    Machrouh J, Tarroux P: Attentional mechanisms for interactive image exploration. EURASIP Journal on Applied Signal Processing 2005,2005(14):2391–2396. 10.1155/ASP.2005.2391

    MATH  Google Scholar 

  47. 47.

    Bradley AP, Stentiford FWM: JPEG 2000 and region of interest coding. Proceedings of Digital Image Computing: Techniques and Applications (DICTA '02), January 2002, Melbourne, Australia 303–308.

    Google Scholar 

  48. 48.

    Draper B, Baek K, Boody J: Implementing the expert object recognition pathway. Proceedings of the 3rd International Conference on Vision Systems (ICVS '03), April 2003, Graz, Austria

    Google Scholar 

  49. 49.

    Itti L, Gold C, Koch C: Visual attention and target detection in cluttered natural scenes. Optical Engineering 2001,40(9):1784–1793. 10.1117/1.1389063

    Article  Google Scholar 

  50. 50.

    Itti L, Koch C: A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 2000,40(10–12):1489–1506.

    Article  Google Scholar 

  51. 51.

    Newcombe R: An interactive bottom-up visual attention toolkit in Java.

  52. 52.

    Ma W-Y, Zhang HJ: Benchmarking of image features for content-based retrieval. Proceedings of the 32nd IEEE Conference Record of the Asilomar Conference on Signals, Systems and Computers, November 1998, Pacific Grove, Calif, USA 1: 253–256.

    Google Scholar 

  53. 53.

    Chen Y, Wang JZ, Krovetz R: CLUE: cluster-based retrieval of images by unsupervised learning. IEEE Transactions on Image Processing 2005,14(8):1187–1201.

    Article  Google Scholar 

  54. 54.

    Kaufman L, Rousseeuw P: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York, NY, USA; 1990.

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Oge Marques.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Marques, O., Mayron, L.M., Borba, G.B. et al. An Attention-Driven Model for Grouping Similar Images with Image Retrieval Applications. EURASIP J. Adv. Signal Process. 2007, 043450 (2006).

Download citation


  • Feature Extraction
  • Visual Attention
  • Image Retrieval
  • Similar Image
  • Relevance Feedback