Investigating the Bag-of-Words Method for 3D Shape Retrieval
© X. Li and A. Godil. 2010
Received: 1 December 2009
Accepted: 3 March 2010
Published: 15 April 2010
This paper investigates the capabilities of the Bag-of-Words (BWs) method in the 3D shape retrieval field. The contributions of this paper are (1) the 3D shape retrieval task is categorized from different points of view: specific versus generic, partial-to-global retrieval (PGR) versus global-to-global retrieval (GGR), and articulated versus nonarticulated (2) the spatial information, represented as concentric spheres, is integrated into the framework to improve the discriminative ability (3) the analysis of the experimental results on Purdue Engineering Benchmark (PEB) reveals that some properties of the BW approach make it perform better on the PGR task than the GGR task (4) the BW approach is evaluated on nonarticulated database PEB and articulated database McGill Shape Benchmark (MSB) and compared to other methods.
With recent advances in scanning and modeling technologies, large number of 3D models are created and stored in databases. For these databases be used effectively require methods for indexing, retrieval, and clustering. Therefore, retrieval and classification of 3D objects are becoming an increasingly important task in modern applications such as computer vision, computer aided design/computer aided manufacturing, multimedia, molecular biology, biometric, security, and robotics.
Because of its simplicity, flexibility, and effectiveness, the Bag-of-Words (BWs) method, which originated from the document retrieval field, has recently attracted large amount of interests in the computer vision fields. It has been applied in the applications such as image/video classification , 3D shape analysis, and retrieval [2–5]. We will explore its performance especially for the 3D shape retrieval task in this paper.
A typical 3D shape retrieval task can be defined as: giving a query 3D shape, to obtain a list of 3D shapes ordered by the similarity between the query object and the one on the list. Several methods are proposed to solve the problem, such as Light Field descriptors , spherical harmonics descriptor , D2 shape distribution , Reeb Graph-based descriptors , Local Feature-based methods [4, 5]. The performance of the methods varies mainly according to the specific tasks. In fact, from different points of view, the 3D shape retrieval task can be further refined as follows.
( ) Differentiated from the object category extent, the task can be discussed in "specific" and "generic" domain, which depends on the purpose and interest of the specialists. The representative benchmarks of the latter one include Princeton Shape Benchmark , NIST 3D Benchmark , while CAD , Protein , and Biometrics  analysis are several important "specific" domains, which have their own properties. For example, CAD models have more complicated structure with holes and other local features. Only using global information, these subtle details could be neglected and lead to less ideal retrieval results.
( ) Based on the completeness of the query shape, the task can be divided into two subtasks as "Partial-to-Global Retrieval (PGR)" and "Global-to-Global Retrieval (GGR)". For the former one, every query shape is regarded as an incomplete object, which is used to obtain similar complete objects from the database. This happens in many cases. For example, when using the 3D range scanners to capture 3D data in real time, because of the limitation of the view angle, the occlusion in the scene, and the real time requirement, only parts of the object can be captured during scanning. Then this incomplete point clouds may be used as the query shape to retrieve the corresponding complete model from an existing database. Solving this problem will also benefit several other applications, such as data registration  and model fixing . Most of the global-based shape retrieval methods [6, 8], which require the complete geometry of a 3D object, cannot be applied directly to PGR. To our knowledge there are only a few literature [3, 17] contributions that solve the PGR problem.
( ) Based on the deformability of the shape, there exist "Articulated Shape Retrieval (ASR)" and "Nonarticulated Shape Retrieval (NASR)". Lots of the natural and man-made objects are deformable. For instance, in CAESAR , each person is scanned in three different postures: standing, sitting with arms open, and sitting with arms down. When performing shape retrieval using a sitting model of person A as the query model, the preferred result is to obtain the other two different gestured model of person A than to retrieve the sitting models of other persons. According to the results in , Light Field method , which performs greatly when dealing with NASR problem, produces poor results for ASR task.
To some extent, in [3, 4, 17], the above three different tasks are discussed within the BWs framework, but there still lacks thorough investigation. Several open problems remain unsolved, such as how to integrate spatial information into the BWs framework to improve the performance. In this paper, we investigate deeply into these three different cases within the framework of the BWs method with spin images  as the low-level features, and provide profound experimental results to support the discussion.
The organization of the paper is as follows. Several related works are summarized in Section 2. The performance measure is discussed in Section 3. In Section 4, we first introduce the ordinary procedure for BWs framework in 3D domain. Then take the CAD database  as an example, a Concentric Bag-of-Words (CBWs) approach is proposed to enhance the discriminative ability of the original BWs method. Several interesting phenomena are studied for PGR problem in Section 5. As for ASR task, McGill articulated shape benchmark  is adopted to test the effectiveness of our approach in Section 6. Finally, we conclude the paper in Section 7.
2. Related Work
Many efforts have been taken to perform 3D shape retrieval recently. Among them, the BWs method, which represents a 3D shape as an orderless collection of local features, has demonstrated impressive level of performance.
In [2, 3], BWs method is explored to accomplish PGR task, in which a visual feature dictionary is constituted by clustering spin images . Then, Kullback-Leibler divergence is proposed as a similarity measurement in , while a probabilistic framework is introduced in .
For the ASR task, Ohbuchi et al.  apply the SIFT algorithm to depth buffer images of the model captured from uniformly sampled locations on a view sphere to collect visual words. After vector quantization, Kullbak-Leibler divergence measures the similarities of the models. It also demonstrates that a) given enough samples, the BWs method can reach a comparable retrieval result as a vision based method like Light Field , when dealing with NASR task; b) the BWs method performs better than Light Field  when dealing with ASR task. In this paper, spin images are used as local features, which can be directly extracted as many as you want in 3D domain. On the other hand, according to , dense features, such as spin images, perform better than sparse features, such as SIFT.
Although the BWs method has many advantages, it suffers from its lack of spatial information. Some methods focus on integrating the spatial layout information into the BWs method. Lazebnik et al.  proposes a spatially enriched Bags-of-Words approach. It works by partitioning the image into increasingly fine subregions and computing histograms of local features found inside each subregion. Implicitly geometric correspondences of the subregions are built in the pyramid matching scheme . In , the object is an ensemble of canonical parts linked together by an explicit homographic relationship. Through an optimization procedure, the model, corresponding to the lowest residual error, gives the class label to the query object along with the localization and pose estimation. Yuan and Wu  describes a context aware clustering method, which captures the contextual information between data. For the BWs method, it means the visual dictionary is constructed based on both the primitive visual features and spatial contexts. Li et al.  propose to treat the model in two different domains, named the feature domain and the spatial domain. The visual word dictionary is built in the feature domain as in the ordinary BWs method. On the other side, the whole model is partitioned into several pieces in the spatial domain. Thereafter, each piece of the model is represented as a word histogram. The whole model is recorded as several word histograms along with a geometry matrix which stores the relative distances between every pairs of the pieces. The weighted sums of dissimilarity measurements from these two domains are used to measure the differences between models.
3. Performance Measure
The performance measure used in this study is the precision-recall curve. Precision-recall curve is the most common metric to evaluate 3D shape retrieval system. Precision is the ratio of retrieved objects that are relevant to all retrieved objects in the ranked list. Recall is the ratio of relevant objects retrieved in the ranked list to all relevant objects.
Basically, recall evaluates how well a retrieval algorithm finds what we want and precision evaluates how well it weeds out what we do not want. There is a tradeoff between recall and precision, one can increase recall by retrieving more, while can decrease precision.
4. Bag-of-Words and Concentric Bag-of-Words Methods
4.1. Bag-of-Words Descriptor
Local feature descriptors, such as spin image , are applied to the 3D model to acquire low-level features.
4.2. Concentric Bag-of-Words Method
After extracting a set of spin images for each model, we construct a shape dictionary as shown in the second block, whose size is predetermined as N, by clustering all spin images acquired from the whole training dataset with -means method.
Instead of representing one model with a histogram of the words from the dictionary, it is partitioned into M regions by grouping the oriented-basis points with M concentric spheres as demonstrated in the third block. Thereafter, the model is recorded as a set of histograms.
where , are two objects A and B, respectively; measures the dissimilarity between two feature vectors, which can be KL divergence , cosine distance, and distance. Thus, for every query object, the objects in the database are all assigned a metric value based on (3), which results a sorted retrieval list.
4.3. Experimental Results
5. Partial-to-Global Retrieval
6. Articulated Shape Retrieval
The Articulated Shape Retrieval requires that the shape descriptor should be deformation invariant, which is not satisfied by several previous methods [6–8]. They perform well when dealing with rigid objects, but manifest poor performance when dealing with deformable ones . BWs method can still be used effectively for ASR task. The descriptors for the models are constructed by following the procedure shown as block 1 and 2 in Figure 2.
We applied the BWs method for ASR task on McGill Shape Benchmark (MSB) . The configuration of the parameters is almost the same as those listed in Section 4.3, except that (a) the width and the height h of the spin plane: , (b) the number of oriented-basis points for one model : . Since all of the models in MSB are regarded as complete, distance is chosen to measure the dissimilarity.
7. Conclusion and Discussion
In this paper, we explore the BWs framework to solve several different tasks in 3D shape retrieval field, which are classified as specific versus generic, partial-to-global versus global-to-global retrieval, and articulated versus Nonarticulated. For each type, the effectiveness of BWs method is discussed in detail. First, CBW method is introduced to improve the discrimination ability of original BWs representation. Second, BWs is applied on PEB to perform partial-to-global retrieval task. And several results revealed, for some shape (gear-like shape), that PGR performs better than GGR. Finally, we compared the results of BWs to several other methods on McGill articulated shape database. Our results are comparable to the best results in . More experiments need to be done to verify the influence of the parameters listed in Section 4.3.
The authors would like to thank the SIMA program and the IDUS program for supporting this work. This work has also been partially supported by NSF Grants of China (60873218) and NSF Grants of Zhejiang (Z1080232).
- Fei-Fei L, Perona P: A bayesian hierarchical model for learning natural scene categories. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), June 2005 2: 524-531.Google Scholar
- Shan Y, Sawhney HS, Matei B, Kumar R: Shapeme histrogram projection and matching for partial object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006, 28(4):568-577.View ArticleGoogle Scholar
- Liu Y, Zha H, Qin H: Shape topics: a compact representation and new algorithms for 3D partial shape retrieval. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), 2006 2: 2025-2032.Google Scholar
- Ohbuchi R, Osada K, Furuya T, Banno T: Salient local visual features for shape-based 3D model retrieval. Proceedings of the IEEE International Conference on Shape Modeling and Applications (SMI '08), 2008, Stony Brook, NY, USA 93-102.Google Scholar
- Li X, Godil A, Wagan A: Spatially enhanced bags of words for 3D shape retrieval. Proceedings of the 4th International Symposium on Advances in Visual Computing (ISVC '08), 2008, Las Vegas, Nev, USA, Lecture Notes in Computer Science 5358: 349-358.Google Scholar
- Chen D-Y, Ouhyoung M, Tian X-P, Shen Y-T: On visual similarity based 3D model retrieval. Computer Graphics Forum 2003, 22(3):223-232. 10.1111/1467-8659.00669View ArticleGoogle Scholar
- Kazhdan M, Funkhouser T, Rusinkiewicz S: Rotation invariant spherical harmonic representation of 3D shape descriptors. Proceedings of the ACM International Conference Symposium on Geometry Processing, June 2003, Aachen, Germany 43: 156-164.Google Scholar
- Osada R, Funkhouser T, Chazelle B, Dobkin D: Shape distributions. ACM Transactions on Graphics 2002, 21(4):807-832. 10.1145/571647.571648View ArticleMathSciNetGoogle Scholar
- Biasotti S: Reeb graph representation of surfaces with boundary. Proceedings of the Shape Modeling International (SMI '04), 2004 371-374.Google Scholar
- Shilane P, Min P, Kazhdan M, Funkhouser T: The princeton shape benchmark. Proceedings of the International Conference on Shape Modeling and Applications (SMI'04), June 2004, Genova, Italy 167-178.Google Scholar
- Fang R, Godil A, Li X, Wagan A: A new shape benchmark for 3D object retrieval. Proceedings of the 4th International Symposium on Visual Computing, 2008, Las Vegas, Nev, USA, Lecture Notes in Computer Science 5358: 381-392.Google Scholar
- Jayanti S, Kalyanaraman Y, Iyer N, Ramani K: Developing an engineering shape benchmark for CAD models. Computer Aided Design 2006, 38(9):939-953. Shape Similarity Detection and Search for CAD/CAE Applications 10.1016/j.cad.2006.06.007View ArticleGoogle Scholar
- Berman HM, Westbrook J, Feng Z, et al.: The protein data bank. Nucleic Acids Research 2000, 28(1):235-242. 10.1093/nar/28.1.235View ArticleGoogle Scholar
- CAESAR Anthopometric Database http://store.sae.org/caesar/
- Mitra NJ, Guibas L, Giesen J, Pauly M: Probabilistic fingerprints for shapes. Proceedings of the 4th Eurographics Symposium on Geometry Processing, 2006, Sardinia, Italy 256: 121-130.Google Scholar
- Funkhouser T, Kazhdan M, Shilane P, et al.: Modeling by example. Proceedings of the 31st International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '04), 2004 652-663.Google Scholar
- Li X, Godil A, Wagan A: 3D part identification based on local shape descriptors. Proceedings of the Performance Metrics for Intelligent Systems Workshop (PerMIS '08), August 2008, Gaithersburg, Md, USAGoogle Scholar
- Johnson AE, Hebert M: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 1999, 21(5):433-449. 10.1109/34.765655View ArticleGoogle Scholar
- Winn J, Criminisi A, Minka T: Object categorization by learned universal visual dictionary. Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), 2005 2: 1800-1807.View ArticleGoogle Scholar
- Lazebnik S, Schmid C, Ponce J: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Proceedings of the Computer Vision and Pattern Recognition (CVPR '06), 2006 2: 2169-2178.Google Scholar
- Grauman K, Darrell T: The pyramid match kernel: discriminative classification with sets of image features. Proceedings of the IEEE International Conference on Computer Vision (ICCV '05), October 2005 2: 1458-1465.View ArticleGoogle Scholar
- Savarese S, Fei-Fei L: 3D generic object categorization, localization and pose estimation. Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV '07), October 2007, Rio de Janeiro, Brazil 1-8.Google Scholar
- Yuan J, Wu Y: Context aware clustering. Proceedings of the Computer Vision and Pattern Recognition (CVPR '08), June 2008, Anchorage, Alaska, USA 1-8.Google Scholar
- Gal R, Cohen-Or D: Salient geometric features for partial shape matching and similarity. ACM Transactions on Graphics 2006, 25(1):130-150. 10.1145/1122501.1122507View ArticleGoogle Scholar
- Jain V, Zhang H: A spectral approach to shape-based retrieval of articulated 3D models. Computer Aided Design 2007, 39(5):398-407. 10.1016/j.cad.2007.02.009View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.