A method of 3D object recognition and localization in a cloud of points
 Jerzy Bielicki^{1}Email author and
 Robert Sitnik^{1}
https://doi.org/10.1186/16876180201329
© Bielicki and Sitnik; licensee Springer. 2013
Received: 12 February 2012
Accepted: 28 January 2013
Published: 21 February 2013
Abstract
The proposed method given in this article is prepared for analysis of data in the form of cloud of points directly from 3D measurements. It is designed for use in the enduser applications that can directly be integrated with 3D scanning software. The method utilizes locally calculated feature vectors (FVs) in point cloud data. Recognition is based on comparison of the analyzed scene with reference object library. A global descriptor in the form of a set of spatially distributed FVs is created for each reference model. During the detection process, correlation of subsets of reference FVs with FVs calculated in the scene is computed. Features utilized in the algorithm are based on parameters, which qualitatively estimate mean and Gaussian curvatures. Replacement of differentiation with averaging in the curvatures estimation makes the algorithm more resistant to discontinuities and poor quality of the input data. Utilization of the FV subsets allows to detect partially occluded and cluttered objects in the scene, while additional spatial information maintains false positive rate at a reasonably low level.
Keywords
Introduction
Prevalence of 3D data acquisition optical systems demands development of dedicated algorithms for an efficient analysis of such data in various fields of civil engineering, entertainment, and industry. Automated monitoring systems more and more often require enduser applications that are able to perform basic analysis or to detect unexpected object presence or behavior. Monitoring of large amount of visual data (e.g., representing terrain after flood, hurricane or another natural catastrophe) to find predefined objects (e.g., human bodies or generally 3D objects) that may be only partially visible is a difficult task for people. It demands constant focus, which usually varies in time. Efficiency of such analysis depends on human factor, and may lead to serious mistakes, that may cost human beings. Proposed method is suitable for such tasks due to automatic processing of large datasets, and specifying areas (with shape similar to requested; e.g., with higher probability than defined threshold), that should be analyzed precisely by human operator (to avoid falsepositive mistakes). Such a solution allows to apply robust and automated analysis for whole dataset, while only the limited areas require more efficient and expensive analysis performed by welltrained human operator.
Most of the existing algorithms are based on either local or global object description. The local approach allows to effectively detect partially occluded objects. Utilization of 3D curves to describe edges of the objects and splashes to describe variations of surface normals within the local neighbourhood is described in [1]. Other local representations are point signatures (similar to splashes, but avoiding firstorder derivative calculations) describing neighbour surface by analysis of sphere–surface intersections [2], point fingerprints (where geodesic circles are projected onto surface tangent planes) [3], or keypoints with quality measure, based on values of principal curvatures (more sensitive to presence of noise and discontinuities, as it requires secondorder derivative calculations) [4]. Locally based recognition is relatively fast, as it does not require computation of any features dependent on the size of the model. The disadvantage is that information provided only by a small part of the object may easily be corrupted by, e.g., noise, and mislead the detection process.
The global approach may be less dependent on the local quality of analyzed surface. Utilization of oriented surface–point pair (point and surface normal at this point) histograms and review of their statistical comparison criteria are presented in [5]. Other methods apply 3D modification of Hough transform for object similarity retrieval [6], or exploit reference object information from range image for efficient, lowlevel comparison implemented on GPU [7]. Important drawbacks include sensitivity to the loss of information (e.g., occlusion), misinformation (e.g., clutter) and time of data processing usually proportional to the object size.
Some existing algorithms require segmentation of the analyzed scene, which uses either a priori knowledge about input data or simple segmentation rules. Treating horizontally oriented planes as background to easily extract objects [8], or clustering scene using distancerelated feature [9] limits the generality of the algorithm. It often leads to under or over segmentation, what may result in significant errors, what is considered in the smoothness constraint segmentation problem [10].
Popular approach to 3D recognition problem is to exploit range images, also known as depth maps. These maps make data processing significantly faster, as they convert the most timeconsuming problems (e.g., neighbourhood finding) from 3D into 2D space. Such a conversion into deeply studied image comparison problem allows to calculate more efficiently local surface descriptors (2D local histograms describing distribution of shape index [11] versus scanning angle) [12], to match multidimensional object histograms (distributions of surface normals and shape indexes) [13], or to focus on implementation of efficient algorithms directly on the GPU [7]. Interesting, but requiring preceding conversion from range image data into mesh model, is tensorbased representation of local surface patches, with voting scheme using 4D hash table [14]. However, the drawback of this approach is that it cannot analyze surfaces representing complete volumes. A range image is a projection of the single, directional point cloud (which is point cloud acquired from single direction—socalled 2.5D data) onto a plane, therefore uniqueness cannot be maintained for full 3D objects (formed from numerous directional clouds of points integrated in a single model).
Another approach converting 3D recognition problem into 2D image correlation task is to exploit spinimages, presented in [15, 16]. Spinimage is a 2D accumulator located at oriented point (point with surface normal n _{s} at this point), which collects all points, that are intersection of analyzed object and rectangle rotated around surface normal. Only such points are taken into consideration, which surface normals are deviated from surface normal (n _{s}) direction less than certain angle threshold. Such a solution is timeefficient regarding scenemodel point matching. Drawback of this approach is the fact, that for optimal object description, setting the angle threshold requires knowledge about shape complexity of reference objects and objects analyzed in the scene.
Topology of the analysed data is exploited by another group of algorithms related to Nonrigid shapes[17], where statistical significance measure uses geodesic metrics as partial similarity criterion [18], or where 3D object is characterized by a set of signatures (histograms of geometric distances, diffusion distances, the ratio of diffusion and geodesic distances, and two curvaturerelated histograms) which allows to determine similarity between objects as a multiplication of the pair wise histogram comparison (with χ^{2} measure) results [19]. Unfortunately, they are not suitable for data in the form of single directional point clouds. Topology of an object represented by a directional point cloud may easily be changed due to the presence of noise or occlusion. This would introduce significant errors in the recognition process that use such type of algorithms.
In the process of development of the proposed algorithm, advantages of local and global approaches have been considered. Proposed algorithm exploits locally calculated descriptors based on point cloud parameters (which are equivalent to curvatures [20, 21] often utilized in the problem of face recognition [22–24]) on the purpose of scenemodel matching and then considers combinations of spatially distributed scenemodel matched candidates. Segmentation, as a step which may introduce errors and decrease algorithm efficiency, has been avoided. Considering various types of data representation and processing, it has been decided to keep whole 3D information, although for a purpose of analysis of single, directional point cloud the range image representation would suffice. Since input data for each reference object is a set of directional point clouds (acquired from different directions), there is no need to perform any preliminary segmentation or to use iterative closest point (ICPtype) [25] algorithms, to obtain full 3D reference object, like in [4]. This assumption allows to expand reference objects library quite easily, simply by scanning new 3D object and performing training phase, without using any additional software and with practically no effort. As a result, the proposed algorithm is more general, and may successfully be applied in various fields of industry (automatic machine part recognition), civil engineering (large area monitoring) and entertainment (applications for stereo imaging mobile phones), where 3D data are exploited. In cases mentioned above, the proposed algorithm points out areas geometrically similar to reference objects, which may significantly improve detection speed and effectiveness of search and rescue. As a part of builtin 3D scanner software, may be optional, but in result significant operator support in critical situations.
Main text
Method
PP
First, the common stage of both phases of the proposed algorithm consists five steps, which are preceded by simple point cloud average distance calculation to allow further automatic calculations on 3D input data.
Average pointtopoint distance calculation
This step precedes PP and is used to estimate the average distance between points in the input data. A set of random points (approximately 1000, or less if model is smaller than 1000 points) is selected and for each point the distance to the closest neighbour point is calculated and the average distance is found. It allows to set a value of neighbourhood radius r for each calculation step as a multiplication of an average point distance, so it is not necessary to know the dimensions of the measured objects. Value of neighbourhood radius is set arbitrary, assuring nonzero neighbourhood even in the presence of fluctuation of the local cloud density. In all calculations presented in the article neighbourhood radius r = 6.
Data PP consists of following steps:

surface normal vectors estimation,

border points detection,

C 1 parameter computation [26]

C 2 parameter computation,

surfacetype estimation.
Surface normal vectors estimation
Border points detection
C 1
Parameter C 1 is defined as follows [26].
C 2
Parameter C 2 is defined as below [26].
Local surface type
Local surface type estimation using H and K (mean and Gaussian curvature, respectively) or using C 1 and C 2 parameters, equivalently
Surface type  H  K  C 1  C 2 

Plane  0  0  0  0 
Convex cylinder  >0  0  >0  0 
Concave cylinder  <0  0  <0  0 
Convex ellipsoid  >0  >0  >0  >0 
Concave ellipsoid  <0  >0  <0  >0 
Saddlelike surface  >0  >0  
=0  <0  =0  <0  
<0  <0 
Training phase
Descriptor of a reference object consists of a set of local FVs, each of which is calculated for a sampling point p _{ n }. Since the local FV is calculated for every n th point of the data, the number of FVs depends on sampling resolution of the reference model. The descriptor also contains the radius R of the smallest bounding sphere for the model. The radius R is utilized in recognition phase to specify neighbourhood for local FV calculations.
The last step of the training phase is PCA, which reduces dimensionality of the created feature space. Feature space is reduced from initial N = 108 dimensions into significantly smaller space, usually less than 10D. Condition taken into consideration during PCA is to preserve 99% of initial energy. It guarantees that only the most meaningful linear combinations of initial features will be considered in the recognition phase.
Recognition phase
In this online phase, scenemodel matching is performed. Based on the PP results, local FV is created. Then the local FV is correlated with reference object descriptors. The thresholding step leads to the final decision about object presence.
Local modelscene correlation
where d = [d _{1},…,d _{ N }] is the projection of the local FV _{ n } onto feature space FS, d _{ref} = [d _{ r 1},…,d _{ rN }] is the projection of one of the reference model’s local FV _{ kl } onto feature space FS.
As a result a set of probability distributions P _{ ikl } is obtained, where i is point index, k is the index of the reference object and l is the index of one of the reference model’s local FV _{ kl }.
Thresholding
At this stage, final decision about the presence of one or multiple objects is made. Sensitivity of the algorithm is modified using three parameters:
T _{ p } – probability threshold; all points with probability P _{ ikl } below T _{ p } are ignored,
D _{ p } – distance difference threshold; a pair of scene sampling points is accepted only if the difference between distance P _{ i } P _{ j } and P _{ kl } P _{ km } is smaller than the threshold value, where P _{ i } P _{ j } – distance between neighbour points p _{ i } and p _{ j }, with probabilities P _{ ikl } and P _{ jkm }, respectively, P _{ kl } P _{ km } – distance between reference object sampling points p _{ kl } and p _{ km }, for which local FV _{ kl } and FV _{ km } were calculated.
N _{ p } – number of accepted points threshold; for N _{ p } = 3 similar triangles, and for N _{ p } = 4 similar quadrangles, are searched.
 1.
Point p _{ 1 } with probability P _{ 1kl } higher than T _{ p } is searched. If found, k (object index) and l (FV index) are remembered (Figure 8a).
 2.Point p _{2} with probability P _{2km } higher than T _{ p } is searched. Additional condition to satisfy is$\left\right{p}_{1}{p}_{2}{p}_{\mathit{kl}}{p}_{\mathit{km}}\left\right<{D}_{p}\cdot \left{p}_{\mathit{kl}}{p}_{\mathit{km}}\right\phantom{\rule{0.5em}{0ex}}\mathit{\text{for}}\phantom{\rule{0.5em}{0ex}}l\ne m$(6)
 3.
If N _{ p } = 3, then last, third point p _{3} is searched. That point has to satisfy condition (6) against both points p _{1} and p _{2} (Figure 8c). To guarantee similarity between detected and referenced triangle, direction of vector product in relation to average direction of normal vectors is checked. If succeeded, a triangle similar to the triangle from reference model is found in the scene and average probability is calculated. If N _{ p } = 4, a fourth point is searched to find a similar quadrangle. All probabilities are sorted top to bottom, so even if the search does not succeed, performance of the algorithm is high.
At the end of the recognition phase all small, adjacent groups of selected points are converted into consistent objects.
Results
Assessment of the proposed method was performed using synthetic and real input data. Synthetic data have mostly been used to evaluate the influence of data quality on detection results. Real data were captured using the OGX3DMADMAC scanner [30, 31].
In Figure 10a, one of the test scenes with objects from five different classes is presented. Consecutive figures present detection results (after the thresholding procedure, where colour of the object depends on the detection probability) of the following objects: (b) ball, (c) plasticine figures, (d) elephant, (e) dinosaur and (f) dog. All objects apart from the plasticine figures are detected with probability of nearly 100%. Plasticine figures are detected with lower, 80–90% probabilities. It is important to notice that only one of these figures was scanned as a reference object, and that all the figures differ from each other in shape, as they were handmade.
Figure 11 shows one of the worst case detection results for (a) torus, (b) ball, (c) plasticine figures, (d) elephant, (e) dinosaur and (f) dog. Figure 11a,c,d shows that in some cases local descriptors may not be descriptive enough to distinguish between similar objects. On the other hand, such a property reduces algorithm sensitivity to clutter and occlusion.
In Figure 12, a synthetic scene is presented. All the complex objects, such as (a) Porsche, aircraft and (b–d) Mercedes models have been converted from full 3D models (composed of multiple solids), so the obtained surfaces may locally be confusing for the algorithm. In Figure 12c, detection of the dog PSB is presented. Figure 12a,b,d presents detection results of human 1 PSB. Figure 12b,d shows the influence of distance difference threshold D _{ p } value on detection results for probability threshold T _{ p } = 90%. For Figure 12b, with D _{ p } = 0,04 only the standing pose is detected. Increasing parameter value to D _{ p } = 0,1 results in detection of both human figures and in one false positive detection (Figure 12d). It can be seen that presence of obstacles in the neighbourhood does not strongly affect detection, since local descriptors are utilized. From the other hand it is noticeable that, with constant value of D _{ p } threshold, the bigger is maximum distance between reference points (which is a result of value of radius R of the analysed neighbourhood—see 1.2 for details) the bigger is tolerance field for real difference between reference points distance and candidate points distance. For better performance, value of D _{ p } threshold should vary as a function of distance, to avoid linear increase of distance difference toleration with increase of the distance between reference points.
Robustness to clutter results from the fact that varying number of points representing other objects/background does not affect the locally calculated FVs, nor the detection results. Nevertheless, the impact of spatial distribution of clutter points is not easy to estimate.
The worst case presented in Figure 14 is recognition of the dog PSB. The reason seems to be the fact that even small amplitude noise added to, e.g., thin legs of the dog changes rapidly their shape (two legs may be joined into a single one). The next to the worst case is the cylinder. In this case, poor recognition result is caused by high ratio between length and radius of the cylinder. Large length implies high amplitude of random noise, which, compared with small radius, significantly modifies the shape of the object. The rest of the objects are detected efficiently and increasing noise amplitude decreases the recognition accuracy at an acceptable rate. Utilization of the BFPs instead of spatial derivatives in the process of local surface type estimation improves algorithm’s resistance to noise or discontinuities within the input data.
Conclusion
In this article, efficient and robust to noise algorithm have been presented. Performed comparisons in the recognition process are based on an approach avoiding noisesensitive calculations, to make an algorithm suitable for lowquality data. The assumption was to exploit minimum number of parameters, which completely describe local geometry of the surface. Main step of the algorithm is to consider set of spatially distributed surface parts as a representation of reference object stored in database. Such an approach should allow to detect even strongly occluded objects, where detection is a result of finding similar structure during scenemodel correlation process. Thanks that there is no significant difference for the algorithm between occlusion and clutter, as long as noise is not similar in shape to the missing parts of detected objects.
Main drawback is, in some cases, descriptiveness of the proposed descriptor, which may result in high false positive ratio. It may need to be minimized by improvements of geometrical representation of the reference object, and by utilizing quadrangles instead of triangles in thresholding process.
The results show that the proposed algorithm can be used with incomplete and noisy data. Impact of clutter on the recognition rate is also minimized because of the utilization of local FVs.
In the future, a sampling algorithm similar to Farthest Point Sampling[33] will be implemented to assure uniform data sampling. Also modification of the spatial representation of the reference objects may improve descriptiveness and thanks to that reduce false positive recognition ratio. The most promising seems to be an implementation of twolevel hierarchy model representation, which will be suitable for high detailed and point dense objects. At the moment, thresholding process results in finding in scene a triangle (or quadrangle), where its vertexes represent surface parts similar to the surface of reference object. Adding top level hierarchy, resulting in detection of “triangle of triangles” of similar surface parts, will increase the accuracy of detection, especially of large models (i.e. huge animals) which consist high detailed elements (i.e. head, legs, tail) which may be significant for distinction one from another (i.e. horse from rhinoceros). Another advantage of proposed modification is that it can represent each object as a spatial distribution of basic shapes, which is much more intuitive and easy to analyse for humans, than, i.e. set of keypoints [4], spinimages [16] or histograms [5], and can further be exploited in more abstract way (i.e. cognitive systems). Calculation time should not increase significantly, as the number of the most timeconsuming calculations (scenemodel similarity comparison) remains same, with change of the thresholding process only.
One of the advantages is that the same algorithm can add reference objects to the existing database, which allows to develop such a database whenever it is necessary. Presented method may efficiently assist the operators of the automated maintenance monitoring systems.
Declarations
Authors’ Affiliations
References
 Fridtjof GM: Stein, Structural indexing: efficient 3D object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14(2):125145. 10.1109/34.121785View ArticleGoogle Scholar
 Chin RJ: Seng Chua, Point signatures a new representation for 3D object recognition. Int. J. Comput. Vis. 1997, 25(1):6385. 10.1023/A:1007981719186View ArticleGoogle Scholar
 Yiyong Sun JP, Andreas K, Page DL, Abidi MA: The point fingerpint. A new 3D object representation scheme. IEEE Trans. Syst. Man Cybern. 2003, 33(4):712717. 10.1109/TSMCB.2003.814295View ArticleGoogle Scholar
 Mian A, Bennamoun M, Owens R: On the repeatability and quality of keypoints for local featurebased 3D object retrieval from cluttered scenes. Int. J. Comput. Vis. 2009, 89: 348361.View ArticleGoogle Scholar
 Wahl E, Hillenbrand U, Hirzinger G: Surfletpairrelation histograms: a statistical 3Dshape representation for rapid classification. Proc. Fourth International Conference on 3D Digital Imaging and Modeling 2003, 474481.Google Scholar
 Zaharia T, Preteux F: Hough transformbased 3D mesh retrieval. Proc. SPIE 2001, 4476: 175185. 10.1117/12.447283View ArticleGoogle Scholar
 Marcel Germann MDB: Automatic Pose Estimation for Range Images on the GPU. Hanspeter Pfister, Kyu Park; 2005.Google Scholar
 Rusu RB, Holzbach A, Beetz M, Bradski G: Detecting and segmenting objects for mobile manipulation. Proc. IEEE 12th International Conference on Computer Vision Workshops 2009, 4754.Google Scholar
 Klasing K: A clustering method for efficient segmentation of 3D laser data. Proc. IEEE International Conference on Robotics and Automation 2008, 40434048.Google Scholar
 Rabbani T, van den Heuvel FA, Vosselman G: Segmentation of point clouds using smoothness constraint. ISPRS 2006, XXXVI(5):248253.Google Scholar
 Koenderink JJ, van Doorn AJ: Surface shape and curvature scales. Image Vis. Comput. 1992, 10(8):557564. 10.1016/02628856(92)90076FView ArticleGoogle Scholar
 Hui BB: Chen, 3D freeform object recognition in range images using local surface patches. Pattern Recognit. Lett. 2007, 28(10):12521262. 10.1016/j.patrec.2007.02.009View ArticleGoogle Scholar
 Leibe B, Hetzel G, Levi P, Schiele B: 3D object recognition from range images using local feature histograms. Proc. CVPR(II) 2001, 394399.Google Scholar
 Mian AS, Bennamoun M, Owens RA: 3D recognition and segmentation of objects in cluttered scenes. Proc. ACV 2005, 813.Google Scholar
 Andrew E, Johnson MH: Surface matching for object recognition in complex 3D scenes. Image Vis. Comput. 1998, 16: 635651. 10.1016/S02628856(98)000742View ArticleGoogle Scholar
 Andrew E, Johnson MH: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21(5):433449. 10.1109/34.765655View ArticleGoogle Scholar
 Ovsjanikov M, Huang QX, Guibas L: A condition number for nonrigid shape matching. Computer Graphics Focum 2011, 30(5):15031512. 10.1111/j.14678659.2011.02024.xView ArticleGoogle Scholar
 Bronstein AM, Bronstein MM, Carmon Y, Kimmel R: Partial Similarity of Shapes Using a Statistical Significance Measure. Department of Computer Science, Technion Israel Institute of Technology, Haifa; 2009.Google Scholar
 Mahmoudi M, Sapiro G: Threedimensional point cloud recognition via distributions of geometric distances. Graphical Models 2009, 71(1):2231. 10.1016/j.gmod.2008.10.002View ArticleGoogle Scholar
 Goldfeather J VI: A novel cubicorder algorithm for approximating principal direction vectors. ACM Trans. Graph 2004, 23: 4563. 10.1145/966131.966134View ArticleGoogle Scholar
 Mustafa A, Shapiro L, Ganter M: 3D object identification with color and curvature signatures. Pattern Recognit. 1999, 32: 339355. 10.1016/S00313203(98)000752View ArticleGoogle Scholar
 Colombo A, Cusano C, Schettini R: 3D face detection using curvature analysis. Pattern Recognit. 2006, 39: 444455. 10.1016/j.patcog.2005.09.009MATHView ArticleGoogle Scholar
 Szeptycki P, Ardabilian M, Chen L: A coarsetofine curvature analysisbased rotation invariant 3D face landmarking. Proc. International Conference of Biometrics: Theory, Applications and Systems 2009, 16.Google Scholar
 Crosilla F, Visintini D, Sepic F: Reliable automatic classification and segmentation of laser point clouds by statistical analysis of surface curvature values. Appl. Geomat. 2009, 1(1–2):1730.View ArticleGoogle Scholar
 Li C, Shaonyi D, Nanning Z: A fast multiresolution iterative closest point algorithm. Proc. CCPR 2010, 15.Google Scholar
 Witkowski M, Sitnik R: Locating and tracing of anatomical landmarks based on fullfield fourdimensional measurement of human body surface. J. Biomed. Opt. 2008, 13(4):044039. 10.1117/1.2960017View ArticleGoogle Scholar
 Björck A: Numerical Methods for Least Squares Problems. SIAM, Philadelphia; 1997.MATHGoogle Scholar
 Kalogerakis E, Nowrouzezahrai D, Patricio S, Karan S: Extracting lines of curvature from noisy point clouds. Elsevier J. Comput.Aided Design 2009, 41(Special Issue on PointBased Computational Techniques):282292.View ArticleGoogle Scholar
 Jolliffe IT: Principal Component Analysis. 2nd edition. Springer, New York; 2002.MATHGoogle Scholar
 Sitnik R, Kujawińska M, Woźnicki J: Digital fringe projection system for largevolume 360deg shape measurement. Opt. Eng. 2002, 443(41):443449.View ArticleGoogle Scholar
 OGXOptographx.http://ogx.mchtr.pw.edu.pl/projects/3dmadmac/
 Princeton Shape Benchmark.http://shape.cs.princeton.edu/benchmark/
 Bronstein AM, Bronstein MM, Kimmel R: Numerical Geometry of NonRigid Shapes. Springer Verlag, New York; 2009.View ArticleMATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.