Skip to main content

A method of 3D object recognition and localization in a cloud of points


The proposed method given in this article is prepared for analysis of data in the form of cloud of points directly from 3D measurements. It is designed for use in the end-user applications that can directly be integrated with 3D scanning software. The method utilizes locally calculated feature vectors (FVs) in point cloud data. Recognition is based on comparison of the analyzed scene with reference object library. A global descriptor in the form of a set of spatially distributed FVs is created for each reference model. During the detection process, correlation of subsets of reference FVs with FVs calculated in the scene is computed. Features utilized in the algorithm are based on parameters, which qualitatively estimate mean and Gaussian curvatures. Replacement of differentiation with averaging in the curvatures estimation makes the algorithm more resistant to discontinuities and poor quality of the input data. Utilization of the FV subsets allows to detect partially occluded and cluttered objects in the scene, while additional spatial information maintains false positive rate at a reasonably low level.


Prevalence of 3D data acquisition optical systems demands development of dedicated algorithms for an efficient analysis of such data in various fields of civil engineering, entertainment, and industry. Automated monitoring systems more and more often require end-user applications that are able to perform basic analysis or to detect unexpected object presence or behavior. Monitoring of large amount of visual data (e.g., representing terrain after flood, hurricane or another natural catastrophe) to find predefined objects (e.g., human bodies or generally 3D objects) that may be only partially visible is a difficult task for people. It demands constant focus, which usually varies in time. Efficiency of such analysis depends on human factor, and may lead to serious mistakes, that may cost human beings. Proposed method is suitable for such tasks due to automatic processing of large datasets, and specifying areas (with shape similar to requested; e.g., with higher probability than defined threshold), that should be analyzed precisely by human operator (to avoid false-positive mistakes). Such a solution allows to apply robust and automated analysis for whole dataset, while only the limited areas require more efficient and expensive analysis performed by well-trained human operator.

Most of the existing algorithms are based on either local or global object description. The local approach allows to effectively detect partially occluded objects. Utilization of 3D curves to describe edges of the objects and splashes to describe variations of surface normals within the local neighbourhood is described in [1]. Other local representations are point signatures (similar to splashes, but avoiding first-order derivative calculations) describing neighbour surface by analysis of sphere–surface intersections [2], point fingerprints (where geodesic circles are projected onto surface tangent planes) [3], or keypoints with quality measure, based on values of principal curvatures (more sensitive to presence of noise and discontinuities, as it requires second-order derivative calculations) [4]. Locally based recognition is relatively fast, as it does not require computation of any features dependent on the size of the model. The disadvantage is that information provided only by a small part of the object may easily be corrupted by, e.g., noise, and mislead the detection process.

The global approach may be less dependent on the local quality of analyzed surface. Utilization of oriented surface–point pair (point and surface normal at this point) histograms and review of their statistical comparison criteria are presented in [5]. Other methods apply 3D modification of Hough transform for object similarity retrieval [6], or exploit reference object information from range image for efficient, low-level comparison implemented on GPU [7]. Important drawbacks include sensitivity to the loss of information (e.g., occlusion), misinformation (e.g., clutter) and time of data processing usually proportional to the object size.

Some existing algorithms require segmentation of the analyzed scene, which uses either a priori knowledge about input data or simple segmentation rules. Treating horizontally oriented planes as background to easily extract objects [8], or clustering scene using distance-related feature [9] limits the generality of the algorithm. It often leads to under or over segmentation, what may result in significant errors, what is considered in the smoothness constraint segmentation problem [10].

Popular approach to 3D recognition problem is to exploit range images, also known as depth maps. These maps make data processing significantly faster, as they convert the most time-consuming problems (e.g., neighbourhood finding) from 3D into 2D space. Such a conversion into deeply studied image comparison problem allows to calculate more efficiently local surface descriptors (2D local histograms describing distribution of shape index [11] versus scanning angle) [12], to match multidimensional object histograms (distributions of surface normals and shape indexes) [13], or to focus on implementation of efficient algorithms directly on the GPU [7]. Interesting, but requiring preceding conversion from range image data into mesh model, is tensor-based representation of local surface patches, with voting scheme using 4D hash table [14]. However, the drawback of this approach is that it cannot analyze surfaces representing complete volumes. A range image is a projection of the single, directional point cloud (which is point cloud acquired from single direction—so-called 2.5D data) onto a plane, therefore uniqueness cannot be maintained for full 3D objects (formed from numerous directional clouds of points integrated in a single model).

Another approach converting 3D recognition problem into 2D image correlation task is to exploit spin-images, presented in [15, 16]. Spin-image is a 2D accumulator located at oriented point (point with surface normal n s at this point), which collects all points, that are intersection of analyzed object and rectangle rotated around surface normal. Only such points are taken into consideration, which surface normals are deviated from surface normal (n s) direction less than certain angle threshold. Such a solution is time-efficient regarding scene-model point matching. Drawback of this approach is the fact, that for optimal object description, setting the angle threshold requires knowledge about shape complexity of reference objects and objects analyzed in the scene.

Topology of the analysed data is exploited by another group of algorithms related to Non-rigid shapes[17], where statistical significance measure uses geodesic metrics as partial similarity criterion [18], or where 3D object is characterized by a set of signatures (histograms of geometric distances, diffusion distances, the ratio of diffusion and geodesic distances, and two curvature-related histograms) which allows to determine similarity between objects as a multiplication of the pair wise histogram comparison (with χ2 measure) results [19]. Unfortunately, they are not suitable for data in the form of single directional point clouds. Topology of an object represented by a directional point cloud may easily be changed due to the presence of noise or occlusion. This would introduce significant errors in the recognition process that use such type of algorithms.

In the process of development of the proposed algorithm, advantages of local and global approaches have been considered. Proposed algorithm exploits locally calculated descriptors based on point cloud parameters (which are equivalent to curvatures [20, 21] often utilized in the problem of face recognition [2224]) on the purpose of scene-model matching and then considers combinations of spatially distributed scene-model matched candidates. Segmentation, as a step which may introduce errors and decrease algorithm efficiency, has been avoided. Considering various types of data representation and processing, it has been decided to keep whole 3D information, although for a purpose of analysis of single, directional point cloud the range image representation would suffice. Since input data for each reference object is a set of directional point clouds (acquired from different directions), there is no need to perform any preliminary segmentation or to use iterative closest point (ICP-type) [25] algorithms, to obtain full 3D reference object, like in [4]. This assumption allows to expand reference objects library quite easily, simply by scanning new 3D object and performing training phase, without using any additional software and with practically no effort. As a result, the proposed algorithm is more general, and may successfully be applied in various fields of industry (automatic machine part recognition), civil engineering (large area monitoring) and entertainment (applications for stereo imaging mobile phones), where 3D data are exploited. In cases mentioned above, the proposed algorithm points out areas geometrically similar to reference objects, which may significantly improve detection speed and effectiveness of search and rescue. As a part of built-in 3D scanner software, may be optional, but in result significant operator support in critical situations.

Main text


The proposed algorithm is divided into two phases (Figure 1). First is the training phase, performed off-line, when reference object database is created. Second phase is the recognition process, performed on-line. Both phases use common pre-processing (PP) in the beginning of the data processing flow. PP extracts geometrical features from the input data, which are utilized in the further steps. Detection results are presented as a probability distribution of the object presence in the scene.

Figure 1
figure 1

Algorithm scheme.


First, the common stage of both phases of the proposed algorithm consists five steps, which are preceded by simple point cloud average distance calculation to allow further automatic calculations on 3D input data.

Average point-to-point distance calculation

This step precedes PP and is used to estimate the average distance between points in the input data. A set of random points (approximately 1000, or less if model is smaller than 1000 points) is selected and for each point the distance to the closest neighbour point is calculated and the average distance is found. It allows to set a value of neighbourhood radius r for each calculation step as a multiplication of an average point distance, so it is not necessary to know the dimensions of the measured objects. Value of neighbourhood radius is set arbitrary, assuring non-zero neighbourhood even in the presence of fluctuation of the local cloud density. In all calculations presented in the article neighbourhood radius r = 6.

Data PP consists of following steps:

  • surface normal vectors estimation,

  • border points detection,

  • C 1 parameter computation [26]

  • C 2 parameter computation,

  • surface-type estimation.

Surface normal vectors estimation

In the next step, normal vectors for the directional point cloud data are calculated. For each point, a best fit plane (BFP) to the surface within the neighbourhood is estimated, using Root Mean Square (RMS) minimization criterion [27]. Obtained BFP is described by coefficients A, B, C and D in plane Equation (1). Normal vector is denoted as a set of normalized coefficients (A,B,C) from Equation (1), and its orientation is determined by the location of the scanning system (normal vector is oriented towards scanning system, which is equivalent to C > 0).

BFP : Ax + By + Cz + D = 0

Border points detection

Surface of the BFP is radially divided into six equal zones (angle of each zone is approximately 60°) with the projection of currently analysed point p p used as division centre (Figure 2) [28]. BFP division is made using vector v 1, which connects projections of the arbitrary neighbour point p 1 with division centre p p , and using vector v 2, which is a vector product of BFP normal and vector v 1. Each zone accumulates orthogonal projections of every neighbour point p i onto the BFP. Lack of points in any of the zones indicates that the considered point belongs to an edge. Division orientation for each BFP is arbitrary, but it does not affect the result of the border detection stage.

Figure 2
figure 2

Border points estimation. BFP is equally divided into six zones, all of which “collects” orthogonal projections of neighbour points. Empty zone indicates that analysed point lies on the border.

C 1

Parameter C 1 is defined as follows [26].

For each sampling point p p signed, weighted, average distance to BFP is calculated. Distance with sign means that the sign of the value depends on the location of neighbour point p i with respect to the BFP normal vector (Figure 3). Points located on the same side of the BFP as its normal vector contribute with positive value d i whereas points on the opposite side contribute with negative value d j . Weight w i for each neighbour point is proportional to the Gaussian distribution of the distance s i between orthogonal projections of sampling point p p and neighbour point p i :

w i e s i 2 2 r 2

, where r is the neighbourhood radius. Value of C 1 is greater than zero for convex surfaces, smaller than zero for concave surfaces and equal to zero for planes or saddle points. C 1 parameter corresponds qualitatively to mean curvature, which is arithmetical mean of main surface curvatures.

Figure 3
figure 3

C 1 parameter estimation. C 1 value is a signed, weighted average point distance to BFP, where weight is proportional to Gaussian function of point distance s i and neighbourhood radius r.

C 2

Parameter C 2 is defined as below [26].

For each sampling point p p , a best plane (BFP N ) is fitted to all its neighbour normal vectors (such as n i ), and average distance between BFP N and tips of the normal vectors is calculated (Figure 4c,f). As one can see, for cylindrical surfaces (Figure 4d–f) C 2 value is equal to zero, while for spherical surfaces (Figure 4a–c) value of C 2 is positive. Negative value of C 2 denotes saddle-like surface. For each neighbour point p i , inner product of normal n i and vector v i is also calculated. Vector v i connects orthogonal projections of sampling point p p and neighbour point p i onto BFP (Figure 5). Sign of each inner product v i n i gives information whether surface surrounding neighbourhood point p i is concave or convex. Presence of noise can cause local fluctuations of direction of normal vectors and change sign of particular inner products. Taking it into consideration, the presence of saddle-like surface is indicated when number of the inner products with opposite signs exceeds 35%. C 2 parameter corresponds qualitatively to Gaussian curvature, which is a multiplication of main surface curvatures.

Figure 4
figure 4

C 2 parameter calculation. (a) Normal vectors on a sphere; (b) normal vectors from a sphere translated into the origin of the coordinate system; (c) normal vectors from a sphere and their BFP N ; (d) normal vectors on a cylinder; (e) normal vectors from a cylinder translated into the origin of the coordinate system; (f) normal vectors from a cylinder and their BFP N [26].

Figure 5
figure 5

Estimation of surface convexity. Positive inner product denotes convex surface, otherwise surface is concave.

Local surface type

Local surface type is estimated using parameters C 1 and C 2 (Table 1). Positive signs of values of both parameters indicate convex ellipsoidal surface. Negative sign of C 2 parameter value indicates that one of the main curvatures is negative, which means that the surface is saddle-like. When C 2 parameter value is zero, then surface is planar for C 1 parameter equal to zero, and cylindrical otherwise. Sign of the C 1 parameter then indicates cylinder convexity (or ellipsoid convexity for positive C 2 value). Local surface type corresponds to shape index introduced by Koenderink and van Doorn [11] utilized in [6, 12, 13], which is basically a mapping of main curvatures into polar system, where angle denotes the surface type, and distance from origin denotes the values of the curvatures. Shape index s is denoted with equation:

s = 2 π arctan κ 2 + κ 1 κ 2 κ 1

where κ 1 and κ 2 are main curvatures. As it can be seen in Table 1, local surface type parameter can have eight different values, utilized in further calculations.

Table 1 Local surface type estimation using H and K (mean and Gaussian curvature, respectively) or using C 1 and C 2 parameters, equivalently

Training phase

In this off-line phase, a description of the reference object is created. Based on the PP results the local feature vectors (FVs) are computed, and then collected to form a global descriptor for each reference object (Figure 6). Further, reduction of the feature space dimensionality using principal component analysis (PCA) is performed [29].

Figure 6
figure 6

Reference object descriptor generation flow.

Descriptor of a reference object consists of a set of local FVs, each of which is calculated for a sampling point p n . Since the local FV is calculated for every n th point of the data, the number of FVs depends on sampling resolution of the reference model. The descriptor also contains the radius R of the smallest bounding sphere for the model. The radius R is utilized in recognition phase to specify neighbourhood for local FV calculations.

Each local FV contains two histograms. First one is 2D distribution C 1 versus C 2 (Figure 7).

H 1 : D 1 i , j 1 , 2 , , d C 1 = 10 × 1 , 2 , , d C 2 = 10 ,

where d C 1 is a number of intervals for C 1 distribution and d C 2 is a number of intervals for C 2 distribution. The other one is a local surface type distribution:

H 2 : D 2 i 1 , 2 , , d S = 8

where d S is a number of intervals. Dimensionality of the feature space FS is then N = d C 1 d C 2 + d S . Each local FV contains also minimal and maximal values of parameters C 1 and C 2 and coordinates of the sampling point p n , for which the vector is calculated. During this process all edge points are ignored, as the information obtained for these areas may be misleading. For each local FV only the points located closer than 60% of radius R are considered. It implies that only approximately 30% of reference model surface is considered for a single FV and partial occlusion or clutter affects only few FVs. Due to this fact the algorithm is more resistant to these adverse conditions.

Figure 7
figure 7

2D distribution of parameters C 1 versus C 2. This distribution along with local surface type distribution creates FV.

The last step of the training phase is PCA, which reduces dimensionality of the created feature space. Feature space is reduced from initial N = 108 dimensions into significantly smaller space, usually less than 10D. Condition taken into consideration during PCA is to preserve 99% of initial energy. It guarantees that only the most meaningful linear combinations of initial features will be considered in the recognition phase.

Recognition phase

In this on-line phase, scene-model matching is performed. Based on the PP results, local FV is created. Then the local FV is correlated with reference object descriptors. The thresholding step leads to the final decision about object presence.

Local model-scene correlation

The step following the PP stage is introduced to compute the local FV for each sampling point. Similarly to the training phase, every n th point is sampled. The size of neighbourhood (denoted by the radius R) utilized for the calculation of local FVs depends on the size of the reference object that is looked for (see 1.2 paragraph for details). Each of the scene’s local FVs is projected onto feature space FS and then correlated with all FVs of the reference object. Correlation degree is denoted as a local probability P of object existence, and is calculated as follows:

P = e d d ref 2 d ref 2 = e o = 1 N d o d ro 2 o = 1 N d ro 2

where d = [d 1,…,d N ] is the projection of the local FV n onto feature space FS, d ref = [d r 1,…,d rN ] is the projection of one of the reference model’s local FV kl onto feature space FS.

As a result a set of probability distributions P ikl is obtained, where i is point index, k is the index of the reference object and l is the index of one of the reference model’s local FV kl .


At this stage, final decision about the presence of one or multiple objects is made. Sensitivity of the algorithm is modified using three parameters:

T p – probability threshold; all points with probability P ikl below T p are ignored,

D p – distance difference threshold; a pair of scene sampling points is accepted only if the difference between distance |P i P j | and |P kl P km | is smaller than the threshold value, where |P i P j | – distance between neighbour points p i and p j , with probabilities P ikl and P jkm , respectively, |P kl P km | – distance between reference object sampling points p kl and p km , for which local FV kl and FV km were calculated.

N p – number of accepted points threshold; for N p = 3 similar triangles, and for N p = 4 similar quadrangles, are searched.

Thresholding algorithm (Figure 8) is shown below.

  1. 1.

    Point p 1 with probability P 1kl higher than T p is searched. If found, k (object index) and l (FV index) are remembered (Figure 8a).

  2. 2.

    Point p 2 with probability P 2km higher than T p is searched. Additional condition to satisfy is

    | | p 1 p 2 | | p kl p km | | < D p | p kl p km | for l m

which corresponds to the fact that a line segment |P i P j | similar to the reference model segment |P kl P km | is found in the scene (Figure 8b).

  1. 3.

    If N p = 3, then last, third point p 3 is searched. That point has to satisfy condition (6) against both points p 1 and p 2 (Figure 8c). To guarantee similarity between detected and referenced triangle, direction of vector product in relation to average direction of normal vectors is checked. If succeeded, a triangle similar to the triangle from reference model is found in the scene and average probability is calculated. If N p = 4, a fourth point is searched to find a similar quadrangle. All probabilities are sorted top to bottom, so even if the search does not succeed, performance of the algorithm is high.

Figure 8
figure 8

Thresholding algorithm. (a) First point with probability greater than threshold is found (with highest probability, as all probabilities points are sorted top to bottom); (b) second point with enough high probability, satisfying additional distance requirement, is found; (c) similar triangle is found, when exists third point which satisfies both distance requirements and guarantee that triangle vertexes permutation is maintained. If needed (N p = 4), fourth point is searched to obtain similar quadrangle.

At the end of the recognition phase all small, adjacent groups of selected points are converted into consistent objects.


Assessment of the proposed method was performed using synthetic and real input data. Synthetic data have mostly been used to evaluate the influence of data quality on detection results. Real data were captured using the OGX|3DMADMAC scanner [30, 31].

For the purpose of algorithm evaluation, an object database was created (red objects in Figure 9). Each reference object is presented as a set of directional point clouds acquired from various points of view. Direction of data acquisition changes horizontally. Angles between any two adjacent scans are approximately 45°. All objects apart from two human figures and a dog, located right bottom in Figure 9, were scanned with OGX|3DMADMAC scanner. Human figures and the dog are imported from Princeton Shape Benchmark (called further: human 1 PSB, human 2 PSB, dog PSB) [32], re-sampled and converted into directional point cloud to maintain consistency within input data. Conversion from full 3D model usually properly imitates point surface acquired during scanning process. If a 3D model consists of a set of volumes (i.e., computer designed model as a set of sub volumes), then created surface may be similar to one that has poor quality, but still allows to perform the calculation. All presented results (Figures 10, 11, 12) are made using colour map with red colour denoting probability close to 100%, through green denoting 50% to blue colour denoting 0%. Grey colour denotes probability below threshold.

Figure 9
figure 9

Reference objects database. Object marked as black (plasticine figurines) are not present in the created library.

Figure 10
figure 10

Recognition results in scenes scanned with OGX|3DMADMAC system.

Figure 11
figure 11

Recognition results in scenes scanned with OGX|3DMADMAC system.

Figure 12
figure 12

Recognition results in synthetic scenes.

In Figure 10a, one of the test scenes with objects from five different classes is presented. Consecutive figures present detection results (after the thresholding procedure, where colour of the object depends on the detection probability) of the following objects: (b) ball, (c) plasticine figures, (d) elephant, (e) dinosaur and (f) dog. All objects apart from the plasticine figures are detected with probability of nearly 100%. Plasticine figures are detected with lower, 80–90% probabilities. It is important to notice that only one of these figures was scanned as a reference object, and that all the figures differ from each other in shape, as they were hand-made.

Figure 11 shows one of the worst case detection results for (a) torus, (b) ball, (c) plasticine figures, (d) elephant, (e) dinosaur and (f) dog. Figure 11a,c,d shows that in some cases local descriptors may not be descriptive enough to distinguish between similar objects. On the other hand, such a property reduces algorithm sensitivity to clutter and occlusion.

In Figure 12, a synthetic scene is presented. All the complex objects, such as (a) Porsche, aircraft and (b–d) Mercedes models have been converted from full 3D models (composed of multiple solids), so the obtained surfaces may locally be confusing for the algorithm. In Figure 12c, detection of the dog PSB is presented. Figure 12a,b,d presents detection results of human 1 PSB. Figure 12b,d shows the influence of distance difference threshold D p value on detection results for probability threshold T p = 90%. For Figure 12b, with D p = 0,04 only the standing pose is detected. Increasing parameter value to D p = 0,1 results in detection of both human figures and in one false positive detection (Figure 12d). It can be seen that presence of obstacles in the neighbourhood does not strongly affect detection, since local descriptors are utilized. From the other hand it is noticeable that, with constant value of D p threshold, the bigger is maximum distance between reference points (which is a result of value of radius R of the analysed neighbourhood—see 1.2 for details) the bigger is tolerance field for real difference between reference points distance and candidate points distance. For better performance, value of D p threshold should vary as a function of distance, to avoid linear increase of distance difference toleration with increase of the distance between reference points.

Figure 13 presents the relation of recognition rate versus occlusion for all the analysed objects. For most of the objects recognition rate above 80% is reached for 70% occlusion, since detection process utilizes a combination of a small number of local FVs. Simultaneously, spatial information (analysis of distribution of at least three descriptors for each reference object) allows to keep false positive ratio at a reasonably low level. The achieved results show that the proposed algorithm is resistant to occlusion. More sensitive to clutter and occlusion seem to be the algorithms, based on histogram comparison [5], where incompleteness of data may change comparison result significantly. Better performance present algorithms based on keypoints [4] with further ICP-type surface registration, what may be less effective, when considering objects, that are in general similar, but differ from each other on the whole surface (i.e., plasticine figures), which is the most common case in real environment.

Figure 13
figure 13

Recognition rate versus occlusion.

Robustness to clutter results from the fact that varying number of points representing other objects/background does not affect the locally calculated FVs, nor the detection results. Nevertheless, the impact of spatial distribution of clutter points is not easy to estimate.

Another important factor to consider is noise. Presence of the noise was simulated by adding to each object (synthetic or real) randomly generated noise of increasing amplitude. To maintain consistency within all reference objects, amplitude value depends on object’s diameter. Values of all detection parameters are the same as in previous assessments. Influence of the noise on recognition efficiency is presented in Figure 14. Most of the objects are recognized with efficiency above 75% for amplitude noise reaching 5% of object diameter. Presence of noise seems to be crucial for descriptiveness of histograms presented in [5], as they require whole data of good quality for proper representation of the object. Sensitive to the presence of noise appear to be also point fingerprint approach [3], as it utilizes geodesic curves, which length and position may strongly be disrupted by noisy points. For the ICP-type algorithms [4], presence of noise may slightly increase RMS error value (quality of surface registration), but should not strongly affect recognition rate.

Figure 14
figure 14

Recognition rate versus noise.

The worst case presented in Figure 14 is recognition of the dog PSB. The reason seems to be the fact that even small amplitude noise added to, e.g., thin legs of the dog changes rapidly their shape (two legs may be joined into a single one). The next to the worst case is the cylinder. In this case, poor recognition result is caused by high ratio between length and radius of the cylinder. Large length implies high amplitude of random noise, which, compared with small radius, significantly modifies the shape of the object. The rest of the objects are detected efficiently and increasing noise amplitude decreases the recognition accuracy at an acceptable rate. Utilization of the BFPs instead of spatial derivatives in the process of local surface type estimation improves algorithm’s resistance to noise or discontinuities within the input data.


In this article, efficient and robust to noise algorithm have been presented. Performed comparisons in the recognition process are based on an approach avoiding noise-sensitive calculations, to make an algorithm suitable for low-quality data. The assumption was to exploit minimum number of parameters, which completely describe local geometry of the surface. Main step of the algorithm is to consider set of spatially distributed surface parts as a representation of reference object stored in database. Such an approach should allow to detect even strongly occluded objects, where detection is a result of finding similar structure during scene-model correlation process. Thanks that there is no significant difference for the algorithm between occlusion and clutter, as long as noise is not similar in shape to the missing parts of detected objects.

Main drawback is, in some cases, descriptiveness of the proposed descriptor, which may result in high false positive ratio. It may need to be minimized by improvements of geometrical representation of the reference object, and by utilizing quadrangles instead of triangles in thresholding process.

The results show that the proposed algorithm can be used with incomplete and noisy data. Impact of clutter on the recognition rate is also minimized because of the utilization of local FVs.

In the future, a sampling algorithm similar to Farthest Point Sampling[33] will be implemented to assure uniform data sampling. Also modification of the spatial representation of the reference objects may improve descriptiveness and thanks to that reduce false positive recognition ratio. The most promising seems to be an implementation of two-level hierarchy model representation, which will be suitable for high detailed and point dense objects. At the moment, thresholding process results in finding in scene a triangle (or quadrangle), where its vertexes represent surface parts similar to the surface of reference object. Adding top level hierarchy, resulting in detection of “triangle of triangles” of similar surface parts, will increase the accuracy of detection, especially of large models (i.e. huge animals) which consist high detailed elements (i.e. head, legs, tail) which may be significant for distinction one from another (i.e. horse from rhinoceros). Another advantage of proposed modification is that it can represent each object as a spatial distribution of basic shapes, which is much more intuitive and easy to analyse for humans, than, i.e. set of keypoints [4], spin-images [16] or histograms [5], and can further be exploited in more abstract way (i.e. cognitive systems). Calculation time should not increase significantly, as the number of the most time-consuming calculations (scene-model similarity comparison) remains same, with change of the thresholding process only.

One of the advantages is that the same algorithm can add reference objects to the existing database, which allows to develop such a database whenever it is necessary. Presented method may efficiently assist the operators of the automated maintenance monitoring systems.


  1. Fridtjof GM: Stein, Structural indexing: efficient 3-D object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14(2):125-145. 10.1109/34.121785

    Article  Google Scholar 

  2. Chin RJ: Seng Chua, Point signatures a new representation for 3D object recognition. Int. J. Comput. Vis. 1997, 25(1):63-85. 10.1023/A:1007981719186

    Article  Google Scholar 

  3. Yiyong Sun JP, Andreas K, Page DL, Abidi MA: The point fingerpint. A new 3-D object representation scheme. IEEE Trans. Syst. Man Cybern. 2003, 33(4):712-717. 10.1109/TSMCB.2003.814295

    Article  Google Scholar 

  4. Mian A, Bennamoun M, Owens R: On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. Int. J. Comput. Vis. 2009, 89: 348-361.

    Article  Google Scholar 

  5. Wahl E, Hillenbrand U, Hirzinger G: Surflet-pair-relation histograms: a statistical 3D-shape representation for rapid classification. Proc. Fourth International Conference on 3-D Digital Imaging and Modeling 2003, 474-481.

    Google Scholar 

  6. Zaharia T, Preteux F: Hough transform-based 3D mesh retrieval. Proc. SPIE 2001, 4476: 175-185. 10.1117/12.447283

    Article  Google Scholar 

  7. Marcel Germann MDB: Automatic Pose Estimation for Range Images on the GPU. Hanspeter Pfister, Kyu Park; 2005.

    Google Scholar 

  8. Rusu RB, Holzbach A, Beetz M, Bradski G: Detecting and segmenting objects for mobile manipulation. Proc. IEEE 12th International Conference on Computer Vision Workshops 2009, 47-54.

    Google Scholar 

  9. Klasing K: A clustering method for efficient segmentation of 3D laser data. Proc. IEEE International Conference on Robotics and Automation 2008, 4043-4048.

    Google Scholar 

  10. Rabbani T, van den Heuvel FA, Vosselman G: Segmentation of point clouds using smoothness constraint. ISPRS 2006, XXXVI(5):248-253.

    Google Scholar 

  11. Koenderink JJ, van Doorn AJ: Surface shape and curvature scales. Image Vis. Comput. 1992, 10(8):557-564. 10.1016/0262-8856(92)90076-F

    Article  Google Scholar 

  12. Hui BB: Chen, 3D free-form object recognition in range images using local surface patches. Pattern Recognit. Lett. 2007, 28(10):1252-1262. 10.1016/j.patrec.2007.02.009

    Article  Google Scholar 

  13. Leibe B, Hetzel G, Levi P, Schiele B: 3D object recognition from range images using local feature histograms. Proc. CVPR(II) 2001, 394-399.

    Google Scholar 

  14. Mian AS, Bennamoun M, Owens RA: 3D recognition and segmentation of objects in cluttered scenes. Proc. ACV 2005, 8-13.

    Google Scholar 

  15. Andrew E, Johnson MH: Surface matching for object recognition in complex 3-D scenes. Image Vis. Comput. 1998, 16: 635-651. 10.1016/S0262-8856(98)00074-2

    Article  Google Scholar 

  16. Andrew E, Johnson MH: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21(5):433-449. 10.1109/34.765655

    Article  Google Scholar 

  17. Ovsjanikov M, Huang Q-X, Guibas L: A condition number for non-rigid shape matching. Computer Graphics Focum 2011, 30(5):1503-1512. 10.1111/j.1467-8659.2011.02024.x

    Article  Google Scholar 

  18. Bronstein AM, Bronstein MM, Carmon Y, Kimmel R: Partial Similarity of Shapes Using a Statistical Significance Measure. Department of Computer Science, Technion Israel Institute of Technology, Haifa; 2009.

    Google Scholar 

  19. Mahmoudi M, Sapiro G: Three-dimensional point cloud recognition via distributions of geometric distances. Graphical Models 2009, 71(1):22-31. 10.1016/j.gmod.2008.10.002

    Article  Google Scholar 

  20. Goldfeather J VI: A novel cubic-order algorithm for approximating principal direction vectors. ACM Trans. Graph 2004, 23: 45-63. 10.1145/966131.966134

    Article  Google Scholar 

  21. Mustafa A, Shapiro L, Ganter M: 3D object identification with color and curvature signatures. Pattern Recognit. 1999, 32: 339-355. 10.1016/S0031-3203(98)00075-2

    Article  Google Scholar 

  22. Colombo A, Cusano C, Schettini R: 3D face detection using curvature analysis. Pattern Recognit. 2006, 39: 444-455. 10.1016/j.patcog.2005.09.009

    Article  MATH  Google Scholar 

  23. Szeptycki P, Ardabilian M, Chen L: A coarse-to-fine curvature analysis-based rotation invariant 3D face landmarking. Proc. International Conference of Biometrics: Theory, Applications and Systems 2009, 1-6.

    Google Scholar 

  24. Crosilla F, Visintini D, Sepic F: Reliable automatic classification and segmentation of laser point clouds by statistical analysis of surface curvature values. Appl. Geomat. 2009, 1(1–2):17-30.

    Article  Google Scholar 

  25. Li C, Shaonyi D, Nanning Z: A fast multi-resolution iterative closest point algorithm. Proc. CCPR 2010, 1-5.

    Google Scholar 

  26. Witkowski M, Sitnik R: Locating and tracing of anatomical landmarks based on full-field four-dimensional measurement of human body surface. J. Biomed. Opt. 2008, 13(4):044039. 10.1117/1.2960017

    Article  Google Scholar 

  27. Björck A: Numerical Methods for Least Squares Problems. SIAM, Philadelphia; 1997.

    MATH  Google Scholar 

  28. Kalogerakis E, Nowrouzezahrai D, Patricio S, Karan S: Extracting lines of curvature from noisy point clouds. Elsevier J. Comput.-Aided Design 2009, 41(Special Issue on Point-Based Computational Techniques):282-292.

    Article  Google Scholar 

  29. Jolliffe IT: Principal Component Analysis. 2nd edition. Springer, New York; 2002.

    MATH  Google Scholar 

  30. Sitnik R, Kujawińska M, Woźnicki J: Digital fringe projection system for large-volume 360-deg shape measurement. Opt. Eng. 2002, 443(41):443-449.

    Article  Google Scholar 

  31. OGXOptographx.

  32. Princeton Shape Benchmark.

  33. Bronstein AM, Bronstein MM, Kimmel R: Numerical Geometry of Non-Rigid Shapes. Springer Verlag, New York; 2009.

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jerzy Bielicki.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Bielicki, J., Sitnik, R. A method of 3D object recognition and localization in a cloud of points. EURASIP J. Adv. Signal Process. 2013, 29 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: