Skip to main content

Robust surface registration using N-points approximate congruent sets

Abstract

Scans acquired by 3D sensors are typically represented in a local coordinate system. When multiple scans, taken from different locations, represent the same scene these must be registered to a common reference frame. We propose a fast and robust registration approach to automatically align two scans by finding two sets of N-points, that are approximately congruent under rigid transformation and leading to a good estimate of the transformation between their corresponding point clouds. Given two scans, our algorithm randomly searches for the best sets of congruent groups of points using a RANSAC-based approach. To successfully and reliably align two scans when there is only a small overlap, we improve the basic RANSAC random selection step by employing a weight function that approximates the probability of each pair of points in one scan to match one pair in the other. The search time to find pairs of congruent sets of N-points is greatly reduced by employing a fast search codebook based on both binary and multi-dimensional lookup tables. Moreover, we introduce a novel indicator of the overlapping region quality which is used to verify the estimated rigid transformation and to improve the alignment robustness. Our framework is general enough to incorporate and efficiently combine different point descriptors derived from geometric and texture-based feature points or scene geometrical characteristics. We also present a method to improve the matching effectiveness of texture feature descriptors by extracting them from an atlas of rectified images recovered from the scan reflectance image. Our algorithm is robust with respect to different sampling densities and also resilient to noise and outliers. We demonstrate its robustness and efficiency on several challenging scan datasets with varying degree of noise, outliers, extent of overlap, acquired from indoor and outdoor scenarios.

1 Introduction

In the past decade, there was a growing interest in 3-D reconstruction and realistic 3-D modelling of large-scale scenes such as urban structures. Applications of such models include virtual reality, cultural heritage, urban planning, and architecture. Commonly, these applications require a combination of laser sensing technology with traditional digital photography.

Applications that employ only digital images extract 3-D information using either a single moving camera or a multi camera system, such as a stereo rig. In both cases, the system extracts and matches distinctive features (typically points) among the available images and estimates both their 3-D positions and the camera parameters [1, 2]. It is then possible to exploit the result of this first step to perform a dense point reconstruction by estimating a depth map for each image [3, 4]. On one hand, these approaches are useful for those applications requiring a robust and low-cost acquisition system. On the other hand, laser sensing technology yields much higher precision and resolution. Thus, the laser sensing technology represents an effective and powerful tool for achieving accurate geometric representations of complex surfaces of real scenes.

In recent years, 3-D laser scanners able to provide satisfying measurement accuracy for different applications become commercially available. These sensors are used to acquire a complex real scene through multiple scans taken from different positions to fully describe the scene while reducing the number of occluded surfaces. For this reason, it is important to employ a systematic and automatic way to align, or register, multiple 3-D scans to represent and visualize them in a common coordinate system. Geometrically, given a point cloud Q considered as reference and a second point cloud P, the problem of registration consists in finding the rigid transformation T, which optimally aligns P to Q in its coordinate system.

1.1 Related works

The iterative closest point (ICP) algorithm [5] is the de facto standard to compute the rigid transformation T between two point clouds. It is basically an optimization method that starts from an initial estimate of T and iteratively refines it by generating pairs of corresponding points on the scans and minimizing an error metric, e.g., the sum of squared distances between corresponding points. Although several variants of ICP were presented [6] to improve its efficiency, the main problem is to achieve a good initial estimate of T since the ICP optimization can easily stop in local minima.

The problem of automatically registering two scans was achieved with a wide variety of methods [7]. Most of these extract sets of feature points, which are automatically matched to recover a good approximation of T. Aiger et al. [8] proposed to automatically match congruent sets of four roughly coplanar points to solve the largest common point (LCP) set problem. Congruent sets of points have similar shapes defined in terms of point distances and normal deviations. The best match between congruent sets is randomly found by following the RANdom SAmple Consensus (RANSAC) approach [9]. Other approaches use shape descriptors to identify sets of candidate feature points to be matched. Gendalf et al. [10] use a 3-D integral invariant shape descriptor to detect feature points, which are matched in sets of three items using a branch-and-bound algorithm. Other interesting shape descriptors invariant with respect to rigid transformation are used to identify feature points, such as scale invariant feature transforms (SIFT)s [11, 12] or Harris corners [12] extracted from reflectance images, 3-D SIFT-like descriptors extracted from triangle meshes approximating the point clouds [13], wavelet features [14], intensity-based range features [15], spin images [16, 17], and extended Gaussian images [18].

Methods to automatically recover the rigid transformations from matching sets of higher-level features were also presented. The advantage of these approaches is the reduction of the search space identified by two small sets of features, which results in efficient matching, but that should account for extra computation time due to scene segmentation or feature detection. Among the feature types presented the most interesting are: lines [19, 20], planes [1922], circles [23], spheres [24] and other fitted geometric primitives [25].

Other studies proposed to formulate the registration as an energy optimization problem that does not need any explicit set of point correspondences. Silva et al. [26] proposed to use an enhanced genetic algorithm to solve the range image registration problem using a robust surface interpenetration measure. Boughorbel et al. [27] defined an energy minimization function based on Gaussian fields to solve the 3-D automatic registration.

The last relevant class of registration approaches is based on modelling the alignment of two point sets as an assignment problem, where the probability of a point in one set to has a correspondence in the other set is estimated and maximized with expectation maximization (EM) algorithms. Popular methods following this approach are known as SoftAssign [28] and EM-ICP [29], which are both based on entropy maximization principles, but imposing different constraints for problem optimization, i.e., a two-way constraint embedded into the deterministic annealing scheme for SoftAssign and a one-way constraint for the EM-ICP. A detailed review and analysis of these methods was provided in [30], where Liu proposed a method to overcome SoftAssign and EM-ICP limitations based on modelling the registration problem as Markov chain of thermodynamic systems and on an entropy model derived from the Lyapunov function. Furthermore, fast implementations on GPU of the SoftAssign and EM-ICP algorithms were recently presented by Tamaki et al. [31].

1.2 Our algorithm

Our method utilizes 3-D points (possibly associated with point descriptors, as it is described in Section 6) to achieve automated registration. It automatically aligns two scans by finding two N-points approximate congruent sets leading to a good estimate of the transformation T between the corresponding point clouds. T is then further refined via the ICP algorithm.

Given two scans P and Q, our algorithm randomly searches for sets of congruent groups of points in P and Q. Corresponding groups are then used to estimate a rigid transformation T to align P to Q. The optimal transformation is recovered following a RANSAC optimization [9], which iterates the following steps until a good solution to the problem is found or the number of iterations exceeds a predefined threshold Imax:

  1. 1)

    Random selection of a N-points base p in P.

  2. 2)

    Approximate congruent group selection of N-points bases in Q. The definition of approximate point set congruence is described in Section 2. This selection is achieved by using a general codebook to efficiently find approximate congruent points bases under rigid transform by exploiting combinations of feature point descriptors when available (see Section 3).

  3. 3)

    Estimation of the transformation T between P and Q given a randomly selected N-points base p in P and each extracted approximate congruent N-points base p in Q.

  4. 4)

    Verification of the transformation. T is verified using all possibly corresponding points after the alignment. The verification employs our proposed quality-based largest common pointset (QLCP) measure described in Section 4.4.

The best transformation is selected as the one yielding the best QLCP measure and then further refined with the ICP algorithm. As in [20], we present a variant of this RANSAC-based algorithm, which improves the random selection step by employing a weight function approximating the probability of each pair of features in P to be matched with one in Q. We call this variant as probability-based RANSAC approach and describe it in Section 4.1.

Our algorithm is robust with respect to different sampling densities and the typical noise introduced by laser scanner acquisition. This is achieved by employing suitable point sampling approaches described in Section 5, and by using feature points and their descriptors to effectively constrain rigid transformations on noisy point sets.

Through our proposed matching framework presented in Sections 2 and 3, we efficiently match points and point pairs in a multi-dimensional space defined by a set of available geometric and texture feature descriptors and geometrical constrains of a set of sampled points. The matching is performed by combining suitable metric functions to compare the provided descriptors.

Any type of features carrying suitable distance functions to be compared can be easily integrated into our matching framework. The major benefit of this approach is to make possible efficient customizations for specific applications aiming at relevantly improving the registration performance in terms of robustness, accuracy, and execution time.

We also present a method to improve the matching effectiveness of texture features extracted from typical spherical reflectance images acquired by laser scanners. It consists in extracting features from atlases of rectified perspective images constructed by sampling the reflectance image spherical field of view at suitable angles. This approach mitigates the effect of spherical distortion on the resulting feature signatures so that they can be matched with higher reliability.

The robustness, accuracy and efficiency of our method were overall evaluated on several challenging scan datasets acquired from indoor and outdoor scenes as described in Section 7.

2 Approximate point set congruence

Given two point sets P and Q, we assume p = { p i | p i P } i = 1 N and q = { q i | q i Q } i = 1 N to be the two corresponding N-points bases from P and Q, respectively. This means that for each point p i p there exists one and only one corresponding point q i q . We consider the two sets to be congruent, if they are approximatively similar in shape and have a similar distribution in 3-D space. We define both a similarity score function and a binary similarity score function in order to measure the congruency of two matching N-points bases as follows.

Given a point p i p , we characterize it using a set of L local descriptors, i.e., { f l ( p i ) } l = 1 L . Similarly, for each pair of points (p i , p j ) in p , we define a set of K measurements { m k ( p i , p j ) } k = 1 K . The L descriptors and the K measurements characterize a N-points base in terms of point features and point pairs relations (see Figure 1). These values are then used to define the congruency of the two different N-points bases p and q .

Figure 1
figure 1

An example of two congruent N -points bases ( left from P and right from Q ). For each point, we evaluate the set of L local features and for each points pair, we evaluate the set of K measurements. Features and measurements between the two sets are compared using the corresponding similarity differences.

For each type of local descriptor or points pair measurement, we define a similarity difference function d(·, ·) that is invariant under rigid transformation of each single N-points base. In particular, given two descriptors or measurements v p and v q , then d(v p , v q ) is represented by a real positive value that states how different the two descriptors or measurements are.

We also define a set of boolean similarity measures

s ( v p , v q ) =b ( d ( v p , v q ) , t ) ,
(1)

where t is a threshold value associated to the particular feature descriptor or measurement and b(·, ·) is a boolean function defined as:

b ( x , t ) = 1 x t , 0 x > t .
(2)

The set of functions {d(·, ·)} are then composed together to define a similarity score s c ( p , q ) between two congruent N-points bases as follows:

s c ( p , q ) = s c f ( p , q ) + s c m ( p , q ) ,
(3)

where s c f is the term related to the local descriptors and s c m is the term related to the similarity measures of points pairs. These two terms are defined as:

s c f ( p , q ) = 1 N f i = 1 N l = 1 L w l f 1 - min d ( f l ( p i ) , f l ( q i ) ) , t l f t l f ,
(4)
s c m ( p , q ) = 1 N m i , j = 1 N i j k = 1 K w k m 1 - min ( d ( m k ( p i , p j ) , m k ( q i , q j ) ) , t k m ) t k m ,
(5)

where { w l f } and { w k m } are user-defined weights, N f =2N l = 1 L w l f and N m =2N ( N - 1 ) k = 1 K w k m are normalization factors. Notice that s c is defined such that its values fall in the range [0, 1], where a higher value represents a higher similarity between the two N-points bases.

Similarly, we define the binary similarity score s c ( p , q ) between two N-points bases as:

s ( p , q ) = i = 1 N l = 1 L s l ( f l ( p i ) , f l ( q i ) ) i , j = 1 N i j k = 1 K s k ( m k ( p i , p j ) , m k ( q i , q j ) ) .
(6)

where s c ( p , q ) represents the product of all boolean similarities associated to the matching points of the two sets. We consider p and q to be approximate congruent only if s ( p , q ) =1.

In order to evaluate the N-points base congruence, we need to define which local point descriptors {f l (·)} and points pair measurements functions {m k (·, ·)} to employ, their corresponding similarity differences {d(·, ·)} and scalar thresholds {t}. We consider as first points pair measurement the Euclidean distance m1(p i , p j ) = ||p i - p j || and define its similarity difference as:

d 1 ( m 1 ( p i , p j ) , m 1 ( q i , q j ) ) = 1 - min ( m 1 ( p i , p j ) , m 1 ( q i , q j ) ) max ( m 1 ( p i , p j ) , m 1 ( q i , q j ) ) .
(7)

If the surface normal at each point is available, we define the second points pair measurement m2(p i , p j ) = nangle(n(p i ), n(p j )), where n(p) denotes the surface normal of the point p and nangle(·, ·) denotes the minimal angle between the two surface normals. Its similarity difference is defined as:

d 2 ( m 2 ( p i , p j ) , m 2 ( q i , q j ) ) =| m 2 ( p i , p j ) - m 2 ( q i , q j ) |.
(8)

If the reflectance or colour images associated with the range scans are available we can extract the corresponding feature points (e.g., SIFT or SURF feature points [32]) associated with each 3-D point of p and q . The corresponding local feature descriptors can be used to define a suitable similarity difference.

In some application, it is possible to exploit some information about the environment to define additional descriptors. This is the case of scans representing structural scenes with one common and main normal direction (ground floor scene) or environment with three common orthogonal normal directions (orthogonal scene). For instance, in an indoor/outdoor scene with a common and main ground floor plane, all points lying on the ground plane roughly have the same normal directions. The type of a structural scene can be automatically detected and classified by clustering the surface normals.

In the case of ground floor scene, we initially transform all points in P and Q to align their corresponding ground floor normals to the z-axis. Then for each point p = (px, py, pz), an additional local descriptor can be defined as f z (p) = nangle(p, n z ), where n z denotes the direction of z-axis, i.e., n z = (0, 0, 1). f z (p) represents the inclination of the surface passing through p w.r.t. the ground. Its similarity difference is defined as:

d f z ( f z ( p i ) , f z ( q i ) ) = | f z ( p i ) - f z ( q i ) | .
(9)

In addition, we can introduce another points pair measurement m z ( p i , p j ) = p i z - p j z , i.e., the height difference between the two points. Its corresponding similarity difference can be defined as:

d m z ( m z ( p i , p j ) , m z ( q i , q j ) ) =| m z ( p i , p j ) - m z ( q i , q j ) |.
(10)

If both P and Q are acquired from an orthogonal scene, P and Q are first transformed to align their corresponding three orthogonal point normals to the x-, y- and z-axis, respectively. Then, we exploit two more local descriptors defined as f x (p) = nangle(p, n x ) and f y (p) = nangle(p, n y ), where n x = (1, 0, 0) and n y = (0, 1, 0). These descriptors represent the inclinations of the surface passing through p w.r.t. the additional main axes n x and n y , respectively. In addition, we introduce two other points pair measurements m x ( p i , p j ) = p i x - p j x , and m y ( p i , p j ) = p i y - p j y . The corresponding similarity difference of f x , f y , m x and m y are defined in the same way as in Equations (9) and (10), respectively.

3 Fast search codebook

Using the criteria defined in the previous section, we are able to evaluate the congruency of two given N-points bases. To perform the registration, we need to couple a N-points base p P with the N- points base q Q having a high-similarity score. This task requires a search over all possibly congruent N-points bases in Q. Using exhaustive search approaches is impractical due to the large number of candidates in Q. To solve this problem, we build a codebook from P and Q composed of two different data structures used to perform a fast search of possibly corresponding points (as described in Section 3.1) and point pairs (as described in Section 3.2). In particular, we employ a boolean table S f used to detect candidate point matches in Q of a selected point p i P and a multi-dimensional table S m used to detect candidate point pair matches of a selected pair of points ( p i , p j ) P. If the number of all detected congruent N-points bases is still large, we further need to compute a similarity score between p and each detected base in Q in order to sort them and then consider only the best ones.

The used codebook is, thus, composed of a boolean m × n table and a floating-point n × n × K tablea, where m=|P|,n=|Q| and K denote the number of used points pair measurements. The required memory for S f increases as O(mn) and for S m increases as O(n2) (assuming K n).

Our algorithm detects candidate congruent N-points bases incrementally. Given p P, we start by selecting two points in p and collect all congruent 2-points bases in q . We then iteratively add points to the current selection and grow the set of candidate bases until we reach a set of N-points bases.

3.1 Point features lookup table

We build S f as a m × n boolean similarity measure table according to the used local descriptors, where m=|P| and n=|Q|, i.e., the sizes of P and Q, respectively. Each element in S f is defined as:

S f ( p i , q j ) = l = 1 L s l ( f l ( p i ) , f l ( q j ) ) ,
(11)

where p i P and q j Q,i=1m, and j = 1...n. Thus, given a point p i P, we can recover all possibly matching points in Q considering the i th row of S f . The set of candidate matches in Q for p i is represented by:

f ( p i ) = { q j | S f ( p i , q j ) = 1 } q j Q .
(12)

Notice that, to build S f , we only make use of the local feature descriptors. Its size depends on the number of points of both point clouds. In Section 5, we describe several techniques to sample the input acquisitions in order to reduce their sizes.

3.2 Point pairs lookup table

Given a point pair ( p i , p j ) P, we need to efficiently find candidate matching pairs in Q. Using a blind exhaustive search, this would require a comparison with n ( n - 1 ) 2 point pairs. To reduce the searching time, we build a lookup table S m for Q by uniformly quantising the K-dimensional space formed by the used K points pair measurements {m k (·, ·)}, i.e., the Euclidean distance, the surface normal minimal angle, the gap difference(s) in the x-, y- or z-axis for structural scenes, etc. The quantisation is achieved by uniformly dividing their corresponding value ranges into B1, B2, ..., B K bins, respectively. The range of the Euclidean distance is [ d min q , d max q ] , where d min q and d max q denote the minimal and maximal distances of point pairs in Q. Surface normal minimal angle falls in the range [0, π]. The gap differences fall in the ranges [-b x , b x ], [-b y , b y ] and [-b z , b z ], respectively, where b x , b y and b z represent the lengths in x, y and z-axis of the minimal bounding box covering all points in Q, respectively. Each K-dimensional bin contains all points pairs ( q i , q j ) P whose measurements {m k (q i , q j )} fall within the bin ranges.

In order to detect the matching point pairs of (p i , p j ), we initially evaluate the set of measurements {m k (p i , p j )}. We then consider the thresholds { t k m } associated with each measurement function in order to estimate a set of ranges { ( m k ( p i , p j ) - t k m , m k ( p i , p j ) + t k m ) } . We select all K-dimensional bins of S m that are covered or partially covered by the estimated set of ranges and recover the associated points pairs of Q. In particular points pairs that belong to partially covered bins are checked by verifying whether their measurements fall within the estimated set of ranges. Each extracted candidate matching pair (q i , q j ) is further verified by exploiting the point feature table S f to keep only pairs whose points features correspond. In particular, we test that:

S f ( p i , q i ) S f ( p j , q j ) = 1 .
(13)

Finally, using Equation 3, we evaluate the similarity score of each remaining candidate pair with (p i , p j ) and keep only the best K p pairs. In case of very distinctive points p i and p j , there are few correspondences in Q with similar local features. For such points pairs, it is more convenient to select, at first, the set of matching points using S f and then verify each points pair using the set of measurements {m k (p i , p j )}. This initial test is conducted by evaluating the value of | f ( p i ) |×| f ( p j ) |, i.e., the largest number of candidate point pair matches w.r.t. p i and p j due to their local features. When this value is lower than a threshold, we employ this latter selection method. Our codebook-based search method allows one to efficiently range-search candidate matching point pairs using adaptive ranges for each query. If we regard the K point pair difference measurements as a K-dimensional vector, other fast search methods can be used for searching, e.g., the approximate nearest neighbor based on kd-tree [33]. However, these methods cannot handle the threshold constraints in each dimension, which may produce more candidates to test while discarding valid ones.

3.3 Iterative search of matching N-points bases

Finding the best corresponding point base set of a N-points base p P requires to test O(nN) N- points bases in Q with a blind exhaustive search, which is often impractical due to the size of the search space. To efficiently search approximate congruent N-points bases in Q given a base p P, we employ an iterative approach that makes use of the codebook defined in the previous sections. We start by selecting a query set i composed by a points pair of p and search candidate congruent 2-points bases using S f and S m . We then iteratively add points of p to i and build the corresponding candidate congruent bases by grouping point pairs or adding single points to the previous candidate bases until i corresponds to p and all candidate bases are represented by N-points bases. Algorithm 1 describes the procedure in detail, which is also illustrated in Figure 2. In Algorithm 1, we describe two approaches to gradually expand the size of a congruent base in Q. The first approach is to add an approximate congruent point pair having a common point with a previous base and satisfying the used congruent constraint. The second approach is to add a single point from f ( p i + 1 ) according with the used congruent constraint. Since it is difficult to select the best approach, we use a simple strategy based on the product of set sizes | f ( p i ) |×| f ( p i + 1 ) |. In particular, if this size product is large, we use the former method, otherwise the latter one.

Figure 2
figure 2

The procedure of N -points approximate congruent set searching.

4 RANSAC pose optimization

To find the best transformation T that aligns the two points sets P and Q, we employ a variant of the RANSAC algorithm [9], which is a widely used general technique for robust fitting of models to data corrupted with noise and outliers. The RANSAC-based alignment procedure is straightforward: randomly pick a base p of N non-collinear points from P; detect the corresponding best congruent bases { q k } k = 1 K g and for each one compute the candidate transformation that aligns points in p with points in q k ; and finally verify the recovered transformations and detect the best one using a best fit criteria. To achieve a certain probability of success, this procedure is repeated for different choices of bases from P. Over all such trials in all iterations, we select the best transformation T with the best fit measure. Our adopted RANSAC algorithm terminates when one of the following two cases is reached:

  1. 1.

    The number of iterations reaches a predefined maximal iteration number Imax;

  2. 2.

    The best transformation is not updated after Inou continuous iterations.

Our method makes use of the codebook-based search scheme defined in Section 3, which is constructed before the optimization. The following sections describe in details each single step of the RANSAC iteration.

4.1 Random selection

Assumed k points in P having corresponding points in Q, the probability of successfully selecting N-points from P having correspondences in Q is p(N) ≈ (k/m)N, where m=|P|. To successfully recover the transformation, in general, we employ a base size N = 3, 4 or 5 points because the probability of success greatly decreases when the base size N increases. Moreover, to make the estimated transformation more robust, we select the N-points base p from P as decentralized as possible in 3-D space.

Notice that when the overlap between two scans is small only a very small subset of points in P have corresponding points in Q. In this case, the probability of selecting a N-points base in P with a uniform distribution having a corresponding N-points base in Q is very low. To improve the selection probability, we propose a probability-based RANSAC approach described as follows. We initially build a m × m pairwise matching probability table S p for all point pairs in P. Given a point pair (p i , p j ) in P, its matching probability is defined by

S p ( p i , p j ) = 1 C exp - φ 1 | f ( p i ) | | f ( p j ) | n 2 - φ 2 | | p i - p j | | d max q - φ 3 s c ( ( p i , p j ) , ( q ̃ i , q ̃ ) ) ,
(14)

where C is a normalization factor, and { φ i } i = 1 3 are three positive constants. ( q ̃ i , q ̃ j ) is the best matched point pair of (p i , p j ) with the best similarity score in Equation 3. Notice that if no approximate congruent match ( q ̃ i , q ̃ j ) is found, we set the probability value to zero, i.e., S p (p i , p j ) = 0. This probability is high if:

  1. 1.

    Both points p i and p j potentially have several matches in Q based on their considered local descriptors (see Equation 12),

  2. 2.

    They are well spaced and

  3. 3.

    There exists a very similar 2-points base ( p i , p j ) Q according to the similarity measure s c (·, ·) defined in Equation 3.

The selection of a N-points base p = { p s k } k = 1 N from P proceeds iteratively by adding points to a selected point set p c . This is done as follows:

  1. 1.

    We randomly select the first two points ( p s 1 , p s 2 ) based on the probability values {S p (p i , p j )}i>j,1 ≤ i, jmof the upper triangular part of the symmetric pairwise matching probability table S p . These points are added to the initially empty p c .

  2. 2.

    The next point p s k + 1 ,k2 is randomly selected based on the joint probability values p s k p c p ( p i , p s k ) p i P , p i p c .

In this way, there is a high probability to select a N-points base p with corresponding points in Q.

At the end of the RANSAC iteration, if we successfully recover a candidate transformation T using the selected N-points base, the probability table S p is updated. In particular, we update the corresponding elements of points p s k p by suitably increasing the probability values: for each point p i P, the new probability value S p ( p i , p s k ) (and its symmetric value S p ( p s k , p i ) ) is evaluated as

S p ( p i , p s k ) = S p ( p i , p s k ) exp ( ν f QLCP ( T , δ ) ) ,
(15)

where f QLCP ( T , δ ) is the transformation fitting criterion described in Section 4.4 and ν is a positive constant. The update of probabilities increases the chance to select good samples in P, which is very useful for aligning two scans with a small overlap. To avoid unbalanced values in S p , we decrease the probabilities of some elements if these have been updated too frequently during the RANSAC iterations. In particular, we decrease the probability value as follows:

S p ( p i , p s k ) = S p ( p s k , p i ) , = ψ S p ( p i , p s k ) ,
(16)

where ψ (0, 1) is a positive constant.

4.2 Approximate congruent group selection

After selecting an N-points base p from P, we need to detect a set of approximate congruent N-points bases. This is done by exploiting the fast codebook structures S f and S m defined in Section 3. In particular following Algorithm 1, we are able to iteratively recover the set of congruent N-points bases as we select points of p from P. We keep only the first K g candidates according to Equation 3. These K g N-points bases { q k } k = 1 K g are used in the following step to estimate a set of point clouds transformations.

4.3 Transformation estimation

Given two point sets P and Q with overlapping regions in arbitrary initial positions, we recover the best transformation from a prescribed family of transformations, typically rigid transformations, that best aligns the overlapping regions of P and Q. In case of rigid transformation, we need a base size of at least three points to uniquely determine the aligning transformation. This means that our algorithm requires at least a pair of matching 3-points bases from P and Q, respectively. In particular for any given N-points bases pairs p and q , we recover the corresponding transformation T using the closed-form solution [34].

4.4 Transformation verification

To determine the best transformation, Aiger et al. [8] employ a best fit criteria called as the largest common pointset (LCP) measure f LCP ( T , δ ) , which refers to the transformation bringing the maximum number of points from P to within some δ-distance of points in Q. Unfortunately, this criteria completely depends on the choice of the distance threshold δ. On one hand, if this threshold is too large wrong transformations may result in large LCP measure values. On the other hand, if δ is too small in some cases no transformation can be found. The main problem is that the LCP measure only considers the quantity of matched points in overlapping regions, but not their matching quality. To solve this problem to some extent, we propose to integrate a suitable matching quality measure into the LCP measure. Suppose that we have two transformations T a and T b computed from two different selected N-points congruent group pairs and resulting in the same LCP measure under the same distance threshold δ. Assume that the histograms of the point distances of the two transformations correspond to the ones shown in Figure 3a, b, respectively. Intuitively, T b is a better solution than T a because most of the corresponding point pairs have shorter distances in Figure 3b than in Figure 3a, i.e., the mean distance of the corresponding point pairs in Figure 3b is smaller than that in Figure 3a. We expect that better point matches (with shorter distances) result in a better transformation. We, thus, define a suitable matching score based on a normalized accumulated histogram n ( T , δ ) (see Figure 3c, d) corresponding to some given transformation T as follows:

Figure 3
figure 3

Histograms of points under two found transformations between two point sets resulting in the same LCP measure under the same distance threshold δ and their normalized accumulated histograms.

m s ( T , δ ) = exp - λ 1 - 0 1 n ( T , δ ) ,
(17)

where λ is a positive parameter and 0 1 n ( T , δ ) denotes the integral of n ( T , δ ) , i.e., the area below the cumulative curve in n ( T , δ ) . We use a quality-based LCP (QLCP) measure defined by f QLCP ( T , δ ) = m s ( T , δ ) f LCP ( T , δ ) as our best fit criteria. By weighting f LCP ( T , δ ) with the quality estimate m s ( T , δ ) the QLCP measure is made less sensitive to the choice of δ than the LCP measure.

After the transformation estimation step we evaluate each recovered transformation T k between p and q k w.r.t. the mean alignment error. In particular, we test whether the error 1 N p i p , q l k q k | | T k p i - q i k | | 2 is less than some predefined threshold, where T k p i denotes the transformed point of p i via T k . We further verify each remaining transformation by detecting how many points in P have correspondences in Q under T k and then measuring the matching score as described in Equation 17. We say that one point p P have a corresponding point in Q under T k , if there exists some point in Q closer within δ-distance to the transformed point T k p , i.e., q Q || q - T k p ||δ. For efficiency, we use the approximate nearest neighbours [33] for neighbourhood querying in 3. We first select a fixed number of points { p i } P and apply the transformation T k . Then, for each transformed point T k p i we query the nearest neighbour in Q. If enough points of {p i } are matched, we perform a similar tests for the remaining points in P and assign to T k a score based on our used QLCP measure.

Finally, we update the current best transformation found with the best QLCP measure and start the next RANSAC iteration.

5 Point sampling approaches

Given two large point sets P and Q, matching approximatively congruent sets of points over the entire data set is not feasible. Thus, we need efficient point sampling strategies to quickly search corresponding sets and to effectively estimate and verify their transformation on a limited number of meaningful candidates points. The reliability of the proposed registration approach depends, to some extent, on the used sampling strategy. If we sample too many points from scarcely meaningful regions the registration might converge slowly, find the wrong transformation (such as solutions showing sliding effects produced by samples poorly constraining the transformation), or even diverge, especially in the presence of noise and outliers. Several point sampling techniques for point cloud alignment were recently proposed [6, 3537]. In [6] the random, uniform (over the surface area of a model) and normal-space sampling are considered to evaluate the convergence of the ICP algorithm. The normal-space sampling algorithm tries to uniformly spread the normals of the selected points on the sphere of directions. The aim is to consider points that sufficiently constrain the estimated rigid transformation and improve the alignment quality by reducing surface sliding effects. Gelfand et al. [35] proposed a variant of this algorithm to make the transformation estimation geometrically stable by selecting points that reduce both translational and rotational uncertainties in the ICP algorithm. This technique samples points in order to equally constrain all eigenvectors of the covariance matrix estimated from the points and the normals of the overlapping region of two point clouds. A similar approach was used in [36] to conceive a probability function to guide the selection of stable sample points, which are also constrained by specific features. Torsello et al. [37] proposed a sampling technique to select feature points with high-local distinctiveness, which is inversely proportional to the average local radius of curvature and related to the area formed by similar points in the neighbourhood of each point. Nehab and Shilane [38] discussed the limitation of the area-based uniform sampling, where the probability of a surface point being sampled is equal for all surface points. This type of sampling might produce points very closed to each other and miss important surface features, which could successfully constrain the transformation. To overcome these drawbacks they proposed a stratified point sampling strategy ensuring an even distribution of the sample points on all surface, which implies a higher probability to catch important surface regions. This algorithm uses the voxelization of the model to generate random samples with controlled intra-distances. Other sampling strategies providing with uniform distribution of sample points on a surface are based on the farthest point [39] and Poisson disk sampling [40], which require the computation of geodesic distances.

In this section, we investigate four sampling approaches: random, uniform, probabilistic and combined sampling, which are described in details in the following.

5.1 Random sampling

Random sampling is the simplest and widely used sampling technique. In random sampling, each item or element of the population has an equal chance of being selected at each draw. A sample is random if the method for obtaining the sample meets the criterion of randomness (each element having an equal chance at each draw). The actual composition of the sample itself does not determine whether or not it was a random sample.

5.2 Uniform sampling

To achieve a high probability of success for registering two overlapping point sets, we expect that the sampled point sets have similar point densities. If the acquired surfaces present similar point densities, the above mentioned random sampling deserves to be an acceptable choice, which will also result in similar point densities in the sampled overlapping parts. However, this assumption does not hold in general. Point density depends on the distances and on the incident angles of the scanned surfaces with respect to the scanner sensor position and orientation, respectively. Normally, short distances and small incident angles lead to surfaces with high-point densities and accuracies, which we consider as high-resolution regions. Random sampling does not guarantee an equal spread of the generated points neither on the surface nor in the volume of the scanned model, and can sample points very close to each other. Thus, it is more likely that it misses important surface features than an evenly distributed sampling [38]. This effect is particularly evident in case of scanned data with non-uniform point densities.

Many approaches to perform a sampling of uniformly distributed points on the model surface can be employed [3840]. We propose a simple and efficient variant of the method presented in [38], which is based on cubic voxelization of point clouds and that provides with samples evenly displaced on typical scanned surfaces. Given a point set P, we can assign them into a set of 3-D cubic voxels of equal sizes, which partition the 3-D space. For each such voxel, we select the closest point to the voxel center as the sampled point. To obtain a sampled point set of a given size N s , we start by splitting the minimal bounding box of the point cloud into a small set of 3-D cubic voxels. We then iteratively split each voxel into eight small voxels until there are enough sampled points found. With this strategy, however, we cannot obtain an accurately fixed-size set of sampled points since most of the voxels do not contain points. Let N l be the number of sampled points at the l th level. The final level L is such that its number of points N L is not less than N s and the number of points at the previous level is NL-1< N s . To obtain a sampling of the expected size N s , the simple way is to randomly select N s points from the N L points of the L level. To achieve a more uniform distribution, we propose, instead, to re-split all points in P into a set of cubic voxels of size 8 N s 2 N L - 1 N L 1 3 S L - 1 , where SL-1denotes the voxel size at the (L - 1)th level. In this way, the number of uniformly sampled points is very close to N s , but still not exact. If the number is larger than N s , we randomly select N s points from them. Otherwise, we add some new points from sampled point set at the L th level.

5.3 Probabilistic sampling

If the acquired surfaces are very similar in structure, e.g., the surfaces of an indoor environment composed of a main flat wall and several small objects in the front of it, the above two sampling approaches may not be efficient for our proposed registration. One reason is that we select the best transformation based on the degree of overlap in the point sets, but not based on the whole scene structure. In the above example, points from the main single wall weakly constrain translations and generate sliding effects in the final alignment, as already discussed in [6, 3537]. Another reason is that a selected N-points base from the wall would have a large number of approximate congruent bases. This would require expensive searches, tests, estimates and verifications of candidate transformations over a large set of approximate congruent bases. To avoid these problems, we expect to consider more points from objects in front of the wall, which would reduce the computation and better constrain the rigid transformation. Similarly to [37], this is achieved by utilizing a probabilistic sampling technique, which selects points based on their likelihoods computed from a specific weight function. The weight function determines how much each point is relevant for registration and is basically associated with the local geometrical properties of the surface at each point. We experimented with two different weight functions ωSV and ωAPD based on surface variation and adjacent point distance, respectively.

The surface variation of a point p is defined as:

ω SV ( p ) = 3 λ 1 λ 1 + λ 2 + λ 3 ,
(18)

where λ1 λ2 λ3 are the eigenvalues corresponding to the principal components of a set of k points in the neighbourhood of p. ωSV(p) [0, 1] indicates how much the surface at p locally deviates from the tangent plane [41]. In practice, ωSV(p) roughly approximates the mean curvature at p: when its value is close to zero it indicates that the surface is locally planar, while, when ωSV(p) is large, p identifies an interesting feature like corners, bumps, etc.

The adjacent point distance [42] is defined by exploiting the grid structure of range image related to the acquired point cloud. Let p be a valid point associated with a pixel of , its adjacent point distance A d ( p ) is defined as the median of the distances between p and its adjacent valid points p k in a 3 × 3 neighbourhood of p, i.e., A d ( p ) = media n k || p - p k ||. Then, to reduce the effect of measurement noise, a median filter of size 5 × 5 is applied on the resulting adjacent point distance map A d to get a filtered map A ̃ d . The weight function ωAPD of a point p is then defined as:

ω APD ( p ) = A ^ d 2 if A ^ d ( p ) A ^ d , A ^ d ( x ) 2 otherwise .
(19)

where A ^ d denotes the 95th percentile of the adjacent point distances in an ascending order of all points in A ̃ d . The use of A ^ d effectively suppresses estimation errors from outliers. ωAPD(p) estimates the local sampling sparsity of the scanned surface at p. High values of ωAPD characterize those points having a neighbourhood sparsely sampled, typical of corner and edge points and regions scanned with low incident angle, which are likely located in the overlapping area of the models.

5.4 Coupled sampling

Besides the aforementioned three sampling approaches, we also consider to couple different sampling approaches in some order. For example, a point set P 1 is selected from the initial point set P based on probabilistic sampling, after that, another point set P 2 is selected from P 1 based on uniform sampling. The finally sampled point set can be selected from the initial point set P via two or more sampling in some order with given sampling ratios. The sampling ratios of the finally sampled point set P S from P via S sampling processes are denoted as | P 1 |:| P 2 |::| P S | where | · | denotes the set size. For different scans to be aligned, we can select a suitable sampling approach.

6 Integrating texture features

If the acquired models lack geometric details to be used as good anchor points for correct registration, we can still employ features extracted from other visual sources provided by the laser range scanners, e.g., the reflectance image. Laser range scanners are non-contact 3-D scanners that measure the distance from the sensor to points in the scene, typically in a regular grid pattern. A natural byproduct of this acquisition process is the reflectance image. A reflectance image (shown in Figure 4) stores in each pixel the portion of laser light reflected from the corresponding surface point, providing with important information about its texture. Both 3-D space distribution and texture characteristics of the texture features extracted from the reflectance images can well constrain a rigid transformation in 3-D space. As shown in [11, 12] texture features can be effectively used to identify anchor points leading to a well-constrained rigid transformation.

Figure 4
figure 4

Atlas of rectified images generated from a reflectance image: (top) reflectance image; (bottom) atlas of 14 rectified images generated by sampling a sphere every 60°.

A feature is accompanied by a descriptor, which locally and compactly describes the texture around the feature pixel. In our application, we are interested in good local feature descriptors, which should have a high-local distinctiveness, invariant w.r.t. affine transformations, and possibly robust w.r.t. illumination changes and local deformation. Several local feature descriptors were presented in the literature [32, 43]. Among them, the most suitable for our application are the SIFT [44], SURF [45] and FAST [46] descriptors, which we extract from the reflectance images of our scanned models and consider as relevant sample points to be matched by our registration algorithm.

6.1 Reflectance image rectification

The aforementioned features cannot be efficiently extracted directly from the reflectance image associated with a scanned model, since its intrinsic spherical format strongly affects the quality of their descriptors, which are not designed to be robust w.r.t. spherical distortions. Indeed, typical laser scanner acquisition systems are usually composed by a fixed platform and a rotating head, which are naturally modelled by simple spherical projections. The resulting acquired images are then obtained by mapping spherical images onto single image planes. The meridians of a spherical image are mapped to vertical viewing planes, and the parallels are mapped to viewing cones with the vertex in the sensor position.

In order to reduce the distortion induced by the spherical projection, we compute the above mentioned feature descriptors on an atlas of rectified images (shown in Figure 4). This is possible since both the spherical projection and the atlas perspective projections will share the same point of view.

To recover the set of rectified images, we initially select a field of view value αfov (in our experiments αfov = 60°). We then calculate the width w r and height h r of each rectified image by constraining the pixel resolution to the resolution of the spherical image equator, i.e., the width w s of the spherical image : w r = h r = w s α fov 3 6 0 o . Given the pixel dimensions of the image plane, we define a standard perspective projection whose principal point is represented by the central pixel of the image plane. The only missing intrinsic parameter, the projection focus, can be easily recovered given αfov, w r and h r . To determine the extrinsic parameters of each projection, we fix the camera point of view to the spherical projection point of view, i.e., the local origin of the point cloud. We then sample the sphere with v equally distributed directions. The value of v depends on the required field of view and is estimated such that the images contained in the atlas completely cover the initial spherical image. Given the camera direction d i we recover the remaining extrinsic parameters of the i th image aligning the camera principal direction to d i and the vertical direction with the vertical direction of the spherical image. Exploiting each projection matrix, we can associate to each final image pixel its corresponding viewing ray, and from this the pixel's corresponding coordinates in the original spherical image. These correspondences are used to perform a bilinear interpolation of the spherical image to recover each single finally rectified image (see Figure 4).

The atlas generation only depends on the field of view used. Small values generate multiple small images, whereas large values generate fewer images but with higher-perspective distortions.

6.2 Texture features extraction and integration

From the original reflectance image or the atlas of rectified images generated from a reflectance image, we extract a set of texture features for registration by using the following methods:

  1. 1.

    SIFT [44]: this feature descriptor encodes the trend of the image local gradient around a pixel as a histogram of typically 128 bins. This descriptor is invariant w.r.t. scaling and rigid 2-D transformation and robust w.r.t. affine distortion, addition of noise, and change in illumination. This descriptor is very accurate in identifying relevant interest point, but its computation is usually slow without exploiting the GPU [47].

  2. 2.

    SURF [45]: this method efficiently detects features by computing a rough approximation of the Hessian matrix using integral images. The resulting descriptor is based on sums of approximated 2-D Haar wavelet responses, which is more compact and much faster to compute than the SIFT descriptor.

  3. 3.

    FAST [48]: this techniques classifies a pixel as corner if there is a sufficiently large set of relevantly brighter (darker) pixels in a circular pixel neighbourhood of fixed radius. This feature detection algorithm is very fast, up to 30 times faster than the SIFT one, but is not invariant w.r.t. scaling and does not provide with effective descriptors [43], which are usually computed by using other techniques (e.g., with the SURF method in this paper).

  4. 4.

    Harris corner detector [49]: it has been widely used in image processing and computer vision, and computes corner features by analysing the local changes of the image intensity with patches shifted by a small amount in different directions. It is not scale and affine invariant, and usually generates a high number of features.

The 3-D points corresponding to the extracted texture features are then considered as sampled points and used with their descriptors by our registration algorithm to match congruent sets of points, as described in Sections 2 and 3.

7 Experimental results

7.1 Test data and evaluation criteria

We tested our point-based registration algorithm on a variety of input data with varying amount of noise, outliers, and extent of overlap. Our test dataset includes some small object models as shown in Figure 5a, b, c, which are selected from the data provided in the demo application of the 4PCS algorithm [8]b. Other test data are models of large indoor/outdoor scenes acquired by different types of scanners, which are mostly selected from [20], as shown in Figure 5d, e, f, g, h, i, j, k. In total, we tested 11 pairs of scans with different extents of overlap as shown in Figure 5. These 11 pairs of scans are noted by { G n } n = a , , k corresponding to the models shown in Figure 5a, b, c, d, e, f, g, h, i, j, k, respectively. The small objects models in G a - G c consist of around 20,000-30,000 points. Indoor scan data ( G d - G i ) were captured by Z+F IMAGER 5003/5006/5006i laser range-scanners. Outdoor scan data ( G j and G k ) were captured by the RIEGL LMS-Z420i laser range-scanner. The accuracy of a point acquired by the Z+F IMAGER 5003 (5006 and 5006i) laser scanner is 3 mm (1 mm) along the laser-beam direction at a maximal distance of 50 m from the scanner. The accuracy of the RIEGL LMS-Z420i laser scanner is 10 mm at 50 m. The resolution of all indoor scans is about 2, 530 × 1, 080 and the resolution of all outdoor scans is 3, 000 × 666. No surface normals were provided for points in the scan data G a , G b and G c . For the other scans, we always employed the surface normals into our registration algorithm. The scan data in G a - G c were scaled so that the bounding box diagonal lengths of the first scans in these scan pairs are taken as 100 units. The bounding box diagonal lengths of the first scans of other eight scan pairs G d - G k are around 10, 28, 26, 37, 77, 142, 1, 860, and 398 m, respectively. The overlap rates shown in Figure 5 were computed as follows. The overlap rate of two point sets P and Q is defined as o ( P , Q ) =min | P Q | | P | , | Q P | | Q | , where | · | denotes the size of a point set and PQ denotes a subset of P in which for each point there exist at least one point in Q whose distance to it is below a given threshold. To produce a reasonable overlap rate in 3-D space, we compute the overlap rate by using large point subsets selected from P and Q using our voxel-based uniform sampler instead of using original point sets. Our registration algorithm was implemented in C++ on a Windows XP system and integrated into our commercial software JRC 3-D Reconstructor. All experiments were executed on a 2.67 GHz Intel machine.

Figure 5
figure 5

All test pairs of scan data described with aligned point clouds ( different colours represent different scans), approximated overlapping percentage, and one reflectance image for each pair of scans representing: a, b, c small objects; d, e, f, g, h indoor scenes; i large indoor scenes; j, k outdoor scenes.

We fixed the poses of all reference scans (i.e., { Q } ) to an identity rotation matrix and no translation. To evaluate the transformation estimation accuracy of our registration algorithm, we first employed our algorithm on all tested pairs of scans to recover a good initial transformation for each scan pair and then applied the ICP optimization algorithm [5] to get a well-aligned transformation. We observed that the mean residual error after the ICP registration optimization was always comparable with the laser scanner measurement error, which is much lower than the estimation errors of our proposed N-points congruent sets (NPCS) registration algorithm. For this reason, we regarded this ICP-optimized transformation as the ground truth transformation T g for evaluation. Thus, given an estimated transformation T from P to Q, the estimation error is defined as the median of the point distances after applying T and T g onto P, i.e., media n p P | | T p - T g p | | .

The transformation estimation accuracy was statistically evaluated by running our registration algorithm Nrun = 20 times on each tested pair of scans. In each run, we refresh the input data by setting a random pose for the moving scan followed by re-sampling. For each scan pair, we set a suitable maximal estimation error Δmax in advance. If the estimation error was above Δmax, we considered it as failed. The maximal estimation errors were set as Δmax = 5 units, Δmax = 1 m, Δmax = 2 m and Δmax = 5 m for small object models G a - G c , the indoor models G - G h , the large indoor models G i and the outdoor models G - G k , respectively. Based on the number Nsuc of successful estimations and the number Nrun of runs the following three indicators were then used to evaluate our method: (1) the successful estimation rate S r = N suc N run to evaluate its robustness; (2) the median estimation error Δ among all Nsuc successful estimations to evaluate its accuracy; (3) the median estimation time t over all Nrun runs to evaluate its efficiency. Note that the estimation time does not include pre-processing computation time (i.e., texture feature detection/matching, point sampling, etc), but it incorporates the codebook building time, which took < 1 s with the basic RANSAC and around two seconds with the probabilistic RANSAC using our parameter setting. The feature detectors listed by increasing processing time are Harris, FAST, SURF and SIFT. The point sampling approaches listed by increasing processing time are random, probabilistic and uniform samplings.

7.2 Performance evaluation

The performance of our proposed NPCS registration algorithm was evaluated on the aforementioned test data. The main parameters were set as follows. We employed congruent points sets of size N = 4. The two main parameters for searching best N-points congruent sets in Algorithm 1 were set as K p = 2, 000 and K g = 50. The exponential parameter λ = 1 for computing our proposed QLCP measure in Equation 17. Given two point sets P and Q, we tried to recover the transformation from P (moving point set) to Q (reference point set). We selected 500 points from P and 1,000 points from Q for searching N-points congruent sets between them. However, for transformation verification, we selected larger subsets, i.e., 1,000 points from P and 2,000 points from Q. This allows the estimation of a more accurate and robust transformation. The voxel-based uniform sampling approach was used to select these points. Our registration algorithm always used the surface normals of points when provided. The maximal normal deviation for the corresponding point pairs was set to 30°. In our experiments, we used the RANSAC-based approach by setting Imax = 1, 000 and Inou = 200, which are the allowed maximal iteration number and the continuous iteration number in which no better transformation was found, respectively. The estimation errors of the first three scan pairs G a - G c were reported in canonical units (i.e., 100 units are equal to the bounding box diagonal lengths), while those of the other eight scan pairs G d - G k in meters. Notice that the same parameter values were used in all experiments described below, unless clearly stated otherwise for particular experiments.

Figure 6 shows the performance comparison of our NPCS algorithm with different K p and K g on three scan pairs G a , G e and G i . First, we fixed K g = 50 and tested the effects of different K p on the registration accuracy (median estimation errors), efficiency (median estimation times) and robustness (successful estimation rates), as shown in Figure 6a, b, c, respectively. We can observe that larger values of K p led to an improvement of the registration accuracy and robustness, but also required longer execution times. In Figure 6c, we can notice that 100% successful estimation rates were achieved for all tested scan pairs when K p ≥ 1, 000. Second, we fixed K p = 2, 000 and tested the effects of different K g on registration performance shown in Figure 6d, e, f. The variation of K g results in similar effects as for K p . When K g ≥ 50, the successful estimation rates were always S r = 100% for all tested scan pairs.

Figure 6
figure 6

Performance comparison with different K p and K g on three scan pairs G a , G e and G i .

Table 1 illustrates the performance evaluation of our NPCS algorithm on three scan pairs G c , G d and G j when the size N of congruent point sets was set as N = 3, ..., 6, respectively. As explained before, large values of N (≥ 6) result in a low probability of successfully selecting N-points congruent sets between two point sets and in longer estimation times. Small values of N (= 3) led to an increase of the candidate N-points congruent sets in Q given N points in P, but sometimes the matched sets found are not enough to well constraint the transformation and the algorithm falls into local minima.

Table 1 Performance evaluation of our NPCS algorithm with different sizes N of congruent point sets

The performance comparison of different sampling approaches on five scan pairs is shown in Table 2. In this experiment, we tested the following sampling strategies: the random sampling; the voxel-based uniform sampling; two probabilistic sampling approaches (probAPD-based on adjacent point distances, probSurfVar-based on surface variations); four texture feature-based sampling using the SIFT, SURF, FAST and HARRIS feature detectors as samplers. Some sampling strategies were coupled and reported in Table 2 combined with the character '+'. For instance, the coupled sampling "random+uniform" denotes that we first applied the random sampling to get a point subset P 1 from the initial point set P and then applied the uniform sampling to get the final point subset P 2 from P 1 . Here, the sampling ratios of all coupled sampling approaches were set to P 2 : P 1 =0.5. All texture feature-based sampling methods were coupled with random or uniform sampling. From Table 2, we observe that the uniform sampling performs better than the random sampling both when is considered alone and when is coupled with other sampling strategies, i.e., with the probAPD-based sampling and feature detectors. Among the probabilistic sampling methods the probAPD-based sampling resulted to be a very good choice if available, since it relevantly improved the registration robustness and also increased its accuracy. On the contrary, the probSurfVar-based sampling showed lack of robustness, but worked very well for models rich of geometric features, which were captured from geometrically complex scenes, as those in G g . In some scenes, texture feature detectors turned out to be a good first sampler. In those cases, they improved the transformation accuracy as shown by 'FAST+uniform" in G e ( Δ = 0 . 0 6 7 2 m ) and by "HARRIS+random" in G g ( Δ = 0 . 0 1 2 1 m ) and G i ( Δ = 0 . 0 8 2 0 m ) . Among the texture feature-based sampling "SIFT+uniform" and "FAST+uniform" also demonstrated a very good robustness by always scoring S r = 100% as the probAPD-based sampling, but with improved accuracies for all models apart from those in G j . Notice that if it is not possible to extract enough texture features the remaining points were selected by the following coupled sampling approach.

Table 2 Performance evaluation of different sampling approaches on five pairs of scans

Our employed QLCP measure for verifying the estimated transformation depends on two main parameters, i.e., λ in Equation 17 and δ-distance. In this paper, we computed δ=α v ref , where α is a positive constant and vref = media n q Q v || q - n c ( q ) ||, where Q v Q represents the set of points used for transformation verification and n c (q) denotes the closest point of q in Q v . Table 3 shows the comparison of our NPCS algorithm when using the LCP or the QLCP measure with different parameters on three scan pairs: G b , G i , and G j . In practice, the LCP measure corresponds to the QLCP measure with λ = 0. From Table 3, we observe that our proposed QLCP measure led to higher successful estimation rates and more accurate transformations than the LCP one. In addition, notice that a large λ slightly increased the estimation robustness when a large α was used. In the rest of the experiments reported in the paper, we used λ = 1 and α = 2.

Table 3 Performance comparison between the LCP and QLCP measures on different models

The integration of scene structure information to our NPCS algorithm can greatly reduce the estimation time, as described in Section 2 and illustrated in Table 4. This can also slightly improve the accuracy of the recovered transformation. Here, the scans in G d and G f were automatically classified as orthogonal scenes.

Table 4 Performance evaluation of our algorithm using scene structure information

Table 5 shows the performance evaluation of our NPCS algorithm on G g and G h using texture features detected from original reflectance images and from atlases of rectified reflectance images. In this experiment the texture features were used to first sample the models and their descriptors were used to improve the matching of congruent sets, as described in Section 2. These techniques were compared with the pure uniform sampling method, which worked well in G g ( S r = 1 0 0 % ) , but failed completely in G h ( S r = 0 % ) .

Table 5 Performance evaluation on the texture feature-based variants of our algorithm with features detected on the original reflectance images and on atlases of rectified reflectance images

The reason of complete failure in G h is due to the presence of wrong transformations with a better QLCP measure than the ground truth transformation. In this example, we clearly showed how the integration of texture features can relevantly improve the robustness of our registration algorithm, especially when they are extracted from rectified images. We recall that there are two ways of integrating texture features. The first one is to consider detected features only as sampled points, as for the experiments reported in Table 2. The second one (reported with the prefix "match") is to rank the matched features using their descriptors and keep in P the best k correspondences for each feature point. This experiment showed that: (1) the SURF features were more robust than the SIFT features in both G g and G h ; (2) the accuracy of the transformation obtained with SURF features is much better than the one obtained with SIFT features in G g ; (3) the features detected from the atlas of rectified reflectance images led to a more robust and generally accurate registration than those extracted from the original reflectance images (the improvement in term of accuracy is particularly evident in G g , where the very low estimation error within 1-2 cm is 0.5 h of the bounding box diagonal length); (4) decreasing the number k of best feature correspondences based their descriptors greatly improves the efficiency (i.e., shorter computation times), but possibly deteriorates the robustness (i.e., lower successful rates); (5) in some experiments the features detected from the atlas of rectified reflectance images led to slightly higher estimation errors than those extracted from the original reflectance images. This happens when k is not large enough (e.g., k = 50, 100) and is mainly related to features located close to the rectified image boundaries. Those features extracted in these regions are less reliable due to the lack of enough texture information around them. This effect can be overcome by imposing a given degree of overlap between neighbouring rectified images.

As explained in Section 4.1, the proposed probability-based RANSAC is especially useful for aligning two scans with a small overlap. This advantage is evident in aligning the scan pair G k with a very small overlap (12%) as reported in Table 6, where S r = 95% for the probabilistic RANSAC, but S r = 65% for the basic RANSAC. The probability-based RANSAC led to similar results in both G a and G e with large overlaps, but with slightly better robustness in G e . Note that K p = 5, 000 and K g = 100 for G k were used in this experiment.

Table 6 Performance comparison between the basic RANSAC and the probabilistic RANSAC

Figure 7 shows seven outdoor scans acquired from the train station Gare de Lyon in Paris, which were automatically aligned by applying our NPCS algorithm followed by ICP registration. These seven scans were coupled by proximity to generate a set of scan pairs to register one after another. The generated pairs of scans have overlap rates of 39, 51, 31, 56, 25 and 29%. The first two scans in these seven scans comprise the tested scan pair G j .

Figure 7
figure 7

Automatically aligned point clouds of seven outdoor scans by applying our NPCS algorithm followed by ICP registration: (left) rendered by colour (each scan is represented in different colour); (right) rendered by blending colour images.

Furthermore, our algorithm was tested on two scans acquired from the same scene at different time instants with some changes as shown in Figure 8, where we acquired two scans almost at the same position after some people moved. In this example, the overlap rate is around 71%. Our algorithm always successfully and accurately registered these two scans (i.e., S r = 100% and Δ = 2.86 cm).

Figure 8
figure 8

An example of the scene environment changed at two different acquisition time instants: the first scan is rendered with its reflectance and the second one is rendered with a colour map based on its changes w.r.t. the first one.

Finally, we also compared our algorithm with the 4PCS algorithm [8] on five scan pairs G a , G c , G i , G j and G k . Initially we extracted 2,000 points from each scan using our proposed voxel-based uniform sampling approach. To make the comparison fair, we performed both the 4PCS algorithm and our proposed NPCS one with N = 4 using 600 randomly selected points from pre-extracted 2,000 points of each scan. Point surface normals (not available in G a and G c ) and same normal difference thresholds were used in both algorithms. The estimated overlapping rates of these five scan pairs shown in Figure 5 were used in 4PCS except for G a where 90% overlapping rate was used. However, in NPCS other parameters were fixed for all five tested scan pairs as mentioned before. Table 7 shows the performance comparison of both algorithms. The 4PCS algorithm sometimes worked successfully in the first four tested scan pairs (S r = 30-90%), but always failed in G k ( S r = 0 % ) . However, our NPCS algorithm almost always succeeded in the first four tested scan pairs and also succeeded in G k with a high successful rate (S r = 60%). By utilizing the probability-based RANSAC, a higher successful rate can be obtained in G k (S r = 95%, see Table 6). Note that the same maximal estimation errors for all tested scan pairs were used in the 4PCS algorithm for computing the estimation successful rate. However, the success of the 4PCS algorithm depends on the provided estimate for the overlap rate of two point sets to some extent. In our application, this overlap rate cannot be robustly approximated in advance. Thus, a feasible strategy would be to try it with different overlap rates until success. This obviously increases the overall execution time. However, in this comparison experiment, we directly used the pre-estimated overlapping rates with recovered ground truth transformations except that 90% overlapping rate was used in G a . Another drawback of the 4PCS algorithm is related to the co-planarity constrain applied on the 4-points bases. Although this greatly reduces the search space, it may miss-lead the algorithm to false transformation estimations, especially in case of a very low overlap rate. We also noticed that in this case the computation time of the 4PCS algorithm increases significantly, since no successful transformations can be found efficiently. Finally, the NPCS algorithm achieved a higher estimation accuracy than the 4PCS one especially in G c and G j mainly due to our proposed QLCP measure as reported in Table 3.

Table 7 Performance comparison of 4PCS and NPCS algorithms

8 Conclusions

In this paper, we presented the NPCS registration, which is a robust and efficient approach for automatically aligning two point sets with overlap. Given two point sets P and Q, our algorithm randomly searches for sets of congruent groups of N-points in P and Q, which lead to the best estimate of the transformation. This is achieved by employing a RANSAC-based algorithm, which can also estimate the matching probability of each point to drive the search of possibly successful congruent bases of N-points. This probabilistic RANSAC approach improves the registration robustness especially for aligning two point sets with a small overlap. The search of congruent sets is efficiently performed by using a fast search codebook inspired by [20] to relevantly reduce the execution time. Our proposed search method can efficiently combine different metric functions to match points and point pairs in a multidimensional search space, which can be defined by geometric and texture feature descriptors and geometrical constraints of set of sampled points. This makes our framework general and flexible The efficient combination of feature descriptors can relevantly improve the registration performance, accuracy, and robustness of models with known specific characteristics. Moreover, we proposed a method to extract texture features from an atlas of rectified images recovered by sampling the reflectance image spherical field of view. The resulting feature descriptors were less sensitive to spherical distortions yielding more precise matches than those extracted from the original reflectance image.

To further improve the registration robustness, we introduced a new measure called QLCP. This is used to efficiently verify the rigid transformation estimated at each RANSAC iteration. The QLCP considers both the quantity and quality of the overlapping point set, and improved the verification robustness w.r.t. the LCP measure utilized in the 4PCS algorithm [8].

Our proposed NPCS algorithm was overall evaluated on a variety of input data with varying amount of noise, outliers, and extent of overlap (12-100%), acquired from small objects, indoor and outdoor scenes. The experiments focused on three main aspects: robustness (successful estimation rate), accuracy (median estimation error), and efficiency (median estimation time), respectively. In almost all cases, our NPCS algorithm (executed with the same parameters) successfully aligned two point clouds in less than a minute by accurately recovering their rigid transformations (the estimation errors are mostly within 0.5-5 h of their bounding box diagonal lengths). We tested our registration method with different sampling techniques and experimentally demonstrated the benefits of using methods generating uniformly distributed point samples w.r.t. random-based sampling strategies. Our proposed voxel-based uniform sampling approach was robust and efficient in almost all tested cases. We showed that the proposed QLCP measure is more reliable than the LCP measure. Still, it is possible to obtain wrong estimations, e.g., the failure related to the scan pair G h . Nevertheless, these wrong estimations can be successfully recovered by using texture-based features detected from the atlas of rectified reflectance images as shown by our experiments, or by using other suitable geometric features as those presented in [10, 13, 50]. In our experiments the NPCS registration performed, on average, better than the 4PCS method in terms of robustness and accuracy. In some experiments, in particular in presence of large overlaps, the execution time of our algorithm was slightly higher than 4PCS. While aligning two scans with a small overlap, our NPCS algorithm took similar computation time (mostly less than a minute) to achieve the good solution. However, 4PCS sometimes took more than 10 min, but still failed (e.g., in G k ).

Future work will be focused on improving the performance of our method by exploiting a parallel hardware and to improve the robustness of our QLCP measure for inspection applications, which typically require to automatically align models captured at different times from scenes possibly containing changes.

Endnotes

aOnly upper triangle part is stored in practice due to the symmetry of the point pairs lookup table. bThe demo application of 4PCS algorithm [8] is available at http://graphics.stanford.edu/~niloy/research/fpcs/4PCS_demo.html.

Algorithm 1

Find the best K g N-points bases { q k } k = 1 K g in Q which are approximate congruent to the N-points base p in P.

{ ( q 1 k , q 2 k ) } k = 1 K p , i.e., the best K p point pairs in Q possibly matching the point pair (p1, p2) in p , according with the similarity score in Equation 3.

for i = 2 to N - 1 do

   case 1: S { ( q i k , q i + 1 k ) } k = 1 K p in Q possibly matching the point pair (p i , pi+1) p .

   case 2: collect f ( p i + 1 )

   

   for each group ( q 1 m , , q i m ) of points in = { ( q 1 m , , q i m ) } m = 1 | | do

      case 1: group point pair

         for each point pair ( q i k , q i + 1 k ) in S do

if q i m = q i k and S f ( p i + 1 , q i + 1 k ) j = 1 i - 1 k = 1 K s k ( m k ( p j m , p i + 1 ) , m k ( q j , q s ) ) =1 then

  { , ( q 1 m , , q i m , q i + 1 k ) }

            end if

         end for

      case 2: add a single point

         for each point q s f ( p i + 1 ) do

            if j = 1 i k = 1 K s k ( m k ( p j m , p i + 1 ) , m k ( q j , q s ) ) =1 then

                { , ( q 1 m , , q i m , q s ) }

            end if

         end for

   end for

   

end for

return { q k } k = 1 K g by filtering according with Eq. (3)

References

  1. Hartley RI, Zisserman A: Multiple View Geometry in Computer Vision. 21st edition. Cambridge University Press; 2004. ISBN: 0521540518

    Book  MATH  Google Scholar 

  2. Yao J, Cham WK: Robust multi-view feature matching from multiple unordered views. Pattern Recogn 2007,40(11):3081-3099. 10.1016/j.patcog.2007.02.011

    Article  MATH  Google Scholar 

  3. Pollefeys M, Van Gool L, Vergauwen M, Verbiest F, Cornelis K, Tops J, Koch R: Visual modeling with a hand-Held camera. Int J Comput Vis 2004,59(3):207-232.

    Article  Google Scholar 

  4. Zhang W, Yao J, Cham WK: 3D modeling from multiple images. The Seventh International Symposium on Neural Networks (ISNN 2010) 2010, 97-103.

    Google Scholar 

  5. Besl PJ, McKay HD: A method for registration of 3-D shapes. IEEE Trans Pattern Anal Mach Intell 1992,14(2):239-256. 10.1109/34.121791

    Article  Google Scholar 

  6. Rusinkiewicz S, Levoy M: Efficient Variants of the ICP Algorithm. International Conference on 3-D Digital Imaging and Modeling (3DIM) 2001.

    Google Scholar 

  7. Matabosch C, Salvi J, Fofi D, Meriaudeau F: Range image registration for industrial inspection. Mach Vis Appl Ind Insp XIII 2005, 216-227.

    Google Scholar 

  8. Aiger D, Mitra NJ, Cohen-Or D: 4-Points Congruent Sets for Robust Pairwise Surface Registration. ACM SIGGRAPH 2008.

    Google Scholar 

  9. Fischler MA, Bolles RC: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 1987,24(6):726-740.

    MathSciNet  Google Scholar 

  10. Gelfand N, Mitra NJ, Guibas LJ, Pottmann H: Robust Global Registration. Third Eurographics Symposium on Geometry Processing (SGP) 2005.

    Google Scholar 

  11. Yu L, Zhang D, Holden E: A fast and fully automatic registration approach based on point features for multi-source remote-sensing images. Computers Geosci 2008,34(7):838-848. 10.1016/j.cageo.2007.10.005

    Article  Google Scholar 

  12. Kang Z: Automatic Registration of Terrestrial Point Cloud Using Panoramic Reflectance Images. International Society for Photogrammetry and Remote Sensing 2008.

    Google Scholar 

  13. Zaharescu A, Boyer E, Varanasi K, Horaud R: Surface Feature Detection and Description with Applications to Mesh Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2009.

    Google Scholar 

  14. Chalfant JS, Patrikalakis NM: Three-dimensional object registration using wavelet features. Eng Computers 2009,25(3):303-318. 10.1007/s00366-009-0126-5

    Article  Google Scholar 

  15. Smith ER, Radke RJ, Stewart CV: Physical Scale Intensity-Based Range Keypoints. 3D Data Processing, Visualization, and Transmission (3DPVT) 2010.

    Google Scholar 

  16. Johnson AE, Hebert M: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans Pattern Anal Mach Intell 1999,21(5):433-449. 10.1109/34.765655

    Article  Google Scholar 

  17. Huber DF, Hebert M: Fully automatic registration of multiple 3D data sets. Image Vis Comput 2003,21(7):637-650. 10.1016/S0262-8856(03)00060-X

    Article  Google Scholar 

  18. Makadia A, Patterson AI, Daniilidis K: Fully Automatic Registration of 3D Point Clouds. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2006.

    Google Scholar 

  19. Stamos I, Leordeanu M: Automated Feature-based Range Registration of Urban Scenes of Large Scale. IEEE Conference on Computer Vision and Pattern Recognition 2003.

    Google Scholar 

  20. Yao J, Ruggeri MR, Taddei P, Sequeira V: Automatic scan registration using 3D linear and planar features. 3D Res 2010,1(3):1-18.

    Article  Google Scholar 

  21. Chao C, Stamos I: Semi-automatic Range to Range Registration: a Feature-based Method. International Conference on 3-D Digital Imaging and Modeling (3DIM) 2005.

    Google Scholar 

  22. Dold C, Brenner C: Registration of Terrestrial Laser Scanning Data Using Planar Patches and Image Data. International Society for Photogrammetry and Remote Sensing 2006.

    Google Scholar 

  23. Chao C, Stamos I: Range Image Registration Based on Circular Features. Proceedings of International Symposium on 3D Data Processing Visualization and Transmission (3DPVT) 2006.

    Google Scholar 

  24. Franaszek M, Cheok GS, Witzgall C: Fast automatic registration of range images from 3D imaging systems using sphere targets. Autom Constr 2009,18(3):265-274. 10.1016/j.autcon.2008.08.003

    Article  Google Scholar 

  25. Rabbani T, van den Heuvel F: Automatic Point Cloud Registration Using Constrained Search for Corresponding Objects. 7th Conference on Optical 3-D Measurement Techniques 2005.

    Google Scholar 

  26. Silva L, Bellon OR, Boyer KL: Precision range image registration using a robust surface interpenetration measure and enhanced genetic algorithms. IEEE Trans Pattern Anal Mach Intell 2005, 27: 762-776.

    Article  MATH  Google Scholar 

  27. Boughorbel F, Mercimek M, Koschan A, Abidi MA: A new method for the registration of three-dimensional point-sets: the Gaussian fields framework. Image Vis Comput 2010,28(1):124-137. 10.1016/j.imavis.2009.05.003

    Article  Google Scholar 

  28. Gold S, Rangarajan A, Lu CP, Pappu S, Mjolsness E: New algorithms for 2D and 3D point matchingpose estimation and correspondence. Pattern Recogn 1998,31(8):1019-1031. 10.1016/S0031-3203(98)80010-1

    Article  Google Scholar 

  29. Granger S, Pennec X: Multi-scale EM-ICP: A Fast and Robust Approach for Surface Registration. In Proc. of the 7th European Conference on Computer Vision-Part IV, ECCV '02. London, UK, UK Springer-Verlag; 2002:418-432.

    Google Scholar 

  30. Liu Y: Automatic range image registration in the Markov chain. IEEE Trans Pattern Anal Mach Intell 2010,32(1):12-29.

    Article  Google Scholar 

  31. Tamaki T, Abe M, Raytchev B, Kaneda K: Softassign and EM-ICP on GPU. International Conference on Networking and Computing (ICNC) 2010, 179-183.

    Google Scholar 

  32. Juan L, Gwon O: A Comparison of SIFT, PCA-SIFT and SURF. Int J Image Process IJIP 2009,3(4):143-152.

    Google Scholar 

  33. Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. ACM-SIAM Symposium on discrete algorithms 1994.

    Google Scholar 

  34. Zhang Z, Faugeras OD: Determining motion from 3D line segment matches: a comparative study. Image Vis Comput 1991, 9: 10-19. 10.1016/0262-8856(91)90043-O

    Article  Google Scholar 

  35. Gelfand N, Ikemoto L, Rusinkiewicz S, Levoy M: Geometrically Stable Sampling for the ICP Algorithm. Fourth International Conference on 3D Digital Imaging and Modeling (3DIM) 2003.

    Google Scholar 

  36. Brown BJ, Rusinkiewicz S: Global non-rigid alignment of 3-D scans. ACM Trans Graph 2007,26(3):21. 10.1145/1276377.1276404

    Article  Google Scholar 

  37. Torsello A, Rodola E, Albarelli A: Sampling Relevant Points for Surface Registration. International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT2011) 2011.

    Google Scholar 

  38. Nehab D, Shilane P: Stratified Point Sampling of 3D Models. Proc of the Symposium on Point-Based Graphics 2004, 49-56.

    Google Scholar 

  39. Ruggeri MR, Patanè G, Spagnuolo M, Saupe D: Spectral-driven isometry-invariant matching of 3D shapes. Int J Comput Vis 2010, 89: 248-265. 10.1007/s11263-009-0250-0

    Article  Google Scholar 

  40. Bowers J, Wang R, Wei LY, Maletz D: Parallel Poisson disk sampling with spectrum analysis on surfaces. ACM Trans Graph 2010, 29: 166:1-166:10.

    Article  Google Scholar 

  41. Pauly M, Gross M, Kobbelt LP: Efficient simplification of point-sampled surfaces. Proc of Vis 2002, 163-170.

    Google Scholar 

  42. Yao J, Taddei P, Ruggeri MR, Sequeira V: Complex and photo-realistic scene representation based on range planar segmentation and model fusion. Int J Robotics Res 2011,30(10):1263-1283. 10.1177/0278364911410754

    Article  Google Scholar 

  43. Tuytelaars T, Mikolajczyk K: Local invariant feature detectors: a survey. Found Trends Comput Graph Vis 2008,3(3):177-280.

    Article  Google Scholar 

  44. Lowe DG: Distinctive image features from scale-invariant keypoints. Int J Computer Vis 2004,60(2):91-110.

    Article  Google Scholar 

  45. Bay H, Ess A, Tuytelaars T, Van Gool L: Speeded-up robust features (SURF). Comput Vis Image Underst 2008,110(3):346-359. 10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  46. Rosten E, Drummond T: Machine Learning for High-speed Corner Detection. European Conference on Computer Vision 2006, 1: 430-443.

    Google Scholar 

  47. Sinha SN, michael Frahm J, Pollefeys M, Genc Y: GPU-based Video Feature Tracking and Matching. Tech. rep., in Workshop on Edge Computing Using New Commodity Architectures 2006.

    Google Scholar 

  48. Rosten E, Porter R, Drummond T: Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 2010,32(1):105-119.

    Article  Google Scholar 

  49. Harris C, Stephens M: A Combined Corner and Edge Detector. The Fourth Alvey Vision Conference 1988, 147-151.

    Google Scholar 

  50. Albarelli A, Rodola E, Torsello A: Loosely Distinctive Features for Robust Surface Alignment. European Conference on Computer Vision (ECCV2010) 2010, 519-532.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Yao.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Yao, J., Ruggeri, M.R., Taddei, P. et al. Robust surface registration using N-points approximate congruent sets. EURASIP J. Adv. Signal Process. 2011, 72 (2011). https://doi.org/10.1186/1687-6180-2011-72

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-6180-2011-72

Keywords