 Research
 Open Access
 Published:
Robust surface registration using Npoints approximate congruent sets
EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 72 (2011)
Abstract
Scans acquired by 3D sensors are typically represented in a local coordinate system. When multiple scans, taken from different locations, represent the same scene these must be registered to a common reference frame. We propose a fast and robust registration approach to automatically align two scans by finding two sets of Npoints, that are approximately congruent under rigid transformation and leading to a good estimate of the transformation between their corresponding point clouds. Given two scans, our algorithm randomly searches for the best sets of congruent groups of points using a RANSACbased approach. To successfully and reliably align two scans when there is only a small overlap, we improve the basic RANSAC random selection step by employing a weight function that approximates the probability of each pair of points in one scan to match one pair in the other. The search time to find pairs of congruent sets of Npoints is greatly reduced by employing a fast search codebook based on both binary and multidimensional lookup tables. Moreover, we introduce a novel indicator of the overlapping region quality which is used to verify the estimated rigid transformation and to improve the alignment robustness. Our framework is general enough to incorporate and efficiently combine different point descriptors derived from geometric and texturebased feature points or scene geometrical characteristics. We also present a method to improve the matching effectiveness of texture feature descriptors by extracting them from an atlas of rectified images recovered from the scan reflectance image. Our algorithm is robust with respect to different sampling densities and also resilient to noise and outliers. We demonstrate its robustness and efficiency on several challenging scan datasets with varying degree of noise, outliers, extent of overlap, acquired from indoor and outdoor scenarios.
1 Introduction
In the past decade, there was a growing interest in 3D reconstruction and realistic 3D modelling of largescale scenes such as urban structures. Applications of such models include virtual reality, cultural heritage, urban planning, and architecture. Commonly, these applications require a combination of laser sensing technology with traditional digital photography.
Applications that employ only digital images extract 3D information using either a single moving camera or a multi camera system, such as a stereo rig. In both cases, the system extracts and matches distinctive features (typically points) among the available images and estimates both their 3D positions and the camera parameters [1, 2]. It is then possible to exploit the result of this first step to perform a dense point reconstruction by estimating a depth map for each image [3, 4]. On one hand, these approaches are useful for those applications requiring a robust and lowcost acquisition system. On the other hand, laser sensing technology yields much higher precision and resolution. Thus, the laser sensing technology represents an effective and powerful tool for achieving accurate geometric representations of complex surfaces of real scenes.
In recent years, 3D laser scanners able to provide satisfying measurement accuracy for different applications become commercially available. These sensors are used to acquire a complex real scene through multiple scans taken from different positions to fully describe the scene while reducing the number of occluded surfaces. For this reason, it is important to employ a systematic and automatic way to align, or register, multiple 3D scans to represent and visualize them in a common coordinate system. Geometrically, given a point cloud $\mathcal{Q}$ considered as reference and a second point cloud $\mathcal{P}$, the problem of registration consists in finding the rigid transformation $\mathcal{T}$, which optimally aligns $\mathcal{P}$ to $\mathcal{Q}$ in its coordinate system.
1.1 Related works
The iterative closest point (ICP) algorithm [5] is the de facto standard to compute the rigid transformation $\mathcal{T}$ between two point clouds. It is basically an optimization method that starts from an initial estimate of $\mathcal{T}$ and iteratively refines it by generating pairs of corresponding points on the scans and minimizing an error metric, e.g., the sum of squared distances between corresponding points. Although several variants of ICP were presented [6] to improve its efficiency, the main problem is to achieve a good initial estimate of $\mathcal{T}$ since the ICP optimization can easily stop in local minima.
The problem of automatically registering two scans was achieved with a wide variety of methods [7]. Most of these extract sets of feature points, which are automatically matched to recover a good approximation of $\mathcal{T}$. Aiger et al. [8] proposed to automatically match congruent sets of four roughly coplanar points to solve the largest common point (LCP) set problem. Congruent sets of points have similar shapes defined in terms of point distances and normal deviations. The best match between congruent sets is randomly found by following the RANdom SAmple Consensus (RANSAC) approach [9]. Other approaches use shape descriptors to identify sets of candidate feature points to be matched. Gendalf et al. [10] use a 3D integral invariant shape descriptor to detect feature points, which are matched in sets of three items using a branchandbound algorithm. Other interesting shape descriptors invariant with respect to rigid transformation are used to identify feature points, such as scale invariant feature transforms (SIFT)s [11, 12] or Harris corners [12] extracted from reflectance images, 3D SIFTlike descriptors extracted from triangle meshes approximating the point clouds [13], wavelet features [14], intensitybased range features [15], spin images [16, 17], and extended Gaussian images [18].
Methods to automatically recover the rigid transformations from matching sets of higherlevel features were also presented. The advantage of these approaches is the reduction of the search space identified by two small sets of features, which results in efficient matching, but that should account for extra computation time due to scene segmentation or feature detection. Among the feature types presented the most interesting are: lines [19, 20], planes [19–22], circles [23], spheres [24] and other fitted geometric primitives [25].
Other studies proposed to formulate the registration as an energy optimization problem that does not need any explicit set of point correspondences. Silva et al. [26] proposed to use an enhanced genetic algorithm to solve the range image registration problem using a robust surface interpenetration measure. Boughorbel et al. [27] defined an energy minimization function based on Gaussian fields to solve the 3D automatic registration.
The last relevant class of registration approaches is based on modelling the alignment of two point sets as an assignment problem, where the probability of a point in one set to has a correspondence in the other set is estimated and maximized with expectation maximization (EM) algorithms. Popular methods following this approach are known as SoftAssign [28] and EMICP [29], which are both based on entropy maximization principles, but imposing different constraints for problem optimization, i.e., a twoway constraint embedded into the deterministic annealing scheme for SoftAssign and a oneway constraint for the EMICP. A detailed review and analysis of these methods was provided in [30], where Liu proposed a method to overcome SoftAssign and EMICP limitations based on modelling the registration problem as Markov chain of thermodynamic systems and on an entropy model derived from the Lyapunov function. Furthermore, fast implementations on GPU of the SoftAssign and EMICP algorithms were recently presented by Tamaki et al. [31].
1.2 Our algorithm
Our method utilizes 3D points (possibly associated with point descriptors, as it is described in Section 6) to achieve automated registration. It automatically aligns two scans by finding two Npoints approximate congruent sets leading to a good estimate of the transformation $\mathcal{T}$ between the corresponding point clouds. $\mathcal{T}$ is then further refined via the ICP algorithm.
Given two scans $\mathcal{P}$ and $\mathcal{Q}$, our algorithm randomly searches for sets of congruent groups of points in $\mathcal{P}$ and $\mathcal{Q}$. Corresponding groups are then used to estimate a rigid transformation $\mathcal{T}$ to align $\mathcal{P}$ to $\mathcal{Q}$. The optimal transformation is recovered following a RANSAC optimization [9], which iterates the following steps until a good solution to the problem is found or the number of iterations exceeds a predefined threshold I_{max}:

1)
Random selection of a Npoints base ${\mathcal{B}}_{p}$ in $\mathcal{P}$.

2)
Approximate congruent group selection of Npoints bases in $\mathcal{Q}$. The definition of approximate point set congruence is described in Section 2. This selection is achieved by using a general codebook to efficiently find approximate congruent points bases under rigid transform by exploiting combinations of feature point descriptors when available (see Section 3).

3)
Estimation of the transformation $\mathcal{T}$ between $\mathcal{P}$ and $\mathcal{Q}$ given a randomly selected Npoints base ${\mathcal{B}}_{p}$ in $\mathcal{P}$ and each extracted approximate congruent Npoints base ${\mathcal{B}}_{p}$ in $\mathcal{Q}$.

4)
Verification of the transformation. $\mathcal{T}$ is verified using all possibly corresponding points after the alignment. The verification employs our proposed qualitybased largest common pointset (QLCP) measure described in Section 4.4.
The best transformation is selected as the one yielding the best QLCP measure and then further refined with the ICP algorithm. As in [20], we present a variant of this RANSACbased algorithm, which improves the random selection step by employing a weight function approximating the probability of each pair of features in $\mathcal{P}$ to be matched with one in $\mathcal{Q}$. We call this variant as probabilitybased RANSAC approach and describe it in Section 4.1.
Our algorithm is robust with respect to different sampling densities and the typical noise introduced by laser scanner acquisition. This is achieved by employing suitable point sampling approaches described in Section 5, and by using feature points and their descriptors to effectively constrain rigid transformations on noisy point sets.
Through our proposed matching framework presented in Sections 2 and 3, we efficiently match points and point pairs in a multidimensional space defined by a set of available geometric and texture feature descriptors and geometrical constrains of a set of sampled points. The matching is performed by combining suitable metric functions to compare the provided descriptors.
Any type of features carrying suitable distance functions to be compared can be easily integrated into our matching framework. The major benefit of this approach is to make possible efficient customizations for specific applications aiming at relevantly improving the registration performance in terms of robustness, accuracy, and execution time.
We also present a method to improve the matching effectiveness of texture features extracted from typical spherical reflectance images acquired by laser scanners. It consists in extracting features from atlases of rectified perspective images constructed by sampling the reflectance image spherical field of view at suitable angles. This approach mitigates the effect of spherical distortion on the resulting feature signatures so that they can be matched with higher reliability.
The robustness, accuracy and efficiency of our method were overall evaluated on several challenging scan datasets acquired from indoor and outdoor scenes as described in Section 7.
2 Approximate point set congruence
Given two point sets $\mathcal{P}$ and $\mathcal{Q}$, we assume ${\mathcal{B}}_{p}={\left\{{p}_{i}{p}_{i}\in \mathcal{P}\right\}}_{i=1}^{N}$ and ${\mathcal{B}}_{q}={\left\{{q}_{i}{q}_{i}\in \mathcal{Q}\right\}}_{i=1}^{N}$ to be the two corresponding Npoints bases from $\mathcal{P}$ and $\mathcal{Q}$, respectively. This means that for each point ${p}_{i}\in {\mathcal{B}}_{p}$ there exists one and only one corresponding point ${q}_{i}\in {\mathcal{B}}_{q}$. We consider the two sets to be congruent, if they are approximatively similar in shape and have a similar distribution in 3D space. We define both a similarity score function and a binary similarity score function in order to measure the congruency of two matching Npoints bases as follows.
Given a point ${p}_{i}\in {\mathcal{B}}_{p}$, we characterize it using a set of L local descriptors, i.e., ${\left\{{f}_{l}\left({p}_{i}\right)\right\}}_{l=1}^{L}$. Similarly, for each pair of points (p_{ i }, p_{ j }) in ${\mathcal{B}}_{p}$, we define a set of K measurements ${\left\{{m}_{k}\left({p}_{i},{p}_{j}\right)\right\}}_{k=1}^{K}$. The L descriptors and the K measurements characterize a Npoints base in terms of point features and point pairs relations (see Figure 1). These values are then used to define the congruency of the two different Npoints bases ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}$.
For each type of local descriptor or points pair measurement, we define a similarity difference function d(·, ·) that is invariant under rigid transformation of each single Npoints base. In particular, given two descriptors or measurements v_{ p } and v_{ q }, then d(v_{ p }, v_{ q }) is represented by a real positive value that states how different the two descriptors or measurements are.
We also define a set of boolean similarity measures
where t is a threshold value associated to the particular feature descriptor or measurement and b(·, ·) is a boolean function defined as:
The set of functions {d(·, ·)} are then composed together to define a similarity score ${s}_{c}\left({\mathcal{B}}_{p},{\mathcal{B}}_{q}\right)$ between two congruent Npoints bases as follows:
where ${s}_{c}^{f}$ is the term related to the local descriptors and ${s}_{c}^{m}$ is the term related to the similarity measures of points pairs. These two terms are defined as:
where $\left\{{w}_{l}^{f}\right\}$ and $\left\{{w}_{k}^{m}\right\}$ are userdefined weights, ${N}_{f}=2N{\sum}_{l=1}^{L}{w}_{l}^{f}$ and ${N}_{m}=2N\left(N1\right){\sum}_{k=1}^{K}{w}_{k}^{m}$ are normalization factors. Notice that s_{ c } is defined such that its values fall in the range [0, 1], where a higher value represents a higher similarity between the two Npoints bases.
Similarly, we define the binary similarity score ${s}_{c}\left({\mathcal{B}}_{p},{\mathcal{B}}_{q}\right)$ between two Npoints bases as:
where ${s}_{c}\left({\mathcal{B}}_{p},{\mathcal{B}}_{q}\right)$ represents the product of all boolean similarities associated to the matching points of the two sets. We consider ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}$ to be approximate congruent only if $s\left({\mathcal{B}}_{p},{\mathcal{B}}_{q}\right)=1$.
In order to evaluate the Npoints base congruence, we need to define which local point descriptors {f_{ l }(·)} and points pair measurements functions {m_{ k }(·, ·)} to employ, their corresponding similarity differences {d(·, ·)} and scalar thresholds {t}. We consider as first points pair measurement the Euclidean distance m_{1}(p_{ i }, p_{ j }) = p_{ i }  p_{ j } and define its similarity difference as:
If the surface normal at each point is available, we define the second points pair measurement m_{2}(p_{ i }, p_{ j }) = n_{angle}(n(p_{ i }), n(p_{ j })), where n(p) denotes the surface normal of the point p and n_{angle}(·, ·) denotes the minimal angle between the two surface normals. Its similarity difference is defined as:
If the reflectance or colour images associated with the range scans are available we can extract the corresponding feature points (e.g., SIFT or SURF feature points [32]) associated with each 3D point of ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}$. The corresponding local feature descriptors can be used to define a suitable similarity difference.
In some application, it is possible to exploit some information about the environment to define additional descriptors. This is the case of scans representing structural scenes with one common and main normal direction (ground floor scene) or environment with three common orthogonal normal directions (orthogonal scene). For instance, in an indoor/outdoor scene with a common and main ground floor plane, all points lying on the ground plane roughly have the same normal directions. The type of a structural scene can be automatically detected and classified by clustering the surface normals.
In the case of ground floor scene, we initially transform all points in $\mathcal{P}$ and $\mathcal{Q}$ to align their corresponding ground floor normals to the zaxis. Then for each point p = (p^{x}, p^{y}, p^{z})^{⊤}, an additional local descriptor can be defined as f_{ z }(p) = n_{angle}(p, n_{ z }), where n_{ z } denotes the direction of zaxis, i.e., n_{ z } = (0, 0, 1)^{⊤}. f_{ z }(p) represents the inclination of the surface passing through p w.r.t. the ground. Its similarity difference is defined as:
In addition, we can introduce another points pair measurement ${m}_{z}\left({p}_{i},{p}_{j}\right)={p}_{i}^{z}{p}_{j}^{z}$, i.e., the height difference between the two points. Its corresponding similarity difference can be defined as:
If both $\mathcal{P}$ and $\mathcal{Q}$ are acquired from an orthogonal scene, $\mathcal{P}$ and $\mathcal{Q}$ are first transformed to align their corresponding three orthogonal point normals to the x, y and zaxis, respectively. Then, we exploit two more local descriptors defined as f_{ x }(p) = n_{angle}(p, n_{ x }) and f_{ y }(p) = n_{angle}(p, n_{ y }), where n_{ x } = (1, 0, 0)^{⊤} and n_{ y } = (0, 1, 0)^{⊤}. These descriptors represent the inclinations of the surface passing through p w.r.t. the additional main axes n_{ x } and n_{ y }, respectively. In addition, we introduce two other points pair measurements ${m}_{x}\left({p}_{i},{p}_{j}\right)={p}_{i}^{x}{p}_{j}^{x}$, and ${m}_{y}\left({p}_{i},{p}_{j}\right)={p}_{i}^{y}{p}_{j}^{y}$. The corresponding similarity difference of f_{ x }, f_{ y }, m_{ x } and m_{ y } are defined in the same way as in Equations (9) and (10), respectively.
3 Fast search codebook
Using the criteria defined in the previous section, we are able to evaluate the congruency of two given Npoints bases. To perform the registration, we need to couple a Npoints base ${\mathcal{B}}_{p}\in \mathcal{P}$ with the N points base ${\mathcal{B}}_{q}\in \mathcal{Q}$ having a highsimilarity score. This task requires a search over all possibly congruent Npoints bases in $\mathcal{Q}$. Using exhaustive search approaches is impractical due to the large number of candidates in $\mathcal{Q}$. To solve this problem, we build a codebook from $\mathcal{P}$ and $\mathcal{Q}$ composed of two different data structures used to perform a fast search of possibly corresponding points (as described in Section 3.1) and point pairs (as described in Section 3.2). In particular, we employ a boolean table S_{ f } used to detect candidate point matches in $\mathcal{Q}$ of a selected point ${p}_{i}\in \mathcal{P}$ and a multidimensional table S_{ m } used to detect candidate point pair matches of a selected pair of points $\left({p}_{i},{p}_{j}\right)\in \mathcal{P}$. If the number of all detected congruent Npoints bases is still large, we further need to compute a similarity score between ${\mathcal{B}}_{p}$ and each detected base in $\mathcal{Q}$ in order to sort them and then consider only the best ones.
The used codebook is, thus, composed of a boolean m × n table and a floatingpoint n × n × K table^{a}, where $m=\phantom{\rule{2.77695pt}{0ex}}\left\mathcal{P}\right,\phantom{\rule{2.77695pt}{0ex}}n=\phantom{\rule{2.77695pt}{0ex}}\left\mathcal{Q}\right$ and K denote the number of used points pair measurements. The required memory for S_{ f } increases as O(mn) and for S_{ m } increases as O(n^{2}) (assuming K ≪ n).
Our algorithm detects candidate congruent Npoints bases incrementally. Given ${\mathcal{B}}_{p}\in \mathcal{P}$, we start by selecting two points in ${\mathcal{B}}_{p}$ and collect all congruent 2points bases in ${\mathcal{B}}_{q}$. We then iteratively add points to the current selection and grow the set of candidate bases until we reach a set of Npoints bases.
3.1 Point features lookup table
We build S_{ f } as a m × n boolean similarity measure table according to the used local descriptors, where $m=\phantom{\rule{2.77695pt}{0ex}}\left\mathcal{P}\right$ and $n=\phantom{\rule{2.77695pt}{0ex}}\left\mathcal{Q}\right$, i.e., the sizes of $\mathcal{P}$ and $\mathcal{Q}$, respectively. Each element in S_{ f } is defined as:
where ${p}_{i}\in \mathcal{P}$ and ${q}_{j}\in \mathcal{Q},\phantom{\rule{2.77695pt}{0ex}}i=1\dots m$, and j = 1...n. Thus, given a point ${p}_{i}\in \mathcal{P}$, we can recover all possibly matching points in $\mathcal{Q}$ considering the i th row of S_{ f }. The set of candidate matches in $\mathcal{Q}$ for p_{ i } is represented by:
Notice that, to build S_{ f }, we only make use of the local feature descriptors. Its size depends on the number of points of both point clouds. In Section 5, we describe several techniques to sample the input acquisitions in order to reduce their sizes.
3.2 Point pairs lookup table
Given a point pair $\left({p}_{i},{p}_{j}\right)\in \mathcal{P}$, we need to efficiently find candidate matching pairs in $\mathcal{Q}$. Using a blind exhaustive search, this would require a comparison with $\frac{n\left(n1\right)}{2}$ point pairs. To reduce the searching time, we build a lookup table S_{ m } for $\mathcal{Q}$ by uniformly quantising the Kdimensional space formed by the used K points pair measurements {m_{ k }(·, ·)}, i.e., the Euclidean distance, the surface normal minimal angle, the gap difference(s) in the x, y or zaxis for structural scenes, etc. The quantisation is achieved by uniformly dividing their corresponding value ranges into B_{1}, B_{2}, ..., B_{ K } bins, respectively. The range of the Euclidean distance is $\left[{d}_{min}^{q},{d}_{max}^{q}\right]$, where ${d}_{min}^{q}$ and ${d}_{max}^{q}$ denote the minimal and maximal distances of point pairs in $\mathcal{Q}$. Surface normal minimal angle falls in the range [0, π]. The gap differences fall in the ranges [b_{ x }, b_{ x }], [b_{ y }, b_{ y }] and [b_{ z }, b_{ z }], respectively, where b_{ x }, b_{ y } and b_{ z } represent the lengths in x, y and zaxis of the minimal bounding box covering all points in $\mathcal{Q}$, respectively. Each Kdimensional bin contains all points pairs $\left({q}_{i},{q}_{j}\right)\in \mathcal{P}$ whose measurements {m_{ k }(q_{ i }, q_{ j })} fall within the bin ranges.
In order to detect the matching point pairs of (p_{ i }, p_{ j }), we initially evaluate the set of measurements {m_{ k }(p_{ i }, p_{ j })}. We then consider the thresholds $\left\{{t}_{k}^{m}\right\}$ associated with each measurement function in order to estimate a set of ranges $\left\{\left({m}_{k}\left({p}_{i},\phantom{\rule{2.77695pt}{0ex}}{p}_{j}\right){t}_{k}^{m},\phantom{\rule{2.77695pt}{0ex}}{m}_{k}\left({p}_{i},{p}_{j}\right)+{t}_{k}^{m}\right)\right\}$. We select all Kdimensional bins of S_{ m } that are covered or partially covered by the estimated set of ranges and recover the associated points pairs of $\mathcal{Q}$. In particular points pairs that belong to partially covered bins are checked by verifying whether their measurements fall within the estimated set of ranges. Each extracted candidate matching pair (q_{ i }, q_{ j }) is further verified by exploiting the point feature table S_{ f } to keep only pairs whose points features correspond. In particular, we test that:
Finally, using Equation 3, we evaluate the similarity score of each remaining candidate pair with (p_{ i }, p_{ j }) and keep only the best K_{ p } pairs. In case of very distinctive points p_{ i } and p_{ j }, there are few correspondences in $\mathcal{Q}$ with similar local features. For such points pairs, it is more convenient to select, at first, the set of matching points using S_{ f } and then verify each points pair using the set of measurements {m_{ k }(p_{ i }, p_{ j })}. This initial test is conducted by evaluating the value of $\left{\mathcal{M}}_{f}\left({p}_{i}\right)\right\times \left{\mathcal{M}}_{f}\left({p}_{j}\right)\right$, i.e., the largest number of candidate point pair matches w.r.t. p_{ i } and p_{ j } due to their local features. When this value is lower than a threshold, we employ this latter selection method. Our codebookbased search method allows one to efficiently rangesearch candidate matching point pairs using adaptive ranges for each query. If we regard the K point pair difference measurements as a Kdimensional vector, other fast search methods can be used for searching, e.g., the approximate nearest neighbor based on kdtree [33]. However, these methods cannot handle the threshold constraints in each dimension, which may produce more candidates to test while discarding valid ones.
3.3 Iterative search of matching Npoints bases
Finding the best corresponding point base set of a Npoints base ${\mathcal{B}}_{p}\in \mathcal{P}$ requires to test O(n^{N}) N points bases in $\mathcal{Q}$ with a blind exhaustive search, which is often impractical due to the size of the search space. To efficiently search approximate congruent Npoints bases in $\mathcal{Q}$ given a base ${\mathcal{B}}_{p}\in \mathcal{P}$, we employ an iterative approach that makes use of the codebook defined in the previous sections. We start by selecting a query set ${\mathcal{B}}_{i}$ composed by a points pair of ${\mathcal{B}}_{p}$ and search candidate congruent 2points bases using S_{ f } and S_{ m }. We then iteratively add points of ${\mathcal{B}}_{p}$ to ${\mathcal{B}}_{i}$ and build the corresponding candidate congruent bases by grouping point pairs or adding single points to the previous candidate bases until ${\mathcal{B}}_{i}$ corresponds to ${\mathcal{B}}_{p}$ and all candidate bases are represented by Npoints bases. Algorithm 1 describes the procedure in detail, which is also illustrated in Figure 2. In Algorithm 1, we describe two approaches to gradually expand the size of a congruent base in $\mathcal{Q}$. The first approach is to add an approximate congruent point pair having a common point with a previous base and satisfying the used congruent constraint. The second approach is to add a single point from ${\mathcal{M}}_{f}\left({p}_{i+1}\right)$ according with the used congruent constraint. Since it is difficult to select the best approach, we use a simple strategy based on the product of set sizes $\left{\mathcal{M}}_{f}\left({p}_{\mathsf{\text{i}}}\right)\right\times \left{\mathcal{M}}_{f}\left({p}_{i+1}\right)\right$. In particular, if this size product is large, we use the former method, otherwise the latter one.
4 RANSAC pose optimization
To find the best transformation $\mathcal{T}$ that aligns the two points sets $\mathcal{P}$ and $\mathcal{Q}$, we employ a variant of the RANSAC algorithm [9], which is a widely used general technique for robust fitting of models to data corrupted with noise and outliers. The RANSACbased alignment procedure is straightforward: randomly pick a base ${\mathcal{B}}_{p}$ of N noncollinear points from $\mathcal{P}$; detect the corresponding best congruent bases ${\left\{{\mathcal{B}}_{q}^{k}\right\}}_{k=1}^{{K}_{g}}$ and for each one compute the candidate transformation that aligns points in ${\mathcal{B}}_{p}$ with points in ${\mathcal{B}}_{q}^{k}$; and finally verify the recovered transformations and detect the best one using a best fit criteria. To achieve a certain probability of success, this procedure is repeated for different choices of bases from $\mathcal{P}$. Over all such trials in all iterations, we select the best transformation $\mathcal{T}$ with the best fit measure. Our adopted RANSAC algorithm terminates when one of the following two cases is reached:

1.
The number of iterations reaches a predefined maximal iteration number I_{max};

2.
The best transformation is not updated after I_{nou} continuous iterations.
Our method makes use of the codebookbased search scheme defined in Section 3, which is constructed before the optimization. The following sections describe in details each single step of the RANSAC iteration.
4.1 Random selection
Assumed k points in $\mathcal{P}$ having corresponding points in $\mathcal{Q}$, the probability of successfully selecting Npoints from $\mathcal{P}$ having correspondences in $\mathcal{Q}$ is p(N) ≈ (k/m)^{N}, where $m=\phantom{\rule{2.77695pt}{0ex}}\left\mathcal{P}\right$. To successfully recover the transformation, in general, we employ a base size N = 3, 4 or 5 points because the probability of success greatly decreases when the base size N increases. Moreover, to make the estimated transformation more robust, we select the Npoints base ${\mathcal{B}}_{p}$ from $\mathcal{P}$ as decentralized as possible in 3D space.
Notice that when the overlap between two scans is small only a very small subset of points in $\mathcal{P}$ have corresponding points in $\mathcal{Q}$. In this case, the probability of selecting a Npoints base in $\mathcal{P}$ with a uniform distribution having a corresponding Npoints base in $\mathcal{Q}$ is very low. To improve the selection probability, we propose a probabilitybased RANSAC approach described as follows. We initially build a m × m pairwise matching probability table S_{ p } for all point pairs in $\mathcal{P}$. Given a point pair (p_{ i }, p_{ j }) in $\mathcal{P}$, its matching probability is defined by
where C is a normalization factor, and ${\left\{{\phi}_{i}\right\}}_{i=1}^{3}$ are three positive constants. $\left({\stackrel{\u0303}{q}}_{i},{\stackrel{\u0303}{q}}_{j}\right)$ is the best matched point pair of (p_{ i }, p_{ j }) with the best similarity score in Equation 3. Notice that if no approximate congruent match $\left({\stackrel{\u0303}{q}}_{i},{\stackrel{\u0303}{q}}_{j}\right)$ is found, we set the probability value to zero, i.e., S_{ p }(p_{ i }, p_{ j }) = 0. This probability is high if:

1.
Both points p_{ i } and p_{ j } potentially have several matches in $\mathcal{Q}$ based on their considered local descriptors (see Equation 12),

2.
They are well spaced and

3.
There exists a very similar 2points base $\left({p}_{i},{p}_{j}\right)\in \mathcal{Q}$ according to the similarity measure s_{ c }(·, ·) defined in Equation 3.
The selection of a Npoints base ${\mathcal{B}}_{p}={\left\{{p}_{{s}_{k}}\right\}}_{k=1}^{N}$ from $\mathcal{P}$ proceeds iteratively by adding points to a selected point set ${\mathcal{B}}_{p}^{c}$. This is done as follows:

1.
We randomly select the first two points $\left({p}_{{s}_{1}},{p}_{{s}_{2}}\right)$ based on the probability values {S_{ p }(p_{ i }, p_{ j })}_{i>j,1 ≤ i, j≤m}of the upper triangular part of the symmetric pairwise matching probability table S_{ p }. These points are added to the initially empty ${\mathcal{B}}_{p}^{c}$.

2.
The next point ${p}_{{s}_{k+1}},\phantom{\rule{2.77695pt}{0ex}}k\ge 2$ is randomly selected based on the joint probability values ${\left\{{\prod}_{{p}_{{s}_{k}}\in {\mathcal{B}}_{p}^{c}}p\left({p}_{i},{p}_{{s}_{k}}\right)\right\}}_{{p}_{i}\in \mathcal{P},{p}_{i}\notin {\mathcal{B}}_{p}^{c}}$.
In this way, there is a high probability to select a Npoints base ${\mathcal{B}}_{p}$ with corresponding points in $\mathcal{Q}$.
At the end of the RANSAC iteration, if we successfully recover a candidate transformation $\mathcal{T}$ using the selected Npoints base, the probability table S_{ p } is updated. In particular, we update the corresponding elements of points ${p}_{{s}_{k}}\in {\mathcal{B}}_{p}$ by suitably increasing the probability values: for each point ${p}_{i}\in \mathcal{P}$, the new probability value ${S}_{p}^{\prime}\left({p}_{i},{p}_{{s}_{k}}\right)$ (and its symmetric value ${{S}^{\prime}}_{p}({p}_{{s}_{k}},{p}_{i}))$ is evaluated as
where ${f}_{\mathsf{\text{QLCP}}}\left(\mathcal{T},\delta \right)$ is the transformation fitting criterion described in Section 4.4 and ν is a positive constant. The update of probabilities increases the chance to select good samples in $\mathcal{P}$, which is very useful for aligning two scans with a small overlap. To avoid unbalanced values in S_{ p }, we decrease the probabilities of some elements if these have been updated too frequently during the RANSAC iterations. In particular, we decrease the probability value as follows:
where ψ ∈ (0, 1) is a positive constant.
4.2 Approximate congruent group selection
After selecting an Npoints base ${\mathcal{B}}_{p}$ from $\mathcal{P}$, we need to detect a set of approximate congruent Npoints bases. This is done by exploiting the fast codebook structures S_{ f } and S_{ m } defined in Section 3. In particular following Algorithm 1, we are able to iteratively recover the set of congruent Npoints bases as we select points of ${\mathcal{B}}_{p}$ from $\mathcal{P}$. We keep only the first K_{ g } candidates according to Equation 3. These K_{ g } Npoints bases ${\left\{{\mathcal{B}}_{q}^{k}\right\}}_{k=1}^{{K}_{g}}$ are used in the following step to estimate a set of point clouds transformations.
4.3 Transformation estimation
Given two point sets $\mathcal{P}$ and $\mathcal{Q}$ with overlapping regions in arbitrary initial positions, we recover the best transformation from a prescribed family of transformations, typically rigid transformations, that best aligns the overlapping regions of $\mathcal{P}$ and $\mathcal{Q}$. In case of rigid transformation, we need a base size of at least three points to uniquely determine the aligning transformation. This means that our algorithm requires at least a pair of matching 3points bases from $\mathcal{P}$ and $\mathcal{Q}$, respectively. In particular for any given Npoints bases pairs ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}$, we recover the corresponding transformation $\mathcal{T}$ using the closedform solution [34].
4.4 Transformation verification
To determine the best transformation, Aiger et al. [8] employ a best fit criteria called as the largest common pointset (LCP) measure ${f}_{\mathsf{\text{LCP}}}\left(\mathcal{T},\delta \right)$, which refers to the transformation bringing the maximum number of points from $\mathcal{P}$ to within some δdistance of points in $\mathcal{Q}$. Unfortunately, this criteria completely depends on the choice of the distance threshold δ. On one hand, if this threshold is too large wrong transformations may result in large LCP measure values. On the other hand, if δ is too small in some cases no transformation can be found. The main problem is that the LCP measure only considers the quantity of matched points in overlapping regions, but not their matching quality. To solve this problem to some extent, we propose to integrate a suitable matching quality measure into the LCP measure. Suppose that we have two transformations ${\mathcal{T}}_{a}$ and ${\mathcal{T}}_{b}$ computed from two different selected Npoints congruent group pairs and resulting in the same LCP measure under the same distance threshold δ. Assume that the histograms of the point distances of the two transformations correspond to the ones shown in Figure 3a, b, respectively. Intuitively, ${\mathcal{T}}_{b}$ is a better solution than ${\mathcal{T}}_{a}$ because most of the corresponding point pairs have shorter distances in Figure 3b than in Figure 3a, i.e., the mean distance of the corresponding point pairs in Figure 3b is smaller than that in Figure 3a. We expect that better point matches (with shorter distances) result in a better transformation. We, thus, define a suitable matching score based on a normalized accumulated histogram ${\mathscr{H}}_{n}\left(\mathcal{T},\delta \right)$ (see Figure 3c, d) corresponding to some given transformation $\mathcal{T}$ as follows:
where λ is a positive parameter and ${\int}_{0}^{1}{\mathscr{H}}_{n}\left(\mathcal{T},\delta \right)$ denotes the integral of ${\mathscr{H}}_{n}\left(\mathcal{T},\delta \right)$, i.e., the area below the cumulative curve in ${\mathscr{H}}_{n}\left(\mathcal{T},\delta \right)$. We use a qualitybased LCP (QLCP) measure defined by ${f}_{\mathsf{\text{QLCP}}}\left(\mathcal{T},\delta \right)={m}_{s}\left(\mathcal{T},\delta \right)\cdot {f}_{\mathsf{\text{LCP}}}\left(\mathcal{T},\delta \right)$ as our best fit criteria. By weighting ${f}_{\mathsf{\text{LCP}}}\left(\mathcal{T},\delta \right)$ with the quality estimate ${m}_{s}\left(\mathcal{T},\delta \right)$ the QLCP measure is made less sensitive to the choice of δ than the LCP measure.
After the transformation estimation step we evaluate each recovered transformation ${\mathcal{T}}_{k}$ between ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}^{k}$ w.r.t. the mean alignment error. In particular, we test whether the error $\frac{1}{N}{\sum}_{{p}_{i}\in {\mathcal{B}}_{p},{q}_{l}^{k}\in {\mathcal{B}}_{q}^{k}}\left\right{\mathcal{T}}_{k}{p}_{i}{q}_{i}^{k}{}^{2}$ is less than some predefined threshold, where ${\mathcal{T}}_{k}{p}_{i}$ denotes the transformed point of p_{ i } via ${\mathcal{T}}_{k}$. We further verify each remaining transformation by detecting how many points in $\mathcal{P}$ have correspondences in $\mathcal{Q}$ under ${\mathcal{T}}_{k}$ and then measuring the matching score as described in Equation 17. We say that one point $p\in \mathcal{P}$ have a corresponding point in $\mathcal{Q}$ under ${\mathcal{T}}_{k}$, if there exists some point in $\mathcal{Q}$ closer within δdistance to the transformed point ${\mathcal{T}}_{k}p$, i.e., ${\exists}_{q\in \mathcal{Q}}\left\rightq{\mathcal{T}}_{k}p\left\right\phantom{\rule{2.77695pt}{0ex}}\le \delta $. For efficiency, we use the approximate nearest neighbours [33] for neighbourhood querying in ℝ^{3}. We first select a fixed number of points $\left\{{p}_{i}\right\}\in \mathcal{P}$ and apply the transformation ${\mathcal{T}}_{k}$. Then, for each transformed point ${\mathcal{T}}_{k}{p}_{i}$ we query the nearest neighbour in $\mathcal{Q}$. If enough points of {p_{ i }} are matched, we perform a similar tests for the remaining points in $\mathcal{P}$ and assign to ${\mathcal{T}}_{k}$ a score based on our used QLCP measure.
Finally, we update the current best transformation found with the best QLCP measure and start the next RANSAC iteration.
5 Point sampling approaches
Given two large point sets $\mathcal{P}$ and $\mathcal{Q}$, matching approximatively congruent sets of points over the entire data set is not feasible. Thus, we need efficient point sampling strategies to quickly search corresponding sets and to effectively estimate and verify their transformation on a limited number of meaningful candidates points. The reliability of the proposed registration approach depends, to some extent, on the used sampling strategy. If we sample too many points from scarcely meaningful regions the registration might converge slowly, find the wrong transformation (such as solutions showing sliding effects produced by samples poorly constraining the transformation), or even diverge, especially in the presence of noise and outliers. Several point sampling techniques for point cloud alignment were recently proposed [6, 35–37]. In [6] the random, uniform (over the surface area of a model) and normalspace sampling are considered to evaluate the convergence of the ICP algorithm. The normalspace sampling algorithm tries to uniformly spread the normals of the selected points on the sphere of directions. The aim is to consider points that sufficiently constrain the estimated rigid transformation and improve the alignment quality by reducing surface sliding effects. Gelfand et al. [35] proposed a variant of this algorithm to make the transformation estimation geometrically stable by selecting points that reduce both translational and rotational uncertainties in the ICP algorithm. This technique samples points in order to equally constrain all eigenvectors of the covariance matrix estimated from the points and the normals of the overlapping region of two point clouds. A similar approach was used in [36] to conceive a probability function to guide the selection of stable sample points, which are also constrained by specific features. Torsello et al. [37] proposed a sampling technique to select feature points with highlocal distinctiveness, which is inversely proportional to the average local radius of curvature and related to the area formed by similar points in the neighbourhood of each point. Nehab and Shilane [38] discussed the limitation of the areabased uniform sampling, where the probability of a surface point being sampled is equal for all surface points. This type of sampling might produce points very closed to each other and miss important surface features, which could successfully constrain the transformation. To overcome these drawbacks they proposed a stratified point sampling strategy ensuring an even distribution of the sample points on all surface, which implies a higher probability to catch important surface regions. This algorithm uses the voxelization of the model to generate random samples with controlled intradistances. Other sampling strategies providing with uniform distribution of sample points on a surface are based on the farthest point [39] and Poisson disk sampling [40], which require the computation of geodesic distances.
In this section, we investigate four sampling approaches: random, uniform, probabilistic and combined sampling, which are described in details in the following.
5.1 Random sampling
Random sampling is the simplest and widely used sampling technique. In random sampling, each item or element of the population has an equal chance of being selected at each draw. A sample is random if the method for obtaining the sample meets the criterion of randomness (each element having an equal chance at each draw). The actual composition of the sample itself does not determine whether or not it was a random sample.
5.2 Uniform sampling
To achieve a high probability of success for registering two overlapping point sets, we expect that the sampled point sets have similar point densities. If the acquired surfaces present similar point densities, the above mentioned random sampling deserves to be an acceptable choice, which will also result in similar point densities in the sampled overlapping parts. However, this assumption does not hold in general. Point density depends on the distances and on the incident angles of the scanned surfaces with respect to the scanner sensor position and orientation, respectively. Normally, short distances and small incident angles lead to surfaces with highpoint densities and accuracies, which we consider as highresolution regions. Random sampling does not guarantee an equal spread of the generated points neither on the surface nor in the volume of the scanned model, and can sample points very close to each other. Thus, it is more likely that it misses important surface features than an evenly distributed sampling [38]. This effect is particularly evident in case of scanned data with nonuniform point densities.
Many approaches to perform a sampling of uniformly distributed points on the model surface can be employed [38–40]. We propose a simple and efficient variant of the method presented in [38], which is based on cubic voxelization of point clouds and that provides with samples evenly displaced on typical scanned surfaces. Given a point set $\mathcal{P}$, we can assign them into a set of 3D cubic voxels of equal sizes, which partition the 3D space. For each such voxel, we select the closest point to the voxel center as the sampled point. To obtain a sampled point set of a given size N_{ s }, we start by splitting the minimal bounding box of the point cloud into a small set of 3D cubic voxels. We then iteratively split each voxel into eight small voxels until there are enough sampled points found. With this strategy, however, we cannot obtain an accurately fixedsize set of sampled points since most of the voxels do not contain points. Let N_{ l } be the number of sampled points at the l th level. The final level L is such that its number of points N_{ L } is not less than N_{ s } and the number of points at the previous level is N_{L1}< N_{ s }. To obtain a sampling of the expected size N_{ s }, the simple way is to randomly select N_{ s } points from the N_{ L } points of the L level. To achieve a more uniform distribution, we propose, instead, to resplit all points in $\mathcal{P}$ into a set of cubic voxels of size ${\left(\frac{8{N}_{s}^{2}}{{N}_{L1}{N}_{L}}\right)}^{\frac{1}{3}}{S}_{L1}$, where S_{L1}denotes the voxel size at the (L  1)th level. In this way, the number of uniformly sampled points is very close to N_{ s }, but still not exact. If the number is larger than N_{ s }, we randomly select N_{ s } points from them. Otherwise, we add some new points from sampled point set at the L th level.
5.3 Probabilistic sampling
If the acquired surfaces are very similar in structure, e.g., the surfaces of an indoor environment composed of a main flat wall and several small objects in the front of it, the above two sampling approaches may not be efficient for our proposed registration. One reason is that we select the best transformation based on the degree of overlap in the point sets, but not based on the whole scene structure. In the above example, points from the main single wall weakly constrain translations and generate sliding effects in the final alignment, as already discussed in [6, 35–37]. Another reason is that a selected Npoints base from the wall would have a large number of approximate congruent bases. This would require expensive searches, tests, estimates and verifications of candidate transformations over a large set of approximate congruent bases. To avoid these problems, we expect to consider more points from objects in front of the wall, which would reduce the computation and better constrain the rigid transformation. Similarly to [37], this is achieved by utilizing a probabilistic sampling technique, which selects points based on their likelihoods computed from a specific weight function. The weight function determines how much each point is relevant for registration and is basically associated with the local geometrical properties of the surface at each point. We experimented with two different weight functions ω_{SV} and ω_{APD} based on surface variation and adjacent point distance, respectively.
The surface variation of a point p is defined as:
where λ_{1} ≤ λ_{2} ≤ λ_{3} are the eigenvalues corresponding to the principal components of a set of k points in the neighbourhood of p. ω_{SV}(p) ∈ [0, 1] indicates how much the surface at p locally deviates from the tangent plane [41]. In practice, ω_{SV}(p) roughly approximates the mean curvature at p: when its value is close to zero it indicates that the surface is locally planar, while, when ω_{SV}(p) is large, p identifies an interesting feature like corners, bumps, etc.
The adjacent point distance [42] is defined by exploiting the grid structure of range image $\mathcal{I}$ related to the acquired point cloud. Let p be a valid point associated with a pixel of $\mathcal{I}$, its adjacent point distance ${\mathcal{A}}_{d}\left(p\right)$ is defined as the median of the distances between p and its adjacent valid points p_{ k } in a 3 × 3 neighbourhood of p, i.e., ${\mathcal{A}}_{d}\left(p\right)=\mathsf{\text{media}}{\mathsf{\text{n}}}_{k}\left\rightp{p}_{k}\left\right$. Then, to reduce the effect of measurement noise, a median filter of size 5 × 5 is applied on the resulting adjacent point distance map ${\mathcal{A}}_{d}$ to get a filtered map ${\stackrel{\u0303}{\mathcal{A}}}_{d}$. The weight function ω_{APD} of a point p is then defined as:
where ${\widehat{\mathcal{A}}}_{d}$ denotes the 95th percentile of the adjacent point distances in an ascending order of all points in ${\stackrel{\u0303}{\mathcal{A}}}_{d}$. The use of ${\widehat{\mathcal{A}}}_{d}$ effectively suppresses estimation errors from outliers. ω_{APD}(p) estimates the local sampling sparsity of the scanned surface at p. High values of ω_{APD} characterize those points having a neighbourhood sparsely sampled, typical of corner and edge points and regions scanned with low incident angle, which are likely located in the overlapping area of the models.
5.4 Coupled sampling
Besides the aforementioned three sampling approaches, we also consider to couple different sampling approaches in some order. For example, a point set ${\mathcal{P}}_{1}$ is selected from the initial point set $\mathcal{P}$ based on probabilistic sampling, after that, another point set ${\mathcal{P}}_{2}$ is selected from ${\mathcal{P}}_{1}$ based on uniform sampling. The finally sampled point set can be selected from the initial point set $\mathcal{P}$ via two or more sampling in some order with given sampling ratios. The sampling ratios of the finally sampled point set ${\mathcal{P}}_{S}$ from $\mathcal{P}$ via S sampling processes are denoted as $\left{\mathcal{P}}_{1}\right\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}\left{\mathcal{P}}_{2}\right\phantom{\rule{2.77695pt}{0ex}}:\cdots :\phantom{\rule{2.77695pt}{0ex}}\left{\mathcal{P}}_{S}\right$ where  ·  denotes the set size. For different scans to be aligned, we can select a suitable sampling approach.
6 Integrating texture features
If the acquired models lack geometric details to be used as good anchor points for correct registration, we can still employ features extracted from other visual sources provided by the laser range scanners, e.g., the reflectance image. Laser range scanners are noncontact 3D scanners that measure the distance from the sensor to points in the scene, typically in a regular grid pattern. A natural byproduct of this acquisition process is the reflectance image. A reflectance image (shown in Figure 4) stores in each pixel the portion of laser light reflected from the corresponding surface point, providing with important information about its texture. Both 3D space distribution and texture characteristics of the texture features extracted from the reflectance images can well constrain a rigid transformation in 3D space. As shown in [11, 12] texture features can be effectively used to identify anchor points leading to a wellconstrained rigid transformation.
A feature is accompanied by a descriptor, which locally and compactly describes the texture around the feature pixel. In our application, we are interested in good local feature descriptors, which should have a highlocal distinctiveness, invariant w.r.t. affine transformations, and possibly robust w.r.t. illumination changes and local deformation. Several local feature descriptors were presented in the literature [32, 43]. Among them, the most suitable for our application are the SIFT [44], SURF [45] and FAST [46] descriptors, which we extract from the reflectance images of our scanned models and consider as relevant sample points to be matched by our registration algorithm.
6.1 Reflectance image rectification
The aforementioned features cannot be efficiently extracted directly from the reflectance image associated with a scanned model, since its intrinsic spherical format strongly affects the quality of their descriptors, which are not designed to be robust w.r.t. spherical distortions. Indeed, typical laser scanner acquisition systems are usually composed by a fixed platform and a rotating head, which are naturally modelled by simple spherical projections. The resulting acquired images are then obtained by mapping spherical images onto single image planes. The meridians of a spherical image are mapped to vertical viewing planes, and the parallels are mapped to viewing cones with the vertex in the sensor position.
In order to reduce the distortion induced by the spherical projection, we compute the above mentioned feature descriptors on an atlas of rectified images (shown in Figure 4). This is possible since both the spherical projection and the atlas perspective projections will share the same point of view.
To recover the set of rectified images, we initially select a field of view value α_{fov} (in our experiments α_{fov} = 60°). We then calculate the width w_{ r } and height h_{ r } of each rectified image by constraining the pixel resolution to the resolution of the spherical image equator, i.e., the width w_{ s } of the spherical image $:{w}_{r}={h}_{r}={w}_{s}\cdot \frac{{\alpha}_{\mathsf{\text{fov}}}}{{360}^{\mathsf{\text{o}}}}$. Given the pixel dimensions of the image plane, we define a standard perspective projection whose principal point is represented by the central pixel of the image plane. The only missing intrinsic parameter, the projection focus, can be easily recovered given α_{fov}, w_{ r } and h_{ r }. To determine the extrinsic parameters of each projection, we fix the camera point of view to the spherical projection point of view, i.e., the local origin of the point cloud. We then sample the sphere with v equally distributed directions. The value of v depends on the required field of view and is estimated such that the images contained in the atlas completely cover the initial spherical image. Given the camera direction d_{ i } we recover the remaining extrinsic parameters of the i th image aligning the camera principal direction to d_{ i } and the vertical direction with the vertical direction of the spherical image. Exploiting each projection matrix, we can associate to each final image pixel its corresponding viewing ray, and from this the pixel's corresponding coordinates in the original spherical image. These correspondences are used to perform a bilinear interpolation of the spherical image to recover each single finally rectified image (see Figure 4).
The atlas generation only depends on the field of view used. Small values generate multiple small images, whereas large values generate fewer images but with higherperspective distortions.
6.2 Texture features extraction and integration
From the original reflectance image or the atlas of rectified images generated from a reflectance image, we extract a set of texture features for registration by using the following methods:

1.
SIFT [44]: this feature descriptor encodes the trend of the image local gradient around a pixel as a histogram of typically 128 bins. This descriptor is invariant w.r.t. scaling and rigid 2D transformation and robust w.r.t. affine distortion, addition of noise, and change in illumination. This descriptor is very accurate in identifying relevant interest point, but its computation is usually slow without exploiting the GPU [47].

2.
SURF [45]: this method efficiently detects features by computing a rough approximation of the Hessian matrix using integral images. The resulting descriptor is based on sums of approximated 2D Haar wavelet responses, which is more compact and much faster to compute than the SIFT descriptor.

3.
FAST [48]: this techniques classifies a pixel as corner if there is a sufficiently large set of relevantly brighter (darker) pixels in a circular pixel neighbourhood of fixed radius. This feature detection algorithm is very fast, up to 30 times faster than the SIFT one, but is not invariant w.r.t. scaling and does not provide with effective descriptors [43], which are usually computed by using other techniques (e.g., with the SURF method in this paper).

4.
Harris corner detector [49]: it has been widely used in image processing and computer vision, and computes corner features by analysing the local changes of the image intensity with patches shifted by a small amount in different directions. It is not scale and affine invariant, and usually generates a high number of features.
The 3D points corresponding to the extracted texture features are then considered as sampled points and used with their descriptors by our registration algorithm to match congruent sets of points, as described in Sections 2 and 3.
7 Experimental results
7.1 Test data and evaluation criteria
We tested our pointbased registration algorithm on a variety of input data with varying amount of noise, outliers, and extent of overlap. Our test dataset includes some small object models as shown in Figure 5a, b, c, which are selected from the data provided in the demo application of the 4PCS algorithm [8]^{b}. Other test data are models of large indoor/outdoor scenes acquired by different types of scanners, which are mostly selected from [20], as shown in Figure 5d, e, f, g, h, i, j, k. In total, we tested 11 pairs of scans with different extents of overlap as shown in Figure 5. These 11 pairs of scans are noted by ${\left\{{\mathcal{G}}_{n}\right\}}_{n=\mathsf{\text{a}},\dots ,\mathsf{\text{k}}}$ corresponding to the models shown in Figure 5a, b, c, d, e, f, g, h, i, j, k, respectively. The small objects models in ${\mathcal{G}}_{\mathsf{\text{a}}}{\mathcal{G}}_{\mathsf{\text{c}}}$ consist of around 20,00030,000 points. Indoor scan data $\left({\mathcal{G}}_{\mathsf{\text{d}}}{\mathcal{G}}_{\mathsf{\text{i}}}\right)$ were captured by Z+F IMAGER 5003/5006/5006i laser rangescanners. Outdoor scan data (${\mathcal{G}}_{\mathsf{\text{j}}}$and ${\mathcal{G}}_{\mathsf{\text{k}}}$) were captured by the RIEGL LMSZ420i laser rangescanner. The accuracy of a point acquired by the Z+F IMAGER 5003 (5006 and 5006i) laser scanner is 3 mm (1 mm) along the laserbeam direction at a maximal distance of 50 m from the scanner. The accuracy of the RIEGL LMSZ420i laser scanner is 10 mm at 50 m. The resolution of all indoor scans is about 2, 530 × 1, 080 and the resolution of all outdoor scans is 3, 000 × 666. No surface normals were provided for points in the scan data ${\mathcal{G}}_{\mathsf{\text{a}}}$, ${\mathcal{G}}_{\mathsf{\text{b}}}$ and ${\mathcal{G}}_{\mathsf{\text{c}}}$. For the other scans, we always employed the surface normals into our registration algorithm. The scan data in ${\mathcal{G}}_{\mathsf{\text{a}}}{\mathcal{G}}_{\mathsf{\text{c}}}$ were scaled so that the bounding box diagonal lengths of the first scans in these scan pairs are taken as 100 units. The bounding box diagonal lengths of the first scans of other eight scan pairs ${\mathcal{G}}_{\mathsf{\text{d}}}{\mathcal{G}}_{\mathsf{\text{k}}}$ are around 10, 28, 26, 37, 77, 142, 1, 860, and 398 m, respectively. The overlap rates shown in Figure 5 were computed as follows. The overlap rate of two point sets $\mathcal{P}$ and $\mathcal{Q}$ is defined as $o\left(\mathcal{P},\mathcal{Q}\right)=min\left(\frac{\mathcal{P}\in \mathcal{Q}}{\left\mathcal{P}\right},\frac{\mathcal{Q}\in \mathcal{P}}{\left\mathcal{Q}\right}\right)$, where  ·  denotes the size of a point set and $\mathcal{P}\in \mathcal{Q}$ denotes a subset of $\mathcal{P}$ in which for each point there exist at least one point in $\mathcal{Q}$ whose distance to it is below a given threshold. To produce a reasonable overlap rate in 3D space, we compute the overlap rate by using large point subsets selected from $\mathcal{P}$ and $\mathcal{Q}$ using our voxelbased uniform sampler instead of using original point sets. Our registration algorithm was implemented in C++ on a Windows XP system and integrated into our commercial software JRC 3D Reconstructor. All experiments were executed on a 2.67 GHz Intel machine.
We fixed the poses of all reference scans (i.e., $\left\{\mathcal{Q}\right\}$) to an identity rotation matrix and no translation. To evaluate the transformation estimation accuracy of our registration algorithm, we first employed our algorithm on all tested pairs of scans to recover a good initial transformation for each scan pair and then applied the ICP optimization algorithm [5] to get a wellaligned transformation. We observed that the mean residual error after the ICP registration optimization was always comparable with the laser scanner measurement error, which is much lower than the estimation errors of our proposed Npoints congruent sets (NPCS) registration algorithm. For this reason, we regarded this ICPoptimized transformation as the ground truth transformation ${\mathcal{T}}_{g}$ for evaluation. Thus, given an estimated transformation $\mathcal{T}$ from $\mathcal{P}$ to $\mathcal{Q}$, the estimation error is defined as the median of the point distances after applying $\mathcal{T}$ and ${\mathcal{T}}_{g}$ onto $\mathcal{P}$, i.e., $\mathsf{\text{media}}{\mathsf{\text{n}}}_{p\in \mathcal{P}}\left\right\mathcal{T}p{\mathcal{T}}_{g}p\left\right$.
The transformation estimation accuracy was statistically evaluated by running our registration algorithm N_{run} = 20 times on each tested pair of scans. In each run, we refresh the input data by setting a random pose for the moving scan followed by resampling. For each scan pair, we set a suitable maximal estimation error Δ_{max} in advance. If the estimation error was above Δ_{max}, we considered it as failed. The maximal estimation errors were set as Δ_{max} = 5 units, Δ_{max} = 1 m, Δ_{max} = 2 m and Δ_{max} = 5 m for small object models ${\mathcal{G}}_{\mathsf{\text{a}}}{\mathcal{G}}_{\mathsf{\text{c}}}$, the indoor models ${\mathcal{G}}_{\mathsf{\text{d}}}{\mathcal{G}}_{\mathsf{\text{h}}}$, the large indoor models ${\mathcal{G}}_{\mathsf{\text{i}}}$ and the outdoor models ${\mathcal{G}}_{\mathsf{\text{j}}}{\mathcal{G}}_{\mathsf{\text{k}}}$, respectively. Based on the number N_{suc} of successful estimations and the number N_{run} of runs the following three indicators were then used to evaluate our method: (1) the successful estimation rate ${S}_{r}=\frac{{N}_{\mathsf{\text{suc}}}}{{N}_{\mathsf{\text{run}}}}$ to evaluate its robustness; (2) the median estimation error Δ among all N_{suc} successful estimations to evaluate its accuracy; (3) the median estimation time t over all N_{run} runs to evaluate its efficiency. Note that the estimation time does not include preprocessing computation time (i.e., texture feature detection/matching, point sampling, etc), but it incorporates the codebook building time, which took < 1 s with the basic RANSAC and around two seconds with the probabilistic RANSAC using our parameter setting. The feature detectors listed by increasing processing time are Harris, FAST, SURF and SIFT. The point sampling approaches listed by increasing processing time are random, probabilistic and uniform samplings.
7.2 Performance evaluation
The performance of our proposed NPCS registration algorithm was evaluated on the aforementioned test data. The main parameters were set as follows. We employed congruent points sets of size N = 4. The two main parameters for searching best Npoints congruent sets in Algorithm 1 were set as K_{ p } = 2, 000 and K_{ g } = 50. The exponential parameter λ = 1 for computing our proposed QLCP measure in Equation 17. Given two point sets $\mathcal{P}$ and Q, we tried to recover the transformation from $\mathcal{P}$ (moving point set) to $\mathcal{Q}$ (reference point set). We selected 500 points from $\mathcal{P}$ and 1,000 points from $\mathcal{Q}$ for searching Npoints congruent sets between them. However, for transformation verification, we selected larger subsets, i.e., 1,000 points from $\mathcal{P}$ and 2,000 points from $\mathcal{Q}$. This allows the estimation of a more accurate and robust transformation. The voxelbased uniform sampling approach was used to select these points. Our registration algorithm always used the surface normals of points when provided. The maximal normal deviation for the corresponding point pairs was set to 30°. In our experiments, we used the RANSACbased approach by setting I_{max} = 1, 000 and I_{nou} = 200, which are the allowed maximal iteration number and the continuous iteration number in which no better transformation was found, respectively. The estimation errors of the first three scan pairs ${\mathcal{G}}_{\mathsf{\text{a}}}{\mathcal{G}}_{\mathsf{\text{c}}}$ were reported in canonical units (i.e., 100 units are equal to the bounding box diagonal lengths), while those of the other eight scan pairs ${\mathcal{G}}_{\mathsf{\text{d}}}{\mathcal{G}}_{\mathsf{\text{k}}}$ in meters. Notice that the same parameter values were used in all experiments described below, unless clearly stated otherwise for particular experiments.
Figure 6 shows the performance comparison of our NPCS algorithm with different K_{ p } and K_{ g } on three scan pairs ${\mathcal{G}}_{\mathsf{\text{a}}}$, ${\mathcal{G}}_{\mathsf{\text{e}}}$ and ${\mathcal{G}}_{\mathsf{\text{i}}}$. First, we fixed K_{ g } = 50 and tested the effects of different K_{ p } on the registration accuracy (median estimation errors), efficiency (median estimation times) and robustness (successful estimation rates), as shown in Figure 6a, b, c, respectively. We can observe that larger values of K_{ p } led to an improvement of the registration accuracy and robustness, but also required longer execution times. In Figure 6c, we can notice that 100% successful estimation rates were achieved for all tested scan pairs when K_{ p } ≥ 1, 000. Second, we fixed K_{ p } = 2, 000 and tested the effects of different K_{ g } on registration performance shown in Figure 6d, e, f. The variation of K_{ g } results in similar effects as for K_{ p }. When K_{ g } ≥ 50, the successful estimation rates were always S_{ r } = 100% for all tested scan pairs.
Table 1 illustrates the performance evaluation of our NPCS algorithm on three scan pairs ${\mathcal{G}}_{\mathsf{\text{c}}}$, ${\mathcal{G}}_{\mathsf{\text{d}}}$ and ${\mathcal{G}}_{\mathsf{\text{j}}}$ when the size N of congruent point sets was set as N = 3, ..., 6, respectively. As explained before, large values of N (≥ 6) result in a low probability of successfully selecting Npoints congruent sets between two point sets and in longer estimation times. Small values of N (= 3) led to an increase of the candidate Npoints congruent sets in $\mathcal{Q}$ given N points in $\mathcal{P}$, but sometimes the matched sets found are not enough to well constraint the transformation and the algorithm falls into local minima.
The performance comparison of different sampling approaches on five scan pairs is shown in Table 2. In this experiment, we tested the following sampling strategies: the random sampling; the voxelbased uniform sampling; two probabilistic sampling approaches (probAPDbased on adjacent point distances, probSurfVarbased on surface variations); four texture featurebased sampling using the SIFT, SURF, FAST and HARRIS feature detectors as samplers. Some sampling strategies were coupled and reported in Table 2 combined with the character '+'. For instance, the coupled sampling "random+uniform" denotes that we first applied the random sampling to get a point subset ${\mathcal{P}}_{1}$ from the initial point set $\mathcal{P}$ and then applied the uniform sampling to get the final point subset ${\mathcal{P}}_{2}$ from ${\mathcal{P}}_{1}$. Here, the sampling ratios of all coupled sampling approaches were set to ${\mathcal{P}}_{2}:{\mathcal{P}}_{1}=0.5$. All texture featurebased sampling methods were coupled with random or uniform sampling. From Table 2, we observe that the uniform sampling performs better than the random sampling both when is considered alone and when is coupled with other sampling strategies, i.e., with the probAPDbased sampling and feature detectors. Among the probabilistic sampling methods the probAPDbased sampling resulted to be a very good choice if available, since it relevantly improved the registration robustness and also increased its accuracy. On the contrary, the probSurfVarbased sampling showed lack of robustness, but worked very well for models rich of geometric features, which were captured from geometrically complex scenes, as those in ${\mathcal{G}}_{\mathsf{\text{g}}}$. In some scenes, texture feature detectors turned out to be a good first sampler. In those cases, they improved the transformation accuracy as shown by 'FAST+uniform" in ${\mathcal{G}}_{\mathsf{\text{e}}}\left(\Delta =0.0672\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{m}}\right)$ and by "HARRIS+random" in ${\mathcal{G}}_{\mathsf{\text{g}}}\left(\Delta =0.0121\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{m}}\right)$ and ${\mathcal{G}}_{\mathsf{\text{i}}}\left(\Delta =0.0820\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{m}}\right)$. Among the texture featurebased sampling "SIFT+uniform" and "FAST+uniform" also demonstrated a very good robustness by always scoring S_{ r } = 100% as the probAPDbased sampling, but with improved accuracies for all models apart from those in ${\mathcal{G}}_{\mathsf{\text{j}}}$. Notice that if it is not possible to extract enough texture features the remaining points were selected by the following coupled sampling approach.
Our employed QLCP measure for verifying the estimated transformation depends on two main parameters, i.e., λ in Equation 17 and δdistance. In this paper, we computed $\delta =\alpha {\u1e0b}_{v\mathsf{\text{ref}}}$, where α is a positive constant and ${\u1e0b}_{\mathsf{\text{vref}}}=\mathsf{\text{media}}{\mathsf{\text{n}}}_{q\in \mathcal{Q}v}\left\rightq{n}_{c}\left(q\right)\left\right$, where ${\mathcal{Q}}_{v}\subset \mathcal{Q}$ represents the set of points used for transformation verification and n_{ c }(q) denotes the closest point of q in ${\mathcal{Q}}_{v}$. Table 3 shows the comparison of our NPCS algorithm when using the LCP or the QLCP measure with different parameters on three scan pairs: ${\mathcal{G}}_{\mathsf{\text{b}}}$, ${\mathcal{G}}_{\mathsf{\text{i}}}$, and ${\mathcal{G}}_{\mathsf{\text{j}}}$. In practice, the LCP measure corresponds to the QLCP measure with λ = 0. From Table 3, we observe that our proposed QLCP measure led to higher successful estimation rates and more accurate transformations than the LCP one. In addition, notice that a large λ slightly increased the estimation robustness when a large α was used. In the rest of the experiments reported in the paper, we used λ = 1 and α = 2.
The integration of scene structure information to our NPCS algorithm can greatly reduce the estimation time, as described in Section 2 and illustrated in Table 4. This can also slightly improve the accuracy of the recovered transformation. Here, the scans in ${\mathcal{G}}_{\mathsf{\text{d}}}$ and ${\mathcal{G}}_{\mathsf{\text{f}}}$ were automatically classified as orthogonal scenes.
Table 5 shows the performance evaluation of our NPCS algorithm on ${\mathcal{G}}_{\mathsf{\text{g}}}$ and ${\mathcal{G}}_{\mathsf{\text{h}}}$ using texture features detected from original reflectance images and from atlases of rectified reflectance images. In this experiment the texture features were used to first sample the models and their descriptors were used to improve the matching of congruent sets, as described in Section 2. These techniques were compared with the pure uniform sampling method, which worked well in ${\mathcal{G}}_{\mathsf{\text{g}}}\left({S}_{r}=100\%\right)$, but failed completely in ${\mathcal{G}}_{\mathsf{\text{h}}}\left({S}_{r}=0\%\right)$.
The reason of complete failure in ${\mathcal{G}}_{\mathsf{\text{h}}}$ is due to the presence of wrong transformations with a better QLCP measure than the ground truth transformation. In this example, we clearly showed how the integration of texture features can relevantly improve the robustness of our registration algorithm, especially when they are extracted from rectified images. We recall that there are two ways of integrating texture features. The first one is to consider detected features only as sampled points, as for the experiments reported in Table 2. The second one (reported with the prefix "match") is to rank the matched features using their descriptors and keep in $\mathcal{P}$ the best k correspondences for each feature point. This experiment showed that: (1) the SURF features were more robust than the SIFT features in both ${\mathcal{G}}_{\mathsf{\text{g}}}$ and ${\mathcal{G}}_{\mathsf{\text{h}}}$; (2) the accuracy of the transformation obtained with SURF features is much better than the one obtained with SIFT features in ${\mathcal{G}}_{\mathsf{\text{g}}}$; (3) the features detected from the atlas of rectified reflectance images led to a more robust and generally accurate registration than those extracted from the original reflectance images (the improvement in term of accuracy is particularly evident in ${\mathcal{G}}_{\mathsf{\text{g}}}$, where the very low estimation error within 12 cm is 0.5 h of the bounding box diagonal length); (4) decreasing the number k of best feature correspondences based their descriptors greatly improves the efficiency (i.e., shorter computation times), but possibly deteriorates the robustness (i.e., lower successful rates); (5) in some experiments the features detected from the atlas of rectified reflectance images led to slightly higher estimation errors than those extracted from the original reflectance images. This happens when k is not large enough (e.g., k = 50, 100) and is mainly related to features located close to the rectified image boundaries. Those features extracted in these regions are less reliable due to the lack of enough texture information around them. This effect can be overcome by imposing a given degree of overlap between neighbouring rectified images.
As explained in Section 4.1, the proposed probabilitybased RANSAC is especially useful for aligning two scans with a small overlap. This advantage is evident in aligning the scan pair ${\mathcal{G}}_{\mathsf{\text{k}}}$ with a very small overlap (12%) as reported in Table 6, where S_{ r } = 95% for the probabilistic RANSAC, but S_{ r } = 65% for the basic RANSAC. The probabilitybased RANSAC led to similar results in both ${\mathcal{G}}_{\mathsf{\text{a}}}$ and ${\mathcal{G}}_{\mathsf{\text{e}}}$ with large overlaps, but with slightly better robustness in ${\mathcal{G}}_{\mathsf{\text{e}}}$. Note that K_{ p } = 5, 000 and K_{ g } = 100 for ${\mathcal{G}}_{\mathsf{\text{k}}}$ were used in this experiment.
Figure 7 shows seven outdoor scans acquired from the train station Gare de Lyon in Paris, which were automatically aligned by applying our NPCS algorithm followed by ICP registration. These seven scans were coupled by proximity to generate a set of scan pairs to register one after another. The generated pairs of scans have overlap rates of 39, 51, 31, 56, 25 and 29%. The first two scans in these seven scans comprise the tested scan pair ${\mathcal{G}}_{\mathsf{\text{j}}}$.
Furthermore, our algorithm was tested on two scans acquired from the same scene at different time instants with some changes as shown in Figure 8, where we acquired two scans almost at the same position after some people moved. In this example, the overlap rate is around 71%. Our algorithm always successfully and accurately registered these two scans (i.e., S_{ r } = 100% and Δ = 2.86 cm).
Finally, we also compared our algorithm with the 4PCS algorithm [8] on five scan pairs ${\mathcal{G}}_{\mathsf{\text{a}}}$, ${\mathcal{G}}_{\mathsf{\text{c}}}$, ${\mathcal{G}}_{\mathsf{\text{i}}}$, ${\mathcal{G}}_{\mathsf{\text{j}}}$ and ${\mathcal{G}}_{\mathsf{\text{k}}}$. Initially we extracted 2,000 points from each scan using our proposed voxelbased uniform sampling approach. To make the comparison fair, we performed both the 4PCS algorithm and our proposed NPCS one with N = 4 using 600 randomly selected points from preextracted 2,000 points of each scan. Point surface normals (not available in ${\mathcal{G}}_{\mathsf{\text{a}}}$ and ${\mathcal{G}}_{\mathsf{\text{c}}}$) and same normal difference thresholds were used in both algorithms. The estimated overlapping rates of these five scan pairs shown in Figure 5 were used in 4PCS except for ${\mathcal{G}}_{\mathsf{\text{a}}}$ where 90% overlapping rate was used. However, in NPCS other parameters were fixed for all five tested scan pairs as mentioned before. Table 7 shows the performance comparison of both algorithms. The 4PCS algorithm sometimes worked successfully in the first four tested scan pairs (S_{ r } = 3090%), but always failed in ${\mathcal{G}}_{\mathsf{\text{k}}}\left({S}_{r}=0\%\right)$. However, our NPCS algorithm almost always succeeded in the first four tested scan pairs and also succeeded in ${\mathcal{G}}_{\mathsf{\text{k}}}$ with a high successful rate (S_{ r } = 60%). By utilizing the probabilitybased RANSAC, a higher successful rate can be obtained in ${\mathcal{G}}_{\mathsf{\text{k}}}$ (S_{ r } = 95%, see Table 6). Note that the same maximal estimation errors for all tested scan pairs were used in the 4PCS algorithm for computing the estimation successful rate. However, the success of the 4PCS algorithm depends on the provided estimate for the overlap rate of two point sets to some extent. In our application, this overlap rate cannot be robustly approximated in advance. Thus, a feasible strategy would be to try it with different overlap rates until success. This obviously increases the overall execution time. However, in this comparison experiment, we directly used the preestimated overlapping rates with recovered ground truth transformations except that 90% overlapping rate was used in ${\mathcal{G}}_{\mathsf{\text{a}}}$. Another drawback of the 4PCS algorithm is related to the coplanarity constrain applied on the 4points bases. Although this greatly reduces the search space, it may misslead the algorithm to false transformation estimations, especially in case of a very low overlap rate. We also noticed that in this case the computation time of the 4PCS algorithm increases significantly, since no successful transformations can be found efficiently. Finally, the NPCS algorithm achieved a higher estimation accuracy than the 4PCS one especially in ${\mathcal{G}}_{\mathsf{\text{c}}}$ and ${\mathcal{G}}_{\mathsf{\text{j}}}$ mainly due to our proposed QLCP measure as reported in Table 3.
8 Conclusions
In this paper, we presented the NPCS registration, which is a robust and efficient approach for automatically aligning two point sets with overlap. Given two point sets $\mathcal{P}$ and $\mathcal{Q}$, our algorithm randomly searches for sets of congruent groups of Npoints in $\mathcal{P}$ and $\mathcal{Q}$, which lead to the best estimate of the transformation. This is achieved by employing a RANSACbased algorithm, which can also estimate the matching probability of each point to drive the search of possibly successful congruent bases of Npoints. This probabilistic RANSAC approach improves the registration robustness especially for aligning two point sets with a small overlap. The search of congruent sets is efficiently performed by using a fast search codebook inspired by [20] to relevantly reduce the execution time. Our proposed search method can efficiently combine different metric functions to match points and point pairs in a multidimensional search space, which can be defined by geometric and texture feature descriptors and geometrical constraints of set of sampled points. This makes our framework general and flexible The efficient combination of feature descriptors can relevantly improve the registration performance, accuracy, and robustness of models with known specific characteristics. Moreover, we proposed a method to extract texture features from an atlas of rectified images recovered by sampling the reflectance image spherical field of view. The resulting feature descriptors were less sensitive to spherical distortions yielding more precise matches than those extracted from the original reflectance image.
To further improve the registration robustness, we introduced a new measure called QLCP. This is used to efficiently verify the rigid transformation estimated at each RANSAC iteration. The QLCP considers both the quantity and quality of the overlapping point set, and improved the verification robustness w.r.t. the LCP measure utilized in the 4PCS algorithm [8].
Our proposed NPCS algorithm was overall evaluated on a variety of input data with varying amount of noise, outliers, and extent of overlap (12100%), acquired from small objects, indoor and outdoor scenes. The experiments focused on three main aspects: robustness (successful estimation rate), accuracy (median estimation error), and efficiency (median estimation time), respectively. In almost all cases, our NPCS algorithm (executed with the same parameters) successfully aligned two point clouds in less than a minute by accurately recovering their rigid transformations (the estimation errors are mostly within 0.55 h of their bounding box diagonal lengths). We tested our registration method with different sampling techniques and experimentally demonstrated the benefits of using methods generating uniformly distributed point samples w.r.t. randombased sampling strategies. Our proposed voxelbased uniform sampling approach was robust and efficient in almost all tested cases. We showed that the proposed QLCP measure is more reliable than the LCP measure. Still, it is possible to obtain wrong estimations, e.g., the failure related to the scan pair ${\mathcal{G}}_{\mathsf{\text{h}}}$. Nevertheless, these wrong estimations can be successfully recovered by using texturebased features detected from the atlas of rectified reflectance images as shown by our experiments, or by using other suitable geometric features as those presented in [10, 13, 50]. In our experiments the NPCS registration performed, on average, better than the 4PCS method in terms of robustness and accuracy. In some experiments, in particular in presence of large overlaps, the execution time of our algorithm was slightly higher than 4PCS. While aligning two scans with a small overlap, our NPCS algorithm took similar computation time (mostly less than a minute) to achieve the good solution. However, 4PCS sometimes took more than 10 min, but still failed (e.g., in ${\mathcal{G}}_{\mathsf{\text{k}}}$).
Future work will be focused on improving the performance of our method by exploiting a parallel hardware and to improve the robustness of our QLCP measure for inspection applications, which typically require to automatically align models captured at different times from scenes possibly containing changes.
Endnotes
^{a}Only upper triangle part is stored in practice due to the symmetry of the point pairs lookup table. ^{b}The demo application of 4PCS algorithm [8] is available at http://graphics.stanford.edu/~niloy/research/fpcs/4PCS_demo.html.
Algorithm 1
Find the best K_{ g } Npoints bases ${\left\{{\mathcal{B}}_{q}^{k}\right\}}_{k=1}^{{K}_{g}}$ in $\mathcal{Q}$ which are approximate congruent to the Npoints base ${\mathcal{B}}_{p}$ in $\mathcal{P}$.
$\mathcal{M}\leftarrow {\left\{\left({q}_{1}^{k},{q}_{2}^{k}\right)\right\}}_{k=1}^{{K}_{p}}$, i.e., the best K_{ p } point pairs in $\mathcal{Q}$ possibly matching the point pair (p_{1}, p_{2}) in ${\mathcal{B}}_{p}$, according with the similarity score in Equation 3.
for i = 2 to N  1 do
case 1: $\mathcal{S}\leftarrow {\left\{\left({q}_{i}^{k},{q}_{i+1}^{k}\right)\right\}}_{k=1}^{{K}_{p}}$ in $\mathcal{Q}$ possibly matching the point pair (p_{ i }, p_{i+1}) ${\mathcal{B}}_{p}$.
case 2: collect ${\mathcal{M}}_{f}\left({p}_{i+1}\right)$
${\mathcal{M}}^{\prime}\leftarrow \mathrm{0\u0338}$
for each group $\left({q}_{1}^{m},\dots ,{q}_{i}^{m}\right)$ of points in $\mathcal{M}={\left\{\left({q}_{1}^{m},\dots ,{q}_{i}^{m}\right)\right\}}_{m=1}^{\left\mathcal{M}\right}$ do
case 1: group point pair
for each point pair $\left({q}_{i}^{k},{q}_{i+1}^{k}\right)$ in $\mathcal{S}$ do
if ${q}_{i}^{m}={q}_{i}^{k}$ and ${S}_{f}\left({p}_{i+1},{q}_{i+1}^{k}\right){\prod}_{j=1}^{i1}{\prod}_{k=1}^{K}{s}_{k}\left({m}_{k}\left({p}_{j}^{m},{p}_{i+1}\right),\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{m}_{k}\left({q}_{j},{q}_{s}\right)\right)=1$ then
${\mathcal{M}}^{\prime}\leftarrow \left\{{\mathcal{M}}^{\prime},\left({q}_{1}^{m},\dots ,{q}_{i}^{m},\phantom{\rule{2.77695pt}{0ex}}{q}_{i+1}^{k}\right)\right\}$
end if
end for
case 2: add a single point
for each point ${q}_{s}\in {\mathcal{M}}_{f}\left({p}_{i+1}\right)$ do
if ${\prod}_{j=1}^{i}{\prod}_{k=1}^{K}{s}_{k}\left({m}_{k}\left({p}_{j}^{m},{p}_{i+1}\right),\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{m}_{k}\left({q}_{j},{q}_{s}\right)\right)=1$ then
${\mathcal{M}}^{\prime}\leftarrow \left\{{\mathcal{M}}^{\prime},\left({q}_{1}^{m},\dots ,{q}_{i}^{m},\phantom{\rule{2.77695pt}{0ex}}{q}_{s}\right)\right\}$
end if
end for
end for
$\mathcal{M}\leftarrow {\mathcal{M}}^{\prime}$
end for
return ${\left\{{\mathcal{B}}_{q}^{k}\right\}}_{k=1}^{{K}_{g}}$ by filtering $\mathcal{M}$ according with Eq. (3)
References
 1.
Hartley RI, Zisserman A: Multiple View Geometry in Computer Vision. 21st edition. Cambridge University Press; 2004. ISBN: 0521540518
 2.
Yao J, Cham WK: Robust multiview feature matching from multiple unordered views. Pattern Recogn 2007,40(11):30813099. 10.1016/j.patcog.2007.02.011
 3.
Pollefeys M, Van Gool L, Vergauwen M, Verbiest F, Cornelis K, Tops J, Koch R: Visual modeling with a handHeld camera. Int J Comput Vis 2004,59(3):207232.
 4.
Zhang W, Yao J, Cham WK: 3D modeling from multiple images. The Seventh International Symposium on Neural Networks (ISNN 2010) 2010, 97103.
 5.
Besl PJ, McKay HD: A method for registration of 3D shapes. IEEE Trans Pattern Anal Mach Intell 1992,14(2):239256. 10.1109/34.121791
 6.
Rusinkiewicz S, Levoy M: Efficient Variants of the ICP Algorithm. International Conference on 3D Digital Imaging and Modeling (3DIM) 2001.
 7.
Matabosch C, Salvi J, Fofi D, Meriaudeau F: Range image registration for industrial inspection. Mach Vis Appl Ind Insp XIII 2005, 216227.
 8.
Aiger D, Mitra NJ, CohenOr D: 4Points Congruent Sets for Robust Pairwise Surface Registration. ACM SIGGRAPH 2008.
 9.
Fischler MA, Bolles RC: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 1987,24(6):726740.
 10.
Gelfand N, Mitra NJ, Guibas LJ, Pottmann H: Robust Global Registration. Third Eurographics Symposium on Geometry Processing (SGP) 2005.
 11.
Yu L, Zhang D, Holden E: A fast and fully automatic registration approach based on point features for multisource remotesensing images. Computers Geosci 2008,34(7):838848. 10.1016/j.cageo.2007.10.005
 12.
Kang Z: Automatic Registration of Terrestrial Point Cloud Using Panoramic Reflectance Images. International Society for Photogrammetry and Remote Sensing 2008.
 13.
Zaharescu A, Boyer E, Varanasi K, Horaud R: Surface Feature Detection and Description with Applications to Mesh Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2009.
 14.
Chalfant JS, Patrikalakis NM: Threedimensional object registration using wavelet features. Eng Computers 2009,25(3):303318. 10.1007/s0036600901265
 15.
Smith ER, Radke RJ, Stewart CV: Physical Scale IntensityBased Range Keypoints. 3D Data Processing, Visualization, and Transmission (3DPVT) 2010.
 16.
Johnson AE, Hebert M: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans Pattern Anal Mach Intell 1999,21(5):433449. 10.1109/34.765655
 17.
Huber DF, Hebert M: Fully automatic registration of multiple 3D data sets. Image Vis Comput 2003,21(7):637650. 10.1016/S02628856(03)00060X
 18.
Makadia A, Patterson AI, Daniilidis K: Fully Automatic Registration of 3D Point Clouds. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2006.
 19.
Stamos I, Leordeanu M: Automated Featurebased Range Registration of Urban Scenes of Large Scale. IEEE Conference on Computer Vision and Pattern Recognition 2003.
 20.
Yao J, Ruggeri MR, Taddei P, Sequeira V: Automatic scan registration using 3D linear and planar features. 3D Res 2010,1(3):118.
 21.
Chao C, Stamos I: Semiautomatic Range to Range Registration: a Featurebased Method. International Conference on 3D Digital Imaging and Modeling (3DIM) 2005.
 22.
Dold C, Brenner C: Registration of Terrestrial Laser Scanning Data Using Planar Patches and Image Data. International Society for Photogrammetry and Remote Sensing 2006.
 23.
Chao C, Stamos I: Range Image Registration Based on Circular Features. Proceedings of International Symposium on 3D Data Processing Visualization and Transmission (3DPVT) 2006.
 24.
Franaszek M, Cheok GS, Witzgall C: Fast automatic registration of range images from 3D imaging systems using sphere targets. Autom Constr 2009,18(3):265274. 10.1016/j.autcon.2008.08.003
 25.
Rabbani T, van den Heuvel F: Automatic Point Cloud Registration Using Constrained Search for Corresponding Objects. 7th Conference on Optical 3D Measurement Techniques 2005.
 26.
Silva L, Bellon OR, Boyer KL: Precision range image registration using a robust surface interpenetration measure and enhanced genetic algorithms. IEEE Trans Pattern Anal Mach Intell 2005, 27: 762776.
 27.
Boughorbel F, Mercimek M, Koschan A, Abidi MA: A new method for the registration of threedimensional pointsets: the Gaussian fields framework. Image Vis Comput 2010,28(1):124137. 10.1016/j.imavis.2009.05.003
 28.
Gold S, Rangarajan A, Lu CP, Pappu S, Mjolsness E: New algorithms for 2D and 3D point matchingpose estimation and correspondence. Pattern Recogn 1998,31(8):10191031. 10.1016/S00313203(98)800101
 29.
Granger S, Pennec X: Multiscale EMICP: A Fast and Robust Approach for Surface Registration. In Proc. of the 7th European Conference on Computer VisionPart IV, ECCV '02. London, UK, UK SpringerVerlag; 2002:418432.
 30.
Liu Y: Automatic range image registration in the Markov chain. IEEE Trans Pattern Anal Mach Intell 2010,32(1):1229.
 31.
Tamaki T, Abe M, Raytchev B, Kaneda K: Softassign and EMICP on GPU. International Conference on Networking and Computing (ICNC) 2010, 179183.
 32.
Juan L, Gwon O: A Comparison of SIFT, PCASIFT and SURF. Int J Image Process IJIP 2009,3(4):143152.
 33.
Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. ACMSIAM Symposium on discrete algorithms 1994.
 34.
Zhang Z, Faugeras OD: Determining motion from 3D line segment matches: a comparative study. Image Vis Comput 1991, 9: 1019. 10.1016/02628856(91)90043O
 35.
Gelfand N, Ikemoto L, Rusinkiewicz S, Levoy M: Geometrically Stable Sampling for the ICP Algorithm. Fourth International Conference on 3D Digital Imaging and Modeling (3DIM) 2003.
 36.
Brown BJ, Rusinkiewicz S: Global nonrigid alignment of 3D scans. ACM Trans Graph 2007,26(3):21. 10.1145/1276377.1276404
 37.
Torsello A, Rodola E, Albarelli A: Sampling Relevant Points for Surface Registration. International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT2011) 2011.
 38.
Nehab D, Shilane P: Stratified Point Sampling of 3D Models. Proc of the Symposium on PointBased Graphics 2004, 4956.
 39.
Ruggeri MR, Patanè G, Spagnuolo M, Saupe D: Spectraldriven isometryinvariant matching of 3D shapes. Int J Comput Vis 2010, 89: 248265. 10.1007/s1126300902500
 40.
Bowers J, Wang R, Wei LY, Maletz D: Parallel Poisson disk sampling with spectrum analysis on surfaces. ACM Trans Graph 2010, 29: 166:1166:10.
 41.
Pauly M, Gross M, Kobbelt LP: Efficient simplification of pointsampled surfaces. Proc of Vis 2002, 163170.
 42.
Yao J, Taddei P, Ruggeri MR, Sequeira V: Complex and photorealistic scene representation based on range planar segmentation and model fusion. Int J Robotics Res 2011,30(10):12631283. 10.1177/0278364911410754
 43.
Tuytelaars T, Mikolajczyk K: Local invariant feature detectors: a survey. Found Trends Comput Graph Vis 2008,3(3):177280.
 44.
Lowe DG: Distinctive image features from scaleinvariant keypoints. Int J Computer Vis 2004,60(2):91110.
 45.
Bay H, Ess A, Tuytelaars T, Van Gool L: Speededup robust features (SURF). Comput Vis Image Underst 2008,110(3):346359. 10.1016/j.cviu.2007.09.014
 46.
Rosten E, Drummond T: Machine Learning for Highspeed Corner Detection. European Conference on Computer Vision 2006, 1: 430443.
 47.
Sinha SN, michael Frahm J, Pollefeys M, Genc Y: GPUbased Video Feature Tracking and Matching. Tech. rep., in Workshop on Edge Computing Using New Commodity Architectures 2006.
 48.
Rosten E, Porter R, Drummond T: Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 2010,32(1):105119.
 49.
Harris C, Stephens M: A Combined Corner and Edge Detector. The Fourth Alvey Vision Conference 1988, 147151.
 50.
Albarelli A, Rodola E, Torsello A: Loosely Distinctive Features for Robust Surface Alignment. European Conference on Computer Vision (ECCV2010) 2010, 519532.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Yao, J., Ruggeri, M.R., Taddei, P. et al. Robust surface registration using Npoints approximate congruent sets. EURASIP J. Adv. Signal Process. 2011, 72 (2011). https://doi.org/10.1186/16876180201172
Received:
Accepted:
Published:
Keywords
 Point Cloud
 Scale Invariant Feature Transform
 Iterative Close Point
 Registration Algorithm
 Point Pair