Robust surface registration using N-points approximate congruent sets
- Jian Yao^{1}Email author,
- Mauro R Ruggeri^{1},
- Pierluigi Taddei^{1} and
- Vítor Sequeira^{1}
DOI: 10.1186/1687-6180-2011-72
© Yao et al; licensee Springer. 2011
Received: 26 May 2011
Accepted: 23 September 2011
Published: 23 September 2011
Abstract
Scans acquired by 3D sensors are typically represented in a local coordinate system. When multiple scans, taken from different locations, represent the same scene these must be registered to a common reference frame. We propose a fast and robust registration approach to automatically align two scans by finding two sets of N-points, that are approximately congruent under rigid transformation and leading to a good estimate of the transformation between their corresponding point clouds. Given two scans, our algorithm randomly searches for the best sets of congruent groups of points using a RANSAC-based approach. To successfully and reliably align two scans when there is only a small overlap, we improve the basic RANSAC random selection step by employing a weight function that approximates the probability of each pair of points in one scan to match one pair in the other. The search time to find pairs of congruent sets of N-points is greatly reduced by employing a fast search codebook based on both binary and multi-dimensional lookup tables. Moreover, we introduce a novel indicator of the overlapping region quality which is used to verify the estimated rigid transformation and to improve the alignment robustness. Our framework is general enough to incorporate and efficiently combine different point descriptors derived from geometric and texture-based feature points or scene geometrical characteristics. We also present a method to improve the matching effectiveness of texture feature descriptors by extracting them from an atlas of rectified images recovered from the scan reflectance image. Our algorithm is robust with respect to different sampling densities and also resilient to noise and outliers. We demonstrate its robustness and efficiency on several challenging scan datasets with varying degree of noise, outliers, extent of overlap, acquired from indoor and outdoor scenarios.
1 Introduction
In the past decade, there was a growing interest in 3-D reconstruction and realistic 3-D modelling of large-scale scenes such as urban structures. Applications of such models include virtual reality, cultural heritage, urban planning, and architecture. Commonly, these applications require a combination of laser sensing technology with traditional digital photography.
Applications that employ only digital images extract 3-D information using either a single moving camera or a multi camera system, such as a stereo rig. In both cases, the system extracts and matches distinctive features (typically points) among the available images and estimates both their 3-D positions and the camera parameters [1, 2]. It is then possible to exploit the result of this first step to perform a dense point reconstruction by estimating a depth map for each image [3, 4]. On one hand, these approaches are useful for those applications requiring a robust and low-cost acquisition system. On the other hand, laser sensing technology yields much higher precision and resolution. Thus, the laser sensing technology represents an effective and powerful tool for achieving accurate geometric representations of complex surfaces of real scenes.
In recent years, 3-D laser scanners able to provide satisfying measurement accuracy for different applications become commercially available. These sensors are used to acquire a complex real scene through multiple scans taken from different positions to fully describe the scene while reducing the number of occluded surfaces. For this reason, it is important to employ a systematic and automatic way to align, or register, multiple 3-D scans to represent and visualize them in a common coordinate system. Geometrically, given a point cloud $\mathcal{Q}$ considered as reference and a second point cloud $\mathcal{P}$, the problem of registration consists in finding the rigid transformation $\mathcal{T}$, which optimally aligns $\mathcal{P}$ to $\mathcal{Q}$ in its coordinate system.
1.1 Related works
The iterative closest point (ICP) algorithm [5] is the de facto standard to compute the rigid transformation $\mathcal{T}$ between two point clouds. It is basically an optimization method that starts from an initial estimate of $\mathcal{T}$ and iteratively refines it by generating pairs of corresponding points on the scans and minimizing an error metric, e.g., the sum of squared distances between corresponding points. Although several variants of ICP were presented [6] to improve its efficiency, the main problem is to achieve a good initial estimate of $\mathcal{T}$ since the ICP optimization can easily stop in local minima.
The problem of automatically registering two scans was achieved with a wide variety of methods [7]. Most of these extract sets of feature points, which are automatically matched to recover a good approximation of $\mathcal{T}$. Aiger et al. [8] proposed to automatically match congruent sets of four roughly coplanar points to solve the largest common point (LCP) set problem. Congruent sets of points have similar shapes defined in terms of point distances and normal deviations. The best match between congruent sets is randomly found by following the RANdom SAmple Consensus (RANSAC) approach [9]. Other approaches use shape descriptors to identify sets of candidate feature points to be matched. Gendalf et al. [10] use a 3-D integral invariant shape descriptor to detect feature points, which are matched in sets of three items using a branch-and-bound algorithm. Other interesting shape descriptors invariant with respect to rigid transformation are used to identify feature points, such as scale invariant feature transforms (SIFT)s [11, 12] or Harris corners [12] extracted from reflectance images, 3-D SIFT-like descriptors extracted from triangle meshes approximating the point clouds [13], wavelet features [14], intensity-based range features [15], spin images [16, 17], and extended Gaussian images [18].
Methods to automatically recover the rigid transformations from matching sets of higher-level features were also presented. The advantage of these approaches is the reduction of the search space identified by two small sets of features, which results in efficient matching, but that should account for extra computation time due to scene segmentation or feature detection. Among the feature types presented the most interesting are: lines [19, 20], planes [19–22], circles [23], spheres [24] and other fitted geometric primitives [25].
Other studies proposed to formulate the registration as an energy optimization problem that does not need any explicit set of point correspondences. Silva et al. [26] proposed to use an enhanced genetic algorithm to solve the range image registration problem using a robust surface interpenetration measure. Boughorbel et al. [27] defined an energy minimization function based on Gaussian fields to solve the 3-D automatic registration.
The last relevant class of registration approaches is based on modelling the alignment of two point sets as an assignment problem, where the probability of a point in one set to has a correspondence in the other set is estimated and maximized with expectation maximization (EM) algorithms. Popular methods following this approach are known as SoftAssign [28] and EM-ICP [29], which are both based on entropy maximization principles, but imposing different constraints for problem optimization, i.e., a two-way constraint embedded into the deterministic annealing scheme for SoftAssign and a one-way constraint for the EM-ICP. A detailed review and analysis of these methods was provided in [30], where Liu proposed a method to overcome SoftAssign and EM-ICP limitations based on modelling the registration problem as Markov chain of thermodynamic systems and on an entropy model derived from the Lyapunov function. Furthermore, fast implementations on GPU of the SoftAssign and EM-ICP algorithms were recently presented by Tamaki et al. [31].
1.2 Our algorithm
Our method utilizes 3-D points (possibly associated with point descriptors, as it is described in Section 6) to achieve automated registration. It automatically aligns two scans by finding two N-points approximate congruent sets leading to a good estimate of the transformation $\mathcal{T}$ between the corresponding point clouds. $\mathcal{T}$ is then further refined via the ICP algorithm.
- 1)
Random selection of a N-points base ${\mathcal{B}}_{p}$ in $\mathcal{P}$.
- 2)
Approximate congruent group selection of N-points bases in $\mathcal{Q}$. The definition of approximate point set congruence is described in Section 2. This selection is achieved by using a general codebook to efficiently find approximate congruent points bases under rigid transform by exploiting combinations of feature point descriptors when available (see Section 3).
- 3)
Estimation of the transformation $\mathcal{T}$ between $\mathcal{P}$ and $\mathcal{Q}$ given a randomly selected N-points base ${\mathcal{B}}_{p}$ in $\mathcal{P}$ and each extracted approximate congruent N-points base ${\mathcal{B}}_{p}$ in $\mathcal{Q}$.
- 4)
Verification of the transformation. $\mathcal{T}$ is verified using all possibly corresponding points after the alignment. The verification employs our proposed quality-based largest common pointset (QLCP) measure described in Section 4.4.
The best transformation is selected as the one yielding the best QLCP measure and then further refined with the ICP algorithm. As in [20], we present a variant of this RANSAC-based algorithm, which improves the random selection step by employing a weight function approximating the probability of each pair of features in $\mathcal{P}$ to be matched with one in $\mathcal{Q}$. We call this variant as probability-based RANSAC approach and describe it in Section 4.1.
Our algorithm is robust with respect to different sampling densities and the typical noise introduced by laser scanner acquisition. This is achieved by employing suitable point sampling approaches described in Section 5, and by using feature points and their descriptors to effectively constrain rigid transformations on noisy point sets.
Through our proposed matching framework presented in Sections 2 and 3, we efficiently match points and point pairs in a multi-dimensional space defined by a set of available geometric and texture feature descriptors and geometrical constrains of a set of sampled points. The matching is performed by combining suitable metric functions to compare the provided descriptors.
Any type of features carrying suitable distance functions to be compared can be easily integrated into our matching framework. The major benefit of this approach is to make possible efficient customizations for specific applications aiming at relevantly improving the registration performance in terms of robustness, accuracy, and execution time.
We also present a method to improve the matching effectiveness of texture features extracted from typical spherical reflectance images acquired by laser scanners. It consists in extracting features from atlases of rectified perspective images constructed by sampling the reflectance image spherical field of view at suitable angles. This approach mitigates the effect of spherical distortion on the resulting feature signatures so that they can be matched with higher reliability.
The robustness, accuracy and efficiency of our method were overall evaluated on several challenging scan datasets acquired from indoor and outdoor scenes as described in Section 7.
2 Approximate point set congruence
Given two point sets $\mathcal{P}$ and $\mathcal{Q}$, we assume ${\mathcal{B}}_{p}={\left\{{p}_{i}|{p}_{i}\in \mathcal{P}\right\}}_{i=1}^{N}$ and ${\mathcal{B}}_{q}={\left\{{q}_{i}|{q}_{i}\in \mathcal{Q}\right\}}_{i=1}^{N}$ to be the two corresponding N-points bases from $\mathcal{P}$ and $\mathcal{Q}$, respectively. This means that for each point ${p}_{i}\in {\mathcal{B}}_{p}$ there exists one and only one corresponding point ${q}_{i}\in {\mathcal{B}}_{q}$. We consider the two sets to be congruent, if they are approximatively similar in shape and have a similar distribution in 3-D space. We define both a similarity score function and a binary similarity score function in order to measure the congruency of two matching N-points bases as follows.
For each type of local descriptor or points pair measurement, we define a similarity difference function d(·, ·) that is invariant under rigid transformation of each single N-points base. In particular, given two descriptors or measurements v_{ p } and v_{ q }, then d(v_{ p }, v_{ q }) is represented by a real positive value that states how different the two descriptors or measurements are.
where $\left\{{w}_{l}^{f}\right\}$ and $\left\{{w}_{k}^{m}\right\}$ are user-defined weights, ${N}_{f}=2N{\sum}_{l=1}^{L}{w}_{l}^{f}$ and ${N}_{m}=2N\left(N-1\right){\sum}_{k=1}^{K}{w}_{k}^{m}$ are normalization factors. Notice that s_{ c } is defined such that its values fall in the range [0, 1], where a higher value represents a higher similarity between the two N-points bases.
where ${s}_{c}\left({\mathcal{B}}_{p},{\mathcal{B}}_{q}\right)$ represents the product of all boolean similarities associated to the matching points of the two sets. We consider ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}$ to be approximate congruent only if $s\left({\mathcal{B}}_{p},{\mathcal{B}}_{q}\right)=1$.
If the reflectance or colour images associated with the range scans are available we can extract the corresponding feature points (e.g., SIFT or SURF feature points [32]) associated with each 3-D point of ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}$. The corresponding local feature descriptors can be used to define a suitable similarity difference.
In some application, it is possible to exploit some information about the environment to define additional descriptors. This is the case of scans representing structural scenes with one common and main normal direction (ground floor scene) or environment with three common orthogonal normal directions (orthogonal scene). For instance, in an indoor/outdoor scene with a common and main ground floor plane, all points lying on the ground plane roughly have the same normal directions. The type of a structural scene can be automatically detected and classified by clustering the surface normals.
If both $\mathcal{P}$ and $\mathcal{Q}$ are acquired from an orthogonal scene, $\mathcal{P}$ and $\mathcal{Q}$ are first transformed to align their corresponding three orthogonal point normals to the x-, y- and z-axis, respectively. Then, we exploit two more local descriptors defined as f_{ x }(p) = n_{angle}(p, n_{ x }) and f_{ y }(p) = n_{angle}(p, n_{ y }), where n_{ x } = (1, 0, 0)^{⊤} and n_{ y } = (0, 1, 0)^{⊤}. These descriptors represent the inclinations of the surface passing through p w.r.t. the additional main axes n_{ x } and n_{ y }, respectively. In addition, we introduce two other points pair measurements ${m}_{x}\left({p}_{i},{p}_{j}\right)={p}_{i}^{x}-{p}_{j}^{x}$, and ${m}_{y}\left({p}_{i},{p}_{j}\right)={p}_{i}^{y}-{p}_{j}^{y}$. The corresponding similarity difference of f_{ x }, f_{ y }, m_{ x } and m_{ y } are defined in the same way as in Equations (9) and (10), respectively.
3 Fast search codebook
Using the criteria defined in the previous section, we are able to evaluate the congruency of two given N-points bases. To perform the registration, we need to couple a N-points base ${\mathcal{B}}_{p}\in \mathcal{P}$ with the N- points base ${\mathcal{B}}_{q}\in \mathcal{Q}$ having a high-similarity score. This task requires a search over all possibly congruent N-points bases in $\mathcal{Q}$. Using exhaustive search approaches is impractical due to the large number of candidates in $\mathcal{Q}$. To solve this problem, we build a codebook from $\mathcal{P}$ and $\mathcal{Q}$ composed of two different data structures used to perform a fast search of possibly corresponding points (as described in Section 3.1) and point pairs (as described in Section 3.2). In particular, we employ a boolean table S_{ f } used to detect candidate point matches in $\mathcal{Q}$ of a selected point ${p}_{i}\in \mathcal{P}$ and a multi-dimensional table S_{ m } used to detect candidate point pair matches of a selected pair of points $\left({p}_{i},{p}_{j}\right)\in \mathcal{P}$. If the number of all detected congruent N-points bases is still large, we further need to compute a similarity score between ${\mathcal{B}}_{p}$ and each detected base in $\mathcal{Q}$ in order to sort them and then consider only the best ones.
The used codebook is, thus, composed of a boolean m × n table and a floating-point n × n × K table^{a}, where $m=\phantom{\rule{2.77695pt}{0ex}}\left|\mathcal{P}\right|,\phantom{\rule{2.77695pt}{0ex}}n=\phantom{\rule{2.77695pt}{0ex}}\left|\mathcal{Q}\right|$ and K denote the number of used points pair measurements. The required memory for S_{ f } increases as O(mn) and for S_{ m } increases as O(n^{2}) (assuming K ≪ n).
Our algorithm detects candidate congruent N-points bases incrementally. Given ${\mathcal{B}}_{p}\in \mathcal{P}$, we start by selecting two points in ${\mathcal{B}}_{p}$ and collect all congruent 2-points bases in ${\mathcal{B}}_{q}$. We then iteratively add points to the current selection and grow the set of candidate bases until we reach a set of N-points bases.
3.1 Point features lookup table
Notice that, to build S_{ f }, we only make use of the local feature descriptors. Its size depends on the number of points of both point clouds. In Section 5, we describe several techniques to sample the input acquisitions in order to reduce their sizes.
3.2 Point pairs lookup table
Given a point pair $\left({p}_{i},{p}_{j}\right)\in \mathcal{P}$, we need to efficiently find candidate matching pairs in $\mathcal{Q}$. Using a blind exhaustive search, this would require a comparison with $\frac{n\left(n-1\right)}{2}$ point pairs. To reduce the searching time, we build a lookup table S_{ m } for $\mathcal{Q}$ by uniformly quantising the K-dimensional space formed by the used K points pair measurements {m_{ k }(·, ·)}, i.e., the Euclidean distance, the surface normal minimal angle, the gap difference(s) in the x-, y- or z-axis for structural scenes, etc. The quantisation is achieved by uniformly dividing their corresponding value ranges into B_{1}, B_{2}, ..., B_{ K } bins, respectively. The range of the Euclidean distance is $\left[{d}_{min}^{q},{d}_{max}^{q}\right]$, where ${d}_{min}^{q}$ and ${d}_{max}^{q}$ denote the minimal and maximal distances of point pairs in $\mathcal{Q}$. Surface normal minimal angle falls in the range [0, π]. The gap differences fall in the ranges [-b_{ x }, b_{ x }], [-b_{ y }, b_{ y }] and [-b_{ z }, b_{ z }], respectively, where b_{ x }, b_{ y } and b_{ z } represent the lengths in x, y and z-axis of the minimal bounding box covering all points in $\mathcal{Q}$, respectively. Each K-dimensional bin contains all points pairs $\left({q}_{i},{q}_{j}\right)\in \mathcal{P}$ whose measurements {m_{ k }(q_{ i }, q_{ j })} fall within the bin ranges.
Finally, using Equation 3, we evaluate the similarity score of each remaining candidate pair with (p_{ i }, p_{ j }) and keep only the best K_{ p } pairs. In case of very distinctive points p_{ i } and p_{ j }, there are few correspondences in $\mathcal{Q}$ with similar local features. For such points pairs, it is more convenient to select, at first, the set of matching points using S_{ f } and then verify each points pair using the set of measurements {m_{ k }(p_{ i }, p_{ j })}. This initial test is conducted by evaluating the value of $\left|{\mathcal{M}}_{f}\left({p}_{i}\right)\right|\times \left|{\mathcal{M}}_{f}\left({p}_{j}\right)\right|$, i.e., the largest number of candidate point pair matches w.r.t. p_{ i } and p_{ j } due to their local features. When this value is lower than a threshold, we employ this latter selection method. Our codebook-based search method allows one to efficiently range-search candidate matching point pairs using adaptive ranges for each query. If we regard the K point pair difference measurements as a K-dimensional vector, other fast search methods can be used for searching, e.g., the approximate nearest neighbor based on kd-tree [33]. However, these methods cannot handle the threshold constraints in each dimension, which may produce more candidates to test while discarding valid ones.
3.3 Iterative search of matching N-points bases
4 RANSAC pose optimization
- 1.
The number of iterations reaches a predefined maximal iteration number I_{max};
- 2.
The best transformation is not updated after I_{nou} continuous iterations.
Our method makes use of the codebook-based search scheme defined in Section 3, which is constructed before the optimization. The following sections describe in details each single step of the RANSAC iteration.
4.1 Random selection
Assumed k points in $\mathcal{P}$ having corresponding points in $\mathcal{Q}$, the probability of successfully selecting N-points from $\mathcal{P}$ having correspondences in $\mathcal{Q}$ is p(N) ≈ (k/m)^{ N }, where $m=\phantom{\rule{2.77695pt}{0ex}}\left|\mathcal{P}\right|$. To successfully recover the transformation, in general, we employ a base size N = 3, 4 or 5 points because the probability of success greatly decreases when the base size N increases. Moreover, to make the estimated transformation more robust, we select the N-points base ${\mathcal{B}}_{p}$ from $\mathcal{P}$ as decentralized as possible in 3-D space.
- 1.
Both points p_{ i } and p_{ j } potentially have several matches in $\mathcal{Q}$ based on their considered local descriptors (see Equation 12),
- 2.
They are well spaced and
- 3.
There exists a very similar 2-points base $\left({p}_{i},{p}_{j}\right)\in \mathcal{Q}$ according to the similarity measure s_{ c }(·, ·) defined in Equation 3.
- 1.
We randomly select the first two points $\left({p}_{{s}_{1}},{p}_{{s}_{2}}\right)$ based on the probability values {S_{ p }(p_{ i }, p_{ j })}_{i>j,1 ≤ i, j≤m}of the upper triangular part of the symmetric pairwise matching probability table S_{ p }. These points are added to the initially empty ${\mathcal{B}}_{p}^{c}$.
- 2.
The next point ${p}_{{s}_{k+1}},\phantom{\rule{2.77695pt}{0ex}}k\ge 2$ is randomly selected based on the joint probability values ${\left\{{\prod}_{{p}_{{s}_{k}}\in {\mathcal{B}}_{p}^{c}}p\left({p}_{i},{p}_{{s}_{k}}\right)\right\}}_{{p}_{i}\in \mathcal{P},{p}_{i}\notin {\mathcal{B}}_{p}^{c}}$.
In this way, there is a high probability to select a N-points base ${\mathcal{B}}_{p}$ with corresponding points in $\mathcal{Q}$.
where ψ ∈ (0, 1) is a positive constant.
4.2 Approximate congruent group selection
After selecting an N-points base ${\mathcal{B}}_{p}$ from $\mathcal{P}$, we need to detect a set of approximate congruent N-points bases. This is done by exploiting the fast codebook structures S_{ f } and S_{ m } defined in Section 3. In particular following Algorithm 1, we are able to iteratively recover the set of congruent N-points bases as we select points of ${\mathcal{B}}_{p}$ from $\mathcal{P}$. We keep only the first K_{ g } candidates according to Equation 3. These K_{ g } N-points bases ${\left\{{\mathcal{B}}_{q}^{k}\right\}}_{k=1}^{{K}_{g}}$ are used in the following step to estimate a set of point clouds transformations.
4.3 Transformation estimation
Given two point sets $\mathcal{P}$ and $\mathcal{Q}$ with overlapping regions in arbitrary initial positions, we recover the best transformation from a prescribed family of transformations, typically rigid transformations, that best aligns the overlapping regions of $\mathcal{P}$ and $\mathcal{Q}$. In case of rigid transformation, we need a base size of at least three points to uniquely determine the aligning transformation. This means that our algorithm requires at least a pair of matching 3-points bases from $\mathcal{P}$ and $\mathcal{Q}$, respectively. In particular for any given N-points bases pairs ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}$, we recover the corresponding transformation $\mathcal{T}$ using the closed-form solution [34].
4.4 Transformation verification
where λ is a positive parameter and ${\int}_{0}^{1}{\mathscr{H}}_{n}\left(\mathcal{T},\delta \right)$ denotes the integral of ${\mathscr{H}}_{n}\left(\mathcal{T},\delta \right)$, i.e., the area below the cumulative curve in ${\mathscr{H}}_{n}\left(\mathcal{T},\delta \right)$. We use a quality-based LCP (QLCP) measure defined by ${f}_{\mathsf{\text{QLCP}}}\left(\mathcal{T},\delta \right)={m}_{s}\left(\mathcal{T},\delta \right)\cdot {f}_{\mathsf{\text{LCP}}}\left(\mathcal{T},\delta \right)$ as our best fit criteria. By weighting ${f}_{\mathsf{\text{LCP}}}\left(\mathcal{T},\delta \right)$ with the quality estimate ${m}_{s}\left(\mathcal{T},\delta \right)$ the QLCP measure is made less sensitive to the choice of δ than the LCP measure.
After the transformation estimation step we evaluate each recovered transformation ${\mathcal{T}}_{k}$ between ${\mathcal{B}}_{p}$ and ${\mathcal{B}}_{q}^{k}$ w.r.t. the mean alignment error. In particular, we test whether the error $\frac{1}{N}{\sum}_{{p}_{i}\in {\mathcal{B}}_{p},{q}_{l}^{k}\in {\mathcal{B}}_{q}^{k}}\left|\right|{\mathcal{T}}_{k}{p}_{i}-{q}_{i}^{k}|{|}^{2}$ is less than some predefined threshold, where ${\mathcal{T}}_{k}{p}_{i}$ denotes the transformed point of p_{ i } via ${\mathcal{T}}_{k}$. We further verify each remaining transformation by detecting how many points in $\mathcal{P}$ have correspondences in $\mathcal{Q}$ under ${\mathcal{T}}_{k}$ and then measuring the matching score as described in Equation 17. We say that one point $p\in \mathcal{P}$ have a corresponding point in $\mathcal{Q}$ under ${\mathcal{T}}_{k}$, if there exists some point in $\mathcal{Q}$ closer within δ-distance to the transformed point ${\mathcal{T}}_{k}p$, i.e., ${\exists}_{q\in \mathcal{Q}}\left|\right|q-{\mathcal{T}}_{k}p\left|\right|\phantom{\rule{2.77695pt}{0ex}}\le \delta $. For efficiency, we use the approximate nearest neighbours [33] for neighbourhood querying in ℝ^{3}. We first select a fixed number of points $\left\{{p}_{i}\right\}\in \mathcal{P}$ and apply the transformation ${\mathcal{T}}_{k}$. Then, for each transformed point ${\mathcal{T}}_{k}{p}_{i}$ we query the nearest neighbour in $\mathcal{Q}$. If enough points of {p_{ i }} are matched, we perform a similar tests for the remaining points in $\mathcal{P}$ and assign to ${\mathcal{T}}_{k}$ a score based on our used QLCP measure.
Finally, we update the current best transformation found with the best QLCP measure and start the next RANSAC iteration.
5 Point sampling approaches
Given two large point sets $\mathcal{P}$ and $\mathcal{Q}$, matching approximatively congruent sets of points over the entire data set is not feasible. Thus, we need efficient point sampling strategies to quickly search corresponding sets and to effectively estimate and verify their transformation on a limited number of meaningful candidates points. The reliability of the proposed registration approach depends, to some extent, on the used sampling strategy. If we sample too many points from scarcely meaningful regions the registration might converge slowly, find the wrong transformation (such as solutions showing sliding effects produced by samples poorly constraining the transformation), or even diverge, especially in the presence of noise and outliers. Several point sampling techniques for point cloud alignment were recently proposed [6, 35–37]. In [6] the random, uniform (over the surface area of a model) and normal-space sampling are considered to evaluate the convergence of the ICP algorithm. The normal-space sampling algorithm tries to uniformly spread the normals of the selected points on the sphere of directions. The aim is to consider points that sufficiently constrain the estimated rigid transformation and improve the alignment quality by reducing surface sliding effects. Gelfand et al. [35] proposed a variant of this algorithm to make the transformation estimation geometrically stable by selecting points that reduce both translational and rotational uncertainties in the ICP algorithm. This technique samples points in order to equally constrain all eigenvectors of the covariance matrix estimated from the points and the normals of the overlapping region of two point clouds. A similar approach was used in [36] to conceive a probability function to guide the selection of stable sample points, which are also constrained by specific features. Torsello et al. [37] proposed a sampling technique to select feature points with high-local distinctiveness, which is inversely proportional to the average local radius of curvature and related to the area formed by similar points in the neighbourhood of each point. Nehab and Shilane [38] discussed the limitation of the area-based uniform sampling, where the probability of a surface point being sampled is equal for all surface points. This type of sampling might produce points very closed to each other and miss important surface features, which could successfully constrain the transformation. To overcome these drawbacks they proposed a stratified point sampling strategy ensuring an even distribution of the sample points on all surface, which implies a higher probability to catch important surface regions. This algorithm uses the voxelization of the model to generate random samples with controlled intra-distances. Other sampling strategies providing with uniform distribution of sample points on a surface are based on the farthest point [39] and Poisson disk sampling [40], which require the computation of geodesic distances.
In this section, we investigate four sampling approaches: random, uniform, probabilistic and combined sampling, which are described in details in the following.
5.1 Random sampling
Random sampling is the simplest and widely used sampling technique. In random sampling, each item or element of the population has an equal chance of being selected at each draw. A sample is random if the method for obtaining the sample meets the criterion of randomness (each element having an equal chance at each draw). The actual composition of the sample itself does not determine whether or not it was a random sample.
5.2 Uniform sampling
To achieve a high probability of success for registering two overlapping point sets, we expect that the sampled point sets have similar point densities. If the acquired surfaces present similar point densities, the above mentioned random sampling deserves to be an acceptable choice, which will also result in similar point densities in the sampled overlapping parts. However, this assumption does not hold in general. Point density depends on the distances and on the incident angles of the scanned surfaces with respect to the scanner sensor position and orientation, respectively. Normally, short distances and small incident angles lead to surfaces with high-point densities and accuracies, which we consider as high-resolution regions. Random sampling does not guarantee an equal spread of the generated points neither on the surface nor in the volume of the scanned model, and can sample points very close to each other. Thus, it is more likely that it misses important surface features than an evenly distributed sampling [38]. This effect is particularly evident in case of scanned data with non-uniform point densities.
Many approaches to perform a sampling of uniformly distributed points on the model surface can be employed [38–40]. We propose a simple and efficient variant of the method presented in [38], which is based on cubic voxelization of point clouds and that provides with samples evenly displaced on typical scanned surfaces. Given a point set $\mathcal{P}$, we can assign them into a set of 3-D cubic voxels of equal sizes, which partition the 3-D space. For each such voxel, we select the closest point to the voxel center as the sampled point. To obtain a sampled point set of a given size N_{ s }, we start by splitting the minimal bounding box of the point cloud into a small set of 3-D cubic voxels. We then iteratively split each voxel into eight small voxels until there are enough sampled points found. With this strategy, however, we cannot obtain an accurately fixed-size set of sampled points since most of the voxels do not contain points. Let N_{ l } be the number of sampled points at the l th level. The final level L is such that its number of points N_{ L } is not less than N_{ s } and the number of points at the previous level is N_{L-1}< N_{ s }. To obtain a sampling of the expected size N_{ s }, the simple way is to randomly select N_{ s } points from the N_{ L } points of the L level. To achieve a more uniform distribution, we propose, instead, to re-split all points in $\mathcal{P}$ into a set of cubic voxels of size ${\left(\frac{8{N}_{s}^{2}}{{N}_{L-1}{N}_{L}}\right)}^{\frac{1}{3}}{S}_{L-1}$, where S_{L-1}denotes the voxel size at the (L - 1)th level. In this way, the number of uniformly sampled points is very close to N_{ s }, but still not exact. If the number is larger than N_{ s }, we randomly select N_{ s } points from them. Otherwise, we add some new points from sampled point set at the L th level.
5.3 Probabilistic sampling
If the acquired surfaces are very similar in structure, e.g., the surfaces of an indoor environment composed of a main flat wall and several small objects in the front of it, the above two sampling approaches may not be efficient for our proposed registration. One reason is that we select the best transformation based on the degree of overlap in the point sets, but not based on the whole scene structure. In the above example, points from the main single wall weakly constrain translations and generate sliding effects in the final alignment, as already discussed in [6, 35–37]. Another reason is that a selected N-points base from the wall would have a large number of approximate congruent bases. This would require expensive searches, tests, estimates and verifications of candidate transformations over a large set of approximate congruent bases. To avoid these problems, we expect to consider more points from objects in front of the wall, which would reduce the computation and better constrain the rigid transformation. Similarly to [37], this is achieved by utilizing a probabilistic sampling technique, which selects points based on their likelihoods computed from a specific weight function. The weight function determines how much each point is relevant for registration and is basically associated with the local geometrical properties of the surface at each point. We experimented with two different weight functions ω_{SV} and ω_{APD} based on surface variation and adjacent point distance, respectively.
where λ_{1} ≤ λ_{2} ≤ λ_{3} are the eigenvalues corresponding to the principal components of a set of k points in the neighbourhood of p. ω_{SV}(p) ∈ [0, 1] indicates how much the surface at p locally deviates from the tangent plane [41]. In practice, ω_{SV}(p) roughly approximates the mean curvature at p: when its value is close to zero it indicates that the surface is locally planar, while, when ω_{SV}(p) is large, p identifies an interesting feature like corners, bumps, etc.
where ${\widehat{\mathcal{A}}}_{d}$ denotes the 95th percentile of the adjacent point distances in an ascending order of all points in ${\stackrel{\u0303}{\mathcal{A}}}_{d}$. The use of ${\widehat{\mathcal{A}}}_{d}$ effectively suppresses estimation errors from outliers. ω_{APD}(p) estimates the local sampling sparsity of the scanned surface at p. High values of ω_{APD} characterize those points having a neighbourhood sparsely sampled, typical of corner and edge points and regions scanned with low incident angle, which are likely located in the overlapping area of the models.
5.4 Coupled sampling
Besides the aforementioned three sampling approaches, we also consider to couple different sampling approaches in some order. For example, a point set ${\mathcal{P}}_{1}$ is selected from the initial point set $\mathcal{P}$ based on probabilistic sampling, after that, another point set ${\mathcal{P}}_{2}$ is selected from ${\mathcal{P}}_{1}$ based on uniform sampling. The finally sampled point set can be selected from the initial point set $\mathcal{P}$ via two or more sampling in some order with given sampling ratios. The sampling ratios of the finally sampled point set ${\mathcal{P}}_{S}$ from $\mathcal{P}$ via S sampling processes are denoted as $\left|{\mathcal{P}}_{1}\right|\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}\left|{\mathcal{P}}_{2}\right|\phantom{\rule{2.77695pt}{0ex}}:\cdots :\phantom{\rule{2.77695pt}{0ex}}\left|{\mathcal{P}}_{S}\right|$ where | · | denotes the set size. For different scans to be aligned, we can select a suitable sampling approach.
6 Integrating texture features
A feature is accompanied by a descriptor, which locally and compactly describes the texture around the feature pixel. In our application, we are interested in good local feature descriptors, which should have a high-local distinctiveness, invariant w.r.t. affine transformations, and possibly robust w.r.t. illumination changes and local deformation. Several local feature descriptors were presented in the literature [32, 43]. Among them, the most suitable for our application are the SIFT [44], SURF [45] and FAST [46] descriptors, which we extract from the reflectance images of our scanned models and consider as relevant sample points to be matched by our registration algorithm.
6.1 Reflectance image rectification
The aforementioned features cannot be efficiently extracted directly from the reflectance image associated with a scanned model, since its intrinsic spherical format strongly affects the quality of their descriptors, which are not designed to be robust w.r.t. spherical distortions. Indeed, typical laser scanner acquisition systems are usually composed by a fixed platform and a rotating head, which are naturally modelled by simple spherical projections. The resulting acquired images are then obtained by mapping spherical images onto single image planes. The meridians of a spherical image are mapped to vertical viewing planes, and the parallels are mapped to viewing cones with the vertex in the sensor position.
In order to reduce the distortion induced by the spherical projection, we compute the above mentioned feature descriptors on an atlas of rectified images (shown in Figure 4). This is possible since both the spherical projection and the atlas perspective projections will share the same point of view.
To recover the set of rectified images, we initially select a field of view value α_{fov} (in our experiments α_{fov} = 60°). We then calculate the width w_{ r } and height h_{ r } of each rectified image by constraining the pixel resolution to the resolution of the spherical image equator, i.e., the width w_{ s } of the spherical image $:{w}_{r}={h}_{r}={w}_{s}\cdot \frac{{\alpha}_{\mathsf{\text{fov}}}}{{360}^{\mathsf{\text{o}}}}$. Given the pixel dimensions of the image plane, we define a standard perspective projection whose principal point is represented by the central pixel of the image plane. The only missing intrinsic parameter, the projection focus, can be easily recovered given α_{fov}, w_{ r } and h_{ r }. To determine the extrinsic parameters of each projection, we fix the camera point of view to the spherical projection point of view, i.e., the local origin of the point cloud. We then sample the sphere with v equally distributed directions. The value of v depends on the required field of view and is estimated such that the images contained in the atlas completely cover the initial spherical image. Given the camera direction d_{ i } we recover the remaining extrinsic parameters of the i th image aligning the camera principal direction to d_{ i } and the vertical direction with the vertical direction of the spherical image. Exploiting each projection matrix, we can associate to each final image pixel its corresponding viewing ray, and from this the pixel's corresponding coordinates in the original spherical image. These correspondences are used to perform a bilinear interpolation of the spherical image to recover each single finally rectified image (see Figure 4).
The atlas generation only depends on the field of view used. Small values generate multiple small images, whereas large values generate fewer images but with higher-perspective distortions.
6.2 Texture features extraction and integration
- 1.
SIFT [44]: this feature descriptor encodes the trend of the image local gradient around a pixel as a histogram of typically 128 bins. This descriptor is invariant w.r.t. scaling and rigid 2-D transformation and robust w.r.t. affine distortion, addition of noise, and change in illumination. This descriptor is very accurate in identifying relevant interest point, but its computation is usually slow without exploiting the GPU [47].
- 2.
SURF [45]: this method efficiently detects features by computing a rough approximation of the Hessian matrix using integral images. The resulting descriptor is based on sums of approximated 2-D Haar wavelet responses, which is more compact and much faster to compute than the SIFT descriptor.
- 3.
FAST [48]: this techniques classifies a pixel as corner if there is a sufficiently large set of relevantly brighter (darker) pixels in a circular pixel neighbourhood of fixed radius. This feature detection algorithm is very fast, up to 30 times faster than the SIFT one, but is not invariant w.r.t. scaling and does not provide with effective descriptors [43], which are usually computed by using other techniques (e.g., with the SURF method in this paper).
- 4.
Harris corner detector [49]: it has been widely used in image processing and computer vision, and computes corner features by analysing the local changes of the image intensity with patches shifted by a small amount in different directions. It is not scale and affine invariant, and usually generates a high number of features.
The 3-D points corresponding to the extracted texture features are then considered as sampled points and used with their descriptors by our registration algorithm to match congruent sets of points, as described in Sections 2 and 3.
7 Experimental results
7.1 Test data and evaluation criteria
We fixed the poses of all reference scans (i.e., $\left\{\mathcal{Q}\right\}$) to an identity rotation matrix and no translation. To evaluate the transformation estimation accuracy of our registration algorithm, we first employed our algorithm on all tested pairs of scans to recover a good initial transformation for each scan pair and then applied the ICP optimization algorithm [5] to get a well-aligned transformation. We observed that the mean residual error after the ICP registration optimization was always comparable with the laser scanner measurement error, which is much lower than the estimation errors of our proposed N-points congruent sets (NPCS) registration algorithm. For this reason, we regarded this ICP-optimized transformation as the ground truth transformation ${\mathcal{T}}_{g}$ for evaluation. Thus, given an estimated transformation $\mathcal{T}$ from $\mathcal{P}$ to $\mathcal{Q}$, the estimation error is defined as the median of the point distances after applying $\mathcal{T}$ and ${\mathcal{T}}_{g}$ onto $\mathcal{P}$, i.e., $\mathsf{\text{media}}{\mathsf{\text{n}}}_{p\in \mathcal{P}}\left|\right|\mathcal{T}p-{\mathcal{T}}_{g}p\left|\right|$.
The transformation estimation accuracy was statistically evaluated by running our registration algorithm N_{run} = 20 times on each tested pair of scans. In each run, we refresh the input data by setting a random pose for the moving scan followed by re-sampling. For each scan pair, we set a suitable maximal estimation error Δ_{max} in advance. If the estimation error was above Δ_{max}, we considered it as failed. The maximal estimation errors were set as Δ_{max} = 5 units, Δ_{max} = 1 m, Δ_{max} = 2 m and Δ_{max} = 5 m for small object models ${\mathcal{G}}_{\mathsf{\text{a}}}-{\mathcal{G}}_{\mathsf{\text{c}}}$, the indoor models ${\mathcal{G}}_{\mathsf{\text{d}}}-{\mathcal{G}}_{\mathsf{\text{h}}}$, the large indoor models ${\mathcal{G}}_{\mathsf{\text{i}}}$ and the outdoor models ${\mathcal{G}}_{\mathsf{\text{j}}}-{\mathcal{G}}_{\mathsf{\text{k}}}$, respectively. Based on the number N_{suc} of successful estimations and the number N_{run} of runs the following three indicators were then used to evaluate our method: (1) the successful estimation rate ${S}_{r}=\frac{{N}_{\mathsf{\text{suc}}}}{{N}_{\mathsf{\text{run}}}}$ to evaluate its robustness; (2) the median estimation error Δ among all N_{suc} successful estimations to evaluate its accuracy; (3) the median estimation time t over all N_{run} runs to evaluate its efficiency. Note that the estimation time does not include pre-processing computation time (i.e., texture feature detection/matching, point sampling, etc), but it incorporates the codebook building time, which took < 1 s with the basic RANSAC and around two seconds with the probabilistic RANSAC using our parameter setting. The feature detectors listed by increasing processing time are Harris, FAST, SURF and SIFT. The point sampling approaches listed by increasing processing time are random, probabilistic and uniform samplings.
7.2 Performance evaluation
The performance of our proposed NPCS registration algorithm was evaluated on the aforementioned test data. The main parameters were set as follows. We employed congruent points sets of size N = 4. The two main parameters for searching best N-points congruent sets in Algorithm 1 were set as K_{ p } = 2, 000 and K_{ g } = 50. The exponential parameter λ = 1 for computing our proposed QLCP measure in Equation 17. Given two point sets $\mathcal{P}$ and Q, we tried to recover the transformation from $\mathcal{P}$ (moving point set) to $\mathcal{Q}$ (reference point set). We selected 500 points from $\mathcal{P}$ and 1,000 points from $\mathcal{Q}$ for searching N-points congruent sets between them. However, for transformation verification, we selected larger subsets, i.e., 1,000 points from $\mathcal{P}$ and 2,000 points from $\mathcal{Q}$. This allows the estimation of a more accurate and robust transformation. The voxel-based uniform sampling approach was used to select these points. Our registration algorithm always used the surface normals of points when provided. The maximal normal deviation for the corresponding point pairs was set to 30°. In our experiments, we used the RANSAC-based approach by setting I_{max} = 1, 000 and I_{nou} = 200, which are the allowed maximal iteration number and the continuous iteration number in which no better transformation was found, respectively. The estimation errors of the first three scan pairs ${\mathcal{G}}_{\mathsf{\text{a}}}-{\mathcal{G}}_{\mathsf{\text{c}}}$ were reported in canonical units (i.e., 100 units are equal to the bounding box diagonal lengths), while those of the other eight scan pairs ${\mathcal{G}}_{\mathsf{\text{d}}}-{\mathcal{G}}_{\mathsf{\text{k}}}$ in meters. Notice that the same parameter values were used in all experiments described below, unless clearly stated otherwise for particular experiments.
Performance evaluation of our NPCS algorithm with different sizes N of congruent point sets
N | ${\mathcal{G}}_{c}$ | ${\mathcal{G}}_{d}$ | ${\mathcal{G}}_{j}$ | ||||||
---|---|---|---|---|---|---|---|---|---|
Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | Δ | Sr (%) | t (s) | |
3 | 0.7060 | 100 | 61.5 | 0.0770 | 100 | 29.4 | 0.5903 | 100 | 18.5 |
4 | 0.3906 | 100 | 35.1 | 0.0508 | 100 | 45.4 | 0.7819 | 100 | 16.6 |
5 | 0.4532 | 100 | 39.2 | 0.0484 | 100 | 53.9 | 0.6114 | 100 | 20.6 |
6 | 0.4277 | 100 | 40.8 | 0.0462 | 100 | 90.6 | 0.9586 | 95 | 77.1 |
Performance evaluation of different sampling approaches on five pairs of scans
Sampling approaches | ${\mathcal{G}}_{c}$ | ${\mathcal{G}}_{e}$ | ${\mathcal{G}}_{g}$ | ${\mathcal{G}}_{i}$ | ${\mathcal{G}}_{j}$ | |||||
---|---|---|---|---|---|---|---|---|---|---|
Δ | S _{ r } (%) | Δ | S _{ r } (%) | Δ | S _{ r } (%) | Δ | S _{ r } (%) | Δ | S _{ r } (%) | |
Random | 0.4561 | 100 | ★ | 0 | 0.0606 | 95 | ★ | 0 | ★ | 0 |
Uniform | 0.4465 | 100 | 0.2160 | 95 | 0.0854 | 100 | 0.5146 | 95 | 0.4231 | 100 |
probAPD | 0.2045 | 100 | 0.0501 | 100 | 0.3260 | 100 | 0.3548 | 100 | ||
probSurfVar | 0.0774 | 10 | 0.0329 | 100 | ★ | 0 | ★ | 0 | ||
Random+uniform | 0.4719 | 100 | 0.1547 | 100 | 0.0410 | 100 | 0.1961 | 5 | ★ | 0 |
Random+probAPD | 0.1670 | 100 | 0.0679 | 100 | 0.4719 | 65 | 0.8053 | 45 | ||
Uniform+probAPD | 0.2001 | 100 | 0.0629 | 100 | 0.4211 | 100 | 0.4343 | 100 | ||
SIFT+random | 0.1020 | 100 | 0.0276 | 100 | 0.6754 | 5 | ★ | 0 | ||
SURF+random | 0.0840 | 85 | 0.0264 | 100 | 0.3329 | 10 | 0.2675 | 30 | ||
FAST+random | 0.1331 | 75 | 0.0252 | 100 | 0.4970 | 55 | ★ | 0 | ||
HARRIS+random | 0.0679 | 100 | 0.0121 | 100 | 0.0820 | 100 | 0.3756 | 100 | ||
SIFT+uniform | 0.0647 | 100 | 0.0349 | 100 | 0.1491 | 100 | 0.1959 | 100 | ||
SURF+uniform | 0.0848 | 100 | 0.0542 | 100 | 0.3064 | 100 | 0.3426 | 100 | ||
FAST+uniform | 0.0672 | 100 | 0.0446 | 100 | 0.1118 | 100 | 0.2719 | 100 | ||
HARRIS+uniform | 0.1131 | 100 | 0.0194 | 100 | 0.1077 | 100 | 0.5959 | 100 |
Performance comparison between the LCP and QLCP measures on different models
${\mathcal{G}}_{b}$ | ${\mathcal{G}}_{i}$ | ${\mathcal{G}}_{j}$ | |||||||
---|---|---|---|---|---|---|---|---|---|
Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | Δ | Sr (%) | t (s) | |
LCP (α = 1) | 0.3738 | 100 | 21.4 | 0.7377 | 95 | 25.6 | 0.7141 | 100 | 6.6 |
LCP (α = 2) | 0.7252 | 100 | 8.3 | 0.7264 | 90 | 26.5 | 2.4107 | 100 | 15.2 |
LCP (α = 4) | 1.2058 | 100 | 16.8 | 0.9550 | 30 | 24.8 | 3.7339 | 10 | 27.1 |
LCP (α = 8) | 0.5809 | 100 | 47.9 | 1.3048 | 20 | 22.6 | 4.5258 | 10 | 37.8 |
QLCP (λ = 1, α = 1) | 0.3524 | 100 | 31.3 | 0.5304 | 95 | 33.2 | 0.4041 | 100 | 11.7 |
QLCP (λ = 1, α = 2) | 0.2622 | 100 | 30.8 | 0.5225 | 100 | 43.0 | 0.5107 | 100 | 14.6 |
QLCP (λ = 1, α = 4) | 0.2895 | 100 | 62.0 | 0.7329 | 95 | 43.9 | 0.7857 | 100 | 28.6 |
QLCP (λ = 1, α = 8) | 0.3934 | 100 | 81.5 | 0.8767 | 80 | 85.1 | 0.9193 | 100 | 97.1 |
QLCP (λ = 2, α = 1) | 0.2127 | 100 | 49.3 | 0.3264 | 95 | 29.1 | 0.3782 | 100 | 10.9 |
QLCP (λ = 2, α = 2) | 0.3055 | 100 | 35.9 | 0.4783 | 100 | 61.2 | 0.5394 | 100 | 23.5 |
QLCP (λ = 2, α = 4) | 0.2904 | 100 | 59.3 | 0.4910 | 100 | 97.8 | 0.6343 | 100 | 33.6 |
QLCP (λ = 2, α = 8) | 0.3537 | 100 | 142.3 | 0.8897 | 85 | 79.5 | 0.7970 | 100 | 64.0 |
Performance evaluation of our algorithm using scene structure information
${\mathcal{G}}_{d}$ | ${\mathcal{G}}_{f}$ | |||||
---|---|---|---|---|---|---|
Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | |
Uniform | 0.0498 | 100 | 42.2 | 0.2269 | 100 | 30.3 |
Uniform (scene structure) | 0.0326 | 100 | 13.4 | 0.1500 | 100 | 10.5 |
Performance evaluation on the texture feature-based variants of our algorithm with features detected on the original reflectance images and on atlases of rectified reflectance images
${\mathcal{G}}_{g}$ | ${\mathcal{G}}_{h}$ | |||||
---|---|---|---|---|---|---|
Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | |
Uniform | 0.0727 | 100 | 32.2 | ★ | 0 | 53.1 |
SURF + uniform | 0.0349 | 100 | 26.2 | 0.2632 | 60 | 11.0 |
matchSURF(k = 500) + uniform | 0.0353 | 100 | 15.9 | 0.2555 | 55 | 4.6 |
matchSURF(k = 200) + uniform | 0.0568 | 100 | 7.7 | 0.2532 | 40 | 4.6 |
matchSURF(k = 100) + uniform | 0.0952 | 100 | 4.7 | 0.3667 | 50 | 4.4 |
matchSURF(k = 50) + uniform | 0.1592 | 90 | 4.5 | 0.3201 | 25 | 3.1 |
Atlas: SURF + uniform | 0.0148 | 100 | 17.8 | 0.2018 | 100 | 8.7 |
Atlas: matchSURF(k = 500) + uniform | 0.0199 | 100 | 6.2 | 0.1560 | 85 | 6.1 |
Atlas: matchSURF(k = 200) + uniform | 0.0248 | 100 | 5.1 | 0.2299 | 95 | 4.3 |
Atlas: matchSURF(k = 100) + uniform | 0.0318 | 100 | 4.9 | 0.3095 | 75 | 4.0 |
Atlas: matchSURF(k = 50) + uniform | 0.0367 | 100 | 2.5 | 0.3354 | 60 | 3.3 |
SIFT + uniform | 0.0433 | 100 | 34.4 | 0.2351 | 30 | 11.5 |
matchSIFT(k = 500) + uniform | 0.0548 | 100 | 15.3 | 0.2222 | 10 | 10.4 |
matchSIFT(k = 200) + uniform | 0.0849 | 100 | 9.5 | 0.2801 | 10 | 3.9 |
matchSIFT(k = 100) + uniform | 0.1674 | 100 | 5.9 | ★ | 0 | 3.4 |
matchSIFT(k = 50) + uniform | 0.2842 | 85 | 5.5 | ★ | 0 | 1.5 |
Atlas: SIFT + uniform | 0.0287 | 100 | 28.7 | 0.0494 | 30 | 10.8 |
Atlas: matchSIFT(k = 500) + uniform | 0.0321 | 100 | 29.4 | 0.1584 | 25 | 4.3 |
Atlas: matchSIFT(k = 200) + uniform | 0.0484 | 100 | 10.2 | 0.3768 | 15 | 5.3 |
Atlas: matchSIFT(k = 100) + uniform | 0.1716 | 100 | 5.2 | 0.5138 | 10 | 4.4 |
Atlas: matchSIFT(k = 50) + uniform | 0.3017 | 90 | 5.6 | ★ | 0 | 1.9 |
The reason of complete failure in ${\mathcal{G}}_{\mathsf{\text{h}}}$ is due to the presence of wrong transformations with a better QLCP measure than the ground truth transformation. In this example, we clearly showed how the integration of texture features can relevantly improve the robustness of our registration algorithm, especially when they are extracted from rectified images. We recall that there are two ways of integrating texture features. The first one is to consider detected features only as sampled points, as for the experiments reported in Table 2. The second one (reported with the prefix "match") is to rank the matched features using their descriptors and keep in $\mathcal{P}$ the best k correspondences for each feature point. This experiment showed that: (1) the SURF features were more robust than the SIFT features in both ${\mathcal{G}}_{\mathsf{\text{g}}}$ and ${\mathcal{G}}_{\mathsf{\text{h}}}$; (2) the accuracy of the transformation obtained with SURF features is much better than the one obtained with SIFT features in ${\mathcal{G}}_{\mathsf{\text{g}}}$; (3) the features detected from the atlas of rectified reflectance images led to a more robust and generally accurate registration than those extracted from the original reflectance images (the improvement in term of accuracy is particularly evident in ${\mathcal{G}}_{\mathsf{\text{g}}}$, where the very low estimation error within 1-2 cm is 0.5 h of the bounding box diagonal length); (4) decreasing the number k of best feature correspondences based their descriptors greatly improves the efficiency (i.e., shorter computation times), but possibly deteriorates the robustness (i.e., lower successful rates); (5) in some experiments the features detected from the atlas of rectified reflectance images led to slightly higher estimation errors than those extracted from the original reflectance images. This happens when k is not large enough (e.g., k = 50, 100) and is mainly related to features located close to the rectified image boundaries. Those features extracted in these regions are less reliable due to the lack of enough texture information around them. This effect can be overcome by imposing a given degree of overlap between neighbouring rectified images.
Performance comparison between the basic RANSAC and the probabilistic RANSAC
Type of RANSAC | ${\mathcal{G}}_{a}$ | ${\mathcal{G}}_{e}$ | ${\mathcal{G}}_{k}$(12% overlap) | ||||||
---|---|---|---|---|---|---|---|---|---|
Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | Δ | Sr (%) | t (s) | |
Basic RANSAC | 0.8910 | 100 | 50.1 | 0.2160 | 95 | 25.3 | 0.9393 | 65 | 118.4 |
Probabilistic RANSAC | 0.9013 | 100 | 55.7 | 0.2071 | 100 | 29.4 | 0.9543 | 95 | 102.7 |
Performance comparison of 4PCS and NPCS algorithms
${\mathcal{G}}_{a}$ | ${\mathcal{G}}_{c}$ | ${\mathcal{G}}_{i}$ | ${\mathcal{G}}_{j}$ | ${\mathcal{G}}_{k}$ | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | Δ | S _{ r } (%) | t (s) | |
4PCS | 2.845 | 85 | 10.6 | 1.953 | 90 | 22.1 | 0.775 | 40 | 56.3 | 3.762 | 30 | 48.8 | ★ | 0 | 356.3 |
NPCS | 0.875 | 100 | 46.3 | 0.420 | 100 | 31.2 | 0.458 | 95 | 45.5 | 0.517 | 100 | 13.5 | 0.980 | 60 | 76.2 |
8 Conclusions
In this paper, we presented the NPCS registration, which is a robust and efficient approach for automatically aligning two point sets with overlap. Given two point sets $\mathcal{P}$ and $\mathcal{Q}$, our algorithm randomly searches for sets of congruent groups of N-points in $\mathcal{P}$ and $\mathcal{Q}$, which lead to the best estimate of the transformation. This is achieved by employing a RANSAC-based algorithm, which can also estimate the matching probability of each point to drive the search of possibly successful congruent bases of N-points. This probabilistic RANSAC approach improves the registration robustness especially for aligning two point sets with a small overlap. The search of congruent sets is efficiently performed by using a fast search codebook inspired by [20] to relevantly reduce the execution time. Our proposed search method can efficiently combine different metric functions to match points and point pairs in a multidimensional search space, which can be defined by geometric and texture feature descriptors and geometrical constraints of set of sampled points. This makes our framework general and flexible The efficient combination of feature descriptors can relevantly improve the registration performance, accuracy, and robustness of models with known specific characteristics. Moreover, we proposed a method to extract texture features from an atlas of rectified images recovered by sampling the reflectance image spherical field of view. The resulting feature descriptors were less sensitive to spherical distortions yielding more precise matches than those extracted from the original reflectance image.
To further improve the registration robustness, we introduced a new measure called QLCP. This is used to efficiently verify the rigid transformation estimated at each RANSAC iteration. The QLCP considers both the quantity and quality of the overlapping point set, and improved the verification robustness w.r.t. the LCP measure utilized in the 4PCS algorithm [8].
Our proposed NPCS algorithm was overall evaluated on a variety of input data with varying amount of noise, outliers, and extent of overlap (12-100%), acquired from small objects, indoor and outdoor scenes. The experiments focused on three main aspects: robustness (successful estimation rate), accuracy (median estimation error), and efficiency (median estimation time), respectively. In almost all cases, our NPCS algorithm (executed with the same parameters) successfully aligned two point clouds in less than a minute by accurately recovering their rigid transformations (the estimation errors are mostly within 0.5-5 h of their bounding box diagonal lengths). We tested our registration method with different sampling techniques and experimentally demonstrated the benefits of using methods generating uniformly distributed point samples w.r.t. random-based sampling strategies. Our proposed voxel-based uniform sampling approach was robust and efficient in almost all tested cases. We showed that the proposed QLCP measure is more reliable than the LCP measure. Still, it is possible to obtain wrong estimations, e.g., the failure related to the scan pair ${\mathcal{G}}_{\mathsf{\text{h}}}$. Nevertheless, these wrong estimations can be successfully recovered by using texture-based features detected from the atlas of rectified reflectance images as shown by our experiments, or by using other suitable geometric features as those presented in [10, 13, 50]. In our experiments the NPCS registration performed, on average, better than the 4PCS method in terms of robustness and accuracy. In some experiments, in particular in presence of large overlaps, the execution time of our algorithm was slightly higher than 4PCS. While aligning two scans with a small overlap, our NPCS algorithm took similar computation time (mostly less than a minute) to achieve the good solution. However, 4PCS sometimes took more than 10 min, but still failed (e.g., in ${\mathcal{G}}_{\mathsf{\text{k}}}$).
Future work will be focused on improving the performance of our method by exploiting a parallel hardware and to improve the robustness of our QLCP measure for inspection applications, which typically require to automatically align models captured at different times from scenes possibly containing changes.
Endnotes
^{a}Only upper triangle part is stored in practice due to the symmetry of the point pairs lookup table. ^{b}The demo application of 4PCS algorithm [8] is available at http://graphics.stanford.edu/~niloy/research/fpcs/4PCS_demo.html.
Algorithm 1
Find the best K_{ g } N-points bases ${\left\{{\mathcal{B}}_{q}^{k}\right\}}_{k=1}^{{K}_{g}}$ in $\mathcal{Q}$ which are approximate congruent to the N-points base ${\mathcal{B}}_{p}$ in $\mathcal{P}$.
$\mathcal{M}\leftarrow {\left\{\left({q}_{1}^{k},{q}_{2}^{k}\right)\right\}}_{k=1}^{{K}_{p}}$, i.e., the best K_{ p } point pairs in $\mathcal{Q}$ possibly matching the point pair (p_{1}, p_{2}) in ${\mathcal{B}}_{p}$, according with the similarity score in Equation 3.
for i = 2 to N - 1 do
case 1: $\mathcal{S}\leftarrow {\left\{\left({q}_{i}^{k},{q}_{i+1}^{k}\right)\right\}}_{k=1}^{{K}_{p}}$ in $\mathcal{Q}$ possibly matching the point pair (p_{ i }, p_{i+1}) ${\mathcal{B}}_{p}$.
case 2: collect ${\mathcal{M}}_{f}\left({p}_{i+1}\right)$
${\mathcal{M}}^{\prime}\leftarrow \mathrm{0\u0338}$
for each group $\left({q}_{1}^{m},\dots ,{q}_{i}^{m}\right)$ of points in $\mathcal{M}={\left\{\left({q}_{1}^{m},\dots ,{q}_{i}^{m}\right)\right\}}_{m=1}^{\left|\mathcal{M}\right|}$ do
case 1: group point pair
for each point pair $\left({q}_{i}^{k},{q}_{i+1}^{k}\right)$ in $\mathcal{S}$ do
if ${q}_{i}^{m}={q}_{i}^{k}$ and ${S}_{f}\left({p}_{i+1},{q}_{i+1}^{k}\right){\prod}_{j=1}^{i-1}{\prod}_{k=1}^{K}{s}_{k}\left({m}_{k}\left({p}_{j}^{m},{p}_{i+1}\right),\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{m}_{k}\left({q}_{j},{q}_{s}\right)\right)=1$ then
${\mathcal{M}}^{\prime}\leftarrow \left\{{\mathcal{M}}^{\prime},\left({q}_{1}^{m},\dots ,{q}_{i}^{m},\phantom{\rule{2.77695pt}{0ex}}{q}_{i+1}^{k}\right)\right\}$
end if
end for
case 2: add a single point
for each point ${q}_{s}\in {\mathcal{M}}_{f}\left({p}_{i+1}\right)$ do
if ${\prod}_{j=1}^{i}{\prod}_{k=1}^{K}{s}_{k}\left({m}_{k}\left({p}_{j}^{m},{p}_{i+1}\right),\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{m}_{k}\left({q}_{j},{q}_{s}\right)\right)=1$ then
${\mathcal{M}}^{\prime}\leftarrow \left\{{\mathcal{M}}^{\prime},\left({q}_{1}^{m},\dots ,{q}_{i}^{m},\phantom{\rule{2.77695pt}{0ex}}{q}_{s}\right)\right\}$
end if
end for
end for
$\mathcal{M}\leftarrow {\mathcal{M}}^{\prime}$
end for
return ${\left\{{\mathcal{B}}_{q}^{k}\right\}}_{k=1}^{{K}_{g}}$ by filtering $\mathcal{M}$ according with Eq. (3)
Declarations
Authors’ Affiliations
References
- Hartley RI, Zisserman A: Multiple View Geometry in Computer Vision. 21st edition. Cambridge University Press; 2004. ISBN: 0521540518View ArticleMATHGoogle Scholar
- Yao J, Cham WK: Robust multi-view feature matching from multiple unordered views. Pattern Recogn 2007,40(11):3081-3099. 10.1016/j.patcog.2007.02.011View ArticleMATHGoogle Scholar
- Pollefeys M, Van Gool L, Vergauwen M, Verbiest F, Cornelis K, Tops J, Koch R: Visual modeling with a hand-Held camera. Int J Comput Vis 2004,59(3):207-232.View ArticleGoogle Scholar
- Zhang W, Yao J, Cham WK: 3D modeling from multiple images. The Seventh International Symposium on Neural Networks (ISNN 2010) 2010, 97-103.Google Scholar
- Besl PJ, McKay HD: A method for registration of 3-D shapes. IEEE Trans Pattern Anal Mach Intell 1992,14(2):239-256. 10.1109/34.121791View ArticleGoogle Scholar
- Rusinkiewicz S, Levoy M: Efficient Variants of the ICP Algorithm. International Conference on 3-D Digital Imaging and Modeling (3DIM) 2001.Google Scholar
- Matabosch C, Salvi J, Fofi D, Meriaudeau F: Range image registration for industrial inspection. Mach Vis Appl Ind Insp XIII 2005, 216-227.Google Scholar
- Aiger D, Mitra NJ, Cohen-Or D: 4-Points Congruent Sets for Robust Pairwise Surface Registration. ACM SIGGRAPH 2008.Google Scholar
- Fischler MA, Bolles RC: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 1987,24(6):726-740.MathSciNetGoogle Scholar
- Gelfand N, Mitra NJ, Guibas LJ, Pottmann H: Robust Global Registration. Third Eurographics Symposium on Geometry Processing (SGP) 2005.Google Scholar
- Yu L, Zhang D, Holden E: A fast and fully automatic registration approach based on point features for multi-source remote-sensing images. Computers Geosci 2008,34(7):838-848. 10.1016/j.cageo.2007.10.005View ArticleGoogle Scholar
- Kang Z: Automatic Registration of Terrestrial Point Cloud Using Panoramic Reflectance Images. International Society for Photogrammetry and Remote Sensing 2008.Google Scholar
- Zaharescu A, Boyer E, Varanasi K, Horaud R: Surface Feature Detection and Description with Applications to Mesh Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2009.Google Scholar
- Chalfant JS, Patrikalakis NM: Three-dimensional object registration using wavelet features. Eng Computers 2009,25(3):303-318. 10.1007/s00366-009-0126-5View ArticleGoogle Scholar
- Smith ER, Radke RJ, Stewart CV: Physical Scale Intensity-Based Range Keypoints. 3D Data Processing, Visualization, and Transmission (3DPVT) 2010.Google Scholar
- Johnson AE, Hebert M: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans Pattern Anal Mach Intell 1999,21(5):433-449. 10.1109/34.765655View ArticleGoogle Scholar
- Huber DF, Hebert M: Fully automatic registration of multiple 3D data sets. Image Vis Comput 2003,21(7):637-650. 10.1016/S0262-8856(03)00060-XView ArticleGoogle Scholar
- Makadia A, Patterson AI, Daniilidis K: Fully Automatic Registration of 3D Point Clouds. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2006.Google Scholar
- Stamos I, Leordeanu M: Automated Feature-based Range Registration of Urban Scenes of Large Scale. IEEE Conference on Computer Vision and Pattern Recognition 2003.Google Scholar
- Yao J, Ruggeri MR, Taddei P, Sequeira V: Automatic scan registration using 3D linear and planar features. 3D Res 2010,1(3):1-18.View ArticleGoogle Scholar
- Chao C, Stamos I: Semi-automatic Range to Range Registration: a Feature-based Method. International Conference on 3-D Digital Imaging and Modeling (3DIM) 2005.Google Scholar
- Dold C, Brenner C: Registration of Terrestrial Laser Scanning Data Using Planar Patches and Image Data. International Society for Photogrammetry and Remote Sensing 2006.Google Scholar
- Chao C, Stamos I: Range Image Registration Based on Circular Features. Proceedings of International Symposium on 3D Data Processing Visualization and Transmission (3DPVT) 2006.Google Scholar
- Franaszek M, Cheok GS, Witzgall C: Fast automatic registration of range images from 3D imaging systems using sphere targets. Autom Constr 2009,18(3):265-274. 10.1016/j.autcon.2008.08.003View ArticleGoogle Scholar
- Rabbani T, van den Heuvel F: Automatic Point Cloud Registration Using Constrained Search for Corresponding Objects. 7th Conference on Optical 3-D Measurement Techniques 2005.Google Scholar
- Silva L, Bellon OR, Boyer KL: Precision range image registration using a robust surface interpenetration measure and enhanced genetic algorithms. IEEE Trans Pattern Anal Mach Intell 2005, 27: 762-776.View ArticleMATHGoogle Scholar
- Boughorbel F, Mercimek M, Koschan A, Abidi MA: A new method for the registration of three-dimensional point-sets: the Gaussian fields framework. Image Vis Comput 2010,28(1):124-137. 10.1016/j.imavis.2009.05.003View ArticleGoogle Scholar
- Gold S, Rangarajan A, Lu CP, Pappu S, Mjolsness E: New algorithms for 2D and 3D point matchingpose estimation and correspondence. Pattern Recogn 1998,31(8):1019-1031. 10.1016/S0031-3203(98)80010-1View ArticleGoogle Scholar
- Granger S, Pennec X: Multi-scale EM-ICP: A Fast and Robust Approach for Surface Registration. In Proc. of the 7th European Conference on Computer Vision-Part IV, ECCV '02. London, UK, UK Springer-Verlag; 2002:418-432.Google Scholar
- Liu Y: Automatic range image registration in the Markov chain. IEEE Trans Pattern Anal Mach Intell 2010,32(1):12-29.View ArticleGoogle Scholar
- Tamaki T, Abe M, Raytchev B, Kaneda K: Softassign and EM-ICP on GPU. International Conference on Networking and Computing (ICNC) 2010, 179-183.Google Scholar
- Juan L, Gwon O: A Comparison of SIFT, PCA-SIFT and SURF. Int J Image Process IJIP 2009,3(4):143-152.Google Scholar
- Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. ACM-SIAM Symposium on discrete algorithms 1994.Google Scholar
- Zhang Z, Faugeras OD: Determining motion from 3D line segment matches: a comparative study. Image Vis Comput 1991, 9: 10-19. 10.1016/0262-8856(91)90043-OView ArticleGoogle Scholar
- Gelfand N, Ikemoto L, Rusinkiewicz S, Levoy M: Geometrically Stable Sampling for the ICP Algorithm. Fourth International Conference on 3D Digital Imaging and Modeling (3DIM) 2003.Google Scholar
- Brown BJ, Rusinkiewicz S: Global non-rigid alignment of 3-D scans. ACM Trans Graph 2007,26(3):21. 10.1145/1276377.1276404View ArticleGoogle Scholar
- Torsello A, Rodola E, Albarelli A: Sampling Relevant Points for Surface Registration. International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT2011) 2011.Google Scholar
- Nehab D, Shilane P: Stratified Point Sampling of 3D Models. Proc of the Symposium on Point-Based Graphics 2004, 49-56.Google Scholar
- Ruggeri MR, Patanè G, Spagnuolo M, Saupe D: Spectral-driven isometry-invariant matching of 3D shapes. Int J Comput Vis 2010, 89: 248-265. 10.1007/s11263-009-0250-0View ArticleGoogle Scholar
- Bowers J, Wang R, Wei LY, Maletz D: Parallel Poisson disk sampling with spectrum analysis on surfaces. ACM Trans Graph 2010, 29: 166:1-166:10.View ArticleGoogle Scholar
- Pauly M, Gross M, Kobbelt LP: Efficient simplification of point-sampled surfaces. Proc of Vis 2002, 163-170.Google Scholar
- Yao J, Taddei P, Ruggeri MR, Sequeira V: Complex and photo-realistic scene representation based on range planar segmentation and model fusion. Int J Robotics Res 2011,30(10):1263-1283. 10.1177/0278364911410754View ArticleGoogle Scholar
- Tuytelaars T, Mikolajczyk K: Local invariant feature detectors: a survey. Found Trends Comput Graph Vis 2008,3(3):177-280.View ArticleGoogle Scholar
- Lowe DG: Distinctive image features from scale-invariant keypoints. Int J Computer Vis 2004,60(2):91-110.View ArticleGoogle Scholar
- Bay H, Ess A, Tuytelaars T, Van Gool L: Speeded-up robust features (SURF). Comput Vis Image Underst 2008,110(3):346-359. 10.1016/j.cviu.2007.09.014View ArticleGoogle Scholar
- Rosten E, Drummond T: Machine Learning for High-speed Corner Detection. European Conference on Computer Vision 2006, 1: 430-443.Google Scholar
- Sinha SN, michael Frahm J, Pollefeys M, Genc Y: GPU-based Video Feature Tracking and Matching. Tech. rep., in Workshop on Edge Computing Using New Commodity Architectures 2006.Google Scholar
- Rosten E, Porter R, Drummond T: Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 2010,32(1):105-119.View ArticleGoogle Scholar
- Harris C, Stephens M: A Combined Corner and Edge Detector. The Fourth Alvey Vision Conference 1988, 147-151.Google Scholar
- Albarelli A, Rodola E, Torsello A: Loosely Distinctive Features for Robust Surface Alignment. European Conference on Computer Vision (ECCV2010) 2010, 519-532.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.