Point matching based on affine invariant centroid trees

Object detection can be formulated as a point matching problem when objects are modeled by point sets. Moments, which have been widely used for point matching, are limited to affine transformations as their support point sets cannot keep invariant. To address this problem, we developed an affine invariant centroid tree (AICT) to obtain a rigorous affine invariant support point set (SPS). The algorithm is constructed by a recursive process: the point set is first divided by the vector from the certain point to the centroid of the point set, and the centroids of subsets are used to generate vectors for renewed partitions. In addition, the centroids of the subsets are stored to form an AICT. The AICT represents the inherent structure of the point set. It is highly tolerant to noise and outliers due to the partitions on the whole point set. More importantly, it is affine invariant owing to the affine invariance of partition. Therefore, we can get rigorous affine invariant descriptors while moments are combined with AICT. The experimental results on synthesized and real data verify that our proposed algorithm outperforms the state-of-the-art point matching methods including shape context, iterative closet point, and the method adopting thin plate spline for rigid robust point matching (TPS-RPM).

estimation problem as the model points are fitted to the target points by maximizing the likelihood. Different to the above iterated estimation algorithms which are time consuming, several methods explore the correspondences based on invariant descriptors. Belongie et al. [11] proposed a descriptor named shape context (SC) to represent the coarse distribution of the rest points with respect to a given point. SC is invariant to rotation while a relative frame is used. Restricted spatial order constraints (RSOC) [12] are developed to generate an affine invariant descriptor based on the preserved adjacent spatial order. Similarly, assuming the neighbors are preserved, Zheng et al. [13] proposed a robust point matching algorithm for nonrigid shapes. However, in practice, the neighbors may be quite different due to the transformations, noise and outliers. Moments [14][15][16], which also have been widely used for point matching under affine transformations, share the similar limitation. These approaches devote to finding descriptors via using the algebraic methods on SPSs. The SPSs are assumed to be affine invariant [17]. Unfortunately, the assumption is not always valid as SPSs are generally composed of the neighbor points sampled by uniform spacing, arc length, affine length [18] and so on. In this paper, to enforce the invariance of the moments, we developed an AICT. The algorithm is constructed by a recursive process: the point set is first divided into two subsets by the vector from the certain point to the centroid of the point set, and the centroids of subsets which are rigorous affine invariant are used to generate vectors for renewed partitions. In addition, the centroids are stored to form an AICT. Hence, a SPS which is constructed by the points in the AICT is affine invariant. A descriptor can be extracted from the SPS while a moment is adopted. The affine invariance of the SPS and the moment guarantees the invariance of the descriptor, and the issue of point matching under affine transformations can be formulated to descriptor matching.
The reminder of this paper is organized as follows: Sect. 2 introduces the AICT and its application in point matching. Section 3 compares the performance of our algorithm with three state-of-the-art algorithms, and followed by a conclusion in Sect. 4.

Affine transformation and its properties
Before embarking on introducing the AICT, we briefly introduce the affine transformation, as well as its important properties based on which the AICT is proposed. A general 2D affine transformation T = {A, b} transforms the point p in the model point set into its corresponding point q in the target point set by q = Ap + b . b 2×1 is the translation vector and A 2×2 is the affine transformation matrix. A 2×2 , which includes rotation, scaling, and shearing transformations, can be represented as the following matrices separately: Affine transformation has many properties, and two remarkable of them are introduced as follows. (1) Note that, this property is premised on the assumption that the unit vector of v is affine invariant. For simplicity, in this paper, the assumption can be restricted to the affine invariance of the vector v . In other words, the start point and end point of the vector v keep invariant under affine transformations.

The construction of AICT
Relative position invariance implies that a point set P can be partitioned into three affine invariant subsets by an affine invariant vector. The three subsets are the positive subset The key problem of the partition is how to get affine invariant vectors. Fortunately, due to the centroid invariance of the affine transformation, the vector from the centroid of one subset to another fits the bill. Consequently, AICT is built via a recursive operation: the point set is first partitioned by an affine invariant vector, then the centroids of the positive subset P + v and negative subset P − v are used to generate other vectors to induce new partitions on the point set, and so on. The centroids of P + v and P − v obtained from each partition process are stored in order to construct the AICT. Figure 2 illustrates the above operation on the point set P = {p 1 , p 2 , . . . , p N } . To build the AICT for p i , the key is how to find an affine invariant vector for the first iteration. Due to the affine invariance of the centroid, the centroid of P (i.e., c in Fig. 2) is adopted and the vector from p i to c is used to divide P into three subsets P i0 1 , P i+ 11 and P i− 12 . Apparently, the centroids c i 11 and c i 12 of P i+ 11 and P i− 12 are both affine invariant. They are extracted and stored as the left and right son of p i , respectively. In the AICT of p i , p i is the root node. c i 11 and c i 12 are at the second level while the level of the root node is defined to be one. The recursive partition process can go on while the vector from the node c i tj to its father c i (t−1)ceil(j/2) is adopted to induce renewed partition, and the centroids c i and c i (t+1)(2j) of the positive subset P i+ (t+1)(2j−1) and negative subset P i− (t+1)(2j) are stored as the left and right son of c i tj , respectively. ceil(ξ ) converts ξ to the nearest integers greater than it. The partition process needs to continue until the AICT achieves a given depth. Here, the depth is defined as the number of levels included by the AICT. For example, the depth of the AICT in Fig. 2c

AICT for point matching
Our point matching algorithm can be divided into two stages: descriptor calculation and matching. The first stage devotes to getting a descriptor for each point. Since the points in the AICT are affine invariant, we can get an affine invariant SPS when the points in AICT are arranged in order. Then, an affine invariant descriptor is gotten while a moment is combined with the SPS. For example, for the point p i , if the points in its AICT are arranged from top to down and left to right, we can get a SPS {p i , c i 11 , c i 12 , c i 21 , c i 22 , . . . , c i n(2n) } . Once the SPS is obtained, the moment such as cross weighted (CW), affine invariant Fourier moment (AIFM), or diagonals of orthogonal projection matrices (DOPM) could be acquired from the SPS as the descriptor of p i .
In the matching stage, the correspondences between points are established by descriptor matching, and target recognition is achieved based on point-to-point matching. Considering two points p i and q j from the model and target point set, respectively, the cost of matching two points, represented as c(p i , q j ) , is measured via the χ 2 test statistic between the descriptors, that is where dp i and dq i denote the descriptors of p i and q j , respectively. Given the set of costs between all point pairs, we measure the cost of object matching by Equation (4) is a weighted bipartite matching problem. It can be solved by the Hungarian method subjecting to the constraint that the matching is one-to-one, i.e., ρ is a permutation. In order to have robust handling of outliers, we add "dummy" points to each point set with a constant matching cost of ε d . In the meantime, C is treated as the matching cost between two objects, and the smaller the C, the better the localization of the corresponding object pairs is.

Results and discussion
In this section, AICT is coupled with DOPM to get a descriptor, which is named AICT-DOPM. We perform experiments both on synthetic data and real data to compare the performance of AICT-DOPM with state-of-the-art algorithms including SC, ICP, and TPS-PRM.

Fundamental experiments on synthetic data
In synthetic data experiments, we generate a point set, in which 100 points uniformly distribute in 2D space with mean distance between neighbor points is normalized to 1. The point set is treated as the model, and target point sets are obtained under different levels of affine transformations, noise and outliers, respectively. To get noisy target point sets, the model points are firstly transformed by a random affine transformation, and (4) C = min ρ i c(p i , q ρ(i) ). where d is the minimum distance between the point to the others in the point set. The outlier measurement, denoted by ODR, is defined as the ratio of the number of outlier to the number of original points. Figure 3 shows the distortions between the model and target point sets. The model is from Chui database. The matching accuracy of descriptors is evaluated by the number of correct matches with respect to the number of currently existing matches. In addition, the correspondences between two point sets are used to estimate the affine transformation T ′ . The matching error is quantified as the average Euclidean distance between the points in the transformed model point set under T ′ and T . All results given in this subsection are the average results based on 100 independent trials.

Effect of depth on the performance
As described in Sect. 2, an AICT with depth n contains 2 n − 1 affine invariant points. Apparently, the deeper the AICT is, the stronger ability it has to capture the inherent structure of the point set. However, whether we can get better performance while the depth increases, and which depth is the best choice while the performance and computational complexity are both considered. To answer these questions, we test the effect of depth on the performance of AICT-DOPM when the points in AICT are all used to construct the SPS. The performance of AICT with various depth under affine transformations, outliers and noise are given in Fig. 4. Figure 4a1, a2, the matching accuracy and matching error of AICT-DOPM, denotes that the descriptor has excellent performance when the depth of the AICT is larger than 3. In these circumstances, the (5) NSR = e/2d, matching accuracy nearly all reach 100% while the matching error drop to 0. AICT with depth 3 has poor performance if the target point set is polluted by outliers (Fig. 4b1, b2). Though the performance is becoming bad as ODR increases, it is highly improved when the depth of the AICT is larger than 5. In addition, the performance just only fluctuates slightly if the depth continues to increase. The reason is that the AICT with depth 5 can represent the global structure of the point set well. Therefore, we prefer 5 to other larger value to be the depth of the AICT for point matching under outliers. Similarly, for point matching under noise (Fig. 4c1, c2), 5 is also an available option for the depth of the AICT.

Performance to affine transformations, outliers and noise
In this subsection, the performance of AICT with depth 5 is compared with SC, ICP and TPS-RPM.

Performance to affine transformations
An affine transformation includes rotation, scaling, and shearing transformations. The behavior of algorithms with respect to rotation is first tested. The target point sets are generated while the model point sets are rotated from 0° to 180° with 20° intervals. Experimental results in Fig. 5a1, a2 show that AICT-DOPM can nearly find all correspondences, whereas the matching accuracy and error of other algorithms fluctuate when the rotation angle changes. Especially for ICP and TPS-RPM, which highly depend on the initial correspondence, their performance get worse when the rotation angle is larger. Then, the sensitivity of the descriptors with respect to scaling is evaluated. To obtain the target point sets, the model point sets are transformed while different non-uniform scaling values (i.e., s x /s y ) change from 1.2 to 3 in step of 0.2. The performance of algorithms on point matching is compared in Fig. 5b1, b2, and they denote that AICT-DOPM is more robust to non-uniform scaling. To evaluate the behavior of the algorithms in relation to shearing, the target point sets are obtained when the model point sets are transformed according to different shearing factor k , which are − 3, − 2, − 1, 0, 1, 2, 3. The matching results summarized in Fig. 5c1, c2 verify the invariance of AICT-DOPM to shearing.

Performance to outliers
The sensitivity of algorithms to outliers is tested when different numbers of outliers are added onto the random affine transformed model. Figure 5d1, d2 shows the matching accuracy and error against outliers, respectively. They depict that the accuracy of all algorithms decreases as the ODR increases, and AICT-DOPM has the best performance against outliers.

Performance to noise
Finally, the effect of noise on algorithms is observed. Figure 5e1, e2, the matching accuracy and error, denotes that AICT-DOPM is most robust to noise.

Extended experiments on real data
The proposed algorithm can be used for object recognition once the objects are represented by point sets. In this subsection, the template image (Fig. 6a) and input image (Fig. 6b) are adopted to test the performance of our proposed algorithm on water region recognition. The two images, the real data taken over areas of Taiwan, were acquired by different sensors. The 11 water regions in the input image all have correspondences in the template image which has 24 regions. The closed water regions, which are extracted automatically by a simply threshold segmentation, are numbered, and their contours are labeled by white color in Fig. 6. In the experiment, the water regions are treated as point sets while the contours are sampled with 100 points by uniform spacing. For each region pair, the correspondences between contour points are found by algorithms to estimate the transformations between the two images, and then, the input region is transformed to be close to the template region. Finally, the registration accuracy is measured via the ratio of the area of common domain between the template and transformed input regions to the area of the template region. The larger the registration accuracy, the similar the two regions are. Figure 7 shows the matching results between the No. 7 water region in the template image (blue plus sign) and the No. 2 water region in the input image (red cycle) using different algorithms. The point matching results are given in the top row, and in the bottom row, contours of transformed regions are plotted on the templates to show the performance of algorithms intuitively. Furthermore, the water region recognition results of algorithms are summarized in Table 1. It demonstrates that AICT-DOPM is much better at point-based object recognition than SC, ICP and TPS-RPM.

Conclusion
In this paper, a novel AICT is proposed to generate an affine invariant SPS to refine the performance of moments. For a given point, the point set is first partitioned into two subsets by the affine invariant vector from the point to the centroid of the point set, the centroid of subsets is stored as the sons of the point. The sons and their father can form vectors to induce renewed partitions. The process will go on until the AICT achieves a given depth. The points in the AICT are arranged in order to construct a rigorous affine invariant SPS. Then, a descriptor of the point can be captured while a moment is combined. Finally, the similarity between two points is measured between the descriptors,   Input  I1  I2  I3  I4  I5  I6  I7  I8  I9  I10  I11  Accuracy (%)   Matched  T6  T7  T10  T9  T13  T12  T14  T15  T16  T17  T21 AICT and the point matching-based object recognition can be achieved. The comparative analysis has been performed against three state-of-the-art algorithms including SC, ICP, and TPS-RPM on synthetic and real data, and the results denote that our proposed algorithm outperforms others in the presence of affine transformations, outliers and noise. Proof As shown in Fig. 8, to describe the relative position of p k with respect to the vector from p i to p j (i.e., v ij ) clearly, we first establish a positively oriented orthonormal frame (O ij , x ij , y ij ) , in which v ij is taken as x ij . Then, the y − axis coordinate of p k in (O ij , x ij , y ij ) can be computed by (6) c

Availability of data and materials
The datasets used during the current study are available from the corresponding author on reasonable request.

Declarations
Ethics approval and consent to participate This work does not involve human participants, human data or human tissue.
x i y i 1 x j y j 1 x k y k 1      .