Robustly building keypoint mappings with global information on multispectral images
- Yong Li^{1}Email author,
- Hongbin Jin^{1},
- Wei Qiao^{1},
- Jing Jing^{1} and
- Hang Yu^{1}
https://doi.org/10.1186/s13634-015-0240-z
© Li et al. 2015
Received: 16 January 2015
Accepted: 12 May 2015
Published: 1 July 2015
Abstract
This paper proposes an approach to robustly build keypoint mappings on multispectral images. The distinctiveness and repeatability of descriptors often decrease significantly on multispectral images and thus give unreliable keypoint mappings. To complement this decrease, global information over entire images is induced in this work to evaluate keypoint mappings. Initial keypoint mappings are established by utilizing descriptors. A pair of keypoint mappings determines a similarity transformation T, and then it is evaluated with the induced global information that is defined to be the similarity metric between the reference image and the transformed image by T. A process is utilized that iteratively considers the pairs of keypoint mappings and searches the best reference matched keypoint for every test keypoint. Experimental results show that the proposed approach can provide more reliable keypoint mappings than SIFT, ORB, FREAK, and ISS on multispectral images.
Keywords
1 Introduction
Multispectral imaging has been widely applied in a variety of applications such as monitoring of natural disaster and battlefield surveillance. The fusion of images taken by different spectral light can often provide more information about objects of interest and scenes than a single-spectrum light. A satisfying fusion usually requires image registration as the building block, and the registration performance has a great effect on the fusion quality.
1.1 Related work
Registering multispectral images has been a challenging problem due to the lack of explicit or implicit relationship between the values of corresponding pixels. In literature, there are two categories of registration methods, registration based on image features and registration based on image intensity [1]. Among intensity-based methods are mutual information [2], MIND [3], and maximum likelihood (ML) [4]. Let I _{ r }(x,y) and I _{ t }(x,y) denote the reference and test image. Intensity-based methods typically construct an objective/registration function \(f(I_{r}(x,y), {I_{t}^{T}}(x,y))\) of the transformation parameter T between images. Then, the task of aligning I _{ r }(x,y) and I _{ t }(x,y) amounts to searching for the T at which \(f(I_{r}(x,y), {I_{t}^{T}}(x,y))\) achieves the extremum.
The problem with intensity-based methods is that any optimization technique may fail to find the ground truth transformation parameters [5]. To improve the convergence of an optimization algorithm, the misalignment is often assumed to be small, e.g., several pixels. This assumption is equivalent to the following: an estimate \(\tilde T\) of the ground truth can be obtained falling into the converging basin of \(f(I_{r}(x,y), {I_{t}^{T}}(x,y))\), allowing for the optimization algorithm to achieve the global extremum. When the misalignment is relatively large, any optimization algorithm may easily be trapped in local extrema, ending with an unsuccessful registration.
Another category of intensity-based methods is Fourier methods. The translation of two images in spatial domain corresponds to the peak of the inverse Fourier transform of the product of two Fourier transformations. Tzimiropoulos et al. [6] propose a FFT-based approach to aligning scale-invariant images in which the log-polar Fourier is used to estimate the scaling and rotation. Pan et al. [7] propose multilayer fractional Fourier transform (MLFFT) to improve the accuracy of registering images with respect to both rotation and scaling. The problem with the Fourier methods lies in the difficulty that translation, rotation, and scaling can not be dealt with simultaneously generally.
Other intensity-based techniques include region-based confidence weighted M-estimators [8] that deal with image sets with arbitrarily shaped local illumination variations caused by changes and movement of light sources. Zosso et al. [9] propose geodesic active fields that couple the registration term and regularization term. The energy of the deformation field is measured with the Polyakov energy weighted by a suitable image distance. Xing and Qiu [10] propose the using of nonparametric local smoothing to determine the underlying transformation, which does not need to assume that the mapping transformation has a certain type of parametric form. Liu et al. [11] propose mean local phase angle (MLPA) and frequency spread phase congruency (FSPC) using local frequency information to emphasize the common structural information while suppressing the sensor-dependent information.
Feature-based registration methods firstly build feature mappings and then compute the transformation parameters without resorting to any optimization techniques. In the past, a variety of image features such as keypoints have been proposed. Among commonly used features are keypoints and descriptors. Lowe [12] proposed the scale invariant feature transform (SIFT) detecting keypoints and descriptors invariant to scale and rotation. A main orientation is assigned to a keypoint, and the local gradient pattern with respect to the main orientation is computed as its descriptor. Bay et al. [13] proposed Speeded-Up Robust Features (SURF). SURF has the same repeatability and distinctiveness as SIFT but is computed faster than SIFT by employing integral images. Alahi et al. [14] propose Fast Retina Keypoint (FREAK). FREAK is a cascade of binary strings computed by comparing image intensities over a retinal sampling pattern. Ambai and Yoshida [15] propose compact and real-time descriptors (CARD). CARD can be computed rapidly by utilizing lookup tables to extract histograms of oriented gradients.
SIFT, SURF, FREAK, and CARD are suitable for monomodal images. To utilize descriptors for building keypoint mappings on multispectral images, partial intensity invariant feature descriptor (PIIFD) was proposed that adapted the gradient pattern to gradient and region reverse [16]. Saleem and Sablatnig [17] proposed using normalized gradients for computing descriptors to achieve robustness against intensity changes between multispectral images. Wang et al. [18] proposed modified sift feature extraction algorithm with shape-context descriptor (MSSCD). MSSCD computes a 3D histogram of edge point locations and orientations around a keypoint as its shape context descriptor.
1.2 The proposed approach
Although MSSCD and PIIFD improve the matching ability of these descriptors on multispectral images, they still generate a high ratio of incorrect mappings since the amount of common information decreases on them. Our previous work [19] considered affine transformations and utilized global information to evaluate triplets of keypoint mappings. To obtain the best matched reference keypoint for a test keypoint, an iterative process is employed that exhausts all triplets of possible keypoint mappings, and the computational complexity of the iterative process is large. For many multispectral images however, a translation [20] or a similarity transformation [21] may be enough to account for the misalignment. Observing this, this paper proposes utilizing global information and descriptors to establish keypoint mappings on two images between which the misalignment can be accounted for by similarity transformations. Since two keypoint mappings are required for calculating a similarity transformation, the computational complexity of exhausting pairs of keypoint mappings is greatly reduced compared with exhausting triplets of keypoint mappings.
The contribution of this paper is to utilize global information to build keypoint mappings. The proposed method has a much lower computational cost than exhausting triplets of keypoint mappings, but can still robustly build keypoint mappings on multispectral images. The matching ability of descriptors decreases on multispectral images, and hence the ratio of correct keypoint mappings is not so high as on monomodal images. Due to this, other information must be employed to help build robust keypoint mappings. One option is to increase the size of the local window for computing descriptor, allowing for more information to be encoded by descriptors. In most existing descriptors on single-spectrum images, one main orientation suffices to characterize the (local) geometric mistransformation since in sufficiently small regions any transformations reduce to rotation, translation, and scaling. However, for a window of a larger size on multispectral images, correctly assigning a main orientation is itself a challenging task [22].
To enhance the matching ability of descriptors, this work proposes utilizing information over entire images. Two keypoint mappings are needed to determine a similarity transformation that comprises scaling and rotation. The determined rotation and scaling in effect serve as a main orientation for the entire image when used as computing descriptors. The proposed method is similar to RANSAC in that both methods sample the combinations of keypoint mappings and then evaluate the sampled combinations. However, it differs essentially from RANSAC in that RANSAC only utilizes keypoint positions, i.e., in RANSAC, the sampled combinations are assessed with the number of correct mappings in the rest. Due to the low ratio of correct mappings on multispectral images, the correct/good combinations are often mis-assessed to be incorrect/bad. While the proposed method utilizes the global information (encoded by the similarity metric) so that good mappings “conform to” the content of entire images, and thus the keypoint mappings of high similarity metric are more likely to be correct than built with RANSAC.
The rest of this paper is organized as follows, Section 2 discusses the proposed method, Section 3 analyzes the complexity of the proposed algorithm, Section 4 presents the experimental results, and Section 5 concludes this paper.
2 Proposed approach
This section presents the registration approach to aligning multispectral images. The misalignment is assumed to be small (i.e., not wide-baseline) and can be accounted for by a similarity transformation. For a test keypoint, the distance constraint is applied to narrow the space of its mapping candidates. Given a pair of keypoint mappings, a similarity transformation T is determined, then the similairty metric between I _{ r }(x,y) and \({I_{t}^{T}}(x,y)\) is calculated over entire images. Intuitively, the greater the similarity metric, the better the pair “conforms to” the entire image content. The insight of this paper comes from the following observation. Descriptors around keypoints encode the local information, and two keypoints are matched if the local information around them have the most common information/structure. However, multispectral images contain less common information than the-same-band (monomodal) images. Therefore, the local information around keypoints, i.e., descriptors, can not provide so many correct keypoint mappings, especially when the spectral difference is large. Intuitively, the local information on multispectral images becomes insufficient to decide whether a keypoint mapping is correct. Consequently, other complementary information to descriptors is necessitated for building reliable keypoint mappings.
To build keypoint mappings with descriptors on multispectral images, global information is utilized in this paper to evaluate keypoint mappings. A keypoint mapping is decided to be correct if its resulting T yields a large similarity metric between I _{ r }(x,y) and \({I_{t}^{T}}(x,y)\). This paper deals with similarity transformations, which require at least two keypoint mappings (i.e., a pair) for determining the misalignment. Since there are multiple such pairs of keypoint mappings, an iterative process is employed to search the best matched reference keypoint for every test keypoint. Calculating T uses the information over entire images, and we call it global information.
2.1 Distance constraints
Note, although A comprises four entries a _{11}, a _{12}, a _{21}, a _{22} in Equation 1, there are only four unknown variables in Equation 3, a, b, t _{ x }, t _{ y }, to be determined. This is the reason that a similarity transformation needs only two keypoint mappings, as compared with an affine transformation that needs three mappings.
When the misalignment is relatively small, the spatial distance of a test keypoint p _{ t } in I _{ t }(x,y) to its corresponding point p _{ r } in I _{ r }(x,y) is small [19, 23].
Formally, \(\| \mathbf {p_{t}} - \mathbf {p_{r}}\|_{2} \leq \sqrt {2} \| A-I\|_{\infty } \cdot \|\mathbf {p_{t}}\|_{2} + \|\mathbf {t}\|_{2} < T_{\textit {trd}}\). T _{ trd } is a threshold to be set. In this work, T _{ trd } is set to the 1/4 the maximum of the height and width of images to be aligned, i.e., T _{ trd }=1/4· max{H,W}, where H (W) is the height (width) of images. 1/4· max{H,W} is used here since a point p _{ t } will not move farther than it given that the unknown misalignment is relatively small. Under this assumption, the distance constraint can easily rule out a large number of wrong keypoint mappings.
A wrong mapping here is referred to two matched keypoints that are spatially far away from each other.
2.2 Building initial keypoint mappings
Due to the gradient reversal and region reversal, the repeatability and distinctiveness decrease significantly on multispectral images, and hence the initial keypoint mappings contain a high ratio of incorrect ones [24]. The set of initially built keypoint mappings are used in Section 2.4 for searching the best matched reference keypoint for every test keypoint.
2.3 Evaluating a pair of keypoint mappings
Given a keypoint mapping \(\left (K_{t}^{i_{1}} \sim K_{r}^{j_{1}}\right)\), \(K_{r}^{j_{1}}\) is the best matched keypoint to \(K_{t}^{i_{1}}\). “Best matched” means the local region around \(K_{r}^{j_{1}}\) is more similar to \(K_{t}^{i_{1}}\) than other keypoints on I _{ r }(x,y). Further evaluation of this mapping is often accomplished by applying “consistence check” to the set of initial mappings. RANSAC [25] is a commonly used technique to separate out correct mappings. When the ratio of wrong mappings is high, it often fails to work. Observing this, this paper proposes utilizing global information to compensate the decrease of the matching ability of descriptors.
Once a,b,t _{ x },t _{ y } are determined, I _{ t }(x,y) is transformed by T to obtain \({I_{t}^{T}}(x,y)\) with Equation 3. The similarity metric \(S(I_{r}(x,y), {I_{t}^{T}}(x,y))\) between I _{ r }(x,y) and \({I_{t}^{T}}(x,y)\) is computed. The greater the similarity metric, the closer the two keypoint mappings to be correct.
where E _{ r }(x,y) and \({E_{t}^{T}}(x,y)\) are edge maps of I _{ r }(x,y) and \({I_{t}^{T}}(x,y)\), respectively. \(S\left (I_{r}(x,y), {I_{t}^{T}}(x,y)\right)\) simply counts the number of overlapped edge pixels.
The NOEP represents the similarity of two edge maps. It serves as a descriptor in the sense that it encodes the distribution of edge points and hence characterizes the content structure. Alternatively, NOEP can be treated as a simplified version of edge of histogram (EOH) [26] calculated on an entire image instead of a local window. Due to the gradient reversal [27], gradient orientation is unreliable, but the position of edges tends to be stable and has been used for computing similarity metric on multispectral images [20]. Note the superscript T of \({E_{t}^{T}}(x,y)\) in (5) is a generalized version of main orientation. The main orientation of a keypoint accounts for the local geometric view difference with rotation. For an entire image (a larger window), a more complex transformation is required other than only rotation to align the entire image (a larger descriptor).
2.4 Searching for the best matched keypoint
Due to the multimodality, some test keypoints may not have any corresponding reference keypoints on I _{ r }(x,y). To rule out incorrect mappings from the initially built keypoint mappings, this section computes for every test keypoint \({K_{t}^{i}}, i=1, \ldots, N_{t}\), the maximum similarity metric \(S_{\max }^{t}(i)\) it can yield. Then, the vector \(S_{\max }^{t}(i), i=1, \ldots, N_{t}\), is ordered, and the test keypoints ranked top 15 % is preserved to calculate the final transformation parameters. In short, this section includes two steps, the first is to compute the maximum similarity metric for every test keypoint, and the second is to choose test keypoints for computing the transformation parameters.
where \(T_{i_{0}, i}\) is determined by (\(K_{t}^{i_{0}}, {K_{t}^{i}}\)) and their initial mapping reference keypoints.
To compute the maximum similarity metric for every test keypoint, an iterative process is employed that exhausts all pairs of keypoint mappings.
The iterative process picks a pair of test keypoints \(\left (\!K_{t}^{i_{1}}, K_{t}^{i_{2}}\!\right)\) and their reference mapped keypoints \(\left (K_{r}^{k_{i_{1}}}, K_{r}^{k_{i_{2}}}\right)\), the distance constraint in Section 2.1 is applied to \(\left (\!K_{t}^{i_{1}}, K_{r}^{k_{i_{1}}}\!\right)\) and \(\left (K_{t}^{i_{2}}, K_{r}^{k_{i_{2}}}\right)\), to remove the keypoint mappings with a greater distance than the threshold T _{ trd }. Additionally, we require the distance between two test keypoints in a pair be greater than a threshold T _{ ttd }, as a pair consisting of smaller-distance keypoints often provides unreliable transformation. In this work, T _{ ttd }=10.
The similarity transformation T is determined with \(\left (K_{t}^{i_{1}}, K_{t}^{i_{2}}\right) \sim \left (K_{r}^{k_{i_{1}}}, K_{r}^{k_{i_{2}}}\right)\), and the similarity metric is calculated. The iterative process considers all pairs of keypoint mappings and stores the maximum similarity metric for every test keypoint. It is summarized in Algorithm ??.
3 Complexity analysis
This section analyzes the computational complexity of the proposed method. Firstly, we discuss the computational cost when the distance constraints are not applied, and then give the real running time when the constraints are applied. Since there are N _{ t } test keypoints, the number of combinations of two test keypoints is \(\binom {N_{t}}{2}\). If we are dealing with affine transformations, at least three keypoint mappings are needed to determine an affine transformation. Three keypoint mappings form a triplet, and there are totally \(\binom {N_{t}}{3}\) such triplets. Consequently, the number of triplets of keypoint mappings is roughly N _{ t } times that of the pairs of keypoint mappings.
On multispectral images, the closest reference keypoint may not be the correct one, so multiple mapping candidates are assigned to a test keypoint [28]. If N _{ c } mapping candidates are assigned to every test keypoint like ref. [19], then the computational cost of the proposed method is \(\binom {N_{t}}{2} \cdot {N_{c}^{2}}\), and the computational cost of the approach to dealing with affine transformations in [19] is \(\binom {N_{t}}{3} \cdot {N_{c}^{3}}\), which is about N _{ t }·N _{ c } times that of the presented algorithm. The similarity transformation used in this paper is sufficient to account for a wide variety of images, e.g., the remote sensing images and slices of medical images. When the misalignment does not involve a lot of skewing, the computational cost of the presented method is roughly N _{ t }·N _{ c } times less than the affine transformation model.
Mean and standard deviation of the running time in seconds of the proposed method and the time required by affine transformation models
Dataset | Proposed method | Affine transformations | |||
---|---|---|---|---|---|
μ _{ t } (s) | σ _{ t } (s) | μ _{ t } (s) | σ _{ t } (s) | ||
EOIR | 6.23 | 10.29 | 63.35 | 480.56 | |
Visible_nir | 5.02 | 4.59 | 155.49 | 228.39 | |
Country | 43.27 | 31.97 | 109.82 | 141.35 | |
Field | 55.92 | 62.10 | 946.60 | 2540.73 | |
Forest | 80.78 | 47.24 | 938.17 | 1181.63 | |
Indoor | 46.57 | 48.84 | 875.28 | 1274.22 | |
Mountain | 44.01 | 43.74 | 1316.91 | 2601.72 | |
Oldbuilding | 72.93 | 74.19 | 1954.95 | 3306.19 | |
Street | 43.38 | 38.65 | 954.92 | 3242.05 | |
Urban | 94.24 | 67.59 | 5585.69 | 6886.24 | |
Water | 26.08 | 25.32 | 259.07 | 444.64 |
However, even with the distance constraints, the mean running time for affine transformation models is still about 10 ∼20 times more than the proposed method. For the image pairs between which the misalignment contains little skew, a similarity transformation is sufficient to account for the misalignment.
In Table 1, we can also see that the proposed method does not provide a real-time registration on any dataset. Two factors contribute to the computational cost, the number of pairs of keypoint mappings, and the complexity of calculating similarity metric in (5) between two images. To improve the running the speed, there are two aspects accordingly. The first is to reduce the number of pairs of keypoint mappings. For this, improving the matching ability of descriptors is a direction as otherwise the ratio of correct keypoint mappings is low and hence we would need to consider sufficiently many pairs of keypoint mappings.
The second is to substitute (5) for a simpler similarity metric of lower complexity. An image feature that effectively represents the entire image for use in registration, and a fast-running similarity metric can improve the running speed. Additionally, a multiresolution technique is an option to lower the computational expense of the presented method. A heuristic calculation of similarity metric will reduce some computation, like the search of extremum points in SIFT [12]. The edge points are assigned to different priorities, and those of a high priority will firstly be used for computing similarity metric. If they do not contribute much to the similarity metric then the calculation stops.
4 Experimental results
This section presents experimental results. The proposed method is compared with the SIFT [12], FREAK [14], improved symmetric-SIFT (ISS) [27], and ORB [29]. The SIFT and ORB are mostly designed for single-mode images, and they are expected to perform well on single-mode images, e.g., visible images. The FREAK is (partly) designed for multispectral images, and the ISS is completely designed for multispectral images.
4.1 Datasets used to test the performance
Three datasets are used to test the performance of the proposed approach. In general, the larger the spectral difference, the stronger the multimodality, and consequently the less repeatable the keypoint and local gradient pattern [30]. We investigate the performance of the proposed method on multispectral images of varying spectral difference.
Dataset 1 (EOIR) includes 101 image pairs acquired by ourselves, one image taken with the visible camera and the other taken with the mid-wave infrared camera (3–5 μ m).
Dataset 2 (Visible_nir) includes real-world hyperspectral image (RWHI) from [31] containing 50 scenes. The images in this dataset were acquired by sequentially tuning a filter through a series of 31 narrow wavelength bands, each with approximately 10-nm bandwidth and centered at steps of 10 nm from 420 to 720 nm (refer [31] for details). We use the 50 image pairs of 420 and 720 nm. Dataset 3 is from [32] including 477 images in 9 categories of scenes: Country, Field, Forest, Indoor, Mountain, Oldbuilding, Street, Urban, Water. The image pairs in dataset 3 are taken with visible camera (RGB) and near infrared (NIR) camera. Since the image pairs in dataset EOIR have a larger spectral distance than the image pairs in dataset Visible_nir and the 9 categories in dataset 3, they contain less common information, and hence the repeatability of descriptors decreases more on dataset EOIR.
The texture information in images is important for establishing keypoint mappings. The image content in dataset EOIR covers 2D indoor scenes, 2D outdoor scenes (e.g., wall of buildings), 3D outdoor scenes, and Landsat images. The 9 categories in dataset 3 cover different scenes as well, on which the performance of keypoint mappings can be effectively evaluated.
4.2 Results of keypoint mappings
This section analyzes the performance of the keypoint mappings built with the SIFT [12], FREAK [14], ISS [27], ORB [29], and the presented method. We first present the visual results of keypoint mappings on the image pairs of dataset EOIR, since this dataset is the most challenging. And then a quantitative analysis is conducted on the performance of keypoints built with different methods.
Next, a quantitative analysis is conducted on the performance of keypoint mappings. Specifically, the number of correct mappings is calculated for each method on every dataset. Assume that \(({K_{t}^{i}}, {K_{r}^{j}})\) is a keypoint mapping, then it will be viewed as correct if \( d (T ({K_{t}^{i}}), {K_{r}^{j}}) < d_{M} \), where d _{ M } is a threshold to be set. In literature, different thresholds on the distance between mapped keypoints have been used to determine whether keypoint mappings are correct or not. These thresholds include 2, 3, 4, 5, etc. To eliminate the effect of thresholds on the performance evaluation of different methods, this work employs the histogram of the distance between mapped keypoints. d _{ M } is set to multiple values, and we count the number of keypoint mappings for which the distance between the two keypoints is smaller than d _{ M }.
The histogram of the distances between mapped keypoints is generated as follows. The bins are set to [0,2], [2,5], [5,10], [10,20], and [20,∞]. For example, the bin [2,5] counts the number of keypoint mappings with the distance less than 5 but greater than 2, and the bin [10,20] counts the number of mappings with the distance greater than 20. Note, other setups for the bins can be used here if a better comparison can be achieved for different methods.
The distribution of the distances between matched keypoints
[0–2] | [2–5] | [5–10] | [10–20] | >20 | [0–2] | [2–5] | [5–10] | [10–20] | >20 | [0–2] | [2–5] | [5–10] | [10–20] | >20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EOIR | Visible_nir | Country | |||||||||||||
Proposed | 234 | 102 | 54 | 244 | 0 | 5732 | 0 | 0 | 0 | 0 | 875 | 35 | 9 | 2 | 0 |
SIFT | 30 | 7 | 9 | 6 | 187 | 4204 | 44 | 1 | 0 | 21 | 379 | 87 | 47 | 14 | 224 |
ISS | 31 | 16 | 5 | 13 | 295 | 1879 | 27 | 8 | 7 | 106 | 213 | 54 | 18 | 12 | 854 |
ORB | 328 | 87 | 50 | 122 | 4616 | 12,946 | 563 | 3 | 0 | 3 | 195 | 95 | 59 | 56 | 6745 |
FREAK | 11 | 11 | 5 | 10 | 358 | 9 | 2 | 4 | 11 | 189 | 12 | 10 | 15 | 35 | 9517 |
Field | Forest | Indoor | |||||||||||||
Proposed | 2480 | 99 | 40 | 60 | 0 | 2814 | 24 | 0 | 0 | 0 | 4803 | 0 | 0 | 0 | 0 |
SIFT | 691 | 116 | 28 | 15 | 183 | 6045 | 1829 | 328 | 9 | 199 | 464 | 23 | 7 | 13 | 108 |
ISS | 246 | 73 | 37 | 10 | 747 | 6 | 2 | 0 | 4 | 1229 | 299 | 26 | 16 | 7 | 155 |
ORB | 295 | 168 | 65 | 49 | 2805 | 2316 | 763 | 22 | 6 | 373 | 391 | 57 | 17 | 19 | 598 |
FREAK | 11 | 9 | 11 | 25 | 3946 | 0 | 0 | 2 | 5 | 876 | 69 | 36 | 36 | 103 | 5351 |
Mountain | Oldbuilding | Street | |||||||||||||
Proposed | 4676 | 20 | 1 | 38 | 0 | 6783 | 31 | 0 | 0 | 0 | 3369 | 70 | 5 | 0 | 0 |
SIFT | 742 | 258 | 31 | 14 | 65 | 696 | 81 | 17 | 0 | 11 | 356 | 94 | 29 | 1 | 13 |
ISS | 179 | 98 | 25 | 5 | 170 | 281 | 32 | 10 | 2 | 46 | 179 | 54 | 18 | 9 | 273 |
ORB | 269 | 186 | 17 | 3 | 359 | 304 | 103 | 30 | 1 | 27 | 209 | 144 | 49 | 10 | 731 |
FREAK | 97 | 13 | 28 | 58 | 5112 | 43 | 24 | 20 | 65 | 6779 | 37 | 6 | 20 | 82 | 6193 |
Urban | Water | ||||||||||||||
Proposed | 12,685 | 0 | 0 | 0 | 0 | 1714 | 30 | 6 | 66 | 0 | |||||
SIFT | 735 | 8 | 4 | 3 | 16 | 425 | 44 | 17 | 11 | 101 | |||||
ISS | 389 | 14 | 0 | 7 | 67 | 263 | 35 | 11 | 16 | 515 | |||||
ORB | 366 | 17 | 10 | 0 | 8 | 300 | 122 | 69 | 78 | 2354 | |||||
FREAK | 111 | 34 | 35 | 115 | 9629 | 32 | 20 | 26 | 75 | 6831 |
One observation on the comparison result shown in Table 2 is that the dataset “EOIR” is the most challenging. In most cases, all methods including the proposed one perform worse on “EOIR” than other datasets. Take the ORB method for an example, and we consider the number of keypoint mappings that have a distance greater than 20 (the worst case). It provides 4616 mappings on dataset “EOIR” that have a distance falling in “ >20”, only 3 mappings on dataset Visible_nir, 6745 mappings on Country, 2805 mappings on Field, 373 mappings on Forest, 598 mappings on Indoor, 359 mappings on Mountain, 27 mappings on Oldbuilding, 731 mappings on Street, 8 mappings on Urban, and 2354 mappings on Water. The reason is that as aforementioned, the multimodality of the image pairs in dataset EOIR is greater than that in other datasets, and hence the matching performance of descriptors decrease on EOIR.
Another observation is that ISS does not perform better than SIFT, although ISS is designed to adapt the descriptor of SIFT to multispectral images. On dataset EOIR, the SIFT performs slightly better or comparable to ISS, and on other, datasets the SIFT performs evidently better than ISS since the multimodality decreases on them. The underlying mechanism causing this phenomenon needs further investigation on more types of multispectral images.
Table 2 clearly shows that the matching performance of descriptors decreases when the multimodality of image data increases. On single-mode images, the SIFT and ORB perform fairly well for coping with integer-pixel alignment. On multispectral images, the common information around keypoints may be not enough for robustly establishing keypoint mappings. Either more common and distinctive information near keypoints can be encoded to enhance the matching ability of descriptors, or complementary information at regions not-too-near keypoints are desired to correctly build keypoint mappings. This will be the future work.
5 Conclusions
This work presents a registration approach on multispectral images. A similarity transformation is considered for accounting for the misalignment between two images. Global information over entire images is induced to help evaluate the quality of keypoint mappings. Compared with the methods that solely use descriptors for building keypoint mappings, the proposed approach effectively compensates the insufficiency of the repeatability and distinctiveness of descriptors and hence provides more correct mappings.
Several future research directions can be done to further improve the performance. The matching ability of descriptors can be researched by analyzing, extracting, and encoding the common information between multispectral images. Although it is not the focus of this work, the matching ability can improve the overall registration accuracy. Another direction is on the similarity metric that has been used for evaluating the quality of keypoint mappings. It carries the global information of entire images and its effective characterization will be expected to bring more precise keypoint mappings.
Declarations
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grants No., NSFC- 61170176), Fund for the Doctoral Program of Higher Education of China (Grants No., 20120005110002), Fund for Beijing University of Posts and Telecommunications (Grants No., 2013XD-04, 2013XZ10), Fund for National Great Science Specific Project (Grants No. 2014ZX03002002-004).
Authors’ Affiliations
References
- LG Brown, A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992).View ArticleGoogle Scholar
- P Viola, III, WMW, in Proceedings of the Fifth International Conference on Computer Vision. Alignment by Maximization of Mutual Information (IEEE,Cambridge, MA, 1995), pp. 16–23.Google Scholar
- MP Heinrich, M Jenkinson, M Bhushan, T Matin, FV Gleeson, SM Brady, JA Schnabel, Mind: Modality independent neighbourhood descriptor for multi-modal deformable registration. Med. Image Anal. 16(7), 1423–1435 (2012).View ArticleGoogle Scholar
- S Chen, Q Guo, H Leung, Bosse, É, A maximum likelihood approach to joint image registration and fusion. IEEE Trans. Image Process. 20(5), 1363–1372 (2011).MathSciNetView ArticleGoogle Scholar
- P Thévenaz, M Unser, Optimization of mutual information for multiresolution image registration. IEEE Trans. Image Process. 9(12), 2083–2099 (2000).View ArticleMATHGoogle Scholar
- G Tzimiropoulos, V Argyriou, S Zafeiriou, T Stathaki, Robust fft-based scale-invariant image registration with image gradients. IEEE Trans. Pattern Anal. Mach. Intell. 32(10), 1899–1906 (2010).View ArticleGoogle Scholar
- W Pan, K Qin, Y Chen, An adaptable-multilayer fractional fourier transform approach for image registration. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 400–413 (2009).View ArticleGoogle Scholar
- MM Fouad, RM Dansereau, AD Whitehead, Image registration under illumination variations using region-based confidence weighted m-estimators. IEEE Trans. Image Process. 21(3), 1046–1060 (2012).MathSciNetView ArticleMATHGoogle Scholar
- D Zosso, X Bresson, J-P Thiran, Geodesic active fieldsa geometric framework for image registration. IEEE Trans. Image Process. 20(5), 1300–1312 (2011).MathSciNetView ArticleGoogle Scholar
- C Xing, P Qiu, Intensity-based image registration by nonparametric local smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 2081–2092 (2011).View ArticleGoogle Scholar
- X Liu, Z Lei, Q Yu, X Zhang, Y Shang, W Hou, Multi-modal image matching based on local frequency information. EURASIP J. Adv. Signal Process. 2013(3), 1–11 (2013).Google Scholar
- DG Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004).View ArticleGoogle Scholar
- H Bay, A Ess, T Tuytelaars, LV Gool, Speeded up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008).View ArticleGoogle Scholar
- A Alahi, R Ortiz, P Vandergheynst, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition. FREAK: Fast Retina Keypoint (IEEEProvidence, RI, 2012), pp. 510–517.Google Scholar
- M Ambai, Y Yoshida, in IEEE International Conference on Computer Vision. CARD: Compact And Real-time Descriptors (IEEE,Barcelona, 2011), pp. 97–104.Google Scholar
- J Chen, J Tian, N Lee, J Zheng, RT Smith, AF Laine, A partial intensity invariant feature descriptor for multimodal retinal image registration. IEEE Trans. Biomed. Eng. 57(7), 1707–1718 (2010).View ArticleGoogle Scholar
- S Saleem, R Sablatnig, A robust sift descriptor for multispectral images. IEEE Signal Process. Lett. 21(4), 400–403 (2014).View ArticleGoogle Scholar
- W Bingjian, L Quan, L Yapeng, L Fan, B Liping, L Gang, L Rui, Image registration method for multimodal images. Appl. Opt. 50(13), 1861–1867 (2011).View ArticleGoogle Scholar
- Y Li, R Stevenson, Incorporating global information in feature-based multimodal image registration. J. Electron. Imaging. 23(2), 023013–1–023013-14 (2014).Google Scholar
- KM Simonson Jr, S M D, FR Tanner, A statistics-based approach to binary image registration with uncertainty analysis. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 112–125 (2007).View ArticleGoogle Scholar
- G Yang, CV Stewart, M Sofka, C-L Tsai, Registration of challenging image pairs: Initialization, estimation, and decision. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1973–1989 (2007).View ArticleGoogle Scholar
- S Gauglitz, M Turk, T Höllerer, in British Machine Vision Conference. Improving Keypoint Orientation Assignment (BMVC Press,University of Dundee, 2011).Google Scholar
- Y Wu, W Ma, M Gong, A novel point-matching algorithm based on fast sample consensus for image registration. IEEE Geosci. Remote Sens. Lett. 12(1), 43–47 (2015).View ArticleGoogle Scholar
- MT Hossain, SW Teng, G Lu, in International Conference on Digital Image Computing: Techniques and Applications (DICTA). Achieving High Multi-Modal Registration Performance Using Simplified Hough-Transform with Improved Symmetric-SIFT (IEEE,Fremantle, WA, 2012), pp. 1–7.Google Scholar
- MA Fischler, RC Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981).MathSciNetView ArticleGoogle Scholar
- C Aguilera, F Barrera, F Lumbreras, AD Sappa, R Toledo, Multispectral image feature points. Sensors12, 12661–12672 (2012).View ArticleGoogle Scholar
- MT Hossain, G Lv, SW Teng, G Lu, M Lackmann, in International Conference on Digital Image Computing: Techniques and Applications (DICTA). Improved Symmetric-SIFT for Multi-modal Image Registration (IEEE,Noosa, QLD, 2011), pp. 197–202.Google Scholar
- Y Wu, W Ma, M Gong, L Su, L Jiao, A novel point-matching algorithm based on fast sample consensus for image registration. IEEE Geoscience Remote Sensing Lett. 12(1), 43–47 (2015).View ArticleGoogle Scholar
- E Rublee, V Rabaud, K Konolige, G Bradski, in IEEE Computer Vision (ICCV). ORB: An Efficient Alternative to SIFT Or SURF (IEEE,Barcelona, 2011), pp. 2564–2571.Google Scholar
- Z Ghassabi, J Shanbehzadeh, A Sedaghat, E Fatemizadeh, An efficient approach for robust multimodal retinal image registration based on ur-sift features and piifd descriptors. EURASIP J. Image Video Process. 2013(25), 1–15 (2013).Google Scholar
- A Chakrabarti, T Zickler, in IEEE Conference on Computer Vision and Pattern Recognition. Statistics of Real-World Hyperspectral Images (IEEE,Providence, RI, 2011), pp. 193–200.Google Scholar
- M Brown, S Süsstrunk, in IEEE Conference on Computer Vision and Pattern Recognition. Multi-Spectral SIFT for Scene Category Recognition (IEEE,Providence, RI, 2011), pp. 177–184.Google Scholar
- Y Ke, R Sukthankar, in 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. PCA-SIFT: A More Distinctive Representation for Local Image Descriptors (IEEE,Washington, DC, 2004), pp. 506–513.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.