A novel information transferring approach for the classification of remote sensing images
- Jianqiang Gao^{1}Email author,
- Lizhong Xu^{1},
- Jie Shen^{1},
- Fengchen Huang^{1} and
- Feng Xu^{1}
https://doi.org/10.1186/s13634-015-0223-0
© Gao et al.; licensee Springer. 2015
Received: 30 April 2014
Accepted: 2 April 2015
Published: 24 April 2015
Abstract
Traditional remote sensing images classification methods focused on using a large amount of labeled target data to train an efficient classification model. However, these approaches were generally based on the target data without considering a host of auxiliary data or the additional information of auxiliary data. If the valuable information from auxiliary data could be successfully transferred to the target data, the performance of the classification model would be improved. In addition, from the perspective of practical application, these valuable information from auxiliary data should be fully used. Therefore, in this paper, based on the transfer learning idea, we proposed a novel information transferring approach to improve the remote sensing images classification performance. The main rationale of this approach is that first, the information of the same areas associated with each pixel is modeled as the intra-class set, and the information of different areas associated with each pixel is modeled as the inter-class set, and then the obtained texture feature information of each area from auxiliary is transferred to the target data set such that the inter-class set is separated and intra-class set is gathered as far as possible. Experiments show that the proposed approach is effective and feasible.
Keywords
1 Introduction
Remote sensing images classification is a complex process that may be affected by many factors, such as the availability of high-quality images, proper classification method, and the analytical ability of scientists. For a particular problem, it is often difficult to identify the best classifier due to the lack of a guideline for selection and the availability of suitable classification approaches to band. Therefore, many researchers proposed all kinds of algorithms to address the remote sensing images classification problems. In [1], the authors built textural information model that use spatial information, and then proposed a wavelet-based multi-scale strategy to characterize local texture, taking the physical nature of the data into account, then the extracted textural information was used as new feature to build a texture kernel and the final kernel was the weighted sum of a kernel made with the spectral information and the texture kernel. In [2], the authors proposed applying kernels on a segmentation graph method. Fauvel et al. [3] proposed a spatial-spectral kernel-based approach with the spatial and spectral information were jointly used for the classification. A kernel-based block matrix decomposition approach for the classification of remotely sensed images was proposed by Gao et al. [4]. Tuia et al. [5] used active learning to adapt remote sensing image classifiers. Their goal is to select these pixels in an intelligent fashion that minimizes their number and maximizes their information content. Two strategies based on uncertainty and clustering of the data space are considered to perform active selection. In [6], Dos Santos J.A. et al. proposed a method for interactive classification of remote sensing images considering multiscale segmentation. Their aim is to improve the selection of training samples using the features from the most appropriate scales of representation. They use a boosting-based active learning strategy to select regions at various scales for user’s relevance feed back. However, these approaches may ignore the auxiliary data of the remote sensing images. In other words, they do not take the auxiliary data into account in the classification model. In this paper, we aim to transfer the texture feature information from the auxiliary data to the target data to improve the classification performance of remote sensing images.
In the traditional classification learning framework, a classification task is to first train a classification model on a labeled training data. And then, the learned model is used to classify a test data set. Hence, under such a framework, the learning method relies on the availability of a large amount of labeled data. In practice, high-quality labeled data are often hard to come by, especially for learning tasks in a new region. Labeling data in a new region involves much human labor and is time-consuming, such as [5,6]. But, fortunately, some auxiliary data such as the texture information are easy to obtain. Therefore, it is reasonable to consider that how to make full use of the valuable texture information of some auxiliary data to improve the classification performance.
Recently, transfer learning [7] has become a popular machine learning method which utilizes auxiliary data for learning. Transfer learning is concerned with adapting knowledge acquired from one source domain to solving problems in another different but related target domain [8]. Generally speaking, traditional machine learning models assume that the training samples collected previously inherit the same feature and distribution as new, incoming data samples during operation [9]. However, in many real-world cases, this assumption does not always hold. In fact, in regard to data classification in non stationary environment, it is not unlikely that the training data set follows a different data distribution as compared with the actual incoming data samples during operation. Such as, in communication channels, discrete signals generated by a specific sequence from a source could be corrupted by Gaussian noise in the transmission process; so, the received signals could deviate from the signal sequence [10]. In this case, traditional machine learning models may not be able to perform well when dealing with the new data samples in the target domain. Hence, the ability of transfer learning would greatly improve the robustness of machine learning models by transferring and adapting knowledge learned from one domain to another related, but different domain. On the other hand, a large set of data samples from a particular task normally is required to train an effective machine learning model [11]. The main principle of transfer learning is that even though the data distributions in the source and target domains are different, some common knowledge across both domains can be adapted for learning [12].
Many researchers have proposed all kinds of methods to transfer learning information or knowledge from auxiliary data. In [13], authors proposed a TrAdaBoost transfer learning framework which constructed a high-quality classification model for target domain by a small number of labeled data and auxiliary data. In [14], authors proposed an extensional method called MultiSource-TrAdaBoost to extend the TrAdaBoost framework for solving multiple sources. In [15], authors proposed a matrix factorization framework to build two mapping matrices for the training images and the auxiliary text data. Based on the co-occurrence data, the correlative principle was introduced to transfer knowledge from text to images by Qi et al. [16]. The authors of reference [17] use an auxiliary data set to construct the pseudo text for each target image, and then, by exploiting the semantic structure of the pseudo text data, the visual features are mapped to the semantic space which respects the text structure. Generally speaking, these methods attempted to transfer information from a lot of auxiliary data to train a more effective model for target data. In our paper, we employ the texture feature information of auxiliary data set to build the similarity matrix for target data set, and then by exploiting the texture information structure of the similarity matrix, the valuable features are mapped to the spectral space and the textural space. At last, the original spectral information is combined with texture information to improve the performance of classification model. In order to solve the shortcomings of scale sensitive and more time consuming, Zhang et al. [22] proposed a potential support vector machine (PSVM) algorithm, which uses a novel objective function to overcome the problem of scale sensitivity in SVM.
The remainder of this paper is organized as follows. Section 2 briefly reviews the formulations of relevant knowledge. In Section 3, the derivation process of the proposed method is described in detail. The effectiveness of the proposed method is demonstrated in Section 4 by experiments on remote sensing images. Finally, Section 5 concludes this paper.
2 Relevant knowledge
2.1 Transferring knowledge of feature representations
An iterative algorithm for solving the optimization function (2) was given in [18].
2.2 Fisher linear discriminant analysis (FLDA)
The main goal of FLDA is to perform dimension reduction while preserving as much information as possible. Linear discriminant analysis aims to find the optimal transformation matrix such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. But in hyperspectral remote sensing images classification problem, generally dimension of the feature vectors is very high with respect to the number of feature vectors. In this subsection, we briefly review the two-dimension Fisher discriminant analysis (2DFLDA) method by Kong et al. [19] proposed to handle the reduce dimensional problem. The main content can be summarized as follows:
where S _{ b } and S _{ w } are the inter-class and intra-class scatter matrices, respectively. \(S_{b}={\sum \nolimits }_{i=1}^{c}\left (m_{i}-m_{0}\!\right)^{T} \left (m_{i}\,-\,m_{0}\right), S_{w}\!\!={\sum \nolimits }_{i=1}^{c}{\sum \nolimits }_{j=1}^{N_{i}}\left ({A_{j}^{i}}-m_{i}\right)^{T}\left ({A_{j}^{i}}-m_{i}\right)\). \(m_{0}=\frac {1}{c}{\sum \nolimits }_{i=1}^{c}m_{i}\) is the global mean image of all classes.
3 Learning for information transferring
3.1 Notations
In this paper, we consider two data sets. One is the target data set (viz. original image) which only includes spectral information. The other is the auxiliary data set which consists of texture information (Please consult Figure 1). Both the two data sets include c classes. Let \(\mathbb {R}^{k}\) and \(\mathbb {R}^{m}\) be the spectral information and texture information feature spaces. And without loss of generality, we use S ^{(t)} and S ^{(a)} to represent target data set and auxiliary data set, respectively. Denote the feature matrix of target data set as \(\textbf {X}^{(t)}\in \mathbb {R}^{k\times n^{(t)}}\), the feature matrix of spectral information of auxiliary data set as \(\textbf {X}^{(a)}\in \mathbb {R}^{k\times n^{(a)}}\), and the texture feature information matrix in auxiliary data set as \(\textbf {T}^{(a)}\in \mathbb {R}^{m\times n^{(a)}}\). For target data set, we assume that each sample corresponds to particular auxiliary information. We use S ^{(t)} to represent the target data as below Equation 5:
Equation 7 shows that \(\textbf {x}_{i}^{(t)}\) and \(\textbf {x}_{j}^{(t)}\) are in the same class in target data set. And then, Equation 8 shows that \(\textbf {x}_{i}^{(t)}\) and \(\textbf {x}_{j}^{(t)}\) are in the different classes in target data set.
3.2 Construct the similarity matrix of S ^{ (t) } and S ^{ (a) }
As we all know, there are same spectrum and texture information for the same region (or field). Therefore, the similarity matrix with very important information for target data set is constructed based on the similarities between samples in S ^{(t)} and S ^{(a)}. For the sample \(x_{i}^{(t)}\) in S ^{(t)}, the most similar sample in S ^{(a)} is defined as Equation 9:
In Equations 10 and 11, \(w_{\textit {ij}}^{(w)}\) and \(w_{\textit {ij}}^{(b)}\) are the elements of W _{ w } and W _{ b }, respectively. d(·,·) is the Euclidean distance between two feature vectors with very important texture information. For W _{ w } and W _{ b }, in order to simplify the calculation, we have done the approximate calculation. The specific steps are as follows:
where \(\overline {S}_{\widehat {\textbf {f}}_{i}}^{(t)}\) is the ith row mean value of \(S_{\widehat {\textbf {f}}}^{(t)}\) and \(\overline {S}_{\textbf {f}_{j}}^{(a)}\) is the jth row mean value of \(S_{\textbf {f}}^{(a)}\).
3.3 Information transferring of auxiliary data
where \(\textbf {P}_{b}=\textbf {I}-\textbf {D}_{b}^{-\frac {1}{2}}\textbf {W}_{b}\textbf {D}_{b}^{-\frac {1}{2}}\) the normalized Laplacian matrix, D _{ b }=d i a g(W _{ b }·1) is a weight matrix whose diagonal elements are \(\textbf {D}_{b}^{ii}=\sum _{j=1}^{n^{(t)}}w_{\textit {ij}}^{(b)}\).
It is obvious that the above optimization is a convex problem, which can be achieved using existing convex optimization packages, such as fminunc and fmincon functions [20], SeDuMi [21]. The detailed description of the overall pseudo algorithm process is given in Algorithm 1.
4 Experimental results and analysis
4.1 Pavia University data set (PUD)
Information classes and training and test samples for PUD
Class No | Samples | |||
---|---|---|---|---|
Name | Train | Test | Auxiliary data | |
1 | Asphalt | 548 | 6304 | 300 |
2 | Meadow | 540 | 18146 | 300 |
3 | Gravel | 392 | 1815 | 300 |
4 | Tree | 524 | 2912 | 300 |
5 | Metal sheet | 265 | 1113 | 300 |
6 | Bare soil | 532 | 4572 | 300 |
7 | Bitumen | 375 | 981 | 300 |
8 | Bricks | 514 | 3364 | 300 |
9 | Shadow | 231 | 795 | 300 |
– | – | 3921 | 40002 | 2700 |
Confusion matrices, κ and time (s) of PUD
Class no. | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | UA (%) |
---|---|---|---|---|---|---|---|---|---|---|
Spectral space | ||||||||||
1 | 5,244 | 37 | 130 | 22 | 19 | 16 | 368 | 460 | 8 | 83.19 |
2 | 0 | 12,230 | 0 | 2,223 | 0 | 3,675 | 0 | 18 | 0 | 67.40 |
3 | 29 | 8 | 1,194 | 0 | 0 | 3 | 1 | 580 | 0 | 65.79 |
4 | 0 | 36 | 0 | 2,858 | 1 | 17 | 0 | 0 | 0 | 98.15 |
5 | 0 | 1 | 2 | 2 | 1,105 | 0 | 0 | 0 | 3 | 99.28 |
6 | 5 | 118 | 0 | 40 | 99 | 4,216 | 0 | 24 | 0 | 92.21 |
7 | 96 | 0 | 1 | 0 | 0 | 0 | 872 | 12 | 0 | 88.89 |
8 | 31 | 19 | 185 | 3 | 0 | 31 | 8 | 3,087 | 0 | 91.77 |
9 | 21 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 767 | 96.48 |
PA(%) | 96.65 | 97.69 | 78.60 | 55.52 | 90.28 | 52.98 | 69.82 | 73.83 | 98.59 | |
OA(%) = 78.93 | ||||||||||
κ = 0.7340 | ||||||||||
t=1.26s | ||||||||||
Fusion space | ||||||||||
1 | 5,286 | 5 | 4 | 41 | 3 | 1 | 403 | 547 | 14 | 83.85 |
2 | 0 | 18,145 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 99.99 |
3 | 0 | 0 | 1,782 | 0 | 0 | 33 | 0 | 0 | 0 | 98.18 |
4 | 0 | 0 | 0 | 2,911 | 1 | 0 | 0 | 0 | 0 | 99.97 |
5 | 0 | 0 | 0 | 4 | 1,109 | 0 | 0 | 0 | 0 | 99.64 |
6 | 0 | 0 | 11 | 26 | 33 | 4,502 | 0 | 0 | 0 | 98.47 |
7 | 95 | 0 | 0 | 0 | 0 | 0 | 872 | 14 | 0 | 88.89 |
8 | 32 | 0 | 0 | 0 | 0 | 0 | 8 | 3,324 | 0 | 98.81 |
9 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 771 | 96.98 |
PA(%) | 97.28 | 99.97 | 99.17 | 97.59 | 96.77 | 99.25 | 67.97 | 85.49 | 98.22 | |
OA(%) = 96.75 | ||||||||||
κ = 0.9562 | ||||||||||
t=0.51s |
According to Table 2, we found that the proposed approach with the fusion space gives better results as compared to the spectral space applied on PUD. In addition, the proposed method gives more overall accuracy (OA) (96.63%) and kappa value (0.9621) as compared to the method original spectral space. So, the proposed method can improve the overall classification accuracy and kappa value. In addition, it is worth noting that the elapsed time of proposed method is less than the original method.
4.2 Hohai University data set
Information classes and training and test samples for HUD
Class No | Samples | |||
---|---|---|---|---|
Name | Train | Test | Auxiliary data | |
1 | Road | 488 | 64,650 | 100 |
2 | Roof | 242 | 56,560 | 100 |
3 | Tree | 266 | 24,765 | 100 |
4 | Bare soil | 230 | 36,650 | 100 |
5 | Water | 182 | 36,600 | 100 |
6 | Shadow | 304 | 21,220 | 100 |
– | – | 8,220 | 240,445 | 600 |
Confusion matrices, κ and time (s) of HUD
Class no. | 1 | 2 | 3 | 4 | 5 | 6 | UA (%) |
---|---|---|---|---|---|---|---|
Spectral space | |||||||
1 | 44,609 | 19,748 | 0 | 188 | 0 | 105 | 69.00 |
2 | 16,283 | 37,422 | 250 | 777 | 4 | 1824 | 66.16 |
3 | 184 | 1,160 | 22,429 | 11 | 662 | 319 | 90.57 |
4 | 189 | 4,576 | 52 | 31,832 | 0 | 1 | 86.85 |
5 | 33 | 4 | 47 | 0 | 36,484 | 32 | 99.68 |
6 | 176 | 3041 | 1,165 | 2 | 4,358 | 12,478 | 58.80 |
PA(%) | 72.57 | 56.74 | 93.68 | 97.02 | 87.90 | 84.55 | |
OA(%) = 77.05 | |||||||
κ = 0.7145 | |||||||
t=18s | |||||||
Fusion space | |||||||
1 | 64,634 | 16 | 0 | 0 | 0 | 0 | 99.98 |
2 | 0 | 56,560 | 0 | 0 | 0 | 0 | 100.0 |
3 | 0 | 638 | 22,843 | 0 | 662 | 622 | 92.24 |
4 | 0 | 1,250 | 0 | 35,400 | 0 | 0 | 96.59 |
5 | 0 | 33 | 46 | 0 | 36,485 | 36 | 99.69 |
6 | 0 | 267 | 1,457 | 0 | 4,366 | 15,130 | 71.30 |
PA(%) | 100.0 | 96.25 | 93.83 | 100.0 | 87.89 | 95.83 | |
OA(%) = 96.09 | |||||||
κ = 0.9515 | |||||||
t=8s |
According to Table 4, the OA in classification accuracy obtained by proposed approach (96.09%) was much higher than that obtained by original method (77.05%). In addition, it is worth noting that the elapsed time of proposed algorithm is less than original algorithms. Meanwhile, we also found that in the fusion space, the result is better as compared to in the spectral space in terms of the accuracy of each class classification and κ.
OA (%), κ and time (s) for SVM, PSVM, Mbsvd, Mbqrcp and proposed method with SVM
From Table 5, we found that the proposed method shows better performance as compared to other approaches in terms of OA, κ, and running time. This is because the valuable texture information is employed in classification process. Hence, in the classification phase, the classification performance is improved.
5 Conclusions
In this paper, we proposed an information transferring approach to enhance remote sensing images classification performance. The main idea of the proposed method is that the texture feature information of auxiliary data set is transferred to the target data set, and then, the classification model is trained by using SVM or KNN classifier. And finally, experimental results show our approach is feasible.
In addition, the authors realize that more work must be done to improve the classification results in the further. Such as, how to choose a suitable method in classification tasks for remote sensing images. In addition, how to avoid negative transfer is an important open issue that is attracting more and more attention in the future. Of course, in this paper, how to determine the parameter λ, how to transfer other valuable information. This will be an interesting open issue.
Declarations
Acknowledgements
This work is supported partly by the National Natural Science Foundation of PR China (No. 61271386) and by the Graduates’ Research Innovation Program of Higher Education of Jiangsu Province of PR China (No. CXZZ13-0239), and the Industrialization Project of Universities in Jiangsu Province PR China (No. JH10-9).
Authors’ Affiliations
References
- G Mercier, F Girard-Ardhuin, Partially supervised oil-slick detection by SAR imagery using kernel expansion. IEEE Trans. Geosci. Remote Sensing. 44(10), 2839–2846 (2006).View ArticleGoogle Scholar
- Z Harchaoui, F Bach, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Image classification with segmentation graph kernels, (2007), pp. 1–8.Google Scholar
- M Fauvel, J Chanussot, JA Benediktsson, A spatial-spectral kernel-based approach for the classification of remote-sensing images. Pattern Recognit. 45(1), 381–392 (2012).View ArticleGoogle Scholar
- J Gao, L Xu, A Shi, F Huang, A kernel-based block matrix decomposition approach for the classification of remotely sensed images. Appl. Math. Comput. 228, 531–545 (2014).View ArticleMathSciNetGoogle Scholar
- D Tuiaa, E Pasollib, WJ Emeryc, Using active learning to adapt remote sensing image classifiers. Remote Sensing Environ. 115(9), 2232–2242 (2011).View ArticleGoogle Scholar
- JA Dos Santos, PH Gosselin, PF Sylvie, RDS Torres, AX Falcao, Interactive multiscale classification of high-resolution remote sensing images. Selected Topics Appl. Earth Observations Remote Sensing, IEEE J. 99, 1–15 (2013).Google Scholar
- SJ Pan, Q Yang, A survey on transfer learning. Knowledge and Data Engineering. IEEE Trans. 22(10), 1345–1359 (2010).Google Scholar
- G Boutsioukis, I Partalas, I Vlahavas, in Recent advances in reinforcement learning, 9th European Workshop EWRL. Transfer learning in multi-agent reinforcement learning domains (Athens, Greece, 2011).Google Scholar
- B Kocer, A Arslan, Genetic transfer learning. Expert Syst. Appl. 37, 6997–7002 (2010).View ArticleGoogle Scholar
- DI Ostry, Synthesis of accurate fractional Gaussian noise by filtering. IEEE Trans. Inf. Theory. 52(4), 1609–1623 (2006).View ArticleMATHMathSciNetGoogle Scholar
- Z Xu, S Sun, in Neural information processing, 19th International Conference ICONIP. Multi-source transfer learning with multiview Adaboost (Doha, Qatar, 2012).Google Scholar
- S Yang, M Lin, C Hou, C Zhang, Y Wu, A general framework for transfer sparse subspace learning. Neural Comput. Appl. 21(7), 1801–1817 (2012).View ArticleGoogle Scholar
- Y Yao, G Doretto, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE. Boosting for transfer learning with multiple sources, (2010), pp. 1855–1862.Google Scholar
- W Dai, Y Chen, G Xue, Q Yang, Y Yu, in Proceedings of Advances in Neural Information Processing Systems (NIPS). Translated learning: transfer learning across different feature spaces, (2008), pp. 353–360.Google Scholar
- Y Zhu, Y Chen, Z Lu, et al, in Special Track on AI and the Web, associated with The Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI). Heterogeneous Transfer Learning for Image Classification, (2011).Google Scholar
- GJ Qi, C Aggarwal, T Huang, in Proceedings of the 20th international conference on World wide web. Towards semantic knowledge propagation from text corpus to web images (ACM, 2011), pp. 297–306.Google Scholar
- Y Wei, Y Zhao, Z Zhu, Y Xiao, in Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) 2012 Eighth International Conference on. Knowledge transferring for Image Classification (IEEE, 2012), pp. 347–350.Google Scholar
- W Dai, Q Yang, G Xue, Y Yu, in Proc. 25th Int’l Conf. Machine Learning. Self-Taught Clustering, (2008), pp. 200–207.Google Scholar
- H Kong, EK Teoh, JG Wang, R Venkateswarlu, Two dimensional fisher discriminant analysis: Forget about small sample size problem. Proc. IEEE Intern. Conf. Acoustics Speech, Signal Process. 2, 761–764 (2005).Google Scholar
- MathWorks (2013). [Online]. Available: http://www.mathworks.com.
- JF Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods Softw. 11(1–4), 625–653 (1999).View ArticleMathSciNetGoogle Scholar
- R Zhang, J Ma, An improved SVM method P-SVM for classification of remotely sensed data. Int. J. Remote Sensing. 29(20), 6029–6036 (2008).View ArticleGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.