 Research
 Open Access
 Published:
Selecting pseudo supervision for unsupervised domain adaptive SAR target classification
EURASIP Journal on Advances in Signal Processing volume 2022, Article number: 84 (2022)
Abstract
In recent years, deep learning has brought significant progress for the problem of synthetic aperture radar (SAR) target classification. However, SAR image characteristics are highly sensitive to the change of imaging conditions. The inconsistency of imaging parameters (especially the depression angle) leads to the distribution shift between the training and test data and severely deteriorates the classification performance. To address this problem, in this paper we propose an unsupervised domain adaptation method based on selective pseudolabelling for SAR target classification. Our method directly trains a deep model using the data from the target domain by generating pseudolabels in the target domain. The key idea is to iteratively select valuable samples from the target domain and optimize the classifier. In each iteration, the breaking ties (BT) criterion is adopted to select the best samples with the highest scores of relative confidence. Besides, to avoid error accumulation in the iterative process, class confusion regularization is used to improve the accuracy of pseudolabelling. Our method is compared with stateoftheart methods, including supervised classification and unsupervised domain adaptation methods, over the moving and stationary target acquisition and recognition (MSTAR) dataset. The experimental results demonstrate that the proposed method can achieve better classification performance, especially when the difference of depression angles of the source and target domain images is large. Besides, our method also shows its superiority under limitedsample conditions.
1 Introduction
Synthetic aperture radar (SAR) has been widely used in military and civilian applications since it can provide highresolution imagery in allweather and daynight time. SAR target recognition is a typical image pattern recognition problem, which aims to classify the target images into different classes or types. Traditionally, a target classification algorithm generally consists of three steps: preprocessing, extracting features, and classifying. Deep learning (DL) has been widely used in the computer vision field in recent years. The automatic featureextraction ability has attracted much attention in SAR automatic target recognition (ATR). Chen et al. [1] used a convolution layer instead of a full connection layer to realize feature classification in a convolutional neural network (CNN), and proposed a full convolution neural network called AConvnet, which achieved a classification accuracy of 99.13% under the standard operating conditions of MSTAR data. Wagner et al. [2] used a supportvector machine to classify features extracted by CNN and proposed the structure of CNNSVM (supportvector machine). At present, most of the DLbased methods can achieve an accuracy of more than 99% under the standard operating conditions of the MSTAR dataset. Target classification using limited data is also a research hotspot in SAR ATR. Many techniques based on adjusting model structure [3], transfer learning [4, 5], data augmentation [6] are adopted to mitigate the overfitting problem caused by the limited training data.
With the development of machine learning in the field of SAR target recognition, the existing methods can achieve better target recognition performance under sufficient and insufficient training samples. However, most methods usually only use the labelled data in the source domain to train a classifier in a supervised way, so it is assumed that the training data and test data come from the same or similar distribution. When this assumption fails, the model learned on the training set is difficult to achieve good performance on the test set, which is caused by inconsistent distribution. SAR image is highly sensitive to imaging conditions. The inconsistency of imaging parameters such as depression angle, azimuth angle, and radar band will lead to the distribution mismatch between the training and test samples [7]. As shown in Fig. 1, depression angle differences cause visual discrepancy of a certain vehicle target. In this case, the classifier trained by the source domain is difficult to obtain accurate classification results on the test set. Therefore, how to overcome the degradation of model generalization performance caused by different imaging conditions is an urgent problem to be solved.
Deep unsupervised domain adaptation techniques can be employed in addressing such domain shift problems. Existing deep unsupervised domain adaptation approaches mainly learn domaininvariant features from the labelled source domain data and the unlabelled target domain data via explicit source and target data distributions [8,9,10,11,12,13,14,15]. Although impressive performance has been achieved in prior works, we argue that such methods designed for optical image classification tasks cannot achieve good results in SAR ATR tasks for two reasons. First, due to the high cost and difficulty of manual annotation of SAR data, deep neural networks turn out to be overfitting to a certain extent with limited labelled training data. Second, alignment between the source and target domains in an unsupervised manner is not efficient enough to handle large imaging discrepancy of SAR images. Furthermore, it has been justified that domain shift can be unified through a classifier trained on both the source and target domain data in a highdimensional feature space due to the dimensionality blessing [16]. Hence, it is beneficial to considering the labelled target domain as an auxiliary training data.
To improve the domain adaptation ability of SAR image targets in different imaging conditions, an unsupervised domain adaptation approach based on selective pseudolabelling is proposed in this paper. The idea of pseudolabelling is to first train the initial classification model in the source domain, then test the samples in the target domain to generate pseudolabels. In each iteration, the BT criterion is adopted to select the best samples with the highest scores of relative confidences. Next, target samples with pseudolabels are added to the training set for supervised training. This process is repeated until the generated pseudolabels do not change. On one hand, the pseudolabelled samples in the target domain can expand the training set and inhibit the overfitting of the model. On the other hand, it can directly adapt the model to the potential feature distribution of the target domain. However, the pseudolabelling strategy is extremely dependent on the initial pseudolabels. If the initial pseudolabels are wrongly assigned, it is easy to result in error accumulation and cause the model fall into local optimal solution. To avoid the error accumulation in the iterative process, class confusion regularization is used to improve the accuracy of pseudolabelling. We conducted the experiments on SAR images with different depression angles to explore the crossdomain classification capability of our method. Based on the MSTAR dataset, six configurations of depression angles are considered to construct the source domain and target domain data for target classification. Our method is compared with stateoftheart methods, including supervised classification and unsupervised domain adaptation methods. Classification results and feature visualization based on tdistributed stochastic neighbor embedding (tSNE) [17] demonstrate that the proposed method can achieve better classification performance, especially when the difference of depression angles of the source and target domain images is large. Besides, our method also shows its superiority under fewsample conditions.
2 Unsupervised deep domain adaptation
In computer vision, a common assumption behind image classification task is that the source domain and the target domain data have similar or the same support [18]. However, in many realworld scenarios, this assumption fails since there may be no overlapping features across the source domain and target domain. We give formal related notations mathematically to define the problem. In supervised learning, given a source training sample set \(X^{s} = \{ (x_{i}^{s} )\}_{i = 1}^{N}\) and corresponding label set \(Y^{s} = \{ (y_{i}^{s} )\}_{i = 1}^{N}\), the goal of a learner usually consists of finding a good hypothesis function \(h\) that captures in the best way possible the relation between \(X\) and \(Y\). This relationship often extends beyond the training instances to test instances \(X^{t} = \{ (x_{i}^{t} )\}_{i = 1}^{M}\) drawn from the same probability. However, when such probability identity does not hold, a classifier trained on the labelled source domain suffers from significant performance drop when directly applied to the target domain, as shown in Fig. 2. To be specific, the marginal distributions of source and target domains are different, i.e., \(p(x^{s} ) \ne p(x^{t} )\), but the conditional probability distributions are identical, i.e., \(p(y^{s} x^{s} ) = p(y^{t} x^{t} )\).
From this perspective, unsupervised deep domain adaptation approaches consider learning domaininvariant features through labelled source data and unlabelled target data in an endtoend framework. DDC (deep domain confusion) [8] aims to learn transferable features by matching kernel embedding in reproducing kernel Hilbert space (RKHS) calculated from two distributions. DAN (deep adaptation networks) [9] improves the DDC domain metric by replacing it with a multikernel variant. Deep Coral [10] utilizes the idea of Coral to learn features with the same second order statistical property. JAN (joint adaptation network) [11] aligns features activated from multilayers using joint maximum mean discrepancy. These methods are based on different advanced domain discrimination metrics. Meanwhile, adversarial training strategy of generative adversarial networks [19] are employed in domain adaptation to learn domaininvariant features. The key is to play a ‘minmax’ game over the features from multilayers output activation. DaNN (domain adversarial neural network) [20] proposes the gradient reversal layer, through which features from two domains are made as indistinguishable as possible during the gradient backpropagation. The success of gradient reversal layer lies on that the ‘minmax’ process of the feature extraction network can be conducted in once backpropagation. CDAN (conditional domain adaptation network) [12] improves the adversarial training by capturing the mutual covariance between features and classifier’s output. More recently, MCD (maximum classifier discrepancy) [13] attempts to align distributions of the source and target domains by utilizing the taskspecific decision boundaries. Two classifiers are trained to maximize the discrepancy to detect target samples that are far from the support of the source data, and a feature generator learns to generate target features near the support to minimize the discrepancy.
It is noticed that unsupervised domain adaptation assumes the access of labelled data only from the source domain and unlabelled data from the target domain, hence it follows transductive learning paradigm. One effective method is to train networks with samples from both domains in a supervised manner. Pseudolabelling has been used in many approaches to help address the lack of labelled data, such as semisupervised learning [21] and fewshot learning [16, 22]. Two categories, i.e., hard labelling [23, 24] and soft labelling [25] have been employed in many existing works. The main idea of hard pseudolabelling is to assign a pseudolabel to each unlabelled instance and then train classifiers with an augmented labelled training set. It is supposed that the parameters of networks trained based on hard labelling tends to be stuck in local maximum since it does not consider each target sample’s confidence. The strategy of soft labelling assigns the conditional probability of each class given a target sample [25]. To address the mislabelling issue, selective pseudolabelling is another effective method, which selects part of the unlabelled samples in some sort of order to assign pseudolabels. One key factor is how to make criterion of sample selection for pseudolabelling. An easytohard strategy is employed in [26]. Target samples whose similarity scores are higher than a certain threshold are selected for pseudolabelling and this threshold is updated after each iteration of learning so that more unlabelled target samples can be selected.
Although impressive performance of unsupervised deep domain adaptation has been achieved in optical image classification, they are rarely used for SAR target recognition. In this paper, we propose a novel domain adaptation approach based on selective pseudolabelling aimed to address the imaging discrepancy for SAR target classification. The main contributions of this article are as follows.

(1)
SAR images are highly susceptible to imaging conditions, which causes recognition degradation on deep learning models. We firstly investigate the model’s generalization ability across images captured from different depression angles, which is ignored in previous studies on SARATR.

(2)
A selective pseudolabelling strategy is introduced into the domain adaptation method. This strategy not only implicitly conducts feature alignment without moment distribution or adversarial learning, but also boosts the model’s generalization under limited training data.

(3)
To avoid error accumulation of pseudolabelling, class confusion loss is introduced into the iterative process as a regularization term, to enhance the pseudolabelling accuracy gradually in each iteration.

(4)
Our proposed method obtains an obvious improvement over the compared domain adaptation methods across different depression angle data in MSTAR datasets. Furthermore, our method is more suitable in SARATR under training conditions of limited samples.
3 Method
Classical domain adaptation methods usually learn domaininvariant features by directly or indirectly aligning the distributions of the source domain and target domain. However, for highdimensional classification neural networks, it is possible to learn a classifier with certain generalization performance when labels in the source domain and target domain are available. Since unsupervised domain adaptation has assumed that there are no available labels for the target domain, we propose a SAR image domain adaptation approach based on selective pseudolabelling with class confusion regularization (SPLCCR).
3.1 Overall architecture of SPLCCR
The diagram of SPLCCR is shown in Fig. 3. SPLCCR organize two main steps in an iterative learning strategy to generate pseudolabels from the target domain. In the first step, we train the classification network with the labelled source data and classify the unlabelled target samples. Then, a fraction of target samples is assigned with pseudolabels according to selection strategy. In the second step, the classification network is trained with two input streams. One of is composed of the source domain samples and the selected target domain samples. The other is composed of the unselected unlabelled target domain samples. The classifier gets stronger after learning from the former trainset in a supervised way. The second train set is also sent to the network to calculate the class confusion loss, which plays the role of regularization to alleviate the mislabelling issue. Specially, we adopt ResNet18 [27] and a custom linear layer as our feature extraction network and classifier.
The key idea of pseudolabelling is to iteratively establish valuable sample set from the target domain and optimize the classifier. Different from traditional supervised learning and domain adaptation, the classifier will be trained by data from both the source domain and target domain. To better describe the learning process, we introduce the model (\(T,X_{s} ,X_{t} ,G,Q\)). \(G\) is a supervised classification network, which is trained using the training set \(T\). \(X_{s}\) is the labelled source domain dataset. \(X_{t}\) is the unlabelled target domain dataset used to provide pseudolabels. \(Q\) refers to the pseudolabelling strategy, which is used to select valuable samples and automatically assign pseudolabels.
As shown in Table 1, pseudolabellingbased domain adaptation is trained in an iterative way. In the beginning, the training set is initialized as the source domain data \(X_{s}\), which have been used to train the classifier \(G\). Next, in each iteration, the most valuable samples from the unlabelled target domain are selected and each is assigned a pseudolabel. The selected samples with pseudolabels are added to the training set \(X_{l}\) and the updated training set is used to retrain the classifier \(G\). The process is repeated until the iteration condition is reached or the pseudolabels no longer change.
3.2 Pseudolabelling selection strategy
Wang et al. [28] proved that it is beneficial to select some valuable pseudolabelled samples from the target domain as part of the training set instead of using all the pseudolabelled samples. When domain discrepancy exists, the classifier trained on the initial training set, i.e., the labelled source domain data, usually shows low accuracy on the target domain. Therefore, it is better to use only a small fraction of the target samples at the beginning.
It is supposed to select the samples with high probability to be correctly classified by the current network to alleviate the mislabelling issue. We adopt the BT criterion [29] which is inspired to the multiclasslevel uncertainty for classification with SVMs. The main idea of the BT criterion is that the best sample has the least uncertainty between the two classes to which it is most likely to belong. The relative confidence of the target sample given by the current network is defined as follows
where \(\Omega = \left( {1,2, \cdots ,C} \right)\); \(c^{ + } = \mathop {\arg \max }\limits_{c \in \Omega } (p(\widehat{y}_{i} = cx_{i} ))\) represents the class to which \(x_{i}\) is most likely to belong; \(\Omega /c^{ + }\) denotes the set of all the class labels in \(\Omega\) except \(c^{ + }\). If \(\mathop {\max }\limits_{c \in \Omega } p(\widehat{y}_{i} = cx_{i} ) \gg \mathop {\max }\limits_{{c \in \Omega /c^{ + } }} p(\widehat{y}_{i} = cx_{i} )\), the probability that \(x_{i}\) belongs to class \(c^{ + }\) is high. If \(\mathop {\max }\limits_{c \in \Omega } p(\widehat{y}_{i} = cx_{i} )\) and \(\mathop {\max }\limits_{{c \in \Omega /c^{ + } }} p(\widehat{y}_{i} = cx_{i} )\) are close, \(x_{i}\) is more likely to be misclassified. Therefore, selecting pseudolabelled target samples with top soft confidence scores can prevent adding the mislabelled samples to the training set. It is noticed that unlike other works [26, 28] using hard confidence, i.e., \(U_{i} = \mathop {\max }\limits_{c \in \Omega } p(\widehat{y}_{i} = cx_{i} )\), the relative confidence is more reasonable and better. At the beginning of training, the classification ability of the network is weak so that it cannot get a confident and reliable output over categories. For example, we think that a sample with the output of (0.2, 0.6, 0.2, 0.1) is more reliable than a sample with the output of (0.65, 0.35, 0, 0) although the latter gets higher confidence in the first class. In realworld applications, the relative confidence can be used on a probabilistic model rather than a decision model, since it is calculated by the two highest probabilities from the model’s prediction. Not just neural networks, other machine learning methods, such as SVMs and Regression, can determine the relative confidence through subtracting the secondhighest category probability from the highest probability.
We adopt the following selective pseudolabelling strategy. In the \(k{\text{  th}}\) iteration, the relative confidence of all the target domain samples is calculated. For class \(c\), the number of selected pseudolabelled target domain samples \(N(c,k)\) is determined by
where \(T\) is the number of iterations.
\(N_{c}\) is the average sample number of \(C\) classes in the target domain.
\(n_{t} (c,k)\) is the number of the target domain samples which are classified into the \(c{\text{  th}}\) class in the \(k{\text{  th}}\) iteration. Our pseudolabelling selection allows balanced pseudolabelled target samples across different classes. The number of predicted pseudolabels \(n_{t} (c,k)\) increases as the iteration proceeds. As a result, there can be a large number of selected pseudolabelled samples for ‘easy’ class while very limited pseudolabelled samples for other samples. We make a minimization with the \(N_{c} \times k/T\) in order to prevent the network to be biased to the ‘easy’ class so that pseudolabelled target samples will contribute to the alignment of distribution for each class during learning.
3.3 Loss function of class confusion
After the pseudolabelled target domain samples are added to the training set, the classifier can directly learn knowledge from the target domain. However, the accumulated error caused by mislabelling still exists. According to Eq. (1), in each iteration the target domain samples with high relative confidence are more likely to be selected as the training samples in the next iteration than those with relative confidence scores. Obviously, samples that make the classifier ambiguous and unconfident across classes may not be fully utilized in domain adaptation learning. As shown in Fig. 4, the selected target domain samples correspond to a sparse classification confusion matrix, while the classification confusion matrix of the unselected target domain samples is more dispersed.
Minimum class confusion (MCC) [14] is a general loss function which can be characterized as a domain adaptation method without explicitly deploying domain alignment since it only uses the target domain data. When the selected target domain samples are added to the training set for the next iteration, the unselected samples will be used to calculate the minimum class confusion as the regularization item for classifier training. Introducing MCC has the following advantages. Firstly, it can be used as a general regularization that prevents the network from being stuck in the local optima. Secondly, in each iteration, the samples farthest to the classification hyperplane are selected, as shown in Fig. 4. Utilizing class confusion term of the unselected target domain samples as a part of loss function in current iteration makes complementary improvement from the target domain. After current iteration, they are more likely to be given higher certainty and selected in the next iteration for class prediction. Thirdly, MCC can largely accelerate convergence and achieve a high domain adaptation performance with limited number of iterations.
We add class confusion as a regularization on unlabelled target domain samples. The confusion between different classes can be naturally described by an innerproduct between the classifier predictions and their responses. Firstly, temperature rescaling [30] is added to the softmax output of the classifier to alleviate the overconfident predictions. The probability that the ith instance belongs to the jth class is expressed as
where \(\widehat{Y}_{i,j}\) is the logit produced by the network of the ith instance. The class correlation between classes \(j\) and \(j^{\prime}\) is defined as
where \(\widehat{z}_{j}\) denotes the probabilities that the samples in each batch come from the jth class. The class correlation measures the probability that the classifier simultaneously classifies the examples into the \(j\)th and \(j^{\prime}\)th class.
Those examples with higher certainty in class predictions given by the classifier are more reliable and should contribute more to the pairwise class confusion [14]. Furthermore, a weighting mechanism based on the uncertainty is added such that class confusion could highlight the samples with higher certainty in class predictions and ignore the samples that shows little category tendency. The entropy function is used as the measure of uncertainty, which is defined as
With weighting mechanism, preliminary definition of class confusion is defined as
where
In Eq. (7), \(W_{ii}\) is the probability of quantifying the importance of the \(i\)th sample for modeling the class confusion. \(W\) is the corresponding diagonal matrix in Eq. (6). Finally, the formal class confusion loss function that is native for the minibatch SGD optimization is written as
In Eq. (8), it is noted that a category normalization technique [31] is adopted to prevent a severe class imbalance when the number of classes is large.
3.4 Total loss function
The total loss function of the classifier \(L_{{{\text{total}}}}\) consists of two parts, the classification loss \(L_{cls}\) and the class confusion loss \(L_{CC}\).
where \(X_{s}\) refers to the source domain samples.\(X_{t}^{{{\text{pse}}}}\) the pseudolabelled target domain samples in current iteration.\(X_{t}^{{{\text{unselected}}}}\) the unselected target domain samples. \(L_{{{\text{cls}}}}\) is given by
where \(N_{s}\) and \(N_{t}^{pse}\) are the numbers of the source domain samples and the pseudolabelled target domain samples, respectively. \(y_{ic}\) and \(y_{jc}^{{{\text{pse}}}}\) represent the sign functions of the ith source domain sample and the jth pseudolabelled target domain sample, respectively, as given by Eq. (11) and Eq. (12). \(p_{ic}\) and \(p_{jc}\) refers to the probabilities that the ith source domain sample and the jth pseudolabelled target domain sample belongs to class \(c\).
3.5 Algorithm of the proposed SPLCCR for SAR target recognition
To summarize, the proposed method of SPLCCR for SAR target recognition is shown in Algorithm 1. In the beginning, the training set initialized as the source domain data \(X_{s}\) is used to train the classifier \(G\). Next, in each iteration, the samples with the highest probability to be correctly classified from the unlabelled target domain are assigned with pseudolabels and selected. The selected samples are used to form new training set and retrain the classifier \(G\). The process is repeated until the iteration condition is reached or the pseudolabels no longer change.
4 Results and discussion
In this section, we describe our experiments on the MSTAR dataset for SPLCCR for SAR image classification. Our method is firstly compared with stateoftheart unsupervised domain adaptation methods to evaluate its effectiveness. Then, we use the tSNE technique to visualize the extracted features from the source and target domains. Besides, we quantitatively measure the domain discrepancy across different methods using \({\mathcal{A}}\)distance [32]. Finally, we investigate the performance under conditions with limited source and target samples.
4.1 Dataset description
The MSTAR dataset [33] contains SAR images of ten targets including tanks, armored vehicles, weapon systems and military engineer vehicles (armored personal carrier: BMP2, BRDM2, BTR60, and BTR70; tank: T62, T72; weapon system: 2S1; air defense unit: ZSU234; truck: ZIL131; bulldozer: D7). The data were collected with a Sandia Xband radar. The range and crossrange resolution are identical and equal to 0.30 m.
Considering the sensitivity of SAR images to depression angles, we conduct the experiments on images with different depression angles to explore the crossdomain classification capability of our method. The MSTAR dataset contains SAR images with four depression angles, i.e., \({15}^{ \circ }\),\({17}^{ \circ }\), \({30}^{ \circ }\) and \({45}^{ \circ }\). There are four classes (2S1, BRDM2, T72 and ZSU234) of target images covers all these depression angles. Figure 5 shows some optical and SAR images of the four classes of targets with different depression angles. The numbers of SAR images of different targets under different depression angles are given in Table 2. We construct six domain adaptation tasks by setting different source and target data configurations, i.e., \(17^{ \circ } \to 30^{ \circ }\), \(30^{ \circ } \to 17^{ \circ }\), \(17^{ \circ } \to 45^{ \circ }\), \(45^{ \circ } \to 17^{ \circ }\), \(30^{ \circ } \to 45^{ \circ }\) and \(45^{ \circ } \to 30^{ \circ }\). Note that data under depression angle of \({15}^{ \circ }\) is not involved in our experiment since it shows little difference with data under depression angle of \({17}^{ \circ }\).
4.2 Experimental setting
Each sample in the MSTAR dataset is cropped to the size of 128 × 128 pixels and no image augmentation and preprocessing algorithm is applied. The algorithms are implemented in Pytorch1.7. The classification model is trained iteratively by a stochastic gradient descent (SGD) optimization algorithm with a momentum of 0.9. The learning rate is adjusted using the simulated annealing strategy with the following schedule
where \(\eta_{0}\) is the initial learning rate.\(\alpha\) = 0.001.\(\beta\) = 0.75. \(p\) represents the ratio of current epoch and total epochs, gradually increasing from 0 to 1.
The pseudolabels of the target domain samples in the last epoch are used as the predictions of the classification model. Different from most traditional domain adaptation methods, pseudolabelling is implemented in an iterative way. Therefore, the epoch number in each iteration for pseudolabelling is set as 10 and the maximum iteration number is set as 10. Epoch number for other domain adaptation methods is set as 100, for a fair comparison on training total epochs with methods based on pseudolabelling.
4.3 Comparison with other approaches
We compare our method with the most competitive classification methods including the supervisedlearningbased (ResNet18) and unsupervised domain adaptation methods (SPL, DDC, DaNN, CDAN, MCC, JAN, MCD). SPL refers to the selective pseudolabelling method without class confusion regularization. All hyperparameters of the compared method are adopted in the same scheme for experimental fairness. We apply all methods based on Resnet18 to evaluate their performance. The average classification accuracy of each method is reported on three random experiments and each method is trained for 80 epochs per experiment. The batch size is set to 8. For MMDbased methods, i.e., DDC and JAN, we adopt a Gaussian kernel with bandwidth set to median pairwise squared distances. Each method is optimized using SGD with a momentum of 0.9 and a weight decay of \(5 \times 10^{  4}\), under the same learning rate adjusting scheme with SPL and SPLCCR. The classification results of six domain adaptation tasks with different methods are given in Table 3. The classification precision, namely the ratio of the numbers of correctly classified samples and total samples, is chosen as the classification evaluation metric in the following tables.
According to Table 3, we find that the differences of depression angles result in large performance differences in classification. We should first emphasize that all methods have achieved above 99% recognition accuracy over the source domain due to the strong fitting ability of the ResNet18 network, which is not shown in the table. However, for all the methods, the classification performances decrease with the increasement of the depression angle difference. Our SPL method achieves an average accuracy of 93.17% over the six domain adaptation tasks which outperforms all the comparative methods except SPLCCR. This proves that generating pseudolabels can directly capture the information of the target domain data. Although the feature distributions are not strictly aligned in the hidden space of the middle layers of the network, the SPL method is able to comprehensively represent the target domain data. As for the proposed SPLCCR method, its average accuracy over the six tasks increases to 96.56%. On the one hand, class confusion regularization is used as a domain adaptation loss function, which can narrow the distribution discrepancy of the output between the source and target domains. Thus, the network will generate more accurate pseudolabels in the next iteration. On the other hand, class confusion regularization improves the convergence of the method. Higher domain adaptation performance can be achieved under limited number of iterations.
In the task of \(17^{ \circ } \to 45^{ \circ }\), SPLCCR, SPL and MCC reach the top three scores of classification accuracy, which are 91.87%, 84.28% and 82.10%, respectively. The accuracy of SPLCCR is higher than that of MCC by 9.77%. In the task of \(45^{ \circ } \to 17^{ \circ }\), SPLCCR, SPL and CDAN achieve the top three scores of classification accuracy, which are 91.98%, 87.06% and 86.52%, respectively. The accuracy of SPLCCR is higher than that of CDAN by 5.46%. It is noted that JAN and CDAN have a lower performance in the tasks of \(17^{ \circ } \to 45^{ \circ }\) and \(45^{ \circ } \to 17^{ \circ }\), compared to other experimental setups. Especially, JAN performs worst compared to other methods. Due to huge domain discrepancy, hardtotransfer examples with uncertain predictions may deteriorate the conditional adversarial adaptation procedure. Hence, features cannot be aligned through capturing the crosscovariance of feature representation and classification prediction. In the task of \(30^{ \circ } \to 45^{ \circ }\), SPLCCR, CDAN and SPL achieve the top three scores of classification accuracy, which are 96.49%, 93.23% and 90.47, respectively. CDAN outperforms SPL, but its accuracy is 3.26% lower than that of SPLCCR. In other three tasks, our SPLCCR method does not significantly outperform the other methods. This indicates that featurebased domain adaptation methods are sufficient to improve model generalization when the difference of depression angles is small. However, if the difference of depression angles is large, e.g., the \(17^{ \circ } \to 45^{ \circ }\) or \(45^{ \circ } \to 17^{ \circ }\) tasks in our experiment, classic unsupervised domain adaptation methods fail to align the feature distributions well, due to large domain discrepancy. Our SPLCCR method can transfer the knowledge of the source domain to the target domain by iteratively generating pseudolabels and performs better even if the difference of depression angles is large.
4.4 Feature visualization
For qualitative comparison of different methods, we use the tSNE technique to visualize the features of the source domain and target domain over the tasks of \(17^{ \circ } \to 30^{ \circ }\), \(17^{ \circ } \to 45^{ \circ }\) and \(30^{ \circ } \to 45^{ \circ }\). Our SPLCCR method are compared with ResNet18, DDC, DaNN and CDAN, since they represent different types of methods and most of them achieve the highest accuracy in a certain task.
Firstly, we visualize the features in the domain adaptation layer of each network. Here, the domain adaptation layer refers to the layer before the output layer. Figure 6 shows the visualization results. Blue and red dots represent the source domain samples and target domain samples, respectively. Figure 6a shows the feature distribution of the domain adaptation layer of ResNet18. The network is trained using only the source domain data. Misalignment of the source domain and target domain samples are quite severe, which leads to bad classification performance. According to Fig. 6b–d, domain adaptation methods such as DDC, DaNN and CDAN can align samples from the source domain and target domain over the tasks of \(17^{ \circ } \to 30^{ \circ }\) and \(30^{ \circ } \to 45^{ \circ }\). However, we can see that the source domain samples and target domain samples are not well aligned in the task of \(17^{ \circ } \to 45^{ \circ }\). This indicates that the domain adaptation loss usually plays a regularization role in the training process and improves the classification performance in the target domain by reduce domain discrepancy. But, it deteriorates the classification in the source domain and results in the nonnegligible generalization error in the target domain. As for our SPLCCR method, according to Fig. 6e, since the proposed method does not use metric criteria or adversarial learning methods for domain adaptation, the distribution differences still exist. However, both the source and target domain data show strong separability, which indicates our method is able to capture the data modality of the target domain and adapt the data distribution.
Secondly, we visualize the feature in the output layer of each network, as shown in Fig. 7. The feature distribution in the output layer of a network can directly reflect the generalization error in the source and target domains as well as domain adaptation ability. According to Fig. 7, our SPLCCR method can extract features which are both domain and class discriminative over the three tasks. It should be pointed out that other domain adaptation methods such as DaNN and CDAN can also achieve good adaptation ability in the tasks of \(17^{ \circ } \to 30^{ \circ }\) and \(30^{ \circ } \to 45^{ \circ }\). Especially in the task of \(30^{ \circ } \to 45^{ \circ }\), the source domain and target domain samples are better aligned by DaNN or CDAN than SPLCCR. However, in the task of \(17^{ \circ } \to 45^{ \circ }\), SPLCCR shows stronger ability in feature alignment than DaNN and CDAN. On the contrary, although our SPLCCR method does not perfectly align the features from the source and target domains, each class is still distinguishable in both domains over the three tasks.
4.5 Domain discrepancy comparison
\({\mathcal{A}}\)distance is often used to measure domain discrepancy in domain adaptation researches. It is defined as
where \(D_{s}\) and \(D_{t}\) denote the source and target domain samples, respectively. \(L_{g}\) is the generalization error of a twosample classification. Here we use a singlelayer network and sigmoid function as a binary classifier.
The results of \({\mathcal{A}}\)distance of different methods over all the six tasks (\(17^{ \circ } \to 30^{ \circ }\), \(30^{ \circ } \to 17^{ \circ }\), \(17^{ \circ } \to 45^{ \circ }\), \(45^{ \circ } \to 17^{ \circ }\), \(30^{ \circ } \to 45^{ \circ }\) and \(45^{ \circ } \to 30^{ \circ }\)) are shown in Fig. 8. As we can see, the \({\mathcal{A}}\)distance of our SPLCCR is significantly lower than those of other methods. This indicates that class confusion regularization and pseudolabelling can reduce crossdomain divergence more effectively.
4.6 Classification with limited samples
This subsection compares our SPLCCR method with other methods in the classification performance under conditions of limited samples. Since the training data includes both the source and target domain data, \(N_{t}\) samples are randomly selected for each class as the training data in both domains. \(N_{t}\) is set to 10, 20 and 30. The test data are all samples of the target domain. Numbers of training and test samples in each task are given in Table 4.
Tables 5, 6 and 7 show the classification results of different methods over all the six tasks with limited samples, i.e. 10, 20, and 30 samples per class, are randomly selected from the training set in the source domain and target domain, respectively. It can be obviously seen that as the number of training samples increases, the classification accuracy is improved since more source domain samples with different imaging environments and target poses are added in the training process.
The classification results with limited samples are consistent with the results in Sect. 4.3, where the training samples are more sufficient. Although the improvement is reduced, our SPLCCR method still outperforms other methods and proves its superiority under limitedsample conditions. It also suggests that domain adaptation methods based on pseudolabelling are effective in SAR target recognition with sample limitation since pseudolabel can make a label complement to prevent network overfitting.
With very limited samples, SPLCCR or SPL do not achieve the best performance over some tasks, e.g., 30° → 45° with 10 samples selected for per class and 45° → 30° with 20 samples selected for per class. The reason may be that mislabelling issue is more severe in the limited sample cases, which causes error accumulation and classification performance drop. Furthermore, Tables 5, 6 and 7, combination of class confusion regularization with SPL displays noticeable accuracy enhancements in most tasks, which proves the effectiveness of class confusion regularization under limitedsample conditions.
5 Conclusions
In this paper, we proposed a novel method for SAR target classification from a perspective of domain adaptation to tackle performance degradation problem caused by variant imaging conditions. A selective pseudolabelling strategy based on the BT criterion and class confusion regularization is designed. Part of the target domain samples are assigned pseudolabels and added to the training set in an iterative way. Therefore, data information in the target domain can be directly studied. Considering the problem of error accumulation of pseudolabelling, class confusion loss is introduced into the iterative process as a regularization term, which improves the network’s adaptation to the target domain samples. We conducted the experiments on SAR images with different depression angles to explore the crossdomain classification capability of our method. Based on the MSTAR dataset, six configurations of depression angles are considered to construct the source domain and target domain data for target classification. The proposed SPLCCR method achieved an average accuracy of 96.56% over all the six tasks, which is significantly higher than those of other comparative methods such as ResNet18, DDC, DaNN, CDAN, MCC, JAN and MCD. The tSNE feature visualization results show that the proposed method has strong ability in feature alignment across two domains and extracts features maintaining good separability from the target domain. Besides, our method also shows its superiority under limitedsample conditions. At present, our work mainly focuses on the variation of depression angle. In the future, we will further study SAR ATR tasks with other type of imaging condition variations.
Availability of data and materials
Please contact the authors for data requests.
Abbreviations
 SAR:

Synthetic aperture radar
 BT:

Breaking ties
 MSTAR:

Moving and stationary target acquisition and recognition
 DL:

Deep learning
 ATR:

Automatic target recognition
 CNN:

Convolutional neural network
 SVM:

Supportvector machine
 DDC:

Deep domain confusion
 DAN:

Deep adaptation networks
 DaNN:

Domain adversarial neural network
 CDAN:

Conditional domain adaptation network
 MCC:

Minimum class confusion
 JAN:

Joint adaptation networks
 MCD:

Maximum classifier discrepancy
 tSNE:

Tdistributed stochastic neighbor embedding
 RKHS:

Reproducing kernel Hilbert space
 SPLCCR:

Selective pseudolabelling with class confusion regularization
 SPL:

Selective pseudolabelling
 SGD:

Stochastic gradient descent
References
S. Chen, H. Wang, F. Xu, Y. Jin, Target classification using the deep convolutional networks for SAR images. IEEE Trans. Geosci. Remote Sens. 54(8), 4806–4817 (2016)
S.A. Wagner, SAR ATR by a combination of convolutional neural network and support vector machines. IEEE Trans. Aerosp. Electron. Syst. 52(6), 2861–2872 (2016)
Z. Lin, K. Ji, M. Kang, X. Leng, H. Zou, Deep convolutional highway unit network for SAR target classification with limited labeled training data. IEEE Geosci. Remote Sens. Lett. 14(7), 1091–1095 (2017)
R. Qin, X. Fu, J. Dong, W. Jiang, A semigreedy neural network CAEHLCNN for SAR target recognition with limited training data. Int. J. Remote Sens. 41(20), 7889–7911 (2020)
M. Rostami, S. Kolouri, E. Eaton, K. Kim, Deep transfer learning for fewshot SAR image classification. Remote Sens. 11(11), 1374 (2019)
Y. Ma, Y. Liang, W. Zhang, S. Yan, SAR target recognition based on transfer learning and data augmentation with LSGANs. Paper presented at 2019 Chinese Automation Congress (CAC), HangZhou, China, 22–24 Nov, (2019)
Q. He, L. Zhao, K. Ji, G. Kuang, SAR target recognition based on taskdriven domain adaptation using simulated data. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)
E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, T. Darrell, Deep domain confusion: maximizing for domain invariance, Computer Science, (2014)
M. Long, Y. Cao, J. Wang, M. Jordan, Learning transferable features with deep adaptation networks. Paper presented at 2015 International Conference on Machine Learning (ICML), Miami, Florida, USA, 9–11 Dec, (2015)
B. Sun, K. Saenko, Deep Coral: Correlation alignment for deep domain adaptation. Paper presented at 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 Oct, (2016)
M. Long, H. Zhu, J. Wang, M.I. Jordan, Deep transfer learning with joint adaptation networks. Paper presented at 2017 International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 Aug, (2017)
M. Long, Z. Cao, J. Wang, M.I. Jordan, Conditional adversarial domain adaptation. Paper presented at 2018 Neural Information Processing Systems (NIPS), Montréal, Canada, 03–06 Dec, (2018)
K. Saito, K. Watanabe, Y. Ushiku, T. Harada, Maximum classifier discrepancy for unsupervised domain adaptation. Paper presented at 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA, 18–22 June, (2018)
Y. Jin, X. Wang, M. Long, J. Wang, Minimum class confusion for versatile domain adaptation. Paper presented at 2020 European Conference on Computer Vision (ECCV), 23–28 Aug, (2020)
W. Deng, Q. Liao, L. Zhao, D. Guo, G. Kuang, D. Hu, L. Liu, Joint clustering and discriminative feature alignment for unsupervised domain adaptation. IEEE Trans. Image Process. 30, 7842–7855 (2021)
E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, Z. Akata, Generalized zeroand fewshot learning via aligned variational autoencoders. Paper presented at IEEE Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, 16–20 June, (2019)
L. Van der Maaten, G. Hinton, Visualizing data using tSNE. J. Mach. Learn. Res. 9(11), (2008)
S. BenDavid, J. Blitzer, K. Crammer, F. Pereira, Analysis of representations for domain adaptation. Paper presented at Neural Information Processing Systems (NIPs), (2007)
Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Paper presented at 2014 Neural Information Processing Systems (NIPs), Montreal, Canada, (2014)
Y. Ganin, V. Lempitsky, Unsupervised domain adaptation by backpropagation. Paper presented at 2015 International Conference on Machine Learning (ICML), Lile, France, 6–11 July, (2015)
X. Zhu, A.B. Goldberg, Introduction to semisupervised learning. Synth. Lectures Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
X. Li, Q. Sun, Y. Liu, Q. Zhou, S. Zheng, T.S. Chua, B. Schiele, Learning to selftrain for semisupervised fewshot classification. Paper presented at 2019 Neural Information Processing Systems (NIPs), Jaipur, India, (2019)
M. Long, J. Wang, G. Ding, J. Sun, P.S. Yu, Transfer Feature Learning with Joint Distribution Adaptation. Paper presented at 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 3–6 Dec, (2013)
J. Zhang, W. Li, P. Ogunbona, Joint geometrical and statistical alignment for visual domain adaptation. Paper presented at 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 21–26 July, (2017)
Z. Pei, Z. Cao, M. Long, J. Wang, Multiadversarial domain adaptation. Paper presented at 2018 AAAI Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, USA, 2–7 Feb, (2018)
C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu, J. Huang, Progressive feature alignment for unsupervised domain adaptation. Paper presented at 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, 16–20 June, (2019)
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. Paper presented at IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, 26 June–1 July, (2016)
Q. Wang, T. Breckon, Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudolabeling. Paper presented at 2020 AAAI Conference on Artificial Intelligence, Hilton New York Midtown, New York, USA, 7–12, Feb, (2020)
B. Demir, C. Persello, L. Bruzzone, Batchmode activelearning methods for the interactive classification of remote sensing images. IEEE Trans. Geosci. Remote Sens. 49(3), 1014–1031 (2010)
R. Min, H. Lan, Z. Cao, Z. Cui, A gradually distilled CNN for SAR target recognition. IEEE Access 7, 42190–42200 (2019)
U. Von Luxburg, A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
S. BenDavid, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, J.W. Vaughan, A theory of learning from different domains. Mach. Learn. 79(1), 151–175 (2010)
T.D. Ross, S.W. Worrell, V.J. Velten, J.C. Mossing, M.L. Bryant, Standard SAR ATR Evaluation Experiments Using the MSTAR Public Release Data Set. Paper presented at Algorithms for Synthetic Aperture Radar Imagery, (1998)
Acknowledgements
The authors would like to thank the handing Associate Editor and the anonymous reviewers for their valuable comments and suggestions for this paper.
Funding
This work was supported by the Natural Science Foundation of Hunan Province, China under Grant 2021JJ30780.
Author information
Authors and Affiliations
Contributions
Lingjun Zhao designed the work, analyzed and interpreted the data, and drafted the manuscript. Qishan He participated in the design of the study, performed the experiments and analysis, and helped to draft the manuscript. Ding Ding and Siqian Zhang contributed to literature investigation. Gangyao Kuang and Li Liu contributed to revise the manuscript. All the authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhao, L., He, Q., Ding, D. et al. Selecting pseudo supervision for unsupervised domain adaptive SAR target classification. EURASIP J. Adv. Signal Process. 2022, 84 (2022). https://doi.org/10.1186/s1363402200906y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363402200906y
Keywords
 SAR image
 Target classification
 Unsupervised domain adaptation
 Selective pseudolabelling