Selecting pseudo supervision for unsupervised domain adaptive SAR target classification

In recent years, deep learning has brought significant progress for the problem of synthetic aperture radar (SAR) target classification. However, SAR image characteristics are highly sensitive to the change of imaging conditions. The inconsistency of imaging parameters (especially the depression angle) leads to the distribution shift between the training and test data and severely deteriorates the classification performance. To address this problem, in this paper we propose an unsupervised domain adaptation method based on selective pseudo-labelling for SAR target classification. Our method directly trains a deep model using the data from the target domain by generating pseudo-labels in the target domain. The key idea is to iteratively select valuable samples from the target domain and optimize the classifier. In each iteration, the breaking ties (BT) criterion is adopted to select the best samples with the highest scores of relative confidence. Besides, to avoid error accumulation in the iterative process, class confusion regularization is used to improve the accuracy of pseudo-labelling. Our method is compared with state-of-the-art methods, including supervised classification and unsupervised domain adaptation methods, over the moving and stationary target acquisition and recognition (MSTAR) dataset. The experimental results demonstrate that the proposed method can achieve better classification performance, especially when the difference of depression angles of the source and target domain images is large. Besides, our method also shows its superiority under limited-sample conditions.

of a full connection layer to realize feature classification in a convolutional neural network (CNN), and proposed a full convolution neural network called A-Convnet, which achieved a classification accuracy of 99.13% under the standard operating conditions of MSTAR data.Wagner et al. [2] used a support-vector machine to classify features extracted by CNN and proposed the structure of CNN-SVM (support-vector machine).At present, most of the DL-based methods can achieve an accuracy of more than 99% under the standard operating conditions of the MSTAR dataset.Target classification using limited data is also a research hotspot in SAR ATR.Many techniques based on adjusting model structure [3], transfer learning [4,5], data augmentation [6] are adopted to mitigate the overfitting problem caused by the limited training data.
With the development of machine learning in the field of SAR target recognition, the existing methods can achieve better target recognition performance under sufficient and insufficient training samples.However, most methods usually only use the labelled data in the source domain to train a classifier in a supervised way, so it is assumed that the training data and test data come from the same or similar distribution.When this assumption fails, the model learned on the training set is difficult to achieve good performance on the test set, which is caused by inconsistent distribution.SAR image is highly sensitive to imaging conditions.The inconsistency of imaging parameters such as depression angle, azimuth angle, and radar band will lead to the distribution mismatch between the training and test samples [7].As shown in Fig. 1, depression angle differences cause visual discrepancy of a certain vehicle target.In this case, the classifier trained by the source domain is difficult to obtain accurate classification results on the test set.Therefore, how to overcome the degradation of model generalization performance caused by different imaging conditions is an urgent problem to be solved.
Deep unsupervised domain adaptation techniques can be employed in addressing such domain shift problems.Existing deep unsupervised domain adaptation approaches mainly learn domain-invariant features from the labelled source domain data and the unlabelled target domain data via explicit source and target data distributions [8][9][10][11][12][13][14][15].Although impressive performance has been achieved in prior works, we argue that such methods designed for optical image classification tasks cannot achieve good results in SAR ATR tasks for two reasons.First, due to the high cost and difficulty of manual annotation of SAR data, deep neural networks turn out to be overfitting to a certain extent with limited labelled training data.Second, alignment between the source and target domains in an unsupervised manner is not efficient enough to handle large imaging discrepancy of SAR images.Furthermore, it has been justified that domain shift can be unified through a classifier trained on both the source and target domain data in a highdimensional feature space due to the dimensionality blessing [16].Hence, it is beneficial to considering the labelled target domain as an auxiliary training data.
To improve the domain adaptation ability of SAR image targets in different imaging conditions, an unsupervised domain adaptation approach based on selective pseudo-labelling is proposed in this paper.The idea of pseudo-labelling is to first train the initial classification model in the source domain, then test the samples in the target domain to generate pseudo-labels.In each iteration, the BT criterion is adopted to select the best samples with the highest scores of relative confidences.Next, target samples with pseudo-labels are added to the training set for supervised training.This process is repeated until the generated pseudo-labels do not change.On one hand, the pseudo-labelled samples in the target domain can expand the training set and inhibit the overfitting of the model.On the other hand, it can directly adapt the model to the potential feature distribution of the target domain.However, the pseudo-labelling strategy is extremely dependent on the initial pseudo-labels.If the initial pseudo-labels are wrongly assigned, it is easy to result in error accumulation and cause the model fall into local optimal solution.To avoid the error accumulation in the iterative process, class confusion regularization is used to improve the accuracy of pseudo-labelling.We conducted the experiments on SAR images with different depression angles to explore the cross-domain classification capability of our method.Based on the MSTAR dataset, six configurations of depression angles are considered to construct the source domain and target domain data for target classification.Our method is compared with state-of-theart methods, including supervised classification and unsupervised domain adaptation methods.Classification results and feature visualization based on t-distributed stochastic neighbor embedding (t-SNE) [17] demonstrate that the proposed method can achieve better classification performance, especially when the difference of depression angles of the source and target domain images is large.Besides, our method also shows its superiority under few-sample conditions.

Unsupervised deep domain adaptation
In computer vision, a common assumption behind image classification task is that the source domain and the target domain data have similar or the same support [18].However, in many real-world scenarios, this assumption fails since there may be no overlapping features across the source domain and target domain.We give formal related notations mathematically to define the problem.In supervised learning, given a source training sample set X s = {(x s i )} N i=1 and corresponding label set Y s = {(y s i )} N i=1 , the goal of a learner usually consists of finding a good hypothesis function h that captures in the best way possible the relation between X and Y .This relationship often extends beyond the training instances to test instances X t = {(x t i )} M i=1 drawn from the same probability.However, when such probability identity does not hold, a classifier trained on the labelled source domain suffers from significant performance drop when directly applied to the target domain, as shown in Fig. 2. To be specific, the marginal distributions of source and target domains are different, i.e., p(x s ) = p(x t ) , but the conditional probabil- ity distributions are identical, i.e., p(y s |x s ) = p(y t |x t ).
From this perspective, unsupervised deep domain adaptation approaches consider learning domain-invariant features through labelled source data and unlabelled target data in an end-to-end framework.DDC (deep domain confusion) [8] aims to learn transferable features by matching kernel embedding in reproducing kernel Hilbert space (RKHS) calculated from two distributions.DAN (deep adaptation networks) [9] improves the DDC domain metric by replacing it with a multi-kernel variant.Deep Coral [10] utilizes the idea of Coral to learn features with the same second order statistical property.JAN (joint adaptation network) [11] aligns features activated from multilayers using joint maximum mean discrepancy.These methods are based on different advanced domain discrimination metrics.Meanwhile, adversarial training strategy of generative adversarial networks [19] are employed in domain adaptation to learn domain-invariant features.The key is to play a 'minmax' game over the features from multilayers output activation.DaNN (domain adversarial neural network) [20] proposes the gradient reversal layer, through which features from two domains are made as indistinguishable as possible during the gradient backpropagation.The success of gradient reversal layer lies on that the 'minmax' process of the feature extraction network can be conducted in once backpropagation.CDAN (conditional domain adaptation network) [12] improves the adversarial training by capturing the mutual covariance between features and classifier's output.More recently, MCD (maximum classifier discrepancy) [13] attempts to align distributions of the source and target domains by utilizing the task-specific decision boundaries.Two classifiers are trained to maximize the discrepancy to detect target samples that are far from the support of the source data, and a feature generator learns to generate target features near the support to minimize the discrepancy.
It is noticed that unsupervised domain adaptation assumes the access of labelled data only from the source domain and unlabelled data from the target domain, hence it follows transductive learning paradigm.One effective method is to train networks with samples from both domains in a supervised manner.Pseudo-labelling has been used in many approaches to help address the lack of labelled data, such as semi-supervised learning [21] and few-shot learning [16,22].Two categories, i.e., hard labelling [23,24] and soft labelling [25] have been employed in many existing works.The main idea of hard pseudo-labelling is to assign a pseudo-label to each unlabelled instance and then train classifiers with an augmented labelled training set.It is supposed that the parameters of networks trained based on hard labelling tends to be stuck in local maximum since it does not consider each target sample's confidence.The strategy of soft labelling assigns the conditional probability of each class given a target sample [25].To address the mis-labelling issue, selective pseudo-labelling is another effective method, which selects part of the unlabelled samples in some sort of order to assign pseudo-labels.One key factor is how to make criterion of sample selection for pseudo-labelling.An easy-tohard strategy is employed in [26].Target samples whose similarity scores are higher than a certain threshold are selected for pseudo-labelling and this threshold is updated after each iteration of learning so that more unlabelled target samples can be selected.
Although impressive performance of unsupervised deep domain adaptation has been achieved in optical image classification, they are rarely used for SAR target recognition.In this paper, we propose a novel domain adaptation approach based on selective pseudo-labelling aimed to address the imaging discrepancy for SAR target classification.The main contributions of this article are as follows.
(1) SAR images are highly susceptible to imaging conditions, which causes recognition degradation on deep learning models.We firstly investigate the model's generalization ability across images captured from different depression angles, which is ignored in previous studies on SAR-ATR.(2) A selective pseudo-labelling strategy is introduced into the domain adaptation method.This strategy not only implicitly conducts feature alignment without moment distribution or adversarial learning, but also boosts the model's generalization under limited training data.(3) To avoid error accumulation of pseudo-labelling, class confusion loss is introduced into the iterative process as a regularization term, to enhance the pseudo-labelling accuracy gradually in each iteration.(4) Our proposed method obtains an obvious improvement over the compared domain adaptation methods across different depression angle data in MSTAR datasets.Furthermore, our method is more suitable in SAR-ATR under training conditions of limited samples.

Method
Classical domain adaptation methods usually learn domain-invariant features by directly or indirectly aligning the distributions of the source domain and target domain.However, for high-dimensional classification neural networks, it is possible to learn a classifier with certain generalization performance when labels in the source domain and target domain are available.Since unsupervised domain adaptation has assumed that there are no available labels for the target domain, we propose a SAR image domain adaptation approach based on selective pseudo-labelling with class confusion regularization (SPL-CCR).

Overall architecture of SPL-CCR
The diagram of SPL-CCR is shown in Fig. 3. SPL-CCR organize two main steps in an iterative learning strategy to generate pseudo-labels from the target domain.In the first step, we train the classification network with the labelled source data and classify the unlabelled target samples.Then, a fraction of target samples is assigned with pseudo-labels according to selection strategy.In the second step, the classification network is trained with two input streams.One of is composed of the source domain samples and the selected target domain samples.The other is composed of the unselected unlabelled target domain samples.The classifier gets stronger after learning from the former trainset in a supervised way.The second train set is also sent to the network to calculate the class confusion loss, which plays the role of regularization to alleviate the mis-labelling issue.Specially, we adopt ResNet18 [27] and a custom linear layer as our feature extraction network and classifier.
The key idea of pseudo-labelling is to iteratively establish valuable sample set from the target domain and optimize the classifier.Different from traditional supervised learning and domain adaptation, the classifier will be trained by data from both the source domain and target domain.To better describe the learning process, we introduce the model ( T , X s , X t , G, Q ).G is a supervised classification network, which is trained using the training set T .X s is the labelled source domain dataset.X t is the unlabelled target domain dataset used to provide pseudo-labels.Q refers to the pseudo-labelling strategy, which is used to select valuable samples and automatically assign pseudo-labels.
As shown in Table 1, pseudo-labelling-based domain adaptation is trained in an iterative way.In the beginning, the training set is initialized as the source domain data X s , which have been used to train the classifier G .Next, in each iteration, the most valuable samples from the unlabelled target domain are selected and each is assigned a pseudolabel.The selected samples with pseudo-labels are added to the training set X l and the  updated training set is used to retrain the classifier G .The process is repeated until the iteration condition is reached or the pseudo-labels no longer change.

Pseudo-labelling selection strategy
Wang et al. [28] proved that it is beneficial to select some valuable pseudo-labelled samples from the target domain as part of the training set instead of using all the pseudolabelled samples.When domain discrepancy exists, the classifier trained on the initial training set, i.e., the labelled source domain data, usually shows low accuracy on the target domain.Therefore, it is better to use only a small fraction of the target samples at the beginning.
It is supposed to select the samples with high probability to be correctly classified by the current network to alleviate the mis-labelling issue.We adopt the BT criterion [29] which is inspired to the multiclass-level uncertainty for classification with SVMs.The main idea of the BT criterion is that the best sample has the least uncertainty between the two classes to which it is most likely to belong.The relative confidence of the target sample given by the current network is defined as follows represents the class to which x i is most likely to belong; �/c + denotes the set of all the class labels in except c + .If max c∈� p( y i = c|x i ) ≫ max c∈�/c + p( y i = c|x i ) , the probability that x i belongs to class c + is high.
If max c∈� p( y i = c|x i ) and max c∈�/c + p( y i = c|x i ) are close, x i is more likely to be misclassified.Therefore, selecting pseudo-labelled target samples with top soft confidence scores can prevent adding the mislabelled samples to the training set.It is noticed that unlike other works [26,28] using hard confidence, i.e., U i = max c∈� p( y i = c|x i ) , the relative confidence is more reasonable and better.At the beginning of training, the classification ability of the network is weak so that it cannot get a confident and reliable output over categories.For example, we think that a sample with the output of (0.2, 0.6, 0.2, 0.1) is more reliable than a sample with the output of (0.65, 0.35, 0, 0) although the latter gets higher confidence in the first class.In real-world applications, the relative confidence can be used on a probabilistic model rather than a decision model, since it is calculated by the two highest probabilities from the model's prediction.Not just neural networks, other machine learning methods, such as SVMs and Regression, can determine the relative confidence through subtracting the second-highest category probability from the highest probability.
We adopt the following selective pseudo-labelling strategy.In the k -th iteration, the relative confidence of all the target domain samples is calculated.For class c , the number of selected pseudo-labelled target domain samples N (c, k) is determined by where T is the number of iterations.N c is the average sample number of C classes in the target domain. (1) ) is the number of the target domain samples which are classified into the c -th class in the k -th iteration.Our pseudo-labelling selection allows balanced pseudo- labelled target samples across different classes.The number of predicted pseudo-labels n t (c, k) increases as the iteration proceeds.As a result, there can be a large number of selected pseudo-labelled samples for 'easy' class while very limited pseudo-labelled samples for other samples.We make a minimization with the N c × k/T in order to prevent the network to be biased to the 'easy' class so that pseudo-labelled target samples will contribute to the alignment of distribution for each class during learning.

Loss function of class confusion
After the pseudo-labelled target domain samples are added to the training set, the classifier can directly learn knowledge from the target domain.However, the accumulated error caused by mis-labelling still exists.According to Eq. ( 1), in each iteration the target domain samples with high relative confidence are more likely to be selected as the training samples in the next iteration than those with relative confidence scores.Obviously, samples that make the classifier ambiguous and unconfident across classes may not be fully utilized in domain adaptation learning.As shown in Fig. 4, the selected target domain samples correspond to a sparse classification confusion matrix, while the classification confusion matrix of the unselected target domain samples is more dispersed.
Minimum class confusion (MCC) [14] is a general loss function which can be characterized as a domain adaptation method without explicitly deploying domain alignment since it only uses the target domain data.When the selected target domain samples are added to the training set for the next iteration, the unselected samples will be used to calculate the minimum class confusion as the regularization item for classifier training.Introducing MCC has the following advantages.Firstly, it can be used as a general regularization that prevents the network from being stuck in the local optima.Secondly, in each iteration, the samples farthest to the classification hyperplane are selected, as shown in Fig. 4. Utilizing class confusion term of the unselected target domain samples as a part of loss function in current iteration makes complementary improvement from the target domain.After current iteration, they are more likely to be given higher certainty and selected in the next Fig. 4 Classification confusion matrices of selected and unselected samples in the target domain iteration for class prediction.Thirdly, MCC can largely accelerate convergence and achieve a high domain adaptation performance with limited number of iterations.
We add class confusion as a regularization on unlabelled target domain samples.The confusion between different classes can be naturally described by an inner-product between the classifier predictions and their responses.Firstly, temperature rescaling [30] is added to the softmax output of the classifier to alleviate the overconfident predictions.The probability that the i-th instance belongs to the j-th class is expressed as where Y i,j is the logit produced by the network of the i-th instance.The class correlation between classes j and j ′ is defined as where z j denotes the probabilities that the samples in each batch come from the j-th class.The class correlation measures the probability that the classifier simultaneously classifies the examples into the j-th and j ′ -th class.
Those examples with higher certainty in class predictions given by the classifier are more reliable and should contribute more to the pairwise class confusion [14].Furthermore, a weighting mechanism based on the uncertainty is added such that class confusion could highlight the samples with higher certainty in class predictions and ignore the samples that shows little category tendency.The entropy function is used as the measure of uncertainty, which is defined as With weighting mechanism, preliminary definition of class confusion is defined as where In Eq. ( 7), W ii is the probability of quantifying the importance of the i-th sample for mod- eling the class confusion.W is the corresponding diagonal matrix in Eq. ( 6).Finally, the formal class confusion loss function that is native for the mini-batch SGD optimization is written as In Eq. ( 8), it is noted that a category normalization technique [31] is adopted to prevent a severe class imbalance when the number of classes is large.

Total loss function
The total loss function of the classifier L total consists of two parts, the classification loss L cls and the class confusion loss L CC .
where X s refers to the source domain samples.X pse t the pseudo-labelled target domain samples in current iteration.X unselected t the unselected target domain samples.L cls is given by where N s and N pse t are the numbers of the source domain samples and the pseudolabelled target domain samples, respectively.y ic and y pse jc represent the sign functions of the i-th source domain sample and the j-th pseudo-labelled target domain sample, respectively, as given by Eq. ( 11) and Eq.(12).p ic and p jc refers to the probabilities that the i-th source domain sample and the j-th pseudo-labelled target domain sample belongs to class c.

Algorithm of the proposed SPL-CCR for SAR target recognition
To summarize, the proposed method of SPL-CCR for SAR target recognition is shown in Algorithm 1.In the beginning, the training set initialized as the source domain data X s is used to train the classifier G .Next, in each iteration, the samples with the highest probability to be correctly classified from the unlabelled target domain are assigned with pseudo-labels and selected.The selected samples are used to form new training set and retrain the classifier G .The process is repeated until the iteration condition is reached or the pseudo-labels no longer change.(8)

Results and discussion
In this section, we describe our experiments on the MSTAR dataset for SPL-CCR for SAR image classification.Our method is firstly compared with state-of-the-art unsupervised domain adaptation methods to evaluate its effectiveness.Then, we use the t-SNE technique to visualize the extracted features from the source and target domains.Besides, we quantitatively measure the domain discrepancy across different methods using A-distance [32].Finally, we investigate the performance under conditions with limited source and target samples.
Considering the sensitivity of SAR images to depression angles, we conduct the experiments on images with different depression angles to explore the cross-domain classification capability of our method.The MSTAR dataset contains SAR images with four depression angles, i.e., 15 • ,17 • , 30 • and 45 • .There are four classes (2S1, BRDM2, T72 and ZSU234) of target images covers all these depression angles.Figure 5 shows some optical and SAR images of the four classes of targets with different depression angles.The numbers of SAR images of different targets under different depression angles are given in Table 2.We construct six domain adaptation tasks by setting different source and target data configurations, i.e., 17

Experimental setting
Each sample in the MSTAR dataset is cropped to the size of 128 × 128 pixels and no image augmentation and pre-processing algorithm is applied.The algorithms are implemented in Pytorch1.7.The classification model is trained iteratively by a stochastic gradient descent (SGD) optimization algorithm with a momentum of 0.9.The learning rate is adjusted using the simulated annealing strategy with the following schedule where η 0 is the initial learning rate.α= 0.001.β= 0.75.p represents the ratio of current epoch and total epochs, gradually increasing from 0 to 1.
The pseudo-labels of the target domain samples in the last epoch are used as the predictions of the classification model.Different from most traditional domain adaptation methods, pseudo-labelling is implemented in an iterative way.Therefore, the epoch number in each iteration for pseudo-labelling is set as 10 and the maximum iteration number is set as 10.Epoch number for other domain adaptation methods is set as 100, for a fair comparison on training total epochs with methods based on pseudo-labelling.

Comparison with other approaches
We compare our method with the most competitive classification methods including the supervised-learning-based (ResNet18) and unsupervised domain adaptation methods (SPL, DDC, DaNN, CDAN, MCC, JAN, MCD).SPL refers to the selective pseudo-labelling method without class confusion regularization.All hyper-parameters of the compared method are adopted in the same scheme for experimental fairness.We apply all methods based on Resnet-18 to evaluate their performance.The average classification accuracy of each method is reported on three random experiments and each method is trained for 80 epochs per experiment.The batch size is set to 8.For MMD-based methods, i.e., DDC and JAN, we adopt a Gaussian kernel with bandwidth set to median pairwise squared distances.Each method is optimized using SGD with a momentum of 0.9 and a weight decay of 5 × 10 −4 , under the same learning rate adjusting scheme with SPL and SPL-CCR.The classification results of six domain adaptation tasks with different methods are given in Table 3.The classification precision, namely the ratio of the numbers of correctly classified samples and total samples, is chosen as the classification evaluation metric in the following tables.
According to Table 3, we find that the differences of depression angles result in large performance differences in classification.We should first emphasize that all methods (13)  have achieved above 99% recognition accuracy over the source domain due to the strong fitting ability of the ResNet-18 network, which is not shown in the table.However, for all the methods, the classification performances decrease with the increasement of the depression angle difference.Our SPL method achieves an average accuracy of 93.17% over the six domain adaptation tasks which outperforms all the comparative methods except SPL-CCR.This proves that generating pseudo-labels can directly capture the information of the target domain data.Although the feature distributions are not strictly aligned in the hidden space of the middle layers of the network, the SPL method is able to comprehensively represent the target domain data.As for the proposed SPL-CCR method, its average accuracy over the six tasks increases to 96.56%.On the one hand, class confusion regularization is used as a domain adaptation loss function, which can narrow the distribution discrepancy of the output between the source and target domains.Thus, the network will generate more accurate pseudo-labels in the next iteration.On the other hand, class confusion regularization improves the convergence of the method.Higher domain adaptation performance can be achieved under limited number of iterations.
In the task of 17 • → 45 • , SPL-CCR, SPL and MCC reach the top three scores of classification accuracy, which are 91.87%,84.28% and 82.10%, respectively.The accuracy of SPL-CCR is higher than that of MCC by 9.77%.In the task of 45 • → 17 • , SPL- CCR, SPL and CDAN achieve the top three scores of classification accuracy, which are 91.98%,87.06% and 86.52%, respectively.The accuracy of SPL-CCR is higher than that of CDAN by 5.46%.It is noted that JAN and CDAN have a lower performance in the tasks of 17 • → 45 • and 45 • → 17 • , compared to other experimental setups.Especially, JAN performs worst compared to other methods.Due to huge domain discrepancy, hard-to-transfer examples with uncertain predictions may deteriorate the conditional adversarial adaptation procedure.Hence, features cannot be aligned through capturing the cross-covariance of feature representation and classification prediction.In the task of 30 • → 45 • , SPL-CCR, CDAN and SPL achieve the top three scores of classifi- cation accuracy, which are 96.49%,93.23% and 90.47, respectively.CDAN outperforms SPL, but its accuracy is 3.26% lower than that of SPL-CCR.In other three tasks, our SPL-CCR method does not significantly outperform the other methods.This indicates that feature-based domain adaptation methods are sufficient to improve model generalization when the difference of depression angles is small.However, if the difference Firstly, we visualize the features in the domain adaptation layer of each network.Here, the domain adaptation layer refers to the layer before the output layer.Figure 6 shows the visualization results.Blue and red dots represent the source domain samples and target domain samples, respectively.Figure 6a shows the feature distribution of the domain adaptation layer of ResNet18.The network is trained using only the source domain data.Misalignment of the source domain and target domain samples are quite severe, which leads to bad classification performance.According to Fig. 6bd, domain adaptation methods such as DDC, DaNN and CDAN can align samples from the source domain and target domain over the tasks of 17 • → 30 • and 30 • → 45 • .However, we can see that the source domain samples and target domain samples are not well aligned in the task of 17 • → 45 • .This indicates that the domain adaptation loss usually plays a regularization role in the training process and improves the classification performance in the target domain by reduce domain discrepancy.But, it deteriorates the classification in the source domain and results in the nonnegligible generalization error in the target domain.As for our SPL-CCR method, according to Fig. 6e, since the proposed method does not use metric criteria or adversarial learning methods for domain adaptation, the distribution differences still exist.However, both the source and target domain data show strong separability, which indicates our method is able to capture the data modality of the target domain and adapt the data distribution.
Secondly, we visualize the feature in the output layer of each network, as shown in Fig. 7.The feature distribution in the output layer of a network can directly reflect the generalization error in the source and target domains as well as domain adaptation ability.According to Fig. 7, our SPL-CCR method can extract features which are both domain and class discriminative over the three tasks.It should be pointed out that other domain adaptation methods such as DaNN and CDAN can also achieve good adaptation ability in the tasks of 17 • → 30 • and 30 • → 45 • .Especially in the task of 30 • → 45 • , the source domain and target domain samples are better aligned by DaNN or CDAN than SPL-CCR.However, in the task of 17 • → 45 • , SPL-CCR shows stronger ability in fea- ture alignment than DaNN and CDAN.On the contrary, although our SPL-CCR method does not perfectly align the features from the source and target domains, each class is still distinguishable in both domains over the three tasks.

Domain discrepancy comparison
A-distance is often used to measure domain discrepancy in domain adaptation researches.It is defined as where D s and D t denote the source and target domain samples, respectively.L g is the generalization error of a two-sample classification.Here we use a single-layer network and sigmoid function as a binary classifier.
The results of A-distance of different methods over all the six tasks ( 17

Classification with limited samples
This subsection compares our SPL-CCR method with other methods in the classification performance under conditions of limited samples.Since the training data includes both the source and target domain data, N t samples are randomly selected for each class as the training data in both domains.N t is set to 10, 20 and 30.The test data are all samples of the target domain.Numbers of training and test samples in each task are given in Table 4.
Tables 5, 6 and 7 show the classification results of different methods over all the six tasks with limited samples, i.e. 10, 20, and 30 samples per class, are randomly selected from the training set in the source domain and target domain, respectively.It can be obviously seen that as the number of training samples increases, the classification  The classification results with limited samples are consistent with the results in Sect.4.3, where the training samples are more sufficient.Although the improvement is reduced, our SPL-CCR method still outperforms other methods and proves its superiority under limited-sample conditions.It also suggests that domain adaptation methods based on pseudo-labelling are effective in SAR target recognition with sample limitation since pseudo-label can make a label complement to prevent network overfitting.
With very limited samples, SPL-CCR or SPL do not achieve the best performance over some tasks, e.g., 30° → 45° with 10 samples selected for per class and 45° → 30° with 20 samples selected for per class.The reason may be that mis-labelling issue is more severe in the limited sample cases, which causes error accumulation and classification performance drop.Furthermore, Tables 5, 6 and 7, combination of class confusion regularization with SPL displays noticeable accuracy enhancements in most tasks, which proves the effectiveness of class confusion regularization under limited-sample conditions.

Conclusions
In this paper, we proposed a novel method for SAR target classification from a perspective of domain adaptation to tackle performance degradation problem caused by variant imaging conditions.A selective pseudo-labelling strategy based on the BT  criterion and class confusion regularization is designed.Part of the target domain samples are assigned pseudo-labels and added to the training set in an iterative way.Therefore, data information in the target domain can be directly studied.Considering the problem of error accumulation of pseudo-labelling, class confusion loss is introduced into the iterative process as a regularization term, which improves the network's adaptation to the target domain samples.We conducted the experiments on SAR images with different depression angles to explore the cross-domain classification capability of our method.Based on the MSTAR dataset, six configurations of depression angles are considered to construct the source domain and target domain data for target classification.The proposed SPL-CCR method achieved an average accuracy of 96.56% over all the six tasks, which is significantly higher than those of other comparative methods such as ResNet18, DDC, DaNN, CDAN, MCC, JAN and MCD.The t-SNE feature visualization results show that the proposed method has strong ability in feature alignment across two domains and extracts features maintaining good separability from the target domain.Besides, our method also shows its superiority under limited-sample conditions.At present, our work mainly focuses on the variation of depression angle.In the future, we will further study SAR ATR tasks with other type of imaging condition variations.

Fig. 1
Fig. 1 SAR images of the 2S1 vehicle target at different depression angles.a Optical image.b SAR image with 15° depression angle.c SAR image with 17° depression angle.d SAR image with 30° depression angle.e SAR image with 45° depression angle

Fig. 2
Fig. 2 Illustration of classifier and samples under two cases.a Distributions of the source and target domains keep the same.b Distributions of the source and target domain is quite different

Fig. 5
Fig. 5 Optical and SAR images of four classes of targets in the MSTAR dataset

Fig. 8
Fig.8 The results of A-distance of different methods over all the six tasks

Table 1
Domain adaptation model based on pseudo-labelling s : Dataset in the source domain X t : Dataset in the target domain X l : Labelled training data set initialized as the source data X s G : Classifier trained on T Q : Selective pseudo-labelling strategy Initialization Train the classifier G on the labelled training data set X l Assign pseudo-labels for all the data in X t using the classifier G Repeat: Select pseudo-labelled samples in X t according to Q Add samples selected in (3) to X l Retrain the classifier G Until the iteration condition is met or the pseudo-labels no longer change

Table 2
Numbers of SAR images of different targets under different depression angles

Table 3
Classification results of different methods on MSTAR dataset of depression angles is large, e.g., the 17 • → 45 • or 45 • → 17 • tasks in our experiment, classic unsupervised domain adaptation methods fail to align the feature distributions well, due to large domain discrepancy.Our SPL-CCR method can transfer the knowledge of the source domain to the target domain by iteratively generating pseudo-labels and performs better even if the difference of depression angles is large.For qualitative comparison of different methods, we use the t-SNE technique to visualize the features of the source domain and target domain over the tasks of 17 • → 30 • , 17 • → 45 • and 30 • → 45 • .Our SPL-CCR method are compared with ResNet18, DDC, DaNN and CDAN, since they represent different types of methods and most of them achieve the highest accuracy in a certain task.

Table 4
Experimental data for classification with limited samples

Table 5
Classification results of different methods with 10 samples randomly selected for each class