Radar signal recognition based on triplet convolutional neural network

Recently, due to the wide application of low probability of intercept (LPI) radar, lots of recognition approaches about LPI radar signal modulations have been proposed. However, facing the increasingly complex electromagnetic environment, most existing methods have poor performance to identify different modulation types in low signal-to-noise ratio (SNR). This paper proposes an automatic recognition method for different LPI radar signal modulations. Firstly, time-domain signals are converted to time-frequency images (TFIs) by smooth pseudo-Wigner–Ville distribution. Then, these TFIs are fed into a designed triplet convolutional neural network (TCNN) to obtain high-dimensional feature vectors. In essence, TCNN is a CNN network that triplet loss is adopted to optimize parameters of the network in the training process. The participation of triplet loss can ensure that the distance between samples in different classes is greater than that between samples with the same label, improving the discriminability of TCNN. Eventually, a fully connected neural network is employed as the classifier to recognize different modulation types. Simulation shows that the overall recognition success rate can achieve 94% at − 10 dB, which proves the proposed method has a strong discriminating capability for the recognition of different LPI radar signal modulations, even under low SNR.

methods [7], frequency-domain methods [8,9], time-frequency domain methods [5,10], and transform-domain methods [11]. And for the classifier, both traditional machine learning (ML) [12,13] and prevalent deep learning (DL) [14,15] are widely applied. Especially for the DL methods, more and more attentions have been paid to them recently, due to their superb performance. It has already been proved that compared with other models of DL, such as Stacked AutoEncoder (SAE) [16,17] and Deep Belief Network (DBN) [18,19], CNN has a better performance in many areas such as time series prediction [20], target detection [21], and object identification [22,23]. Therefore, the method of combining TFIs and CNN stands out from all these approaches. Because compared with any single domain method mentioned above, the time-frequency technique performs well in the aspect of anti-noise [24]. Meanwhile, employing CNN as the encoder module means that manual intervention will not be needed anymore, which makes the recognition process more reasonable and reliable.
Existing recognition methods of LPI radar signal modulations are mostly based on time-frequency analysis and DL. Lunden and Koivunen [5] presented a large set of features extracted from TFIs of radar signals, and fed them into a MLP classifier to perform the classification. However, the selections of these features need some prior information and human intervention. Zhang et al. [25] firstly explored an automatic recognition system for radar waveforms based on Choi-Williams Distribution (CWD) and CNN. The system didn't need any prior information and manual intervention to recognize radar waveforms. For 8 kinds of radar waveforms (LFM, BPSK, Costas, Frank, and T1-T4), the overall RSR was more than 93.7% when SNR was greater than −2 dB. In [26], the authors also chose CWD to process received radar signals. Features extracted from TFIs were fed into Elman neural network (ENN) for classification. Different from [25], they took P1, P2, P3, and P4 polyphase classes into account, expanding the types of recognition waveforms. The overall RSR of 8 radar waveforms (LFM, BPSK, Costas, Frank code, P1-P4 code) was 94.7% at SNR of − 2 dB. However, the feature extraction process in their research still required manual design and was cumbersome to handle. Guo and Chen [27] used an improved AlexNet to classify LPI radar signals. They successfully classified 10 types of radar signals at − 6 dB, including CW, NLFM, LFM, BPSK, Costas, Frank, and T1-T4. Their research not only expanded the types of identification, but also achieved a better result in a lower SNR. With the rise of transfer learning, more and more new methods have been explored in the area of radar waveforms recognition. Guo et al. [28] adopted a transferred CNN to recognize the TFIs of radar signals. By virtue of transfer learning, their system achieved the recognition of radar waveforms with a small number of training samples, providing a new method for the circumstance of insufficient training samples. In addition, Xiao et al. [29] took advantage of feature fusion algorithm and transfer learning to achieve a good recognition results at − 4 dB. The above methods are gradually improving the ability of radar signal recognition. Their progresses encourage more and more methods of radar signal identification to be explored and applied in electronic reconnaissance.
To improve the classification accuracy in lower SNR cases, this paper proposes an automatic recognition approach to achieve accurate recognition of LPI radar signal modulations. The approach involves analyzing radar signals in time-frequency domain, designing a feature encoder named TCNN, and constructing a FCNN as the classifier.
For simplicity, TCNN-FCNN will be used to represent the proposed method in the following. The specific contribution of this paper can be summarized as follows: • This paper proposes a TCNN-FCNN structure to address the problem of LPI radar signal modulations recognition in low SNR. As an end-to-end model, TCNN-FCNN can identify different modulation types accurately even when SNR is -10dB. It means that our method provides a solid basis for further research on modulation recognition in complicated electromagnetic environments. • The proposed method employs triplet loss in the process of LPI radar modulation identification. By setting a margin between each positive pair and negative pair, triplet loss minimizes the distance between samples with the same label and maximizes the distance between samples with different labels. Experiments show that the discriminability of the model trained with triplet loss is effectively enhanced. • Different from other existing approaches, the proposed method emphasizes the role of the objective function in the training process. To some extent, it can provide novel ideas for LPI radar signal modulations recognition.
The rest of this paper is organized as follows. The overall structure of our recognition system is proposed in Sect. 2. Section 3 briefly introduces the groundworks of the proposed method including the signal model and SPWVD technique. Main methods of the system are introduced in Sect. 4, involving specific structure of models, triplet loss, t-Distributed Stochastic Neighbor Embedding (t-SNE) technique, etc. Section 5 shows and analyzes the performance of the proposed recognition system. Finally, the conclusion of this paper is drawn in Sect. 6.

System overview
In this section, an automatic recognition method of LPI radar signal modulations is described in detail. The specific structure of the system is shown in Fig. 1. At first, all received LPI radar signals are converted into TFIs by SPWVD. Since SPWVD describes the distribution of signal energy over time and frequency on a two-dimensional plane, TFIs can reflect the distinction between different modulation types of LPI radar signals, even at low SNR. Next, the signal dataset is separated into train dataset and test dataset. Then, the CNN is designed as the feature encoder to extract features automatically. Note that triplet loss plays an important role in the iterative training process of CNN. In the high-dimensional space, the distribution of features extracted from the same class is Fig. 1 Framework of the proposed method more concentrated than those from different classes with the assistance of triplet loss. Finally, as the classifier, FCNN is tuned by cross-entropy loss to achieve multi-classes classification accurately. In particular, t-SNE technique is adopted to visualize the 2-D distribution of high-dimensional feature vectors obtained by the designed TCNN, ensuring that triplet loss actually works in the identification process, and providing a visual proof for the classification results. It is noteworthy that triplet loss is employed to optimize parameters of CNN encoder and cross-entropy loss is used to update parameters of FCNN classifier, separately. As a metric loss function, triplet loss aims to maximize the similarity of within-class and minimize the similarity of between-class. It works by narrowing the distance between intra-class samples and increasing the distance between inter-class samples in higher dimensional space. Accordingly, it is calculated in the embedding space basing on the feature vectors extracted by CNN encoder, while cross-entropy loss is a common classification loss function. It is calculated by comparing target labels and predicted outputs of the last dense layer. By minimizing cross-entropy loss, FCNN can tag each training sample with a corresponding label. Consequently, we choose triplet loss and cross-entropy loss to update the parameters of CNN encoder and FCNN classifier, separately. More details about triplet loss and cross-entropy loss are introduced in Sect. 4.2.

Signal model
In general, the filtered radar signal r(t) consists of radar modulated signal s(t) and additive Gaussian white noise (AGWN) n(t) [26]. Corresponding signal model can be expressed as where A is the amplitude and ϕ represents the modulation phase. For the sake of simplicity, we assume A = 1 . Different values of SNR are designed in the cause of mimicking the complexity of actual application environment. The definition of SNR in this paper is where �·� 2 is L2-norm. E s(t) 2 2 and E n(t) 2 2 denote the mean of s(t) 2 2 and n(t) 2 2 , respectively.

Smooth pseudo-Wigner-Ville distribution
As a kind of Cohen class time-frequency distribution, SPWVD adopts smoothing operations in both frequency and time domains. Therefore, it can eliminate the cross-term interference distributed both along the time axis and the frequency axis [30].
where * denotes the complex conjugate. r(t) is the complex signal received by radar, which is shown in Eq. 1. t and f represent time and frequency variables, respectively.
(1) r(t) = s(t) + n(t) = Ae jϕt + n(t) is the kernel function of SPWVD. g(τ ) and h(v) are the independent low pass filters and work on the time delay τ and the frequency shift v, respectively. For the TFI generated by SPWVD, cross-term interference is eliminated at the cost of decreasing the time-frequency concentration. Namely, the smoothing operation of SPWVD will reduce the time-frequency resolution, resulting in a loss of some useful information. To increase the time-frequency concentration and improve the timefrequency resolution of TFIs, a proper selection of window function is needed. In this paper, we choose the Gaussian window function as the smoothing filter, since the Gaussian window function has no negative sidelobes and no sidelobes fluctuation, which means the spectral energy leakage can be suppressed to a certain extent.
Another critical parameter for SPWVD is the window length. Actually, there have been some related works [31][32][33] on parameters selection of time-frequency distribution. Inspired by [31], we define three levels of window lengths: small, medium and large, and choose "33, 133, 233" as the concrete representations of them, respectively. Figure 2 shows different TFIs generated by SPWVD under different combinations of window lengths. L g and L h denote the length of Gaussian window g and h. As shown in Fig. 2, there is an issue about energy leakage in Fig. 2a, d, g. It demonstrates that severe energy leakage exists in TFIs when L h is small, while as L g increases, the time resolution becomes worse, so that some useful information cannot be displayed in TFIs. This is verified by Fig. 2e- Fig. 2c is larger than Fig. 2b, which results in a lower frequency resolution in Fig. 2c. Therefore, to make a trade-off between less energy leakage and high resolution, the combination of L g = 33 and L h = 133 is chosen in this paper. In fact, the selection of window length is not be strictly restricted in this paper. Parameters which can ensure that mutation features of signals are fully reflected and no severe spectrum energy leakage exists in TFIs will be included in the selection.
As shown in Fig. 3, each TFI describes the change of signal instantaneous frequency with time clearly. Different TFIs can intuitively reflect different signal modulation types. Therefore, it is feasible and dependable to recognize different modulation types by TFIs.

Structure of designed models
In this section, the architecture of the encoder module TCNN presented in Fig. 4a and the classifier FCNN shown in Fig. 4b will be introduced in detail.
As shown in Fig. 4a, the encoder module has 2 convolutional blocks. Each of them is comprised of a convolutional layer, a batch normalization layer, an activating function, and a pooling layer. The convolutional layer in Conv Block 1 has 128 kernels with the kernel size of 3 × 3 , aiming to extract feature maps from TFIs. In particular, to reduce internal covariate shift, avoid vanishing gradient and accelerate the convergence speed of the model, a batch normalization layer [34] is added, since it can ensure that input data of the activation units will obey Gaussian distribution. Rectified linear unit (ReLU) is adopted as the nonlinear activating function to provide nonlinearity for the model and alleviate overfitting. To retain major features and reduce the complexity of the network, a max-pooling layer with the kernel size of 2 × 2 is employed. The structure of Conv Block 2 is the same as Conv Block 1, except that the number of kernels is 64 in the convolutional layer. A dense layer with ReLU activating function is used to integrate the learned "distributed features. " Eventually, after the forward propagation, a 128-dimensional Fig. 3 TFIs of different modulation types feature vector of input TFI is obtained; especially, triplet loss is employed as the objective function during the back propagation and is detailed explained in Sect. 4.2. We choose Adam [35] as the optimization algorithm to minimize triplet loss instead of traditional stochastic gradient descent (SGD), because it only needs first-order gradients with high computational efficiency and little memory requirements.
As illustrated in Fig. 4b, the FCNN model is composed of dense layers, completely. The number of neurons in dense 1 and dense 2 is 128 and 10, respectively. Dense 1 still uses ReLU as the nonlinear activating function. Dense 2 utilizes softmax function to achieve multi-objective classification. FCNN model uses Adam to optimize as well, except that cross-entropy loss is employed as the objective function during the back propagation.

Triplet loss
Two different objective functions are mentioned in Sect. 4.1. Both triplet loss [36][37][38][39] and cross-entropy loss are widely used in deep neural networks. Cross-entropy loss is usually employed in multi-classification missions [40]. In high-dimensional embedding space, cross-entropy loss aims to project samples with the same label to the same place, and map the rest samples with different labels to other places. However, it doesn't take account of the distance between different classes [41]. This may cause an unsatisfied circumstance that the distance between samples with the same label d inter is farther than the distance between samples of different classes d intra . The discrepancy between triplet loss and cross-entropy loss is shown in Fig. 5, where the same shape represents the same class, and different colors represent different samples of each class. Figure 5a illustrates This unsatisfied circumstance can be addressed by using triplet loss as the objective function to optimize models. The effect of triplet loss is displayed in Fig. 5c. Apparently, triplet loss is designed to update the parameters of models by enforcing a margin between each sample from one class to all samples from other classes [36]. Not only can it minimize d inter , but it also can maximize d intra .
More specifically, as shown in Fig. 4a, TCNN maps initial TFIs into high dimensional Euclidean space, and the embedding function can be represented by M θ : R H ×W ×3 → R D , where θ denotes the encoder module. Each TFI with size of H × W will be represented as a D-dimensional feature vector f i ∈ R D , i = 1, 2, . . . , m by the embedding module, where f i is the output of TCNN.
Among all these f i , an anchor feature vector f a i is chosen randomly. Then, a positive feature vector f p i which has the same label with f a i and a negative feature vector f n i whose label differs from f a i are needed to construct a valid triplet. For each given f a i , triplet loss needs to ensure that f a i is closer to all other f p i . In the meanwhile, f a i also should stay away from any other f n i . The main purpose of triplet loss is to satisfy the following condition: The objective function of triplet loss can be written as: In summary, with the assistance of triplet loss, the discriminative ability of the encoder module will be efficiently enhanced during the process of training.

Visualization by t-SNE
To demonstrate the effect of triplet loss further and provide an intuitive explanation for the results of the classification in subsequent experiments, t-SNE technology is adopted as the visualization tool in this paper. The basic theory of t-SNE will be discussed in this section. t-SNE [42][43][44] is a variation in Stochastic Neighbor Embedding (SNE) technique [45,46]. It can visualize high-dimensional data by providing a location in a two or three-dimensional space for each datapoint [42].
Compared with SNE, t-SNE employs a Student t-distribution in the low-dimensional space, instead of Gaussian distribution. Since Student t-distribution is closely related to the Gaussian distribution and has much heavier tails than Gaussian, it can alleviate the crowding problem to some extent. The principle of t-SNE is as follows: • For high-dimensional feature vectors f 1 , f 2 , . . . , f m , t-SNE converts Euclidean distance between f i and f j into a joint probability p ij obeying Gaussian distribution.
The formulation can be written as where σ denotes the variance of Gaussian distribution. • In low dimensional space, a similar probability q ij is computed by using Student t-distribution with a single degree of freedom.
where m i and m j are the low-dimensional mapping points of high-dimensional feature vectors f i and f j . • t-SNE tries to find an optimal low-dimensional data representation which will match p ij and q ij as well as possible. The objective function of t-SNE is shown in Eq. 8.
where KL(P Q ) denotes the Kullback-Leibler divergence between P which is the joint probability distribution over high-dimensional feature vectors and Q which represents the joint probability distribution over low-dimensional mapping points. • By minimizing Eq. 8, t-SNE can find the optimal low-dimensional representation.
The gradient of Eq. 8 is given by

Experiments and analysis
To evaluate the performance of the proposed TCNN-FCNN method, some experiments and analyses are presented in this section.

Dataset
The dataset includes 10 different kinds of LPI radar signal modulations mentioned in Sect. 3.3. The parameters of simulation signals are set dynamic ranges so as to verify the generalization performance of the designed framework. Corresponding parameters are shown in Table 1.
For each class, there are 1000 samples in the dataset. We randomly choose 800 samples from each class as the training dataset D train , and the rest of them as the testing dataset D test . Besides, 11 different values of SNR are designed to mimic different situations, which range from − 12 to 8 dB at interval in 2 dB. Actually, there are 110, 000 simulation signals provided for the subsequent training and testing processes in total.

Feasibility experiments
The feasibility and validity of the proposed TCNN-FCNN method will be verified by some experiments in this section. At first, to figure out whether the encoder module can extract representative features of input TFIs, we randomly choose a single TFI in each modulation type and correspondingly display several feature maps of them in Fig. 6. It illustrates that most intermediate feature maps generated by TCNN have high similarity to input TFIs. Therefore, using these features to identify different LPI radar signal modulations is totally enough. In further, it demonstrates that the TCNN encoder module is effective and convictive as well.
In order to show the difference between triplet loss and cross-entropy loss more intuitively, we employ t-SNE technology to visualize the distribution of 128-D feature vectors in 2-D space. The visualization of feature distribution is displayed in Fig. 7. In the condition of SNR = 8 dB , there are 200 samples of 10 LPI radar modulations shown in Fig. 7a, b, and each class has 20 samples. The parameters of t-SNE are set as follows: The perplexity is 30 and the number of iterations is 5000. Apparently, compared with Fig. 7a, the distribution of samples with the same label is highly aggregated and different labels are far from each other in Fig. 7b. It means that CNN trained with triplet loss is more discriminative than that trained with crossentropy loss. It proves that triplet loss is feasible in the recognition of LPI radar signal modulations as well.

Results and discussions
For discussing the performance of the proposed approach, several methods are compared in the following experiments. Figure 8 presents the relation curves between RSR and SNR of these methods. In the legend, TCNN-FCNN (red curve) represents our proposed method. CNN-FCNN (blue curve) has the same structure as TCNN-FCNN, except that cross-entropy loss is the only loss function adopted to update parameters of the CNN encoder and FCNN classifier. In addition, the other three different methods Lunden (dashed magenta curve) proposed in [5], Zhang (dotted green curve) proposed in [25], Guo (dash-dot cyan curve) proposed in [28] are involved. Figure 8 delivers some important messages. Firstly, compared with Lunden, TCNN-FCNN has strikingly advantage, meaning that those features designed in Lunten's method are not applicable to all signal classes. Namely, using TCNN encoder to extract features automatically is more reliable. Secondly, both TCNN-FCNN and CNN-FCNN are superior to Zhang. Considering TCNN-FCNN and CNN-FCNN have the same net structure, it implies that the structure of the model which we designed in this paper is proper and effective. Thirdly, Guo loses its advantage when SNR drops below − 4 dB, which means that the transferred net cannot perform well in lower SNR. Lastly, the gap between CNN-FCNN and TCNN-FCNN becomes wider and wider with the decrease in SNR from − 2 to − 10 dB, meaning that the strength of triplet loss is highlighted at lower SNR. To sum up, compared with other methods, the proposed TCNN-FCNN has better performance, especially in lower SNR.
Besides, some extra experiments are provided to make an in-depth analysis of the overall RSR shown in Fig. 8. Since Zhang and Lunten have poor performance, they are omitted in the following experiments. Table 2 adopts macro F 1 -score to evaluate the performance of TCNN-FCNN, CNN-FCNN and Guo. On the basis of Fig. 8, we focus on cases that SNR drops below 0 dB, because these three methods have almost same effect when SNR is higher than 0 dB. According to Table 2, the result of Guo becomes worse and worse from − 4 to − 12 dB. Therefore, considering the RSR and macro F 1 -score, method Guo is more suitable for the situation that SNR is higher than − 4 dB. While, the macro F 1 -score of TCNN-FCNN is over 0.9 when SNR is higher than − 10 dB. The gap grows wider between TCNN-FCNN and CNN-FCNN, especially at − 8 dB and − 10 dB. Concerning with this phenomenon, confusion matrices are displayed in Fig. 9 to investigate the classification details of TCNN-FCNN and CNN-FCNN.   Figure 9 shows the confusion matrices of TCNN-FCNN and CNN-FCNN at − 8 dB and − 10 dB. Since the discussed SNR is out of Guo's best range of application, we don't analyze it in the following experiments. In the light of Fig. 9, CNN-FCNN doesn't perform well on the recognition of BPSK at − 8 dB and − 10 dB. Moreover, it is completely invalid to T1 at − 10 dB. Most samples of T1 are treated as T3 and other classes, while for TCNN-FCNN, although the RSR is reduced in − 8 dB and − 10 dB, most signals can still be identified correctly. It means that TCNN-FCNN is effective for all classes, even when SNR is − 10 dB. It will be demonstrated more clearly by visualizing feature vectors with t-SNE technique in Fig. 10. Figure 10 not only explains the classification results in a more intuitive way, but also emphasizes the effectiveness of the triplet loss by the comparison between TCNN-FCNN and CNN-FCNN at − -10 dB. It depicts that most samples of T1 and T3 are mixed and difficult to distinguish in Fig. 10a, just like the result in Fig. 9c. Some samples of NS and BPSK are considered as a new cluster, which increases the probability of misjudgment. In contrast, boundaries between every two classes are clear in Fig. 10b, which means that most testing samples will be recognized correctly. It is acceptable that a few samples are in the wrong place considering the value of SNR. A little aliasing between LFM and Frank, T1 and T3 also verifies the recognition effect of themselves in Fig. 9d. On the other hand, features extracted by TCNN (Fig. 10b) have more within-class similarity and lesser between-class similarity, verifying that compared with cross-entropy loss, triplet loss performs better on optimizing parameters of the encoder module.
To sum up, TCNN-FCNN proposed in this paper has a strong discriminative ability even in a harsh environment with low SNR. Not only can it be proved by RSR and macro F 1 -score from the data perspective, but it is also verified in an intuitive way such as confusion matrix and t-SNE visualization.

Conclusion
An automatic recognition method named TCNN-FCNN is proposed to recognize 10 different modulations of LPI radar signals in this literature. Different from other existing related methods, more attentions are paid to the objective function of the optimization in the proposed method, which provides a new way for the recognition of LPI radar signal modulations. Simulation results show that the RSR is 0.94 at − 10 dB and almost always 1 when the SNR is greater than − 4 dB. It means the presented TCNN-FCNN method has remarkable performance in the recognition process, especially in the situation with low SNR. And it also proves that triplet loss has a better discriminative ability than cross-entropy loss, which can improve the classification performance in the recognition process of different LPI radar modulations, specifically in terrible circumstances. The success of LPI radar signal modulation recognition will make a better preparation for the following tracking, locating and interference. Therefore, the proposed method has vital application value in the electronic reconnaissance system.