Time-Domain Anti-Interference Method for Ship Radiated Noise Signal

Ship radiated noise signal is one of the important ways to detect and identify ships, and emission of interference noise to shield its own radiated noise signal is a common countermeasure. In this paper, we try to use the idea of signal enhancement to enhance the ship radiated noise signal with extremely low signal-to-noise ratio, so as to achieve anti-explosive signal interference. We propose a signal enhancement deep learning model to enhance the ship radiated noise signal by learning a mask in the temporal domain. Our approach is an encoder–decoder structure with U-net. U-net consists of 1d-conv with skip connection. In order to improve the learning ability of the model, we directly connect the U-net in series. In order to improve the learning ability of the model’s time series information. The Transformer attention mechanism is adopted to make the model have the ability to learn temporal information. We propose a combine Loss function for scale-invariant source-to-noise ratio and mean squared error in time-domain. Finally, we use the actual collected data to conduct experiments. It is verified that our algorithm can effectively improve the signal-to-noise ratio of the ship radiated noise signal to 2 dB under the extremely low signal-to-noise ratio of − 20 dB to − 25 dB.

interfering signals, a very heuristic idea is that if we can effectively enhance the target signal, we can eliminate the influence of interfering signals, and finally achieve antiinterference.Therefore, the problem of anti-interference of ship radiated noise signal can be transformed into a signal enhancement problem.This paper will focus on the signal enhancement research under the low signal-to-noise ratio of single-channel ship radiated noise signal.
The traditional signal enhancement methods mainly include: spectral subtraction, Wiener filtering, mask method and so on [1][2][3].According to the data in our hands and after testing and simulation, we found that the traditional signal enhancement method has a very poor effect on the enhancement of ship radiated noise signal in the case of extremely low signal-to-noise ratio.The specific simulation results are shown in the experiment.This paper will focus on signal enhancement methods based on data-driven deep learning methods.Due to the dependence of deep learning on massive data, current signal enhancement methods focus on the field of speech enhancement research [4][5][6][7][8][9][10][11][12][13].Among them, the single-channel time domain signal enhancement method based on deep learning is to establish the mapping relationship between the enhanced signal and the mixture signal through the deep learning model, directly realize the signal enhancement in the time domain [14][15][16][17][18][19][20][21][22][23].The method of using the time domain has certain advantages, avoiding the calculation of the mutual conversion of the time domain to the frequency domain.Moreover, through training, the model can have the ability to extract better signal features of the signal, and the features obtained through data-driven training may be more suitable for the current signal enhancement task.And the current time-frequency domain representation of the signal has the requirement of time resolution.The time domain method can avoid this problem.In theory, the signal enhancement method in the time domain can use any resolution.Recently, end-toend signal enhancement in time-domain has shown state-of-the-art results in a variety of signal enhancement tasks [24][25][26][27][28][29][30][31][32].WaveNet is uesed to deal with the text-to-speech problem through the CNN, which is also applied to time-domain signal enhancement [33,34].Conv-TasNet verifies and proposes the superiority of signal enhancement in the time domain [35].Dual-Path RNN can build a bidirectional RNN deep learning model.Realize the modeling and learning of long-sequence signals [36].In order to improve the learning ability of the model and effectively obtain sequence and local signal features, WaveCRN establishes a flexible signal enhancement strategy by combining CNN and LSTM [37].In addition to adopting different deep learning paradigms, there are some other deep Learning techniques are applied to deal with the time-domain signal enhancement problem.For example, various attention mechanisms are combined in the model, such as self-attention, transformer, etc [38,39].
In this paper, we propose a deep learning model for signal enhancement in the time domain, which is mainly aimed at the enhancement of ship radiated noise signal.Our method is inspired by Conv-TasNet.Our method is an encoder and decoder-based architecture with U-net.The U-net is established through 1d-conv, and combined with the transformer attention mechanism.In this paper, our enhanced target is ship radiated noise signal, and the interference signal is mainly explosion signal.The final experiments show that our method works well.The contributions of this paper mainly include two aspects.First, we propose the use of deep learning method for anti-interference of ship radiated noise signal for the first time, which verifies that it is feasible to use deep learning paradigm for signal enhancement of ship radiated noise signal.Secondly, our experiment sets the signal-to-noise ratio at extremely low level, our final experimental results show that the ship radiation noise signal can still be effectively enhanced.
The rest of this paper is organized as follows: Sect. 2 defines the problem of signal enhancement of ship radiated noise.Section 3 proposed method and related technical details.Section 4 outlines experimental results and analysis.Section 5 represent Conclusion.

Problem definition
In this paper, we mainly focus on ship radiated noise signal enhancement, its application scenarios are different from general speech enhancement.Here, we need to clarify our main problems.The ship radiated noise signal is s and the interference signal is n, then the interfered signal can be expressed as: where {y, s, n} ∈ R l * m , m is the number of sampling points of the signal.We hope to use the signal enhancement method to enhance the received signal with a very low signalto-noise ratio to reduce the influence of interference and obtain a signal ŷ with a higher signal-to-noise ratio from y.
where f θ is the parameters θ of the deep learning model.

Proposed method
The method proposed in this paper is inspired by Conv-TasNet, and adopts an end-toend signal enhancement strategy to build a 1dconv-Unet model based on the encoder decoder structure.Our idea is to frame and reduce the input signal through the input block, and then train the 1dconv-Unet model to obtain the mask of the signal [40], and multiply the mask by the time-domain signal vector to obtain the enhanced signal.After that, the signal is restored to the signal with the same time resolution as the input through the output block.The structure of the entire model is shown in Fig. 1.We will elaborate on it below.

Input block
The input signal x ∈ R 1 * m , m is the length of the signal.Generally, the input signal is chunked into overlapping frames which is processed as frame-level enhancement: s ∈ R T * L .Where T is the number of frames.L is the length of frames.After a 1d-conv layer, the frames are transformed into a vector representation of C channels.After passing through a nonlinear activation function: where H() is nonlinear activation function, here we use the PRelu() function. (1)

1d-Conv Unet block
This block is mainly composed of two modules: downsample block and upsample block.The multi-layer stacking of the downsample block realizes the dimension reduction transformation of the input vector.And upsample block realizes the up-dimension transformation of vector, and there will be skip connection in the conversion process of upsample.The interaction of information in different transformation processes at the same resolution is beneficial to improve model performance.
The main components of the Downsample block are shown in the Fig. 2.Among them, 1d-conv is mainly responsible for dimensionality reduction operations at different resolutions.In many other signal enhancement articles, in order to make the model have better temporal information learning ability, it is mostly through some deep learning paradigms such as LSTM and RNN.However, such the typical time series deep learning model will greatly increase the complexity and computational complexity of the model.Therefore, we choose the transformer as the attention mechanism here, so that the model has the ability to learn time series information [41].
In actual training, we found that the use of globe layer norm can greatly improve the performance of the model.( 4) where F ∈ R C * T is the vector of input, γ , β is learnable parameters.
The main components of the upsample block are shown in the Fig. 3.Among them, 1d-conv is mainly responsible for expanding the input vector at the channel level.It is then dilated at the temporal resolution by Pixel shuffle.Pixel shuffle is mainly used in the field of image super-resolution research [42].It is usually applied to 2-dimensional data.Here, we apply it to a 1-dimensional signal.The input vector v ∈ R T * C * K .For the r upsample rate, we first expand the vector at the channel level through 1d-conv v ′ ∈ R T * rC * K , and then expand at the temporal resolution level o ∈ R T * C * rK .The pixel shuffle operation process is shown in the Fig. 3.
In order to improve the learning ability of the model at different time resolutions.Here, we will use multiple Unet networks in series, and use different dilated 1d-conv convolutions.In actual training, it is found that setting different dilated parameters can significantly improve the performance of the model.We believe that under the condition that multiple Unet networks are connected in series, using different dilated parameters can extract time series information at different time resolutions.Greatly improves the performance of the model.

Output block
The Output block is to ensure that the time resolution of the signal is restored to the length of the original signal.The input v ∈ R T * C * K is converted to by 1d-transconv m ∈ R T * L , and finally converted to the same size as the input signal by the overlap-and-add method. (5)

Loss function
The model proposed in this paper adopts an end-to-end training strategy, and the target of model training is the scale-invariant source-to-noise ratio.This metric is often used to evaluate the performance of signal enhancement methods.It is defined as: where ŝ and s are the estimated and original clean sources.〈〉 is inner product operation.A mean squared error (MSE) loss in the time domain is defined as: where ŝ and s are same as (8).Finally, We combine the two loss functions: where α is a hyperparameter.The combination loss function we designed is divided into two parts, the first part is that SI-SNR is our task objective.We hope that the SI-SNR of the enhanced signal will be significantly improved.The second part is to want the signal output by our model to be as consistent as possible in the time domain.We found that the combination loss function can improve the effect of signal enhancement.

Data
Our ship radiated noise signal data comes from DeepShip [43].Our interference signal data were collected in the South China Sea.The explosion signals are simulated by periodic air guns ranging from seconds to minutes for scientific research.The sampling locations are located in two different sea areas, with a total length of 8 hours.During this period, there will be short-term long-distance ships passing through.Moreover, some of the noise signals have oil exploration platform operating noise.The waveform of the interference signal is shown in Fig. 4. When we use these airsoft signals.we split it into ten-second segment.When making the dataset, we clean the data to make sure that there is an airsoft signal in the segment, and then fuse it with the ship's radiated noise signal at − 20 dB to − 25 dB.Finally, it is fused with the radiation noise signal of the ship.The data set we finally formed has a total of 32,312 signals, and the sampling frequency is 8 KHz.
Two samples in our dataset are shown in the Fig. 4. The raw_sig in the figure is the ship radiated noise signal, the noi_sig is the air gun signal, and the mix_sig is the mixed signal, and the legend on the mixed signal shows the signal-to-noise ratio of the mixed signal.(8) (a) and (c) are the time-domain waveforms of the two samples, we can find that the two samples are seriously distorted after mixing with the noise signal.(b)(d) are the visualization images of the two samples after time-frequency transformation.We can find that the frequency range of ship radiated noise signal is mainly distributed 0-1200 Hz.Our airsoft noise signaling industry is mainly distributed in this frequency band too.After mixing, it can be seen that the ship's radiated noise signal has been submerged in the noise, which is very difficult to identify.

Experiment setting
Our model uses 10 layers of Unet-1dconv blocks, 5 layers of downsample blocks and 5 layers of upsample blocks.Cascading 5 Unet-1dconv blocks.The default batch is 1 during model training.The optimization algorithm is SGD, and the learning rate is 0.0001.We set 50 epches, and if the loss function 5 epochs did not drop, the training would stop.Our dataset has a total of 32,312 signals.We divide it into three parts, 50% is used as training set, 20% is used as validation set, and 30% is used as test set.The original length of the data is 80,000, and the length of each frame through slices is 4000 with the 50% overlap.We use scale-invariant source-to-noise ratio and source-to-noise ratio (SNR) as objective evaluation.Our experimental comparison methods are: Ideal Amplitude Mask (IAM), Ideal Ratio Mask (IRM), Ideal Binary Mask (IBM), Winner-Filter, Conv-TasNet.We will conduct multiple experiments and show the best results.We deploy our model using Pytorch.Using two NVIDIA 2080TI for training.The definition of SNR is as follows: where ŝ and s reference are estimated and reference signals

Experiment results
The objective evaluation results of our experiments are shown in the Table 1.We use SI-SNR and SNR for verification.We find that traditional methods such as IAM, IRM, IBM, Winner-Filter do not perform well in the task of ship radiated noise signal enhancement.And it can be found that the two evaluation are quite different.We believe that SI-SNR reduces the influence of signal strength on signal-to-noise through orthogonality.Part of the frequency components can be filtered out by winner filtering, but it has a lower signal gain, so it has the high SI-SNR and the low SNR.And vice versa for other mask (13)  Fig. 5 The output of two models methods.We can see that the data-driven deep learning method has crushing advantages in two objective evaluation.Our method outperforms the state-of-art Conv-Tas-Net on all of the objective evaluation.
Next we compare through the visualization of the signal.Since other traditional methods differ too much in objective evaluation criteria, we only compare Conv-TasNet.We visualize the signal after the enhancement of the two signal enhencement models.The signal time-domain waveform and time-frequency spectrum are drawn respectively.Its visualization results are shown in the Fig. 5.Where "mix_sig" is the input of the model and the signal to be enhanced."clean_sig" is the reference signal, "tas_est_sig" is the output of Conv-TasNet, and "our_est_sig" is the output of our method.By comparing (a) and (c), we find that our enhancement effect in the time domain is better than that of Conv-TasNet.By comparing (b) (d), we find that the enhancement effect is better in the low frequency band of 0-1000 Hz, and the line spectrum is clearer.

Conclusion and discussion
We propose a data-driven deep learning method for ship radiated noise signal enhancement.We directly enhance the signal in the time domain, eliminating the need for conversion between different domains.We build the encoder-decoder structure of the Unet network at different temporal resolutions.We introduce a transformer attention mechanism to enable our model to learn temporal information.We conduct experiments with actual collected data and verify that our method can effectively enhance the signal time in the case of extremely low signal-to-noise ratios of − 20 to − 25 dB.In the furture, we would like to propose suitable anti-reference methods for ship radiated noise signal for more interference types.

Table 1
Objective evaluation result