Skip to main content

VMD and self-attention mechanism-based Bi-LSTM model for fault detection of optical fiber composite submarine cables


As the main electrical equipment of offshore power grids, optical fiber composite submarine cables undertake the task of power transmission and data communication. In order to ensure the proper functioning of the submarine cable, it is necessary to analyze the working state of it and identify the fault event. This paper proposes a fault detection method for submarine cables, that is, the VMD and self-attention-based Bi-LSTM model. First, we use ANSYS software to generate the vibration waveforms of three main fault events of optical fiber composite submarine cables. Then, by generating the detection matrix of background noise and the vibration waveforms, it can realize the orientation and detection of fault events in single submarine cable. In addition, the vibration signal can be decomposed into IMF components using variational mode decomposition (VMD) for feature extraction. Moreover, the IMF components are input to the self-attention layer for feature fusion and Bi-LSTM module for further feature extraction. Finally, the result of the fault detection is output through the classification layer. According to the comparative experiment and the ablation experiment, the proposed model has proved to outperform the other benchmark models and is robust and stable under the condition of different signal-to-noise ratios.

1 Introduction

As the main electrical equipment of offshore power grids, the optical fiber composite submarine cables undertake dual functions of both the power transmission and data communication [1]. However, the impact of human activities and severe working environment will make the submarine cable suffer from strong currents, eventually leading to irreversible damage to the structure of submarine cables. In order to ensure the proper functioning of the optical fiber composite submarine cable, it is necessary to analyze the working state of it and identify the possible fault events. However, due to the long laying distance of the submarine cables on the seabed, conducting physical experiments on site consumes high labor and material overheads, and the experimental conditions are relatively harsh. Therefore, the acquisition of the fault data is generally obtained through software simulation.

Finite element analysis (FEA) [2] is a widely applied approach to model the complex system of submarine cables. In FEA, by using mathematical approximation, a finite number of known variables are used to approximate a real system containing an infinite number of unknown variables. The current capacity finite element software was used in [2] to perform the simulation analysis and derive the direct current (DC) resistance requirements of the submarine cable. To meet the practical needs of the project laying environment, the final design of the suitable conductor and structure size for the submarine cable was obtained. By combining the external impedance analysis method and the FEA, reference [3] proposed a hybrid method for calculating the impedance of a submarine cable. The hybrid method uses analytical expressions to calculate the external impedance of the submarine cable, which can reduce the computational effort of the finite element analysis and also overcome the limitations of the finite element software. In [4], FEA was employed to calculate the ground return impedance of a submarine power cable when inserted into a tri-media environment, i.e., air, seawater, and soil. The authors also compared the results with dual-media and single-media analytical formulations. For the deformation of submarine cables caused by the ship anchor, a three-dimensional dual nonlinear model was established in [5] using a FEA software called ABAQUS.

The existing research of submarine cables mainly focus on the applications for the engineering field [6,7,8]. However, the study on the fault detection of submarine cables is fairly scarce. When the fault event occurs in the surrounding environment, it will cause the vibration to the submarine cable, thus the vibration characteristic is of great significance to determine the type and the location of the fault events of the optical fiber composite submarine cables. Therefore, extracting robust and subtle features from vibration signals and studying the detection methods are of great help for the detection of fault events of the optical fiber composite submarine cables.

The processing of vibration signals mainly focuses on two aspects, i.e., feature extraction and classification. Due to the nonlinear and non-stationary characteristics of optical fiber vibration signals, the time-domain waveforms are relatively too rough to describe the characteristics of vibration signals thus it is hard to achieve accurate classification and detection. The transform domain method is widely applied to extract salient features from the time-domain signals. Short-time Fourier transform (STFT) is the most widely used signal processing method, in which the frequency information of the signal can be obtained [9]. However, STFT has insufficient processing ability for non-stationary signals. Based on the mathematical theory of STFT, J.Morlet proposed wavelet transform (WT), which is a local transformation of both the time and frequency domain. For abrupt signal, the effect of WT is better than that of STFT. The key to using WT is to find a suitable wavelet function. However, since the wavelet base needs to be selected manually, the accuracy of the analysis results depends on the selection of the wavelet function [10]. Another time-frequency processing method named empirical mode decomposition (EMD) can smoothen the non-stationary signal and obtain a series of components of stationary signals in different time scales [11]. However, there also exist some shortcomings in EMD, such as mode mixing and end-point effects. In order to overcome the above shortcomings, K. Dragomiretskiy et al. proposed variational mode decomposition (VMD) [12]. The essential of VMD is a plurality of adaptive Wiener filter arrays, which can decompose the signal and convert it into the variational decomposition mode. Therefore, VMD can realize adaptive segmentation of each component in the signal frequency domain and can effectively overcome the mode mixing generated in EMD decomposition.

As for the classification of the time sequence, the solutions are divided into traditional statistical methods, machine learning (ML) methods and deep learning (DL) methods. Commonly used traditional statistical methods such as regression analysis [13,14,15], cluster analysis [16,17,18], support vector machine (SVM) [19,20,21] and K-nearest neighbor (KNN) [22,23,24,25], etc., require manual participation in the design. Additionally, the classification efficiency of that is relatively low and the operational complexity is high. The typical ML methods include decision tree, case-based learning, genetic algorithm, etc. However, a lot of time and computing resource are required to extract features of the data in the feature engineering stage of ML, while DL does not need to manually extract features [26]. Driven by massive data and neural networks, DL can automatically perform high-dimensional abstract learning and the self-iterative update of the network. The recurrent neural network (RNN) is good at processing time-dependent sequence due to the recurrent structure with memory ability, realizing the information extraction through parameter sharing at different time [27]. However, RNN can only keep short-term memory due to the disappearance of gradient, thus Hochreiter and Schmidhuber [28] added a cell state to form the long short-term memory (LSTM) network. LSTM combines short-term memory with long-term memory through delicate gate control to avoid gradient disappearance. Although LSTM can process events with long intervals and delays in time sequences, they can only utilize historical data in the task. Bidirectional LSTM (Bi-LSTM) network is trained using both the past and future time sequences [29], thus the accuracy of processing context-related events is significantly improved than that of LSTM network.

In the field of deep learning, there are often large amounts of data to be received and processed by the models, but only a small amount of certain data contributes to the learning of the model. Attention mechanism is a special structure embedded in deep learning models to automatically learn and calculate the contribution of input data to output data [30]. The introduction of attention mechanism in feature engineering can enable the model to select effective features of appropriate scale, so that the model can effectively help to complete the task. Self-attention mechanism [31] starts from the internal correlation of data and can use the correlation to optimize the local expression of data, so as to achieve the purpose of feature optimization. Self-attention mechanism has been used for different types of tasks, such as natural language processing [32, 33] and multi-target tracking [34,35,36,37].

From the aforementioned analysis, we know that the detection of abnormal events is of great significance to the optical fiber composite submarine cables. In addition, the key to precise classification of the abnormal events is based on the feature determination and extraction. Therefore, in this paper, we put forward the VMD and self-attention mechanism-based Bi-LSTM model in order to conduct the fault detection of optical fiber composite submarine cables. The main contribution of this paper is threefold as follows:

  • Proposal of a fault detection method for optical fiber composite submarine cables based on the pseudo vibration signals. By generating the background noise arrays and fault signal added background noise arrays, the detection matrix is formed for fault detection. Through the threshold of signal-to-noise ratio (SNR), proper functioning states are filtered out and fault states can be detected for the following classification of fault events. Therefore, the specific spatial position of the fault state in the submarine cable can be located and the amount of data to be processed has been reduced greatly.

  • Proposal of the VMD and self-attention mechanism-based Bi-LSTM model for the classification of fault events of optical fiber composite submarine cables. At the stage of feature engineering, the VMD method is employed to decompose the waveform of the fault event into representative IMF components. Then, through the self-attention mechanism, the grouped IMF components are input into the Bi-LSTM modules for further feature extraction. In addition, shallower features and deeper features undergo different layers of Bi-LSTM modules and are fused together through pyramid-like feature fusion structure. Finally, all the IMF components finish the feature extraction and the classification result can be output.

  • Proposal of the pyramid-like feature fusion structure to integrate features of different depths. Compared to conventional feature fusion scheme of integrating all features at the same stage in the meantime, the proposed feature fusion structure can fully extract the hidden features carried on the IMF components in gradients. The ablation experiment has proved that the pyramid-like feature fusion structure is more stable and has higher detection accuracy than conventional feature fusion scheme.

The rest of this paper is organized as follows. Section 2 introduces the relevant techniques used in the proposed detection model and explains the methodology. Then, numerical results and discussions are presented in Sect. 3. Finally, Sect. 4 concludes the paper.

2 Methodology

In this section, the relevant techniques used in the proposed detection model are illustrated first. Based on those, we then propose the structure of VMD and self-attention mechanism-based Bi-LSTM model for the fault detection of optical fiber composite submarine cables.

2.1 VMD

VMD is an adaptive and completely non-recursive modal variation method, where the multi-component signal can be decomposed into multiple single-component amplitude-modulated-frequency-modulated (AM-FM) signals at one time, which is defined as the intrinsic mode functions (IMF) [12]:

Fig. 1
figure 1

The flowchart of VMD

$$\begin{aligned} u_{k}(t)=A_{k}(t) \cos \left( \phi _{k}(t)\right) \end{aligned}$$

where t is time, \(A_{k}(t)\ge 0\) is the envelope and \(\phi _{k}(t)\) is the instantaneous phase. The core idea of VMD is to seek K IMFs \(u_{k}\) \((k\in [1,K])\) and their respective central frequencies \(\omega _k\) so that these IMFs together can reproduce the original input signal f(t). Therefore, the solution of the VMD problem can be correspondingly transformed into the construction and solution of the variational problem. The specific solution process is as follows:

  1. (1)

    The analysis signal of \(u_{k}\) and its single-sided spectrum are obtained by Hilbert transform. Then, through multiplying with the operator \(\textrm{e}^{-j w_{k} t}\),the analysis signal is demodulated to the baseband that

    $$\begin{aligned} \left[ \left( \delta (t)+\frac{j}{\pi t}\right) * u_{k}(t)\right] \textrm{e}^{-j w_{k} t}. \end{aligned}$$
  2. (2)

    Calculate the 2-Norm of the demodulated gradient of the demodulated signal and obtain the bandwidth of each IMF. The variational constraint model is shown by

    $$\begin{aligned} \left\{ \begin{array}{l} \min {\left\{ \sum _{k=1}^{K}\left\| \partial _{\textrm{t}}\left[ \left( \delta (t)+\frac{j}{\pi t}\right) * u_{k}(t)\right] \textrm{e}^{-j w_{k} t}\right\| _2^{2}\right\} } \\ \text{ s.t. } \quad \sum _{k=1}^K u_{k}=f(t). \end{array}\right. \end{aligned}$$
  3. (3)

    In order to find the optimal solution of the constrained variational problem, the Lagrange multiplier \(\uplambda (t)\) and the second-order penalty factor \(\alpha\) are introduced to transform the constrained variational problem into an unconstrained variational problem. The Lagrange multiplier \(\uplambda (t)\) is to guarantee the strictness of the constraints and the second-order penalty factor \(\alpha\) can ensure the accuracy of signal reconstruction in the Gaussian noise environment. The extended Lagrangian expression is as follows:

    $$\begin{aligned} \begin{aligned} \textrm{L}\left( \left\{ u_{k}\right\} ,\left\{ \omega _{k}\right\} , \uplambda (t) \right)&= \alpha \sum _{k=1}^k\left\| \partial _{\textrm{t}}\left[ \left( \delta (t)+\frac{j}{\pi t}\right) * u_{k}(t)\right] \textrm{e}^{-j \omega _{k} t}\right\| _2^{2} +\left\| f(t)- u_{k}(t)\right\| _2^{2} \\&\quad +\left\langle \uplambda ({t}), f(t)-\sum _{k=1}^K u_{k}(t)\right\rangle \end{aligned} \end{aligned}$$
  4. (4)

    The alternating direction method of multipliers (ADMM) is used to update each IMF component and its central frequency. Finally, the saddle point of the unconstrained model is obtained, that is, the optimal solution of the original problem. The IMF components and the central frequency of the solution are shown as

    $$\begin{aligned} \begin{aligned} {\hat{u}}_{k}^{n+1}(\omega )&=\frac{{\hat{s}}(\omega )-\sum _{i \ne k} {\hat{u}}_{i}(\omega )+({\hat{\uplambda }} (\omega ) / 2)}{1+2 \alpha \left( \omega -\omega _{k}\right) ^{2}} \\ \omega _{k}^{n+1}&=\frac{\int _{0}^{\infty } \omega \left| {\hat{u}}_{k}(\omega )\right| ^{2} d \omega }{\int _{0}^{\infty }\left| {\hat{u}}_{k}(\omega )\right| ^{2} d \omega } \end{aligned} \end{aligned}$$

    where \({\hat{u}}_{k}^{n+1}(\omega )\), \({\hat{s}}(\omega )\) and \({\hat{\uplambda }} (\omega )\) are the Fourier transform forms of \(u_{k}^{n}(t)\), s(t) and \(\uplambda (t)\), respectively, and \({\hat{u}}_{k}^{n+1}(\omega )\) is the residual of \(\hat{\textrm{s}}(\omega )-\sum _{i \ne k} {\hat{u}}_{i}(\omega )\) after Wiener filtering. The procedure for solving each IMF mode is shown in Fig. 1.

2.2 Classification method

In order to process with the time-dependent sequence, Bi-LSTM unit is adopted in our proposed model. Since Bi-LSTM unit is based on the structure of LSTM unit, we first give the brief introduction of LSTM and then illustrate the structure and principle of Bi-LSTM.

2.2.1 LSTM

Fig. 2
figure 2

The cell structure of LSTM

On the basis of the recurrent structure of RNN, LSTM introduces the gated mechanism to control the circulation and oblivion of features. The structure of LSTM unit is presented in Fig. 2, consisting of a memory cell and three control gates, i.e., the input gate, forget gate and output gate. The main function of the input and output gates is to control the flow of the memory cell’s input and output to other parts of the network. In addition, the forget gate can pass the output information with high weights from the previous neuron to the next neuron. The information held in the memory cell depends on the result of high activation. If the input cell is activated high, the information can be stored in the memory cell. Also, if the output unit has high activation, then it will pass the information to the next neuron, otherwise, the input information with high weight will reside in the memory cell.

  1. (1)

    The forget gate determines whether the information is discarded from the unit. After the current input vector \({\varvec{X}}_t\) and the output of the hidden layer at the previous moment \({\varvec{h}}_t\) pass through the forget gate, a value between 0 and 1 is output by the sigmoid activation function \(\sigma (x)=\frac{1}{1+e^{-x}}\):

    $$\begin{aligned} {\varvec{f}}_{t}=\sigma \left( {\varvec{W}}_{f} \cdot \left[ {\varvec{h}}_{t-1}, {\varvec{X}}_{t}\right] +b_{f}\right) \end{aligned}$$

    where \({\varvec{W}}_{f}\) and \({\varvec{b}}_{f}\) are the recurrent weight and bias of the forget gate, respectively.

  2. (2)

    The input gate determines what new information is stored in the unit. The information is first processed by the sigmoid function to decide whether it is to be updated. Then, the layer with activation function \(\tanh x=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}\) creates a new vector of candidate values \(\varvec{{\tilde{C}}}_t\) to add to the cell state:

    $$\begin{aligned} \begin{aligned} {\varvec{i}}_{t}&=\sigma \left( {\varvec{W}}_{i} \cdot \left[ {\varvec{h}}_{t-1}, {\varvec{X}}_{t}\right] +{\varvec{b}}_{i}\right) \\ \varvec{{\tilde{C}}}_{t}&=\tanh \left( {\varvec{W}}_{C} \cdot \left[ {\varvec{h}}_{t-1}, {\varvec{X}}_{t}\right] +{\varvec{b}}_{C}\right) \end{aligned} \end{aligned}$$

    where \({\varvec{W}}_{i}\) and \({\varvec{W}}_{C}\) are the recurrent weights, and \({\varvec{b}}_{i}\) and \({\varvec{b}}_{C}\) are the bias of the input gate, respectively.

  3. (3)

    Next, the old state \({\varvec{C}}_{t-1}\) is multiplied by \({\varvec{f}}_t\) to forget the information that is to be discarded and then adds \({\varvec{i}}_{t} * \varvec{{\tilde{C}}}_{t}\) to compose the new candidate value.

    $$\begin{aligned} {\varvec{C}}_{t}={\varvec{f}}_{t} * {\varvec{C}}_{t-1}+{\varvec{i}}_{t} * \varvec{{\tilde{C}}}_{t} \end{aligned}$$
  4. (4)

    Finally, the output information is elicited based on the unit state. The first step is to pass through a sigmoid layer to determine whether to output. Next, the output of the sigmoid layer is to multiply the result from tanh function to decide the final output.

    $$\begin{aligned} \begin{aligned} {\varvec{o}}_{t}&=\sigma \left( {\varvec{W}}_{o}\left[ {\varvec{h}}_{t-1}, {\varvec{X}}_{t}\right] +{\varvec{b}}_{o}\right) \\ {\varvec{h}}_{t}&={\varvec{o}}_{t} * \tanh \left( {\varvec{C}}_{t}\right) \end{aligned} \end{aligned}$$

    where \({\varvec{W}}_{o}\) and \({\varvec{b}}_{o}\) are the recurrent weight and bias of the output gate, respectively.

2.2.2 Bi-LSTM

Fig. 3
figure 3

The network structure of Bi-LSTM

Compared to RNN, the performance of LSTM has improved a lot. However, there still exist some limitations of the LSTM unit, the most prominent one is that it only employs the previous information but not takes the future one into consideration. In practical scenarios, it may require the utilization of information from the entire input sequence. Bi-LSTM can overcome the restrictions of LSTM, which consists of two different LSTM hidden layers with opposite output directions. Based on the LSTM unit structure, the information of the input sequence in both forward and backward directions is combined in Bi-LSTM. For the output at time t, the forward LSTM layer has the information before time t in the input sequence, and the backward LSTM layer has the information after time t in the input sequence. Under this structure, both previous and future information can be exploited in the output layer, thus Bi-LSTM outperforms LSTM in numerous practical tasks. The network structure of the Bi-LSTM unit is shown in Fig. 3 and the internal structure of the network can be expressed as

$$\begin{aligned} \begin{aligned} \varvec{\overrightarrow{h} }&=f\left( \varvec{\overrightarrow{W}}\left[ \left( \varvec{\overrightarrow{h} }\right) ^{p r e}, {\varvec{X}}\right] +\varvec{\overrightarrow{b} }\right) \\ \varvec{\overleftarrow{h} }&=f\left( \varvec{\overleftarrow{W} }\left[ \left( \varvec{\overleftarrow{h} }\right) ^{\text{ next } }, {\varvec{X}}\right] +\varvec{\overleftarrow{b} }\right) \\ {\varvec{Y}}&=F\left( \varvec{\overleftarrow{W}}_{{\varvec{Y}}}\left[ \varvec{\overrightarrow{h}}, \varvec{\overleftarrow{h} }\right] +{\varvec{b}}_{{\varvec{Y}}}\right) \end{aligned} \end{aligned}$$

where X and Y are the input and output sequence, and \(\varvec{\overrightarrow{h} }\) and \(\varvec{\overleftarrow{h} }\) are the output vector of the forward and the backward LSTM units, respectively. \(f(\cdot )\) denotes the operations in Eqs. (6)–(9) and \(F(\cdot )\) represents the output function. \((\varvec{\overrightarrow{h} })^{pre}\) is the output of the previous memory cell in the forward LSTM unit and \((\varvec{\overleftarrow{h} })^{\text{ next }}\) is the output of the next memory cell in the backward LSTM unit. \(\varvec{\overrightarrow{W}}\) and \(\varvec{\overrightarrow{b} }\) are the weight and bias of the forward LSTM unit, and \(\varvec{\overleftarrow{W} }\) and \(\varvec{\overleftarrow{b} }\) are the weight and bias of the backward LSTM unit, respectively.

2.3 Self-attention mechanism

The gated control mechanism in LSTM cell can mitigate the short-term memory issue introduced by RNN to some extent, however, when dealing with long time-dependent sequence, the improvement brought by LSTM is limited. When too much information needs to be remembered, the model will become more complex, but the current computing power is still the bottleneck limiting the development of neural networks. By introducing attention mechanism, the focus of the learning process is laid on the information that is more critical to the current task among all the input, thus the attention to other information is cut down and the irrelevant information can be filtered out. The self-attention mechanism is a variant of the attention mechanism, which reduces the dependence on external information and is better at capturing the internal correlations of data or features. The core idea of the self-attention mechanism is to capture the correlation between vectors, which is realized by the dot product operation. The procedure of the self-attention mechanism is shown in Fig. 4.

Fig. 4
figure 4

Self-attention mechanism

  1. (1)

    Calculate the matrices Q, K and V. Through linear transformation, the input matrix I is multiplied by three transformation matrices \(W^{q}\), \(W^{k}\) and \(W^{v},\) respectively, which can improve the fitting ability of the model. The obtained matrices Q, K and V are named as the query matrix, key matrix and value matrix, respectively. The transformation matrices can play a buffering role through training.

    $$\begin{aligned} \begin{array}{l} Q=W^{q} I \\ K=W^{k} I \\ V=W^{v} I \end{array} \end{aligned}$$
  2. (2)

    Solve the attention score matrix \(A'\). Do a dot product between the key matrix K and the query matrix Q to obtain the correlation value A between the input vectors. Then, the correlation value A is normalized by a softmax layer to obtain the output normalized correlation value \(A^{\prime }\), namely the attention score.

    $$\begin{aligned} \begin{array}{c} A=K^{T} Q \\ A^{\prime }={\text {softmax}}(A) \end{array} \end{aligned}$$
  3. (3)

    Compute the output matrix O. The attention score matrix \(A^{\prime }\) denotes the weight value of the input vectors. The larger the weight value is, the more the corresponding input vector participates in the calculation, and the final output vector is more similar to the input vector.

    $$\begin{aligned} O=V A^{\prime } \end{aligned}$$

2.4 The VMD and attention mechanism-based Bi-LSTM model

Fig. 5
figure 5

The network structure of the proposed model

Optical fiber composite submarine cables are laid in sediments at a depth of 1–2 ms below the seabed. On the one hand, it is affected by human factors such as ship navigation, marine operations, and anchor damage. On the other hand, the uneven seabed gully is affected by natural disasters such as typhoons and tsunamis. By referring to the literature [38,39,40,41], we determine three typical fault events of optical fiber composite submarine cables for the detection, i.e., the anchor hit, rock friction and ocean scour:

  • Anchor hit: This kind of fault event accounts for a high proportion of the total optical fiber composite submarine cable faults, which mainly occurs when the anchor falls and hits the optical fiber composite submarine cable. The speed and weight of the anchor will cause the optical fiber composite submarine cable to deform or malfunction.

  • Rock friction: The scour of the ocean waves will generate continuous vibrations to the optical fiber composite submarine cable, which will cause mechanical fault of the optical fiber composite submarine cable. In the long run, it will threaten the normal operation of the optical fiber composite submarine cable.

  • Ocean scour: Since the optical fiber composite submarine cable is laid on the seabed, under the long-term scour of the ocean, the optical fiber composite submarine cable will also have a certain degree of mechanical and electrical damage.

The vibratory characteristic is manifested as a representative feature to distinguish different working status of optical fiber composite submarine cables, that is, proper functioning state or fault state. Then, referring to [42], ANSYS simulation software is adopted to establish the finite element models in marine environment of the submarine cable as the type of 110 kV XLPE. So the laboratory vibration platform is built to obtain the vibration waveforms of the three typical fault events.

The distribution and range of background noise in the marine refer to [43]. By randomly adding the vibration signal of the three events to the background noise, the vibration state of the optical fiber composite submarine cable when encountering fault events can be simulated, i.e., the pseudo-original signals. We assume that during the sampling time of t, one entire optical fiber composite submarine cable at the length of l is divided into n segments of equal length, and each segment is randomly generated a background noise array or a vibration signal added noise array to simulate the proper functioning state or fault state of the optical fiber composite submarine cable, respectively. Therefore, the waveform can be quantified by the detection matrix \(D_{n \times t}\), where each row vector demonstrates the spatial position of different segments, while the column is the sampling time. By setting the threshold value of the signal-to-noise ratio (SNR), we can locate the row vectors that may contain fault vibration events from the detection matrix, corresponding to a certain segment of the optical fiber composite submarine cable. Therefore, by the detection and orientation of fault working state, the proper functioning state is filtered out and the following classification focus is laid on the fault events, which decreases the computational complexity for the detection.

Then, the located row vectors are decomposed into the IMF components with VMD, respectively. The IMF components represent each frequency component of the pseudo-original signal and are arranged in order from high frequency to low frequency [12]. The high frequency components are on behalf of the details of the original data, while the low frequency components represent the information of slow changes, which means the outline and approximate information of the original data [44]. In order for better feature extraction, all IMF components \({\varvec{{\mathcal {K}}}}\) are divided into three feature groups, i.e., deep feature group \({\varvec{\mathcal {\kappa }_1}}\), middle feature group \({\varvec{\mathcal {\kappa }_2}}\) and shallow feature group \({\varvec{\mathcal {\kappa }_3}}\) \(({\varvec{{\mathcal {K}}}}=\{{\varvec{\mathcal {\kappa }_1}},{\varvec{\mathcal {\kappa }_2}},{\varvec{\mathcal {\kappa }_3}}\})\). In each feature group, the IMF components are input into the self-attention layer and the output of it can focus more on the key information of the input IMF components. Consisting of a Bi-LSTM unit and a tanh layer as the activation function, the Bi-LSTM module can extract deeper features for further classification. Then, grouped IMF components are integrated with each other by the pyramid-like structure to fuse the extracted features. Note that, different feature groups will go through different network layers. It is because by deepening the layers of the network, the expression ability of the model can be enhanced, so deep features need deep networks for detailed information extraction.

Finally, by the fully connected layer and the softmax layer, the classification result is obtained and the network structure is illustrated in Fig. 5.

3 Results and discussion

3.1 Data

Fig. 6
figure 6

The vibration waveforms of the fault events

Fig. 7
figure 7

The waveforms of pseudo-original signals

In order to obtain the pseudo-original waveforms of the three fault events, ANSYS simulation software is adopted and the vibration waveforms of them are presented in Fig. 6. Next, we perform normalization for data preprocessing to the amplitude of three vibration waveforms, scaling to \([-1,1]\). For the following processing, the duration of the three types of signals and the sampling interval should be kept the same. However, the duration of these signals varies, where the energy of the anchor hit has been completely released within 0.05 s, the duration of the scour is about 60–100 s and the friction lasts for about 0.5 s. In order to coordinate the durations, the sampling interval is set to 0.0002 s and the sampling time is 2 s, obtaining a total of 10,000 sample points. For clarity of exposition, the optical fiber composite submarine cable is assumed to be the length of 100 km and is divided into 5000 segments. Taking the signal condition, noise distribution and the sensitivity of detection equipment into consideration [45], the SNR value in the experiment is set within \([-15,15]\) dB, and we take the SNR of 10 dB for simulation to examine the classification effect of the proposed model. Figure 7 presents the waveforms of pseudo original signals. Then, the detection matrix \(D_{5000 \times 10000}\) is obtained through quantification which consists of 2000 rows for proper working segments and 3000 rows for fault events, where each typical fault event contains 1000 rows of data. Through the SNR threshold, the data of fault events with the size of 3000 \(\times\) 10,000 can be obtained.

Fig. 8
figure 8

The VMD results

In this section, numerical results are presented to verify the classification effect of the proposed methodology. The experiment contains and the comparative experiment and the ablation experiment. Several related benchmark neural network models are adopted in the comparative experiment, including LSTM, RNN and the back propagation (BP) model. BP neural network is a multilayer fully connected network, which is trained according to the error back propagation algorithm, and is one of the most widely used neural network models. Note that, in the comparative experiment, all the benchmark models are employed in the same network structure as the proposed model, i.e., the input data are processed with VMD and pass the self-attention layer. In the ablation experiment, modules of self-attention mechanism and pyramid-like structure for feature fusion are explored.

3.2 VMD for feature engineering

Through the VMD transformation, all the row vectors are converted to IMF features. The key point of the algorithm is to select the appropriate modal number K, the penalty factor \(\alpha\) and the fidelity coefficient \(\tau\). If the value of K is too large, it will cause modal repetition and noise, and if K is too small, the pre-modal decomposition will occur. The main difference between different modes is due to the selection of different center frequencies. Therefore, appropriate mode values are determined by observing the distributions of center frequency under different mode numbers. Based on the observation of the center frequency, the value of K increases from 1 until the last IMF component maintains to be a relatively stable center frequency then the value of K is considered the optimal one. In addition, to ensure the fidelity after decomposition, the parameters are set as \(K=12\), \(\alpha = 3000\) and \(\tau = 0.2\). The VMD results of three fault event samples are shown in Fig. 8.

3.3 Classification results

Fig. 9
figure 9

Confusion matrix

Fig. 10
figure 10

The prediction accuracy of the experiment

Table 1 The parameters of the ablation experiment

In order to ensure the results of different models to be comparable, all the experiments are conducted in the same environment. The pseudo-original data are divided as the training set, validation set and the test set at the ratio of 6 : 2 : 2. The experiment environment is in Windows 11 64-bit operating system with 12th Gen Intel(R) Core(TM) i5-12400F 2.50 GHz processor. The program is running on the MATLAB software with the adam solver. The initial learning rate is set to 0.01, decreasing by 75% every 10 epochs. The 12 IMF components are divided into three feature groups, i.e., IMF1 to IMF2, IMF3 to IMF7, and IMF8 to IMF12.

The classification results when SNR = 10 dB are demonstrated by the confusion matrix in Fig. 9, in which the detection accuracy in self-attention-based Bi-LSTM models reaches \(99.17\%\) while in self-attention-based LSTM, self-attention-based RNN and self-attention-based BP are \(96.67\%\), \(95\%\) and \(92\%\), respectively. In order to test the robustness of the proposed self-attention mechanism-based Bi-LSTM model, we also conduct the comparative experiment at different SNR conditions of the original data, and the results are shown in Fig. 10. The detection accuracy of the models all shows an upward tendency with SNR, since the classification result is more precise with less interference of the background noise. It is clear that the network using Bi-LSTM module outperforms those of benchmark models in wide range of SNR, followed by LSTM, RNN and BP in sequence. This is due to the fact that the bidirectional recurrent structure can take both the past and future data into training, thus the feature of the time-dependent sequence can be fully learned by the network. In addition, it can reach the detection accuracy of \(99.33\%\) when SNR = 15 dB and still remains the accuracy of \(92.67\%\) even when SNR = \(-15\) dB. Although the benchmark models can reach the accuracy over \(90\%\) when SNR is enough high, once the SNR condition is severe, the detection accuracy will sharply decline below \(90\%\). Therefore, the Bi-LSTM module has proved to be robust for the processing of the time-dependent sequence. By introducing the gated control unit, the Bi-LSTM and LSTM perform better than RNN model, which has testified the effectiveness of LSTM cell. Obviously, the accuracy of BP model performs the worst among all the benchmark models, thus the recurrent neural network is proved to be more effective than the fully connected network to process with the time-dependent sequence.

As for the ablation experiment, we test the effectiveness of two modules, i.e., self-attention mechanism and pyramid-like structure for feature fusion, and the parameters and notations are listed in Table 1. Note that, in the fully feature fusion scheme, all the IMF components are fused at the input stage in the meantime and then pass four successive layers of Bi-LSTM modules. Figure 10b demonstrates the result of the ablation experiment in different SNR conditions. It can be seen that the proposed model with self-attention-based Bi-LSTM and pyramid-like structure for feature fusion shows a slight fluctuation in wide range of SNR, while other models fluctuate sharply when SNR changes. Therefore, the proposed model is more stable compared to other network structures. From the results, it is obvious that adopting the attention mechanism is of great help for the detection since adopting attention mechanism into the classification network can help the network focus on the crucial feature and pay less attention on the irrelevant information. Additionally, the effect of self-attention mechanism is superior to that of attention mechanism. However, the prominent performance comes at the cost of computational complexity. Moreover, the detection accuracy of pyramid-like structure for feature fusion is better than fully feature fusion. Since employing the pyramid-like structure, features of various depths can be fused and learned by the network in different gradients, the network can fully extract the hidden features carried on the IMF components. Therefore, the ablation experiment has proved the effectiveness of self-attention mechanism and pyramid-like structure for feature fusion, which can have about \(20\%\) improvement in accuracy when SNR is severe and about \(13\%\) improvement under higher SNR compared to fully feature fusion-based Bi-LSTM model without attention mechanism.

4 Conclusions

As for the fault detection of optical fiber composite submarine cables, this paper proposes a VMD and self-attention mechanism-based Bi-LSTM model. Through the simulation of ANSYS software, the vibration waveforms of typical optical fiber composite submarine cable fault events are obtained. To locate and detect the fault events of submarine cables, we then generate the detection matrix of the background noise in the marine and the pseudo-original vibration signals of three typical fault events. By adopting VMD to the pseudo-original vibration signal, it can be decomposed into the IMF components on behalf of different extents of features. With the self-attention mechanism and pyramid-like feature fusion structure, the IMF components are fused into different layers of the Bi-LSTM module and finally the result of fault detection is output through the classification layer. The results show that the proposed model can achieve the prediction accuracy of \(99.93\%\) when SNR = 15 dB and still remain \(92.67\%\) when SNR = \(-15\) dB. Comprehensive experiments have proved that the proposed model outperforms other benchmark network models and is robust and stable under the condition of different SNRs.


  1. P. Ma, K. Liu, J. Jiang, Z. Li, P. Li, T. Liu, Probabilistic event discrimination algorithm for fiber optic perimeter security systems. J. Lightwave Technol. 36(11), 2069–2075 (2018)

    Article  Google Scholar 

  2. W. Mei, W. Pan, T. Chen, G. Song, J. Di, Research and design of dc500kv optical fiber composite submarine cable, in 2017 4th IEEE International Conference on Engineering Technologies and Applied Sciences (ICETAS) (IEEE, 2017), pp. 1–6

  3. A. Furlan, M.L. Heldwein, Hybrid method to compute the total series impedance of submarine power cables, in IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society, vol. 1 (IEEE, 2019), pp. 2178–2183

  4. R.B.P. Chagas, A. Furlan, M.L. Heldwein, Finite element method analysis of a three-media submarine cable ground return impedance at varying depth, in 2020 IEEE 21st Workshop on Control and Modeling for Power Electronics (COMPEL) (2020), pp. 1–7.

  5. T. Zhang, A. Du, L. Li, R. Wei, H. Tan, K. Wang, A. Abu-Siada, Z. Li, Analysis of three-core composite submarine cable damage due to ship anchor. IEEE Access 10, 93910–93920 (2022)

    Article  Google Scholar 

  6. H. Matsumoto, E. Araki, T. Kimura, G. Fujie, K. Shiraishi, T. Tonegawa, K. Obana, R. Arai, Y. Kaiho, Y. Nakamura et al., Detection of hydroacoustic signals on a fiber-optic submarine cable. Sci. Rep. 11(1), 1–12 (2021)

    Article  Google Scholar 

  7. B.M.T. Fouda, B. Yang, D. Han, B. An, Pattern recognition of optical fiber vibration signal of the submarine cable for its safety. IEEE Sens. J. 21(5), 6510–6519 (2020)

    Article  Google Scholar 

  8. H. Katsuta, A. Oda, T. Etou, N. Fukushima, Y. Fujii, Y. Shirasaki, Development of new cable probe for localizing deep buried submarine telecommunication and power cables, in 2018 OCEANS—MTS/IEEE Kobe Techno-Oceans (OTO) (2018), pp. 1–5.

  9. X. Wang, T. Ying, W. Tian, Spectrum representation based on STFT, in 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (2020), pp. 435–438.

  10. K.N. Chaudhury, M. Unser, Construction of Hilbert transform pairs of wavelet bases and Gabor-like transforms. IEEE Trans. Signal Process. 57(9), 3411–3425 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  11. A. Vijayasankar, P.R. Kumar, Correction of blink artifacts from single channel EEG by EMD-IMF thresholding, in 2018 Conference on Signal Processing and Communication Engineering Systems (SPACES) (2018), pp. 176–180.

  12. K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  13. Y. Huang, W.W. Liu, Regression analysis model based on data processing and Matlab numerical simulation, in 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC) (2022), pp. 1115–1118.

  14. C. Qiu, S. Chen, Y. Liang, Acoustic analysis of age-related variability of Guangzhou Cantonese tone merging based on the linear regression, in 2021 2nd International Conference on Education, Knowledge and Information Management (ICEKIM) (2021), pp. 914–918.

  15. W. Jiao, Research of relationship between corporate governance and financial performance based on multiple regression analysis method, in 2020 Management Science Informatization and Economic Innovation Development Conference (MSIEID) (2020), pp. 494–497.

  16. S. Wang, X. Cai, Literature analysis of Xi Jinping’s important expositions on ideological and political education based on cluster analysis, in 2021 2nd International Conference on Education, Knowledge and Information Management (ICEKIM) (2021), pp. 792–796.

  17. J. Liu, Y. Ye, Y. Du, Text feature extraction and clustering analysis of events caused by the cockpit crew, in 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT) (2020), pp. 1018–1022.

  18. S. Das, B. Saket, B.C. Kwon, A. Endert, Geono-cluster: interactive visual cluster analysis for biologists. IEEE Trans. Vis. Comput. Graph. 27(12), 4401–4412 (2021).

    Article  Google Scholar 

  19. A.A. Kabanov, Application of support vector machines to the multiclass classification electromyography signal patterns, in 2021 XV International Scientific-Technical Conference on Actual Problems of Electronic Instrument Engineering (APEIE) (2021), pp. 92–95.

  20. L. Zeyang, Research on intelligent acceleration algorithm for big data mining in communication network based on support vector machine, in 2021 IEEE 4th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE) (2021), pp. 479–483.

  21. F. Borges, A. Pinto, D. Ribeiro, T. Barbosa, D. Pereira, R. Magalháes, B. Barbosa, D. Ferreira, An unsupervised method based on support vector machines and higher-order statistics for mechanical faults detection. IEEE Lat. Am. Trans. 18(06), 1093–1101 (2020).

    Article  Google Scholar 

  22. Y. Sari, M. Maulida, E. Gunawan, J. Wahyudi, Artificial intelligence approach for Baznas website using k-nearest neighbor (knn), in 2021 Sixth International Conference on Informatics and Computing (ICIC) (2021), pp. 1–4.

  23. A. Kumar, A. Verma, G. Shinde, Y. Sukhdeve, N. Lal, Crime prediction using k-nearest neighboring algorithm, in 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE) (2020), pp. 1–4.

  24. B. Rahman, H.L. Hendric Spits Warnars, B. Subirosa Sabarguna, W. Budiharto, Heart disease classification model using k-nearest neighbor algorithm, in 2021 Sixth International Conference on Informatics and Computing (ICIC) (2021), pp. 1–4.

  25. X. Wang, H. Jiang, B. Yang, A k-nearest neighbor medoid-based outlier detection algorithm, in 2021 International Conference on Communications, Information System and Computer Engineering (CISCE) (2021), pp. 601–605.

  26. C.-Y. Huang-Fu, C.-H. Liao, J.-Y. Wu, Comparing the performance of machine learning and deep learning algorithms classifying messages in facebook learning group, in 2021 International Conference on Advanced Learning Technologies (ICALT) (2021), pp. 347–349.

  27. D.-J. Choi, J.-H. Han, S.-U. Park, S.-K. Hong, Comparison of motor fault diagnosis performance using rnn and k-means for data with disturbance, in 2020 20th International Conference on Control, Automation and Systems (ICCAS) (2020), pp. 443–446.

  28. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).

    Article  Google Scholar 

  29. S. Wei, P. Jiang, Q. Yuan, J. Wang, Mobile application network behavior detection and evaluation with WGAN and Bi-LSTM, in TENCON 2018—2018 IEEE Region 10 Conference (2018), pp. 0044–0049.

  30. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems, vol. 30 (2017)

  31. M.-H. Guo, Z.-N. Liu, T.-J. Mu, S.-M. Hu, Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2022)

  32. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

  33. T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  34. Z. Cheng, C. Yan, F.-X. Wu, J. Wang, Drug-target interaction prediction using multi-head self-attention and graph attention network. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(4), 2208–2218 (2021)

    Article  Google Scholar 

  35. D. Zhang, Z. Zheng, M. Li, R. Liu, CSART: channel and spatial attention-guided residual learning for real-time object tracking. Neurocomputing 436, 260–272 (2021)

    Article  Google Scholar 

  36. T. Ren, H. Xu, G. Jiang, M. Yu, X. Zhang, B. Wang, T. Luo, Reinforced swin-convs transformer for simultaneous underwater sensing scene image enhancement and super-resolution. IEEE Trans. Geosci. Remote Sens. 60, 1–16 (2022)

    Google Scholar 

  37. F. Yang, H. Yang, J. Fu, H. Lu, B. Guo, Learning texture transformer network for image super-resolution, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5791–5800 (2020)

  38. X. Huang, Q. Guo, C. Cai, Y. Chen, X. Ou, C. He, G. Ji, Fault diagnosis and location for long submarine cable based on frequency domain refection, in 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2) (2020), pp. 4047–4050.

  39. G. O’Reilly, J. Kearney, J. Lawler, T. Looby, Design of an underwater cable fault location device to precisely locate submarine cable faults, in 2017 52nd International Universities Power Engineering Conference (UPEC) (2017), pp. 1–5.

  40. G.-Y. Kwon, C.-K. Lee, G.S. Lee, Y.H. Lee, S.J. Chang, C.-K. Jung, J.-W. Kang, Y.-J. Shin, Offline fault localization technique on HVDC submarine cable via time-frequency domain reflectometry. IEEE Trans. Power Deliv. 32(3), 1626–1635 (2017).

    Article  Google Scholar 

  41. C. Yanhui, Retracted article: Monitoring method based on GIS for submarine cable fault data. Arab. J. Geosci. 14(6), 490 (2021).

    Article  Google Scholar 

  42. G. Zheng, Research on Vibration Signal Analysis Method of Fiber Optic Composite Submarine Cable Based on ESMD and SVM

  43. J. Yang, Study on Time-frequency and Phase Characteristics of Low Frequency Marine Ambient Noise

  44. R.A. Roberts, C.T. Mullis, Digital Signal Processing (Addison-Wesley Longman Publishing Co., Inc., Boston, 1987)

    MATH  Google Scholar 

  45. T.F.B. Marie, Y. Bin, H. Dezhi, A. Bowen, Principle and application state of fully distributed fiber optic vibration detection technology based on ϕ-OTDR: a review. IEEE Sens. J. 21(15), 16428–16442 (2021)

    Article  Google Scholar 

Download references


The work was supported by the National Natural Science Foundation of China under Grant 62001067.

Author information

Authors and Affiliations



The concept of this paper was put forward by JL, WF, YL and JL. JL built and optimized the model, made the performance simulations, wrote the paper and revised the paper. Other authors assisted in related work. All authors read and approved the final manuscript.

Author's information

Jie Lu received her B.Eng. degree in electronic information engineering from the South-Central Minzu University. Now, she is a postgraduate student in Chongqing University. She is currently focusing on the research about next generation wireless communication, interference management and deep learning.

Wenjiang Feng received the Ph.D. degree in electrical engineering from Chongqing University, Chongqing, China, in 2000. He is currently a Professor with the College of Microelectronics and Communication Engineering, Chongqing University. His research interests include all aspects of MIMO communication, including limited feedback techniques, antenna design, interference management and full-duplex communication, cognitive radio, special mobile communication systems, and emergency communication. He is a Peer Review Expert of the Natural Science Foundation of China and is a Senior Member of the China Institute of Communications. He also serves as an Editorial Board Member of Data Communication, China.

Yuan Li received his Bachelor of Science degree in electronic information science and technology from the Chongqing University. Now, he is a postgraduate student in Chongqing University. He is currently focusing on the research about backscatter communication system and deep learning.

Juntao Zhang received his B.Eng. degree in communication engineering from the Chongqing University. Now, he is a master candidate in Chongqing University. He is currently focusing on the research about the radio frequency fingerprinting of wireless devices and machine learning.

Yongqi Zou received his B.Eng. degree in electronic information engineering from the Shandong University Of Technology. Now, she is a master degree candidate in Chongqing University. She is currently focusing on the research about the deep learning in the field of energy.

Jingfu Li received his B.Eng. degree in communication engineering from the Chongqing University of Posts and Telecommunications and M.Eng. degree in information and communication engineering from the Chongqing University. Now, he is a doctoral candidate in Chongqing University. He is currently focusing on the research about the wireless communication, interference management and machine learning. Accordingly, three SCI papers and two EI papers have been obtained.

Corresponding author

Correspondence to Jie Lu.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, J., Feng, W., Li, Y. et al. VMD and self-attention mechanism-based Bi-LSTM model for fault detection of optical fiber composite submarine cables. EURASIP J. Adv. Signal Process. 2023, 29 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: