Electrocardiogram prediction based on variational mode decomposition and a convolutional gated recurrent unit

Electrocardiogram (ECG) prediction is highly important for detecting and storing heart signals and identifying potential health hazards. To improve the duration and accuracy of ECG prediction on the basis of noise filtering, a new algorithm based on variational mode decomposition (VMD) and a convolutional gated recurrent unit (ConvGRU) was proposed, named VMD-ConvGRU. VMD can directly remove noise, such as baseline drift noise, without manual intervention, greatly improving the model usability, and its combination with ConvGRU improves the prediction time and accuracy. The proposed algorithm was compared with three related algorithms (PSR-NN, VMD-NN and TS fuzzy) on MIT-BIH, an internationally recognized arrhythmia database. The experiments showed that the VMD-ConvGRU algorithm not only achieves better prediction accuracy than that of the other three algorithms but also has a considerable advantage in terms of prediction time. In addition, prediction experiments on both the MIT-BIH and European ST-T databases have shown that the VMD-ConvGRU algorithm has better generalizability than the other methods.


Introduction
Time series data are generated during the process of regular observation and collection of certain phenomena.In the medical field, many biomedical data can be used as time series.An electrocardiogram (ECG) is a graphical representation of the electrical potential of the heart and is commonly used to detect the presence of cardiovascular disease (CVD).In general, an ECG has three main components: a P wave, a QRS composite wave and a T wave.The ECG in Fig. 1 was recorded at a rate of 25 mm/s (5 large cubes/s) [1].In the vertical direction, 10 mm represents 1 mV, and each 1 mm square represents 0.04 s (40 ms) and 0.10 mV.
ECG prediction is a scientific analysis method based on time series data.If potential health hazards can be predicted in time, timely medical assistance can be provided, and the compressed storage of relevant signals can be facilitated.In the existing research on deep learning-based prediction of ECGs, there are notable limitations regarding handling abnormal conditions, noise, and interference that can significantly affect the model performance.Furthermore, the majority of models tend to predict only a single point after the input signals, indicating that both the output time duration and the prediction accuracy require further optimization.
Therefore, to solve the key problems of noise filtering, such as baseline drift, and further improve the prediction time and accuracy, in this paper, the ECG prediction algorithm VMD-ConvGRU, which combines variational mode decomposition (VMD) and a convolutional gated recurrent unit (ConvGRU) was proposed, and a generalization test was carried out.Compared to other methods, VMD-ConvGRU is particularly well suited for ECG compression, variable-length signal restoration, and complex environment applications, such as real-world clinical settings.The following points summarize the contributions of this paper: (1) Using VMD to preprocess the data can effectively filter out noise such as baseline drift and capture different frequency domain characteristics of the actual signal through multiple adaptive segmentation component predictions.(2) The encoder-forecaster (EF) structure can freely change the input length and prediction data to realize continuous prediction for a period and effectively increase the prediction time.Moreover, multilayer information exchange can be realized between the stacked structure encoder and the predictor.Finally, combining VMD signal decomposition with EF-ConvGRU greatly improves the prediction accuracy.
The remainder of this article is structured as follows.Section 2 discusses related works.In Sect.3, VMD and ConvGRU are theoretically analyzed and introduced.Section 4 proposes an improved ECG prediction algorithm.Section 5 presents some experimental analysis and comparisons with the relevant algorithms and their performance.Finally, the conclusion and future work are given in Section 6.

Related work
ECG signal preprocessing is an indispensable step in the prediction process.VMD is an adaptive and completely nonrecursive decomposition technique that has been gradually applied in various fields since it was first proposed [2].
Upadhyay et al. proposed a method based on VMD to detect sound or silent regions in speech signals [3].Wang et al. combined the particle swarm optimization algorithm with VMD and applied it to the fault diagnosis of complex rotating machinery [4].Lahmiri proposed a model combining VMD and the backpropagation neural network (BPNN), which decomposed a price sequence into a series of variational modes by VMD.Then, the BPNN was trained to predict the stock price of the day [5].This approach has also been applied to ECG decomposition.The typical workflow in ECG signal prediction is as follows.Su et al. proposed a prediction method for ECG signals based on the combination of phase space reconstruction and the Takagi-Sugeno (TS) fuzzy model [6], which used only three data points to carry out experiments.Huang et al. proposed a prediction method for ECG signals based on the autoregressive integrated moving average (ARIMA) model and discrete wavelet transformation (DWT) [7].Sun et al. proposed an ECG signal prediction method, VMD-NN, based on a combination of VMD and a backpropagation neural network [8], where an ECG signal was decomposed into n models by VMD and n − k models other than k noise models were input into the backpropaga- tion neural network.The network structure consists of three layers, the number of input layer nodes is 9, that is, n − k = 9, the number of hidden nodes is 18, and the number of output nodes is 1.The simulation experiment only used one example (No. 100).The results showed that the waveform of the ECG signal is not affected by fault tolerance, verifying the effectiveness of the algorithm for the prediction of the ECG signal.In 2018, the team proposed a prediction model based on phase space reconstruction and a neural network [9].The same example (No. 100) was used in the MIT-BIH database for the experiments to demonstrate the effectiveness of the algorithm.Huang et al. proposed an ECG signal prediction method combining VMD, the Cao method and a long short-term memory (LSTM) neural network, significantly improving the prediction accuracy [10].
Based on the analysis and summary of the relevant related research mentioned above, it is far from practical to predict an ECG signal at only one point each time, and there is still room for improvement in the accuracy of these algorithms.In addition, due to the limited data selection, systematic research on the generalizability of related algorithms has become highly important.

Variational mode decomposition
The goal of VMD is to decompose the original input signal into multiple intrinsic mode functions (IMFs).In the process of obtaining the decomposed components, VMD determines the frequency center and bandwidth of each component by iteratively searching for the optimal solution of the variational model to adaptively subdivide the signal frequency domain and effectively separate each component.The VMD defines the IMF as the amplitude-modulated-frequency-modulated (AM-FM) signal shown as where u k (t) is the modal component, A k (t) is the instantaneous amplitude of the modal component signal, and φ k (t) is the phase of the modal component signal.The instanta- neous frequency of u k (t) is defined as It is assumed that each mode u k (t) has a center frequency and limited bandwidth; the constraint condition is that the sum of all the modes is equal to the input original signal (1) 3) is transformed into In the solution process, the alternate direction method of multipliers (ADMM) is used to update u ( are used to find the saddle point, which can be written as until the update conditions are met with Finally, k IMF components are obtained according to the frequency domain characteristics of the actual signal, and adaptive segmentation of the signal frequency band is completed. In our application scenario, according to the formula in pseudocode [2], u k , w k and other parameters converge with appropriate parameter initialization and convergence criteria.Therefore, VMD optimized with the ADMM can successfully decompose any preprocessed ECG signals in the datasets mentioned in Sect.5.1.For complexity, the algorithm uses two for loops successively, with complexity O(K).In one cycle, the minimum value is selected, and the optimal complexity is O(KlogK); thus, the final complexity is O(K 2 logK ). (3)

Encoding-forecasting structure based on a convolutional gated recurrent unit
For a situation where the lengths of the input and output sections are inconsistent, a variant structure of the recurrent neural network (RNN) is a good choice; its specific implementation structure from sequence to sequence is shown in Fig. 2. Usually, this variant RNN structure first encodes an input sequence into a context vector C. C becomes a hidden initial variable of the decoding network, which is the forecaster module in the following content of this paper.The encoding-forecasting structure based on a convolutional gated recurrent unit (EF-ConvGRU) can be regarded as a special encoding and decoding structure [11].Instead of sharing only one set of information, in the coding prediction structure, the encoder and predictor share multiple sets of information, and there is information exchange between each layer.The structure is shown in Fig. 3. On the left, the information flowing from bottom to top is the encoder, and on the right, the information flowing from top to bottom is the predictor.The input of the encoder includes observations from x t−σ +1 to x t ; that is, σ continuous states can be input, and a total of τ continuous states from x t+1 to x t+τ can be predicted.
The encoder consists of two different modules, ConvGRU and DownScale, for reducing the data length.There are three different modules in the predictor, ConvGRU and UpScale for restoring the data length and Predict for converting the data into the final predicted value.

Architecture of the VMD-ConvGRU model
The VMD and ConvGRU algorithms were integrated, and the VMD-ConvGRU algorithm framework was proposed.As shown in Fig. 4, the input is the original ECG signal of σ × len points, the output is the predicted τ × len ECG signal and does not contain noise such as baseline drift, where σ represents the number of segments of the input ECG, τ represents the number of segments of the output ECG, and len represents the length of each ECG segment.The algorithm first decomposes the original ECG signal through VMD, removes some components to achieve noise reduction, then uses EF-ConvGRU to predict each IMF component, and finally combines the results to obtain the predicted ECG signal.On the one hand, it is convenient for simultaneous computation in a distributed system to achieve higher efficiency; on the other hand, pathology can be divided to carry out multidimensional analysis.

Fig. 4 Algorithm framework of VMD-ConvGRU
Different from image information, ECG signals are usually one-dimensional sequential signals.Therefore, the convolution operation in the ConvGRU is replaced with a onedimensional convolution.Like two-dimensional convolution methods, one-dimensional convolution methods also involve parameters such as the kernel size, step size and padding.The difference is that the size of the one-dimensional convolution kernel is usually n × 1 , and it only moves along the direction of length n, which is the direction of step size movement.Padding is only applied at both ends in this direction.The parameters required for one-dimensional convolution are usually expressed as The classical GRU based on matrix multiplication can be denoted as The input parameters are the observed value at the current moment x t and the hidden state at the previous moment h t−1 .The parameters in the cell mainly include the matrix W and bias b, while the operation is mainly matrix multiplication.The calculation of the ConvGRU is designed as The input parameters X t and H t−1 have the same meaning as those of the classical GRU, but the dimensions are changed to multichannel information.For the ConvGRU of the first layer, the input dimension changes from R L to R 1×L , and 1 means that there is cur- rently only one channel.The input dimension of the upper ConvGRU is R C i ×L , where C i indicates that there may be multiple channels.The dimension of H t−1 is R C h ×L , and there may also be multiple channels.Moreover, the sizes of C i and C h can be different.The parameters in the cell mainly include the convolution kernel W and the bias b, and the operation is mainly a convolution operation.The final output H t ∈ R C h ×L has the same dimension as that of H t−1 .
The Downscale and UpScale modules are also shown in Fig. 3.For DownScale, the data flow from the bottom to the top.First, a one-dimensional convolution module is used, and the stride of the convolution is set to 2 to reduce the length of the input data by half.Moreover, setting out_channel greater than in_channel can increase the understanding of more dimensional information while reducing the length.To speed up the training process, batch normalization (BN) operations are used after convolution to aggregate information from different samples, and the leaky rectified linear unit (LeakyReLU) is used as the activation function.For UpScale, the data flow from top to bottom.It is necessary to use transposed convolution to increase the length of the data while reducing out_channel .The operation is also followed by the BN and LeakyReLU operations.
Different from the ReLU calculation method [12], which directly zeroes parts less than 0, the LeakyReLU [13] activation function is shown as (9) parameters = in_channels × out_channels × kernal_size (10) which uses a parameter to scale the part less than 0. In this paper, we set this value to the recommended value of 0.01.
Finally, after processing by the Predict module, the final predicted value can be obtained.The Predict module is shown in Fig. 3 and includes the convolution and LeakyReLU operations.In the prediction process, convolution is used to replace the fully connected layer.The advantage of this approach is that the number of parameters can be reduced while increasing the ability to gather information.By setting out_channel of the last convolutional layer to 1, an output with a channel number of 1 can be obtained, which can be used as the predicted value at a certain moment in the end.

Training process of the VMD-ConvGRU model
The ECG prediction algorithm based on VMD-ConvGRU includes various parameters, such as the length of the input data, the parameters of the convolution kernel in the hidden layer, the number of decomposition layers k, and the initialization of the parameters.In addition, the parameters in the neural network are continuously updated by learning the data in the training set until the number of iterations is reached.
The algorithm includes the following operations.

Experimental settings
The MIT-BIH database, an internationally recognized ECG database that can be used as a standard, [14], was generated by the laboratories of the Beth Israel Hospital and the Massachusetts Institute of Technology.Each piece of data contains three files, namely, the header file (.hea), the data file (.dat) and the annotation file (.atr).The header file records the signal name, the number of leads, the name of the lead, the sampling frequency, the number of sampling points, etc.; the data file records all signal points; the annotation file contains the diagnostic information of the expert on the ECG signal; and a detailed record is made for each heartbeat type of the ECG.The signal sampling rate in this database is 360 Hz, and there are 48 two-lead ECG signals (mostly from the MLII and V1) collected in approximately 30 min.We used the MLII lead data and screened out 19 data points for experiments based on the proportion of normal heartbeats exceeding 95%.The first 80% of each piece of data was used as the training set for a total of 24 min, and the remaining 20% was used as the test set for a total of 6 min.
When σ = τ = 2 and len = 36 , the parameter settings of the network structure are shown in Table 1.
The experiment was run on a computer with the Windows 10 64-bit operating system and an Intel Core i5-8250U CPU with 8 GB of memory and written in Python 3.8.5 using PyTorch 1.6.0.

Evaluation indicators
The evaluation metrics used in the simulation experiment include the mean absolute error (MAE), mean square error (MSE) and root mean square error (RMSE): which are, respectively defined as

VMD of the ECG signal
To show the effect of VMD on signal decomposition, we used the VMD tool [15] provided in MATLAB to decompose the first 3 s of the 100th signal in the MIT-BIH database.The decomposition results are shown in Fig. 5.The number of decomposition layers k was set to 10, and the other parameters were set to the default parameters provided by MATLAB.Figure 5a shows the original ECG signal.Figure 5b shows the decomposed components IMFs 1-10, where IMF 1 can be regarded as a high-frequency noise signal, IMF 10 can be regarded as baseline wander.and Fig. 5c is the undecomposed residual signal.( 14) A more intuitive representation is shown in Fig. 6, where the x-axis is time (in seconds), the y-axis is the corresponding amplitude, and the z-axis is the residual signal, IMF, or original signal, respectively.The value of the residual signal is very small relative to the original signal, and the frequency gradually decreases from IMF 1 to IMF 10.
The signal decomposed by VMD can be used to reconstruct the original signal, and its corresponding relationship can be expressed as We reconstructed the first 3 s of signal 100, and the comparison is shown in Fig. 7.The reconstructed ECG signal is smoother in the time domain than in the other domains, and the quality of the original signal is preserved.Additionally, the baseline signal was evaluated using the mean.In Fig. 7, the reconstructed ECG baseline is determined to be 1.3514E − 04 , which is very close to zero.This result demonstrates that the recon- structed signal effectively removes noise and corrects baseline drift.In subsequent experiments, the reconstructed signal was used as the predicted target value.

Parameter analysis
In the parameter analysis experiment, we used example 100 as training and test data.These data contain a total of 30 min of ECG signals.The first 80% was used as the training set, a total of 24 min, and the remaining 20% was used as the test set, a total of 6 min.For each predicted IMF component, the MSE was used as the loss function.In addition, the adaptive moment estimation (Adam) optimizer was used, the weight decay was set to 0.0001, the initial learning rate was 0.01, and the learning rate was reduced when the loss no longer decreased.In each experiment, 50 epochs were trained.The parameters that were set in the EF-ConvGRU model include the input time length σ , the prediction time length τ , and the input length at each moment len.These three parameters are discussed separately below.
To explore the effect of σ on the result, we fixed τ = 2 and len = 36 and then set σ = 2, 4, 6, 8, and 10; that is, the inputs were 0.2 s, 0.4 s, 0.6 s, 0.8 s, and 1 s, respectively.The effect of different input lengths on the result was obtained.The change in loss is shown in Fig. 8.When the input duration is 0.2 s, 0.4 s, or 0.8 s, the convergence trend is relatively stable.For an input duration = 0.8 s, the VMD-ConvGRU converges the earliest, i.e., at epoch 25.For an input duration = 0.2 s, the VMD-ConvGRU converges at epoch 33, which is better than the corresponding value of 0.4 s.The other models with different input durations converge after 50 epochs.As the input duration increases, the model exhibits improved performance after convergence.The convergence loss values are very close when the input duration is 0.2 s and 0.4 s.In addition, as the input duration increases, such as with values of 0.6 s or 1 s, the model clearly exhibits significant fluctuations in performance during the initial training stages.This result can be attributed to the larger amount of information the model receives at once compared to that at shorter input durations, resulting in greater signal variations.Consequently, additional Fig. 8 Changes in the loss of the VMD-ConvGRU algorithm for example 100 for different input durations adjustments are needed to effectively extract relevant information, ultimately leading to improved predictive outcomes.
To verify the prediction effect of the model with different input durations, we conducted a test, and the results are shown in Table 2.When the input σ = 10, the effect is the best, and the performance is similar to that on the training set.Therefore, the greater the number of input periods is, the better the performance, but the greater the additional computational overhead.Figure 9 shows the comparison line charts of the MSE and MAE for different input durations.
To explore the impact of τ on the results, we fixed σ = 10 and len = 36 and then set τ = 1, 2, 3, 4, and 5; that is, the outputs were 0.1 s, 0.2 s, 0.3 s, 0.4 s, and 0.5 s, respectively; and the influence of different output lengths on the prediction results was obtained.The change in loss is shown in Fig. 10.The output duration has a greater impact on the loss, and the shorter the prediction time is, the better the training effect.
To verify the prediction effect, we tested the trained model.The experimental results are shown in Table 3.As the output duration increases, the prediction performance of the model decreases.A possible reason for this result is that as the length of the data transfer increases, some information is lost, reducing the prediction accuracy.In addition, an increase in τ reduces the inference speed.Figure 11 shows that as the output duration increases, the MSE gradually increases, and the model fitting ability gradually decreases.3) Input length at each moment.As shown above, the larger σ is, the better the prediction result, and the larger τ is, the worse the result.When σ = τ = 2, a model with more accurate prediction results and better inference performance can be obtained.To explore the impact of the input value at each moment len on the result, we fixed σ = τ = 2, and then set len = 5, 10, 36, 60, namely, 1/72 s, 1/36 s, 0.1 s, 1/6 s.The change of Loss is shown in Fig. 12.As len increases, the training effect worsens, and the model effect worsens when len = 60.
The performance on the test set is shown in Table 4.With the continuous increase in len, the prediction effect of the model decreases rapidly.When the length continues to increase, the model can no longer predict normally.

Performance comparison with related methods
To verify the prediction effect of VMD-ConvGRU, we compared our method with the three methods proposed in three related papers, and the results are shown in Table 5.When σ = τ = 2 and len = 5 , the RMSE (0.0117) and MAE (0.0051) of the proposed VMD-ConvGRU model are both smaller than the corresponding values of the VMD-NN [8], PSR-NN [9], and TS fuzzy models [6].When len = 36 , 0.2 s can be predicted at a time, which significantly increases the prediction time length, and the prediction performance does not decrease too much; in particular, the MAE indicator still has a great advantage.Notably, the above three methods can predict only one point at a time, namely 1/360 s.However, VMD-ConvGRU can predict at least 1/72 s.VMD-ConvGRU increases the prediction duration while ensuring that the prediction accuracy is still improved.In addition, the experimental results of the above three methods are all part of the extraction from example 100, i.e., not all the data were used for training and testing.Our proposed method directly uses the first 24 min as training and the last 6 min as testing.Furthermore, we selected example 113 for comparison with the TS fuzzy model method.Our model achieves better performance, as shown in Table 5.Therefore, our model has better generalizability.
Figure 13a shows the prediction result at 1 s for example 100.The prediction results of the VMD-ConvGRU model proposed in this paper fit the reconstructed signal well.Figure 13b-d are the prediction results of the Q, R, and S waves, respectively, corresponding to the first heartbeats in Fig. 13a.The trends of the three waveforms are well predicted, and most of the predicted curves overlap with the original curves.Figure 14 shows the predicted difference.The error remains within a small range, and most of the differences are within 0.01.6.
The best values for RMSE and MAE on the 18 selected examples in MIT-BIH are 0.0093 and 0.0067, respectively, while the corresponding values for EDB are 0.0088 and 0.0057, respectively.To evaluate the overall performance of VMD-ConvGRU, we selected two evaluation metrics: the mean and variance.The mean and variance of the RMSE and MAE are 0.0095 and 0.005, respectively, on 18 examples in the EDB.In general, the prediction effect and stability of VMD-ConvGRU are excellent, and its generalizability has been verified.

Fig. 2
Fig. 2 Structure of the RNN sequence-to-sequence model

1 . 8 .
Determine the number of decomposition layers k.Considering the fairness of the comparison experiments with related algorithms, set k = 10.2. Determine the length of the input data and the length to be predicted.3. Set the EF-ConvGRU convolutional layer parameters, such as kernal_size , stride, padding, in_channel , and out_channel.4. Determine batch_size and epoch. 5. Using the mean square error (MSE) as the loss function of each prediction, express the final loss as 6.Initialize the convolution kernel and bias value.7. Decompose the signal with VMD and remove the decomposed IMF 1, IMF 10 and residual.Train the model and use dense sampling for data augmentation.Each turn rearranges the ECG signals in the training set.9. Use the loss function to train the parameters of the convolution kernel, and stop when the set epoch is reached.The pseudocode of the ECG prediction algorithm based on VMD-ConvGRU is shown in Algorithm 1.

Fig. 5
Fig. 5 Decompose the first 3 s original signal of example 100 with VMD

Fig. 6 Fig. 7
Fig. 6 3D display for the first 3 s of example 100 after VMD

Fig. 9 Table 3 Fig. 11
Fig. 9 MSE and MAE line charts for VMD-ConvGRU on example 100 with different input durations

Fig. 12
Fig. 12 Loss change line chart for VMD-ConvGRU on example 100 with different len values

Fig. 14
Fig. 14 Prediction bias of VMD-CONVGRU on 1 s of example 100 and the sum of the estimated bandwidths of each mode is minimal.The constrained variational model is expressed as Here, f is decomposed into k finite IMF components, namely, {u k } := {u 1 , u 2 , . . ., u k } , and {w k } := {w 1 , w 2 , . . ., w k } is the central frequency of each component.The problem of obtaining the decomposed IMF is transformed into solving the constrained variational model.The penalty factor α and the Lagrange multiplicative operator are introduced, and Eq. (

Table 1
Parameter setting of the module encoder and forecaster

Table 2
Performance comparison of VMD-ConvGRU on example 100 with different σ values

Table 4
Performance comparison of VMD-ConvGRU on example 100 with different len values

Table 5
Performance comparison of different algorithms on example 100 (VMD-NN, PSR-NN, TS fuzzy model and VMD-ConvGRU) and example 113 (TS fuzzy model and VMD-ConvGRU) The best values for RMSE and MAE on the 18 selected examples

Table 6
Generalizability results of VMD-ConvGRU on 36 examples from MIT-BIH and EDBThe best values for RMSE and MAE on the 18 selected examples