4.1 Multi-element LSTM model
LSTM (long short-term memory) network model is an improvement of RNN (recurrent neural network). The infrastructure of LSTM contains a part that controls the storage state, which can solve the problem of gradient disappearance encountered by RNN [31]. In this paper, the method of supervised learning is adopted, which does not require artificial construction of time series features. The time series curve can be fitted through deep learning network, and the long-term dependence of time sequence relationship can be captured for feature learning and prediction. The principle of LSTM is shown in Fig. 4.
When \(f_{t} = 1\), it means that the short-term memory is completely retained. After the noise data are input, whether it can be stored in the cell depends on the input gate, and the output of the input gate is \(C_{t}\) as in the formula (3).
$$f_{t} = \sigma (W_{f} [h_{t - 1} ,n_{t} ]) + b_{f} ,f_{t} \in [0,1]$$
(2)
$$C_{t} = f_{t} *C_{t - 1} + \sigma (W_{i} [h_{t - 1} ,n_{t} ] + b_{i} )*\tanh (W_{C} [h_{t - 1} ,n_{t} ] + b_{C} )$$
(3)
\(n_{t}\) represents the input noise of the current layer. \(h_{t - 1}\) is the output noise of the previous layer and the hidden state of the current layer. The above formula represents the state of the new cell after discarded useless information and retained some new information, where \(i_{t} = \sigma (W_{i} [h_{t - 1} ,n_{t} ] + b_{i} )\) and it represents the probability of new information being retained, and the prediction noise of output depends on the output gate:
$$o_{t} = \sigma (W_{o} [h_{t - 1} ,n_{t} ] + b_{o} )$$
(4)
$$Y_{t} = h_{t} = o_{t} *\tanh (C_{t} )$$
(5)
\(o_{t}\) is the output probability. Multiplying \(o_{t}\) and hyperbolic tangent function \(\tanh (C_{t} )\) can achieve the purpose of controlling the cell state filtering, and the output \(Y_{t}\) is the hidden state of the next layer. In the above expression, \(W_{f} ,W_{i} ,W_{C} ,W_{o}\) are the function parameter weight vectors and \(b_{f} ,b_{i} ,b_{C} ,b_{o}\) are the bias vectors.
The essence of realizing multivariate is to form a sample with multiple dimensions of multiple information and transform it into a supervised learning problem, so as to achieve the purpose of multiple inputs and single output. There are 32 neurons in the first hidden layer, 1 neuron in the output layer is used to predict noise, and the input variables are four-dimensional information including wind speed, noise of neighboring station based on wind direction, traffic flow information and noise of prediction station. The output is prediction noise of prediction station with 2 prediction steps and time interval of 10 min. The model was trained 100 times with a batch size of 128, tracking training and test losses during training by setting the validation_data parameter in the fit () function.
Multi-factor features were extracted based on LSTM model for noise prediction. The prediction error was large during the abrupt change period: In January, the noise plunged about 4.5 dB, and the high error of the prediction result was about 2 dB. Therefore, the Prophet model was introduced to fuse multi-station information to improve the prediction accuracy.
4.2 Prophet model based on spatial multi-station regression
The Prophet prediction model has great advantages in processing periodic data with abnormal values and trend changes, and the noise of chemical parks has strong micro-abruptness and macro-regularity. Therefore, Prophet model is introduced for noise prediction in this paper. Prophet model decomposes the time series according to the following formula:
$$P\left( t \right) = g\left( t \right) + s\left( t \right) + h\left( t \right) + \varepsilon \left( t \right)$$
(6)
In formula (6), \(g\left( t \right)\) represents the noise trend term, which is mainly used to fit aperiodic changes in the time series. We use a trend term model based on piecewise linear functions:
$$g\left( t \right) = (k + a(t)^{T} \delta )t + (m + a(t)^{T} \gamma )$$
(7)
In formula (7), \(m\) is the offset, \(k\) represents the growth rate, and \(\delta\) represents the change in the growth rate. The indicator function is: \(a(t) = (a_{1} (t),...,a_{S} (t))^{T}\). \(a(t) \in \left\{ {0,1} \right\}^{S} ,a_{j} (t) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if}}\;t \ge S_{j} } \hfill \\ {0,} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.\), \(\gamma = (\gamma_{1} ,...,\gamma_{S} )^{T} ,\gamma_{j} = - S_{j} \delta_{j}\), where S represents the number of mutation points.
\(s\left( t \right)\) is a periodic term modeled by Fourier series:
$$s\left( t \right) = \sum\limits_{n = 1}^{N} {\left[ {a_{n} \cos \left( {\frac{2\pi nt}{P}} \right) + b_{n} \sin \left( {\frac{2\pi nt}{P}} \right)} \right]}$$
(8)
In Formula (8), \(t\) represents a fixed period, \(2n\) represents the number of periods expected to be used in the model, \(P\) represents the period of the time series, and P = 7 represents a period of weeks.
\(h\left( t \right)\) is a holiday item that regards the influence of each holiday at different times as an independent model. \(\varepsilon \left( t \right)\) represents the error term or interference term, which represents random and unpredictable fluctuations. Prophet algorithm can add up trend terms, season terms and so on to be the predicted value of time series.
In this paper, the method add_regressor() was used to add data from multiple stations as regression variables for fitting. First, the noise time series data of other sites were added to Prophet in turn for prediction. Then, the sites were sorted according to the RMSE size of the prediction results, and the ranking results were added to Prophet model in turn to improve the prediction accuracy. Although the Prophet model is flexible, it cannot consider the influence of the characteristics of multi-dimensional factors. Therefore, achieving accurate prediction requires a more complete prediction scheme.
4.3 Multi-PL model based on Prophet and LSTM combination
Based on the characteristics of the Prophet and LSTM models, we propose the Multi-PL model to make up for the limitations of a single model, and can effectively use the park information and the advantages of the two models to achieve higher-precision noise prediction.
Firstly, the noise feature sequence of adjacent stations based on wind direction was constructed, and the wind direction was classified as direction labels with time series features. Extract the noise value of the corresponding site during the time according to the tag, stitch the extracted noise value into a new time series feature, which is the noise feature in Fig. 5, and construct a multi-element LSTM model by combining the time series features of traffic flow and wind speed. The above work is based on the multivariate data set \({\text{Train}}\;{\text{Set}}\;1\). The data of each site in the multi-station dataset \({\text{Train}}\;{\text{Set}}\;2\) were, respectively, used in the Prophet model, sorted according to the size of RMSE of different sites, and added to the Prophet model in the order of RMSE from small to large.
Use the cftool (Curve Fitting Tool) curve fitting toolbox in MATLAB to fit the two model prediction results and the real noise value in the training set, and obtain the formula (9) between the actual noise value and the model prediction value:
$${\text{Train}}\_{\text{true}} = A*{\text{LSTM}} + B*{\text{Prophet}} + C$$
(9)
The method of obtaining the relationship between the actual value and the predicted value by fitting method is closer to the true value than the linear weighting method of the predicted value of the two models, and has the property of constant compensation, which prevents the training result of a certain model from being too high or too low leading to deviations in forecast results.