Noise prediction of chemical industry park based on multi-station Prophet and multivariate LSTM fitting model

Zeng, Qingtian; Liang, Yu; Chen, Geng; Duan, Hua; Li, Chunguo

doi:10.1186/s13634-021-00815-6

Research
Open access
Published: 29 October 2021

Noise prediction of chemical industry park based on multi-station Prophet and multivariate LSTM fitting model

Qingtian Zeng¹,
Yu Liang¹,
Geng Chen ORCID: orcid.org/0000-0001-9432-0563¹,
Hua Duan² &
…
Chunguo Li³

EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 106 (2021) Cite this article

2317 Accesses
3 Citations
Metrics details

Abstract

With the gradual transformation of chemical industry park to digital and intelligent, various types of environmental data in the park are extremely rich. It has high application value to provide safe production environment by deeply mining environmental data law and providing data support for industrial safety and workers’ health in the park through prediction means. This paper takes the noise data of the chemical industry park as the main research object, and innovatively applies the 3σ principle to the zero-value processing of the noise data, and builds an LSTM model that integrates multivariate information based on the characteristics of the wind direction classification noise data combined with the wind speed and vehicle flow information. The Prophet model integrating multi-site noise information was adopted, and the Multi-PL model was constructed by fitting the above two models to predict the noise. This paper designs and implements a comparative experiment with Kalman filter, BP neural network, Prophet, LSTM, Prophet + LSTM weighted combination prediction model. R² was used to evaluate the fitting effect of single model in Multi-PL, RMSE and MAE that were used to evaluate the prediction effect of Multi-PL on noise time series. The experimental results show that the RMSE and MAE of the data processed by the 3σ principle are reduced by 32.2% and 23.3% in the multi-station ordered Prophet method, respectively. Compared with the above comparison models, the Multi-PL model prediction method is more stable and accurate. Therefore, the Multi-PL method proposed in this paper can provide a new idea for noise prediction in digital chemical parks.

1 Methods/experimental

In order to improve the accuracy of environmental noise prediction in chemical industry park, this paper proposes a multivariate and multi-station neural network model (Multi-PL) based on LSTM and Prophet. According to the periodicity of environmental data in the park, it is divided into multivariate data and multi-station data. Secondly, the structure and implementation of the model are introduced and explained in detail. Finally, the prediction accuracy under different proportions of training sets is compared through experiments, and different data sets and different models are used for experiments. Experiments with and without the use of the 3σ criterion were conducted to compare the single model in different data sets, and to compare the model with other models. The results of experiment 4.2 show that the application of 3σ criterion and multivariate and multi-station data can improve the prediction performance of the single model. In addition, experiment 4.3 proves that Multi-PL is better than single model, traditional prediction method and LSTM + Prophet linear combination model.

2 Introduction

With the spread of 5G high-speed transmission technology, chemical industrial complexes are also entering the Era of Internet of Things (IoT) through sensors [1]. As the chemical park brings good economic benefits through the gathering of factories, pollution problems are gradually exposed. Exhaust gas and wastewater can be recycled and reused through Ecological Industrial Park, and noise, as a threat that is often overlooked, continues to affect human mental and hearing health. Factory noise may cause mild or moderate noise deafness [2]; noise can also cause headaches, insomnia, unresponsiveness, hearing loss and other symptoms [3,4,5,6]. The chemical park is surrounded by farmland and villages, and noise will have a negative impact on villagers’ lives, animal breeding and natural ecology [7]. How to use effective methods to predict noise and dig out noise rules to reduce the impact on life and physical health is a problem that needs to be considered and solved.

IoT data contain a lot of useful information, such as satellite Industrial Internet of Things (IIoT) data can be used to solve service quality problems [8]. Noise prediction is restricted by many conditions. With the development and change of artificial intelligence technology, existing technologies can solve learning trends, big data classification and trend prediction problems by introducing environmental factors [9]. Information transmission in the IIoT is also limited by spectrum resources, so data loss is a common situation [10]. It is an extremely important research topic to dig out the laws of noise and predict the future noise level to be able to mitigate noise hazards [11]. Noise prediction research has received increasing attention. For example, the literature [11] proposed a gradient boosting model to predict noise, which combines multiple characteristics to analyze areas with severe noise exposure, and performs well under specific frequency sensors. [12] proposed a two-layer long short-term memory (LSTM) network to predict environmental noise under a large amount of data, which can reflect the change of noise level within a day, but only the time regularity of noise is considered. [13] proves that the LSTM model is better than the traditional ARIMA time series forecasting model. In literature [14], LSTM model is used for airport noise prediction, and metadata of aircraft type, trajectory information and weather data are also integrated into the model, resulting in higher prediction accuracy, but lack of consideration of spatio-temporal characteristics of noise. [15] proposed an integrated model of airport noise prediction based on space fitting and BP neural network, which integrates time and space characteristics to improve the accuracy and fault tolerance of prediction. However, the application area of this model is limited and not flexible enough. [16] established a feature-weighted support vector regression model FWSVR based on the time series similarity, which has generalization ability. [17] simulates the noise of a typical road network based on the existing traffic flow model. The above two methods are limited to univariate prediction and lack information integrity. [18] uses the improved Federal Highway Administration (FHWA) model to predict the noise level. This method integrates multivariate information, but the information is not perfect in practical application. Environmental noise prediction still faces the following challenges: Noise has superposition and mutability, how to capture the noise law of the park? How to reduce the influence of sparse zero and outliers caused by sensor faults on the prediction without affecting the noise law? In addition to noise prediction, Prophet, Stackelberg model and extended Kalman filter have also been used by some researchers to achieve good results [19,20,21]. However, a single forecasting method cannot capture the distribution of complex time series patterns. More and more researchers are capturing complex time series distribution patterns based on hybrid forecasting models in order to obtain better forecast accuracy and performance [22]. There are three types of hybrid models for time series prediction. Hybrid model based on ARMA and machine learning [23, 24]: Literature [23] combined ARMA, PSO-SVM and clustering method for wind power generation prediction, and [24] uses the combined EMD-GM-ARMA model for coal mine safety production situation prediction. Hybrid model based on ARIMA and machine learning [25,26,27]: In literature [25], the mixed SSA-ARIMA-ANN model was used to predict daily rainfall, in [26], the combined ARIMA and ANN model was used to predict daily radiation and in [27], the mixed ARIMA and SVM model was used to predict corn futures price. Hybrid model based on machine learning [28,29,30]: Literature [28] uses CNN and AI-tuned SVM for power consumption prediction, literature [29] uses CNN-LSTM hybrid model for price sequence prediction, and literature [30] uses LSTM-RNN combined model for low-traffic flow forecast. The prediction accuracy obtained by applying the mixed model in the above literature is better than that of the single model, so the mixed model will be the key method to solve the problem of time series prediction of park noise. The above-mentioned literature focuses on noise pollution mainly on road traffic, airport, and urban environmental noise, ignoring the harm of noise in chemical parks. Motivated by the studies mentioned above, this paper studies the noise prediction of chemical industry park from the perspective of mixed model, which fills in the blank of the research direction of noise prediction in chemical industry park.

Based on the existing sensor distribution and traffic data in the chemical park, this paper builds a scene model suitable for the distribution characteristics of the park, constructs a noise multivariate data set and a multi-station data set according to the scene, and introduces the 3σ criterion to deal with the zero value of noise in order to improve the prediction accuracy. A Multi-PL model based on LSTM and Prophet models is proposed. Multivariate data set features such as wind speed, vehicle flow, and noise data based on wind direction classification are used in the multivariate LSTM model to improve the prediction accuracy. The multi-station noise data set is used as an additional regression variable for the Prophet model. Fitting the above model forms Multi-PL prediction model with higher accuracy.

The rest of this article is structured as follows. The second part introduces the research background, data set and preprocessing. The third part introduces the principle and construction of Multi-PL model. In the fourth part, the experimental results of the training model are given and evaluated in detail. The last part is summary and prospect.

3 System model and data set

3.1 System model

The research scene of this paper is an engineering plastics industrial park in Shandong Province, China. Based on the original smart chemical industry park, noise monitoring data are obtained through sensors. The collected data are accurate and effective, which provides an effective data basis for noise prediction.

The Park covers an area of 8.97 km² and is equipped with 12 air monitoring stations (no data at Station 11 due to failure) and 8 vehicle gate monitoring stations. At the mark in Fig. 1a, this paper takes the data of no.10 monitoring station and gate for analysis. There are three main sources of noise in the park:

1.
There are a large number of vehicles in the park for the transportation, loading and unloading of chemical raw materials. The volume of vehicles will affect the noise level.
2.
Chemical plants generally operate 24 h a day, and the impact of noise is not only periodic but also persistent.
3.
Natural sounds, such as wind, also affect the overall noise level. Different wind directions will bring different regional sound effects.

According to Fig. 1b, noise affects the hearing health of workers in the park, reduces the growth rate of crops, and causes residents to be irritable and tired. Conversely, hearing loss leads to decreased work efficiency, and residents' behaviors affect the operation of the park.

In the face of many problems in the scene, noise prediction and risk identification can assist the park in planning the operation cycle and reduce the operation of noise source equipment during periods of high noise to avoid the occurrence of the above situations.

3.2 Data set construction

Based on the system model, we constructed the park noise prediction data set as shown in Fig. 2. Part A represents the noise data and natural environment information monitored by the air monitoring station, and part B represents the vehicle information recorded by the gates. The information is uploaded to the gateway and stored in the park database server.

We carry out preprocessing by reading the data in the server. In this paper, all data are constructed into two sub-data sets according to requirements: multivariate data set and multi-station data set.

3.2.1 Data set preprocessing

The data sets used in this paper are from the scenarios in Sect. 2.1 and span from 14:00 on August 22, 2020 to 01:00 on February 2, 2021. As shown in Fig. 2, data pre-processing mainly includes the following three tasks:

Step 1: Data cleaning. Noise has mutability, and the irregular 0 dB value of the data has a great influence on the prediction accuracy. The 3σ criterion is introduced to deal with outlier zero value. Sparse missing data are completed by KNN adjacent interpolation.

Step 2: Data screening. The original noise data interval is 30 s, and a noise sensor has 470,760 pieces of data. The data are too dense. The training process can be accelerated by resampling experimental data according to 10-min intervals.

Step 3: Traffic data parsing: All the vehicle information in the park is classified with the vehicle entry and exit status as tags, and statistics are made at 10-min intervals.

After the data set preprocessing is completed, we construct sub-data set and verify the correlation between noise data and different variables, laying a foundation for the subsequent prediction work.

3.2.2 Multivariate data set and multi-site data set

Multivariate data sets include vehicle flow, noise characteristics of adjacent stations based on wind direction classification (the construction method is located in Sect. 3), wind speed and noise. The multi-station dataset contains noise data from 11 monitoring stations.

The noise data and natural environment data are derived from part A of Fig. 2, including information such as temperature, wind speed, wind direction, light, noise, and PM2.5. The traffic flow data come from part B of Fig. 2. In Fig. 3b, c, the X-axis represents the time interval index (in days), and (b) the Y-axis represents the noise decibel value and wind speed. The blue and red curves represent the noise value and wind speed, respectively, (c) the Y-axis represents the noise decibel value and the number of traffic flows. The blue, red and green curves indicate the number of vehicles entering and leaving the park and the decibel level, respectively. In order to analyze the correlation of representative data in air monitoring stations, Pearson correlation coefficient ρ is introduced as follows:

$$\left\{ \begin{gathered} \rho_{NW} = \frac{{{\text{cov}} (N^{(T,Y)} ,W^{(T,Y)} )}}{{\sigma_{{N^{(T,Y)} }} \sigma_{{W^{(T,Y)} }} }} \hfill \\ \rho_{NN} = \frac{{{\text{cov}} (N^{(T,Y)} ,N^{{(T,Y^{\prime})}} )}}{{\sigma_{{N^{(T,Y)} }} \sigma_{{N^{{(T,Y^{\prime})}} }} }} \hfill \\ \end{gathered} \right.$$

(1)

where ${\text{cov}} ( \cdot )$ refers to the covariance operator, σ is the standard deviation, $\rho_{NW}$ means in the same station at the same time the correlation coefficient of wind speed, $N^{(T,Y)}$ and $W^{(T,Y)}$, respectively, represent the noise and wind speed values of $Y$ station at time $T$. $\rho_{NN}$ represents the correlation coefficient between the noise values of different stations at the same time, $N^{(T,Y)}$ and $N^{{(T,Y^{\prime})}}$ represent the noise of $Y$ and $Y^{\prime}$ at time $T$, respectively. Draw conclusions based on the information in Fig. 3:

1.
Correlation of data. According to (a), the correlation coefficient between noise and wind speed is 0.48, which is the main influencing factor in the existing information. According to (d), the noise data of different stations are correlated.
2.
Similarity of data. According to (a), the fluctuation trend of noise and wind speed is similar. It is necessary to correlate wind speed information to predict noise more accurately.
3.
Periodicity of data: Traffic flow and noise level have similar periodicity. Among them, at zero o’clock, the peak of vehicle entry and exit is reached, and the second peak of traffic in the park is reached around 12 noon.

The multivariate data set contains the influence of traffic flow, wind speed and wind direction on noise change, and the multi-station data set contains the correlation between the noise of neighboring stations and the stations to be measured. The Multi-PL noise prediction method is proposed according to the unique data attribute of park.

4 Multi-PL model based on Prophet and LSTM

4.1 Multi-element LSTM model

LSTM (long short-term memory) network model is an improvement of RNN (recurrent neural network). The infrastructure of LSTM contains a part that controls the storage state, which can solve the problem of gradient disappearance encountered by RNN [31]. In this paper, the method of supervised learning is adopted, which does not require artificial construction of time series features. The time series curve can be fitted through deep learning network, and the long-term dependence of time sequence relationship can be captured for feature learning and prediction. The principle of LSTM is shown in Fig. 4.

When $f_{t} = 1$, it means that the short-term memory is completely retained. After the noise data are input, whether it can be stored in the cell depends on the input gate, and the output of the input gate is $C_{t}$ as in the formula (3).

$$f_{t} = \sigma (W_{f} [h_{t - 1} ,n_{t} ]) + b_{f} ,f_{t} \in [0,1]$$

(2)

$$C_{t} = f_{t} *C_{t - 1} + \sigma (W_{i} [h_{t - 1} ,n_{t} ] + b_{i} )*\tanh (W_{C} [h_{t - 1} ,n_{t} ] + b_{C} )$$

(3)

$n_{t}$ represents the input noise of the current layer. $h_{t - 1}$ is the output noise of the previous layer and the hidden state of the current layer. The above formula represents the state of the new cell after discarded useless information and retained some new information, where $i_{t} = \sigma (W_{i} [h_{t - 1} ,n_{t} ] + b_{i} )$ and it represents the probability of new information being retained, and the prediction noise of output depends on the output gate:

$$o_{t} = \sigma (W_{o} [h_{t - 1} ,n_{t} ] + b_{o} )$$

(4)

$$Y_{t} = h_{t} = o_{t} *\tanh (C_{t} )$$

(5)

$o_{t}$ is the output probability. Multiplying $o_{t}$ and hyperbolic tangent function $\tanh (C_{t} )$ can achieve the purpose of controlling the cell state filtering, and the output $Y_{t}$ is the hidden state of the next layer. In the above expression, $W_{f} ,W_{i} ,W_{C} ,W_{o}$ are the function parameter weight vectors and $b_{f} ,b_{i} ,b_{C} ,b_{o}$ are the bias vectors.

The essence of realizing multivariate is to form a sample with multiple dimensions of multiple information and transform it into a supervised learning problem, so as to achieve the purpose of multiple inputs and single output. There are 32 neurons in the first hidden layer, 1 neuron in the output layer is used to predict noise, and the input variables are four-dimensional information including wind speed, noise of neighboring station based on wind direction, traffic flow information and noise of prediction station. The output is prediction noise of prediction station with 2 prediction steps and time interval of 10 min. The model was trained 100 times with a batch size of 128, tracking training and test losses during training by setting the validation_data parameter in the fit () function.

Multi-factor features were extracted based on LSTM model for noise prediction. The prediction error was large during the abrupt change period: In January, the noise plunged about 4.5 dB, and the high error of the prediction result was about 2 dB. Therefore, the Prophet model was introduced to fuse multi-station information to improve the prediction accuracy.

4.2 Prophet model based on spatial multi-station regression

The Prophet prediction model has great advantages in processing periodic data with abnormal values and trend changes, and the noise of chemical parks has strong micro-abruptness and macro-regularity. Therefore, Prophet model is introduced for noise prediction in this paper. Prophet model decomposes the time series according to the following formula:

$$P\left( t \right) = g\left( t \right) + s\left( t \right) + h\left( t \right) + \varepsilon \left( t \right)$$

(6)

In formula (6), $g\left( t \right)$ represents the noise trend term, which is mainly used to fit aperiodic changes in the time series. We use a trend term model based on piecewise linear functions:

$$g\left( t \right) = (k + a(t)^{T} \delta )t + (m + a(t)^{T} \gamma )$$

(7)

In formula (7), $m$ is the offset, $k$ represents the growth rate, and $\delta$ represents the change in the growth rate. The indicator function is: $a(t) = (a_{1} (t),...,a_{S} (t))^{T}$. $a(t) \in \left\{ {0,1} \right\}^{S} ,a_{j} (t) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if}}\;t \ge S_{j} } \hfill \\ {0,} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$, $\gamma = (\gamma_{1} ,...,\gamma_{S} )^{T} ,\gamma_{j} = - S_{j} \delta_{j}$, where S represents the number of mutation points.

$s\left( t \right)$ is a periodic term modeled by Fourier series:

$$s\left( t \right) = \sum\limits_{n = 1}^{N} {\left[ {a_{n} \cos \left( {\frac{2\pi nt}{P}} \right) + b_{n} \sin \left( {\frac{2\pi nt}{P}} \right)} \right]}$$

(8)

In Formula (8), $t$ represents a fixed period, $2n$ represents the number of periods expected to be used in the model, $P$ represents the period of the time series, and P = 7 represents a period of weeks.

$h\left( t \right)$ is a holiday item that regards the influence of each holiday at different times as an independent model. $\varepsilon \left( t \right)$ represents the error term or interference term, which represents random and unpredictable fluctuations. Prophet algorithm can add up trend terms, season terms and so on to be the predicted value of time series.

In this paper, the method add_regressor() was used to add data from multiple stations as regression variables for fitting. First, the noise time series data of other sites were added to Prophet in turn for prediction. Then, the sites were sorted according to the RMSE size of the prediction results, and the ranking results were added to Prophet model in turn to improve the prediction accuracy. Although the Prophet model is flexible, it cannot consider the influence of the characteristics of multi-dimensional factors. Therefore, achieving accurate prediction requires a more complete prediction scheme.

4.3 Multi-PL model based on Prophet and LSTM combination

Based on the characteristics of the Prophet and LSTM models, we propose the Multi-PL model to make up for the limitations of a single model, and can effectively use the park information and the advantages of the two models to achieve higher-precision noise prediction.

Firstly, the noise feature sequence of adjacent stations based on wind direction was constructed, and the wind direction was classified as direction labels with time series features. Extract the noise value of the corresponding site during the time according to the tag, stitch the extracted noise value into a new time series feature, which is the noise feature in Fig. 5, and construct a multi-element LSTM model by combining the time series features of traffic flow and wind speed. The above work is based on the multivariate data set ${\text{Train}}\;{\text{Set}}\;1$. The data of each site in the multi-station dataset ${\text{Train}}\;{\text{Set}}\;2$ were, respectively, used in the Prophet model, sorted according to the size of RMSE of different sites, and added to the Prophet model in the order of RMSE from small to large.

Use the cftool (Curve Fitting Tool) curve fitting toolbox in MATLAB to fit the two model prediction results and the real noise value in the training set, and obtain the formula (9) between the actual noise value and the model prediction value:

$${\text{Train}}\_{\text{true}} = A*{\text{LSTM}} + B*{\text{Prophet}} + C$$

(9)

The method of obtaining the relationship between the actual value and the predicted value by fitting method is closer to the true value than the linear weighting method of the predicted value of the two models, and has the property of constant compensation, which prevents the training result of a certain model from being too high or too low leading to deviations in forecast results.

5 Experiment and result analysis

Firstly, the proportion and evaluation indexes used in the training set are described, and then, the 3σ criterion and multivariate multi-station prediction results analysis are introduced. Finally, the Multi-PL proposed in this paper is compared with Prophet + LSTM linear weighted combination model, LSTM, Prophet, BP neural network model, traditional Kalman filter prediction model and other prediction models, to verify that the proposed method has better accuracy and prediction ability.

5.1 Train set proportion and evaluation index

The proportion of training set and test set in multivariate data set and multi-station data set is determined by experimental comparison. The LSTM deep neural network is prone to overfitting, and the Prophet model has good stability. Taking single-site prediction as an example, the difference in RMSE between different data set ratio experiments does not exceed 0.5. Therefore, the LSTM model is used as the basis for data set division to ensure, however, the best is selected on the basis of fitting. According to Table 1, 72% of the training set is finally determined, and the rest is the test set.

Table 1 LSTM prediction results

Full size table

In order to verify the validity of Multi-PL prediction model, this paper uses three evaluation indexes: root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination ($R^{2}$). The calculation formula is as follows:

$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{t = 1}^{n} {\left( {x - \tilde{x}} \right)}^{2} }$$

(10)

$${\text{MAE}} = \frac{1}{n}\sum\limits_{t = 1}^{n} {\left| {x - \tilde{x}} \right|}$$

(11)

$$R^{2} = 1 - \frac{{\sum\limits_{t = 1}^{n} {(x - \tilde{x})^{2} } }}{{\sum\limits_{t = 1}^{n} {(x - \overline{x})^{2} } }}$$

(12)

$\overline{x}$ is the mean value of the true value of noise, $x = (x_{1} ,x_{2} , \ldots ,x_{n} ),x_{i} \in R^{n}$ is the true value of noise, $\tilde{x} = (\tilde{x}_{1} ,\tilde{x}_{2} , \ldots ,\tilde{x}_{n} ),\tilde{x}_{i} \in R^{n}$ is the predicted value of noise in Eqs. (10) and (11), expressed as the fitted value of the predicted values of the two models in Eq. (12), and $n$ is the number of time series values. The smaller the number of values, RMSE and MAE, the better the predictive ability of the model. The closer $R^{2}$ is to 1, the better the predictive effect of the fitted model.

5.2 Analysis of forecast results

The 3σ criterion assumes that a set of data contain only random errors, and the noise value ${\text{noise}} \in (u - 3\sigma ,u + 3\sigma )$ interval accounts for about 99.74%. It is believed that any error exceeding this interval is not a random error but a gross error. The data containing this error should be removed or replaced, u represents the mean value of noise, σ is the noise standard deviation, and noise is the noise value.

This article replaces the noise range at $0 \le {\text{noise}} < u - 3\sigma ({\text{dB}})$ with the mean value. Take the noise data of Station 10 in Fig. 6 as an example, part A is the original noise value containing the zero value of the sparse mutation point, and the unbiased standard deviation of the sample is 4.45. Part B represents the noise value after the above 3σ treatment, and the unbiased standard deviation of the sample is 4.28.

According to Table 2, RMSE decreases by at least 0.1 dB and MAE also decreases for both single-station and multi-station predictions using the 3σ criterion; compared with single-station data, the predicted RMSE and MAE of multi-station data set used in Prophet model are reduced by 5.3% and 7.3%, respectively. Each station is used for noise prediction of station 10. The RMSE and MAE of each station are shown in sub-pictures 1 and 3 in Fig. 7. After the stations are sorted according to RMSE, they are shown in sub-pictures 2 and 4. According to the order, the multi-station data are added to the Prophet model as regression variables, and the RMSE and MAE of the disorderly prediction are reduced by 26.3% and 22.8%, respectively. In the multi-site ordered Prophet method, the RMSE and MAE of the data processed by the 3σ principle are reduced by 32.2% and 23.3%; compared with single-site data, the RMSE and MAE predicted by using the multivariate data set in the LSTM model are reduced by 9.3% and 15.9% dB, respectively.

Table 2 Comparative experimental results

Full size table

It can be seen from Table 2 that the prediction results with the application of 3σ criterion have higher accuracy. The Prophet model uses multi-station ordered data with the highest accuracy, and the LSTM model uses multivariate data sets with higher accuracy than the original station. On this basis, the data predicted by LSTM and Prophet training set were fitted, and the relationship between the real noise value of the training set and the predicted value of LSTM and Prophet training set was obtained as shown in Eq. (13), where $L(t),P(t)$ are the predicted results of LSTM and Prophet training set, respectively. $f(t)$ is the fitting predicted value.

$$f(t) = - 20.57 + 1.185*L(t) + 0.27P(t)$$

(13)

The RMSE of $f(t)$ obtained by fitting and the true value is 0.54. The data points in Fig. 8 are basically fitted to the same plane and the coefficient of determination $R^{2} = 0.962$. The fitting effect is good. After the test set was fed into LSTM and Prophet, the predicted value was put into the verification Eq. (13), and the prediction result $f_{{{\text{test}}}} (t)$ of the Multi-PL model was obtained as shown in Fig. 9.

Among them, ${\text{Test}}\;{\text{Set}}\;1$ is from multivariate data set and ${\text{Test}}\;{\text{Set}}\;2$ is from multi-station data set. Figure 10 shows the true value, LSTM and Prophet noise predicted value. Compared with the predicted value fitted by the Multi-PL model in Fig. 11, Multi-PL makes up for the prediction deviation of the two models and improves the prediction accuracy of outliers contained in the noise. The RMSE and MAE of $f_{{{\text{test}}}} (t)$ and the true value were 0.53 and 0.46 dB, respectively. The prediction result of Multi-PL model is obviously better than that of single LSTM and Prophet model.

5.3 Comparison results of different prediction models

In order to verify the prediction performance of Multi-PL model, two evaluation indexes, RMSE and MAE in Sect. 4.1, are used to evaluate Kalman filter prediction, BP neural network, LSTM, Prophet, and Prophet + LSTM linear weighted model (optimal weight:$\omega_{{{\text{LSTM}}}}$ = 0.5, $\omega_{{{\text{Prophet}}}}$ = 0.5). According to Table 3, the prediction results of Multi-PL are better than other prediction methods, and the accuracy of RMSE and MAE is improved by 45.9% and 25.9%, respectively, compared with the linear weighted model. Multi-PL model can be used as an effective prediction model for chemical industry parks.

Table 3 Analysis of results of different prediction models

Full size table

6 Conclusions

It is very important to analyze the noise law and influencing factors in chemical industry park and improve the prediction accuracy of noise, which is of great significance to guide the working time planning and workers' hearing health protection. Based on the appearance law of time series data such as noise, traffic flow, wind direction and wind speed in a chemical park, this paper uses the 3σ criterion to replace the zero value of noise, and proposes a Multi-PL model based on multivariate information and multi-station information. Design and implement the comparative experiment with Prophet + LSTM weighted model, single model, Kalman filter prediction model and traditional BP neural network model under each weight coefficient. The results show that the time series data of park noise processed by the 3σ criterion have better performance in the prediction model, and the prediction error of multi-station Prophet and multivariate LSTM neural network model is lower than the traditional Kalman filter prediction model and BP neural network model. Moreover, Prophet + LSTM linear weighted combination model has a slightly higher prediction accuracy than the above models, and Multi-PL model which can effectively use park data and has constant compensation property has the best effect. Compared with linear weighted combination model, RMSE and MAE errors are reduced by 0.45 dB and 0.36 dB, respectively. Multi-PL can be used as an effective noise prediction model in chemical industry park. On the basis of the wide application of intelligent parks, this study can provide a new idea for noise prediction in parks.

This paper only constructs the prediction model fitted by two multi-factor models. In the future, the traditional prediction model based on statistical method can be introduced to make up the disadvantage of neural network and get more accurate noise prediction results. In addition, transfer learning or reinforcement learning can be used to predict the overall noise level of the park.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Abbreviations

3σ criterion:: A method of outlier discrimination based on normal distribution
RNN:: Recurrent neural network
LSTM:: Long short-term memory
Prophet:: An open source time series prediction algorithm provided by Facebook
Multi-PL:: Noise prediction model based on LSTM and Prophet
Kalman:: An algorithm for optimal estimation of system state by using linear system state equation
BP:: Back propagation
RMSE:: Root mean-squared error
MAE:: Mean absolute error
R ² :: R squared

References

X. Liu, X. Zhang, Rate and energy efficiency improvements for 5G-based IoT with simultaneous transfer. IEEE Internet Things J. 6(4), 5971–5980 (2019)
Article Google Scholar
J. Wu, H. Miao, J. Liu, Hearing status of workers in automobile industry and the correlation of influencing factors of noise deafness. Clin. Res. China 30(5), 713–716 (2017)
Google Scholar
Z.U.R. Farooqi, Assessment of noise pollution and its effects on human health in industrial hub of Pakistan. Environ. Sci. Pollut. Res. Int. 27(3), 2819–2828 (2020)
Article MathSciNet Google Scholar
G.R. Taffere, M. Bonsa, M. Assefa, Magnitude of occupational exposure to noise, heat and associated factors among sugarcane factory workers in Ethiopia, 2017. J. Public Health (Berl.) 28, 517–523 (2020)
Article Google Scholar
I.P. Nyarubeli, A.M. Tungu, Variability and determinants of occupational noise exposure among iron and steel factory workers in Tanzania. Ann. Work Expos. Health 62(9), 1109–1122 (2018)
Article Google Scholar
K. Ayda, G. Perihan, “Noise factory”: a qualitative study exploring healthcare providers’ perceptions of noise in the intensive care unit. Intensive Crit. Care Nurs 63, 102975 (2020)
Google Scholar
M. Susan, Life: Industrial noise disrupts plants: effects on animals alter dispersal of seeds, pollen. Sci. News 181(8), 15 (2012)
Google Scholar
X. Liu, X.B. Zhai, W. Lu, C. Wu, QoS-guarantee resource allocation for multibeam satellite industrial internet of things with NOMA. IEEE Trans. Ind. Inform. 17(3), 2052–2061 (2021)
Article Google Scholar
H. Zhang, Intelligent 5G: when cellular networks meet artificial intelligence. IEEE Wirel. Commun. 24(5), 175–183 (2017)
Article Google Scholar
X. Liu, X. Zhang, NOMA-based resource allocation for cluster-based cognitive industrial internet of things. IEEE Trans. Ind. Inf. 16(8), 5379–5388 (2020)
Article Google Scholar
W. Po-Jiun, H. Chihpin, Noise prediction using machine learning with measurements analysis. Appl. Sci. 10(18), 6619 (2020)
Article Google Scholar
X. Zhang, M. Zhao, R. Dong, Time-series prediction of environmental noise for urban IoT based on long short-term memory recurrent neural network. Appl. Sci. 10(3), 1144 (2020)
Article Google Scholar
S. Jaffry, Cellular traffic prediction with recurrent neural network//2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT). (IEEE, 2020)
A.E. Vela, Y. Oleyaei-Motlagh, Ground level aviation noise prediction: a sequence to sequence modeling approach using LSTM recurrent neural networks//2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC). (IEEE, 2020)
X. Tao, S. Han, Y. Guoqing, Integrated model of airport noise prediction based on spatial fitting and neural network. China Environ. Sci. 36(04), 1250–1257 (2016)
Google Scholar
C. Liu, Research on Interactive Prediction of Airport Noise Monitoring Points Based on Time Series Similarity Measure (Nanjing University of Aeronautics and Astronautics, Nanjing, 2018)
Google Scholar
B. Sun, L. Chen, Univariate traffic noise prediction considering traffic flow state of road network. Noise Vib. Control 41(02), 190–195 (2021)
Google Scholar
S. Sameer, K. Satish, Assessment and prediction of environmental noise generated by road traffic in Nagpur City, India. Environ. Pollut. 77, 167–180 (2018)
Article Google Scholar
K. Mashael, L. Kaouther, A. Nada, Time series Facebook Prophet model and python for COVID-19 outbreak prediction. Comput. Mater. Continua 67(3), 3781–3793 (2021)
Article Google Scholar
J. Song, H. Xie, B. Gao, Maximum likelihood-based extended Kalman filter for COVID-19 Prediction. Chaos Solitons Fractals 146, 110922 (2021)
Article MathSciNet Google Scholar
F. Li, K. Lam, X. Liu, J. Wang, K. Zhao, L. Wang, Joint pricing and power allocation for multibeam satellite systems with dynamic game model. IEEE Trans. Veh. Technol. 67(3), 2398–2408 (2018)
Article Google Scholar
Z. Liu, Z. Zhu, J. Gao, C. Xu, Forecast methods for time series data: a survey. IEEE Access 9, 91896–91912 (2021)
Article Google Scholar
Y. Wang, D. Wang, Y. Tang, Clustered hybrid wind power prediction model based on ARMA, PSO-SVM, and clustering methods. IEEE Access 8, 17071–17079 (2020)
Article Google Scholar
M. Wu, Y. Ye, N. Hu, Q. Wang, ‘EMD-GM-ARMA model for mining safety production situation prediction.’ Complexity 2020, 1–14 (2020)
Google Scholar
P. Unnikrishnan, V. Jothiprakash, Hybrid SSA-ARIMA-ANN model for forecasting daily rainfall. Water Resour. Manag. 34(11), 3609–3623 (2020)
Article Google Scholar
C.G. Ozoegwu, Artificial neural network forecast of monthly mean daily global solar radiation of selected locations based on time series and month number. J. Cleaner Prod. 216, 1–13 (2019)
Article Google Scholar
S. Wu, F. Shao, R. Sun, ‘Corn futures price forecast based on ARIMA time series and support vector machine. in Proceedings of 4th International Conference on System Computing and Big Data, vol. 5, pp. 41–49 (2019)
S. Chan, I. Oktavianti, V. Puspita, A deep learning CNN and AI-tuned SVM for electricity consumption forecasting: Multivariate time series data, IEEE 10th Annual Information Technology, Electronics and Mobile Communication. Conference (IEMCON), pp. 488–494 (2019)
I.E. Livieris, E. Pintelas, P. Pintelas, A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl. 32, 17351–17360 (2020)
Article Google Scholar
B.B. Sahoo, R. Jha, A. Singh et al., Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting. Acta Geophys. 67, 1471–1481 (2019)
Article Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 61701284, U1931207 and 61702306, the Innovative Research Foundation of Qingdao under Grant No.19-6-2-1-cg, the Application Research Project for Postdoctoral Researchers of Qingdao, the Sci. & Tech. Development Fund of Shandong Province of China under Grant No. ZR202102230289, ZR202102250695 and ZR2019LZH001, the Humanities and Social Science Research Project of the Ministry of Education under Grant No.18YJAZH017, the Taishan Scholar Program of Shandong Province, the Shandong Chongqing Science and technology cooperation project under Grant No. cstc2020jscx-lyjsAX0008, the Sci. & Tech. Development Fund of Qingdao under Grant No. 21-1-5-zlyj-1-zc, SDUST Research Fund under Grant No. 2015TDJH102, and the Science and Technology Support Plan of Youth Innovation Team of Shandong higher School under Grant No. 2019KJN024.

Author information

Authors and Affiliations

College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
Qingtian Zeng, Yu Liang & Geng Chen
College of Mathematics and System Science, Shandong University of Science and Technology, Qingdao, 266590, China
Hua Duan
School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
Chunguo Li

Authors

Qingtian Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Geng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hua Duan
View author publications
You can also search for this author in PubMed Google Scholar
Chunguo Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

QTZ, YL, GC and CGL conceived and designed the experiments. QTZ and YL performed the experiments. GC, HD analyzed the data. YL, GC and QTZ wrote the paper. All authors have contributed to this research work and read and approved the final manuscript.

Corresponding author

Correspondence to Geng Chen.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zeng, Q., Liang, Y., Chen, G. et al. Noise prediction of chemical industry park based on multi-station Prophet and multivariate LSTM fitting model. EURASIP J. Adv. Signal Process. 2021, 106 (2021). https://doi.org/10.1186/s13634-021-00815-6

Download citation

Received: 02 September 2021
Accepted: 20 October 2021
Published: 29 October 2021
DOI: https://doi.org/10.1186/s13634-021-00815-6

Noise prediction of chemical industry park based on multi-station Prophet and multivariate LSTM fitting model

Abstract

1 Methods/experimental

2 Introduction

3 System model and data set

3.1 System model

3.2 Data set construction

3.2.1 Data set preprocessing

3.2.2 Multivariate data set and multi-site data set

4 Multi-PL model based on Prophet and LSTM

4.1 Multi-element LSTM model

4.2 Prophet model based on spatial multi-station regression

4.3 Multi-PL model based on Prophet and LSTM combination

5 Experiment and result analysis

5.1 Train set proportion and evaluation index

5.2 Analysis of forecast results

5.3 Comparison results of different prediction models

6 Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords