De-noising classification method for financial time series based on ICEEMDAN and wavelet threshold, and its application

This paper proposes a classification method for financial time series that addresses the significant issue of noise. The proposed method combines improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) and wavelet threshold de-noising. The method begins by employing ICEEMDAN to decompose the time series into modal components and residuals. Using the noise component verification approach introduced in this paper, these components are categorized into noisy and de-noised elements. The noisy components are then de-noised using the Wavelet Threshold technique, which separates the non-noise and noise elements. The final de-noised output is produced by merging the non-noise elements with the de-noised components, and the 1-NN (nearest neighbor) algorithm is applied for time series classification. Highlighting its practical value in finance, this paper introduces a two-step stock classification prediction method that combines time series classification with a BP (Backpropagation) neural network. The method first classifies stocks into portfolios with high internal similarity using time series classification. It then employs a BP neural network to predict the classification of stock price movements within these portfolios. Backtesting confirms that this approach can enhance the accuracy of predicting stock price fluctuations.


Introduction
The classification of time series is a crucial research area with applications in healthcare, econometrics, and voice recognition, among other fields.As a result, numerous time series classification methods have been developed.However, the accuracy of classification algorithms, particularly those based on Euclidean and DTW distances, consistently declines [1] as the noise standard deviation increases.Noise has become a critical challenge in time series classification.
The literature indicates that the wavelet method is utilized for signal decomposition and de-noising [2,3].Similar to the empirical mode decomposition method, the wavelet method offers a multi-frequency and multi-scale analysis [4,5], and it has been extensively researched and applied [6,7].Fractal images or fractal noise, which are present in chaotic systems across various fields such as physics, biology, psychology, economics, and finance [8][9][10], have led scholars to integrate fractal theory [11,12] into the Wavelet method to develop fractal wavelet techniques [13][14][15][16].
Inspired by this, scholars have focused on various joint de-noising methods combining modal decomposition with wavelet threshold, including EMD and wavelet threshold de-noising [17][18][19], CEEMDAN and wavelet threshold de-noising [20], ICEEMDAN and wavelet threshold de-noising [21], and variational mode decomposition (VMD) with wavelet threshold de-noising [22].A prevalent challenge in these methods is determining whether an IMF component is dominated by noise.Common practice involves calculating the Pearson correlation coefficient between the IMF component and the original signal to gauge the IMF's information content.A threshold is set, below which the IMF components are deemed noise-dominated.However, using Pearson's correlation coefficient to identify noise components presents two problems: first, a lower degree of linear correlation between the IMF component and the original signal does not necessarily mean that the IMF component is noise; second, setting the correlation threshold is somewhat subjective and lacks convincing justification.To address this, this paper introduces a joint verification method that employs the t test and unit root test to ascertain whether an IMF component is noise-dominated.This method is rooted in the nature of noise and offers a clear parameter testing approach.It can replace the correlation coefficient test in various modal decomposition methods combined with wavelet threshold de-noising.
Building on this, the paper proposes a time series classification method based on ICEEMDAN and Wavelet Threshold joint de-noising.The process begins with ICEEM-DAN, which decomposes the time series into a series of IMF components and residuals.The noise component verification method proposed here is then applied to categorize the IMF components and residuals into noise and de-noised elements.The noise components are subsequently de-noised using the wavelet threshold method, resulting in non-noise sequences.The final de-noised output is formed by combining these nonnoise sequences with the de-noised components, after which the nearest neighbor 1-NN algorithm is used for time series classification.
Wang et al. [23] proposed a two-stage investment strategy for bear markets, initially using tail correlation coefficients for hierarchical clustering of assets based on fuzzy matrices, then selecting one asset from each cluster to form an investment portfolio.Empirical evidence showed that this method could construct portfolios more resistant to risk during bear markets.Gupta et al. [24] introduced a two-step investment framework that first employs a Bayesian classifier to identify investment targets for a portfolio, and then applies multiple criteria decision-making (MCDM) techniques to devise investment strategies.
This paper contends that the essence of financial time series classification methods lies in identifying the similarity among various financial time series.Technical indicators derived from portfolios with higher internal similarity are posited to be more effective than those from less similar portfolios.Consequently, a two-step classification prediction method for stock portfolios is introduced.This method first uses a time series classification algorithm to select stocks with higher similarity within a certain industry, then employs a BP neural network to predict the classification of stock price movements within the portfolio.
The marginal contributions of this paper are threefold: (1) It proposes a noise component verification method with an objective and clear judgment standard, applicable to noise verification of IMF components across all modal decomposition methods.(2) The de-noising method put forward ensures that essential information is preserved and can effectively precede all time series data mining tasks.(3) It introduces a two-step stock classification prediction method that combines time series classification with a BP neural network, aiming to improve the accuracy of predicting stock price movements in investment portfolios.

Improved complete ensemble empirical mode decomposition with adaptive noise
The CEEMDAN algorithm in the current technology can effectively reduce the error in signal reconstruction, restoring the completeness of EMD.However, IMF components are easily affected by noise, and problems with residual noise and pseudo-modal components persist.The improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) algorithm [25] introduces a local envelope average, allowing for the decomposition of IMF components with less noise and greater physical significance.Let x(t) represent the original time series, E j (•) represent the j-th IMF com- ponent obtained after EMD decomposition, ω i represent the i-th added Gaussian white noise, β k the amplitude coefficient of the added noise, i.e., the signal-to-noise ratio in the k-th stage, and i = 1, . . ., I represent the number of experiments.The specific steps of the ICEEMDAN algorithm are as follows: Step 1 Add noise to the time series x(t) to construct a new time series.
Step 2 Through EMD, calculate the i-th local average in Eq. ( 1), obtaining the first stage residual: where < M(•) > is the operator to calculate the average.The noisy signal x i (t) obtains the first modal component of ICEEMDAN through EMD, i.e., Step 3 Similarly, perform I experiments ( i = 1, . . ., I ), calculate the local average of the signal r 1 (t) + β 1 E 2 (ω i ) , and obtain the second stage residual: Subtract Eq. (2) from Eq. ( 4) to obtain the second IMF component of the original sequence: (1) Step 4 Repeat Step 3 until the extreme points of the residual do not exceed 2. The recursive formula for the k-th residual is as follows: and the k-th component of the original sequence is obtained: The final residual is: Thus, the original time series x(t) is ultimately decomposed into:

Noise component test method
In reality, financial time series contain a significant amount of noise.Assuming noise ε(t) = 0 is unrealistic; the actual data are often ε(t) = 0 .Without loss of generality, let us assume that random noise ε(t) exists in the time series x(t) .This noise reflects the impact of random factors on the de-noised time series x(t) .Consequently, we construct Under the Gaussian assumption, the noise is considered white noise, meaning it follows a normal distribution with a mean of 0 and a variance of σ 2 .It is denoted as ε(t) ∼ N (0, σ 2 ) .At this point, the noise ε(t) should exhibit zero mean and homosce- dasticity.However, heteroscedasticity tests often necessitate the use of explanatory variables from the original model to construct an auxiliary regression model.This model helps determine whether random errors display heteroscedasticity.Conducting this test is challenging without first building a regression model.
In cointegration tests, if the variables X t and Y t are both first-order integrated I(1) , we assume the original model is Y t = β 0 + β 1 X t + ε t .In tests for cointegration relation- ships, if ε(t) is stationary with a mean of 0, it suggests that X t and Y t have a cointegrating relationship, ensuring that random errors in the equation do not accumulate.Conversely, if ε(t) follows a random walk (unit root process), it implies that random errors in the equation will accumulate, leading to persistent deviations from equilibrium that cannot self-correct.If the random time series X t is stationary, then: (1) The mean of X t does not change over time, E(X t ) = µ.
(2) The variance of X t does not change over time, VAR(X (3) The covariance between X t and X t−k at any two periods relies solely on the distance or lag length (k) between these periods and does not depend on other variables (for all k).This is expressed as the covariance between X t and X t−k : If any of the above properties are not met, X t is said to be non-stationary.
Given that this paper focuses on time series data, we can substitute the Gaussian model's test for random disturbances with an examination of whether ε(t) is a stationary pro- cess with a mean of 0. When ε(t) is stationary with a mean of 0, deviations from x(t) are corrected promptly.The elimination of random noise ε(t) does not affect the long-term trend of x(t).
In the financial markets, the Shenzhen Component Index is one of the indices that most accurately represents the Chinese stock market.This paper compiles a financial time series sample using the daily closing prices of the Shenzhen Component Index from 2000 to 2021.The sample comprises 5332 data points for the Shenzhen Component Index.As depicted in Fig. 1, several IMF components and residues were derived from the CEEMDAN decomposition of the Shenzhen Component Index.The red line showcases distinct heteroscedasticity in the high-frequency IMF components during certain periods.Further analysis reveals that although the composite component of the initial high-frequency components passes the zero mean and stationarity tests, it exhibits heteroscedasticity in specific periods.This is a characteristic outcome, akin to "leptokurtic" and "volatility clustering" observed in financial time series.These heteroscedasticities signify that high-frequency IMF components are not solely composed of noise.Consequently, this paper suggests that if the composite component of high-frequency components, decomposed from a time series via the empirical mode decomposition method, is stationary with a mean of 0, it lacks long-term trend elements and is presumed to be primarily noise.This component is referred to as the "noise-containing component" in this paper.Further decomposition of this noise-containing component is necessary to extract valuable information.
Following the decomposition of a time series into a series of modal components and residues via the empirical mode decomposition method, the modal components and residues can be aggregated into two categories.Without loss of generality, if the division is (11) Fig. 1 CEEMDAN decomposition of the Shenzhen component index between 1, . . ., i and i + 1, . . ., the noise-containing component and the de-noised com- ponent can be obtained, denoted respectively as x(t) noise and x(t) non_noise : The decomposition should satisfy the following conditions: For testing conditions ( 1) and ( 2), a population mean test can be conducted on IMF k ( k = 1, . . ., i ) and i k=1 IMF k respectively, denoted as H 0 : µ = 0, H 1 : µ � = 0 .The t test statistic can be constructed as follows: Hence, the rejection region is {|t| > t α/2 (n − 1)}.For testing conditions (3) and ( 4), the ADF test can be used.

Wavelet threshold de-noising
Donoho [26] proposed a de-noising method based on wavelet transformation, known as the wavelet threshold de-noising method.This method has been widely studied and applied [27][28][29].It involves selecting suitable wavelet basis functions and decomposition levels, performing wavelet decomposition on the noise-containing signal, and obtaining a series of low-frequency and high-frequency wavelet coefficients.These coefficients are then processed with a threshold function.After processing, the high-frequency and lowfrequency coefficients are reconstructed to produce a signal from which noise has been removed.

Threshold selection criteria
1 In wavelet threshold de-noising, the criteria for threshold selection typically include: 2 Fixed threshold (sqtwolog), 1 = σ n √ 2 ln N , where σ n is the noise standard devia- tion and N is the signal length.

Threshold functions
After the noise-containing component x(t) noise undergoes wavelet decomposi- tion, the wavelet coefficients are de-noised using threshold functions.This process separates the noise component ε(t) from the non-noise component x(t) , where . The wavelet coefficient processing includes soft threshold functions, hard threshold functions, and some improved threshold functions.Assuming ω j,k is the wavelet coefficient, ωj,k is the quantized wavelet coefficient, sgn is the sign function, and is the threshold, the functions are as follows: (1) Soft threshold function [30] (2) Hard threshold function [31] (3) Improved threshold function (a1) [32] (4) Improved threshold function (a2) [33] (5) Improved threshold function (a3) [34]

Euclidean distance
The similarity measure D(x i , x j ) between time series x i (t) and x j (t) is a function that takes two time series x i (t) and x j (t) as inputs and returns the distance d between the two time series.
Euclidean distance (ED) [35] is one of the most commonly used methods for measuring similarity in time series classification.It can be understood as the length of the straight line segment connecting two points and measures the absolute distance between two points in multidimensional space.The formula for Euclidean distance is as follows: (15)

Nearest neighbor algorithm
First, we find the k-nearest neighbor samples of the studied sample in the training data set.If most of the k-nearest neighbor samples belong to a certain category, then the sample also belongs to this category, which is the k-nearest neighbor algorithm (KNN) [36].
The specific formula is as follows: Importing: training datasets where x i ∈ χ ⊆ R n is the time series of the sample, y i ∈ y = (c 1 , c 2 , . . ., c K } and is the category of the sample, i = 1, 2, . . ., N. Output: Class Y to which test set sample x belongs. (1) According to the Euclidean distance metric, find k points nearest to the test set sample X in the training set t, and record the field of X covering these k points as N k (x); (2) Determine the category y of X in N k (x) according to the classification decision rules (majority voting): where I is the indicating function, that is, at that time y i = c j I is 1, otherwise I is 0.
The special case of k-nearest neighbor algorithm is the case of k = 1, which is called nearest neighbor 1-NN algorithm.Because the nearest neighbor 1-NN algorithm has the advantage of no parameters, it is more convenient to compare among various methods, so this paper selects the nearest neighbor 1-NN algorithm to determine the classification label of samples.

Classification method steps
The ICEEMAN method is employed here for empirical mode decomposition.This paper proposes a financial time series de-noising classification method based on ICEEMDAN and wavelet threshold, with the steps detailed as follows: Step 1 Utilize ICEEMDAN to decompose the time series, resulting in IMF components and a residual component.
Step 2 Carry out t tests and unit root tests for each IMF k and i k=1 IMF k , then gather the IMF components and the residual components to form the noise-containing component x(t) noise and the noise-removed component x(t) non_noise .
Step 3 Apply wavelet threshold de-noising to the noise-containing component x(t) noise , breaking it down into the noise component ε(t) and the retained noise- free component x(t) .ε(t) will be the final noise component.Integrate the retained I(y i = c j ), i = 1, 2, . . ., N ; j = 1, 2, . . ., K noise-free component in the noisy component x(t) to the noise-removed component x(t) non_noise , obtaining the final de-noised signal x(t) non_noise .The de-noising meth- ods proposed in this paper are outlined above.
Step 3 Two-step stock classification forecasting method based on classification method and BP neural network

BP neural network
The BP (Backpropagation) neural network [37] refers to one of the most classical neural networks.It is a neural network that uses the backpropagation algorithm.Backpropagation involves gathering errors produced in the simulation process, feeding the aforementioned errors back to the output values, then adjusting the weights of the neurons with these errors, thus generating an artificial neural network system that is capable of simulating the original problem.
A BP neural network primarily consists of an input layer, one or more hidden layers, and an output layer, each with a certain number of nodes (neurons).Typically, the input data of a neural network moves forward through the input layer, hidden layer, and output layer.Furthermore, the BP neural network covers backpropagation, i.e., the output errors start to move backward from the output layer.The specific steps [38] are elucidated as follows: Step 1 Initialize weights; Step 2 Forward move the signal, obtain model output y , compute error vector E , and calculate the delta δ of output nodes; Step 3 Calculate the delta AA of the backpropagated output nodes and the delta of the next layer of nodes; (22) Step 4 Repeat Step 3 until it calculates the hidden layer on the right side of the input layer; Step 5 Adjust the weight values according to the following formula, i.e., Step 6 Repeat Steps 2 to 5 for all training data nodes; Step 7 Repeat Steps 2 to 6 until the neural network has received suitable training.

Method steps
The two-step stock classification forecasting method based on the classification method and the BP neural network encompasses the following steps: Step 1 Tag multiple industry indices with category labels, and combine the closing prices of these indices at the first time stage as the training set for the first step of the time series classification stage; select all stocks in a certain industry into the portfolio as the control group for the second step of the forecast stage; and select the adjusted prices of the stocks at the first time stage of this control group as the test set for the first step.
Step 2 Use the time series decomposition-ensemble classification method to select the investment portfolio of the control group, eliminate stocks with significant differences in morphological features from the industry index, and select stocks with a high degree of industry morphological similarity to form an investment portfolio, which is referred to as the experimental group.
Step 3 Use the data of the second time stage to calculate the technical indicators of the experimental group and the control group, respectively, to characterize the statistical features of the stocks.The technical indicators of the experimental group and the control group are split in time order into the training set and the prediction set in terms of the second step of the forecast stage.
Step 4 Define the historical samples of the experimental group and the control group as good, bad, or average, and tag them with rise and fall category labels.
Step 5 Adopt the mean-variance normalization method to normalize the training set and prediction set of the experimental group and the control group at the prediction stage.
Step 6 To avoid ineffective technical indicators reducing prediction performance, the correlation coefficient method is employed to determine the correlation between the technical indicators of the prediction stage training set and the stock category labels, thereby eliminating irrelevant technical indicators.
Step 7 Use the prediction stage training set to train the BP neural network, then use the prediction set to forecast the rise and fall classification of stocks, and compare the prediction accuracy of the experimental group and the control group.(27) �w ij = αδ i x j (28)
In the parameter setting, the wavelet function uses db5, the decomposition level is 5, and the threshold adopts the unbiased risk estimation threshold (rigrsure) criterion.The threshold functions used include soft threshold function, hard threshold function, and improved threshold functions a1, a2, and a3.Here, the de-noising experiment uses these five threshold functions for wavelet threshold de-noising as the control group, and under the method proposed in this paper, these five threshold functions are used as the experimental group, and comparative experiments are conducted.
This paper uses the signal-to-noise ratio (SNR), mean square error (MSE), and waveform correlation coefficient (NCC) as evaluation indicators of de-noising performance.The higher the SNR, the more significant the noise suppression effect.The MSE reflects the similarity between the de-noised signal and the noise-free signal, and the smaller the error value, the better the de-noising performance.The calculation methods of SNR, MSE, and NCC are shown below: Fig. 2 Noise-free and noisy signals of HeaviSine where x(k) is the noise-free signal, y(k) is the de-noised signal, and n is the length of the signal.
From Table 1, it can be found that whether it is the soft threshold, hard threshold, or a1, a2, a3, the improved method proposed in this paper has shown excellent performance.

Data source
This research validates the performance of the proposed algorithm using the UCR dataset [39].Due to the ICEEMDAN method's requirement for data to reach a certain time series length, to verify the effectiveness of the proposed classification method, as shown in Table 2, we utilize the UCR dataset with time series length greater than 255, totaling 68 datasets, ordered by time series length.

Experimental results comparison
To better compare and validate the effectiveness of classification methods, this research selected the baseline algorithm as the nearest neighbor 1-NN algorithm based on Euclidean distance (ED), denoted as ED.As shown in Table 3, among the algorithms based on the nearest neighbor ED, the proposed financial time series classification method based on ICEEMDAN and wavelet threshold (deED) showed optimal performance 46 times, outperforming ED.Regarding the mean accuracy rate, deED is 0.6407, and ED is 0.6312, indicating that deED also surpasses ED.

Application of the classification method in quantitative portfolio
investment: numerical experiment

Classification experiment data
To evaluate the effectiveness of the two-step stock classification prediction method, which integrates a classification technique and a BP neural network, a numerical experiment was conducted.The IndexShares487 dataset was compiled for the purpose of sample classification.As indicated in Tables 4 and 5, this dataset includes indices from four specific industries for training: Food and Beverage, Pharmaceuticals and Biotech, Defense, and Banking.The listed companies in the banking industry, comprising stateowned commercial banks, joint-stock commercial banks, city commercial banks, and rural commercial banks, were selected as the test set.However, the banking sector is considered a narrow-based industry due to the significant impact of industry-specific factors on listed companies.The period selected for sample classification spanned from January 1, 2020, to December 31, 2021, encompassing two years of daily closing price data.The data for the listed companies in the test set have been adjusted for rights, and companies that have been designated were excluded.Missing values were imputed using the closing price of the preceding trading day.The training set is composed of 51 samples, and the test set includes 28 samples.The time series length for this dataset is 487, with all data obtained (31  from the RESSET database.The products and services offered by listed companies in the banking industry are relatively homogeneous, and the industry is significantly influenced by regulatory policies.The time series classification method introduced in this paper has been shown to produce a high degree of similarity within the screened investment portfolio for the banking industry, indicating strong interconnectivity.

Classification experiment results
The classification method described previously was first utilized to designate classification labels, which led to the acquisition of the classification results.As depicted in Table 6, samples labeled with 'Y' in the classification results were chosen to construct the investment portfolio for the experimental group.The control group consisted of all samples from the banking industry (bank), while the experimental group (banksel) included samples marked with 'Y' from the industry's integrated classification results.This same stock price increase/decrease classification prediction method was then applied to both the experimental and control groups, allowing for a comparative analysis of the two groups' performance in stock classification prediction.

Stock price rise/fall classification prediction experiment data and technical indicators
In this section, we adopt the methodology used by Zhuo Jinwu and Zhou Ying [40], employing 20 technical indicators as presented in Table 8.The sample range for calculating these indicators is the last 100 trading days leading up to December 31, 2022.The classification of each stock on a daily basis is determined by the stock's price increase      Table 8 illustrates the filtration of the 20 technical indicators for both the control group (bank) and the experimental group (banksel).Figures 3 and 4 show the degree of linear correlation between the technical indicators and stock categories for the bank and banksel groups, respectively.A low correlation between a technical indicator and stock category could negatively impact its effectiveness as a predictive parameter in the model.Hence, a threshold is usually established for selecting technical indicators.In this paper, 0.2 is the chosen threshold; technical indicators with an absolute value of the correlation coefficient greater than 0.2 with the stock category are selected as inputs for the model.Table 8 lists the technical indicators chosen for each portfolio, where TRUE signifies that an indicator has been selected, and FALSE indicates that the indicator's correlation is below 0.2 and is, therefore, not selected.

Stock rise/fall classification prediction experimental result comparison
For selecting the number of hidden layer nodes, we reference the empirical formula [41]: 2 X > N , where X represents the count of nodes in the hidden layer, and N is the num- ber of samples.Table 9 shows the correct prediction rates for the control group (bank) and the experimental group (banksel) with different numbers of hidden layer nodes.The experimental group (banksel) consistently outperforms the control group (bank) in terms of classification prediction accuracy across various node counts, achieving an average increase in accuracy of 7%.Although the two-step stock portfolio rise/fall classification prediction method does not directly result in an investment strategy, its superior performance in classifying stock price movements can inform an investment strategy that involves buying stocks predicted as 'good' and selling those predicted as 'bad.'

Conclusion
This paper addresses the significant noise present in financial time series by introducing a time series classification method that combines improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) and wavelet threshold for noise reduction.Initially, the method employs ICEEMDAN to decompose the time series into a set of IMF components and a residue.A noise detection method proposed in this paper is then used to categorize the IMF components and residue into components with and without noise.Subsequently, noise-inclusive components are processed through wavelet threshold de-noising, resulting in a non-noise sequence.This sequence, merged with the noise-reduced components, forms the final de-noised output, which is then classified using the 1-NN nearest neighbor algorithm.This approach not only enhances the benchmark method's performance, but also has significant application potential.For the first time, a combination of the t test and unit root test is introduced to detect noise in time series components, offering new tools for the analysis of time series data.Through de-noising simulation experiments, the method proposed in this paper demonstrates superior performance over the benchmark method across five different thresholds.In time series classification experiments with 68 UCR datasets, the proposed method outperforms the benchmark algorithm.Additionally, this paper presents a two-step stock portfolio classification prediction method that utilizes both a time series classification method and a BP neural network.Initially, the time series classification method screens stocks within a specific industry, resulting in a selected stock portfolio.Subsequently, the BP neural network algorithm predicts the directional movement of stock prices within this portfolio.Comparative experiments on quantitative portfolio investment show that the two-step classification prediction method offers stable and improved performance over the direct prediction method with various configurations of hidden layer nodes.The practical application confirms that the time series classification method proposed improves the predictive performance of existing methods and has promising prospects in quantitative portfolio investment.
The success of this approach lies in its ability to construct an investment portfolio closely aligned with the learning objective through empirical learning.This not only strategically leverages investors' experiential knowledge, but also promotes high similarity within the selected portfolio, enhancing the statistical effectiveness of its technical indicators.
In practical applications, to manage the impact of sudden market events on stock price behavior, a threshold should be set.Monitoring for unexpected events that could shift stock price patterns is crucial, with strategies adjusted when win rates or returns

3
Unbiased risk estimate threshold (rigrsure), based on Stein's unbiased risk estimate principle for adaptive threshold selection.The threshold is 2 = σ n √ ω b , where σ n is the noise standard deviation and ω b is the risk function.

4
Calculate the Euclidean distances D(x i , xj ) = n k=1 (x ik − xik ) 2 of the de- noised signals from the training and testing sets, applying 1-NN to label the test set with its category index.This results in classifying each time series in the test set.

Fig. 3 Fig. 4
Fig. 3 Correlation between technical indicators and stock categories in the control group (bank) Liu and Cheng EURASIP Journal on Advances in Signal Processing (2024) 2024:19

Table 1
SNR, MSE, and NCC values of results obtained after simulation de-noising

Table 2
Dataset information

Table 3
Classification accuracy rates of algorithms based on nearest neighbor ED

Table 5
IndexShares487 test dataset -day and 3 days periods.A stock is categorized as 'good' if its price rises by 2% the next day and by 3% over the next three days.Conversely, if the stock price declines on the next day and also over the next three days, it is categorized as 'bad' .All other stocks are designated as 'average' .Since December 31, 2022, falls on a weekend when the markets are closed, there are no stock category labels for the last three trading days of the year, from December 28 to December 30, 2022.Consequently, neither the training nor the prediction samples include data from these days.As indicated in Table7, the final five trading days out of the 27 are set aside as the prediction samples, with the rest allocated as training samples.

Table 6
Classification results for samples in the banking industry

Table 7
Number of training samples and prediction quantities for each portfolio in the banking industry

Table 8
Technical indicators selected for each portfolio

Table 9
Correct prediction rate of control group (bank) and experimental group (banksel) at different hidden layer node counts