Extending the scope of empirical mode decomposition by smoothing
© Kim et al.; licensee Springer. 2012
Received: 22 February 2012
Accepted: 25 July 2012
Published: 7 August 2012
This article considers extending the scope of the empirical mode decomposition (EMD) method. The extension is aimed at noisy data and irregularly spaced data, which is necessary for widespread applicability of EMD. The proposed algorithm, called statistical EMD (SEMD), uses a smoothing technique instead of an interpolation when constructing upper and lower envelopes. Using SEMD, we discuss how to identify non-informative fluctuations such as noise, outliers, and ultra-high frequency components from the signal, and to decompose irregularly spaced data into several components without distortions.
When analyzing a complex signal, we frequently decompose it into several components having simple forms and then analyze the information contained in each component to reduce the complexity and to enhance interpretability. Conventionally, decomposition is processed using a basis system. The benefits of decomposition are as follows: (1) a signal is well approximated by a finite number of basis functions, (2) information in the time (physical) domain is transformed into information in the frequency domain without losing any information, and (3) the interpretability of the signal can be enhanced by analyzing each component separately and comparing it with the other components.
Spectral analysis and wavelet analysis[2–4] are popular methods for signal decomposition. However, when a signal has inherent nonstationary and nonlinear features according to the scale and time location, these methods might not be suitable. Empirical mode decomposition (EMD), developed by Huang et al., provides a data-driven approach to decompose a signal into so-called intrinsic mode functions (IMFs) according to the local oscillation magnitude in the physical domain. IMFs can be considered as data-driven empirical basis functions. EMD has been popularly used for analyzing nonstationary signals or nonlinear signals in many disciplines of science and engineering.
However, due to interpolation process in the construction of envelopes, IMFs obtained by the conventional EMD algorithm are sensitive to non-informative fluctuations such as noise, outliers, and ultra-high frequency components, and hence, the non-informative fluctuation effect distorts the subsequent decomposition results. In addition, this method focuses on a narrow scope that does not cover irregularly sampled data. These constraints of its scope strongly diminish the applicability of EMD to various signals. To extend the scope of the conventional EMD to noisy signals and irregularly spaced data, we propose a statistical EMD algorithm called SEMD that is based on a smoothing technique. This method is a fully data-adaptive algorithm as in the case of the conventional EMD. The proposed SEMD has several advantages over the conventional EMD: (1) It is robust to noise or non-informative random fluctuations such as outliers and ultra-high frequency components, and hence, SEMD can decompose such signals into appropriate IMFs without distortion caused by the above-mentioned factors. (2) It provides a reasonable boundary condition of an IMF without any boundary treatment, and therefore, SEMD can provide stable decomposition results on the entire domain including boundary regions. Furthermore, we extend EMD to analyze irregularly spaced signals by combining SEMD with a simulation technique.
The remainder of this article is organized as follows. Section Review: empirical mode decomposition presents an overview of the conventional EMD. Section Statistical EMD describes the proposed SEMD method, and several case studies are presented to show its broad applicability. In addition, we investigate the variation diminishing property of SEMD. An extension to an irregularly spaced signal is presented in Section Extension of EMD to irregularly spaced signals. Finally, concluding remarks are presented in Section Conclusion.
Before closing this section, we note that, in the literature, there have been several attempts to enhance the performance of the conventional EMD and to extend its scope. For example, to deal with noise, Boudraa and Cexus removed the high-frequency components using a filtering method, and Wu and Huang used the ensemble mean approach of the simulated signal. Both methods are based on conventional sifting followed by a posterior adjustment. For applying the conventional EMD to signals with lower sampling rate, Xu et al. proposed a hybrid extrema estimation algorithm based on Fourier interpolation. More recently, Diop et al. suggested a PDE-based approach to compute envelopes, which is another way to use non-interpolation in construction of envelopes.
Review: empirical mode decomposition
Fourier analysis decomposes a signal into a sum of sinusoids having different frequencies. However, it is well known that for nonstationary signals, Fourier analysis does not effectively provide frequency information of the signals. Although wavelet analysis is a popular method for analyzing nonstationary signals, it suffers from a nonadaptive nature in that it applies the same type of basis functions to the entire range of data. Wavelet analysis also represents a signal by a linear combination of wavelet basis functions. Therefore, its formulation for the energy-frequency representation of nonlinear data can be misleading. Thus, we require a set of flexible basis functions that reflects time-varying properties of a signal.
Huang et al. proposed a data-driven algorithm for extracting an oscillatory wave from a given signal x as follows. First, we identify the local extrema and construct two functions called the upper envelope and lower envelope by interpolating the local maxima and local minima, respectively. Second, we take their average; this produces a signal with a frequency lower than that of the original signal because the main pattern of the signal is confined between the two envelopes. Third, by subtracting the envelope mean from x, the highly oscillatory wave h is separated.
Here, index i denotes the resolution level and imf1 is IMF at the finest level. We finally remark that Fourier analysis assumes that a signal is stationary and consists of components of a pure tone. In practice, the frequency information can evolve over time and several such frequencies can be compounded. The above EMD procedure is useful for identifying the amount of variation due to oscillation at different scale and time location and extracting an oscillatory wave from a nonstationary signal.
One of the main purposes of EMD is to decompose a signal into several components and to identify its significant frequency components. It is not uncommon for a signal to be corrupted by non-informative random fluctuations such as noise, which might consist of high frequencies and contains no interpretable information.
- A.(Modified sifting) Take a signal x to be decomposed, and extract the first mode h 1,λ by using a smoothing technique.
(A-1) Identify the local maxima (minima) z of the signal where is the original signal x.
(A-2) Construct an upper envelope (lower envelope) by applying a smoothing technique with a smoothing parameter λ to the maxima (minima) z.
(A-3) Compute the local mean by the average of both the envelopes, and then obtain a candidate intrinsic mode.
(A-4) Repeat steps (A-1)–(A-3) for the signal until the signal at the j th iteration satisfies the IMF conditions.
(A-5) Decompose the signal x = h1,λ + r λ , where h1,λ is defined as the limit of and r λ is the remaining signal.
(Conventional sifting) If the remaining signal r λ = x − h 1,λ has an intrinsic oscillation mode, then r λ can be further decomposed by conventional sifting.
The only difference between SEMD algorithm and the conventional EMD is step A, where the first mode is extracted by smoothing instead of interpolation. In particular, step (A-2) in construction of and by smoothing plays most important roles in determining the quality of the decomposition when the signal is corrupted by non-informative random fluctuations.
A key issue that needs to be considered is how to determine the degree of smoothness (i.e., smoothing parameter, λ) in the smoothing process. We propose an automatic selection method of λ utilizing the conventional cross-validation. The cross-validation splits observations into K roughly equal-sized parts (for example, K = 4). For the k th part (say, test dataset), we fit the model to the other K−1parts (say, training dataset) of the observations, and calculate the prediction error of the k th part by the fitted model. We perform this procedure for k = 1,…,K and combine all K estimates of prediction error.
However, by omitting the test dataset, the remaining training dataset for fitting the model becomes unequally spaced data. Since the model fit is based on the decomposition, it is difficult to obtain the stable fitting results with such unequally spaced data, and hence, the conventional cross-validation method may not be directly applicable to this case.
Split a signal x into K test datasets T 1,…,T k ,…,T K .
Impute the k th test dataset by local average of two neighboring points and obtain .
With a given smoothing parameter λ, apply the SEMD algorithm to decompose the composite signal into an h 1,λ and the remaining signal r λ .
Obtain the predicted values of remaining signal evaluated at the k th part, say .
- (v)Repeat steps (ii)–(iv) for k = 1,…,K, and define the prediction error as
Finally, by using an optimization algorithm such as golden section search algorithm, we select the smoothing parameter λ value that minimizes the prediction error PE(λ). By considering each test dataset as new observations, it can be shown that the expectation of PE(λ) is close to true prediction error. Thus, the above procedure is widely used for estimating true prediction error.
We have some remarks regarding the SEMD algorithm.
The first mode h1,λ: The h1,λ might not contain a meaningful mode when non-informative fluctuations such as noise are present, and hence, it may not be appropriate to define the extracted mode as IMF. However, the extracted mode can be considered as IMF in the case of noiseless signals.
Modified sifting at the first level: The modified sifting can be employed to extract further IMFs beyond the firstly extracted mode h1,λ. However, from our experience based on extensive simulation studies, SEMD effectively extracts noise from a noisy signal x at the first level. Furthermore, Rilling and Flandrin and Park et al. investigated that when there exists a big discrepancy between the frequencies of two components of a signal, ordinary sifting process cannot correctly estimate the relatively low frequency component, which results in misidentifying the relatively high frequency component. Since noise acts as a high frequency component and the modified sifting utilizing smoothing effectively estimates the low frequency component, SEMD seems to effectively extract noise at the first level.
Smoothing technique: Several smoothing techniques including kernel smoothing, smoothing splines, and local polynomial method have been well developed. In this study, we use kernel smoothing with Gaussian kernel. In practice, any smoothing method can be adopted for SEMD algorithm.
The role of the smoothing parameter λ: The performance of the modified sifting depends on the choice of λ. We now consider two special cases: (1) λ = 0—both envelopes and are constructed by interpolation, and hence, the extracted results are identical to those by the conventional EMD, and (2) λ = ∞—both envelopes and are the weighted averages of local extrema, so that the extracted mode becomes a over-smoothed function which might not be suitable to represent any frequency patterns of the original signal. It implies that any meaningful modes can not be extracted further. Therefore, to overcome the above problem, we propose the data-driven cross-validation approach to select an optimal λ. Finally, we remark that since the PE(λ)is a reasonable estimate of true prediction error, the resultant λ should be close to 0 when the signal is noise-free. Thus, the resulting fitting is almost identical to interpolation result in the case of noise-free signal. In summary, SEMD can be applicable to both noisy and noise-free signals.
The number of K: Through this article, we use K = 4, so that the entire signal is divided into four parts. The K can be chosen to be any number less than n. The case K = n is known as ‘leave-one-out’ cross-validation, where κ(t) = t, and the predicted value for the t th observation is evaluated using all the data except the t th observation. Thus, the leave-one-out cross-validation is computationally intensive.
Sensitivity of imputation method: An imputation method is required for the derivation of PE(λ). In this study, we use local average of two neighboring points for imputation which is simple and fast. It can be also adapted by an advanced imputation technique such as EM algorithm. However, for all cases in the article, we observe that the selection results for the smoothing parameter are almost identical.
Computation cost: Compared to the conventional EMD, the SEMD algorithm requires a longer computational time due mainly to the smoothing parameter selection. However, once the smoothing parameter for the first mode is selected, the computation time of the proposed algorithm is even faster than that of the conventional methods when a signal is contaminated by random fluctuations, because the remaining steps are almost identical to those of conventional EMD and in this case, the conventional EMD tends to produce extra artificial modes (this observation will be shown in subsequent sections). In addition, the computational burden of the above K-fold cross-validation procedure is not considerable at all.
Here, we discuss a theoretical property of SEMD, namely, the variation diminishing property of envelopes. It implies that, as the value of smoothing parameter increases, variation of envelopes decreases monotonically. In other words, structures of envelopes such as peaks and valleys disappear monotonically as the level of smoothing increases. Thus, lower and upper envelopes generated by a certain level of smoothing parameter should not contain some artifacts due to noise. This fact has been known as causality in the scale space literature (see, e.g.,).
for any positive value λ ′ ≤ λ.
for all λ1,λ2 > 0. By Theorem of, it follows that the number of sign changes in is monotonically decreasing function of λ. □
Extrema from a signal contaminated by noise are sensitive to noise or outliers. Thus, it is necessary to filter out such insignificant terms when constructing the upper and lower envelopes, and hence, the sifting process using filtered envelopes can produce stable IMFs.
As to be discussed below, the conventional EMD cannot properly handle a signal containing an ultra-high frequency component because it is difficult to obtain the desired upper and lower envelopes by using interpolation. This case can be solved by employing a smoothing approach.
Inadequate information is available on the modulation of two boundaries before the first extremum and after the last extremum when constructing envelopes. Thus, using smoothing instead of interpolation for extrema can alleviate the boundary problem.
SEMD for noisy signals
SEMD for signals with outliers
emd: conventional EMD,
semd: proposed SEMD, and
sure: SURE wavelet shrinkage method of .
From the simulation results, the following main observations can be made: (i) noise distorts the decomposition results in the case of the conventional EMD, (ii) wavelet shrinkage outperforms the conventional EMD in recovering the true function, (iii) SEMD is the most effective in removing noise from a noisy signal, and (iv) SEMD is robust to the presence of extreme values. In summary, the simulation results illustrate that SEMD is an effective decomposition method for separating noise or outliers from signals.
SEMD for signals with an ultra-high frequency component
Boundary treatment by SEMD
When constructing envelopes during the sifting process, inadequate information is available on the modulation of two boundaries before the first extremum and after the last extremum. Unless the boundaries are properly treated, large swings occur on both sides, and these eventually distort the entire decomposition result. This phenomenon is particularly exaggerated in lower-frequency IMFs because there is inadequate information on an intrinsic mode. In addition to traditional boundary treatments such as periodic or symmetric conditions, Huang et al. extended the original signal by adding artificial waves called characteristic waves, and these can be constructed by repeating the intrinsic mode formed by extreme values nearest to the boundary.
Extension of EMD to irregularly spaced signals
We consider an extension of EMD to irregularly spaced signals. The conventional EMD interpolates in-between extrema using cubic splines; this might not be appropriate for obtaining the upper and lower envelopes when the observed data are scattered: they are not observed on regular (spatial) grids, and they have spatially inhomogeneous densities including data voids of various sizes.
Here, we propose a new method based on the combination of a simulation technique for generating random fields and the SEMD algorithm, called simulation-based SEMD. This method can be easily adopted for one-dimensional signals. The proposed method comprises two steps: (1) Extrema are generated on a regularly spaced domain by a simulation method. (2) The upper and lower envelopes are constructed using the simulated extrema and the SEMD algorithm.
A key feature of the proposed simulation-based SEMD method is that it can integrate various patterns between the simulated extrema. Furthermore, the uncertainty of the resulting IMFs can be evaluated on the basis of several sets of simulations.
where p(t;x) denotes the kriging predictor at regularly spaced locations t = (t1,…,t n )that depends on x. The quantity e(t) = x(t) − p(t;x)denotes kriging residuals that are not available in practice. Therefore, we generate e(t) with an estimated covariance. Using the simulated values p(t;x) + e(t), we identify the extrema and use the SEMD to obtain the IMF at t. Irregularly spaced IMFs are derived at s.
In this article, we have proposed a statistical EMD to deal with a noisy signal by combining smoothing techniques and the conventional EMD. The results obtained from various numerical experiments confirm the effectiveness of the statistical EMD method. Furthermore, we have extended EMD to irregularly spaced signals by utilizing simulated extrema. These extensions of the conventional EMD are expected to increase the applications of EMD.
Further studies of the proposed SEMD are needed. The current algorithm of SEMD requires the selection of smoothing parameter, which is indeed computationally expensive and might be an obstacle of handing massive data. Hence, it is necessary to develop a computationally efficient method of smoothing parameter selection. As another possible refinement of SEMD, we would like to investigate intermittence problem of mode mixing, which means that different modes of oscillations coexist in a single IMF. Finally, although SEMD is relatively robust to outliers compared with the conventional EMD, a least-squared-based smoothing method such as kernel smoothing can be affected by outliers in construction of envelopes. Therefore, it seems that a quantile-based EMD would merit further study.
This work of Hee-Seok Oh was supported by the National Research Foundation of Korea (NRF) grant (No. 2012002712) funded by the Korea government (MEST). This research of Donghoh Kim was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2009-0076223).
- Priestley MB: Spectral Analysis and Time Series,. vols. 1 and 2 (Academic Press, New York, 1981)MATHGoogle Scholar
- Mallat S: A Wavelet Tour of Signal Processing. (Academic Press, New York, 2009)MATHGoogle Scholar
- Daubechies I: Ten Lectures on Wavelets. (SIAM, Philadelphia, 1992)View ArticleMATHGoogle Scholar
- Vidakovic B: Statistical Modeling by Wavelets. (John Wiley & Sons, New York, 1999)View ArticleMATHGoogle Scholar
- Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH: The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc. Roy. Soc. Lond. A 1998, 454: 903-995. 10.1098/rspa.1998.0193MathSciNetView ArticleMATHGoogle Scholar
- Huang NE, Shen SSP: Hilbert-Huang Transform and Its Applications. (World Scientic, Singapore, 2005)View ArticleMATHGoogle Scholar
- Boudraa AO, Cexus JC: EMD-based signal filtering. IEEE Trans. Instrum. Meas 2007, 56: 2196-2202.View ArticleGoogle Scholar
- Wu Z, Huang NE: Ensemble empirical mode decomposition: a noise assisted data analysis method. Adv. Adapt. Data Anal 2009, 1: 1-49. 10.1142/S1793536909000047View ArticleGoogle Scholar
- Xu Z, Huang B, Zhang F: Improvement of empirical mode decomposition under low sampling rate. Signal Process 2009, 89: 2296-2303. 10.1016/j.sigpro.2009.04.038View ArticleMATHGoogle Scholar
- Diop EHS, Alexandre R, Boudraa AO: Analysis of intrinsic mode functions: a pde approach. IEEE Signal Process. Lett 2010, 17: 398-401.View ArticleGoogle Scholar
- Hastie T, Tibshirani R: Generalized additive models. (Chapman and Hall, London, 1990)MATHGoogle Scholar
- Rilling G, Flandrin P: One or two frequencies? the empirical mode decomposition answers. IEEE Trans. Signal Process 2008, 56: 85-95.MathSciNetView ArticleGoogle Scholar
- Park M, Kim D, Oh HS: A reinterpretation of EMD by cubic spline interpolation. Adv. Adapt. Data Anal 2011, 3: 527-540. 10.1142/S1793536911000921MathSciNetView ArticleGoogle Scholar
- Lindberg T: Scale-Space Theory in Computer Vision. (Kluwer, Boston, 1994)View ArticleGoogle Scholar
- Silverman BW: Using Kernel density estimates to investigate multimodality. J. Roy. Stat. Soc. B 1981, 43: 97-99.MathSciNetGoogle Scholar
- Donoho DL, Johnstone IM: Adapting to unknown smoothing via wavelet shrinkage. J. Am. Stat. Assoc 1995, 90: 1200-1224. 10.1080/01621459.1995.10476626MathSciNetView ArticleMATHGoogle Scholar
- Donoho DL, Johnstone IM: Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81: 425-455. 10.1093/biomet/81.3.425MathSciNetView ArticleMATHGoogle Scholar
- Fan J, Gijbels I: Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. J. Roy. Stat. Soc. B 1995, 57: 371-394.MathSciNetMATHGoogle Scholar
- Marron JS, Adak S, Johnstone IM, Neumann MH, Patil P: Exact risk analysis of wavelet regression. J. Comput. Graph. Stat 1998, 7: 278-309.Google Scholar
- Cai TT: Adaptive wavelet estimation: a block thresholding and oracle inequality approach. Ann. Stat 1999, 27: 898-924. 10.1214/aos/1018031262View ArticleMathSciNetMATHGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.