Extending the scope of empirical mode decomposition by smoothing
 Donghoh Kim^{1},
 Kyungmee O Kim^{2} and
 HeeSeok Oh^{3}Email author
https://doi.org/10.1186/168761802012168
© Kim et al.; licensee Springer. 2012
Received: 22 February 2012
Accepted: 25 July 2012
Published: 7 August 2012
Abstract
This article considers extending the scope of the empirical mode decomposition (EMD) method. The extension is aimed at noisy data and irregularly spaced data, which is necessary for widespread applicability of EMD. The proposed algorithm, called statistical EMD (SEMD), uses a smoothing technique instead of an interpolation when constructing upper and lower envelopes. Using SEMD, we discuss how to identify noninformative fluctuations such as noise, outliers, and ultrahigh frequency components from the signal, and to decompose irregularly spaced data into several components without distortions.
Keywords
Introduction
When analyzing a complex signal, we frequently decompose it into several components having simple forms and then analyze the information contained in each component to reduce the complexity and to enhance interpretability. Conventionally, decomposition is processed using a basis system. The benefits of decomposition are as follows: (1) a signal is well approximated by a finite number of basis functions, (2) information in the time (physical) domain is transformed into information in the frequency domain without losing any information, and (3) the interpretability of the signal can be enhanced by analyzing each component separately and comparing it with the other components.
Spectral analysis[1] and wavelet analysis[2–4] are popular methods for signal decomposition. However, when a signal has inherent nonstationary and nonlinear features according to the scale and time location, these methods might not be suitable. Empirical mode decomposition (EMD), developed by Huang et al.[5], provides a datadriven approach to decompose a signal into socalled intrinsic mode functions (IMFs) according to the local oscillation magnitude in the physical domain. IMFs can be considered as datadriven empirical basis functions. EMD has been popularly used for analyzing nonstationary signals or nonlinear signals in many disciplines of science and engineering[6].
However, due to interpolation process in the construction of envelopes, IMFs obtained by the conventional EMD algorithm are sensitive to noninformative fluctuations such as noise, outliers, and ultrahigh frequency components, and hence, the noninformative fluctuation effect distorts the subsequent decomposition results. In addition, this method focuses on a narrow scope that does not cover irregularly sampled data. These constraints of its scope strongly diminish the applicability of EMD to various signals. To extend the scope of the conventional EMD to noisy signals and irregularly spaced data, we propose a statistical EMD algorithm called SEMD that is based on a smoothing technique. This method is a fully dataadaptive algorithm as in the case of the conventional EMD. The proposed SEMD has several advantages over the conventional EMD: (1) It is robust to noise or noninformative random fluctuations such as outliers and ultrahigh frequency components, and hence, SEMD can decompose such signals into appropriate IMFs without distortion caused by the abovementioned factors. (2) It provides a reasonable boundary condition of an IMF without any boundary treatment, and therefore, SEMD can provide stable decomposition results on the entire domain including boundary regions. Furthermore, we extend EMD to analyze irregularly spaced signals by combining SEMD with a simulation technique.
The remainder of this article is organized as follows. Section Review: empirical mode decomposition presents an overview of the conventional EMD. Section Statistical EMD describes the proposed SEMD method, and several case studies are presented to show its broad applicability. In addition, we investigate the variation diminishing property of SEMD. An extension to an irregularly spaced signal is presented in Section Extension of EMD to irregularly spaced signals. Finally, concluding remarks are presented in Section Conclusion.
Before closing this section, we note that, in the literature, there have been several attempts to enhance the performance of the conventional EMD and to extend its scope. For example, to deal with noise, Boudraa and Cexus[7] removed the highfrequency components using a filtering method, and Wu and Huang[8] used the ensemble mean approach of the simulated signal. Both methods are based on conventional sifting followed by a posterior adjustment. For applying the conventional EMD to signals with lower sampling rate, Xu et al.[9] proposed a hybrid extrema estimation algorithm based on Fourier interpolation. More recently, Diop et al.[10] suggested a PDEbased approach to compute envelopes, which is another way to use noninterpolation in construction of envelopes.
Review: empirical mode decomposition
Fourier analysis decomposes a signal into a sum of sinusoids having different frequencies. However, it is well known that for nonstationary signals, Fourier analysis does not effectively provide frequency information of the signals. Although wavelet analysis is a popular method for analyzing nonstationary signals, it suffers from a nonadaptive nature in that it applies the same type of basis functions to the entire range of data. Wavelet analysis also represents a signal by a linear combination of wavelet basis functions. Therefore, its formulation for the energyfrequency representation of nonlinear data can be misleading[5]. Thus, we require a set of flexible basis functions that reflects timevarying properties of a signal.
Huang et al.[5] proposed a datadriven algorithm for extracting an oscillatory wave from a given signal x as follows. First, we identify the local extrema and construct two functions called the upper envelope and lower envelope by interpolating the local maxima and local minima, respectively. Second, we take their average; this produces a signal with a frequency lower than that of the original signal because the main pattern of the signal is confined between the two envelopes. Third, by subtracting the envelope mean from x, the highly oscillatory wave h is separated.
Here, index i denotes the resolution level and imf_{1} is IMF at the finest level. We finally remark that Fourier analysis assumes that a signal is stationary and consists of components of a pure tone. In practice, the frequency information can evolve over time and several such frequencies can be compounded. The above EMD procedure is useful for identifying the amount of variation due to oscillation at different scale and time location and extracting an oscillatory wave from a nonstationary signal.
Statistical EMD
One of the main purposes of EMD is to decompose a signal into several components and to identify its significant frequency components. It is not uncommon for a signal to be corrupted by noninformative random fluctuations such as noise, which might consist of high frequencies and contains no interpretable information.
 A.(Modified sifting) Take a signal x to be decomposed, and extract the first mode h _{1,λ} by using a smoothing technique.

(A1) Identify the local maxima (minima) z of the signal${h}_{1,\lambda}^{0}$ where${h}_{1,\lambda}^{0}$ is the original signal x.

(A2) Construct an upper envelope${\xfb}_{\lambda}$ (lower envelope${\widehat{\ell}}_{\lambda}$) by applying a smoothing technique with a smoothing parameter λ to the maxima (minima) z.

(A3) Compute the local mean${m}_{\lambda}=\frac{1}{2}({\xfb}_{\lambda}+{\widehat{\ell}}_{\lambda})$ by the average of both the envelopes, and then obtain a candidate intrinsic mode${h}_{1,\lambda}^{1}={h}_{1,\lambda}^{0}{m}_{\lambda}$.

(A4) Repeat steps (A1)–(A3) for the signal${h}_{1,\lambda}^{1}$ until the signal${h}_{1,\lambda}^{j}$ at the j th iteration satisfies the IMF conditions.

(A5) Decompose the signal x = h_{1,λ} + r_{ λ }, where h_{1,λ} is defined as the limit of${h}_{1,\lambda}^{j}$ and r_{ λ } is the remaining signal.

 B.
(Conventional sifting) If the remaining signal r _{ λ } = x − h _{1,λ} has an intrinsic oscillation mode, then r _{ λ } can be further decomposed by conventional sifting.
The only difference between SEMD algorithm and the conventional EMD is step A, where the first mode is extracted by smoothing instead of interpolation. In particular, step (A2) in construction of${\xfb}_{\lambda}$ and${\widehat{\ell}}_{\lambda}$ by smoothing plays most important roles in determining the quality of the decomposition when the signal is corrupted by noninformative random fluctuations.
A key issue that needs to be considered is how to determine the degree of smoothness (i.e., smoothing parameter, λ) in the smoothing process. We propose an automatic selection method of λ utilizing the conventional crossvalidation. The crossvalidation splits observations into K roughly equalsized parts (for example, K = 4). For the k th part (say, test dataset), we fit the model to the other K−1parts (say, training dataset) of the observations, and calculate the prediction error of the k th part by the fitted model. We perform this procedure for k = 1,…,K and combine all K estimates of prediction error.
However, by omitting the test dataset, the remaining training dataset for fitting the model becomes unequally spaced data. Since the model fit is based on the decomposition, it is difficult to obtain the stable fitting results with such unequally spaced data, and hence, the conventional crossvalidation method may not be directly applicable to this case.
 (i)
Split a signal x into K test datasets T _{1},…,T _{ k },…,T _{ K }.
 (ii)
Impute the k th test dataset by local average of two neighboring points and obtain ${\stackrel{~}{T}}_{k}$.
 (iii)
With a given smoothing parameter λ, apply the SEMD algorithm to decompose the composite signal ${T}_{1},\dots ,{T}_{k1},{\stackrel{~}{T}}_{k},{T}_{k+1},\dots ,{T}_{K}$ into an h _{1,λ} and the remaining signal r _{ λ }.
 (iv)
Obtain the predicted values of remaining signal evaluated at the k th part, say ${r}_{\lambda}^{k}\left(t\right)$.
 (v)Repeat steps (ii)–(iv) for k = 1,…,K, and define the prediction error as$\mathrm{PE}\left(\lambda \right)=\frac{1}{n}\sum _{t=1}^{n}{\{x\left(t\right){r}_{\lambda}^{k}\left(t\right)\}}^{2}.$
Finally, by using an optimization algorithm such as golden section search algorithm, we select the smoothing parameter λ value that minimizes the prediction error PE(λ). By considering each test dataset as new observations, it can be shown that the expectation of PE(λ) is close to true prediction error[11]. Thus, the above procedure is widely used for estimating true prediction error.
We have some remarks regarding the SEMD algorithm.

The first mode h_{1,λ}: The h_{1,λ} might not contain a meaningful mode when noninformative fluctuations such as noise are present, and hence, it may not be appropriate to define the extracted mode as IMF. However, the extracted mode can be considered as IMF in the case of noiseless signals.

Modified sifting at the first level: The modified sifting can be employed to extract further IMFs beyond the firstly extracted mode h_{1,λ}. However, from our experience based on extensive simulation studies, SEMD effectively extracts noise from a noisy signal x at the first level. Furthermore, Rilling and Flandrin[12] and Park et al.[13] investigated that when there exists a big discrepancy between the frequencies of two components of a signal, ordinary sifting process cannot correctly estimate the relatively low frequency component, which results in misidentifying the relatively high frequency component. Since noise acts as a high frequency component and the modified sifting utilizing smoothing effectively estimates the low frequency component, SEMD seems to effectively extract noise at the first level.

Smoothing technique: Several smoothing techniques including kernel smoothing, smoothing splines, and local polynomial method have been well developed. In this study, we use kernel smoothing with Gaussian kernel. In practice, any smoothing method can be adopted for SEMD algorithm.

The role of the smoothing parameter λ: The performance of the modified sifting depends on the choice of λ. We now consider two special cases: (1) λ = 0—both envelopes${\xfb}_{\lambda}$ and${\widehat{\ell}}_{\lambda}$ are constructed by interpolation, and hence, the extracted results are identical to those by the conventional EMD, and (2) λ = ∞—both envelopes${\xfb}_{\lambda}$ and${\widehat{\ell}}_{\lambda}$ are the weighted averages of local extrema, so that the extracted mode becomes a oversmoothed function which might not be suitable to represent any frequency patterns of the original signal. It implies that any meaningful modes can not be extracted further. Therefore, to overcome the above problem, we propose the datadriven crossvalidation approach to select an optimal λ. Finally, we remark that since the PE(λ)is a reasonable estimate of true prediction error, the resultant λ should be close to 0 when the signal is noisefree. Thus, the resulting fitting is almost identical to interpolation result in the case of noisefree signal. In summary, SEMD can be applicable to both noisy and noisefree signals.

The number of K: Through this article, we use K = 4, so that the entire signal is divided into four parts. The K can be chosen to be any number less than n. The case K = n is known as ‘leaveoneout’ crossvalidation, where κ(t) = t, and the predicted value for the t th observation is evaluated using all the data except the t th observation. Thus, the leaveoneout crossvalidation is computationally intensive.

Sensitivity of imputation method: An imputation method is required for the derivation of PE(λ). In this study, we use local average of two neighboring points for imputation which is simple and fast. It can be also adapted by an advanced imputation technique such as EM algorithm. However, for all cases in the article, we observe that the selection results for the smoothing parameter are almost identical.

Computation cost: Compared to the conventional EMD, the SEMD algorithm requires a longer computational time due mainly to the smoothing parameter selection. However, once the smoothing parameter for the first mode is selected, the computation time of the proposed algorithm is even faster than that of the conventional methods when a signal is contaminated by random fluctuations, because the remaining steps are almost identical to those of conventional EMD and in this case, the conventional EMD tends to produce extra artificial modes (this observation will be shown in subsequent sections). In addition, the computational burden of the above Kfold crossvalidation procedure is not considerable at all.
Here, we discuss a theoretical property of SEMD, namely, the variation diminishing property of envelopes. It implies that, as the value of smoothing parameter increases, variation of envelopes decreases monotonically. In other words, structures of envelopes such as peaks and valleys disappear monotonically as the level of smoothing increases. Thus, lower and upper envelopes generated by a certain level of smoothing parameter should not contain some artifacts due to noise. This fact has been known as causality in the scale space literature (see, e.g.,[14]).
Proposition 1
for any positive value λ^{ ′ } ≤ λ.
Proof
for all λ_{1},λ_{2} > 0. By Theorem of[15], it follows that the number of sign changes in${\xfb}_{\lambda}\left(t\right)$ is monotonically decreasing function of λ. □
 (a)
Extrema from a signal contaminated by noise are sensitive to noise or outliers. Thus, it is necessary to filter out such insignificant terms when constructing the upper and lower envelopes, and hence, the sifting process using filtered envelopes can produce stable IMFs.
 (b)
As to be discussed below, the conventional EMD cannot properly handle a signal containing an ultrahigh frequency component because it is difficult to obtain the desired upper and lower envelopes by using interpolation. This case can be solved by employing a smoothing approach.
 (c)
Inadequate information is available on the modulation of two boundaries before the first extremum and after the last extremum when constructing envelopes. Thus, using smoothing instead of interpolation for extrema can alleviate the boundary problem.
SEMD for noisy signals
SEMD for signals with outliers
 1.
emd: conventional EMD,
 2.
semd: proposed SEMD, and
 3.
sure: SURE wavelet shrinkage method of [16].
From the simulation results, the following main observations can be made: (i) noise distorts the decomposition results in the case of the conventional EMD, (ii) wavelet shrinkage outperforms the conventional EMD in recovering the true function, (iii) SEMD is the most effective in removing noise from a noisy signal, and (iv) SEMD is robust to the presence of extreme values. In summary, the simulation results illustrate that SEMD is an effective decomposition method for separating noise or outliers from signals.
SEMD for signals with an ultrahigh frequency component
Boundary treatment by SEMD
When constructing envelopes during the sifting process, inadequate information is available on the modulation of two boundaries before the first extremum and after the last extremum. Unless the boundaries are properly treated, large swings occur on both sides, and these eventually distort the entire decomposition result. This phenomenon is particularly exaggerated in lowerfrequency IMFs because there is inadequate information on an intrinsic mode. In addition to traditional boundary treatments such as periodic or symmetric conditions, Huang et al.[5] extended the original signal by adding artificial waves called characteristic waves, and these can be constructed by repeating the intrinsic mode formed by extreme values nearest to the boundary.
Extension of EMD to irregularly spaced signals
We consider an extension of EMD to irregularly spaced signals. The conventional EMD interpolates inbetween extrema using cubic splines; this might not be appropriate for obtaining the upper and lower envelopes when the observed data are scattered: they are not observed on regular (spatial) grids, and they have spatially inhomogeneous densities including data voids of various sizes.
Here, we propose a new method based on the combination of a simulation technique for generating random fields and the SEMD algorithm, called simulationbased SEMD. This method can be easily adopted for onedimensional signals. The proposed method comprises two steps: (1) Extrema are generated on a regularly spaced domain by a simulation method. (2) The upper and lower envelopes are constructed using the simulated extrema and the SEMD algorithm.
A key feature of the proposed simulationbased SEMD method is that it can integrate various patterns between the simulated extrema. Furthermore, the uncertainty of the resulting IMFs can be evaluated on the basis of several sets of simulations.
where p(t;x) denotes the kriging predictor at regularly spaced locations t = (t_{1},…,t_{ n })that depends on x. The quantity e(t) = x(t) − p(t;x)denotes kriging residuals that are not available in practice. Therefore, we generate e(t) with an estimated covariance. Using the simulated values p(t;x) + e(t), we identify the extrema and use the SEMD to obtain the IMF at t. Irregularly spaced IMFs are derived at s.
Conclusion
In this article, we have proposed a statistical EMD to deal with a noisy signal by combining smoothing techniques and the conventional EMD. The results obtained from various numerical experiments confirm the effectiveness of the statistical EMD method. Furthermore, we have extended EMD to irregularly spaced signals by utilizing simulated extrema. These extensions of the conventional EMD are expected to increase the applications of EMD.
Further studies of the proposed SEMD are needed. The current algorithm of SEMD requires the selection of smoothing parameter, which is indeed computationally expensive and might be an obstacle of handing massive data. Hence, it is necessary to develop a computationally efficient method of smoothing parameter selection. As another possible refinement of SEMD, we would like to investigate intermittence problem of mode mixing, which means that different modes of oscillations coexist in a single IMF. Finally, although SEMD is relatively robust to outliers compared with the conventional EMD, a leastsquaredbased smoothing method such as kernel smoothing can be affected by outliers in construction of envelopes. Therefore, it seems that a quantilebased EMD would merit further study.
Declarations
Acknowledgements
This work of HeeSeok Oh was supported by the National Research Foundation of Korea (NRF) grant (No. 2012002712) funded by the Korea government (MEST). This research of Donghoh Kim was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (20090076223).
Authors’ Affiliations
References
 Priestley MB: Spectral Analysis and Time Series,. vols. 1 and 2 (Academic Press, New York, 1981)MATHGoogle Scholar
 Mallat S: A Wavelet Tour of Signal Processing. (Academic Press, New York, 2009)MATHGoogle Scholar
 Daubechies I: Ten Lectures on Wavelets. (SIAM, Philadelphia, 1992)View ArticleMATHGoogle Scholar
 Vidakovic B: Statistical Modeling by Wavelets. (John Wiley & Sons, New York, 1999)View ArticleMATHGoogle Scholar
 Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH: The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc. Roy. Soc. Lond. A 1998, 454: 903995. 10.1098/rspa.1998.0193MathSciNetView ArticleMATHGoogle Scholar
 Huang NE, Shen SSP: HilbertHuang Transform and Its Applications. (World Scientic, Singapore, 2005)View ArticleMATHGoogle Scholar
 Boudraa AO, Cexus JC: EMDbased signal filtering. IEEE Trans. Instrum. Meas 2007, 56: 21962202.View ArticleGoogle Scholar
 Wu Z, Huang NE: Ensemble empirical mode decomposition: a noise assisted data analysis method. Adv. Adapt. Data Anal 2009, 1: 149. 10.1142/S1793536909000047View ArticleGoogle Scholar
 Xu Z, Huang B, Zhang F: Improvement of empirical mode decomposition under low sampling rate. Signal Process 2009, 89: 22962303. 10.1016/j.sigpro.2009.04.038View ArticleMATHGoogle Scholar
 Diop EHS, Alexandre R, Boudraa AO: Analysis of intrinsic mode functions: a pde approach. IEEE Signal Process. Lett 2010, 17: 398401.View ArticleGoogle Scholar
 Hastie T, Tibshirani R: Generalized additive models. (Chapman and Hall, London, 1990)MATHGoogle Scholar
 Rilling G, Flandrin P: One or two frequencies? the empirical mode decomposition answers. IEEE Trans. Signal Process 2008, 56: 8595.MathSciNetView ArticleGoogle Scholar
 Park M, Kim D, Oh HS: A reinterpretation of EMD by cubic spline interpolation. Adv. Adapt. Data Anal 2011, 3: 527540. 10.1142/S1793536911000921MathSciNetView ArticleGoogle Scholar
 Lindberg T: ScaleSpace Theory in Computer Vision. (Kluwer, Boston, 1994)View ArticleGoogle Scholar
 Silverman BW: Using Kernel density estimates to investigate multimodality. J. Roy. Stat. Soc. B 1981, 43: 9799.MathSciNetGoogle Scholar
 Donoho DL, Johnstone IM: Adapting to unknown smoothing via wavelet shrinkage. J. Am. Stat. Assoc 1995, 90: 12001224. 10.1080/01621459.1995.10476626MathSciNetView ArticleMATHGoogle Scholar
 Donoho DL, Johnstone IM: Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81: 425455. 10.1093/biomet/81.3.425MathSciNetView ArticleMATHGoogle Scholar
 Fan J, Gijbels I: Datadriven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. J. Roy. Stat. Soc. B 1995, 57: 371394.MathSciNetMATHGoogle Scholar
 Marron JS, Adak S, Johnstone IM, Neumann MH, Patil P: Exact risk analysis of wavelet regression. J. Comput. Graph. Stat 1998, 7: 278309.Google Scholar
 Cai TT: Adaptive wavelet estimation: a block thresholding and oracle inequality approach. Ann. Stat 1999, 27: 898924. 10.1214/aos/1018031262View ArticleMathSciNetMATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.