Optimization of Weighting Factors for Multiple Window Spectrogram of Event-Related Potentials
© M. Hansson-Sandsten and J. Sandberg. 2010
Received: 22 December 2009
Accepted: 14 May 2010
Published: 10 June 2010
This paper concerns the mean square error optimal weighting factors for multiple window spectrogram of different stationary and nonstationary processes. It is well known that the choice of multiple windows is important, but here we show that the weighting of the different multiple window spectrograms in the final average is as important to consider and that the equally averaged spectrogram is not mean square error optimal for non-stationary processes. The cost function for optimization is the normalized mean square error where the normalization factor is the multiple window spectrogram. This means that the unknown weighting factors will be present in the numerator as well as in the denominator. A quasi-Newton algorithm is used for the optimization. The optimization is compared for a number of well-known sets of multiple windows and common weighting factors and the results show that the number and the shape of the windows are important for a small mean square error. Multiple window spectrograms using these optimal weighting factors, from ElectroEncephaloGram data including steady-state visual evoked potentials, are shown as examples.
Estimation and detection of frequency changes of shorter or longer duration in the ElectroEncephaloGram (EEG) connected to stimuli, for example, evoked or induced potentials are often of great interest. To statistically differ between responses from different types of stimuli, choosing an spectral estimator with small bias and low variance is important.
The idea of multiple windows or multitapering was introduced by Thomson, , and in the last decades the Thomson method has been used in many different application areas. It has been shown to outperform the Welch method  in terms of leakage, resolution, and variance for a stationary spectrally smooth process, . For nonsmooth spectra, however, the performance of the Thomson method degrades due to cross-correlation between subspectra . Other appropriate choices are then, for example, [5–7]. A comparison of Hermite and Slepian functions (the Thomson method) has shown that in the case of time-varying signals and spectrogram estimation, Hermite functions are a better choice .
The choice of windows has been studied in the literature but how to weight the different multiple window spectrograms in the final average has not gained that much attention. In , the weighting factors are optimized for the Peak-Matched Multiple Windows, . A criterion is used where normalized bias, variance, and mean square error is optimized for the predefined peaked spectrum. In the nonstationary case, different approaches to approximate a time-varying spectrum with a few windowed spectrograms have been taken, for example, [10–13].
In this paper we compare the Hermite functions, the Thomson windows, the Peak Matched Multiple Windows, and the Welch windows and evaluate the performance with optimal weighting factors for different processes. The cost function for optimization is the normalized mean square error where the normalization factor is the multiple window spectrogram. This means that the unknown weighting factors will be present in the numerator as well as in the denominator. A quasi-Newton algorithm is used for the optimization. We compare the results from the usual equally weighted multiple window spectrogram as well as an optimal scaling factor-adjusted multiple window spectrogram. Preliminary results have been presented in . A nonstationary process model, which could be appropriate for, for example, induced responses in the EEG, is studied. We illustrate the weighted multiple window spectrogram estimates by showing examples of steady-state visual evoked potential (SSVEP).
The paper is organized as follows. Section 2 presents the optimized weighting factors and in Section 3 the evaluation for different stationary and nonstationary processes is presented. In Section 4 examples of estimation of SSVEP are shown. Section 5 concludes the paper.
2. Optimization of Weighting Factors
for and , where the assumption is that the data is stationary for the samples . Equation (1) is a weighted sum of spectrograms obtained by using the data windows , and the weighting factors , . The parameter is the step size and the number of values in the DFT.
2.1. Mean Square Error Optimization
The mean square error (MSE) is a natural choice of optimization since it includes both variance and squared bias. Optimizing the MSE for a model where the power varies with time, which might be the case for nonstationary processes, focuses too much on high-power parts of the process. To avoid this, the optimization can be done normalizing with the true Wigner spectrum at each time and frequency value. However, this might give a strange result if the Wigner estimate is biased. Therefore, we consider the normalized MSE (nMSE) where the expectation of the multiple window spectrogram is used for the normalization at each time and frequency value.
where is the known Wigner spectrum of the model. The optimization cost function of (2) includes the expressions of (4) and (6) where , are known windows and is the time-variable nonstationary covariance matrix. The unknown variables are , which appear both in the numerator and the denominator of (2). The minimization of the criterion is therefore done iteratively with a quasi-Newton algorithm . The criterion and its derivative are used in the algorithm. The algorithm is described in . Using these weights in the multiple window spectrogram is referred to as optimal weights (OPTWEI).
2.2. Averaging and Scale Optimization
Using equal weights according to (7) is referred to as equal weights (EQWEI).
3.1. Bandlimited White Noise Process
The Peak-Matched (PM) multiple windows  are designed to give small correlation between subspectra when the spectrum of the stationary process includes peaks and notches. The windows are given by the solution of the generalized eigenvalue problem where the number of windows satisfies (12). Other parameters to be defined are the peak height chosen as dB and the sidelobe suppression chosen as dB . The number of windows is and is related to the bandwidth and window length as in (12).
The Welch method (WO)  utilizes time-shifted equal windows. In this paper we use a Hanning window of appropriate length so that the number of windows, , is fitted into the total window length with 50% overlap.
with for . The parameter is chosen so that the first Hermite function is approximately equal to the first Slepian function of the Thomson method in each case (similar approach as in ).
The number of windows is chosen as for all different methods and the window lengths are in all cases giving for the Thomson and Peak-Matched multiple windows. For Case (stationary process), the nMSE is computed and optimized only for the frequency, , that is, and . For the nonstationary cases we choose, and with , ( values) for Case and with , ( values) for Case These choices include the whole covariance matrix in each case and give a balance between different time and frequency values in the average.
In Case Figure 2(b), the results from the long-event nonstationary process show the importance of using SCWEI and OPTWEI compared to the EQWEI in the nonstationary case. The difference of these two sets of weights is, however, not that large. It could also be noted that the Hermite functions perform slightly better than the Thomson multiple windows, which is in concordance with the study of nonstationary processes in . In Case Figure 2(c), using EQWEI on the short-event nonstationary process gives a very large error. Using SCWEI and OPTWEI gives a much lower nMSE.
As the optimization is made using a quasi-Newton algorithm, we cannot be sure of convergence to the global minimum. To verify, we optimize the weighting factors in all cases using 100 different initial sets of weighting factors. The set of initial values is randomly picked from a rectangle distribution with values between zero and one and the resulting sum is normalized to one. For all three bandlimited white noise processes and for all sets of windows, the optimization converged to the same minimum error for all the 100 cases, based on equal sets of weighting factors.
3.2. Bandlimited Peaked Spectrum Process
The convergence of the optimization of the weighting factors in the three cases of bandlimited peaked spectrum processes is also investigated using the same randomly picked initial values as for the bandlimited white noise process and all different window sets. The results show that a minimum of 90% (usually around 95%) of the initial values converge to the global minimum giving the true optimal weighting factors. In the cases where the algorithm did not converge, the final error and the weighting factors were very far away from the true values and the divergence was easily discovered.
4. Real Data Examples
To show the performance for real-data, sampled ElectroEncephaloGram data (EEG) were studied, where a flickering light (Grass Photic stimulator Model PS22C) was introduced at different time points. The light stimulation lasted approximately 1 s or 5 s. For a repetitive periodic visual stimulus a steady-state visual evoked potential (SSVEP) arises in the EEG. We assume the short stimulation ( 1 s) to introduce a short-event nonstationary process and the long stimulation ( 5 s) to introduce a long-event nonstationary process in the measured EEG. The subject was supine with closed eyes on a bed in a silent laboratory where ambient light was dimmed. The flickering light, with set frequency and time interval, was flashed at the subject from a distance of approximately 1 m. Data were recorded using a Neuroscan system with a digital amplifier (SYNAMP 5080, Neuro Scan, Inc.). Amplifier band-pass settings were 0.3 and 50 Hz. The sample rate was 256 Hz which was downsampled to a sample rate of Hz in Matlab. In all examples channel PZ is chosen.
We illustrate the performance of the methods with examples of four different data sets. Example is given from flickering light of 12 Hz and example is given from flickering light of 15 Hz. For both these examples the flickering lasts between time points 5 and 10 s and we assume these responses to be two long-event nonstationary processes. The third and fourth examples are given from flickering of 9 Hz between time points 10.4 and 12.2 s and time points 4.7 and 5.7 s, respectively. These two examples are assumed to be responses of short-event nonstationary processes.
Even better is to actually estimate a model covariance function, using, for example, many trials from the same experiment for a robust estimate. From the properties of this modeled covariance function an appropriate set of multiple windows can be chosen and the weighting factors could be nMSE optimized to estimate the single stimulus response.
We compare the Hermite functions, the Thomson windows, the Peak-Matched Multiple Windows, and the Welch windows and compute the performance with optimal weighting factors for different stationary and nonstationary processes. The cost function for optimization is the normalized mean square error where the normalization factor is the multiple window spectrogram. This means that the unknown weighting factors will be present in the numerator as well as in the denominator. A quasi-Newton algorithm is used for the optimization. The results show that the weighting factors, as well as the shape of the windows, are important factors for a small error. It is also shown that a scaling optimization of the usual averaging could give almost as small mean square error as an optimization of the individual weighting factors in case of a smooth spectrum. For a peaked spectrum, a significant reduction of the normalized mean square error is achieved using individual optimization of the weights.
This paper is supported by the Swedish Research Council.
- Thomson DJ: Spectrum estimation and harmonic analysis. Proceedings of the IEEE 1982, 70(9):1055-1096.View ArticleGoogle Scholar
- Welch PD: The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio Electroacoustics 1967, 15(2):70-73. 10.1109/TAU.1967.1161901MathSciNetView ArticleGoogle Scholar
- Bronez TP: On the performance advantage of multitaper spectral analysis. IEEE Transactions on Signal Processing 1992, 40(12):2941-2946. 10.1109/78.175738View ArticleGoogle Scholar
- Walden AT, McCoy E, Percival DB: Variance of multitaper spectrum estimates for real Gaussian processes. IEEE Transactions on Signal Processing 1994, 42(2):479-482. 10.1109/78.275635View ArticleGoogle Scholar
- Riedel KS, Sidorenko A: Minimum bias multiple taper spectral estimation. IEEE Transactions on Signal Processing 1995, 43(1):188-195. 10.1109/78.365298View ArticleGoogle Scholar
- Hansson M, Salomonsson G: A multiple window method for estimation of peaked spectra. IEEE Transactions on Signal Processing 1997, 45(3):778-781. 10.1109/78.558503View ArticleGoogle Scholar
- Farhang-Boroujeny B: Prolate filters for nonadaptive multitaper spectral estimators with high spectral dynamic range. IEEE Signal Processing Letters 2008, 15: 457-460.View ArticleGoogle Scholar
- Bayram M, Baraniuk RG: Multiple window time-frequency analysis. Proceedings of IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, June 1996 173-176.View ArticleGoogle Scholar
- Hansson M: Optimized weighted averaging of peak matched multiple window spectrum estimates. IEEE Transactions on Signal Processing 1999, 47(4):1141-1146. 10.1109/78.752613MathSciNetView ArticleGoogle Scholar
- Çakrak F, Loughlin PJ: Multiple window time-varying spectral analysis. IEEE Transactions on Signal Processing 2001, 49(2):448-453. 10.1109/78.902129View ArticleGoogle Scholar
- Xiao J, Flandrin P: Multitaper time-frequency reassignment for nonstationary spectrum estimation and chirp enhancement. IEEE Transactions on Signal Processing 2007, 55(6):2851-2860.MathSciNetView ArticleGoogle Scholar
- Williams WJ, Aviyente S: Spectrogram decompositions of time-frequency distributions. Proceedings of the 6th International, Symposium on Signal Processing and its Applications (ISSPA '01), 2001 587-590.View ArticleGoogle Scholar
- Scharf LL, Friedlander B: Toeplitz and Hankel kernels for estimating time-varying spectra of discrete-time random processes. IEEE Transactions on Signal Processing 2001, 49(1):179-189. 10.1109/78.890359MathSciNetView ArticleGoogle Scholar
- Hansson-Sandsten M, Sandberg J: Optimization of weighting factors for multiple window time-frequency analysis. Proceedings of the European Signal Processing Conference (EUSIPCO '09), 2009, Glasgow, UKGoogle Scholar
- Fletcher R: Practical Methods for Optimization. John Wiley & Sons, New York, NY, USA; 1987.MATHGoogle Scholar
- Silverman RA: Locally stationary random processes. IRE Transactions on Information Theory 1957, 3: 182-187. 10.1109/TIT.1957.1057413View ArticleGoogle Scholar
- Wahlberg P, Hansson M: Kernels and multiple windows for estimation of the Wigner-Ville spectrum of Gaussian locally stationary processes. IEEE Transactions on Signal Processing 2007, 55(1):73-84.MathSciNetView ArticleGoogle Scholar
- Hansson M, Lindgren M: Multiple-window spectrogram of peaks due to transients in the electroencephalogram. IEEE Transactions on Biomedical Engineering 2001, 48(3):284-293. 10.1109/10.914791View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.