Optimization of Weighting Factors for Multiple Window Spectrogram of Event-Related Potentials
EURASIP Journal on Advances in Signal Processing volume 2010, Article number: 391798 (2010)
This paper concerns the mean square error optimal weighting factors for multiple window spectrogram of different stationary and nonstationary processes. It is well known that the choice of multiple windows is important, but here we show that the weighting of the different multiple window spectrograms in the final average is as important to consider and that the equally averaged spectrogram is not mean square error optimal for non-stationary processes. The cost function for optimization is the normalized mean square error where the normalization factor is the multiple window spectrogram. This means that the unknown weighting factors will be present in the numerator as well as in the denominator. A quasi-Newton algorithm is used for the optimization. The optimization is compared for a number of well-known sets of multiple windows and common weighting factors and the results show that the number and the shape of the windows are important for a small mean square error. Multiple window spectrograms using these optimal weighting factors, from ElectroEncephaloGram data including steady-state visual evoked potentials, are shown as examples.
Estimation and detection of frequency changes of shorter or longer duration in the ElectroEncephaloGram (EEG) connected to stimuli, for example, evoked or induced potentials are often of great interest. To statistically differ between responses from different types of stimuli, choosing an spectral estimator with small bias and low variance is important.
The idea of multiple windows or multitapering was introduced by Thomson, , and in the last decades the Thomson method has been used in many different application areas. It has been shown to outperform the Welch method  in terms of leakage, resolution, and variance for a stationary spectrally smooth process, . For nonsmooth spectra, however, the performance of the Thomson method degrades due to cross-correlation between subspectra . Other appropriate choices are then, for example, [5–7]. A comparison of Hermite and Slepian functions (the Thomson method) has shown that in the case of time-varying signals and spectrogram estimation, Hermite functions are a better choice .
The choice of windows has been studied in the literature but how to weight the different multiple window spectrograms in the final average has not gained that much attention. In , the weighting factors are optimized for the Peak-Matched Multiple Windows, . A criterion is used where normalized bias, variance, and mean square error is optimized for the predefined peaked spectrum. In the nonstationary case, different approaches to approximate a time-varying spectrum with a few windowed spectrograms have been taken, for example, [10–13].
In this paper we compare the Hermite functions, the Thomson windows, the Peak Matched Multiple Windows, and the Welch windows and evaluate the performance with optimal weighting factors for different processes. The cost function for optimization is the normalized mean square error where the normalization factor is the multiple window spectrogram. This means that the unknown weighting factors will be present in the numerator as well as in the denominator. A quasi-Newton algorithm is used for the optimization. We compare the results from the usual equally weighted multiple window spectrogram as well as an optimal scaling factor-adjusted multiple window spectrogram. Preliminary results have been presented in . A nonstationary process model, which could be appropriate for, for example, induced responses in the EEG, is studied. We illustrate the weighted multiple window spectrogram estimates by showing examples of steady-state visual evoked potential (SSVEP).
The paper is organized as follows. Section 2 presents the optimized weighting factors and in Section 3 the evaluation for different stationary and nonstationary processes is presented. In Section 4 examples of estimation of SSVEP are shown. Section 5 concludes the paper.
2. Optimization of Weighting Factors
The Multiple Window Spectrogram of the zero mean real-valued random process , is defined by
for and , where the assumption is that the data is stationary for the samples . Equation (1) is a weighted sum of spectrograms obtained by using the data windows , and the weighting factors , . The parameter is the step size and the number of values in the DFT.
With only one window, , the spectrogram has too large variance to be useful in the analysis of a stochastic process, as the variance is approximately the squared Wigner spectrum .
2.1. Mean Square Error Optimization
The mean square error (MSE) is a natural choice of optimization since it includes both variance and squared bias. Optimizing the MSE for a model where the power varies with time, which might be the case for nonstationary processes, focuses too much on high-power parts of the process. To avoid this, the optimization can be done normalizing with the true Wigner spectrum at each time and frequency value. However, this might give a strange result if the Wigner estimate is biased. Therefore, we consider the normalized MSE (nMSE) where the expectation of the multiple window spectrogram is used for the normalization at each time and frequency value.
The nMSE, which is computed in the time interval and in the frequency interval , is the average of a number of time-frequency values, giving the cost function:
where the MSE for each time and frequency value is defined as
The variance is
where the covariance matrix with and and the superscript denotes conjugate transpose, according to . Reduction of the variance is established if the correlation between the windowed periodogram (subspectra),
from the windows and , , is small for all frequency values .
The bias is
where is the known Wigner spectrum of the model. The optimization cost function of (2) includes the expressions of (4) and (6) where , are known windows and is the time-variable nonstationary covariance matrix. The unknown variables are , which appear both in the numerator and the denominator of (2). The minimization of the criterion is therefore done iteratively with a quasi-Newton algorithm . The criterion and its derivative are used in the algorithm. The algorithm is described in . Using these weights in the multiple window spectrogram is referred to as optimal weights (OPTWEI).
2.2. Averaging and Scale Optimization
Usually, the spectrograms from different windows are equally weighted and averaged in the final estimate, that is,
Using equal weights according to (7) is referred to as equal weights (EQWEI).
The mean square error could be optimized according to the nMSE criterion, using equal weights scaled with a constant factor, that is,
where a closed form expression for the factor is found from
The weighting factors are referred to as scaled weights (SCWEI).
3.1. Bandlimited White Noise Process
The evaluation is done for different stationary and nonstationary processes. The bandlimited white noise process with the covariance function
generates a Toeplitz covariance matrix , which is shown in Figure 1(a) for , (Case ). The locally stationary process approach [16, 17], where the covariance function of a nonstationary process is defined by
gives a time-variable bandlimited spectrum where the time-variable power of the bandlimited white-noise process changes with a Gaussian envelope. Two examples are seen in Figure 1(b) as a long-event nonstationary process () (Case ) and in Figure 1(c) as a short-event nonstationary process () (Case ).
The weighting factors are optimized using four different sets of multiple windows. The Thomson multiple windows (TH)  give uncorrelated subspectra and thereby low variance for a stationary white noise process and the window functions are given by the eigenvectors of the () Toeplitz covariance matrix with elements given by (10) with
where is the number of multiple windows in the set.
The Peak-Matched (PM) multiple windows  are designed to give small correlation between subspectra when the spectrum of the stationary process includes peaks and notches. The windows are given by the solution of the generalized eigenvalue problem where the number of windows satisfies (12). Other parameters to be defined are the peak height chosen as dB and the sidelobe suppression chosen as dB . The number of windows is and is related to the bandwidth and window length as in (12).
The Welch method (WO)  utilizes time-shifted equal windows. In this paper we use a Hanning window of appropriate length so that the number of windows, , is fitted into the total window length with 50% overlap.
A set of Hermite functions (HE) is computed as
with for . The parameter is chosen so that the first Hermite function is approximately equal to the first Slepian function of the Thomson method in each case (similar approach as in ).
The number of windows is chosen as for all different methods and the window lengths are in all cases giving for the Thomson and Peak-Matched multiple windows. For Case (stationary process), the nMSE is computed and optimized only for the frequency, , that is, and . For the nonstationary cases we choose, and with , ( values) for Case and with , ( values) for Case These choices include the whole covariance matrix in each case and give a balance between different time and frequency values in the average.
The nMSE for Case is shown in Figure 2(a), for the different multiple window sets, where the nMSE from EQWEI is shown with circles, the SCWEI with pluses, and OPTWEI with stars. The Thomson windows and Hermite functions are optimal for the stationary bandlimited white-noise process using the EQWEI and thereby the optimization of the weighting factors (SCWEI and OPTWEI) does not give any improvement of the nMSE. The Peak-Matched multiple windows do not give a small error using EQWEI, but with SCWEI and also OPTWEI, the nMSE decreases. The overall smallest error, however, is given by the Thomson and Hermite multiple windows as expected, as these two sets are optimal for a stationary bandlimited process. The Peak-Matched multiple windows and the Welch method are not able to reach the same nMSE even when the weighting factors are optimized.
In Case Figure 2(b), the results from the long-event nonstationary process show the importance of using SCWEI and OPTWEI compared to the EQWEI in the nonstationary case. The difference of these two sets of weights is, however, not that large. It could also be noted that the Hermite functions perform slightly better than the Thomson multiple windows, which is in concordance with the study of nonstationary processes in . In Case Figure 2(c), using EQWEI on the short-event nonstationary process gives a very large error. Using SCWEI and OPTWEI gives a much lower nMSE.
The weighting factors for OPTWEI are depicted in Figure 3 for the different window sets and the different cases. For the stationary process, the optimal weighting factors for the Thomson multiple windows (stars) are equally given by . This is almost also the case for the Hermite functions (crosses), where the Peak-Matched multiple windows as well as the Welch method give more irregular weighting factors. Overall, however, the optimal weighting factors result almost in equally averaged spectra in all cases which coincides with theory for stationary processes. Of more interest is the long-event nonstationary process in Figure 3(b), where now both the Thomson and the Hermite windows give weights where more power is given to the spectrogram from the first window function with decreasing power to the following ones. Similar appearance is seen for the Welch method, where we should remember that all the windows have the same frequency shape but have their power centered at different time points. Most of the power is laid on the resulting spectrograms of the middle windows which intuitively seems quite natural. For the more short-event nonstationary process, the resulting weighting factors have a different behavior, see Figure 3(c), where now the multiple windows located at end points of the time interval for the Welch method are given most power. This shows the importance of considering the weighting factors in estimation procedure. However, for the bandlimited white noise process, we should remember that using SCWEI in all cases gave almost as small error as OPTWEI.
As the optimization is made using a quasi-Newton algorithm, we cannot be sure of convergence to the global minimum. To verify, we optimize the weighting factors in all cases using 100 different initial sets of weighting factors. The set of initial values is randomly picked from a rectangle distribution with values between zero and one and the resulting sum is normalized to one. For all three bandlimited white noise processes and for all sets of windows, the optimization converged to the same minimum error for all the 100 cases, based on equal sets of weighting factors.
3.2. Bandlimited Peaked Spectrum Process
Instead of using a bandlimited white noise process, the stationary covariance function in (10) is replaced with the covariance function of a bandlimited peaked spectrum according to :
In (14), is a peaked spectrum with , and dB, where dB. The nonstationary processes are found from (11) with replaced with .
The results from these processes are presented in Figure 4. For the stationary-peaked spectrum process, Figure 4(a), the EQWEI of the Welch method happens to give the smallest error. Using SCWEI we can lower the nMSE for all methods but using OPTWEI combined with Peak Matched multiple windows gives the smallest nMSE of all methods, which is concordance with [6, 9], where these windows and optimized weighting factors are shown to be optimal for this process. In Case Figure 4(b), for the long-event nonstationary process, the benefit of using windows with properties suitable for the process becomes visible as the smallest nMSE is given for the Peak-Matched multiple windows combined with OPTWEI. In this case, the SCWEI is far from giving the same result. In Case we also see a similar result for the short-event nonstationary process.
The different weighting factors are depicted in Figure 5, and for the stationary case, we see in Figure 5(a) the characteristic weighting for the PM given by , where is the eigenvalues from the solution of the eigenvalue problem giving the peak-matched multiple windows optimizing at the frequency see . Of more interest is the nonstationary processes of Cases and The optimal weightings of Peak-Matched and Thomson multiple windows are similar and they all give most power to the spectrogram from the first window and decreasing power to the following spectrograms. It is also worth notifying that this power increases for the short-event process of Case see Figure 5(c). Most important, however, is how the weighting changes between the peaked spectrum process and the bandlimited white noise process and also how the weighting changes with the non-stationarity of the process.
The convergence of the optimization of the weighting factors in the three cases of bandlimited peaked spectrum processes is also investigated using the same randomly picked initial values as for the bandlimited white noise process and all different window sets. The results show that a minimum of 90% (usually around 95%) of the initial values converge to the global minimum giving the true optimal weighting factors. In the cases where the algorithm did not converge, the final error and the weighting factors were very far away from the true values and the divergence was easily discovered.
4. Real Data Examples
To show the performance for real-data, sampled ElectroEncephaloGram data (EEG) were studied, where a flickering light (Grass Photic stimulator Model PS22C) was introduced at different time points. The light stimulation lasted approximately 1 s or 5 s. For a repetitive periodic visual stimulus a steady-state visual evoked potential (SSVEP) arises in the EEG. We assume the short stimulation (1 s) to introduce a short-event nonstationary process and the long stimulation (5 s) to introduce a long-event nonstationary process in the measured EEG. The subject was supine with closed eyes on a bed in a silent laboratory where ambient light was dimmed. The flickering light, with set frequency and time interval, was flashed at the subject from a distance of approximately 1 m. Data were recorded using a Neuroscan system with a digital amplifier (SYNAMP 5080, Neuro Scan, Inc.). Amplifier band-pass settings were 0.3 and 50 Hz. The sample rate was 256 Hz which was downsampled to a sample rate of Hz in Matlab. In all examples channel PZ is chosen.
We illustrate the performance of the methods with examples of four different data sets. Example is given from flickering light of 12 Hz and example is given from flickering light of 15 Hz. For both these examples the flickering lasts between time points 5 and 10 s and we assume these responses to be two long-event nonstationary processes. The third and fourth examples are given from flickering of 9 Hz between time points 10.4 and 12.2 s and time points 4.7 and 5.7 s, respectively. These two examples are assumed to be responses of short-event nonstationary processes.
We assume that we can model the different long-event nonstationary SSVEPs of Examples and as bandlimited peaked spectrum processes; see . Logarithmic spectrograms are depicted in Figure 6, where we compare the spectrograms using the single Hanning window with different weightings of the Peak Matched multiple windows using OPTWEI from Figure 5(b) and EQWEI/SCWEI. In all cases, the window length is . Note that the spectrogram using EQWEI is equal to SCWEI as the difference is only a gain factor and the coloring is adjusted between the minimum and maximum value of each plot. We should also remember that the bandwidth of this estimator is Hz, which is also clearly seen in the examples in Figures 6(b) and 6(e). The spread of the power caused by the large frequency bandwidth makes it difficult to know where the actual response frequency is located. Equal weighting of the multiple window spectrograms is not appropriate for data where it is important to locate the maximum power at a certain frequency. In the time-scale, however, we see that the resulting responses show up in the time interval where they should be located according to the stimuli given. The single window Hanning spectrograms are well resolved in frequency but the variance is however too large to be reliable, which is seen in Figures 6(a) and 6(d).
In Figure 7, we compare the spectrogram estimates using the single Hanning window with different weightings of the Peak-Matched multiple windows using OPTWEI from Figure 5(c) and EQWEI/SCWEI. The single Hanning spectrogram in Figures 7(a) and 7(d) is difficult to interpret and the spectrograms using EQWEI/SCWEI give a too wide estimate in frequency; see Figures 7(b) and 7(e). Using OPTWEI, the short-event nonstationary processes in Figures 7(c) and 7(f) are located correctly in the time interval as well as at the appropriate frequency. The last case however, Figure 7(f), has a large amount of power outside the time interval of stimuli, around 6-7 s. This is explained by the fact the stimulus sequence also activated the person and thereby also the alpha activity raised; see .
Even better is to actually estimate a model covariance function, using, for example, many trials from the same experiment for a robust estimate. From the properties of this modeled covariance function an appropriate set of multiple windows can be chosen and the weighting factors could be nMSE optimized to estimate the single stimulus response.
We compare the Hermite functions, the Thomson windows, the Peak-Matched Multiple Windows, and the Welch windows and compute the performance with optimal weighting factors for different stationary and nonstationary processes. The cost function for optimization is the normalized mean square error where the normalization factor is the multiple window spectrogram. This means that the unknown weighting factors will be present in the numerator as well as in the denominator. A quasi-Newton algorithm is used for the optimization. The results show that the weighting factors, as well as the shape of the windows, are important factors for a small error. It is also shown that a scaling optimization of the usual averaging could give almost as small mean square error as an optimization of the individual weighting factors in case of a smooth spectrum. For a peaked spectrum, a significant reduction of the normalized mean square error is achieved using individual optimization of the weights.
Thomson DJ: Spectrum estimation and harmonic analysis. Proceedings of the IEEE 1982, 70(9):1055-1096.
Welch PD: The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio Electroacoustics 1967, 15(2):70-73. 10.1109/TAU.1967.1161901
Bronez TP: On the performance advantage of multitaper spectral analysis. IEEE Transactions on Signal Processing 1992, 40(12):2941-2946. 10.1109/78.175738
Walden AT, McCoy E, Percival DB: Variance of multitaper spectrum estimates for real Gaussian processes. IEEE Transactions on Signal Processing 1994, 42(2):479-482. 10.1109/78.275635
Riedel KS, Sidorenko A: Minimum bias multiple taper spectral estimation. IEEE Transactions on Signal Processing 1995, 43(1):188-195. 10.1109/78.365298
Hansson M, Salomonsson G: A multiple window method for estimation of peaked spectra. IEEE Transactions on Signal Processing 1997, 45(3):778-781. 10.1109/78.558503
Farhang-Boroujeny B: Prolate filters for nonadaptive multitaper spectral estimators with high spectral dynamic range. IEEE Signal Processing Letters 2008, 15: 457-460.
Bayram M, Baraniuk RG: Multiple window time-frequency analysis. Proceedings of IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, June 1996 173-176.
Hansson M: Optimized weighted averaging of peak matched multiple window spectrum estimates. IEEE Transactions on Signal Processing 1999, 47(4):1141-1146. 10.1109/78.752613
Çakrak F, Loughlin PJ: Multiple window time-varying spectral analysis. IEEE Transactions on Signal Processing 2001, 49(2):448-453. 10.1109/78.902129
Xiao J, Flandrin P: Multitaper time-frequency reassignment for nonstationary spectrum estimation and chirp enhancement. IEEE Transactions on Signal Processing 2007, 55(6):2851-2860.
Williams WJ, Aviyente S: Spectrogram decompositions of time-frequency distributions. Proceedings of the 6th International, Symposium on Signal Processing and its Applications (ISSPA '01), 2001 587-590.
Scharf LL, Friedlander B: Toeplitz and Hankel kernels for estimating time-varying spectra of discrete-time random processes. IEEE Transactions on Signal Processing 2001, 49(1):179-189. 10.1109/78.890359
Hansson-Sandsten M, Sandberg J: Optimization of weighting factors for multiple window time-frequency analysis. Proceedings of the European Signal Processing Conference (EUSIPCO '09), 2009, Glasgow, UK
Fletcher R: Practical Methods for Optimization. John Wiley & Sons, New York, NY, USA; 1987.
Silverman RA: Locally stationary random processes. IRE Transactions on Information Theory 1957, 3: 182-187. 10.1109/TIT.1957.1057413
Wahlberg P, Hansson M: Kernels and multiple windows for estimation of the Wigner-Ville spectrum of Gaussian locally stationary processes. IEEE Transactions on Signal Processing 2007, 55(1):73-84.
Hansson M, Lindgren M: Multiple-window spectrogram of peaks due to transients in the electroencephalogram. IEEE Transactions on Biomedical Engineering 2001, 48(3):284-293. 10.1109/10.914791
This paper is supported by the Swedish Research Council.
About this article
Cite this article
Hansson-Sandsten, M., Sandberg, J. Optimization of Weighting Factors for Multiple Window Spectrogram of Event-Related Potentials. EURASIP J. Adv. Signal Process. 2010, 391798 (2010). https://doi.org/10.1155/2010/391798