Optimal Multitaper Wigner Spectrum Estimation of a Class of Locally Stationary Processes Using Hermite Functions

This paper investigates the time-discrete multitapers that give a mean square error optimal Wigner spectrum estimate for a class of locally stationary processes (LSPs). The accuracy in the estimation of the time-variable Wigner spectrum of the LSP is evaluated and compared with other frequently used methods. The optimal multitapers are also approximated by Hermite functions, which is computationally more e ﬃ cient, and the errors introduced by this approximation are studied. Additionally, the number of windows included in a multitaper spectrum estimate is often crucial and an investigation of the error caused by limiting this number is made. Finally, the same optimal set of weights can be stored and utilized for di ﬀ erent window lengths. As a result, the optimal multitapers are shown to be well approximated by Hermite functions, and a limited number of windows can be used for a mean square error optimal spectrogram estimate.


Introduction
A locally stationary process (LSP) has a covariance function which is a multiplication of a covariance function of a stationary process and a time-variable function, [1].The process is nonstationary with properties suitable for modeling measured signals that, for example, have a transient amplitude behavior, such as evoked or induced potentials arising in the electroencephalogram, [2,3].To statistically differentiate between responses from different types of stimuli, choosing a spectral estimator with small bias and low variance is important.Such estimators can be found but additionally important is that the estimators have discrete-time and discrete-frequency properties suitable from implementation aspects, such as few choices of parameters and computational efficiency.
The mean square error optimal kernel for the class of Gaussian harmonizable processes has been obtained by Sayeed and Jones [4], and further, the optimal time-frequency kernel for LSPs, restricted to covariance functions defined by a multiplication of two variable Gaussian functions, is obtained in [5].The calculation of the two-dimensional convolution between the kernel and the Wigner distribution of a process realization can be simplified using kernel decomposition and calculating multitaper spectrograms, [6,7].The time-lag estimation kernel is rotated, and the corresponding eigenvectors and eigenvalues are calculated.The estimate of the Wigner spectrum is given as the weighted sum of the spectrograms of the data with the different eigenvectors as sliding windows and the eigenvalues as weights, [8].Different approaches to approximate a time-varying spectrum with a few windowed spectrograms have been taken, for example, [9][10][11][12][13][14][15].The sampling properties related to time-frequency analysis are well covered in [16,17], and differences in the time-discrete case compared to the time-continuous case are considered, for example, in [18,19].
In this paper, the time-discrete multitapers corresponding to the mean square error optimal time-frequency kernel for a class of LSPs are computed and the performance of the resulting estimator is compared to other well-known algorithms.The approximation with Hermite functions are also evaluated in the discrete-time case and appropriate choices of the scaling of the Hermite functions are given for different window lengths.A reasonable number of computed multitaper spectrograms should be averaged for the final Wigner spectrum estimate, and an evaluation of how this EURASIP Journal on Advances in Signal Processing  number could be reduced without degraded performance is done.Many applications where such an algorithm could be useful for implementation can be found.In the present study, examples are shown of the electrophysiological correlates of toddlers' watching of video clips displaying different types of nonlinguistic communication between two actors.The paper is organized as follows, Section 2 presents examples of the class of LSPs used in the paper.Section 3 summarizes the estimation of the optimal kernel approach and the corresponding multitapers and gives some examples.In Section 4, a comparison of several algorithms for estimation of locally stationary spectra is given.Section 5 evaluates the approximation using Hermite functions, and the reduction of the number of averaged multitaper spectrograms is investigated in Section 6.In Section 7, the electroencephalogram data examples are shown, Section 8 concludes the paper.

Locally Stationary Processes
A zero-mean second-order continuous-time locally stationary process, {x(t)} has, per definition in [  function determined by two functions according to, In this paper, q(τ) = e −τ 2 /2 , is a fixed Gaussian function with a constant sign which we assume to be nonnegative, [5] and the function r(τ) = e (−c/4)τ 2 /2 , with c ≥ 1.When c 4, r(τ) decreases quicker than q(τ), and we approach a stationary process as c → ∞.The opposite extreme with c = 1 gives maximal nonstationarity.A locally stationary chirp process, (LSCP), [5], is defined where m determines the chirp frequency and d the start of the chirp frequency.The definition of LSP is also extended to a sum of locally stationary processes, a so-called multicomponent locally stationary process (MLSP) where the variable Gaussian function r j (τ) = e (−cj /4)τ 2 /2 has different c j for j = 1, . . ., J, [5].
The sampled versions of these definitions are given with where F s is chosen for the Gaussian functions to become approximately zero for t = ±N/2F s .The N × N covariance matrix R x (t, s) is computed, and in Figure 1 some examples of the shape of this matrix for different parameter values are depicted for N = 256 and F s = 7.
To illustrate the processes for different parameters, a number of realizations x = [ x(0) . . .x(N − 1)] T , are simulated from where b = [ b(0) . . .b(N −1)] T is a sequence of realizations from the a real-valued white Gaussian zero mean stochastic process with variance one.The process-generating matrix H is related to the covariance matrix as In Figure 2, examples of the base-band shapes of such processes are shown.Important to remember is that these processes can be translated in frequency as well as in time, and still the Cohen's class estimators will give the same timeand frequency-translated result, [8].For a small value of c = 1.1, the realizations are narrow banded, Figure 2(a), but for a larger c, the realizations are more wide banded with a more irregular form, Figure 2(b).In Figure 2(c), the parameters m and d are chosen for an increasing chirp frequency, but the principal shape is the same as in Figure 2(a) as c = 1.1 (note that only the real part of these process realizations are shown).The realizations in Figure 2(d) have the covariance matrix of (3) with J = 2, c 1 = 1.1, and c 2 = 20, giving realizations which are a combination of a narrow-banded and wide-banded shapes.

Optimal Kernels and Multitapers
The optimal ambiguity domain kernel in the mean square error sense for zero-mean Gaussian processes was estimated in [4], and based on this, the corresponding kernels for the processes defined in (1), (2), and (3) were derived in [5] as The optimal ambiguity domain kernels corresponding to the covariance functions and processes of Figures 1 and 2 are computed and plotted in Figure 3. Instead of calculating the time-frequency estimate using the ambiguity domain kernel, the calculations can be simplified by using a multitaper spectrogram, [8], where Cohen's class is written as With t = (t 1 + t 2 )/2 and τ = t 1 − t 2 , where If the kernel ρ rot opt (t 1 , t 2 ) satisfies the Hermitian property ρ rot opt (t 1 , t 2 ) = (ρ rot opt (t 2 , t 1 )) * , then solving the integral results in eigenvalues λ k and eigenfunctions q k which form a complete set.The time-lag kernel can be expressed as Using the eigenvalues and eigenvectors, ( 7) is rewritten as a weighted sum of spectrograms, With just a few λ k that differ from zero, the multitaper spectrogram solution is an effective solution from implementation aspects.
Using discrete-time data, the windows and weights are found from the solution of the corresponding matrix equation where the eigenvalues are ordered according to To extract the corresponding samples of ρ rot opt (t, τ) in an easy way, the time-lag kernel, ρ opt (t, τ) should be sampled with 2F s , that is, t = n/2F s , with n = −N , . . ., N − 1 and τ = n/2F s with n = −2N , . . ., 2N − 1 to cover the required range for t 1 and t 2 .The ambiguity domain kernel φ opt (ν, τ) sampled with 2F s and the discrete fourier transform (DFT) in the first variable will give the sampled ρ opt (t, τ) and the matrix R opt .Examples of the eigenvalues and eigenvectors connected to the corresponding matrices R opt are computed for N = 256 and F s = 7 and depicted in Figure 4 where the 10 first eigenvalues are plotted in Figure 4(a) and the three first eigenvectors are seen in Figures 4(b),4(c),4(d), and 4(e) for the respective case.
Note that the eigenvalues which are optimal for the LSP with c = 1.1 and LSCP with c = 1.1, m = 2, are equal, which was shown to be true in [5].The eigenvectors for LSP with c = 1.1 and c = 3 have a very similar shape, and it was also shown in [5] that the eigenfunctions approximate Hermite functions and differ with a specific scaling factor.These approximations will be investigated in Section 4.

Evaluation
Using the set of windows and weighting factors described in the previous section, a performance comparison of different kernels and multitaper methods is made.For a fair comparison, the optimal parameter values are found for all methods in all evaluations.For the Wigner spectrum (WIG), [20], the time-lag kernel of the Wigner distribution is defined as and the Choi-Williams kernel (C-W), [21], has a time-lag kernel defined as where σ is optimized for the smallest mean square error in each case.In the computer evaluation, the C-W time-lag kernel for the discrete-time case proposed in [22], Chapter 6, page 271, is used.For the single Hanning window (HANN), the window length N w is optimized in each case.For the Welch method (WOSA), [23], also a Hanning window is used with an overlap of 50 % between the windows, and the optimal length N w , and number of windows K is found for each case.In the Thomson multitaper method (TH MW), the lengths N w of the K multitapers are optimized, and the relation with the resolution bandwidth determined by B = (K + 3)/N w in each case, [24].The peak matched multiple windows (PM MW), [25], are designed to give low correlations between subspectra when the spectrum of the random process includes peaks, that is, a spectrum corresponding to a process with large dynamics.It has been shown that this method gives a small mean square error for evoked potentials, [26] as well as for estimation of heart rate variability (HRV), [27].The window lengths N w as well as the number of windows K are optimized for the best result, and the corresponding resolution bandwidth is determined by B = (K + 3)/N w , similarly as for TH MW in each case.
The weighting factors are chosen as the eigenvalues α k = λ k , k = 1, . . ., K, as this choice is close to the optimal weighting factors, [28].The reason for comparing with these methods is that either they are very frequently used various applications or they are designed to be optimal for a similar type of process.
The total mean square error is defined as where E denotes expected value.Using a sufficiently large value N and an appropriate sampling frequency F s , the discrete-time case will produce results close to the continuous-time case.The parameter γ is used to adjust the kernel φ opt (ν, τ) to give the smallest possible mean square error for every method and choice of parameters.The parameter γ is optimized individually for all methods and parameter cases and is found by derivation as In the first simulation, the resulting total mean square errors (MSE) of the LSP-optimal multitapers (LSP opt ) for three different processes defined by (1), with the parameter c as c = 1.1, 3, and 20, ranging from nonstationary to more stationary processes, are calculated for N = 256 and F s = 7.The results of the different methods are presented in Table 1.The optimal choice is compared with a number of other methods, also optimized for the smallest error.For C-W, the parameter σ is varied in different ranges: σ = 1-38 for c = 1.1, σ = 0.05-1.9for c = 3, and σ = 0.01-0.38 for c = 20, to find the minimum value in each case.The window lengths N w that give the optimum result for HANN, WOSA, PM MW, and TH MW are varied between N w = 30 and 140.For WOSA, TH MW, and PM MW, a different number of windows ranging from K = 2 to 8 are also tested to find the number of windows that give a minimum error.
The results show that WIG and C-W give a mean square error very close to the optimal error, J opt = 19.23 when c = 1.1 and that, out of the spectrogram methods, HANN gives the best result.For c = 3, WIG as well as the result of C-W are larger than the spectrogram methods where the PM MW gives an error closer to the optimal mean square error J opt = 11.85.For c = 20, the result of the TH MW is closest to the optimal error J opt = 3.27.This example show  The next simulation evaluates the performance for an LSCP, (2) and the corresponding optimal windows, (LSCP opt ).In this case, we use m = 2 and d = −2 for the simulation of all three processes using the same c-values as in the preceding example.The evaluation is done using the same methods with the C-W parameter varied in the range, σ = 25-950 for c = 1.1, σ = 2.5-95 for c = 3, and σ = 0.025-0.95for c = 20.In Table 2, the results show that WIG and the C-W are almost as optimal as LSCP opt for c = 1.1.The optimal value for c = 3, J opt = 11.85 is not reached by any other method.For c = 20, the optimal result is J opt = 3.27 and the three multitaper methods give the closest result.
The last simulation with results shown in Table 3 evaluates the performance for the optimal window and weights of MLSP, (3), and (MLSP opt ), using two different processes in the sum for each of three examples, the first case with  Table 3: The mean square error of the MLSP-optimal multitapers and weights (MLSP opt ) compared to optimal choices of parameters for different methods evaluating the class of MLSPs in three cases: , the second case with c 1 = 2, c 2 = 10, and the third case with c 1 = 4, c 2 = 8.For the C-W, the evaluation is done using σ that varies from σ = 0.05-1.9 in the first and second case, and from σ = 0.025-0.95 in the third case.It is seen that none of the methods reach the optimal mean square error for this more complicated process.The HANN and the PM MW, however, give small errors comparable to the error of the MLSP opt multitapers in the first case.In the last case, all the multitaper methods are able to reach a small mean square error.The worst results are produced from the WIG and C-W in all three cases.

Approximation of Windows Using Hermite Functions
In [5], the optimal multitapers of the proposed class of LSPs are shown to approximate a set of Hermite functions.This approximation is appropriate in the continuous infinitelength case, but in the finite-length discrete-time case, a set of Hermite functions might be far from the true eigenvectors.How far, depends on the scaling of the matrix R opt with use of the sampling frequency parameter F s .If F s is very small, the covariance values are large at the edges of the matrix R opt and the Hermite functions certainly are inappropriate as eigenvectors.With F s too large, the larger part of the power of R opt will be located in just a few samples in the middle of the matrix, and the eigenvectors will consist of a few values that differ from zero.A proper choice between these extremes should be carefully considered.
Another result from [5] is that the Hermite functions can be dilated, where the dilation factors depend on c according to c 1/4 / √ 2. If these results transfer to the timediscrete case, a very efficient implementation of the method is possible.Such an approximation is advantageous when it comes to calculation, since only a limited number of Hermite functions need to be calculated instead of solving an possibly large-scale eigenvalue problem which is different for all parameters c in the class.If, as well, the same set of Hermite functions can be used for different c-values, an even more efficient implementation can be done as the windowed spectrograms can be computed once for all c-values and weighted together using the different sets of eigenvalues for each c.Thereby, the whole range of c-values are easily tuned by a new weighting of the set of computed spectrograms.A scaling of the sampling frequency of the optimal kernel and corresponding time-lag matrix R opt is done according to Instead of dilating the Hermite functions, the optimal kernel is dilated for different c-values.Using this c-dependent sampling frequency we investigate a set of Hermite functions, with t = n/F H for n = −N/2, . . ., N/2 − 1.The resulting set of vectors is The square error is computed for all sets of Hermite functions with F H varying from 0.1 to 50 with a step of ΔF H = 0.1.The best set of Hermite functions is chosen as the one where the sum of square errors between the first eigenvector and the first Hermite function is minimized, that is, Thereby, the whole set of optimal Hermite functions h FH opt k , k = 1, . . ., K, is determined for a specific F c s .However, the optimal set of Hermite functions found from this criterion does not necessarily have F Hopt = F s .The resulting square error ε k (F s , c, F Hopt ) is studied for a number of different F s and c-values, where the sampling frequency F c s according to ( 18) is used for the computation of the optimal kernel and corresponding multitapers.The results are given in Figure 5 for N = 256 and in Figure 6 for N = 32.The logarithm of the different error vectors for c-values c = 1.1, 3, 10, and 20 are depicted in the same figure as a function of F s and the results are presented for the 1st, 2nd, 4th, and 10th eigenvectors.The results in Figure 5 for N = 256 show that for the 1st eigenvector, the sampling F s is less sensitive than that of the 10th eigenvector.The range of F s , where the error is small, is between 2.5 and 20 for all q F c s 1 where it is just between 4 and 15 for q F c s 10 , see Figures 5(a) and 5(d).However, with a proper choice of F s , the Hermite functions approximate the first 10 eigenvectors in an appropriate way for N = 256.Within this range, it can also be verified that F s = F Hopt , and thereby the same set of Hermite functions can be used independently of the value of c.In Figure 6, the square error ε k (F s , c, F Hopt ) is shown for N = 32, where the error is small in the range of F s between 2 and 2.5 with an maximum error of 10 −7 (1.5 and 3.4 if we allow a larger maximum error of 10 −5 ).Also within these ranges, it can be verified that F s = F Hopt .The error increases for the 2nd and 4th eigenvectors.The range of the minimum errors is, however, about the same as that of q the error is much larger independently of F s (note the larger scale on the Y -axis).This means that for small N-values, the higher order eigenvectors are not possible to approximate as Hermite functions with a small error.The corresponding ranges for N = 512, 128, and 64 have also been studied, and the results are summarized in Table 4.

Reduction of the Number of Averaged Spectrograms
The only variables left to adapt the optimal set of multitapers to a certain process are the eigenvalues.In Figure 7, the eigenvalues for different c are depicted where the calculations are made for N = 512 and F s = 15.Note the logarithmic Xaxis.Using other values of N with appropriate corresponding F s will give equal sets of eigenvalues.Changing F s and N do not alter the eigenvalues, as N and F s just change the sampling and size of the matrix R opt .This is verified to be true if R opt is still sampled in an appropriate way, with the choice of F s in the intervals specified in Table 4. From Figure 7, it is seen how the eigenvalues vary with c, for example, the λ 1 decreases slowly where λ 2 is first negative and then positive.For large values of c, several eigenvalues differ from zero where, for example, for c = 2 there are just a few eigenvalues that differ from zero.An important aspect is that a multitaper spectrogram should be calculated with a reasonable number of windows.This is possible if a major part of the eigenvalues is close to zero.Here, four different levels for the eigenvalues to contribute to the final estimate are suggested based on where δ = 0 (i.e., all eigenvalues are included), δ = 0.01, δ = 0.05, and δ = 0.2.With this choice, the number of averaged multiple spectrograms reduces according to Figure 8.With the level set to δ = 0.01, the number of spectrograms vary from K = 3 for c = 1.5 to K = 23 when c = 30, solid line of Figure 8.As many as 23 averaged spectrograms are rather impractical, and using the level δ = 0.05, K reduces to be between 2 and 7, dashed line.In the last case with δ = 0.2, the number of included eigenvalues might be too small to give an  and the level δ = 0.01 gives the same result as using all windows, (δ = 0).The last example in Table 7 shows the result when N = 32, but F s is now altered to be outside the appropriate range, to F s = 5.This choice of F s gives a very large error when all windows are included in the estimate, (δ = 0).The reason of course is that the Hermite functions are not at all appropriate to use for larger K, as they are no longer similar to the eigenvectors.Reducing the number of windows, however, δ ≥ 0.01, gives a mean square error sufficiently close to the true J opt .This shows that the Hermite functions together with limiting the number of windows give an appropriate estimate even if F s is chosen inappropriately.

Estimation of Induced Potential Power in the Electroencephalogram
To show the performance of the proposed method and especially show the possibility to view The stimulus material consisted of different 13-second video clips merged to a 2×15-minute continuous sequence, intended to show the three kinds of social behavior; communicative (protodeclarative or protoimperative gestures) and communicatively neutral parallel play.Silver-silver chloride electrodes (EasyCap, Falk Minow) were placed according to the international 10-20 system.The vertical electrooculogram (VEOG) was recorded from electrodes placed above and below the right eye, and the horizontal electrooculogram (HEOG) was recorded from electrodes placed lateral to the left and the right eye.All electrodes were referenced to the average of the left and the right mastoids.Impedances were kept below 5 kΩ for all electrodes.The EEG was recorded with a 0.1/70 Hz band-pass filter at a sampling rate of 500 Hz, and amplified with a Neuroscan NuAmps amplifier.The sampling frequency is reduced to 62.5 Hz by decimation in Matlab and before and after the 2s epochs, zeros are added to allow for the total data window length.The included data epochs are limited so that the same number of epochs is used for the declarative, imperative, and parallel play acts.For subject 1, the number of epochs were 20 in each of the three cases, and for subject 2, the number of epochs were 29.
The averaged power from two different subjects is estimated using a set of Hermite functions with N = 256 and F s = F Hopt = 12.Different sets of eigenvalues are used as weighting factors, corresponding to c = 1.1, c = 3, and c = 20.Using δ = 0.05, the estimation of the spectrogram is then done using a reasonable number of averaged spectrograms, K = 10, K = 2 and, K = 6, respectively.Remember, that with this procedure only the averaging of the spectrograms need to be redone for a new value of the parameter c, the individual single-window periodograms with the set of chosen Hermite functions are the same.The results of two subjects are shown in Figures 9,10,and 11.For each subject, the averaged estimates for all epochs are shown.The colormap used is the same for each row (each subject) in all the figures.The results show that the induced responses are stronger around 6 Hz and between 0 and 1 s for the protodeclarative as well as the protoimperative act compared to the parallel play act, for both subjects.Similar performance is found for other subjects.In the different figures, the resolution varies from being similar to the Wigner distribution (c = 1.1) in Figure 9, to almost a single-window spectrogram estimator (c = 3) in Figure 10 (just K = 2 windows) and finally similar to the usual multitaper estimator, for example, the Thomson estimator, (c = 20) in Figure 11.This shows the possibility to easily investigate a whole range of estimators when the data model is not well defined.

Conclusions
The mean square error optimal multitaper spectrogram estimator, for a class of locally stationary processes is evaluated and compared to other frequently used methods: the Wigner distribution, the Choi-Williams distribution, the single Hanning spectrogram, the Welch method, the Thomson multitapers, and the peak matched multitapers.The results show that it is possible to find other methods that give a result more or less close to the result of the optimal multitapers in mean square error sense, but it is different methods for different processes of the class.An evaluation is also made for the classes of locally stationary chirp processes and sum of locally stationary processes.
An investigation of the error shows that with a proper sampling frequency of the windows, the windows are well approximated by a set of Hermite functions where the same set of functions is valid for the whole class if an appropriate dilation factor is used.Another advantage is that the optimal set of weighting factors can be stored and utilized for all different window lengths.It is also investigated if the number of included spectrograms in the average can be reduced, and the results show that only a small number of windows, 1-10, are needed for an appropriate approximation of the true kernel.
The advantage of the usage of the Hermite functions is that the multitapers as well as the corresponding spectrograms can be calculated and stored, and then a tuning process using just one parameter, which determines the weights, will give the final weighted multitaper spectrogram estimate for a certain parameter of the whole class.Power estimation of induced potentials of the brain from toddlers' R LSP x , c = 1.1, m = 2, d = R LSP x , c 1 = 1.1, c 2 = 20

Figure 5 :
Figure 5: The dependence on F s of the errors between the eigenvector optimal to LSP and the corresponding hermite function for different c-values.The 1st, 2nd, 4th, and 10th eigenvectors are compared.The eigenvector length is N = 256.

Table 1 :
The mean square error of the LSP-optimal multitapers and weights (LSP opt ) compared to optimal choices of parameters of different methods evaluating the class of LSPs when c = 1.1, 3, and 20.

Table 4 :
The appropriate range for F s = F Hopt for small errors when using Hermite functions as multitapers optimal to LSP for different values of N.

Table 5 :
Results of the MSE of the LSP for different parameter choices using Hermite functions as windows and a decreasing number of eigenvalues, N = 256 and F s = 7.

Table 6 :
Results of the MSE of the LSP for different parameter choices using Hermite functions as windows and a decreasing number of eigenvalues, N = 32 and F s = 2.5.

Table 7 :
Results of the MSE of the LSP for different parameter choices using Hermite functions as windows and a decreasing number of eigenvalues, N = 32 and F s = 5.