EURASIP Journal on Applied Signal Processing 2005:19, 3122–3127 c ○ 2005 Hindawi Publishing Corporation Finding Significant Correlates of Conscious Activity in Rhythmic EEG

One of the important issues in designing an EEG-based brain-computer interface is an exact delineation of the rhythms, related to the intended or performed action. Traditionally, related bands were found by trial and error procedures seeking maximum reactivity. Even then, large values of ERD/ERS did not imply the statistical significance of the results. This paper presents complete methodology, allowing for a high-resolution presentation of the whole time-frequency picture of event-related changes in the energy density of signals, revealing the microstructure of rhythms, and determination of the time-frequency regions of energy changes, which are related to the intentions in a statistically significant way.


INTRODUCTION
Thinking of a "brain-computer interface" (BCI), one can imagine a device which would directly process all the brains output-like in a perfect virtual reality machine [1]. Today's attempts are much more humble: we are basically at the level of controlling simple left/right motions. On the other hand, these approaches are more ambitious than direct connections to the peripheral nerves: we are trying to guess the intention of an action directly from the activity of the brains cortex, recorded from the scalp (EEG).
Contemporary EEG-based BCI systems are based upon various phenomena like, for example, visual or P300 evoked potentials, slow cortical potentials, or sensorimotor cortex rhythms [2]. The most attractive path leads towards the detection of the "natural" EEG features, for example a normal intention of moving the right hand (or rather its reflection in EEG) would move the cursor to the right. Determination of such features in EEG is more difficult than using evoked or especially trained responses. Desynchronization of the µ rhythm is an example of a feature correlated not only with the actual movement, but also with its mere imagination.
All these approaches encounter obstacles, common in the neurosciences: great intersubject variability and poor understanding of the underlying processes. Significant improvement can be brought by coherent basic research on the EEG representation of conscious actions. This paper presents two methodological aspects of such research.
(i) High-resolution parameterization and feature extraction from the EEG time series. Scalp electrodes gather signal from many neural populations, so the rhythms of interest are buried in a strong background. Owing to the high temporal resolution of EEG and the oscillatory character of most of its features, we can look for the relevant activities in the time-frequency plane. (ii) Determination of significant correlates of conscious activities requires a dedicated statistical framework. Until recently, reporting significance of changes in the time-frequency plane presented a serious problem.

TIME-FREQUENCY ENERGY DENSITY OF SIGNALS
Among the parameters used in nowadays BCI systems (like those designed in the Graz University of Technology [3]), event-related desynchronization and synchronization (ERD/ERS) phenomena play an important role. ERD and ERS are defined as the percentage of change of the average (across repetitions) power of a given rhythm-usually µ/α, β, and γ [4]. Estimation of the time course of the rhythm energy is crucial for the sensitivity of these parameters. But due to the intersubject variability, we cannot expect the rhythms to appear at the same frequencies for all subjects. Therefore, a classical procedure was developed to find the reactive rhythms [4]. For each subject, the frequency range of interest was divided into 1 Hz intervals, in each of them the single trials (repetitions) were bandpass filtered, squared, and averaged, to obtain the estimate of the average band energy. Among these fixed bands, those revealing the largest changes related to the event were chosen. This naturally limits the

Time-frequency distributions of energy density
Because of the uncertainty principle, there are many alternative estimates of the time-frequency density of signal's energy. Actually, the same problem (nonunique estimates) is present also in calculating the spectral power or bandpass filtering finite sequences, but in the quadratic time-frequency distributions we may say that the relevancy of the problem is "squared." Fluctuations of power spectra, appearing at high resolutions, in the time-frequency distributions take the form of cross-terms. These false peaks occur in between the autoterms (which correspond to the actual signal's structures), and significantly blur the energy estimates ( Figure 1). Their presence stems from the equation (a + b) 2 = a 2 + b 2 + 2ab. Quadratic representation of an unknown signal s, composed of two structures a and b, contains autoterms corresponding to these structures (a 2 and b 2 ) as well as the crossterm 2ab. For a signal more complex than a sum of two clear and separate structures (like the simplistic simulation in Figure 1), cross-terms are indistinguishable from the autoterms. Advanced mathematical methods are being developed for the reduction of this drawback [5]. While some of them give impressive results for particular signals, in general we are confronted with the tradeoff: higher resolution versus. more reliable (suppressed cross-terms) estimate.

Adaptive approximations
If we knew exactly the structures (a and b) of which the signal is composed, we might explicitly omit the cross-term 2ab, thus obtaining a clear time-frequency picture. In practice, this would require a reasonably sparse approximation of the signal in a form where g i are known functions fitting well the actual signal's structures. This may be achieved only by choosing the functions g i for each analyzed signal separately. 1 Criterion of their choice is usually aimed at explaining the maximum part of signal energy in a given number of iterations (M). However, the problem of choosing the optimal set of functions g i is intractable. 2 A suboptimal solution can be found by means of the matching pursuit (MP) algorithm [7]. But even this suboptimal solution is still quite computer-intensive, 3 so the first practical applications were not possible before mid-nineties [8]. The MP algorithm and construction of an estimate of the signal's time-frequency energy density, which is free of cross-terms, are described in the appendix. Functions g i are chosen from large and redundant collections of Gabor functions (sine-modulated Gauss).
Advantages of this estimator in the context of eventrelated desynchronization and synchronization were discussed in [9,10].

Experimental data
To present advantages of the presented methodology, the classical ERD/ERS experimental setup was modified to obtain relatively long epochs of EEG between the events.
Thirty-one-year-old right-handed subject was half lying in a dim room with open eyes. Movement of the thumb, detected by a microswitch, was performed approximately 5 seconds (at a subject's choice) after a quiet sound generated approximately every 20 seconds. Experiment was divided into 15-minute sessions, and recorded EEG into 20-second long. After artifacts rejection, 124 epochs were left for the analysis. EEG was registered from electrodes at positions selected from the 10-20 system. Figures 2-4 present results for the C4 electrode (contralateral to the hand performing movements) in the local average reference. Signal was down-sampled offline from 250 Hz to 125 Hz. Figure 5 presents data from another subject, collected in a standard ERD/ERS experiment.  Figure 2). Shades of gray are proportional to the percentage of change relative to the reference epoch (between 1 and 3 seconds in Figure 2).

High-resolution picture of energy density
Time-frequency estimates of the signal energy density, including the MP estimate given by (A.5), contain no phase information, so they can be summed across the trials to give the average time-frequency density of energy. 4 Figure 2 presents such an average for 124 repetitions of EEG synchronized to the finger movement, occurring in the 12th second. We easily observe that the α rhythm concentrates around 12 Hz. We may also notice its decrease (desynchronization) around the time when finger movement occurred, as well as some increased activity in 15-30 Hz near 12-13 seconds (β synchronization).
In another experiment ( Figure 5), high-resolution estimate revealed clearly two very close but separate components of the µ rhythm with different time courses-an effect elusive to the previously applied methods.

High-resolution ERD and ERS
Speaking of the decrease in the α rhythm in the previous section, we compared the activity near the 12th second ( Figure 2) to the average level of the α rhythm energy, or, more correctly, to a period before the movement, which should not be related to the event. To quantify this procedure, we must define the reference period, to which the energy changes will be related. It should be distant enough from the onset of the event, to avoid incorporating premovement correlates into the reference. To avoid border problems of estimates, it should be also removed from the very start of the 4 Note that the average of the energy densities is in general different from the energy density of the averaged signal. The latter (averaged signal) reveals phase-locked phenomena like for example the classical evoked potential. analyzed epoch. In Figure 2 it was chosen between the 1st and the 3rd second.
Classically, for each selected band, ERD/ERS were calculated as the percentage of power relative to the reference epoch (ERD corresponding to a decrease and ERS to an increase). Owing to the high-resolution estimate of the whole picture of energy density, we may calculate it for the whole relevant time-frequency region with maximum resolution. ERD/ERS map in Figure 3 was obtained as a ratio of each point's energy to the average energy of the reference epoch in the same frequency. In this plot we observe, like in Figure 2, darker area (increase) corresponding to the β postmovement synchronization, and white spot around the time of the movement, corresponding to the α desynchronization. However, in the long premovement period there are still a lot of fluctuations, which naturally implies a question about the statistical significance of the observed changes.

STATISTICAL SIGNIFICANCE
The following steps constitute a fully automatic (hence objective and repeatable) and statistically correct procedure, which delineates and presents with high resolution the timefrequency regions of significant changes in the average energy density.
(1) Divide the time-frequency plane into resels (from resolution elements), for which the statistics are calculated (Section 4.1).
(2) Calculate pseudo-t statistics and p-values for the null hypothesis of no change in the given resel compared to the reference epoch in the same frequency (Section 4).
(3) Select a threshold for the null hypothesis corrected by multiple comparisons (Section 4.3).
(4) Display the energy changes calculated for maximum  Figure 5: Average time-frequency energy density (2) of 57 trials from the C1 electrode (average reference), constructed for g γi longer than 250 milliseconds. Presented from 5 to 15 Hz is the finger movement in the 5th second. We observe two very close, but separate, µ rhythms with different time courses. Faster rhythm desynchronizes about 1.5 seconds before the movement, while the slower lasts until its very onset and desynchronizes in the 5th second. resolution (Section 3.2) in windows corresponding to resels which indicated statistically significant changes. These steps will be described in the following sections. Further details can be found in [10].

Integration of MP maps in resels
In choosing the dimensions of a resel, suitable for the statistical analyses, we turn to the theory of the periodogram sampling [11]. For a statistically optimal sampling of the periodogram the product of the frequency interval and signal length gives 1/2. This value was taken as the product of the resel's widths in time and frequency, their ratio being a free parameter.
Calculating the amount of energy in such relatively large resels simply as the value of the distribution (A.5) in its center, that is, may not be representative for the amount of energy contained in a given resel. In such case 5 we use the exact solution: (3)

Resampling the pseudo-t statistics
The values of energy of all the N repetitions (trials) in each questioned resel will be compared to the energies of resels within the corresponding frequency of the reference epoch. For each resel at coordinates {t i , ω i } we will compare its energy averaged over N repetitions with the energy averaged over repetitions in resels from the reference epoch in the same frequency. Their difference can be written as where the superscript "k" denotes the kth repetition (out of N). However, we want to account also for the different variances of E k , revealing the variability of the N repetitions. Therefore we replace the simple difference of means (4) by the pseudo-t statistics: where ∆E is defined as in (4), and s ∆ is the pooled variance of the reference epoch and the investigated resel. In spite of the central limit theorem, this magnitude tends to have nonnormal distribution [10]. Therefore, we use resampling methods.
We estimate the distribution of t from (5)-under the null hypothesis of no significant change-from the data in the reference epoch (for each frequency N · N ref values) by drawing with replacement two samples of sizes N and N · N ref and calculating, for each such replication, statistics (5). This distribution is approximated once for each frequency. Then for each resel the actual value of (5) is compared to this distribution yielding p for the null hypothesis.
The number of permutations giving values of (5) exceeding the observed value has a binomial distribution for N rep repetitions with probability α. 6 Its variance equals N rep α(1 − α). The relative error of α will be then (cf. [12]) To keep this relative error at 10% for a significance level α = 5%, N rep = 2000 is enough. Unfortunately, due to the problem of multiple comparisons discussed in Section 4.3, we need to work with much smaller values of α. In this study N rep was set to 2 · 10 6 , which resulted in relatively large computation times.

Adjustment for multiplicity
In the preceding section, we estimated the achieved significance levels p for null hypotheses of no change of the average energy in each resel, compared to the reference region in the same frequency. Adjusting results for multiplicity is a very important issue in case of such a large amount of potentially correlated tests. As proposed in [10], it can be effectively achieved using the false discovery rate (FDR, [13]). It controls the ratio q of the number of the true null hypotheses rejected to all the rejected hypotheses. In our case this is the ratio of the number of resels, to which significant changes may be wrongly attributed, to the total number of resels revealing changes.
Let us denote the total number of performed tests, equal to the number of questioned resels, as m. If for m 0 of them the null hypothesis of no change is true, then [13] proves that the following procedure controls the FDR at the level q(m 0 /m) ≤ q.
(1) Order the achieved significance levels p i , approximated in the previous section for all the resels separately, in an ascending series: (3) Reject all hypotheses for which p ≤ p k . 6 For brevity we omit the distinction between the exact value α, which would be estimated from all the possible repetitions, and the actually calculated. Figure 4 gives the final picture of statistically significant changes in the time-frequency plane. It is constructed by displaying the high-resolution ERD/ERS map (Figure 3) only in the areas corresponding to the resels which revealed statistical significance in the procedure from Section 4. Desynchronization of 12-Hz α occurs around the time of the movement (12th second). Synchronization of 18-30 Hz β, occurring just after the movement, is divided in half by the harmonic of α (24 Hz). In the long premovement epoch no significant changes are detected, which suggests the robustness and reliability of the whole procedure.

CONCLUSIONS
Presented procedure gives high-resolution and free-of-crossterms estimates of the average time-frequency energy density of event-related EEG, revealing the microstructure of rhythms. Time-frequency area of significant changes are assessed via objective statistical procedures. This allows for example to investigate the minimum number of repetitions required to delineate the reactive rhythms. Application of this methodology may bring a significant improvement in basic research on the event-related changes of EEG rhythms, as well as "per subject" customization of the ERD/ERS-based BCI.

REPRODUCIBLE RESEARCH
Software for calculating the MP decomposition (appendix), with complete source code in C and executables for GNU/Linux and MS Windows, plus an interactive display and averaging of the time-frequency maps of energy (in Java), are available at http://brain.fuw.edu. pl/∼durka/software/mp. Datasets used in Figures 5-4 and Matlab code for calculating maps and statistics like : are available at http://brain.fuw.edu.pl/∼durka/tfstat/.

MATCHING PURSUIT ALGORITHM
In each of the steps, a waveform g γn from the redundant dictionary D is matched to the signal R n f , which is the residual left after subtracting results of previous iterations: where arg max gγ i ∈D means the g γi giving the largest value of the product | R n f , g γi |.
Dictionaries (D) for time-frequency analysis of real signals are constructed from real Gabor functions: g γ (t) = K(γ)e −π((t−u)/s) 2 N is the size of the signal, K(γ) is such that g γ = 1, γ = {u, ω, s, φ} denotes parameters of the dictionary's functions. For these parameters no particular sampling is a priori defined. In practical implementations we use subsets of the infinite space of possible dictionary's functions. However, any fixed scheme of subsampling this space introduces a statistical bias in the resulting parameterization. A bias-free solution using stochastic dictionaries, where parameters of the dictionary's functions are randomized before each decomposition, was proposed in [14].
For a complete dictionary the procedure converges to f -in theory in an infinite number of steps [7], but in practice we use finite sums: