Adaptive filtering and analysis of EEG signals in the time-frequency domain based on the local entropy

The brain dynamics in the electroencephalogram (EEG) data are often challenging to interpret, specially when the signal is a combination of desired brain dynamics and noise. Thus, in an EEG signal, anything other than the desired electrical activity, which is produced due to coordinated electrochemical process, can be considered as unwanted or noise. To make brain dynamics more analyzable, it is necessary to remove noise in the temporal location of interest, as well as to denoise data from a specific spatial location. In this paper, we propose a novel method for noisy EEG analysis with accompanying toolbox which includes adaptive, data-driven noise removal technique based on the improved intersection of confidence interval (ICI)-based algorithm. Next, a local entropy-based method for EEG data analysis was designed and included in the toolbox. As shown in the paper, the relative intersection of confidence interval (RICI) procedure retains the dominant dipole activity projected on electrodes, while the local (short-term) Rényi entropy-based analysis of the EEG representation in the time-frequency domain is efficient in detecting the presence of P300 event-related potential (ERP) at specific electrodes. Namely, the P300 are detected as sharp drop of entropy in the temporal domain that enabled accurate calculation of the index of the noise class for the EEG signals.

the underlying phenomena and brain dynamics of the subject. While some brain dynamics, like the activity during sleep and epileptic seizures, could be difficult to accurately correlate to induced events, some other potentials like contingent negative variation (CNV), mismatch negativity (MMN), and P300 are related to the occurred event. This event can be marked on a time scale, and the EEG following this event could be observed with higher temporal resolution. Such potentials which are neural responses of events are often referred to as event-related potentials (ERP). One of the early ERP researchers, Herb Vaughan in 1969, defined ERP as the general class of potentials that display stable time relationships to a definable reference event.
P300 is one of the most studied ERPs, arguably one of those potentials which is understood the most when compared to other ERPs [1]. Research on the P300 component by Squires, Squires, and Hillyard (1975)  P3b component is observed for targets that are infrequent but are in some sense expected; most researchers refer to this as P3 component or P300 component. It gets its name due to the fact that the early researchers noticed that the ERP had a peak amplitude at around 300 ms [2]. Work done by Donchin (1981) proposed that the P300 wave is somehow related to a process called "context updating"; this was confirmed by other researchers as well [3,4], considering these there has been development of practical response to stimuli communication methods. Such methods are built on algorithms that leverage real-time streaming of EEG data to classify between a P300 signal (which is target ERP), and a non-target ERP [4][5][6][7][8]. Any such system which interacts with computer and tries to interpret brain signals to do a certain task is called brain computer interface (BCI) systems. The importance of response to stimuli detection and identification has been shown in a significantly large number of experiments with different kinds of stimuli [4][5][6]9]. Popular among them is the visual P300 ERP experiment, often used as a speller [10]. Namely, in the visual P300 experiment, the user is shown a matrix of characters, images, or faces; the user is expected to focus or perform a specific response to target stimuli by counting or pressing a button whenever the stimuli flashes. This response to stimuli is expected to produce P300 ERP. Nonetheless, P300 are not the only brain dynamics that are occurring during the experiment, as other neuronal activities are constantly happening in the brain which, in our case, are considered as noise. Additionally, there are also other electrical activities that electrodes pick up like 50 Hz line noise, eye blink artifacts, and muscle artifacts. There are numerous effective artifact removal techniques which get rid of these artifacts, therefore we shall consider EEG data with dipole activations and without artifacts in our data processing steps.
With real-life EEG data, it is difficult to validate and study the advantages and limitations of data analysis methods, but with deep understanding of the data, it becomes clear what to expect from the data simulations, and how to treat the data, allowing the validation of data analysis methods with ground truth data. Moreover, the advantage of being able to manipulate effect sizes and noise characteristics in simulated data, which is not possible to do in real data, makes the simulation of data even more essential. A fair amount of research work has been done using simulated data [11][12][13][14]. In our case, we simulate data with dynamics, features, and parameters of P300 ERP in mind in order to make it look as similar as possible to real data. Next, we analyze some real-life P300 data [10], keeping in mind that the ground truth about this data could be affected by artifacts, noise, and mood of the subjects and other factors, and also that data sample is not large enough to draw any cognitive conclusions.
The rest of the paper is structured as follows. Section 2 provides information on the structure of the data analyzed in the paper. An adaptive, data-driven method for noise removal from EEG records is described in Section 3.1, followed by the representation of the EEG signals in timefrequency domain explained in Section 3.2. Section 3.3 presents the proposed approach for detecting the phenomena of interest in EEG time-frequency domain using the local (short-term) entropy measures both for simulated and a real-life P300 EEG data. Discussion on visual comparison of results are elaborated in Section 4, followed by conclusion in Section 5.

Structure of simulated data
The biophysical events, which give rise to scalp ERP, are presumed to occur if an excitatory neurotransmitter is released at the apical dendrites of a cortical pyramidal cell. Resulting phenomena is detected as the current flowing from the extracellular space into the cell, yielding a net negativity on the outside of the cell in the region of the apical dendrites. To complete the circuit, the current will also flow out of the cell body and basal dendrites, yielding a net positivity in this area. Together, the negativity at the apical dendrites and the positivity at the cell body create a tiny dipole. Hence, by interpreting this in terms of source activation, a dipole is defined as a pair of positive and negative electrical charges separated by a small distance.
As shown in Fig. 1a, we use a non-adaptive distributed source imaging method with 2004 dipoles for a standard head shape with fixed locations and orientations (their magnitudes are estimated by defining a set of electrode weights for each source location). Thus, 64 electrodes and 3 dipole orientations generate 64 × 3 × 2004 weighting matrix. For example, multiplying the data from all electrodes by the weights for the 637th element of that weighting matrix would give you the estimate of activity at the 637th voxel in the brain model for the selected head shape.
However, accurate spatial localization of P300 ERP is rather challenging and requires further technological advancements to achieve precise source localization. This is due to the fact that for each person's idiosyncratic folding pattern of the cortex is not the same. Thus, in this paper, we shall consider, based on reliable sources, that P300 dipole activation was largest at a midline parietal electrode site [1]. So, in order to simulate a P300 ERP, dipole activation is simulated to occur and peak at 300 ms, and the effect of this dipole activation is dominantly visible at electrode locations/numbers Pz(31), CPz(32), P1 (20), P2(57), CP1 (19), and CP2(56) (but it can also be seen on some other more distant electrodes) as visible in Fig. 1b. This kind of simulations where one dipole is activated is commonly addressed as forward model. Some of the disadvantages of the above method are that the weights for each electrode are not fine-tuned to the statistical properties of the data. Another disadvantage is the number of comparisons that can be computed must be controlled in statistical analyses. But having clear hypotheses to constrain the analyses and results search space will help address both the issues to a significant extent [15].
One particular disadvantage of the ERP technique is that ERPs are so small that it usually requires more than one trials to measure them accurately. Therefore, we consider 30 trails which seems to be a modest number of trails for a P300 experiment [9].
In order to make the simulated data as real as possible, let us also consider the situation where the subject who is responding to the event does not necessarily respond to the event with millisecond accuracy and with the same concentration and state of mind for each trial. This results in P300 ERP which might not accurately peak at 300ms causing latency variability. In addition, there are other brain dynamics which occur around P300, viz, P100 or P200 and N400, which might or might not be visible due to the effect of other dipole activation. Considering these factors, we do not just use a Gaussian waveform to represent a P300, but we do a pointwise multiplication of Gaussian and a trial unique sine wave (where each trial has its own random phase value) which results into a trial unique Morlet-like wavelet.
Let us now consider a Gaussian taper with a peak at 300 ms and sinusoidal waveform of 10 Hz.
where a is the amplitude of Gaussian, t is the total number of time point, m is peak time, and σ is the width of the Gaussian, and the sinusoidal waveform where f is the frequency of the sinusoidal waveform, t is the total number of time points, and ξ random value which shifts the phase of the waveform randomly on each trial. With t values same for both Gaussian and Sinusoidal, pointwise multiplication of these two waveforms gives us a wavelet which looks like a typical Morlet wavelet.
For the wavelets, which are generated for T = 30 trials, the phase randomly changes for each trial. The average of all these wavelets from each trial at the dipole level is shown in Fig. 2a.
Unlike the Gaussian taper which smoothly dissipates to zero on both the sides, Morlet wavelet integrates to zero. This means it has both positive and negative values in the middle and tapers to zero on both the ends such that the sum over integral is approximately equal to zero. This kind of narrow-band responses which are also transient in nature are typical for cognitive brain responses [16,17].
The noise in our simulated data is just an activation of other dipoles which are projected onto the electrodes. An immediate intuitive thought when trying to simulate noise is to add white noise, i.e., adaptive white Gaussian , but in our EEG simulation, it is crucial to consider that noise should look like temporal dynamics that have been recorded by the electrode placed on the scalp. These temporal dynamics in real-life data are the results of neural responses due to activities which are uncorrelated to event, viz, other cognitive, motor, perceptual, and language processes. Such data on the dipole level has a mixture of signals with different frequencies and amplitudes.
In order to represent this, we generate a pink noise, initially generated in the frequency domain, Fig.3a. It decreases with an increase in frequency, due to which it is also referred to as factorial noise or 1/f noise. The parameter called the exponential decay ed allows us to control how frequencies decays as we initially represent these amplitude varying frequencies in the frequency domain with where ξ is a random variable generated for each data point n which has 1024 data points as we consider 1 s long signal with sampling rate of 1024. A c represents the amplitude coefficients of ξ random variables which are multiplied by exponential parameter from Euler's formula e ik to give f c Fourier coefficients Here, ξ is the random sizes of amplitude parameter for n data points. Coefficients such as f c are generated for each dipole that is the real part of the inverse Fourier transform which represents the noise that is generated by each dipole The histogram representation of the noise in Fig.3b shows that the noise is normally distributed, and the time domain representation in Fig. 3c shows the temporal (2020) 2020:7 Page 5 of 18 structure of the data which very closely represents the real EEG signal. The projection of dipoles onto the electrodes can be represented with a linear algebraic equation where X is a matrix of dipole time series data, which is multiplied with L the lead field matrix to give matrix Y which is electrode time series data [18]. Figure 2b represents the trial average of ERP at electrode CPz, which shows activity due to the projection of dipoles onto CPz. The interesting thing about the electrode CPz is that it is one of those electrodes which is near to the scalp location where the P300 dipole activation is happening. Therefore, features of P300 in the ERP along with noise are visible in Fig. 2b. Now that we have established ground truth about the data, in order to evaluate the accuracy of the data analysis method, we can compare the results of the data processing pipeline in Section 3.1, against the data that is simulated and projected onto the electrodes. This gives us an evaluation of the ground truth about the data and also the effectiveness of the data analysis method.

RICI algorithm
A fairly simple and most commonly used method of isolating ERPs from the EEG noise (which is the result of activation of other dipoles) is to average across trials. Averaging across more and more trails will progressively reduce the EEG noise. But as the EEG noise N decreases as a function of the square root of the number of trails T, the size of noise in an average of T trails equals (1/ √ T) × N; therefore, increasing the number of trials only works to a certain extent. Figure 4 shows the trail averaged spatial activity across all the electrodes. Here, in each electrode, the ERPs are isolated from the EEG noise, but as it can be seen that  there is one major limitation to this method, i.e., the number of EEG trails used to average out EEG noise is not enough, as there is still much noise that is obscuring the features of interest. It might be tempting to increase the total number of trials by, for example, four times (T = 120), and this might reduce the noise by 50%. But this is not a practical solution since the time needed to detect ERP would be much higher, which in turn increases the length of the whole EEG recording session. An alternative approach that we propose in this paper is tested with the simulated P300 ERP (from Section 2). This method is based on adaptive temporal filtering, where adaptive filters are able to reconfigure themselves by tracking non-stationary changes in the signal.
A method that originally proposed this kind of adaptive filtering [19] was shown to efficiently adapt to unknown smoothness of the signal [20] ensuring close to optimal bias-to-variance trade-off [21], getting its name based on its functionality as the intersection of confidence intervals (ICI) method [20]. The goal of the ICI based denoising procedure is to obtain discrete estimatesŷ(t i ) as close as possible to the noise-free signal y(t i ); the ICI method does so by selecting upper and lower boundaries of confidence interval to track non-emptiness of confidence intervals intersection, whose limits can be expressed as: and where is the preset threshold, h is the window size which is less than or equal to optimal window size [22], and t i is instantaneous time for the estimatesŷ(t i ). Note that the performances of the ICI-based denoising are highly dependent on threshold [23], which causes oversmoothing for larger values of and undersmoothing for small values of . An improved ICI-based method, called the relative intersection of confidence intervals (RICI), has been proposed in [24]. The modified approach takes into account the amount of overlapping and defines R(t i , h) as a ratio of the overlapping versus the size of the considered confidence interval: improved method introduces threshold R c as a additional criterion defined as: where R c ∈[ 0, 1]. For our EEG signal, we choose the values = 4.4 and R c = 0.85, as suggested in [25], for temporal smoothing which significantly reduces noise retaining features of the narrow band cognitive response around 300 ms. This can be seen at electrode CPz(32), Pz(31), CP2(56), and P2(57) in Fig. 5, when compared to Fig. 4. Anyway, since RICI filtering does not have prior knowledge of the feature that needs to be retained, it cannot distinguish between other features and P300 ERP; therefore, there are features in Fig. 5 which may be misunderstood as useful information. Hence, in order to understand the behavior of the P300 feature better, we express the data in time-frequency domain in Section 3.2, in order to represent and understand the behavior of the distribution with respect to changes in its entropy in Section 3.3. As electrodes CPz, Pz, CP2, and P2 are primarily influenced by P300 dipole activation and given the correct adjustment of parameters and R c , which allows RICI to denoise the ERPs at these electrodes, the RICI results are shown and discussed with more details in Section 4.
Finally, it is worth mentioning that a plugin for EEGLAB toolbox [26], where the RICI method can be applied on any data that can be imported into EEGLAB, is been built. In this plugin, the parameters and R c can be adjusted manually to best suite the data that needs to be denoised and features that need to be retained. The goal here is to provide a tool to the researchers to manually do denoising on specific temporal regions of the data or run denoising on the whole dataset by selecting parameters of and R c using the RICI plugin.

Time-frequency ERP representations
The non-stationarities of an ERP signal are encoded in the phase spectrum of the signal, and their direct extraction causes problems like phase unwrapping, as the frequency characteristics of such a signal are equally important. The assumption that the signal is roughly stationary over some shorter (sliding) time window gives us a better frequency resolution because instantaneous frequency commonly describes phase variations. This allows us to compute the power spectrum of a signal over short windows of time resulting in high resolution (energy concentration) and integrity (correct boundaries in time and frequency) [27].
Hamming window decays quite fast relative to other window functions, which suites our need for building a distribution which reflects any quick changes in the signal onto the distribution. Let us consider a Hamming window of size h(τ ) = 7 as an analyzing window in a quadratic distribution: which is known as smoothed pseudo-Wigner-Ville distribution (SPWVD) [28]; we therefore obtain energy distribution by smoothing the density distribution with the window function h(τ ). The resulting time-frequency power plots for few EEG electrodes are shown in plot (b) of Figs. 6, 7, 8, 9, 10, and 11 of each electrode which shows the energy spectrum of the distribution for P300 ERP at electrodes CPz, Pz, CP2 and P2. Also, notice that some of these electrodes, viz, Oz and F6, do not have the P300 ERP or its feature in the time-frequency plane.

Rényi entropy
The EEG data have non-stationarities in the recordings, which resulted in a mixture of signal components with different time durations and frequency band widths. These varying frequencies in time have been expressed in the form of instantaneous power and energy spectrum using SPWVD, as shown in 13. Based on this, it has been established that an entropy of a multi-component system is relatively larger than for the mono-component system, and the generalized entropy of such system represented in the time-frequency domain is well expressed with Rényi entropy [28] as (14) where Cs(t, f ) is the distribution obtained from 13 and additionally normalized, and α is rank of Rényi measure. Figure 12 shows in red (circle point representation) the entropies of averaged ERPs for each electrode; it also shows in green (square point representation) the entropies of averaged ERPs for the same electrodes after the data had been processed by the RICI temporal denoising (as explained in Section 3.1). Change in global entropy after RICI shows that the standard deviation of the ERPs across the scalp for the given time period has increased by 0.1%. This shows that although global Rényi entropy is a good indicator of signal complexity, change in entropy for the ERPs obtained across the scalp provides us with very little information on what is happening on the electrode level. Therefore, in this paper, we explore the solution which can represent the P300 features using short-term Rényi entropy [28] using a sliding window of fixed size. This is done by considering different portions of the (t, f ) plane; for each time instant p to obtain different values of the Rényi entropy, where p is centered in the (t, f ) plane for the upper and lower window span t 0 , we can express the short-term Rényi entropy as The Rényi entropy of a signal is invariant to the active dipole orientation (if it is negative or positive) and does not care about the within subject variation; this makes it a robust tool for ERP analysis.
To check short-term Rényi entropy methods' performance with some real-life EEG dataset, we considered training datasets of two subjects from visual P300 experiment [10] which was recorded using the paradigm described by Donchin et al., in 2000, [5] and originally by Farwell and Donchin, in 1988 [6]. Researchers were interested in basically two types of responses which were recorded at 240 Hz, one that would appear if the character the user was focusing flashed, and thus was considered as target response, and the non-target response is the one where characters were flashing while the user was focused on the character he selected, waiting for it to flash. This produced a target ERP and a non-target ERP which are trail averaged with other targets and non-targets as shown in plot (a) of Fig. 13 and Fig. 14, and short-term Rényi entropy of target and non-target shows a clear difference on how the local changes in entropy are different for targets and non-targets; this is clearly visible in plot (b) of Fig. 13 and Fig. 14 for the electrode Cz. The dataset provided was prepossessed to minimize noise, and more details about the dataset and experiment can be obtained by studying P300 experiment conducted by Wolpaw, in 2002 [10].

Results and discussion
The position of P300 dipole in the head model of Fig. 1a and its projection due to dipole activation is shown in Fig. 1b, based on which we simulated P300 narrow band responses on the dipole level which has been trail averaged and shown in Fig. 2.
Other dipole activations which represent noise in our dataset have been initially simulated with factorial decaying frequencies in the frequency domain as shown in Fig. 3a, and the normal distribution of the histogram in Fig. 3b shows how pink noise is almost Gaussian, very similar to adaptive white Gaussian noise. Meanwhile, the time domain representation of the pink noise shown in Fig. 3c looks much more like real biological EEG activity and shows how it is different than the adaptive white Gaussian noise in its temporal structure.
Simulated P300 ERP on an electrode level, which is a combination of simulated P300 narrow band response and noise, projecting onto the scalp is shown in Fig. 2b. Such trial averaged scalp projections of activity in each electrode has been shown in Fig. 4 for all the 64 EEG electrodes. The scalp projections of activity after adaptive RICI temporal denoising is shown in Fig. 5. Finally, Fig. 12 compares before and after RICI global Rényi entropy and shows the shift of global entropy for each of the 64 EEG electrodes.
The pre-processed Visual P300 experiment datasets obtained from BCI competition 3 [10] have subjects A and B EEG data at the Cz electrode location for both target and non-target ERP activities. This has to be averaged across their respective trials, and their ERP is shown in plot (a) of Figs. 13 and 14, and plot (b) of Figs. 13 and 14 which shows the local short-term Rényi entropy for both target and non-target ERP to facilitate discussion and comparison with results of the simulated dataset.
Analysis of real-life P300 ERP in general [1], as well as the dataset we obtained and studied [10], shows that the P300 ERP waveform vary significantly between subjects. In addition, external factors as drop in concentration or internal factors like noise due to other dipole activations could cause small variations in latency and phase of P300 ERP [1,29]. This is the reasons why the proposed method for ERP analysis is designed not to depend on any prior information about the signal.
With respect to real-life P300 ERP datasets, the simulated ERP dataset has a lot in common to the one explained in Section 2, so much so that we can consider the factors like the variation in latency and phase that affect the real-life P300 ERP dataset, to stand true to our simulated dataset as well.
A visual interpretation of the temporal dynamics of the ERPs of our simulated data at near dipole electrodes CPz, Pz, CP2, and P2 (in the plot (a) of Figs. 6, 7, 8, and 9) shows that the effect of P300 dipole activation, although visible around 300 ms, has been corrupted by noise from other dipoles. The time-frequency representation in plot (b) of Figs. 6, 7, 8, and 9 for the same electrodes more clearly shows the P300 feature, but it also has many ridges throughout the temporal plane. Far dipole electrodes Oz and F6 in plots (a) and (b) of Figs. 10 to 11 do not have P300 feature as they are relatively away from P300 dipole activation region (however, they also have ridges and features due to other dipole activations which are not relevant to P300).
Also, it is worth noting that the recommended (in Section 3.3) short-term Rényi entropy changes in Figs. 6, 7, 8, and 9 are not consistent for the near dipole electrodes.
But once the ERPs are denoised with the RICI, as suggested in Section 3.1, including P300 ERP component, other dominant components are also retained, which is visually comparable between Fig.4 and Fig. 5.
Comparison between plots (a) and (b) of Figs. 6, 7, 8 9, 10, and 11) and Figs. 15, 16, 17, 18, 19, and 20) both for near dipole electrodes and far dipole electrodes show the cleaner ERP temporal dynamics and lesser ridges in the time-frequency representation in the latter. By the way, the P300 ERP and its temporal dynamics around 300 ms are more clearly visible at near dipole electrodes in plots (a) of Figs. 15, 16, 17, and 18 than in Figs. 6, 7, 8, and 9.
Interestingly, the short-term Rényi entropy method (elaborated on in the Section 3.3) after RICI denoising (as explained in Section 3.1), gives a bell-shaped curve around the temporal region where P300 ERP is dominant. This is visible for near dipole electrodes in Figs. 15, 16, 17, and 18, but it is not present for far dipole electrodes (which has nothing but noise, and it can be considered to have similar uncorrelated EEG activity as non-targets).
The bell-shaped curve is comparable with the shortterm Rényi entropy of real visual P300 ERP dataset that has been analyzed in plot (b) of Fig. 13 to Fig. 14 for targets. As local Rényi entropy shows a steep drop in the zone of event detection and then a steady climb indicating the local similarity in the P300 activity (which is then followed by a steep drop, representing the difference between ERP and pink noise), this can be noted as area of detection for our ERP. The reason why the bell shape is much larger with real visual P300 ERP is explained by the original sampling rate being 240 samples per second, and the chosen short-term Rényi entropy window size is selected to consider 101 samples, thus making the area of detection larger. In simulated P300 ERP whose sampling rate is 1024, a short-term Rényi entropy window size of 101 creates smaller local entropy bell-shaped curve.
Note that the proposed entropy-based analysis requires the short-term Rényi entropy window to be carefully selected for the ERP based on its temporal structure, since the short-term Rényi entropy window size should be large enough to detect the ERP component of interest. We therefore have allow user in the short-term Rényi entropy toolbox plugin to manually select the desired window size. However, an automatic procedure for selecting the optimal window size is planned to be investigated in our future work.

Conclusion
The paper present a novel approach for noisy EEG signal interpretation in the time-frequency domain. In order to analyze EEG dynamics more accurately, it is necessary to denoise the data. Noise removal was performed using the improved adaptive, data-driven ICI method, called the RICI method (which was also included in the accompanying toolbox). The results of the RICI filtering showed that the P300 feature was successfully retained in the time-frequency domain while the noise was significantly reduced. Next, a local Rényi entropy was proposed as a tool for the P300 detection in the time-frequency distributions of the denoised EEG records. It was shown that the local Rényi entropy is superior to peak-based measures due to its insensitivity to latency variability and change in dipole orientation. Both the RICI and local Rényi entropy analysis have proved to perform well for the simulated data where ground truth is known. The method was also applied to real-life P300 data showing the promising results. In our future work, we plan to perform comprehensive testings on the large P300 datasets and other ERPs in order to obtain detailed performance analysis of the proposed technique for analysis of the EEG records in the time-frequency domain. For this purpose, we have also built a toolbox plugin to automatize large datasets analysis.