 Research
 Open access
 Published:
Evaluation of features and channels of electroencephalographic signals for biometric systems
EURASIP Journal on Advances in Signal Processing volume 2024, Article number: 58 (2024)
Abstract
Biometric systems are essential tools in modern society where most of our personal information lives in digital form. Although there is a significant variety of biometrics, electroencephalogram (EEG) signals are a useful technique to guarantee that the person is alive, they are universal, and not falsifiable. Nevertheless, EEG processing needs to address some challenges to become a viable technique to build productionready biometric systems. These challenges include the adequate selection of features and channels that maximize the quality of the results and optimize resources. This work provides an analysis of which are the most important features and channels for the correct operation of a biometric system. The experimental analysis worked with two datasets and evaluated 19 features belonging to three groups, waveletbased, spectral, and complexity. Five classifiers were trained: multilayer perceptron, AdaBoost, random forest, support vector machine, and Knearest neighbors. The results found that the best feature for developing a biometric system is the standard deviation extracted from the coefficients of a threelevel discrete wavelet transform. Additionally, the experimental results with the two datasets showed that the proposed method for channel selection can reduce the necessary number of channels while maintaining its performance. Our results, from one of the datasets, showed a reduction of 21 channels (from 32 to 11) and indicated that the best channels to develop biometric systems seem to be those located on the central area of the scalp.
1 Introduction
Most modern authentication systems rely on identification numbers (PIN), imagebased techniques such as fingerprints, facial patterns, iris patterns, and hand geometry [1]. However, these techniques could be easily tricked or simulated, or in the case of PINs, they are also prone to be forgotten or stolen. Electroencephalographic signals have been proposed as an alternative technique for biometric systems since they are universal, meaning every human being produces EEG signals. Moreover, these signals can show information about cognitive processes and subjectspecific genetics. The disadvantages of this kind of biometrics are related to the noise associated with brain signals, making them more complex and challenging than traditional biometric traits. Nonetheless, some features extracted from the EEGs can help process and classify them to differentiate among subjects.
Another concern when working with electroencephalograms for biometric systems development is how to record and use them with a functional, portable, and lowcost implementation. For this reason, biometric systems need a selection of channels to extract the most essential and helpful information to classify subjects. It is beneficial to avoid a full set of 14 or 32 electrodes, which are the typical number of channels found in commercial EEG headsets. The channel selection process could result in constructing biometric systems based on EEG signals with fewer channels to decrease the computational and monetary cost while keeping high performance and portability.
The proposed methodology to find the best features and channels include two parts. The first part focuses on identifying the best feature or set of features to extract from the EEG signals that effectively distinguish one person from another. This evaluation compares the subject classification results using five machine learning algorithms for 19 different EEG extracted features. The second part of this study concentrates on selecting the most appropriate number of EEG channels to identify between subjects and analyze their impact on the performance of the system.
The main contribution of this work is a methodology to select the best feature or features and the optimal number of EEG channels. In this work, the best feature was the standard deviation extracted from the coefficients of a threelevel discrete wavelet transform, and the optimal number of channels was different for each dataset, obtaining a reduction of 21 channels in the best scenario. Additionally, another interesting contribution is the collection of characteristics useful for EEGbased biometric systems. The article is organized as follows. Section 2 summarizes the related work on the feature and channel selection. Section 3 describes the datasets used in this work and the proposed methodology. Section 4 presents the experimental results and their corresponding analysis. Finally, Sect. 6 contains the conclusions and future work.
2 Related works
The ability of the features to capture the most distinctive characteristics of the subjects is critical to achieving high performance in EEGbased applications [2]. Due to the wide range of applications based on EEGs, researchers have focused on finding a suitable method for feature selection during recent years. The most common methods determine the best feature and include a channel selection approach [3]. However, most feature or channel selection methods target specific activity classification where the goal is to differentiate between activities rather than people. For this reason, the results of those methods cannot be directly applied in the development of biometrics since the neurological stimulus used for recording the signals does not matter from a security point of view but rather the precision to distinguish one person from another.
Schroder et al. [4] proposed an automatic feature selection combining a genetic algorithm with a Support Vector Machine (SVM). They considered each channel as a feature, so their method aimed to find the best channels for developing brain–computer Interfaces (BCIs). The authors concluded that the optimal set of features for developing BCI is strongly dependent on the subjects and the experimental paradigm. In the development of biometric systems, a requirement is to establish a method independent of the involved people.
Even though the channels can be used as features, most EEGbased applications compute some features capable of representing the channels’ most relevant information while reducing the dataset dimension. For this reason, [5] first evaluated a set of features including the autoregression (AR) model, the power spectrum of the time domain (TPS), the power spectrum of the frequencydomain (FPS), and phaselocking value (PLV). These features were extracted from all the available channels and then evaluated with SVM. After the feature evaluation, they employed a support vector machine recursive feature elimination (SVMRFE) to select the most discriminative channels, but only for the AR model because it showed the best performance in the previous step. Despite the selection of channels, the authors proposed using 32 EEG channels. Considering that using 32 channels still makes it hard to develop portable biometric systems, more features need to be evaluated to further reduce this number.
Another approach for selecting the best features and channels is finding a measurement representing intrasubject and intersubject variability. In this kind of measurement, the best features and channels have minimum intrasubject variability and maximum intersubject variability. Following this approach, Kang, Lee, and Kim [1, 6] provided two analyses of features and channels for EEGbiometrics. In the first analysis, they evaluated power spectral density (PSD) features and Lyapunov exponents with a criteria index (CI) that consisted of three types of variances. Using this index, they found that the maximum Lyapunov exponent had the maximum CI value among all types of EEG features. Furthermore, two (T4, T6) out of sixteen channels had the highest CI values overall brain areas [1].
A second study [6] evaluates the performance of seven features (alpha/theta, alpha/beta, theta/beta power ratio, sample entropy, permutation entropy, entropy, and median values of distribution) extracted for each of the sixteen EEG channels. As a result, 112 features composed their initial set of features. Then, they used mutual information to select the best ones. After their analysis, the set of features was reduced to nine, and the user identification results were higher than using PCA reduction [6]. Regardless of the proposed measurements’ efficiency, both works’ main limitations were the dataset’s size because it only contained the information of seven subjects with recordings of sixteen channels.
A different approach for features and channels selection for a subject identification is presented by [7]. The authors first performed a channel selection guided by the assumption that electrooculographic (EOG) interference in the restingstate with eyes open and the bursting alpha activity in the restingstate with eyes closed could lead to low authentication. Using the spectral power, they investigated the effect of these two factors and finally reduced the number of channels from 56 to 34. They analyzed, using the selected channels, ten singlechannel features (seven spectral and three nonlinear) and ten multichannel features by conducting network analysis based on phase synchronization. The authors were able to select the best feature for the dataset in the study.
Despite the existence of various approaches for selecting optimal features and channels in previous research, there is a notable absence of studies that confirm their findings across multiple datasets obtained from diverse EEG headsets. Furthermore, there is a deficiency in comparative analysis involving classification algorithms to ascertain the most effective featurechannel pairing. Additionally, features derived from wavelet transform have been demonstrated to be effective for user identification [8,9,10]. Consequently, incorporating waveletbased features into the initial feature set could significantly enhance the comprehension of EEGbased biometric systems.
3 Methodology
The proposed method is divided into four main steps: data acquisition, feature extraction, feature selection, and channel selection. The following subsections discuss the details of each step. Moreover, it is important to mention that this study extends the EEGbased applications using Python (EBAPy) framework [11]. Due to the flexibility of this framework, this study added functionalities to EBAPy [12].
3.1 Data acquisition
This work employed two datasets to train, test, and experiment with our methodology. The first dataset is the openaccess dataset “DEAP” [13]. It has recordings of 32 healthy participants (50% female), aged between 19 and 37 (mean age is 27). Each recording has 32 EEG channels, 12 peripheral channels, three unused channels, and one status channel. All subjects recorded 40 trials, and each trial has a duration of 63 seconds (3 seconds of baseline and 60 seconds of trial). In each trial, the participants were presented with a YouTube video aimed at evoking particular sentiments. After each video, the subjects scored their level of valence, arousal, dominance, and liking.
Even though the DEAP dataset aims to analyze human affective states, it can be correctly used for biometrics development. It is only needed to know which subject each EEG corresponds to, regardless of the task performed at recording time. Furthermore, using a dataset with different affective states is advantageous because each subject’s EEGs can vary significantly between the trials providing a more realistic environment.
DEAP dataset has two versions, one with the raw signals and another one with already preprocessed signals. This work uses the preprocessed version of the DEAP dataset. In this version, the EEG signals were downsampled from 512 Hz to 128 Hz, passed through a bandpass frequency filter from 445 Hz, and through a common average reference filter to improve the signaltonoise ratio. Additionally, the electrooculographic (EOG) artifacts were removed using a blind source separating technique.
The second dataset is the openaccess “BIOMEXDB” [14]. The main reason for using another dataset was to increase the reliability of this study. In contrast to DEAP, this dataset aims to develop multimodal biometric systems by providing EEG, audio, and video recordings. However, our research only analyzes the EEGs.
The information of BIOMEXDB corresponds to 51 healthy participants (49% female), aged between 16 and 61 (mean age 29). Each participant registered 135 trials of 2.5 seconds, where each trial was the pronunciation of a number between 1 and 10. Each EEG was recorded with a frequency of 2048 Hz and 14 channels. However, the sampling frequency was downsampled to 128 Hz.
As with DEAP, the original labels of BIOMEXDB were replaced by the subjects of each EEG. Based on the results of previous works [15, 16], this research uses 1.75 seconds of recording taken randomly from each trial.
3.2 Feature extraction
This study evaluates 19 features belonging to three categories: spectral, complexity, and waveletbased. Each feature is detailed below.
3.2.1 Spectral features
Dauwels et al. [17] used the spectral features to quantify the changes in the signal given by its power. When working with spectral features, it is necessary to work with the Fourier transform of the signal.

Power Spectral Density (PSD) The PSD shows the strength of the energy variation of a signal as a function of the frequency. It is defined as the discretetime Fourier transform of the covariance sequence [18]. Nevertheless, in this work, the PSD was obtained using the “Periodogram,” which is a MATLAB function. The periodogram is a nonparametric estimate of the PSD, and it is defined as the Fourier transform of the biased estimate of the autocorrelation sequence [19]. To calculate this, we use the frequency sample of the signals, which is 128Hz.

Total Average Power (TAP) Power is defined as work per time, meaning the amount of energy transferred per unit of time. The following formula is used to calculate the total average power. Considering a signal x:
$$\begin{aligned} TAP = \frac{\sum x^2}{\text {length}(x)}. \end{aligned}$$(1)Median Frequency (MFreq) The MFreq is defined as the frequency in which the total spectral power is halved [20]. It is expressed by the following equation:
$$\begin{aligned} \sum _{j=1}^{MDF} P_j = \frac{1}{2}\sum _{j=1}^M P_j, \end{aligned}$$(2)where \(P_j\) is the EEG power spectrum at the frequency bin j, MDF is the frequency value in which the power spectrum is divided in two parts with equal integrated power [21], and M is the length of the frequency bin [22].

Relative Power (RelPow) This method consists of the computation of the spectral power in each frequency band of the EEG, and then, it is necessary to compute the percentage of the total power that each band has. For computing the RelPow, this work used four frequency bands: \(\delta\) (04 Hz), \(\theta\) (48 Hz), \(\alpha\) (812 Hz), and \(\beta\) (1230 Hz).
3.2.2 Complexity features
This category of features quantifies the entropy and complexity of a system. In the context of information theory, entropy is defined as the measure of the uncertainty associated with a random variable [17]. The greater the entropy, the more complex the system is

Tsallis Entropy (TsEn) The TsEn is the generalization of the BoltzmannGibbs entropy [23] and it is determined by quantizing the amplitude of the EEG. As mentioned in [24], given a discrete set of probabilities \(\{p_i\}\) with the condition that \(\sum p_i = 1\), the TsEn is defined as:
$$\begin{aligned} S_q(p_i) = \frac{k}{q1}\left( 1\sum _{i=1}^W p_i^q \right) , \end{aligned}$$(3)where \(q \in {\mathbb {R}}\), \(W \in {\mathbb {N}}\) is the total number of possible configurations. and k is a conventional positive constant. For this work we used \(k=1\) and \(q = 3\), as recommended in several studies for determining changes in a signal [25].

Approximate Entropy (ApEn) It measures the logarithmic likelihood that runs of patterns that are close (within a window denominated by r) for m contiguous observations on subsequent incremental comparisons. In this work, we used the values of \(r=0.01\) and \(m=1\) to determine the amplitude variation in the signal [26]. To compute the ApEn, the following steps should be followed [27]:

1.
Form a timeseries data \(\{u(n)\} = u(1), u(2), ..., u(N)\) with N data values equally spaced in time.

2.
Fix \(m \in {\mathbb {Z}}\) and \(r \in {\mathbb {R}}^+\) where m is the length of the run data, and r the filtering level, or tolerance window.

3.
Form a sequence of vectors \(x(1), x(2),...,x(Nm+1) \in {\mathbb {R}}^m\), a mdimensional space defined by \(x(i) = [u(i), u(i+1), ..., u(i+m1)]\) where \(i\in \{1, ... , Nm+1\}\).

4.
Define d[x(i), x(j)] which is the distance between x(i) and x(j), as the maximum absolute difference between their respective scalar components
$$\begin{aligned} d[x(i), x(j)] = \max _{k=1,2,...,m} [u(i+k1)u(j+k1)]. \end{aligned}$$(4) 
5.
For a given x(i), count the number of j (\(j \in \{1,...,Nm+1\}\)) so that \(d[x(i),x(j)] \le r\) denoted as \(N^m(i)\). Then:
$$\begin{aligned} C_r^m = \frac{N^m(i)}{Nm+1}, \end{aligned}$$(5)for \(i\in \{1, ... , Nm+1\}\).

6.
Find the natural logarithm of each \(C_r^m(i)\) and average it over i
$$\begin{aligned} \Phi ^m(r) = \frac{1}{Nm+1} \sum _{i=1}^{Nm+1} \ln C_r^m(i). \end{aligned}$$(6) 
7.
Increase the dimension of m to \(m+1\) and then find \(C_r^{m+1}(i)\) and \(\Phi ^{m+1}(r)\)

8.
Define approximate entropy as:
$$\begin{aligned} ApEn = \Phi ^m(r)  \Phi ^{m+1}(r). \end{aligned}$$(7)

1.

Sample Entropy (SampEn) The SampEn is a refinement of the ApEn designed to have a smaller bias since it does not include selfsimilar patterns [28]. Assuming a timeseries sequence of length \(N, u(i) = \{u_1, \ldots ,u_N\}\), \(Nm+1\) vectors \(x_m(i)\) with \(i\in \{1, \ldots , Nm+1\}\) where \(x_m(i) = \{u(i+k): 0\le k \le m1 \}\) are formed. Then, the distance function \(d[x_m(i),x_m(k)]\) is defined to be the maximum difference between the components of the vectors:
$$\begin{aligned} d[x_m(i),x_m(k)] = \max \{u(i+j)u(k+j) :0\le j \le m \}. \end{aligned}$$(8)Finally, the SampEn is defined as:
$$\begin{aligned} SampEn =  \log \frac{A}{B}, \end{aligned}$$(9)where A is the number of template vector pairs having \(d[x_{m+1}(i), x_{m+1}(k)] < r\), and B is the number of template vector pairs having \(d[x_{m}(i), x_{m}(k)] < r\) [28]. The result is always greater than zero, and a small value is an indicator of less noise and more selfsimilarity. For this feature, we also used the values of \(r=0.01\) and \(m=1\).

LempelZiv Complexity (LZC) This method counts the number of different patterns in a signal of length n. The fewer such patterns, the better a signal may be compressed. The compression rate is a measure of the regularity of a signal [29].
The original signal must be coarsegrained and transformed into a symbol sequence for simplifying the computation. To generate a twostate sequence, signal (\(R = \{r(1), r(2), \ldots , r(n)\}\)), the following equation is applied:
$$\begin{aligned} r(i) = \left\{ \begin{array}{ll} 0, &{} if \quad x(i) < T_h\\ 1, &{} if \quad x(i) \ge T_h \end{array}\right. \end{aligned}$$where n is the length of the signal x(n) and \(T_h\) is the threshold, which normally is the mean value of the sequence.
Using the binary sequence R, the vector c(n), which is a counter of the different patterns, for a binary symbol sequence, is calculated following the below process:

1.
Let S and Q denote two strings. SQ is the concatenation of S and Q. \(SQ\pi\) is the concatenated string with the last character deleted, and \(v(SQ\pi )\) is the vocabulary of all the different substrings of \(SQ\pi\). For example, consider \(c(n) = 1\), \(S=S_1\) and \(Q = S_2\), so \(SQ\pi =S_1\).

2.
If \(Q \in v(SQ\pi )\) then Q is a substring of \(SQ\pi\).
\(S = S_1S_2S_3 \dots S_r\)
\(Q = S_{r+1}\)
\(SQ\pi = S_1S_2...S_r\)

3.
\(Q=S_{r+1}S_{r+2} \dots S_{r+i}\) is not a substring of \(SQ\pi = S_1S_2S_3 \cdots S_{r+i1}\), so c(n) increases by one.

4.
c(n) is updated because a new pattern was found, then S and Q are also updated: \(S=S_1S_2 \dots S_{r+i}\) and \(Q = S_{r+i+1}\)As a result, c(n) is the number of different substrings contained in R, meaning that c(n) represent the different patterns in a sequence. Also, c(n) may vary with length, therefore, it should be normalized. The upper bound of c(n) is:
$$\begin{aligned} c(n) < \frac{n}{(1\epsilon _n)\log _\alpha (n)}, \end{aligned}$$(10)where n is the length of the sequence, \(\alpha\) the number of different symbols, and \(\epsilon _n\) is a small quantity. Additionally, \(\epsilon \rightarrow 0\) when \(n \rightarrow \infty .\)
$$\begin{aligned} \lim _{n \rightarrow \infty }c(n) = b(n) = \frac{n}{\log _\alpha (n)}. \end{aligned}$$(11)Finally, c(n) can be normalized as:
$$\begin{aligned} C(n) = \frac{c(n)}{b(n)}. \end{aligned}$$(12)
Higuchi Fractal Dimension (HFD) According to Giannakakis et al. (2014), the fractal dimension is a nonlinear measure on the time domain that is used to characterize the complexity of a time series. It is helpful to quantify the complexity and selfsimilarity from a signal [30]. The degree of complexity of the sequence increases as the fractal dimension increases. The following procedure was used to obtain this measure. Given a onedimensional timeseries \(X = x(1), x(2), ..., x(N)\), form k new timeseries \(X_k^m\) defined by:
$$\begin{aligned} \begin{aligned} X_k^m = \left\{ x(m), x(m+k), x(m+2k),..., x\left( m + int\left( \frac{Nm}{k}\right) \times k\right) \right\} , \end{aligned} \end{aligned}$$(13)where k and m are both integers, and \(int(*)\) makes reference to the integer part of ‘\(*\)’. k represents the discrete time interval between points, and \(m=1,2,...,k\) is a representation of the initial time value [30]. In this sense, taking as an example \(k_{max}=3\) and \(N=100\):
$$\begin{aligned}{} & {} X_1^3: X(1), X(4), X(7), \dots , X(94), X(97), X(100) \\{} & {} X_2^3: X(2), X(5), \dots , X(95), X(98) \\{} & {} X_3^3: X(3), X(6), \dots , X(96), X(99) \end{aligned}$$For each of the time series constructed by the previous equation, the length is computed in the following way:
$$\begin{aligned} \begin{aligned} L(m,k) = \frac{1}{k} \left( \sum _{i=1}^{int(\frac{Nm}{k})}{\left X\left[ m+ik \right]  X \left[ m + (i1) \times K \right] \right } \right) \times \frac{N1}{int(\frac{Nm}{k})k} , \end{aligned} \end{aligned}$$(14)with N as the length of the original timeseries X, and \(\frac{N1}{int(\frac{Nm}{k})k}\) is a normalization factor.
$$\begin{aligned} L(k) = \frac{1}{k} \times \sum _{m=1}^kL(m,k). \end{aligned}$$(15)Those steps are repeated \(k_{max}\) times for each \(k = 1,2,\dots ,k_{max}\) The next step in the procedure is to plot L(k) against 1/k on a double logarithmic scale, with \(k =1,2,\dots , k_{max}\). The result should be the data falling on a straight line that has a slope equal to the fractal dimension of the timeseries X. Therefore, Higuchi fractal dimension (HFD) is defined as the slope of the line that fits the pairs \(\{\ln (L(k)), \ln (1/k)\}\) using a leastsquare method to determine it. The value of \(k_{max}\) is chosen at the point in which the fractal dimension is considered a saturation point. A value of \(k_{max}=60\) was chosed for this study.

1.
3.2.3 Waveletbased features
This study analyzes the ten features proposed by [31]. All features are calculated after applying a discrete wavelet transform (DWT). Previous experimental results by CarrionOjeda et al. [15, 16] showed that a threelevel DWT with Daubechies4 as mother wavelet provides efficient features. For this reason, this study uses those parameters for the DWT. All the detail coefficients (D) and the last approximation coefficient (A) were analyzed to compute the features. Below are the equations for all the extracted features and in all of them
and N is the length of the coefficients.
Maximum per Wavelet Coefficient (max):
Minimum per Wavelet Coefficient (min):
Mean per Wavelet Coefficient:
Standard Deviation per Wavelet Coefficient:
Variance per Wavelet Coefficient:
Median per Wavelet Coefficient: Median of each coefficient.
Skewness per Wavelet Coefficient:
Energy per Wavelet Coefficient:
Relative Wavelet Energy: For computing the RWE, first the total energy is need:
Then, the RWE is defined as follows:
Entropy per Wavelet Coefficient:
3.3 Feature selection
An individual evaluation was carried out for each of the features detailed in Section 3.2 to select the best one. The entire process followed to evaluate and select the best feature is detailed below.
3.3.1 Hyperparameter optimization
This study assessed five classifiers: multilayer perceptron (MLP), AdaBoost (AB), random forest (RF), support vector machine (SVM), and knearest neighbors (kNN). For optimizing the hyperparameters of all classifiers, a greedy search optimization was applied. This algorithm uses a hyperparameter set and analyzes all possible combinations generated using the parameters contained in that set [32]. The optimization was applied individually for each feature in each dataset. Additionally, tenfold crossvalidation was used to increase the reliability of the selection of hyperparameters. The set of parameters for each classifier was the same throughout the optimization process and is shown in Table 1.
For each dataset, 20% of the available data was extracted to perform hyperparameter optimization. These data were divided into a training and testing set to carry out the greedy search optimization. This division was performed in a balanced way to avoid possible bias when selecting the best parameters. For this reason, 75% of the trials of each subject were randomly selected for the training set, and the remaining 25% made up the test set. For avoiding favoring any classifier, all of them were optimized using the same folds. Furthermore, it is worth mentioning that the data used at this stage was not reused at any later stage.
3.3.2 Classification
After obtaining the best combination of parameters for each classifier, these were independently trained for each dataset and feature. As mentioned in Sect. 3.1, this study followed a multiclass classification approach where each subject corresponds to a class, resulting in 32 classification categories for DEAP and 51 classification categories for BIOMEXDB. For training the classifiers, a closed set strategy was followed, meaning that each classification category was presented during training.
3.3.3 Performance evaluation and best feature selection
As this study followed a multiclass classification approach, three of the multiclass performance metrics proposed by [33] were computed. The three computed metrics are macroaveraging sensitivity (Se), macroaveraging specificity (Sp), and average accuracy (Acc). All these metrics are based on the confusion matrix of the classifier, and their formulas are the following:
where l is the number of participants (classes), \(Tp_i\) are the truepositive classifications, \(Tn_i\) are the truenegative classifications, \(Fp_i\) corresponds to the falsepositive classifications and \(Fn_i\) corresponds to the falsenegative classifications; all of them of the \(i^{th}\) subject.
For increasing the reliability of the experimental results, a tenfold crossvalidation was used. The same process described in Sect. 3.3.1 was followed to obtain balanced folds. This stage used the data that were not used during optimization. Finally, the best feature was the one that produced the highest three performance metrics.
3.4 Channel selection
Channel selection is divided into two main stages. The first is responsible for evaluating each channel individually, while the second looks for the optimal number of channels.
3.4.1 Singlechannel evaluation
For evaluating each channel individually, the “good” features were first selected based on the results obtained after applying the process described in Sect. 3.3. A feature was considered “good” if at least three classifiers achieved a performance higher than 85% in all evaluation metrics. This selection was performed independently for the two datasets.
Once the good features were identified, new data matrices were created for each subject. The dimension of the matrices was \(t \times n\), where t is the number of trials of each subject and n corresponds to the concatenation of the good features using a single channel. These new matrices were created for all available channels, resulting in matrices for each channel instead of each feature. Subsequently, the new dataset was divided in the same way as before, i.e., 20% for optimization and 80% for evaluation.
This study used a MLP to perform the channel evaluation due to its potential to find patterns in complex datasets. This classifier was optimized using the same approach described in Sect. 3.3.1. Note that a single MLP was optimized for all channels. For optimizing a single MLP, the optimization folds of each channel were joined into a single fold. Consequently, each fold contained the same amount of information from each channel and continued to have a balanced selection in terms of subjects. In this way, we maintained the impartiality of the classifier over the channels.
Table 2 contains the results of MLP optimization for each dataset. After the optimization, the channels were evaluated individually using the optimized MLP and the correct classification rate (CCR) as a performance metric. The CCR is defined as:
This evaluation was used to find the best channel to develop a biometric system using a single EEG channel. It also allowed ordering the channels from best to worst based on the CCR to perform the subsequent analysis.
3.4.2 Optimal number of channels assessment
This assessment worked only with the best feature and classifier found in Sect. 3.3, which was standard deviation. The assessment consisted of analyzing the performance of the classifier with the three metrics explained in Sect. 3.3.3 using a different number of channels starting from a single channel until reaching the total available channels for each dataset. The variation was made progressively, increasing one channel each time.
This study proposes to use the channel ordering method described in the previous section. For this reason, to demonstrate the efficiency of the proposed method, a comparison was made against a random ordering. For the random ordering method, the results of 10 executions (10 different ways to sort channels) were averaged. Subsequently, a multivariate analysis of variance (MANOVA) was conducted to determine the optimal number of channels for each method. The MANOVA is a method for testing statistical significance in differences among multivariate sample means and is the multivariate extension of the univariate analysis of variance (ANOVA). This method offers advantages over conducting ANOVA test for each dependent variable, at the cost of added complexity, as it reduces the likelihood of type 1 errors, i.e., wrongly rejecting a true null hypothesis, and captures correlations among combinations of independent variables [34]. In this scenario, the optimum corresponds to the number of channels from which that number is no longer an influencing factor in the performance of the classifier.
The MANOVA analyzed the results varying the number of channels from all until a single channel. Consequently, the optimal number of channels is the least number of channels that maintain a pvalue greater than 0.05. To verify if there was a significant difference between the proposed ordering method and the random ordering, the lowest performance metric among the Se, Sp, and Acc was selected to perform a Wilcoxon test between both methods. This test was applied independently for each channel evaluation, i.e., \(1 \text { channel}, 2 \text { channels}, \dots , c \text { channels}\), where c is the total amount of channels.
4 Experimental results
The results and analysis are divided into feature selection, singlechannel evaluation, and the optimal number of channels assessment. Moreover, the results for each dataset are presented separately.
4.1 Feature selection
Figure 1 shows a graphic representation of the sensitivity of the classifiers using the 19 features mentioned in Sect. 3.2 for the case of DEAP dataset. The features are ordered from best to worst based on the best classification results. The best feature was standard deviation per wavelet coefficient using a MLP, while the worst was skewness per wavelet coefficient using RF. Figure 2 illustrates the difference between those two features using the confusion matrix of the MLP.
Sensitivity is shown in this section because it was the lowest performance metric and it helps to distinguish between features. Accuracy and specificity were also calculated but not presented in this section. Figure 1 shows that even though complexity features can be useful for other applications such as medical applications, they do not seem to be the most suitable choice for developing biometric systems. On the contrary, some of the spectral and wavelet features appeared to be a better option for developing these systems. However, it is worth mentioning that not all waveletbased features lead to high results; for example, the worst feature (skewness) belongs to this category.
For BIOMEXDB, Fig. 3 shows the sensitivity boxplots of the classifiers with each feature. For this dataset, complexity and spectral features did not lead to results as good as those obtained with some wavelet features. Besides, this figure helped to verify that the performance of the system heavily relies on the classifier. For example, the performance using the minimum of the DWT coefficients varies a lot using MLP regarding RF. As in the case of DEAP, the best feature was standard deviation using a MLP and the worst feature was the skewness using kNN. The difference between these two features using the MLP is depicted in Fig. 4.
4.2 Singlechannel evaluation
For evaluating each channel of the DEAP dataset; the waveletbased features: Std, Energy, Var, Max, Entropy, and Min were the features that fulfilled the requirements explained in Sect. 3.4.1. There were also two spectral features: PSD and TAP. Using these features and the optimized MLP (Table 2), the singlechannel evaluation was performed. Table 3 exhibits the 32 available channels ordered from best to worst based on their correct classification rate. Figure 5 aids to visualize how the channels were placed and the order found with the MLP. By analyzing this figure, the best channels were those located in the central area of the scalp.
On the other hand, the best performing features for the BIOMEXDB dataset were one spectral feature (RelPow) and seven waveletbased features: Std, Energy, Entropy, Var, RWE, Max, and Min. Table 4 presents the 14 available channels ordered descending by their CCR. As with DEAP, for better visualization of the results, Fig. 6 indicates the location of the channels and their corresponding order for the BIOMEXDB dataset. For this dataset, the best channels tend to be located on the right side of the scalp.
4.3 Optimal number of channels
Figure 7 illustrates the results of the analysis of the number of channels. The proposed methods started using one channel and then increasing by one the number of selected channels until using all available channels. The channels were added following the order presented in Table 3 and a random ordering. As before, only the sensitivity is presented due to this metric was the lowest. This figure encouraged the subsequent analysis to find the optimum number of channels.
Table 5 contains the results of the statistical analyses using MANOVA and Wilcoxon tests for the DEAP dataset. In this table, the first column shows the number of channels. The second column shows the pvalue from comparing between the MLP sensitivity with n channels and \(n+1\) channels. The third column shows the pvalue from comparing between the sensitivity achieved by a random selection method with n channels and \(n+1\) channels. The fourth column shows the p value resulting from the Wilcoxon test between the proposed method and a random selection.
A p value less than 0.05 in the MANOVA analysis means that the number of channels impacts the overall performance of the system. The results of this analysis shows that the proposed MLP was statistically better than randomly selecting the order of the channels because by using the proposed MLP, the optimal number of channels was 11, while by using the random selection, this number increases to 25. On the other hand, for the Wilcoxon Test, a p value less than 0.05 is interpreted as a statistically significant difference between the evaluated methods. These results verified that there is a difference between the results of the two methods in almost all the cases indicating the efficiency of the proposed selection method. In the Tables 5 and 6, bold values are pvalues less than 0.05, and the values with * correspond to the optimal number of channels.
Due to the results mentioned above, the suggested biometric system for the DEAP dataset has the following characteristics: EEG recordings of 1.75 seconds using the best 11 channels selected by the proposed MLP, and standard deviation extracted from a threelevel DWT as a feature. Figure 8 illustrates the difference in the sensitivity of the classifiers using the proposed number of channels regarding using all available channels. In both cases, the classifiers were evaluated using the standard deviation per wavelet coefficient. Despite the difference in the classification performance between using all the available channels and the proposed optimum, the previously mentioned MANOVA showed that this difference is not statistically significant.
In the case of the BIOMEXDB dataset, Fig. 9 compares the proposed method for selecting the channels with a random selection, while varying the number of channels. The optimal number of channels is identified using the same statistical analysis used with DEAP dataset. In this case, that number is 13 channels.
Table 6 contains the resulting pvalues of the statistical analysis. This table follows the same structure of Table 5 explained above. For this dataset, the results of the MANOVA indicate that for both methods, the optimal number of channels is 13. Despite this result, in general, the pvalues of the proposed MLP were less than the ones of the random selection. Moreover, the equality in the optimum could be because, in this dataset, the channels used for recording the EEGs were located at the left and right zones of the scalp. However, the results presented in Sect. 4.2 for the DEAP dataset, which uses more EEG channels distributed over a wider area, indicate that the best channels are located on the center of the scalp.
Besides, the Wilcoxon test results verified that the two evaluated selection methods are statistically different in almost all scenarios. The only scenario where these methods were not statistically different was when working with a single channel. This result was expected since when working with a singlechannel biometric system, it is extremely difficult for the results of using a specific channel to be better than the results of using any other channel.
Resulting from the above mentioned results, the proposed biometric system for the BIOMEXDB dataset has the following characteristics: EEG recordings of 1.75 seconds using the best 13 channels selected by the proposed MLP, and standard deviation extracted from a threelevel DWT as a feature. As with DEAP, Fig. 10 contains the results of the comparison between the proposed biometric system regarding the biometric developed using all the available EEG channels.
5 Limitations
This section provides an analysis of identified limitations of our work. These limitations could be address with ideas mentioned in the future work but were considered outside the scope of the study.
The search space of all combinations of features and channels grows exponentially. Therefore, we limited this study to find the best feature and the minimum number of channels.
The two databases that were used in this study only include data from a single session. This could potentially affect the quality of the biometric features that need to be consistent over time for the same subject. BIOMEXDB is a dataset specifically created for biometrics but only includes one session.
Recent studies [35] show promising results with deep neural networks that tackle the feature extraction as part of their architecture. It would be interesting to contrast our work with these new architectures to evaluate the results in both cases.
6 Conclusions and future work
The main contribution of this work is a methodology to select the best feature and the optimal number of channels for developing a biometric system based on EEG signals. This study worked with two datasets independently to evaluate the effectiveness of the proposed methodology. Moreover, nineteen features belonging to three categories (spectral, complexity, and waveletbased) were described and evaluated in this investigation.
The experimental results using two different datasets showed that waveletbased features are the best option for developing biometric systems based on EEG signals. Additionally, the standard deviation per wavelet coefficient proved to be the most efficient feature among all the analyzed features in this study to represent the differences between the subjects effectively. On the contrary, although the complexity features are commonly used in medical applications, they generally led to the lowest performance in this study.
Additionally, the evaluation of the EEG channels showed that it is not necessary to use all available ones. For instance, in the case of the DEAP dataset, using approximately onethird of the available channels produced results statistically equal to those obtained with all channels. This reduction in the number of channels is highly beneficial since it decreases the computational cost and increases the portability of the final system. Moreover, the experimental results seem to indicate that the best channels for developing biometric systems are located on the center of the scalp. The results show the viability of biometric systems based on EEG signals because a judicious selection of characteristics, based on an analysis as described in this paper, enables the implementation of an efficient and relatively simple biometric system.
As future work, there are many directions of research; here, we include the main ideas that we consider worth exploiting in the future.

Include an indepth analysis of the standard deviation to understand why this feature can effectively represent the differential factors of each subject.

Analyze the impact of the location of the EEG channels on the overall performance of the system because this can significantly increase the understanding of the factors that influence the performance of EEGbased biometric systems.

Consider collecting EEGs over several sessions to guarantee the robustness of the biometric features over time.

Explore multivariate feature selection to find out the performance of combining features.

Evaluate different architectures including deep neural networks because the MLP was the best classifier for both datasets indicating that neural networks are a promising approach.
Availability of data and materials
Not applicable.
References
J.H. Kang, C.H. Lee, S.P. Kim, EEG feature selection and the use of lyapunov exponents for eegbased biometrics, in 2016 IEEEEMBS International Conference on Biomedical and Health Informatics (BHI), pp. 228–231 (2016). https://doi.org/10.1109/BHI.2016.7455876
Q. Gui, M.V. RuizBlondet, S. Laszlo, Z. Jin, A Survey on Brain Biometrics. ACM Comput. Surv. 51(6) (2019) https://doi.org/10.1145/3230632
T. Alotaiby, F.E.A. ElSamie, S.A. Alshebeili, I. Ahmad, A review of channel selection algorithms for EEG signal processing. EURASIP J. Adv. Signal Process. 2015 (2015) https://doi.org/10.1186/s1363401502519
M. Schroder, M. Bogdan, T. Hinterberger, N. Birbaumer, Automated eeg feature selection for brain computer interfaces, in First International IEEE EMBS Conference on Neural Engineering, 2003. Conference Proceedings., pp. 626–629 (2003). https://doi.org/10.1109/CNE.2003.1196906
S. Liu, Y. Bai, J. Liu, H. Qi, P. Li, X. Zhao, P. Zhou, L. Zhang, B. Wan, C. Wang, Q. Li, X. Jiao, S. Chen, D. Ming, Individual feature extraction and identification on EEG signals in relax and visual evoked tasks. Commun. Comput. Inf. Sci. 404, 305–318 (2014). https://doi.org/10.1007/9783642541216_29
C. Lee, J.H. Kang, S.P. Kim, Feature slection using mutual information for eegbased biometrics, in 2016 39th International Conference on Telecommunications and Signal Processing (TSP), pp. 673–676 (2016).https://doi.org/10.1109/TSP.2016.7760968
J.H. Kang, Y.C. Jo, S.P. Kim, Electroencephalographic feature evaluation for improving personal authentication performance. Neurocomputing 287, 93–101 (2018). https://doi.org/10.1016/j.neucom.2018.01.074
H.A. Shedeed, A new method for person identification in a biometric security system based on brain eeg signal processing, in 2011 World Congress on Information and Communication Technologies, pp. 1205–1210 (2011). https://doi.org/10.1109/WICT.2011.6141420
K. Bashar, Ecg and eeg based multimodal biometrics for human identification, in 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4345–4350 (2018). https://doi.org/10.1109/SMC.2018.00734
S. Yang, S. Hoque, F. Deravi, Improved timefrequency features and electrode placement for EEGbased biometric person recognition. IEEE Access 7, 49604–49613 (2019). https://doi.org/10.1109/ACCESS.2019.2910752
D. CarriónOjeda, P. MartínezArias, R. FonsecaDelgado, I. Pineda, EBAPy: a Python framework for analyzing the factors that have an influence in the performance of EEGbased applications. Softw. Impacts 8, 100062 (2021). https://doi.org/10.1016/j.simpa.2021.100062
D. CarriónOjeda, P. MartínezArias, R. FonsecaDelgado, I. Pineda, H. MejíaVallejo, Evaluation of Features and Channels of Electroencephalographic Signals for Biometric SystemsSource Code. https://www.codeocean.com/ (2021). https://doi.org/10.24433/CO.1541880.v2
S. Koelstra, C. Mühl, M. Soleymani, J.S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras, DEAP: a database for emotion analysis using physiological signalsls. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012). https://doi.org/10.1109/TAFFC.2011.15
J.C. MorenoRodriguez, J.C. AtencoVazquez, J.M. RamirezCortes, R. ArechigaMartinez, P. GomezGil, R. FonsecaDelgado, Biomexdb: a cognitive audiovisual dataset for unimodal and multimodal biometric systems. IEEE Access 9, 111267–111276 (2021). https://doi.org/10.1109/ACCESS.2021.3100035
D. CarriónOjeda, H. MejíaVallejo, R. FonsecaDelgado, P. GómezGil, M. RamírezCortés, A method for studying how much time of eeg recording is needed to have a good user identification, in 2019 IEEE Latin American Conference on Computational Intelligence (LACCI), pp. 1–6 (2019). https://doi.org/10.1109/LACCI47412.2019.9037054
D. CarriónOjeda, R. FonsecaDelgado, I. Pineda, Analysis of factors that influence the performance of biometric systems based on EEG signals. Expert Syst. Appl. 165, 113967 (2021). https://doi.org/10.1016/j.eswa.2020.113967
J. Dauwels, F. Vialatte, A. Cichocki, Diagnosis of alzheimers disease from EEG signals: Where are we standing? Curr. Alzheimer Res. 7(6), 487–505 (2010). https://doi.org/10.2174/156720510792231720
P.G. Stoica, R. Moses, Spectral Analysis of Signals (Pearson, Upper Saddle River, NJ, 2005)
F. Auger, P. Flandrin, Improving the readability of timefrequency and timescale representations by the reassignment method. IEEE Trans. Signal Process. 43(5), 1068–1089 (1995). https://doi.org/10.1109/78.382394
J.C. McBride, X. Zhao, N.B. Munro, C.D. Smith, G.A. Jicha, L. Hively, L.S. Broster, F.A. Schmitt, R.J. Kryscio, Y. Jiang, Spectral and complexity analysis of scalp EEG characteristics for mild cognitive impairment and early alzheimer’s disease. Comput. Methods Programs Biomed. 114(2), 153–163 (2014). https://doi.org/10.1016/j.cmpb.2014.01.019
S. Thongpanja, A. Phinyomark, P. Phukpattaranont, C. Limsakul, Mean and median frequency of emg signal to determine muscle force based on timedependent power spectrum. Elektronika ir Elektrotechnika 19(3), 51–56 (2013). https://doi.org/10.5755/j01.eee.19.3.3697
A. Phinyomark, S. Thongpanja, H. Hu, P. Phukpattaranont, C. Limsakul, The usefulness of mean and median frequencies in electromyography analysis. In: Naik, G.R. (ed.) Computational Intelligence in Electromyography Analysis. IntechOpen, Rijeka (2012). Chap. 8. https://doi.org/10.5772/50639
C. Coronel, H. Garn, M. Waser, M. Deistler, T. Benke, P. DalBianco, G. Ransmayr, S. Seiler, D. Grossegger, R. Schmidt, Quantitative EEG markers of entropy and auto mutual information in relation to MMSE scores of probable alzheimer’s disease patients. Entropy 19(3), 130 (2017). https://doi.org/10.3390/e19030130
C. Tsallis, Possible generalization of boltzmanngibbs statistics. J. Stat. Phys. 52(1–2), 479–487 (1988)
D. Zhang, X. Jia, H. Ding, D. Ye, N.V. Thakor, Application of tsallis entropy to eeg: quantifying the presence of burst suppression after asphyxial cardiac arrest in rats. IEEE Trans. Biomed. Eng. 57(4), 867–874 (2009)
L. Sarlabous, A. Torres, J.A. Fiz, J. Gea, J.M. MartínezLlorens, J. Morera, R. Jané, Interpretation of the approximate entropy using fixed tolerance values as a measure of amplitude variations in biomedical signals, in 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pp. 5967–5970 (2010). IEEE
D. Abásolo, J. Escudero, R. Hornero, C. Gómez, P. Espino, Approximate entropy and auto mutual information analysis of the electroencephalogram in alzheimer’s disease patients. Med. Biol. Eng. Comput. 46(10), 1019–1028 (2008). https://doi.org/10.1007/s1151700803921
J.S. Richman, D.E. Lake, J.R. Moorman, Sample entropy, in Numerical Computer Methods, Part E. Methods in Enzymology, vol. 384, pp. 172–184. Academic Press (2004). https://doi.org/10.1016/S00766879(04)840114
Y. Zhang, S. Wei, C.D. Maria, C. Liu, Using lempel–ziv complexity to assess ECG signal quality. J. Med. Biol. Eng. 36(5), 625–634 (2016). https://doi.org/10.1007/s4084601601655
C. Gómez, Á. Mediavilla, R. Hornero, D. Abásolo, A. Fernández, Use of the higuchi’s fractal dimension for the analysis of MEG recordings from alzheimer’s disease patients. Med. Eng. Phys. 31(3), 306–313 (2009). https://doi.org/10.1016/j.medengphy.2008.06.010
A. Hamad, E.H. Houssein, A.E. Hassanien, A.A. Fahmy, Feature extraction of epilepsy eeg using discrete wavelet transform, in 2016 12th International Computer Engineering Conference (ICENCO), pp. 190–195 (2016). https://doi.org/10.1109/ICENCO.2016.7856467
P. Matuszyk, R. Castillo, D. Kottke, M. Spiliopoulou, A Comparative Study on Hyperparameter Optimization for Recommender Systems, in Workshop on Recommender Systems and Big Data Analytics, pp. 13–21 (2016)
M. Sokolova, G. Lapalme, A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
R.T. Warne, A primer on multivariate analysis of variance (manova) for behavioral scientists. Pract. Assess. Res. Eval. 19 (2014)
M. Wang, K. Kasmarik, A. Bezerianos, K.C. Tan, H. Abbass, On the channel density of eeg signals for reliable biometric recognition. Pattern Recogn. Lett. 147, 134–141 (2021). https://doi.org/10.1016/j.patrec.2021.04.003
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Not applicable.
Corresponding authors
Ethics declarations
Competing interests
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
CarriónOjeda, D., MartínezArias, P., FonsecaDelgado, R. et al. Evaluation of features and channels of electroencephalographic signals for biometric systems. EURASIP J. Adv. Signal Process. 2024, 58 (2024). https://doi.org/10.1186/s1363402401155x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363402401155x