Evaluation of features and channels of electroencephalographic signals for biometric systems

Biometric systems are essential tools in modern society where most of our personal information lives in digital form. Although there is a significant variety of biometrics, electroencephalogram (EEG) signals are a useful technique to guarantee that the per‑ son is alive, they are universal, and not falsifiable. Nevertheless, EEG processing needs to address some challenges to become a viable technique to build production‑ready biometric systems. These challenges include the adequate selection of features and channels that maximize the quality of the results and optimize resources. This work provides an analysis of which are the most important features and channels for the correct operation of a biometric system. The experimental analysis worked with two datasets and evaluated 19 features belonging to three groups, wavelet‑based, spectral, and complexity. Five classifiers were trained: multilayer perceptron, AdaBoost, random forest, support vector machine, and K‑nearest neighbors. The results found that the best feature for developing a biometric system is the standard deviation extracted from the coefficients of a three‑level discrete wavelet transform. Addition‑ ally, the experimental results with the two datasets showed that the proposed method for channel selection can reduce the necessary number of channels while maintaining its performance. Our results, from one of the datasets, showed a reduction of 21 chan‑ nels (from 32 to 11) and indicated that the best channels to develop biometric systems seem to be those located on the central area of the scalp.


Introduction
Most modern authentication systems rely on identification numbers (PIN), image-based techniques such as fingerprints, facial patterns, iris patterns, and hand geometry [1].However, these techniques could be easily tricked or simulated, or in the case of PINs, they are also prone to be forgotten or stolen.Electroencephalographic signals have been proposed as an alternative technique for biometric systems since they are universal, meaning every human being produces EEG signals.Moreover, these signals can show information about cognitive processes and subject-specific genetics.The disadvantages of this kind of biometrics are related to the noise associated with brain signals, making them more complex and challenging than traditional biometric traits.Nonetheless, some features extracted from the EEGs can help process and classify them to differentiate among subjects.
Another concern when working with electroencephalograms for biometric systems development is how to record and use them with a functional, portable, and low-cost implementation.For this reason, biometric systems need a selection of channels to extract the most essential and helpful information to classify subjects.It is beneficial to avoid a full set of 14 or 32 electrodes, which are the typical number of channels found in commercial EEG headsets.The channel selection process could result in constructing biometric systems based on EEG signals with fewer channels to decrease the computational and monetary cost while keeping high performance and portability.
The proposed methodology to find the best features and channels include two parts.The first part focuses on identifying the best feature or set of features to extract from the EEG signals that effectively distinguish one person from another.This evaluation compares the subject classification results using five machine learning algorithms for 19 different EEG extracted features.The second part of this study concentrates on selecting the most appropriate number of EEG channels to identify between subjects and analyze their impact on the performance of the system.
The main contribution of this work is a methodology to select the best feature or features and the optimal number of EEG channels.In this work, the best feature was the standard deviation extracted from the coefficients of a three-level discrete wavelet transform, and the optimal number of channels was different for each dataset, obtaining a reduction of 21 channels in the best scenario.Additionally, another interesting contribution is the collection of characteristics useful for EEG-based biometric systems.The article is organized as follows.Section 2 summarizes the related work on the feature and channel selection.Section 3 describes the datasets used in this work and the proposed methodology.Section 4 presents the experimental results and their corresponding analysis.Finally, Sect.6 contains the conclusions and future work.

Related works
The ability of the features to capture the most distinctive characteristics of the subjects is critical to achieving high performance in EEG-based applications [2].Due to the wide range of applications based on EEGs, researchers have focused on finding a suitable method for feature selection during recent years.The most common methods determine the best feature and include a channel selection approach [3].However, most feature or channel selection methods target specific activity classification where the goal is to differentiate between activities rather than people.For this reason, the results of those methods cannot be directly applied in the development of biometrics since the neurological stimulus used for recording the signals does not matter from a security point of view but rather the precision to distinguish one person from another.
Schroder et al. [4] proposed an automatic feature selection combining a genetic algorithm with a Support Vector Machine (SVM).They considered each channel as a feature, so their method aimed to find the best channels for developing brain-computer Interfaces (BCIs).The authors concluded that the optimal set of features for developing BCI is strongly dependent on the subjects and the experimental paradigm.In the development of biometric systems, a requirement is to establish a method independent of the involved people.
Even though the channels can be used as features, most EEG-based applications compute some features capable of representing the channels' most relevant information while reducing the dataset dimension.For this reason, [5] first evaluated a set of features including the autoregression (AR) model, the power spectrum of the time domain (TPS), the power spectrum of the frequency-domain (FPS), and phase-locking value (PLV).These features were extracted from all the available channels and then evaluated with SVM.After the feature evaluation, they employed a support vector machine recursive feature elimination (SVM-RFE) to select the most discriminative channels, but only for the AR model because it showed the best performance in the previous step.Despite the selection of channels, the authors proposed using 32 EEG channels.Considering that using 32 channels still makes it hard to develop portable biometric systems, more features need to be evaluated to further reduce this number.
Another approach for selecting the best features and channels is finding a measurement representing intra-subject and inter-subject variability.In this kind of measurement, the best features and channels have minimum intra-subject variability and maximum inter-subject variability.Following this approach, Kang, Lee, and Kim [1,6] provided two analyses of features and channels for EEG-biometrics.In the first analysis, they evaluated power spectral density (PSD) features and Lyapunov exponents with a criteria index (CI) that consisted of three types of variances.Using this index, they found that the maximum Lyapunov exponent had the maximum CI value among all types of EEG features.Furthermore, two (T4, T6) out of sixteen channels had the highest CI values overall brain areas [1].
A second study [6] evaluates the performance of seven features (alpha/theta, alpha/ beta, theta/beta power ratio, sample entropy, permutation entropy, entropy, and median values of distribution) extracted for each of the sixteen EEG channels.As a result, 112 features composed their initial set of features.Then, they used mutual information to select the best ones.After their analysis, the set of features was reduced to nine, and the user identification results were higher than using PCA reduction [6].Regardless of the proposed measurements' efficiency, both works' main limitations were the dataset's size because it only contained the information of seven subjects with recordings of sixteen channels.
A different approach for features and channels selection for a subject identification is presented by [7].The authors first performed a channel selection guided by the assumption that electrooculographic (EOG) interference in the resting-state with eyes open and the bursting alpha activity in the resting-state with eyes closed could lead to low authentication.Using the spectral power, they investigated the effect of these two factors and finally reduced the number of channels from 56 to 34.They analyzed, using the selected channels, ten single-channel features (seven spectral and three nonlinear) and ten multichannel features by conducting network analysis based on phase synchronization.The authors were able to select the best feature for the dataset in the study.
Despite the existence of various approaches for selecting optimal features and channels in previous research, there is a notable absence of studies that confirm their findings across multiple datasets obtained from diverse EEG headsets.Furthermore, there is a deficiency in comparative analysis involving classification algorithms to ascertain the most effective feature-channel pairing.Additionally, features derived from wavelet transform have been demonstrated to be effective for user identification [8][9][10].Consequently, incorporating wavelet-based features into the initial feature set could significantly enhance the comprehension of EEG-based biometric systems.

Methodology
The proposed method is divided into four main steps: data acquisition, feature extraction, feature selection, and channel selection.The following subsections discuss the details of each step.Moreover, it is important to mention that this study extends the EEG-based applications using Python (EBAPy) framework [11].Due to the flexibility of this framework, this study added functionalities to EBAPy [12].

Data acquisition
This work employed two datasets to train, test, and experiment with our methodology.The first dataset is the open-access dataset "DEAP" [13].It has recordings of 32 healthy participants (50% female), aged between 19 and 37 (mean age is 27).Each recording has 32 EEG channels, 12 peripheral channels, three unused channels, and one status channel.All subjects recorded 40 trials, and each trial has a duration of 63 seconds (3 seconds of baseline and 60 seconds of trial).In each trial, the participants were presented with a YouTube video aimed at evoking particular sentiments.After each video, the subjects scored their level of valence, arousal, dominance, and liking.
Even though the DEAP dataset aims to analyze human affective states, it can be correctly used for biometrics development.It is only needed to know which subject each EEG corresponds to, regardless of the task performed at recording time.Furthermore, using a dataset with different affective states is advantageous because each subject's EEGs can vary significantly between the trials providing a more realistic environment.
DEAP dataset has two versions, one with the raw signals and another one with already preprocessed signals.This work uses the preprocessed version of the DEAP dataset.In this version, the EEG signals were downsampled from 512 Hz to 128 Hz, passed through a band-pass frequency filter from 4-45 Hz, and through a common average reference filter to improve the signal-to-noise ratio.Additionally, the electrooculographic (EOG) artifacts were removed using a blind source separating technique.
The second dataset is the open-access "BIOMEX-DB" [14].The main reason for using another dataset was to increase the reliability of this study.In contrast to DEAP, this dataset aims to develop multimodal biometric systems by providing EEG, audio, and video recordings.However, our research only analyzes the EEGs.
The information of BIOMEX-DB corresponds to 51 healthy participants (49% female), aged between 16 and 61 (mean age 29).Each participant registered 135 trials of 2.5 seconds, where each trial was the pronunciation of a number between 1 and 10.Each EEG was recorded with a frequency of 2048 Hz and 14 channels.However, the sampling frequency was downsampled to 128 Hz.
As with DEAP, the original labels of BIOMEX-DB were replaced by the subjects of each EEG.Based on the results of previous works [15,16], this research uses 1.75 seconds of recording taken randomly from each trial.

Feature extraction
This study evaluates 19 features belonging to three categories: spectral, complexity, and wavelet-based.Each feature is detailed below.

Spectral features
Dauwels et al. [17] used the spectral features to quantify the changes in the signal given by its power.When working with spectral features, it is necessary to work with the Fourier transform of the signal.

Power Spectral Density (PSD)
The PSD shows the strength of the energy variation of a signal as a function of the frequency.It is defined as the discrete-time Fourier transform of the covariance sequence [18].Nevertheless, in this work, the PSD was obtained using the "Periodogram, " which is a MATLAB function.The periodogram is a nonparametric estimate of the PSD, and it is defined as the Fourier transform of the biased estimate of the autocorrelation sequence [19].To calculate this, we use the frequency sample of the signals, which is 128Hz.Total Average Power (TAP) Power is defined as work per time, meaning the amount of energy transferred per unit of time.The following formula is used to calculate the total average power.Considering a signal x: Median Frequency (MFreq) The MFreq is defined as the frequency in which the total spectral power is halved [20].It is expressed by the following equation: where P j is the EEG power spectrum at the frequency bin j, MDF is the frequency value in which the power spectrum is divided in two parts with equal integrated power [21], and M is the length of the frequency bin [22].Relative Power (RelPow) This method consists of the computation of the spectral power in each frequency band of the EEG, and then, it is necessary to compute the percentage of the total power that each band has.For computing the RelPow, this work used four frequency bands: δ (0-4 Hz), θ (4-8 Hz), α (8-12 Hz), and β (12-30 Hz).

Complexity features
This category of features quantifies the entropy and complexity of a system.In the context of information theory, entropy is defined as the measure of the uncertainty (1) TAP = |x| 2 length(x) . ( associated with a random variable [17].The greater the entropy, the more complex the system is Tsallis Entropy (TsEn) The TsEn is the generalization of the Boltzmann-Gibbs entropy [23] and it is determined by quantizing the amplitude of the EEG.As mentioned in [24], given a discrete set of probabilities {p i } with the condition that p i = 1 , the TsEn is defined as: where q ∈ R , W ∈ N is the total number of possible configurations.and k is a con- ventional positive constant.For this work we used k = 1 and q = 3 , as recommended in several studies for determining changes in a signal [25].
Approximate Entropy (ApEn) It measures the logarithmic likelihood that runs of patterns that are close (within a window denominated by r) for m contiguous observations on subsequent incremental comparisons.In this work, we used the values of r = 0.01 and m = 1 to determine the amplitude variation in the signal [26].To compute the ApEn, the following steps should be followed [27]:

Define d[x(i), x(j)
] which is the distance between x(i) and x(j), as the maximum absolute difference between their respective scalar components 5.For a given x(i), count the number of j ( j ∈ {1, ..., N − m + 1} ) so that d[x(i), x(j)] ≤ r denoted as N m (i) .Then: Find the natural logarithm of each C m r (i) and average it over i 7. Increase the dimension of m to m + 1 and then find C m+1 r (i) and � m+1 (r) 8. Define approximate entropy as: Sample Entropy (SampEn) The SampEn is a refinement of the ApEn designed to have a smaller bias since it does not include self-similar patterns [28].Assuming a time-series sequence of length N , Then, the distance function d[x m (i), x m (k)] is defined to be the maximum difference between the components of the vectors: Finally, the SampEn is defined as: where A is the number of template vector pairs having d[x m+1 (i), x m+1 (k)] < r , and B is the number of template vector pairs having d[x m (i), x m (k)] < r [28].The result is always greater than zero, and a small value is an indicator of less noise and more self-similarity.For this feature, we also used the values of r = 0.01 and m = 1.

Lempel-Ziv Complexity (LZC)
This method counts the number of different patterns in a signal of length n.The fewer such patterns, the better a signal may be compressed.The compression rate is a measure of the regularity of a signal [29].
The original signal must be coarse-grained and transformed into a symbol sequence for simplifying the computation.To generate a two-state sequence, signal ( R = {r(1), r(2), . . ., r(n)} ), the following equation is applied: where n is the length of the signal x(n) and T h is the threshold, which normally is the mean value of the sequence.
Using the binary sequence R, the vector c(n), which is a counter of the different patterns, for a binary symbol sequence, is calculated following the below process: 1. Let S and Q denote two strings.SQ is the concatenation of S and Q. SQπ is the concatenated string with the last character deleted, and v(SQπ) is the vocabu- lary of all the different substrings of SQπ .For example, consider increases by one.4. c(n) is updated because a new pattern was found, then S and Q are also updated: sequence.Also, c(n) may vary with length, therefore, it should be normalized.The upper bound of c(n) is: where n is the length of the sequence, α the number of different symbols, and ǫ n is a small quantity.Additionally, ǫ → 0 when n → ∞.
Finally, c(n) can be normalized as: Higuchi Fractal Dimension (HFD) According to Giannakakis et al. ( 2014), the fractal dimension is a nonlinear measure on the time domain that is used to characterize the complexity of a time series.It is helpful to quantify the complexity and self-similarity from a signal [30].The degree of complexity of the sequence increases as the fractal dimension increases.The following procedure was used to obtain this measure.Given a one-dimensional time-series X = x(1), x(2), ..., x(N ) , form k new time-series X m k defined by: where k and m are both integers, and int( * ) makes reference to the integer part of ' * ' .k represents the discrete time interval between points, and m = 1, 2, ..., k is a repre- sentation of the initial time value [30].In this sense, taking as an example k max = 3 and N = 100: For each of the time series constructed by the previous equation, the length is computed in the following way: with N as the length of the original time-series X, and 1 : X(1), X(4), X(7), . . ., X(94), X(97), X(100) X 3  2 : X(2), X(5), . . ., X(95), X(98) X 3  3 : X(3), X(6), . . ., X(96), X(99) Those steps are repeated k max times for each k = 1, 2, . . ., k max The next step in the procedure is to plot L(k) against 1/k on a double logarithmic scale, with k = 1, 2, . . ., k max .The result should be the data falling on a straight line that has a slope equal to the fractal dimension of the time-series X.Therefore, Higuchi fractal dimension (HFD) is defined as the slope of the line that fits the pairs {ln(L(k)), ln(1/k)} using a least-square method to determine it.The value of k max is chosen at the point in which the fractal dimension is considered a saturation point.A value of k max = 60 was chosed for this study.

Wavelet-based features
This study analyzes the ten features proposed by [31].All features are calculated after applying a discrete wavelet transform (DWT).Previous experimental results by Carrion-Ojeda et al. [15,16] showed that a three-level DWT with Daubechies-4 as mother wavelet provides efficient features.For this reason, this study uses those parameters for the DWT.All the detail coefficients (D) and the last approximation coefficient (A) were analyzed to compute the features.Below are the equations for all the extracted features and in all of them and N is the length of the coefficients.

Maximum per Wavelet Coefficient (max):
Minimum per Wavelet Coefficient (min): Mean per Wavelet Coefficient: Standard Deviation per Wavelet Coefficient: Variance per Wavelet Coefficient: Median per Wavelet Coefficient: Median of each coefficient.

Skewness per Wavelet Coefficient:
Energy per Wavelet Coefficient: Relative Wavelet Energy: For computing the RWE, first the total energy is need: Then, the RWE is defined as follows: Entropy per Wavelet Coefficient:

Feature selection
An individual evaluation was carried out for each of the features detailed in Section 3.2 to select the best one.The entire process followed to evaluate and select the best feature is detailed below.

Hyperparameter optimization
This study assessed five classifiers: multilayer perceptron (MLP), AdaBoost (AB), random forest (RF), support vector machine (SVM), and k-nearest neighbors (kNN).For optimizing the hyperparameters of all classifiers, a greedy search optimization was applied.This algorithm uses a hyperparameter set and analyzes all possible combinations generated using the parameters contained in that set [32].The optimization was applied individually for each feature in each dataset.Additionally, tenfold cross-validation was used to increase the reliability of the selection of hyperparameters.The set of parameters for each classifier was the same throughout the optimization process and is shown in Table 1. (20) For each dataset, 20% of the available data was extracted to perform hyperparameter optimization.These data were divided into a training and testing set to carry out the greedy search optimization.This division was performed in a balanced way to avoid possible bias when selecting the best parameters.For this reason, 75% of the trials of each subject were randomly selected for the training set, and the remaining 25% made up the test set.For avoiding favoring any classifier, all of them were optimized using the same folds.Furthermore, it is worth mentioning that the data used at this stage was not reused at any later stage.

Classification
After obtaining the best combination of parameters for each classifier, these were independently trained for each dataset and feature.As mentioned in Sect.3.1, this study followed a multi-class classification approach where each subject corresponds to a class, resulting in 32 classification categories for DEAP and 51 classification categories for BIOMEX-DB.For training the classifiers, a closed set strategy was followed, meaning that each classification category was presented during training.

Performance evaluation and best feature selection
As this study followed a multi-class classification approach, three of the multi-class performance metrics proposed by [33] were computed.The three computed metrics are macro-averaging sensitivity (Se), macro-averaging specificity (Sp), and average accuracy (Acc).All these metrics are based on the confusion matrix of the classifier, and their formulas are the following: where l is the number of participants (classes), Tp i are the true-positive classifications, Tn i are the true-negative classifications, Fp i corresponds to the false-positive classifi- cations and Fn i corresponds to the false-negative classifications; all of them of the i th subject.
For increasing the reliability of the experimental results, a tenfold cross-validation was used.The same process described in Sect.3.3.1 was followed to obtain balanced folds.This stage used the data that were not used during optimization.Finally, the best feature was the one that produced the highest three performance metrics.

Channel selection
Channel selection is divided into two main stages.The first is responsible for evaluating each channel individually, while the second looks for the optimal number of channels.

Single-channel evaluation
For evaluating each channel individually, the "good" features were first selected based on the results obtained after applying the process described in Sect.3.3.A feature was considered "good" if at least three classifiers achieved a performance higher than 85% in all evaluation metrics.This selection was performed independently for the two datasets.
Once the good features were identified, new data matrices were created for each subject.The dimension of the matrices was t × n , where t is the number of trials of each subject and n corresponds to the concatenation of the good features using a single channel.These new matrices were created for all available channels, resulting in matrices for each channel instead of each feature.Subsequently, the new dataset was divided in the same way as before, i.e., 20% for optimization and 80% for evaluation.
This study used a MLP to perform the channel evaluation due to its potential to find patterns in complex datasets.This classifier was optimized using the same approach described in Sect.3.3.1.Note that a single MLP was optimized for all channels.For (26) optimizing a single MLP, the optimization folds of each channel were joined into a single fold.Consequently, each fold contained the same amount of information from each channel and continued to have a balanced selection in terms of subjects.In this way, we maintained the impartiality of the classifier over the channels.Table 2 contains the results of MLP optimization for each dataset.After the optimization, the channels were evaluated individually using the optimized MLP and the correct classification rate (CCR) as a performance metric.The CCR is defined as: This evaluation was used to find the best channel to develop a biometric system using a single EEG channel.It also allowed ordering the channels from best to worst based on the CCR to perform the subsequent analysis.

Optimal number of channels assessment
This assessment worked only with the best feature and classifier found in Sect.3.3, which was standard deviation.The assessment consisted of analyzing the performance of the classifier with the three metrics explained in Sect.3.3.3using a different number of channels starting from a single channel until reaching the total available channels for each dataset.The variation was made progressively, increasing one channel each time.
This study proposes to use the channel ordering method described in the previous section.For this reason, to demonstrate the efficiency of the proposed method, a comparison was made against a random ordering.For the random ordering method, the results of 10 executions (10 different ways to sort channels) were averaged.Subsequently, a multivariate analysis of variance (MANOVA) was conducted to determine the optimal number of channels for each method.The MANOVA is a method for testing statistical significance in differences among multivariate sample means and is the multivariate extension of the univariate analysis of variance (ANOVA).This method offers advantages over conducting ANOVA test for each dependent variable, at the cost of added complexity, as it reduces the likelihood of type 1 errors, i.e., wrongly rejecting a true null hypothesis, and captures correlations among combinations of independent variables [34].In this scenario, the optimum corresponds to the number of channels from which that number is no longer an influencing factor in the performance of the classifier.The MANOVA analyzed the results varying the number of channels from all until a single channel.Consequently, the optimal number of channels is the least number of channels that maintain a p-value greater than 0.05.To verify if there was a significant difference between the proposed ordering method and the random ordering, the lowest performance metric among the Se, Sp, and Acc was selected to perform a Wilcoxon test between both methods.This test was applied independently for each channel evaluation, i.e., 1 channel, 2 channels, . . ., c channels , where c is the total amount of channels.

Experimental results
The results and analysis are divided into feature selection, single-channel evaluation, and the optimal number of channels assessment.Moreover, the results for each dataset are presented separately.

Feature selection
Figure 1 shows a graphic representation of the sensitivity of the classifiers using the 19 features mentioned in Sect.3.2 for the case of DEAP dataset.The features are ordered from best to worst based on the best classification results.The best feature was standard deviation per wavelet coefficient using a MLP, while the worst was skewness per wavelet coefficient using RF. Figure 2 illustrates the difference between those two features using the confusion matrix of the MLP.
Sensitivity is shown in this section because it was the lowest performance metric and it helps to distinguish between features.Accuracy and specificity were also calculated but not presented in this section.Figure 1 shows that even though complexity features can be useful for other applications such as medical applications, they do not seem to be the most suitable choice for developing biometric systems.On the contrary, some of the spectral and wavelet features appeared to be a better option for Fig. 1 Boxplots of the feature categories grouped by classifier using DEAP dataset developing these systems.However, it is worth mentioning that not all wavelet-based features lead to high results; for example, the worst feature (skewness) belongs to this category.
For BIOMEX-DB, Fig. 3 shows the sensitivity boxplots of the classifiers with each feature.For this dataset, complexity and spectral features did not lead to results as good as those obtained with some wavelet features.Besides, this figure helped to verify that the performance of the system heavily relies on the classifier.For example, the performance using the minimum of the DWT coefficients varies a lot using MLP regarding RF.As in the case of DEAP, the best feature was standard deviation using a MLP and the worst feature was the skewness using kNN.The difference between these two features using the MLP is depicted in Fig. 4.

Single-channel evaluation
For evaluating each channel of the DEAP dataset; the wavelet-based features: Std, Energy, Var, Max, Entropy, and Min were the features that fulfilled the requirements explained in Sect.3.4.1.There were also two spectral features: PSD and TAP.Using these features and the optimized MLP (Table 2), the single-channel evaluation was performed.Table 3 exhibits the 32 available channels ordered from best to worst based on their correct classification rate.Figure 5 aids to visualize how the channels were placed and the order found with the MLP.By analyzing this figure, the best channels were those located in the central area of the scalp.On the other hand, the best performing features for the BIOMEX-DB dataset were one spectral feature (RelPow) and seven wavelet-based features: Std, Energy, Entropy, Var, RWE, Max, and Min.Table 4 presents the 14 available channels ordered descending by their CCR.As with DEAP, for better visualization of the results, Fig. 6 indicates the location of the channels and their corresponding order for the BIOMEX-DB dataset.For this dataset, the best channels tend to be located on the right side of the scalp.

Optimal number of channels
Figure 7 illustrates the results of the analysis of the number of channels.The proposed methods started using one channel and then increasing by one the number of selected channels until using all available channels.The channels were added following the order presented in Table 3 and a random ordering.As before, only the sensitivity is presented due to this metric was the lowest.This figure encouraged the subsequent analysis to find the optimum number of channels.
Table 5 contains the results of the statistical analyses using MANOVA and Wilcoxon tests for the DEAP dataset.In this table, the first column shows the number of channels.The second column shows the p-value from comparing between the MLP sensitivity with n channels and n + 1 channels.The third column shows the p-value  from comparing between the sensitivity achieved by a random selection method with n channels and n + 1 channels.The fourth column shows the p value resulting from the Wilcoxon test between the proposed method and a random selection.
A p value less than 0.05 in the MANOVA analysis means that the number of channels impacts the overall performance of the system.The results of this analysis shows that the proposed MLP was statistically better than randomly selecting the order of the channels because by using the proposed MLP, the optimal number of channels was 11, while by using the random selection, this number increases to 25.On the other hand, for the Wilcoxon Test, a p value less than 0.05 is interpreted as a statistically significant difference between the evaluated methods.These results verified that there is a difference between the results of the two methods in almost all the cases indicating the efficiency of the proposed selection method.In the Tables 5 and 6, bold values are p-values less than 0.05, and the values with * correspond to the optimal number of channels.
Due to the results mentioned above, the suggested biometric system for the DEAP dataset has the following characteristics: EEG recordings of 1.75 seconds using the best 11 channels selected by the proposed MLP, and standard deviation extracted from a Fig. 6 Channels position and order for the BIOMEX-DB dataset.In (b), 1 corresponds to the best channel while 14 to the worst Fig. 7 Comparison between proposed method for ordering channels (shown as MLP) with respect to a random ordering using DEAP dataset three-level DWT as a feature.Figure 8 illustrates the difference in the sensitivity of the classifiers using the proposed number of channels regarding using all available channels.In both cases, the classifiers were evaluated using the standard deviation per wavelet coefficient.Despite the difference in the classification performance between using all the available channels and the proposed optimum, the previously mentioned MANOVA showed that this difference is not statistically significant.
In the case of the BIOMEX-DB dataset, Fig. 9 compares the proposed method for selecting the channels with a random selection, while varying the number of channels.

Table 5 Statistical analysis of sensitivity, using DEAP dataset
The columns show: first, the number of channels; second and third columns, the p-value from comparing the MLP and random sensitivity with n and n + 1 channels, respectively; and the last column, the p-values of the Wilcoxon test between the proposed method and a random selection The optimal number of channels is identified using the same statistical analysis used with DEAP dataset.In this case, that number is 13 channels.Table 6 contains the resulting p-values of the statistical analysis.This table follows the same structure of Table 5 explained above.For this dataset, the results of the MANOVA indicate that for both methods, the optimal number of channels is 13.Despite this result, in general, the p-values of the proposed MLP were less than the ones of the random selection.Moreover, the equality in the optimum could be because, in this dataset, the channels used for recording the EEGs were located at the left and right zones of the scalp.However, the results presented in Sect.4.2 for the DEAP dataset, which uses more EEG channels distributed over a wider area, indicate that the best channels are located on the center of the scalp.

MANOVA
Besides, the Wilcoxon test results verified that the two evaluated selection methods are statistically different in almost all scenarios.The only scenario where these methods were not statistically different was when working with a single channel.This result was expected since when working with a single-channel biometric system, it is extremely difficult for the results of using a specific channel to be better than the results of using any other channel.8 Comparison between the classifiers performance using the proposed optimum number of channels (11) with respect to all available channels (32) using DEAP dataset Fig. 9 Comparison between proposed method for ordering channels (MLP) with respect to a random ordering using BIOMEX-DB dataset The search space of all combinations of features and channels grows exponentially.Therefore, we limited this study to find the best feature and the minimum number of channels.
The two databases that were used in this study only include data from a single session.This could potentially affect the quality of the biometric features that need to be consistent over time for the same subject.BIOMEX-DB is a dataset specifically created for biometrics but only includes one session.
Recent studies [35] show promising results with deep neural networks that tackle the feature extraction as part of their architecture.It would be interesting to contrast our work with these new architectures to evaluate the results in both cases.

Conclusions and future work
The main contribution of this work is a methodology to select the best feature and the optimal number of channels for developing a biometric system based on EEG signals.This study worked with two datasets independently to evaluate the effectiveness of the proposed methodology.Moreover, nineteen features belonging to three categories (spectral, complexity, and wavelet-based) were described and evaluated in this investigation.
The experimental results using two different datasets showed that wavelet-based features are the best option for developing biometric systems based on EEG signals.Additionally, the standard deviation per wavelet coefficient proved to be the most efficient feature among all the analyzed features in this study to represent the differences between the subjects effectively.On the contrary, although the complexity features are commonly used in medical applications, they generally led to the lowest performance in this study.
Additionally, the evaluation of the EEG channels showed that it is not necessary to use all available ones.For instance, in the case of the DEAP dataset, using approximately one-third of the available channels produced results statistically equal to those obtained with all channels.This reduction in the number of channels is highly beneficial since it decreases the computational cost and increases the portability of the final system.Moreover, the experimental results seem to indicate that the best channels for developing biometric systems are located on the center of the scalp.The results show the viability of biometric systems based on EEG signals because a judicious selection of characteristics, based on an analysis as described in this paper, enables the implementation of an efficient and relatively simple biometric system.
As future work, there are many directions of research; here, we include the main ideas that we consider worth exploiting in the future.
• Include an in-depth analysis of the standard deviation to understand why this feature can effectively represent the differential factors of each subject.• Analyze the impact of the location of the EEG channels on the overall performance of the system because this can significantly increase the understanding of the factors that influence the performance of EEG-based biometric systems.• Consider collecting EEGs over several sessions to guarantee the robustness of the biometric features over time.• Explore multivariate feature selection to find out the performance of combining features.
• Evaluate different architectures including deep neural networks because the MLP was the best classifier for both datasets indicating that neural networks are a promising approach.

Fig. 2 Fig. 3
Fig. 2 Confusion matrices of the best and worst feature for DEAP dataset using the MLP classifier

Fig. 4
Fig. 4 Confusion matrices of the best and worst feature for BIOMEX-DB dataset using the MLP classifier

Fig. 5
Fig. 5 Channels position and order for the DEAP dataset.In (b), 1 corresponds to the best channel while 32 to the worst

Fig.
Fig.8Comparison between the classifiers performance using the proposed optimum number of channels(11) with respect to all available channels (32) using DEAP dataset

Table 1
Set of values for hyperparameter optimization

Table 2
Best hyperparameters for single-channel evaluation using MLP with DEAP and BIOMEX-DB datasets

Table 3
DEAP dataset channels in descending order by their correct classification rate

Table 4
BIOMEX-DB dataset channels in descending order by their correct classification rate