A review of channel selection algorithms for EEG signal processing

Digital processing of electroencephalography (EEG) signals has now been popularly used in a wide variety of applications such as seizure detection/prediction, motor imagery classification, mental task classification, emotion classification, sleep state classification, and drug effects diagnosis. With the large number of EEG channels acquired, it has become apparent that efficient channel selection algorithms are needed with varying importance from one application to another. The main purpose of the channel selection process is threefold: (i) to reduce the computational complexity of any processing task performed on EEG signals by selecting the relevant channels and hence extracting the features of major importance, (ii) to reduce the amount of overfitting that may arise due to the utilization of unnecessary channels, for the purpose of improving the performance, and (iii) to reduce the setup time in some applications. Signal processing tools such as time-domain analysis, power spectral estimation, and wavelet transform have been used for feature extraction and hence for channel selection in most of channel selection algorithms. In addition, different evaluation approaches such as filtering, wrapper, embedded, hybrid, and human-based techniques have been widely used for the evaluation of the selected subset of channels. In this paper, we survey the recent developments in the field of EEG channel selection methods along with their applications and classify these methods according to the evaluation approach.


Introduction
Digital processing of EEG signals plays an important role in a variety of applications, e.g., seizure detection/prediction, sleep state classification, and motor imagery classification. Digital processing of EEG signals consists of different components: signal acquisition unit, feature extraction unit, and a decision algorithm as shown in Fig. 1. The input to the system in Fig. 1 is an EEG signal acquired from the scalp, brain surface, or brain interior. The signal acquisition unit is represented by electrodes whether they are invasive or non-invasive. The feature extraction unit is a signal processing unit aiming to extract discriminative features from channel(s). The decision unit, in brain computer interface (BCI) for example, is a hybrid unit with the purpose of classification, decision-making, and passing the decisions to external devices outputting the intention of the subject [1].
As mentioned above, the interface between the brain and the computer (or a device) could be through invasive or non-invasive technologies. Although invasive technologies have recently shown some promises in different applications for their large accuracy and low noise [2], noninvasive technologies are still used extensively for safety purposes with some additional signal processing tasks to compensate for the noise and resolution limitations. Scalp EEG acquisition devices are generally preferred due to their low-cost, ease of use, portability, and high temporal resolution. The scalp EEG signals can be recorded by different modes such as unipolar and bipolar modes. In the former mode, the voltage differences between all electrodes and a reference one are recorded, where a channel is formed by an electrode-reference pair. On the other hand, in the bipolar mode, the voltage differences between two specified electrodes are recorded, where each pair forms a channel. An electrode placement scheme on scalp, known as International 10-20 system, was recommended by the International Federation of Societies for Electroencephalography and Clinical Neurophysiology (IFSECN) [3]. Figure 2 shows the 10-20 EEG electrode positions for the placement of electrodes from the left and the top of the head. These electrodes (channels) show the activities of different brain areas. Figure 3 shows the brain areas.
Most of the useful information about the functional state of a human brain lies in five major brain waves distinguished by their different frequency bands. These frequency bands are delta band (0-4 Hz), theta band (3.5-7.5 Hz), alpha band , beta band (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26), and gamma band   [4]. Delta waves are related to the deep sleep state. Theta waves are related to the deepest state of mediation (body asleep/mind awake). Alpha waves are related to the case of dreaming and relaxation. Beta waves are the dominant with the waking state with large attention. Gamma waves are highly related to the decision-making mode of the brain. When dealing with mental illnesses states, unexpected disturbances of the brain waves occur leading to the need of considerable signal processing burdens for diagnosis of abnormal states [4].
The acquired EEG signals are generally of multi-channel nature. To classify these signals, for example, we have two choices: to work on a subset of channels selected based on certain criteria or to work on all channels [5]. Figure 4 gives an illustration for the general process of EEG signal classification based on channel selection. In this signal processing setting, reducing the number of channels is needed because the setup process with a large number of channels is time-consuming and causes subject inconvenience. In addition, it adds to the computational complexity of the system, which is required to be low in certain applications.
Another example where channel reduction is of a potential value is in seizure detection and prediction. In particular, there is a great interest from the industry and scientific community in the development of portable medical support systems that incorporate algorithms capable of detecting early onset of epileptic seizures or even predicting them hours before they occur, as this will help to alert ambulatory patients or caregivers before seizure occurs to avoid injury [6,7]. The development of such portable systems should be based on computationally efficient prediction algorithms that make use of as minimum number of channels as possible to reduce system power consumption, a necessary step to maintain longer time of operation.
Various techniques have been investigated for channel selection in the processing of EEG signals. This paper presents a survey for the recent developments in this field. Several flowcharts and tabular forms are presented to enable the reader to explore the different channel selection algorithms, to determine their classification Fig. 1 Processing of EEG signals Fig. 2 The international 10-20 system. The left image shows the left side of the head, and the right image presents the view from above the head [78] according to the evaluation algorithms, and to bring attention to directions of research in EEG channel selection for different applications. The performance of different methods, if available, is given in terms of classification/detection accuracy and probability of false alarm to produce clear and informative comparisons among the channel selection approaches. In addition, this survey may assist designers to choose the appropriate algorithms that suite intended applications. Furthermore, this work is expected to help newcomers to the field to determine the limitations associated with the available channel selection methods and to pave the road for the development of new channel selection designs.
The rest of this paper is organized as follows. Section 2 covers the selection techniques in general. Section 3 discusses channel selection for seizure detection/prediction. Section 4 is devoted for channel selection for motor imagery classification. Sections 5 and 6 cover the topics of emotion classification and mental task classification with channel selection strategies. Section 7 discusses channel selection for the task of sleep state analysis. Section 8 discusses the channel selection process for drug effects diagnosis. Finally, concluding remarks are given in Section 9.

Channel selection techniques
During the last decades, EEG-based processing has become a highly attractive research field. The large number of channel recordings due to the availability of low-cost interfaces led to the evolution of channel selection algorithms. The objectives of channel selection are manifold: improving model performance, providing faster processing and dimensionality reduction, and identifying brain area that generates class-event activity.
Based on the literature, feature selection algorithms were used for EEG channel selection [8][9][10]. In this section, we show how to adapt such techniques for channel selection. The main steps of channel selection are illustrated in Fig. 5 for a set of EEG channels. The subset generation step is a heuristic search process to present a candidate for evaluation based on a search strategy such as complete search, sequential search, or random search. In some applications, a trained specialist selects a subset of channels based on his experience. There are five main categories of candidate evaluation strategies, namely, filtering, wrapping, embedded, hybrid, and human-based techniques. These techniques are used for subset evaluation. The process of channel subset generation and evaluation is terminated when a stopping criterion is satisfied (search is completed or a threshold is reached). In the last step, the selected channel subset is validated via prior knowledge about the data. The evaluation techniques are discussed in the next subsections.

Filtering techniques
Filtering techniques use an independent evaluation criterion such as distance measure, information measure, dependency measure, and consistency measure to evaluate the candidate channel subsets, which are generated using a search algorithm. Filtering techniques have some advantages among which are the high speed, independence from the classifier, and scalability [10], but they suffer from the low accuracy, since they do not consider the combinations of different channels. Figure 6 shows a general flowchart for the filtering techniques. In this flowchart, S 0 represents the initial subset and S best represents the selected best subset of channels. Also, D(C 0 , ….., C n-1 ) represents a pool of n channels for selection, and M refers to an independent evaluation criterion. The γ represents the value of the evaluation criterion for each subset of channels. The "evaluate" function refers to an evaluation process.

Wrapper techniques
In case of wrapper techniques, a classification algorithm is used to evaluate the candidate channel subsets, which are generated by a search algorithm as shown in Fig. 7, in which A denotes a classifier, and γ best represents the best value of the evaluation criterion. The evaluation of every candidate is obtained by training and testing a  The cerebrum is subdivided into four lobes: frontal, parietal, occipital, and temporal lobe [78] specific classification algorithm [10]. Consequently, they are more computationally expensive than filtering techniques and they are prone to overfitting.

Embedded techniques
In the embedded techniques, the channels are selected based on criteria generated during the learning process of a specific classifier because the selection is included into the classifier construction [9]. Embedded techniques achieve an interaction between the channel selection and the classification. They are computationally less expensive and less prone to overfitting. They are based on recursive channel elimination to keep only channels with appreciated magnitude.

Hybrid techniques
A hybrid technique is a combination of a filtering technique and a wrapper technique attempting to take advantage of both in avoiding the pre-specification of a stopping criterion (see Fig. 8). Generally, hybrid techniques utilize both an independent measure and a mining algorithm for evaluation of the available channel subsets [10]. Two threshold values are evaluated: γ best corresponding to the case with a classifier and θ best corresponding to the case without a classifier. The independent measure is used to select the best subset for a given size Cr (cardinality), and then the mining algorithm is used to select the final best subset across cardinalities.

Human-based techniques
In some applications, a well-trained observer evaluates the outcome of a specific application, like seizure detection, on the selected channels with any of the abovementioned subset generation techniques based on his experience. Thus, the findings of the human-based techniques can be used in a feedback manner to refine the channel selection process.

Channel selection for seizure detection/ prediction
Epilepsy is well known as the second most prevalent brain disorder (after stroke) characterized with unexpected occurrence of seizures. The International League Against Epilepsy (ILAE) and the International Bureau for Epilepsy (IBE) presented a definition for the epileptic seizure as "a transit occurrence of signs and/or symptoms due to abnormal excessive or synchronous neuronal activity in the brain" [11]. Epilepsy affects around 1 % of the world population, and based on the seizure statistics of the Epilepsy Foundation of America (EFA), about 200,000 cases of epilepsy are diagnosed per year. The primary tool for diagnosis and management of epilepsy is through EEG signals.
In general, EEG recordings have different channels for signals acquired from different spots of the human brain. In certain applications, there is a need to select some of these channels for EEG seizure detection/prediction because the computational load required for a seizure detection/prediction algorithm increases as a function of the number of channels [12]. Reducing the number of channels is of an utmost importance, for example, in the development of portable medical support systems for epilepsy patients, as reducing algorithmic computational complexity will lead to faster real-time response and lower power consumption to maintain longer time for operation. In addition, the lower the number of channels is, the more convenience the patient would have and the lower the setup time required to fix gel-based EEG electrodes. Another factor that needs to be considered carefully in seizure detection/ prediction is the overfitting effect due to the utilization of a large number of redundant channels. Therefore, channel selection could be used to reduce the feature pattern size and lower the computational cost of feature extraction and classification. In what follows, we cover some of the channel selection techniques utilized for seizure detection/prediction and classify them according to the channel subset evaluation techniques given in Section 2.

Filtering techniques
For EEG seizure detection and prediction, signal statistics such as variance and entropy can be used for channel selection. This subsection presents four trends for channel selection for EEG seizure detection and prediction, with a common thread that they are all based on signal statistics. Duun-Henriksen et al. [12] investigated different channel selection schemes based on different statistical criteria as follows:

Selection based on variance
The variance of ictal data of all available channels is estimated with the equation: where x c , μ c, , and k are the seizure data, mean, and number of samples of the training seizure data of channel c, respectively. The three channels with maximum variance are selected for signal classification [12].

Selection based on difference in variance
The difference in variance is calculated as follows: where V nonict is the non-ictal (non-seizure) training data, which represents the background variance that would be deduced (variance of the normal state). The N input channels are selected based on the minimum difference of variance.

Selection based on entropy
The entropy of an EEG channel is a measure of uncertainty, where the EEG signal is considered as a random variable. The entropy of channel c is defined as: where p(x i ) is the probability mass function of the channel having n samples. The N channels with the highest entropy are chosen as input to the automatic seizure detector.
In the method developed by Duun-Henriksen et al. [12], after channel selection, a feature extraction process utilizing wavelet transform is performed on the selected channels. A support vector machine (SVM) was used as a classifier for seizure detection, with a non-linear radial basis kernel having a regularization term of 0.5 and a cost factor varying between 0.05 and 0.5. A recorded iEEG data with 59 seizures and 1419 h from 10 patients were used for training and testing. It was found that the best channel selection method is based upon maximum variance during seizure, which led to a seizure detection sensitivity of 96 % and false detection rate of 0.14/h using three channels. The work of Duun-Henriksen et al. [12] falls in the class of filtering techniques, as it used an independent measure for channel selection, which is the variance. The authors did not use a search strategy.
Faul [13] used another statistical measure, which is the probability of a seizure in a channel produced by the real-time EEG analysis for event detection (REACT) system of Temko et al. [14] as a tool for channel selection to change the behavior of the system aiming to reduce the computational effort. The SVM output in this system is treated as a probability, which is further passed through a sigmoid function. He used a waiting time for each channel to be incorporated in the channel selection process, and this time is computed according to the probability of seizure in that channel. He reported that the computational efforts can be reduced by up to 65 % with no effect on the seizure detection performance of the REACT system (96 % for neonatal and 94 % for adult databases). Faul and Marnane [15] developed statistics-based dynamic EEG channel selection methods to reduce the power consumption in seizure detection, which can be classified as filtering techniques. In their methods, a number predefined or primary screening channels (PSCs) are chosen in the following approaches depending on the probability provided by the REACT system, and then the channels are added or removed in the next epoch. The first approach is location spread based on some seed PSCs with their neighborhood channels. Channels are added or excluded based on probability of seizure. If the probability of seizure in any of the PSCs exceeds a threshold, the nearest channel to the PSC is added to the analysis and remains until its probability drops below a threshold. Their results show that with two channels {2,7}, the performance is 95.74 % with 66.76 % computational load saving, while with four channels {2,4,6,7}, the performance is 96.55 % with 43.11 % computational cost reduction.
The second approach is idling (single/twin) based on the two brain hemispheres. For idling, channels in the PSCs are analyzed sequentially in alternate epochs, and when the channel is activated, it is then analyzed continuously until deactivation. Both activation and deactivation are based on the probability of seizure. In twin idling, the PSCs in each hemisphere are idled, separately. The performance of the idling approach, without dynamic channel selection, is 91 and 91.48 % and the computational saving is 87.5 and 75 % for single and twin idle, respectively. There is an increase in computational saving with single idling of approximately 10 and 30 % with two PSC and four PSC configurations, respectively. Location spread with twin idling is another approach for the authors, but it did not achieve an appreciable gain in terms of performance over the single idling approach with approximately 10 % drop in computational cost. The results of these approaches have been compared with the all-channel (eight channels) REACT system [14] results leading to a feasibility of the suggested approaches by Faul and Marnane [15].
Atoufi et al. [16] investigated the prediction ability of neuro-fuzzy models in different states of EEG signals: normal, pre-ictal, and ictal. Although the main objective of this study was to improve the model prediction accuracy using information fusion, the main challenge was selecting the channels that should be used to construct the predictor. A selection algorithm was used to select the channels with the largest amount of information about the target (channel whose signal is to be predicted) but with the least information about each other. This technique can be classified as a filtering technique with a greedy sequential search strategy because it depends on independent channel evaluation criteria. An information theoretic criterion (mutual information (MI)) was used to select a group of channels for multi-channel prediction. The MI of two random variables X and Y is defined as [16]: where H(X) and H(Y) are the entropies of the variables X and Y and H(X|Y) is the conditional entropy. H(X;Y) is the joint entropy of the two variables X and Y. The authors evaluated their method on two patients' data, of Freiburg database, which is a publicly available intracranial EEG database containing six channels (three focal and three extrafocal) ECoG recordings of 21 patients with a 256 Hz sampling rate [15]. This method achieved 60.6 and 60 % success rates in three-channel cases of ECoG and EEG datasets, respectively. The authors reported that the prediction accuracy using the selection algorithm with multi-channels has been improved with a noticeable improvement in pre-state detection over the single channel.

Wrapper techniques
In this subsection, we try to bring together the channel selection techniques for EEG seizure detection and prediction that can be classified as wrapper techniques and show why they fall into this category. Shih et al. [17] presented a machine learning-based approach to construct detectors that use fewer channels for seizure onset detection. For selecting channels, the authors used an instance of the wrapper approach, which is a feature selection algorithm with backward elimination. This approach reduced the average number of channels required to detect the seizure onset from 18 to 4.6, while the mean fraction of seizure detection decreases from 99 to 97 %. In addition, the average number of false events per hour decreased from 0.35 to 0.19. An increase of average detection latency from 7.8 to 11.2 s with average of 69 % of energy saving was achieved. When this approach is combined with a patientspecific screening detector, an additional energy saving of 16 % was achieved. Those results were compared to the 18-channel [18] results revealing the feasibility of the channel reduction with this approach.
Glassman and Guttag [19] presented another method that uses recursive feature elimination to design patientspecific SVM detectors that use small numbers of electrodes. The recursive feature elimination uses the SVM to rank the contributions of each selected channel. They used a leave-one-out cross validation principle to estimate the performance of the detectors as illustrated in Table 1. The main idea of the process is to find the smallest number of channels n, such that the average cross validation performance of detectors built using n channels is at least as good as the average cross validation for the 21-channel detector.
The Detectorsetup (C, S) is a function to build an SVM detector using the channels in C. It uses S seizures as a training set and REF(n, S) as a recursive feature elimination function used to find the best n channels when training on S. The function Modify(Subset, d, s) is a function used to calculate the performance of the detector d when used on the file S and update the average performance. This method is based on a wrapper algorithm with a sequential search strategy. The authors evaluated their method using 21-channel scalp EEG recordings of 10 patients. The method achieved 7.1 average channels, 0.011 average false negative, 0.48 average false positive, and 9.54 s average latency time.
Statistical metrics have also been used for wrapper channel selection techniques. Mirowski et al. [20] presented a method based on computing bivariate features of EEG synchronization. These features are cross correlation, non-linear independence, dynamic entrainment, and wavelet synchrony. They computed the features on 21 patients from Frieburg dataset [21]. They used time aggregation for features before classification. Chang et al. [22] proposed a channel selection method to reduce the feature pattern size produced from Mirowski et al. work [20] for seizure prediction. Their work can be classified as a wrapper technique with a pre-specified subset of channels. Their method requires computation of features from pairs of channels of the available EEG signal. They investigated the performances of using all channel pair combinations with the number of channels from two to six in the case of electrocorticography (ECoG) dataset and 75 combinations of fixed channel pairs in case of EEG dataset. This method aims to reduce the computed wavelet coherence (localized correlation coefficient in time-frequency space) values for a given channel pair over non-overlapping 5 s and frequency bands after decomposing the channels into sub-bands. The features are aggregated into patterns. Then, the SVM is used to classify the patterns into pre-ictal and inter-ictal states as shown in Fig. 9. The authors investigated the performance of all channel pair combinations in the ECoG database and 75 combinations of fixed channel pairs in the EEG database. They evaluated their method using three datasets: Freiburg database [21], CHB-MIT database (6 patients: 1, 3, 6, 7, 10, 22) which is a scalp EEG database with a 256 Hz sampling rate and more than 22 channels for most of them [23], and National Taiwan University Hospital database (one patient) which is also a scalp EEG database with 200 Hz sampling rate and 18 channels [24]. The method achieved 60.6 and 60 % success rate in threechannel cases of ECoG and EEG datasets, respectively. Also, the method achieved more than 93.73 % of computational saving compared to the full 22-channel case.
Greene et al. [25] developed another non-patientspecific statistical method for automated neonatal channel selection and seizure detection based on a regularized discriminant classifier. This method can be classified as a wrapper method with a pre-specified subset channel selection scheme. Seven features were extracted from non-overlapping 8 s, which are spectral entropy, Shannon entropy, spectral edge frequency, non-linear energy, line length, wavelet energy, and root mean square (RMS) amplitude. They compared the effectiveness of their method on a single channel with the training performed on multi-channel EEG. The authors evaluated their method on 17 recordings from 17 neonates with a 251.9 total number of hours and 411 seizures with a 256 Hz sampling rate. Each recording contained 7-11 EEG channels and 1 ECG channel. They examined the performance of nine single channels, which are C4-C3, C3-T3, C4-T4, F3-C3, F4-C4, Cz-C4, Cz-C3, C4-02, and C3-01. Channel C3-C4 gave the best seizure detection performance, when compared to other single channels. It achieved a 90.77 % correct detection and 9.43 % false detection rate, respectively, while the multi-channels achieved an 81.03 % detection rate and a 3.82 % false detection rate.
Another statistical approach was presented by Temko et al. [26]. They presented an online neonate seizure detection framework based on EEG channel weighting and moving average filtering as illustrated in Fig. 10. The authors computed the channel weights on the fly using patient specific history and clinically derived Table 1 Using cross validation to select channels for a patient [19] //"Full montage average performance evaluation" prior channel importance. The moving average filtering is used to smooth the SVM output which is interpreted as probabilities. The dataset used consists of 17 newborns EEG recordings obtained from Cork University Maternity Hospital database, Ireland [27]. A total of 36 channels were used for recording the data. Eight channels were selected for further processing using channel weighting. From each channel, 55 features were extracted which are then fed to an SVM classifier. The output of the classifier was then smoothed with a moving average filter and converted to probability-like values using a Bayesian probabilistic framework. These values were then compared with a threshold in the interval [0,1], and based on this comparison, decisions about the presence of seizures per channel were taken. Area under the precision-recall (PR) curve [28] was used as a metric in this work. The authors have shown that with their proposed channel weighting technique, the PR area has increased up to 25 % with the average increase from 81.0 to 84.42 %. Furthermore, it was reported that the performance of the channel weighting algorithm was proportional to the subject observation time.

Human-based techniques
In this subsection, we explore two human-based techniques for channel selection in EEG signal detection and prediction. Zimbric et al. [29] compared a collection of 3 channels with a collection of 21 channels for the detection of neonatal seizures and quantification of seizure burden. Tracking were analyzed in the three-channel montage for seizure number, duration, and quantification of seizure burden before reanalysis with the full 21-channel neonatal minimal placement montage. Seizures were identified using standard definitions of EEG seizure. Analysis of the results was performed by two independent readers, and hence, this method can be classified as a human-based technique with a pre-specified subset of channels. Evaluation metrics such as sensitivity, specificity, and reliability were calculated. They evaluated their method using 35 EEG recordings from 28 infants with a total of 1389 min. The sensitivity and specificity of three-channel montage for detecting seizures >10 s were 91 and 100 % for reader 1 and 82 and 96 % for reader 2, respectively.
Tekgual et al. [30] presented a comparison study of reduced electrode montage (9 electrodes) with full 10/ 20 electrode montage (19 electrodes) considering detection and characterization of neonatal seizures and background EEG characteristics. Three independent readers reviewed EEG recordings for number, duration, and topography of seizures and background features. Hence, we can consider this approach as a humanbased approach with a pre-specified subset channel selection. The reviewers started with reduced montage and then the full montage. A total of 151 EEG recordings from 139 infants, obtained from Bio-logic System of the Clinical Neurophysiology Laboratory, Children's Hospital, Boston [30], were reviewed by the reader on both montages. The sensitivity and specificity of the reduced montage for seizure detection were 96.8 and 100 %, respectively. Fig. 10 Steps of the Temko et al. [26] method for seizure detection with channel selection Fig. 9 Seizure prediction algorithm of Chang et al. [13] 4 Channel selection for motor imagery classification Motor imagery is a mental task in which the subject imagines that he is doing an action. Motor imagery classification is very important for certain patients. This task can be performed with EEG signals and may require channel selection to choose the most related channels to the cortical activity patterns and to reduce the computation time as well.

Filtering techniques
The common thread between the channel selection techniques adopted for motor imagery classification and categorized as filtering techniques is that they are based on EEG signal statistics. He et al. [31] presented a statistical channel selection method for classifying motor imagery. This method used Bhattacharyya bound of common spatial patterns [32] as an optimal index and fast forward search to find the optimal combination of channels. It is a filtering method with sequential search strategy for subset channel selection. Then, it uses Bayes algorithm [33] as a classifier. The authors utilized four EEG recordings for subjects a, b, d, and e of dataset 1 of BCI competition IV to evaluate the performance of their method, each of which contains 200 trials. A total of 59 channels were used for the recordings. They reported that the classification accuracies obtained by their method, which is~95 % with 33 average number of channels, is higher than those obtained by all channels, but still their solution is a suboptimal solution.
Another statistical method was presented by Tam et al. [34], who proposed a channel selection method for motor imagery classification based on the sorting of common spatial pattern (CSP) filter coefficients. Their method, called CSP-rank, is based on a filtering approach with a sequential search strategy for subset channel selection. It uses two CSP filters for two classes corresponding to motor imagery and immobilization. It firstly sorts the absolute value of the filter coefficients in each filter and then selects the electrode with the next largest coefficient in turn from the two spatial filters. They utilized 64-channel EEG recordings from five chronic stroke patients through 20 sessions and each session consisted of 80 trials. They compared CSP-rank with support vector machine recursive feature elimination (SVM-RFE) [35,36] and random selection. CSP-rank was able to maintain an average classification accuracy rate above 90 % for 8-38 electrodes. It obtained the highest average classification accuracy rate of 91.7 % with 22 electrodes. The SVM-RFE maintained average classification accuracy rate above 90 % for 12-28 electrodes and achieved the highest average classification accuracy rate of 90.7 % with 14 electrodes. Random selection maintained an average classification accuracy rate above 85 % for 10-50 electrodes and obtained the highest average classification accuracy rate of 89.6 % with 32 electrodes.
Yong et al. [37] presented another statistical channel selection method for classifying two motor imageries based on introducing l 1 norm regularization term in the CSP algorithm which supports sparsity in the weights of the spatial filter. This method adopts a filtering approach with a pre-specified subset channel selection strategy based on experience. The EEG data used for evaluating this method was recorded from five subjects (aa, av, al, aw, ay) using 118 channels and a 1 kHz sampling rate, provided by Fraunhofer FIRST (Intelligent Data Analysis Group) and University of Medicine Berlin (Neurophysics Group) with two classes: right hand and right foot motor imageries [38]. Each class consisted of 140 trials. The method was able to reduce the number of channels on average to 13 electrodes (of 118 electrodes), while the average classification accuracy rate dropped from 77.3 to 73.5 % only. The value of the regularization parameter is subject-specific and was selected manually. Therefore, it needs to be chosen automatically to produce reasonable results.
Meng et al. [39] presented an automated channel selection method based on CSP in BCI systems. The CSP algorithm is commonly used to derive spatial filters for the multi-channel EEG signals. However, it is known that the performance of the CSP degrades due to the overfitting problem, when the number of channels is large. Therefore, to reduce the number of channels, the authors used a heuristic algorithm, namely, l 1 norm, to select the most useful channels, and then extract the features from the selected channels using the CSP. They initially applied the CSP to the datasets and then scored the channels based on their l 1 norm. The channels with the highest scores were selected for further processing, while the others were excluded. Using the CSP, features are extracted only from the selected channels and are forwarded to the classifier. This algorithm adopted a filtering approach. It was evaluated on datasets provided by Fraunhofer FIRST (Intelligent Data Analysis Group) and University of Medicine Berlin (Neurophysics Group). A total of 118 electrodes were placed on the scalp to record the data with a sampling rate of 1 KHz for five subjects [38]. The algorithm was compared with a commonly used γ 2 algorithm for channel selection [40]. Wang et al. [41] proposed a channel reduction method in motor imagery, in which the prominent channels in this method were selected using the maximum of spatial pattern vectors obtained with the CSP algorithm. Event-related desynchronization (ERD) and readiness potential (RP) of the selected EEG channels were used as features. Using these features, EEG signals were classified using a Fisher discriminant (FD) classifier [42][43][44]. This method was evaluated on a datasets provided by Fraunhofer FIRST (Intelligent Data Analysis Group) and University Medicine Berlin (Neurophysics Group) [38]. It was shown that the classification accuracies with four channels were 93.45 and 91.88 % for two subjects. Increasing the channels to eight increased the classification accuracies to 96.68 and 93.25 %, respectively, at the expense of decreasing the convenience of the system user.
The effect of increasing the number of channels on the classification accuracy of EEG-based motor imagery has been presented by Shan et al. [45]. They adopted a filtering approach with a sequential search strategy for subset channel selection. They used two different datasets, an imagery-based cursor movement control dataset and a motor imagery tasks dataset for comparison. In the first dataset, 64 channels were used for recording the data with a sampling rate of 200 Hz. Similarly, in the second dataset, 59 channels were used for recording data with a sampling rate of 100 Hz. A modified timefrequency-spatial synthesized method was used for right and left motor imagery classification. It was observed that increasing the number of channels increases the classification accuracy in the first dataset, while it is not the case in the second dataset in which the optimum accuracy is achieved at a subset of channels. Increasing the number of channels from two to all in the first dataset increased the training and testing classification accuracies from 68.7 to 90.4 and 63.7 to 87.7 %, while in the second dataset, it was observed that the classification accuracy increased till 16 channels and then significantly decreased from 81.3 to 68.9 % for all channels. It was concluded that the performance of online BCI systems increases by increasing the number of channels in contrast to the offline motor imagery tasks paradigm.
EEG patterns in a BCI system vary from the first session to subsequent sessions on other days due to several subjects' preconditions. Therefore, there is a need for a robust and stable channel selection algorithm across different sessions. Arvaneh et al. [46] presented a robust channel selection approach across sessions in BCI system involving stroke patients adopting a filtering approach with prespecified subset channel selection based on experience. They proposed a robust sparse common spatial pattern (RSCSP) algorithm for optimal EEG channel selection across different sessions, where the estimates of the covariance matrices of EEG measurements are replaced with the robust minimum covariance determinant (MCD) estimates. The stability and robustness of this algorithm were evaluated by comparison with five existing channel selection algorithms across 12 different sessions of motor imagery-based datasets from 11 stroke patients. A total of 27 channels were used for recording the data with a sampling rate of 250 Hz. Eight channels were selected using the RSCSP algorithm from the first session and were evaluated on the 11 subsequent sessions. The results showed that the RSCSP algorithm outperformed other algorithms like SCSP, CSP, MI, Fisher criterion (FC), and SVM by an average accuracy of 0.88, 2.85, 2.69, 4.85, and 4.58 %, respectively.
He et al. [47] presented a Rayleigh coefficient (RC) maximization-based genetic algorithm (GA) for channel selection in motor-imagery BCI system adopting a filtering approach with a random search strategy for subset channel selection. This algorithm uses the CSP to diagonalize the covariance matrices and maximize the difference of variances of two classes. On the other hand, RC maximization is performed not only for maximizing the difference of covariance of two classes but also for minimizing the sum of these two covariance matrices. Hence, the RC features can be more discriminating than CSP. However, like CSP, the performance of the RC maximization is deteriorated with the redundant electrode channels. Therefore, the authors proposed an improved GA for channel selection based on RC maximization. Using this algorithm, Fisher ratios for every single channel were calculated and ranked in descending order. The first selection of the subset of channels was made through the maximum Fisher ratios of the channels. An improved GA based on RC maximization was then applied on the selected channels to get the optimum subset of channels. This algorithm was evaluated on two datasets. In the first dataset, 118 channels were used to record the data with a sampling rate of 100 Hz, and in the second dataset, 32 channels were used for recording the data with a sampling date of 250 Hz. It was observed that the RC-GA achieved high classification accuracy with lower computational cost. The average accuracies are 88.2 and 89.38 % for the first and second datasets, respectively. The performance of this algorithm was also compared with other channel selection algorithms like SVM-GA, Sequential Forward Search (SFS), and Sequential Backward Search (SBS) algorithms. It was shown that RC-GA provided more compact selected channels, while acquiring higher classification accuracy than the other mentioned algorithms.
Arvaneh et al. [48] proposed an SCSP algorithm for subject-dependent channel selection in BCI systems adopting a filtering approach with a pre-specified subset channel selection scheme. They formulated the SCSP algorithm as an optimization problem to select the minimum number of channels within a constraint of classification accuracy. The CSP is usually used to derive spatial filters for the multi-channel EEG signals. However, the weights of the CSP are very dense. The CSP algorithm is sparsified by adding a l p norm, 0 < p < 1, regularization term into the optimization problem. The performance of this algorithm was evaluated using two datasets, Dataset IIa [49] from BCI competition IV with 22 channels (four motor imagery tasks; left hand, right hand, feet, or tongue) and Dataset IVa [50] from BCI competition III with 118 channels (two motor imagery tasks; right hand and foot) recorded from 14 subjects. It was shown that the SCSP algorithm achieved the best classification accuracy by reducing the number of channels and an improvement of 10 % in classification accuracy compared to the three channels case (C3, C4, and Cz). The average accuracy rates for SCSP1 (maximizing the accuracy by removing noisy and irrelevant channels) were 81.63 and 82.28 % for Dataset IIa and Dataset IVa with average number of channels 13.22 and 22.6, respectively. The average accuracy rates for SCSP2 (minimizing the selected channel(s) while maintaining the accuracy comparable to all channels' accuracy) were 79.07% with 8.55 average number of channels and 79.28% with 7.6 average number of channels for the first and second datasets, respectively. It is also shown that this algorithm outperforms other existing channel selection algorithms based on Fisher criterion, mutual information, SVM, CSP, and RCSP, in classification accuracy.

Wrapper techniques
Some of the adopted channel selection techniques for motor imagery classification are categorized as wrapper techniques. Yang et al. [51] presented a subject-specific channel selection method based on criteria derived from Fisher's discriminant analysis to measure the discrimination power of time-domain parameter (TDP) features extracted from different channels and different time segments for classification of two motor imagery tasks, right hand and right foot. This method adopts a wrapper approach with pre-specified subset channel selection depending on experience. The authors utilized the dataset IVa from BCI competition III [52], which consists of EEG recordings from five subjects using 118 electrodes. The subjects performed 280 trials of cue-drive motor imagery (right hand, 140 trails; right foot, 140 trails) and each trial lasted for 3.5 s. This method reduced the number of channels from 118 to no more than 11 channels without a significant decrease in the accuracy rate (78 % mean accuracy rate).
Wei and Wang [53] presented a method for channel selection during the classification of motor imagery of left hand, right hand, and foot based on a binary multiobjective particle swarm optimization algorithm. This method adopted a wrapper approach with a random search strategy for subset channel selection. It extended the particle swarm optimization algorithm shown in Fig. 11 to handle two objectives: minimizing the number of selected channels and maximizing the sum of three mutual information metrics. The classification accuracy rate was calculated with three different classifiers: KNN, SVM, and back-propagation (BP) network. This method utilized EEG recordings from five health subjects from 22 channels with 256 Hz sampling rate. The experiment consisted of six runs separated by 5 min break, and each run included 60 trails (120 trails for each class). The results showed that the highest accuracy rate was around 91 % with nine channels in subject two, while the accuracy rate with all channels was around 92 %. Similarly, the data set of subject 3 showed the lowest accuracy of around 75 % with 14 channels and around 76 % with all channels.
Zhou and Yedida [54] presented a method for the reduction of the number of channels for the task of classifying mental states for shoulder and elbow movement intentions for healthy and stroke patients. Their method is based on combining the support vector channel selection with a weighted time-frequency synthesis classification algorithm [54]. It is classified as a wrapper method with a sequential search strategy for subset channel selection. The authors evaluated their method using EEG recordings from two able-bodied (healthy) and one stroke subjects. A total of 131 channels were used, and the sampling rate was 256 Hz. This method was able to achieve higher than 90 % classification accuracy rate for Fig. 11 General particle swarm optimization flowchart [79] the healthy subjects when the total number of channels N was 20 < N < 131 and 50 < N < 131, for the first and second subjects, respectively. On the other hand, the classification accuracy rate was below 85 % for the subject with stroke.
Kamrunnahar et al. [55] presented a systematic optimization algorithm for the optimization of the number and locations of electrodes in BCI systems adopting a wrapper approach with a complete search strategy for subset channel selection. A human scalp EEG data were recorded in response to cue-based motor imagery tasks. A total of 19 channels were used for recording the data with a sampling rate of 256 Hz that was passed through a band-pass Butterworth filter with cutoff frequencies of 0.5 and 60 Hz. To increase the spatial resolution of the recorded data and decrease its dependence on the reference location, the authors used two techniques: Laplacian derivation [56][57][58][59] and common average reference (CAR) [55]. They used a model-based autoregressive technique to extract the features. For selecting the optimum number of channels, all possible combinations of channels were calculated. Task discrimination errors were calculated using linear discriminant analysis (LDA) [44] for each combination. The channel combination with the lowest discrimination error was selected as the optimal selection for a specific subject. The average classification errors for subject one were~21.75 with four channels and~28.28 with three channels for tasks one and two, respectively. The performance of this algorithm was evaluated by comparison with another feature selection algorithm, namely, forward stepwise feature selection [60,61].
Yang et al. [62] presented an artificial neural network and genetic algorithm approach for channel selection and classification of EEG signals in BCI systems adopting a wrapper approach with a random search strategy for subset channel selection. Conventional ANN-based approaches have problems of the lack of explicit input optimization, and their learning results are not easily understood. Therefore, the authors proposed a generic neural mathematic method (GNMM) for EEG channel selection and classification problems, aiming to focus on the issues above. The GNMM consists of three steps [63,64]: channel selection based on the GA, pattern classification using multi-layer perceptron (MLP), and rule-extraction based on mathematical programming. The channel appearance percentage was used in the GA to optimize the input channel selection. After channel selection, the MLP was used for pattern classification, and finally, regression rules were extracted so that training results can be easily implemented. This technique was evaluated on two datasets. The first dataset contains ECoG signals recorded using an 8 × 8 electrode grid in touch with the brain at a sampling rate of 1000 Hz, and the subject had to imagine movements of either the little figure or the tongue. In the second dataset, 32 channels were used for recording EEG signals with a sampling rate of 256 Hz, where the participants had to execute left-hand or right-hand button press. Using the GNMM proposed by the authors, 10 channels were selected in the first dataset which achieved a classification accuracy of about 80 %. For the second dataset, six channels were selected with which they achieved a classification accuracy of 86 %.

Embedded techniques
Lal et al. [40] adopted feature selection algorithms, recursive feature elimination (RFE), and zero-norm optimization based on the training of SVMs for channel selection and demonstrated the usefulness of these operations on motor imagery classification. Their work adopted an embedded approach with a sequential search strategy for subset channel selection. The authors evaluated their method utilizing 39 EEG channel recordings from five subjects (A, B, C, D, and E). A band-pass filter with cutoff frequencies 0.1 and 40 Hz was applied, and the sampling rate was 256 Hz. With every subject, they recorded 400 trials, and each trial lasted for 9 s tasking every subject to imagine left versus right hand movements during each trial. They found that the RFE and zero-norm optimization are capable of reducing the number of channels without increasing the error. The average error rate for 17 channels (located over or close to the motor cortex) over the five subjects using RFE was reported to be 23 %, while the average error rate using 12 channels was 24 %.
Schroder et al. [65] presented a robust EEG channel selection algorithm across subjects in BCI systems adopting an embedded approach with a sequential search strategy for subset channel selection. They tried to investigate whether channels selected for one subject are useful to the others as well. Data were recorded from eight male subjects using 17 EEG channels with a sampling rate of 256 Hz. For each subject, Welch's method [66] was used to extract the features, which were then fed to the linear SVMs for classification. The authors used a recursive channel elimination (RCE) method for the cross subjects channel selection. Using the RCE, the importance of the channel is determined by its influence on the margin of a trained SVM. After applying the RCE on the datasets from different subjects for cross channel selection, it was observed that it cannot only be used successfully for channel selection in individual subjects but also proved helpful in channel selection across different subjects with low error rates. The average error rate is 26.9 % with more than 32 channels.

Hybrid techniques
Li et al. [67] proposed a method for selecting suitable channels for classification of two motor imagery tasks: right hand and right foot based on a common spatial pattern algorithm as shown in Fig. 12. This method is based on a wrapper approach with a complete search strategy for subset channel selection. The l 1 norms of common spatial pattern features are used to compute the contribution scores D i foot score and D i hand score for the ith channel. The channels with larger values of D i foot score and D i hand score are used to obtain channel rankings A and B, where A is the channel ranking under right-hand motor imagery task and B is the channel ranking under foot imagery task. The first m ≤ 59 channels are selected from A and removed from B to obtain channel ranking C. After that, the first n (n ≤ 118 − m) channels of C are selected. The channel combination is obtained by combining both m and n. Finally, the optimal combination of channels is selected by comparing the classification accuracy rates using an SVM with all combinations. The authors evaluated their method on the datasets of two subjects: "aa" and "a1" from the dataset IVa from BCI competition III using 118 electrodes [52].
For the dataset "aa," the highest classification accuracy rate was 92.34 % using seven channels, while the classification accuracy rate with all channels was 90.54 %, respectively. Similarly, for the data set "a1," the highest classification rate was 94.63 % using eight channels, while it was 90.82 % with all channels.

Channel selection for emotion classification
Human emotions are thought to be discrete in nature with distinguishable EEG signals. The process of emotion classification based on EEG signals may require some sort of channel selection to save computation time. In addition, there is a certain area in the brain that is concerned with emotions, which makes channels from other areas unrelated to emotion classification. Channel selection approaches adopted for emotion classification can be categorized to filtering and wrapper techniques.
Rizon et al. [68] proposed an asymmetric ratio (AR) (asymmetric variance ratio and amplitude asymmetric ratio) based channel selection method for human emotion recognition from EEG signals as illustrated in Figs. 13 and 14 and Table 2. The ratio of variances between hemisphere channels was used as an indicator for assessing the region of the brain and the channels associated with emotion detection. The spectral power ratios between hemisphere channels are used to precisely estimate the electrical activity. The asymmetric variance ratio (AVR) is defined as [68]: where V(i) is the variance of left hemisphere channel, V(j) is the variance of the right hemisphere channel, i = 0, 1, 2…….N, j = 0, 1, 2…..N, and N is the number of homogeneously distributed electrodes on left and right hemispheres. The amplitude asymmetric ratio (AAR) is given by [68]: where P(i) is the spectral power of left hemisphere channel, P(j) is the spectral power of right hemisphere channel, i = 0, 1, 2…….N, j = 0, 1, 2…..N, and N is the number of electrodes on left and right hemispheres. The method of Rizon's et al. is a filtering approach with a pre-specified subset of channels selected by a Fig. 12 Flowchart of Li et al. algorithm [67] human expert. It was evaluated using 63 channel EEG recordings (28 pairs, seven center electrodes) from five healthy subjects with a 256 Hz sampling rate and a band-pass filter between 0.05 and 70 Hz with five different classes of emotions (disgust, happy, surprise, sad, and fear). In this method, features are extracted from the wavelet domain using Daubechies 4 (db4) wavelet transform. The results show that this method reduced the 28 pairs of channels to 2. For validating the method, the authors employed a fuzzy C-Means clustering algorithm to classify the emotions [68]. Its results support their findings.
Jatupaiboon et al. [69] proposed a method to classify two emotions based on EEG signals, which are positive and negative emotions elicited by pictures. They extracted the power spectrum from five bands and used SVM as a classifier in a wrapper channel selection evaluation approach. In their experiment, they used a manual approach for reducing the number of channels. They utilized EEG recordings from 11 participants, whom have been shown 100 pictures (50 positives and 50 negatives) from Geneva Affective Picture Database (GAPED) [70]. The authors used EMOTIVE headset with 14 channels [71] for recording with 4-s epochs and 50 % overlapping. They achieved an 85.41 % accuracy rate with seven pairs (14 channels: full) and 84.18 % accuracy rate with five pairs, respectively. The authors found also that frontal pairs of channels and high-frequency bands give higher accuracy than other pairs of channels and lower frequency bands.

Channel selection for mental task classification
Mental task classification is a new and challenging trend in EEG signal processing. The main objective of this classification process is enabling a patient to communicate with the outside world without physical movement. This classification process may require channel selection as a pre-processing step to reduce the computation time.

Filtering techniques
Lan et al. [72] presented an ambulatory cognitive state classification system to assess the user's mental load based on EEG measurements. Their work focused particularly on dimensionality reduction (channel selection and feature projection) utilizing mutual information techniques as shown in Fig. 15. This work is based on a filtering approach with a sequential search strategy for subset channel selection. In order to select an effective subset of the available channels after pre-processing by artifact removal and band-pass filtering, the authors used a forward incremental method. Three classifiers, Gaussian mixtures model (GMM), K-nearest neighbor (KNN), and Parzen, were used to classify the feature vector based on majority voting in a fusion filtering process. The authors used 32 channels of EEG recordings from three subjects performing two mental tasks (n-back, Larson) at two difficulty levels (low, high) with 256 Hz sampling rate to evaluate the method. The nback task is a continuous performance task used to measure a part of working memory. The Larson task requires the subjects to maintain a mental count according to the presented configuration of images on the monitor. For performance evaluation, the data was divided into five sets and each set was saved for testing and the other four were used for training. The average classification accuracy was around 80 % for all subjects

Wrapper techniques
Chai et al. [73] presented a method for EEG mental task classification using a genetic algorithm-based neural network classifier. The method of Chai et al. is a wrapper-based method with a pre-specified subset of channels. They used six non-imagery tasks which are arithmetic (math) by imagining and solving simple multiplication, letter composing by mentally composing simple words, Rubik's cube rolling by imaging a Rubik's cube rolled forward, visual counting by mentally counting numbers from one to nine, ringtone by imagining a familiar mobile ringtone, and spatial navigation by moving around and scanning the surroundings in a familiar location in mind. Two methods of feature extraction were used and compared: power spectral density (PSD) and Hilbert Huang transform (HHT). For recording, they used a monopolar EEG system from Compumedic company with 256 Hz sampling rate. Five participants were involved in this experiment with 10 sessions recording for each mental task. The accuracy rate for classifying three mental tasks using the original eight channels is between 76 and 85 % using PSD feature extractor. In case of two channels with PSD feature extractor, the accuracy rate was between 65 and 79 %, and with HHT feature extractor, the accuracy rate was between 70 and 84 %.
Tavakolian et al. [74] presented a channel reduction method for classifying three mental tasks (baseline, multiplication, and geometric figure rotation) based on genetic algorithms for subset generation as shown in Fig. 16. This method is based on the wrapper approach with random search strategy for subset channel selection. They used a feed-forward neural network as a classifier, and its outputs were averaged and considered as the performance function of the genetic algorithm. The genetic algorithm was used to find the best six channel combinations of 19 channels. The method was evaluated using 19 channel EEG recordings from five subjects with a 250 Hz sampling rate. In each session, every task was repeated two times and each time lasted for 10 s. The results showed that the classification accuracy rates were 100, 99.6, 96.66, and 88 % for subjects 1, 3, 5, 2, and 4, respectively.

Channel selection for sleep state classification
Sleep state classification is very important for infants as well as adults. This classification process can be performed with EEG signals. This field has been the subject of interest of several neuroscience researchers. It requires also some sort of channel selection to obtain Table 2 Rizon's channel selection algorithm for emotion classification [68] 1. The raw EEG signals from five subjects over five discrete emotions are collected using 63 electrodes which are placed through standard International 10/20 system on the scalp.
2. The signals are pre-processed by mean removal and variance normalization.
3. The signals are filtered using 5 th order band-pass filter at a cutoff frequency of 0.05 Hz -45 Hz.
4. The signals are divided into five different EEG frequency bands using 5 th order Butterworth filter.
5. Alpha band is used for channel selection.
6. From the 63-channel EEG signals, only 28 homogeneous channelpairs are separated out for calculating the asymmetric ratio for channel selection.
7. For each subject, the values of AVR and AAR are calculated for all the 28 homogeneous pairs of electrodes using the equations (6) and (7).
8. If all values of AR are positive or negative for five emotions this leads to a rank of 5, for four emotions this leads to a rank of 4, etc.
9. The channel pairs of higher ranks of AAR and APR are sorted as the dominant channels for emotion recognition.  [72] robust classification results because at least one EEG channel combined with one electromyography (EMG) and one electrooculography (EOG) are required for manual scoring. Piryatinska et al. [75] presented a channel selection method for neonate EEG sleep state classification using a multivariate analysis approach adopting filtering with a complete search strategy for subset channel selection. It has two main stages: scoring of sleep stages based on each combination of EEG channels and selection of the optimal channel combination. The latter consists of three steps: producing two rankings, one for the full term and one for the preterm neonates, of the channel combinations; selecting the channels that appear most often in the top channel combinations; and validating these selections with a cross validation methodology. They used EEG sleep signals from 36 neonates (20 full term and 16 preterm) recorded from 14 channels at a sampling rate 64 Hz. This method achieved 87.20 % a mean agreement percentage (MAP) with five channels (compared to physician's scores) and 87.41 % MPA with four channels for full term and preterm, respectively.

Channel selection for drug effect classification
Ong et al. [76] presented a channel selection algorithm for visual evoked potentials (VEP) based on principle component analysis (PCA). The VEP is a small electrical potential originating from the brain in response to a visual stimulus. This algorithm was used to classify alcoholic and non-alcoholic subjects. PCA transforms the dataset into a new set of variables called principle components to be ranked from the highest to the lowest bases. The first few principle components usually contain most of the variation present in the original dataset. The performance of this algorithm was evaluated on a VEP dataset recorded from 20 subjects: 10 are alcoholic and 10 are non-alcoholic. A total of 61 channels (variables) were used with a signal being sampled at 256 Hz. The authors selected 16 optimal channels, since they contributed 98.563 % of the total variance. Gamma band powers of the selected channels were used as features for classification. An MLP neural network was used as a classifier to classify the alcoholic and non-alcoholic subjects. The gamma band power was used as an input feature to the neural network. It was concluded that the classification performance using all the channels was 95.83 % and using 16 channels was 94.06 %, which are very close.

Conclusion and future research directions
This paper explored some EEG channel selection techniques for different applications taking into consideration the different criteria developed in the literature for channel selection evaluation and search strategy. The paper introduced the basic notations and procedures of the channel selection process. It presents a description of channel selection approaches for a variety of applications. The comprehensive study in this paper has revealed that it is possible, without much loss in the performance of the classification/detection tasks, to make use of a small set of EEG channels ranging from 10 to 30 % of the available channels. This will in turn reduce the processing complexity with less setup time and maintain the subject's convenience by having less electrodes. In some applications, such as sleep state classification, there are dominant channels responsible for the activity of concern and need to be determined. In some other applications, such as seizure detection and prediction, the use of all channels may lead to an overfitting effect during the classification process. Table 3 summarizes the channel selection techniques surveyed in Sections 3-8 as contrasted to each application. These techniques have been tested using different databases. Therefore, an extensive study is needed to determine the channel selection technique that gives the highest performance score, when all techniques belonging to a specific application are applied to a unified database. Another important issue is to investigate channel selection techniques for emerging applications based on visual and auditory-evoked potential [77]. Finally, it is observed that channel selection algorithms are in general based on features extracted from the EEG signals. Finding features well representing all EEG signal states is still a challenging task that needs further research. It is observed from this study that channel selection has been investigated intensively in motor imagery classification with a variety of techniques. So, extending these techniques to other applications will be useful. For wrapper, hybrid, and embedded channel selection techniques, the performance Fig. 16 Tavakolian et al. method [74] sensitivity should be studied with different types of classifiers. With channel selection, we may still work on a multi-channel basis, so the development of a framework containing channel selection and decision fusion is an open area for further investigation.