Brain Topogr DOI 10.1007/s10548-015-0462-2
BRIEF COMMUNICATION
A Novel Approach Based on Data Redundancy for Feature Extraction of EEG Signals Hafeez Ullah Amin1 • Aamir Saeed Malik1 • Nidal Kamel1 • Muhammad Hussain2
Received: 27 March 2015 / Accepted: 7 November 2015 Springer Science+Business Media New York 2015
Abstract Feature extraction and classification for electroencephalogram (EEG) in medical applications is a challenging task. The EEG signals produce a huge amount of redundant data or repeating information. This redundancy causes potential hurdles in EEG analysis. Hence, we propose to use this redundant information of EEG as a feature to discriminate and classify different EEG datasets. In this study, we have proposed a JPEG2000 based approach for computing data redundancy from multichannels EEG signals and have used the redundancy as a feature for classification of EEG signals by applying support vector machine, multi-layer perceptron and k-nearest neighbors classifiers. The approach is validated on three EEG datasets and achieved high accuracy rate (95–99 %) in the classification. Dataset-1 includes the EEG signals recorded during fluid intelligence test, dataset-2 consists of EEG signals recorded during memory recall test, and dataset-3 has epileptic seizure and non-seizure EEG. The findings demonstrate that the approach has the ability to extract robust feature and classify the EEG signals in various applications including clinical as well as normal EEG patterns.
& Aamir Saeed Malik
[email protected] Hafeez Ullah Amin
[email protected] 1
Centre for Intelligent Signal and Imaging Research (CISIR), Department of Electrical & Electronic Engineering, Universiti Teknologi PETRONAS, 32610 Bandar Seri Iskandar, Perak, Malaysia
2
Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Keywords Data redundancy Feature extraction Classification EEG signal
Introduction Electroencephalography (EEG) is a reliable tool to measure and assess the neurophysiological changes related to postsynaptic activity in the neocortex (Tong et al. 2009). It enables the researchers and clinicians to study the brain functions such as memory, vision, intelligence, motor imagery, emotion, perception, and recognition; as well as detect its abnormalities such as epilepsy, stroke, dementia, sleep disorders, depression, and trauma. Existing approaches of EEG analysis include the basic EEG rhythms analysis, spectral analysis, time series analysis, time–frequency analysis, statistical analysis (mean, median, and standard deviation) and so on (Tong et al. 2009). However, these approaches do not always give a good result for the diagnosis of brain abnormalities as well as detecting a desired EEG pattern. The reasons may be that EEG is highly vulnerable to artifacts, its non-stationary characteristics, and the existence of high inter-individual variability. Further, all EEG features may not have good discriminating power to diagnose different EEG patterns. The features with low discriminating power may increase the computational cost as well as the false detection rate of the classification model. The feature standardization and feature selection techniques can play an important role here for identification of robust features to feed the classifiers. Thus, a pattern recognition approach has been desired to extract relevant important features from EEG signals that could efficiently classify EEG to corresponding classes in many applications. Feature extraction and classification are two important steps in the pattern recognition approach. Extracting
123
Brain Topogr
relevant features is of high importance and a significant step due to its direct impact on the classification performance. The lack of expressive features for a certain problem may lead to poor classification results. Hence, extracting discriminative features from EEG data to obtain high classification accuracy is mandatory. EEG contains a lot of redundant information—particular wave segments are repeating with respect to time, especially in spontaneous EEG activity i.e., the recording of electrical field potentials generated by the brain with no specific task assigned to it. Processing the whole data including the redundant information causes waste of time and resources as well as decreases the reliability in diagnosis of brain disorders and classification accuracy of different EEG brain pattern. To reduce the redundant information, several techniques have been proposed and employed in communication and data transmission fields, such as lossy compression and lossless compression (Gonzalez and Woods 2002). These techniques are employed for compression of large dataset for storage and remote transmission. A well-known standard for compression and redundancy is the JPEG2000 (Gonzalez and Woods 2002). Today, this standard is successfully employed for compression of biomedical signals (Higgins et al. 2010) and images (Gonzalez and Woods 2002). The JPEG2000 has been used for data reduction and compression of bio-signals, especially in telemedicine applications (e.g., video-monitoring of patients) where data transmission over networks is required and also in long data recordings (e.g., epilepsy monitoring, sleep data recording) where efficient data storage is needed. The huge data records could be reduced to compact data that could efficiently retain the significant information to contribute in medical decision making. However, the use of JPEG2000 in these applications is limited to compression and storage purpose only. In this paper, we propose a JPEG2000 based feature extraction approach for EEG pattern classification. The EEG features in this approach reflect the redundant information present in the EEG signals. The EEG signals contain redundant information and the redundancy varies across different electrodes over the scalp. In Fig. 1, a simple periodic sine wave and a real EEG channel are shown in terms of percentage redundant information. The periodic sine wave (3–6 Hz) in Fig. 1a shows the redundancy (R = 72 %) present in the signal, which is obvious due to periodicity. Similarly, Fig. 1b shows one recorded EEG signal (3–6 Hz), which looks periodic to a certain extent and the computed redundancy (R) is 52 %. The redundancy depends on the regularity of EEG signals and varies among experimental tasks as well as different channels locations—representing certain brain region. For example, EEG signal in eyes closed condition at occipital region is expected to be more regular as compared to eyes open condition. Therefore, the redundancy can be used as a feature for discrimination of EEG signals
123
recorded in different brain states or brain disorders such as epileptic seizure. The proposed approach is validated on three EEG datasets including normal EEG and clinical EEG data. The results of the proposed approach are compared with the existing time series analysis (Entropy feature) and EEG spectral analysis (power feature). We found high classification accuracy of the proposed approach as compared to the existing feature extraction techniques. A substantial impact of the present study is the composition of wavelet transform with arithmetic coding for feature extractions (data redundancy) which are used to train the classifiers for the EEG signals classification. The purpose of this research is to propose a robust EEG feature extraction scheme using JPEG2000 based on the redundant data present in the EEG recordings. The proposed scheme can be used to develop an efficient automatic EEG pattern detection system for clinical applications and/or research purpose. The rest of the paper is organized as follows. Sect. 2 presents the review of existing methods reported for EEG feature extraction and classification; Sect. 3 describes the proposed algorithm for feature extraction, an overview of classification techniques used for evaluating the proposed method, and detailed description of datasets; Sect. 4 presents the results and discussion including comparison with the state of the art methods; and finally, Sect. 5 gives the concluding remarks.
Review of Existing Methods In the literature, the EEG feature extraction methods based on time series analysis, spectral analysis, and time–frequency analysis have been reported e.g., entropy analysis, power analysis, and wavelet analysis (Richman and Moorman 2000; Vidaurre et al. 2009; Acharya et al. 2012a; Iscan et al. 2011; Sabeti et al. 2011; Fu et al. 2014; Musselman and Djurdjanovic 2012; Wang et al. 2011). In these methods, EEG features are extracted and fed to machine learning classifiers for EEG signals classification. Here, we present a review of recent literature on EEG signals classification in various normal and clinical applications. EEG time series (entropy) analysis for feature extraction and classification has been reported in epileptic seizure detection and classification of schizophrenic and control patients (Acharya et al. 2012a; Nicolaou and Georgiou 2012; Song et al. 2012; Acharya et al. 2012b; Sabeti et al. 2009). These studies used sample entropy (SamEn), approximate entropy (ApEn), and permutation entropy for feature extraction and fed to classifiers. EEG spectral analysis is a widely reported method of feature extraction (Herman et al. 2008; Subasi and Ismail 2010; Faust et al. 2010). It involves the EEG basic rhythms analysis such as delta, theta, alpha, beta and
Brain Topogr Fig. 1 Comparison of redundancy in periodic sine signal (72 %) and EEG Signal (52 %) in 3–6 Hz
gamma frequencies, autoregressive moving average, power density spectrum, local maxima and minima for EEG classification problem. EEG time–frequency analysis includes wavelet based feature extraction which has been reported mostly for clinical EEG data, e.g., epileptic seizure detection (Subasi 2007; Jahankhani et al. 2006; Parvez and Paul 2014). Acharya et al. (2012a) computed ApEn, SamEn, fractal dimension (FD) and wavelet based features and employed various classifiers such as decision tree, support vector machine (SVM), k-nearest neighbor (k-NN), and neural network (NN) for identification of epileptic seizures. The study achieved high classification accuracy (99.7 %) with the use of time domain and wavelet based features but employed relatively small epilepsy dataset [Bonn epilepsy database (Andrzejak et al. 2001)] as compared to large datasets (CHBMIT1, Freiburg dataset2). Sabeti et al. 2009 used entropy and complexity based features such as ApEn, spectral entropy, Lempel–Ziv complexity and FD for EEG signals classification of schizophrenic patients from control participants with linear discriminant analysis and Adaboost classifiers. The study reported classification accuracy 80–90 %. TaghizadehSarabi et al. (2014) classified different objects (animals, stationary, buildings, etc.) through EEG signals using wavelet transform and SVM classifier. Three different wavelets (Db4, Haar, and Symlet2) were applied for feature extraction. The study reported 80 % classification accuracy for animal and stationary objects categories. Wu and Neskovic (2007) and Zarjam et al. (2012) classified 1
Large dataset, Epilepsy EEG dataset collected at Children’s Hospital Boston, Massachusetts Institute of Technology. Available: http://physionet.org/pn6/chbmit/. 2 Large dataset, EEG Epilepsy dataset collected at Epilepsy Center of the University Hospital of Freiburg. Available: http://epileptologiebonn.de/cms/.
working memory loads using EEG features such as entropy, mutual information, and wavelet complexity with SVM, and NN classifiers. The studies reported 90–96 % classification accuracy for discriminating different working memory loads. However, the data used in these studies are time-locked EEG signals and relatively shorter in length as compared to spontaneous EEG. Jahidin et al. (2014) employed EEG sub-band power ratio as a feature for classification of EEG patterns corresponding to different intelligent quotient groups using ANN classifier. The study utilized theta, alpha and beta ratio and reported 88.89 % classification accuracy. We have noted that existing features extraction methods are either nonlinear in nature (e.g., ApEn, SamEn) or used with non-linear classifiers such as NN and kernel based SVM for good classification accuracy. In either case, the computational cost would be high. In addition, the existing methods have expertise in a single EEG application and/or used small EEG datasets (Acharya et al. 2012a; Wu and Neskovic 2007; Zarjam et al. 2012). Further, applying on large datasets may boost the false classification rate (Yuan et al. 2012). Hence, an efficient and robust feature extraction method is desired to give high classification accuracy both in clinical applications as well as in research studies for detection of spontaneous EEG patterns.
Materials and Methods Proposed Algorithm for Feature Extraction and Classification Joint Photographic Experts Group in 2000 (Taubman and Marcellin 2002) was designed to replace the traditional JEPG file format with advanced features such as lossless
123
Brain Topogr
and lossy compression. The core components of JPEG2000 include: Discrete Wavelet Transform (DWT), Thresholding and Quantization, and an Arithmetic coder. The proposed algorithm utilizes this concept and consists of three main steps (see Fig. 2). In the first step, the discrete EEG signal x½n 2 R1M (where M is the number of sample points) is decomposed into approximation and detailed coefficients of DWT up to level 4 using Daubechies wavelet (db4). The db4 is the most appropriate wavelet for decomposition of EEG signals because of its orthogonality property and efficient filter implementation, especially when the EEG signals contain spikes such as epileptic EEG data (Adeli et al. 2003). The DWT uses successive low pass h(n) and high pass g(n) filters. The high pass filter g(n) is the discrete mother wavelet and the low pass filter h(n) is its mirror version (Subasi 2007). The cutoff frequency of h(n) and g(n) filters is one-fourth of sampling frequency of the input EEG signal. In the first level of DWT decomposition, the input signal is simultaneously filtered through h(n) and g(n)filters and the corresponding outputs are known as approximation (A1) and detail (D1) coefficients, respectively. The coefficients of the DWT are the dot product of the original time series and the designated basis functions. The approximation coefficients Ai and the detail coefficients Di in the ith level are represented as: 1 X Ai ¼ pffiffiffiffiffi xðnÞ:/j;k ðnÞ ð1Þ M n where, /j;k ðnÞ ¼ 2j=2 h 2j n - k is the scaling function, 1 X Di ¼ pffiffiffiffiffi xðnÞ:wj;k ðnÞ ð2Þ M n where, wj;k ðnÞ ¼ 2j=2 g 2j n - k is the wavelet function. and n ¼ 0; 1; 2; . . .; M 1; j ¼ 0; 1; 2; . . .; J 1; k ¼ 0; 1; 2; . . .; 2j 1; J ¼ log2 ðMÞ; M is the length of EEG discrete signal x[n]. The DWT coefficients Djk (approximation Ai¼4 and detailed Di¼1;2;3;and 4 ) are reduced by discarding the nonsignificant coefficients using a certain threshold value a.
EEG Signal
EEG Datasets Classification Classifier (SVM)
Thresholding and Rounding off
DWT Coefficients
Reduced DWT Coefficients Redundancy Features
Optimum Features Normalization and Selection
Arithmetic Encoding
Bits stream
Discrete Wavelet Decomposition
Compute Data Redundancy
Fig. 2 Block diagram of proposed algorithm for feature extraction and classification of EEG signals
123
^ jk ¼ D
Djk ;Djk a; 0; Djk \a;
ð3Þ
The threshold value a is specified in such a way that the reconstructed signal has more than 99 % energy. EnergyðEÞ ¼
100 kXr k22 kX k22
[ 99%
ð4Þ
where, Xr is the reconstructed signal and X is the original signal. This ensures the quality of the signal after eliminating the non-significant coefficients. The thresholded DWT ^ jk are rounding off to the nearest integer, coefficients D denoted as Djk . jk In the second step, the rounded off DWT coefficients D are encoded to bits streams using the arithmetic coding technique. In Arithmetic coding (Gonzalez and Woods 2002), the whole sequence of the source symbols is assigned a single arithmetic code word. The arithmetic code word defines an interval of real numbers between 0 and 1. The interval becomes smaller and the number of bits required to represent the interval becomes larger as the number of symbols in the source to be coded increases. As the arithmetic coding does not require each symbol of the source to represent into an integral number of code symbols, thus each symbol of the source decreases the size of the interval according to its probability of occurrence. Accordingly, the size of DWT coefficients is reduced and consequently the signal is compressed. Then, the redundancy features are computed from the arithmetic coding output bits stream as follows: R¼
1 100 CR
ð5Þ
Size of original signal ðXÞ where, CR ¼ Size of compressed signal ðX Þ c Finally, in the third step, the extracted redundancy features are standardized and reduced to optimum number of features using Fisher’ discriminant ratio (FDR). The features are standardized as follows: xi x x^i ¼ ð6Þ r where i ¼ 1; 2; . . .; N; N is the number of instances in a specific feature x; r and x are standard deviation and mean of xi ; and x^i is the normalized feature value. Fisher’ discriminant ratio is ðm1 m2 Þ FDR ¼ 2 r1 r22
ð7Þ
where m1 and m2 are mean values and r21 and r22 are the respective variances of a feature xi in two classes. The FDR ranks all the features according to their discrimination
Brain Topogr
power. Thus, features are selected above the median value in the FDR ranking as optimum ones. The optimum redundancy (R) features are then used as input for a classifier. The redundancy R was computed over 128 spatial locations (electrodes) in the EEG dataset-1 and 2 and 19 spatial locations in the dataset-3. However, the FDR selected the optimum features from the features set. The exact number of features used in the classification is mentioned in Table 2. The detail of these datasets is given in the EEG datasets section. The value of R across the electrodes varies reflecting the state of EEG. The redundancy R was used as input for training and testing the classifiers. The redundancy R feature represents the percentage of redundant information present in the EEG signal recorded from a certain electrode position over the scalp. The variation in the R value of EEG signals computed over different scalp positions reflects the changes in the electrical brain potentials. Hence, it has the capability to give high classification accuracy and discriminate EEG patterns of different classes. Classifiers and k-Fold Cross Validation A classifier is a function that utilizes various independent variable values (features) as input and predicts the corresponding class to which the independent variable belongs (Pereira et al. 2009). To demonstrate the effectiveness of the proposed algorithm in EEG Signals classification, we used three classifiers i.e., SVM with radial basis function (Ben-Hur and Weston 2010), multi-layer perceptron (MLP) with three hidden layers (Orhan et al. 2011), and K-nearest neighbors (k-NN) with k = 1 (see Pereira et al. 2009 for detail of machine learning classifiers). To evaluate the performance of the classifiers, the 10-fold cross validation method was adopted—each extracted features set is randomly split into 10 mutually exclusive folds of equal size. The classifiers are trained and tested ten times. The training is done with ninefolds and the testing is done on the remaining onefold (Pereira et al. 2009). The average of accuracy, sensitivity, specificity, precision, and Kappa’s statistic are computed for final classifiers’ performance (Pereira et al. 2009). EEG Datasets Three EEG datasets are used for the validation of this scheme. Dataset-1 consists of eight healthy university students with 128 channels EEG, recorded during performing a complex cognitive task (class-1) and base line eyes open (class-2) condition using the HydroCel Geodesic Sensor Net (Electrical Geodesic Inc., Eugene, OR, USA). All the electrodes referenced a single vertex, Cz, from which raw signals were amplified with EGI Net-Amps 300
amplifier. The sampling rate was 250 samples per second and the impedance was kept below 50 KX. In the complex cognitive task, the participants performed the Raven’s Advance Progressive Matric (RAPM), which is a nonverbal test used as a tool to measure the fluid intelligence of individuals. The RAPM consists of two sets. Set-I consists of 12 problems for practice and set-II consists of 36 problems for assessment. The detail about the procedure of RAPM can be found in our previous study (Amin et al. 2015a). The EEG signals were band passed (0.5–48 Hz) and EOG artifacts were corrected using Gratton’s method (Gratton et al. 1983). The detail of the data collection and experimental tasks are reported in (Amin et al. 2013). The dataset consists of two classes. The class-1 represents the EEG recording of all the participants collected during the RAPM task and the class-2 denotes all participants’ EEG data recorded during eyes open condition. The RAPM task consists of 36 questions, so each participant is observed 36 times while EEG data is recorded. From classification point of view, each participant’s EEG data gives 36 instances. Few participants did not answer all the questions in RAPM task. The unanswered questions were excluded and we left with a total number of 280 instances for class-1 (8 participants 9 36 questions = 288 instances, excluding the missing questions, gives 280 instances). For class-2, eyes open EEG recordings were segmented according to the corresponding numbers of attempted questions in RAPM task for each participant. Thus, the feature matrix is 560 instances (class-1 = 280; class-2 = 280). The segmentation was preformed to balance the number of instances between the classes. The EEG segment length is 8.5 s in class-2 but the length of EEG in class-1 varies between 10 and 60 s. This variable length in class-2 is due to variable response time of the participants in RAPM task. Dataset-2 consists of 20 subjects with 128 channels EEG recordings, which were recorded during performing a memory recall task (class-1) and base line—eyes-open (class-2) condition. The detail of the experimental task and data collection is given in (Amin et al. 2014). The EEG recording setup and preprocessing was similar to that of dataset-1. In the memory recall task, the participants first go through watching of learning materials about which they don’t have background knowledge. After the learning, they were given a memory recall task. There was 30 min gap between learning and recall task as a retention time. The recall task consists of 20 multiple choice questions (MCQs) about the learning material. Participants were given 30 s to response each MCQ within a maximum limit of 10 min time in the recall task. The EEG recordings were preprocessed and clean from artifacts by the same method applied on dataset-1. As the number of participants is 20 and the recall task consists of 20 MCQs, thus EEG data recorded during each MCQ is considered as a separate
123
Brain Topogr
instance for classification. The total number of instances in class-1 for this dataset is 400 (20 participants 9 20 MCQs). Accordingly, the eyes-open EEG recordings (class-2) for each participant were segmented to balance the number of instances in class-2. Hence, the feature matrix for this dataset is 800 instances (class-1 = 400, class-2 = 400). The EEG segment length was up to 30 s in class-1 and 15 s in class-2. Dataset-3 consists of EEG recording with intractable seizures of 24 pediatric patients (age: M = 10.15, SD = 5.64), collected by a team of investigators at the Children’s Hospital Bosten and the Massachusetts Institute of Technology (MIT)—commonly known as CHBMIT database. The signals were sampled at 256 samples per second with 16-bit resolution and international 10–20 system of electrode configuration was used for this dataset. This dataset is publicly available at (http://www.physionet.org/pn6/ chbmit/). The EEG recordings are filtered with zero-phase (0.5–57 Hz). It contains 19 EEG channels and 193 seizures. All the EEG seizures are considered as class-1. Prior to onset of each seizure, we have selected non-seizure EEG of equal length of seizure and considered as class-2. Hence, the feature matrix for this dataset contains 386 instances (193 seizures and 193 non-seizures). The range of the seizure and nonseizure segments is 18–752 s. The detail of seizures and nonseizures EEG recordings is given in Shoeb and Guttag (2010). Table 1 shows the length of EEG signals in all the three datasets in the respective classes.
Results and Discussion In this paper, a novel feature extraction scheme for EEG datasets is proposed based on DWT and arithmetic coding technique to improve the classification performance in spontaneous EEG datasets via extracting relevant and significant information from the EEG signal which directly reflects the variations in the brain during different tasks/conditions. We have followed the pattern recognition steps including feature extraction, feature standardization, feature selection, apply classifiers, and evaluating the classifiers. In order to show the superiority of the proposed Table 1 Mean and standard deviation of the length of EEG datasets (in seconds) EEG Classes
Statistics
Dataset-1
Dataset-2
Dataset-3
Class-1
Mean SD±
34.22 15.51
9.58 4.77
64.36 100.59
Class-2
Mean
*8.5
*15.0
SD±
*0
*0
64.36 100.59
* All the instances in class-2 of datasets (1 and 2) have same length. Hence, the SD is ‘0’
123
feature extraction scheme, three EEG datasets were employed including clinical and research datasets which contain high density and low density EEG data. In the classification stage, linear (kNN) and non-linear (MLP and SVM) classifiers have been employed. Classification Results The mean redundancy rate in the healthy EEG (eyes open and normal recording) of all the three datasets is around 75 % which indicates that more than 2/3rd of the signal information is redundant or repeating. However, the mean redundancy rate in cognitive task, memory recall and Epileptic EEG is 63, 58, and 44 %, respectively. This indicates that the variability in the cognitive and memory tasks as well as in the epileptic EEG is relatively increased, which causes decrease in redundant EEG activity and consequently the mean redundancy rate is reduced (see Fig. 3 for whole scalp distribution of compression ratio— CR). Table 2 gives the obtained results from the classifiers (SVM, MLP and kNN) in the classification of all the datasets. In Dataset-1, the k-NN gives 99.82 % accuracy, discriminating the EEG pattern of complex cognitive task from eyes open. In Dataset-2, the SVM provides high accuracy 98.52 %, in which the EEG of memory recall is discriminated from resting state EEG pattern. In Dataset-3, the epileptic seizure and non-seizure EEG patterns are successfully classified with an acceptable accuracy and sensitivity rate i.e., 97.27 and 96.4 % respectively. The achieved AUC by the three classification algorithm is greater than 90 % in all the datasets. The novelty of the proposed scheme for EEG feature extraction is the combination of wavelet decomposition and the arithmetic coding technique which extracts efficiently the EEG feature. The wavelet decomposition has been reported for de-nosing of EEG signals, extraction of EEG frequencies and extraction of event-related potentials (ERPs) signal from background EEG activity (Demiralp et al. 2001; Quiroga 2005). Besides, the wavelet decomposition has been reported in the literature for EEG classification using various levels e.g., level 3, level 4, level 5 etc. (Adeli et al. 2003; Kumar et al. 2014; Chen et al. 2014). The decomposed wavelet coefficients (D1 to D4 and A4) correspond to the basic EEG rhythms (delta, theta, alpha, beta, and gamma (Chen et al. 2014). We selected the decomposition level 4 due to corresponding frequencies in the preprocessed EEG signals in the datasets. The reason is that maximum EEG information resides around 0.5-30 Hz. Here, the DWT decomposition up-to 4th level contains the EEG information till 30 Hz. In addition, we also found that in decomposition level 4, the number of discarded wavelet coefficients were higher than the lower decomposition
Brain Topogr
Fig. 3 The distribution of compression ratio over whole scalp surface (Left image Memory Recall task; right image Eyes Open) shows the variations in the EEG activity over the whole scalp. High compression
value (red area in topomap) indicates high redundancy and low compression value (e.g., yellow regions in the topomap) shows low redundancy or non-repeating EEG activity
Table 2 Classification results on EEG datasets using SVM, MLP, kNN EEG Datasets
Classifier
Accuracy (%)
Sensitivity (%)
Specificity (%)
Area under the ROC curve (AUC)
Kappa’s statistic
Dataset-1 Features matrix (560 9 64)
SVM
98.92
97.90
100
0.98
0.97
MLP k-NN
98.50 99.82
97.50 99.00
98.50 100
0.97 0.99
0.96 0.98
Dataset-2
SVM
98.52
100
97.10
0.98
0.97
Features matrix (400 9 64)
MLP
94.11
97.10
96.90
0.99
0.88
k-NN
95.58
100
91.20
0.94
0.91
Dataset-3
SVM
95.45
96.40
94.50
0.95
0.90
Features matrix (386 9 9)
MLP
95.45
94.50
96.40
0.98
0.90
k-NN
97.27
96.40
98.20
0.99
0.94
levels (i.e., only 30 % of the wavelet coefficients reconstructed the original signal with 99 % energy, so the R value was high). However, decomposition level higher than 4 did not show any significant effect on R value (Fig. 4). The wavelet coefficients of different levels represent the information present in the EEG activity. The variations in the values of the wavelet coefficients also depends on the variations of EEG signal, i.e., if the EEG signal has high variability then the wavelet coefficients will have more variations. The extraction of significant wavelet coefficients in the proposed scheme is almost similar to the Donoho’s de-nosing method (Donoho and Johnstone 1994). Donoho and colleagues proposed a de-nosing implementation for ERP signal extraction, where the wavelet coefficients are selected by thresholding. However, the combination of arithmetic coding such as in JPEG2000 reduced the redundant wavelet coefficients and encoded to bit stream. Hence, the proposed scheme outperforms the state of art methods of EEG feature extraction.
Implementation on Raw EEG The proposed feature extraction scheme is implemented on full raw EEG, band limited raw EEG and pre-processed artifacts free EEG to study the effects of artifacts on the redundancy feature and its robustness. In raw EEG data, the possible artifacts are eyes blinks and eyes movement artifacts (high amplitude artifacts), high frequency muscles artifacts, line noise (50 Hz), ECG artifacts, and bad channel contamination; while the band limited raw EEG have high amplitude eyes blinks and eyes movement artifacts, ECG artifacts and some muscles artifacts in gamma frequencies below 48 Hz. The mean (±standard deviation) of 128 channels data of R values for raw EEG, band limited raw EEG and clean EEG are 47.02 (±14.24), 90.32 (±7.55) and 97.12 (±1.05), respectfully. We found that fully raw EEG signal reflects high variability in R values (see, Fig. 5) as compared to band-limited and clean EEG data. However, the band-
123
Brain Topogr Fig. 4 Comparison of different levels of wavelet decomposition for selection of highest level of decomposition in the proposed method. The decomposition level 4 shows significant differences than the lower levels of decomposition (see panel b) and no or negligible differences for higher levels of decomposition as shown in panel c. The x-axis of the above graph shows data length in seconds and the y-axis represents the percentage of discarded wavelet coefficients while fulfilling the reconstructed signal quality. The L1, L2…, L10 denote the levels of decomposition
limited raw EEG shows relatively close performance to clean EEG as compared to fully raw EEG. The frontal electrodes (from 1 to 26 as indicated with rectangle and arrow in Fig. 5) for band limited EEG shows high variability than the rest of the electrodes i.e., variance of electrodes 1–26 is 129.49 and variance of electrodes 27–128 is 11.06 for R value. These frontal electrodes contain high influence of eyes blinks and eyes movements artifacts. In addition, channel 79 (as indicated with arrow) is a bad channel, which also decreases the R value. This comparison indicated that R feature can be extracted for fully raw EEG including all possible artifacts, but variability in the R value may not be neglected which may influence the performance of classification algorithms. Furthermore, the R feature can be extracted from band limited raw EEG and may be used for classification due to close performance with the clean data. However, high variability has been reported in eye blinking rate in humans [i.e., 4.6–43.5 eyeblinks/min (Doughty et al. 2009)] and the reduction in eye blinking rate has also been reported in high demanding attention tasks (Fukuda 2001). Hence, the variability in frontal regions which directly reflects the eyes blinks artifacts may be a confounding factor, especially in comparison or classification of two groups where the eyes blinking rate is not controlled between groups. In short, the proposed feature extraction scheme may be implemented on band-limited raw EEG with the consideration of controlling the eyes blinking rate between groups of subjects and/or resting and high demanding attention tasks condition.
123
Comparison with Existing Feature Extraction Methods The proposed feature extraction scheme is compared with the non-linear feature extraction methods (time domain features) and well-known spectral analysis method (EEG power feature). The time domain features include ApEn, SamEn, composite permutation entropy index (CPEI), Hjorth complexity, and FD (Richman and Moorman 2000; Acharya et al. 2012a). The EEG power features are computed using FFT with ‘hanning window’ and 50 % overlapping over 2 s of EEG segments (Amin et al. 2014). The power features are extracted for EEG basic rhythms (delta 1–4 Hz, theta 4–8 Hz, alpha 8–13 Hz, beta 13–30 Hz and gamma 30–45 Hz band). These features are computed for datasets (1 and 2). As the dataset-3 is publically available database. Hence, a recent study which has used the same EEG dataset (dataset-3) is compared with the proposed scheme. The redundancy feature in the proposed algorithm gives better discrimination results in the first two datasets as compared to time domain features and spectral domain (see, Table 3). The results of the proposed scheme in dataset-3 have been compared with a recent EEG epilepsy study which used the same dataset and employed wavelet features for epileptic seizure classification (ACC%: 84) (Ahammad et al. 2014). The proposed scheme shows better performance than the previous study in epileptic seizure classification. This confirmed the robustness of the proposed algorithm for feature extraction and classification of EEG signal.
Brain Topogr
Fig. 5 Comparison of R feature values for 5 min long (75,000 sample points) Raw EEG signal (variance = 202.89), band-limited Raw EEG (variance = 56.95), and clean EEG (variance: 1.09). The Raw EEG signal contains all frequencies (i.e., 0–125 frequencies) and all possible artifacts. The Band-limited Raw EEG is band passed for
0.5–48.0 Hz but still includes the EOG, ECG artifacts and some muscles artifacts below 48 Hz. The Clean EEG contains also frequencies from 0.5 to 48 Hz and free from artifacts. The x-axis represents the electrodes number from 1 to 128
Table 3 Comparison with existing feature extraction methods in term of classification accuracy (%) EEG datasets
Classifier
Dataset-1
SVM
Feature matrix
MLP
(560 9 64)
k-NN
Dataset-2 Feature Matrix (400 9 64)
Delta
Theta
Alpha
Beta
Gamma
ApEn
SamEn
CPEI
Hjorth Complexity
Fractal Dimension
94.10
93.21
91.50
88.21
87.15
87.50
93.75
87.75
77.75
73.50
93.85
92.45
90.35
87.55
87.60
87.50
91.11
86.75
75.00
81.50
94.67
92.85
89.10
87.35
87.48
68.75
86.75
68.50
73.50
75.00
SVM
86.76
82.35
86.76
75.00
69.11
94.11
86.76
86.76
82.35
80.88
MLP
91.17
88.23
88.23
86.76
82.35
93.50
88.23
89.70
92.64
79.41
k-NN
91.17
80.88
67.64
73.52
67.64
87.50
83.82
75.00
75.00
58.82
Limitations of the Proposed Scheme The aim of this study is to propose an efficient feature extraction and classification scheme for spontaneous EEG signals. The spontaneous EEG usually has huge data length such as epilepsy monitoring and/or sleep EEG recordings. The ERPs tasks normally contain short length EEG recordings (e.g., oddball paradigm). In short length EEG data, the probability of redundant information is relatively low than the spontaneous EEG. However, the proposed feature extraction scheme is applied on various EEG data lengths including oddball paradigms. We found that the redundancy feature value is either zero or very close to zero if the data length is less than 750 sample points i.e., if sampling frequency is 250, data length should be at least 3 s to compute the redundancy feature. Hence, for oddball data set [tested data length was 500 ms taken from our previous study (Amin et al. 2015b)] the proposed scheme may not be appropriate. For classification of short length EEG signals such as oddball paradigms and ERP
data, methods are reported in literature (Blankertz et al. 2011; Stewart et al. 2014; Cecotti et al. 2014). This may be the limitation of the proposed scheme which needs to be resolved in future work.
Conclusion The findings demonstrated that data redundancy provides an efficient and robust approach to feature extraction and classification of EEG signals. The classification performance with acceptable accuracy indicated that the redundant information as a feature works well for EEG classification in various applications. This approach may be employed for diagnosis of brain disorders, remote epileptic patients monitoring, and clustering of several levels of brain states. The datasets used in the study are of two class problems, if there are multi-class problems then a single classification algorithm may not have expertise to the solution. Hence, in future studies, a mixture of experts
123
Brain Topogr
architecture will be used to employee the expertise of multiple classifiers (experts). Acknowledgments This research work was supported by the HiCoE grant for CISIR (0153CA-002), Ministry of Education (MOE), Malaysia; and by NSTIP strategic technologies programs, grant number 12-INF2582-02 in the Kingdom of Saudi Arabia. The authors, therefore, acknowledge with thanks the technical and financial support.
References Acharya UR, Sree SV, Ang PCA, Yanti R, Suri JS (2012a) Application of non-linear and wavelet based features for the automated identification of epileptic EEG signals. Int J Neural Syst 22:1250002 Acharya UR, Molinari F, Sree SV, Chattopadhyay S, Ng K-H, Suri JS (2012b) Automated diagnosis of epileptic EEG using entropies. Biomed Signal Process Control 7:401–408 Adeli H, Zhou Z, Dadmehr N (2003) Analysis of EEG records in an epileptic patient using wavelet transform. J Neurosci Methods 123:69–87 Ahammad N, Fathima T, Joseph P (2014) Detection of epileptic seizure event and onset using EEG. BioMed Res Int 2014 Amin HU, Malik AS, Subhani AR, Badruddin N, Chooi W-T (2013) Dynamics of scalp potential and autonomic nerve activity during intelligence test. In: Lee M et al (eds) Neural Information Processing, vol 8226. Springer, Berlin, pp 9–16 Amin HU, Malik AS, Badruddin N, Chooi W-T (2014) Brain behavior in learning and memory recall process: a highresolution eeg analysis. In: The 15th International Conference on Biomedical Engineering, vol. 43, J. Goh, Ed., ed: Springer International Publishing, 2014, pp. 683–686 Amin HU, Malik AS, Ahmad RF, Badruddin N, Kamel N, Hussain M, Chooi W-T (2015a) Feature extraction and classification for EEG signals using wavelet transform and machine learning techniques. Australas Phys Eng Sci Med 38:139–149 Amin HU, Malik AS, Mumtaz W, Badruddin N, Kamel N (2015b) Evaluation of passive polarized stereoscopic 3D display for visual and mental fatigues. In: Presented at the Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, Milan, Italy, 2015 Andrzejak RG, Lehnertz K, Mormann F, Rieke C, David P, Elger CE (2001) Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys Rev E 64:061907 Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. In: Carugo O, Eisenhabe F (eds) Data mining techniques for the life sciences. Springer, Berlin, pp 223–239 Blankertz B, Lemm S, Treder M, Haufe S, Mu¨ller K-R (2011) Singletrial analysis and classification of ERP components—a tutorial. Neuroimage 56:814–825 Cecotti H, Eckstein MP, Giesbrecht B (2014) Single-trial classification of event-related potentials in rapid serial visual presentation tasks using supervised spatial filtering. IEEE Trans Neural Netw Learn Syst 25:2030–2042 Chen L-L, Zhang J, Zou J-Z, Zhao C-J, Wang G-S (2014) A framework on wavelet-based nonlinear features and extreme learning machine for epileptic seizure detection. Biomed Signal Process Control 10:1–10 Demiralp T, Ademoglu A, Istefanopulos Y, Bas¸ ar-Eroglu C, Bas¸ ar E (2001) Wavelet analysis of oddball P300. Int J Psychophysiol 39:221–227
123
Donoho DL, Johnstone JM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455 Doughty MJ, Naase T, Button NF (2009) Frequent spontaneous eyeblink activity associated with reduced conjunctival surface (trigeminal nerve) tactile sensitivity. Graefe’s Arch Clin Exp Ophthalmol 247:939–946 Faust O, Acharya UR, Min LC, Sputh BH (2010) Automatic identification of epileptic and background EEG signals using frequency domain parameters. Int J Neural Syst 20:159–176 Fu K, Qu J, Chai Y, Dong Y (2014) Classification of seizure based on the time-frequency image of EEG signals using HHT and SVM. Biomed Signal Process Control 13:15–22 Fukuda K (2001) Eye blinks: new indices for the detection of deception. Int J Psychophysiol 40:239–245 Gonzalez RC, Woods RE (2002) Digital image processing, 2nd edn. Prentice Hall, Upper Saddle River Gratton G, Coles MGH, Donchin E (1983) A new method for off-line removal of ocular artifact. Electroencephalogr Clin Neurophysiol 55:468–484 Herman P, Prasad G, McGinnity TM, Coyle D (2008) Comparative analysis of spectral approaches to feature extraction for EEGbased motor imagery classification. IEEE Trans Neural Syst Rehabil Eng 16:317–326 G Higgins, S Faul, RP McEvoy, B McGinley, M Glavin, WP Marnane, and E Jones (2010) EEG compression using JPEG2000: how much loss is too much?. In: Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, 2010, pp. 614–617 Iscan Z, Dokur Z, Demiralp T (2011) Classification of electroencephalogram signals with combined time and frequency features. Expert Syst Appl 38:10499–10505 Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3:79–87 P. Jahankhani, V. Kodogiannis, and K. Revett (2006) EEG signal classification using wavelet feature extraction and neural networks. In: IEEE John Vincent Atanas off 2006 International Symposium on Modern Computing, 2006. JVA’06, pp. 12–124 Jahidin AH, Ali MSAM, Taib MN, Tahir NM, Yassin IM, Lias S (2014) Classification of intelligence quotient via brainwave subband power ratio features and artificial neural network. Comput. Methods Prog. Biomed. 114:50–59 Kumar Y, Dewal ML, Anand RS (2014) Epileptic seizure detection using DWT based fuzzy approximate entropy and support vector machine. Neurocomputing 133:271–279 Taubman D, Marcellin MW (eds) (2002) JPEG2000 Image compression fundamentals, standards and practice: image compression fundamentals, standards, and practice, vol 1, 1st edn. Springer US. Musselman M, Djurdjanovic D (2012) Time–frequency distributions in the classification of epilepsy from EEG signals. Expert Syst Appl 39:11413–11422 Nicolaou N, Georgiou J (2012) Detection of epileptic electroencephalogram based on Permutation Entropy and Support Vector Machines. Expert Syst Appl 39:202–209 Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38:13475–13481 Parvez MZ, Paul M (2014) Epileptic seizure detection by analyzing EEG signals using different transformation techniques. Neurocomputing 145:190–200 Pereira F, Mitchell T, Botvinick M (2009) Machine learning classifiers and fMRI: a tutorial overview. Neuroimage 45:S199–S209 RQ Quiroga (2005) Single-trial event-related potentials with wavelet denoising: method and applications. In: International Congress Series, 2005, pp. 429–432
Brain Topogr Richman JS, Moorman JR (2000) Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol Heart Circ Physiol 278:H2039–H2049 Sabeti M, Katebi S, Boostani R (2009) Entropy and complexity measures for EEG signal classification of schizophrenic and control participants. Artif Intell Med 47:263–274 Sabeti M, Katebi SD, Boostani R, Price GW (2011) A new approach for EEG signal classification of schizophrenic and control participants. Expert Syst Appl 38:2063–2071 Shoeb AH and Guttag JV (2010) Application of machine learning to epileptic seizure detection. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 975–982 Song Y, Crowcroft J, Zhang J (2012) Automatic epileptic seizure detection in EEGs based on optimized sample entropy and extreme learning machine. J Neurosci Methods 210:132–146 Stewart AX, Nuthmann A, Sanguinetti G (2014) Single-trial classification of EEG in a visual object task using ICA and machine learning. J Neurosci Methods 228:1–14 Subasi A (2007) EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst Appl 32:1084–1093 Subasi A, Ismail Gursoy M (2010) EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst Appl 37:8659–8666
Taghizadeh-Sarabi M, Daliri MR, Niksirat KS (2014) Decoding objects of basic categories from electroencephalographic signals using wavelet transform and support vector machines. Brain Topogr 28(1):33–46 Tong S, Thakor NV (2009) Quantitative EEG analysis methods and clinical applications. Artech House, Boston Vidaurre C, Kra¨mer N, Blankertz B, Schlo¨gl A (2009) Time domain parameters as a feature for EEG-based brain–computer interfaces. Neural Netw 22:1313–1319 Wang D, Miao D, Xie C (2011) Best basis-based wavelet packet entropy feature extraction and hierarchical EEG classification for epileptic detection. Expert Syst Appl 38:14314–14320 Wu L and Neskovic P (2007) Classifying EEG data into different memory loads across subjects. In: Artificial Neural Networks— ICANN 2007, ed: Springer, 2007, pp. 149–158 Yuan Q, Zhou W, Liu Y, Wang J (2012) Epileptic seizure detection with linear and nonlinear features. Epilepsy Behav 24:415–421 P Zarjam, J Epps, and NH Lovell (2012) Characterizing mental load in an arithmetic task using entropy-based features. In: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), 2012, pp. 199–204
123