Double-Criteria Active Learning for Multiclass Brain-Computer Interfaces

She, Qingshan; Chen, Kang; Luo, Zhizeng; Nguyen, Thinh; Potter, Thomas; Zhang, Yingchun

doi:https://doi.org/10.1155/2020/3287589

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Experimental Results Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Advances in Recent Nature-Inspired Algorithms for Neural Engineering

View this Special Issue

Research Article | Open Access

Volume 2020 | Article ID 3287589 | https://doi.org/10.1155/2020/3287589

Double-Criteria Active Learning for Multiclass Brain-Computer Interfaces

Qingshan She,¹Kang Chen,¹Zhizeng Luo,¹Thinh Nguyen,²Thomas Potter,²and Yingchun Zhang²

Guest Editor: Eduardo Rodriguez-Tello

Received25 Jul 2019

Accepted11 Feb 2020

Published10 Mar 2020

Abstract

Recent technological advances have enabled researchers to collect large amounts of electroencephalography (EEG) signals in labeled and unlabeled datasets. It is expensive and time consuming to collect labeled EEG data for use in brain-computer interface (BCI) systems, however. In this paper, a novel active learning method is proposed to minimize the amount of labeled, subject-specific EEG data required for effective classifier training, by combining measures of uncertainty and representativeness within an extreme learning machine (ELM). Following this approach, an ELM classifier was first used to select a relatively large batch of unlabeled examples, whose uncertainty was measured through the best-versus-second-best (BvSB) strategy. The diversity of each sample was then measured between the limited labeled training data and previously selected unlabeled samples, and similarity is measured among the previously selected samples. Finally, a tradeoff parameter is introduced to control the balance between informative and representative samples, and these samples are then used to construct a powerful ELM classifier. Extensive experiments were conducted using benchmark and multiclass motor imagery EEG datasets to evaluate the efficacy of the proposed method. Experimental results show that the performance of the new algorithm exceeds or matches those of several state-of-the-art active learning algorithms. It is thereby shown that the proposed method improves classifier performance and reduces the need for training samples in BCI applications.

1. Introduction

Brain-computer interfaces (BCIs) are systems that allow users to control external devices via observed brain activity, without relying on peripheral nerve or muscle activity [1]. The most common and useful BCIs are constructed using noninvasive brain activity recording techniques, such as electroencephalography (EEG) [2]. While EEG has become widely used for medical monitoring, rehabilitation, neuroprosthesis, and other healthcare applications [3–5], the data acquisition process can be lengthy and exhaustive for users [6]. In addition, EEG signals often vary over the course of an experiment due to both biological and technical causes, including subject-specific anatomical differences, intersession variability, and the attentional drift of subjects [7]. Consequently, users must often undergo a long data collection process to train a suitable BCI system. This poses a prohibitive burden for individuals with paralysis or a severely injured central nervous system, making it a major hurdle for therapeutic applications. It is therefore of the utmost importance that developed BCI systems achieve efficient and robust performance with as few samples as possible.

One approach that has been effectively applied to cases with limited training sets is the introduction of active learning (AL) to the BCI calibration procedure. AL queries the class labels of informative samples within the unlabeled sample space to maximize the efficiency of the learning model, and its application greatly reduces the complexity of training samples without any obvious loss of classification accuracy [8]. In essence, AL is an iterative sampling and labeling procedure. On each iteration, AL extracts the sample or batch of samples that are most valuable for improving the current classification model from the unlabeled data pool, and these samples are then manually labeled. The greatest challenge for AL methods is identifying the most informative samples so that the maximum prediction accuracy can be achieved. A number of sample-selection criteria have then been applied to this task, including (1) query-by-committee (QBC), in which several distinct classifiers are used and the selected samples are those with the largest difference between the labels predicted by different classifiers [9–11]; (2) margin uncertainty sampling, wherein the samples are selected according to the maximum uncertainty based on their respective distances from the classification boundaries [12, 13]; (3) max-entropy sampling, which uses entropy as the uncertainty measure via probabilistic modeling [14, 15]; and (4) diversity sampling, which prefers selecting representative samples [16].

Over the past few decades, many supervised learning models have been adopted as baseline classifiers for AL, including linear discriminate analysis (LDA) [12, 17], support vector machine (SVM) [18, 19], artificial neural network (ANN) [20], and extreme learning machine (ELM) [21, 22]. Among these, the ELM has shown a high learning speed and good generalizability in preliminary testing. Additionally, it can be directly applied to both two-class and multiclass classification. To date, few studies have attempted to introduce AL algorithms into the ELM framework, although these have shown the method to be competitive with active SVMs [13, 14, 23]. Specifically, Yu et al. [13] proposed an active learning method called AL-ELM with the goal of saving training time, and results showed a classification performance comparable to that of AL-SVM [18]. Zhang and Er [23] then introduced the SEAL-ELM method by combining the online sequential ELM (OS-ELM) with AL, yielding a higher classification accuracy than offline combinations of AL and SVM on most test datasets. Regrettably, these existing active ELMs only consider a single-querying strategy, leaving space for improvement. The intuitive next step was to then introduce multiple querying strategies to select desirable samples. In fact, researchers have tried to combine two strategies in AL with base classifier SVM, with each performing better than their single-query counterparts [24–26]. At present, however, few implementations of active learning with ELM have been explored and applied for motor imagery- (MI-) based BCI systems [8, 13].

The present investigation intends to fill this gap by combining a two-query AL algorithm with an ELM and testing the method in a BCI application. A well-defined, general framework for active learning is thereby developed in a manner that accounts for both informativeness and representativeness in a multiclass situation. First, an uncertainty sampling strategy is adopted to select a relative large number of samples using the base ELM classifier. The degree of diversity between labeled training data and previously selected, unlabeled samples is then assessed, along with the degree of similarity between the unlabeled samples. Finally, highly informative and representative samples are used to update the ELM classifier through the introduction of a tradeoff parameter. The method is then tested on several benchmark datasets, along with a multiclass MI EEG dataset from BCI Competition IV Dataset 2a. Results demonstrate that the performance of the new method compares favorably with that of existing AL approaches.

Compared to existing ELM-based active learning algorithms, the new method has several noteworthy aspects:(1)Considering that the use of a single uncertainty strategy may not take full advantage of the abundant information with unlabeled data, the AL-ELM algorithm is extended to combine two querying strategies (uncertainty and diversity) in order to select the most valuable samples from the unlabeled EEG data pool.(2)The proposed algorithm provides a straightforward and meaningful way to measure representativeness by assaying two kinds of similarity: the similarity between a query sample and the labeled dataset, and the similarity between any two possible query samples. Employing this modified diversity strategy can help isolate highly representative samples during the active learning process.

2. Background Knowledge

2.1. Active Learning

Active learning methods typically comprise five basic components: , , , , and . is the limited labeled dataset, is the pool of samples/instances that contains abundant unlabeled instances, is the classification model trained by , is a query strategy to select the most valuable instances from , and is a human annotator that labels the selected instances correctly. AL is an iterative procedure that gradually adds the most important samples, queried by and labeled by , from to to update the classification model . The iterative AL process will continue in this manner until a predefined criterion is met. The ability to identify both an excellent classification model and an effective query strategy is highly important for active learning algorithms.

Depending on the number of querying samples at each iteration, AL can be divided into two groups: stream-based AL and pool-based AL. In stream-based AL, the learner can only access one sample per iteration, while pool-based AL allows the learner to select a batch of samples during each iteration. Adjusting the selection method and number of queried samples then creates different AL algorithms, such as the QBC strategy, the uncertainty strategy, and the diversity strategy.

2.2. Basic ELM

Single-hidden-layer feedforward neural networks (SLFNs) are capable of universal approximation [21]. Consider a dataset containing training samples, , with the input and a corresponding desired output of , where and represent the respective dimensions and denotes a transpose operation. Assuming that is the number of hidden neurons, the output function of the SLFNs is mathematically modeled aswhere is the weight vector that connects the i-th hidden neuron to the output neurons, is a randomly chosen input weight vector connecting the i-th hidden neuron to the input neurons, is a randomly chosen bias of the i-th hidden node, and is the activation function, which can be any nonlinear piecewise continuous function (such as a sigmoid function or Gaussian function).

For convenience, equation (1) can be rewritten in matrix notation aswhere is the expected network output, denotes the weight of output layer, and is the hidden layer output matrix which is defined as

Unlike SLFNs, which require that the parameters of hidden neurons are adjusted during training, ELM adopts randomly generated hidden layer parameters and a tuning-free training strategy [22]. Even with these random hidden node parameters, ELM maintains the universal approximation capability of SLFNs [21]. The ELM training then aims to find suitable network parameters to minimize the approximation error . To achieve better generalization performance, a regularization parameter is introduced in [27], with its corresponding objective function given aswhere denotes the -norm of a matrix or a vector. We can obtain the output weight vector using the Moore-Penrose principle. The solution of equation (4) is then if , and if .

3. The D-AL-ELM Method

In this section, we present a novel active learning algorithm, D-AL-ELM, that incorporates both the uncertainty and diversity strategies into consecutive steps. This identifies the most valuable, informative instances, which can then be selected to update the baseline classifier ELM during each learning round.

3.1. Discriminative Information by the Uncertainty Criterion

The uncertainty criterion is used to measure the informativeness of each sample. Uncertain samples which lie along the boundaries of different classes carry more information and play a more significant role in the construction of a classifier. In this implementation, the best-versus-second-best (BvSB) strategy is adopted to estimate the uncertainty of each sample. The BvSB strategy is based on a calculation of posterior probability, which considers the difference in probability values between the two classes with the highest estimated probabilities [28]. The outputs of the ELM then approximate the posterior probabilities of the different classes [13]. To do this, a sigmoid function is used to construct a mapping relationship between the real outputs of the ELM and the posterior probabilities, which is described aswhere denotes the actual output of the output node corresponding to the time instance . In practice, equation (5) is only applied to two-class problems, such that the sum of the converted posterior probabilities for the instance is always 1. However, application in multiclass problem may create a summed posterior proximity that exceeds 1, so calculated probabilities were normalized using the following formula:where is the original probability of the class.

Based on the above parameters, the BvSB strategy for each sample can be expressed aswhere and are the largest and second largest posterior probabilities of , respectively. It should be noted that values are inversely related to the amount of uncertainty in a sample, with smaller values indicating greater uncertainty.

3.2. Representative Information by the Diversity Criterion

The selection of redundant or overly similar samples is of little use when attempting to construct a robust classifier. It is therefore necessary to use a diversity criterion to select a batch of samples which are diverse in nature. A feasible way of measuring the diversity of uncertain samples is the cosine angle distance. Following this approach, the similarity between two samples and is given by

As can be seen from equation (8), the similarity between the two samples and is small if these two samples are far from each other, and vice versa.

Suppose a batch of samples . If the value of is small, then the new sample is diverse from the samples in . The similarity between a new sample and is defined as

Note that a smaller value implies more diversity between and .

In order to avoid selecting highly redundant samples, a novel diversity criterion is defined by combining the similarity between a query sample and the labeled set, and the similarity between any two candidate query samples at the same time. This calculation is given bywhere represents the diversity between the sample and the candidate set (apart from ), and represents the diversity between the sample and the labeled training set .

3.3. Proposed D-AL-ELM Algorithm

The BvSB sampling method is a highly effective strategy for sample selection in active learning. Unfortunately, the BvSB may also select some uncertain samples which contain highly redundant information, which reduces the information available for classification. To address this problem, optimal samples were selected for classification. An ideal sample would not only furnish significant information for the classifier but also show diversity from the candidate unlabeled set and a minimal amount of redundancy within the labeled set.

The specific steps for each iteration of the D-AL-ELM algorithm are as follows: Step 1: the BvSB strategy is adopted to select the most uncertain samples from the unlabeled samples pool . Step 2: let represent the most uncertain samples, denoted by , and be an arbitrary subset containing samples selected from . Two evaluations are then performed, including the diversity from the labeled set and the candidate set , and the similarity to the samples in . Step 3: combining the discriminative and representative parts, the following formulation is obtained to select the samples which are uncertain and diverse from each other:where is a tradeoff parameter that can balance the informativeness and representativeness criteria, and is the labeled training set. denotes the unlabeled sample that will be annotated and then included into the labeled training dataset for updating the ELM classifier.

The implementation of the proposed method is summarized in Algorithm 1.

Inputs: with labeled samples, with unlabeled samples , the tradeoff parameter (), the number of samples selected on basis of their uncertainty (), the batch size (), and the terminating condition.
Output: The final learned ELM classifier.
(1)	Train the ELM classifier using labeled set .
(2)	Repeat
(3)	Calculate the estimated probability for the samples in with the pretrained ELM classifier according to equation (5) or (6).
(4)	Calculate the uncertainty level of each sample in using equation (7).
(5)	Include the most uncertain samples into the set .
(6)	Select samples from using equation (11).
(7)	Label the selected samples.
(8)	Update the labeled set and unlabeled set .
(9)	Use the extended set to train a new ELM classifier.
(10)	Until the terminating condition is satisfied.
(11)	Return the output the final learned ELM classifier.

In order to quantitatively evaluate the quality of each learning algorithm, area under the learning curve (ALC) [13] was calculated as a performance metric, which is described aswhere denotes the number of learning iterations and denotes the classification accuracy at the i-th learning round, such that . It is noted that the larger the ALC value, the better the performance of the learning algorithm.

4. Experimental Results and Discussions

In this section, several experiments were performed on benchmark datasets and multiclass MI EEG datasets to evaluate the performance of the proposed D-AL-ELM method, in comparison with the other state-of-the-art approaches, including passive learning-based ELM, AL-ELM [13], and entropy-based ELM [14]. All methods were implemented using the MATLAB 2014b environment on a computer with a 2.5 GHz processor and 4.0 GB RAM.

4.1. Experiments on the Benchmark Datasets

4.1.1. Description of the Benchmark Datasets

A series of experiments were performed to evaluate the D-AL-ELM algorithm on 9 benchmark datasets from the KEEL dataset [29] and UCI dataset repositories [30]. Datasets included both binary and multiclass classification problems. As in [13], each raw dataset was divided into three parts: a small initial labeled set, a large unlabeled set, and a testing set. Testing instances comprised 50% of the total number of samples, while the percentage of initially labeled instances was assigned based on the size of the raw dataset and the number of categories. Detailed information regarding these datasets is presented in Table 1.

4.1.2. The Compared Algorithms, Parameter Settings, and the Performance Metric

In our experiments, we compare the proposed method with other state-of-the-art learning algorithms, including the following:(1)PL-ELM: a passive learning algorithm that randomly selects some instances from the unlabeled set to train the initial classifier(2)AL-ELM: a batch-mode active learning method based on ELM that uses the margin sampling strategy to select most uncertain examples for labeling [13](3)ELM-Entropy: querying discriminative samples through entropy measures [14]

In this study, the ELM adopted a sigmoid function as the activation function on the hidden level. A grid search based on tenfold cross-validation was then used to find the optimal number of hidden nodes in the initial labeled set. For the regularization parameter , a leave-one-out (LOO) cross-validation strategy was adopted based on the minimum to find the optimal parameter value [31]. The optimal parameters and were determined from and on all the datasets except for the Letter dataset, where the parameter was searched among . Additionally, the tradeoff parameter for equation (10) was chosen by grid search when and were fixed through the aforementioned methods. It should be noted that the ELM parameter selection process was implemented in the same manner for all four methods.

Parameter details are shown in Table 2. It should be noted that the regularization parameter was automatically identified using the LOO cross-validation and was not fixed during the learning process (thus, not shown in Table 2).

The batch mode was adopted to add new labeled instances. For the proposed D-AL-ELM method, samples were first selected from the unlabeled set using equation (7), and then samples were selected from the samples using equation (11) and added to the labeled set for each iteration. In this experiment, was empirically set to while was 5% of the total instances in the original unlabeled set for 8 of the 9 datasets (except Letter). For the Letter dataset, was 1% of the total instances in the original unlabeled set and was set to . These parameters were chosen to decrease the labeling cost, considering the size of the raw dataset and the number of categories.

To provide a fair comparison, all four methods queried instances on each iteration. For each dataset, the procedure was stopped when the prediction accuracy stabilized or the number of selected samples was greater than 80% of the original unlabeled set. Additionally, to ensure the validity of experimental results, ten runs were performed for each learning method in each experiment, and average results were calculated.

4.1.3. Comparisons with Relevant State-of-the-Art Algorithms

Figure 1shows the trends of classification accuracy for the classifiers when trained by increasing numbers of data points across the various datasets. The results show that the proposed D-AL-ELM algorithm yielded the highest accuracy of all four methods on most of datasets (excepting the Wine and Iris datasets) at the last learning round. Specifically, the proposed method performed better than the remaining three methods over the majority of the active learning period for the Twonorm, Hayes-Roth, and Letter datasets. Moreover, the D-AL-ELM yielded the fastest learning rate over the first few iterations of the learning process for most datasets. This phenomenon indicates that the new method begins by effectively identifying the most informative and representative samples, unlike the other algorithms. Additionally, the ELM-Entropy approach generally yielded lower accuracy in multiclass classification, failing to surpass the PL-ELM on the Wine, Hayes-Roth, Iris, and Letter datasets. Another interesting observation was that the performance tended to degrade at a certain interval on the Segment dataset. It was considered that the Segment dataset may have a more irregular data structure, confounding the BvSB strategy and deteriorating the result. In cases such as this, a more adaptive stop criterion should be designed to stop the learning program at a more appropriate right time, before output degrades.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Table 3 presents the mean classification accuracies of the four methods across the 9 datasets during the learning process. The ALC values for the four methods are further compared in Table 4. The results shown in Tables 3 and 4 indicate that the D-AL-ELM method yielded the best performance among all datasets for the tested methods. As in [13], the ALC metric not only was related to the learning velocity but also had close relationship to the quality of the learning model. The proposed D-AL-ELM outperformed the other methods in terms of ALC, with the AL-ELM performing second best, with an accuracy close to that of the D-AL-ELM on the Wdbc and Segment datasets. For the Wdbc dataset, although the proposed method had a slightly higher ALC value than the AL-ELM, both algorithms yielded the same mean accuracy for the overall learning process.

Finally, Table 5 reports the average time for the learning stage of each algorithm across all datasets. As expected, the PL-ELM was the fastest method because it lacked any criteria for the evaluation of samples. The proposed D-AL-ELM required slightly more learning time than AL-ELM and ELM-Entropy, since it computed both informativeness and representativeness of each instance. Considering the improvement of classification performance, this extra time may be deemed an acceptable tradeoff.

4.1.4. Analysis of Effect of Different Batch Size Values

In this experiment, the performance of the proposed active learning method was further evaluated using different batch sizes (i.e., and values).

The new method was tested with different querying sizes by varying the values of and , respectively. The remaining experimental settings were the same as in earlier experiments and testing was conducted on two benchmark datasets: Iris and Wine. The and parameters were set as to observe the performance with different batch sizes. Results are reported in Figures 2 and 3. In Figure 2, was fixed at 5% of the total number of instances in the original unlabeled set and was chosen from a candidate set . In Figure 3, was fixed at the value of and was chosen from . It can be seen from Figure 2 that learning rates at the start of the curve increased with higher values. Performance on Iris was less sensitive to the value when enough instances were queried, and learning curves tended to be similar when query numbers and values were large. In contrast, performance on the Wine dataset was more sensitive to . This may be a result of the Wine dataset having a more complex distribution, which is difficult to capture. Although the D-AL-ELM performed differently on the two datasets, relatively larger values were consistently able to obtain favorable performance. On the other hand, this increase in value leads to a greater computational burden. Figure 3 shows the effects of different values of on the Iris and Wine datasets. From this, it was observed that convergence can be more easily achieved with small values. Alternatively, when is large, more instances can be learned at each iteration and the number of total iterations greatly reduced, although this boost in performance does not provide substantially increased accuracy. In conclusion, optimizing the and values is not crucial for the D-AL-ELM, as most values yield similar results. It should be noted, however, that larger and are generally recommended.

(a)

(b)

(a)

(b)

4.2. Experiment on Multiclass MI EEG Data

4.2.1. Description of EEG Datasets

This section further evaluates the performance of the proposed D-AL-ELM method on multiclass MI EEG data from the BCI Competition IV Dataset 2a [32]. This dataset consists of the EEG signals from 9 subjects who performed 4 tasks, including left hand, right hand, foot, and tongue MI. EEG signals were recorded using 22 electrodes. Each subject underwent a training and testing session, each consisting of 288 trials (a total of 576 trials across the two sessions).

4.2.2. Experimental Setup and Parameter Settings

Data preprocessing was first performed on the raw EEG data. For each trial, features were extracted from the time segment lasting from 0.5 s to 2.5 s after the cue instructing the subject to perform MI. Each trial was first band-pass filtered from 8–30 Hz using a fifth-order Butterworth filter. Next, the dimension of the EEG signal was reduced to a 24-dimension feature set using the one-versus-rest common spatial pattern (OVR-CSP) algorithm [33], which is an effective and popular feature extraction method for EEG multiclassification that computes the features that discriminate each class from the remaining classes. Finally, the features extracted by OVR-CSP were discriminated using the different classification methods.

Optimal selection of the , , and parameters was performed in the same manner described in Section 4.1.2. The number of hidden nodes was searched within . For each subject, the first 400 trials were considered as the training set, while the remaining 176 trials were used as the independent testing set [11]. The values for and were set at and . Finally, experiments included ten runs for each learning method from which average results were calculated.

4.2.3. Comparisons with Related Algorithms

Figure 4 illustrates the trend lines of classification accuracy when methods were applied to different testing datasets, while Table 6 lists the mean classification accuracies of the four methods during the learning process. Table 7 then provides the ALC results, while Table 8 shows the average running time (s) for the learning stage.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

The results show that the performance of D-AL-ELM method is comparable to that of the AL-ELM and better than that of the ELM-Entropy and PL-ELM algorithms for all subjects (except for subject 2 in PL-ELM). Specifically, the proposed method surpassed the AL-ELM approach in 6 of the 9 subjects (1, 2, 4, 5, 6, 9) in terms of the ALC metric. For all 9 subjects, the D-AL-ELM method yielded a mean accuracy of 71.36%, higher than that of AL-ELM (70.92%), ELM-Entropy (70.34%), and PL-ELM (70.51%). These results demonstrate the effectiveness of the D-AL-ELM in selecting both informative and representative instances from unlabeled EEG samples. Additionally, they reveal that the proposed method can calibrate an effective classifier for MI EEG signals without the need for a large number of labeled training samples.

For comparative purposes, Table 8 also provides the average running time of each learning algorithm. Although the D-AL-ELM exhibited slightly longer training time than the other three methods, this may be considered a worthwhile tradeoff for the improved classification performance of the D-AL-ELM.

4.3. Discussion

In these experiments, the proposed D-AL-ELM method exhibited excellent performance in both classification accuracy and computational efficiency, as demonstrated on several benchmark datasets and an experimental MI EEG dataset. When compared to a passive learning-based ELM, D-AL-ELM achieved improved performance by effectively extracting the most valuable unlabeled samples. The D-AL-ELM also outperformed the AL-ELM and ELM-Entropy algorithms, which both employed a single-query strategy. Improvement was seen on all nine datasets in Section 4.1, evidencing the ability of the D-AL-ELM to boost overall learning performance by combining the uncertainty and diversity strategies when updating the classifier with the selected samples. In terms of computational efficiency, the slight increase in training time for the D-AL-ELM, as compared to the PL-ELM, AL-ELM, and ELM-Entropy, was negligible in practice, especially when considering the improved classification accuracy. The experimental results then demonstrate that the proposed algorithm can effectively and comprehensively measure the representativeness of samples. Simultaneously, the proposed approach also measures how informative individual examples are, contributing to the improved classifier performance. Combining these factors, suitable instances can be selected for classifier construction.

Finally, the effectiveness of the D-AL-ELM was shown in its application to an experimental multiclass MI task from the BCI Competition IV Dataset 2a. Due to the low signal-to-noise ratio of EEG data, the applied algorithms struggled to generate adequate results. Consequently, hand-designed features were first extracted from the raw EEG data using the OVR-CSP approach, and the different AL algorithms were then used to further extract the unlabeled samples and calibrate a robust classifier. For subjects S1, S3, S7, S8, and S9, the D-AL-ELM yielded an acceptably high mean classification accuracy of over 80% for the whole learning process. Unfortunately, all tested methods performed poorly on subject S5. The proposed algorithm was only able to achieve 49.06% accuracy which, though insufficient, still ranked the highest among the applied methods.

5. Conclusion

In this paper, a novel active learning method with ELM, the D-AL-ELM, was developed for multiclassification. This new algorithm combines the uncertainty and diversity strategies and effectively reduces the expense and time cost of obtaining labeled data manually. For each sample, the proposed algorithm employs a BvSB strategy to measure informativeness and the cosine angle distance to measure diversity. The modified diversity measure not only estimates the diversity between the limited labeled training data and previously selected unlabeled samples, but also calculates the similarity among previously selected samples. Experimental results from several benchmark datasets and the multiclass MI EEG data from BCI Competition IV Dataset 2a were then used to verify the efficacy of the proposed D-AL-ELM algorithm. These results indicate that the performance of the proposed algorithm is consistently better than, or at least comparable to, that of other popular active learning techniques. Future work will then aim to develop online learning for the D-AL-ELM [23, 34]. In addition, an adaptive stopping criterion may be applied to promote the efficiency of the D-AL-ELM and improve its abilities for the classification and evaluation of MI EEG signals.

Data Availability

The BCI Competition IV Dataset 2a was used in our study, which is publicly available via the following link: http://www.bbci.de/competition/iv/.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 61871427 and 61671197). The authors would like to acknowledge the BCI Competition IV Dataset 2a which was used to test the algorithms proposed in this study.

References

F. Lotte, L. Bougrain, A. Cichocki et al., “A review of classification algorithms for EEG-based brain-computer interfaces: a 10-year update,” Journal of Neural Engineering, vol. 15, no. 3, p. 031005, 2018.
View at: Publisher Site | Google Scholar
P. Gonzalez-Navarro, M. Moghadamfalahi, M. Akcakaya, and D. Erdogmus, “Spatio-temporal EEG models for brain interfaces,” Signal Processing, vol. 131, pp. 333–343, 2017.
View at: Publisher Site | Google Scholar
K. K. Ang and C. Guan, “EEG-based strategies to detect motor imagery for control and rehabilitation,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, no. 4, pp. 392–401, 2017.
View at: Publisher Site | Google Scholar
R. Zhang, Y. Li, Y. Yan et al., “Control of a wheelchair in an indoor environment based on a brain-computer interface and automated navigation,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 24, no. 1, pp. 128–139, 2016.
View at: Publisher Site | Google Scholar
Q. She, H. Gan, Y. Ma et al., “Scale-Dependent signal identification in low-dimensional subspace: motor imagery task classification,” Neural Plasticity, vol. 2016, Article ID 7431012, p. 15, 2016.
View at: Publisher Site | Google Scholar
R. Li, T. Potter, W. Huang et al., “Enhancing performance of a hybrid EEG-fNIRS system using channel selection and early temporal features,” Frontiers in Human Neuroscience, vol. 11, p. 462, 2017.
View at: Publisher Site | Google Scholar
P. Gaur, R. B. Pachori, H. Wang, and G. Prasad, “A multi-class EEG-based BCI classification using multivariate empirical mode decomposition based filtering and riemannian geometry,” Expert Systems with Applications, vol. 95, no. 1, pp. 201–211, 2018.
View at: Publisher Site | Google Scholar
J. Li and L. Zhang, “Active training paradigm for motor imagery BCI,” Experimental Brain Research, vol. 219, no. 2, pp. 245–254, 2012.
View at: Publisher Site | Google Scholar
Y. Freund, H. S. Seung, E. Shamir, and N. Tishby, “Selective sampling using the query by committee algorithm,” Machine Learning, vol. 28, no. 2/3, pp. 133–168, 1997.
View at: Publisher Site | Google Scholar
S. Kee, E. Del Castillo, and G. Runger, “Query-by-committee improvement with diversity and density in batch active learning,” Information Sciences, vol. 454-455, pp. 401–418, 2018.
View at: Publisher Site | Google Scholar
V. Lawhern, D. Slayback, D. Wu et al., “Efficient labeling of EEG signal artifacts using active learning,” in Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, pp. 3217–3222, Kowloon, China, October 2015.
View at: Publisher Site | Google Scholar
M. Chen, X. Tan, J. Q. Gan et al., “A batch-mode active learning method based on the nearest average-class distance (NACD) for multiclass brain-computer interfaces,” Journal of Fiber Bioengineering & Informatics, vol. 7, no. 4, pp. 627–636, 2014.
View at: Google Scholar
H. Yu, C. Sun, W. Yang, X. Yang, and X. Zuo, “AL-ELM: one uncertainty-based active learning algorithm using extreme learning machine,” Neurocomputing, vol. 166, pp. 140–150, 2015.
View at: Publisher Site | Google Scholar
R. Wang, S. Kwong, Q. Jiang et al., “Active learning based on single-hidden layer feed-forward neural network,” in Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, pp. 2158–2163, Kowloon, China, October 2015.
View at: Publisher Site | Google Scholar
Z. Qiu, D. J. Miller, and G. Kesidis, “A maximum entropy framework for semisupervised and active learning with unknown and label-scarce classes,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 4, pp. 917–933, 2017.
View at: Publisher Site | Google Scholar
R. Chattopadhyay, Z. Wang, W. Fan, I. Davidson, S. Panchanathan, and J. Ye, “Batch mode active sampling based on marginal probability distribution matching,” ACM Transactions on Knowledge Discovery from Data, vol. 7, no. 3, pp. 1–25, 2013.
View at: Publisher Site | Google Scholar
H. Ibrahim, K. Abbas, H. Imali et al., “Multiclass informative instance transfer learning framework for motor imagery-based brain-computer interface,” Computational Intelligence and Neuroscience, vol. 2018, Article ID 6323414, 12 pages, 2018.
View at: Publisher Site | Google Scholar
S. Tong and D. Koller, “Support vector machine active learning with applications to text classification,” Journal of Machine Learning Research, vol. 2, no. Nov, pp. 45–66, 2001.
View at: Google Scholar
S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Semisupervised SVM batch mode active learning with applications to image retrieval,” ACM Transactions on Information Systems, vol. 27, no. 3, pp. 1–29, 2009.
View at: Publisher Site | Google Scholar
Q. Zhang and S. Sun, “Multiple-view multiple-learner active learning,” Pattern Recognition, vol. 43, no. 9, pp. 3113–3119, 2010.
View at: Publisher Site | Google Scholar
G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006.
View at: Publisher Site | Google Scholar
G. Huang, G.-B. Huang, S. Song, and K. You, “Trends in extreme learning machines: a review,” Neural Networks, vol. 61, pp. 32–48, 2015.
View at: Publisher Site | Google Scholar
Y. Zhang and M. J. Er, “Sequential active learning using meta-cognitive extreme learning machine,” Neurocomputing, vol. 173, no. P3, pp. 835–844, 2016.
View at: Publisher Site | Google Scholar
B. Du, Z. Wang, L. Zhang et al., “Exploring representativeness and informativeness for active learning,” IEEE Transactions on Cybernetics, vol. 47, no. 1, pp. 14–26, 2017.
View at: Publisher Site | Google Scholar
Y. Gu, S. C. Chiu, and Z. Jin, “Active learning combining uncertainty and diversity for multi-class image classification,” IET Computer Vision, vol. 9, no. 3, pp. 400–407, 2015.
View at: Publisher Site | Google Scholar
S.-J. Huang, R. Jin, and Z.-H. Zhou, “Active learning by querying informative and representative examples,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 10, pp. 1936–1949, 2014.
View at: Publisher Site | Google Scholar
G. B. Huang, H. Zhou, X. Ding et al., “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems Man & Cybernetics Part B, vol. 42, no. 2, pp. 513–529, 2012.
View at: Google Scholar
A. J. Joshi, F. Porikli, and N. P. Papanikolopoulos, “Scalable active learning for multiclass image classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2259–2273, 2012.
View at: Publisher Site | Google Scholar
J. Alcalá-Fdez, A. Fernández, J. Luengo et al., “KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic & Soft Computing, vol. 17, pp. 255–287, 2011.
View at: Google Scholar
D. Dua and E. Karra Taniskidou, UCI Machine Learning Repository, 2017, http://archive.ics.uci.edu/ml/.
J. Cao, K. Zhang, M. Luo, C. Yin, and X. Lai, “Extreme learning machine and adaptive sparse representation for image classification,” Neural Networks, vol. 81, pp. 91–102, 2016.
View at: Publisher Site | Google Scholar
M. Tangermann, K. R. Müller, A. Aertsen et al., “Review of the BCI competition IV,” Frontiers in Neuroscience, vol. 6, p. 55, 2012.
View at: Publisher Site | Google Scholar
M. Meng, J. Zhu, Q. She et al., “Two-level feature extraction method for multi-class motor imagery EEG,” Acta Automatica Sinica, vol. 42, no. 12, pp. 1915–1922, 2016.
View at: Google Scholar
J. S. Lim, S. Lee, and H. S. Pang, “Low complexity adaptive forgetting factor for online sequential extreme learning machine (OS-ELM) for application to nonstationary system estimations,” Neural Computing & Applications, vol. 22, no. 3-4, pp. 569–576, 2013.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2020 Qingshan She et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

563

Downloads

1282

Citations