Bayesian common spatial patterns for multi-subject EEG classification

doi:10.1016/j.neunet.2014.05.012

Neural Networks

Volume 57, September 2014, Pages 39-50

https://doi.org/10.1016/j.neunet.2014.05.012 Get rights and content

Abstract

Multi-subject electroencephalography (EEG) classification involves algorithm development for automatically categorizing brain waves measured from multiple subjects who undergo the same mental task. Common spatial patterns (CSP) or its probabilistic counterpart, PCSP, is a popular discriminative feature extraction method for EEG classification. Models in CSP or PCSP are trained on a subject-by-subject basis so that inter-subject information is neglected. In the case of multi-subject EEG classification, however, it is desirable to capture inter-subject relatedness in learning a model. In this paper we present a nonparametric Bayesian model for a multi-subject extension of PCSP where subject relatedness is captured by assuming that spatial patterns across subjects share a latent subspace. Spatial patterns and the shared latent subspace are jointly learned by variational inference. We use an infinite latent feature model to automatically infer the dimension of the shared latent subspace, placing Indian Buffet process (IBP) priors on our model. Numerical experiments on BCI competition III IVa and IV 2a dataset demonstrate the high performance of our method, compared to PCSP and existing Bayesian multi-task CSP models.

Introduction

Electroencephalography (EEG) is the recording of electrical potentials using multiple sensors placed on the scalp, to collect multivariate time series data involving brain activities. EEG classification enables computers to translate a subject’s intention or mental status into a control signal for a device, providing a new communication mode, known as brain–computer interface (BCI) (Cichocki et al., 2008, Dornhege et al., 2007, Ebrahimi et al., 2003). Multi-subject EEG classification considers brain signals from multiple subjects, each of whom undergoes the same mental task, where the brain waves reflect task-specific and subject-specific characteristics, as well as inter-subject variations.

Common spatial patterns (CSP) is a popular discriminative EEG feature extraction method, which has proved to be a useful subject-specific spatial filter (Blankertz et al., 2008, Koles, 1991, Müller-Gerking et al., 1999, Ramoser et al., 2000). Given two class-conditional covariance matrices, CSP seeks a discriminative subspace such that the variance for one class is maximized while the variance for the other class is minimized at the same time. The solution is determined by the simultaneous diagonalization of two class-conditional covariance matrices, which is equivalent to the generalized eigenvalue problem. Probabilistic CSP (PCSP) (Wu et al., 2009, Wu et al., 2010, Wu et al., 2011) is a probabilistic counterpart of CSP, where two linear Gaussian generative models with a shared basis matrix are jointly learned to infer spatial patterns that correspond to the column vectors of the shared basis matrix. Inherently, CSP or PCSP is a user-specific spatial filter, where the subject-relatedness is neglected. In the case where the number of training samples for a subject of interest is small, subject-specific models suffer from performance degradation. Multi-subject extensions of CSP can be found in Devlaminck, Wyns, Grosse-Wentrup, Otte, and Santens (2011), Kang, Nam, and Choi (2009), Krauledat, Tangermann, and Blankertz (2008) and Lotte and Guan (2010). In regularized CSP methods, class-conditional covariance matrices for a subject of interest are regularized by a linear combination of other subjects’ covariance matrices, in order to incorporate inter-subject relatedness (Kang et al., 2009, Lotte and Guan, 2011). Learning CSP is re-formulated as a risk minimization problem, where regularization is added to constrain spatial filters to become similar across subjects (Devlaminck et al., 2011). A multi-subject EEG classification method uses its own features other than CSP, and regularizes the learning of subject-specific parameters to become similar across subjects (Alamgir, Grosse-Wentrup, & Altun, 2010).

On the other hand, multi-task learning has been adopted in the probabilistic model for CSP, treating subjects as tasks. Bayesian multi-task learning (Heskes, 2000, Xue et al., 2007) deals with several related tasks at the same time, with the intention that the tasks will learn from each other by sharing hyperparameters (parameters of prior distributions). A Bayesian multi-task extension of CSP (BCSP) was recently developed in Kang and Choi (2011), where subject-to-subject information was transferred during the learning of model for a subject of interest by sharing hyperparameters across subjects, while treating subjects as tasks. Bayesian CSP (Kang & Choi, 2011) works better than PCSP (on subject-by-subject basis), although similarities or dissimilarities between spatial patterns are neglected because all spatial patterns are forced to share the same hyperparameters. Bayesian CSP with Dirichlet process (DP) priors (BCSP-DP) jointly learns and groups spatial patterns, so that spatial patterns in the same group, determined by the DP mixture model, share the hyperparameters of their prior distributions. Coupling similar spatial patterns in the same cluster by sharing hyperparameters facilitates information transfer among subjects with similar spatial patterns, whereas information transfer is prevented among dissimilar subjects. However, information transfer across clusters is not possible, so that the common characteristics across all the subjects are not captured.

In this paper we present a hierarchical Bayesian model where spatial patterns across subjects are assumed to share a latent subspace to capture the subject-relatedness, which is determined by measurements from a subject of interest as well as other subjects. Motivated by the idea of nonparametric Bayesian multi-task learning (Rai and Daumé, 2010, Zhang et al., 2008), we use an infinite latent feature model to automatically infer the dimension of the shared latent subspace, placing Indian Buffet process (IBP) priors (Griffiths & Ghahramani, 2006) on our model. We refer to the proposed method as BCSP-IBP and develop a variational inference algorithm.

The rest of this paper is organized as follows. In Section 2, we briefly review a few relevant existing methods, including CSP, PCSP, and Bayesian CSP models. Section 3 presents the main contribution of this paper, in which we describe Bayesian CSP with Indian Buffet process priors, where model and variational inference algorithms are illustrated. Section 4 presents numerical experiments on BCI competition III IVa and IV 2a dataset, to compare the classification performances of our proposed method to existing methods. Finally, conclusions are drawn in Section 5.

Section snippets

Related work

In this section, we briefly review two existing Bayesian CSP models (Kang and Choi, 2011, Kang and Choi, 2012) as well as CSP (Koles, 1991) and PCSP (Wu et al., 2009).

Bayesian CSP with Indian buffet process priors

In this section we present the main contribution of this paper. In order to alleviate limitations in the two previous Bayesian CSP models (Kang and Choi, 2011, Kang and Choi, 2012), we base our proposed method, referred to as BCSP-IBP, on infinite latent feature models (Griffiths and Ghahramani, 2006, Rai and Daumé, 2010, Zhang et al., 2008), assuming that spatial patterns across subjects share a latent subspace, referred to as shared latent subspace, to capture subject relatedness. In our

Experiments

We compare the performance of our proposed model BCSP-IBP to existing models such as CSP, PCSP, BCSP (Kang & Choi, 2011), and BCSP-DP (Kang & Choi, 2012), on the BCI Competition III IVa¹ and IV² 2a dataset (Blankertz et al., 2004, Blankertz et al., 2006, Sajda et al., 2003). The dimensionality of the feature vectors was set to six ( $n = 3$ ) in every model. We also compared the performances of two recent CSP-based multi-subject

Conclusions

We have presented a Bayesian CSP model with IBP priors for multi-subject EEG classification, where spatial patterns are modeled by an infinite latent feature model, assuming that a latent subspace is shared across subjects to capture subject relatedness. Spatial patterns are coupled through sharing a common latent subspace but individual characteristics are reflected by subject-specific coefficients. Compared to previous Bayesian CSP models that were reviewed in this paper, our proposed model,

Acknowledgments

This work was supported by National Research Foundation (NRF) of Korea (NRF-2013R1A2A2A01067464) and the IT R&D Program of MSIP/KEIT (14-824-09-014, Machine Learning Center). A preliminary version was presented in ICASSP-2013 (Kang & Choi, 2013).

References (36)

A. Delorme et al.
EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics
Journal of Neuroscience Methods
(2004)
Z.J. Koles
The quantitative extraction and topographic mapping of the abnormal components
EEG and Clinical Neurophysilology
(1991)
J. Müller-Gerking et al.
Designing optimal spatial filters for single-trial EEG classification in a movement task
Clinical Neurophysiology
(1999)
H. Wang et al.
Local temporal common spatial patterns for robust single-trial EEG classification
IEEE Transactions on Neural Systems and Rehabilitation Engineering
(2008)
W. Wu et al.
A hierarchical Bayesian approach for learning sparse spatio-temporal decompositions of multichannel EEG
NeuroImage
(2011)
Alamgir, M., Grosse-Wentrup, M., & Altun, Y. (2010). Multitask learning for brain–computer interfaces. In Proceedings...
B. Blankertz et al.
The BCI competition 2003: progress and perspectives in detection and discrimination of EEG single trials
IEEE Transactions on Biomedical Engineering
(2004)
B. Blankertz et al.
The BCI competition III: validating alternative approaches to actual BCI problems
IEEE Transactions on Neural Systems and Rehabilitation Engineering
(2006)
B. Blankertz et al.
Optimizing spatial filters for robust EEG single-trial analysis
IEEE Signal Processing Magazine
(2008)
D.M. Blei et al.
Variational inference for Dirichlet process mixtures
Bayesian Analysis
(2006)

A. Cichocki et al.

Noninvasive BCIs: multiway signal-processing array decompositions

IEEE Computer

(2008)

D. Devlaminck et al.

Multisubject learning for common spatial patterns in motor-imagery BCI

Computational Intelligence and Neuroscience

(2011)

T.G. Dietterich

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation

(1998)

G. Dornhege et al.

Toward brain–computer interfacing

(2007)

Doshi-Velez, F., Miller, K.T., Gael, J.V., & Teh, Y.W. (2009). Variational inference for the Indian buffet process. In...

T. Ebrahimi et al.

Brain–computer interface in multimedia communication

IEEE Signal Processing Magazine

(2003)

T.L. Griffiths et al.

Infinite latent feature models and the Indian buffet process

Heskes, T. (2000). Empirical Bayes for learning to learn. In Proceedings of the international conference on machine...

Cited by (85)

EEG based emotion recognition by hierarchical bayesian spectral regression framework
2024, Journal of Neuroscience Methods
Spectral regression (SR), a graph-based learning regression model, can be used to extract features from graphs to realize efficient dimensionality reduction. However, due to the SR method remains a regularized least squares problem and being defined in L2-norm space, the effect of artifacts in EEG signals cannot be efficiently resisted. In this work, to further improve the robustness of the graph-based regression models, we propose to utilize the prior distribution estimation in the Bayesian framework and develop a robust hierarchical Bayesian spectral regression framework (named HB-SR), which is designed with the hierarchical Bayesian ensemble strategies. In the proposed HB-SR, the impact of noises can be effectively reduced by the adaptive adjustment approach in model parameters with the data-driven manner. Specifically, in the current work, three different distributions have been elaborately designed to enhance the universality of the proposed HB-SR, i.e., Gaussian distribution, Laplace distribution, and Student-t distribution. To objectively evaluate the performance of the HB-SR framework, we conducted both simulation studies and emotion recognition experiments based on emotional EEG signals. Experimental results have consistently indicated that compared with other existing spectral regression methods, the proposed HB-SR can effectively suppress the influence of noises and achieve robust EEG emotion recognition.
Multimodal pre-screening can predict BCI performance variability: A novel subject-specific experimental scheme
2024, Computers in Biology and Medicine
Brain-computer interface (BCI) systems currently lack the required robustness for long-term daily use due to inter- and intra-subject performance variability. In this study, we propose a novel personalized scheme for a multimodal BCI system, primarily using functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG), to identify, predict, and compensate for factors affecting competence-related and interfering factors associated with performance.
11 (out of 13 recruited) participants, including five participants with motor deficits, completed four sessions on average. During the training sessions, the subjects performed a short pre-screening phase, followed by three variations of a novel visou-mental (VM) protocol. Features extracted from the pre-screening phase were used to construct predictive platforms using stepwise multivariate linear regression (MLR) models. In the test sessions, we employed a task-correction phase where our predictive models were used to predict the ideal task variation to maximize performance, followed by an interference-correction phase. We then investigated the associations between predicted and actual performances and evaluated the outcome of correction strategies.
The predictive models resulted in respective adjusted R-squared values of 0.942, 0.724, and 0.939 for the first, second, and third variation of the task, respectively. The statistical analyses showed significant associations between the performances predicted by predictive models and the actual performances for the first two task variations, with rhos of 0.7289 (p-value = 0.011) and 0.6970 (p-value = 0.017), respectively. For 81.82 % of the subjects, the task/workload correction stage correctly determined which task variation provided the highest accuracy, with an average performance gain of 5.18 % when applying the correction strategies.
Our proposed method can lead to an integrated multimodal predictive framework to compensate for BCI performance variability, particularly, for people with severe motor deficits.
EEG signal processing by feature extraction and classification based on biomedical deep learning architecture with wireless communication
2022, Optik
Citation Excerpt :
For classification of mental arithmetic tasks, methods described above solely looked at the time-domain as well as frequency-domain aspects of EEG data. State-space representation-based entropy characteristics have been employed in different applications, such as detection of generalized as well as partial epileptic seizures [17], emotion identification [18] and BCI applications, to measure non-linearity and randomness utilizing EEG signal processing [16]. These non-linear entropy properties, such as dispersion entropy and other entropy measures, have not been investigated for using EEG signals to categorize MACT.
The brain is the most important organ in the human body. It transmits a lot of data to many body parts and its health is a prominent topic in e-healthcare. The Electroencefalography (EEG) is used in the biomedical industry to identify brain-related illnesses in which the input was gathered via brain EEG waves to extract brain action features. The band power of each collected signal is seen in Emotive Epoc Pro. Alpha wave (4–8 Hz), theta wave The recorded signal's changes with respect to time are at the frequencies of (4–8 Hz), alpha (8–12 Hz), low beta (12–16 Hz), beta (16–25 Hz), and gamma (25–45 Hz). The kernel extreme learning-based multilayer perceptron (KExL MLP) was used to extract the processed images and the extracted features of brain activity have been classified using Bayesian quadratic discriminant Transfer neural network (BQDTNN) which reflect brain activity. The accuracy of the parametric analysis was 96 %, the precision was 72 %, the recall was 69 %, and the F-1 score was 62 % for the EEG input dataset and 98 %, the precision was 80%, the recall was 73 %, and the F-1 score was 71 % for the SSVEP EEG dataset.
Multivariate time-series classification with hierarchical variational graph pooling
2022, Neural Networks
In recent years, multivariate time-series classification (MTSC) has attracted considerable attention owing to the advancement of sensing technology. Existing deep-learning-based MTSC techniques, which mostly rely on convolutional or recurrent neural networks, focus primarily on the temporal dependency of a single time series. Based on this, complex pairwise dependencies among multivariate variables can be better described using advanced graph methods, where each variable is regarded as a node in the graph, and their dependencies are regarded as edges. Furthermore, current spatial–temporal modeling (e.g., graph classification) methodologies based on graph neural networks (GNNs) are inherently flat and cannot hierarchically aggregate node information. To address these limitations, we propose a novel graph-pooling-based framework, MTPool, to obtain an expressive global representation of MTS. We first convert MTS slices into graphs using the interactions of variables via a graph structure learning module and obtain the spatial–temporal graph node features via a temporal convolutional module. To obtain global graph-level representation, we design an “encoder-decoder”-based variational graph pooling module to create adaptive centroids for cluster assignments. Then, we combine GNNs and our proposed variational graph pooling layers for joint graph representation learning and graph coarsening, after which the graph is progressively coarsened to one node. Finally, a differentiable classifier uses this coarsened representation to obtain the final predicted class. Experiments on ten benchmark datasets showed that MTPool outperforms state-of-the-art strategies in the MTSC task.
Spatial filtering based on Riemannian distance to improve the generalization of ErrP classification
2022, Neurocomputing
Citation Excerpt :
TL approaches based on feature space learning try to find a transformation to a data space in which the features are invariant across sessions or subjects and a single classification rule can classify all the data [10–13]. On the other hand, model space approaches attempt to learn how the decision rules differ across sessions or subjects [14,15]. Feature space learning is the approach most found in the literature, such as regularized common spatial patterns (CSP), and covariate shift adaptation [9,2].
Due to the inherent non-stationarity of EEG signals, before each experimental session, BCI is usually calibrated to build the classification models, thus avoiding performance decay. This tedious re-calibration procedure is a limiting factor for real-world applications. Therefore, single-calibration or zero-calibration plays a crucial role in the use of BCIs in real contexts, outside the laboratory. Here, we propose and validate a statistical spatial filter, Riemannian Fisher criterion beamformer, based on Riemannian geometry able to use the invariance properties of Riemannian distance to handle cross-session and cross-subject generalization. The proposed method is validated with two datasets publicly available, consisting of error-related potentials. The results show that the proposed filter improves the generalization across sessions and across subjects and that it is robust to the amount of error samples used to train the classification model.
Improvement decoding performance based on GQDA during a high engagement demanding paradigm
2022, Biomedical Signal Processing and Control
This research concentrates on the study of dexterous brain-computer interaction, combining a new paradigm and method for the decoding of electroencephalography (EEG) motor intention. An EEG evoking paradigm by implementing margin energy constraint in complex coupling objects is used for inducing strong intention and providing high recognition accuracy. Motor intention decoding from task-dependent EEG was given by a generalized quadratic discriminant analysis, which is proposed based on discriminant analysis. The generalization method is flexible and adaptive to any given data. Furthermore, we conduct a systematic comparison relative to extracted features in different dimensions and to find the optimal feature dimension. Cross-validation is used to make the generalized quadratic discriminant analysis method adaptive individually for each volunteer. The discriminant performance of state recognition given by area under the receiver operating characteristics curve is ∼ 0.96 from ten volunteers, this performance has been found to compare very favorably with some well-known powerful classifiers. It has been demonstrated that the proposed method generalized quadratic discriminant analysis has excellent decoding performance in manipulating complex objects. Together, this work can provide a guarantee for improving the accuracy of intention recognition on data level with the high engagement demanding task, and may inspire better EEG acquisition contexts.

View all citing articles on Scopus

View full text

Bayesian common spatial patterns for multi-subject EEG classification

Abstract

Introduction

Section snippets

Related work

Bayesian CSP with Indian buffet process priors

Experiments

Conclusions

Acknowledgments

Journal of Neuroscience Methods

EEG and Clinical Neurophysilology

Clinical Neurophysiology

IEEE Transactions on Neural Systems and Rehabilitation Engineering

NeuroImage

The BCI competition 2003: progress and perspectives in detection and discrimination of EEG single trials

IEEE Transactions on Biomedical Engineering

The BCI competition III: validating alternative approaches to actual BCI problems

IEEE Transactions on Neural Systems and Rehabilitation Engineering

Optimizing spatial filters for robust EEG single-trial analysis

IEEE Signal Processing Magazine

Variational inference for Dirichlet process mixtures

Bayesian Analysis

Noninvasive BCIs: multiway signal-processing array decompositions

IEEE Computer

Multisubject learning for common spatial patterns in motor-imagery BCI

Computational Intelligence and Neuroscience

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation

Toward brain–computer interfacing

Brain–computer interface in multimedia communication

IEEE Signal Processing Magazine

Infinite latent feature models and the Indian buffet process