Elsevier

Neural Networks

Volume 57, September 2014, Pages 39-50
Neural Networks

Bayesian common spatial patterns for multi-subject EEG classification

https://doi.org/10.1016/j.neunet.2014.05.012Get rights and content

Abstract

Multi-subject electroencephalography (EEG) classification involves algorithm development for automatically categorizing brain waves measured from multiple subjects who undergo the same mental task. Common spatial patterns (CSP) or its probabilistic counterpart, PCSP, is a popular discriminative feature extraction method for EEG classification. Models in CSP or PCSP are trained on a subject-by-subject basis so that inter-subject information is neglected. In the case of multi-subject EEG classification, however, it is desirable to capture inter-subject relatedness in learning a model. In this paper we present a nonparametric Bayesian model for a multi-subject extension of PCSP where subject relatedness is captured by assuming that spatial patterns across subjects share a latent subspace. Spatial patterns and the shared latent subspace are jointly learned by variational inference. We use an infinite latent feature model to automatically infer the dimension of the shared latent subspace, placing Indian Buffet process (IBP) priors on our model. Numerical experiments on BCI competition III IVa and IV 2a dataset demonstrate the high performance of our method, compared to PCSP and existing Bayesian multi-task CSP models.

Introduction

Electroencephalography (EEG) is the recording of electrical potentials using multiple sensors placed on the scalp, to collect multivariate time series data involving brain activities. EEG classification enables computers to translate a subject’s intention or mental status into a control signal for a device, providing a new communication mode, known as brain–computer interface (BCI) (Cichocki et al., 2008, Dornhege et al., 2007, Ebrahimi et al., 2003). Multi-subject EEG classification considers brain signals from multiple subjects, each of whom undergoes the same mental task, where the brain waves reflect task-specific and subject-specific characteristics, as well as inter-subject variations.

Common spatial patterns (CSP) is a popular discriminative EEG feature extraction method, which has proved to be a useful subject-specific spatial filter (Blankertz et al., 2008, Koles, 1991, Müller-Gerking et al., 1999, Ramoser et al., 2000). Given two class-conditional covariance matrices, CSP seeks a discriminative subspace such that the variance for one class is maximized while the variance for the other class is minimized at the same time. The solution is determined by the simultaneous diagonalization of two class-conditional covariance matrices, which is equivalent to the generalized eigenvalue problem. Probabilistic CSP (PCSP) (Wu et al., 2009, Wu et al., 2010, Wu et al., 2011) is a probabilistic counterpart of CSP, where two linear Gaussian generative models with a shared basis matrix are jointly learned to infer spatial patterns that correspond to the column vectors of the shared basis matrix. Inherently, CSP or PCSP is a user-specific spatial filter, where the subject-relatedness is neglected. In the case where the number of training samples for a subject of interest is small, subject-specific models suffer from performance degradation. Multi-subject extensions of CSP can be found in Devlaminck, Wyns, Grosse-Wentrup, Otte, and Santens (2011), Kang, Nam, and Choi (2009), Krauledat, Tangermann, and Blankertz (2008) and Lotte and Guan (2010). In regularized CSP methods, class-conditional covariance matrices for a subject of interest are regularized by a linear combination of other subjects’ covariance matrices, in order to incorporate inter-subject relatedness (Kang et al., 2009, Lotte and Guan, 2011). Learning CSP is re-formulated as a risk minimization problem, where regularization is added to constrain spatial filters to become similar across subjects (Devlaminck et al., 2011). A multi-subject EEG classification method uses its own features other than CSP, and regularizes the learning of subject-specific parameters to become similar across subjects (Alamgir, Grosse-Wentrup, & Altun, 2010).

On the other hand, multi-task learning has been adopted in the probabilistic model for CSP, treating subjects as tasks. Bayesian multi-task learning (Heskes, 2000, Xue et al., 2007) deals with several related tasks at the same time, with the intention that the tasks will learn from each other by sharing hyperparameters (parameters of prior distributions). A Bayesian multi-task extension of CSP (BCSP) was recently developed in Kang and Choi (2011), where subject-to-subject information was transferred during the learning of model for a subject of interest by sharing hyperparameters across subjects, while treating subjects as tasks. Bayesian CSP (Kang & Choi, 2011) works better than PCSP (on subject-by-subject basis), although similarities or dissimilarities between spatial patterns are neglected because all spatial patterns are forced to share the same hyperparameters. Bayesian CSP with Dirichlet process (DP) priors (BCSP-DP) jointly learns and groups spatial patterns, so that spatial patterns in the same group, determined by the DP mixture model, share the hyperparameters of their prior distributions. Coupling similar spatial patterns in the same cluster by sharing hyperparameters facilitates information transfer among subjects with similar spatial patterns, whereas information transfer is prevented among dissimilar subjects. However, information transfer across clusters is not possible, so that the common characteristics across all the subjects are not captured.

In this paper we present a hierarchical Bayesian model where spatial patterns across subjects are assumed to share a latent subspace to capture the subject-relatedness, which is determined by measurements from a subject of interest as well as other subjects. Motivated by the idea of nonparametric Bayesian multi-task learning (Rai and Daumé, 2010, Zhang et al., 2008), we use an infinite latent feature model to automatically infer the dimension of the shared latent subspace, placing Indian Buffet process (IBP) priors (Griffiths & Ghahramani, 2006) on our model. We refer to the proposed method as BCSP-IBP and develop a variational inference algorithm.

The rest of this paper is organized as follows. In Section  2, we briefly review a few relevant existing methods, including CSP, PCSP, and Bayesian CSP models. Section  3 presents the main contribution of this paper, in which we describe Bayesian CSP with Indian Buffet process priors, where model and variational inference algorithms are illustrated. Section  4 presents numerical experiments on BCI competition III IVa and IV 2a dataset, to compare the classification performances of our proposed method to existing methods. Finally, conclusions are drawn in Section  5.

Section snippets

Related work

In this section, we briefly review two existing Bayesian CSP models (Kang and Choi, 2011, Kang and Choi, 2012) as well as CSP (Koles, 1991) and PCSP (Wu et al., 2009).

Bayesian CSP with Indian buffet process priors

In this section we present the main contribution of this paper. In order to alleviate limitations in the two previous Bayesian CSP models (Kang and Choi, 2011, Kang and Choi, 2012), we base our proposed method, referred to as BCSP-IBP, on infinite latent feature models (Griffiths and Ghahramani, 2006, Rai and Daumé, 2010, Zhang et al., 2008), assuming that spatial patterns across subjects share a latent subspace, referred to as shared latent subspace, to capture subject relatedness. In our

Experiments

We compare the performance of our proposed model BCSP-IBP to existing models such as CSP, PCSP, BCSP (Kang & Choi, 2011), and BCSP-DP (Kang & Choi, 2012), on the BCI Competition III IVa1 and IV2 2a dataset (Blankertz et al., 2004, Blankertz et al., 2006, Sajda et al., 2003). The dimensionality of the feature vectors was set to six (n=3) in every model. We also compared the performances of two recent CSP-based multi-subject

Conclusions

We have presented a Bayesian CSP model with IBP priors for multi-subject EEG classification, where spatial patterns are modeled by an infinite latent feature model, assuming that a latent subspace is shared across subjects to capture subject relatedness. Spatial patterns are coupled through sharing a common latent subspace but individual characteristics are reflected by subject-specific coefficients. Compared to previous Bayesian CSP models that were reviewed in this paper, our proposed model,

Acknowledgments

This work was supported by National Research Foundation (NRF) of Korea (NRF-2013R1A2A2A01067464) and the IT R&D Program of MSIP/KEIT (14-824-09-014, Machine Learning Center). A preliminary version was presented in ICASSP-2013 (Kang & Choi, 2013).

References (36)

  • A. Cichocki et al.

    Noninvasive BCIs: multiway signal-processing array decompositions

    IEEE Computer

    (2008)
  • D. Devlaminck et al.

    Multisubject learning for common spatial patterns in motor-imagery BCI

    Computational Intelligence and Neuroscience

    (2011)
  • T.G. Dietterich

    Approximate statistical tests for comparing supervised classification learning algorithms

    Neural Computation

    (1998)
  • G. Dornhege et al.

    Toward brain–computer interfacing

    (2007)
  • Doshi-Velez, F., Miller, K.T., Gael, J.V., & Teh, Y.W. (2009). Variational inference for the Indian buffet process. In...
  • T. Ebrahimi et al.

    Brain–computer interface in multimedia communication

    IEEE Signal Processing Magazine

    (2003)
  • T.L. Griffiths et al.

    Infinite latent feature models and the Indian buffet process

  • Heskes, T. (2000). Empirical Bayes for learning to learn. In Proceedings of the international conference on machine...
  • Cited by (85)

    • EEG signal processing by feature extraction and classification based on biomedical deep learning architecture with wireless communication

      2022, Optik
      Citation Excerpt :

      For classification of mental arithmetic tasks, methods described above solely looked at the time-domain as well as frequency-domain aspects of EEG data. State-space representation-based entropy characteristics have been employed in different applications, such as detection of generalized as well as partial epileptic seizures [17], emotion identification [18] and BCI applications, to measure non-linearity and randomness utilizing EEG signal processing [16]. These non-linear entropy properties, such as dispersion entropy and other entropy measures, have not been investigated for using EEG signals to categorize MACT.

    • Spatial filtering based on Riemannian distance to improve the generalization of ErrP classification

      2022, Neurocomputing
      Citation Excerpt :

      TL approaches based on feature space learning try to find a transformation to a data space in which the features are invariant across sessions or subjects and a single classification rule can classify all the data [10–13]. On the other hand, model space approaches attempt to learn how the decision rules differ across sessions or subjects [14,15]. Feature space learning is the approach most found in the literature, such as regularized common spatial patterns (CSP), and covariate shift adaptation [9,2].

    View all citing articles on Scopus
    View full text