Elsevier

Neural Networks

Volume 24, Issue 7, September 2011, Pages 726-734
Neural Networks

Bayesian inference for an adaptive Ordered Probit model: An application to Brain Computer Interfacing

https://doi.org/10.1016/j.neunet.2011.03.019Get rights and content

Abstract

This paper proposes an algorithm for adaptive, sequential classification in systems with unknown labeling errors, focusing on the biomedical application of Brain Computer Interfacing (BCI). The method is shown to be robust in the presence of label and sensor noise. We focus on the inference and prediction of target labels under a nonlinear and non-Gaussian model. In order to handle missing or erroneous labeling, we model observed labels as a noisy observation of a latent label set with multiple classes (2). Whilst this paper focuses on the method’s application to BCI systems, the algorithm has the potential to be applied to many application domains in which sequential missing labels are to be imputed in the presence of uncertainty. This dynamic classification algorithm combines an Ordered Probit model and an Extended Kalman Filter (EKF). The EKF estimates the parameters of the Ordered Probit model sequentially with time. We test the performance of the classification approach by processing synthetic datasets and real experimental EEG signals with multiple classes (2, 3 and 4 labels) for a Brain Computer Interfacing (BCI) experiment.

Introduction

Adaptive classification is an important problem in many application domains including biomedical engineering. Previous research in adaptive classification applied to biomedical engineering has used state space modeling of the time series (Lowne et al., 2010, Penny et al., 2000, Sykacek et al., 2004). The classification of an observed time series can often require extensive training before reliable results are obtained, as shown for biomedical applications (Pfurtscheller et al., 1993, Vidaurre et al., 2006, Wolpaw et al., 1991). Adaptive sequential classification approaches have been developed in order to reduce the overheads of training, to track non-stationarity and to cope rapidly with new observations. Such adaptive approaches differ from the typical methodology, in which an algorithm is trained off-line on retrospective data, in so far that the process of learning is continuously taking place rather than being confined to a section of training data.

In particular, Yoon et al., 2008a, Yoon et al., 2008b have investigated the use of the Extended Kalman Filter (EKF) to process and analyze brain signals with the assumption that an observed label is informative (there is little uncertainty), i.e. there are no ‘bit errors’ in the label stream. In general we do not, however, know the ground truth of the labels — we have only imperfect and noisy labels which we use as a reference. The assumption of error-free labels is hence invalid. In order to address this problem, we have developed a classification algorithm augmented by an auxiliary latent variable which corresponds to the uncorrupted underlying labels using Rao–Blackwellized Sequential Monte Carlo (Rao–Blackwellized SMC) (Yoon, Roberts, Dyson, & Gan, 2009). All these approaches have employed a standard Probit regression kernel for two class, binary classification. In this paper we develop a multi-class classifier.

Only two class labels may be distinguished by a sequential algorithm which employs a standard probit model (Aitchison & Silvey, 1957) with a threshold 0 so as to partition into two classes. The Probit model can be generalized by allowing multiple classes. Multi-class Probit models have been studied and applied in the economic (Geweke et al., 1994, McCulloch and Rossi, 1994) and clinical domains (Jung et al., 2006, Zhou et al., 2006). The multinomial Probit model is a well-known approach for unordered multi-class problems (Albert & Chip, 1993). Alternatively, for ordinal data, we can use the Ordered Probit model (Cameron and Trivedi, 2006, Munkin and Trivedi, 2007, Weiss, 1997).

It is common to find many application domains for ordinal classification, such as heights and weights of a population. For an ordinal dataset, the basic concept of the Ordered Probit model is that there is a latent continuous metric underlying the ordinal responses, and that thresholds partition the cumulative distribution function (c.d.f.) into a series of regions corresponding to the ordinal categories. Most literature in the Ordered Probit model focuses on off-line estimation of the thresholds for partitioning and on the estimation of their parameters and hyper-parameters (using, for example, Maximum Likelihood, Markov Chain Monte Carlo and Gibbs sampling) (Munkin and Trivedi, 2007, Ronning and Kukuk, 1996). However, the metrics (the thresholds) for ordinal classification may be adapted rather than kept fixed. In this paper, we propose an adaptive classifier using the Ordered Probit model which makes estimation of the non-stationary thresholds using hidden parameters whose values are dynamically inferred using an Extended Kalman Filter (EKF).

This paper is organized as follows. Section 2 presents the mathematical models for the proposed algorithms. We present the mathematical forms of the likelihood function and prior information with the Ordered Probit model based on a hierarchical framework. In Section 3 the algorithms are described in terms of prediction and filtering steps of the state space models. In Section 4 we present results from application to synthetic datasets (with two and three labels) and real experimental datasets composed of two and four label problems in the classification of human electroencephalogram (EEG) signals used for a Brain Computer Interfacing system.

Section snippets

Mathematical model

Our proposed algorithm is based on a probit classifier (regression) model. Suppose that we have radial basis functions φt(xt) from an observed input vector xt at time t. To simplify notation, we use φt instead of φt(xt) i.e. φt=φt(xt)=[xtT,{ϕt(1)(xt)}T,,{ϕt(Nb)(xt)}T,1]T and ϕt(i)(xt) is the response of the ith Gaussian basis function for i{1,,Nb} at time t=1,2,,T. The use of Gaussian basis functions allows for nonlinear classification boundaries yet retains analytic simplicity. Here, Nb is

Algorithms

In adaptive classification, there are two steps to our state space model which performs model inference: Filtering: p(θt|z1:t,x1:t) and Prediction: p(θt|z1:t1,x1:t1).

Results

The algorithm was tested on synthetic datasets (with two and three classes) and experimental EEG datasets (with two and four classes).

Discussion

From a methodological perspective, the proposed method has a strong assumption (limitation) that the number of classes is known. This assumption is not realistic in many real life problems. It is in general difficult to infer the number of classes even in off-line systems. Unfortunately, inferring the number of classes becomes far more difficult, or practically infeasible, because the trans-dimensional scheme requires high computational loads to properly build target distributions in an on-line

Conclusion

In this paper, we propose a variant of an Ordered Probit model to dynamically classify multi-class signals for adaptive Brain Computer Interfacing systems. The EKF enables the static Ordered Probit model to become an adaptive Ordered Probit model since the EKF estimates the hidden metrics (thresholds) of the adaptive Ordered Probit model. This approach also introduces auxiliary variables and truncation of the filtering distribution to deal with both Markov dependency of the output function and

Acknowledgments

This project was supported by grant named ‘EP/D030099’ from the UK EPSRC, to whom we are most grateful. The authors would like to acknowledge the core work of Duncan Lowne in the early development of adaptive classification in BCI systems.

References (30)

  • F. Babiloni et al.

    Linear classification of low-resolution EEG patterns produced by imagined hand movements

    IEEE Transactions on Rehabilitation Engineering

    (2000)
  • C.M. Bishop

    Pattern recognition and machine learning

    (1988)
  • A.C. Cameron et al.

    Regression analysis of count data

    (2006)
  • G. Dornhege et al.

    Boosting bit rates in noninvasive EEG single-trial classifications by feature combination and multiclass paradigms

    IEEE Transactions on Biomedical Engineering

    (2004)
  • J. Geweke et al.

    Alternative computational approaches to inference in the Multinomial Probit model

    The Review of Economics and Statistics

    (1994)
  • Cited by (9)

    • Adaptive semi-supervised classification to reduce intersession non-stationarity in multiclass motor imagery-based brain-computer interfaces

      2015, Neurocomputing
      Citation Excerpt :

      Adaptive procedures such as bias adaptation [14,24], importance weighted cross validation [25,26], or data space adaptation based on the Kullback–Leibler divergence [27] were proposed to extend LDA to non-stationary environments. Likewise, dynamic Bayesian classifiers based on the Kalman filter [28–30] have been developed for on-line adaptive classification. All these methods sequentially update the model during the unlabeled dataset or testing sessions giving more importance to the most recent trials.

    • Predicting Self-reported Customer Satisfaction of Interactions with a Corporate Call Center

      2017, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Context-aware adaptive spelling in motor imagery BCI

      2016, Journal of Neural Engineering
    • Ordinal Regression Methods: Survey and Experimental Study

      2016, IEEE Transactions on Knowledge and Data Engineering
    View all citing articles on Scopus
    View full text