Bayesian inference for an adaptive Ordered Probit model: An application to Brain Computer Interfacing

doi:10.1016/j.neunet.2011.03.019

Neural Networks

Volume 24, Issue 7, September 2011, Pages 726-734

https://doi.org/10.1016/j.neunet.2011.03.019 Get rights and content

Abstract

This paper proposes an algorithm for adaptive, sequential classification in systems with unknown labeling errors, focusing on the biomedical application of Brain Computer Interfacing (BCI). The method is shown to be robust in the presence of label and sensor noise. We focus on the inference and prediction of target labels under a nonlinear and non-Gaussian model. In order to handle missing or erroneous labeling, we model observed labels as a noisy observation of a latent label set with multiple classes ( $\geq 2$ ). Whilst this paper focuses on the method’s application to BCI systems, the algorithm has the potential to be applied to many application domains in which sequential missing labels are to be imputed in the presence of uncertainty. This dynamic classification algorithm combines an Ordered Probit model and an Extended Kalman Filter (EKF). The EKF estimates the parameters of the Ordered Probit model sequentially with time. We test the performance of the classification approach by processing synthetic datasets and real experimental EEG signals with multiple classes (2, 3 and 4 labels) for a Brain Computer Interfacing (BCI) experiment.

Introduction

Adaptive classification is an important problem in many application domains including biomedical engineering. Previous research in adaptive classification applied to biomedical engineering has used state space modeling of the time series (Lowne et al., 2010, Penny et al., 2000, Sykacek et al., 2004). The classification of an observed time series can often require extensive training before reliable results are obtained, as shown for biomedical applications (Pfurtscheller et al., 1993, Vidaurre et al., 2006, Wolpaw et al., 1991). Adaptive sequential classification approaches have been developed in order to reduce the overheads of training, to track non-stationarity and to cope rapidly with new observations. Such adaptive approaches differ from the typical methodology, in which an algorithm is trained off-line on retrospective data, in so far that the process of learning is continuously taking place rather than being confined to a section of training data.

In particular, Yoon et al., 2008a, Yoon et al., 2008b have investigated the use of the Extended Kalman Filter (EKF) to process and analyze brain signals with the assumption that an observed label is informative (there is little uncertainty), i.e. there are no ‘bit errors’ in the label stream. In general we do not, however, know the ground truth of the labels — we have only imperfect and noisy labels which we use as a reference. The assumption of error-free labels is hence invalid. In order to address this problem, we have developed a classification algorithm augmented by an auxiliary latent variable which corresponds to the uncorrupted underlying labels using Rao–Blackwellized Sequential Monte Carlo (Rao–Blackwellized SMC) (Yoon, Roberts, Dyson, & Gan, 2009). All these approaches have employed a standard Probit regression kernel for two class, binary classification. In this paper we develop a multi-class classifier.

Only two class labels may be distinguished by a sequential algorithm which employs a standard probit model (Aitchison & Silvey, 1957) with a threshold 0 so as to partition into two classes. The Probit model can be generalized by allowing multiple classes. Multi-class Probit models have been studied and applied in the economic (Geweke et al., 1994, McCulloch and Rossi, 1994) and clinical domains (Jung et al., 2006, Zhou et al., 2006). The multinomial Probit model is a well-known approach for unordered multi-class problems (Albert & Chip, 1993). Alternatively, for ordinal data, we can use the Ordered Probit model (Cameron and Trivedi, 2006, Munkin and Trivedi, 2007, Weiss, 1997).

It is common to find many application domains for ordinal classification, such as heights and weights of a population. For an ordinal dataset, the basic concept of the Ordered Probit model is that there is a latent continuous metric underlying the ordinal responses, and that thresholds partition the cumulative distribution function (c.d.f.) into a series of regions corresponding to the ordinal categories. Most literature in the Ordered Probit model focuses on off-line estimation of the thresholds for partitioning and on the estimation of their parameters and hyper-parameters (using, for example, Maximum Likelihood, Markov Chain Monte Carlo and Gibbs sampling) (Munkin and Trivedi, 2007, Ronning and Kukuk, 1996). However, the metrics (the thresholds) for ordinal classification may be adapted rather than kept fixed. In this paper, we propose an adaptive classifier using the Ordered Probit model which makes estimation of the non-stationary thresholds using hidden parameters whose values are dynamically inferred using an Extended Kalman Filter (EKF).

This paper is organized as follows. Section 2 presents the mathematical models for the proposed algorithms. We present the mathematical forms of the likelihood function and prior information with the Ordered Probit model based on a hierarchical framework. In Section 3 the algorithms are described in terms of prediction and filtering steps of the state space models. In Section 4 we present results from application to synthetic datasets (with two and three labels) and real experimental datasets composed of two and four label problems in the classification of human electroencephalogram (EEG) signals used for a Brain Computer Interfacing system.

Section snippets

Mathematical model

Our proposed algorithm is based on a probit classifier (regression) model. Suppose that we have radial basis functions $φ_{t} (x_{t})$ from an observed input vector $x_{t}$ at time $t$ . To simplify notation, we use $φ_{t}$ instead of $φ_{t} (x_{t})$ i.e. $φ_{t} = φ_{t} (x_{t}) = {[x_{t}^{T}, {ϕ_{t}^{(1)} (x_{t})}^{T}, \dots, {ϕ_{t}^{(N_{b})} (x_{t})}^{T}, 1]}^{T}$ and $ϕ_{t}^{(i)} (x_{t})$ is the response of the $i$ th Gaussian basis function for $i \in {1, \dots, N_{b}}$ at time $t = 1, 2, \dots, T$ . The use of Gaussian basis functions allows for nonlinear classification boundaries yet retains analytic simplicity. Here, $N_{b}$ is

Algorithms

In adaptive classification, there are two steps to our state space model which performs model inference: Filtering: $p (θ_{t} | z_{1 : t}, x_{1 : t})$ and Prediction: $p (θ_{t} | z_{1 : t - 1}, x_{1 : t - 1})$ .

Results

The algorithm was tested on synthetic datasets (with two and three classes) and experimental EEG datasets (with two and four classes).

Discussion

From a methodological perspective, the proposed method has a strong assumption (limitation) that the number of classes is known. This assumption is not realistic in many real life problems. It is in general difficult to infer the number of classes even in off-line systems. Unfortunately, inferring the number of classes becomes far more difficult, or practically infeasible, because the trans-dimensional scheme requires high computational loads to properly build target distributions in an on-line

Conclusion

In this paper, we propose a variant of an Ordered Probit model to dynamically classify multi-class signals for adaptive Brain Computer Interfacing systems. The EKF enables the static Ordered Probit model to become an adaptive Ordered Probit model since the EKF estimates the hidden metrics (thresholds) of the adaptive Ordered Probit model. This approach also introduces auxiliary variables and truncation of the filtering distribution to deal with both Markov dependency of the output function and

Acknowledgments

This project was supported by grant named ‘EP/D030099’ from the UK EPSRC, to whom we are most grateful. The authors would like to acknowledge the core work of Duncan Lowne in the early development of adaptive classification in BCI systems.

References (30)

R.C. Jung et al.
Time series of count data: modeling, estimation and diagnostics
Computational Statistics & Data Analysis
(2006)
D.R. Lowne et al.
Sequential non-stationary dynamic classification with sparse feedback
Pattern Recognition
(2010)
R. McCulloch et al.
An exact likelihood analysis of the multinomial probit model
Journal of Econometrics
(1994)
B. Obermaier et al.
Hidden Markov models for online classification of single trial EEG data
Pattern Recognition Letters
(2001)
J. Pardey et al.
A review of parametric modelling techniques for EEG analysis
Medical Engineering & Physics
(1996)
G. Pfurtscheller et al.
Brain–computer interface: a new communication device for handicapped persons
Journal of Microcomputer Applications archive
(1993)
J.R. Wolpaw et al.
An EEG-based brain–computer interface for cursor control
Electroencephalography and Clinical Neurophysiology
(1991)
J. Yoon et al.
Adaptive classification for brain computer interface systems using sequential Monte Carlo sampling
Neural Networks
(2009)
J. Aitchison et al.
The generalization of probit analysis to the case of multiple response
Biometrika
(1957)
J.H. Albert et al.
Bayesian analysis of binary and polychotomous response data
Journal of American Statistical Association
(1993)

F. Babiloni et al.

Linear classification of low-resolution EEG patterns produced by imagined hand movements

IEEE Transactions on Rehabilitation Engineering

(2000)

C.M. Bishop

Pattern recognition and machine learning

(1988)

A.C. Cameron et al.

Regression analysis of count data

(2006)

G. Dornhege et al.

Boosting bit rates in noninvasive EEG single-trial classifications by feature combination and multiclass paradigms

IEEE Transactions on Biomedical Engineering

(2004)

J. Geweke et al.

Alternative computational approaches to inference in the Multinomial Probit model

The Review of Economics and Statistics

(1994)

Cited by (9)

Adaptive semi-supervised classification to reduce intersession non-stationarity in multiclass motor imagery-based brain-computer interfaces
2015, Neurocomputing
Citation Excerpt :
Adaptive procedures such as bias adaptation [14,24], importance weighted cross validation [25,26], or data space adaptation based on the Kullback–Leibler divergence [27] were proposed to extend LDA to non-stationary environments. Likewise, dynamic Bayesian classifiers based on the Kalman filter [28–30] have been developed for on-line adaptive classification. All these methods sequentially update the model during the unlabeled dataset or testing sessions giving more importance to the most recent trials.
The intersession non-stationarity in electroencephalogram (EEG) data is a major issue to robust operation of brain–computer interfaces (BCIs). The aim of this paper is to propose a semi-supervised classification algorithm whereby the model is gradually enhanced with unlabeled data collected online. Additionally, a processing stage is introduced before classification to adaptively reduce the small fluctuations between the features from training and evaluation sessions. The key element of the classification algorithm is an optimized version of kernel discriminant analysis called spectral regression kernel discriminant analysis (SRKDA) in order to meet the low computational cost requirement for online BCI applications. Four different approaches, SRKDA and sequential updating semi-supervised SRKDA (SUSS-SRKDA) with or without adaptive processing stage are considered to quantify the advantages of semi-supervised learning and adaptive stage. The session-to-session performance for each of them is evaluated on the multiclass problem (four motor imagery tasks: the imagination of movement of the left hand, right hand, both feet, and tongue) posed in the BCI Competition IV dataset 2a. The results agree with previous studies reporting semi-supervised learning enhances the adaptability of BCIs to non-stationary EEG data. Moreover, we show that reducing the inter-session non-stationarity before classification further boosts its performance. The classification method combining adaptive processing and semi-supervised learning is found to yield the highest session-to session transfer results presented so far for this multiclass dataset: accuracy (77%) and Cohen׳s kappa coefficient (0.70). Thus, the proposed methodology could be of great interest for real-life BCIs.
Graph neural networks to predict customer satisfaction following interactions with a corporate call center
2021, arXiv
A NOVEL LARGE-SCALE ORDINAL REGRESSION MODEL<sup>∗</sup>
2018, arXiv
Predicting Self-reported Customer Satisfaction of Interactions with a Corporate Call Center
2017, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Context-aware adaptive spelling in motor imagery BCI
2016, Journal of Neural Engineering
Ordinal Regression Methods: Survey and Experimental Study
2016, IEEE Transactions on Knowledge and Data Engineering

View all citing articles on Scopus

View full text

Bayesian inference for an adaptive Ordered Probit model: An application to Brain Computer Interfacing

Abstract

Introduction

Section snippets

Mathematical model

Algorithms

Results

Discussion

Conclusion

Acknowledgments

Computational Statistics & Data Analysis

Pattern Recognition

Journal of Econometrics

Pattern Recognition Letters

Medical Engineering & Physics

Journal of Microcomputer Applications archive

Electroencephalography and Clinical Neurophysiology

Neural Networks

The generalization of probit analysis to the case of multiple response

Biometrika

Bayesian analysis of binary and polychotomous response data

Journal of American Statistical Association