Elsevier

Pattern Recognition

Volume 45, Issue 6, June 2012, Pages 2085-2100
Pattern Recognition

Model sparsity and brain pattern interpretation of classification models in neuroimaging

https://doi.org/10.1016/j.patcog.2011.09.011Get rights and content

Abstract

Interest is increasing in applying discriminative multivariate analysis techniques to the analysis of functional neuroimaging data. Model interpretation is of great importance in the neuroimaging context, and is conventionally based on a ‘brain map’ derived from the classification model. In this study we focus on the relative influence of model regularization parameter choices on both the model generalization, the reliability of the spatial patterns extracted from the classification model, and the ability of the resulting model to identify relevant brain networks defining the underlying neural encoding of the experiment. For a support vector machine, logistic regression and Fisher's discriminant analysis we demonstrate that selection of model regularization parameters has a strong but consistent impact on the generalizability and both the reproducibility and interpretable sparsity of the models for both 2 and 1 regularization. Importantly, we illustrate a trade-off between model spatial reproducibility and prediction accuracy. We show that known parts of brain networks can be overlooked in pursuing maximization of classification accuracy alone with either 2 and/or 1 regularization. This supports the view that the quality of spatial patterns extracted from models cannot be assessed purely by focusing on prediction accuracy. Our results instead suggest that model regularization parameters must be carefully selected, so that the model and its visualization enhance our ability to interpret the brain.

Highlights

► We consider classification models widely used within the neuroimaging community. ► Within a resampling framework we evaluate the importance of appropriate selection of model regularization parameters. ► We illustrate a trade-off between model visualization reproducibility and prediction accuracy. ► The quality of spatial patterns extracted from models cannot be assessed purely by focusing on prediction accuracy. ► Optimizing prediction accuracy does not ensure discovery of the relevant brain networks.

Introduction

In recent years there has been increasing interest in applying discriminative multivariate analysis techniques in the analysis of functional neuroimaging data [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. The procedure of discriminative multivariate analysis has also been referred to as mental state decoding [11] and multivoxel pattern analysis [12]. The questions that such approaches attempt to answer are: Given a brain scan, can we, based on the activation pattern, infer which of multiple brain states a subject was engaged in when the scan was acquired, and what is the most reliable spatial pattern reflecting the underlying neural encoding of the experiment defining the multiple brain states? This analysis approach has two justifications. First, in clinical settings classification methods can potentially support a diagnosis [13], [14]. Second, in brain mapping applications classification methods provide a principled framework to integrate relevant information on how brain responses are spatially distributed when investigating brain function, i.e., does the brain use local, distributed or perhaps a combination of these two neural encoding possibilities [15], [16]. While conventional univariate analysis strategies attempt to identify brain regions that behave according to a prespecified model, see e.g. Friston et al. [17], pattern-based classification techniques provide a focus on how information is encoded in the brain, rather than solely focusing on locations in the brain, hence such methods allow for detection of complex non-local relations between behavior and brain state [7].

Model evaluation is a crucial aspect in multivariate analysis of neuroimaging data. After building a model, it is important to assess whether the model captured the statistical regularities of interest in the data. A natural procedure is to quantify performance in terms of the generalizability of the model [2]. Effectively this is done by evaluating the test error. We have more confidence in a model that correctly classifies the brain states, while it is hard to defend a model with poor generalization performance. Additionally we are interested in the interpretability of the classification model. Typically, such model interpretation is done on the basis of a ‘brain map’ that reveals in which voxels the discriminative information resides [6], [18], [19]. Hence, we could say that there is a hidden agenda in the use of classification models in analysis of neuroimaging data. That is, we are interested in how the discriminative information is encoded in the brain, rather than assignment of class labels to scans (since labels often are already known).

When building classifiers on neuroimaging data we face challenges that are well known from other domains e.g. bio-informatics. Data sets are typically characterized by a high number of features/voxels (10–100 K), while a relatively small number of examples are available. Hence, strong regularization in the models is often required to avoid over-fitting to the training data.

An example is the support vector machine (SVM) that has demonstrated good performance on classification tasks in neuroimaging data, and seems to be the most popular classification model used, see e.g. [5], [6], [9], [14], [20], [21], [22], [23]. The linear SVM has one regularization parameter C that allows for control of classifier complexity. In neuroimaging contexts, it has been observed that model performance in terms of prediction accuracy is only degraded at low values of the C parameter [6]. These observations seem to support the use of the ‘hard-margin’ SVM, a special instance of the SVM with no regularization parameter. Selection of model regularization parameters is not a major concern of SVM users in the neuroimaging community, which may be explained by the fact that SVMs operated ‘out of the box’ show good generalization performance on present neuroimaging data sets.

In addition to high generalization performance, sparsity is a desirable model property, that can aid in data interpretation, particularly when building models on large-scale data sets. For example, after model fitting the SVM may only require a subset of the training data in order to classify new data examples. Hence, the SVM can impose sparsity in the example/brain scan dimension by removing training examples from that dimension [6]. In large scale data sets such sparsity often also directly reduces the computational demand/time for classification of new examples. Another strategy is to impose sparsity by selecting a subset of an appropriate basis set, e.g. principal components [4], [24] or independent components [25]. Such approaches attempt to enhance signal detection and model interpretability by restricting the analysis to an informative subset of basis features extracted as a linear combination of voxels. Methods that impose sparsity in the feature/voxel dimension have been introduced for neuroimaging applications [26], [27], [28], [29]. Methods that enforce sparsity in the voxel dimension by forcing many voxel weights to zero attempt to deal with the following issues: (i) if the proportion of voxels that convey discriminative information is small compared to the total number of voxels the sensitivity of a classifier may be degraded. (ii) Driven by the hypothesis that only a subset of the voxels convey discriminative information, identification of such discriminative voxels may improve model interpretability. That is, a sparsity enforcing model should exclude irrelevant voxels and retain all relevant voxels that improve generalizability and model interpretation. Conventionally, ‘relevant voxels’ has been defined as the subset of voxels that maximizes the classification accuracy [27], [29], an assumption we examine closely below and show that both 2 and 1 regularized techniques may remove more than irrelevant voxels for interpretation of underlying brain networks.

Selection of an appropriate model regularization can be motivated by a means to control the well known bias-variance dilemma (BVD), see e.g. [30], [31]. In its various forms the BVD can be used to explain over-fitting and under-fitting data in terms of the prediction/curve-fitting error. The BV is a decomposition of the prediction error in terms of a ‘systematic prediction error’ and a ‘prediction variability error’. We here note that a model can have very stable predictions (low prediction variance) and still have highly uncertain parameters or hidden variables, i.e., interpretable brain patterns. That is, the BVD concerns the prediction error alone, and does not directly consider the parameter uncertainty that is of paramount importance for neuroimaging and other scientific discovery type applications.

While the BVD focuses on a decomposition and explanation of the prediction error, the work that we present here attempts to explore the tradeoff between prediction error and visualization reproducibility or parameter stability in relation to complexity control of classifiers. By visualization we refer to the brain map based on the model parameters. To measure this visualization reproducibility we use a ‘sample-to-sample’ (i.e., split-half) variability of the model parameters. Our reproducibility metric measures parameter fluctuations in the presence of an unknown parameter bias since we typically do not know the true spatial brain pattern we hope our experiment will reveal. This parameter bias may or may not be coupled to the prediction bias. We have described the quantification of this “parameter noise” in earlier publications, e.g. [4], [32], [33].

Our work is motivated by the strong interest in trustworthy neuroimage visualizations, thus, we introduce the additional objective that the visualization should be stable, or maximally reproducible with a further prediction constraint.

If the primary goal of data analysis via classification methods is model interpretation, we suggest it is particularly important to assess the reliability/stability of model parameters (e.g. the classifiers weight vector) as a function of choices regarding model regularization parameters in the classifier design [4], [8], [18]. Furthermore we investigate to what extent the models are capable of discovering the relevant brain network containing discriminative information. To address our question we apply three well known classification models on two fMRI data sets. One is from a simple finger tapping experiment with a relatively high signal to noise ratio (SNR), where the underlying networks, and hence the spatial representation likely to best support discriminative information is relatively well understood. The other is from a more complicated object tracing experiment with a lower SNR characteristic of subtle cognitive differences that are not yet well understood. For model evaluation we use the NPAIRS resampling framework [4], that allows for evaluation of both generalizability and pattern stability/reproducibility in classification models. NPAIRS split-half resampling [33] is related to stable variable selection as recently described in [34], where Meinshausen and Bühlmann provide additional theoretical background and numerical evidence for the benefit of split-half based analyses. In particular they show that for feature selection in linear regressions with certain exchangeability conditions split-half procedures lead to finite sample control of the error rate of false discoveries.

The paper is organized as follows: in Section 2 we present the classification methods, data sets, and resampling framework; Section 3 contains our results; in Section 4 we discuss the results; and in Section 5 we conclude the article.

Section snippets

Materials and methods

This section is organized as follows: first we provide a general description of the classification models, and a brief review of the mathematical formulations of the three classifiers used in this study; the support vector machine (SVM); logistic regression (LogReg); and Fisher's discriminant analysis (FDA), see e.g. [30] for a general introduction to these classification methods and for historical references. All models are also formulated on the basis of a loss+penalty criterion to emphasize

Results

Fig. 1 shows the performance of the three classifiers for the trailsAB data set over a range of values of the regularization parameter λ. All classifiers showed a transition in prediction accuracy from best accuracy at light regularization to a decreased accuracy at stronger regularization. Around λ=28 we observe maximum accuracy for all classifiers. The SVM showed a somewhat steeper transition from high to low accuracy compared to the other models. All classifiers showed a transition from low

Model evaluation and interpretation

When building discriminative models of brain states with neuroimaging data sets, by far the most popular way to assess model quality is to consider the generalization performance in terms of test set accuracy [2], [7], [10]. This is a natural approach, since the objective is formulated as prediction of a scan label given the brain scan volume. Generalization performance provides information on how well the model captured some of the statistical regularities of interest in the data. However,

Conclusion

In this study we investigated the strong but relatively consistent impact of choices of model regularization parameters on the generalization and the reproducibility or stability of spatial brain patterns extracted from classification models in neuroimaging experiments. We have demonstrated the critical importance of regularization choices other than the commonly used maximum classification criterion for different classifiers in two quite different experiments from different centers/scanners

Acknowledgments

We thank the two anonymous reviewers for their constructive comments. We thank Morten Mørup and Ricardo Henao, DTU Informatics, The Technical University of Denmark for enlightening discussions in the initial phase of this project. We also thank Grigori Yourganov, Rotman Research Institute of Baycrest Centre, Toronto, Canada, for assistance with logistic issues regarding our data sets.

Peter M. Rasmussen did his Bachelors/Masters in biomedical engineering from the Technical University of Denmark and University of Copenhagen, in 2006 and 2008 respectively. Currently he is pursuing his Ph.D. degree at DTU Informatics, Technical University of Denmark. His research interests are in the fields of neuroimaging and machine learning.

References (76)

  • J. Mourão-Miranda et al.

    Dynamic discrimination analysis: a spatial-temporal SVM

    NeuroImage

    (2007)
  • L.K. Hansen et al.

    Generalizable patterns in neuroimaging: how many principal components?

    NeuroImage

    (1999)
  • F. De Martino et al.

    Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns

    NeuroImage

    (2008)
  • M.K. Carroll et al.

    Prediction and interpretation of distributed neural activity with sparse models

    NeuroImage

    (2009)
  • S. Ryali et al.

    Sparse logistic regression for whole-brain classification of fMRI data

    NeuroImage

    (2010)
  • G. Yourganov et al.

    Dimensionality estimation for optimal detection of functional networks in BOLD fMRI data

    NeuroImage

    (2011)
  • J. Ashburner et al.

    Unified segmentation

    NeuroImage

    (2005)
  • R. Cox

    AFNI: software for analysis and visualization of functional magnetic resonance neuroimages

    Computers and Biomedical Research

    (1996)
  • M. Jenkinson et al.

    A global optimisation method for robust affine registration of brain images

    Medical Image Analysis

    (2001)
  • A. Guimond et al.

    Average brain models: a convergence study

    Computer Vision and Image Understanding

    (2000)
  • S. LaConte et al.

    The evaluation of preprocessing choices in single-subject bold fMRI using npairs performance metrics

    NeuroImage

    (2003)
  • A. Riecker et al.

    Parametric analysis of rate-dependent hemodynamic response functions of cortical and subcortical brain structures during auditorily cued finger tapping: a fMRI study

    NeuroImage

    (2003)
  • S.T. Witt et al.

    Functional neuroimaging correlates of finger-tapping task variations: an ALE meta-analysis

    NeuroImage

    (2008)
  • S.C. Strother et al.

    Optimizing the fMRI data-processing pipeline using prediction and reproducibility performance metrics: I. A preliminary group analysis

    NeuroImage

    (2004)
  • A. Marquand et al.

    Quantitative prediction of subjective pain intensity from whole-brain fMRI data using gaussian processes

    NeuroImage

    (2010)
  • J.R. Sato et al.

    Evaluating SVM and MLDA in the extraction of discriminant regions for mental state prediction

    NeuroImage

    (2009)
  • K. Friston et al.

    Bayesian decoding of brain images

    NeuroImage

    (2008)
  • O. Yamashita et al.

    Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns

    NeuroImage

    (2008)
  • M.A. van Gerven et al.

    Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior

    NeuroImage

    (2010)
  • B. Thirion et al.

    Analysis of a large fMRI cohort: statistical and methodological issues for group analyses

    NeuroImage

    (2007)
  • B. Lautrup et al.

    Massive weight sharing: a cure for extremely ill-posed problems

  • N. Mørch et al.

    Nonlinear versus linear models in functional neuroimaging: learning curves and generalization crossover

  • R. Kustra et al.

    Penalized discriminant analysis of [15-O]-water PET brain images with prediction error selection of smoothness and regularization hyperparameters

    IEEE Transactions on Medical Imaging

    (2001)
  • A. O'Toole et al.

    Theoretical, statistical, and practical perspectives on pattern-based classification approaches to the analysis of functional neuroimaging data

    Journal of Cognitive Neuroscience

    (2007)
  • S.J. Hanson et al.

    Brain reading using full brain support vector machines for object recognition: there is no “face” identification area

    Neural Computation

    (2008)
  • J.-D. Haynes et al.

    Decoding mental states from brain activity in humans

    Nature Reviews Neuroscience

    (2006)
  • S. Klöppel et al.

    Automatic classification of MR scans in Alzheimer's disease

    Brain

    (2008)
  • J. Cohen et al.

    The face of controversy

    Science

    (2001)
  • Cited by (103)

    View all citing articles on Scopus

    Peter M. Rasmussen did his Bachelors/Masters in biomedical engineering from the Technical University of Denmark and University of Copenhagen, in 2006 and 2008 respectively. Currently he is pursuing his Ph.D. degree at DTU Informatics, Technical University of Denmark. His research interests are in the fields of neuroimaging and machine learning.

    Lars K. Hansen did his Ph.D. in Physics from the University of Copenhagen in 1986. He is a professor of signal processing, DTU Informatics, Technical University of Denmark. Research interests include statistical machine learning and applications in bio-medicine and digital media.

    Kristoffer H. Madsen obtained his Ph.D. in Machine Learning with neuroimaging applications from the Technical University of Denmark. He is currently a post doc at the Danish Research Centre for Magnetic Resonance where his main focus is on development and application of new methods for the analysis of fMRI data.

    Nathan W. Churchill obtained his Bachelors at Carleton University (B.Sc. Biology and Physics, 2008). He is currently completing his Ph.D. at the University of Toronto, in the Department of Medical Biophysics. Current research interests include data pre-processing and analysis techniques for functional MRI.

    Stephen C. Strother did his Ph.D. in Electrical Engineering, McGill University, 1986. He is currently a Senior Scientist, Rotman Research Institute, and Assistant Director, Center for Stroke Recovery, Baycrest, and Professor, Medical Biophysics, University of Toronto. Research interests include neuroinformatics and machine learning techniques for neuroimaging of human performance and cognition across the age-span.

    View full text