main-content

## Über dieses Buch

This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering. Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat

ory frameworks, but become powerful, quantitative data-analytical tools in themselves that enable researchers to look beyond the data surface and unravel underlying mechanisms. Interactive examples of most methods are provided through a package of MatLab routines, encouraging a playful approach to the subject, and providing readers with a better feel for the practical aspects of the methods covered.

"Computational neuroscience is essential for integrating and providing a basis for understanding the myriads of remarkable laboratory data on nervous system functions. Daniel Durstewitz has excellently covered the breadth of computational neuroscience from statistical interpretations of data to biophysically based modeling of the neurobiological sources of those data. His presentation is clear, pedagogically sound, and readily useable by experts and beginners alike. It is a pleasure to recommend this very well crafted discussion to experimental neuroscientists as well

as mathematically well versed Physicists. The book acts as a window to the issues, to the questions, and to the tools for finding the answers to interesting inquiries about brains and how they function."

Henry D. I. Abarbanel

Physics and Scripps Institution of Oceanography, University of California, San Diego

“This book delivers a clear and thorough introduction to sophisticated analysis approaches useful in computational neuroscience. The models described and the examples provided will help readers develop critical intuitions into what the methods reveal about data. The overall approach of the book reflects the extensive experience Prof. Durstewitz has developed as a leading practitioner of computational neuroscience. “

Bruno B. Averbeck

## Inhaltsverzeichnis

### Chapter 1. Statistical Inference

Abstract
This first chapter will briefly review basic statistical concepts, ways of thinking, and ideas that will reoccur throughout the book, as well as some general principles and mathematical techniques for handling these. In this sense it will lay out some of the ground on which statistical methods developed in later chapters rest. It is assumed that the reader is basically familiar with core concepts in probability theory and statistics, such as expectancy values, probability distributions like the binomial or Gaussian, Bayes’ rule, or analysis of variance. The presentation given in this chapter is quite condensed and mainly serves to summarize and organize key facts and concepts required later, as well as to put special emphasis on some topics. Although this chapter is self-contained, if the reader did not pass through an introductory statistics course so far, it may be advisable to consult introductory chapters in a basic statistics textbook first (very readable introductions are provided, for instance, by Hays 1994, or Wackerly et al. 2007; Kass et al. 2014, in particular, give a highly recommended introduction specifically targeted to a neuroscience readership). More generally, it is remarked here that the intention of the first six chapters was more to extract and summarize essential points and concepts from the literature referred to.
Daniel Durstewitz

### Chapter 2. Regression Problems

Abstract
Assume we would like to predict variables y from variables x through a function f(x) such that the squared deviations between actual and predicted values are minimized (a so-called squared error loss function, see Eq. 1.​11). Then the regression function which optimally achieves this is given by f(x) = E(y|x) (Winer 1971; Bishop 2006; Hastie et al. 2009), that is the goal in regression is to model the conditional expectancy of y (the “outputs” or “responses”) given x (the “predictors” or “regressors”). For instance, we may have recorded in vivo the average firing rate of p neurons on N independent trials i, arranged in a set of row vectors X = {x 1,…, x i ,…, x N }, and would like to see whether with these we can predict the movement direction (angle) y i of the animal on each trial (a “decoding” problem). This is a typical multiple regression problem (where “multiple” indicates that we have more than one predictor). Had we also measured more than one output variable, e.g., several movement parameters like angle, velocity, and acceleration, which we would like to set in relation to the firing rates of the p recorded neurons, we would get into the domain of multivariate regression.
Daniel Durstewitz

### Chapter 3. Classification Problems

Abstract
In classification problems, the objective is to classify observations into a set of K discrete classes C∈{1…K}. To these ends, one often tries to estimate or approximate the posterior probabilities p(k|x) ≡ p(C = k|x). Given these, one could classify new observations x into the class C * for which we have
Daniel Durstewitz

### Chapter 4. Model Complexity and Selection

Abstract
In Chap. 2 the bias-variance tradeoff was introduced and approaches to regulate model complexity by some parameter λ—but how to choose it? Here is a fundamental issue in statistical model fitting or parameter estimation: We usually only have available a comparatively small sample from a much larger population, but we really want to make statements about the population as a whole. Now, if we choose a sufficiently flexible model, e.g., a local or spline regression model with many parameters, we may always achieve a perfect fit to the training data, as we already saw in Chap. 2 (see Fig. 2.​5). The problem with this is that it might not say much about the true underlying population anymore as we may have mainly fitted noise—we have overfit the data, and consequently our model would generalize poorly to sets of new observations not used for fitting. As a note on the side, it is not only the nominal number of parameters relevant for this but also the functional form or flexibility of our model and constraints put on the parameters. For instance, of course we cannot accurately capture a nonlinear functional relationship with a (globally) linear model, regardless of how many parameters. Or, as noted before, in basis expansions and kernel approaches, the effective number of parameters may be much smaller as the variables are constrained by their functional relationships. This chapter, especially the following discussion and Sects. 4.1–4.4, largely develops along the exposition in Hastie et al. (2009; but see also the brief discussion in Bishop, 2006, from a slightly different angle).
Daniel Durstewitz

### Chapter 5. Clustering and Density Estimation

Abstract
In classification approaches as described in Chap. 3, we have a training sample X with known class labels C, and we use this information either to estimate the conditional probabilities p(C = k|x), or to set up class boundaries (decision surfaces) by some other more direct criterion. In clustering we likewise assume that there is some underlying class structure in the data, just that we don’t know it and have no access to class labels C for our sample X, so that we have to infer it from X alone. This is also called an unsupervised statistical learning problem. In neurobiology this problem frequently occurs, for instance, when we suspect that neural cells in a brain area—judging from their morphological and/or electrophysiological characteristics—fall into different types, when gene sets cluster in functional pathways, when we believe that neural spiking patterns generated spontaneously in a given area are not arranged along a continuum but come from discrete categories (as possibly indicative of an attractor dynamics, see Chap. 9), or when rodents appear to utilize a discrete set of behavioral patterns or response strategies. In many such circumstances, we may feel that similarities between observations (observed feature sets) speak for an underlying mechanism that produces discrete types, but how could we extract such apparent structure and characterize it more formally? More precisely, we are looking for some partition G: $${\mathbb{R}}^p$$ → {1…K} of the p-dimensional real space (or some other feature space, don’t have to be real numbers) that reveals its intrinsic structure. In fact, we may not just search for one such specific partition but may aim for a hierarchically nested set of partitions, that is, classes may split into subclasses and so on, as is the case with many natural categories and biological taxonomies. For instance, at a superordinate level, we may group cortical cell types into pyramidal cells and interneurons, which then in turn would split into several subclasses (like fast-spiking, bursting, “stuttering,” etc.).
Daniel Durstewitz

### Chapter 6. Dimensionality Reduction

Abstract
For the purpose of visualization and for the ease of interpretation, to remove redundancies from the data or to combat the curse of dimensionality (Sect. 4.​4), it may be useful to reduce the dimensionality of the original p-dimensional feature space. This, of course, should be done in a way that minimizes the potential loss of information, where the precise definition of “loss of information” may depend on the statistical and scientific questions asked. There are both linear and nonlinear methods for dimensionality reduction. This chapter will start with the by far most popular procedure, principal component analysis.
Daniel Durstewitz

### Chapter 7. Linear Time Series Analysis

Abstract
From a purely statistical point of view, one major difference between time series and data sets as discussed in the previous chapters is that temporally consecutive measurements are usually highly dependent, thus violating the assumption of identically and independently distributed observations on which most of conventional statistical inference relies. Before we dive deeper into this topic, we note that the independency assumption is not only violated in time series but also in a number of other common test situations. Hence, beyond the area of time series, statistical models and methods have been developed to deal with such scenarios. Most importantly, the assumption of independent observations is given up in the class of mixed models which combine fixed and random effects, and which are suited for both nested and longitudinal (i.e., time series) data (see, e.g., Khuri et al. 1998; West et al. 2006, for more details). Aarts et al. (2014) discuss these models specifically in the context of neuroscience, where dependent and nested data other than time series frequently occur, e.g., when we have recordings from multiple neurons, nested within animals, nested within treatment groups, thus introducing dependencies. Besides including random effects, mixed models can account for dependency by allowing for much more flexible (parameterized) forms for the involved covariance matrices. For instance, in a regression model like Eq. (2.​6) we may assume a full covariance matrix for the error terms [instead of the scalar form assumed in Eq. (2.​6)] that captures some of the correlations among observations. Taking such a full covariance structure for Σ into account, under the multivariate normal model the ML estimator for parameters β becomes (West et al. 2006)
Daniel Durstewitz

### Chapter 8. Nonlinear Concepts in Time Series Analysis

Abstract
In biology, neuroscience in particular, the dynamical processes generating the observed time series will commonly be (highly) nonlinear. A prominent example is the very essence of neural communication itself, the action potential, which is generated by the strongly nonlinear feedbacks between sodium and potassium channel gating and membrane potential (or interactions among channels themselves; Naundorf et al. 2006). Stable oscillations as frequently encountered in neural systems, detected, e.g., in EEG or local field potentials, are nonlinear phenomena as well. This does not imply that linear time series analysis is not useful. Linear models, especially in very noisy or chaotic (see below) situations, may still provide a good approximation; they may still be able to capture the bulk of the deterministic dynamics in a sea of noise and explain most of the deterministic variance of the process (Perretti et al. 2013). Even if they do not capture a too large proportion of the deterministic fluctuations in the data, they could still be harvested as hypothesis testing tools in some situations. But linear systems are very limited from a computational point of view (arguably the most important biological purpose of brains) and won’t be able to capture a number of prominent biophysical phenomena.
Daniel Durstewitz

### Chapter 9. Time Series from a Nonlinear Dynamical Systems Perspective

Abstract
Nonlinear dynamics is a huge field in mathematics and physics, and we will hardly be able to scratch the surface here. Nevertheless, this field is so tremendously important for our theoretical understanding of brain function and time series phenomena that I felt a book on statistical methods in neuroscience should not go without discussing at least some of its core concepts. Having some grasp of nonlinear dynamical systems may give important insights into how the observed time series were generated. In fact, nonlinear dynamics provides a kind of universal language for mathematically describing the deterministic part of the dynamical systems generating the observed time series—we will see later (Sect. 9.3) how to connect these ideas to stochastic processes and statistical inference. ARMA and state space models as discussed in Sects. 7.​2 and 7.​5 are examples of discrete-time, linear dynamical systems driven by noise. However, linear dynamical systems can only exhibit a limited repertoire of dynamical behaviors and typically do not capture a number of prominent and computationally important phenomena observed in physiological recordings. In the following, we will distinguish between models that are defined in discrete time (Sect. 9.1), as all the time series models discussed so far, and continuous-time models (Sect. 9.2).
Daniel Durstewitz

### Backmatter

Weitere Informationen