Construction of multivariate surrogate sets from nonlinear data using the wavelet transform

https://doi.org/10.1016/S0167-2789(03)00136-2Get rights and content

Abstract

The use of surrogate data has become a crucial first step in the study of nonlinearity in time series data. A widely used technique to construct surrogate data is to randomize the phases of the data in the Fourier domain. In this paper, an alternative technique based on the resampling of wavelet coefficients is discussed. This approach exploits between scale correlations that exist within nonlinear data but which are either absent or weak in stochastic data. It proceeds by transforming the data into the wavelet domain and permuting the wavelet coefficients. Experimental and numerical time series data are used to demonstrate that the performance of the wavelet resampling technique is comparable to phase randomization in terms of the preservation of linear properties, removal of nonlinear structure and computational demands. However, the wavelet technique may have specific and distinct advantages in the application to complex data sets, such as numerical analysis of turbulence and experimental brain imaging data, where wavelets give a more parsimonious representation of spatio-temporal patterns than Fourier modes. It is shown that different techniques of resampling the data in the wavelet domain may optimize the construction of surrogate data according to the properties of the experimental time series and computational constraints.

Introduction

The nonlinear properties of biophysical systems such as the brain have motivated the nonlinear characterization of time series data in many studies. However, before characterizing the nonlinear properties of time series data, it is important to first determine that the data do, in fact, contain nonlinear structure. It is otherwise possible that the values of ‘nonlinear invariants’ will merely reflect practical limitations of the data such as a finite sample length [1], [2], experimental filtering [3] or other linear properties such as the power spectra [4]. A data set can be said to be nonlinear if it meets two criteria. First, the data must permit rejection of the null hypothesis that they are purely linear, or have a linear origin but have been distorted by a static nonlinear measurement effect. Second, alternative factors that may allow rejection of this null hypothesis, such as nonstationary stochastic processes, must be excluded [5], [6]. Only if these conditions have been met is it then reasonable to use nonlinear methods to further characterize the properties of the time series.

A simple, yet powerful method of testing such a null hypothesis is to resample the original data in such a way that the linear properties of the data are preserved, but any nonlinear structure is removed. The values of a nonlinear measure calculated from an ensemble of such “surrogate data” represent their expected distribution under the null hypothesis. A straightforward statistical comparison between this null distribution and the value derived from the experimental data then permits formal testing of the null hypothesis. This “bootstrap” technique of generating a null distribution has a well established role in statistics. It was introduced into the dynamical systems context by Pijn et al. [7] and Theiler et al. [8] where it has had a significant impact (for an overview see [9]). For example, prior to surrogate data testing, several studies had concluded that scalp EEG data from healthy human subjects was chaotic. Reappraisal of this data using surrogate techniques has typically shown that nonlinearity does occur, but only weakly and/or infrequently [10], [11], [12], [13].

The simplest method of data resampling is to randomly permute the temporal order of the data (without replacement). However, unless strongly constrained [14] such a process typically destroys linear as well as nonlinear correlations. A solution to this problem, and the most widely implemented technique for generating surrogate data, is to resample the data in the Fourier domain [8]. That is, the data are Fourier transformed and the phase of each frequency is rotated by an independent random number p∈(0,2π). The inverse Fourier transformed data has, on average, the same spectral properties as the original data, but with any nonlinear structure removed. Such a procedure is easy to implement and associated with minimal computation demands, particularly if the sample length is a power of 2. There are, however, important caveats of this approach, including the effect on the amplitude distribution of the data and the extension to spatially extended data sets, which to our knowledge has not yet been illustrated.

Wavelets are a relatively novel signal analysis tool that have already had important applications in engineering, biology and physics. They are ideally suited to signals with transient temporal properties, multiscale structure and spatial extension. Wavelets hence show great potential in the study of nonlinear and biological systems. For example, Guan et al. [15] found that just three wavelet modes were able to capture the rich dynamics of a complex nonlinear spatio-temporal system, whereas greater than 20 Fourier modes were required. Wavelets represent a simple method of capturing the correlations between scales exhibited by coupled chaotic oscillators [16]. Wavelets are able to efficiently capture neural signals isolated in time and space, such as epileptiform discharges in scalp EEG [17], [18] and isolated areas of transient activation in intracranial functional magnetic resonance imaging (fMRI) [19], [20], [21].

The parsinomious representation of complex spatio-temporal and nonlinear data by wavelet decomposition suggests that a wavelet-based surrogate algorithm may have many important applications in the study of complex biophysical systems. In this paper, a wavelet-based method of constructing surrogate data for nonlinear hypothesis testing is presented and applied to both numeric and experimental test data. Different methods of resampling the data in the ‘wavelet domain’ are compared. It is shown how the technique can be easily extended to multivariate data sets. A consideration of the effect of wavelet resampling on the amplitude distribution of the original data is presented. This motivates a practical approach to construct surrogate data in the context of the constraints imposed in the real experimental setting.

Section snippets

Wavelet resampling

In this section, a brief technical review of a wavelet decomposition of a time series is given. The ‘decorrelating’ properties of the wavelet transform are discussed, and the technique of resampling in the wavelet domain is described.

Application to numerical and experimental data

In the following sections the above concepts are applied to time series data obtained from (1) a coupled Rossler dynamical system, (2) a static nonlinear transformation of a colored noise source, and (3) human scalp EEG data. The effects of different types of wavelet resampling are illustrated. The results and computational demands are compared to the phase randomization approach.

Extension to multivariate data

Many complex biophysical systems—such as the nervous system—are characterized by sparsely interconnected local nonlinear subsystems. Since the initial observations of synchronization in chaotic systems [32] there has been much interest in the role of complex, nonlinear interdependence between different subsystems within such a network (e.g. [13], [33]). This can be studied by examining multivariate time series data sets recorded from different regions of the system. Research in this area has

Effect of wavelet resampling on amplitude distribution of time series

A significant problem that is known to be associated with the phase randomization technique is that the amplitude distribution of the surrogate data is, on average, Gaussian. Although stochastic signals, such as produced by Eq. (16), have Gaussian amplitude distributions, it is possible that the process of recording the data introduces a static nonlinear distortion of the signal, as modelled by Eq. (17). A nonlinear measurement effect is thought to be present in the case of fMRI of blood flow

Effect of wavelet resampling on nonlinear structure of time series

Just as the preservation of linear structure and the potential for multiple distinct realizations are essential characteristics of any surrogate technique, so too is the removal of any nonlinear structure that may be present. Whereas the first two properties are essential to minimize the rate of false positive rejections of the null hypothesis, removal of nonlinear structure is essential to limit the rate of false negatives. That is, failure to properly destroy any nonlinear structure in the

Computation demands of wavelet resampling

As discussed above, in the real experimental setting, it is often desirable to test multiple data sets for nonlinearity. To ensure that the rate of false positive rejections of the null hypothesis is minimized, it may be necessary to construct thousands of surrogate data sets, particularly if the data sets are drawn from different subjects. Clearly the computational demands of surrogate data construction are important in this setting even if the computation is performed off-line. The

Discussion

The use of surrogate data is a crucial step in the testing of time series data sets for evidence of nonlinear structure or independence. In this paper, we present a method of generating surrogate data based on the wavelet transform. This method was recently developed for application in other biophysical contexts [26] and is presented here in relation to nonlinear hypothesis testing for the first time. Resampling of the data in the wavelet domain is shown to have all the desired properties of a

Acknowledgements

The authors wish to thank S. Knock and P. Drysdale for helpful discussions and technical assistance. MB acknowledges the support of a NSW Institute of Psychiatry Research Fellowship and University of Sydney SESQUI post-doctoral fellowship.

References (41)

  • J. Arnhold et al.

    A robust method for detecting interdependencies: application to intracranially recorded EEG

    Physica D

    (1999)
  • C.J. Stam et al.

    Synchronization likelihood: an unbiased estimate of generalized synchronization in multivariate data sets

    Physica D

    (2002)
  • A. Mechelli et al.

    Nonlinear coupling between evoked rCBF and BOLD signals: a simulation study

    NeuroImage

    (2001)
  • K.J. Friston

    Bayesian estimation of dynamical systems: an application to fMRI

    NeuroImage

    (2002)
  • J. Theiler

    Spurious dimension from correlation algorithms applied to limited time-series data

    Phys. Rev. A

    (1986)
  • D. Ruelle

    Deterministic chaos: the science and the fiction

    Proc. Roy. Soc. Lond. A

    (1990)
  • P. Rapp et al.

    Filtered noise can mimic low-dimensional chaotic attractors

    Phys. Rev. E

    (1993)
  • M. Palus

    Nonlinearity in normal human EEG: cycles, temporal asymmetry, nonstationarity and randomness, not chaos

    Biol. Cybern.

    (1996)
  • W. Pritchard et al.

    Dimensional analysis of resting human EEG. II. Surrogate data testing indicates nonlinearity but not low-dimensional chaos

    Psychophysiology

    (1995)
  • M. Breakspear

    Nonlinear phase desynchronization in human EEG data

    Human Brain Mapp.

    (2002)
  • Cited by (0)

    View full text