Skip to main content

Über dieses Buch

This book contains 30 selected, refereed papers from an in- ternational conference on bootstrapping and related techni- ques held in Trier 1990. Thepurpose of the book is to in- form about recent research in the area of bootstrap, jack- knife and Monte Carlo Tests. Addressing the novice and the expert it covers as well theoretical as practical aspects of these statistical techniques. Potential users in different disciplines as biometry, epidemiology, computer science, economics and sociology but also theoretical researchers s- hould consult the book to be informed on the state of the art in this area.



Random Number Generation (1)


Principles for Generating Non-Uniform Random Numbers

In many simulations random numbers from a given distribution function F(x) with density f(x) = F′(x) are necessary. If f(x) = 1 for 0 ≤ x ≤ 1 uniformly distributed random numbers are needed. For this purpose the linear congruential method is widely used: A sequence of integers is initialized with a value z 0 and continued as
$$ {z_{{i + 1}}} \equiv a{z_i} + r\;(\bmod \,m),\;0 \leqslant {z_i} < m\;for\,all\;i $$
The fractions u i = z i /m are the derived pseudo-random numbers in the interval [0,1). The constants m, the modulus, a, the multiplicator, r, the increment, and z 0, the starting number are suitably chosen non-negative integers. Three choices of m, a and r are common on most computers:
r = 0, m = 2E, a ≡ 5 (mod 8) and z0 ≡ 1 (mod 4). All zi ≡ 1 (mod 4) are generated.
r = 0, m = p, p prime, a a primitive root mod p. All zi = 1,..., p − 1 are generated.
gcd(r, m) = 1, m = 2E, a = 1 (mod 4). All integers 0, 1,..., 2E − 1 are generated.
Ulrich Dieter

Special methods for pseudorandom number generation

In most applications of stochastic simulation the source of randomness is a sequence of standard pseudorandom numbers u 0, u 1, u 2,…, i.e. a sequence of numbers generated by a computer which “behave” as a realization of a sequence U 0, U 1 , U 2 ,…. of independent identically distributed random variables having a uniform distribution on the unit interval [0,1]. Many users of simulation do not think about how such numbers are produced by their computers and apply the standard software at hand. This attitude is dangerous since the source of randomness is fundamental for all stochastic simulations and many pseudorandom number generators in use have serious defects. It is also evident that the results of simulation studies depend on the methods of generation of the pseudorandom numbers. The present paper gives a survey of methods for generating deterministic sequences of numbers which can be used as standard pseudorandom number sequences with emphasis on the generation methods studied by the author’s research group in Darmstadt.
Jürgen Lehn

Monte-Carlo-Techniques in Inferential Statistics (2)


Designing Bootstrap Prediction Regions

This article discusses the design and bootstrap construction of asymptotically optimal prediction regions. The emphasis is on devising simultaneous one-sided prediction intervals. A good solution to this problem implies constructions for simultaneous two-sided prediction intervals and for multivariate prediction regions.
Rudolf Beran

The Generalized Bootstrap

The bootstrap of Efron is generalized to the Generalized Bootstrap. The Generalized Bootstrap has superior properties for continuous data in the small-sample non-asymptotic case.
Edward J. Dudewicz

Simulation-Based Multiple Comparisons: Background, Philosophy, and Extensions to the Multivariate General Linear Model

The most widely-used version of the multivariate general linear model (MGLM) supposes that p measurements are made on subject i, denoted here Yi′ = (Yi1,…, Yip), i = 1,.., n. If Y is the nxp matrix whose ith row is Yi′, it is hypothesized that Y = Xβ + ε, where X is an nxq matrix of known constants, β is a qxp matrix of unknown constants, and ε is an nxp matrix of random variables whose rows εi′, i = 1,…,n are independent, each with mean 0 and positive definite covariance matrix Σ. For exact inference, it is assumed that each εi′ has a multivariate normal distribution, but the methods described here will be robust to moderate departures from this assumption for moderate to large n. It is also assumed without loss of generality that X is of full rank q.
Don Edwards, Jack J. Berry

Applications of Monte Carlo Methods in Spatial and Image Analysis

Monte Carlo methods have a long history in spatial statistics, and have often been used very effectively to sidestep problems of analytical or computational intractability. On the other hand, bootstrap and other non-parametric methods have made no impact and are rarely considered. The reasons are immediate but often overlooked. The author was once asked to referee a paper on the spatial organization of monkey troops, in which the positions were recorded every ten minutes. After a page or so on the virtues of “distribution-free tests”, these were used to test hypotheses about the dominant male being the centre of the troop’s movements. Unfortunately for this study, “distribution-free” tests do make one assumption, independence, there violated in both space and time. More generally, resampling methods depend on at least exchangeability.
B. D. Ripley

Bootstrap Bands for Confidence and Prediction Regions (3)


Confidence Bands for Probability Distributions on Vapnik-Chervonenkis Classes of Sets in Arbitrary Sample Spaces Using the Bootstrap

The construction of confidence bands for an unknown distribution function (df) on the real line ℝ using Efron’s (1979) bootstrap procedure is well known through the work of Bickel and Freedman (1981). It is based on a central limit theorem for the bootstrapped empirical process on ℝ, which has been subsequently generalized together with parallel results for empirical processes based on observations in ℝd, d > 1, and in arbitrary sample spaces, where the corresponding processes are then indexed by certain classes of sets or functions. The present state of this subject will be shortly reviewed. Special attention is given to empirical processes indexed by Vapnik-Chervonenkis classes Н of sets in arbitrary sample spaces together with the construction of confidence bands for probability distributions on Н as in the classical case.
Peter Gaenssler

Bootstrap Confidence Bands

Bootstrap confidence bands are constructed for nonparametric regression. Resampling is based on a suitably estimated residual distribution. The procedure is called the Wild Bootstrap. The method is to construct first a fine grid of error bars with simultaneous coverage probability. Second the end-points of these error bars are joined via polygon pieces or parabolae using assumptions on the local curvature of the regression curve.
Wolfgang Härdle, Michael Nussbaum

Applying the Bootstrap to Generate Confidence Regions in Multiple Correspondence Analysis

The bootstrap method, introduced in 1979 by Efron, promised to provide a means to solve many previously unsolved problems in statistics. Among these problems a prominent place is taken by the determination of statistical properties of complex methods for multivariate data analysis. Here we may think of multiparameter models for the exploratory analysis of multivariate data of mixed measurement level, which are usually applied in several steps (cf. Diaconis & Efron, 1983). In the last decade a large number of results on the bootstrap method have been published, most of which, however, concern univariate single parameter models. Although this attention for the relatively simple may be understood from a theoretical point of view, it is not very satisfactory for the practice of data analysis. In this paper we report on a study that may be considered as an approach from the other side. We take a complex method, i.e. multiple correspondence analysis, and try to find out what the bootstrap could contribute to data analysis. This is done mainly by Monte Carlo methods. After a short explanation of the multivariate method and the general methodology, results are reported of two Monte Carlo studies.
Monica Th. Markus, Ron A. Visser

Bootstrap for Statistical Tests (4)


Permutation Versus Bootstrap Significance Tests in Multiple Regression and Anova

Kempthorne’s (1952) formulation of the randomization test is extended to yield a permutational analog of the bootstrap significance test. In the new test, residuals of a multiple regression are permuted instead of being bootstrapped. The test is an attractive alternative for Oja’s test that permutes predictors (Austr. J. Statist. 29, 91–100, 1987).
Cajo J. F. ter Braak

Nonparametric Bootstrap Tests: Some Applications

In a series of papers Beran (1984, 1986, 1988) proposed bootstrap techniques for hypothesis testing. These tests are concerned with the following situation. Let {X 1, X 2,…, X n} be an i.i.d. sample of n random variables with distribution function F and the parameter θ(F) which is a real-valued functional statistic to be tested for H 0: θ(F) = θ 0. To test this hypothesis Beran proposes two alternative approaches. The first one which is called the test statistic approach approximates the exact critical region of the test by the percentiles of its bootstrap distribution, where the unknown distribution of the sample is replaced by its empirical distribution.
Jörg Breitung

A Class of Combinations of Dependent Tests by a Resampling Procedure

A number of statistical hypotheses could be tested by combining k> 2 test statistics in a single statistic. e.g. hypotheses for the equality of several means and variances, or more generally for different effects jointly relevant for the analysis. In this paper, we show how resampling techniques, based on permutations of the data, may be conveniently used to combine the k test statistics, when they are characterized by an unknown dependence structure. These techniques have recently been interpreted (Hinkley (1989), Pesarin (1989) and Romano (1989)) as conditional bootstrap methods.
A. Pallini, F. Pesarin

Bootstrap for Time Series (5)


A Bootstrap Approach for Nonlinear Autoregressions Some Preliminary Results

We consider non-linear autoregressions of order 1, i.e. discrete time processes generated by
$$ {X_{{t + 1}}} = m\left( {{X_t}} \right) + {\varepsilon_{{t + 1}}}, - \infty \, < \,t\, < \,\infty, $$
where εt, - ∞ < t < ∞, are i.i.d. zero-mean real random variables with probability density fε and finite variance \( \sigma_{\varepsilon }^2 = {\rm var} \left( {\varepsilon {}_t} \right) \). Then, the transition probabilities P(x,.) of the Markov chain (1) are absolutely continuous with density fε (.- m(x)).
Jürgen Franke, Matthias Wendel

Bootstrap procedures for AR (∞) — processes

In this paper we will deal with an application of Efron’s 1979 bootstrap to stationary stochastic processes in discrete time. In many applications it is assumed that these processes are of autoregressive or more generally of autoregressive moving average type, i.e. the underlying stationary process X = (X t: tZ = {0, ±1, ±2,…}) is assumed to satisfy the following stochastic difference equation
$$ {X_t} = \sum\limits_{{v = 1}}^p {{a_v}{X_{{t - v}}} + {\varepsilon_t} + \sum\limits_{{\mu = 1}}^q {{b_{\mu }}{\varepsilon_{{t - \mu }}},\;t \in Z} } $$
Here ε = (ε t: tZ) denotes a white noise, that is a sequence of uncorrelated, zero mean random variables with finite variance σ 2 .
Jens-Peter Kreiss

Bootstrapping Some Statistics Useful in Identifying ARMA Models

Consider a zero mean weakly stationary stochastic process with a continuous and nonzero spectral density function, which satisfies the stochastic difference equation
$$ {X_t} = \sum\limits_{{j = 1}}^{\infty} {{a_j}{X_{{t - j}}} + {\varepsilon_t}} $$
for Z. We assume that the associated power series \( A(z) = 1 - \sum\nolimits_{{j = 1}}^{\infty} {{a_j}{z^j}} \) converges and is nonzero for |z| ≤ 1. The random variables ε t are assumed to be independently and identically distributed according to an unknown distribution function F with t = 0 and \( E\varepsilon_t^2 = {\sigma^2} > 0 \).
Efstathios Paparoditis

Bootstrap Approximations to Prediction Intervals for Explosive Ar(1)-Processes

Let X0, X1,…, Xn be observed values from some time series. An important issue then is to predict future values Xn+s from the observables. Usually, the quality of the predictor depends on how well a parametric or semiparametric model may be fitted to the data. E.g., if there is strong evidence for an AR(p)-model
$$ {X_i} = {\beta_1}{X_{{i - 1}}} + \ldots + {\beta_p}{X_{{i - p}}} + {\varepsilon_i} $$
in which the errors (εi)i are i.i.d. with d.f. F, zero means and finite variance, then the optimal predictor for Xn+1 under L2-loss equals
$$ {\hat{X}_{{n + 1}}} = {\beta_1}{X_n} + \ldots + {\beta_p}{X_{{n + 1 - p}}} $$
W. Stute, B. Gründer

Bootstrap for Linear Models (6)


Search for a Break in the Portuguese GDP 1833–1985 with Bootstrap Methods

Since Nelson & Plosser’s (1982) seminal paper, the permanent nature of the macroeconomic fluctuations has become the centre of intense debate. The traditional view that macroeconomic time series, namely the real Gross Domestic Product (GDP), are well described as transitory deviations from a deterministic trend is challenged by the identification of a large permanent component in the series, meaning that the fluctuations represent persistent movements, i.e., that the series follow a stochastic trend. The economic implications of this conclusion are substantial, particularly to the Business Cycle Theory (see Andrade (1990)).
Isabel Andrade, Isabel Proença

One-Step Bootstrapping in Generalized Linear Models

Some very commonly used statistical models are members of the class of generalized linear models introduced by Neider & Wedderbum(1972). Included are probit and logit models for binomially distributed data, log-linear models for Poisson and also classical linear models for normally distributed response data. In all these examples the underlying distribution is an element of an one-parametric exponential family. The ML-estimator for the mean vector has an interpretation as projection of the observation vector onto the space of all possible mean values. The computation is in most situations an iteratively weighted least-square estimation. However, existence and uniqueness of this estimator is not always guaranteed (see Wedderburn(1976) for a set of sufficient conditions). The following example presents logistic regression with explanatory variables xn and only 10 0–1-observations γn. Here the probability of the existence of the ML-estimator β(γ1,…, γ10) is only around 80%.
Olaf Mosbach

Bootstrap for Mean and Covariance Structure Models

The development of the social sciences as empirical sciences has been hampered by their great difficulties to measure variables of substantive interest to the researcher using reliable and valid instruments. Consequently, questions of scaling and measurement have been of great theoretical and practical concern. Researchers working in areas such as sociology, economics, and epidemiology usually wish to connect scales that measure some dependent variables with each other and with explanatory variables. Hence, models that incorporate measurement models and regression models simultaneously have been often applied in these areas of research. The most popular one is the LISREL model (cf. Jöreskog and Sörbom, 1988). A somewhat more general model is the following one: Y i ,i = 1,…, nN are independent, identically distributed random R P -vectors, where each Y i is distributed as
$$ y = \Lambda \eta + \in $$
(measurement model for y), and where
$$ \eta = B\eta + \varsigma $$
(structural model).
Günter Rothe, Gerhard Arminger

Bootstrap: Selected Topics (7)


On The Bayesian Bootstrap

A method to compare the Bayesian bootstrap with a parametric analysis is derived. The method is applied to the mean of a Poisson distribution.
Raul Cano

Bootstrapping the Sample Quantile: A Survey

Let X 1,…, X n be independent and identically distributed (≡ iid) random variables (≡ rvs) with common distribution function (≡ df) F and let T(F) be an unknown parameter of interest. The natural nonparametric estimator of T(F) is T(F n ), where \( {F_n}(t): = {n^{{ - 1}}}{\sum\nolimits_{{i = 1}}^n 1_{{\left( { - \infty, t} \right]}}}\left( {{X_i}} \right),t \in \mathbb{R} \), denotes the empirical df pertaining to the sample X 1 ,…, X n Our target function is now the df of the estimator T(F n ), centered at the unknown parameter T(F), i.e.
$$ {G_n}(x): = {P_F}\left\{ {{n^{{\frac{1}{2}}}}\left( {T\left( {{F_n}} \right) - T(F} \right) \leqslant x} \right\},x \in \mathbb{R} $$
In a great many of cases, the factor n 1/2 is the corrected standardization constant to ensure the nondegenerate limit distribution of n 1/2 (T(F n) -T(F)) as n increases.
Michael Falk

Bootstrapping Conditional Curves

It is shown that consistency and more refined results for the bootstrap hold in the conditional framework if the analogue is valid in the unconditional case. We are particularly interested in the asymptotic performance of bootstrap d.f.’s in case of mean and median regression functionals.
M. Falk, R.-D. Reiss

Resampling Stochastic Processes using a Bootstrap Approach

A bootstrap method for weakly stationary Gaussian sequences is presented. The resampling is done among the coefficients of the random Fourier representation of a sample from the sequence. The independent increments for the spectral representation of a sequence justifies the choice of the method and the resampling provides new bootstrap realizations. Validity of the method is checked on a typical spectral estimator and a correlation estimator using computer simulation of large samples. The results are satisfactory and point towards asymptotic validity of the method.
Anders Nordgaard

Applications in Epidemiology and Medical Statistics (8)


Exploring Heterogeneous Risk Structure: Comparison of a Bootstrapped Model Selection and Nonparametric Classification Technique

The intention of epidemiological studies is the identification of risk factors. Various techniques are used to assess the quantitative nature of effects, e.g. logistic risk estimation. We display data from an adverse drug reaction study with obvious and strong heterogeneity of risks in strata. Since for this data the origin of heterogeneity is well understood, we compare two methods for their ability of detecting different risk structures. It turns out that the nonparametric classification technique easily uncovers different risk strata, whereas the bootstrapped model selection approach tails. Some insight into the structural properties of bootstrap samples explains the failure of the bootstrap approach.
Peter Dirsched, Renate Grohmann

Bootstrapping Current Life Table Estimators

This research involves development of appropriate bootstrap methods for current life table estimators, where the data have a structure that is more complex than that arising from independent and identically distributed sampling. In general outline, the bootstrap approach described here involves conceptualization of a hypothetical cohort to correspond to each row of the life table. Simple random samples with replacement are repeatedly taken from the “observations” for each hypothetical cohort. These contribute to the construction of a set of simulated life tables which provides approximate sampling distributions for the estimators of the biometric functions contained within it. Methods are illustrated using data for the 1980 San Diego County (U.S.A.) population aged 65 and over.
Amanda L. Golbeck

Jackknifing Estimators of a Common Odds Ratio from Several 2×2 Tables

This paper gives a review of certain jackknife estimators of a common odds ratio in the situation of several 2×2 tables. These jackknife estimators are based on the classical Mantel-Haenszel estimator and on some asymptotically efficient noniterative estimators. Finite-sample results and asymptotic properties for increasing sample sizes, but a fixed number of tables are summarized. Some open problems in this field of research are indicated.
Iris Pigeot

An Application of the Bootstrap in Clinical Chemistry

We apply the bootstrap to a one-dimensional nonparametric discrimination problem in clinical chemistry: the construction of a cut-off point (discrimination limit) for a quantitative diagnostic test on the basis of a sample of “diseased” and “non-diseased” individuals and the subsequent evaluation of the resulting decision rule in the same sample. When the cut-off point is selected to maximize some performance criterion, the sample estimate of maximal performance is known to systematically overestimate the unknown true performance of the test at the selected cut-off point. Bootstrap methods are proposed in the classical paper by EFRON [2] to reduce bias and to obtain confidence intervals. We apply these methods and investigate their performance by a simulation study.
Helmut Schäfer

Applications of Monte-Carlo-Techniques (9)


Computer Aided Simulation Analysis (Casa) in Order to Test Cost-Effectiveness of Drugs Using the Example of Ceftriaxone (RocephinR)

It is well known that the methods currently being used in successful drug development are very expensive, involving a large number of carefully monitored clinical trials and provide the results only after a long period of time. Even when a standardized registration procedure is followed, however, it does not necessarily guarantee that the medication will become available on the market. This may be due in part to the pharmaceutical industry which, in addition to fulfilling its usual requirements in establishing sufficient proof of the efficacy and safety of a new drug to the regulatory authorities as well as to the health insurances, also often has to substantiate the drugs economic efficiency.
Rito Bergemann, Arno Brandt, Thomas Nawrath, Alexander Richter, Walter Siegrist, Fred Sorenson

Some Modelling and Simulation Issues Related to Bootstrapping

The paper presented at the conference was based primarily upon “Estimating Model Discrepancy,” Technometrics, 1990, co-authored with Shane Pederson. Since this article is easily available at University libraries, the present paper will focus on the chronology of work leading up to the Technometrics article. Further, some basic simulation and modelling issues related to bootstrapping are presented.
Mark E. Johnson

Supercomputers for Monte Carlo Simulation: Cross-Validation Versus Rao’s Test in Multivariate Regression

Part I covers vector computers, in the context of Monte Carlo experiments with regression models. These computers should exploit a specific dimension of the Monte Carlo experiment, namely its replicates. The resulting code computes Ordinary Least Squares (OLS) estimates on a CYBER 205 in 2% of the time needed on a Vax 8700. For Generalized Least Squares, however, the code runs slower on the CTBER 205 if the regression model is small; for large models the CYBER runs much faster. Part II covers regression models with correlated errors. To test the validity of the specified regression model, Rao (1959) generalized the F statistic for lack of fit, whereas Kleijnen (1983) proposed cross-validation using Student’s t statistic and Bonferroni’s inequality. A large Monte Carlo experiment compares these two methods, for normal and non-normal errors. Under normality, cross-validation is conservative, whereas Rao’s test realizes its nominal type I error and has high power. Several confidence interval procedures for regression parameters are also compared. Under lognormality, only cross-validation with OLS works.
Jack P. C. Kleijnen


Weitere Informationen