main-content

## Über dieses Buch

The articles in this volume were contributed by the friends of Lucien Le Cam on the occasion of his 70th birthday in November 1994. We wish him a belated happy birthday. In addition to all the usual excuses for our tardiness in the preparation of the volume, we must point to the miracles of modern computing . .Az3 the old proverb almost put it: there's many a slip 'twixt \cup and \baselineskip. We beg forgiveness of any of our infinitely patient contributors who find that the final product does not quite match with the galley proofs. Our task was also made harder by the sad death of our friend and fellow editor, Erik Torgersen. We greatly appreciate the editorial help of David Donoho with one of the more troublesome contributions. In addition to the 29 contributed articles, we have included a short vita, a list of publications, and a list of Lucien's Ph.D. students. We are also pleased that Lucien allowed us to include a private letter, written to Grace Yang, in response to a query about the extent of his formal mathematical training. The letter gives some insights into what made Lucien one of the leading mathematical statisticians of the century.

## Inhaltsverzeichnis

### 1. Counting Processes and Dynamic Modelling

Abstract
I give some historical comments concerning the introduction of counting process theory into survival analysis. The concept of dynamic modelling of counting processes is discussed, focussing on the advantage of models that are not of proportional hazards type. The connection with a statistical definition of causality is pointed out. Finally, the concept of martingale residual processes is discussed briefly.
Odd O. Aalen

### 2. Multivariate Symmetry Models

Abstract
Let Γ0 be a fixed, compact subgroup of the group Γ of orthogonal transformations on R d . A random variable x, with values in R d and distribution P, is Γ 0 -symmetric if x and γx have the same distribution for all γ ∈Γ0. In terms of P, this means P(A) = P(ΓA) for all Borel sets A and all γ ∈Γ0. Let x1,…,x n be iid random variables with values in R d . The Γ0-symmetry model asserts that the x1 have an unknown common distribution that is Γ0-symmetric. The Γ 0 -location model specifies that for some unknown η∈R d , the random variables x 1-η,…, x n-η have an unknown common distribution P which is Γ0-symmetric. This paper develops some methods of inference for these multivariate symmetry models. Unlike the one dimensional case, there are a large number of “symmetry” notions in R d ,d > 1; Section 2.3 provides a few simple, useful examples, which figure in subsequent development.
R. J. Beran, P. W. Millar

### 3. Local Asymptotic Normality of Ranks and Covariates in Transformation Models

Abstract
Le Cam & Yang (1988) addressed broadly the following question: Given observations X (n) = (X1n,…, Xnn) distributed according to P θ (n) ; θR k
such that the family of probability measures {P θ (n) ;}has a locally asymptotically normal (LAN) structure at θ0 and a statistic $${{Y}^{{\left( n \right)}}} = {{g}_{{\left( n \right)}}}\left( {{{X}^{{\left( n \right)}}}} \right)$$:
(i)
(i) When do the distributions of Y(n) also have an LAN structure at θ0?

(ii)
When is there no loss in information about B in going from X (n) to Y (n)?

P. J. Bickel, Y. Ritov

### 4. From Model Selection to Adaptive Estimation

Abstract
Many different model selection information criteria can be found in the literature in various contexts including regression and density estimation. There is a huge amount of literature concerning this subject and we shall, in this paper, content ourselves to cite only a few typical references in order to illustrate our presentation. Let us just mention AIC, C p , or C L , BIC and MDL criteria proposed by Akaike (1973), Mallows (1973), Schwarz (1978), and Rissanen (1978) respectively. These methods propose to select among a given collection of parametric models that model which minimizes an empirical loss (typically squared error or minus log-likelihood) plus some penalty term which is proportional to the dimension of the model. From one criterion to another the penalty functions differ by factors of log n, where n represents the number of observations.
Lucien Birgé, Pascal Massart

### 5. Large Deviations for Martingales

Abstract
Let X1,X2… be variables satisfying
(i)
(i) |X n | ≤ 1 and

(ii)
X(X n |X 1,…,X n-1)=0,

and put n =X 1+…X n .
D. Blackwell

### 6. An Application of Statistics to Meteorology: Estimation of Motion

Abstract
Concern is with moving meteorological phenomena. Some existing techniques for the estimation of motion parameters are reviewed. Fourier-based and generalized-additive-model-based analyses are then carried out for the global geopotential 500 millibar (mb) height field during the period 1-6 January 1986.
David R. Brillinger

### 7. At the Interface of Statistics and Medicine: Conflicting Paradigms

Abstract
The association of Professor Le Cam with clinical investigators started in 1973, when he collaborated in the design of a clinical trial using an immunotherapeutic agent in patients with osteogenic sarcoma. Collaborations between statisticians and clinicians started quite early—British clinicians and statisticians were publishing studies together in the 1950’s. However before the 1960’s such collaborations were sporadic, and certainly not mandatory. By 1973 they were quite pervasive, so the history of Professor Le Cam’s involvement in clinical research probably mirrored closely the development of collaborations between clinical researchers and statisticians. Those of us in clinical research have seen the growing influence of statisticians. The statistical-clinical relationship is an evolving one and remains, at times, rather rocky. This friendly antagonism is likely due to the institutional differences in perception and priorities between the two professions.
Vera S. Byers, Kenneth Gorelick

### 8. Points Singuliers des Modèles Statistiques

Abstract
Lucien Le Cam a donné un cadre fondamental au développement d’une théorie unifiée de la statistique asymptotique. Il a aussi donné (quelque fois malicieusement) des exemples où les méthodes générales s’appliquent mal, ces difficultés étant le plus souvent liées soit à la complexité topologique de l’espace des paramètres, soit à l’absence de domination du modèle.
D. Dacunha-Castelle

### 9. Exponential Tightness and Projective Systems in Large Deviation Theory

Abstract
Let {E α ,p α β } be a projective system of Hausdorff topological spaces; here α,β∈A, a directed set, and p α β :E β E α is a continuous surjective map for α<β (for details, see Section 3). Let E be a Hausdorff topological space endowed with a σ-algebra,ε (possibly smaller than the Borel σ-algebra), and for each α∈A, let p α :EE α be a continuous measurable surjective map such that p α =p α β 0p β for a α<β Let {µn}be a sequence of probability measures on ε,and assume that for each a α∈ A,the sequence {µn o p α -1 } satisfies the large deviation principle (see Section 2). In this paper we show that under suitable additional assumptions, the large deviation principle for {µn} follows.
A. de Acosta

### 10. Consistency of Bayes Estimates for Nonparametric Regression: A Review

Abstract
This paper reviews some recent studies of frequentist properties of Bayes estimates. In nonparametric regression, natural priors can lead to inconsistent estimators; although in some problems, such priors do give consistent estimates.
P. Diaconis, D. A. Freedman

### 11. Renormalizing Experiments for Nonlinear Functionals

Abstract
Let f = f(t), t R d be an unknown “object” (real-valued function), and suppose we are interested in recovering the nonlinear functional T(f). We know a priori that f ∈F, a certain convex class of functions (e.g. a class of smooth functions). For various types of measurements Yn=(yl, y2,…, yn), problems of this form arise in statistical settings, such as nonparametric density estimation and nonparametric regression estimation; but they also arise in signal recovery and image processing. In such problems, there generally exists an “optimal rate of convergence”: the minimax risk from n observations, $$R\left( n \right) = \mathop{{\inf }}\limits_{{\hat{T}}} \mathop{{\sup }}\limits_{{f \in F}} E{{\left( {\hat{T}\left( {{{Y}_{n}}} \right) - T\left( f \right)} \right)}^{2}}$$ tends to zero as. $$R\left( n \right) \asymp {{n}^{{ - r}}}$$ There is ariety of functionals T, function classes.F, and types of observation Yn; the literature is really too extensive to list here, although we mention Ibragimov & Has’minskii (1981), Sacks & Ylvisaker (1981), and Stone (1980). Lucien Le Cam (1973) has contributed directly to this literature, in his typical abstract and profound way; his ideas have stimulated the work of others in the field, e.g. Donoho & Liu (1991a).
David L. Donoho

### 12. Universal Near Minimaxity of Wavelet Shrinkage

Abstract
We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coefficients towards the origin by an amount $$\sqrt {{2\log \left( n \right)}} \cdot \sigma /\sqrt {n}$$ The method is nearly minimax for a wide variety of loss functions-e.g. pointwise error, global error measured in LP norms, pointwise and global error in estimation of derivatives—and for a wide range of smoothness classes, including standard Hölder classes, Sobolev classes, and Bounded Variation. This is a broader near-optimality than anything previously proposed in the minimax literature. The theory underlying the method exploits a correspondence between statistical questions and questions of optimal recovery and information-based complexity. This paper contains a detailed proof of the result announced in Donoho, Johnstone, Kerkyacharian & Picard (1995).
D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, D. Picard

### 13. Empirical Processes and p-variation

Abstract
Remainder bounds in Fréchet differentiability of functionals for p-variation norms are found for empirical distribution functions. For $$1 \leqslant p \leqslant 2$$ the p-variation of the empirical process n1/2(Fn — F) is of order n1-p/2 in probability up to a factor (log log n) p /2. For $$\left( {F,G} \right) \mapsto \int {FdG}$$ and for $$\left( {F,G} \right) \mapsto F \circ {{G}^{{ - 1}}}$$ this yields nearly optimal remainder bounds. Also, p-variation gives new proofs for the asymptotic distributions of the Cramér-von Mises-Rosenblatt and Watson two-sample statistics when the two sample sizes m, n go to infinity arbitarily.
R. M. Dudley

### 14. A Poisson Fishing Model

Abstract
A fishing model of Starr, Wardrop, and Woodroofe is related to the sequential search model of Cozzolino. The latter is generalized to allow an arbitrary joint distribution of capture times and fish sizes. Implications to the foraging models of Oaten and Green and to debugging software are indicated.
Thomas S. Ferguson

### 15. Lower Bounds for Function Estimation

Abstract
Over the last thirty years, it has been recognized that a common abstract framework underlies many basic problems of nonparametric estimation. In that framework, f is an unknown function to be estimated, known to belong to a class.P of smooth functions, and an observation X is available in order to perform the estimation. The X has its values in a measurable space (E,13) and obeys a probability law indexed by f and denoted Pf. The set of all probabilities P 9 where g varies in.P is denoted P.
Catherine Huber

### 16. Some Estimation Problems in Infinite Dimensional Gaussian White Noise

Abstract
Statistical problems for infinite dimensional Gaussian white noise arise in a natural way when one tries to study statistical questions connected with stochastic partial differential equations (Hübner, Khas’minskii & Rozovskii 1993). In this paper we analyze the simplest situation of estimation of the shift parameter in Gaussian white noise. We considered the one-dimensional case in Ibragimov & Has’minskii (1977). It turns out that the problems for infinite dimensional white noise have a much richer analytical content, as we hope to show in this and other papers that we are planning to write.
I. Ibragimov, R. Khasminskii

### 17. On Asymptotic Inference in AR and Cointegrated Models With Unit Roots and Heavy Tailed Errors

Abstract
Consider the AR(q) model
$${{{\text{X}}}_{i}} = {{\beta }^{{\left( 1 \right)}}}{{{\text{X}}}_{{i - 1}}} + \cdots + {{\beta }^{{\left( q \right)}}}{{{\text{X}}}_{{i - q}}} + {{ \in }_{i}},fori = 1,2, \ldots ,n,$$
(1)
where i i ≥ 1, are i.i.d., independent of (X 0 ,…, X i-q ). The characteristic polynomial associated with the model (1) is defined by
$$\phi \left( z \right) = 1 - {{\beta }^{{\left( 2 \right)}}}{{z}^{2}} - \cdots - {{\beta }^{{\left( q \right)}}}{{z}^{q}}$$
(2)
.
P. Jeganathan

### 18. Le Cam at Berkeley

Abstract
Written in appreciation of the pleasure and many benefits I have received from over forty years of friendship and collegiality with Lucien Le Cam.
E. L. Lehmann

### 19. Another Look at Differentiability in Quadratic Mean

Abstract
This note revisits the delightfully subtle interconnections between three ideas: differentiability, in an S 2 sense, of the square-root of a probability density; local asymptotic normality; and contiguity.
David Pollard

### 20. On a Set of the First Category

Abstract
In an analysis of the bootstrap Putter & van Zwet (1993) showed that under quite general circumstances, the bootstrap will work for “most” underlying distributions. In fact, the set of exceptional distributions for which the bootstrap does not work was shown to be a set D of the first category in the space S of all possible underlying distributions, equipped with a topology S. Such a set of the first category is usually “small” in a topological sense. However, it is known that this concept of smallness may sometimes be deceptive and in unpleasant cases such “small” sets may in fact be quite large.
Here we present a striking and hopefully amusing example of this phenomenon, where the “small” subset D equals all of S. We show that as a result, a particular version of the bootstrap for the sample minimum will never work, even though our earlier results tell us that it can only fail for a “small” subset of underlying distributions. We also show that when we change the topology on S—and as a consequence employ a different resampling distribution—this paradox vanishes and a satisfactory version of the bootstrap is obtained. This demonstrates the importance of a proper choice of the resampling distribution when using the bootstrap.
Hein Putter, Willem R. van Zwet

### 21. A Limiting Distribution Theorem

Abstract
Under some mild conditions, we establish the limiting distribution of a test statistic proposed by Ebrahimi & Habibullah (1992) for testing exponentiality based on sample entropy.
C. R. Rao, L. C. Zhao

### 22. Minimum Distance Estimates with Rates under ø-mixing

Abstract
On the basis of the segment of observations X1,…, Xn from a ∅-mixing sequence of random variables, a minimum distance estimate $${{\hat{P}}_{n}}$$ of the probability measure P,governing the process, is constructed. Under suitable regularity conditions, it is shown that $${{\hat{P}}_{n}}$$ is weakly uniformly consistent, within the class P of assumed probability measures, at the same rate as in the independent identically distributed case. Strengthening of the underlying assumptions provides for strong consistency.
George G. Roussas, Yannis G. Yatracos

### 23. Daniel Bernoulli, Leonhard Euler, and Maximum Likelihood

Abstract
The history of statistical concepts usually hinges on subtle questions of definition, on what one sees as a crucial element in the concept. Is the simple statement of a goal crucial? Or do we require the investigation of the implications of pursuing that goal, perhaps including the discovery of anomalies that require specification of conditions under which a claimed property holds? Or the detailed successful exploration of those conditions? Such considerations certainly arise in the case of the method of maximum likelihood. If the object of study is the modern theory of maximum likelihood, of its efficiency in large samples in a parametric setting, then an argument could be made for beginning with Edgeworth (19081909) (see Pratt (1976)), or Fisher (1912 or1922or1935) (see Edwards, 1974), or even Wald (1949) or Le Cam (1953). It might be thought that the question would be easy to resolve if instead of worrying about mathematical rigor and the deeper questions of inference, including the interpretation of statistical information, we only asked about the introduction of the idea of choosing, as an estimate, that value which maximizes the likelihood function, but that is not the case. Even at that level difficulties of interpretation arise. Was Gauss employing maximum likelihood in 1809 when he arrived at the method of least squares, or, as some of his development would lead you to believe, was he maximizing a posterior density with a uniform prior?
Stephen M. Stigler

### 24. Asymptotic Admissibility and Uniqueness of Efficient Estimates in Semiparametric Models

Abstract
The concept of local asymptotic efficiency of estimators can be made precise in several ways. In semiparametric theory most authors are using local asymptotic minimaxity or asymptotic convolution theorems. We will show how Le Cam’s asymptotic admissibility theorem and Hájek’s asymptotic uniqueness result can be applied to semiparametric problems.
Helmut Strasser

### 25. Contiguity in Nonstationary Time Series

Abstract
We show how a result of Cox & Llatas (1991) can be derived using contiguity arguments. Also we compare the asymptotic power functions of three tests of the characteristic polynomial of an AR(1) process having a root at unity.
A. R. Swensen

### 26. More Optimality Properties of the Sequential Probability Ratio Test

Abstract
Consider the problem of testing sequentially the null hypothesis “ $$\theta = 0$$ ” against the alternative “ $$\theta = 1$$ ” on the basis of i.i.d. potentially observable variables X1, X2,…. Let N be a stopping rule admitting a test based on (Xi,…, XN) having probabilities of errors ao and ai. Then the Hellinger transform of (Xi,…, XN) is at most equal to that of (X1,..., XN*.) where N* is the stopping rule of a sequential probability ratio test 5‘ having the same probabilities of errors. In particular the Hellinger distance between the distributions of (X1,…, XN) under $$\theta = 0$$ and $$\theta = 1$$ is at least equal to the same distance for (X1,…, XN*.). This remains so if the Hellinger distance is replaced by the statistical distance and provided the number 1 is not outside the stopping bounds.
E. Torgersen

### 27. Superefficiency

Abstract
We review the history and several proofs of the famous result of Le Cam that a sequence of estimators can be superefficient on at most a Lebesgue hull set.
A. W. van der Vaart

### 28. Le Cam’s Procedure and Sodium Channel Experiments

Abstract
Consider a random variable U which has either a discrete distribution,
$$\begin{array}{*{20}{c}} {P\left[ {U = 0} \right] = 1 - p,} \hfill \\ {P\left[ {U = u} \right] = p\left( {1 - \lambda } \right){{\lambda }^{u}}} \hfill \\ \end{array}$$
(1)
for u = 1, 2,…, with parameters 0 < p < 1 and $$0 < \lambda < 1$$ ,or a mixture distribution with an atom at zero and an exponential density for u > 0,
$$\begin{array}{*{20}{c}} {P\left[ {U = 0} \right] = 1 - p,} \hfill \\ {{\text{ }}p\lambda \exp \left( { - \lambda u} \right),} \hfill \\ \end{array}$$
(2)
for $$\lambda > 0$$
Grace L. Yang

### 29. Assouad, Fano, and Le Cam

Abstract
This note explores the connections and differences between three commonly used methods for constructing minimax lower bounds in nonparametric estimation problems: Le Cam’s, Assouad’s and Fano’s. Two connections are established between Le Cam’s and Assouad’s and between Assouad’s and Fano’s. The three methods are then compared in the context of two estimation problems for a smooth class of densities on [0,1]. The two estimation problems are for the integrated squared first derivatives and for the density function itself.
Bin Yu
Weitere Informationen