Skip to main content

1996 | Buch

Weak Convergence and Empirical Processes

With Applications to Statistics

verfasst von: Aad W. van der Vaart, Jon A. Wellner

Verlag: Springer New York

Buchreihe : Springer Series in Statistics

insite
SUCHEN

Über dieses Buch

This book tries to do three things. The first goal is to give an exposition of certain modes of stochastic convergence, in particular convergence in distribution. The classical theory of this subject was developed mostly in the 1950s and is well summarized in Billingsley (1968). During the last 15 years, the need for a more general theory allowing random elements that are not Borel measurable has become well established, particularly in developing the theory of empirical processes. Part 1 of the book, Stochastic Convergence, gives an exposition of such a theory following the ideas of J. Hoffmann-J!1Jrgensen and R. M. Dudley. A second goal is to use the weak convergence theory background devel­ oped in Part 1 to present an account of major components of the modern theory of empirical processes indexed by classes of sets and functions. The weak convergence theory developed in Part 1 is important for this, simply because the empirical processes studied in Part 2, Empirical Processes, are naturally viewed as taking values in nonseparable Banach spaces, even in the most elementary cases, and are typically not Borel measurable. Much of the theory presented in Part 2 has previously been scattered in the journal literature and has, as a result, been accessible only to a relatively small number of specialists. In view of the importance of this theory for statis­ tics, we hope that the presentation given here will make this theory more accessible to statisticians as well as to probabilists interested in statistical applications.

Inhaltsverzeichnis

Frontmatter

Stochastic Convergence

Frontmatter
1.1. Introduction

The first goal in this book is to give an exposition of the modern weak convergence theory suitable for the study of empirical processes.

Aad W. van der Vaart, Jon A. Wellner
1.2. Outer Integrals and Measurable Majorants

Let (Ω, A, P) be an arbitrary probability space and $$T:\Omega \mapsto \overline R $$ an arbitrary map. The outer integral of T with respect to P is defined as $$E*T = \inf \left\{ {EU:UT,U:T:\Omega \mapsto \overline R \;measurable\;and\;EU\;exists} \right\}$$

Aad W. van der Vaart, Jon A. Wellner
1.3. Weak Convergence

In this section D and E are metric spaces with metrics d and e, respectively. The set of all continuous, bounded functions f: D ↦ ℝ is denoted C b (D).

Aad W. van der Vaart, Jon A. Wellner
1.4. Product Spaces

Let D and E be metric spaces with metrics d and e. Then the Cartesian product D × E is a metric space for any of the metrics $$\begin{array}{*{20}{c}} {c\left( {\left( {{{x}_{1}},{{y}_{1}}} \right),\left( {{{x}_{2}},{{y}_{2}}} \right)} \right) = d\left( {{{x}_{1}},{{x}_{2}}} \right) \vee e\left( {{{y}_{1}},{{y}_{2}}} \right),} \\ {c\left( {\left( {{{x}_{1}},{{y}_{1}}} \right),\left( {{{x}_{2}},{{y}_{2}}} \right)} \right) = \sqrt {{d{{{\left( {{{x}_{1}},{{x}_{2}}} \right)}}^{2}} + e{{{\left( {{{y}_{1}},{{y}_{2}}} \right)}}^{2}},}} } \\ {c\left( {\left( {{{x}_{1}},{{y}_{1}}} \right),\left( {{{x}_{2}},{{y}_{2}}} \right)} \right) = d\left( {{{x}_{1}},{{x}_{2}}} \right) + e\left( {{{y}_{1}},{{y}_{2}}} \right).} \\ \end{array}$$ These generate the same topology, the product topology.

Aad W. van der Vaart, Jon A. Wellner
1.5. Spaces of Bounded Functions

Let T be an arbitrary set. The space ℓ∞(T) is defined as the set of all uniformly bounded, real functions on T: all functions z: T ↦ ℝ such that $$ {\left\| z \right\|_T}: = \mathop {\sup }\limits_{t \in T} \left| {z\left( t \right)} \right|\infty $$

Aad W. van der Vaart, Jon A. Wellner
1.6. Spaces of Locally Bounded Functions

Let T1 ⊂ T2 ⊂ ... be arbitrary sets and T = ⋃ i=1∞T i . The space ℓ∞ (T1, T2,,...) is defined as the set of all functions z: T ↦ ℝ that are uniformly bounded on every T i (but not necessarily on T). This is a complete metric space with respect to the metric $$ d\left( {{z_1},{z_2}} \right) = \sum\limits_{i = 1}^\infty {\left( {{{\left\| {{z_1} - {z_2}} \right\|}_{{T_i}}} \wedge 1} \right){2^{ - i}}} $$

Aad W. van der Vaart, Jon A. Wellner
1.7. The Ball Sigma-Field and Measurability of Suprema

The ball σ-field on D is the smallest σ-field containing all the open (and/or closed) balls in D. In general, this is smaller than the Borel σ-field, although the two σ-fields are equal for separable spaces (Problems 1.7.3 and 1.7.4). For some nonseparable spaces, it is even fairly common that maps are ball measurable even though they are not Borel measurable. Thus one may wonder about the possibility of a weak convergence theory for ball measurable maps. It turns out that the set of ball measurable f ∈ C b (D) is rich enough to make this fruitful, but at the same time it is so rich that the theory is a special case of the theory that we have discussed so far.

Aad W. van der Vaart, Jon A. Wellner
1.8. Hilbert Spaces

Let ℍ be a (real) Hilbert space with inner product 〈·,·〉 and complete orthonormal system {e j : j ∈ J}. Thus (e i , e i ) equals 0 or 1 if i ≠ j or i = j, respectively, and every x ∈ℍ can be written as $$x = \sum\limits_j {\left\langle {x,} \right.} \left. {{e_j}} \right\rangle {e_j}$$

Aad W. van der Vaart, Jon A. Wellner
1.9. Convergence: Almost Surely and in Probability

For nets of maps defined on a single, fixed probability space (Ω, A, P), convergence almost surely and in probability are frequently used modes of stochastic convergence, stronger than weak convergence. In this section we consider their nonmeasurable extensions together with the concept of almost uniform convergence, which is equivalent to outer almost sure convergence for sequences, but stronger and more useful for general nets.

Aad W. van der Vaart, Jon A. Wellner
1.10. Convergence: Weak, Almost Uniform, and in Probability

Consider the relationships between the convergence concepts introduced in the previous section and weak convergence. First we shall be a bit formal and note that convergence in probability to a constant can be defined for maps with different domains (Ω α , A α , P α ) too, so that it is not covered by Definition 1.9.1 in the preceding section.

Aad W. van der Vaart, Jon A. Wellner
1.11. Refinements

The continuous mapping theorems for the three modes of stochastic convergence considered so far can be refined to cover maps g n (X n ), rather than g(X n ), for a fixed g. Then the g n should have a property that might be called asymptotic equicontinuity almost everywhere under the limit measure.

Aad W. van der Vaart, Jon A. Wellner
1.12. Uniformity and Metrization

In principle, weak convergence is the pointwise convergence of “operators” X α or L α on the space C b (D). However, there is automatically uniform convergence over certain subsets. These subsets can be fairly big: equicontinuity and boundedness suffice. On the other hand, there also exist small (countable) subsets such that pointwise convergence on such a subset is automatically uniform, and equivalent to pointwise convergence on the whole of C b (D), i.e. weak convergence. For separable D, it is even possible to pick such a countable subset that works for every X α , at the same time.

Aad W. van der Vaart, Jon A. Wellner

Empirical Processes

Frontmatter
2.1. Introduction

This part is concerned with convergence of a particular type of random map: the empirical process. The empirical measure ℙ n of a sample of random elements X1i,...,X n on a measurable space (X, A) is the discrete random measure given by ℙ n (C) = n−1#(1 ≤ i ≤ n: X i ∈ C). Alternatively (if points are measurable), it can be described as the random measure that puts mass 1/n at each observation. We shall frequently write the empirical measure as the linear combination $${{\rm P}_n} = {n^{ - 1}}\sum _{i = 1}^n{\delta _{{X_i}}}$$ of the dirac measures at the observations.

Aad W. van der Vaart, Jon A. Wellner
2.2. Maximal Inequalities and Covering Numbers

In this chapter we derive a class of maximal inequalities that can be used to establish the asymptotic equicontinuity of the empirical process. Since the inequalities have much wider applicability, we temporarily leave the empirical process framework.

Aad W. van der Vaart, Jon A. Wellner
2.3. Symmetrization and Measurability

One of the two main approaches toward deriving Glivenko-Cantelli and Donsker theorems is based on the principle of comparing the empirical process to a “symmetrized” empirical process. In this chapter we derive the main symmetrization theorem, as well as a number of technical complements, which may be skipped at first reading.

Aad W. van der Vaart, Jon A. Wellner
2.4. Glivenko-Cantelli Theorems

In this chapter we prove two types of Glivenko-Cantelli theorems. The first theorem is the simplest and is based on entropy with bracketing. Its proof relies on finite approximation and the law of large numbers for real variables. The second theorem uses random L1-entropy numbers and is proved through symmetrization followed by a maximal inequality.

Aad W. van der Vaart, Jon A. Wellner
2.5. Donsker Theorems

In this chapter we present the two main empirical central limit theorems. The first is based on uniform entropy, and its proof relies on symmetrization. The second is based on bracketing entropy.

Aad W. van der Vaart, Jon A. Wellner
2.6. Uniform Entropy Numbers

In Section 2.5.1 the empirical process was shown to converge weakly for indexing sets F satisfying a uniform entropy condition. In particular, if $$ s\mathop u\limits_Q p\log N\left( {\varepsilon \parallel F{\parallel _{Q,2}},F,\mathop L\nolimits_2 \left( Q \right)} \right) \leqslant K{\left( {\frac{1}{\varepsilon }} \right)^{2 - \delta }} $$ for some δ >0, then the entropy integral (2.5.1) converges and F is a Donsker class for any probability measure P such that P*F2 < ∞, provided measurability conditions are met. Many classes of functions satisfy this condition and often even the much stronger condition $$ s\mathop u\limits_Q pN\left( {\varepsilon \parallel F{\parallel _{Q,2}},F,{L_2}\left( Q \right)} \right) \leqslant K{\left( {\frac{1}{\varepsilon }} \right)^v},0 < \varepsilon < 1 $$ for some number V. In this chapter this is shown for classes satisfying certain combinatorial conditions. For classes of sets, these were first studied by Vapnik and Červonenkis, whence the name VC-classes. In the second part of this chapter, VC-classes of functions are defined in terms of VC-classes of sets. The remainder of this chapter considers operations on classes that preserve entropy properties, such as taking convex hulls.

Aad W. van der Vaart, Jon A. Wellner
2.7. Bracketing Numbers

While the VC-theory gives control over the entropy numbers of many interesting classes through simple combinatorial arguments, results on bracketing numbers can be found in approximation theory. This section gives examples. In some cases, the bracketing numbers are actually uniform in the underlying measure.

Aad W. van der Vaart, Jon A. Wellner
2.8. Uniformity in the Underlying Distribution

The previous chapters present empirical laws of large numbers and central limit theorems for observations from a fixed underlying distribution P. Many of the sufficient conditions given there are actually satisfied by very large classes of underlying measures; typically, the only limitation is finiteness of some appropriate moment of the envelope function. For instance, classes satisfying the uniform entropy condition are, up to measurability, Glivenko-Cantelli or Donsker for all P with P*F < ∞ or P*F2 < ∞, respectively. In particular, many bounded classes of functions are universally Donsker: Donsker for every probability measure on the sample space.

Aad W. van der Vaart, Jon A. Wellner
2.9. Multiplier Central Limit Theorems

With the notation Z i =δx i − P, the empirical central limit theorem can be written $$ \frac{1}{{\sqrt n }}\sum\limits_{i = 1}^n {{Z_i}} \rightsquigarrow G $$ in e∞(F), where G is a (tight) Brownian bridge. Given i.i.d. real-valued random variables ξi,..., ξ n , which are independent of Z1,..., Z n , the multiplier central limit theorem asserts that $$ \frac{1}{{\sqrt n }}\sum\limits_{i = 1}^n {{\xi _i}{Z_i}} \rightsquigarrow G $$.

Aad W. van der Vaart, Jon A. Wellner
2.10. Permanence of the Donsker Property

In this chapter we consider a number of operations that preserve the Donsker property and allow the formation of many new Donsker classes from given examples. For instance, unions, convex hulls, and certain closures of Donsker classes are Donsker. The main result of this chapter concerns Lipschitz transformations of Donsker classes and is discussed in Section 2.10.2. Section 2.10.3 covers the preservation of the uniform-entropy condition, and in Section 2.10.4 new Donsker classes are formed through union of sample spaces.

Aad W. van der Vaart, Jon A. Wellner
2.11. The Central Limit Theorem for Processes

So far we have focused on limit theorems and inequalities for the empirical process of independent and identically distributed random variables. Most of the methods of proof apply more generally. In this chapter we indicate some extensions to the case of independent but not identically distributed processes.

Aad W. van der Vaart, Jon A. Wellner
2.12. Partial-Sum Processes

The name “Donsker class of functions” was chosen in honor of Donsker’s theorem on weak convergence of the empirical distribution function. A second famous theorem by Donsker concerns the partial-sum process $$ {\mathbb{Z}_n}(s) = \frac{1}{{\sqrt n }}\sum\limits_{i = 1}^{\left[ {ns} \right]} {{Y_i}} = \frac{1}{{\sqrt n }}\sum\limits_{i = 1}^k {{Y_i}} ,\frac{k}{n} \leqslant s < \frac{{k + 1}}{n}, $$ for i.i.d. random variables Y1, …, Y n with zero mean and variance 1. Donsker essentially proved that the sequence of processes {ℤn (t): 0 ≤ t ≤ 1} converges in distribution in the space ℓ∞ [0,1] to a standard Brownian motion process [Donsker (1951)].

Aad W. van der Vaart, Jon A. Wellner
2.13. Other Donsker Classes

In this section we consider some Donsker classes of interest, that do not fit well in the framework of the preceding chapters.

Aad W. van der Vaart, Jon A. Wellner
2.14. Tail Bounds

In this chapter we derive moment and tail bounds for the supremum $$ \left\| {{G_n}} \right\|F $$ of the empirical process. Throughout this chapter, $$ {G_{n = \sqrt n }}({P_n} - P) $$ denotes the empirical process of an i.i.d. sample X1,..., X n ,, from a probability measure P, defined as the coordinate projections of a product probability space (X∞, A∞, P∞).

Aad W. van der Vaart, Jon A. Wellner

Statistical Applications

Frontmatter
3.1. Introduction

The empirical process methods and techniques developed in Part 2 have many applications in statistics. The present part illustrates this in some detail by applications ranging from M-estimation (limit theory and rates of convergence in infinite-dimensional applications), bootstrapping, permutation tests, tests of independence, the functional delta-method, and contiguity theory.

Aad W. van der Vaart, Jon A. Wellner
3.2. M-Estimators

The most important method of constructing statistical estimators is to choose the estimator to maximize a certain criterion function. We shall call such estimators M-estimators (from “maximum” or “minimum”). In the case of i.i.d. observations X1,..., X n , a common type of criterion function is of the form $$\theta \mapsto {{\text{P}}_n}{m_\theta } = \frac{1}{n}\sum\limits_{i = 1}^n {{m_\theta }} ({X_i}),$$ for known given functions m θ on the sample space. In particular, the method of maximum likelihood estimation corresponds to the choice m θ = log p θ , where p θ is the density of the observations.† The theory of empirical processes comes in naturally when studying the asymptotic properties of these estimators. In this chapter we present several results that give the asymptotic distribution of M-estimators. Some results are of a general nature, while others presume the set-up of i.i.d. observations.

Aad W. van der Vaart, Jon A. Wellner
3.3. Z-Estimators

Let the parameter set ⊝ be a subset of a Banach space, and let $${\psi _n}:\Theta \mapsto L,\psi :\Theta \mapsto L$$ be random maps and a deterministic map, respectively, with values in another Banach space L. Here “random maps” means that each Ψn(θ) is defined on the product of ⊝ and some probability space. The dependence on the probability space is suppressed in the notation.

Aad W. van der Vaart, Jon A. Wellner
3.4. Rates of Convergence

This chapter gives some results on rates of convergence of M-estimators, including maximum likelihood estimators and least-squares estimators. We first state an abstract result, which is a generalization of the theorem on rates of convergence in Chapter 3.2, and next discuss some methods to establish the maximal inequalities needed for the application of this result. Our main interest is in M-estimators of infinite-dimensional parameters.

Aad W. van der Vaart, Jon A. Wellner
3.5. Random Sample Size, Poissonization and Kac Processes

It can be argued that in practice the number of available observations is random and perhaps dependent on the same random phenomenon. In the first section it is shown in fair generality that the empirical central limit theorem is valid also for random sample size.

Aad W. van der Vaart, Jon A. Wellner
3.6. The Bootstrap

In this chapter we first prove the asymptotic consistency of the empirical bootstrap estimator of the distribution of the empirical process. Next this result is generalized to more general, “exchangeable” bootstrap schemes. The results of this chapter are particularly interesting when combined with those of Section 3.9.3, which show that the consistency of the bootstrap is retained under application of the delta-method.

Aad W. van der Vaart, Jon A. Wellner
3.7. The Two-Sample Problem

Let X1,...,X m and Y1,...,Y n be independent random samples from distributions P and Q on a measurable space (X, A). We wish to test the null hypothesis H0: P = Q versus the alternative H1: P ≠ Q.

Aad W. van der Vaart, Jon A. Wellner
3.8. Independence Empirical Processes

Let H be a probability measure on the measurable space (X x y, A x B) with marginal laws P and Q on (X, A) and (y, B), respectively. Given a sample (X1, Y1),..., (X n , Y n ) of independently and identically distributed vectors from H, we want to test the null hypothesis of independence Ho : H = P x Q versus the alternative hypothesis H1: H ≠ P x Q. Let H n be the empirical measure of the observations, and let P n , and Q n be its marginals. The latter are the empirical measures of the X i ’s and Yx’s, respectively.

Aad W. van der Vaart, Jon A. Wellner
3.9. The Delta-Method

After giving the general principle of the delta-method, we consider the special case of Gaussian limits and the “conditional” delta-method, which applies to the bootstrap. The chapter closes with a large number of examples.

Aad W. van der Vaart, Jon A. Wellner
3.10. Contiguity

Let P and Q be probability measures on a measurable space (Ω, A). If Q is absolutely continuous with respect to P, then the Q-law of a measurable map X: Ω ↦ D can be calculated from the P-law of the pair (X, dQ/dP) through the formula $${E_Q}f(X) = {E_P}f(X)\frac{{dQ}}{{dP}}$$

Aad W. van der Vaart, Jon A. Wellner
3.11. Convolution and Minimax Theorems

Let H be a linear subspace of a Hilbert space with inner product (·, ·) and norm I · II. For each n ∈ N and h ∈ H, let P n,h be a probability measure on a measurable space (X n , A n ). Consider the problem of estimating a “parameter” k n (h) given an “observation” X n with law P n,h . The convolution theorem and the minimax theorem give a lower bound on how well k n (h) can be estimated asymptotically as ↦ ∞. Suppose the sequence of statistical experiments (Xn, A n , P n,h : h ∈ H) is “asymptotically normal” and the sequence of parameters is “regular”. Then the limit distribution of every “regular” estimator sequence is the convolution of a certain Gaussian distribution and a noise factor. Furthermore, the maximum risk of any estimator sequence is bounded below by the “risk” of this Gaussian distribution. These concepts are defined as follows.

Aad W. van der Vaart, Jon A. Wellner
Backmatter
Metadaten
Titel
Weak Convergence and Empirical Processes
verfasst von
Aad W. van der Vaart
Jon A. Wellner
Copyright-Jahr
1996
Verlag
Springer New York
Electronic ISBN
978-1-4757-2545-2
Print ISBN
978-1-4757-2547-6
DOI
https://doi.org/10.1007/978-1-4757-2545-2