Skip to main content

2003 | Buch

Sample Survey Theory

Some Pythagorean Perspectives

verfasst von: Paul Knottnerus

Verlag: Springer New York

Buchreihe : Springer Series in Statistics

insite
SUCHEN

Über dieses Buch

This volume deals primarily with the classical question of how to draw conclusions about the population mean of a variable, given a sample with observations on that variable. Another classical question is how to use prior knowledge of an economic or definitional relationship between the popu­ lation means of several variables, provided that the variables are observed in a sample. The present volume is a compilation of two discussion papers and some additional notes on these two basic questions. The discussion papers and notes were prepared for a 15-hour course at Statistics Nether­ lands in Voorburg in February 2000. The first discussion paper is entitled "A Memoir on Sampling and Rho, the Generalized Intrasample Correlation Coefficient" (1999). It describes a new approach to the problem of unequal probability sampling. The second discussion paper "The General Restric­ tion Estimator" (2000), deals with the problem of how to find constrained estimators that obey a given set of restrictions imposed on the parameters to be estimated. Parts I and II of the volume provide a novel and systematic treatment of sampling theory considered from the angle of the sampling autocorrelation coefficient p (rho). The same concept plays an important role in the analysis of time series. Although this concept is also well known in sampling theory, for instance in cluster sampling and systematic sampling, generalizations of p for an arbitrary sampling design are to my knowledge not readily found in the literature.

Inhaltsverzeichnis

Frontmatter

Introduction and Outline of the Book

1. Introduction and Outline of the Book
Abstract
In practice most formulas in sampling theory can be employed straightforwardly and the calculations necessary for statistical inference can easily be carried out, provided that nonsampling errors can be disregarded. Only in case of complex sampling designs or, equivalently, in case of unequal probability sampling without replacement, the calculations required for variance estimation can be cumbersome or other problems may arise; see Section 4.1 or Särndal et al. (1992, pp. 98–99). Therefore, alternative methods for variance estimation are very welcome.
Paul Knottnerus

Elementary Statistics

2. Elementary Statistics
Abstract
In order to make the present volume as self-contained as possible, this chapter provides a concise summary of the main results from statistics. However, it is assumed that the reader has already had a course in statistics including regression analysis. Chapter 2 is just meant as a service to the reader. When examining later chapters he may wish to refer back to one of the sections in this chapter. Furthermore, some familiarity with calculus and matrix algebra is a sufficient prerequisite.
Paul Knottnerus

Sampling Theory and Autocorrelations

Frontmatter
3. Alternative Approach to Unequal Probability Sampling
Abstract
In this chapter we examine an alternative approach to the problem of unequal probability sampling which is based on the so-called sampling autocorrelation coefficient, denoted by ρ z (rho(z)) and also ρ y . As a point of departure for our considerations we take the normal distribution and a discrete approximation of it. That is, in the simple situation of sampling with replacement we start without loss of generality with the assumption that the target variable in the population can be well described by the normal distribution. We also could have taken the more realistic lognormal distribution but, unfortunately, this distribution leads to more tedious algebra; note that most economic data such as income, investments, or returns are better described by a lognormal distribution than by a normal distribution. On the other hand, it is recalled from mathematical statistics that independent drawings from an arbitrary continuous or discrete distribution always lead to the same sampling formulas irrespective of the type of the underlying distribution or population.
Paul Knottnerus
4. A General Rho Theory on Survey Sampling
Abstract
In the previous chapter we have seen how useful the sampling autocorrelation coefficient ρ z is for deriving formulas for variance estimators in case of unequal probability sampling from a finite population. Especially in case of sampling with replacement, the formulas are straightforward because ρ z = 0 or, equivalently, the sample observations are uncorrelated for such sampling designs. This applies to multistage sampling designs as well. From a geometric point of view this means that the random variables z i are mutually orthogonal (i = 1,...,n). In this chapter we will give a more detailed geometric interpretation of ρ z in case of unequal probability sampling without replacement. Before doing this we give a comprehensive and a somewhat more formal description of the alternative rho approach to unequal probability sampling. In the next section we first pay attention to the classical Horvitz-Thompson estimator (for short, the HT estimator).
Paul Knottnerus
5. Variance Estimation for Some Standard Designs
Abstract
In this chapter we demonstrate how for some standard sampling designs without replacement the well-known formulas for \(Var({\widehat {\overline Y }_{HT}}), \), \(\widehat {\overline {Var} }({\widehat {\overline Y }_{HT}})\), and E(s y 2 )directly follow from the results outlined in the previous chapters. First we consider the Hansen-Hurwitz (HH) estimator for sampling with replacement; cf. Hansen and Hurwitz (1943).
Paul Knottnerus
6. Multistage and Cluster (Sub)Sampling
Abstract
This chapter focuses on multistage sampling designs. Unlike in stratified sampling, in multistage sampling not all clusters (or strata) are sampled; only a subset of n clusters is sampled. In the second stage (sub)samples are drawn from those clusters drawn in the first stage in order to estimate the corresponding cluster totals. This is called two-stage sampling. When the clusters are divided into subclusters and the subcluster totals are estimated first, one calls this three-stage sampling, and so on; see also Sections 3.4 and 3.9. Some of the results mentioned already in Chapter 3 are also given in this chapter where we bring together the various results on multistage sampling. This chapter also discusses the problem when clusters are not disjoint or when some elements of the population are observed more than once. The last section of this chapter deals with the postclustification estimator, which can be used when poststratification is inappropriate because some strata are missing in the sample.
Paul Knottnerus
7. Systematic Sampling
Abstract
In this chapter we pay attention to systematic sampling with equal probabilities (SY) and systematic sampling with unequal probabilities. The latter is also referred to as the so-called dollar unit (DU) sampling method. First we treat the systematic (SY) sampling design where all population elements have equal inclusion probabilities. Although the sampling autocorrelation coefficient or intracluster correlation coefficient ρ y is in this context a well-known measure for the homogeneity of the clusters, we will argue that the squared multiple correlation coefficient R2 from regression theory can also be interpreted as a measure for the homogeneity of the clusters, where R2 stems from a regression of y on cluster dummy variables.
Paul Knottnerus

Variance Estimation in Complex Surveys

Frontmatter
8. Estimation of the Sampling Autocorrelation ρ z
Abstract
This chapter describes how under some regularity conditions the sampling autocorrelation coefficient ρ z can be estimated in practice. The basic idea of the estimation procedure is that the fixed value Z j can be decomposed numerically into two components: (i) a part Ф j that is linear in the powers of p j and (ii) an uncorrelated remainder Ω j (j = 1,...,N). Or to say it in a more intuitive way, we assume that Z j can be split up according to
$$\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{{Z}_{j}} \equiv \frac{{{{Y}_{j}}}}{{{{p}_{j}}}} = f({{p}_{j}}) + {{\Omega }_{j}}} & {(j = 1, \ldots ,N)} \\ \end{array} } \\ { \approx \sum\limits_{{k = 0}} {{{\beta }_{k}}p_{j}^{K} + {{\Omega }_{j}}} } \\ { = {{\Phi }_{j}} + {{\Omega }_{j}},} \\ \end{array}$$
Paul Knottnerus
9. Variance Approximations
Abstract
In this chapter we make yet another attempt to answer the question under which (sufficient) numerical conditions the approximation
$${\rho _z} \approx {R^2}{\rho _\varphi } + \left( {1 - {R^2}} \right){\rho _\omega }$$
(9.1)
from (8.39) can be used to estimate \(Var({\widehat {\overline Y }_{HT}}).\)
Paul Knottnerus
10. A Simulation Study
Abstract
In this chapter we present the results of a simulation study with real data from a population that consists of 34 Dutch municipalities. The 34 municipalities are a systematic selection from the population of all 572 Dutch municipalities in 1997. The 572 municipalities are ordered such that the numbers of inhabitants are increasing. The chosen (sub)population in our simulation study consists of the 34 municipalities with the rank numbers 15, 30, ..., 510. The data of these 34 municipalities are given in Appendix 10.A.
Paul Knottnerus

Minimum Variance Estimators

Frontmatter
11. The Regression Estimator Revisited
Abstract
In Chapter 5 we described the regression estimator and gave two expressions for estimating the vector β of regression coefficients. A natural question now is to what extent \(Var({\widehat {\overline Y }_{z,reg}})\) attains its minimum by choosing β according to (5.44), which does not depend on the inclusion probabilities at all. Formula (5.44) is based on the minimization of the unweighted sum of squared residuals e′e = (y − X/β)′(y — Xβ). Although the resulting β might be seen as the minimum variance estimator of a superpopulation parameter βsup from a hypothetical superpopulation model with homoskedastic Gaussian disturbances, this value of β does not necessarily minimize \(var({\widehat {\overline Y }_{z,reg}})\). In other words, there is no guarantee that (5.44) provides the minimum variance estimator of the target parameter, i.e., the actual population mean \(\overline Y \). Refinements as well as corrections that allow for heteroskedastic disturbances of the superpopulation model cannot overcome this shortcoming of the regression estimator defined according to (5.44) or (5.46), at least not in the case of unequal probability sampling; see also Särndal et al. (1992, p. 291). Moreover, counterexamples can be given that a specific regression estimator based on (5.44) has a larger variance than the ordinary Horvitz-Thompson estimator. In other words, it is possible to disprove the statement that R2 > c > 0 implies \(var({\widehat {\overline Y }_{reg}}) < Var({\widehat {\overline Y }_{HT}})\) for large values of n, where R is the multiple correlation coefficient from a regression of y on an auxiliary variable x and a constant term.
Paul Knottnerus
12. General Restriction Estimator in Multisurvey Sampling
Abstract
In practice it may occur that two or more estimates of the same population parameter are available but that the estimates come from different surveys. A similar situation occurs when the estimates correspond to different parameters and the estimates have to obey a set of restrictions implied by the definitions of the underlying parameters or by theoretical considerations. For instance, quarterly data have to sum up to the yearly total, hence the sum of the four quarterly estimates should be equal to the yearly estimate if the estimates come from different surveys. At the macroeconomic level the aggregated demand for a good such as, for instance, energy must be equal to the aggregated supply of energy. Another example is that costs plus profits of all enterprises must be equal to their turnovers, whereas the sample observations might come from two or three (in)dependent surveys. Consider, for instance, the following situation with four samples that we will further elaborate on in Example 12.3. Let the variable x stand for the cost, y for the profit, and z for the turnover. Obviously, the population totals obey the restriction X + Y = Z. Furthermore, it is assumed that variable x is observed in samples 1 and 3, variable y in samples 2 and 3, while the variable z is only observed in sample 4.
Paul Knottnerus
13. Weighting Procedures
Abstract
In survey sampling estimators are often translated into weights w i for the individual observations or records of the sample in such a way that Σw i y i yields the desired estimator of the population mean. For example, in SRS sampling the weights w i are obviously equal to 1/n.
Paul Knottnerus
Backmatter
Metadaten
Titel
Sample Survey Theory
verfasst von
Paul Knottnerus
Copyright-Jahr
2003
Verlag
Springer New York
Electronic ISBN
978-0-387-21764-2
Print ISBN
978-1-4419-2988-4
DOI
https://doi.org/10.1007/978-0-387-21764-2