Skip to main content

2017 | Buch

Biased Sampling, Over-identified Parameter Problems and Beyond

insite
SUCHEN

Über dieses Buch

This book is devoted to biased sampling problems (also called choice-based sampling in Econometrics parlance) and over-identified parameter estimation problems. Biased sampling problems appear in many areas of research, including Medicine, Epidemiology and Public Health, the Social Sciences and Economics. The book addresses a range of important topics, including case and control studies, causal inference, missing data problems, meta-analysis, renewal process and length biased sampling problems, capture and recapture problems, case cohort studies, exponential tilting genetic mixture models etc.
The goal of this book is to make it easier for Ph. D students and new researchers to get started in this research area. It will be of interest to all those who work in the health, biological, social and physical sciences, as well as those who are interested in survey methodology and other areas of statistical science, among others.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Examples and Basic Theories for Length Biased Sampling Problems
Abstract
We begin with a discussion of length biased sampling problems and their applications in different fields. Biased sampling occurs when an investigator naturally collects samples from a population, but the sampling distribution is different from the target population. It happens because not every unit in the population has an equal chance to be sampled when the natural sampling plan is adopted.
Jing Qin
Chapter 2. Brief Introduction of Renewal Process
Abstract
Length biased sampling is a fundamental problem in studying renewal processes. Suppose buses arrive at a stop after independent random intervals of T minutes, where T is uniformly distributed between 10 and 20. It is natural to wonder how long one is expected to wait from some random point in time t until next bus arrives. The next bus could arrive immediately, or one could be unlucky with time t just after the previous bus left and could wait as long as 20 minutes for the next bus. Interestingly this waiting time is no longer uniformly distributed. This is the so-called “inspection paradox”.
Jing Qin
Chapter 3. Heuristical Introduction of General Biased Sampling with Various Applications
Abstract
When individuals of a population are less likely or more likely to be included than others in a study, then sampling bias occurs. Sampling bias is a systematic error due to the non-random sampling from a population. As a consequence, the sampled subjects are neither equally balanced nor objectively representing the target population. This problem appears naturally in evidence based economic, epidemiological, and medical research in which observational studies are conducted. We call this type of biased sampling “natural selection biased sampling problems”.
Jing Qin
Chapter 4. Brief Review of Parametric Likelihood Inferences
Abstract
Maximum likelihood estimation (MLE) under regular conditions can be found in most statistical books. In non-regular cases, however, it involves all kinds of problems, such as solution on the boundary of parameter space, multiple roots, non-existence, inconsistency in the presence of many incidental parameters, etc.
Jing Qin
Chapter 5. Optimal Estimating Function Theory
Abstract
Classical statistical inference emphasizes unbiased estimators rather than unbiased estimating equations.
Jing Qin
Chapter 6. Projection Methods in General Semiparametric Models
Abstract
The projection method can be used not only in finitely many parameter problems but also in nuisance function or infinite many nuisance parameters cases.
Jing Qin
Chapter 7. Generalized Method of Moments
Abstract
Generalized method of moments (Hansen, Econometrica, 50, 1029–1054, 1982) is one of the most popular methods in econometric literature. Due to this ground-break work, Hansen was awarded Nobel prize in 2013. The generalized method of moments (henceforth GMM) has become an important unifying framework for inference in econometrics during the last thirty years.
Jing Qin
Chapter 8. Empirical Likelihood with Applications
Abstract
The maximum likelihood method for regular parametric models has many optimality properties. As a result, it is one of the most popular methods in statistical inference. However, model mis-specification is a big concern since a misspecified model may lead to bias results.
Jing Qin
Chapter 9. Kullback–Leibler Likelihood and Entropy Family
Abstract
Besides empirical likelihood, the Kullback–Leibler likelihood is another popular method to calibrate auxiliary information. The entropy family has also been used extensively in information theory. We mainly focus on discussions for continuous random variable cases. The discrete cases can be treated similarly.
Jing Qin
Chapter 10. General Theory on Biased Sampling Problems
Abstract
As indicated in the beginning of this book, the biased sampling is a ubiquitous problem in many disciplines.
Jing Qin
Chapter 11. General Theory for Case-Control Studies
Abstract
So far we have assumed that the weight functions in biased sampling problems are either completely known or depend on the underlying distribution. In many applications, the weight functions may depend on some unknown finite dimensional parameters. We are not going to discuss in detail the general forms of the weight functions.
Jing Qin
Chapter 12. Conditioning Approach for Discrete Outcome Problems
Abstract
In this chapter we study conditional likelihood-based inference in discrete outcome problems. This method is very useful for sparse data where there exists a large number of nuisance parameters. Moreover it is used extensively in matched case-control studies where some baseline covariates or survival times are matched at the data collection stage.
Jing Qin
Chapter 13. Discrete Data Models
Abstract
The logistic regression model has been widely used in statistical literature for analyzing categorical data. In this chapter we present many other useful discrete data models. If the data collection process is retrospective, then we end up with different biased sampling problems.
Jing Qin
Chapter 14. Gene and Environment Independence and Secondary Outcome Analysis in Case-Control Study
Abstract
The testing of gene and environment interaction for a susceptibility disease such as cancer has become a very popular topic in genetic epidemiological studies. This is mainly due to the fact that relatively common ploymorphisms in a wide spectrum of genes may be modified by environmental exposures.
Jing Qin
Chapter 15. Outcome Dependent Sampling and Maximum Rank Estimation
Abstract
The case-control sampling for discrete outcomes has been generalized to continuous outcomes. The so-called outcome dependent sampling is a sampling scheme that depends on the outcome. In an outcome dependent sampling, the tails of a distribution are over-sampled to compensate for the low probability of making observation in the tails under a random sampling.
Jing Qin
Chapter 16. Noncentral Hypergeometric Distribution and Poisson Binomial Distribution
Abstract
So far we have discussed continuous and mixture of continuous and discrete covariate discrete choice models. In applications, especially in biomedical researches, a series of \(2\times 2\) tables have been used extensively. Next we present a fundamental result by Kou and Yin (Stat Sin 809–829:8, 1996) on the i.i.d. representation of a noncentral hypergeometric distribution as the summation of independent Bernoulli trials with possible different success probabilities. As a consequence, it will be very easy to conduct statistical inferences and to derive the Central Limit Theorem for a series of \(2\times 2\) tables.
Jing Qin
Chapter 17. Inferences and Tests in Semiparametric Finite Mixture Models
Abstract
Mixture models have been widely used in many disciplines, including econometric, psychosocial, genetic and medical researches and many others.
Jing Qin
Chapter 18. Connections Among Marginal Likelihood, Conditional Likelihood and Empirical Likelihood
Abstract
In this Chapter we present the results by Qin and Zhang (Biometrika 92:251–270, 2005) and Li and Qin (JASA 496:1476–1484, 2011) on the connection between marginal likelihood, conditional likelihood and empirical likelihood.
Jing Qin
Chapter 19. Causal Inference and Missing Data Problems
Abstract
Well established physical theories are developed under rigorous mathematical reasoning and tightly controlled laboratory experiment tests. In randomized clinical trials for the comparison between treatment and control, patients are randomly assigned to one of the two groups. As a consequence, baseline profiles are balanced between the two arms, i.e., they have the same distribution. In observational medical studies or epidemiological researches, however, insights from biology and intuition may suggest possible treatment effects while the underlying experiments may not have a rigorous design, which lead to unbalanced baseline patient characteristics between groups. Similarly in evidence based economic studies, there is no control over intervention programs. As a result participation may not be completely at random. Some individuals may be more likely to participate than others. The fundamental problem of causal inference is that we can only observe one of the two potential outcomes for a particular subject. It is impossible to conduct a paired t-test for the assessment of treatment effects. On the other hand, the unpaired two sample t-test or Wilcoxon test may produce biased results for treatment effects since they fail to adjust for baseline covariates.
Jing Qin
Chapter 20. Inference in Finite Populations
Abstract
Most statistical methods such as imputation methods, inverse probability weighted methods, regression calibration methods developed in missing data and biased sampling problems originated from survey sampling studies. In this chapter, we briefly review some important survey sampling problems.
Jing Qin
Chapter 21. Inference for Density Ratio Model with Continuous Covariates
Abstract
We have discussed different density ratio models for two-sample or multiple-sample problems in Chaps. 10, 11, 17 and 18. A natural generalization is to study a density ratio model for continuous covariates. In this chapter we discuss two approaches. (1) the pairwise conditional likelihood method to eliminate the baseline “carrier density” or nuisance parameters, and (2) the profile maximum likelihood method. Also we will consider conditional independent tests in general semiparametric models and in partially specified exponential graphical models with high dimensional parameters.
Jing Qin
Chapter 22. Non-ignorable Missing Data Problems
Abstract
Biased sampling and non-ignorable missing data are the most difficult missing data problems. In contrast to missing at random, where the missing probability and underlying response model can be separately factored out in the likelihood function, in a non-ignorable missing data problem, they cannot be separated and must be handled simultaneously. Furthermore the underlying model may not be identifiable even in the full parametric setup. First we discuss the full parametric model case.
Jing Qin
Chapter 23. Maximum Likelihood Estimation in Capture-Recapture Models
Abstract
In many fields, such as biology, ecology, demography, epidemiology and reliability study, it is important to know the abundance of a species, the size of a closed population (Borchers et al., Estimating animal abundance closed populations, 2002), or the number of failed devices in a system. Mark-recapture, or sometimes called capture-recapture, experiments are widely used to collect the necessary data. In such experiments, individuals or animals from a population of interest are captured, marked, and then released. At a later time after the captured and not captured subjects have been mixed, another sample is taken from this population. The mark-recapture experiment is extensively used when it is not practical to count all individuals in the population.
Jing Qin
Chapter 24. A Review of Survival Analysis
Abstract
Survival analyses have been developed extensively over the past 50 years. The fundamental challenge lies in the incomplete observation of failure times due to censoring. Another layer of complication which arises frequently in epidemiological studies or econometric researches is the selection bias reduced by left truncation in additional to right censoring.
Jing Qin
Chapter 25. Length Biased Sampling, Multiplicative Censoring and Survival Analysis
Abstract
With the preparation of the previous chapter on survival analysis, we are ready to move to the latest development on the survival analysis based on length-biased and right-censored data.
Jing Qin
Chapter 26. Applications of the Pool Adjacent Violation Algorithm (PAVA) in Statistical Inferences
Abstract
The isotonic regression solves many order restricted maximum likelihood estimation problems. This method, especially with the combination of the celebrated EM algorithm, is a powerful mathematical tool to tackle many important and difficult statistical problems. In this chapter we give a few examples to illustrate this method. In general the theoretical results are difficult for shape restricted inferences.
Jing Qin
Backmatter
Metadaten
Titel
Biased Sampling, Over-identified Parameter Problems and Beyond
verfasst von
Jing Qin
Copyright-Jahr
2017
Verlag
Springer Singapore
Electronic ISBN
978-981-10-4856-2
Print ISBN
978-981-10-4854-8
DOI
https://doi.org/10.1007/978-981-10-4856-2