The Statistical Analysis of Discrete Data

verfasst von: Thomas J. Santner, Diane E. Duffy

Verlag: Springer New York

Buchreihe : Springer Texts in Statistics

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

The Statistical Analysis of Discrete Data provides an introduction to cur rent statistical methods for analyzing discrete response data. The book can be used as a course text for graduate students and as a reference for researchers who analyze discrete data. The book's mathematical prereq uisites are linear algebra and elementary advanced calculus. It assumes a basic statistics course which includes some decision theory, and knowledge of classical linear model theory for continuous response data. Problems are provided at the end of each chapter to give the reader an opportunity to ap ply the methods in the text, to explore extensions of the material covered, and to analyze data with discrete responses. In the text examples, and in the problems, we have sought to include interesting data sets from a wide variety of fields including political science, medicine, nuclear engineering, sociology, ecology, cancer research, library science, and biology. Although there are several texts available on discrete data analysis, we felt there was a need for a book which incorporated some of the myriad recent research advances. Our motivation was to introduce the subject by emphasizing its ties to the well-known theories of linear models, experi mental design, and regression diagnostics, as well as to describe alterna tive methodologies (Bayesian, smoothing, etc. ); the latter are based on the premise that external information is available. These overriding goals, to gether with our own experiences and biases, have governed our choice of topics.

Inhaltsverzeichnis

Frontmatter

1. Introduction

Abstract

Statistical problems can be classified according to the types of variables observed. Two different criteria for distinguishing variables are important in this book. First, it is convenient to differentiate between (i) responses and (ii) explanatory variables (which affect the responses). In a given problem how one makes the distinction depends on the study design and the scientific goals of the investigation. Second, variables can be distinguished according to their scale of measurement. Four measurement scales are described below.

Thomas J. Santner, Diane E. Duffy

2. Univariate Discrete Responses

Abstract

Perhaps the simplest discrete data problem involves single-sample binary responses. This section considers point and interval estimation for such data. The techniques are described in some detail since they are most easily understood in this simple setting and since analogs of these methods have been developed for many of the more complicated discrete data problems discussed in later sections.

Thomas J. Santner, Diane E. Duffy

3. Loglinear Models

Abstract

The first three sections of this chapter present the theory of maximum likelihood estimation of a vector of means which satisfy a loglinear model under Poisson, multinomial, and product multinomial sampling. Example 1.2.10 (considered in Problem 3.6), Problem 3.3, and Problem 3.4 illustrate loglinear modeling under Poisson sampling for data on valve failures in nuclear plants, breakdowns in electronic equipment, and absences of school children, respectively. Further applications are deferred to Chapter 4 where cross-classified (multinomial) data are studied and to Chapter 5 where binary regression (product multinomial) data are considered. Alternative methods of estimation and non-loglinear models are discussed in Section3.4.

Thomas J. Santner, Diane E. Duffy

4. Cross-Classified Data

Abstract

This chapter describes the use of classical likelihood methods and loglinear models to analyze cross-classified data. Cross-classified data arise when a random sample W ¹, W ²,…, W ^m, say, is drawn from a discrete d-variate distribution where each trial W ^k=(W ₁ ^k ,...,W _d ^k )’ has common joint probability mass function:

$${p_i}: = P[W_1^k = {i_1}, \ldots ,W_d^k = {i_d}],$$

(4.1.1)

. Here the support of W _j ^k is taken to be $\{ 1, \ldots ,{L_j}\} $ without loss of generality. The symbol W, without a superscript, will be used to denote a generic classification variable with probability mass function (4.1.1). By sufficiency, the data can be summarized as the counts $\{ {Y_i}:i \in x\} $ in a d-dimensional contingency table where Y _i is the number of vectors W which equal i. Thus the counts $\{ {Y_i}:i \in x\} $ have the M _t( m, p ) multinomial distribution where $p = \{ {p_i}:i \in X\} ,\sum\nolimits_{i \in x} {{p_i}} = 1,t = \Pi _{j = 1}^d{L_j},$, and $m = \sum\nolimits_{i \in x} {{Y_{i\cdot }}} $.

Thomas J. Santner, Diane E. Duffy

5. Univariate Discrete Data with Covariates

Abstract

Polychotomous response regression data has the form (Y _i, m _i, x _i),1≤i≤T, where, without loss of generality, each response takes one of the values{0,...,g}.Each vector Y _i=(Y _i0,…, Y _ig)’,giving the umber of times the (g +1) outcomes occur at the ith design point, is assumed to follow an independent multinomial distribution with m _i trials $\left( {{{\sum\nolimits_{j = 0}^g Y }_{ij}} = {m_i}} \right)$. The vector x _i=( x _i1,..., x _ik)’contains covariate values affecting the cell probabilities of Y _i. Example 1.2.7 on the severity of nausea in cancer patients undergoing chemotherapy is typical of such data. There are six possible outcomes (g = 5), two design points (T = 2), a scalar covariate (k = 1) which is 1 or 0 according as the chemotherapy includes cisplatinum or not, and the numbers of patients are m ₁ = 58 and m ₂ = 161 for the cisplatinum and no-cisplatinum groups, respectively.

Thomas J. Santner, Diane E. Duffy

Backmatter

Titel: The Statistical Analysis of Discrete Data
verfasst von: Thomas J. Santner
Diane E. Duffy
Verlag: Springer New York
Electronic ISBN: 978-1-4612-1017-7
Print ISBN: 978-1-4612-6986-1
DOI: https://doi.org/10.1007/978-1-4612-1017-7

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Introduction

2. Univariate Discrete Responses

3. Loglinear Models

4. Cross-Classified Data

5. Univariate Discrete Data with Covariates

Backmatter