Skip to main content
main-content

Über dieses Buch

The statistical analysis of discrete multivariate data has received a great deal of attention in the statistics literature over the past two decades. The develop­ ment ofappropriate models is the common theme of books such as Cox (1970), Haberman (1974, 1978, 1979), Bishop et al. (1975), Gokhale and Kullback (1978), Upton (1978), Fienberg (1980), Plackett (1981), Agresti (1984), Goodman (1984), and Freeman (1987). The objective of our book differs from those listed above. Rather than concentrating on model building, our intention is to describe and assess the goodness-of-fit statistics used in the model verification part of the inference process. Those books that emphasize model development tend to assume that the model can be tested with one of the traditional goodness-of-fit tests 2 2 (e.g., Pearson's X or the loglikelihood ratio G ) using a chi-squared critical value. However, it is well known that this can give a poor approximation in many circumstances. This book provides the reader with a unified analysis of the traditional goodness-of-fit tests, describing their behavior and relative merits as well as introducing some new test statistics. The power-divergence family of statistics (Cressie and Read, 1984) is used to link the traditional test statistics through a single real-valued parameter, and provides a way to consolidate and extend the current fragmented literature. As a by-product of our analysis, a new 2 2 statistic emerges "between" Pearson's X and the loglikelihood ratio G that has some valuable properties.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction to the Power-Divergence Statistic

Abstract
The definition and testing of models for discrete multivariate data has been the subject of much statistical research over the past twenty years. The widespread tendency to group data and to report group frequencies has led to many diverse applications throughout the sciences: for example, the reporting of survey responses (always, sometimes, never); the accumulation of patient treatment-response records (mild, moderate, severe, remission); the reporting of warranty failures (mechanical, electrical, no trouble found); and the collection of tolerances on a machined part (within specification, out of specification).
Timothy R. C. Read, Noel A. C. Cressie

Chapter 2. Defining and Testing Models: Concepts and Examples

Abstract
This chapter introduces the general notation and framework for modeling and testing discrete multivariate data. Through a series of examples we introduce the concept of a null model for the parameters of the sampling distribution in Section 2.1, and motivate the use of goodness-of-fit test statistics to check the null model in Section 2.2. A detailed example is given in Section 2.3, which illustrates how the magnitudes of the cell frequencies cause the traditional goodness-of-fit statistics to behave differently. An explanation of these differences is provided by studying the power-divergence statistic (introduced in Section 2.4) and is developed throughout this book. Section 2.5 considers an application from the area of visual perception.
Timothy R. C. Read, Noel A. C. Cressie

Chapter 3. Modeling Cross-Classified Categorical Data

Abstract
In this chapter we introduce some of the basic terminology and concepts for fitting loglinear models to cross-classified categorical data. We discuss briefly those aspects of model fitting relevant to the development of further chapters, and we show how the power-divergence statistic adds a new dimension to categorical data analysis. In addition, Section 3.4 describes methods of estimating unknown model parameters from the perspective of the power-divergence statistic. In Section 3.5 we discuss the minimum discrimination information approach to characterizing the loglinear model, and illustrate how the power-divergence statistic provides a generalization that characterizes other models (including the linear model). The chapter concludes with a discussion in Section 3.6 of strategies for choosing an appropriate loglinear model.
Timothy R. C. Read, Noel A. C. Cressie

Chapter 4. Testing the Models: Large-Sample Results

Abstract
Chapter 3 presented the terminology and concepts for loglinear model fitting. In the present chapter, the emphasis changes from defining models to testing the model fit using the power-divergence goodness-of-fit statistic.
Timothy R. C. Read, Noel A. C. Cressie

Chapter 5. Improving the Accuracy of Tests with Small Sample Size

Abstract
The distributional properties of the power-divergence family discussed in Chapters 3 and 4 rely on large sample sizes for their validity (i.e., they are asymptotic results). So far, we have not discussed the relevance of these properties when the sample size is small.
Timothy R. C. Read, Noel A. C. Cressie

Chapter 6. Comparing the Sensitivity of the Test Statistics

Abstract
Chapters 4 and 5 presented some theoretical and computational comparisons of the power-divergence family members. This chapter provides some further practical understanding of these results, by analyzing the effects of individual cell frequencies on the power-divergence statistic.
Timothy R. C. Read, Noel A. C. Cressie

Chapter 7. Links with Other Test Statistics and Measures of Divergence

Abstract
In the preceding chapters, we have concentrated on test statistics that measure the degree of divergence between the observed frequencies for a group of cells and the corresponding expected frequencies based on some null model. Throughout we have assumed that we are dealing with discrete counts. If the variable under observation has a continuous (in general, multivariate) distribution, we assume that the outcomes have been grouped into mutually exclusive intervals (in general, compact regions) or cells, whose union contains the support (i.e., the region where the density is positive) of the random variable. We can then use the frequencies with which observations from a sample fall in these cells to perform a goodness-of-fit test (as described in the previous chapters). Such grouping of continuous data necessarily results in some loss of information, and we may wish to use an alternative test statistic that is more efficient in these circumstances. The first two sections of this chapter are devoted to discussing test statistics that are designed for continuous data. In Section 7.3 we draw some comparisons between these statistics and the power-divergence statistic based on the grouping described earlier.
Timothy R. C. Read, Noel A. C. Cressie

Chapter 8. Future Directions

Abstract
The development of the power-divergence test statistic in the previous chapters suggests interesting and important avenues for further research. In this chapter we outline some specific topics and suggest possible future directions this research might take.
Timothy R. C. Read, Noel A. C. Cressie

Historical Perspective: Pearson’s X2 and the Loglikelihood Ratio Statistic G2

Abstract
In Chapter 8 we looked towards future directions for research in goodness of-fit statistics for discrete multivariate. We now provide some historical perspective on the two“old warriors,” X2 and G2, which supports our conclusions for the power-divergence family in Chapters 3–7.
Timothy R. C. Read, Noel A. C. Cressie

Backmatter

Weitere Informationen