Skip to main content

2010 | Buch

Comparing Distributions

insite
SUCHEN

Über dieses Buch

Comparing Distributions refers to the statistical data analysis that encompasses the traditional goodness-of-fit testing. Whereas the latter includes only formal statistical hypothesis tests for the one-sample and the K-sample problems, this book presents a more general and informative treatment by also considering graphical and estimation methods. A procedure is said to be informative when it provides information on the reason for rejecting the null hypothesis. Despite the historically seemingly different development of methods, this book emphasises the similarities between the methods by linking them to a common theory backbone.

This book consists of two parts. In the first part statistical methods for the one-sample problem are discussed. The second part of the book treats the K-sample problem. Many sections of this second part of the book may be of interest to every statistician who is involved in comparative studies.

The book gives a self-contained theoretical treatment of a wide range of goodness-of-fit methods, including graphical methods, hypothesis tests, model selection and density estimation. It relies on parametric, semiparametric and nonparametric theory, which is kept at an intermediate level; the intuition and heuristics behind the methods are usually provided as well. The book contains many data examples that are analysed with the cd R-package that is written by the author. All examples include the R-code.

Because many methods described in this book belong to the basic toolbox of almost every statistician, the book should be of interest to a wide audience. In particular, the book may be useful for researchers, graduate students and PhD students who need a starting point for doing research in the area of goodness-of-fit testing. Practitioners and applied statisticians may also be interested because of the many examples, the R-code and the stress on the informative nature of the procedures.

Inhaltsverzeichnis

Frontmatter

One-Sample Problems

Frontmatter
Chapter 1. Introduction
Abstract
In this introductory chapter we start with a brief historical note on the one-sample problem (Section 1.1). A first step in a data analysis is often the graphical exploration of the data. In Section 1.2 we give some graphical techniques which may be very useful in assessing the goodness-of-fit. In this section also most of the example datasets are introduced which are further used to illustrate methods in the remainder of the first part of the book. One of the earliest goodness-of-fit tests is the Pearson chi-squared test. Although it is definitely not the best choice in many situations, it is still often applied. It also often serves as a cornerstone in the construction of other goodness-offit tests. We give an overview of the most important issues in applying the Pearson test in Section 1.3. Moreover, many of the more recent methods still rely on the intuition of this test.
Olivier Thas
Chapter 2. Preliminaries (Building Blocks)
Abstract
This chapter provides an introduction to some methods and concepts on which many of the goodness-of-fit methods are based. For instance, the empirical distribution function (EDF) plays a central role in many GOF techniques. Instead of introducing and discussing the EDF in the section where it is used for the first time, we have chosen to isolate it and put it into this chapter. When in further chapters a method is described which relies heavily on the EDF, the reader is referred to this introductory chapter. Other concepts treated in this way are empirical processes, comparison distributions, Hilbert spaces, parameter estimation, and nonparametric density estimation. Some of the topics are quite technical, but we have tried to focus on the rationale and intuition behind them, rather than providing all the technical details
Olivier Thas
Chapter 3. Graphical Tools
Abstract
A graphical presentation of the data is typically one of the first steps in an exploratory data analysis. This is not different in the goodness-of-fit context. Although many of the graphs presented in the chapter are well known by most statisticians, we think it is still important to give some further details on those methods, particularly because some of the goodness-of-fit tests are very closely related to some of the graphs presented here. We start in Section 3.1 with the description of the histogram and the boxplot, of which the former is basically a nonparametric density estimator. Probability plots (PP and QQ) and comparison distributions are the topics of Sections 3.2 and 3.3, respectively. Both types of plots are related to very important goodness-of-fit tests, and we therefore spend quite some space on these methods
Olivier Thas
Chapter 4. Smooth Tests
Abstract
In this chapter we discuss smooth tests. This class of tests actually dates back from Neyman (1937), who developed a smooth test as a score test for which he proved some optimality properties. Although smooth tests are considered as nonparametric tests, they are actually constructed by first considering a k-dimensional smooth family of alternativesin which the hypothesised distribution is embedded. These smooth alternatives are the subject of Section 4.1. The tests are given in Section 4.2. The power of the test depends on how well the true distribution is approximated by the k-dimensional smooth alternative. In particular, for each data-generating distribution there exists an optimal order k. In Section 4.3 we discuss adaptive smooth tests of which the order is estimated from the data so that often the power is improved. Sections 4.1 up to 4.3 are limited to continuous distributions; smooth tests for discrete distributions are the topic of Section 4.4, and in Section 4.5 we show how smooth tests may be viewed from within a semiparametric framework. Finally, in Section 4.7 we give a brief summary from a practical viewpoint
Olivier Thas
Chapter 5. Methods Based on the Empirical Distribution Function
Abstract
In this chapter a very wide class of statistical tests based on the empirical distribution function (EDF) is introduced. Among these tests we find some old tests, as the Kolmogorov-Smirnov test, but also in recent years new tests have still been added to this class. A discussion on the EDF and empirical processes has been given in Sections 2.1 and 2.2. Sections 5.1 and 5.2 are devoted to the Kolmogorov-Smirnov and the Cramér-von Mises type tests, respectively. In Section 5.3 we generate the class of EDF tests so that also more recent tests based on the empirical quantile function or the empirical characteristic function fit into the framework. We show that many of these tests are closely related to the class of smooth tests. Practical guidelines are provided in Section 5.6.
Olivier Thas

Two-Sample and K-Sample Problems

Frontmatter
Chapter 6. Introduction
Abstract
In this second part of the book we discuss statistical methods for the two-sampleand the K-sampleproblems. Whereas in the one-sample problem the objective is to compare the distribution of the sample observations with a hypothesised distribution, we are now concerned with comparing the distributions of two or more populations from which we have observations at our disposal. As both classes of problems are about comparing distributions, many of the methods developed for the former can be easily adapted to the latter. We indeed show that many names of tests come back (e.g., the Kolmogorov–Smirnov and the Anderson–Darling tests). It also further implies that many of the building blocks of Chapter 2 are useful again.
Olivier Thas
Chapter 7. Preliminaries (Building Blocks)
Abstract
In Section 6.1.1 we argued that the p-values of two-sample tests cannot be based anymore on the parametric bootstrap method, because this technique presumes that the null hypothesis specifies some parameterised parametric distribution. On the other hand, most two-sample tests can be based on an asymptotic null distribution that can be derived from a central limit theorem, or from the application of the continuous mapping theorem and the weak convergence of the empirical process. Although the general two-sample null hypothesis is less parametric than the one-sample null hypothesis, we show here that this null hypothesis even allows us to obtain an exact null distribution. This means that the p-values computed from this null distribution are correct, even for very small sample sizes. Exact null distributions are often enumerated using permutations of observations. In this case we use the term permutation null distributionand tests based on it are referred to as permutation tests
Olivier Thas
Chapter 8. Graphical Tools
Abstract
Most of the graphical tools that have been discussed in Chapter 3 for the one-sample problem can be adapted to the two- and the K-sample problems in a very straightforward way. We focus in this chapter on the QQ and PP plots and on the comparison distribution plots. Just as in Part I we start with defining the population versions of these plots, as these are easier vehicles for explaining how differences between distributions can be interpreted and understood. These graphs are again closely related to the tests discussed in the next chapters.
Olivier Thas
Chapter 9. Some Important Two-Sample Tests
Abstract
We start this chapter with some general guidelines for setting null and alternative hypotheses, while stressing their relation with the choice of a test statistic and the interplay between the null hypothesis and the distributional assumptions one is willing to make. In Section 9.2 this is illustrated for the two-sample problem in the discussion of the well-known Wilcoxon rank sum test. We study the Wilcoxon test from several points of view. From this discussion it becomes clear that its interpretation is not always as clear-cut as one would hope. For example, we demonstrate that the test may not always be used for detecting differences in means. This brings us back to the diagnostic propertythat was also important in Part I. We further elaborate on this in Section 9.3, in which we again consider the Wilcoxon test as an example. The same reasoning is applied in Section 9.5, where we discuss some of the nonparametric tests for detecting differences in scale. Section 9.6 focusses on the Kruskal–Wallis test for the K-sample problem, and we conclude this chapter with an introduction to adaptive tests.
Olivier Thas
Chapter 10. Smooth Tests
Abstract
This chapter is devoted to smooth tests for the two- and the K-sample problems. The literature on such tests may not be as vast as for the onesample problem, though its applicability is very broad and often informative. Because many of the techniques and ideas used in this chapter rely heavily on what has been discussed in the previous chapters, this chapter is quite concise. The construction of the test is very similar to the one-sample smooth test of Chapter 4.
Olivier Thas
Chapter 11. Methods Based on the Empirical Distribution Function
Abstract
This chapter is devoted to tests for the two- and K-sample problems that are based on the empirical distribution functions (EDF) of the distributions to be compared. Such tests are generally known as EDF tests. The types of tests that are treated in this chapter are often of the same form of the EDF tests for the one-sample problem (Chapter 5). The Kolmogorov–Smirnov test is discussed in Section 11.1, and Section 11.2 concerns tests of the Anderson– Darling type. We conclude the chapter with some practical guidelines in Section 11.4. As in Chapter 5 we again prefer the use of empirical processes for studying the asymptotic properties of the tests.
Olivier Thas
Chapter 12. Two Final Methods and Some Final Thoughts
Abstract
A seemingly completely different approach to arrive at two- or K-sample EDF test statistics is described in Section 12.1. This method is known as the contingency table approach, as proposed by Rayner and Best (2001). The sample space partition testsof Section 12.2 can be looked at as a combination of the contingency table approach and the tests of the EDF type. Although both type of tests are basically EDF tests, we have chosen to present them in a separate chapter, because the manner in which they are constructed deviates from what is seen in the previous chapters. This chapter, and the book, is concluded with Section 12.3 with some final thoughts.
Olivier Thas
Backmatter
Metadaten
Titel
Comparing Distributions
verfasst von
Olivier Thas
Copyright-Jahr
2010
Verlag
Springer New York
Electronic ISBN
978-0-387-92710-7
Print ISBN
978-0-387-92709-1
DOI
https://doi.org/10.1007/978-0-387-92710-7

Premium Partner