main-content

## Über dieses Buch

The 37 expository articles in this volume provide broad coverage of important topics relating to the theory, methods, and applications of goodness-of-fit tests and model validity. The book is divided into eight parts, each of which presents topics written by expert researchers in their areas. Key features include: * state-of-the-art exposition of modern model validity methods, graphical techniques, and computer-intensive methods * systematic presentation with sufficient history and coverage of the fundamentals of the subject * exposure to recent research and a variety of open problems * many interesting real life examples for practitioners * extensive bibliography, with special emphasis on recent literature * subject index This comprehensive reference work will serve the statistical and applied mathematics communities as well as practitioners in the field.

## Inhaltsverzeichnis

### 1. Karl Pearson and the Chi-Squared Test

This historical and review paper is in three parts. The first gives some brief details about Karl Pearson. The second describes in outline the 1900 paper which is being celebrated at this conference. The third provides some perspective on the importance, historically and contemporarily, of the chi-squared test.

D. R. Cox

### 2. Karl Pearson Chi-Square Test The Dawn of Statistical Inference

Specification or stochastic modeling of data is an important step in statistical analysis of data. Karl Pearson was the first to recognize this problem and introduce a criterion, in a paper published in 1900, to examine whether the observed data support a given specification. He called it chi-square goodnessof-fit test, which motivated research in testing of hypotheses and estimation of unknown parameters and led to the development of statistics as a separate discipline. Efron (1995) says, “Karl Pearson’s famous chi-square paper appeared in the spring of 1900, an auspicious beginning to a wonderful century for the field of statistics.”This paper reviews the early work on the chi-square statistic, its derivation from the general theory of asymptotic inference as a score test introduced by Rao (1948), its use in practice and recent contributions to alternative tests. A new test for goodness-of-fit in the continuous case is proposed.

C. R. Rao

### 3. Approximate Models

The increasing size and complexity of data sets increasingly forces us to deal with less than perfect, but ever more complicated models. I shall discuss general issues of model fitting and of assessing the quality of fit, and the important and often crucial roles of robustness and simulation.

Peter J. Huber

### 4. Partitioning the Pearson-Fisher Chi-Squared Goodness-of-Fit Statistic

This paper presents an overview of Rayner and Best’s (1989) categorised Neyman smooth goodness-of-fit score tests, along with an explanation of recent work into how these tests can be used to construct components of the Pearson-Fisher chi-squared test statistic in the presence of unknown nuisance parameters. A short simulation study examining the size of these component test statistics is also presented.

G. D. Rayner

### 5. Statistical Tests for Normal Family in Presence of Outlying Observations

A package of programs in the Fortran software is available for statistical analysis of normal data in presence of outlying observations. At first the Bol’shev test, based on the Chauvenet rule, is applied for detecting all outlying observations in a sample. After the chi-squared type test, based on the statistic of Nikulin-Rao-Robson-Moore with the Neyman-Pearson classes for grouping of data, is applied for testing of normality. We include a practical application of our software to treat the data of Milliken and the data of Daniel. The power of the test for testing normality against the family of logistic distributions, formed on the Neyman-Pearson classes, is also studied.

Aïcha Zerbet

### 6. Chi-Squared Test for the Law of Annual Death Rates: Case with Censure for Life Insurance Files

The object of this article is to set up a chi-squared test in the case of a law of multidimensional binomial type when the observations are censured. This problem arises, for example, in the construction of a goodnessof-fit test for the law of annual death rates used in life insurance to calculate various premiums. For the non-censured case with a complete example for the Makeham law, see Gerville-Réache and Nikulin (2000).

Léo Gerville-Reache

### 7. Shapiro-Wilk Type Goodness-of-Fit Tests for Normality: Asymptotics Revisited

Regression type goodness-of-fit tests for normality based onL-statistics, proposed by Shapiro and Wilk (1965), are known to possess good power properties. However, due to some distributional problems (particularly for large sample sizes), various modifications have been considered in the literature. The intricacies of these asymptotics are presented here in a general setup, and in the light of that some theoretical explanations are provided for the asymptotic power of related tests.

Pranab Kumar Sen

### 8. A Test of Exponentiality Based on Spacings for Progressively Type-II Censored Data

There have been numerous tests proposed in the literature to determine whether or not an exponential model is appropriate for a given data set. These procedures range from graphical techniques, to tests that exploit characterization results for the exponential distribution. In this article, we propose a goodness-of-fit test for the exponential distribution based on general progressively Type-II censored data. This test based on spacings generalizes a test proposed by Tiku (1980). We derive the exact and asymptotic null distribution of the test statistic. The results of a simulation study of the power under several different alternatives like the Weibull, Lomax, Lognormal and Gamma distributions are presented. We also discuss an approximation to the power based on normality and compare the results with those obtained by simulation. A wide range of sample sized and progressive censoring schemes have been considered for the empirical study. We also compare the performance of this procedure with two standard tests for exponentiality, viz. the Cramer-von Mises and the Shapiro-Wilk test. The results are illustrated on some real data for the one-and two-parameter exponential models. Finally, some extensions to the multi-sample case are suggested.

N. Balakrishnan, H. K. T. Ng, N. Kannan

### 9. Goodness-of-Fit Statistics for the Exponential Distribution When the Data are Grouped

In many industrial and biological experiments, the recorded data consist of the number of observations falling in an interval. In this paper, we develop two test statistics to test whether the grouped observations come from an exponential distribution. Following the procedure of Damianou and Kemp (1990), Kolmogrov-Smirnov type statistics are developed with the maximum likelihood estimator of the scale parameter substituted for the true unknown scale. The asymptotic theory for both the statistics is studied and power studies carried out via simulations.

Sneh Gulati, Jordan Neus

### 10. Characterization Theorems and Goodness-of-Fit Tests

Karl Pearson’s chi-square goodness-of-fit test of 1900 is considered an epochal contribution to the science in general and statistics in particular. Regarded as the first objective criterion for agreement between a theory and reality, and suggested as “beginning the prelude to the modern era in statistics,” it stimulated a broadband enquiry into the basics of statistics and led to numerous concepts and ideas which are now common fare in statistical science. Over the decades of the twentieth century the goodness-of-fit has become a substantial field of statistical science of both theoretical and applied importance, and has led to development of a variety of statistical tools. The characterization theorems in probability and statistics, the other topic of our focus, are widely appreciated for their role in clarifying the structure of the families of probability distributions. The purpose of this paper is twofold. The first is to demonstrate that characterization theorems can be natural, logical and effective starting points for constructing goodness-of-fit tests. Towards this end, several entropy and independence characterizations of the normal and the inverse gaussian (IG) distributions, which have resulted in goodness-of-fit tests, are used. The second goal of this paper is to show that the interplay between distributional characterizations and goodness-of-fit assessment continues to be a stimulus for new discoveries and ideas. The point is illustrated using the new concepts of IG symmetry, IG skewness and IG kurtosis, which resulted from goodness-of-fit investigations and have substantially expanded our understanding of the striking and intriguing analogies between the IG and normal families.

Carol E. Marchetti, Govind S. Mudholkar

### 11. Goodness-of-Fit Tests Based on Record Data and Generalized Ranked Set Data

Assume that observations have common distribution functionF.We wish to testH: F = F 0 where F0 is a completely specified distribution. Two kinds of data are considered: (i) The firstk+1record values X(0), X(1)...X(k)or possibly several independent sets of records based on observations with distributionF.(ii) Generalized ranked set data, i.e.Jindependent order statistics Xij:nj; with common parent distributionF.Several appropriate goodness-offit tests are described and evaluated by simulation studies. The more general problem dealing with the composite hypothesisH : FΘ {F(.;θ): θΘ}is also discussed.

Barry C. Arnold, Robert J. Beaver, Enrique Castillo, Jose Maria Sarabia

### 12. Gibbs Regression and a Test for Goodness-of-Fit

We explore a model for social networks that may be viewed either as an extension of logistic regression or as a Gibbs distribution on a complete graph. The model was developed for data from a mental health service system which includes a neighborhood structure on the clients in the system. This neighborhood structure is used to develop a Markov chain Monte Carlo goodness-of-fit test for the fitted model, with pleasing results.

Lynne Seymour

### 13. A CLT for the L_2 Norm of the Regression Estimators Under α-Mixing: Application to G-O-F Tests

We establish a central limit theorem for integrated square error of least squares splines estimators based on α-mixing. The new theorem is used to study the behavior of an asymptotic goodness-of-fit test.

Cheikh A. T. Diack

### 14. Testing the Goodness-of-Fit of a Linear Model in Nonparametric Regression

We construct a linear hypothesis test on the regression function f in nonparametric regression model; more precisely, we test that f is an element of U, where U is a finite dimensional vector space. The test statistic is easy to compute and we give the asymptotic level and the asymptotic power of the test. Even if the procedure is based on large sample behaviour, simulation experiments reveal that, for small samples, the proposed statistic is close to the asymptotic distribution and the test has good power properties.

### 15. A New Test of Linear Hypothesis in Regression

In the Gaussian regression model, we propose a new test, based on model selection methods, for testing that the regression function F belongs to a linear space. The test is free from any prior assumption on F and on the variance a2 of the errors. The procedure is rate optimal over both smooth and directional alternatives and the simulations studies show that it is also robust with respect to the non-Gaussianity of the errors.

Y. Baraud, S. Huet, B. Laurent

### 16. Inference in Extensions of the Cox Model for Heterogeneous Populations

The analysis of censored time data in heterogeneous populations requires extensions of the Cox model to describe the distribution of duration times when the conditions may change according to various schemes. New estimators and tests are presented for a model with a non-stationary baseline hazard depending on the time at which the observed phenomenon starts, and a model where the regression coefficients are functions of an observed variable.

Odile Pons

### 17. Assumptions of a Latent Survival Model

Whitmore, Crowder, and Lawless (1998), henceforth WCL, consider a model for failure of engineering systems in which the physical degradation process is latent or unobservable but a time-varying marker, related to the degradation process, is observable. Lee, DeGruttola, and Schoenfeld (2000), henceforth LDS, extend the WCL model and investigate the relationship between a disease marker and clinical disease by modeling them as a bivariate stochastic process. The disease process is assumed to be latent or unobservable. The time to reach the primary endpoint or failure (for example, death, disease onset, etc.) is the time when the latent disease process first crosses a failure threshold. The marker process is assumed to be correlated with the latent disease process and, hence, tracks disease, albeit imperfectly perhaps. The general development of this latent survival model does not require the proportional hazards assumption. The Wiener processes assumptions of the WCL model and the extended model by LDS, however, must be verified in actual applications to have confidence in the validity of the findings in these applications. In this article, we present a suite of techniques for checking assumptions of this model and discuss a number of remedies that are available to make the model applicable.

Mei-Ling Ting Lee, G. A. Whitmore

### 18. Goodness-of-Fit Testing for the Cox Proportional Hazards Model

For testing the validity of the Cox proportional hazards model, a goodness-of-fit test of the null proportional hazards assumption is proposed based on a semi-parametric generalization of the Cox model, whereby the hazard functions can cross for different values of the covariates, using KullbackLeibler distance. The proposed method is illustrated using some real data. Our test was compared to that of previously described tests by using simulation experiments and found to perform very well.

### 19. A New Family of Multivariate Distributions for Survival Data

We introduce a family of multivariate distributions in discrete time which may be regarded as a multiple logistic distribution for discrete conditional hazards. In the independent case, the marginal laws are identical to those of the univariate logistic model for survival data discussed by Efron (1988). We present the analysis of a data set previously analyzed using frailty models.

Shulamith T. Gross, Catherine Huber-Carol

### 20. Discrimination Index, the Area Under the ROC Curve

The accuracy of fit of a mathematical predictive model is the degree to which the predicted values coincide with the observed outcome. When the outcome variable is dichotomous and predictions are stated as probabilities that an event will occur, models can be checked for good discrimination and calibration. In case of the multiple logistic regression model for binary outcomes (event, non-event), the area under the ROC (Receiver Operating Characteristic) curve is the most used measure of model discrimination. The area under the ROC curve is identical to the Mann-Whitney statistic. We consider shift models for the distributions of predicted probabilities for event and non-event. From the interval estimates of the shift parameter, we calculate the confidence intervals for the area under the ROC curve. Also, we present the development of a general description of an overall discrimination index C (overall C) which we can extend to a survival time model such as the the Cox regression model. The general theory of rank correlation is applied in developing the overall C. The overall C is a linear combination of three independent components: event vs. non-event, event vs. event and event vs. censored. By showing that these three components are asymptotically normally distributed, the overall C is shown to be asymptotically normally distributed. The expected value and the variance of the overall C are presented.

Byung-Ho Nam, Ralph B. D’Agostino

### 21. Goodness-of-Fit Tests for Accelerated Life Models

Goodness-of-fit test for the generalized Sedyakin’s model is proposed when accelerated experiments are done under step-stresses. Various alternatives are considered. Power of the test against approaching alternatives is investigated.

Vilijandas Bagdonavičius, Mikhail S. Nikulin

### 22. Two Nonstandard Examples of the Classical Stratification Approach to Graphically Assessing Proportionality of Hazards

Goodness-of-fit assessment of the crucial proportionality and log-linearity assumptions of the Cox (1972a, b) proportional hazards regression models for survival data and repeated events has necessitated several new developments. This contribution presents two concrete examples of nonstandard application of these ideas: in discrete-time regression for the retro-hazard of the reporting delay time in a multiple sclerosis registry, and in analysing repeated insurance claims in a fixed time window.

Niels Keiding

### 23. Association in Contingency Tables, Correspondence Analysis, and (Modified) Andrews Plots

Andrews plots [Andrews (1972)], as a tool to graphically interpret multivariate data, have recently gained considerable recognition. We present the use of Andrews plots and Modified Andrews plots which are recently introduced by the authors [Khattree and Naik (2001)], in graphically exhibiting the association in the cross classified data and in the context of correspondence analysis. A new alternative to traditional correspondence analysis introduced by C. R. Rao (1995) and implementation of our approach in this case are also discussed.

Ravindra Khattree, Dayanand N. Naik

### 24. Orthogonal Expansions and Distinction Between Logistic and Normal

We propose a graphical goodness-of-fit test which consists in representing a sample along principal dimensions of a random variable. This method is used to distinguish the logistic from the normal distribution. Some simulations are given.

### 25. Functional Tests of Fit

In this paper we define and study a class of functional tests of fit. These smooth tests are associated with projection density estimators. The Pearson’s X2-test belongs to this class since it is associated with the histogram.Using a multidimensional Berry-Esseen type inequality we obtain the asymptotic behaviour in distribution of the test’s statistics. As a consequence we obtain consistency with rates. Exponential rate holds in a special case.We also study asymptotic efficiency under adjacent hypothesis and calculate the Bahadur slope.Finally we give criteria for choosing a test in the class and present some numerical simulations.

Denis Bosq

### 26. Quasi Most Powerful Invariant Tests of Goodness-of-Fit

In this chapter, we consider the problem of testing the goodness-offit of either one of two location-scale families of density when these parameters are unknown. We derive anO(n-1)approximation to the densities of the maximal invariant on which the most powerful invariant test is based. The resulting test, which we call quasi most powerful invariant, can be applied to many situations. The power of the new procedure is studied for some particular cases.

Gilles R. Ducharme, Benoît Frichot

### 27. Test of Monotonicity for the Rasch Model

Sums of independent non-identically distributed indicators have been studied by numerous authors. We give an application of them to some questionnaire models.

Jean Bretagnolle

### 28. Validation of Model Assumptions in Quality of Life Measurements

The measurement of Quality of Life (QoL) has become an increasingly used outcome measure in clinical trials over the past few years. QoL can be assessed by self-rated questionnaires, i.e. a collection of items intended to measure several characteristics of the QoL. When new tests are developed, the goal is to produce a valid measure. The measuring process needs to construct a statistical model and then try to approach it in the best way. Different statistical “methods,” in fact models, are used during the validation process of QoL questionnaires. In this paper, we will present the hypothesis underlying each model and which tests are used in practice.

A. Hamon, J. F. Dupuy, M. Mesbah

### 29. One-Sided Hypotheses in a Multinomial Model

Several independent data sets are represented as coordinates of one i.i.d. data set. We consider model selection for several 2 x 2 contingency tables including models with and without a common odds ratio and one-sided models such as models where a treatment is beneficial. We use Jeffreys priors on each model. For studies of long-term treatment with aspirin after a heart attack, we find that the treatment appeared to be beneficial if it began within six months after the heart attack; if begun later it had no effect on mortality.

Richard M. Dudley, Dominique M. Haughton

### 30. A Depth Test for Symmetry

It was recently shown for arbitrary multivariate probability distributions that angular symmetry is completely characterized by location depth. We use this mathematical result to construct a statistical test of the null hypothesis that the data were generated by a symmetric distribution, and illustrate the test by several real examples.

Peter J. Rousseeuw, Anja Struyf

### 31. Adaptive Combination of Tests

In this paper we present a combination of two tests of the linear hypothesis in the linear regression model. The adaptive decision rule which selects the optimal combination of the tests is quite analogous to that which led to the optimal combinations of estimators proposed by the authors.

### 32. Partially Parametric Testing

If a smooth test of goodness-of-fit is applied and the null hypothesis is rejected, the hypothesized probability density function is replaced by ak parameter alternativeto the original model. Three examples are given of inference based on such a model: S-sample smooth tests for goodness-of-fitpartially parametric alternative tests to the t-testtests for the location of modes

J. C. W. Rayner

### 33. Exact Nonparametric Two-Sample Homogeneity Tests

In this paper, we study several tests for the equality of two unknown distributions. Two are based on empirical distribution functions, three on nonparametric probability density estimates, and the last ones on differences between sample moments. We suggest controlling the size of such tests (under nonparametric assumptions) by using permutational versions of the tests jointly with the method of Monte Carlo tests properly adjusted to deal with discrete distributions. In a simulation experiment, we show that this technique provides perfect control of test size, in contrast with usual asymptotic critical values.

Jean-Marie Dufour, Abdeljelil Farhat

### 34. Power Comparisons of Some Nonparametric Tests for Lattice Ordered Alternatives in Two-Factor Experiments

In biological or medical situations the expectations of response variables may be ordered by a rectangular grid partial ordering. For example, serum glucose as a function of body mass index and age would typically be assumed to be nondecreasing in each predictive variable. An order-restricted least squares approach to hypothesis testing may be implemented, but the practical implementation of estimation techniques and sampling theory tend to be complicated. However, advances in computer processing have now made computer intensive methods for such inference more practical.In this paper we consider the problem of testing trend under the rectangular grid ordering whenever we have small sample sizes and nonparametric sampling assumptions. Our focus is upon the behavior of rank-based test statistics, in particular, the order-restricted version of the Kruskal-Wallis statistic. We compare the power of test statistics generated by order-restricted least squares to the power of more traditionally defined statistics.

Thu Hoàng, Van L. Parsons

### 35. Tests of Independence with Exponential Marginals

We present tests of independence for bivariate vectors with exponential marginals, in the setup of bivariate extreme value distributions. These rely on a new Karhunen-Loeve expansion due to Deheuvels and Martynov (2000).

Paul Deheuvels

### 36. Testing Problem for Increasing Function in a Model with Infinite Dimensional Nuisance Parameter

We consider the next statistical problem arising, for example, in accelerated life testing. Let Xl be a random variable with density function f (t), Ψ (t) be an increasing absolutely continuous function, Φ(t)=Φ−1(t) be its inverse function, random variable X2 be defined as X2= Φ(X1). In order to test whether function Ψ(t) belongs to a given parametric family when function f is completely unknown, we take two independent nonparametric estimators $${{\hat{f}}_{n}}$$of density function f and $${{\hat{g}}_{n}}$$ of density function g of X2 and compare the function $${{\hat{g}}_{n}}(t)$$ with the function $${{\hat{f}}_{n}}(\Psi ({{\hat{\theta }}_{n}};t))\psi ({{\hat{\theta }}_{n}};t)$$ for a minimum distance estimator $${{\hat{\theta }}_{n}}$$. But at the begining we have to investigate the asymptotic behavior of the estimator $${{\hat{\theta }}_{n}}$$. We consider a parametric minimum distance estimator for ill, when we observe (with a mechanism of independent censoring) two independent samples from the distributions of X1and X2 respectively.

M. Nikulin, V. Solev

### 37. The Concept of Generalized Asymptotic Deficiency and its Application to the Minimum Discrepancy Estimation

The concept of (asymptotic) deficiency was defined as the (limit of) additional number of observations which is an integer. However, it would be convenient for the value of deficiency to be continualized in the higher order asymptotics. In order to do so, Hodges and Lehmann (1970) suggested using stochastic interpolation as one of the methods. In this paper, using the method, we consider the concept of generalized asymptotic deficiency and apply it to the minimum discrepancy estimation including the minimum chi-square one considered by Fisher (1928) in relation to the work of Pearson (1900).

Masafumi Akahira

### Backmatter

Weitere Informationen