Skip to main content

Über dieses Buch

This volume presents the published proceedings of the lOth International Workshop on Statistical Modelling, to be held in Innsbruck, Austria from 10 to 14 July, 1995. This workshop marks an important anniversary. The inaugural workshop in this series also took place in Innsbruck in 1986, and brought together a small but enthusiastic group of thirty European statisticians interested in statistical modelling. The workshop arose out of two G LIM conferences in the U. K. in London (1982) and Lancaster (1985), and from a num­ ber of short courses organised by Murray Aitkin and held at Lancaster in the early 1980s, which attracted many European statisticians interested in Generalised Linear Modelling. The inaugural workshop in Innsbruck con­ centrated on GLMs and was characterised by a number of features - a friendly and supportive academic atmosphere, tutorial sessions and invited speakers presenting new developments in statistical modelling, and a very well organised social programme. The academic programme allowed plenty of time for presentation and for discussion, and made available copies of all papers beforehand. Over the intervening years, the workshop has grown substantially, and now regularly attracts over 150 participants. The scope of the workshop is now much broader, reflecting the growth in the subject of statistical modelling over ten years. The elements ofthe first workshop, however, are still present, and participants always find the meetings relevant and stimulating.



NPML estimation of the mixing distribution in general statistical models with unobserved random effects

General maximum likelihood computational methods have recently been described for longitudinal analysis and related problems using generalized linear models. These developments extend the standard methods of generalized linear modelling to deal with overdispersion and variance component structures caused by the presence of unobserved random effects in the models. The value of these methods is that they are not restricted by particular statistical model assumptions about the distribution of the random effects, which if incorrect might invalidate the conclusions. Despite this generality these methods are fully efficient, in the sense that the model is fitted by (nonparametric) maximum likelihood, rather than by approximate or inefficient methods. The computational implementation of the methods is straightforward in GLM packages like GLIM4 and S+.An important feature of these computational methods is that they are applicable to a much broader class of models than the GLMs described above, and provide new general computational solutions to fitting a very wide range of models using latent variables.

Murray Aitkin

Some Topics in Optimum Experimental Design for Generalized Linear Models

Optimum experimental designs for generalized linear models are found by applying the methods for normal theory regression models to the information matrix for weighted least squares. The weights are those in the iterative fitting of the model. Examples for logistic regression with two variables illustrate the differences between design for normal theory models and that for other GLMs.

Anthony C. Atkinson

Autoregressive Modelling of Markov Chains

The reduction of the number of parameters in high-order Markov chain already inspired several articles. In particular, Raftery (1985) proposed an autoregressive modelling which utilizes a same transition matrix for every lag. In this paper, we show that a model of the same type, but utilizing different matrices, gives best results and is not harder to estimate, even when the number of data is small.

André Berchtold

A Case—Study on Accuracy of Cytological Diagnosis

Confortini, Biggeri et al. (Acta Cytologica, 1993) designed a study to assess the reliability of cytological diagnoses of a centralized laboratory in the screening for Cervical tumours. A 100 slides set standard was read by 16 raters; the majority diagnosis has been defined as the modal rating for each slide, and the target diagnosis was known by histopathology and clinical follow-up.In the present paper we analyse ratings from seven raters on a dichotomous classification (negative vs positive: i.e. active diagnostic investigations to be performed), using latent class models and log-linear models.The results from latent variable modelling compared with those obtained considering majority and target diagnoses using log-linear models, show that the latent variable should be interpreted more as a sort of modal judgement (majority diagnosis) than a true diagnosis.

A. Biggeri, M. Bini

Dynamics and Correlated Responses in Longitudinal Count Data Models

The aim of this paper is to examine the properties of dynamic count data models. We propose the use of a linear feedback model (LFM), based on the Integer Valued Autoregressive (INAR) class, for longitudinal data applications with weakly exogenous regressors. In these models the conditional mean is modelled linearly in the history of the process with a log-link in the exogenous variables. We explore the quasi-differencing GMM approach to eliminating unobserved heterogeneity. These ideas are illustrated using data on US R&D expenditures and patents.

R. Blundell, R. Griffith, F. Windmeijer

Bootstrap Methods for Generalized Linear Mixed Models With Applications to Small Area Estimation

Generalized linear mixed models (GLMMs) provide a unified framework for analyzing relationships between binary, count or continuous response variables and predictors with either fixed or random effects. Recent advances in approximate fitting procedures and Markov Chain Monte Carlo techniques, as well as the widespread availability of high speed computers suggest that GLMM software will soon be a standard feature of many statistical packages. Although the difficulty of fitting of GLMMs has to a large extent been overcome, there are still many unresolved problems, particularly with regards to inference. For example, analytical formulas for standard errors and confidence intervals for linear combinations of fixed and random effects are often unreliable or not available, even in the classical case with normal errors. In this paper we propose the use of the parametric bootstrap as a practical tool for addressing problems associated with inference from GLMMs. The power of the bootstrap approach is illustrated in two small area estimation examples. In the first example, it is shown that the bootstrap reproduces complicated analytical formulas for the standard errors of estimates of small area means based on a normal theory mixed linear model. In the second example, involving a logistic-normal model, the bootstrap produces sensible estimates for standard errors, even though no analytical formulas are available.

James Booth

Confidence Intervals for Threshold Parameters

The calculation of a confidence interval for an unknown threshold is sometimes a non-standard problem. We use a modification of a score statistic initially suggested by Smith for hypothesis testing. The limiting distribution follows a stable law. We use the Mellin transform to obtain explicit formulas for the parameters defining this distribution. We show that if the distribution is to remain non-degenerate over the range of possible parameters values then a special standardization has to be used. We also so show how the score statistic can be modified so that it converges to just a single distribution. This simplifies the tabulation required for practical confidence interval calculations.

R. C. H. Cheng, W. B. Liu

Optimal Design for Models Incorporating the Richards Function

Optimal designs for models incorporating the Richards function, which are based on criteria involving a second-order approximation to the mean square error of estimates of the parameters, are considered. Interest focuses on the shape parameter and the asymptote of the model, and it is shown that designs which, separately, minimize the mean square error for estimates of these parameters exhibit severe nonlinearity, but that designs which minimize compromise criteria, or designs formed as a combination of optimal designs for the individual parameters, have attractive properties.

G. P. Y. Clarke, L. M. Haines

Mixed Markov Renewal Models of Social Processes

Mixed Markov renewal models for movement between social states were proposed in the early 1980s (eg Flinn and Heckman, 1982). This is a promising class of model for analysing work and life history data because of its focus on categorical outcomes, its flexibility in representing both state dependence and duration effects, and its random effects specification. The random effects specification is particularly important given the mounting evidence that failure to allow for the inevitable omission of some relevant explanatory variables from any analysis risks serious inferential error not just on temporal dependencies but also on the effects of explanatory variables included in the model. A corollary of this problem is that the opportunity to provide some measure of control for omitted variables in observational studies is a major justification for collecting and analysing logitudinal data. However, mixed Markov renewal models are rarely used in social science research. Atleast in part this is because researchers have been deterred by the model specification and computational problems posed by the relatively simpler models used for event history analysis. Such methods are thernselves only in routine use in a few areas of social science with established quantitative traditions.

R. B. Davies, G. R. Oskrochi

Statistical Inference Based on a General Model of Unobserved Heterogeneity

In this paper, a family of Finite Mixed Generalized Linear Models is considered. A straightforward general EM-algorithm for estimating any model from this family by standard GLM-software is given. After discussing the particular problems of statistical inference arising when FMGLMs are used, three estimators of standard errors of the parameter estimates are compared by means of example data and some simulations.

Ekkehart Dietz, Dankmar Böhning

An Extended Model for Paired Comparisons

The aim of this paper is to present a log-linear formulation of an extended Bradley-Terry model for paired comparisons that allows for simultaneous modelling of ties, order effects, categorical subject-specific covariates as well as object-specific covariates.

Regina Dittrich, Reinhold Hatzinger, Walter Katzenbeisser

Indirect Observations, Composite Link Models and Penalized Likelihood

The composite link model (CLM) is applied to the estimation of indirectly observed distributions. Coarsely grouped histograms and mixed discrete distributions are examples. The CLM can be very ill-conditioned, but a discrete penalty solves this problem. The optimal smoothing parameter is found efficiently by minimizing AIC. Three applications are presented.

Paul H. C. Eilers

Model Estimation in Nonlinear Regression

Given data from a sample of noisy curves in a nonlinear parametric regression model we consider nonparametric estimation of the model function and the parameters under certain structural assumptions. An algorithm for a consistent estimator is proposed and examples given.

Joachim Engel, Alois Kneip

Pearson statistics, goodness of fit, and overdispersion in generalised linear models

The Pearson statistic is commonly used for assessing goodness of fit in generalised linear models. However when data are sparse, asymptotic results based on the chi square distribution may not be valid. McCullagh (1985) recommended conditioning on the parameter estimates and obtained approximations to the first three conditional moments of Pearson’s statistic for generalised linear models with canonical link functions. This paper presents a generalisation of these results to non-canonical models, derived in Farrington (1995). A first order linear correction term to the Pearson statistic is defined which induces local orthogonality with the regression parameters, and leads to substantial simplifications in the expressions for the first three conditional and unconditional moments. Expressions are given for Poisson, binomial, gamma and inverse Gaussian models. The power of the modified statistic to detect overdispersion is assessed, and the methods are applied to adjusting the bias of the dispersion parameter estimate in exponential family models.

C. P. Farrington

M-estimation: Some Remedies

This paper is concerned with two problems facing M- estimators. M-estimators bound the influence of large residuals, or more generally, large deviations from the mean. In doing so, however, they can become inconsistent, in particular in the case of non-Normal Generalized Linear Models (GLMs), thus leading to biased estimates. We present a method for correcting such biased estimates without needing to alter the standard estimation procedures used in most statistical packages. Another problem facing M-estimators is that they do not take into account the (potentially high) leverage of a point. The cause of high leverage could be the mis-recording of a single explanatory variate value. We investigate the Mean Shift Outlier Model (MSOM) as discussed by Cook and Weisberg (1982), and show that this can lead to a method for reducing the leverage of a suspect point. However, we also note that in other cases, the method would increase the leverage (albeit removing the influence) and therefore some other approach is needed. Our discussion is based on the Normal case, although we have in mind extensions of these proposals in the GLM context.

Robert Gilchrist, George Portides

Subject-Specific and Population-Averaged Questions for Log-Linear Regression Data

This paper illustrates the relation between subject-specific and population-averaged models (Zeger et al., 1988), especially for loglinear regression data with normal random effects. The emphasis is on simple special cases. The practical implications are discussed using an example from the literature.

Ulrike Grömping

Radon and Lung Cancer Mortality: An Example of a Bayesian Ecological Analysis

The purpose of this ecological study is the investigation of the relation of lung cancer mortality and radon in Switzerland, considering confounding risk factors and taking into account the particular spatial structure of the data. We make use of a Bayesian model pioneered by Clayton and Kaldor (1987) and by Besag et al. (1991): The Poisson model for observed mortality rates is extended to include two sources of extra-Poisson variation; a spatially ‘unstructured’ white noise process and a spatially ‘structured’ intrinsic conditional autoregressive process. Under the full model and the restricted models the radon effect is ‘significant’. However, effect estimates and corresponding SE’s vary considerably between models. We discuss and interpret these effects and conclude that there is no single best model explaining the data. Rather, we find that in such complex situations it is preferable to present the sequence of all relevant models to epidemiologists. Only an analysis based on epidemiological knowledge and this range of models allows one to reach an informed conclusion about the risk factor of interest.

Ulrich Helfenstein, Christoph E. Minder

IRREML, a tool for fitting a versatile class of mixed models for ordinal data

Our aim is to develop models for ordered categorical data that are as general as for continuous data and allow for similar inferential procedures. The basic model is the common threshold or grouped continuous model, assuming a underlying continuous variable z which is observed imperfectly. Any family of continuous distributions is a candidate for approximating the distribution of z and a generalised linear mixed model may be specified for its parameters. The choice of distribution induces the link function that links the mean of the observed frequencies to one of the parameters of the distribution of z, usually the location. The remaining parameters of the distribution of z are parameters of this link function. The link parameters are estimated by local linearisation of the link function, which extends the model to an approximate generalised linear mixed model including linear contributions of the link parameters. All parameters of the model are estimated simultaneously by iterative reweighted REML. It is feasible to analyse fairly general models for the parameters of the distribution of z, in particular its location and scale parameter.

A. Keen, B. Engel

Towards a General Robust Estimation Approach for Generalised Regression Models

This paper presents a general methodology for robust estimation of the coefficients as well as the variance - covariance parameters of an econometric model, by combining the optimal B-robust estimation available for M-estimators and an original extension of the bounded-influence methods recently developed for variance components models. Special attention is given to the application of this procedure in the context of panel data models, particularly for the most well-known error component and random coefficient models.

Jayalakshmi Krishnakumar

Comparing Local Fitting to Other Automatic Smoothers

Two automatic local fitting procedures are compared by simulation. They appear to exhibit similar small sample behaviour as other more popular smoothing methods.

Michael Maderbacher, Werner G. Müller

Iterative Reweighted Partial Least Squares Estimation for GLMs

We extend the concept of partial least squares (PLS) into the framework of generalized linear models. These models form a sequence of rank one approximations useful for predicting the response variable when the explanatory information is severely ill-conditioned or ill-posed. An Iterative reweighted PLS algorithm is presented with various theoretical properties. Connections to principal component and maximum likelihood estimation are made, as well as suggestions for rules to choose the proper rank of the final model.

Brian D. Marx

Protective Estimation of Longitudinal Categorical Data With Nonrandom Dropout

Longitudinal categorical data, subject to dropout, are analyzed using a protective estimator, assuming dropout depends on the unobserved outcomes only. Necessary and sufficient conditions are derived in order to have an estimator in the interior of the parameter space. A variance estimator is proposed. The method is illustrated using data taken from a psychiatric study.

Bart Michiels, Geert Molenberghs

Analysis of Counts Generated by an Age-dependent Branching Process

Analytical expressions for the implementation of likelihood inference for data generated by an age-dependent branching process are available for only very few special cases. On the other hand, the generation of simulated data from such a process is extremely simple. This suggests the use of Monte-Carlo methods for the implementation of likelihood inference. This work compares the results obtained by Monte-Carlo methods with those obtained with the aid of symbolic algebra software, in the context of an application from colon cancer prevention. The study suggests that Monte-Carlo analysis can provide accurate approximations in an environment of flexible interactive analysis using standard software.

Salomon Minkin

Quantitative Risk Assessment for Clustered Binary Data

The effect of misspecifying the parametric response model for a univariate clustered binary outcome from a toxicological study on the assessment of dose effect and on estimating a virtually safe dose is investigated. Marginal and conditional models are contrasted.

Geert Molenberghs, Lieven Declerck, Marc Aerts

Optimal Design and Lack of Fit in Nonlinear Regression Models

This paper points out that so-called optimal designs for nonlinear regression models are often limited when the assumed model function is not known with complete certainty and argues that robust designs - near optimal designs but with extra support points - can be used to also test for lack of fit of the model function. A simple robust design strategy - which has been implemented with a popular software package - is also presented and illustrated.

Timothy E. O’Brien

Nonparametric Regression, Kriging and Process Optimization

Thin plate splines and kriging models are proposed as methods for approximating unknown response functions in the context of process optimization. Connections between the methods are discussed and implementation of the models using S-PLUS is described. Results are presented from a simulation study comparing the methods and further necessary work is identified.

M. O’Connell, P. Haaland, S. Hardy, D. Nychka

Frailty In Multiplicative Intensity Models

In multiplicative intensity models, using the counting process approach to survival analysis, the intensity function may depend on covariates (risk variables). However not all risk variables may be known or measurable. An unknown or unmeasurable covariate is usually called individual heterogeneity or frailty. When more than one failure time is obtained for each individual, frailty is a common factor among such repeated failure times. Here, multiplicative intensity models including random frailty effects are constructed. The method is programmed via a GLIM macro, and a real data set is analyzed.

GH. R. Oskrochi

Methods for Assessing the Adequacy of Probability Prediction Models

There is an increasing interest in the development of statistical models which use a set of predictor variables in order to estimate the probability that a subject has a particular health outcome. Validation of these probability prediction models requires an assessment of how well the model is able to predict the outcomes of interest. Much of the work to date has considered the case of K = 2 distinct outcomes and a large candidate set of predictors. We consider the more general case with K ≥ 2 outcomes and argue that the model needs to be validated on an individual basis as well as on a marginal basis. We propose some methods for these purposes and describe a new chi-square statistic for assessing goodness of fit. Two examples are given; one example applies these methods to a probability prediction model for predicting sets of symptoms associated with benign prostatic hyperplasia and the other example uses a logistic regression model.

Joseph G. Pigeon, Joseph F. Heyse

Forecast Methods in Regression Models for Categorical Time Series

We are dealing with the prediction of forthcoming outcomes of a categorical time series. We will assume that the evolution of the time series is driven by a covariate process and by former outcomes and that the covariate process itself obeys an autoregressive law. Two forecasting methods are presented. The first is based on an integral formula for the probabilities of forthcoming events and by a Monte Carlo evaluation of this integral. The second method makes use of an approximation formula for conditional expectations. The procedures proposed are illustrated by an application to data on forest damages.

Helmut Pruscha

Standard Errors, Correlations and Model Analysis

In recent years there has been a great increase in the computing power and user-friendliness of packages that may be used for statistical analysis. One effect of these developments has been to bring progressively more sophisticated models within the scope of supposedly routine analysis by people who may or may not be adequately trained in statistics. Often a major purpose of such analysis is to carry out parsimonious model selection to fit a set of data. The purpose of this paper is to illustrate the need for caution when interpreting standard errors and other diagnostics used to guide the selection process.

K. L. Q. Read

Mean and Dispersion Additive Models: Applications and Diagnostics

This paper presents further applications and diagnostics of the ‘Mean and Dispersion Additive Model’ or ‘MADAM’. This is a flexible model for the mean and variance of a dependent variable in which the variance is modelled as a product of the dispersion parameter and a known variance function of the mean, and the mean and dispersion parameters are each modelled as functions of explanatory variables using a semi-parametric Additive model. MADAM’s are fitted using a successive relaxation algorithm which alternates between mean and dispersion model fits until convergence, providing diagnostics for each model. It is shown in the appendix that the algorithm maximises the penalised extended quasi-likelihood of the MADAM.

Robert A. Rigby, Mikis D. Stasinopoulos

Computational Aspects in Maximum Penalized Likelihood Estimation

In this paper we describe the technical details for implementing maximum penalized likelihood estimation (MPLE). This includes description of software for fitting weighted cubic smoothing splines, which constitute building blocks in MPLE. An example is given for illustration.

Ori Rosen, Ayala Cohen

Risk Estimation Using a Surrogate Marker Measured with Error

Our goal is to estimate the risk of contacting Pneurnocystis carinii Pneumonia (PCP) in a fixed time interval based on the current observed CD4 count for an individual. The methodology used involves a linear random effects model for the trajectory of the observed CD4 counts in order to obtain predicted values at each event time. These predicted counts are imputed in the partial likelihood for estimating a regression coefficient in the Cox model. Subsequently, Monte Carlo techniques are employed to approximate the risk of PCP in a six month period. The method will be illustrated on data from AIDS patients.

Amrik Shah, David Schoenfeld, Victor De Gruttola

Estimating Attributable Risks under an Arithmetic Mixture Model

The concept of attributable risks can be used to estimate the number of lung cancers attributable to residential radon in a population. Recent studies indicate that smoking might modify the effect of radon on lung cancer. In this paper, three different approaches for incorporating this kind of information into attributable risk calculations, one of which is using an arithmetic mixture model, are presented and discussed. For illustration purposes, a risk assessment for the population of the former West Germany is conducted.

Karen Steindorf, Jay Lubin

A Nonparametric Method for Detecting Neural Connectivity

We apply graphically a method (Mip-method) of inference on dependence among the firing times of a biological neural network. This we compare to the traditional method based on the cross correlogram (CCH).In the analysis of firing times from two artifical networks it is shown that the CCH can be a very poor device for short as well as arbitrary long series of observations while the Mip-methods recovers the dependence clearly. We further analyse an empirically obtained series and thus illustrate that the proposed method can be a useful alternative to cross correlation analysis.

Klaus J. Utikal

On The Design Of Accelerated Life Testing Experiments

This paper considers the design of constant stress accelerated life testing experiments, using a recently outlined framework based on various classes of stress-parameter models. For brevity, we focus on the two parameter Weibull distribution, and the use of likelihood-based estimation techniques. We present formulae useful in various numerical methods, and discuss the role of the Fisher information in the design of experiments. We also present an illustrative example of experimental design, and briefly discuss the validation of proposed designs by stochastic simulation.

A. J. Watkins

Splitting Criteria in Survival Trees

A new splitting criterion is explored for the tree-based method of analysis of censored data. This criterion is an extension of those used in Classification and Regression Trees (CART). It can also be applied to analyze data with multiple responses of mixed types that arise frequently in practice. We use a simple simulation experiment to compare various splitting criteria and propose a performance score to measure the capability of the splitting criteria for discovering the data structure.

Heping Zhang

The Different Parameterizations of the GEE1 and the GEE2

The purpose of this paper is to give a systematic presentation of the various Generalized Estimating Equation (GEE) approaches. They can be derived by using the Pseudo Maximum Likelihood (PML) approach which has been extensively discussed by Gourieroux and Monfort (1993). Furthermore, it is shown, that the Generalized Method of Moments (Hansen, 1982) can be applied to obtain estimators which are asymptotically equivalent to the GEE estimators.

Andreas Ziegler


Weitere Informationen