main-content

## Über dieses Buch

The papers assembled in this volume were presented at COMPSTAT 1988, the 8th biannual Symposium in Computational Statistics held under the auspices of the International Association for Statistical Computing. The current impact of computers on the theory and practice of statistics can be traced at many levels: on one level, the ubiquitous personal computer has made methods for explorative data analysis and display, rarely even described in conventional statistics textbooks, widely available. At another level, advances in computing power permit the development and application of statistical methods in ways that previously have been infeasible. Some of these methods, for example Bayesian methods, are deeply rooted in the philosophical basis of statistics, while others, for example dynamic graphics, present the classical statistical framework with quite novel perspectives. The contents of this volume provide a cross-section of current concerns and interests in computational statistics. A dominating topic is the application of artificial intelligence to statistics (and vice versa), where systems deserving the label" expert systems" are just beginning to emerge from the haze of good intentions with which they hitherto have been clouded. Other topics that are well represented include: nonparametric estimation, graphical techniques, algorithmic developments in all areas, projection pursuit and other computationally intensive methods. COMPSTAT symposia have been held biannually since 1974. This tradition has made COMPSTAT a major forum for advances in computational statistics with contributions from many countries in the world. Two new features have been introduced at COMPSTAT '88.

## Inhaltsverzeichnis

### Parallel Linear Algebra in Statistical Computations

The main problem in parallel computation is to get a number of computers to cooperate in solving a single problem. The word “single” is necessary here to exclude the case of processors in a system working on unrelated problems. Ideally we should like to take a problem that requires time T to solve on a single processor and solve it in time T/p on a system consisting of p processors. We say that a system is efficient in proportion as it achieves this goal.

G. W. Stewart

### Efficient Nonparametric Smoothing in High Dimensions Using Interactive Graphical Techniques

Smoothing techniques are used to reduce the variability of point clouds. There is great interest not only among applied statisticians but also among applied workers in biostatistics, economics and engineering to model the data in a nonparametric fashion. The benefits of this more flexible modeling come at the cost of greater computation, especially in high dimensions. In this paper several possibilities of smoothing in high dimensions are described using additive models. The algorithms for solving the nonparametric smoothing problems are based on WARPing, i.e. Weighted Averaging using Rounded Points. Interactive graphical techniques are a conditio sine qua non for tuning and checking the structure of lower dimensional projections of the data and of smooths produced by the algorithms. Applications of the WARPing technique to a side impact study are shown by smoothing in Projection-Pursuit-type models using Average Derivative Estimation.

W. Härdle

### A Boundary Modification of Kernel Function Smoothing, with Application to Insulin Absorption Kinetics

Kernel function smoothing of observations lying around a regression function is a procedure with marked boundary effects and therefore specific consideration of boundary problems is important. A traditional solution is the construction of kernels designed specifically for use at the boundary. These are derived assuming that the design is continuous. The problem is solved for minimum variance kernels, but not for optimal kernels. In practice, the design is discrete and the boundary kernels may give large variance and biased results even for linear functions. A new procedure, the so-called time-adjustment is proposed. This procedure can be applied to estimates both with and without boundary modifications and is an improvement in both cases.

P. Hougaard

### A Roughness Penalty Regression Approach for Statistical Graphics

A discrete version of the spline smoothing technique is developed to deal with curve estimation problems when the errors have stationary, but not necessarily independent stochastic structure. Both autoregressive and moving average error structures are considered. The use of band matrix manipulations makes it possible to construct linear time algorithms in both cases.

M. G. Schimek

### Detecting Structures by Means of Projection Pursuit

In this paper, the consideration of the projection pursuit for testing the presence of clusters is based on the model of the ellipsoidally symmetric unlmodal densities mixture. It is shown that under this model the use of projections indices based on Renyi entropy or on third or fourth moments results In obtaining an estimate of the discriminant subspace. For estimating the Renyi indices values some forms of the order statistics are used. For detecting outliers the ratio of the standard variance estimate to a robust one is proposed as projection index. In-deces for discriminant analysis problem are introduced.

I. S. Yenyukov

### Confidence Regions for Projection Pursuit Density Estimates

Multivariate Projection Pursuit Density Estimation (PPDE) does not suffer from the “curse of dimensionnality” as the more classical kernel density estimation does, however a means of evaluating its stability and precision is needed, and this paper shows how the bootstrap can provide certain useful confidence intervals, the method used for constructing them starts by a pre-pivoting process.

E. Elguero, S. Holmes-Junca

### A Robustness Property of the Projection Pursuit Methods in Sampling from Separably Dependent Random Vectors

A purpose of thîs paper Is to point out that the dependence of the observations may Improve the capability of a P.P.M. (Projection Pursuit Method) In detecting clusters In the projected observation values. To this purpose, the mean vector and the covarlance matrix of the r.v. (random variables) representing the projected observation values are calculated under the assumptions of the Intrinsic Inference Model for Finite Populations of Separably Dependent Random Vectors. Comparison with the mean vector and the covarlance matrix under the assumption of 2 population of Independent r.vt. (random vectors) shows the effect of the dependence. It Is proved, at least for the case of normal r.v., that the probability that the projected observation values are sepa-reted Into two destinct clusters Is greater In the case of dependent r.vt. than In the case of Independent r.vt..

B. Baldessari, F. Gallo

### Graphical Modelling with Large Numbers of Variables: An Application of Principal Components

Certain issues concerned with graphical modelling with large numbers of variables are discussed. A rudimentary form of initial model selection and testing is proposed in the context of covariance selection models, similar in spirit to the screening procedure of Kreiner (1987), which avoids the explicit fitting of any graphical model. It is conjectured that a useful guide to assess the performance of the model is to compare its predictive power against that of principal components. This is illustrated by an example from a data set with 30 continuous variables.

J. Whittaker, A. Iliakopoulos, P. W. F. Smith

### Some Graphical Displays for Square Tables

Social mobility is one of the most studied topics in social science- In this context, the data commonly considered are square tables either simple or stratified. The availability of statistical softwares (GAUSS, GLIM,...) enables to consider exploratory methods providing low dimensional graphical displays along with baseline log-linear models. Recently there have been attempts to combine these two strategies in order to serve various objectives: residual analysis, model selection, model description (BACCINI et al.(1987), CAUSSINUS and FALGUEROLLES(1987), HEIJDEN and LEEUW(1985), HEIJDEN(1987), WORSLEYU987)). In this article, we present on real data some of the issues raised by this combined approach. We also use some results on separability (MATHIEU(1987)) to consider models of marginal homogeneity.

A. de Falguerolles, J. R. Mathieu

### Data Plotting Methods for Checking Multivariate Normality and Related Ideas

The object of the paper is to find directions along which multivariate observations have the greatest multivariate skewness or kurtosis in an appropriate sense. Typical measures of multivariate skewness and kurtosis are Malkovich and Afifi’s (1973) b1* and b2*, which are essentially nonlinear maximization problems. To avoid this nonlinearity we present an approach to reduce the above problems to easier ones, which are eigenvalue problems in linear algebra and closely related to some types of measures of multivariate skewness and kurtosis. By using the resultant directions we can project observations into the sample space and check normality of the data through probability plots and scatter plots. We also show that the proposed approach enables us to extend the usual principal component analysis to a higher order case.

T. Isogai

### Computer-Aided Illustration of Regression Diagnostics

In this paper we describe a microcomputer system developed by the Editors for illustration of various concepts related to regression analysis. Particular attention is paid to the diagnostics for analysing influential observations and outliers. Naturally graphical methods play a central role in this illustration. The system is mainly planned for a student taking a course in regression analysis or a person who is applying regression analysis and wants to know the meaning of diagnostics in practice. The programming language is the APL and thus the user familiar with the APL can easily extend the system by his own functions. The system is implemented on Apple Macintosh personal microcomputer and it is a part of a larger system, called KONSTA 88, which is planned for illustrating statistical concepts.

T. Nummi, M. Nurhonen, S. Puntanen

### Computer Guided Diagnostics

The battery of diagnostic techniques developed during the past ten years to find influential observations, assess collinearity, check for data transformations, etc. can overwhelm an expert data analyst, let alone a novice. We are working on software systems that guide the analyst through these procedures and aid in interpreting the results. The tools we use are the S statistical analysis system from Bell Laboratories and HyperCard from Apple Computer. We discuss examples involving collinearity (New S) and solicitation of prior data information for use in diagnostic procedures (HyperCard).

D. A. Belsley, A. Venetoulias, R. E. Welsch

### How Should the Statistical Expert System and its User See Each Other?

This paper draws on experience that my colleagues and I have had in constructing GLIMPSE, a knowledge-based front-end for GLIM. GLIM is a statistical package that facilitates the specification, fitting, and checking of the class of generalized linear models (McCullagh and Neider, 1983). It has its own interpretive language, sufficiently powerful to allow the user to program his own non-standard analyses if a model falls outside the built-in set. GLIM gives the user little on-line syntactic help (how to do things), and almost no semantic help (what to do), except to comment on unsuitable models which produce, for example, negative fitted values when these must be positive. Front ends are designed to remedy these deficiencies, by providing help of both kinds. The result, in the case of GLIMPSE, is a system with three-way communication taking place between the user, the front-end and GLIM itself, which serves as the algorithmic engine.

J. A. Nelder

### Towards a Probabilistic Analysis of MYCIN-like Expert Systems (Working Paper)

A formal apparatus is developed here for the study of the compatibility of MYCIN-like expert systems with axioms of probability theory. Some partial results are presented; but the main stress is on open problems.

P. Hájek

### An Expert System Accepting Knowledge in a Form of Statistical Data

Bottle-neck effect in application of most expert system shells nowadays lies in knowledge acquisition. Experts are forced to express their knowledge in a formalized way acceptable by computers. Most of this burden lies on shoulders of knowledge engineers and besides being tedious it drags into the process an unwanted subjective factor. Therefore, recently some authors started to look -for methods of automatical knowledge acquisition from the statistical data.

R. Jiroušek, O. Kříž

### Building a Statistical Expert System with Knowledge Bases of Different Levels of Abstraction

Statistical analysis is predetermined by the way a (prospective) experiment is planned or data are collected in a (retrospective) study. The a-priori knowledge of observable, theoretical and hypothetical relations (WITTKOWSKI 1987) determines the semantically meaningful database activities and statistical analyses. For instance, relations between variables and types of observational units may be used to determine whether or not the meaning of a value depends on values of other variables. The models underlying the statistical methods are determined by theoretical knowledge on the sampling strategy of factors, scales, and constraints (WITTKOWSKI 1985). For confirmatory analyses, the primary goal (hypothesis) needs to be specified at the time the sample size is computed.

K. M. Wittkowski

### An Expert System for the Interpretation of Results of Canonical Covariance Analysis

The expected knowledge of mathematical and statistical properties of some data analysis method or technique in the sample of users of this method or technique usually decrease as the sample space increase. This is especially true for the apparently simple and intuitively easy understandable methods, like Canonical Covariance Analysis (Momirovic, Dobric and Karaman, 1983; Momirovic and Dobric, 1985; Dobric, 1986). Unfortunately the knowledge of a typical user of CCA is seldom sufficient even for technical interpretation of obtained results. For that reason an expert system is written in GENSTAT to help the interpretation to user not understanding the real meaning of parameters and set of pattern, structure, crosstructure and latent variables intercorrelation and crosscorrelation matrices computed for the identification of latent variables content.

K. Momirović, J. Radaković, V. Dobrić

### Building a Statistical Knowledge Base: A Discussion of the Approach Used in the Development of THESEUS, a Statistical Expert System

Knowledge acquisition is one of the major problems in developing any expert system. It is recognised that different methods of knowledge acquisition should be used for eliciting different types of knowledge and that the usual approach of dialogue sessions between a domain expert and a knowledge engineer is not always appropriate.

E. Bell, P. Watts

### PRINCE: An Expert System for Nonlinear Principal Components Analysis

Some statistical programs have so many options that users find it hard to choose and specify the analysis they want. Sometimes incorrect options are chosen and at other times possibilities are overlooked that could have been very useful. Especially programs for multivariate analysis techniques are notorious for their complexity. In recent years, statistical expert systems have been suggested as a way to solve these problems (Gale, 1986). When an expert system is developed with the explicit purpose in mind to advise on an existing statistical program, it is useful to establish a link between the system and the program. The expert system thus can give advice in statistical matters related to this program and it can generate the control language for the existing program. This implies that the conclusions drawn by the system, should not only be displayed to the user but must also be translated into actual control language. The ultimate goal of the expert system is the generation of control language that accurately reflects the demands of the user.

I. J. Duijsens, T. J. Duijkers, G. M. van den Berg

### Expert Systems for Non-Linear Modelling: Progress and Prospects

Expert systems can help scientists to use non-linear models effectively. The development of the model-fitting program MLP aims to provide the user with helpful advice at all stages of model choice, model fitting and interpretation of results.

G. J. S. Ross

### Inside a Statistical Expert System: Statistical Methods Employed in the ESTES System

In this paper we describe the statistical methods and their organization in a statistical expert system called ESTES. The system is intended to provide guidance for an inexperienced time series analyst in the preliminary analysis of time series. The knowledge base (i.e. statistical methods) of the system has been organized according to the property being considered, the granularity of the analysis process, the goal of the analysis and the user experience. The explanation capability of the ESTES system provides justifications of the statistical methods used in the reasoning process.

P. Hietala

### An Implementation of an EDA Expert System in Prolog Environment

An experimental project of a system for exploratory data analysis (EDA), called GUHA-80, was described in [7]. Although discussions on the project were large and deep, the system was never implemented since the project seemed to be too complex with respect to both data structures and control of reasoning process. In the present paper we argue that a Prolog environment could be good enough for representing both object oriented data structures necessary to implement concepts defined in the project and control strategies.

P. Jirků

### Automatic Acquisition of Knowledge Base from Data without Expert: ESOD (Expert System from Observational Data)

The acquisition of knowledge from experts is nowadays the most demanding activity in the construction of expert systems. Hence the effort for its automation [5,11] follows. In 1984–86 we developed the ESOD expert system shell [7,8,12], in which the knowledge base is acquired automatically from observational data.

J. Ivánek, B. Stejskal

### Experiments with Probabilistic Consultation Systems

This paper describes simulation experiments with probabilistic consultation systems, also called expert systems. The aim was to test the relative performance of systems with inference algorithms based on different assumptions. The approach used was simulation of task generation, knowledge acquisition, and task solving. The main results found were that a rule value approach seems to be superior to a goal driven approach, that the use of the complete Bayes’ formula improves the the quality of the task solutions, that ignorance of statistical dependence among antecedents makes the estimated probability of the predictions useless as a reliability indicator, but that models taking the dependencies into account can be designed.

S. Nordbotten

### Statistical Consultants and Statistical Expert Systems

This paper contains a critical evaluation of the value of expert systems for statistical consultation. Section 1 presents the steps that should ideally be taken when a researcher consults an expert statistician. Next statistical expert systems are considered. It is concluded that they can be fruitfully used for enhancement of the knowledge of the statistician, but that a researcher who consults only a computerized system rather than a human statistician will obtain suboptimal help. The technical knowledge of the statistician may perhaps be incorporated into a knowledge base, butnot the subtle dialogue of client and statistician, with its interplay of substantive and statistical considerations.

I. W. Molenaar

### On Inference Process

“An Expert system is a computing system capable of representing and reasoning about some knowledge-rich domain ... with a view to solving problems and giving advice.” Jackson, p 1, (19.86). Building such a system seems to be a manageable task considering the many expert system shells which are now available.

Th. Westerhoff, P. Naeve

### Identification Keys, Diagnostic Tables and Expert Systems

There are many parallels between the methodology of identification keys and diagnostic tables, and the new methodology of expert systems. For example, the standard identification key identifies specimens from a known set of taxa by applying tests sequentially in a hierarchical manner; this structure is identical to that used in the many expert systems that have a simple deterministic hierarchy of questions leading to a conclusion — which in a statistical context might take the form of recommending some form of analysis. Likewise the systems where the conclusion is determined by comparing an observed set of conditions against a theoretical set of rules use a similar method to that employed by the user of a diagnostic table. Even the expert systems where there is a network of nodes of menus or questions have their parallels in the on-line identification systems that have been developed for botanical and other biological work, and some of these on-line systems allow the user to modify the data base of taxonomic information to take account of new information about the taxa, so that the system is able to learn with experience. Consequently work on methods of constructing efficient keys and tables is very relevant also to expert systems.

R. W. Payne

### Adding New Statistical Techniques to Standard Software Systems: A Review

The paper gives an informal review of the facilities available for adding new statistical methods to the major statistical systems. Evaluation is based on the capacity of these systems to incorporate new techniques developed for the problem of ecological inference.

C. Payne, N. Cleave, P. Brown

### Funigirls: A Prototype Functional Programming Language for the Analysis of Generalized Linear Models

Statistical analysis is most readily carried out with the aid of a statistical package; there are of course numerous well-established packages, most now available both in main-frame and micro (usually IBM PC compatible) formats. In addition to routine analyses, the professional statistician will also wish to develop his/her own procedures. This necessitates either the use of a package with some form of programming facility (looping, branching, etc.), or the use of a high level language. Both approaches suffer from disadvantages. Packages tend not to be designed with the aim of extensibility in mind; thus even with the most powerful package ‘languages’ (e.g. SAS, GLIM), programming is something of an adventure (and an unstructured one at that). An illustration is the widespread industry of writing GLIM macros to bend the GLIM program into the analysis of models outside the standard GLM framework. Such programming can be very tricky, numerous publications result, but try understanding someone else’s macros!

R. Gilchrist, A. Scallan

### BLINWDR: An APL-Function Library for Interactively Solving the Problem of Robust and Bounded Influence Regression

The library BLINWDR represents a coordinated collection of APL-functions for robust and bounded influence regression analysis. It combines important features of robust analysis with the appealing matrix-oriented interactive programming language APL. The paper outlines the structure of the modules and their possible use. Some related problems like L1-estimation, finding robust covariance matrices and computing high breakdown point estimates are also touched.

R. Dutter

### Exact Non-Parametric Significance Tests

A very rich class of non-parametric two-sample tests are of the form $$d(x) = \sum\limits_{i = 1}^k {{a_i}({m_{i - 1}},{x_i})}$$ where the xi’s are the entries in row 1 of the 2xk contingency table x: mi = x1+ X2+...+xi’ and ai(mi-1, xi) is a real valued function. An important special case arises when ai(mi-1,xi) = aixi which, by suitable choice of scores, ai, yields the class of linear rank tests for the two sample problem.

C. R. Mehta, N. R. Patel, P. Senchaudhuri

### Resampling Tests of Statistical Hypotheses

A direct approach to statistical tests of significance is possible using bootstrap techniques to simulate the null distribution of the test statistic of interest. The approach is outlined by Hinkley(1988) and by Young(1986). In this paper issues involved in the construction of such tests are considered in the context of testing the mean of a univariate population. The general method is summarized in Section 2, while questions relating to choice of reference distribution and test statistic are considered and illustrated empirically in Section 3. Section 4 discusses the importance of appropriate conditioning in resampling tests of significance.

A. Young

### Clustering Based on Neural Network Processing

Artificial neural networks — having been popular in the fifties and sixties — recently received a new wave of interest (Anderson 1986; Materna 1987; Fahlman Sz Hinton 1987). Neural network computing, also known as connectionism, is inspired from brain theory and is based on interconnecting a large number of simple processing elements called ‘neurons’, which cooperate in the computations. Essentially such a neuron adds up the weighted input on its input line, pushes the resulting net input through a (possibly non-linear) transfer function and communicates the output via its output lines to all other neurons to which it is connected. Mathematically the static part of a neural network is nothing but a ‘network’ in graph theory, i.e. a labelled weighted directed graph, whose vertices correspond to the neurons and whose weighted arcs represent the network connections. A network state corresponds to a set of weights for the vertices. The static description of a neural network is complemented by the specification of the dynamics governing its state changes.

### Decision Tree Classifier for Speech Recognition

Both the computational complexity and the massive storage make the word recognition strategies based on a word model unfeasible for large vocabulary speech recognition applications. One of the possible solutions for continuous speech recognition is to recognize the basic phonetic units of the input speech. A component of an analytic recognition system is the acoustic-phonetic processor whose purpose is to encode the speech signal into a string of discrete subword units, such as phones, diphones...In the present approach, a segmentation is done prior to recognition. For that, statistically based methods are used to detect non-stationarities in the speech signal and therefore, the nature of the segmented units is not defined in advance.We deal with the “segments”, each of them being characterized by the energy, the first six cepstral coefficients of the signal and by some indices summarizing the phoneticians knowledge. We build a binary decision tree where the leaves are the phonemes of the french language-vowels, semi-vowels, consonants-. The configuration of the tree has been settled making use of a training set for a given male speaker. We construct it in a bottom-up stepwise way, using a likelihood criteria. We can not be sure that we get the “best” tree in terms of classification rate. Annealing methods allow us to improve the original configuration.At each node, we fit either logical rales, either probabilistic rules using a logistic regression model. The system gives good results on test samples pronounced by the same speaker.The segmentation is expected to be reliable and speaker independent. But the characteristic variables are not speaker independent. We know that there are some invariants in speech signal particularly in the plane defined by the first two formants, whoever the speaker is. We are investigating different ways of adapting easily the rules for any speaker. It could be a small sample of artfully choosen words.This work is supposed to be a part of a complete analytic recognition system integrating different steps such as segmentation, vectorial quantization, units recognition, lexical decoding and linguistic analyser.

A. M. Morin

### Efficient Sampling Algorithms and Balanced Samples

Need for sampling algorithms in a finite population has greatly increased in recent years. New quick methods for drawing a sample from a data file have emerged. Their principal utilizations are not, however, in the design of survey samples, but lie in methodological work involving repeated sampling, for instance bootstrap, and simulation. Then, in the case of rejective sampling, where samples are drawn until some specific property is realized, performant drawing becomes a necessary tool if the desired property has a very small probability.

J. C. Deville, J. M. Grosbras, N. Roth

### Recursive Partition in Biostatistics: Stability of Trees and Choice of the Most Stable Classification

Structures found in data by exploratory techniques are notoriously unstable. Suppose that we search for a model within a given family and that we do this on different samples from the same population, D0, D1,..., DB. When only one data set is available, one can think of D as the original data set and the others as bootstrap samples from D0. Experience shows that one can be practically sure to find different models from different samples. A striking example of this model instability is given by Gong [1], in the context of stepwise logistic regression. The problem can be expected to be even more serious for tree-structured predictors, such as the RECPAM trees [2–4] which are the main concern of this work, since the model is selected out of a family much richer than that of linear regression as usually defined.

A. Ciampi, J. Thiffault

### Generating Rules by Means of Regression Analysis

This paper proposes a procedure by which the results of a regression analysis are “transformed” into a decision tree, or into decision rules. This may oe useful either as a way of integrating regression analysis in a learning environment where the accumulated knowledge is represented symbolically, or simply as a way of deriving intuitive and psychologically meaningful explanations of a regression analysis. The proposed algorithm is illustrated within a clinical observational study on prognosis.

C. Berzuini

### A New Algorithm for Matched Case-Control Studies with Applications to Additive Models

Logistic models are commonly used to analyze matched case-control data. The standard analysis requires the computation of conditional maximum likelihood estimates. We propose a simple algorithm that uses a diagonal approximation for the (non-diagonal) weight matrix deriving from the Newton-Raphson method. The primary purpose of the new algorithm is to exploit iterative reweighted least-squares procedures for fitting general additive rather than simple linear structure.

T. Hastie, D. Pregibon

### An Algorithm for the Approximation of N-Dimensional Distributions

An algorithm for approximation and parameter estimation leading to L1 and L- optimization is presented, and for numerical computation Karmarkar’s linear programming algorithm is adapted, Keywords: approximation, estimation, linear programming, computer aided design, finite mixture distribution

J. Gordesch

### Further Recursive Algorithms for Multidimensional Table Computation

We briefly review some recursive algorithms in the literature, and see that most seem to fall into two broad classes: classical branch-and-bound backtracking, and mixed-radix integer counting. New examples of the latter are shown, and it is indicated how more can be obtained from these to perform a wide range of tabular computation.

B. P. Murphy, G. Bartlett

### Nonlinear Regression: Methodological and Software Aspects

Through a survey of the state of the art in nonlinear regression, this paper examines the contribution of different scientific tools: asymptotic theory, differential geometry, numerical analysis and computer science. The specific role of computer science is pointed out: hardware power, software technology, computing environnements and now, artificial intelligence. Finally, some trends for the future are presented.

A. Messéan

### Comparing Sensitivity of Models to Missing Data in the GMANOVA

It is important that a model has a satisfactory fit to the data we analyze. Other general acceptable features of a model are its simplicity and interpretability of parameters. Longitudinal data arising in epidemiologic or clinical studies, for example, are rarely complete. It is therefore desirable that a model is, to some extent, robust to missing data. Thus models should be compared also with respect to this property.We investigate robustness to missing data in the generalized analysis of variance model (GMANOVA) applying a resampling approach. For this purpose we introduce a statistic which indicates the influence of deleting a set of measurements on estimated curves. For example, p×l00%(0<p<l) from measurements are dropped out at random and the effect on estimates is assessed using our influence measure, which can be viewed as a multivariate generalization of Cook’s measure. By repeating the experiment we can determine the empirical distribution of the influence statistic, and this distribution can be analyzed further. Carrying out the analysis for different values of p and for the alternative models under consideration, robustness to missing data can be scrutinized as a function of p. Both the maximum likelihood and the least squares approach can be applied. The techniques are illustrated by analyzing data on a growth experiment for bulls. It is shown how the methods can be utilized in finding “good” design points in growth studies.

E. P. Liski, T. Nummi

### A Modelling Approach to Multiple Correspondence Analysis

A multi-dimensional extension to the Goodman RC model is proposed as a statistical modelling form of multiple correspondence analysis. An algorithm is given for the maximum likelihood estimation of the parameters and an approximate method which provides starting values for the full ML procedure. The method is illustrated by the analysis of the Law School Admission Test. The versitility of the approach is demonstrated by the analysis of a non-standard problem in the study of Marital Endogamy.

M. Green

### Multidimensional Scaling on the Sphere

Nonmetric multidimensional scaling (MDS) has become a popular method for graphical representation of objects or individuals based on dissimilarity measures. Since the pioneering work of Shepard (1962a,b) and Kruskal (1964a,b), various MDS models have been produced together with associated computer programs such as MDS(X) which is described in Coxon and Davies (1982).

M. A. A. Cox, T. F. Cox

### A Monte Carlo Evaluation of the Methods for Estimating the Parameters of the Generalized Lambda Distribution

This paper presents the results of a Monte Carlo study concerning the problems of parameter estimation in the generalized lambda, distribution. Statistical properties of estimates obtained by using four available methods are contrasted and it is concluded that the exact least squares procedure performs better than the other methods.

M. C. Okur

### Statistical Guidance for Model Modification in Covariance Structure Analysis

Four topics on model modification in covariance structure analysis are covered. 1. A comparison by Monte Carlo methods of statistical criteria used to detect the correct model under conditions of misspecification. 2. The effect of a number of factors on the behaviour of these criteria. 3. The scale dependency of the criteria. 4. An alternative model test procedure.

T. C. Luijben, A. Boomsma

### Similarities Functions

The main problem in data analysis is the problem of the representation of this data by a visuable display understandable by every body ; such are, for example, hierarchical classification, additive tree, Euclidean representation. With this end in view, what is really analysed is not exactly the data but relations between them, namely similarities or dissimilarities. It is interesting to quote that the representations are mostly read in terms of dissimilarities, while they are constructed in terms of similarities. It is then very important to study the relations between these two kinds of association coefficients.

G. le Calvé

### Robust Bayesian Regression Analysis with HPD-Regions

Robust methods for inference in the linear regression model have been discussed from different points of view. In the Bayesian analysis robustness with respect to the prior distribution of the parameter plays an important role. Various papers (Chamberlain and Learner (1976), Learner (1982), Polasek (1984)) deal with the sensitivity of the posterior mean to changes in the prior variance. This concept has been extended to credible regions by Pötzelberger (1986) and Polasek and Pötzelberger (1987).

K. Felsenstein, K. Pötzelberger

### Estimation of ARMA Process Parameters and Noise Variance by Means of a Non Linear Filtering Algorithm

The estimation of the parameters associated with an ARMA(p,q) process can be formulated as a non linear filtering problem. Optimal filtering allows to follow the evolution of the a-posteriori probability density; all desired information about parameters can be obtained from their a-posteriori density. First of all, optimal estimators can be obtained directly from it. Secondly, this is a good test for sub-optimal algorithms, and finally, a lower bound of the simple size to get a fixed accuracy can be obtained.

M. P. Muñoz-Gracia, J. Pages-Fita, M. Marti-Recober

### Autoregressive Models with Latent Variables

Observations of time series often are corrupted by noise. This has a negative influence on the quality of estimated parameters of autoregressive models. A new technique is presented here. The individual, not directly observable, samples of the corrupted time series are modeled and estimated as latent variables. The estimation-process leads to the iterative solution of two non-linearly coupled systems of linear equations. Adaptations for missing values, additive and innovative outliers are described. Some applications are presented.

P. H. C. Eilers

### An Algorithm for Time Series Decomposition Using State-Space Models with Singular Transition Matrix

Recently, the problem of seasonal adjustment of time series has received considerable attention by several authors, see e.g. Bell and Hillmer (1984) for a historical review with analysis and discussion of current issues.

E. Alvoni

### New Perspectives in Computer Assisted Survey Processing

National statistical offices process huge quantities of data. The use of recent developments in the fields of statistics and computer technology are a prerequisite for efficient production of high quality statistics. At the Netherlands Central Bureau of Statistics an increasing use is made of (personal) computers in all steps of the statistical production process. This paper discusses the role of the computer in data collection, data editing, tabulation, and analysis. Particularly, attention is payed to the Blaise System for computer assisted survey processing, a system which controls various steps in the statistical production process.

W. J. Keller, J. G. Bethlehem

### Multiple Imputation for Data-Base Construction

Multiple imputation is a technique for handling missing values in shared databases. The technique replaces each missing value by a vector of possible values that reflect uncertainty about which value to impute. The resultant multiply-imputed data base consists of the original data set with missing values replaced by pointers to the rows of a supplemental matrix of multiple imputations, each row being the vector of multiple imputations for a missing value. This data base can be analyzed by standard complete-data analysis techniques to obtain valid inferences.

D. B. Rubin

### GRASP: A Complete Graphical Conceptual Language for Definition and Manipulation of Statistical Databases

Recent research activities show a growing interest in statistical databases. Such interest is motivated by the increasing number of statistical applications and the inadequacies of traditional database languages and interfaces for statistical applications.An outstanding open problem in statistical database development is how to express aggregate data starting from a description of elementary data. We propose a solution to this problem by means of a new language, called GRASP (GRA phical Statistical Package). The language allows to query uniformly both elementary and aggregate data; and is, therefore, suitable for several different purposes: a) posing queries on elementary data; b) building aggregate data from elementary data; c) computing new aggregate data using statistical operators.

T. Catarci, G. Santucci

### New Algorithmic and Software Tools for D-Optimal Design Computation in Nonlinear Regression

The aim of this paper is to present 1.A new theoretical result for a problem of long standing (Atkinson et al. 1968, Box 1968) in the Theory of Optimal Design in Nonlinear Regression: the determination of optimality conditions for designs made of replications of the minimal D-optimal designs points (Vila 1986).2.A new easy-to-use powerful software for optimal design computation in nonlinear regression, following the D-optimality criterion and other allied criteria.

J. P. Vila

### Model-Building on Micro-Computers: Spreadsheets or Specific Software

In recent years, the growth of micro-computer power has led to the development of specific modelling packages, trying to combine the advantages of a dedicated software with the features of the micro-computer. But spreadsheets have also improved, both in speed and in versatility, and might now be considered as a modelling tool. Thus, one might question the necessity of buying and learning a specific software, especially users which are already familiar with spreadsheets, or will have to be anyway.

J.-L. Brillet

### Three Examples of Computer — Intensive Statistical Inference

We discuss three recent data analyses which illustrate making statistical inferences (finding significance levels, confidence intervals, and standard errors) with the critical assistance of a computer. The first example concerns a permutation test for a linear model situation with several covariates. We provide a computer-based compromise between complete randomization and optimum design, partially answering the question “how much randomization is enough?”A problem in particle physics provides the second example. We use bootstrap both to find a good estimator for an interesting decay probability, and then to obtain a believable confidence interval.The third problem involves a long-running cancer trial in which the z-value in favor of the more rigorous treatment wandered extensively during the course of the experiment. A dubious theory, which suggests that the wandering is just due to random noise, is rendered more believable by a bootstrap analysis. All three examples illustrate the tendancy for computer-based inference to raise new points in statistical theory.

B. Efron

### New Computer Procedures for Generating Optimal Mixture Designs on Finite Design Spaces

The investigations for developing industrial products with better properties are often based on experiments with different kinds of mixtures. In a q-component mixture the sum of the proportions of the components is unity. If the proportion of the i-th component is denoted by xi then (1)$$\mathop{\Sigma }\limits_{{i = 1}}^{q} {{X}_{1}} = 1,\;and$$(2)$$0\;\;{{X}_{1}}\;\;1,\;i = 1,q$$.

H. A. Yonchev

### Screening Based Exclusively on Experts Opinions

Let U be a performance variable which cannot be observed neither for the classified item nor for items from a learning sample. Let Z be a screening variable positively dependent on U. Besides, a learning sample is available in which the value of U for each item is evaluated by (Z,Z) where Z and Z are the opinions of two experts. We assume that Z and Z are conditionally independent given U and that for any u the conditional distributions of (Z|U=u) and of (Z|U=u) are identical. Models of this type are thoroughly discussed by Holland and Rosenbaum (1986).

Y. Ćwik, E. Pleszczyńska

### Backmatter

Weitere Informationen