Classification

Frontmatter

Stochastic Algorithms for Clustering

It is well known that, generally, the solution provided by a partitioning algorithm depends upon its initial position. In this paper, we consider two algorithms which incorporate random perturbations to reduce the initial-position dependence. Both appear to be variations of a general Classification EM algorithm (CEM), conceived to optimizing Classification Maximum Likelihood (CML) criteria in the mixture context. In Section 2, we present the CEM algorithm and we give its main characteristics. In Section 3, we present a Stochastic version (SEM) of the CEM algorithm and a simulated annealing algorithm for clustering (CAEM) conceived in the same framework. Both algorithms can be performed to optimize most of clustering criteria, but here we focus on the variance criterion. In Section 4, we summarize the conclusions of numerical experiments performed to analyze the practical behaviour of both algorithms to optimizing the variance criterion.

G. Celeux, G. Govaert

A Classification Algorithm for Binary Observations Based on Euclidean Representations of Hypergraphs

We are given a sample for n binary random variables. The objects form the edges of the hypergraph H = (V, E) on n vertices. The aim of the present paper is to classify the vertices of this hypergraph in the way that “similar” vertices (those having many incident edges in common) be of the same cluster. The problem is formulated as follows: given a connected hypergraph with n vertices and fixing the integer k (1 ≤ k ≤ n), we are looking for k-partition of the set of vertices such that the edges of the corresponding cut-set be as few as possible. Some combinatorial measures — the minimal k-sector, θ_k(H) and the minimal weighted cut, ν_k(H) — characterizing this structural property are introduced and the following relations between them and the eigenvalues 0 = λ₁ ≤ λ₂ ≤ ⋯ ≤ λ_n of H are proved: $$ {c_n}\theta k(H){\rm{ }} \le {\rm{ }}\mathop \sum \limits_{j = 1}^k {\rm{ }}{\lambda _n}{\rm{ }} \le {\rm{ }}{v_k}(H), $$ where the constant c_n depends only on n. The notion of spectra of hypergraphs — which is the generalisation of C-spectra of graphs (see Fiedler [4] — is also introduced together with k-dimensional Euclidean representations. We shall show that the existence of k “small” eigenvalues is a necessary but not sufficient condition for the existence of a good classification. In addition, the representatives of the vertices in an optimal k-dimensional Euclidean representation of the hypergraph should be well separated by means of their Euclidean distances. In this case the k-partition giving the optimal clustering is also obtained by this classification method and the estimation $$ {v_k}(H){\rm{ }} \le {\rm{ }}{q^2}{\rm{ }}\mathop \sum \nolimits_{j = 1}^k {\rm{ }}{\lambda _j} $$ holds true, where the constant q depends on n and the diameters of the clusters.

M. Bolla

Agglomerative Hierarchical Multicriteria Clustering Using Decision Rules

In a multicriteria clustering problem optimization over more than one criterion is required. The problem can be treated in different ways. In this paper the agglomerative hierarchical method based on decision rules for making decisions under uncertainty is proposed for solving multicriteria clustering problem. An application of proposed approach to Rosenberg and Kim kinship data is presented.

V. Batagelj, A. Ferligoj

GLIMTREE: RECPAM Trees with the Generalized Linear Model

Many problems of data analysis can be reduced to the construction of a linear prediction rule. Typically, one studies the relationship of a variable of special importance, y, which we shall call outcome variable, to other variables, here referred to as predictors. A linear prediction rule is a linear combination of the predictors whose values represent the expected value of the response variable, given the predictors. In the classical paradigm, the outcome variable is assumed to be continuous and normally distributed, and the coefficients of the linear predictor are estimated by least squares.

A. Ciampi, Q. Lin, G. Yousif

Algorithms and Statistical Software

Frontmatter

BOJA: A Program for Bootstrap and Jackknife

Main features of a computer program for bootstrap and jackknife analysis are summarized. An application of a correlation model is presented to compare re-sampling methods, and different methods for estimating bias, standard errors and confidence intervals. Future program extensions are mentioned briefly.

A. Boomsma

Prototyping Dynamic Graphics Functions in S

Much of the work in dynamic graphics for data analysis has been done on specialized workstations using specialized software. The result is quite impressive, highly efficient, but non-portable code. Portability is especially important in this area as critical assessment of many of the ideas has yet to be undertaken. We provide crude versions of some existing dynamic graphics operations in the (new) S language [1]. S is UNIX® based and widely distributed. Our functions rely on a recently developed device driver [3] for a portable and nearly device-independent system for algorithm animation [2]. We foresee application of our approach by anyone interested in exploring the power of animation techniques for data analysis.

L. A. Clark, D. Pregibon

Programming Languages for Statistical Computation

Several workers in the field of statistical computation have pointed to the need for a statistical package which would include a programming language specifically designed for statistical applications [3].Some interest has been shown in the possible applicability of the functional programming style. For example an implementation of a statistical language based on the functional programming language Lisp has been carried out by Tierney [8]. The general purpose language S has also been designed and written with statistical programming in mind by Chambers [1].With the aim of providing a flexible approach to designing a language for statistical computation the authors have created a programming language prototyping system. This system allows rapid implementation of interpreters for new and mutually quite different languages, all of which are based on a core of statistical routines. Examples of the use of a working prototype, SOL, are given. SOL incorporates generalized linear modelling facilities in a Pascal-like framework extended to include functional language facilities.

M. Harman, S. Danicic, R. Gilchrist

Expert System in Statistics

Frontmatter

Statistical Knowledge-Based Systems — Critical Remarks and Requirements for Approval

The term ’statistical expert system’ has been used extensively, although it is still ill-defined. We will call a program a statistical expert system (SES), if it can undertake actions specific for experts in the field of statistics (or a sub-field thereof) through application of knowledge-bases (Wittkowski 1989).

K. M. Wittkowski

New Approach to Guha-Method from the Reliability Viewpoint

The goal is to show the importance of the sentence length account while the application of the GUHA-method takes place. We present the pattern of such account obtained by utilization of Bayes theorem.

V. V. Kuzmenkov, O. I. Terskin

Classifying Documents: A Discriminant Analysis and an Expert System Work Together

The subject of the study here is the classification of documents into disjoint and pre-defined categories. The classification of these documents, that describe research projects, will be used in order to build a synthezised view of the activity of our research center.

G. Hebrail, M. Suchard

Estimation Procedures for Language Context: Poor Estimates are Worse than None

It is difficult to estimate the probability of a word’s context because of sparse data problems. If appropriate care is taken, we find that it is possible to make useful estimates of contextual probabilities that improve performance in a spelling correction application. In contrast, less careful estimates are found to be useless. Specifically, we will show that the Good-Turing method makes the use of contextual information practical for a spelling corrector, while attempts to use the maximum likelihood estimator (MLE) or expected likelihood estimator (ELE) fail. Spelling correction was selected as an application domain because it is analogous to many important recognition applications based on a noisy channel model (such as speech recognition), though somewhat simpler and therefore possibly more amenable to detailed statistical analysis.

W. A. Gale, K. W. Church

Knowledge Modelling for Statistical Consultation Systems; Two Empirical Studies

In discussions on the design of computerized support in statistics, it is often contended that individual differences between experts’ ideas on statistical consultation and on the application of analysis methods are so large, that it prevents general acceptance of statistical consultation systems. Of course, one could design a support system based on knowledge acquisition with a single expert. However, we preferred to consider first how serious the problem of individual differences is. Ultimately, this is an empirical question.

G. M. van den Berg, R. A. Visser

An Expert System Strategy for Selecting Interesting Results

The system EXPLORA extracts from a set of data a collection of interesting statements utilizing as far as possible the semantics of the data. The system may find too many such statements. A procedure is described that suppresses among them those that are sufficiently inferior to, and at the same time sufficiently similar to other statements that are retained. Some properties of the procedure, in particular in a statistical environment, are described.

F. Gebhardt

Computer Assisted Interpretation of Conditional Independence Graphs

Conditional independence graphs are one of the few graphical techniques available for the representation of a model which has been fitted to a given set of data. Through the use of CIGE, the computer software described in this paper, it is now possible to represent without ambiguity a much larger number of log-linear models fitted to discrete data, as well as to represent a certain class of models, the covariance selection models, fitted to continuous data. It is suggested that CIGE provides a useful tool for the graphical communication and interactive interpretation of such models, which may be of some use when working with researchers with limited statistical knowledge.

M. J. Cottee

WAMASTEX — Heuristic Guidance for Statistical Analysis

The current state and the direction of further development of the WAMASTEX system are described. The main portion of the paper discusses the empirical assessment of several decision heuristics Wamastex’s internal workings are based upon.

W. Dorda, K. A. Froeschl, W. Grossmann

Multivariate Data Analysis and Model Building

Frontmatter

On Model Search Methods

In the present paper questions linked with parallel using of symbolic and numeric computations in model search tasks are discussed as well as possibilities of high level parallelisation of simultaneous evaluation of sets of models in implementing model search algorithms.

T. Havránek

Principal Components Analysis with Respect to Instrumental Variables Via Univariate Splines

Introducing univariate splines provides an approach for solving the PCAIV problem in the non linear case: That is to build a representation of the observations as near as an another study, presupposing that the link between the two studies’variables takes an additive form. The linear PCAIV (Bonifas et al., 1984) is then a particular case ofthe spline-PCAIV (Durand, 1989) in the following sense: The solution of the linear PCAIV belongs to the set of feasible solutions of the spline-PCAIV and will constitute the first step of the associated iterative algorithm. Moreover this method presents an extension to regression using additive splines and can be considered as a particular canonical analysis in one sense which is specified.

J. F. Durand

Selecting the Best Subset of Variables in Principal Component Analysis

The problem of variable selection, in Principal Component Analysis (PCA) has been studied by several authors [1] but as yet, no selection procedures are found in the classical statistical computer softwares. Such selection procedures are found, on the other hand, for linear regression or discriminant analysis because the selection criteria are based on well known quantities such as the multiple correlation coefficient or the average prediction error.

P. L. Gonzalez, R. Cléroux, B. Rioux

Interesting Projections of Multidimensional Data by Means of Generalized Principal Component Analyses

Principal Component Analysis can produce several interesting projections of a point cloud if suitable inner products are chosen for measuring the distances between the units. We discuss two examples of such choices. The first one allows us to display outliers, while the second is expected to display clusters. Doing so we introduce a robust estimate of a covariance matrix and we investigate some of its properties.

H. Caussinus, A. Ruiz

Maximum Likelihood Estimation of Mixed Linear and Multiplicative Models for Contingency Tables Using Distan

The paper describes a general class of statistical models for the analysis of multidimensional contingency tables, suggests a method of computing maximum likelihood estimates under these models and describes how the computations can be carried out using the DISTAN package. The class of models considered includes log-linear. i.e. multiplicative and linear models including the case when the model is defined by both types of assumptions.

T. Rudas

Alternate Forms of Graphical Modelling — A Comparison

The theoretical ideas underpinning the relatively new technique of graphical modelling have received considerable coverage in recent publications. This paper considers the more practical side of graphical modelling, examining the options open to potential users and the limitations imposed upon them by the currently available software.

A. Scott, J. Whittaker

Exact Significance Testing by the Method of Control Variates

Monte Carlo estimates of exact permutational p-values are an important alternative to both asymptotic and exact inference. For many sparse data sets, asymptotic tests are unreliable, but corresponding exact tests are too difficult to compute, despite the availability of fast numerical algorithms for their execution. Monte Carlo methods are a good compromise between these two extremes. By sampling a predetermined number of times from an appropriate reference set one can obtain an arbitrarily narrow confidence interval for the exact p-value; for instance one could state with 99% confidence that the estimated p-value was accurate to the third decimal place. This level of accuracy is generally sufficient.

C. Mehta, N. Patel, P. Senchaudhuri

Testing Collapsibility of Hierarchical Loglinear Models for Contingency Tables

A (hierarchical) loglinear model is said to be collapsible onto a set of variables if the corresponding marginal totals can be drawn from the tables of marginal totals formed by summing the sufficient marginals of the model over the remaining variables. Collapsibility has important consequences for hypothesis testing and model selection, and can be useful in data reduction. We shall present a procedure for computing marginal totals for arbitrary sets of variables; as a by-product, a simple algorithm for testing collapsibility is obtained.

F. M. Malvestuto

The Generalised Biplot: Software Potential

The whole area of ordination/multidimensional scaling is one of the most used statistical methodologies in the applied sciences. The very names of ordination, coming from ecology, and multidimensional scaling, from psychometrics, illustrate the range of applications. Included are well-known methods such as components analysis, biplots, non-metric scaling, correspondence analysis and multiple correspondence analysis. I shall be concerned here with unifying these seemingly diverse techniques and hence providing a solid basis for unified software. Generalisations to samples from several populations, several sets of variables or to so-called multiway techniques ( see Coppi and Belasco, 1989) will not be considered here.

J. C. Gower

Exploratory Approach and Maximum Likelihood Estimation of Models for Non Symmetrical Analysis of Two-Way Multiple Contingency Tables

In non symmetrical analysis of two—way multiple contingency tables we are interested in the dependence between one response variable and two explicative variables. The exploratory approach based on multiple and partial non symmetrical correspondence analysis can be used complementary to asymmetrical association models, logit—linear models and latent budget analysis. In this paper, maximum likelihood estimation of these models is obtained by using alternatingly a multidimensional Newton algorithm. Model parameters are identified by a generalized singular value decomposition with the same metrics as used in non symmetrical correspondence analyses. This provides factorial representations of both the dependence effects among the variables and the residuals from independence.

R. Siciliano, N. C. Lauro, A. Mooijaart

An Orthogonal Procrustes Rotation Procedure for Multiple Correspondence Analysis

Multiple Correspondence Analysis (MCA) is a well-known technique for the exploratory analysis of qualitative variables. For several descriptions of the technique we refer to Tenenhaus and Young (1985). Here we merely give the following rather technical description. Let each qualitative variable be described by a socalled indicator matrix G_j of order n x k_j, j = l,...,m, where n is the number of observation-units, m is the number of variables, and k_j is the number of categories of variable j. An indicator matrix contains one unit element in each row, located in the column corresponding to the category to which the observation unit belongs, and zeroes elsewhere. Let D_j ≡ G_jG_j, the diagonal matrix with category frequencies, then the projector for the space spanned by the columns of G_j is given by G_jD_j−1G_j. MCA can be seen as the method that maximizes 1$$ {\rm{f}}(X){\rm{ = tr }}{X^\prime }{\rm{ }}\mathop \sum \limits_{j = 1}^m {\rm{ }}{P_j}{\rm{ }}X, $$ over the component scores matrix X, subject to X′X = I_r, for some dimensionality r, where P_j is the doubly centered version of the projector for the column-space of G_j. From (1) it is obvious that the solution for X is given by K_r, the matrix with in its columns the first r eigenvectors of ∑_j_P_j, or by any rotation of K_r.

H. A. L. Kiers, J. M. F. Ten Berge

Optimization Techniques and Nonlinear Models

Frontmatter

Optimization in Statistics — Recent Trends

In the recent past interaction between mathematical programming and statistical problems has grown considerably. Optimization with multi objective functions and global optimization have also found use in solving statistical problems arising in various fields. Especially in the field of Quality Engineering and On-line Process Control, due to the innovative approaches of Genichi Taguchi [16] some problems have been identified and solved using optimization methods. Since the publication of MATHEMATICAL PROGRAMMING IN STATISTICS [2] there has been a large number of books and paper written, which bring out the connections between Statistical Methods and Optimization.

T. S. Arthanari

Fitting Non Linear Models with Two Components of Error

The problem of fitting non linear models to a sample of individual data sets is discussed. For each individual a response curve is proposed, with errors of measurement, and the response curve parameters are themselves considered to be samples from a population of curve parameters, with a covariance matrix to be estimated. By studying cases where full information is available by a conventional analysis of each individual data set, it is shown that a direct attempt to fit the full model by maximum likelihood estimation does not always lead to acceptable estimates of the covariance matrix.

G. J. S. Ross

Computing for Robust Statistics

Frontmatter

Some Proposals for Fast HBD Regression

Existing high-breakdown regression estimators need substantial computation time. In this paper we propose a fast estimator for robust regression with a breakdown point of 1/3. This is not the highest value possible, but it is independent of the number of predictor variables. Also a more refined variant of this method is proposed. Both algorithms start by identifying leverage points by computationally cheap robust estimates of location and scale in x-space, and then apply L₁ regression to the non-leverage points. Some examples are given, and a small simulation study is presented. It appears that in these cases both new algorithms perform equally well as more time-consuming estimators.

P. J. Rousseeuw, B. C. van Zomeren

Robust Selection of Variables in the Discriminant Analysis Based on MVE and MCD Estimators

The paper presents a method for selecting the most discriminative variables in the discriminant analysis, using Wilks’ lambda statistic, robustified by means of high breakdown point estimates of multivariate means and covariance matrices. Some examples are given for comparison of the method to the selection of variables based on classical and M-estimates.

V. K. Todorov, N. M. Neykov, P. N. Neytchev

Interactively Computing Robust Covariance Matrices

Covariance matrices may be seen as the basis of many mutivariate methods, like canonical correlation analysis, principal component analysis, factor analysis, to mention a few. Because of the high sensibility of covariance matrices to outliers it is very important to use robust estimates instead of the classical ones. In this paper we consider two types of estimators: M-estimators and high-breakdown-point-estimators. For the easy handling of these estimators it seems very advantageously to work within an interactive environment which guarantees high flexibility. The programs we refer in this paper are written in APL and have been integrated in an interactive program package called GRIPS (General and Robust Interactive Programs for Statistics, see Karnel, 1989a).

G. Karnel

Sensitivity Analysis in Factor Analysis: Methods and Software

The present paper deals with methods and software of sensitivity analysis in some procedures of exploratory factor analysis. Our aim is to investigate how a small change of data affects the results of analysis. To do this Tanaka and Odaka(1989a,b,c) proposed methods of sensitivity analysis in principal factor analysis(PFA), maximum likelihood factor analysis(MLFA) and least squares factor analysis(LSFA), respectively. The major mathematical tools are influence functions for some functions of eigenvalues and eigenvectors of a real symmetric matrix. We treat the influence of omitting or downweighting one or more individuals on the estimates of the unique and common variance matrices $$ \mathop \Delta \limits^ \wedge $$ and $$ \mathop {T*}\limits^ \wedge $$, the precision of $$ \mathop \Delta \limits^ \wedge $$, and the goodness of fit.

Y. Tanaka, E. Castaño-Tostado, Y. Odaka

Influence Functions of Eigenvalues and Eigenvectors in Multidimensional Data Analysis

Influence functions [2] of the most important parameters in principal component analysis (PCA), e.g., eigenvalues, eigenvectors and projection operators, have been derived by a number of authors [1,3,7] using results from the perturbation theory of the ordinary eigenproblem. In the present paper we show that the perturbation theory of generalized eigenproblems [4] underlies and unifies the treatment of influence functions of eigenvalues and eigenvectors in multidimensional data analysis and we present new applications in canonical variate (CVA) and canonical correlation analysis (CCA).

M. Romanazzi

Algorithms for Non-Linear Huber Estimation

In the non-linear least squares problem we minimize 1.1$$ \frac{1}{2}{\rm{ }}\mathop \sum \limits_{j = 1}^m {\rm{ }}{{\rm{f}}_{\rm{j}}}{{\rm{ }}^2}(x) $$ where f₁,..., f_m is a set of non-linear smooth functions ℜn → ℜ and x is an n-vector of “parameters”.

H. Ekblom, K. Madsen

Statistics and Database Management

Frontmatter

Co-Operative Processing — A Challenge for Statistics and Database Management

According to the essential importance of data for all kinds of statistics the data organisation was and is and will be a very crucial point in statistical work in general and especially in Official Statistical Information Systems (OSIS). This background is reflected in scientific papers and discussions on conferences and workshops over a long time, which however also show a change in emphasis and context in respect to data organisation. Since the end of the sixties the formation of statistical databases has dominated the development of OSIS. With the now arising new conditions, namely the changes in the environment of each OSIS and the evolving of supranational OSIS, both statistics and database management are forced to react. The indispensable new organisation of the division of labour in the whole process of producing and consuming official statistics will require perfecting the up to now dominating distributed processing by a new quality: co-operative processing. But this new quality is not possible without some further development of the management of statistical databases.

K. Neumann

A Structured Language for Modelling Statistical Data

In this paper we describe a new language for statistical data modelling, which offers a general framework for the representation of elementary and summary data. Three are the main characteristics of the language: 1) the types of modeling primitives of the language are particularly suited for representing objects from a statistical point of view; 2) the language includes a rich set of structuring mechanisms for both elementary and summary data, which are given a formal semantics by means of logic; 3) the language is equipped with specialized inference procedures, allowing for performing different kinds of checks on the representation.

T. Catarci, G. D’Angiolini, M. Lenzerini

Time Dependent Models

Frontmatter

Spectral Analysis of Non-Stationary Time Series

The last decade has seen a great popularity of application of the methods of the correlation and spectral analyses of time series [1–9]. The spectral approach to researches had a necessary condition: a given model of the series must be stationary. But in practice in most researches this condition is possible only within some limited time interval or impossible at all. At the same time a statistical stady of the spectra of these series is drowing more and more attention [4–12] despite the qualitativs and technical complexity of models. Spectral estimates obtained by time shift (they were systematically studied in [8,9]) suggested a new approach which is simple and convenient in most situations. Below is the study of a time series model which not only substantiates investigation of spectra changing in time but also brings us to the problems of the multivariable statistical analysis of these spectra: pattern recognition in the spectral domain, spectral desorders, spectral investigation when there are many strong non-stationary phenomena, and others. The size of the paper and some methodical considerations prevent demonstration of application of this approach to various problems ( its simplicity makes these problems quite easy and particularly logical from the researcher’s viewpoint.

I. G. Žhurbenko

ARMA Simulator for Testing Methods of Process Diagnostic

In the management of automatic or semi-automatic plants it is important to check that the system dynamics do not undergo variations beyond limits considered acceptable, or, if this happens, the variations which have occurred must be quantified. The presence of a large number of variables introduces serious obstacles in system management because of: the consistency of behaviour among the variables, the computation time for their control and the difficulties in diagnosis.

O. Lessi, L. Olivi, P. Parisi

Statistical Inference in an Extremal Markovian Model

The markovian sequence X_i = k max(X_i-1, Y_i), i≥1, 0<k<1, X₀ a random variable with distribution function H₀, and {Y_i}_i≥1 a sequence of independent, identically distributed random variables, independent of X₀, with d.f. F, is considered in this paper, as the genesis of a model for which statistical inference is developed, under stationarity conditions.

M. I. Gomes

Interpretation of Spectral Estimation Results for Nonstationary Signals

We consider application of spectral estimation methods (MSE) to the problem of identification of parameters of nonstationary signals. The main result of our work is obtaining interpretation conditions, that take into account frequency resolution and nonstationarity of different components of the signal at the same time. These conditions establish the connection between the length of stationarity intervals and a priori information about signal parameters.

S. J. Pliskin, V. A. Karpenko

An Automated Method for Trend Analysis

An automated method designed to handle outliers, missing observations and structural changes in smoothing time series is presented. Smoothing of trend, adjusted for seasonality is done using Akaike’s Information Criterion (AIC) in the L₁ (least absolute deviation) context. AIC selects approximations to trend among polynomials of up to degree three or cubic cardinal B-splines with a varying number of knots. A short time series is analyzed to demonstrate the method.

T. Atilgan

Analysis of Spatial Data

Frontmatter

A Test of Spatial Isotropy

In many practical situations the semi-variogram function is a powerful tool in geostatistical analyses of spatial data. Krige (1976) states that an important requirement of such analyses is the ability to detect the presence or absence of isotropy, in which the semi-variogram exhibits the same behaviour in all directions. Geostatisticians have tended to use graphical methods, essentially plotting the sample semi-variogram for different directions. In an early attempt to use statistical methods to detect anisotropy Matheron (1961) used a x2-test of homogeneity of the sample semi-variogram in perpendicular directions, though the dependence of the semi-variogram at different lags and in different directions was neglected in that study. More recently Baczkowski and Mardia (1990) presented tests of symmetry using the sample semi-variogram under Gaussian assumptions. A test is here presented which does not require distributional assumptions for the regionalized variable under study.

A. J. Baczkowski

Computational Inference

Frontmatter

Characteristics of Sequential Sampling Plans for Attributes: Algorithms for Exact Computing

This paper deals with sequential sampling plans for attributive inspection. With a more general definition any kind of sequential sampling plans are included, especially all curtailed single, double and multiple sampling plans as well as all truncated and non-truncated SPRT’s (sequential probability ratio tests). In a natural way (by direct methods) we find a recursive formula to compute the termination probabilites for acceptance and rejection of the whole lot. Exact computer algorithms are given in a pseudo-programming language to calculate these probabilities in both cases, sampling with and without replacement. Then there is no difficulty to exactly compute the characteristics of any sequential sampling plan like operating characteristic, power function, average sample number or costs, average outgoing quality, average total inspection and so on.

R. Würländer

Exact Experimental Designs Via Stochastic Optimization for Nonlinear Regression Models

The aim of this paper is twofold 1.to present a new family of optimal design criteria, based on minimization of the expected volumes of exact confidence regions for nonlinear regression model parameters.2.to compare the efficiencies of stochastic optimization algorithms of the Kiefer-Wolfowitz type, with those of deterministic algorithms of the Quasi-Newton type, for the minimization of the criteria.

J. P. Vila

A Comparison of Algorithms for Combination of Information in Generally Balanced Designs

Methods are discussed for producing combined estimates of treatments whose effects can be estimated in more than one stratum of a generally balanced design. The algorithms of Nelder (1968), Wilkinson (1970) and Payne & Wilkinson (1977) provide a method that is specific to generally balanced designs. This method is compared with others that do not assume general balance, namely maximum likelihood and residual maximum likelihood (REML), and its computing requirements are compared with the REML implementation of Robinson, Thompson & Digby (1982).

R. W. Payne, S. J. Welham

A Comparison on Non-Negative Estimators for Ratios of Variance Components

The problems of point estimation for ratios of nonnegative variance components in the balanced one way random effects model are considered. Seven estimators are compared with respect to their biases and mean squared error (MSE). A new estimator (New) that dominates ML type estimators in terms of MSE is derived. In conclusion, New and MINQE estimators are recommended that these estimators possess smaller MSE even in the presence of nontrivial bias.

J. T. Lee, K. S. Lee

Optimal Fit in Non-Parametric Modelling Via Computationally Intensive Inference

We present a criterion for model selection in non-parametric inference based on comparing the confidence intervals of a family of competing estimators (CIC). Application to M-spline density estimation is dicussed: here the family of density estimators is indexed by the meta-parameter “number of knots”. Numerical examples show the relationship between CIC model choice and model choice based on the AIC and the BIC.

M. Abrahamowicz, A. Ciampi

Notes About New Developments in Statistical Software

Frontmatter

Statistical Models in S

The interactive data analysis and graphics language S (Becker, Chambers and Wilks, 1988) has become a popular environment for both data analysts and research statisticians. A common complaint, however, has concerned the lack of statistical modeling tools, such as those provided by GLIM© or GENSTAT©.

J. Chambers, T. Hastie, D. Pregibon

GLIM4 — Developments in Model Fitting

GLIM was first developed with the simple aim of providing a software tool that would allow the fitting of Generalised Linear Models, with a small number of utilities necessary for practical data analysis. Today, using GLIM377 (Payne et al, 1985) the variety of models that GLIM has been used to fit (Aitkin et al, 1989) is far wider than anyone could have imagined in the early days of release 1. For the new release, the temptation to extend the fitting procedure of GLIM to include models well outside those of GLMs has been resisted. Rather, it is the intention of the current enhancement to incorporate some of the facilities that will extend the range of GLMs and their extensions which can be fitted; most effort has been concentrated on extending the model formula syntax, to allow the user greater control and access to system structures, to improve the operation and readability of GLIM macros and to provide simple high-quality graphics to allow graphical techniques for model checking and display to be used.

B. Francis, M. Green, M. Bradley

Springer Professional

About this book

Table of Contents

Frontmatter

Classification

Frontmatter

Stochastic Algorithms for Clustering

A Classification Algorithm for Binary Observations Based on Euclidean Representations of Hypergraphs

Agglomerative Hierarchical Multicriteria Clustering Using Decision Rules

GLIMTREE: RECPAM Trees with the Generalized Linear Model

Algorithms and Statistical Software

Frontmatter

BOJA: A Program for Bootstrap and Jackknife

Prototyping Dynamic Graphics Functions in S

Programming Languages for Statistical Computation

Expert System in Statistics

Frontmatter

Statistical Knowledge-Based Systems — Critical Remarks and Requirements for Approval

New Approach to Guha-Method from the Reliability Viewpoint

Classifying Documents: A Discriminant Analysis and an Expert System Work Together

Estimation Procedures for Language Context: Poor Estimates are Worse than None

Knowledge Modelling for Statistical Consultation Systems; Two Empirical Studies

An Expert System Strategy for Selecting Interesting Results

Computer Assisted Interpretation of Conditional Independence Graphs

WAMASTEX — Heuristic Guidance for Statistical Analysis

Multivariate Data Analysis and Model Building

Frontmatter

On Model Search Methods

Principal Components Analysis with Respect to Instrumental Variables Via Univariate Splines

Selecting the Best Subset of Variables in Principal Component Analysis

Interesting Projections of Multidimensional Data by Means of Generalized Principal Component Analyses

Maximum Likelihood Estimation of Mixed Linear and Multiplicative Models for Contingency Tables Using Distan

Alternate Forms of Graphical Modelling — A Comparison

Exact Significance Testing by the Method of Control Variates

Testing Collapsibility of Hierarchical Loglinear Models for Contingency Tables

The Generalised Biplot: Software Potential

Exploratory Approach and Maximum Likelihood Estimation of Models for Non Symmetrical Analysis of Two-Way Multiple Contingency Tables

An Orthogonal Procrustes Rotation Procedure for Multiple Correspondence Analysis

Optimization Techniques and Nonlinear Models

Frontmatter

Optimization in Statistics — Recent Trends

Fitting Non Linear Models with Two Components of Error

Computing for Robust Statistics

Frontmatter

Some Proposals for Fast HBD Regression

Robust Selection of Variables in the Discriminant Analysis Based on MVE and MCD Estimators

Interactively Computing Robust Covariance Matrices

Sensitivity Analysis in Factor Analysis: Methods and Software

Influence Functions of Eigenvalues and Eigenvectors in Multidimensional Data Analysis

Algorithms for Non-Linear Huber Estimation

Statistics and Database Management

Frontmatter

Co-Operative Processing — A Challenge for Statistics and Database Management

A Structured Language for Modelling Statistical Data

Time Dependent Models

Frontmatter

Spectral Analysis of Non-Stationary Time Series

ARMA Simulator for Testing Methods of Process Diagnostic

Statistical Inference in an Extremal Markovian Model

Interpretation of Spectral Estimation Results for Nonstationary Signals

An Automated Method for Trend Analysis

Analysis of Spatial Data

Frontmatter

A Test of Spatial Isotropy

Computational Inference

Frontmatter

Characteristics of Sequential Sampling Plans for Attributes: Algorithms for Exact Computing

Exact Experimental Designs Via Stochastic Optimization for Nonlinear Regression Models

A Comparison of Algorithms for Combination of Information in Generally Balanced Designs

A Comparison on Non-Negative Estimators for Ratios of Variance Components

Optimal Fit in Non-Parametric Modelling Via Computationally Intensive Inference

Notes About New Developments in Statistical Software

Frontmatter

Statistical Models in S

GLIM4 — Developments in Model Fitting

Backmatter