Skip to main content

1987 | Buch

Multivariate Statistical Modeling and Data Analysis

Proceedings of the Advanced Symposium on Multivariate Modeling and Data Analysis May 15–16, 1986

herausgegeben von: H. Bozdogan, A. K. Gupta

Verlag: Springer Netherlands

Buchreihe : Theory and Decision Library

insite
SUCHEN

Über dieses Buch

This volume contains the Proceedings of the Advanced Symposium on Multivariate Modeling and Data Analysis held at the 64th Annual Heeting of the Virginia Academy of Sciences (VAS)--American Statistical Association's Vir­ ginia Chapter at James Madison University in Harrisonburg. Virginia during Hay 15-16. 1986. This symposium was sponsored by financial support from the Center for Advanced Studies at the University of Virginia to promote new and modern information-theoretic statist­ ical modeling procedures and to blend these new techniques within the classical theory. Multivariate statistical analysis has come a long way and currently it is in an evolutionary stage in the era of high-speed computation and computer technology. The Advanced Symposium was the first to address the new innovative approaches in multi­ variate analysis to develop modern analytical and yet practical procedures to meet the needs of researchers and the societal need of statistics. vii viii PREFACE Papers presented at the Symposium by e1l11lJinent researchers in the field were geared not Just for specialists in statistics, but an attempt has been made to achieve a well balanced and uniform coverage of different areas in multi­ variate modeling and data analysis. The areas covered included topics in the analysis of repeated measurements, cluster analysis, discriminant analysis, canonical cor­ relations, distribution theory and testing, bivariate densi ty estimation, factor analysis, principle component analysis, multidimensional scaling, multivariate linear models, nonparametric regression, etc.

Inhaltsverzeichnis

Frontmatter
1. On the Application of AIC to Bivariate Density Estimation, Nonparametric Regression and Discrimination
Abstract
Some simple data analytic procedures are available for bivariate nonparametric density estimation. If we use a linear approximation of specified basis functions then the coefficients can be estimated by the EM algorithm, and the number of terms judged by Akaike’s information criterion. The method also yields readily compatible approaches to nonparametric regression and logistic discrimination. Tukey’s energy consumption data and a psychological test for 25 normal and 25 psychotic patients are re-analyzed and the current methodology compared with previous procedures. The procedures offer many possible applications in the biomedical area, which are discussed in Sections 5 and 6, e.g. it is possible to analyze noisy data sets in situations where structured regression techniques would typically fail.
Taskin Atilgan, Tom Leonard
2. On the Interface between Cluster Analysis, Principal Component Analysis, and Multidimensional Scaling
Abstract
This paper shows how methods of cluster analysis, principal component analysis, and multidimensional scaling may be combined in order to obtain an optimal fit between a classification underlying some set of objects 1,…,n and its visual representation in a low-dimensional euclidean space ℝs. We propose several clustering criteria and corresponding k-means-like algorithms which are based either on a probabilistic model or on geometrical considerations leading to matrix approximation problems. In particular, a MDS-clustering strategy is presented for-displaying not only the n objects using their pairwise dissimilarities, but also the detected clusters and their average distances.
H. H. Bock
3. An Expert Model Selection Approach to Determine the “Best” Pattern Structure in Factor Analysis Models
Abstract
This paper introduces and develops an expert data-analytic model selection approach based on Akaike’s Information Criterion (AIC) and asymptotically Consistent Akaike’s Information Criterion (CAIC), to choose the number of factors, m, and to determine the “best” factor pattern structure among all possible patterns under the orthogonal factor model using Mallows’ Cp Criterion.
A subset selection procedure is carried out using a “leaps and bounds” algorithm to interpret the complex interrelationships between the best fitting number of factors and the original variables.
The new approach presented in this paper tries to unify both the exploratory and confirmatory factor analysis to find “the best fitting simple structure for the best m-factor model” in one expert statistical system.
Numerical examples are provided to show how to achieve flexibility in modeling and to demonstrate the efficiency of this procedure.
Hamparsum Bozdogan, Donald E. Ramirez
4. Blus Residuals in Multivariate Linear Models
Abstract
The examination of residuals is always an important aspect of fitting data to statistical models in terms of identifying influential observations and detecting violations of assumptions. The latter use is difficult to perform for the ordinary residuals in the univariate linear model because these residuals are not independent. This led researchers to consider alternative sets of residuals, such as the best linear, unbiased, scalar-type variance (BLUS) residuals. In this article the definition of BLUS residuals is extended to multivariate models. The extension is relatively straightforward for the multivariate analysis of variance (MANOVA) model, but not for the generalized multivariate analysis of variance (GMANOVA) model and the mixed MANOVA-GMANOVA model. For each of the GMANOVA and mixed MANOVA-GMANOVA models two sets of BLUS residuals arise naturally, namely, a “between” set and a “within” set.
Vernon M. Chinchilli
5. Analysis of Within- and Across-Subject Correlations
Abstract
In some fields of applications, response variables are measured on k(k > 1) independent samples for each experimental subject. For this type of situation, within-subject and across-subject correlation matrices are defined and methods of analysis are discussed.
The maximum likelihood estimators for the two different correlation matrices are obtained, and the exact test for within-subject correlation and two approximate tests for across-subject correlation are proposed.
Simulation studies for bivariate distributions suggest that the estimators are satisfactory although the across-subject correlation coefficients are somewhat under estimated. The studies also showed that the two approximate tests are adequate in terms of the size and power. Other properties of the estimators and the tests are discussed.
Sung C. Choi, Vernon M. Chinchilli
6. Two-Stage Multi-Sample Cluster Analysis as a General Approach to Discriminant Analysis
Abstract
This paper introduces Two-Stage Multi-Sample Cluster Analysis (TSMSCA), i.e., the problem of grouping samples and improving upon homogeneity via reassigning individual objects, as a general approach to ‘classical’ discriminant analysis (DA).
Akaike’s Information Criterion (AIC) and Bozdogan’s CAIC are derived and used in TSMSCA to choose the best fitting model and the best partition among all possible clustering alternatives. With this approach the dimension of the discriminant space is determined, and using a decision-tree classifier, the best lower dimensional models are identified, yielding a hierarchy of efficient separation and assignment rules. On each step of the hierarchy, the performance of the classification of the best discriminant model is evaluated either by a cross-validation method or the method of conditional clustering.
Cross-validation reassigns one object at a time based only on the tentatively updated model, whereas the conditional clustering method actually executes reassignments of objects via a transfer and swapping algorithm given the best discriminant model as the initial partition.
Numerical examples are carried out on real data sets to demonstrate the generality and versatility of the proposed new approach.
Dorothea Eisenblätter, Hamparsum Bozdogan
7. On Relationship Between the AIC and the Overall Error Rates for Selection of Variables in a Discriminant Analysis
Summary
This paper deals with the problem of selecting the “best” subset of variables in a discriminant analysis with the aim of allocating future observations, in the context of two multivariate normal populations with the same covariance matrix. We consider the methods based on the following three criteria: (i) the AIC for the “no additional information” model, (ii) the overall error rate criterion based on the linear classification statistic and (iii) the overall error rate criterion based on the ML classification statistic. It is shown that there is a close relationship between the AIC and the overall error rate criteria.
Yasunori Fujikoshi
8. Distribution of Likelihood Criteria and Box Approximation
Abstract
In this paper, the exact distribution of a random variable whose moments are a certain function of gamma functions (Box, 1949), has been derived. It is shown that Box’s asymptotic expansion can be obtained from this exact distribution by collecting terms of the same order. From the point of view of computation, the derived series has a distinct advantage over the results of Box since the coefficients satisfy a recurrence relation.
A. K. Gupta, J. Tang
9. Topics in the Analysis of Repeated Measurements
Abstract
This study is concerned with the analysis of repeated scalar measurements having r ≥ 1 repetitions within cells of a two-way array. Alternative models are considered for dependencies among observations within subjects, and analytical methods are identified as appropriate for each. Procedures for multiple comparisons, for analyzing factorial experiments, and for other nonstandard tests are featured. Emphasis is given to the validity and efficiency of the several procedures considered. Nonparametric and robust aspects of relevant normal-theory tests are discussed with reference to the analysis of repeated measurements.
D. R. Jensen
10. Metric Considerations in Clustering: Implications for Algorithms
Abstract
Given measurements on p variables for each of n individuals, aspects of the problem of clustering the individuals are considered. Special attention is given to models based upon mixtures of distributions, esp. multivariate normal distributions. The relationship between the orientation(s) of the clusters and the nature of the within-cluster covariance matrices is reviewed, as is the inadequacy of transformation to principal components based on the overall (total) covariance matrix of the whole (mixed) sample. The nature of certain iterative algorithms is discussed; variations which result from allowing different covariance matrices within clusters are studied.
Stanley L. Sclove
Backmatter
Metadaten
Titel
Multivariate Statistical Modeling and Data Analysis
herausgegeben von
H. Bozdogan
A. K. Gupta
Copyright-Jahr
1987
Verlag
Springer Netherlands
Electronic ISBN
978-94-009-3977-6
Print ISBN
978-94-010-8264-8
DOI
https://doi.org/10.1007/978-94-009-3977-6