Skip to main content

2017 | Buch

Robust Multivariate Analysis

insite
SUCHEN

Über dieses Buch

This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given. The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory.

The robust techniques are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis. A simple way to bootstrap confidence regions is also provided.

Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with outliers. Many R programs and R data sets are available on the author’s website.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
This chapter gives a brief introduction to multivariate analysis, including some matrix optimization results, mixture distributions, and the special case of the location model. Section 1.2 gives an overview of the book along with a table of abbreviations. Truncated distributions, covered in Section 1.7, will be useful for large sample theory for the location model and for the regression model. See Chapter 14.
David J. Olive
Chapter 2. Multivariate Distributions
Abstract
This chapter describes the multivariate location and dispersion (MLD) model, random vectors, the population mean, the population covariance matrix, and the classical MLD estimators: the sample mean and the sample covariance matrix. Some important results on Mahalanobis distances and the volume of a hyperellipsoid are given. Often methods of multivariate analysis work best when the variables \(x_1, ..., x_p\) are linearly related.
David J. Olive
Chapter 3. Elliptically Contoured Distributions
Abstract
This chapter considers elliptically contoured distributions, including the multivariate normal distribution.
David J. Olive
Chapter 4. MLD Estimators
Abstract
This chapter is the most important chapter for outlier robust statistics and covers robust estimators of multivariate location and dispersion. The practical, highly outlier resistant, \(\sqrt{n}\) consistent FCH, RFCH, and RMVN estimators of \((\varvec{\mu }, c \varvec{\varSigma })\) are developed along with proofs. The RFCH and RMVN estimators are reweighted versions of the FCH estimator. It is shown why competing “robust estimators” fail to work, are impractical, or are not yet backed by theory. The RMVN and RFCH sets are defined and will be used to create practical robust methods of principal component analysis, canonical correlation analysis, discriminant analysis, factor analysis, and multivariate linear regression in the following chapters.
David J. Olive
Chapter 5. DD Plots and Prediction Regions
Abstract
This chapter examines the DD plot of classical versus robust Mahalanobis distances and develops practical prediction regions for a future test observation \(\varvec{x}_f\) that work even if the iid training data \(\varvec{x}_1, ..., \varvec{x}_n\) come from an unknown distribution. The prediction regions can be visualized with the DD plot. The classical prediction region assumes that the data are iid from a multivariate normal distribution, and the region tends to have too small of a volume if the MVN assumption is violated. The undercoverage of the volume of the classical region becomes worse as the number of variables p increases since the volume of the region \(\{ \varvec{x}: D_{\varvec{x}}(\overline{\varvec{x}},{\varvec{S}}) \le h \} \propto h^p\). The classical region uses \(h_c = \sqrt{\chi ^2_{p, 1-\delta }}\), which tends to be much smaller than the value of h that gives correct coverage.
David J. Olive
Chapter 6. Principal Component Analysis
Abstract
This chapter considers classical and robust principal component analysis (PCA). Principal component analysis is used to explain the dispersion structure with a few linear combinations of the original variables, called principal components. These linear combinations are uncorrelated if \(\varvec{S}\) or \(\varvec{R}\) is used as the dispersion matrix. The analysis is used for data reduction and interpretation.
David J. Olive
Chapter 7. Canonical Correlation Analysis
Abstract
This chapter covers classical and robust canonical correlation analysis (CCA). Let \(\varvec{x}\) be the \(p \times 1\) vector of predictors, and partition \({\varvec{x}} = ({\varvec{w}}^T, {\varvec{y}}^T)^T\) where \({\varvec{w}}\) is \(m \times 1\) and \({\varvec{y}}\) is \(q \times 1\) with \(m = p-q \le q\) and \(m, q \ge 1\). If \(m = 1\) and \(q = 1\), then the canonical correlation is the usual correlation. Hence usually \(q > 1\) and \(m > 1\). The population canonical correlation analysis seeks m pairs of linear combinations \(({\varvec{a}}_1^T {\varvec{w}}, {\varvec{b}}_1^T {\varvec{y}}), ..., ({\varvec{a}}_m^T {\varvec{w}}, {\varvec{b}}_m^T {\varvec{y}})\) such that corr(\({\varvec{a}}_i^T {\varvec{w}}, {\varvec{b}}_i^T {\varvec{y}})\) is large under some constraints on the \({\varvec{a}}_i\) and \({\varvec{b}}_i\) where \(i = 1, ..., m\).
David J. Olive
Chapter 8. Discriminant Analysis
Abstract
This chapter considers discriminant analysis: given p measurements \(\varvec{w}\), we want to correctly classify \(\varvec{w}\) into one of G groups or populations. The maximum likelihood, Bayesian, and Fisher’s discriminant rules are used to show why methods like linear and quadratic discriminant analysis can work well for a wide variety of group distributions.
David J. Olive
Chapter 9. Hotelling’s Test
Abstract
The Hotelling’s \(T^2\) test is used to test \(H_0: {\varvec{\mu }}= {\varvec{\mu }}_0\) when there is one sample, and \(H_0: {\varvec{\mu }}_1 = {\varvec{\mu }}_2\) when there are two samples. Other applications include the multivariate matched pairs test and a test in the repeated measurements setting. These tests are robust to nonnormality.
David J. Olive
Chapter 10. MANOVA
Abstract
This chapter considers MANOVA models which are special cases of the multivariate linear model.
David J. Olive
Chapter 11. Factor Analysis
Abstract
Factor analysis gives an approximation of the dispersion matrix
$$\hat{\varvec{\varSigma }} \approx \hat{\varvec{L}}^T \hat{\varvec{L}} + \hat{\varvec{\varPsi }},$$
so \(\hat{\varvec{\varSigma }} \approx \hat{\varvec{L}}^T \hat{\varvec{L}}\) if \(\hat{\varvec{\varPsi }}\) is small.
David J. Olive
Chapter 12. Multivariate Linear Regression
Abstract
This chapter will show that multivariate linear regression with \(m \ge 2\) response variables is nearly as easy to use, at least if m is small, as multiple linear regression which has 1 response variable. For multivariate linear regression, at least one predictor variable is quantitative. Plots for checking the model, including outlier detection, are given. Prediction regions that are robust to nonnormality are developed. For hypothesis testing, it is shown that the Wilks’ lambda statistic, Hotelling Lawley trace statistic, and Pillai’s trace statistic are robust to nonnormality.
David J. Olive
Chapter 13. Clustering
Abstract
Clustering is used to classify the n cases into k groups. Unlike discriminant analysis, it is not known to which group the cases in the training data belong, and often the number of clusters k is unknown. Discriminant analysis is a type of supervised classification while clustering is a type of unsupervised classification.
David J. Olive
Chapter 14. Other Techniques
Abstract
This chapter suggests several other techniques using robust estimators. From the literature, often the “robust method” can be improved by replacing the plug in estimator (often FMCD, FS, FMM, or FMVE) with RFCH or RMVN. Using the RMVN set U can also be useful. A short list of some techniques is given in Section 14.1, and then more details are given for robust regression and 1D regression. See Table 1.​1 for acronyms.
David J. Olive
Chapter 15. Stuff for Students
Abstract
This chapter gives tips for doing research and for using R.
David J. Olive
Backmatter
Metadaten
Titel
Robust Multivariate Analysis
verfasst von
David J. Olive
Copyright-Jahr
2017
Electronic ISBN
978-3-319-68253-2
Print ISBN
978-3-319-68251-8
DOI
https://doi.org/10.1007/978-3-319-68253-2