nach oben

2017 | Buch

Robust Multivariate Analysis

verfasst von: David J. Olive

Verlag: Springer International Publishing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given. The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory.

The robust techniques are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis. A simple way to bootstrap confidence regions is also provided.

Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with outliers. Many R programs and R data sets are available on the author’s website.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract

This chapter gives a brief introduction to multivariate analysis, including some matrix optimization results, mixture distributions, and the special case of the location model. Section 1.2 gives an overview of the book along with a table of abbreviations. Truncated distributions, covered in Section 1.7, will be useful for large sample theory for the location model and for the regression model. See Chapter 14.

David J. Olive

Chapter 2. Multivariate Distributions

Abstract

This chapter describes the multivariate location and dispersion (MLD) model, random vectors, the population mean, the population covariance matrix, and the classical MLD estimators: the sample mean and the sample covariance matrix. Some important results on Mahalanobis distances and the volume of a hyperellipsoid are given. Often methods of multivariate analysis work best when the variables $x_1, ..., x_p$ are linearly related.

David J. Olive

Chapter 3. Elliptically Contoured Distributions

Abstract

This chapter considers elliptically contoured distributions, including the multivariate normal distribution.

David J. Olive

Chapter 4. MLD Estimators

Abstract

This chapter is the most important chapter for outlier robust statistics and covers robust estimators of multivariate location and dispersion. The practical, highly outlier resistant, $\sqrt{n}$ consistent FCH, RFCH, and RMVN estimators of $(\varvec{\mu }, c \varvec{\varSigma })$ are developed along with proofs. The RFCH and RMVN estimators are reweighted versions of the FCH estimator. It is shown why competing “robust estimators” fail to work, are impractical, or are not yet backed by theory. The RMVN and RFCH sets are defined and will be used to create practical robust methods of principal component analysis, canonical correlation analysis, discriminant analysis, factor analysis, and multivariate linear regression in the following chapters.

David J. Olive

Chapter 5. DD Plots and Prediction Regions

Abstract

This chapter examines the DD plot of classical versus robust Mahalanobis distances and develops practical prediction regions for a future test observation $\varvec{x}_f$ that work even if the iid training data $\varvec{x}_1, ..., \varvec{x}_n$ come from an unknown distribution. The prediction regions can be visualized with the DD plot. The classical prediction region assumes that the data are iid from a multivariate normal distribution, and the region tends to have too small of a volume if the MVN assumption is violated. The undercoverage of the volume of the classical region becomes worse as the number of variables p increases since the volume of the region $\{ \varvec{x}: D_{\varvec{x}}(\overline{\varvec{x}},{\varvec{S}}) \le h \} \propto h^p$. The classical region uses $h_c = \sqrt{\chi ^2_{p, 1-\delta }}$, which tends to be much smaller than the value of h that gives correct coverage.

David J. Olive

Chapter 6. Principal Component Analysis

Abstract

This chapter considers classical and robust principal component analysis (PCA). Principal component analysis is used to explain the dispersion structure with a few linear combinations of the original variables, called principal components. These linear combinations are uncorrelated if $\varvec{S}$ or $\varvec{R}$ is used as the dispersion matrix. The analysis is used for data reduction and interpretation.

David J. Olive

Chapter 7. Canonical Correlation Analysis

Abstract

This chapter covers classical and robust canonical correlation analysis (CCA). Let $\varvec{x}$ be the $p \times 1$ vector of predictors, and partition ${\varvec{x}} = ({\varvec{w}}^T, {\varvec{y}}^T)^T$ where ${\varvec{w}}$ is $m \times 1$ and ${\varvec{y}}$ is $q \times 1$ with $m = p-q \le q$ and $m, q \ge 1$. If $m = 1$ and $q = 1$, then the canonical correlation is the usual correlation. Hence usually $q > 1$ and $m > 1$. The population canonical correlation analysis seeks m pairs of linear combinations $({\varvec{a}}_1^T {\varvec{w}}, {\varvec{b}}_1^T {\varvec{y}}), ..., ({\varvec{a}}_m^T {\varvec{w}}, {\varvec{b}}_m^T {\varvec{y}})$ such that corr(${\varvec{a}}_i^T {\varvec{w}}, {\varvec{b}}_i^T {\varvec{y}})$ is large under some constraints on the ${\varvec{a}}_i$ and ${\varvec{b}}_i$ where $i = 1, ..., m$.

David J. Olive

Chapter 8. Discriminant Analysis

Abstract

This chapter considers discriminant analysis: given p measurements $\varvec{w}$, we want to correctly classify $\varvec{w}$ into one of G groups or populations. The maximum likelihood, Bayesian, and Fisher’s discriminant rules are used to show why methods like linear and quadratic discriminant analysis can work well for a wide variety of group distributions.

David J. Olive

Chapter 9. Hotelling’s Test

Abstract

The Hotelling’s $T^2$ test is used to test $H_0: {\varvec{\mu }}= {\varvec{\mu }}_0$ when there is one sample, and $H_0: {\varvec{\mu }}_1 = {\varvec{\mu }}_2$ when there are two samples. Other applications include the multivariate matched pairs test and a test in the repeated measurements setting. These tests are robust to nonnormality.

David J. Olive

Chapter 10. MANOVA

Abstract

This chapter considers MANOVA models which are special cases of the multivariate linear model.

David J. Olive

Chapter 11. Factor Analysis

Abstract

Factor analysis gives an approximation of the dispersion matrix

$$\hat{\varvec{\varSigma }} \approx \hat{\varvec{L}}^T \hat{\varvec{L}} + \hat{\varvec{\varPsi }},$$

so $\hat{\varvec{\varSigma }} \approx \hat{\varvec{L}}^T \hat{\varvec{L}}$ if $\hat{\varvec{\varPsi }}$ is small.

David J. Olive

Chapter 12. Multivariate Linear Regression

Abstract

This chapter will show that multivariate linear regression with $m \ge 2$ response variables is nearly as easy to use, at least if m is small, as multiple linear regression which has 1 response variable. For multivariate linear regression, at least one predictor variable is quantitative. Plots for checking the model, including outlier detection, are given. Prediction regions that are robust to nonnormality are developed. For hypothesis testing, it is shown that the Wilks’ lambda statistic, Hotelling Lawley trace statistic, and Pillai’s trace statistic are robust to nonnormality.

David J. Olive

Chapter 13. Clustering

Abstract

Clustering is used to classify the n cases into k groups. Unlike discriminant analysis, it is not known to which group the cases in the training data belong, and often the number of clusters k is unknown. Discriminant analysis is a type of supervised classification while clustering is a type of unsupervised classification.

David J. Olive

Chapter 14. Other Techniques

Abstract

This chapter suggests several other techniques using robust estimators. From the literature, often the “robust method” can be improved by replacing the plug in estimator (often FMCD, FS, FMM, or FMVE) with RFCH or RMVN. Using the RMVN set U can also be useful. A short list of some techniques is given in Section 14.1, and then more details are given for robust regression and 1D regression. See Table 1.1 for acronyms.

David J. Olive

Chapter 15. Stuff for Students

Abstract

This chapter gives tips for doing research and for using R.

David J. Olive

Backmatter

Titel: Robust Multivariate Analysis
verfasst von: David J. Olive
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-68253-2
Print ISBN: 978-3-319-68251-8
DOI: https://doi.org/10.1007/978-3-319-68253-2