Skip to main content

1986 | Buch

Principal Component Analysis

verfasst von: I. T. Jolliffe

Verlag: Springer New York

Buchreihe : Springer Series in Statistics

insite
SUCHEN

Über dieses Buch

Principal component analysis is probably the oldest and best known of the It was first introduced by Pearson (1901), techniques ofmultivariate analysis. and developed independently by Hotelling (1933). Like many multivariate methods, it was not widely used until the advent of electronic computers, but it is now weIl entrenched in virtually every statistical computer package. The central idea of principal component analysis is to reduce the dimen­ sionality of a data set in which there are a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This reduction is achieved by transforming to a new set of variables, the principal components, which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Computation of the principal components reduces to the solution of an eigenvalue-eigenvector problem for a positive-semidefinite symmetrie matrix. Thus, the definition and computation of principal components are straightforward but, as will be seen, this apparently simple technique has a wide variety of different applications, as weIl as a number of different deri­ vations. Any feelings that principal component analysis is a narrow subject should soon be dispelled by the present book; indeed some quite broad topics which are related to principal component analysis receive no more than a brief mention in the final two chapters.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set which consists of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables.
I. T. Jolliffe
Chapter 2. Mathematical and Statistical Properties of Population Principal Components
Abstract
In this chapter many of the mathematical and statistical properties of PCs are described, based on a known population covariance (or correlation) matrix Σ. Further properties are included in Chapter 3 but in the context of sample, rather than population, PCs. As well as being derived from a statistical viewpoint, PCs can be found using purely mathematical arguments; they are given by an orthogonal linear transformation of a set of variables which optimizes a certain algebraic criterion. In fact, the PCs optimize several different algebraic criteria and these optimization properties, together with their statistical implications, are described in the first section of the chapter.
I. T. Jolliffe
Chapter 3. Mathematical and Statistical Properties of Sample Principal Components
Abstract
The first part of this chapter will be similar in structure to Chapter 2, except that it will deal with properties of PCs obtained from a sample covariance (or correlation) matrix, rather than from a population covariance (or correlation) matrix. The first two sections of the chapter, as in Chapter 2, describe respectively many of the algebraic and geometric properties of PCs. Most of the properties discussed in Chapter 2 are almost the same for samples as for populations, and will only be mentioned again briefly. There are, however, some additional properties which are relevant only to sample PCs and these will be discussed more fully.
I. T. Jolliffe
Chapter 4. Principal Components as a Small Number of Interpretable Variables: Some Examples
Abstract
The original purpose of PCA was to reduce a large number (p) of variables to a much small number (m) of PCs whilst retaining as much as possible of the variation in the p original variables. The technique is especially useful if m « p,and if the m PCs can be readily interpreted.
I. T. Jolliffe
Chapter 5. Graphical Representation of Data Using Principal Components
Abstract
The main objective of a PCA is to reduce the dimensionality of a set of data. This is particularly advantageous if a set of data with many variables lies, in reality, close to a two-dimensional subspace (plane). In this case the data can be plotted with respect to these two dimensions, thus giving a straightforward visual representation of what the data look like, instead of having a large mass of numbers to digest. If the data fall close to a three-dimensional subspace it is still possible, with a little effort, to gain a good visual impression of the data, especially if a computer is available with interactive graphics. Even with slightly more dimensions it is possible, with some degree of ingenuity, to get a ‘picture’ of the data—see, for example, Chapters 10–12 (by Tukey and Tukey) in Barnett (1981)—although we shall concentrate almost entirely on two-dimensional representations in the present chapter.
I. T. Jolliffe
Chapter 6. Choosing a Subset of Principal Components or Variables
Abstract
In this chapter two separate, but related, topics are considered, both of which are concerned with choosing a subset of variables. In the first section, the choice to be examined is how many PCs adequately account for the total variation in x. The major objective in many applications of PCA is to replace the p elements of x by a much smaller number, m, of PCs, which nevertheless discard very little information. It is crucial to know how small m can be taken without serious information loss. Various rules, mostly ad hoc, have been proposed for determining a suitable value of m, and these are discussed in Section 6.1. Examples of their use are given in Section 6.2.
I. T. Jolliffe
Chapter 7. Principal Component Analysis and Factor Analysis
Abstract
Principal component analysis has often been dealt with in textbooks as a special case of factor analysis, and this tendency has been continued by many computer packages which treat PCA as one option in a program for factor analysis—see Appendix A2. This view is misguided since PCA and factor analysis, as usually defined, are really quite distinct techniques. The confusion may have arisen, in part, because of Hotelling’s (1933) original paper, in which principal components were introduced in the context of providing a small number of ‘more fundamental’ variables which determine the values of the p original variables. This is very much in the spirit of the factor model introduced in Section 7.1, although Girschick (1936) indicates that there were soon criticisms of Hotelling’s method of PCs, as being inappropriate for factor analysis. Further confusion results from the fact that practitioners of ‘factor analysis’ do not always have the same definition of the technique (see Jackson, 1981). The definition adopted in this chapter is, however, fairly standard.
I. T. Jolliffe
Chapter 8. Principal Components in Regression Analysis
Abstract
As illustrated in the other chapters of this book, research continues into a wide variety of methods of using PCA in analysing various types of data. However, in no area has this research been more active in recent years, than in investigating approaches to regression analysis which use PCs in some form or another.
I. T. Jolliffe
Chapter 9. Principal Components Used with Other Multivariate Techniques
Abstract
Principal component analysis is often used as a dimension-reducing technique within some other type of analysis. For example, Chapter 8 described the use of PCs as regressor variables in a multiple regression analysis. The present chapter discusses three multivariate techniques, namely discriminant analysis, cluster analysis and canonical correlation analysis; for each of these three techniques, examples are given in the literature which use PCA as a dimension-reducing technique.
I. T. Jolliffe
Chapter 10. Outlier Detection, Influential Observations and Robust Estimation of Principal Components
Abstract
This chapter deals with three related topics, which are all concerned with situations where some of the observations may, in some way, be atypical of the bulk of the data.
I. T. Jolliffe
Chapter 11. Principal Component Analysis for Special Types of Data
Abstract
In much of statistical inference, it is assumed that a data set consists of n independent observations on one or more random variables, x, and this assumption is often implicit when a PCA is done. Another assumption which also may be made implicitly is that x consists of continuous variables, with perhaps the stronger assumption of multivariate normality if we require to make some formal inference for the PCs.
I. T. Jolliffe
Chapter 12. Generalizations and Adaptations of Principal Component Analysis
Abstract
The basic technique of PCA has been generalized or adapted in many ways, and some have already been discussed, in particular in Chapter 11 where adaptations for special types of data were described. This final chapter discusses a number of additional generalizations and modifications; for several of them the discussion is very brief in comparison to the large amount of material that has appeared in the literature, because most have yet to be used widely in practice.
I. T. Jolliffe
Backmatter
Metadaten
Titel
Principal Component Analysis
verfasst von
I. T. Jolliffe
Copyright-Jahr
1986
Verlag
Springer New York
Electronic ISBN
978-1-4757-1904-8
Print ISBN
978-1-4757-1906-2
DOI
https://doi.org/10.1007/978-1-4757-1904-8