Skip to main content

2020 | Buch

Matrix-Based Introduction to Multivariate Data Analysis

insite
SUCHEN

Über dieses Buch

This is the first textbook that allows readers who may be unfamiliar with matrices to understand a variety of multivariate analysis procedures in matrix forms. By explaining which models underlie particular procedures and what objective function is optimized to fit the model to the data, it enables readers to rapidly comprehend multivariate data analysis. Arranged so that readers can intuitively grasp the purposes for which multivariate analysis procedures are used, the book also offers clear explanations of those purposes, with numerical examples preceding the mathematical descriptions.

Supporting the modern matrix formulations by highlighting singular value decomposition among theorems in matrix algebra, this book is useful for undergraduate students who have already learned introductory statistics, as well as for graduate students and researchers who are not familiar with matrix-intensive formulations of multivariate data analysis.

The book begins by explaining fundamental matrix operations and the matrix expressions of elementary statistics. Then, it offers an introduction to popular multivariate procedures, with each chapter featuring increasing advanced levels of matrix algebra.

Further the book includes in six chapters on advanced procedures, covering advanced matrix operations and recently proposed multivariate procedures, such as sparse estimation, together with a clear explication of the differences between principal components and factor analyses solutions. In a nutshell, this book allows readers to gain an understanding of the latest developments in multivariate data science.

Inhaltsverzeichnis

Frontmatter

Elementary Statistics with Matrices

Frontmatter
Chapter 1. Elementary Matrix Operations
Abstract
The mathematics for studying the properties of matrices is called matrix algebra or linear algebra. This first chapter treats the introductory part of matrix algebra required for learning multivariate data analysis. We begin by explaining what a matrix is, in order to describe elementary matrix operations.
Kohei Adachi
Chapter 2. Intra-variable Statistics
Abstract
This chapter begins with expressing data sets by matrices. Then, we introduce two statistics (statistical indices), average and variance, where the average is an index value that represents scores and the variance stands for how widely scores disperse. Further, how the original scores are transformed into centered and standard scores using the average and variance is described.
Kohei Adachi
Chapter 3. Inter-variable Statistics
Abstract
In the previous chapter, we described the two statistics, average and variance, which summarize the distribution of scores within a variable. In this chapter, we introduce covariance and the correlation coefficient, which are the inter-variable statistics indicating the relationships between two variables. Finally, the rank of a matrix, an important notion in linear algebra, is introduced.
Kohei Adachi

Least Squares Procedures

Frontmatter
Chapter 4. Regression Analysis
Abstract
In the previous two chapters, we expressed elementary statistics in matrix form as preparation for introducing multivariate analysis procedures. The introduction to those procedures begins in this chapter.
Kohei Adachi
Chapter 5. Principal Component Analysis (Part 1)
Abstract
In regression analysis (Chap. 4), variables are classified as dependent and explanatory variables. Such a distinction does not exist in principal component analysis (PCA), which is introduced in this chapter.
Kohei Adachi
Chapter 6. Principal Component Analysis (Part 2)
Abstract
In this chapter, principal component analysis (PCA) is reformulated. The loss function to be minimized is the same as that in the previous chapter, but the constraints for the matrices are different.
Kohei Adachi
Chapter 7. Cluster Analysis
Abstract
The term “cluster” is synonymous with both “group” as a noun and “classify” as a verb. Cluster analysis, which is also simply called clustering, generally refers to the procedures for computationally classifying (i.e., clustering) individuals into groups (i.e., clusters) so that similar individuals are classified into the same group and mutually dissimilar ones are allocated to different groups. There are various procedures for performing cluster analysis. One of the most popular of these, called k-means clustering (KMC), which was first presented by MacQueen (1967), is introduced here.
Kohei Adachi

Maximum Likelihood Procedures

Frontmatter
Chapter 8. Maximum Likelihood and Multivariate Normal Distribution
Abstract
In the analysis procedures introduced in the last four chapters, parameters are estimated by the least squares (LS) method, as reviewed in Sect. 8.1. The remaining sections in this chapter serve to prepare readers for the following chapters, in which a maximum likelihood (ML) method, which differs from LS, is used for estimating parameters. That is, the ML method is introduced in Sect. 8.2, which is followed by describing the notion of probability density function and the ML method with multivariate normal distribution. Finally, ML-based model selection with information criteria is introduced.
Kohei Adachi
Chapter 9. Path Analysis
Abstract
Let us assume three variables, A, B, and C, to be analyzed.
Kohei Adachi
Chapter 10. Confirmatory Factor Analysis
Abstract
Let the positive correlations be observed among the test scores for physics, chemistry, and biology. In order to investigate the causal relationships among the three variables, we can use the path analysis from the previous chapter. For example, we can evaluate the model in which a person’s ability in physics influences his/her scores in chemistry and biology; ability in physics is a cause, while the scores in chemistry and biology are the results.
Kohei Adachi
Chapter 11. Structural Equation Modeling
Abstract
In confirmatory factor analysis (CFA), introduced in the previous chapter, all factors (latent variables) were causes (explanatory variables). An extended variant of CFA is structural equation modeling (SEM), in which the causal relationships among factors are considered, i.e., factors appear that are dependent variables.
Kohei Adachi
Chapter 12. Exploratory Factor Analysis (Part 1)
Abstract
As described in Chap. 10, factor analysis (FA) is classified into exploratory FA (EFA) and confirmatory FA (CFA), except the sparse FA treated in Chap. 22.
Kohei Adachi

Miscellaneous Procedures

Frontmatter
Chapter 13. Rotation Techniques
Abstract
In some analysis procedures, the solution for a data set is not uniquely determined; multiple solutions exist. An example of such procedures is exploratory factor analysis (EFA).
Kohei Adachi
Chapter 14. Canonical Correlation and Multiple Correspondence Analyses
Abstract
In this chapter, we treat procedures for the data set in which variables are classified into some groups. Such a data set is expressed as a block matrix, introduced in Sect. 14.1. Then, we describe canonical correlation analysis (CCA) for data with two groups of variables, which is followed by the introduction of generalized CCA (GCCA) for more than two groups of variables in Sect. 14.3.
Kohei Adachi
Chapter 15. Discriminant Analysis
Abstract
Discriminant analysis refers to a group of statistical procedures for analyzing a data set with individuals classified into certain groups, where the results of the analysis are used for finding the group of a new individual that is not included in the above data set.
Kohei Adachi
Chapter 16. Multidimensional Scaling
Abstract
xxx
Kohei Adachi

Advanced Procedures

Frontmatter
Chapter 17. Advanced Matrix Operations
Abstract
In this chapter we introduce matrix operations that are more advanced than those treated so far. We start by describing systems of linear equations, and then introduce the MoorePenrose (MP) inverse, considered as one of the most important operations for statistics, as well as singular value decomposition (SVD). The MP inverse is closely related to SVD and more useful than the ordinary inverse matrix, which is regarded as a special case of the MP inverse.
Kohei Adachi
Chapter 18. Exploratory Factor Analysis (Part 2)
Abstract
In Chap. 12, exploratory factor analysis (EFA) was formulated as a probabilistic model. However, EFA can also be formulated as a kind of matrix decomposition problem, without using the notion of probabilities.
Kohei Adachi
Chapter 19. Principal Component Analysis Versus Factor Analysis
Abstract
In this chapter, we refer to exploratory factor analysis simply as factor analysis and consider the principal component analysis formulated as reduced rank approximation as in Chap. 5. Principal component analysis (PCA) and factor analysis (FA) can be performed for identical data sets, with the purpose of dimension reduction. This reduction means that p observed variables, i.e., the p-dimensional scores, are reduced to lower-dimensional scores. The lower dimensions correspond to the m principal components in PCA and the m common factors in FA, with m < p. A major purpose of this chapter is to introduce mathematical facts that contrast PCA and FA solutions for an identical data set. The facts elucidate crucial differences between PCA and FA, which can suggest whether PCA or FA should be used for a particular data set.
Kohei Adachi
Chapter 20. Three-Way Principal Component Analysis
Abstract
In Chap. 5, principal component analysis (PCA) was introduced as the reduced rank approximation of a data matrix. This matrix should be noted to be a two-way array of rows × columns. We often encounter three-way data arrays, however, an example of which is a set of scores of examinees for multiple tests administered on different occasions. These scores form a three-way array of examinees × tests × occasions. Modified PCA procedures specified for similar three-way data are known as three-way PCA (3WPCA). Popular 3WPCA procedures are introduced in this chapter.
Kohei Adachi
Chapter 21. Sparse Regression Analysis
Abstract
A matrix or vector is said to be sparse when it includes a number of zero elements. Hence, the term sparse estimation refers to estimating a number of parameters as zeros. The developments in multivariate analysis procedures with sparse estimation started from modifications to the multiple regression analysis introduced in Chap. 4.
Kohei Adachi
Chapter 22. Sparse Factor Analysis
Abstract
In the last chapter, modified regression analysis procedures were presented, in which a coefficient vector is estimated so that it is sparse, i.e., includes a number of zero elements. Such sparse estimation can be incorporated into other multivariate analysis procedures, so as to provide sparse solutions. They can be easily interpreted, as we may only focus on their nonzero elements. As such, a number of sparse multivariate procedures have been developed, following the sparse estimation techniques developed in regression.
Kohei Adachi
Backmatter
Metadaten
Titel
Matrix-Based Introduction to Multivariate Data Analysis
verfasst von
Prof. Dr. Kohei Adachi
Copyright-Jahr
2020
Verlag
Springer Singapore
Electronic ISBN
978-981-15-4103-2
Print ISBN
978-981-15-4102-5
DOI
https://doi.org/10.1007/978-981-15-4103-2