nach oben

2020 | Buch

Matrix-Based Introduction to Multivariate Data Analysis

verfasst von: Prof. Dr. Kohei Adachi

Verlag: Springer Singapore

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This is the first textbook that allows readers who may be unfamiliar with matrices to understand a variety of multivariate analysis procedures in matrix forms. By explaining which models underlie particular procedures and what objective function is optimized to fit the model to the data, it enables readers to rapidly comprehend multivariate data analysis. Arranged so that readers can intuitively grasp the purposes for which multivariate analysis procedures are used, the book also offers clear explanations of those purposes, with numerical examples preceding the mathematical descriptions.

Supporting the modern matrix formulations by highlighting singular value decomposition among theorems in matrix algebra, this book is useful for undergraduate students who have already learned introductory statistics, as well as for graduate students and researchers who are not familiar with matrix-intensive formulations of multivariate data analysis.

The book begins by explaining fundamental matrix operations and the matrix expressions of elementary statistics. Then, it offers an introduction to popular multivariate procedures, with each chapter featuring increasing advanced levels of matrix algebra.

Further the book includes in six chapters on advanced procedures, covering advanced matrix operations and recently proposed multivariate procedures, such as sparse estimation, together with a clear explication of the differences between principal components and factor analyses solutions. In a nutshell, this book allows readers to gain an understanding of the latest developments in multivariate data science.

Inhaltsverzeichnis

Frontmatter

Elementary Statistics with Matrices

Frontmatter

Chapter 1. Elementary Matrix Operations

Abstract

The mathematics for studying the properties of matrices is called matrix algebra or linear algebra. This first chapter treats the introductory part of matrix algebra required for learning multivariate data analysis. We begin by explaining what a matrix is, in order to describe elementary matrix operations.

Kohei Adachi

Chapter 2. Intra-variable Statistics

Abstract

This chapter begins with expressing data sets by matrices. Then, we introduce two statistics (statistical indices), average and variance, where the average is an index value that represents scores and the variance stands for how widely scores disperse. Further, how the original scores are transformed into centered and standard scores using the average and variance is described.

Kohei Adachi

Chapter 3. Inter-variable Statistics

Abstract

In the previous chapter, we described the two statistics, average and variance, which summarize the distribution of scores within a variable. In this chapter, we introduce covariance and the correlation coefficient, which are the inter-variable statistics indicating the relationships between two variables. Finally, the rank of a matrix, an important notion in linear algebra, is introduced.

Kohei Adachi

Least Squares Procedures

Frontmatter

Chapter 4. Regression Analysis

Abstract

In the previous two chapters, we expressed elementary statistics in matrix form as preparation for introducing multivariate analysis procedures. The introduction to those procedures begins in this chapter.

Kohei Adachi

Chapter 5. Principal Component Analysis (Part 1)

Abstract

In regression analysis (Chap. 4), variables are classified as dependent and explanatory variables. Such a distinction does not exist in principal component analysis (PCA), which is introduced in this chapter.

Kohei Adachi

Chapter 6. Principal Component Analysis (Part 2)

Abstract

In this chapter, principal component analysis (PCA) is reformulated. The loss function to be minimized is the same as that in the previous chapter, but the constraints for the matrices are different.

Kohei Adachi

Chapter 7. Cluster Analysis

Abstract

The term “cluster” is synonymous with both “group” as a noun and “classify” as a verb. Cluster analysis, which is also simply called clustering, generally refers to the procedures for computationally classifying (i.e., clustering) individuals into groups (i.e., clusters) so that similar individuals are classified into the same group and mutually dissimilar ones are allocated to different groups. There are various procedures for performing cluster analysis. One of the most popular of these, called k-means clustering (KMC), which was first presented by MacQueen (1967), is introduced here.

Kohei Adachi

Maximum Likelihood Procedures

Frontmatter

Chapter 8. Maximum Likelihood and Multivariate Normal Distribution

Abstract

In the analysis procedures introduced in the last four chapters, parameters are estimated by the least squares (LS) method, as reviewed in Sect. 8.1. The remaining sections in this chapter serve to prepare readers for the following chapters, in which a maximum likelihood (ML) method, which differs from LS, is used for estimating parameters. That is, the ML method is introduced in Sect. 8.2, which is followed by describing the notion of probability density function and the ML method with multivariate normal distribution. Finally, ML-based model selection with information criteria is introduced.

Kohei Adachi

Chapter 9. Path Analysis

Abstract

Let us assume three variables, A, B, and C, to be analyzed.

Kohei Adachi

Chapter 10. Confirmatory Factor Analysis

Abstract

Let the positive correlations be observed among the test scores for physics, chemistry, and biology. In order to investigate the causal relationships among the three variables, we can use the path analysis from the previous chapter. For example, we can evaluate the model in which a person’s ability in physics influences his/her scores in chemistry and biology; ability in physics is a cause, while the scores in chemistry and biology are the results.

Kohei Adachi

Chapter 11. Structural Equation Modeling

Abstract

In confirmatory factor analysis (CFA), introduced in the previous chapter, all factors (latent variables) were causes (explanatory variables). An extended variant of CFA is structural equation modeling (SEM), in which the causal relationships among factors are considered, i.e., factors appear that are dependent variables.

Kohei Adachi

Chapter 12. Exploratory Factor Analysis (Part 1)

Abstract

As described in Chap. 10, factor analysis (FA) is classified into exploratory FA (EFA) and confirmatory FA (CFA), except the sparse FA treated in Chap. 22.

Kohei Adachi

Miscellaneous Procedures

Frontmatter

Chapter 13. Rotation Techniques

Abstract

In some analysis procedures, the solution for a data set is not uniquely determined; multiple solutions exist. An example of such procedures is exploratory factor analysis (EFA).

Kohei Adachi

Chapter 14. Canonical Correlation and Multiple Correspondence Analyses

Abstract

In this chapter, we treat procedures for the data set in which variables are classified into some groups. Such a data set is expressed as a block matrix, introduced in Sect. 14.1. Then, we describe canonical correlation analysis (CCA) for data with two groups of variables, which is followed by the introduction of generalized CCA (GCCA) for more than two groups of variables in Sect. 14.3.

Kohei Adachi

Chapter 15. Discriminant Analysis

Abstract

Discriminant analysis refers to a group of statistical procedures for analyzing a data set with individuals classified into certain groups, where the results of the analysis are used for finding the group of a new individual that is not included in the above data set.

Kohei Adachi

Chapter 16. Multidimensional Scaling

Abstract

xxx

Kohei Adachi

Advanced Procedures

Frontmatter

Chapter 17. Advanced Matrix Operations

Abstract

In this chapter we introduce matrix operations that are more advanced than those treated so far. We start by describing systems of linear equations, and then introduce the Moore–Penrose (MP) inverse, considered as one of the most important operations for statistics, as well as singular value decomposition (SVD). The MP inverse is closely related to SVD and more useful than the ordinary inverse matrix, which is regarded as a special case of the MP inverse.

Kohei Adachi

Chapter 18. Exploratory Factor Analysis (Part 2)

Abstract

In Chap. 12, exploratory factor analysis (EFA) was formulated as a probabilistic model. However, EFA can also be formulated as a kind of matrix decomposition problem, without using the notion of probabilities.

Kohei Adachi

Chapter 19. Principal Component Analysis Versus Factor Analysis

Abstract

In this chapter, we refer to exploratory factor analysis simply as factor analysis and consider the principal component analysis formulated as reduced rank approximation as in Chap. 5. Principal component analysis (PCA) and factor analysis (FA) can be performed for identical data sets, with the purpose of dimension reduction. This reduction means that p observed variables, i.e., the p-dimensional scores, are reduced to lower-dimensional scores. The lower dimensions correspond to the m principal components in PCA and the m common factors in FA, with m < p. A major purpose of this chapter is to introduce mathematical facts that contrast PCA and FA solutions for an identical data set. The facts elucidate crucial differences between PCA and FA, which can suggest whether PCA or FA should be used for a particular data set.

Kohei Adachi

Chapter 20. Three-Way Principal Component Analysis

Abstract

In Chap. 5, principal component analysis (PCA) was introduced as the reduced rank approximation of a data matrix. This matrix should be noted to be a two-way array of rows × columns. We often encounter three-way data arrays, however, an example of which is a set of scores of examinees for multiple tests administered on different occasions. These scores form a three-way array of examinees × tests × occasions. Modified PCA procedures specified for similar three-way data are known as three-way PCA (3WPCA). Popular 3WPCA procedures are introduced in this chapter.

Kohei Adachi

Chapter 21. Sparse Regression Analysis

Abstract

A matrix or vector is said to be sparse when it includes a number of zero elements. Hence, the term sparse estimation refers to estimating a number of parameters as zeros. The developments in multivariate analysis procedures with sparse estimation started from modifications to the multiple regression analysis introduced in Chap. 4.

Kohei Adachi

Chapter 22. Sparse Factor Analysis

Abstract

In the last chapter, modified regression analysis procedures were presented, in which a coefficient vector is estimated so that it is sparse, i.e., includes a number of zero elements. Such sparse estimation can be incorporated into other multivariate analysis procedures, so as to provide sparse solutions. They can be easily interpreted, as we may only focus on their nonzero elements. As such, a number of sparse multivariate procedures have been developed, following the sparse estimation techniques developed in regression.

Kohei Adachi

Backmatter

Titel: Matrix-Based Introduction to Multivariate Data Analysis
verfasst von: Prof. Dr. Kohei Adachi
Verlag: Springer Singapore
Electronic ISBN: 978-981-15-4103-2
Print ISBN: 978-981-15-4102-5
DOI: https://doi.org/10.1007/978-981-15-4103-2