Skip to main content
main-content

Über dieses Buch

This book expounds the principle and related applications of nonlinear principal component analysis (PCA), which is useful method to analyze mixed measurement levels data. In the part dealing with the principle, after a brief introduction of ordinary PCA, a PCA for categorical data (nominal and ordinal) is introduced as nonlinear PCA, in which an optimal scaling technique is used to quantify the categorical variables. The alternating least squares (ALS) is the main algorithm in the method. Multiple correspondence analysis (MCA), a special case of nonlinear PCA, is also introduced. All formulations in these methods are integrated in the same manner as matrix operations. Because any measurement levels data can be treated consistently as numerical data and ALS is a very powerful tool for estimations, the methods can be utilized in a variety of fields such as biometrics, econometrics, psychometrics, and sociology. In the applications part of the book, four applications are introduced: variable selection for mixed measurement levels data, sparse MCA, joint dimension reduction and clustering methods for categorical data, and acceleration of ALS computation. The variable selection methods in PCA that originally were developed for numerical data can be applied to any types of measurement levels by using nonlinear PCA. Sparseness and joint dimension reduction and clustering for nonlinear data, the results of recent studies, are extensions obtained by the same matrix operations in nonlinear PCA. Finally, an acceleration algorithm is proposed to reduce the problem of computational cost in the ALS iteration in nonlinear multivariate methods. This book thus presents the usefulness of nonlinear PCA which can be applied to different measurement levels data in diverse fields. As well, it covers the latest topics including the extension of the traditional statistical method, newly proposed nonlinear methods, and computational efficiency in the methods.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract
The principles of nonlinear principal component analysis and multiple correspondence analysis, which are useful methods for analyzing mixed measurement level data, and related applications are introduced in this book.
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Erratum to: Nonlinear Principal Component Analysis and Its Applications

Without Abstract
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Nonlinear Principal Component Analysis

Frontmatter

Chapter 2. Nonlinear Principal Component Analysis

Abstract
Principal components analysis (PCA) is a commonly used descriptive multivariate method for handling quantitative data and can be extended to deal with mixed measurement level data. For the extended PCA with such a mixture of quantitative and qualitative data, we require the quantification of qualitative data in order to obtain optimal scaling data. PCA with optimal scaling is referred to as nonlinear PCA, (Gifi, Nonlinear Multivariate Analysis. Wiley, Chichester, 1990). Nonlinear PCA including optimal scaling alternates between estimating the parameters of PCA and quantifying qualitative data. The alternating least squares (ALS) algorithm is used as the algorithm for nonlinear PCA and can find least squares solutions by minimizing two types of loss functions: a low-rank approximation and homogeneity analysis with restrictions. PRINCIPALS of Young et al. (Principal components of mixed measurement level multivariate data: an alternating least squares method with optimal scaling features 43:279–281, 1978) and PRINCALS of Gifi (Nonlinear Multivariate Analysis. Wiley, Chichester, 1990) are used for the computation.
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Chapter 3. Multiple Correspondence Analysis

Abstract
Multiple correspondence analysis (MCA) is a widely used technique to analyze categorical data and aims to reduce large sets of variables into smaller sets of components that summarize the information contained in the data. The purpose of MCA is the same as that of principal component analysis (PCA), and MCA can be regarded as an adaptation to the categorical data of PCA (Jolliffe, Principal Component Analysis, 2002). There are various approaches to formulate an MCA. We introduce a formulation in which the quantified data matrix is approximated by a lower-rank matrix using the quantification technique proposed by Murakami et al. (Non-metric principal component analysis for categorical variables with multiple quantifications, 1999).
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Applications and Related Topics

Frontmatter

Chapter 4. Variable Selection in Nonlinear Principal Component Analysis

Abstract
Chapter 2 shows that any measurement level multivariate data can be uniformly dealt with as numerical data in the context of principal component analysis (PCA) by using the alternating least squares with optimal scaling. This means that all variables in the data can be analyzed as numerical variables, and, therefore, we can solve the variable selection problem for mixed measurement level data using any existing variable selection method developed for numerical variables. In this chapter, we discuss variable selection in nonlinear PCA. We select a subset of variables that represents all variables as far as possible from mixed measurement level data using criteria in the modified PCA, which naturally includes a variable selection procedure.
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Chapter 5. Sparse Multiple Correspondence Analysis

Abstract
In multiple correspondence analysis (MCA), an estimated solution can be transformed into a simple structure in order to simplify the interpretation. The rotation technique is widely used for this purpose. However, an alternative approach, called sparse MCA, has also been proposed. One of the advantages of sparse MCA is that, in contrast to unrotated or rotated ordinary MCA loadings, some loadings in sparse MCA can be exactly zero. A real data example demonstrates that sparse MCA can provide simple solutions.
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Chapter 6. Joint Dimension Reduction and Clustering

Abstract
Cluster analysis is a technique that attempts to divide objects into similar groups. As described in previous studies, cluster analysis works poorly when variables that do not reflect the clustering structure are present in the dataset or when the number of variables is large. In order to tackle this problem, several methods have been proposed that jointly perform clustering of objects and dimension reduction of the variables. In this chapter, we review the technique whereby multiple correspondence analysis and k-means clustering are combined in order to investigate the relationships between qualitative variables.
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Chapter 7. Acceleration of Convergence of the Alternating Least Squares Algorithm for Nonlinear Principal Component Analysis

Abstract
Nonlinear principal component analysis (PCA) requires iterative computation using the alternating least squares (ALS) algorithm, which alternates between optimal scaling for quantifying qualitative data and the analysis of optimally scaled data using the ordinary PCA approach. PRINCIPALS of Young et al. (Psychometrika 43:279–281) (1978) and PRINCALS of Gifi (Nonlinear Multivariate Analysis. Wiley, Chichester) (1990) are the ALS algorithms used for nonlinear PCA. When applying nonlinear PCA to very large data sets of numerous nominal and ordinal variables, the ALS algorithm may require many iterations and significant computation time to converge. One reason for the slow convergence of the ALS algorithm is that the speed of convergence is linear. In order to accelerate the convergence of the ALS algorithm, Kuroda et al. (Comput Stat Data Anal 55:143–153) (2011) developed a new iterative algorithm using the vector \(\varepsilon \) algorithm by Wynn (Math Comput 16:301–322) (1962).
Yuichi Mori, Masahiro Kuroda, Naomichi Makino

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise