Homogeneity analysis using absolute deviations
Introduction
Homogeneity Analysis (also known as Multiple Correspondence Analysis (MCA)) is a well-known technique to make graphical representations of categorical multivariate data (Gifi, 1990). It can also be presented as a technique to produce informative layouts of bipartite graphs (Michailidis and de Leeuw, 1998; De Leeuw and Michailidis 2000a, De Leeuw and Michailidis 2000b).
The setting is as follows: data have been collected for N objects on J categorical variables with kj categories per variable. Let K=∑j=1Jkj be the total number of categories in the data set. Then, a graph with nodes (vertices) corresponding to the N objects and the K categories and with edges linking the object nodes to the category nodes, and thus reflecting which objects belong to which categories, contains the same information as the original data set. The latter information is usually represented in matrix form through a binary (0-1) matrix . It can be easily shown that the (N+k)×(N+k) matrixcorresponds to the adjacency matrix of our graph. The above defined multivariate data graph with vertex set V and edge set E has a special structure, namely that the N nodes corresponding to the objects are not connected between themselves and similarly for the K category nodes. This can also be seen by the two zero submatrices in the adjacency matrix A of . Thus, we are dealing with a bipartite graph.
A drawing of the graph is a mapping of its vertex set V into p-dimensional space. Adjacent points in the graph are connected by lines in the drawing. This goes in the direction of making a picture of the data, and when things work out well, a picture is worth a lot of numbers, especially when these numbers are just zeros and ones as several examples in the literature have shown (Gifi, 1990; Michailidis and de Leeuw, 1998).
The quality of the drawing is measured by the loss functionwhere the xi's contain the coordinates of the N objects and the yj the coordinates of the K categories of all the variables in the p-dimensional space, and d denotes the Euclidean distance. The objective is to arrange the vertices (objects and categories) of the graph in such a way, so that the loss would be small. Thus points which are connected by lines should be close, i.e. the lines in the drawing should be short.
If we design algorithms to minimize pull2(X,Y), then we must make sure that the perfect, but trivial, solution X=Y=0 is excluded. This is done by imposing normalization constraints. For example, in MCA drawings are normalized by requiring that X′X=I. Under this normalization the solution to problem (1.1) is characterized by the centroid principle (Gifi, 1990), namely that the category points are located in the center of gravity of the objects they belong to. An additional advantage of this normalization is that the optimal solution is given by an eigenvalue problem (Gifi, 1990). The p=2-dimensional solution for the Guttman–Bell and sleeping bags data sets (for their description see Section 4) that illustrate the centroid principle are given in Fig. 1.
However, MCA has a few drawbacks; the major ones are: (i) the influence of objects with ‘rare’ profiles that tend to dominate the solution (Michailidis and de Leeuw, 1998), as can be seen on the left part of the picture for the Guttman–Bell drawing and (ii) the presence of horseshoes (De Leeuw et al., 1980).
One possible solution to these is to use a more ‘robust’ loss function, such asi.e. it is the same loss function as (1.1), but without squaring the distance. The same normalization is used as before, requiring that X′X=I. This is a special case of a very general framework introduced in Michailidis and de Leeuw (2001), where the square of the distance in the definition of the loss function (1.1) is replaced by a general function φ(d). Robust estimation has a very long history in statistics (Huber, 1981). The case (1.2) was discussed earlier in Heiser (1987) in the context of correspondence analysis (graphical representation of a two-way table) who gave an algorithm and an example that corresponded to our framework. The example showed clustering, in the sense that many of the objects and categories in the optimal drawing on the plane were collapsed into single points, and only very few distinct points were left. Heiser (1987, p. 349) made the following comments regarding this clustering phenomenon.
How should we appreciate this result? There are perhaps two views. One is that in the process of mapping the original table into a spatial configuration too much of the fine detail is lost, and that the approach leads to a dead end. The other is that it appears to be possible to devise a class of clustering techniques that is smoothly related to a more continuous representation, and that seems to avoid the usual combinatorial complications.
In Fig. 2, the optimal graph drawings of the Guttman–Bell and sleeping bags data sets under loss function (1.2) are shown. In both cases a very strong clustering pattern emerges for the object points; i.e. all of the object nodes occupy only three locations. On the other hand, the category points still seem to obey some form of the centroid principle for the Guttman–Bell example.
Experience with many other categorical data sets with varying numbers of objects, variables and categories per variable confirm the above empirical finding; namely, that the optimal 2-dimensional layout consists of three object nodes Michailidis and de Leeuw (2000). Analogously, the three-dimensional layouts consist of four object nodes. Finally, for p=1 the result also holds, namely that the optimal solution consists of two points only, and is rigorously proven in De Leeuw and Michailidis (2003). Obviously, such solutions become totally uninteresting from a data analysis point of view, since they are unable to uncover interesting patterns in the data. Hence, it is of great interest to gain insight into the origins of this phenomenon and examine possible alternatives that overcome the problem.
The paper is organized as follows: Section 2 discusses the structure of the loss function (1.2) and presents several optimization algorithms for computing the optimal solution. In Section 3, the structure of the optimal solution is investigated and in Section 4 the performance of the various algorithms is examined. Finally, in Section 5 we look into other loss functions and present some potential solutions to the strong clustering problem observed.
Section snippets
The loss function and its optimization
Our objective is to minimize loss function (1.2) over all N×p matrices X satisfying X′X=I and over all K×p matrices Y. The N×K matrix W={wij} is the off-diagonal part of the adjacency matrix A of the bipartite multivariate data graph .
For purposes of regularization, to avoid problems with differentiability and division by zero, we actually definethroughout, where ε is small, and we minimizeIn the remainder of the paper we will
Guttman–Bell dataset
This small dataset dealing with attitudes of social groups (also analyzed in Guttman (1968) and in Gifi (1990)) consists of 7 objects and 5 variables with a total of K=17 categories. In Fig. 3, the homogeneity analysis solution under the pull2 and the pull1 loss functions are given. The lines indicate which objects under the pull2 solution are mapped to three points that describe the pull1 solution. This mapping of the object points to one of the 3 locations is completely determined by the
The structure of the optimal solution
The block relaxation algorithms has provided insight into the structure of the optimal solution with respect to the category points. Since the yjs must correspond to multivariate medians, their position in the optimal graph layout is completely determined by this requirement.
Moreover, on the basis of extensive numerical experience (see previous section and also Michailidis and de Leeuw (2000)) we make the following conjecture. Conjecture 4.1 The p-dimensional optimal solution X that minimizes the pull(X,Y)
Discussion: other loss functions and potential solutions
The pull1(X,Y) function used so far is a special case of the more general class of functions defined byThis is a family of convex functions with growth rates slower than the quadratic. The class contains as extreme cases both the pull2 and the pull1 functions. An application of Young's inequality shows that we can construct a majorization algorithm to minimize members of this class under the X′X=I minimization constraint. Specifically we have that
Acknowledgements
The authors would like to thank the AE and an anonymous referee for useful suggestions that improved the presentation. The work of George Michailidis was supported in part by NSF under grants IIS-9988095 and DMS-0214171.
References (22)
Correspondence analysis with least absolute residuals
Comput. Statist. Data Anal
(1987)- De Leeuw, J., Michailidis, G., 2000a. Graph layout techniques and multidimensional data analysis. In: Bruss, F.T., Le...
- et al.
Block relaxation algorithms in statistics
J. Comput. Graphical Statist
(2000) - De Leeuw, J., Michailidis, G., 2003. Weber correspondence analysis: the one dimensional case. Technical Report #343,...
- et al.
Nonlinear principal components analysis with B-splines
Methods Oper. Res
(1980) - Drezner, Z. (Ed.) 1995. Facility Location. Springer, New...
- et al.
The geometry of algorithms with orthogonality constraints
SIAM J. Matrix Anal. Appl
(1999) Nonlinear Multivariate Analysis
(1990)A general non-metric technique for fitting the smallest coordinate space for a configuration of points
Psychometrika
(1968)Robust Statistics
(1981)
Weighting least squares fitting using ordinary least squares algorithms
Psychometrika
Cited by (8)
L<inf>1</inf>-norm projection pursuit principal component analysis
2006, Computational Statistics and Data AnalysisAn improved reliability model for FMEA using probabilistic linguistic term sets and TODIM method
2022, Annals of Operations ResearchThe relative performance of foreign-owned subsidiaries and domestic companies
2019, Post-Communist EconomiesEnglish Language Teachers’ Burnout Within the Cultural Dimensions Framework
2016, Asia-Pacific Education ResearcherCorrespondence Analysis: Theory, Practice and New Strategies
2014, Correspondence Analysis: Theory, Practice and New StrategiesHOMALS for dimension reduction in information retrieval
2012, Studies in Classification, Data Analysis, and Knowledge Organization