Homogeneity analysis using absolute deviations

https://doi.org/10.1016/j.csda.2004.03.007Get rights and content

Abstract

Homogeneity analysis is a technique for making graphical representations of categorical multivariate data sets. Such data sets can also be represented by the adjacency matrix of a bipartite graph. Homogeneity analysis optimizes a weighted least-squares criterion and the optimal graph layout is computed by an alternating least squares algorithm. Heiser Comput. Statist. Data Anal. (1987) 337, looked at homogeneity analysis under a more robust to outliers criterion, namely a weighted least absolute deviations criterion. In this paper, we take an in-depth look at the mathematical structure of this problem and show that the graph drawings are created by reciprocal computation of multivariate medians. Several algorithms for computing the solution are investigated and applications to actual data suggest that the resulting p-dimensional drawings (p⩾2) are degenerate, in the sense that all object points are clustered in p+1 locations. We also examine some variations of the criterion used and conclude that the generate solutions observed are a consequence of the normalization constraint employed in this class of problems.

Introduction

Homogeneity Analysis (also known as Multiple Correspondence Analysis (MCA)) is a well-known technique to make graphical representations of categorical multivariate data (Gifi, 1990). It can also be presented as a technique to produce informative layouts of bipartite graphs (Michailidis and de Leeuw, 1998; De Leeuw and Michailidis 2000a, De Leeuw and Michailidis 2000b).

The setting is as follows: data have been collected for N objects on J categorical variables with kj categories per variable. Let K=∑j=1Jkj be the total number of categories in the data set. Then, a graph G with nodes (vertices) corresponding to the N objects and the K categories and with edges linking the object nodes to the category nodes, and thus reflecting which objects belong to which categories, contains the same information as the original data set. The latter information is usually represented in matrix form through a binary (0-1) matrix W={wij|,i=1,…,N,j=1,…,K}. It can be easily shown that the (N+k)×(N+k) matrixA=0WW′0corresponds to the adjacency matrix of our graph. The above defined multivariate data graph G with vertex set V and edge set E has a special structure, namely that the N nodes corresponding to the objects are not connected between themselves and similarly for the K category nodes. This can also be seen by the two zero submatrices in the adjacency matrix A of G. Thus, we are dealing with a bipartite graph.

A drawing of the graph G is a mapping of its vertex set V into p-dimensional space. Adjacent points in the graph are connected by lines in the drawing. This goes in the direction of making a picture of the data, and when things work out well, a picture is worth a lot of numbers, especially when these numbers are just zeros and ones as several examples in the literature have shown (Gifi, 1990; Michailidis and de Leeuw, 1998).

The quality of the drawing is measured by the loss functionpull2(X,Y)=i=1Nj=Kmwijd2(xi,yj),where the xi's contain the coordinates of the N objects and the yj the coordinates of the K categories of all the variables in the p-dimensional space, and d denotes the Euclidean distance. The objective is to arrange the vertices (objects and categories) of the graph in such a way, so that the loss would be small. Thus points which are connected by lines should be close, i.e. the lines in the drawing should be short.

If we design algorithms to minimize pull2(X,Y), then we must make sure that the perfect, but trivial, solution X=Y=0 is excluded. This is done by imposing normalization constraints. For example, in MCA drawings are normalized by requiring that XX=I. Under this normalization the solution to problem (1.1) is characterized by the centroid principle (Gifi, 1990), namely that the category points are located in the center of gravity of the objects they belong to. An additional advantage of this normalization is that the optimal solution is given by an eigenvalue problem (Gifi, 1990). The p=2-dimensional solution for the Guttman–Bell and sleeping bags data sets (for their description see Section 4) that illustrate the centroid principle are given in Fig. 1.

However, MCA has a few drawbacks; the major ones are: (i) the influence of objects with ‘rare’ profiles that tend to dominate the solution (Michailidis and de Leeuw, 1998), as can be seen on the left part of the picture for the Guttman–Bell drawing and (ii) the presence of horseshoes (De Leeuw et al., 1980).

One possible solution to these is to use a more ‘robust’ loss function, such aspull1(X,Y)=i=1Nj=1Kwijd(xi,yj),i.e. it is the same loss function as (1.1), but without squaring the distance. The same normalization is used as before, requiring that XX=I. This is a special case of a very general framework introduced in Michailidis and de Leeuw (2001), where the square of the distance in the definition of the loss function (1.1) is replaced by a general function φ(d). Robust estimation has a very long history in statistics (Huber, 1981). The case (1.2) was discussed earlier in Heiser (1987) in the context of correspondence analysis (graphical representation of a two-way table) who gave an algorithm and an example that corresponded to our framework. The example showed clustering, in the sense that many of the objects and categories in the optimal drawing on the plane were collapsed into single points, and only very few distinct points were left. Heiser (1987, p. 349) made the following comments regarding this clustering phenomenon.

How should we appreciate this result? There are perhaps two views. One is that in the process of mapping the original table into a spatial configuration too much of the fine detail is lost, and that the approach leads to a dead end. The other is that it appears to be possible to devise a class of clustering techniques that is smoothly related to a more continuous representation, and that seems to avoid the usual combinatorial complications.

In Fig. 2, the optimal graph drawings of the Guttman–Bell and sleeping bags data sets under loss function (1.2) are shown. In both cases a very strong clustering pattern emerges for the object points; i.e. all of the object nodes occupy only three locations. On the other hand, the category points still seem to obey some form of the centroid principle for the Guttman–Bell example.

Experience with many other categorical data sets with varying numbers of objects, variables and categories per variable confirm the above empirical finding; namely, that the optimal 2-dimensional layout consists of three object nodes Michailidis and de Leeuw (2000). Analogously, the three-dimensional layouts consist of four object nodes. Finally, for p=1 the result also holds, namely that the optimal solution consists of two points only, and is rigorously proven in De Leeuw and Michailidis (2003). Obviously, such solutions become totally uninteresting from a data analysis point of view, since they are unable to uncover interesting patterns in the data. Hence, it is of great interest to gain insight into the origins of this phenomenon and examine possible alternatives that overcome the problem.

The paper is organized as follows: Section 2 discusses the structure of the loss function (1.2) and presents several optimization algorithms for computing the optimal solution. In Section 3, the structure of the optimal solution is investigated and in Section 4 the performance of the various algorithms is examined. Finally, in Section 5 we look into other loss functions and present some potential solutions to the strong clustering problem observed.

Section snippets

The loss function and its optimization

Our objective is to minimize loss function (1.2) over all N×p matrices X satisfying XX=I and over all K×p matrices Y. The N×K matrix W={wij} is the off-diagonal part of the adjacency matrix A of the bipartite multivariate data graph G.

For purposes of regularization, to avoid problems with differentiability and division by zero, we actually defined2ε(xi,yj)≜(xi−yj)′(xi−yj)+ε2throughout, where ε is small, and we minimizepull(1,ε)(X,Y)=i=1nj=1mwijdε(xi,yj).In the remainder of the paper we will

Guttman–Bell dataset

This small dataset dealing with attitudes of social groups (also analyzed in Guttman (1968) and in Gifi (1990)) consists of 7 objects and 5 variables with a total of K=17 categories. In Fig. 3, the homogeneity analysis solution under the pull2 and the pull1 loss functions are given. The lines indicate which objects under the pull2 solution are mapped to three points that describe the pull1 solution. This mapping of the object points to one of the 3 locations is completely determined by the

The structure of the optimal solution

The block relaxation algorithms has provided insight into the structure of the optimal solution with respect to the category points. Since the yjs must correspond to multivariate medians, their position in the optimal graph layout is completely determined by this requirement.

Moreover, on the basis of extensive numerical experience (see previous section and also Michailidis and de Leeuw (2000)) we make the following conjecture.

Conjecture 4.1

The p-dimensional optimal solution X that minimizes the pull(X,Y)

Discussion: other loss functions and potential solutions

The pull1(X,Y) function used so far is a special case of the more general class of functions defined bypullβ(X,Y)=i=1Nj=1Kwijd(xi,yj)β,β∈[1,2].This is a family of convex functions with growth rates slower than the quadratic. The class contains as extreme cases both the pull2 and the pull1 functions. An application of Young's inequality shows that we can construct a majorization algorithm to minimize members of this class under the XX=I minimization constraint. Specifically we have thatd(xi,yj

Acknowledgements

The authors would like to thank the AE and an anonymous referee for useful suggestions that improved the presentation. The work of George Michailidis was supported in part by NSF under grants IIS-9988095 and DMS-0214171.

References (22)

  • W.J. Heiser

    Correspondence analysis with least absolute residuals

    Comput. Statist. Data Anal

    (1987)
  • De Leeuw, J., Michailidis, G., 2000a. Graph layout techniques and multidimensional data analysis. In: Bruss, F.T., Le...
  • J. De Leeuw et al.

    Block relaxation algorithms in statistics

    J. Comput. Graphical Statist

    (2000)
  • De Leeuw, J., Michailidis, G., 2003. Weber correspondence analysis: the one dimensional case. Technical Report #343,...
  • J. De Leeuw et al.

    Nonlinear principal components analysis with B-splines

    Methods Oper. Res

    (1980)
  • Drezner, Z. (Ed.) 1995. Facility Location. Springer, New...
  • A. Edelman et al.

    The geometry of algorithms with orthogonality constraints

    SIAM J. Matrix Anal. Appl

    (1999)
  • A. Gifi

    Nonlinear Multivariate Analysis

    (1990)
  • L. Guttman

    A general non-metric technique for fitting the smallest coordinate space for a configuration of points

    Psychometrika

    (1968)
  • P. Huber

    Robust Statistics

    (1981)
  • H.A.L. Kiers

    Weighting least squares fitting using ordinary least squares algorithms

    Psychometrika

    (1997)
  • Cited by (8)

    View all citing articles on Scopus
    View full text