Skip to main content
Log in

A permutation-based algorithm for block clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ADELSON, R. M., nORMAN, J. M., and LAPORTE, G. (1976), “A Dynamic Programming Formulation With Diverse Applications,”Operational Research Quaterly, 27, 119–121.

    MATH  Google Scholar 

  • ALDOUS, D. (1987), “On the Markov Chain Simulation Method for Uniform Combinatorial Distributions and Simulated Annealing,”Probability in the Engineering and Informational Sciences, 1, 33–46.

    Article  MATH  Google Scholar 

  • ARABIE, P., BOORMAN, S. A. and LEVITT, P. R. (1978), “Constructing Blockmodels: How and Why,”Journal of Mathematical Psychology, 17, 21–63.

    Article  MATH  Google Scholar 

  • ARABIE, P., and BOORMAN, S. A. (1982), “Blockmodels; Development and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., Herschel C. Hudson and Associates, San Francisco: Jossey-Bass Publisher, Chapter 11.

    Google Scholar 

  • BILLINGSLEY, P. (1968),Convergence of Probability Measures, New York: Wiley.

    MATH  Google Scholar 

  • BOCK, H. H. (1979), “Simultaneous Clustering of Objects and Variables,” inAnalyse de Données et Informatique. Le Chesnay (France): Institut National de Recherche en Informatique et en Automatique, 187–203.

    Google Scholar 

  • BREIGER, R. L., BOORMAN, S. A., and ARABIE, P. (1975), “An Algorithm for Clustering Relational Data With Applications to Social Network Analysis and Comparison with Multidimensional Scaling,”Journal of Matheamtical Psychology, 12, 328–383.

    Article  Google Scholar 

  • BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R., and STONE, C. J. (1984),Classification and Regression Trees, Belmont, CA: Wadsworth:

    MATH  Google Scholar 

  • DE SOETE, G., DESARBO, W. S., FURNAS, G. W., and CARROLL, J. D. (1984), “The Estimation of Ultrametric and Path Length Trees From Rectangular Proximity Data,”Psychometrika, 49 (3), 289–310.

    Article  Google Scholar 

  • DEUTSCH, S. B. and MARTIN J. J. (1971), “An Ordering Algorithm for Analysis of Data Arrays,”Operations Research, 19, 1350–1362.

    Article  MATH  Google Scholar 

  • DIACONIS, P., and SHASHAHANI, M. (1981), “Generating a Random Permutation with Random Transposition,”Zeitschrift fuer Wahrscheinlichkeitstheorie and Verwandte Gebiete, 57, 159–179.

    Article  MATH  Google Scholar 

  • DUFFY, D. E., and KEMPERMAN, J. H. B. (1990), “Entropy-Based Splitting Criteria,” Morristown, NJ: Bell Communications Research Technical memorandum, in preparation.

  • DUFFY, D. E., FOWLKES, E. B., and KANE, L. D. (1987), “Cluster Analysis in Strategic Data Architecture Design,” in 1987Bellcore Database Symposium. Morristown, NJ: Bell Communications Research, Inc., 175–186.

    Google Scholar 

  • EDELSBRUNNER, H. (1987),Algorithms in Combinatorial Geometry, New York: Springer-Verlag.

    MATH  Google Scholar 

  • GILULA, Z. (1986), “Grouping and Association in Contingency Tables: An Exploratory Canonical Correlation Approach,”Journal of the American Statistical Association, 81, 773–779.

    Article  MATH  MathSciNet  Google Scholar 

  • GOODMAN, L. A. (1981), “Critieria for Determining Whether Certain Categories in a Cross-Classification Table Should be Combined With Special Reference to Occupation Categories in an Occupational Mobility Table,”American Journal of Sociology, 87, 612–650.

    Article  Google Scholar 

  • GOVAERT, G. (1977), “Algorithme de Classification d'un Tableau de Contingence,”Proceedings of the First International Symposium on Data Analysis and Informatics, 2, Le Chesnay (France): Institut National de Recherche en Informatique et Automatique, 487–500.

    Google Scholar 

  • GREENACRE, M. J. (1988), “Clustering the Rows and Columns of a Contingency Table,”Journal of Classification, 5, 39–51.

    Article  MATH  MathSciNet  Google Scholar 

  • HARTIGAN, J. A. (1972), “Direct Clustering of a Data Matrix,”Journal of the American Statistical Association, 6, 123–129.

    Article  Google Scholar 

  • HARTIGAN, J. A. (1975),Clustering Algorithms, New York: Wiley.

    MATH  Google Scholar 

  • HARTIGAN, J. A. (1976), “Modal Blocks in Dentition of West Coast Mammals,”Systematic Zoology, 25, 149–160.

    Article  Google Scholar 

  • HEISER, W. J., and MEULMAN, J. (1983), “Analyzing Rectangular Tables by Joint and Constrained Multidimensional Scalling,”Journal of Econometrics, 22, 139–167.

    Article  Google Scholar 

  • HILL, M. O. (1974), “Correspondence Analysis: A Neglected Multivariate Method,”Applied Statistics, 23, 340–354.

    Article  Google Scholar 

  • HOLLAND, P. W., and LEINHARDT, S. (1981), “An Exponential Family of Probability Distributions for Directed Graphs (with discussion),”Journal of the American Statistical Association, 76, 33–65.

    Article  MATH  MathSciNet  Google Scholar 

  • HUBERT, L. J. (1974), “Problems of Seriation Using a Subject by Item Response Matrix,”Psychological Bulletin, 81, 976–983.

    Article  Google Scholar 

  • HUBERT, L. J., and GOLLEDGE, R. G. (1981), “Matrix Reorganization and Dynamic Programming: Applications to Paired Comparisons and Unidimensional Seriation,”Psychometrika, 46, 429–441.

    Article  MATH  Google Scholar 

  • LAMBERT, J. M. and WILLIAMS, W. T. (1962), “Multivariate Methods in Plant Ecology IV. Nodal Analysis,”Journal of Ecology, 50, 775–802.

    Article  Google Scholar 

  • MEHTA, C. R., PATEL, N. R., and GRAY, R. (1985), “Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2×2 Contingency Tables,”Journal of the American Statistical Association, 8, 969–973.

    Article  MathSciNet  Google Scholar 

  • SCHMID, B. (1984), “Niche Width and Variation Within and Between Populations in Colonizing Species (Carex flava group)”Oecologia (Berlin),63, 1–5.

    Article  Google Scholar 

  • SOKAL, R. R., and SNEATH, P. H. A. (1963),Principles of Numerical Taxonomy, San Francisco: Freeman.

    Google Scholar 

  • WANG, Y. J., and WONG, G. Y. (1987), “Stochastic Blockmodels for Directed Graphs,”Journal of the American Statistical Association, 82, 8–19.

    Article  MathSciNet  Google Scholar 

  • WONG, G. Y. (1987), “Bayesian Models for Directed Graphs,”Journal of the American Statistical Association, 82, 140–148.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duffy, D.E., Quiroz, a.J. A permutation-based algorithm for block clustering. Journal of Classification 8, 65–91 (1991). https://doi.org/10.1007/BF02616248

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02616248

Keywords

Navigation