A permutation-based algorithm for block clustering

Duffy, Diane E.; Quiroz, adolfo J.

doi:10.1007/BF02616248

A permutation-based algorithm for block clustering

Published: January 1991

Volume 8, pages 65–91, (1991)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Diane E. Duffy¹ &
adolfo J. Quiroz²

229 Accesses
41 Citations
Explore all metrics

Abstract

Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ADELSON, R. M., nORMAN, J. M., and LAPORTE, G. (1976), “A Dynamic Programming Formulation With Diverse Applications,”Operational Research Quaterly, 27, 119–121.
MATH Google Scholar
ALDOUS, D. (1987), “On the Markov Chain Simulation Method for Uniform Combinatorial Distributions and Simulated Annealing,”Probability in the Engineering and Informational Sciences, 1, 33–46.
Article MATH Google Scholar
ARABIE, P., BOORMAN, S. A. and LEVITT, P. R. (1978), “Constructing Blockmodels: How and Why,”Journal of Mathematical Psychology, 17, 21–63.
Article MATH Google Scholar
ARABIE, P., and BOORMAN, S. A. (1982), “Blockmodels; Development and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., Herschel C. Hudson and Associates, San Francisco: Jossey-Bass Publisher, Chapter 11.
Google Scholar
BILLINGSLEY, P. (1968),Convergence of Probability Measures, New York: Wiley.
MATH Google Scholar
BOCK, H. H. (1979), “Simultaneous Clustering of Objects and Variables,” inAnalyse de Données et Informatique. Le Chesnay (France): Institut National de Recherche en Informatique et en Automatique, 187–203.
Google Scholar
BREIGER, R. L., BOORMAN, S. A., and ARABIE, P. (1975), “An Algorithm for Clustering Relational Data With Applications to Social Network Analysis and Comparison with Multidimensional Scaling,”Journal of Matheamtical Psychology, 12, 328–383.
Article Google Scholar
BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R., and STONE, C. J. (1984),Classification and Regression Trees, Belmont, CA: Wadsworth:
MATH Google Scholar
DE SOETE, G., DESARBO, W. S., FURNAS, G. W., and CARROLL, J. D. (1984), “The Estimation of Ultrametric and Path Length Trees From Rectangular Proximity Data,”Psychometrika, 49 (3), 289–310.
Article Google Scholar
DEUTSCH, S. B. and MARTIN J. J. (1971), “An Ordering Algorithm for Analysis of Data Arrays,”Operations Research, 19, 1350–1362.
Article MATH Google Scholar
DIACONIS, P., and SHASHAHANI, M. (1981), “Generating a Random Permutation with Random Transposition,”Zeitschrift fuer Wahrscheinlichkeitstheorie and Verwandte Gebiete, 57, 159–179.
Article MATH Google Scholar
DUFFY, D. E., and KEMPERMAN, J. H. B. (1990), “Entropy-Based Splitting Criteria,” Morristown, NJ: Bell Communications Research Technical memorandum, in preparation.
DUFFY, D. E., FOWLKES, E. B., and KANE, L. D. (1987), “Cluster Analysis in Strategic Data Architecture Design,” in 1987Bellcore Database Symposium. Morristown, NJ: Bell Communications Research, Inc., 175–186.
Google Scholar
EDELSBRUNNER, H. (1987),Algorithms in Combinatorial Geometry, New York: Springer-Verlag.
MATH Google Scholar
GILULA, Z. (1986), “Grouping and Association in Contingency Tables: An Exploratory Canonical Correlation Approach,”Journal of the American Statistical Association, 81, 773–779.
Article MATH MathSciNet Google Scholar
GOODMAN, L. A. (1981), “Critieria for Determining Whether Certain Categories in a Cross-Classification Table Should be Combined With Special Reference to Occupation Categories in an Occupational Mobility Table,”American Journal of Sociology, 87, 612–650.
Article Google Scholar
GOVAERT, G. (1977), “Algorithme de Classification d'un Tableau de Contingence,”Proceedings of the First International Symposium on Data Analysis and Informatics, 2, Le Chesnay (France): Institut National de Recherche en Informatique et Automatique, 487–500.
Google Scholar
GREENACRE, M. J. (1988), “Clustering the Rows and Columns of a Contingency Table,”Journal of Classification, 5, 39–51.
Article MATH MathSciNet Google Scholar
HARTIGAN, J. A. (1972), “Direct Clustering of a Data Matrix,”Journal of the American Statistical Association, 6, 123–129.
Article Google Scholar
HARTIGAN, J. A. (1975),Clustering Algorithms, New York: Wiley.
MATH Google Scholar
HARTIGAN, J. A. (1976), “Modal Blocks in Dentition of West Coast Mammals,”Systematic Zoology, 25, 149–160.
Article Google Scholar
HEISER, W. J., and MEULMAN, J. (1983), “Analyzing Rectangular Tables by Joint and Constrained Multidimensional Scalling,”Journal of Econometrics, 22, 139–167.
Article Google Scholar
HILL, M. O. (1974), “Correspondence Analysis: A Neglected Multivariate Method,”Applied Statistics, 23, 340–354.
Article Google Scholar
HOLLAND, P. W., and LEINHARDT, S. (1981), “An Exponential Family of Probability Distributions for Directed Graphs (with discussion),”Journal of the American Statistical Association, 76, 33–65.
Article MATH MathSciNet Google Scholar
HUBERT, L. J. (1974), “Problems of Seriation Using a Subject by Item Response Matrix,”Psychological Bulletin, 81, 976–983.
Article Google Scholar
HUBERT, L. J., and GOLLEDGE, R. G. (1981), “Matrix Reorganization and Dynamic Programming: Applications to Paired Comparisons and Unidimensional Seriation,”Psychometrika, 46, 429–441.
Article MATH Google Scholar
LAMBERT, J. M. and WILLIAMS, W. T. (1962), “Multivariate Methods in Plant Ecology IV. Nodal Analysis,”Journal of Ecology, 50, 775–802.
Article Google Scholar
MEHTA, C. R., PATEL, N. R., and GRAY, R. (1985), “Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2×2 Contingency Tables,”Journal of the American Statistical Association, 8, 969–973.
Article MathSciNet Google Scholar
SCHMID, B. (1984), “Niche Width and Variation Within and Between Populations in Colonizing Species (Carex flava group)”Oecologia (Berlin),63, 1–5.
Article Google Scholar
SOKAL, R. R., and SNEATH, P. H. A. (1963),Principles of Numerical Taxonomy, San Francisco: Freeman.
Google Scholar
WANG, Y. J., and WONG, G. Y. (1987), “Stochastic Blockmodels for Directed Graphs,”Journal of the American Statistical Association, 82, 8–19.
Article MathSciNet Google Scholar
WONG, G. Y. (1987), “Bayesian Models for Directed Graphs,”Journal of the American Statistical Association, 82, 140–148.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Bellcore, Morristowri, NJ, USA
Diane E. Duffy
Universidad Simon Bolivar, Caracas, Venezuela
adolfo J. Quiroz

Authors

Diane E. Duffy
View author publications
You can also search for this author in PubMed Google Scholar
adolfo J. Quiroz
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duffy, D.E., Quiroz, a.J. A permutation-based algorithm for block clustering. Journal of Classification 8, 65–91 (1991). https://doi.org/10.1007/BF02616248

Download citation

Issue Date: January 1991
DOI: https://doi.org/10.1007/BF02616248

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A permutation-based algorithm for block clustering

Abstract

Access this article

Similar content being viewed by others

Fast and consistent algorithm for the latent block model

Clustering via Nonsymmetric Partition Distributions

Randomized-Blocks Designs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A permutation-based algorithm for block clustering

Abstract

Access this article

Similar content being viewed by others

Fast and consistent algorithm for the latent block model

Clustering via Nonsymmetric Partition Distributions

Randomized-Blocks Designs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation