Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal

Steinley, Douglas; Hubert, Lawrence

doi:10.1007/s11336-008-9058-z

Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal

Theory and Methods
Published: 14 February 2008

Volume 73, pages 647–664, (2008)
Cite this article

Psychometrika Aims and scope Submit manuscript

Douglas Steinley¹ &
Lawrence Hubert²

320 Accesses
24 Citations
Explore all metrics

Abstract

This paper proposes an order-constrained K-means cluster analysis strategy, and implements that strategy through an auxiliary quadratic assignment optimization heuristic that identifies an initial object order. A subsequent dynamic programming recursion is applied to optimally subdivide the object set subject to the order constraint. We show that although the usual K-means sum-of-squared-error criterion is not guaranteed to be minimal, a true underlying cluster structure may be more accurately recovered. Also, substantive interpretability seems generally improved when constrained solutions are considered. We illustrate the procedure with several data sets from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Al-Sultan, K. (1995). A tabu search approach to the clustering problem. Pattern Recognition, 28, 1443–1451.
Article Google Scholar
Babu, G.P., & Murty, M.N. (1994). Simulated annealing for selecting optimal initial seeds in the K-means algorithm. Indian Journal of Pure and Applied Mathematics, 25, 85–94.
Google Scholar
Baker, F.B., & Hubert, L.J. (1977). Applications of combinatorial programming to data analysis: Seriation using asymmetric proximity measures. British Journal of Mathematical and Statistical Psychology, 30, 154–164.
Google Scholar
Brusco, M.J., & Cradit, J.D. (2004). Graph coloring, mimimum-diameter partitioning, and the analysis of confusion matrices. Journal of Mathematical Psychology, 48, 301–309.
Article Google Scholar
Brusco, M.J., & Cradit, J.D. (2005). Bicriterion methods for partitioning dissimilarity matrices. British Journal of Mathematical and Statistical Psychology, 58, 319–332.
Article PubMed Google Scholar
Brusco, M.J., & Stahl, S. (2001a). An interactive multiobjective programming approach to combinatorial data analysis. Psychometrika, 66, 5–24.
Article Google Scholar
Brusco, M.J., & Stahl, S. (2001b). Compact integer-programming models for extracting subsets of stimuli from confusion matrices. Psychometrika, 66, 405–420.
Article Google Scholar
Brusco, M.J., & Stahl, S. (2005). Branch-and-bound applications in combinatorial data analysis. New York: Springer.
Google Scholar
Chang, W.C. (1983). On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics, 32, 267–275.
Article Google Scholar
Delattre, M., & Hansen, P. (1980). Bicriterion cluster analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 277–291.
Article Google Scholar
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Google Scholar
Fisher, W.D. (1958). On grouping for maximum heterogeneity. Journal of the American Statistical Association, 53, 789–798.
Article Google Scholar
Forgy, E.W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768–769.
Google Scholar
Hansen, P., & Mladenovic, N. (2001). J-Means: A new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34, 405–413.
Article Google Scholar
Hartigan, J. (1975). Clustering algorithms. New York: Wiley.
Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Hubert, L., Arabie, P., & Meulman, J. (2001). Combinatorial data analysis: Optimization by dynamic programming. Philadelphia: SIAM.
Google Scholar
Hubert, L., Arabie, P., & Meulman, J. (2006). The structural representation of proximity matrices with MATLAB. Philadelphia: SIAM.
Google Scholar
Kiplinger’s personal finance. In Kiplinger’s personal finance (Vol. 57, pp. 104–123).
MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). University of California Press: Berkeley.
Google Scholar
Milligan, G.W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325–342.
Article Google Scholar
Milligan, G.W., & Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.
Article Google Scholar
Milligan, G.W., & Cooper, M.C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5, 181–204.
Article Google Scholar
Mirkin, B. (2005). Clustering for data mining. New York: Chapman & Hall.
Google Scholar
Pacheco, J., & Valencia, O. (2003). Design of hybrids for the minimum sum-of-squares clustering problem. Computational Statistics and Data Analysis, 43, 235–248.
Google Scholar
Späth, H. (1980). Cluster analysis algorithms. Chichester: Ellis Horwood.
Google Scholar
Steinley, D. (2003). K-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304.
Article PubMed Google Scholar
Steinley, D. (2004a). Standardizing variables in K-means clustering. In D. Banks, L. House, F.R. McMorris, P. Arabie, & W. Gaul (Eds.), Classification, clustering, and data mining applications (pp. 53–60). New York: Springer.
Google Scholar
Steinley, D. (2004b). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386–396.
Article PubMed Google Scholar
Steinley, D. (2006a). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.
Article PubMed Google Scholar
Steinley, D. (2006b). Profiling local optima in K-means clustering: Developing a diagnostic technique. Psychological Methods, 11, 178–192.
Article PubMed Google Scholar
Steinley, D., & Brusco, M.J. (2007). Initializing K-means batch clustering: A critical evaluation of several techniques. Journal of Classification, 24, 99–121.
Article Google Scholar
Steinley, D., & Brusco, M.J. (2008, in press). A new variable weighting and selection procedure for K-means cluster analysis. Multivariate Behavioral Research.
Steinley, D., & Henson, R. (2005). OCLUS: An analytic method for generating clusters with known overlap. Journal of Classification, 22, 221–250.
Article Google Scholar
Theise, E.S. (1989). Finding a subset of stimulus-response pairs with minimum total confusion: A binary integer programming approach. Human Factors, 31, 291–305.
Google Scholar
Thorndike, R.L. (1953). Who belongs in the family? Psychometrika, 18, 267–276.
Article Google Scholar
Van Ness, J.W. (1973). Admissible clustering procedures II. Biometrika, 60, 422–424.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychological Sciences, University of Missouri-Columbia, 210 McAlester Hall, Columbia, MO, 65211, USA
Douglas Steinley
University of Illinois, Urbana-Champaign, Champaign, IL, USA
Lawrence Hubert

Authors

Douglas Steinley
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Hubert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas Steinley.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Steinley, D., Hubert, L. Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal. Psychometrika 73, 647–664 (2008). https://doi.org/10.1007/s11336-008-9058-z

Download citation

Received: 12 December 2006
Revised: 02 March 2007
Published: 14 February 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s11336-008-9058-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation