Skip to main content
Log in

Weighting and selection of variables for cluster analysis

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

One of the thorniest aspects of cluster analysis continues to be the weighting and selection of variables. This paper reports on the performance of nine methods on eight “leading case” simulated and real sets of data. The results demonstrate shortcomings of weighting based on the standard deviation or range as well as other more complex schemes in the literature. Weighting schemes based upon carefully chosen estimates of within-cluster and between-cluster variability are generally more effective. These estimates do not require knowledge of the cluster structure. Additional research is essential: worry-free approaches do not yet exist.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ANDREWS, D. F., and HERZBERG, A. M. (1985),Data: A Collection of Problems from Many Fields for the Student and Research Worker, New York: Springer-Verlag.

    Google Scholar 

  • ART, D., GNANADESIKAN, R., and KETTENRING, J. R. (1982), “Data-Based Metrics for Cluster Analysis,”Utilitas Mathematica, 21A, 75–99.

    Google Scholar 

  • BATCHELOR, B. G. (1978),Pattern Recognition: Ideas in Practice, New York: Plenum

    Google Scholar 

  • DESARBO, W. S., CARROLL, J. D., CLARK, L. A., and GREEN, P. E. (1984), “Synthesized Clustering: A Method for Amalgamating Clustering Bases with Differential Weighting of Variables,”Psychometrika, 49, 57–78.

    Google Scholar 

  • DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,”Quality and Quantity, 20, 169–180.

    Google Scholar 

  • DE SOETE, G. (1988), “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting,”Journal of Classification, 5, 101–104.

    Google Scholar 

  • DE SOETE, G., DESARBO, W. S., and CARROLL, J. D. (1985), “Optimal Variable Weighting for Hierarchical Clustering: An Alternating Least Squares Algorithm,”Journal of Classification, 2, 173–192.

    Google Scholar 

  • DUFFY, D. E., and QUIROZ, A. J. (1991), “A Permutation-Based Algorithm for Block Clustering,”Journal of Classification, 8, 65–91.

    Google Scholar 

  • FINNEY, D. J. (1956), “Multivariate Analysis and Agricultural Experiments,”Biometrics, 12, 67–71.

    Google Scholar 

  • FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J. R. (1987), “Variable Selection in Clustering and Other Contexts,” inDesign, Data, and Analysis, by Some Friends of Cuthbert Daniel, C. L. Mallows, New York: Wiley, 13–34.

    Google Scholar 

  • FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J.R. (1988), “Variable Selection in Clustering,”Journal of Classification, 5, 205–228.

    Google Scholar 

  • FRIEDMAN, H.. P. and Rubin, J. (1967), “On Some Invariant Criteria for Grouping Data,”Journal of the American Statistical Association, 62, 1159–1178.

    Google Scholar 

  • FUKUNAGA, K. (1972),Introduction to Statistical Pattern Recognition, New York: Academic Press.

    Google Scholar 

  • GNANADESIKAN, R. (1977),Methods for Statistical Data Analysis of Multivariate Observations, New York: Wiley.

    Google Scholar 

  • GNANADESIKAN, R., HARVEY, J. W., and KETTENRING, J. R. (1993), “Mahalanobis Metrics for Cluster Analysis,”Sankhya A, 55, 494–505.

    Google Scholar 

  • GORDON, A. D. (1981),Classification: Methods for Exploratory' Analysis of Multivariate Data, New York: Chapman and Hall.

    Google Scholar 

  • GORDON, A. D. (1987), “A Review of Hierarchical Classification,”Journal of the Royal Statistical Society A, 150, 119–137.

    Google Scholar 

  • GORDON, A. D. (1990), “Constructing Dissimilarity Measures,”Journal of Classification, 7, 257–269.

    Google Scholar 

  • GREEN, P. E., CARMONE, F. J., and KIM, J. (1990), “A Preliminary Study of Optimal Variable Weighting ink-Means Clustering,”Journal of Classification, 7, 271–285.

    Google Scholar 

  • HANSEN, K. M., and TUKEY, J. W. (1992), “Tuning a Major Part of a Clustering Algorithm,”International Statistical Review, 60, 21–43.

    Google Scholar 

  • HARTIGAN, J. (1972), “Direct Clustering of a Data Matrix,”Journal of the American Statistical Association, 67, 123–129.

    Google Scholar 

  • KAUFMAN, L., and ROUSSEEUW, P. J. (1990),Finding Groups in Data: An Introduction to Cluster Analysis, New York: Wiley.

    Google Scholar 

  • KRUSKAL, J. B. (1964a), “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis,”Psychometrika, 29, 1–27.

    Google Scholar 

  • KRUSKAL, J. B. (1964b), “Nonmetric Multidimensional Scaling: A Numerical Method,”Psychometrika, 29, 115–129.

    Google Scholar 

  • MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,” inProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), Eds., L.M. Le Cam & J. Neyman, Berkeley: University of California Press, 281–297.

    Google Scholar 

  • MILLIGAN, G. W. (1989), “A Validation Study of a Variable Weighting Algorithm for Cluster Analysis,”Journal of Classification, 6, 53–71.

    Google Scholar 

  • MURTAGH, F. (1991), “Review ofAdaptive Pattern Recognition and Neural Networks by Pao andNeural Networks in Artificial Intelligence by Zeidenberg,”Journal of Classification, 8, 115–119.

    Google Scholar 

  • MILLIGAN, G. W. and COOPER, M. C. (1988), “A Study of Standardization of Variables in Cluster Analysis,”Journal of Classification, 5, 181–204.

    Google Scholar 

  • RIPLEY, B. D. (1993), “Statistical Aspects of Neural Networks,” inNetworks and Chaos-Statistical and Probabilistic Aspects, Eds., O.E. Barndorff-Nielsen, J.L. Jensen, and W.S. Kendall, New York: Chapman and Hall, 40–123.

    Google Scholar 

  • SOKAL, R. R., and ROHLF, F. J. (1980), “An Experiment in Taxonomic Judgment,”Systematic Botany, 5, 341–365.

    Google Scholar 

  • SPÄTH, H. (1980),Cluster Analysis Algorithms, Chichester: Ellis Horwood.

    Google Scholar 

  • WARD, J. H., Jr. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gnanadesikan, R., Kettenring, J.R. & Tsao, S.L. Weighting and selection of variables for cluster analysis. Journal of Classification 12, 113–136 (1995). https://doi.org/10.1007/BF01202271

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01202271

Keywords

Navigation