Weighting and selection of variables for cluster analysis

Gnanadesikan, R.; Kettenring, J. R.; Tsao, S. L.

doi:10.1007/BF01202271

Weighting and selection of variables for cluster analysis

Published: March 1995

Volume 12, pages 113–136, (1995)
Cite this article

Journal of Classification Aims and scope Submit manuscript

R. Gnanadesikan^1,2,
J. R. Kettenring² &
S. L. Tsao^2,3

982 Accesses
104 Citations
Explore all metrics

Abstract

One of the thorniest aspects of cluster analysis continues to be the weighting and selection of variables. This paper reports on the performance of nine methods on eight “leading case” simulated and real sets of data. The results demonstrate shortcomings of weighting based on the standard deviation or range as well as other more complex schemes in the literature. Weighting schemes based upon carefully chosen estimates of within-cluster and between-cluster variability are generally more effective. These estimates do not require knowledge of the cluster structure. Additional research is essential: worry-free approaches do not yet exist.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ANDREWS, D. F., and HERZBERG, A. M. (1985),Data: A Collection of Problems from Many Fields for the Student and Research Worker, New York: Springer-Verlag.
Google Scholar
ART, D., GNANADESIKAN, R., and KETTENRING, J. R. (1982), “Data-Based Metrics for Cluster Analysis,”Utilitas Mathematica, 21A, 75–99.
Google Scholar
BATCHELOR, B. G. (1978),Pattern Recognition: Ideas in Practice, New York: Plenum
Google Scholar
DESARBO, W. S., CARROLL, J. D., CLARK, L. A., and GREEN, P. E. (1984), “Synthesized Clustering: A Method for Amalgamating Clustering Bases with Differential Weighting of Variables,”Psychometrika, 49, 57–78.
Google Scholar
DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,”Quality and Quantity, 20, 169–180.
Google Scholar
DE SOETE, G. (1988), “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting,”Journal of Classification, 5, 101–104.
Google Scholar
DE SOETE, G., DESARBO, W. S., and CARROLL, J. D. (1985), “Optimal Variable Weighting for Hierarchical Clustering: An Alternating Least Squares Algorithm,”Journal of Classification, 2, 173–192.
Google Scholar
DUFFY, D. E., and QUIROZ, A. J. (1991), “A Permutation-Based Algorithm for Block Clustering,”Journal of Classification, 8, 65–91.
Google Scholar
FINNEY, D. J. (1956), “Multivariate Analysis and Agricultural Experiments,”Biometrics, 12, 67–71.
Google Scholar
FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J. R. (1987), “Variable Selection in Clustering and Other Contexts,” inDesign, Data, and Analysis, by Some Friends of Cuthbert Daniel, C. L. Mallows, New York: Wiley, 13–34.
Google Scholar
FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J.R. (1988), “Variable Selection in Clustering,”Journal of Classification, 5, 205–228.
Google Scholar
FRIEDMAN, H.. P. and Rubin, J. (1967), “On Some Invariant Criteria for Grouping Data,”Journal of the American Statistical Association, 62, 1159–1178.
Google Scholar
FUKUNAGA, K. (1972),Introduction to Statistical Pattern Recognition, New York: Academic Press.
Google Scholar
GNANADESIKAN, R. (1977),Methods for Statistical Data Analysis of Multivariate Observations, New York: Wiley.
Google Scholar
GNANADESIKAN, R., HARVEY, J. W., and KETTENRING, J. R. (1993), “Mahalanobis Metrics for Cluster Analysis,”Sankhya A, 55, 494–505.
Google Scholar
GORDON, A. D. (1981),Classification: Methods for Exploratory' Analysis of Multivariate Data, New York: Chapman and Hall.
Google Scholar
GORDON, A. D. (1987), “A Review of Hierarchical Classification,”Journal of the Royal Statistical Society A, 150, 119–137.
Google Scholar
GORDON, A. D. (1990), “Constructing Dissimilarity Measures,”Journal of Classification, 7, 257–269.
Google Scholar
GREEN, P. E., CARMONE, F. J., and KIM, J. (1990), “A Preliminary Study of Optimal Variable Weighting ink-Means Clustering,”Journal of Classification, 7, 271–285.
Google Scholar
HANSEN, K. M., and TUKEY, J. W. (1992), “Tuning a Major Part of a Clustering Algorithm,”International Statistical Review, 60, 21–43.
Google Scholar
HARTIGAN, J. (1972), “Direct Clustering of a Data Matrix,”Journal of the American Statistical Association, 67, 123–129.
Google Scholar
KAUFMAN, L., and ROUSSEEUW, P. J. (1990),Finding Groups in Data: An Introduction to Cluster Analysis, New York: Wiley.
Google Scholar
KRUSKAL, J. B. (1964a), “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis,”Psychometrika, 29, 1–27.
Google Scholar
KRUSKAL, J. B. (1964b), “Nonmetric Multidimensional Scaling: A Numerical Method,”Psychometrika, 29, 115–129.
Google Scholar
MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,” inProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), Eds., L.M. Le Cam & J. Neyman, Berkeley: University of California Press, 281–297.
Google Scholar
MILLIGAN, G. W. (1989), “A Validation Study of a Variable Weighting Algorithm for Cluster Analysis,”Journal of Classification, 6, 53–71.
Google Scholar
MURTAGH, F. (1991), “Review ofAdaptive Pattern Recognition and Neural Networks by Pao andNeural Networks in Artificial Intelligence by Zeidenberg,”Journal of Classification, 8, 115–119.
Google Scholar
MILLIGAN, G. W. and COOPER, M. C. (1988), “A Study of Standardization of Variables in Cluster Analysis,”Journal of Classification, 5, 181–204.
Google Scholar
RIPLEY, B. D. (1993), “Statistical Aspects of Neural Networks,” inNetworks and Chaos-Statistical and Probabilistic Aspects, Eds., O.E. Barndorff-Nielsen, J.L. Jensen, and W.S. Kendall, New York: Chapman and Hall, 40–123.
Google Scholar
SOKAL, R. R., and ROHLF, F. J. (1980), “An Experiment in Taxonomic Judgment,”Systematic Botany, 5, 341–365.
Google Scholar
SPÄTH, H. (1980),Cluster Analysis Algorithms, Chichester: Ellis Horwood.
Google Scholar
WARD, J. H., Jr. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Rutgers University, 08903, New Brunswick, NJ
R. Gnanadesikan
Bellcore, 445 South Street, 07960, Morristown, NJ
R. Gnanadesikan, J. R. Kettenring & S. L. Tsao
AT&T Bell Laboratories, 101 Crawford's Corner Road, 07733, Holmdel, NJ
S. L. Tsao

Authors

R. Gnanadesikan
View author publications
You can also search for this author in PubMed Google Scholar
J. R. Kettenring
View author publications
You can also search for this author in PubMed Google Scholar
S. L. Tsao
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gnanadesikan, R., Kettenring, J.R. & Tsao, S.L. Weighting and selection of variables for cluster analysis. Journal of Classification 12, 113–136 (1995). https://doi.org/10.1007/BF01202271

Download citation

Issue Date: March 1995
DOI: https://doi.org/10.1007/BF01202271

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighting and selection of variables for cluster analysis

Abstract

Access this article

Similar content being viewed by others

Variable Selection in Cluster Analysis: An Approach Based on a New Index

Hierarchical Means Clustering

Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weighting and selection of variables for cluster analysis

Abstract

Access this article

Similar content being viewed by others

Variable Selection in Cluster Analysis: An Approach Based on a New Index

Hierarchical Means Clustering

Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation