A preliminary study of optimal variable weighting in k-means clustering

Green, Paul E.; Kim, Jonathan; Carmone, Frank J.

doi:10.1007/BF01908720

A preliminary study of optimal variable weighting in k-means clustering

Published: September 1990

Volume 7, pages 271–285, (1990)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Paul E. Green¹,
Jonathan Kim¹ &
Frank J. Carmone²

313 Accesses
50 Citations
Explore all metrics

Abstract

Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one of these algorithms cross-validates extremely well.

The present study applies a k-means, optimal weighting procedure to two empirical data sets and contrasts its cross-validation performance with that of unit (i.e., equal) weighting of the variables. We find that the optimal weighting procedure cross-validates better in one of the two data sets. In the second data set its comparative performance strongly depends on the approach used to find seed values for the initial k-means partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Feature Weighting Based K-Means Algorithms

Article 01 July 2016

Renato Cordeiro de Amorim

Hierarchical Means Clustering

Article Open access 23 September 2022

Maurizio Vichi, Carlo Cavicchia & Patrick J. F. Groenen

Variable Selection in K-Means Clustering via Regularization

References

ARABIE, P., and BOORMAN, S. (1982), “Blockmodels: Developments and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., H. C. Hudson and Associates, San Francisco: Josey-Bass.
Google Scholar
ARABLE, P., CARROLL, J. D., DESARBO, W. S., and WIND, Y. (1981), “Overlapping Clustering: A New Methodology for Product Positioning,”Journal of Marketing Research, 18, 310–317.
Google Scholar
BLASHFIELD, R. K. (1976), “Mixture Model Tests of Cluster Analysis: Accuracy of Four Agglomerative Hierarchical Methods,”Psychological Bulletin, 83, 377–388.
Google Scholar
BLASHFIELD, R. K., and ALDENDERFER, M. S. (1978), “Computer Programs for Performing Iterative Partitioning Cluster Analyses,”Applied Psychological Measurement, 2, 533–541.
Google Scholar
CARROLL, J. D., GREEN, P. E., and SCHAFFER, C. M. (1987), “Comparing Interpoint Distances in Correspondence Analyses: A Clarification,”Journal of Marketing Research, 24, 445–450.
Google Scholar
CHANG, W. (1983), “On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions,”Applied Statistics, 32, 267–275.
Google Scholar
COLLINS, L. M., and DENT, C. W. (1988), “Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions,”Multivariate Behavioral Research, 23, 231–242.
Google Scholar
COOPER, L. G. (1983), “A Review of Multidimensional Scaling in Marketing Research,”Applied Psychological Measurement, 7, 427–450.
Google Scholar
DESARBO, W. S., CARROLL, J. D., CLARK, L., and GREEN, P. E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables,”Psychometrika, 49, 59–78.
Google Scholar
DESARBO, W. S., and AJAHAN, V. (1984), “Constrained Classification: The Use ofA Priori Information in Cluster Analysis,”Psychometrika, 49, 187–216.
Google Scholar
DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,”Quality and Quantity, 20, 169–180.
Google Scholar
DE SOETE, G., DESARBO, W. S., and CARROLL, J. D. (1985), “Optimal Variable Weighting for Hierarchical Clustering: An Alternating Least Squares Approach,”Journal of Classification, 2, 173–192.
Google Scholar
DICKENSON, J. R. (1986),Bibliography of Marketing Research Methods, Lexington, MA: Lexington, 580–597.
Google Scholar
FISHER, D. G., and HOFFMAN, P. (1988), “The Adjusted Rand Statistic: A SAS Macro,”Psychometrika, 53, 417–423.
Google Scholar
FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J. R. (1988_, “Variable Selection in Clustering,”Journal of Classification, 5, 205–228.
Google Scholar
FRANK, R. E., and GREEN, P. E. (1968), “Numerical Taxonomy in Marketing Analysis: A Review Article,”Journal of Marketing Research, 5, 83–98.
Google Scholar
GREEN, P. E., CARMONE, F. J., and SMITH, S. M. (1989),Multidimensional Scaling: Concepts and Applications, Boston: Allyn and Bacon.
Google Scholar
GREEN, P. E., FRANK, R. E., and ROBINSON, P. J. (1967), “Cluster Analysis in Test Market Selection,”Management Science, 13 B, 387–400.
Google Scholar
GREENACRE, M. J. (1984),Theory and Applications of Correspondence Analysis, London: Academic Press.
Google Scholar
HOFFMAN, D. L., and FRANKE, G. G. (1986), “Correspondence Analysis: Graphical Representation of Categorical Data in Marketing Research,”Journal of Marketing Research, 23, 213–227.
Google Scholar
HOWARD, N., and HARRIS, B. (1966), “A Hierarchical Grouping Routine, IBM 360/65 FORTRAN IV Program,” Philadelphia: University of Pennsylvania, Computer Center.
Google Scholar
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions,”Journal of Classification, 2, 193–218.
Google Scholar
JANCEY, R. C. (1966), “Multidimensional Group Analysis,”Australian Journal of Botany, 14, 127–130.
Google Scholar
JOHNSON, R. M. (1988), “Convergent Cluster Analysis System,” unpublished manuscript, Ketchum, ID: Sawtooth Software, April.
Google Scholar
JOYCE, T., and CHANNON, C. (1966), “Classifying Marketing Survey Respondents,”Applied Statistics, 15, 191–215.
Google Scholar
MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,”Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. I, 231–297.
Google Scholar
MCINTYRE, R. M., and BLASHFIELD, R. K. (1980), “A Nearest Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure,”Multivariate Behavioral Research, 15, 225–238.
Google Scholar
MCLACHLAN, G. J., and BASFORD, K. E. (1988),Mixture Models: Inferences and Applications to Clustering, New York: Marcel Dekker.
Google Scholar
MILLIGAN, G. W. (1989), “A Validation Study of a Variable Weighting System for Cluster Analysis,”Journal of Classification, 6, 53–72.
Google Scholar
MILLIGAN, G. W., and COOPER, M. C. (1988), “A Study of Standardization of Variables in Cluster Analysis,”Journal of Classification, 5, 181–204.
Google Scholar
MILLIGAN, G. W., and COOPER, M. C. (1987), “Methodology Review: Clustering Methods,”Applied Psychological Measurement, 11, 329–354.
Google Scholar
MILLIGAN, G. W., and COOPER, M. C. (1986), “A Study of the Comparability of Extemal Criteria for Hierarchical Cluster Analysis,”Multivariate Behavioral Research, 21, 441–458.
Google Scholar
MILLIGAN, G. W., and SOKOL, L. M. (1980), “A Two-Stage Clustering Algorithm with Robust Recovery Characteristics,”Educational and Psychological Measurement, 40, 755–759.
Google Scholar
MORRISON, D. G. (1967), “Measurement Problems in Cluster Analysis,”Management Science, 13 B, 775–780.
Google Scholar
NEIDELL, L. A. (1970), “Procedures and Pitfalls in Cluster Analysis,”Proceedings, Fall Conference, Chicago: American Marketing Association.
Google Scholar
PUNJ, G., and STEWART, D. W. (1983), “Cluster Analysis in Marketing Research: Review and Suggestions for Application,”Journal of Marketing Research, 20, 134–148.
Google Scholar
SCHMIDT, F. L. (1971), “The Relative Efficiency of Regression and Simple Unit Predictor weights in Applied Differential Psychology,”Educational and Psychological Measurement, 31, 699–714.
Google Scholar
SRIVASTAVA, R. K., ALPERT, M. I., and SHOCKER, A. P. (1984), “A Customer-Oriented Approach for Determining Market Structures,”Journal of Marketing, 48, 32–48.
Google Scholar
WARD, J. H. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.
Google Scholar
WOLFE, J. H. (1970), “Pattern Clustering by Multivariate Mixture Analysis,”Multivariate Behavior Research, 5, 329–350.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Pennsylvania, Suite 1400 Steinberg Hall-Dietrich Hall, 19104, Philadelphia, PA
Paul E. Green & Jonathan Kim
Marketing Department, Drexel University, 19104, Philadelphia, PA, USA
Frank J. Carmone

Authors

Paul E. Green
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Frank J. Carmone
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

The authors would like to acknowledge the support of the Citibank Fellowship from the Sol C. Snider Entrepreneurial Center at the Wharton School. The authors would like to express their appreciation to J. Douglas Carroll and Abba M. Kreiger for comments on an earlier version of the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Green, P.E., Kim, J. & Carmone, F.J. A preliminary study of optimal variable weighting in k-means clustering. Journal of Classification 7, 271–285 (1990). https://doi.org/10.1007/BF01908720

Download citation

Issue Date: September 1990
DOI: https://doi.org/10.1007/BF01908720

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A preliminary study of optimal variable weighting in k-means clustering

Abstract

Access this article

Similar content being viewed by others

A Survey on Feature Weighting Based K-Means Algorithms

Hierarchical Means Clustering

Variable Selection in K-Means Clustering via Regularization

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A preliminary study of optimal variable weighting in k-means clustering

Abstract

Access this article

Similar content being viewed by others

A Survey on Feature Weighting Based K-Means Algorithms

Hierarchical Means Clustering

Variable Selection in K-Means Clustering via Regularization

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation