Skip to main content
Log in

A preliminary study of optimal variable weighting in k-means clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one of these algorithms cross-validates extremely well.

The present study applies a k-means, optimal weighting procedure to two empirical data sets and contrasts its cross-validation performance with that of unit (i.e., equal) weighting of the variables. We find that the optimal weighting procedure cross-validates better in one of the two data sets. In the second data set its comparative performance strongly depends on the approach used to find seed values for the initial k-means partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • ARABIE, P., and BOORMAN, S. (1982), “Blockmodels: Developments and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., H. C. Hudson and Associates, San Francisco: Josey-Bass.

    Google Scholar 

  • ARABLE, P., CARROLL, J. D., DESARBO, W. S., and WIND, Y. (1981), “Overlapping Clustering: A New Methodology for Product Positioning,”Journal of Marketing Research, 18, 310–317.

    Google Scholar 

  • BLASHFIELD, R. K. (1976), “Mixture Model Tests of Cluster Analysis: Accuracy of Four Agglomerative Hierarchical Methods,”Psychological Bulletin, 83, 377–388.

    Google Scholar 

  • BLASHFIELD, R. K., and ALDENDERFER, M. S. (1978), “Computer Programs for Performing Iterative Partitioning Cluster Analyses,”Applied Psychological Measurement, 2, 533–541.

    Google Scholar 

  • CARROLL, J. D., GREEN, P. E., and SCHAFFER, C. M. (1987), “Comparing Interpoint Distances in Correspondence Analyses: A Clarification,”Journal of Marketing Research, 24, 445–450.

    Google Scholar 

  • CHANG, W. (1983), “On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions,”Applied Statistics, 32, 267–275.

    Google Scholar 

  • COLLINS, L. M., and DENT, C. W. (1988), “Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions,”Multivariate Behavioral Research, 23, 231–242.

    Google Scholar 

  • COOPER, L. G. (1983), “A Review of Multidimensional Scaling in Marketing Research,”Applied Psychological Measurement, 7, 427–450.

    Google Scholar 

  • DESARBO, W. S., CARROLL, J. D., CLARK, L., and GREEN, P. E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables,”Psychometrika, 49, 59–78.

    Google Scholar 

  • DESARBO, W. S., and AJAHAN, V. (1984), “Constrained Classification: The Use ofA Priori Information in Cluster Analysis,”Psychometrika, 49, 187–216.

    Google Scholar 

  • DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,”Quality and Quantity, 20, 169–180.

    Google Scholar 

  • DE SOETE, G., DESARBO, W. S., and CARROLL, J. D. (1985), “Optimal Variable Weighting for Hierarchical Clustering: An Alternating Least Squares Approach,”Journal of Classification, 2, 173–192.

    Google Scholar 

  • DICKENSON, J. R. (1986),Bibliography of Marketing Research Methods, Lexington, MA: Lexington, 580–597.

    Google Scholar 

  • FISHER, D. G., and HOFFMAN, P. (1988), “The Adjusted Rand Statistic: A SAS Macro,”Psychometrika, 53, 417–423.

    Google Scholar 

  • FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J. R. (1988_, “Variable Selection in Clustering,”Journal of Classification, 5, 205–228.

    Google Scholar 

  • FRANK, R. E., and GREEN, P. E. (1968), “Numerical Taxonomy in Marketing Analysis: A Review Article,”Journal of Marketing Research, 5, 83–98.

    Google Scholar 

  • GREEN, P. E., CARMONE, F. J., and SMITH, S. M. (1989),Multidimensional Scaling: Concepts and Applications, Boston: Allyn and Bacon.

    Google Scholar 

  • GREEN, P. E., FRANK, R. E., and ROBINSON, P. J. (1967), “Cluster Analysis in Test Market Selection,”Management Science, 13 B, 387–400.

    Google Scholar 

  • GREENACRE, M. J. (1984),Theory and Applications of Correspondence Analysis, London: Academic Press.

    Google Scholar 

  • HOFFMAN, D. L., and FRANKE, G. G. (1986), “Correspondence Analysis: Graphical Representation of Categorical Data in Marketing Research,”Journal of Marketing Research, 23, 213–227.

    Google Scholar 

  • HOWARD, N., and HARRIS, B. (1966), “A Hierarchical Grouping Routine, IBM 360/65 FORTRAN IV Program,” Philadelphia: University of Pennsylvania, Computer Center.

    Google Scholar 

  • HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions,”Journal of Classification, 2, 193–218.

    Google Scholar 

  • JANCEY, R. C. (1966), “Multidimensional Group Analysis,”Australian Journal of Botany, 14, 127–130.

    Google Scholar 

  • JOHNSON, R. M. (1988), “Convergent Cluster Analysis System,” unpublished manuscript, Ketchum, ID: Sawtooth Software, April.

    Google Scholar 

  • JOYCE, T., and CHANNON, C. (1966), “Classifying Marketing Survey Respondents,”Applied Statistics, 15, 191–215.

    Google Scholar 

  • MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,”Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. I, 231–297.

    Google Scholar 

  • MCINTYRE, R. M., and BLASHFIELD, R. K. (1980), “A Nearest Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure,”Multivariate Behavioral Research, 15, 225–238.

    Google Scholar 

  • MCLACHLAN, G. J., and BASFORD, K. E. (1988),Mixture Models: Inferences and Applications to Clustering, New York: Marcel Dekker.

    Google Scholar 

  • MILLIGAN, G. W. (1989), “A Validation Study of a Variable Weighting System for Cluster Analysis,”Journal of Classification, 6, 53–72.

    Google Scholar 

  • MILLIGAN, G. W., and COOPER, M. C. (1988), “A Study of Standardization of Variables in Cluster Analysis,”Journal of Classification, 5, 181–204.

    Google Scholar 

  • MILLIGAN, G. W., and COOPER, M. C. (1987), “Methodology Review: Clustering Methods,”Applied Psychological Measurement, 11, 329–354.

    Google Scholar 

  • MILLIGAN, G. W., and COOPER, M. C. (1986), “A Study of the Comparability of Extemal Criteria for Hierarchical Cluster Analysis,”Multivariate Behavioral Research, 21, 441–458.

    Google Scholar 

  • MILLIGAN, G. W., and SOKOL, L. M. (1980), “A Two-Stage Clustering Algorithm with Robust Recovery Characteristics,”Educational and Psychological Measurement, 40, 755–759.

    Google Scholar 

  • MORRISON, D. G. (1967), “Measurement Problems in Cluster Analysis,”Management Science, 13 B, 775–780.

    Google Scholar 

  • NEIDELL, L. A. (1970), “Procedures and Pitfalls in Cluster Analysis,”Proceedings, Fall Conference, Chicago: American Marketing Association.

    Google Scholar 

  • PUNJ, G., and STEWART, D. W. (1983), “Cluster Analysis in Marketing Research: Review and Suggestions for Application,”Journal of Marketing Research, 20, 134–148.

    Google Scholar 

  • SCHMIDT, F. L. (1971), “The Relative Efficiency of Regression and Simple Unit Predictor weights in Applied Differential Psychology,”Educational and Psychological Measurement, 31, 699–714.

    Google Scholar 

  • SRIVASTAVA, R. K., ALPERT, M. I., and SHOCKER, A. P. (1984), “A Customer-Oriented Approach for Determining Market Structures,”Journal of Marketing, 48, 32–48.

    Google Scholar 

  • WARD, J. H. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.

    Google Scholar 

  • WOLFE, J. H. (1970), “Pattern Clustering by Multivariate Mixture Analysis,”Multivariate Behavior Research, 5, 329–350.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

The authors would like to acknowledge the support of the Citibank Fellowship from the Sol C. Snider Entrepreneurial Center at the Wharton School. The authors would like to express their appreciation to J. Douglas Carroll and Abba M. Kreiger for comments on an earlier version of the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Green, P.E., Kim, J. & Carmone, F.J. A preliminary study of optimal variable weighting in k-means clustering. Journal of Classification 7, 271–285 (1990). https://doi.org/10.1007/BF01908720

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01908720

Keywords

Navigation