Abstract
Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one of these algorithms cross-validates extremely well.
The present study applies a k-means, optimal weighting procedure to two empirical data sets and contrasts its cross-validation performance with that of unit (i.e., equal) weighting of the variables. We find that the optimal weighting procedure cross-validates better in one of the two data sets. In the second data set its comparative performance strongly depends on the approach used to find seed values for the initial k-means partitioning.
Similar content being viewed by others
References
ARABIE, P., and BOORMAN, S. (1982), “Blockmodels: Developments and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., H. C. Hudson and Associates, San Francisco: Josey-Bass.
ARABLE, P., CARROLL, J. D., DESARBO, W. S., and WIND, Y. (1981), “Overlapping Clustering: A New Methodology for Product Positioning,”Journal of Marketing Research, 18, 310–317.
BLASHFIELD, R. K. (1976), “Mixture Model Tests of Cluster Analysis: Accuracy of Four Agglomerative Hierarchical Methods,”Psychological Bulletin, 83, 377–388.
BLASHFIELD, R. K., and ALDENDERFER, M. S. (1978), “Computer Programs for Performing Iterative Partitioning Cluster Analyses,”Applied Psychological Measurement, 2, 533–541.
CARROLL, J. D., GREEN, P. E., and SCHAFFER, C. M. (1987), “Comparing Interpoint Distances in Correspondence Analyses: A Clarification,”Journal of Marketing Research, 24, 445–450.
CHANG, W. (1983), “On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions,”Applied Statistics, 32, 267–275.
COLLINS, L. M., and DENT, C. W. (1988), “Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions,”Multivariate Behavioral Research, 23, 231–242.
COOPER, L. G. (1983), “A Review of Multidimensional Scaling in Marketing Research,”Applied Psychological Measurement, 7, 427–450.
DESARBO, W. S., CARROLL, J. D., CLARK, L., and GREEN, P. E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables,”Psychometrika, 49, 59–78.
DESARBO, W. S., and AJAHAN, V. (1984), “Constrained Classification: The Use ofA Priori Information in Cluster Analysis,”Psychometrika, 49, 187–216.
DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,”Quality and Quantity, 20, 169–180.
DE SOETE, G., DESARBO, W. S., and CARROLL, J. D. (1985), “Optimal Variable Weighting for Hierarchical Clustering: An Alternating Least Squares Approach,”Journal of Classification, 2, 173–192.
DICKENSON, J. R. (1986),Bibliography of Marketing Research Methods, Lexington, MA: Lexington, 580–597.
FISHER, D. G., and HOFFMAN, P. (1988), “The Adjusted Rand Statistic: A SAS Macro,”Psychometrika, 53, 417–423.
FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J. R. (1988_, “Variable Selection in Clustering,”Journal of Classification, 5, 205–228.
FRANK, R. E., and GREEN, P. E. (1968), “Numerical Taxonomy in Marketing Analysis: A Review Article,”Journal of Marketing Research, 5, 83–98.
GREEN, P. E., CARMONE, F. J., and SMITH, S. M. (1989),Multidimensional Scaling: Concepts and Applications, Boston: Allyn and Bacon.
GREEN, P. E., FRANK, R. E., and ROBINSON, P. J. (1967), “Cluster Analysis in Test Market Selection,”Management Science, 13 B, 387–400.
GREENACRE, M. J. (1984),Theory and Applications of Correspondence Analysis, London: Academic Press.
HOFFMAN, D. L., and FRANKE, G. G. (1986), “Correspondence Analysis: Graphical Representation of Categorical Data in Marketing Research,”Journal of Marketing Research, 23, 213–227.
HOWARD, N., and HARRIS, B. (1966), “A Hierarchical Grouping Routine, IBM 360/65 FORTRAN IV Program,” Philadelphia: University of Pennsylvania, Computer Center.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions,”Journal of Classification, 2, 193–218.
JANCEY, R. C. (1966), “Multidimensional Group Analysis,”Australian Journal of Botany, 14, 127–130.
JOHNSON, R. M. (1988), “Convergent Cluster Analysis System,” unpublished manuscript, Ketchum, ID: Sawtooth Software, April.
JOYCE, T., and CHANNON, C. (1966), “Classifying Marketing Survey Respondents,”Applied Statistics, 15, 191–215.
MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,”Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. I, 231–297.
MCINTYRE, R. M., and BLASHFIELD, R. K. (1980), “A Nearest Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure,”Multivariate Behavioral Research, 15, 225–238.
MCLACHLAN, G. J., and BASFORD, K. E. (1988),Mixture Models: Inferences and Applications to Clustering, New York: Marcel Dekker.
MILLIGAN, G. W. (1989), “A Validation Study of a Variable Weighting System for Cluster Analysis,”Journal of Classification, 6, 53–72.
MILLIGAN, G. W., and COOPER, M. C. (1988), “A Study of Standardization of Variables in Cluster Analysis,”Journal of Classification, 5, 181–204.
MILLIGAN, G. W., and COOPER, M. C. (1987), “Methodology Review: Clustering Methods,”Applied Psychological Measurement, 11, 329–354.
MILLIGAN, G. W., and COOPER, M. C. (1986), “A Study of the Comparability of Extemal Criteria for Hierarchical Cluster Analysis,”Multivariate Behavioral Research, 21, 441–458.
MILLIGAN, G. W., and SOKOL, L. M. (1980), “A Two-Stage Clustering Algorithm with Robust Recovery Characteristics,”Educational and Psychological Measurement, 40, 755–759.
MORRISON, D. G. (1967), “Measurement Problems in Cluster Analysis,”Management Science, 13 B, 775–780.
NEIDELL, L. A. (1970), “Procedures and Pitfalls in Cluster Analysis,”Proceedings, Fall Conference, Chicago: American Marketing Association.
PUNJ, G., and STEWART, D. W. (1983), “Cluster Analysis in Marketing Research: Review and Suggestions for Application,”Journal of Marketing Research, 20, 134–148.
SCHMIDT, F. L. (1971), “The Relative Efficiency of Regression and Simple Unit Predictor weights in Applied Differential Psychology,”Educational and Psychological Measurement, 31, 699–714.
SRIVASTAVA, R. K., ALPERT, M. I., and SHOCKER, A. P. (1984), “A Customer-Oriented Approach for Determining Market Structures,”Journal of Marketing, 48, 32–48.
WARD, J. H. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.
WOLFE, J. H. (1970), “Pattern Clustering by Multivariate Mixture Analysis,”Multivariate Behavior Research, 5, 329–350.
Author information
Authors and Affiliations
Additional information
The authors would like to acknowledge the support of the Citibank Fellowship from the Sol C. Snider Entrepreneurial Center at the Wharton School. The authors would like to express their appreciation to J. Douglas Carroll and Abba M. Kreiger for comments on an earlier version of the paper.
Rights and permissions
About this article
Cite this article
Green, P.E., Kim, J. & Carmone, F.J. A preliminary study of optimal variable weighting in k-means clustering. Journal of Classification 7, 271–285 (1990). https://doi.org/10.1007/BF01908720
Issue Date:
DOI: https://doi.org/10.1007/BF01908720