Skip to main content
Log in

An algorithm for generating artificial test clusters

  • Computational Psychometrics
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bayne, C. K., Beauchamp, J. J., Begovich, C. L., & Kane, V. E. (1980). Monte Carlo comparisons of selected clustering procedures.Pattern Recognition, 12, 51–62.

    Google Scholar 

  • Blashfield, R. K. (1976). Mixture model test of cluster analysis: Accuracy of four agglomerative hierarchical methods.Psychological Bulletin, 83, 377–388.

    Google Scholar 

  • Blashfield, R. K., & Morey, L. C. (1980). A comparison of four clustering methods using MMPI Monte Carlo data.Applied Psychological Measurement, 4, 57–64.

    Google Scholar 

  • Cormack, R. M. (1971). A review of classification.Journal of the Royal Statistical Society (Series A),14, 279–298.

    Google Scholar 

  • Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies.Pattern Recognition, 11, 235–254.

    Google Scholar 

  • Edelbrock, C. (1979). Comparing the accuracy of hierarchical grouping techniques: The problem of classifying everybody.Multivariate Behavioral Research, 14, 367–384.

    Google Scholar 

  • Everitt, B. S. (1980).Cluster analysis (2nd ed.). London: Halstead Press.

    Google Scholar 

  • Hartigan, J. A. (1975).Clustering algorithms. New York: Wiley.

    Google Scholar 

  • Kuiper, F. K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures.Biometrika, 31, 86–101.

    Google Scholar 

  • Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms.Psychometrika, 45, 325–342.

    Google Scholar 

  • Milligan, G. W. (1981a). A Monte Carlo study of thirty internal criterion measures for cluster analysis.Psychometrika, 46, 187–199.

    Google Scholar 

  • Milligan, G. W. (1981b). A review of Monte Carlo tests of cluster analysis.Multivariate Behavioral Research, 16, 379–407.

    Google Scholar 

  • Milligan, G. W., & Cooper, M. C. (in press). An examination of procedures for determining the number of clusters in a data set.Psychometrika, 50.

  • Milligan, G. W., & Isaac, P. D. (1980). The validation of four ultrametric clustering algorithms.Pattern Recognition, 12, 41–50.

    Google Scholar 

  • Milligan, G. W., & Mahajan, V. (1980). A note on procedures for testing the quality of a clustering of a set of objects.Decision Sciences, 11, 669–677.

    Google Scholar 

  • Milligan, G. W., & Schilling, D. A. (in press). Asymptotic and Finite Sample Characteristics of Four External Criterion Measures.Multivariate Behavioral Research.

  • Milligan, G. W., Soon, S. C., & Sokol, L. M. (1983). The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure.IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 40–47.

    Google Scholar 

  • Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation.The Computer Journal, 20, 359–363.

    Google Scholar 

  • Morey, L., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the Rand statistic for chance agreement.Educational and Psychological Measurement, 44, 33–37.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milligan, G.W. An algorithm for generating artificial test clusters. Psychometrika 50, 123–127 (1985). https://doi.org/10.1007/BF02294153

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02294153

Key words

Navigation