An algorithm for generating artificial test clusters

Milligan, Glenn W.

doi:10.1007/BF02294153

An algorithm for generating artificial test clusters

Computational Psychometrics
Published: March 1985

Volume 50, pages 123–127, (1985)
Cite this article

Psychometrika Aims and scope Submit manuscript

Glenn W. Milligan¹

402 Accesses
100 Citations
Explore all metrics

Abstract

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bayne, C. K., Beauchamp, J. J., Begovich, C. L., & Kane, V. E. (1980). Monte Carlo comparisons of selected clustering procedures.Pattern Recognition, 12, 51–62.
Google Scholar
Blashfield, R. K. (1976). Mixture model test of cluster analysis: Accuracy of four agglomerative hierarchical methods.Psychological Bulletin, 83, 377–388.
Google Scholar
Blashfield, R. K., & Morey, L. C. (1980). A comparison of four clustering methods using MMPI Monte Carlo data.Applied Psychological Measurement, 4, 57–64.
Google Scholar
Cormack, R. M. (1971). A review of classification.Journal of the Royal Statistical Society (Series A),14, 279–298.
Google Scholar
Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies.Pattern Recognition, 11, 235–254.
Google Scholar
Edelbrock, C. (1979). Comparing the accuracy of hierarchical grouping techniques: The problem of classifying everybody.Multivariate Behavioral Research, 14, 367–384.
Google Scholar
Everitt, B. S. (1980).Cluster analysis (2nd ed.). London: Halstead Press.
Google Scholar
Hartigan, J. A. (1975).Clustering algorithms. New York: Wiley.
Google Scholar
Kuiper, F. K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures.Biometrika, 31, 86–101.
Google Scholar
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms.Psychometrika, 45, 325–342.
Google Scholar
Milligan, G. W. (1981a). A Monte Carlo study of thirty internal criterion measures for cluster analysis.Psychometrika, 46, 187–199.
Google Scholar
Milligan, G. W. (1981b). A review of Monte Carlo tests of cluster analysis.Multivariate Behavioral Research, 16, 379–407.
Google Scholar
Milligan, G. W., & Cooper, M. C. (in press). An examination of procedures for determining the number of clusters in a data set.Psychometrika, 50.
Milligan, G. W., & Isaac, P. D. (1980). The validation of four ultrametric clustering algorithms.Pattern Recognition, 12, 41–50.
Google Scholar
Milligan, G. W., & Mahajan, V. (1980). A note on procedures for testing the quality of a clustering of a set of objects.Decision Sciences, 11, 669–677.
Google Scholar
Milligan, G. W., & Schilling, D. A. (in press). Asymptotic and Finite Sample Characteristics of Four External Criterion Measures.Multivariate Behavioral Research.
Milligan, G. W., Soon, S. C., & Sokol, L. M. (1983). The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure.IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 40–47.
Google Scholar
Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation.The Computer Journal, 20, 359–363.
Google Scholar
Morey, L., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the Rand statistic for chance agreement.Educational and Psychological Measurement, 44, 33–37.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Management Sciences, The Ohio State University, 301 Hagerty Hall, 43210, Columbus, OH
Glenn W. Milligan

Authors

Glenn W. Milligan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milligan, G.W. An algorithm for generating artificial test clusters. Psychometrika 50, 123–127 (1985). https://doi.org/10.1007/BF02294153

Download citation

Issue Date: March 1985
DOI: https://doi.org/10.1007/BF02294153

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An algorithm for generating artificial test clusters

Abstract

Access this article

Similar content being viewed by others

A Heuristic Automatic Clustering Method Based on Hierarchical Clustering

An empirical comparison and characterisation of nine popular clustering methods

Pairwise Data Clustering Accompanied by Validation and Visualisation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

An algorithm for generating artificial test clusters

Abstract

Access this article

Similar content being viewed by others

A Heuristic Automatic Clustering Method Based on Hierarchical Clustering

An empirical comparison and characterisation of nine popular clustering methods

Pairwise Data Clustering Accompanied by Validation and Visualisation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation