An Experimental Comparison of Model-Based Clustering Methods

Meilă, Marina; Heckerman, David

doi:10.1023/A:1007648401407

An Experimental Comparison of Model-Based Clustering Methods

Published: January 2001

Volume 42, pages 9–29, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

An Experimental Comparison of Model-Based Clustering Methods

Download PDF

Marina Meilă¹ &
David Heckerman¹

2889 Accesses
134 Citations
Explore all metrics

Abstract

We compare the three basic algorithms for model-based clustering on high-dimensional discrete-variable datasets. All three algorithms use the same underlying model: a naive-Bayes model with a hidden root node, also known as a multinomial-mixture model. In the first part of the paper, we perform an experimental comparison between three batch algorithms that learn the parameters of this model: the Expectation–Maximization (EM) algorithm, a “winner take all” version of the EM algorithm reminiscent of the K-means algorithm, and model-based agglomerative clustering. We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization methods on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned models that are similar in quality.

References

Banfield, J. & Raftery, A. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821.
Google Scholar
Bauer, E., Koller, D., & Singer, Y. (1997). Update rules for parameter estimation in Bayesian networks. In D. Geiger and P. Shenoy (Eds.), Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, RI, (pp. 3–13). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Celeux, G. & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14, 315–332.
Google Scholar
Cheeseman, P. & Stutz, J. (1995). Bayesian classification (AutoClass): Theory and results. In U. Fayyad, G. Piatesky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.) Advances in Knowledge Discovery and Data Mining (pp. 153–180). Menlo Park, CA: AAAI Press.
Google Scholar
Chickering, D. & Heckerman, D. (1997). Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29, 181–212.
Google Scholar
Clogg, C. (1995). Latent class models. In Handbook of Statistical Modeling for the Social and Behavioral Sciences (pp. 311–359). New York: Plenum Press.
Google Scholar
DeGroot, M. (1970). Optimal Statistical Decisions. New York, NY: McGraw-Hill.
Google Scholar
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1–38.
Google Scholar
Dobson, A. J. (1990). An Introduction to Generalized Linear Models. New York, NY: Chapman and Hall.
Google Scholar
Duda, R. O. & Hart, P. E. (1973) Pattern Classification and Scene Analysis. New York, NY: John Wiley & Sons.
Google Scholar
Fisher, D. (1996). Iterative optimization and simplification of hierarchical clustering. Journal of Artificial Intelligence Research, 4:270:281.
Google Scholar
Fraley, C. (1997). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing, 20, 270–281.
Google Scholar
Frey, B., Hinton, G., & Dayan, P. (1996). Does the wake-sleep algorithm produce good density estimators? In D. Touretsky, M. Mozer, & M. Hasselmo, (Eds.), Neural Information Processing Systems (Vol. 8, pp. 661–667). Cambridge, MA: MIT Press.
Google Scholar
Jain, A. K. & Dubes, R. C. (1988). Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Meilă, M. & Heckerman, D. (February, 1998). An experimental comparison of several clustering and initialization methods. Technical Report MSR-TR-98-06, Microsoft Research, Redmond, WA.
Google Scholar
Thiesson, B. (1995). Accelerated quantification of Bayesian networks with incomplete data. In Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU (pp. 306–311). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Thiesson, B., Meek, C., Chickering, D., & Heckerman, D. (1999). Computationally efficient methods for selecting among mixtures of graphical models, with discussion. In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting (pp. 631–656), Oxford: Oxford University Press.
Google Scholar
Zipf, G. (1949). Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, 98052, USA
Marina Meilă & David Heckerman

Authors

Marina Meilă
View author publications
You can also search for this author in PubMed Google Scholar
David Heckerman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meilă, M., Heckerman, D. An Experimental Comparison of Model-Based Clustering Methods. Machine Learning 42, 9–29 (2001). https://doi.org/10.1023/A:1007648401407

Download citation

Issue Date: January 2001
DOI: https://doi.org/10.1023/A:1007648401407

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Experimental Comparison of Model-Based Clustering Methods

Abstract

Article PDF

Similar content being viewed by others

Recent Developments in Model-Based Clustering with Applications

Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions

Expectation Maximization Clustering

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

An Experimental Comparison of Model-Based Clustering Methods

Abstract

Article PDF

Similar content being viewed by others

Recent Developments in Model-Based Clustering with Applications

Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions

Expectation Maximization Clustering

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation