Parsimonious Gaussian mixture models

McNicholas, Paul David; Murphy, Thomas Brendan

doi:10.1007/s11222-008-9056-0

Parsimonious Gaussian mixture models

Published: 19 April 2008

Volume 18, pages 285–296, (2008)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Paul David McNicholas¹ &
Thomas Brendan Murphy²

1757 Accesses
207 Citations
Explore all metrics

Abstract

Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases.

In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed.

These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803–821 (1993)
Article MathSciNet MATH Google Scholar
Bartholomew, D.J., Knott, M.: Latent Variable Models and Factor Analysis, 2nd edn. Edward Arnold, London (1999)
MATH Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Article Google Scholar
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Inst. Stat. Math. 46, 373–388 (1994)
Article MATH Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28, 781–793 (1995)
Article Google Scholar
Chang, W.-C.: On using principal components before separating a mixture of two multivariate normal distributions. J. Roy. Stat. Soc. Ser. C 32(3), 267–275 (1983)
MATH Google Scholar
Dean, N., Raftery, A.E.: The clustvarsel package. R package version 0.2-4 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977) (with discussion)
MathSciNet MATH Google Scholar
Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)
Google Scholar
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
MATH Google Scholar
Fraley, C., Raftery, A.E.: Mclust: Software for model-based clustering. J. Classif. 16, 297–306 (1999)
Article MATH Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–612 (2002)
Article MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E.: Enhanced model-based clustering, density estimation and discriminant analysis software: MCLUST. J. Classif. 20(2), 263–296 (2003)
Article MathSciNet MATH Google Scholar
Ghahramani, Z., Hinton, G.E.: The EM algorithm for factor analyzers. Technical report CRG-TR-96-1, University of Toronto, Toronto (1997)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar
Hurley, C.: Clustering visualizations of multivariate data. J. Comput. Graph. Stat. 13(4), 788–806 (2004)
Article MathSciNet Google Scholar
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Article MATH Google Scholar
Keribin, C.: Estimation consistante de l’ordre de modèles de mélange. C. R. Acad. Sci. Paris Sér. I Math. 326(2), 243–248 (1998)
MathSciNet MATH Google Scholar
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā Ser. A 62(1), 49–66 (2000)
MathSciNet MATH Google Scholar
Lindsay, B.: Mixture Models: Theory, Geometry and Applications. Institute of Mathematical Statistics, Hayward (1995)
Google Scholar
Lütkepohl, H.: Handbook of Matrices. Wiley, Chichester (1996)
MATH Google Scholar
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York (1988)
MATH Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
MATH Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
MATH Google Scholar
McLachlan, G.J., Peel, D., Bean, R.W.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41(3–4), 379–388 (2003)
Article MathSciNet Google Scholar
Meng, X.L., van Dyk, D.: The EM algorithm—an old folk song sung to the fast tune (with discussion). J. Roy. Stat. Soc. Ser. B 59, 511–567 (1997)
Article MATH Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipies in C—The Art of Scientific Computation, 2nd edn. Cambridge University Press, Cambridge (1992)
Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2004)
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)
Article Google Scholar
Schwartz, G.: Estimating the dimension of a model. Ann. Stat. 6, 31–38 (1978)
Google Scholar
Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904)
Article Google Scholar
Streuli, H.: Der heutige stand der kaffeechemie. In: 6th International Colloquium on Coffee Chemisrty. Association Scientifique International du Cafe, Bogatá, Columbia, pp. 61–72 (1973)
Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analysers. Neural Comput. 11(2), 443–482 (1999a)
Article Google Scholar
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc. Ser. B 61(3), 611–622 (1999b)
Article MathSciNet MATH Google Scholar
Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester (1985)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
Paul David McNicholas
School of Mathematical Sciences, University College Dublin, Belfield, Dublin, 4, Ireland
Thomas Brendan Murphy

Authors

Paul David McNicholas
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brendan Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Brendan Murphy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McNicholas, P.D., Murphy, T.B. Parsimonious Gaussian mixture models. Stat Comput 18, 285–296 (2008). https://doi.org/10.1007/s11222-008-9056-0

Download citation

Received: 28 January 2007
Accepted: 20 February 2008
Published: 19 April 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s11222-008-9056-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parsimonious Gaussian mixture models

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parsimonious Gaussian mixture models

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation