Dimension reduction for model-based clustering

Scrucca, Luca

doi:10.1007/s11222-009-9138-7

Dimension reduction for model-based clustering

Published: 01 July 2009

Volume 20, pages 471–484, (2010)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Luca Scrucca¹

813 Accesses
42 Citations
2 Altmetric
Explore all metrics

Abstract

We introduce a dimension reduction method for visualizing the clustering structure obtained from a finite mixture of Gaussian densities. Information on the dimension reduction subspace is obtained from the variation on group means and, depending on the estimated mixture model, on the variation on group covariances. The proposed method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original features which capture most of the cluster structure contained in the data. Observations may then be projected onto such a reduced subspace, thus providing summary plots which help to visualize the clustering structure. These plots can be particularly appealing in the case of high-dimensional data and noisy structure. The new constructed variables capture most of the clustering information available in the data, and they can be further reduced to improve clustering performance. We illustrate the approach on both simulated and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Banfield, J., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Article MATH MathSciNet Google Scholar
Bernard-Michel, C., Girard, S.: Gaussian regularized sliced inverse regression. Stat. Comput. 19(1), 85–98 (2009)
Article Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. 22(7), 719–725 (2000)
Article Google Scholar
Bishop, C.M., Tipping, M.E.: A hierarchical latent variable model for data visualization. IEEE Trans. Pattern Anal. 20(3), 281–293 (1998)
Article Google Scholar
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007)
Article MATH MathSciNet Google Scholar
Campbell, N., Mahon, R.: A multivariate study of variation in two species of rock crab of genus leptograpsus. Aust. J. Zool. 22, 417–425 (1974)
Article Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28, 781–793 (1995)
Article Google Scholar
Celeux, G., Soromenho, G.: An entropy criterion for assessing the number of clusters in a mixture model. J. Classif. 2, 195–212 (1996)
Article MathSciNet Google Scholar
Chang, W.: On using principal components before separating a mixture of two multivariate normal distributions. J. R. Stat. Soc. C Appl. Stat. 32(3) (1983)
Cook, R.D.: Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York (1998)
MATH Google Scholar
Cook, R.D., Yin, X.: Dimension reduction and visualization in discriminant analysis (with discussion). Aust. NZ J. Stat. 43, 147–199 (2001)
Article MATH MathSciNet Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm (with discussion). J. R. Stat. Soc. B Methodol. 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986). ftp://ftp.ics.uci.edu/pub/machine-learning-databases/wine. Wine Recognition Database
Google Scholar
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)
Article MATH Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Article MATH MathSciNet Google Scholar
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Tech. Rep. 504, Department of Statistics, University of Washington (2006)
Ghosh, D., Chinnaiyan, A.M.: Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18(2), 275–286 (2002)
Article Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)
MATH Google Scholar
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Article MATH Google Scholar
Li, K.C.: Sliced inverse regression for dimension reduction (with discussion). J. Am. Stat. Assoc. 86, 316–342 (1991)
Article MATH Google Scholar
Li, K.C.: High dimensional data analysis via the SIR/PHD approach. Unpublished manuscript (2000)
Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection for clustering with Gaussian mixture models. Biometrics (2009, to appear)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Book MATH Google Scholar
McLachlan, G., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)
Article MathSciNet Google Scholar
McNicholas, P., Murphy, T.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)
Article MathSciNet Google Scholar
Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168–178 (2006)
Article MATH MathSciNet Google Scholar
Schwartz, G.: Estimating the dimension of a model. Ann. Stat. 6, 31–38 (1978)
Google Scholar
Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Comput. 11(2), 443–482 (1999a)
Article Google Scholar
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. B Methodol. 61, 611–622 (1999b)
Article MATH MathSciNet Google Scholar
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10), 977–987 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Economia, Finanza e Statistica, Università degli Studi di Perugia, Perugia, Italy
Luca Scrucca

Authors

Luca Scrucca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Scrucca.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scrucca, L. Dimension reduction for model-based clustering. Stat Comput 20, 471–484 (2010). https://doi.org/10.1007/s11222-009-9138-7

Download citation

Received: 03 November 2008
Accepted: 16 June 2009
Published: 01 July 2009
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11222-009-9138-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimension reduction for model-based clustering

Abstract

Access this article

Similar content being viewed by others

Gaussian mixture model with an extended ultrametric covariance structure

Mixture models for simultaneous classification and reduction of three-way data

Modelling the role of variables in model-based cluster analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dimension reduction for model-based clustering

Abstract

Access this article

Similar content being viewed by others

Gaussian mixture model with an extended ultrametric covariance structure

Mixture models for simultaneous classification and reduction of three-way data

Modelling the role of variables in model-based cluster analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation