Skip to main content
Erschienen in: Journal of Classification 3/2021

04.06.2021 | Original Research

On Bayesian Analysis of Parsimonious Gaussian Mixture Models

verfasst von: Xiang Lu, Yaoxiang Li, Tanzy Love

Erschienen in: Journal of Classification | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cluster analysis is the task of grouping a set of objects in such a way that objects in the same cluster are similar to each other. It is widely used in many fields including machine learning, bioinformatics, and computer graphics. In all of these applications, the partition is an inference goal, along with the number of clusters and their distinguishing characteristics. Mixtures of factor analyzers is a special case of model-based clustering which assumes the variance of each cluster comes from a factor analysis model. It simplifies the Gaussian mixture model through parameter dimension reduction and conceptually represents the variables as coming from a lower dimensional subspace where the clusters are separate. In this paper, we introduce a new RJMCMC (reversible-jump Markov chain Monte Carlo) inferential procedure for the family of constrained MFA models.
The three goals of inference here are the partition of the objects, estimation of the number of clusters, and identification and estimation of the covariance structure of the clusters; each therefore has posterior distributions. RJMCMC is the major sampling tool, which allows the dimension of the parameters to be estimated. We present simulations comparing the estimation of the clustering parameters and the partition between this inferential technique and previous methods. Finally, we illustrate these new methods with a dataset of DNA methylation measures for subjects with different brain tumor types. Our method uses four latent factors to correctly discover the five brain tumor types without assuming a constant variance structure and it classifies subjects with an excellent classification performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Capper, D., Jones, D.T.W., Sill, M., Hovestadt, V., Schrimpf, D., Sturm, D., Koellsche, C., Sahm, F., Chavez, L., Reuss, D.E., & et al. (2018). DNA methylation-based classification of central nervous system tumours. Nature, 555(7697), 469–474.CrossRef Capper, D., Jones, D.T.W., Sill, M., Hovestadt, V., Schrimpf, D., Sturm, D., Koellsche, C., Sahm, F., Chavez, L., Reuss, D.E., & et al. (2018). DNA methylation-based classification of central nervous system tumours. Nature, 555(7697), 469–474.CrossRef
Zurück zum Zitat Diebolt, J., & Robert, C.P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society: Series B (Methodological), 56(2), 363–375.MathSciNetMATH Diebolt, J., & Robert, C.P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society: Series B (Methodological), 56(2), 363–375.MathSciNetMATH
Zurück zum Zitat Escobar, M.D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the american statistical association, 90(430), 577–588.MathSciNetCrossRef Escobar, M.D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the american statistical association, 90(430), 577–588.MathSciNetCrossRef
Zurück zum Zitat Fokoué, E, & Titterington, D.M. (2003). Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation. Machine Learning, 50 (1-2), 73–94.CrossRef Fokoué, E, & Titterington, D.M. (2003). Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation. Machine Learning, 50 (1-2), 73–94.CrossRef
Zurück zum Zitat Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. In Food research and data analysis: proceedings from the IUFoST Symposium, September 20-23, 1982, Oslo, Norway/edited by H. Martens and H. Russwurm, Jr, London, Applied Science Publishers. Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. In Food research and data analysis: proceedings from the IUFoST Symposium, September 20-23, 1982, Oslo, Norway/edited by H. Martens and H. Russwurm, Jr, London, Applied Science Publishers.
Zurück zum Zitat Forina, M., Leardi, R., Armanino, C., Lanteri, S., Conti, P., & Princi, P. (1988). PARVUS: An extendable package of programs for data exploration, classification and correlation. Journal of Chemometrics, 4(2), 191–193. Forina, M., Leardi, R., Armanino, C., Lanteri, S., Conti, P., & Princi, P. (1988). PARVUS: An extendable package of programs for data exploration, classification and correlation. Journal of Chemometrics, 4(2), 191–193.
Zurück zum Zitat Ghahramani, Z., Hinton, G.E., & et al. (1996). The EM algorithm for mixtures of factor analyzers. Technical report, Technical Report CRG-TR-96-1 University of Toronto. Ghahramani, Z., Hinton, G.E., & et al. (1996). The EM algorithm for mixtures of factor analyzers. Technical report, Technical Report CRG-TR-96-1 University of Toronto.
Zurück zum Zitat Hoadley, K.A., Yau, C., Hinoue, T., Wolf, D.M., Lazar, A.J., Drill, E., Shen, R., Taylor, A.M., Cherniack, A.D., & Thorsson, V. (2018). Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173(2), 291–304.CrossRef Hoadley, K.A., Yau, C., Hinoue, T., Wolf, D.M., Lazar, A.J., Drill, E., Shen, R., Taylor, A.M., Cherniack, A.D., & Thorsson, V. (2018). Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173(2), 291–304.CrossRef
Zurück zum Zitat Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193–218.CrossRef Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193–218.CrossRef
Zurück zum Zitat Larjo, A., & Lähdesmäki, H. (2015). Using multi-step proposal distribution for improved MCMC convergence in Bayesian network structure learning. EURASIP Journal on Bioinformatics and Systems Biology, 2015(1), 6.CrossRef Larjo, A., & Lähdesmäki, H. (2015). Using multi-step proposal distribution for improved MCMC convergence in Bayesian network structure learning. EURASIP Journal on Bioinformatics and Systems Biology, 2015(1), 6.CrossRef
Zurück zum Zitat Lopes, H.F., & West, M. (2004). Bayesian model assessment in factor analysis. Statistica Sinica, 14(1), 41–67.MathSciNetMATH Lopes, H.F., & West, M. (2004). Bayesian model assessment in factor analysis. Statistica Sinica, 14(1), 41–67.MathSciNetMATH
Zurück zum Zitat Lu, X. (2019). Model selection and variable selection for the mixture of factor analyzers model. PhD thesis, University of Rochester. Lu, X. (2019). Model selection and variable selection for the mixture of factor analyzers model. PhD thesis, University of Rochester.
Zurück zum Zitat McLachlan, G., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the seventeenth international conference on machine learning, San Francisco, pages 599–606. Morgan Kaufmann. McLachlan, G., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the seventeenth international conference on machine learning, San Francisco, pages 599–606. Morgan Kaufmann.
Zurück zum Zitat McLachlan, G.J., & Basford, K.E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker Inc.MATH McLachlan, G.J., & Basford, K.E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker Inc.MATH
Zurück zum Zitat McLachlan, G.J., Peel, D., & Bean, R.W. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis, 41(3-4), 379–388.MathSciNetCrossRef McLachlan, G.J., Peel, D., & Bean, R.W. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis, 41(3-4), 379–388.MathSciNetCrossRef
Zurück zum Zitat McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.MathSciNetCrossRef McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.MathSciNetCrossRef
Zurück zum Zitat Meng, X.L., & Dyk, D.V. (1997). The EM algorithm—an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3), 511–567.MathSciNetCrossRef Meng, X.L., & Dyk, D.V. (1997). The EM algorithm—an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3), 511–567.MathSciNetCrossRef
Zurück zum Zitat Mengersen, K.L., & Robert, C.P. (1996). Testing for mixtures: a Bayesian entropic approach, MA: Oxford University Press, Cambridge. Mengersen, K.L., & Robert, C.P. (1996). Testing for mixtures: a Bayesian entropic approach, MA: Oxford University Press, Cambridge.
Zurück zum Zitat Murphy, K., Viroli, C., & Gormley, I.C. (2020). Infinite mixtures of infinite factor analysers. Bayesian Analysis, 15(3), 937–963.MathSciNetCrossRef Murphy, K., Viroli, C., & Gormley, I.C. (2020). Infinite mixtures of infinite factor analysers. Bayesian Analysis, 15(3), 937–963.MathSciNetCrossRef
Zurück zum Zitat Nobile, A. (1994). Bayesian analysis of finite mixture distributions. Pittsburgh: PhD thesis, PhD Thesis. Carnegie Mellon University. Nobile, A. (1994). Bayesian analysis of finite mixture distributions. Pittsburgh: PhD thesis, PhD Thesis. Carnegie Mellon University.
Zurück zum Zitat Panagiotis, P. (2018). Overfitting Bayesian mixtures of factor analyzers with an unknown number of components. Computational Statistics & Data Analysis, 124, 220–234.MathSciNetCrossRef Panagiotis, P. (2018). Overfitting Bayesian mixtures of factor analyzers with an unknown number of components. Computational Statistics & Data Analysis, 124, 220–234.MathSciNetCrossRef
Zurück zum Zitat Phillips, D.B., & Smith, A.F.M. (1996). Bayesian model comparison via jump diffusions, (pp. 215–239). New York: Springer.MATH Phillips, D.B., & Smith, A.F.M. (1996). Bayesian model comparison via jump diffusions, (pp. 215–239). New York: Springer.MATH
Zurück zum Zitat Press, S.J., & Shigemasu, K. (1989). Bayesian inference in factor analysis, (pp. 271–287). New York: Springer. Press, S.J., & Shigemasu, K. (1989). Bayesian inference in factor analysis, (pp. 271–287). New York: Springer.
Zurück zum Zitat Richardson, S., & Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: series B (statistical methodology), 59(4), 731–792.CrossRef Richardson, S., & Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: series B (statistical methodology), 59(4), 731–792.CrossRef
Zurück zum Zitat Rodríguez-Paredes, M, & Manel, E. (2011). Cancer epigenetics reaches mainstream oncology. Nature Medicine, 17(3), 330.CrossRef Rodríguez-Paredes, M, & Manel, E. (2011). Cancer epigenetics reaches mainstream oncology. Nature Medicine, 17(3), 330.CrossRef
Zurück zum Zitat Roeder, K., & Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. Journal of the American Statistical Association, 92 (439), 894–902.MathSciNetCrossRef Roeder, K., & Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. Journal of the American Statistical Association, 92 (439), 894–902.MathSciNetCrossRef
Zurück zum Zitat Rousseau, J., & Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society Series B (Statistical Methodology), 73(5), 689–710.MathSciNetCrossRef Rousseau, J., & Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society Series B (Statistical Methodology), 73(5), 689–710.MathSciNetCrossRef
Zurück zum Zitat Sturm, D., Orr, B.A., Toprak, U.H., Hovestadt, V., Jones, D.T.W., Capper, D., Sill, M., Buchhalter, I., Northcott, P.A., Leis, I., & et al. (2016). New brain tumor entities emerge from molecular classification of CNS-PNETs. Cell, 164(5), 1060–1072.CrossRef Sturm, D., Orr, B.A., Toprak, U.H., Hovestadt, V., Jones, D.T.W., Capper, D., Sill, M., Buchhalter, I., Northcott, P.A., Leis, I., & et al. (2016). New brain tumor entities emerge from molecular classification of CNS-PNETs. Cell, 164(5), 1060–1072.CrossRef
Zurück zum Zitat Tipping, M.E., & Bishop, C.M. (1999). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443–482.CrossRef Tipping, M.E., & Bishop, C.M. (1999). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443–482.CrossRef
Zurück zum Zitat Utsugi, A., & Kumagai, T. (2001). Bayesian analysis of mixtures of factor analyzers. Neural Computation, 13(5), 993–1002.CrossRef Utsugi, A., & Kumagai, T. (2001). Bayesian analysis of mixtures of factor analyzers. Neural Computation, 13(5), 993–1002.CrossRef
Zurück zum Zitat Vats, D., Flegal, J.M., & Jones, G.L. (2019). Multivariate output analysis for Markov chain Monte Carlo. Biometrika, 106(2), 321–337.MathSciNetCrossRef Vats, D., Flegal, J.M., & Jones, G.L. (2019). Multivariate output analysis for Markov chain Monte Carlo. Biometrika, 106(2), 321–337.MathSciNetCrossRef
Metadaten
Titel
On Bayesian Analysis of Parsimonious Gaussian Mixture Models
verfasst von
Xiang Lu
Yaoxiang Li
Tanzy Love
Publikationsdatum
04.06.2021
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 3/2021
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-021-09391-8

Weitere Artikel der Ausgabe 3/2021

Journal of Classification 3/2021 Zur Ausgabe

Premium Partner