Skip to main content

2016 | OriginalPaper | Buchkapitel

Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions

verfasst von : Sharon X. Lee, Geoffrey J. McLachlan

Erschienen in: Advanced Data Mining and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, finite mixtures of skew distributions are gaining popularity as a flexible tool for modelling data with asymmetric distributional features. Parameter estimation for these mixture models via the traditional EM algorithm requires the number of components to be specified a priori. In this paper, we consider unsupervised learning of skew mixture models where the optimal number of components is estimated during the parameter estimation process. We adopt a component-wise EM algorithm and use the minimum message length (MML) criterion. For illustrative purposes, we focus on the case of a finite mixture of multivariate skew t distributions. The performance of the approach is demonstrated on a real dataset from flow cytometry, where our mixture model was used to provide an automated segmentation of cell populations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abanto-Valle, C.A., Lachos, V.H., Dey, D.K.: Bayesian estimation of a skew-student-\(t\) stochastic volatility model. Methodol. Comput. Appl. Probab. 17, 721–738 (2015)MathSciNetCrossRefMATH Abanto-Valle, C.A., Lachos, V.H., Dey, D.K.: Bayesian estimation of a skew-student-\(t\) stochastic volatility model. Methodol. Comput. Appl. Probab. 17, 721–738 (2015)MathSciNetCrossRefMATH
3.
Zurück zum Zitat Asparouhov, T., Muthén, B.: Structural equation models and mixture models with continuous non-normal skewed distributions. Structural Equation Modeling (2015) Asparouhov, T., Muthén, B.: Structural equation models and mixture models with continuous non-normal skewed distributions. Structural Equation Modeling (2015)
4.
Zurück zum Zitat Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J. Roy. Stat. Soc. B 65, 367–389 (2003)MathSciNetCrossRefMATH Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J. Roy. Stat. Soc. B 65, 367–389 (2003)MathSciNetCrossRefMATH
6.
Zurück zum Zitat Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)MathSciNetCrossRefMATH Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)MathSciNetCrossRefMATH
7.
Zurück zum Zitat Celeux, G., Chrétien, S., Forbes, F., MkhadrA.: A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4) (2001) Celeux, G., Chrétien, S., Forbes, F., MkhadrA.: A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4) (2001)
8.
Zurück zum Zitat Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 3813 (2002)CrossRef Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 3813 (2002)CrossRef
9.
Zurück zum Zitat Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)CrossRef Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)CrossRef
10.
Zurück zum Zitat Hu, X., Kim, H., Brennan, P.J., Han, B., Baecher-Allan, C.M., Jager, P.L., Brenner, M.B., Raychaudhuri, S.: Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells. In: Proceedings of the National Academy of Sciences USA, vol. 110, pp. 19030–19035 (2013) Hu, X., Kim, H., Brennan, P.J., Han, B., Baecher-Allan, C.M., Jager, P.L., Brenner, M.B., Raychaudhuri, S.: Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells. In: Proceedings of the National Academy of Sciences USA, vol. 110, pp. 19030–19035 (2013)
11.
Zurück zum Zitat Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat. Comput. 24, 181–202 (2014)MathSciNetCrossRefMATH Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat. Comput. 24, 181–202 (2014)MathSciNetCrossRefMATH
12.
Zurück zum Zitat Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)MathSciNetCrossRefMATH Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)MathSciNetCrossRefMATH
13.
Zurück zum Zitat Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asym- metric mixture nodels with applications to wstimation of value-at-risk. In: MODSIM 2013, 20th International Congress on Modelling and Simulation, pp. 1228–1234, Adelaide, Australia (2013) Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asym- metric mixture nodels with applications to wstimation of value-at-risk. In: MODSIM 2013, 20th International Congress on Modelling and Simulation, pp. 1228–1234, Adelaide, Australia (2013)
14.
Zurück zum Zitat Lee, S.X., McLachlan, G.J.: On mixtures of skew-normal and skew \(t\)-distributions. Adv. Data Anal. Classif. 7, 241–266 (2013)MathSciNetCrossRefMATH Lee, S.X., McLachlan, G.J.: On mixtures of skew-normal and skew \(t\)-distributions. Adv. Data Anal. Classif. 7, 241–266 (2013)MathSciNetCrossRefMATH
15.
Zurück zum Zitat Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)MathSciNetCrossRefMATH Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)MathSciNetCrossRefMATH
16.
Zurück zum Zitat Lee, S.X., McLachlan, G.J.: Risk measures based on multivariate skew normal and skew \(t\)-mixture models. In: Alcock, J., Satchell, S. (eds.) Asymmetric Dependence in Finance. Wiley, Hoboken, New Jersey (2016, to appear) Lee, S.X., McLachlan, G.J.: Risk measures based on multivariate skew normal and skew \(t\)-mixture models. In: Alcock, J., Satchell, S. (eds.) Asymmetric Dependence in Finance. Wiley, Hoboken, New Jersey (2016, to appear)
17.
Zurück zum Zitat Lee, S.X., McLachlan, G.J., Pyne, S.: Supervised classification of flow cytometric samples via the Joint Clustering and Matching (JCM) procedure. arXiv:1411.2820 [q-bio.QM] (2014) Lee, S.X., McLachlan, G.J., Pyne, S.: Supervised classification of flow cytometric samples via the Joint Clustering and Matching (JCM) procedure. arXiv:​1411.​2820 [q-bio.QM] (2014)
18.
Zurück zum Zitat Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A (2016) Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A (2016)
19.
Zurück zum Zitat Lin, T.I.: Robust mixture modeling using multivariate skew-\(t\) distribution. Stat. Comput. 20, 343–356 (2010)MathSciNetCrossRef Lin, T.I.: Robust mixture modeling using multivariate skew-\(t\) distribution. Stat. Comput. 20, 343–356 (2010)MathSciNetCrossRef
20.
Zurück zum Zitat Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution. Stat. Comput. 24, 531–546 (2014)MathSciNetCrossRefMATH Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution. Stat. Comput. 24, 531–546 (2014)MathSciNetCrossRefMATH
21.
Zurück zum Zitat Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)MathSciNetCrossRefMATH Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)MathSciNetCrossRefMATH
22.
Zurück zum Zitat Lin, T.I., Wu, P.H., McLachlan, G.J., Lee, S.X.: A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24, 510–531 (2015)MathSciNetCrossRefMATH Lin, T.I., Wu, P.H., McLachlan, G.J., Lee, S.X.: A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24, 510–531 (2015)MathSciNetCrossRefMATH
23.
Zurück zum Zitat McLachlan, G.J., Lee, S.X.: Comment on “Comparing Two Formulations of Skew Distributions with Special Reference to Model-Based Clustering” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. arXiv:1404.1733 (2014) McLachlan, G.J., Lee, S.X.: Comment on “Comparing Two Formulations of Skew Distributions with Special Reference to Model-Based Clustering” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. arXiv:​1404.​1733 (2014)
24.
Zurück zum Zitat McLachlan, G.J., Lee, S.X.: Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Statistics and Probaility Letters 116, 1–5 (2016)MathSciNetCrossRefMATH McLachlan, G.J., Lee, S.X.: Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Statistics and Probaility Letters 116, 1–5 (2016)MathSciNetCrossRefMATH
25.
Zurück zum Zitat Muthén, B., Asparouhov, T.: Growth mixture modeling with non-normal distributions. Stat. Med. 34, 1041–1058 (2014)MathSciNetCrossRef Muthén, B., Asparouhov, T.: Growth mixture modeling with non-normal distributions. Stat. Med. 34, 1041–1058 (2014)MathSciNetCrossRef
26.
Zurück zum Zitat Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. In: Proceedings of the National Academy of Sciences USA, vol. 106, pp. 8519–8524 (2009) Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. In: Proceedings of the National Academy of Sciences USA, vol. 106, pp. 8519–8524 (2009)
27.
Zurück zum Zitat Pyne, S., Lee, S.X., Wang, K., Irish, J., Tamayo, P., Nazaire, M.D., Duong, T., Ng, S.K., Hafler, D., Levy, R., Nolan, G.P., Mesirov, J., McLachlan, G.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLOS ONE 9, e100334 (2014)CrossRef Pyne, S., Lee, S.X., Wang, K., Irish, J., Tamayo, P., Nazaire, M.D., Duong, T., Ng, S.K., Hafler, D., Levy, R., Nolan, G.P., Mesirov, J., McLachlan, G.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLOS ONE 9, e100334 (2014)CrossRef
28.
Zurück zum Zitat Pyne, S., Lee, S., McLachlan, G.: Nature and man: The goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)MathSciNet Pyne, S., Lee, S., McLachlan, G.: Nature and man: The goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)MathSciNet
29.
Zurück zum Zitat Riggi, S., Ingrassia, S.: A model-based clustering approach for mass composition analysis of high energy cosmic rays. Astropart. Phys. 48, 86–96 (2013)CrossRef Riggi, S., Ingrassia, S.: A model-based clustering approach for mass composition analysis of high energy cosmic rays. Astropart. Phys. 48, 86–96 (2013)CrossRef
30.
Zurück zum Zitat Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)MathSciNetCrossRefMATH Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)MathSciNetCrossRefMATH
31.
Zurück zum Zitat Schaarschmidt, F., Hofmann, M., Jaki, T., Grün, B., Hothorn, L.A.: Statistical approaches for the determination of cut points in anti-drug antibody bioassays. J. Immunol. Methods 25, 295–306 (2015) Schaarschmidt, F., Hofmann, M., Jaki, T., Grün, B., Hothorn, L.A.: Statistical approaches for the determination of cut points in anti-drug antibody bioassays. J. Immunol. Methods 25, 295–306 (2015)
32.
Zurück zum Zitat Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–189 (1968)CrossRefMATH Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–189 (1968)CrossRefMATH
33.
Zurück zum Zitat Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of Conference of Digital Image Computing: Techniques and Applications, pp. 526–531, Los Alamitos, California (2009) Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of Conference of Digital Image Computing: Techniques and Applications, pp. 526–531, Los Alamitos, California (2009)
Metadaten
Titel
Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions
verfasst von
Sharon X. Lee
Geoffrey J. McLachlan
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49586-6_49