Skip to main content
Top

2016 | OriginalPaper | Chapter

Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions

Authors : Sharon X. Lee, Geoffrey J. McLachlan

Published in: Advanced Data Mining and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years, finite mixtures of skew distributions are gaining popularity as a flexible tool for modelling data with asymmetric distributional features. Parameter estimation for these mixture models via the traditional EM algorithm requires the number of components to be specified a priori. In this paper, we consider unsupervised learning of skew mixture models where the optimal number of components is estimated during the parameter estimation process. We adopt a component-wise EM algorithm and use the minimum message length (MML) criterion. For illustrative purposes, we focus on the case of a finite mixture of multivariate skew t distributions. The performance of the approach is demonstrated on a real dataset from flow cytometry, where our mixture model was used to provide an automated segmentation of cell populations.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abanto-Valle, C.A., Lachos, V.H., Dey, D.K.: Bayesian estimation of a skew-student-\(t\) stochastic volatility model. Methodol. Comput. Appl. Probab. 17, 721–738 (2015)MathSciNetCrossRefMATH Abanto-Valle, C.A., Lachos, V.H., Dey, D.K.: Bayesian estimation of a skew-student-\(t\) stochastic volatility model. Methodol. Comput. Appl. Probab. 17, 721–738 (2015)MathSciNetCrossRefMATH
3.
go back to reference Asparouhov, T., Muthén, B.: Structural equation models and mixture models with continuous non-normal skewed distributions. Structural Equation Modeling (2015) Asparouhov, T., Muthén, B.: Structural equation models and mixture models with continuous non-normal skewed distributions. Structural Equation Modeling (2015)
4.
go back to reference Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J. Roy. Stat. Soc. B 65, 367–389 (2003)MathSciNetCrossRefMATH Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J. Roy. Stat. Soc. B 65, 367–389 (2003)MathSciNetCrossRefMATH
6.
go back to reference Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)MathSciNetCrossRefMATH Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)MathSciNetCrossRefMATH
7.
go back to reference Celeux, G., Chrétien, S., Forbes, F., MkhadrA.: A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4) (2001) Celeux, G., Chrétien, S., Forbes, F., MkhadrA.: A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4) (2001)
8.
go back to reference Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 3813 (2002)CrossRef Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 3813 (2002)CrossRef
9.
go back to reference Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)CrossRef Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)CrossRef
10.
go back to reference Hu, X., Kim, H., Brennan, P.J., Han, B., Baecher-Allan, C.M., Jager, P.L., Brenner, M.B., Raychaudhuri, S.: Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells. In: Proceedings of the National Academy of Sciences USA, vol. 110, pp. 19030–19035 (2013) Hu, X., Kim, H., Brennan, P.J., Han, B., Baecher-Allan, C.M., Jager, P.L., Brenner, M.B., Raychaudhuri, S.: Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells. In: Proceedings of the National Academy of Sciences USA, vol. 110, pp. 19030–19035 (2013)
11.
go back to reference Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat. Comput. 24, 181–202 (2014)MathSciNetCrossRefMATH Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat. Comput. 24, 181–202 (2014)MathSciNetCrossRefMATH
12.
go back to reference Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)MathSciNetCrossRefMATH Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)MathSciNetCrossRefMATH
13.
go back to reference Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asym- metric mixture nodels with applications to wstimation of value-at-risk. In: MODSIM 2013, 20th International Congress on Modelling and Simulation, pp. 1228–1234, Adelaide, Australia (2013) Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asym- metric mixture nodels with applications to wstimation of value-at-risk. In: MODSIM 2013, 20th International Congress on Modelling and Simulation, pp. 1228–1234, Adelaide, Australia (2013)
14.
15.
go back to reference Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)MathSciNetCrossRefMATH Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)MathSciNetCrossRefMATH
16.
go back to reference Lee, S.X., McLachlan, G.J.: Risk measures based on multivariate skew normal and skew \(t\)-mixture models. In: Alcock, J., Satchell, S. (eds.) Asymmetric Dependence in Finance. Wiley, Hoboken, New Jersey (2016, to appear) Lee, S.X., McLachlan, G.J.: Risk measures based on multivariate skew normal and skew \(t\)-mixture models. In: Alcock, J., Satchell, S. (eds.) Asymmetric Dependence in Finance. Wiley, Hoboken, New Jersey (2016, to appear)
17.
go back to reference Lee, S.X., McLachlan, G.J., Pyne, S.: Supervised classification of flow cytometric samples via the Joint Clustering and Matching (JCM) procedure. arXiv:1411.2820 [q-bio.QM] (2014) Lee, S.X., McLachlan, G.J., Pyne, S.: Supervised classification of flow cytometric samples via the Joint Clustering and Matching (JCM) procedure. arXiv:​1411.​2820 [q-bio.QM] (2014)
18.
go back to reference Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A (2016) Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A (2016)
19.
20.
go back to reference Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution. Stat. Comput. 24, 531–546 (2014)MathSciNetCrossRefMATH Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution. Stat. Comput. 24, 531–546 (2014)MathSciNetCrossRefMATH
21.
go back to reference Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)MathSciNetCrossRefMATH Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)MathSciNetCrossRefMATH
22.
go back to reference Lin, T.I., Wu, P.H., McLachlan, G.J., Lee, S.X.: A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24, 510–531 (2015)MathSciNetCrossRefMATH Lin, T.I., Wu, P.H., McLachlan, G.J., Lee, S.X.: A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24, 510–531 (2015)MathSciNetCrossRefMATH
23.
go back to reference McLachlan, G.J., Lee, S.X.: Comment on “Comparing Two Formulations of Skew Distributions with Special Reference to Model-Based Clustering” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. arXiv:1404.1733 (2014) McLachlan, G.J., Lee, S.X.: Comment on “Comparing Two Formulations of Skew Distributions with Special Reference to Model-Based Clustering” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. arXiv:​1404.​1733 (2014)
24.
go back to reference McLachlan, G.J., Lee, S.X.: Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Statistics and Probaility Letters 116, 1–5 (2016)MathSciNetCrossRefMATH McLachlan, G.J., Lee, S.X.: Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Statistics and Probaility Letters 116, 1–5 (2016)MathSciNetCrossRefMATH
25.
go back to reference Muthén, B., Asparouhov, T.: Growth mixture modeling with non-normal distributions. Stat. Med. 34, 1041–1058 (2014)MathSciNetCrossRef Muthén, B., Asparouhov, T.: Growth mixture modeling with non-normal distributions. Stat. Med. 34, 1041–1058 (2014)MathSciNetCrossRef
26.
go back to reference Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. In: Proceedings of the National Academy of Sciences USA, vol. 106, pp. 8519–8524 (2009) Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. In: Proceedings of the National Academy of Sciences USA, vol. 106, pp. 8519–8524 (2009)
27.
go back to reference Pyne, S., Lee, S.X., Wang, K., Irish, J., Tamayo, P., Nazaire, M.D., Duong, T., Ng, S.K., Hafler, D., Levy, R., Nolan, G.P., Mesirov, J., McLachlan, G.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLOS ONE 9, e100334 (2014)CrossRef Pyne, S., Lee, S.X., Wang, K., Irish, J., Tamayo, P., Nazaire, M.D., Duong, T., Ng, S.K., Hafler, D., Levy, R., Nolan, G.P., Mesirov, J., McLachlan, G.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLOS ONE 9, e100334 (2014)CrossRef
28.
go back to reference Pyne, S., Lee, S., McLachlan, G.: Nature and man: The goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)MathSciNet Pyne, S., Lee, S., McLachlan, G.: Nature and man: The goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)MathSciNet
29.
go back to reference Riggi, S., Ingrassia, S.: A model-based clustering approach for mass composition analysis of high energy cosmic rays. Astropart. Phys. 48, 86–96 (2013)CrossRef Riggi, S., Ingrassia, S.: A model-based clustering approach for mass composition analysis of high energy cosmic rays. Astropart. Phys. 48, 86–96 (2013)CrossRef
30.
go back to reference Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)MathSciNetCrossRefMATH Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)MathSciNetCrossRefMATH
31.
go back to reference Schaarschmidt, F., Hofmann, M., Jaki, T., Grün, B., Hothorn, L.A.: Statistical approaches for the determination of cut points in anti-drug antibody bioassays. J. Immunol. Methods 25, 295–306 (2015) Schaarschmidt, F., Hofmann, M., Jaki, T., Grün, B., Hothorn, L.A.: Statistical approaches for the determination of cut points in anti-drug antibody bioassays. J. Immunol. Methods 25, 295–306 (2015)
32.
go back to reference Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–189 (1968)CrossRefMATH Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–189 (1968)CrossRefMATH
33.
go back to reference Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of Conference of Digital Image Computing: Techniques and Applications, pp. 526–531, Los Alamitos, California (2009) Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of Conference of Digital Image Computing: Techniques and Applications, pp. 526–531, Los Alamitos, California (2009)
Metadata
Title
Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions
Authors
Sharon X. Lee
Geoffrey J. McLachlan
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-49586-6_49

Premium Partner