Skip to main content
Erschienen in: Advances in Data Analysis and Classification 2/2019

08.03.2018 | Regular Article

Mixtures of restricted skew-t factor analyzers with common factor loadings

verfasst von: Wan-Lun Wang, Luis M. Castro, Yen-Ting Chang, Tsung-I Lin

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Mixtures of common t factor analyzers (MCtFA) have been shown its effectiveness in robustifying mixtures of common factor analyzers (MCFA) when handling model-based clustering of the high-dimensional data with heavy tails. However, the MCtFA model may still suffer from a lack of robustness against observations whose distributions are highly asymmetric. This paper presents a further robust extension of the MCFA and MCtFA models, called the mixture of common restricted skew-t factor analyzers (MCrstFA), by assuming a restricted multivariate skew-t distribution for the common factors. The MCrstFA model can be used to accommodate severely non-normal (skewed and leptokurtic) random phenomena while preserving its parsimony in factor-analytic representation and performing graphical visualization in low-dimensional plots. A computationally feasible expectation conditional maximization either algorithm is developed to carry out maximum likelihood estimation. The numbers of factors and mixture components are simultaneously determined based on common likelihood penalized criteria. The usefulness of our proposed model is illustrated with simulated and real datasets, and experimental results signify its superiority over some existing competitors.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305CrossRefMATH Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305CrossRefMATH
Zurück zum Zitat Azzalini A (2014) The skew-normal and related families. IMS monographs series. Cambridge University Press, CambridgeMATH Azzalini A (2014) The skew-normal and related families. IMS monographs series. Cambridge University Press, CambridgeMATH
Zurück zum Zitat Azzalini A, Browne RP, Genton MG, McNicholas PD (2016) On nomenclature for, and the relative merits of, two formulations of skew distributions. Stat Probab Lett 110:201–206MathSciNetCrossRefMATH Azzalini A, Browne RP, Genton MG, McNicholas PD (2016) On nomenclature for, and the relative merits of, two formulations of skew distributions. Stat Probab Lett 110:201–206MathSciNetCrossRefMATH
Zurück zum Zitat Baek J, McLachlan GJ (2011) Mixtures of common \(t\)-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276CrossRef Baek J, McLachlan GJ (2011) Mixtures of common \(t\)-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276CrossRef
Zurück zum Zitat Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32:1–13CrossRef Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32:1–13CrossRef
Zurück zum Zitat Barndorff-Nielsen O, Shephard N (2001) Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J Roy Stat Soc Ser B 63:167–241MathSciNetCrossRefMATH Barndorff-Nielsen O, Shephard N (2001) Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J Roy Stat Soc Ser B 63:167–241MathSciNetCrossRefMATH
Zurück zum Zitat Beal MJ (2003) Variational algorithms for approximate Bayesian inference. Ph.D. thesis, The University of London, London, UK Beal MJ (2003) Variational algorithms for approximate Bayesian inference. Ph.D. thesis, The University of London, London, UK
Zurück zum Zitat Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725CrossRef Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725CrossRef
Zurück zum Zitat Cabral CR, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142MathSciNetCrossRefMATH Cabral CR, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142MathSciNetCrossRefMATH
Zurück zum Zitat Castro LM, Costa DR, Prates MO, Lachos VH (2015) Likelihood-based inference for Tobit confirmatory factor analysis using the multivariate Student-\(t\) distribution. Stat Comput 25:1163–1183MathSciNetCrossRefMATH Castro LM, Costa DR, Prates MO, Lachos VH (2015) Likelihood-based inference for Tobit confirmatory factor analysis using the multivariate Student-\(t\) distribution. Stat Comput 25:1163–1183MathSciNetCrossRefMATH
Zurück zum Zitat Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai KM, Ji J, Dudoit S, Ng IO, Van De Rijn M, Botstein D, Brown PO (2002) Gene expression patterns in human liver cancers. Mol Biol Cell 13:1929–1939CrossRef Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai KM, Ji J, Dudoit S, Ng IO, Van De Rijn M, Botstein D, Brown PO (2002) Gene expression patterns in human liver cancers. Mol Biol Cell 13:1929–1939CrossRef
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 9:1–38MATH Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 9:1–38MATH
Zurück zum Zitat Ghahramani Z, Beal M (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems. MIT Press, Cambridge Ghahramani Z, Beal M (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems. MIT Press, Cambridge
Zurück zum Zitat Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Technical Report No. CRG-TR-96-1, The University of Toronto, Toronto Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Technical Report No. CRG-TR-96-1, The University of Toronto, Toronto
Zurück zum Zitat Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc C 28:100–108MATH Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc C 28:100–108MATH
Zurück zum Zitat Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233CrossRefMATH Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233CrossRefMATH
Zurück zum Zitat Lachos VH, Morenoa EJL, Chen K, Cabralc CRB (2017) Finite mixture modeling of censored data using the multivariate Student-\(t\) distribution. J Multivar Anal 159:151–167MathSciNetCrossRef Lachos VH, Morenoa EJL, Chen K, Cabralc CRB (2017) Finite mixture modeling of censored data using the multivariate Student-\(t\) distribution. J Multivar Anal 159:151–167MathSciNetCrossRef
Zurück zum Zitat Lee SX, McLachlan GJ (2014) Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat Comp 24:181–202MathSciNetCrossRefMATH Lee SX, McLachlan GJ (2014) Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat Comp 24:181–202MathSciNetCrossRefMATH
Zurück zum Zitat Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew \(t\)-distributions: the unication of the restricted and unrestricted skew \(t\)-mixture models. Stat Comp 26:573–589CrossRefMATH Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew \(t\)-distributions: the unication of the restricted and unrestricted skew \(t\)-mixture models. Stat Comp 26:573–589CrossRefMATH
Zurück zum Zitat Lee YW, Poon SH (2011) Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops, pp 1–61. School of Social Science, University of Manchester Lee YW, Poon SH (2011) Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops, pp 1–61. School of Social Science, University of Manchester
Zurück zum Zitat Lee WL, Chen YC, Hsieh KS (2003) Ultrasonic liver tissues classification by fractal feature vector based on M-band wavelet transform. IEEE Trans Med Imaging 22:382–392CrossRef Lee WL, Chen YC, Hsieh KS (2003) Ultrasonic liver tissues classification by fractal feature vector based on M-band wavelet transform. IEEE Trans Med Imaging 22:382–392CrossRef
Zurück zum Zitat Lin TI (2014) Learning from incomplete data via parameterized \(t\) mixture models through eigenvalue decomposition. Comput Stat Data Anal 71:183–195MathSciNetCrossRefMATH Lin TI (2014) Learning from incomplete data via parameterized \(t\) mixture models through eigenvalue decomposition. Comput Stat Data Anal 71:183–195MathSciNetCrossRefMATH
Zurück zum Zitat Lin TI, Wu PH, McLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew-\(t\) distribution. TEST 24:510–531MathSciNetCrossRefMATH Lin TI, Wu PH, McLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew-\(t\) distribution. TEST 24:510–531MathSciNetCrossRefMATH
Zurück zum Zitat Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413MathSciNetCrossRefMATH Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413MathSciNetCrossRefMATH
Zurück zum Zitat Lin TI, Wang WL, McLachlan GJ, Lee SX (2018) Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution. Stat Model 28:50–72MathSciNetCrossRef Lin TI, Wang WL, McLachlan GJ, Lee SX (2018) Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution. Stat Model 28:50–72MathSciNetCrossRef
Zurück zum Zitat Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:33–648MathSciNetMATH Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:33–648MathSciNetMATH
Zurück zum Zitat McLachlan GJ, Basford KE (1988) Mixture models: inference and application to clustering. Marcel Dekker, New YorkMATH McLachlan GJ, Basford KE (1988) Mixture models: inference and application to clustering. Marcel Dekker, New YorkMATH
Zurück zum Zitat McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New YorkCrossRefMATH McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New YorkCrossRefMATH
Zurück zum Zitat McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54:711–723MathSciNetCrossRefMATH McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54:711–723MathSciNetCrossRefMATH
Zurück zum Zitat Murray PM, McNicholas PD, Browne RP (2014b) Mixtures of common skew-\(t\) factor analyzers. Stat 3:68–82CrossRefMATH Murray PM, McNicholas PD, Browne RP (2014b) Mixtures of common skew-\(t\) factor analyzers. Stat 3:68–82CrossRefMATH
Zurück zum Zitat Murray PM, Browne RP, McNicholas PD (2017a) A mixture of SDB skew-\(t\) factor analyzers. Econom Stat 3:160–168MathSciNetCrossRef Murray PM, Browne RP, McNicholas PD (2017a) A mixture of SDB skew-\(t\) factor analyzers. Econom Stat 3:160–168MathSciNetCrossRef
Zurück zum Zitat Murray PM, Browne RP, McNicholas PD (2017b) Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. J Multivar Anal 161:141–156MathSciNetCrossRefMATH Murray PM, Browne RP, McNicholas PD (2017b) Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. J Multivar Anal 161:141–156MathSciNetCrossRefMATH
Zurück zum Zitat Ouyang M, Welsh W, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20:917–923CrossRef Ouyang M, Welsh W, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20:917–923CrossRef
Zurück zum Zitat Prates MO, Cabral CR, Lachos VH (2013) mixsmsn: fitting finite mixture of scale mixture of skew-normal distributions. J Stat Soft 54:1–20CrossRef Prates MO, Cabral CR, Lachos VH (2013) mixsmsn: fitting finite mixture of scale mixture of skew-normal distributions. J Stat Soft 54:1–20CrossRef
Zurück zum Zitat Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524CrossRef Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524CrossRef
Zurück zum Zitat Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150MathSciNetCrossRefMATH Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150MathSciNetCrossRefMATH
Zurück zum Zitat Subedi S, McNicholas PD (2014) Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv Data Anal Classif 8:167–193MathSciNetCrossRef Subedi S, McNicholas PD (2014) Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv Data Anal Classif 8:167–193MathSciNetCrossRef
Zurück zum Zitat Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21:3025–3033CrossRef Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21:3025–3033CrossRef
Zurück zum Zitat Tortora C, McNicholas P, Browne R (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440MathSciNetCrossRef Tortora C, McNicholas P, Browne R (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440MathSciNetCrossRef
Zurück zum Zitat Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12:2109–2128CrossRef Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12:2109–2128CrossRef
Zurück zum Zitat Wang WL (2013) Mixtures of common factor analyzers for high-dimensional data with missing information. J Multivar Anal 117:120–133MathSciNetCrossRefMATH Wang WL (2013) Mixtures of common factor analyzers for high-dimensional data with missing information. J Multivar Anal 117:120–133MathSciNetCrossRefMATH
Zurück zum Zitat Wang WL (2015) Mixtures of common \(t\)-factor analyzers for modeling high-dimensional data with missing values. Comput Stat Data Anal 83:223–235MathSciNetCrossRefMATH Wang WL (2015) Mixtures of common \(t\)-factor analyzers for modeling high-dimensional data with missing values. Comput Stat Data Anal 83:223–235MathSciNetCrossRefMATH
Zurück zum Zitat Wang WL, Lin TI (2017) Flexible clustering via extended mixtures of common \(t\)-factor analyzers. AStA Adv Stat Anal 101:227–252MathSciNetCrossRefMATH Wang WL, Lin TI (2017) Flexible clustering via extended mixtures of common \(t\)-factor analyzers. AStA Adv Stat Anal 101:227–252MathSciNetCrossRefMATH
Zurück zum Zitat Wang K, McLachlan GJ, Ng SK, Peel D (2009) EMMIX-skew: EM algorithm for mixture of multivariate skew normal/\(t\) distributions. R package version 1.0-12 Wang K, McLachlan GJ, Ng SK, Peel D (2009) EMMIX-skew: EM algorithm for mixture of multivariate skew normal/\(t\) distributions. R package version 1.0-12
Zurück zum Zitat Wang WL, Castro LM, Lin TI (2017a) Automated learning of \(t\) factor analysis models with complete and incomplete data. J Multivar Anal 161:157–171MathSciNetCrossRefMATH Wang WL, Castro LM, Lin TI (2017a) Automated learning of \(t\) factor analysis models with complete and incomplete data. J Multivar Anal 161:157–171MathSciNetCrossRefMATH
Zurück zum Zitat Wang WL, Liu M, Lin TI (2017b) Robust skew-\(t\) factor analysis models for handling missing data. Stat Methods Appl 26:649–672MathSciNetCrossRefMATH Wang WL, Liu M, Lin TI (2017b) Robust skew-\(t\) factor analysis models for handling missing data. Stat Methods Appl 26:649–672MathSciNetCrossRefMATH
Zurück zum Zitat Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixture of experts. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixture of experts. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge
Metadaten
Titel
Mixtures of restricted skew-t factor analyzers with common factor loadings
verfasst von
Wan-Lun Wang
Luis M. Castro
Yen-Ting Chang
Tsung-I Lin
Publikationsdatum
08.03.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 2/2019
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-018-0317-2

Weitere Artikel der Ausgabe 2/2019

Advances in Data Analysis and Classification 2/2019 Zur Ausgabe

Premium Partner