Skip to main content
Erschienen in: Advances in Data Analysis and Classification 3/2022

17.06.2021 | Regular Article

Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution

verfasst von: Francisco H. C. de Alencar, Christian E. Galarza, Larissa A. Matos, Victor H. Lachos

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Finite mixture models have been widely used to model and analyze data from a heterogeneous populations. Moreover, data of this kind can be missing or subject to some upper and/or lower detection limits because of the constraints of experimental apparatuses. Another complication arises when measures of each population depart significantly from normality, such as asymmetric behavior. For such data structures, we propose a robust model for censored and/or missing data based on finite mixtures of multivariate skew-normal distributions. This approach allows us to model data with great flexibility, accommodating multimodality and skewness, simultaneously, depending on the structure of the mixture components. We develop an analytically simple, yet efficient, EM-type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the truncated multivariate skew-normal distributions. Furthermore, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed method. The proposed algorithm and method are implemented in the new R package CensMFM.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arellano-Valle RB, Genton MG (2010) Multivariate extended skew-t distributions and related families. Metron LXVIII:201–234 Arellano-Valle RB, Genton MG (2010) Multivariate extended skew-t distributions and related families. Metron LXVIII:201–234
Zurück zum Zitat Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. J R Stat Soc B 61:579–602MathSciNetMATHCrossRef Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. J R Stat Soc B 61:579–602MathSciNetMATHCrossRef
Zurück zum Zitat Bai Z, Krishnaiah P, Zhao L (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388MathSciNetMATHCrossRef Bai Z, Krishnaiah P, Zhao L (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388MathSciNetMATHCrossRef
Zurück zum Zitat Basford K, Greenway D, McLachlan G, Peel D (1997) Standard errors of fitted component means of normal mixtures. Comput Stat 12:1–18MATH Basford K, Greenway D, McLachlan G, Peel D (1997) Standard errors of fitted component means of normal mixtures. Comput Stat 12:1–18MATH
Zurück zum Zitat Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54(12):2926–2941MathSciNetMATHCrossRef Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54(12):2926–2941MathSciNetMATHCrossRef
Zurück zum Zitat Bouveyron C, Celeux G, Murphy T, Raftery A (2019) Model-based clustering and classification for data science: with applications in R. Cambridge University Press, CambridgeMATHCrossRef Bouveyron C, Celeux G, Murphy T, Raftery A (2019) Model-based clustering and classification for data science: with applications in R. Cambridge University Press, CambridgeMATHCrossRef
Zurück zum Zitat Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142MathSciNetMATHCrossRef Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142MathSciNetMATHCrossRef
Zurück zum Zitat Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137MathSciNetCrossRef Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137MathSciNetCrossRef
Zurück zum Zitat Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38MathSciNetMATH Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38MathSciNetMATH
Zurück zum Zitat Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, BerlinMATH Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, BerlinMATH
Zurück zum Zitat Galarza CE, Kan R, Lachos VH (2020a) MomTrunc: moments of folded and doubly truncated multivariate distributions. R Package Vers 5:87 Galarza CE, Kan R, Lachos VH (2020a) MomTrunc: moments of folded and doubly truncated multivariate distributions. R Package Vers 5:87
Zurück zum Zitat Galarza CE, Matos L, Lachos VH (2020b) Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-\( t \) distribution. arXiv preprint arXiv:2007.14980 Galarza CE, Matos L, Lachos VH (2020b) Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-\( t \) distribution. arXiv preprint arXiv:​2007.​14980
Zurück zum Zitat He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Adv Water Resour 59:15–24CrossRef He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Adv Water Resour 59:15–24CrossRef
Zurück zum Zitat Lachos VH, Bandyopadhyay D, Dey DK (2011) Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions. Biometrics 67:1594–1604MathSciNetMATHCrossRef Lachos VH, Bandyopadhyay D, Dey DK (2011) Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions. Biometrics 67:1594–1604MathSciNetMATHCrossRef
Zurück zum Zitat Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using the multivariate Student-t distribution. J Multivar Anal 159:151–167MathSciNetMATHCrossRef Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using the multivariate Student-t distribution. J Multivar Anal 159:151–167MathSciNetMATHCrossRef
Zurück zum Zitat Lachos VH, Cabral CRB, Zeller CB (2018) Finite mixture of Skewed distributions. Springer, BerlinMATHCrossRef Lachos VH, Cabral CRB, Zeller CB (2018) Finite mixture of Skewed distributions. Springer, BerlinMATHCrossRef
Zurück zum Zitat Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. J Multivar Anal 100(19):2337–2351MathSciNetMATHCrossRef Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. J Multivar Anal 100(19):2337–2351MathSciNetMATHCrossRef
Zurück zum Zitat Lin TI, Lachos VH, Wang WL (2018) Multivariate longitudinal data analysis with censored and intermittent missing responses. Stat Med 37(19):2822–2835MathSciNetCrossRef Lin TI, Lachos VH, Wang WL (2018) Multivariate longitudinal data analysis with censored and intermittent missing responses. Stat Med 37(19):2822–2835MathSciNetCrossRef
Zurück zum Zitat Lin TI, Wang WL (2020) Multivariate-t linear mixed models with censored responses, intermittent missing values and heavy tails. Stat Methods Med 29(5):288–1304MathSciNet Lin TI, Wang WL (2020) Multivariate-t linear mixed models with censored responses, intermittent missing values and heavy tails. Stat Methods Med 29(5):288–1304MathSciNet
Zurück zum Zitat Little RJ, Rubin DB (2002) Statistical analysis with missing data, vol 793. Wiley, HobokenMATHCrossRef Little RJ, Rubin DB (2002) Statistical analysis with missing data, vol 793. Wiley, HobokenMATHCrossRef
Zurück zum Zitat Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44:226–233MathSciNetMATH Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44:226–233MathSciNetMATH
Zurück zum Zitat McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, HobokenMATHCrossRef McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, HobokenMATHCrossRef
Zurück zum Zitat McNicholas PD (2016) Mixture model-based classification. Chapman and Hall/CRC, Boca RatonMATHCrossRef McNicholas PD (2016) Mixture model-based classification. Chapman and Hall/CRC, Boca RatonMATHCrossRef
Zurück zum Zitat Meilijson I (1989) A fast improvement to the em algorithm on its own terms. J R Stat Soc Ser B (Methodological) 51(1):127–138MathSciNetMATH Meilijson I (1989) A fast improvement to the em algorithm on its own terms. J R Stat Soc Ser B (Methodological) 51(1):127–138MathSciNetMATH
Zurück zum Zitat Peel D, McLachlan GJ (2000a) Finite mixture models. Wiley, HobokenMATH Peel D, McLachlan GJ (2000a) Finite mixture models. Wiley, HobokenMATH
Zurück zum Zitat Peel D, McLachlan GJ (2000b) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348CrossRef Peel D, McLachlan GJ (2000b) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348CrossRef
Zurück zum Zitat Prates MO, Lachos VH, Cabral C (2013) mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. J Stat Softw 54(12):1–20CrossRef Prates MO, Lachos VH, Cabral C (2013) mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. J Stat Softw 54(12):1–20CrossRef
Zurück zum Zitat Wang WL, Castro LM, Lachos VH, Lin TI (2019) Model-based clustering of censored data via mixtures of factor analyzers. Comput Stat Data Anal 140:104–121MathSciNetMATHCrossRef Wang WL, Castro LM, Lachos VH, Lin TI (2019) Model-based clustering of censored data via mixtures of factor analyzers. Comput Stat Data Anal 140:104–121MathSciNetMATHCrossRef
Zurück zum Zitat Wang WL, Liu M, Lin TI (2017) Robust skew-t factor analysis models for handling missing data. Stat Methods Appl 26(4):649–672MathSciNetMATHCrossRef Wang WL, Liu M, Lin TI (2017) Robust skew-t factor analysis models for handling missing data. Stat Methods Appl 26(4):649–672MathSciNetMATHCrossRef
Zurück zum Zitat Zeller CB, Cabral CR, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25(2):375–396MathSciNetMATHCrossRef Zeller CB, Cabral CR, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25(2):375–396MathSciNetMATHCrossRef
Metadaten
Titel
Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution
verfasst von
Francisco H. C. de Alencar
Christian E. Galarza
Larissa A. Matos
Victor H. Lachos
Publikationsdatum
17.06.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 3/2022
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-021-00448-5

Weitere Artikel der Ausgabe 3/2022

Advances in Data Analysis and Classification 3/2022 Zur Ausgabe

Premium Partner