Skip to main content
Erschienen in: Advances in Data Analysis and Classification 2/2016

01.06.2016 | Regular Article

Latent class model with conditional dependency per modes to cluster categorical data

verfasst von: Matthieu Marbac, Christophe Biernacki, Vincent Vandewalle

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named conditional modes model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that the repartition of the variables into blocks is identical between classes. This choice is motivated by reasons of identifiability and interpretation that we will detail later.
 
Literatur
Zurück zum Zitat Allman E, Matias C, Rhodes J (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37(6A):3099–3132MathSciNetCrossRefMATH Allman E, Matias C, Rhodes J (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37(6A):3099–3132MathSciNetCrossRefMATH
Zurück zum Zitat Bartholomew D, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, vol 899. Wiley, New YorkCrossRefMATH Bartholomew D, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, vol 899. Wiley, New YorkCrossRefMATH
Zurück zum Zitat Biernacki C, Celeux G, Govaert G (2010) Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140(11):2991–3002MathSciNetCrossRefMATH Biernacki C, Celeux G, Govaert G (2010) Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140(11):2991–3002MathSciNetCrossRefMATH
Zurück zum Zitat Bretagnolle V (2007) Personal communication. source: Museum Bretagnolle V (2007) Personal communication. source: Museum
Zurück zum Zitat Celeux G, Govaert G (1991) Clustering criteria for discrete data and latent class models. J Classif 8(2):157–176CrossRefMATH Celeux G, Govaert G (1991) Clustering criteria for discrete data and latent class models. J Classif 8(2):157–176CrossRefMATH
Zurück zum Zitat Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. Classification as a tool for research. Springer, Berlin Heidelberg, pp 91–99CrossRef Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. Classification as a tool for research. Springer, Berlin Heidelberg, pp 91–99CrossRef
Zurück zum Zitat Czerniak J, Zarzycki H (2003) Application of rough sets in the presumptive diagnosis of urinary system diseases. Artifical intelligence and security in computing systems, ACS’2002 9th International Conference Proceedings, pp 41–51 Czerniak J, Zarzycki H (2003) Application of rough sets in the presumptive diagnosis of urinary system diseases. Artifical intelligence and security in computing systems, ACS’2002 9th International Conference Proceedings, pp 41–51
Zurück zum Zitat Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Method) 39(1):1–38MathSciNetMATH Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Method) 39(1):1–38MathSciNetMATH
Zurück zum Zitat Espeland M, Handelman S (1989) Using Latent class models to characterize and assess relative error in discrete measurements. Biometrics 45(2):587–599CrossRefMATH Espeland M, Handelman S (1989) Using Latent class models to characterize and assess relative error in discrete measurements. Biometrics 45(2):587–599CrossRefMATH
Zurück zum Zitat Gollini I, Murphy T (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput 24(4):569–588MathSciNetCrossRefMATH Gollini I, Murphy T (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput 24(4):569–588MathSciNetCrossRefMATH
Zurück zum Zitat Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231MathSciNetCrossRefMATH Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231MathSciNetCrossRefMATH
Zurück zum Zitat Govaert G (2010) Data analysis, vol 136. Wiley, New yorkMATH Govaert G (2010) Data analysis, vol 136. Wiley, New yorkMATH
Zurück zum Zitat Hagenaars J (1988) Latent structure models with direct effects between indicators local dependence models. Sociol Methods & Res 16(3):379–405CrossRef Hagenaars J (1988) Latent structure models with direct effects between indicators local dependence models. Sociol Methods & Res 16(3):379–405CrossRef
Zurück zum Zitat Hand D, Yu K (2001) Idiot’s bayes not so stupid after all? Int Stat Rev 69(3):385–398MATH Hand D, Yu K (2001) Idiot’s bayes not so stupid after all? Int Stat Rev 69(3):385–398MATH
Zurück zum Zitat Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. Pattern Anal Mach Intell, IEEE Trans On 27(5):657–668CrossRef Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. Pattern Anal Mach Intell, IEEE Trans On 27(5):657–668CrossRef
Zurück zum Zitat Jajuga K, Sokołowski A, Bock H (2002) Classification, clustering and data analysis: recent advances and applications. Springer, BerlinCrossRefMATH Jajuga K, Sokołowski A, Bock H (2002) Classification, clustering and data analysis: recent advances and applications. Springer, BerlinCrossRefMATH
Zurück zum Zitat Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In Proc Conf ISIS 96:375–384 Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In Proc Conf ISIS 96:375–384
Zurück zum Zitat Kruskal J (1976) More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika 41(3):281–293MathSciNetCrossRefMATH Kruskal J (1976) More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika 41(3):281–293MathSciNetCrossRefMATH
Zurück zum Zitat Kruskal J (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl 18(2):95–138MathSciNetCrossRefMATH Kruskal J (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl 18(2):95–138MathSciNetCrossRefMATH
Zurück zum Zitat Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2014) Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. J Stat Softw (in press) Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2014) Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. J Stat Softw (in press)
Zurück zum Zitat McLachlan G, Krishnan T (1997) The EM algorithm. Applied probability and statistics., Probability and statisticsWiley, New YorkMATH McLachlan G, Krishnan T (1997) The EM algorithm. Applied probability and statistics., Probability and statisticsWiley, New YorkMATH
Zurück zum Zitat McLachlan G, Peel D (2000) Finite mixutre models. Applied probability and statistics., Probability and statisticsWiley, New YorkCrossRefMATH McLachlan G, Peel D (2000) Finite mixutre models. Applied probability and statistics., Probability and statisticsWiley, New YorkCrossRefMATH
Zurück zum Zitat Moran M, Walsh C, Lynch A, Coen R, Coakley D, Lawlor B (2004) Syndromes of behavioural and psychological symptoms in mild alzheimer’s disease. Int J Geriatr Psychiatry 19(4):359–364CrossRef Moran M, Walsh C, Lynch A, Coen R, Coakley D, Lawlor B (2004) Syndromes of behavioural and psychological symptoms in mild alzheimer’s disease. Int J Geriatr Psychiatry 19(4):359–364CrossRef
Zurück zum Zitat Qu Y, Tan M, Kutner M (1996) Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52(3):797–810MathSciNetCrossRefMATH Qu Y, Tan M, Kutner M (1996) Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52(3):797–810MathSciNetCrossRefMATH
Zurück zum Zitat Van Hattum P, Hoijtink H (2009) Market segmentation using brand strategy research: bayesian inference with respect to mixtures of log-linear models. J Classif 26(3):297–328MathSciNetCrossRefMATH Van Hattum P, Hoijtink H (2009) Market segmentation using brand strategy research: bayesian inference with respect to mixtures of log-linear models. J Classif 26(3):297–328MathSciNetCrossRefMATH
Zurück zum Zitat Vermunt J (2003) Multilevel latent class models. Sociol Method 33(1):213–239CrossRef Vermunt J (2003) Multilevel latent class models. Sociol Method 33(1):213–239CrossRef
Metadaten
Titel
Latent class model with conditional dependency per modes to cluster categorical data
verfasst von
Matthieu Marbac
Christophe Biernacki
Vincent Vandewalle
Publikationsdatum
01.06.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 2/2016
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-016-0250-1

Weitere Artikel der Ausgabe 2/2016

Advances in Data Analysis and Classification 2/2016 Zur Ausgabe

Premium Partner