nach oben

Advances in Data Analysis and Classification

Erschienen in:

01.06.2016 | Regular Article

Latent class model with conditional dependency per modes to cluster categorical data

verfasst von: Matthieu Marbac, Christophe Biernacki, Vincent Vandewalle

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named conditional modes model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.

Vorheriger Artikel Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis

Nächster Artikel Power analysis for the bootstrap likelihood ratio test for the number of classes in latent class models

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Note that the repartition of the variables into blocks is identical between classes. This choice is motivated by reasons of identifiability and interpretation that we will detail later.

Agresti A (2002) Categorical data analysis, vol 359. Wiley, New YorkCrossRefMATH

Allman E, Matias C, Rhodes J (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37(6A):3099–3132MathSciNetCrossRefMATH

Bartholomew D, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, vol 899. Wiley, New YorkCrossRefMATH

Biernacki C, Celeux G, Govaert G (2010) Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140(11):2991–3002MathSciNetCrossRefMATH

Bretagnolle V (2007) Personal communication. source: Museum

Celeux G, Govaert G (1991) Clustering criteria for discrete data and latent class models. J Classif 8(2):157–176CrossRefMATH

Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. Classification as a tool for research. Springer, Berlin Heidelberg, pp 91–99CrossRef

Choirat C, Seri R (2012) Estimation in discrete parameter models. Stat Sci 27(2):278–293MathSciNetCrossRefMATH

Czerniak J, Zarzycki H (2003) Application of rough sets in the presumptive diagnosis of urinary system diseases. Artifical intelligence and security in computing systems, ACS’2002 9th International Conference Proceedings, pp 41–51

Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Method) 39(1):1–38MathSciNetMATH

Espeland M, Handelman S (1989) Using Latent class models to characterize and assess relative error in discrete measurements. Biometrics 45(2):587–599CrossRefMATH

Gollini I, Murphy T (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput 24(4):569–588MathSciNetCrossRefMATH

Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231MathSciNetCrossRefMATH

Govaert G (2010) Data analysis, vol 136. Wiley, New yorkMATH

Hagenaars J (1988) Latent structure models with direct effects between indicators local dependence models. Sociol Methods & Res 16(3):379–405CrossRef

Hand D, Yu K (2001) Idiot’s bayes not so stupid after all? Int Stat Rev 69(3):385–398MATH

Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. Pattern Anal Mach Intell, IEEE Trans On 27(5):657–668CrossRef

Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRefMATH

Jajuga K, Sokołowski A, Bock H (2002) Classification, clustering and data analysis: recent advances and applications. Springer, BerlinCrossRefMATH

Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In Proc Conf ISIS 96:375–384

Kruskal J (1976) More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika 41(3):281–293MathSciNetCrossRefMATH

Kruskal J (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl 18(2):95–138MathSciNetCrossRefMATH

Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2014) Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. J Stat Softw (in press)

McLachlan G, Krishnan T (1997) The EM algorithm. Applied probability and statistics., Probability and statisticsWiley, New YorkMATH

McLachlan G, Peel D (2000) Finite mixutre models. Applied probability and statistics., Probability and statisticsWiley, New YorkCrossRefMATH

Moran M, Walsh C, Lynch A, Coen R, Coakley D, Lawlor B (2004) Syndromes of behavioural and psychological symptoms in mild alzheimer’s disease. Int J Geriatr Psychiatry 19(4):359–364CrossRef

Qu Y, Tan M, Kutner M (1996) Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52(3):797–810MathSciNetCrossRefMATH

Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464MathSciNetCrossRefMATH

Van Hattum P, Hoijtink H (2009) Market segmentation using brand strategy research: bayesian inference with respect to mixtures of log-linear models. J Classif 26(3):297–328MathSciNetCrossRefMATH

Vermunt J (2003) Multilevel latent class models. Sociol Method 33(1):213–239CrossRef

Titel: Latent class model with conditional dependency per modes to cluster categorical data
verfasst von: Matthieu Marbac
Christophe Biernacki
Vincent Vandewalle
Publikationsdatum: 01.06.2016
Verlag: Springer Berlin Heidelberg
Erschienen in: Advances in Data Analysis and Classification / Ausgabe 2/2016
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI: https://doi.org/10.1007/s11634-016-0250-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2016

Model based clustering for mixed data: clustMD

Varying uncertainty in CUB models

Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis

Special issue on Advances in latent variables: methods, models and applications

Dynamic segmentation with growth mixture models

Power analysis for the bootstrap likelihood ratio test for the number of classes in latent class models

Premium Partner