Clustering criteria for discrete data and latent class models

Celeux, Gilles; Govaert, Gérard

doi:10.1007/BF02616237

Clustering criteria for discrete data and latent class models

Published: December 1991

Volume 8, pages 157–176, (1991)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Gilles Celeux¹ &
Gérard Govaert²

334 Accesses
63 Citations
Explore all metrics

Abstract

We show that a well-known clustering criterion for discrete data, the information criterion, is closely related to the classification maximum likelihood criterion for the latent class model. This relation can be derived from the Bryant-Windham construction. Emphasis is placed on binary clustering criteria which are analyzed under the maximum likelihood approach for different multivariate Bernoulli mixtures. This alternative form of criterion reveals non-apparent aspects of clustering techniques. All the criteria discussed can be optimized with the alternating optimization algorithm. Some illustrative applications are included.

Résumé

Nous montrons que le critère de classification de l'information, souvent utilisé pour les données discrètes, est très lié au critère du maximum de vraisemblance classifiante appliqué au modèle des classes latentes. Ce lien peut être analysé sous l'approche de la paramétrisation de Bryant-Windham. L'accent est mis sur le cas des données binaires qui sont analysées sous l'approche du maximum de vraisemblance pour les mélanges de distributions multivariées de Bernoulli. Cette forme de critère permet de mettre en évidence des aspects cachés des méthodes de classification de données binaires. Tous les critères envisagés ici peuvent être optimisés avec l'algorithme d'optimisation alternée. Des exemples concluent cet article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

AITCHISON, J., and AITKEN, C. G. G. (1976), “Multivariate Binary Discrimination by the Kernel method,”Biometrika, 63, 413–420.
Article MATH MathSciNet Google Scholar
BENZÉCRI, J. P. (1973), “Théorie de l'Information et Classification d'après un Tableau de Contingence,”L'Analyse des Données, tome 1, Paris: Dunod, 207–236.
Google Scholar
BEZDEK, J. C., HATHAWAY, R. J., HOWARD, R. E., WILSON, C. A., and WINDHAM, M. P. (1987), “Local Convergence Analysis of a Grouped Variable Version of Coordinate Descent,”Journal of Optimization and Applications, 54, 471–477.
Article MATH MathSciNet Google Scholar
BOCK, H. H. (1986), “Loglinear Models and Entropy Clustering Methods for Qualitative Data,” inClassification as a Tool of Research, Eds., W. Gaul and M. Schader, Amsterdam: North-Holland, 19–26.
Google Scholar
BOCK, H. H. (1989), “Probabilistic Aspects in Cluster Analysis,” inConceptual and Numerical Analysis of Data, Ed., O. Opitz, Berlin: Springer-Verlag 12–44.
Google Scholar
BOZDOGAN H. (1987), “Selecting Loglinear Models and Subset Selection of Variables in Multiway Contingency Tables using Akaike's Information Criterion,” inClassification and Related Methods of Data Analysis, Ed., H. H. Bock, Amsterdam: North-Holland, 609–616.
Google Scholar
BRYANT, P. (1988), “On Characterizing Optimization-Based Clustering Methods,”Journal of Classification, 5, 81–84.
Article MathSciNet Google Scholar
BRYANT, P., and WILLIAMSON, J. A. (1978), “Asymptotic Behaviour of Classification Maximum Likelihood Estimates,”Biometrika, 65, 273–281.
Article MATH Google Scholar
BRYANT, P., and WILLIAMSON, J. A. (1986), “Maximum Likelihood and Classification: A Comparison of Three Approaches,” inClassification as a Tool of Research, Eds., W. Gaul and M. Schader, Amsterdam: North-Holland, 33–45.
Google Scholar
CELEUX, G. (1988), “Classification et Modèles,”Revue de Statistique Appliquée, 36, 4, 43–58.
MATH MathSciNet Google Scholar
DEMPSTER, A. P., LAIRD, N. M., and RUBIN, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion),”Journal of the Royal Statistical Society, B 39, 1–38.
MATH MathSciNet Google Scholar
DIDAY, E., and SIMON J. C. (1976), “Clustering Analysis,” inDigital Pattern, Recognition, Ed., K. S. Fu, Berlin: Springer-Verlag, 47–94.
Google Scholar
EVERITT, B. (1984),An Introduction to Latent Variable Models, London: Chapman and Hall.
MATH Google Scholar
GANESALINGAM S. (1989), “Classification and Mixture Approach to Clustering via Maximum Likelihood,”Applied Statistics, 38, 455–466.
Article MATH MathSciNet Google Scholar
GOODMAN, L. A. (1974), “Exploratory Latent Structure Models using both Identifiable and Unidentifiable Models,”Biometrika, 61, 215–231.
Article MATH MathSciNet Google Scholar
GOVAERT, G. (1983),Classification Croisée, Thesis Université Paris 6.
MARRIOTT, F. H. C. (1975), “Separating Mixtures of Normal Distributions,”Biometrics, 31, 767–769.
Article MATH Google Scholar
SCOTT, A. J., and SYMONS, M. J. (1971), “Clustering Methods Based on Likelihood Ratio Criteria,”Biometrics, 27, 387–397.
Article Google Scholar
STOUFFER, S. A., and TOBY, J. (1951), “Role Conflict and Personality,”American Journal of Sociology, 56, 395–406.
Article Google Scholar
WINDHAM, M. P. (1987), “Parameter Modification for Clustering,”Journal of Classification, 4, 191–214.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Domaine de Voluceau, Rocquencourt, B.P., 105 78153, Le Chesnay Cedex
Gilles Celeux
URA CNRS 817, Université de Technologie de Compiègne, BP 649, 60206, Compiègne Cedex
Gérard Govaert

Authors

Gilles Celeux
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Govaert
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Celeux, G., Govaert, G. Clustering criteria for discrete data and latent class models. Journal of Classification 8, 157–176 (1991). https://doi.org/10.1007/BF02616237

Download citation

Issue Date: December 1991
DOI: https://doi.org/10.1007/BF02616237

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering criteria for discrete data and latent class models

Abstract

Résumé

Access this article

Similar content being viewed by others

Model-Based Clustering

Recent Developments in Model-Based Clustering with Applications

On Clustering and Classification Via Mixtures of Multivariate t-Distributions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering criteria for discrete data and latent class models

Abstract

Résumé

Access this article

Similar content being viewed by others

Model-Based Clustering

Recent Developments in Model-Based Clustering with Applications

On Clustering and Classification Via Mixtures of Multivariate t-Distributions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation