Abstract
We show that a well-known clustering criterion for discrete data, the information criterion, is closely related to the classification maximum likelihood criterion for the latent class model. This relation can be derived from the Bryant-Windham construction. Emphasis is placed on binary clustering criteria which are analyzed under the maximum likelihood approach for different multivariate Bernoulli mixtures. This alternative form of criterion reveals non-apparent aspects of clustering techniques. All the criteria discussed can be optimized with the alternating optimization algorithm. Some illustrative applications are included.
Résumé
Nous montrons que le critère de classification de l'information, souvent utilisé pour les données discrètes, est très lié au critère du maximum de vraisemblance classifiante appliqué au modèle des classes latentes. Ce lien peut être analysé sous l'approche de la paramétrisation de Bryant-Windham. L'accent est mis sur le cas des données binaires qui sont analysées sous l'approche du maximum de vraisemblance pour les mélanges de distributions multivariées de Bernoulli. Cette forme de critère permet de mettre en évidence des aspects cachés des méthodes de classification de données binaires. Tous les critères envisagés ici peuvent être optimisés avec l'algorithme d'optimisation alternée. Des exemples concluent cet article.
Similar content being viewed by others
References
AITCHISON, J., and AITKEN, C. G. G. (1976), “Multivariate Binary Discrimination by the Kernel method,”Biometrika, 63, 413–420.
BENZÉCRI, J. P. (1973), “Théorie de l'Information et Classification d'après un Tableau de Contingence,”L'Analyse des Données, tome 1, Paris: Dunod, 207–236.
BEZDEK, J. C., HATHAWAY, R. J., HOWARD, R. E., WILSON, C. A., and WINDHAM, M. P. (1987), “Local Convergence Analysis of a Grouped Variable Version of Coordinate Descent,”Journal of Optimization and Applications, 54, 471–477.
BOCK, H. H. (1986), “Loglinear Models and Entropy Clustering Methods for Qualitative Data,” inClassification as a Tool of Research, Eds., W. Gaul and M. Schader, Amsterdam: North-Holland, 19–26.
BOCK, H. H. (1989), “Probabilistic Aspects in Cluster Analysis,” inConceptual and Numerical Analysis of Data, Ed., O. Opitz, Berlin: Springer-Verlag 12–44.
BOZDOGAN H. (1987), “Selecting Loglinear Models and Subset Selection of Variables in Multiway Contingency Tables using Akaike's Information Criterion,” inClassification and Related Methods of Data Analysis, Ed., H. H. Bock, Amsterdam: North-Holland, 609–616.
BRYANT, P. (1988), “On Characterizing Optimization-Based Clustering Methods,”Journal of Classification, 5, 81–84.
BRYANT, P., and WILLIAMSON, J. A. (1978), “Asymptotic Behaviour of Classification Maximum Likelihood Estimates,”Biometrika, 65, 273–281.
BRYANT, P., and WILLIAMSON, J. A. (1986), “Maximum Likelihood and Classification: A Comparison of Three Approaches,” inClassification as a Tool of Research, Eds., W. Gaul and M. Schader, Amsterdam: North-Holland, 33–45.
CELEUX, G. (1988), “Classification et Modèles,”Revue de Statistique Appliquée, 36, 4, 43–58.
DEMPSTER, A. P., LAIRD, N. M., and RUBIN, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion),”Journal of the Royal Statistical Society, B 39, 1–38.
DIDAY, E., and SIMON J. C. (1976), “Clustering Analysis,” inDigital Pattern, Recognition, Ed., K. S. Fu, Berlin: Springer-Verlag, 47–94.
EVERITT, B. (1984),An Introduction to Latent Variable Models, London: Chapman and Hall.
GANESALINGAM S. (1989), “Classification and Mixture Approach to Clustering via Maximum Likelihood,”Applied Statistics, 38, 455–466.
GOODMAN, L. A. (1974), “Exploratory Latent Structure Models using both Identifiable and Unidentifiable Models,”Biometrika, 61, 215–231.
GOVAERT, G. (1983),Classification Croisée, Thesis Université Paris 6.
MARRIOTT, F. H. C. (1975), “Separating Mixtures of Normal Distributions,”Biometrics, 31, 767–769.
SCOTT, A. J., and SYMONS, M. J. (1971), “Clustering Methods Based on Likelihood Ratio Criteria,”Biometrics, 27, 387–397.
STOUFFER, S. A., and TOBY, J. (1951), “Role Conflict and Personality,”American Journal of Sociology, 56, 395–406.
WINDHAM, M. P. (1987), “Parameter Modification for Clustering,”Journal of Classification, 4, 191–214.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Celeux, G., Govaert, G. Clustering criteria for discrete data and latent class models. Journal of Classification 8, 157–176 (1991). https://doi.org/10.1007/BF02616237
Issue Date:
DOI: https://doi.org/10.1007/BF02616237