nach oben

Journal of Classification

Erschienen in:

11.07.2020 | Original Research

Gaussian-Based Visualization of Gaussian and Non-Gaussian-Based Clustering

verfasst von: Christophe Biernacki, Matthieu Marbac, Vincent Vandewalle

Erschienen in: Journal of Classification | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A generic method is introduced to visualize in a “Gaussian-like way,” and onto \(\mathbb {R}^{2}\), results of Gaussian or non-Gaussian–based clustering. The key point is to explicitly force a visualization based on a spherical Gaussian mixture to inherit from the within cluster overlap that is present in the initial clustering mixture. The result is a particularly user-friendly drawing of the clusters, providing any practitioner with an overview of the potentially complex clustering result. An entropic measure provides information about the quality of the drawn overlap compared with the true one in the initial space. The proposed method is illustrated on four real data sets of different types (categorical, mixed, functional, and network) and is implemented on the r package ClusVis.

Vorheriger Artikel Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions

Nächster Artikel Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Note that the normalized empirical entropy E could also eventually be additionally normalized by \(\ln K\). We have not done it in order to follow the seminal definition of Hathaway. And, obviously, it has no impact at all on the related parameter estimation in Section 3.3.

We never observed this event through experiments of this paper. But, in case it happens, we propose to draw at least the center of the unobserved cluster on the graph.

Ambroise, C., & Matias, C. (2012). New consistent and asymptotically normal parameter estimates for random-graph mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 74(1), 3–35. https://doi.org/10.1111/j.1467-9868.2011.01009.x.MathSciNetCrossRefMATH

Audigier, V., Husson, F., & Josse, J. (2016a). Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11), 2140–2156.

Audigier, V., Husson, F., & Josse, J. (2016b). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1), 5–26.

Banfield, J., & Raftery, A. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821. https://doi.org/10.2307/2532201.MathSciNetCrossRefMATH

Benaglia, T., Chauveau, D., & Hunter, D.R. (2009). An em-like algorithm for semi- and nonparametric estimation in multivariate mixtures. Journal of Computational and Graphical Statistics, 18, 505–526.MathSciNetCrossRef

Bezdek, J.C., Pal, M.R., Keller, J., & Krisnapuram, R. (1999). Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. USA: Kluwer Academic Publishers.CrossRef

Biernacki, C. (2017). Mixture models. In J.-J. Droesbeke, G. Saporta Thomas-Agnan, eds, ‘Choix de modèles et agrégation’, Technip. https://hal.inria.fr/hal-01252671.

Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(7), 719–725.CrossRef

Bishop, C.M., Svensén, M., & Williams, C.K. (1998). Gtm: The generative topographic mapping. Neural computation, 10(1), 215–234.CrossRef

Bouveyron, C. (2015). funFEM: Clustering in the Discriminative Functional Subspace. R package version 1.1. https://CRAN.R-project.org/package=funFEM, .

Bouveyron, C., Côme, E., & Jacques, J. (2015). The discriminative functional mixture model for a comparative analysis of bike sharing systems. Ann. Appl. Stat., 9(4), 1726–1760. https://doi.org/10.1214/15-AOAS861.MathSciNetCrossRefMATH

Bouveyron, C., & Jacques, J. (2011). Model-based clustering of time series in group-specific functional subspaces. Advances in Data Analysis and Classification, 5(4), 281–300.MathSciNetCrossRef

Celeux, G., & Govaert, G. (1991). Clustering criteria for discrete data and latent class models. Journal of Classification, 8(2), 157–176. https://doi.org/10.1007/BF02616237.CrossRefMATH

Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern recognition, 28(5), 781–793.CrossRef

Chavent, M., & Kuentz-Simonet, V. (2012). Orthogonal rotation in pcamix. Advances in Data Analysis and Classification, 6(2), 131–146.MathSciNetCrossRef

Chen, K., & Lei, J. (2015). Localized functional principal component analysis. J. Amer. Statist. Assoc., 110(511), 1266–1275. https://doi.org/10.1080/01621459.2015.1016225.MathSciNetCrossRefMATH

Cox, T., & Cox, M. (2001). Multidimensional Scaling Chapman and Hall.

Daudin, J.-J., Picard, F., & Robin, S. (2008). A mixture model for random graphs. Statistics and Computing, 18(2), 173–183. https://doi.org/10.1007/s11222-007-9046-7.MathSciNetCrossRef

Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.MathSciNetCrossRef

Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 179–188.CrossRef

Gollini, I., & Murphy, T. (2014). Mixture of latent trait analyzers for model-based clustering of categorical data. Statistics and Computing, 24(4), 569–588.MathSciNetCrossRef

Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.MathSciNetCrossRef

Greenacre, M. (2017). Correspondence analysis in practice CRC press.

Hathaway, R.J. (1986). Another interpretation of the em algorithm for mixture distributions. Statistics and Probability Letters, 4, 53–56.MathSciNetCrossRef

Hennig, C. (2004). Asymmetric linear dimension reduction for classification. Journal of Computational and Graphical Statistics, 13(4), 930–945.MathSciNetCrossRef

Hennig, C. (2010). Methods for merging gaussian mixture components. Advances in Data Analysis and Classification, 4, 3–34.MathSciNetCrossRef

Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Comput. Statist. Data Anal., 71, 92–106. https://doi.org/10.1016/j.csda.2012.12.004.MathSciNetCrossRefMATH

Jajuga, K., Sokołowski, A., & Bock, H. (2002). Classification, clustering and data analysis: recent advances and applications. Berlin Heidelberg New York: Springer.CrossRef

Josse, J., Chavent, M., Liquet, B., & Husson, F. (2012). Handling missing values with regularized iterative multiple correspondence analysis. Journal of classification, 29(1), 91–116.MathSciNetCrossRef

Josse, J., Pagès, J, & Husson, F. (2011). Multiple imputation in principal component analysis. Advances in data analysis and classification, 5(3), 231–246.MathSciNetCrossRef

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59–69.MathSciNetCrossRef

Kosmidis, I., & Karlis, D. (2015). Model-based clustering using copulas with applications Statistics and Computing pp. 1–21 https://doi.org/10.1007/s11222-015-9590-5.

Larose, C. (2015). Model-Based Clustering of Incomplete Data, PhD thesis, University of Connecticut.

Lê, S., Josse, J., Husson, F., & et al. (2008). Factominer: an R package for multivariate analysis. Journal of statistical software, 25(1), 1–18.CrossRef

Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., & Govaert, G. (2015). Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. Journal of Statistical Software, 67(6), 241–270.CrossRef

Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine learning, 4(3), 203–228.CrossRef

Marbac, M., Biernacki, C., & Vandewalle, V. (2016). Latent class model with conditional dependency per modes to cluster categorical data. Advances in Data Analysis and Classification, 10(2), 183–207.MathSciNetCrossRef

Marbac, M., Biernacki, C., & Vandewalle, V. (2017). Model-based clustering of Gaussian copulas for mixed data. Communications in Statistics - Theory and Methods, 46(23), 11635–11656.MathSciNetCrossRef

Mazo, G. (2017). A semiparametric and location-shift copula-based mixture model. Journal of Classification, 34(3), 444–464.MathSciNetCrossRef

McLachlan, G., & Peel, D. (2004). Finite mixture models. New York: Wiley.MATH

McNicholas, P. (2016). Mixture model-based classification CRC Press.

McNicholas, P., & Murphy, T. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296. https://doi.org/10.1007/s11222-008-9056-0.MathSciNetCrossRef

McNicholas, P., & Scrucca, L. (2013). Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Statistics & Probability Letters, 7, 321–338.MathSciNetMATH

McParland, D., & Gormley, I.C. (2016). Model based clustering for mixed data: clustmd. Advances in Data Analysis and Classification, 10(2), 155–169.MathSciNetCrossRef

Moustaki, I., & Papageorgiou, I. (2005). Latent class models for mixed variables with applications in archaeometry. Computational statistics & data analysis, 48(3), 659–675.MathSciNetCrossRef

Punzo, A., & Ingrassia, S. (2016). Clustering bivariate mixed-type data via the cluster-weighted model. Computational Statistics, 31(3), 989–1013.MathSciNetCrossRef

Ramsay, J.O., & Silverman, B.W. (2005). Functional data analysis Springer Series in Statistics, second edn, Springer, New York.

Samé, A., Chamroukhi, F., Govert, G., & Aknin, P. (2011). Model-based clustering and segmentation of time series with changes in regime. Advances in Data Analysis Classification, 5, 301–321.MathSciNetCrossRef

Schlimmer, J. (1987). Concept acquisition through representational adjustment, PhD thesis, Department of Information and Computer Science, University of California.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.MathSciNetCrossRef

Scrucca, L. (2010). Dimension reduction for model-based clustering. Statistics and Computing, 20(4), 471–484. https://doi.org/10.1007/s11222-009-9138-7.MathSciNetCrossRef

Scrucca, L., Fop, M., Murphy, T.B., & Raftery, A.E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 205–233. https://journal.r-project.org/archive/2016-1/scrucca-fop-murphy-etal.pdf.CrossRef

Van der Heijden, P., & Escofier, B. (2003). Multiple correspondence analysis with missing data Analyse des correspondances. Recherches au cżur de l’analyse des donnees pp. 152–170.

Verbanck, M., Josse, J., & Husson, F. (2015). Regularised PCA to denoise and visualise data. Statistics and Computing, 25(2), 471–486.MathSciNetCrossRef

Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3), 586–600.CrossRef

Xanthopoulos, P., Pardalos, P.M., & Trafalis, T.B. (2013). Linear Discriminant Analysis.

Young, F.W. (1987). Multidimensional scaling: history, theory, and applications Lawrence Erlbaum Associates.

Zanghi, H., Ambroise, C., & Miele, V. (2008). Fast online graph clustering via Erdös-Rényi mixture. Pattern Recognition, 41(12), 3592–3599. http://www.sciencedirect.com/science/article/pii/S0031320308002483.CrossRef

Zhou, L., & Pan, H. (2014). Principal component analysis of two-dimensional functional data. Journal of Computational and Graphical Statistics, 2(3), 779–801. https://doi.org/10.1080/10618600.2013.827986.MathSciNetCrossRef

Titel: Gaussian-Based Visualization of Gaussian and Non-Gaussian-Based Clustering
verfasst von: Christophe Biernacki
Matthieu Marbac
Vincent Vandewalle
Publikationsdatum: 11.07.2020
Verlag: Springer US
Erschienen in: Journal of Classification / Ausgabe 1/2021
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI: https://doi.org/10.1007/s00357-020-09369-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2021

Editorial: Journal of Classification Vol. 38-1

A Membership Probability–Based Undersampling Algorithm for Imbalanced Data

A New Performance Evaluation Metric for Classifiers: Polygon Area Metric

Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index

Consumer Segmentation Based on Use Patterns

Spherical Classification of Data, a New Rule-Based Learning Method

Premium Partner