Skip to main content
Erschienen in: Journal of Classification 1/2021

11.07.2020 | Original Research

Gaussian-Based Visualization of Gaussian and Non-Gaussian-Based Clustering

verfasst von: Christophe Biernacki, Matthieu Marbac, Vincent Vandewalle

Erschienen in: Journal of Classification | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A generic method is introduced to visualize in a “Gaussian-like way,” and onto \(\mathbb {R}^{2}\), results of Gaussian or non-Gaussian–based clustering. The key point is to explicitly force a visualization based on a spherical Gaussian mixture to inherit from the within cluster overlap that is present in the initial clustering mixture. The result is a particularly user-friendly drawing of the clusters, providing any practitioner with an overview of the potentially complex clustering result. An entropic measure provides information about the quality of the drawn overlap compared with the true one in the initial space. The proposed method is illustrated on four real data sets of different types (categorical, mixed, functional, and network) and is implemented on the r package ClusVis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that the normalized empirical entropy E could also eventually be additionally normalized by \(\ln K\). We have not done it in order to follow the seminal definition of Hathaway. And, obviously, it has no impact at all on the related parameter estimation in Section 3.3.
 
2
We never observed this event through experiments of this paper. But, in case it happens, we propose to draw at least the center of the unobserved cluster on the graph.
 
Literatur
Zurück zum Zitat Audigier, V., Husson, F., & Josse, J. (2016a). Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11), 2140–2156. Audigier, V., Husson, F., & Josse, J. (2016a). Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11), 2140–2156.
Zurück zum Zitat Audigier, V., Husson, F., & Josse, J. (2016b). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1), 5–26. Audigier, V., Husson, F., & Josse, J. (2016b). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1), 5–26.
Zurück zum Zitat Benaglia, T., Chauveau, D., & Hunter, D.R. (2009). An em-like algorithm for semi- and nonparametric estimation in multivariate mixtures. Journal of Computational and Graphical Statistics, 18, 505–526.MathSciNetCrossRef Benaglia, T., Chauveau, D., & Hunter, D.R. (2009). An em-like algorithm for semi- and nonparametric estimation in multivariate mixtures. Journal of Computational and Graphical Statistics, 18, 505–526.MathSciNetCrossRef
Zurück zum Zitat Bezdek, J.C., Pal, M.R., Keller, J., & Krisnapuram, R. (1999). Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. USA: Kluwer Academic Publishers.CrossRef Bezdek, J.C., Pal, M.R., Keller, J., & Krisnapuram, R. (1999). Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. USA: Kluwer Academic Publishers.CrossRef
Zurück zum Zitat Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(7), 719–725.CrossRef Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(7), 719–725.CrossRef
Zurück zum Zitat Bishop, C.M., Svensén, M., & Williams, C.K. (1998). Gtm: The generative topographic mapping. Neural computation, 10(1), 215–234.CrossRef Bishop, C.M., Svensén, M., & Williams, C.K. (1998). Gtm: The generative topographic mapping. Neural computation, 10(1), 215–234.CrossRef
Zurück zum Zitat Bouveyron, C., & Jacques, J. (2011). Model-based clustering of time series in group-specific functional subspaces. Advances in Data Analysis and Classification, 5(4), 281–300.MathSciNetCrossRef Bouveyron, C., & Jacques, J. (2011). Model-based clustering of time series in group-specific functional subspaces. Advances in Data Analysis and Classification, 5(4), 281–300.MathSciNetCrossRef
Zurück zum Zitat Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern recognition, 28(5), 781–793.CrossRef Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern recognition, 28(5), 781–793.CrossRef
Zurück zum Zitat Chavent, M., & Kuentz-Simonet, V. (2012). Orthogonal rotation in pcamix. Advances in Data Analysis and Classification, 6(2), 131–146.MathSciNetCrossRef Chavent, M., & Kuentz-Simonet, V. (2012). Orthogonal rotation in pcamix. Advances in Data Analysis and Classification, 6(2), 131–146.MathSciNetCrossRef
Zurück zum Zitat Cox, T., & Cox, M. (2001). Multidimensional Scaling Chapman and Hall. Cox, T., & Cox, M. (2001). Multidimensional Scaling Chapman and Hall.
Zurück zum Zitat Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.MathSciNetCrossRef Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.MathSciNetCrossRef
Zurück zum Zitat Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 179–188.CrossRef Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 179–188.CrossRef
Zurück zum Zitat Gollini, I., & Murphy, T. (2014). Mixture of latent trait analyzers for model-based clustering of categorical data. Statistics and Computing, 24(4), 569–588.MathSciNetCrossRef Gollini, I., & Murphy, T. (2014). Mixture of latent trait analyzers for model-based clustering of categorical data. Statistics and Computing, 24(4), 569–588.MathSciNetCrossRef
Zurück zum Zitat Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.MathSciNetCrossRef Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.MathSciNetCrossRef
Zurück zum Zitat Greenacre, M. (2017). Correspondence analysis in practice CRC press. Greenacre, M. (2017). Correspondence analysis in practice CRC press.
Zurück zum Zitat Hathaway, R.J. (1986). Another interpretation of the em algorithm for mixture distributions. Statistics and Probability Letters, 4, 53–56.MathSciNetCrossRef Hathaway, R.J. (1986). Another interpretation of the em algorithm for mixture distributions. Statistics and Probability Letters, 4, 53–56.MathSciNetCrossRef
Zurück zum Zitat Hennig, C. (2004). Asymmetric linear dimension reduction for classification. Journal of Computational and Graphical Statistics, 13(4), 930–945.MathSciNetCrossRef Hennig, C. (2004). Asymmetric linear dimension reduction for classification. Journal of Computational and Graphical Statistics, 13(4), 930–945.MathSciNetCrossRef
Zurück zum Zitat Hennig, C. (2010). Methods for merging gaussian mixture components. Advances in Data Analysis and Classification, 4, 3–34.MathSciNetCrossRef Hennig, C. (2010). Methods for merging gaussian mixture components. Advances in Data Analysis and Classification, 4, 3–34.MathSciNetCrossRef
Zurück zum Zitat Jajuga, K., Sokołowski, A., & Bock, H. (2002). Classification, clustering and data analysis: recent advances and applications. Berlin Heidelberg New York: Springer.CrossRef Jajuga, K., Sokołowski, A., & Bock, H. (2002). Classification, clustering and data analysis: recent advances and applications. Berlin Heidelberg New York: Springer.CrossRef
Zurück zum Zitat Josse, J., Chavent, M., Liquet, B., & Husson, F. (2012). Handling missing values with regularized iterative multiple correspondence analysis. Journal of classification, 29(1), 91–116.MathSciNetCrossRef Josse, J., Chavent, M., Liquet, B., & Husson, F. (2012). Handling missing values with regularized iterative multiple correspondence analysis. Journal of classification, 29(1), 91–116.MathSciNetCrossRef
Zurück zum Zitat Josse, J., Pagès, J, & Husson, F. (2011). Multiple imputation in principal component analysis. Advances in data analysis and classification, 5(3), 231–246.MathSciNetCrossRef Josse, J., Pagès, J, & Husson, F. (2011). Multiple imputation in principal component analysis. Advances in data analysis and classification, 5(3), 231–246.MathSciNetCrossRef
Zurück zum Zitat Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59–69.MathSciNetCrossRef Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59–69.MathSciNetCrossRef
Zurück zum Zitat Larose, C. (2015). Model-Based Clustering of Incomplete Data, PhD thesis, University of Connecticut. Larose, C. (2015). Model-Based Clustering of Incomplete Data, PhD thesis, University of Connecticut.
Zurück zum Zitat Lê, S., Josse, J., Husson, F., & et al. (2008). Factominer: an R package for multivariate analysis. Journal of statistical software, 25(1), 1–18.CrossRef Lê, S., Josse, J., Husson, F., & et al. (2008). Factominer: an R package for multivariate analysis. Journal of statistical software, 25(1), 1–18.CrossRef
Zurück zum Zitat Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., & Govaert, G. (2015). Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. Journal of Statistical Software, 67(6), 241–270.CrossRef Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., & Govaert, G. (2015). Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. Journal of Statistical Software, 67(6), 241–270.CrossRef
Zurück zum Zitat Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine learning, 4(3), 203–228.CrossRef Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine learning, 4(3), 203–228.CrossRef
Zurück zum Zitat Marbac, M., Biernacki, C., & Vandewalle, V. (2016). Latent class model with conditional dependency per modes to cluster categorical data. Advances in Data Analysis and Classification, 10(2), 183–207.MathSciNetCrossRef Marbac, M., Biernacki, C., & Vandewalle, V. (2016). Latent class model with conditional dependency per modes to cluster categorical data. Advances in Data Analysis and Classification, 10(2), 183–207.MathSciNetCrossRef
Zurück zum Zitat Marbac, M., Biernacki, C., & Vandewalle, V. (2017). Model-based clustering of Gaussian copulas for mixed data. Communications in Statistics - Theory and Methods, 46(23), 11635–11656.MathSciNetCrossRef Marbac, M., Biernacki, C., & Vandewalle, V. (2017). Model-based clustering of Gaussian copulas for mixed data. Communications in Statistics - Theory and Methods, 46(23), 11635–11656.MathSciNetCrossRef
Zurück zum Zitat Mazo, G. (2017). A semiparametric and location-shift copula-based mixture model. Journal of Classification, 34(3), 444–464.MathSciNetCrossRef Mazo, G. (2017). A semiparametric and location-shift copula-based mixture model. Journal of Classification, 34(3), 444–464.MathSciNetCrossRef
Zurück zum Zitat McLachlan, G., & Peel, D. (2004). Finite mixture models. New York: Wiley.MATH McLachlan, G., & Peel, D. (2004). Finite mixture models. New York: Wiley.MATH
Zurück zum Zitat McNicholas, P. (2016). Mixture model-based classification CRC Press. McNicholas, P. (2016). Mixture model-based classification CRC Press.
Zurück zum Zitat McNicholas, P., & Scrucca, L. (2013). Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Statistics & Probability Letters, 7, 321–338.MathSciNetMATH McNicholas, P., & Scrucca, L. (2013). Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Statistics & Probability Letters, 7, 321–338.MathSciNetMATH
Zurück zum Zitat McParland, D., & Gormley, I.C. (2016). Model based clustering for mixed data: clustmd. Advances in Data Analysis and Classification, 10(2), 155–169.MathSciNetCrossRef McParland, D., & Gormley, I.C. (2016). Model based clustering for mixed data: clustmd. Advances in Data Analysis and Classification, 10(2), 155–169.MathSciNetCrossRef
Zurück zum Zitat Moustaki, I., & Papageorgiou, I. (2005). Latent class models for mixed variables with applications in archaeometry. Computational statistics & data analysis, 48(3), 659–675.MathSciNetCrossRef Moustaki, I., & Papageorgiou, I. (2005). Latent class models for mixed variables with applications in archaeometry. Computational statistics & data analysis, 48(3), 659–675.MathSciNetCrossRef
Zurück zum Zitat Punzo, A., & Ingrassia, S. (2016). Clustering bivariate mixed-type data via the cluster-weighted model. Computational Statistics, 31(3), 989–1013.MathSciNetCrossRef Punzo, A., & Ingrassia, S. (2016). Clustering bivariate mixed-type data via the cluster-weighted model. Computational Statistics, 31(3), 989–1013.MathSciNetCrossRef
Zurück zum Zitat Ramsay, J.O., & Silverman, B.W. (2005). Functional data analysis Springer Series in Statistics, second edn, Springer, New York. Ramsay, J.O., & Silverman, B.W. (2005). Functional data analysis Springer Series in Statistics, second edn, Springer, New York.
Zurück zum Zitat Samé, A., Chamroukhi, F., Govert, G., & Aknin, P. (2011). Model-based clustering and segmentation of time series with changes in regime. Advances in Data Analysis Classification, 5, 301–321.MathSciNetCrossRef Samé, A., Chamroukhi, F., Govert, G., & Aknin, P. (2011). Model-based clustering and segmentation of time series with changes in regime. Advances in Data Analysis Classification, 5, 301–321.MathSciNetCrossRef
Zurück zum Zitat Schlimmer, J. (1987). Concept acquisition through representational adjustment, PhD thesis, Department of Information and Computer Science, University of California. Schlimmer, J. (1987). Concept acquisition through representational adjustment, PhD thesis, Department of Information and Computer Science, University of California.
Zurück zum Zitat Van der Heijden, P., & Escofier, B. (2003). Multiple correspondence analysis with missing data Analyse des correspondances. Recherches au cżur de l’analyse des donnees pp. 152–170. Van der Heijden, P., & Escofier, B. (2003). Multiple correspondence analysis with missing data Analyse des correspondances. Recherches au cżur de l’analyse des donnees pp. 152–170.
Zurück zum Zitat Verbanck, M., Josse, J., & Husson, F. (2015). Regularised PCA to denoise and visualise data. Statistics and Computing, 25(2), 471–486.MathSciNetCrossRef Verbanck, M., Josse, J., & Husson, F. (2015). Regularised PCA to denoise and visualise data. Statistics and Computing, 25(2), 471–486.MathSciNetCrossRef
Zurück zum Zitat Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3), 586–600.CrossRef Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3), 586–600.CrossRef
Zurück zum Zitat Xanthopoulos, P., Pardalos, P.M., & Trafalis, T.B. (2013). Linear Discriminant Analysis. Xanthopoulos, P., Pardalos, P.M., & Trafalis, T.B. (2013). Linear Discriminant Analysis.
Zurück zum Zitat Young, F.W. (1987). Multidimensional scaling: history, theory, and applications Lawrence Erlbaum Associates. Young, F.W. (1987). Multidimensional scaling: history, theory, and applications Lawrence Erlbaum Associates.
Metadaten
Titel
Gaussian-Based Visualization of Gaussian and Non-Gaussian-Based Clustering
verfasst von
Christophe Biernacki
Matthieu Marbac
Vincent Vandewalle
Publikationsdatum
11.07.2020
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 1/2021
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-020-09369-y

Weitere Artikel der Ausgabe 1/2021

Journal of Classification 1/2021 Zur Ausgabe

Premium Partner