Skip to main content
Top

2019 | OriginalPaper | Chapter

Simultaneous Supervised and Unsupervised Classification Modeling for Assessing Cluster Analysis and Improving Results Interpretability

Authors : Mario Fordellone, Maurizio Vichi

Published in: Statistical Learning of Complex Data

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the unsupervised classification field, the unknown number of clusters and the lack of assessment and interpretability of the final partition by means of inferential tools denote important limitations that could negatively influence the reliability of the final results. In this work, we propose to combine unsupervised classification with supervised methods in order to enhance the assessment and interpretation of the obtained partition. In particular, the approach consists in combining of the clustering method k-means (KM) with logistic regression (LR) modeling to have an algorithm that allows an evaluation of the partition identified through KM, to assess the correct number of clusters, and to verify the selection of the most important variables. An application on real data is presented to better clarify the utility of the proposed approach.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agresti, A., Kateri, M.: Categorical data analysis. In: International Encyclopedia of Statistical Science, pp. 206–208 (2011)CrossRef Agresti, A., Kateri, M.: Categorical data analysis. In: International Encyclopedia of Statistical Science, pp. 206–208 (2011)CrossRef
2.
go back to reference Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)MathSciNetCrossRef Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)MathSciNetCrossRef
3.
go back to reference Chaovalit, P., Zhou, L.: Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, HICSS’05 (2005) Chaovalit, P., Zhou, L.: Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, HICSS’05 (2005)
4.
go back to reference Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)CrossRef Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)CrossRef
5.
go back to reference Dua, D., Taniskidou, E.K.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2017) Dua, D., Taniskidou, E.K.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2017)
6.
go back to reference Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Miscellaneous clustering methods. In: Cluster Analysis, 5th edn., pp. 215–255. Wiley, New York (2011) Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Miscellaneous clustering methods. In: Cluster Analysis, 5th edn., pp. 215–255. Wiley, New York (2011)
7.
go back to reference Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The Elements of Statistical Learning, pp. 485–585. Springer, Dordrecht (2009) Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The Elements of Statistical Learning, pp. 485–585. Springer, Dordrecht (2009)
8.
go back to reference Hepner, G., Logan, T., Ritter, N., Bryant, N.: Artificial neural network classification using a minimal training set. Comparison to conventional supervised classification. Photogramm. Eng. Remote. Sens. 56(4), 469–473 (1990) Hepner, G., Logan, T., Ritter, N., Bryant, N.: Artificial neural network classification using a minimal training set. Comparison to conventional supervised classification. Photogramm. Eng. Remote. Sens. 56(4), 469–473 (1990)
9.
go back to reference Krzanowski, W.J., Lai, Y.T.: A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44, 23–34 (1988)MathSciNetCrossRef Krzanowski, W.J., Lai, Y.T.: A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44, 23–34 (1988)MathSciNetCrossRef
10.
go back to reference MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1(14), 281–297 (1967)MathSciNetMATH MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1(14), 281–297 (1967)MathSciNetMATH
11.
go back to reference Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef
12.
go back to reference Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 63(2), 411–423 (2001)MathSciNetCrossRef Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 63(2), 411–423 (2001)MathSciNetCrossRef
Metadata
Title
Simultaneous Supervised and Unsupervised Classification Modeling for Assessing Cluster Analysis and Improving Results Interpretability
Authors
Mario Fordellone
Maurizio Vichi
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-21140-0_3

Premium Partner