Skip to main content

2017 | OriginalPaper | Buchkapitel

Feature Ranking of Large, Robust, and Weighted Clustering Result

verfasst von : Mirka Saarela, Joonas Hämäläinen, Tommi Kärkkäinen

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides automatic determination of variable ranking that can be used to explain and distinguish the groups of population. The ranking method is illustrated with an open data and then, applied to advance the educational knowledge discovery from large-scale international student assessment data, whose robust clustering into disjoint groups on three different levels of abstraction was performed in [19].

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Acar, E.F., Sun, L.: A generalized Kruskal-Wallis test incorporating group uncertainty with application to genetic association studies. Biometrics 69(2), 427–435 (2013)MathSciNetCrossRefMATH Acar, E.F., Sun, L.: A generalized Kruskal-Wallis test incorporating group uncertainty with application to genetic association studies. Biometrics 69(2), 427–435 (2013)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)MATH Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)MATH
3.
Zurück zum Zitat Äyrämö, S.: Knowledge Mining Using Robust Clustering. Jyväskylä Studies in Computing, vol. 63. University of Jyväskylä, Jyväskylä (2006) Äyrämö, S.: Knowledge Mining Using Robust Clustering. Jyväskylä Studies in Computing, vol. 63. University of Jyväskylä, Jyväskylä (2006)
4.
Zurück zum Zitat Ceccarelli, M., Maratea, A.: Assessing clustering reliability and features informativeness by random permutations. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007. LNCS (LNAI), vol. 4694, pp. 878–885. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74829-8_107 CrossRef Ceccarelli, M., Maratea, A.: Assessing clustering reliability and features informativeness by random permutations. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007. LNCS (LNAI), vol. 4694, pp. 878–885. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-74829-8_​107 CrossRef
5.
Zurück zum Zitat Cord, A., Ambroise, C., Cocquerez, J.P.: Feature selection in robust clustering based on Laplace mixture. Pattern Recogn. Lett. 27(6), 627–635 (2006)CrossRef Cord, A., Ambroise, C., Cocquerez, J.P.: Feature selection in robust clustering based on Laplace mixture. Pattern Recogn. Lett. 27(6), 627–635 (2006)CrossRef
6.
Zurück zum Zitat Crabtree, D., Andreae, P., Gao, X.: QC4 - a clustering evaluation method. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 59–70. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71701-0_9 CrossRef Crabtree, D., Andreae, P., Gao, X.: QC4 - a clustering evaluation method. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 59–70. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-71701-0_​9 CrossRef
7.
Zurück zum Zitat Dash, M., Liu, H.: Feature selection for clustering. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 110–121. Springer, Heidelberg (2000). doi:10.1007/3-540-45571-X_13 CrossRef Dash, M., Liu, H.: Feature selection for clustering. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 110–121. Springer, Heidelberg (2000). doi:10.​1007/​3-540-45571-X_​13 CrossRef
9.
Zurück zum Zitat Elamir, E.A.: Kruskal-Wallis test: a graphical way. Int. J. Stat. Appl. 5(3), 113–119 (2015) Elamir, E.A.: Kruskal-Wallis test: a graphical way. Int. J. Stat. Appl. 5(3), 113–119 (2015)
10.
Zurück zum Zitat Fagin, R., Kumar, R., Sivakumar, D.: Comparing top K lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 28–36. Society for Industrial and Applied Mathematics (2003) Fagin, R., Kumar, R., Sivakumar, D.: Comparing top K lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 28–36. Society for Industrial and Applied Mathematics (2003)
11.
Zurück zum Zitat Fung, P.C.G., Morstatter, F., Liu, H.: Feature selection strategy in text classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6634, pp. 26–37. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20841-6_3 CrossRef Fung, P.C.G., Morstatter, F., Liu, H.: Feature selection strategy in text classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6634, pp. 26–37. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-20841-6_​3 CrossRef
12.
Zurück zum Zitat Gifi, A.: Nonlinear Multivariate Analysis. Wiley, Hoboken (1991)MATH Gifi, A.: Nonlinear Multivariate Analysis. Wiley, Hoboken (1991)MATH
13.
Zurück zum Zitat Kim, Y., Lee, S.: A clustering validity assessment index. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 602–608. Springer, Heidelberg (2003). doi:10.1007/3-540-36175-8_60 CrossRef Kim, Y., Lee, S.: A clustering validity assessment index. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 602–608. Springer, Heidelberg (2003). doi:10.​1007/​3-540-36175-8_​60 CrossRef
14.
Zurück zum Zitat Koskela, A.: Exploring the differences of Finnish students in PISA 2003 and 2012 using educational data mining. Jyväskylä Studies in Computing. University of Jyväskylä, Jyväskylä (2016) Koskela, A.: Exploring the differences of Finnish students in PISA 2003 and 2012 using educational data mining. Jyväskylä Studies in Computing. University of Jyväskylä, Jyväskylä (2016)
15.
Zurück zum Zitat Kruskal, W., Wallis, W.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)CrossRefMATH Kruskal, W., Wallis, W.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)CrossRefMATH
16.
Zurück zum Zitat OECD: PISA 2012 Results: Excellence Through Equity: Giving Every Student the Chance to Succeed (Volume II). PISA, OECD Publishing (2013) OECD: PISA 2012 Results: Excellence Through Equity: Giving Every Student the Chance to Succeed (Volume II). PISA, OECD Publishing (2013)
17.
Zurück zum Zitat OECD: PISA 2012 Technical report. OECD Publishing (2014) OECD: PISA 2012 Technical report. OECD Publishing (2014)
18.
Zurück zum Zitat Rutkowski, L., Rutkowski, D.: Getting it “better”: the importance of improving background questionnaires in international large-scale assessment. J. Curric. Stud. 42(3), 411–430 (2010)CrossRef Rutkowski, L., Rutkowski, D.: Getting it “better”: the importance of improving background questionnaires in international large-scale assessment. J. Curric. Stud. 42(3), 411–430 (2010)CrossRef
19.
Zurück zum Zitat Saarela, M., Kärkkäinen, T.: Do country stereotypes exist in PISA? A clustering approach for large, sparse, and weighted data. In: Proceedings of the 8th International Conference on Educational Data Mining, pp. 156–163 (2015) Saarela, M., Kärkkäinen, T.: Do country stereotypes exist in PISA? A clustering approach for large, sparse, and weighted data. In: Proceedings of the 8th International Conference on Educational Data Mining, pp. 156–163 (2015)
20.
Zurück zum Zitat Saarela, M., Kärkkäinen, T.: Weighted clustering of sparse educational data. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 337–342 (2015) Saarela, M., Kärkkäinen, T.: Weighted clustering of sparse educational data. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 337–342 (2015)
21.
22.
Zurück zum Zitat Tölgyesi, C., Bátori, Z., Erdős, L.: Using statistical tests on relative ecological indicator values to compare vegetation units-different approaches and weighting methods. Ecol. Ind. 36, 441–446 (2014)CrossRef Tölgyesi, C., Bátori, Z., Erdős, L.: Using statistical tests on relative ecological indicator values to compare vegetation units-different approaches and weighting methods. Ecol. Ind. 36, 441–446 (2014)CrossRef
23.
Zurück zum Zitat Verde, R., Lechevallier, Y., Chavent, M.: Symbolic clustering interpretation and visualization. Electron. J. Symbolic Data Anal. 1(1), 1 (2003) Verde, R., Lechevallier, Y., Chavent, M.: Symbolic clustering interpretation and visualization. Electron. J. Symbolic Data Anal. 1(1), 1 (2003)
24.
Zurück zum Zitat Yang, H., Zhao, D., Cao, L., Sun, F.: A precise and robust clustering approach using homophilic degrees of graph kernel. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 257–270. Springer, Cham (2016). doi:10.1007/978-3-319-31750-2_21 CrossRef Yang, H., Zhao, D., Cao, L., Sun, F.: A precise and robust clustering approach using homophilic degrees of graph kernel. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 257–270. Springer, Cham (2016). doi:10.​1007/​978-3-319-31750-2_​21 CrossRef
25.
Zurück zum Zitat Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)MATH Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)MATH
Metadaten
Titel
Feature Ranking of Large, Robust, and Weighted Clustering Result
verfasst von
Mirka Saarela
Joonas Hämäläinen
Tommi Kärkkäinen
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-57454-7_8