Skip to main content

2023 | OriginalPaper | Buchkapitel

Discovering Diverse Top-K Characteristic Lists

verfasst von : Antonio Lopez-Martinez-Carrasco, Hugo M. Proença, Jose M. Juarez, Matthijs van Leeuwen, Manuel Campos

Erschienen in: Advances in Intelligent Data Analysis XXI

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we define the new problem of finding diverse top-k characteristic lists to provide different statistically robust explanations of the same dataset. This type of problem is often encountered in complex domains, such as medicine, in which a single model cannot consistently explain the already established ground truth, needing a diversity of models. We propose a solution for this new problem based on Subgroup Discovery (SD). Moreover, the diversity is described in terms of coverage and descriptions. The characteristic lists are obtained using an extension of SD, in which a subgroup identifies a set of relations between attributes (description) with respect to an attribute of interest (target). In particular, the generation of these characteristic lists is driven by the Minimum Description Length (MDL) principle, which is based on the idea that the best explanation of the data is the one that achieves the greatest compression. Finally, we also propose an algorithm called GMSL which is simple and easy to interpret and obtains a collection of diverse top-k characteristic lists.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
1.
Zurück zum Zitat Alkhatib, A., Boström, H., Vazirgiannis, M.: Explaining predictions by characteristic rules. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, Cham (2022) Alkhatib, A., Boström, H., Vazirgiannis, M.: Explaining predictions by characteristic rules. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, Cham (2022)
2.
Zurück zum Zitat Atzmueller, M.: Subgroup discovery - advanced review. WIREs: Data Min. Knowl. Discov. 5(1), 35–49 (2015) Atzmueller, M.: Subgroup discovery - advanced review. WIREs: Data Min. Knowl. Discov. 5(1), 35–49 (2015)
4.
Zurück zum Zitat Duivesteijn, W., Knobbe, A.: Exploiting false discoveries - statistical validation of patterns and quality measures in subgroup discovery. In: IEEE 11th International Conference on Data Mining (ICDM 2011), pp. 151–160 (2011) Duivesteijn, W., Knobbe, A.: Exploiting false discoveries - statistical validation of patterns and quality measures in subgroup discovery. In: IEEE 11th International Conference on Data Mining (ICDM 2011), pp. 151–160 (2011)
5.
6.
Zurück zum Zitat Grünwald, P.D.: The Minimum Description Length Principle, MIT Press Books, vol. 1. The MIT Press, Cambridge (2007) Grünwald, P.D.: The Minimum Description Length Principle, MIT Press Books, vol. 1. The MIT Press, Cambridge (2007)
7.
Zurück zum Zitat Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1675–1684. KDD 2016, Association for Computing Machinery (2016) Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1675–1684. KDD 2016, Association for Computing Machinery (2016)
8.
Zurück zum Zitat Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNet Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNet
10.
Zurück zum Zitat Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery. Data Min. Knowl. Discovery 36, 1885–1970 (2022)MathSciNetCrossRef Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery. Data Min. Knowl. Discovery 36, 1885–1970 (2022)MathSciNetCrossRef
11.
Zurück zum Zitat Proença, H.M., Grünwald, P., Bäck, T., Leeuwen, M.V.: Discovering outstanding subgroup lists for numeric targets using MDL. In: Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020), pp. 19–35 (2021) Proença, H.M., Grünwald, P., Bäck, T., Leeuwen, M.V.: Discovering outstanding subgroup lists for numeric targets using MDL. In: Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020), pp. 19–35 (2021)
12.
Zurück zum Zitat Semenova, L., Rudin, C., Parr, R.: On the existence of simpler machine learning models. In: ACM Conference on Fairness, Accountability, and Transparency, pp. 1827–1858. FAccT 2022, Association for Computing Machinery (2022) Semenova, L., Rudin, C., Parr, R.: On the existence of simpler machine learning models. In: ACM Conference on Fairness, Accountability, and Transparency, pp. 1827–1858. FAccT 2022, Association for Computing Machinery (2022)
13.
Zurück zum Zitat Xin, R., Zhong, C., Chen, Z., Takagi, T., Seltzer, M.I., Rudin, C.: Exploring the whole rashomon set of sparse decision trees. ArXiv abs/2209.08040 (2022) Xin, R., Zhong, C., Chen, Z., Takagi, T., Seltzer, M.I., Rudin, C.: Exploring the whole rashomon set of sparse decision trees. ArXiv abs/2209.08040 (2022)
Metadaten
Titel
Discovering Diverse Top-K Characteristic Lists
verfasst von
Antonio Lopez-Martinez-Carrasco
Hugo M. Proença
Jose M. Juarez
Matthijs van Leeuwen
Manuel Campos
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-30047-9_21

Premium Partner