Skip to main content
Top

2023 | OriginalPaper | Chapter

Discovering Diverse Top-K Characteristic Lists

Authors : Antonio Lopez-Martinez-Carrasco, Hugo M. Proença, Jose M. Juarez, Matthijs van Leeuwen, Manuel Campos

Published in: Advances in Intelligent Data Analysis XXI

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, we define the new problem of finding diverse top-k characteristic lists to provide different statistically robust explanations of the same dataset. This type of problem is often encountered in complex domains, such as medicine, in which a single model cannot consistently explain the already established ground truth, needing a diversity of models. We propose a solution for this new problem based on Subgroup Discovery (SD). Moreover, the diversity is described in terms of coverage and descriptions. The characteristic lists are obtained using an extension of SD, in which a subgroup identifies a set of relations between attributes (description) with respect to an attribute of interest (target). In particular, the generation of these characteristic lists is driven by the Minimum Description Length (MDL) principle, which is based on the idea that the best explanation of the data is the one that achieves the greatest compression. Finally, we also propose an algorithm called GMSL which is simple and easy to interpret and obtains a collection of diverse top-k characteristic lists.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
1.
go back to reference Alkhatib, A., Boström, H., Vazirgiannis, M.: Explaining predictions by characteristic rules. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, Cham (2022) Alkhatib, A., Boström, H., Vazirgiannis, M.: Explaining predictions by characteristic rules. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, Cham (2022)
2.
go back to reference Atzmueller, M.: Subgroup discovery - advanced review. WIREs: Data Min. Knowl. Discov. 5(1), 35–49 (2015) Atzmueller, M.: Subgroup discovery - advanced review. WIREs: Data Min. Knowl. Discov. 5(1), 35–49 (2015)
4.
go back to reference Duivesteijn, W., Knobbe, A.: Exploiting false discoveries - statistical validation of patterns and quality measures in subgroup discovery. In: IEEE 11th International Conference on Data Mining (ICDM 2011), pp. 151–160 (2011) Duivesteijn, W., Knobbe, A.: Exploiting false discoveries - statistical validation of patterns and quality measures in subgroup discovery. In: IEEE 11th International Conference on Data Mining (ICDM 2011), pp. 151–160 (2011)
5.
6.
go back to reference Grünwald, P.D.: The Minimum Description Length Principle, MIT Press Books, vol. 1. The MIT Press, Cambridge (2007) Grünwald, P.D.: The Minimum Description Length Principle, MIT Press Books, vol. 1. The MIT Press, Cambridge (2007)
7.
go back to reference Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1675–1684. KDD 2016, Association for Computing Machinery (2016) Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1675–1684. KDD 2016, Association for Computing Machinery (2016)
8.
go back to reference Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNet Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNet
10.
go back to reference Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery. Data Min. Knowl. Discovery 36, 1885–1970 (2022)MathSciNetCrossRef Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery. Data Min. Knowl. Discovery 36, 1885–1970 (2022)MathSciNetCrossRef
11.
go back to reference Proença, H.M., Grünwald, P., Bäck, T., Leeuwen, M.V.: Discovering outstanding subgroup lists for numeric targets using MDL. In: Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020), pp. 19–35 (2021) Proença, H.M., Grünwald, P., Bäck, T., Leeuwen, M.V.: Discovering outstanding subgroup lists for numeric targets using MDL. In: Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020), pp. 19–35 (2021)
12.
go back to reference Semenova, L., Rudin, C., Parr, R.: On the existence of simpler machine learning models. In: ACM Conference on Fairness, Accountability, and Transparency, pp. 1827–1858. FAccT 2022, Association for Computing Machinery (2022) Semenova, L., Rudin, C., Parr, R.: On the existence of simpler machine learning models. In: ACM Conference on Fairness, Accountability, and Transparency, pp. 1827–1858. FAccT 2022, Association for Computing Machinery (2022)
13.
go back to reference Xin, R., Zhong, C., Chen, Z., Takagi, T., Seltzer, M.I., Rudin, C.: Exploring the whole rashomon set of sparse decision trees. ArXiv abs/2209.08040 (2022) Xin, R., Zhong, C., Chen, Z., Takagi, T., Seltzer, M.I., Rudin, C.: Exploring the whole rashomon set of sparse decision trees. ArXiv abs/2209.08040 (2022)
Metadata
Title
Discovering Diverse Top-K Characteristic Lists
Authors
Antonio Lopez-Martinez-Carrasco
Hugo M. Proença
Jose M. Juarez
Matthijs van Leeuwen
Manuel Campos
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-30047-9_21

Premium Partner