Skip to main content
Top

2017 | OriginalPaper | Chapter

Explaining Deviating Subsets Through Explanation Networks

Authors : Antti Ukkonen, Vladimir Dzyuba, Matthijs van Leeuwen

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose a novel approach to finding explanations of deviating subsets, often called subgroups. Existing approaches for subgroup discovery rely on various quality measures that nonetheless often fail to find subgroup sets that are diverse, of high quality, and most importantly, provide good explanations of the deviations that occur in the data.
To tackle this issue we introduce explanation networks, which provide a holistic view on all candidate subgroups and how they relate to each other, offering elegant ways to select high-quality yet diverse subgroup sets. Explanation networks are constructed by representing subgroups by nodes and having weighted edges represent the extent to which one subgroup explains another. Explanatory strength is defined by extending ideas from database causality, in which interventions are used to quantify the effect of one query on another.
Given an explanatory network, existing network analysis techniques can be used for subgroup discovery. In particular, we study the use of Page-Rank for pattern ranking and seed selection (from influence maximization) for pattern set selection. Experiments on synthetic and real data show that the proposed approach finds subgroup sets that are more likely to capture the generative processes of the data than other methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Atzmueller, M.: Subgroup discovery. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 5(1), 35–49 (2015) Atzmueller, M.: Subgroup discovery. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 5(1), 35–49 (2015)
2.
go back to reference Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998) Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
4.
go back to reference Grosskreutz, H.: Cascaded subgroups discovery with an application to regression. In: Proceedings of LeGo ECML/PKDD Workshop (2008) Grosskreutz, H.: Cascaded subgroups discovery with an application to regression. In: Proceedings of LeGo ECML/PKDD Workshop (2008)
5.
go back to reference Huang, S., Webb, G.I.: Discarding insignificant rules during impact rule discovery in large, dense databases. In: Proceedings of SDM, pp. 541–545 (2005) Huang, S., Webb, G.I.: Discarding insignificant rules during impact rule discovery in large, dense databases. In: Proceedings of SDM, pp. 541–545 (2005)
6.
go back to reference Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of KDD, pp. 137–146 (2003) Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of KDD, pp. 137–146 (2003)
7.
go back to reference Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996) Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)
8.
go back to reference Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(Feb), 153–188 (2004) Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(Feb), 153–188 (2004)
9.
go back to reference Lavrač, N., Gamberger, D.: Relevancy in constraint-based subgroup discovery. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 243–266. Springer, Heidelberg (2006). https://doi.org/10.1007/11615576_12 CrossRef Lavrač, N., Gamberger, D.: Relevancy in constraint-based subgroup discovery. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 243–266. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11615576_​12 CrossRef
10.
13.
go back to reference Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Discov. 30(3), 711–762 (2016)MathSciNetCrossRef Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Discov. 30(3), 711–762 (2016)MathSciNetCrossRef
15.
go back to reference Meliou, A., Gatterbauer, W., Halpern, J.Y., Koch, C., Moore, K.F., Suciu, D.: Causality in databases. IEEE Data Eng. Bull. 33(3), 59–67 (2010) Meliou, A., Gatterbauer, W., Halpern, J.Y., Koch, C., Moore, K.F., Suciu, D.: Causality in databases. IEEE Data Eng. Bull. 33(3), 59–67 (2010)
16.
go back to reference Meliou, A., Roy, S., Suciu, D.: Causality and explanations in databases. Proc. VLDB Endow. 7(13), 1715–1716 (2014)CrossRef Meliou, A., Roy, S., Suciu, D.: Causality and explanations in databases. Proc. VLDB Endow. 7(13), 1715–1716 (2014)CrossRef
17.
go back to reference Pearl, J.: Causality, 2nd edn. Cambridge University Press, Cambridge (2009) Pearl, J.: Causality, 2nd edn. Cambridge University Press, Cambridge (2009)
18.
go back to reference Roy, S., Orr, L., Suciu, D.: Explaining query answers with explanation-ready databases. Proc. VLDB Endow. 9(4), 348–359 (2015)CrossRef Roy, S., Orr, L., Suciu, D.: Explaining query answers with explanation-ready databases. Proc. VLDB Endow. 9(4), 348–359 (2015)CrossRef
19.
go back to reference Roy, S., Suciu, D.: A formal approach to finding explanations for database queries. In: Proceedings of SIGMOD, pp. 1579–1590 (2014) Roy, S., Suciu, D.: A formal approach to finding explanations for database queries. In: Proceedings of SIGMOD, pp. 1579–1590 (2014)
20.
go back to reference Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J.: Statistical significance of combinatorial regulations. Proc. Natl. Acad. Sci. 110(32), 12996–13001 (2013)MathSciNetCrossRefMATH Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J.: Statistical significance of combinatorial regulations. Proc. Natl. Acad. Sci. 110(32), 12996–13001 (2013)MathSciNetCrossRefMATH
22.
go back to reference Wu, E., Madden, S.: Scorpion: explaining away outliers in aggregate queries. Proc. VLDB Endow. 6(8), 553–564 (2013)CrossRef Wu, E., Madden, S.: Scorpion: explaining away outliers in aggregate queries. Proc. VLDB Endow. 6(8), 553–564 (2013)CrossRef
23.
go back to reference Zliobaite, I., Mathioudakis, M., Lehtiniemi, T., Parviainen, P., Janhunen, T.: Accessibility by public transport predicts residential real estate prices: a case study in Helsinki region. In: 2nd Workshop on Mining Urban Data at ICML 2015 (2015) Zliobaite, I., Mathioudakis, M., Lehtiniemi, T., Parviainen, P., Janhunen, T.: Accessibility by public transport predicts residential real estate prices: a case study in Helsinki region. In: 2nd Workshop on Mining Urban Data at ICML 2015 (2015)
Metadata
Title
Explaining Deviating Subsets Through Explanation Networks
Authors
Antti Ukkonen
Vladimir Dzyuba
Matthijs van Leeuwen
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-71246-8_26

Premium Partner