Skip to main content
Top
Published in: Progress in Artificial Intelligence 1/2022

18-10-2021 | Regular Paper

Semi-causal decision trees

Authors: Ana Rita Nogueira, Carlos Abreu Ferreira, João Gama

Published in: Progress in Artificial Intelligence | Issue 1/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Typically, classification algorithms use correlation analysis to make decisions. However, these decisions and the models they learn are not easily understandable for the typical user. Causal discovery is the field that studies the means to find causal relationships in observational data. Although highly interpretable, causal discovery algorithms tend to not perform so well in classification problems. This paper aims to propose a hybrid decision tree approach (SC tree) that mixes causal discovery with correlation analysis through the implementation of a custom metric to split the data in the tree’s construction (Semi-causal gain ratio). In the results, the proposed methodology obtained a significant performance improvement (11.26% mean error rate) when compared to several causal baselines CDT-PS (23.67% ) and CDT-SPS (25.14%), matching closely the performance of J48 (10.20%), used as a correlation baseline, in ten binary data sets. Besides, when compared with PC in discrete data sets, the proposed approach obtained substantial improvement (16.17% against 28.07% in terms of mean error rate).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
2
“For each of the separate levels of the co-variable set h = 1, 2, ..., q, the response variable is distributed at random with respect to the sub-populations, i.e. the data in the respective rows of the hth table can be regarded as a successive set of simple random samples of sizes {Nhi.} from a fixed population corresponding to the marginal total distribution of the response variable {Nh.j}” [10].
 
3
We used the WEKA jar file provided kindly by the authors to compare with our methodology.
 
4
We used the WEKA implementation.
 
Literature
1.
go back to reference Agresti, A.: An introduction to categorical data analysis. Wiley, New York (2018)MATH Agresti, A.: An introduction to categorical data analysis. Wiley, New York (2018)MATH
2.
go back to reference Birch, M.: The detection of partial association, i: the 2\(\times \) 2 case. J. Roy. Stat. Soc.: Ser. B (Methodol.) 26(2), 313–324 (1964)MathSciNetMATH Birch, M.: The detection of partial association, i: the 2\(\times \) 2 case. J. Roy. Stat. Soc.: Ser. B (Methodol.) 26(2), 313–324 (1964)MathSciNetMATH
3.
4.
go back to reference DeFries, R., Agarwala, M., Baquie, S., Choksi, P., Khanwilkar, S., Mondal, P., Nagendra, H., Uperlainen, J.: Improved household living standards can restore dry tropical forests. Biotropica (2021) DeFries, R., Agarwala, M., Baquie, S., Choksi, P., Khanwilkar, S., Mondal, P., Nagendra, H., Uperlainen, J.: Improved household living standards can restore dry tropical forests. Biotropica (2021)
16.
go back to reference Marx, A., Vreeken, J.: Testing conditional independence on discrete data using stochastic complexity. arXiv preprint arXiv:1903.04829 (2019) Marx, A., Vreeken, J.: Testing conditional independence on discrete data using stochastic complexity. arXiv preprint arXiv:​1903.​04829 (2019)
17.
go back to reference Mooij, J.M., Cremers, J., Others: An empirical study of one of the simplest causal prediction algorithms. In: UAI 2015 Workshop on Advances in Causal Inference, 1504, pp. 30–39 (2015) Mooij, J.M., Cremers, J., Others: An empirical study of one of the simplest causal prediction algorithms. In: UAI 2015 Workshop on Advances in Causal Inference, 1504, pp. 30–39 (2015)
18.
go back to reference Pearl, J., Verma, T.S.: A theory of inferred causation. In: Studies in Logic and the Foundations of Mathematics, vol. 134, pp. 789–811. Elsevier (1995) Pearl, J., Verma, T.S.: A theory of inferred causation. In: Studies in Logic and the Foundations of Mathematics, vol. 134, pp. 789–811. Elsevier (1995)
19.
go back to reference Piltaver, R., Luštrek, M., Gams, M., Martinšić-Ipšić, S.: What makes classification trees comprehensible? Expert Syst. Appl. 62, 333–346 (2016) Piltaver, R., Luštrek, M., Gams, M., Martinšić-Ipšić, S.: What makes classification trees comprehensible? Expert Syst. Appl. 62, 333–346 (2016)
22.
go back to reference Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, prediction, and search. MIT press (2000) Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, prediction, and search. MIT press (2000)
23.
go back to reference Tangirala, S.: Evaluating the impact of gini index and information gain on classification using decision tree classifier algorithm. Int. J. Adv. Comput. Sci. Appl. 11(2), 612–619 (2020) Tangirala, S.: Evaluating the impact of gini index and information gain on classification using decision tree classifier algorithm. Int. J. Adv. Comput. Sci. Appl. 11(2), 612–619 (2020)
24.
go back to reference Theil, H.: Statistical decomposition analysis; with applications in the social and administrative sciences. Tech. rep. (1972) Theil, H.: Statistical decomposition analysis; with applications in the social and administrative sciences. Tech. rep. (1972)
26.
go back to reference Yu, K., Li, J., Liu, L.: A Review on Algorithms for Constraint-based Causal Discovery (2016) Yu, K., Li, J., Liu, L.: A Review on Algorithms for Constraint-based Causal Discovery (2016)
28.
go back to reference Zhang, X., Baral, C., Kim, S.: An algorithm to learn causal relations between genes from steady state data: Simulation and its application to melanoma dataset. In: Miksch, S., Hunter, J., Keravnou, E.T. (eds.) Artificial Intelligence in Medicine, pp. 524–534. Springer, Berlin (2005)CrossRef Zhang, X., Baral, C., Kim, S.: An algorithm to learn causal relations between genes from steady state data: Simulation and its application to melanoma dataset. In: Miksch, S., Hunter, J., Keravnou, E.T. (eds.) Artificial Intelligence in Medicine, pp. 524–534. Springer, Berlin (2005)CrossRef
29.
go back to reference Zhou, Q., Liao, F., Mou, C., Wang, P.: Measuring interpretability for different types of machine learning models. In: M. Ganji, L. Rashidi, B.C.M. Fung, C. Wang (eds.) Trends and Applications in Knowledge Discovery and Data Mining - PAKDD 2018 Workshops, BDASC, BDM, ML4Cyber, PAISI, DaMEMO, Melbourne, VIC, Australia, June 3, 2018, Revised Selected Papers, Lecture Notes in Computer Science, vol. 11154, pp. 295–308. Springer (2018). https://doi.org/10.1007/978-3-030-04503-6_29 Zhou, Q., Liao, F., Mou, C., Wang, P.: Measuring interpretability for different types of machine learning models. In: M. Ganji, L. Rashidi, B.C.M. Fung, C. Wang (eds.) Trends and Applications in Knowledge Discovery and Data Mining - PAKDD 2018 Workshops, BDASC, BDM, ML4Cyber, PAISI, DaMEMO, Melbourne, VIC, Australia, June 3, 2018, Revised Selected Papers, Lecture Notes in Computer Science, vol. 11154, pp. 295–308. Springer (2018). https://​doi.​org/​10.​1007/​978-3-030-04503-6_​29
Metadata
Title
Semi-causal decision trees
Authors
Ana Rita Nogueira
Carlos Abreu Ferreira
João Gama
Publication date
18-10-2021
Publisher
Springer Berlin Heidelberg
Published in
Progress in Artificial Intelligence / Issue 1/2022
Print ISSN: 2192-6352
Electronic ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-021-00262-2

Other articles of this Issue 1/2022

Progress in Artificial Intelligence 1/2022 Go to the issue

Premium Partner