Skip to main content
Top

2021 | OriginalPaper | Chapter

Random Forest Model and Sample Explainer for Non-experts in Machine Learning – Two Case Studies

Authors : D. Petkovic, A. Alavi, D. Cai, M. Wong

Published in: Pattern Recognition. ICPR International Workshops and Challenges

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Machine Learning (ML) is becoming an increasingly critical technology in many areas such as health, business but also in everyday applications of significant societal importance. However, the lack of explainability or ability of ML systems to offer explanation on how they work, which refers to the model (related to the whole data) and sample explainability (related to specific samples) poses significant challenges in their adoption, verification, and in ensuring the trust among users and general public. We present novel integrated Random Forest Model and Sample Explainer – RFEX. RFEX is specifically designed for important class of users who are non-ML experts but are often the domain experts and key decision makers. RFEX provides easy to analyze one-page Model and Sample explainability summaries in tabular format with wealth of explainability information including classification confidence, tradeoff between accuracy and features used, as well as ability to identify potential outlier samples and features. We demonstrate RFEX on two case studies: mortality prediction for COVID-19 patients from the data obtained from Huazhong University of Science and Technology, Wuhan, China, and classification of cell type clusters for human nervous system based on the data from J. Craig Venter Institute. We show that RFEX offers simple yet powerful means of explaining RF classification at model, sample and feature levels, as well as providing guidance for testing and developing explainable and cost-effective operational prediction models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Szabo, L., Kaiser Health News: Artificial intelligence is rushing into patient care—and could raise risks. Sci. Am. 24 December 2019 Szabo, L., Kaiser Health News: Artificial intelligence is rushing into patient care—and could raise risks. Sci. Am. 24 December 2019
2.
go back to reference Kaufman, S., Rosset, S., Perlich, C.: Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6(4), 1–21 (2012)CrossRef Kaufman, S., Rosset, S., Perlich, C.: Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6(4), 1–21 (2012)CrossRef
3.
go back to reference Dzindolet, M., Peterson, S., Pomranky, R., Pierce, L., Beck, H.: The role of trust in automation reliance. Int. J. Hum.-Comput. Stud. 58(6), 697–718 (2003)CrossRef Dzindolet, M., Peterson, S., Pomranky, R., Pierce, L., Beck, H.: The role of trust in automation reliance. Int. J. Hum.-Comput. Stud. 58(6), 697–718 (2003)CrossRef
4.
go back to reference Holm, E.: In defense of black box. Science 364(6435), 26–27 (2019) Holm, E.: In defense of black box. Science 364(6435), 26–27 (2019)
5.
go back to reference Petkovic, D., Kobzik, L., Re, C.: Machine learning and deep analytics for biocomputing: call for better explainability. Pacific Symposium on Biocomputing Hawaii 23, 623–627 (2018) Petkovic, D., Kobzik, L., Re, C.: Machine learning and deep analytics for biocomputing: call for better explainability. Pacific Symposium on Biocomputing Hawaii 23, 623–627 (2018)
6.
go back to reference Petkovic, D., Kobzik, L., Ganaghan, R.: AI ethics and values in biomedicine – technical challenges and solutions. In: Pacific Symposium on Biocomputing, Hawaii, 3–7 January (2020) Petkovic, D., Kobzik, L., Ganaghan, R.: AI ethics and values in biomedicine – technical challenges and solutions. In: Pacific Symposium on Biocomputing, Hawaii, 3–7 January (2020)
7.
go back to reference Vellido, A., Martin-Guerrero, J., Lisboa, P.: Making machine learning models interpretable. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning; 25–27 April, Bruges, Belgium (2012) Vellido, A., Martin-Guerrero, J., Lisboa, P.: Making machine learning models interpretable. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning; 25–27 April, Bruges, Belgium (2012)
12.
go back to reference Petkovic, D., Altman, R., Wong, M., Vigil, A.: Improving the explainability of Random Forest classifier - user centered approach. Pacific Symposium on Biocomputing. 23, 204–215 (2018) Petkovic, D., Altman, R., Wong, M., Vigil, A.: Improving the explainability of Random Forest classifier - user centered approach. Pacific Symposium on Biocomputing. 23, 204–215 (2018)
14.
go back to reference Okada, K., Flores, L., Wong, M., Petkovic, D.: Microenvironment-based protein function analysis by random forest. In: Proceedings of the ICPR (International Conference on Pattern Recognition), Stockholm (2014) Okada, K., Flores, L., Wong, M., Petkovic, D.: Microenvironment-based protein function analysis by random forest. In: Proceedings of the ICPR (International Conference on Pattern Recognition), Stockholm (2014)
15.
go back to reference Yan, L., et al.: An Interpretable mortality prediction model for COVID-19 patients. Nature Mach. Intell. 2, pp. 283–288 (2020) Yan, L., et al.: An Interpretable mortality prediction model for COVID-19 patients. Nature Mach. Intell. 2, pp. 283–288 (2020)
16.
go back to reference Aevermann, B., et al.: Cell type discovery using single cell transcriptomics: implications for ontological representation. Hum. Mol. Gene. 27(R1), R40–R47 (2018) Aevermann, B., et al.: Cell type discovery using single cell transcriptomics: implications for ontological representation. Hum. Mol. Gene. 27(R1), R40–R47 (2018)
17.
go back to reference Aevermann, B., McCorrison, J., Venepally, P., et al.: Production of a preliminary quality control pipeline for single nuclei RNA-seq and its application in the analysis of cell type diversity of post-mortem human brain neocortex. In: Pacific Symposium on Biocomputing Proceedings, vol. 22, pp. 564–575, Hawaii, January 2017 Aevermann, B., McCorrison, J., Venepally, P., et al.: Production of a preliminary quality control pipeline for single nuclei RNA-seq and its application in the analysis of cell type diversity of post-mortem human brain neocortex. In: Pacific Symposium on Biocomputing Proceedings, vol. 22, pp. 564–575, Hawaii, January 2017
21.
go back to reference Olson, R.S., Cava, W., Mustahsan, Z., Varik, A., Moore, J.H.: Data-driven advice for applying machine learning to bioinformatics problems. Pac. Symp. Biocomput. 23, 192–203 (2018) Olson, R.S., Cava, W., Mustahsan, Z., Varik, A., Moore, J.H.: Data-driven advice for applying machine learning to bioinformatics problems. Pac. Symp. Biocomput. 23, 192–203 (2018)
23.
go back to reference Solla, F., Tran, A., Bertoncelli, D., Musoff, C., Bertoncelli, C.M.: Why a P-value is not enough. Clin Spine Surg. 31(9), 385–388 (2018)CrossRef Solla, F., Tran, A., Bertoncelli, D., Musoff, C., Bertoncelli, C.M.: Why a P-value is not enough. Clin Spine Surg. 31(9), 385–388 (2018)CrossRef
Metadata
Title
Random Forest Model and Sample Explainer for Non-experts in Machine Learning – Two Case Studies
Authors
D. Petkovic
A. Alavi
D. Cai
M. Wong
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-68796-0_5

Premium Partner