Skip to main content

2020 | OriginalPaper | Buchkapitel

4. Can Genetic Programming Perform Explainable Machine Learning for Bioinformatics?

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Although proven powerful in making predictions and finding patterns, machine learning algorithms often struggle to provide explanations and translational knowledge when applied to many problems, especially in biomedical sciences. This is often resulted by the highly complex structure employed by machine learning algorithms to represent and model the relationship of the predictors and the response. The prediction accuracy is increased at the cost of having a “black-box” model that is not amenable for interpretation. Genetic programming may provide a potential solution to explainable machine learning for bioinformatics where learned knowledge and patterns can be translated to clinical actions. In this study, we employed an LGP algorithm for a bioinformatics classification problem. We developed feature selection analysis methods and aimed at explaining which features are influential in the prediction, and whether such an influence is through individual effects or synergistic effects of combining with other features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Almasi, S.M., Hu, T.: Measuring the importance of vertices in the weighted human disease network. PLoS ONE 14(3), e0205,936 (2019)CrossRef Almasi, S.M., Hu, T.: Measuring the importance of vertices in the weighted human disease network. PLoS ONE 14(3), e0205,936 (2019)CrossRef
2.
Zurück zum Zitat Altman, R., Alarcon, G., Appelrouth, D., Bloch, D., Borenstein, D., Brandt, K., Brown, C., Cooke, T.D., et al.: The american college of rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis and Rheumatology 34(5), 505–514 (1991)CrossRef Altman, R., Alarcon, G., Appelrouth, D., Bloch, D., Borenstein, D., Brandt, K., Brown, C., Cooke, T.D., et al.: The american college of rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis and Rheumatology 34(5), 505–514 (1991)CrossRef
3.
Zurück zum Zitat Barabasi, A.L., Oltvai, Z.N.: Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics 5, 101–113 (2004)CrossRef Barabasi, A.L., Oltvai, Z.N.: Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics 5, 101–113 (2004)CrossRef
5.
Zurück zum Zitat Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer (2007) Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer (2007)
6.
Zurück zum Zitat Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., Collins, J.J.: Next-generation machine learning for biological networks. Cell 173(7), 1581–1592 (2018)CrossRef Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., Collins, J.J.: Next-generation machine learning for biological networks. Cell 173(7), 1581–1592 (2018)CrossRef
7.
Zurück zum Zitat Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730 (2015) Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730 (2015)
8.
Zurück zum Zitat Cho, D.Y., Kim, Y.A., Przytycka, T.M.: Network biology approach to complex diseases. PLoS Computational Biology 8(12), e1002,820 (2012)CrossRef Cho, D.Y., Kim, Y.A., Przytycka, T.M.: Network biology approach to complex diseases. PLoS Computational Biology 8(12), e1002,820 (2012)CrossRef
9.
Zurück zum Zitat Dorani, F., Hu, T., Woods, M.O., Zhai, G.: Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 6, e5854 (2018)CrossRef Dorani, F., Hu, T., Woods, M.O., Zhai, G.: Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 6, e5854 (2018)CrossRef
10.
Zurück zum Zitat Fontaine-Bisson, B., Thorburn, J., Gregory, A., Zhang, H., Sun, G.: Melanin-concentrating hormone receptor 1 polymorphisms are associated with components of energy balance in the complex diseases in the newfoundland population: Environment and genetics (coding) study. The American Journal of Clinical Nutrition 99(2), 384–391 (2014)CrossRef Fontaine-Bisson, B., Thorburn, J., Gregory, A., Zhang, H., Sun, G.: Melanin-concentrating hormone receptor 1 polymorphisms are associated with components of energy balance in the complex diseases in the newfoundland population: Environment and genetics (coding) study. The American Journal of Clinical Nutrition 99(2), 384–391 (2014)CrossRef
11.
Zurück zum Zitat Ghahramani, Z.: Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015)CrossRef Ghahramani, Z.: Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015)CrossRef
12.
Zurück zum Zitat Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89 (2018) Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89 (2018)
13.
Zurück zum Zitat Hu, T., Chen, Y., Kiralis, J.W., Moore, J.H.: ViSEN: Methodology and software for visualization of statistical epistasis networks. Genetic Epidemiology 37, 283–285 (2013)CrossRef Hu, T., Chen, Y., Kiralis, J.W., Moore, J.H.: ViSEN: Methodology and software for visualization of statistical epistasis networks. Genetic Epidemiology 37, 283–285 (2013)CrossRef
14.
Zurück zum Zitat Hu, T., Moore, J.H.: Network modeling of statistical epistasis. In: M. Elloumi, A.Y. Zomaya (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, chap. 8, pp. 175–190. Wiley (2013) Hu, T., Moore, J.H.: Network modeling of statistical epistasis. In: M. Elloumi, A.Y. Zomaya (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, chap. 8, pp. 175–190. Wiley (2013)
15.
Zurück zum Zitat Hu, T., Oksanen, K., Zhang, W., Randell, E., Furey, A., Sun, G., Zhai, G.: An evolutioanry learning and network approach to identifying key metabolites for osteoarthritis. PLoS Computational Biology 14(3), e1005,986 (2018)CrossRef Hu, T., Oksanen, K., Zhang, W., Randell, E., Furey, A., Sun, G., Zhai, G.: An evolutioanry learning and network approach to identifying key metabolites for osteoarthritis. PLoS Computational Biology 14(3), e1005,986 (2018)CrossRef
16.
Zurück zum Zitat Hu, T., Sinnott-Armstrong, N.A., Kiralis, J.W., Andrew, A.S., Karagas, M.R., Moore, J.H.: Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12, 364 (2011)CrossRef Hu, T., Sinnott-Armstrong, N.A., Kiralis, J.W., Andrew, A.S., Karagas, M.R., Moore, J.H.: Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12, 364 (2011)CrossRef
17.
Zurück zum Zitat Hu, T., Zhang, W., Fan, Z., Sun, G., Likhodi, S., Randell, E., Zhai, G.: Metabolomics differential correlation network analysis of osteoarthritis. Pacific Symposium on Biocomputing 21, 120–131 (2016) Hu, T., Zhang, W., Fan, Z., Sun, G., Likhodi, S., Randell, E., Zhai, G.: Metabolomics differential correlation network analysis of osteoarthritis. Pacific Symposium on Biocomputing 21, 120–131 (2016)
18.
Zurück zum Zitat Kafaie, S., Chen, Y., Hu, T.: A network approach to prioritizing susceptibility genes for genome-wide association studies. Genetic Epidemiology 43(5), 477–491 (2019)CrossRef Kafaie, S., Chen, Y., Hu, T.: A network approach to prioritizing susceptibility genes for genome-wide association studies. Genetic Epidemiology 43(5), 477–491 (2019)CrossRef
19.
Zurück zum Zitat Kontny, E., Wojtecka-ŁUkasik, E., Rell-Bakalarska, K., Dziewczopolski, W., Maśliński, W., Maślinski, S.: Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients. Amino Acids 23(4), 415–418 (2002)CrossRef Kontny, E., Wojtecka-ŁUkasik, E., Rell-Bakalarska, K., Dziewczopolski, W., Maśliński, W., Maślinski, S.: Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients. Amino Acids 23(4), 415–418 (2002)CrossRef
20.
Zurück zum Zitat Lee, M., Hu, T.: Computational methods for the discovery of metabolic markers of complex traits. Metabolites 9(4), 66 (2019)CrossRef Lee, M., Hu, T.: Computational methods for the discovery of metabolic markers of complex traits. Metabolites 9(4), 66 (2019)CrossRef
21.
Zurück zum Zitat Loeser, R.F., Carlson, C.S., Carlo, M.D., Cole, A.: Detection of nitrotyrosine in aging and osteoarthritic cartilage: Correlation of oxidative damage with the presence of interleukin-1β and with chondrocyte resistance to insulin-like growth factor 1. Arthritis and Rheumatology 46(9), 2349–2357 (2002)CrossRef Loeser, R.F., Carlson, C.S., Carlo, M.D., Cole, A.: Detection of nitrotyrosine in aging and osteoarthritic cartilage: Correlation of oxidative damage with the presence of interleukin-1β and with chondrocyte resistance to insulin-like growth factor 1. Arthritis and Rheumatology 46(9), 2349–2357 (2002)CrossRef
22.
Zurück zum Zitat Ma, J., Yu, M.K., Fong, S., Ono, K., Sage, E., Demchak, B., Sharan, R., Ideker, T.: Using deep learning to model the hierarchical structure and function of a cell. Nature Methods 15(4), 290–298 (2018)CrossRef Ma, J., Yu, M.K., Fong, S., Ono, K., Sage, E., Demchak, B., Sharan, R., Ideker, T.: Using deep learning to model the hierarchical structure and function of a cell. Nature Methods 15(4), 290–298 (2018)CrossRef
23.
Zurück zum Zitat Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2014)CrossRef Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2014)CrossRef
24.
Zurück zum Zitat Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016) Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
25.
Zurück zum Zitat Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 13, 2498–2504 (2003)CrossRef Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 13, 2498–2504 (2003)CrossRef
26.
Zurück zum Zitat Yu, M.K., Ma, J., Fisher, J., Kreisberg, J.F., Raphael, B.J., Ideker, T.: Visible machine learning for biomedicine. Cell 173(7), 1562–1565 (2018)CrossRef Yu, M.K., Ma, J., Fisher, J., Kreisberg, J.F., Raphael, B.J., Ideker, T.: Visible machine learning for biomedicine. Cell 173(7), 1562–1565 (2018)CrossRef
27.
Zurück zum Zitat Zhai, G., Aref-Eshghi, E., Rahman, P., Zhang, H., Martin, G., Furey, A., Green, R.C., Sun, G.: Attempt to replicate the published osteoarthritis-associated genetic variants in the newfoundland & labrador population. Journal of Orthopedics and Rheumatology 1(3), 5 (2014) Zhai, G., Aref-Eshghi, E., Rahman, P., Zhang, H., Martin, G., Furey, A., Green, R.C., Sun, G.: Attempt to replicate the published osteoarthritis-associated genetic variants in the newfoundland & labrador population. Journal of Orthopedics and Rheumatology 1(3), 5 (2014)
28.
Zurück zum Zitat Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D.: Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis. Annals of the Rheumatic Diseases p. 120857 (2010) Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D.: Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis. Annals of the Rheumatic Diseases p. 120857 (2010)
29.
Zurück zum Zitat Zhang, W., Likhodii, S., Aref-Eshghi, E., Zhang, Y., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Relationship between blood plasma and synovial fluid metabolite concentrations in patients with osteoarthritis. The Journal of Rheumatology 42(5), 859–865 (2015)CrossRef Zhang, W., Likhodii, S., Aref-Eshghi, E., Zhang, Y., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Relationship between blood plasma and synovial fluid metabolite concentrations in patients with osteoarthritis. The Journal of Rheumatology 42(5), 859–865 (2015)CrossRef
30.
Zurück zum Zitat Zhang, W., Sun, G., Likhodii, S., Liu, M., Aref-Eshghi, E., Harper, P.E., Martin, G., Furey, A., Green, R., Randell, E., Rahman, P., Zhai, G.: Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. Osteoarthritis and Cartilage 24, 827–834 (2016)CrossRef Zhang, W., Sun, G., Likhodii, S., Liu, M., Aref-Eshghi, E., Harper, P.E., Martin, G., Furey, A., Green, R., Randell, E., Rahman, P., Zhai, G.: Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. Osteoarthritis and Cartilage 24, 827–834 (2016)CrossRef
Metadaten
Titel
Can Genetic Programming Perform Explainable Machine Learning for Bioinformatics?
verfasst von
Ting Hu
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-39958-0_4

Premium Partner