Skip to main content

2024 | OriginalPaper | Buchkapitel

14. Genetic Programming as an Innovation Engine for Automated Machine Learning: The Tree-Based Pipeline Optimization Tool (TPOT)

verfasst von : Jason H. Moore, Pedro H. Ribeiro, Nicholas Matsumoto, Anil K. Saini

Erschienen in: Handbook of Evolutionary Machine Learning

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the central challenges of machine learning is the selection of methods for feature selection, feature engineering, and classification or regression algorithms for building an analytics pipeline. This is true for both novices and experts. Automated machine learning (AutoML) has emerged as a useful approach to generate machine learning pipelines without the need for manual construction and evaluation. We review here some challenges of building pipelines and present several of the first and most widely used AutoML methods and open-source software. We present in detail the Tree-based Pipeline Optimization Tool (TPOT) that represents pipelines as expression trees and uses genetic programming (GP) for discovery and optimization. We present some of the extensions of TPOT and its application to real-world big data. We end with some thoughts about the future of AutoML and evolutionary machine learning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chicco, D., Oneto, L., Tavazzi, E.: Eleven quick tips for data cleaning and feature engineering. PLoS Comput. Biol. 18, e1010718 (2022)CrossRef Chicco, D., Oneto, L., Tavazzi, E.: Eleven quick tips for data cleaning and feature engineering. PLoS Comput. Biol. 18, e1010718 (2022)CrossRef
2.
Zurück zum Zitat Urbanowicz, R.J., Meeker, M., La Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018)CrossRef Urbanowicz, R.J., Meeker, M., La Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018)CrossRef
3.
Zurück zum Zitat Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38 (2006) Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38 (2006)
4.
Zurück zum Zitat Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.-M., Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming Theory and Practice II. pp. 283–299 (2006) Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.-M., Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming Theory and Practice II. pp. 283–299 (2006)
5.
Zurück zum Zitat Combi, C., Amico, B., Bellazzi, R., Holzinger, A., Moore, J.H., Zitnik, M., Holmes, J.H.: A manifesto on explainability for artificial intelligence in medicine. Artif. Intell. Med. 133, 102423 (2022)CrossRef Combi, C., Amico, B., Bellazzi, R., Holzinger, A., Moore, J.H., Zitnik, M., Holmes, J.H.: A manifesto on explainability for artificial intelligence in medicine. Artif. Intell. Med. 133, 102423 (2022)CrossRef
6.
Zurück zum Zitat Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning: Methods, Systems, Challenges. Springer International Publishing (2019) Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning: Methods, Systems, Challenges. Springer International Publishing (2019)
7.
Zurück zum Zitat Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM, New York, NY, USA (2013) Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM, New York, NY, USA (2013)
8.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef
9.
Zurück zum Zitat Wang, H.-L., Hsu, W.-Y., Lee, M.-H., Weng, H.-H., Chang, S.-W., Yang, J.-T., Tsai, Y.-H.: Automatic machine-learning-based outcome prediction in patients with primary intracerebral hemorrhage. Front. Neurol. 10, 910 (2019)CrossRef Wang, H.-L., Hsu, W.-Y., Lee, M.-H., Weng, H.-H., Chang, S.-W., Yang, J.-T., Tsai, Y.-H.: Automatic machine-learning-based outcome prediction in patients with primary intracerebral hemorrhage. Front. Neurol. 10, 910 (2019)CrossRef
10.
Zurück zum Zitat Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. pp. 2962–2970. Curran Associates, Inc. (2015) Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. pp. 2962–2970. Curran Associates, Inc. (2015)
11.
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
12.
Zurück zum Zitat Howard, D., Maslej, M.M., Lee, J., Ritchie, J., Woollard, G., French, L.: Transfer learning for risk classification of social media posts: model evaluation study. J. Med. Internet Res. 22, e15371 (2020)CrossRef Howard, D., Maslej, M.M., Lee, J., Ritchie, J., Woollard, G., French, L.: Transfer learning for risk classification of social media posts: model evaluation study. J. Med. Internet Res. 22, e15371 (2020)CrossRef
13.
Zurück zum Zitat Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A., Kidd, L.C., Moore, J.H.: Automating biomedical data science through tree-based pipeline optimization. In: Squillero, G., Burelli, P. (eds.) Applications of Evolutionary Computation, pp. 123–137. Springer International Publishing, Cham (2016)CrossRef Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A., Kidd, L.C., Moore, J.H.: Automating biomedical data science through tree-based pipeline optimization. In: Squillero, G., Burelli, P. (eds.) Applications of Evolutionary Computation, pp. 123–137. Springer International Publishing, Cham (2016)CrossRef
14.
Zurück zum Zitat Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 485–492. ACM, New York, NY, USA (2016) Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 485–492. ACM, New York, NY, USA (2016)
15.
Zurück zum Zitat Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning: Methods, Systems, Challenges, pp. 151–160. Springer International Publishing, Cham (2019)CrossRef Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning: Methods, Systems, Challenges, pp. 151–160. Springer International Publishing, Cham (2019)CrossRef
16.
Zurück zum Zitat Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992)MATH Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992)MATH
17.
Zurück zum Zitat Fortin, F., De Rainville, F., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MathSciNet Fortin, F., De Rainville, F., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MathSciNet
18.
Zurück zum Zitat Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)CrossRef Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)CrossRef
19.
Zurück zum Zitat Helmuth, T., McPhee, N.F., Spector, L.: Lexicase selection for program synthesis: a diversity analysis. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, pp. 151–167. Springer International Publishing, Cham (2016)CrossRef Helmuth, T., McPhee, N.F., Spector, L.: Lexicase selection for program synthesis: a diversity analysis. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, pp. 151–167. Springer International Publishing, Cham (2016)CrossRef
20.
Zurück zum Zitat Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinforma. Oxf. Engl. 36, 250–256 (2020)CrossRef Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinforma. Oxf. Engl. 36, 250–256 (2020)CrossRef
21.
Zurück zum Zitat Romano, J., Le, T., Fu, W., Moore, J.: TPOT-NN: augmenting tree-based automated machine learning with neural network estimators. Genet. Program. Evolvable Mach. 1–21 (2021) Romano, J., Le, T., Fu, W., Moore, J.: TPOT-NN: augmenting tree-based automated machine learning with neural network estimators. Genet. Program. Evolvable Mach. 1–21 (2021)
22.
Zurück zum Zitat Manduchi, E., Romano, J.D., Moore, J.H.: The promise of automated machine learning for the genetic analysis of complex traits. Hum. Genet. 141, 1529–1544 (2022)CrossRef Manduchi, E., Romano, J.D., Moore, J.H.: The promise of automated machine learning for the genetic analysis of complex traits. Hum. Genet. 141, 1529–1544 (2022)CrossRef
23.
Zurück zum Zitat Orlenko, A., Kofink, D., Lyytikäinen, L.-P., Nikus, K., Mishra, P., Kuukasjärvi, P., Karhunen, P.J., Kähönen, M., Laurikka, J.O., Lehtimäki, T., Asselbergs, F.W., Moore, J.H.: Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinforma. Oxf. Engl. 36, 1772–1778 (2020)CrossRef Orlenko, A., Kofink, D., Lyytikäinen, L.-P., Nikus, K., Mishra, P., Kuukasjärvi, P., Karhunen, P.J., Kähönen, M., Laurikka, J.O., Lehtimäki, T., Asselbergs, F.W., Moore, J.H.: Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinforma. Oxf. Engl. 36, 1772–1778 (2020)CrossRef
24.
Zurück zum Zitat Manduchi, E., Fu, W., Romano, J.D., Ruberto, S., Moore, J.H.: Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses. BMC Bioinform. 21, 430 (2020)CrossRef Manduchi, E., Fu, W., Romano, J.D., Ruberto, S., Moore, J.H.: Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses. BMC Bioinform. 21, 430 (2020)CrossRef
25.
Zurück zum Zitat Purkayastha, S., Zhao, Y., Wu, J., Hu, R., McGirr, A., Singh, S., Chang, K., Huang, R.Y., Zhang, P.J., Silva, A., Soulen, M.C., Stavropoulos, S.W., Zhang, Z., Bai, H.X.: Differentiation of low and high grade renal cell carcinoma on routine MRI with an externally validated automatic machine learning algorithm. Sci. Rep. 10, 19503 (2020)CrossRef Purkayastha, S., Zhao, Y., Wu, J., Hu, R., McGirr, A., Singh, S., Chang, K., Huang, R.Y., Zhang, P.J., Silva, A., Soulen, M.C., Stavropoulos, S.W., Zhang, Z., Bai, H.X.: Differentiation of low and high grade renal cell carcinoma on routine MRI with an externally validated automatic machine learning algorithm. Sci. Rep. 10, 19503 (2020)CrossRef
26.
Zurück zum Zitat Heimisdottir, L.H., Lin, B.M., Cho, H., Orlenko, A., Ribeiro, A.A., Simon-Soro, A., Roach, J., Shungin, D., Ginnis, J., Simancas-Pallares, M.A., Spangler, H.D., Zandoná, A.G.F., Wright, J.T., Ramamoorthy, P., Moore, J.H., Koo, H., Wu, D., Divaris, K.: Metabolomics insights in early childhood caries. J. Dent. Res. 100, 615–622 (2021)CrossRef Heimisdottir, L.H., Lin, B.M., Cho, H., Orlenko, A., Ribeiro, A.A., Simon-Soro, A., Roach, J., Shungin, D., Ginnis, J., Simancas-Pallares, M.A., Spangler, H.D., Zandoná, A.G.F., Wright, J.T., Ramamoorthy, P., Moore, J.H., Koo, H., Wu, D., Divaris, K.: Metabolomics insights in early childhood caries. J. Dent. Res. 100, 615–622 (2021)CrossRef
27.
Zurück zum Zitat Manduchi, E., Le, T.T., Fu, W., Moore, J.H.: Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 1379–1386 (2022)CrossRef Manduchi, E., Le, T.T., Fu, W., Moore, J.H.: Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 1379–1386 (2022)CrossRef
28.
Zurück zum Zitat Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. (2017) Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. (2017)
29.
Zurück zum Zitat Sipper, M., Moore, J.H.: Genetic programming theory and practice: a fifteen-year trajectory. Genet. Program Evolvable Mach. 21, 169–179 (2020)CrossRef Sipper, M., Moore, J.H.: Genetic programming theory and practice: a fifteen-year trajectory. Genet. Program Evolvable Mach. 21, 169–179 (2020)CrossRef
30.
Zurück zum Zitat La Cava, W., Williams, H., Fu, W., Vitale, S., Srivatsan, D., Moore, J.H.: Evaluating recommender systems for AI-driven biomedical informatics. Bioinforma. Oxf. Engl. 37, 250–256 (2021)CrossRef La Cava, W., Williams, H., Fu, W., Vitale, S., Srivatsan, D., Moore, J.H.: Evaluating recommender systems for AI-driven biomedical informatics. Bioinforma. Oxf. Engl. 37, 250–256 (2021)CrossRef
31.
Zurück zum Zitat Moore, J.H., Parker, J.S., Olsen, N.J., Aune, T.M.: Symbolic discriminant analysis of microarray data in autoimmune disease. Genet. Epidemiol. 23, 57–69 (2002)CrossRef Moore, J.H., Parker, J.S., Olsen, N.J., Aune, T.M.: Symbolic discriminant analysis of microarray data in autoimmune disease. Genet. Epidemiol. 23, 57–69 (2002)CrossRef
Metadaten
Titel
Genetic Programming as an Innovation Engine for Automated Machine Learning: The Tree-Based Pipeline Optimization Tool (TPOT)
verfasst von
Jason H. Moore
Pedro H. Ribeiro
Nicholas Matsumoto
Anil K. Saini
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-3814-8_14

Premium Partner