Skip to main content
Top

2018 | OriginalPaper | Chapter

Evolutionary Induction of Classification Trees on Spark

Authors : Daniel Reska, Krzysztof Jurczuk, Marek Kretowski

Published in: Artificial Intelligence and Soft Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Evolutionary-based approaches have recently been increasingly proposed for data mining tasks, but their real applicability depends on efficiency and scalability for large-scale data. It is clear that parallel and distributed processing support is indispensable herein. Apache Spark is one of the most promising cluster-computing engines for Big Data. In this paper, we investigate the application of Spark to speed up an evolutionary induction of classification trees in the Global Decision Tree (GDT) system. The system simultaneously searches for the tree structure and tests in non-terminal nodes due to specialized genetic operators. As the original GDT system is implemented in C++, the Java-based module is developed for Spark-based acceleration of the most computationally demanding fitness evaluation. The training dataset is transformed to Resilient Distributed Dataset, which enables in-memory processing of dataset’s parts on workers. Preliminary experimental validation on large-scale artificial and real-life datasets shows that the proposed solution is efficient and scales well.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
A dipole is a pair of observations; if observations come from different classes, a dipole is called mixed.
 
Literature
2.
go back to reference Alba, E., Tomassini, M.: Parallelism and evolutionary algorithms. IEEE Trans. Evol. Comput. 6(5), 443–462 (2002)CrossRef Alba, E., Tomassini, M.: Parallelism and evolutionary algorithms. IEEE Trans. Evol. Comput. 6(5), 443–462 (2002)CrossRef
3.
go back to reference Barros, R.C., Basgalupp, M.P., Carvalho, A.C., Freitas, A.A.: A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. SMC, Part C 42(3), 291–312 (2012) Barros, R.C., Basgalupp, M.P., Carvalho, A.C., Freitas, A.A.: A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. SMC, Part C 42(3), 291–312 (2012)
5.
go back to reference Czajkowski, M., Jurczuk, K., Kretowski, M.: A parallel approach for evolutionary induced decision trees. MPI+OpenMP implementation. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS (LNAI), vol. 9119, pp. 340–349. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19324-3_31CrossRef Czajkowski, M., Jurczuk, K., Kretowski, M.: A parallel approach for evolutionary induced decision trees. MPI+OpenMP implementation. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS (LNAI), vol. 9119, pp. 340–349. Springer, Cham (2015). https://​doi.​org/​10.​1007/​978-3-319-19324-3_​31CrossRef
6.
7.
go back to reference Czajkowski, M., Kretowski, M.: Evolutionary induction of global model trees with specialized operators and memetic extensions. Inf. Sci. 288, 153–173 (2014)CrossRef Czajkowski, M., Kretowski, M.: Evolutionary induction of global model trees with specialized operators and memetic extensions. Inf. Sci. 288, 153–173 (2014)CrossRef
9.
go back to reference Ferranti, A., Marcelloni, F., Segatori, A., Antonelli, M., Ducange, P.: A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data. Inf. Sci. 415–416, 319–340 (2017)CrossRef Ferranti, A., Marcelloni, F., Segatori, A., Antonelli, M., Ducange, P.: A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data. Inf. Sci. 415–416, 319–340 (2017)CrossRef
11.
go back to reference Gong, Y.J., Chen, W.N., Zhan, Z.H., Zhang, J., Li, Y., Zhang, Q., Li, J.J.: Distributed evolutionary algorithms and their models: a survey of the state-of-the-art. Appl. Soft Comput. 34, 286–300 (2015)CrossRef Gong, Y.J., Chen, W.N., Zhan, Z.H., Zhang, J., Li, Y., Zhang, Q., Li, J.J.: Distributed evolutionary algorithms and their models: a survey of the state-of-the-art. Appl. Soft Comput. 34, 286–300 (2015)CrossRef
12.
go back to reference Grama, A., Karypis, G., Kumar, V., Gupta, A.: Introduction to Parallel Computing. Addison-Wesley, Boston (2003)MATH Grama, A., Karypis, G., Kumar, V., Gupta, A.: Introduction to Parallel Computing. Addison-Wesley, Boston (2003)MATH
13.
go back to reference Jurczuk, K., Czajkowski, M., Kretowski, M.: Evolutionary induction of a decision tree for large-scale data: a GPU-based approach. Soft Comput. 21(24), 7363–7379 (2017)CrossRef Jurczuk, K., Czajkowski, M., Kretowski, M.: Evolutionary induction of a decision tree for large-scale data: a GPU-based approach. Soft Comput. 21(24), 7363–7379 (2017)CrossRef
14.
go back to reference Kretowski, M., Grzes, M.: Evolutionary induction of mixed decision trees. Int. J. Data Warehous. Min. (IJDWM) 3(4), 68–82 (2007)CrossRef Kretowski, M., Grzes, M.: Evolutionary induction of mixed decision trees. Int. J. Data Warehous. Min. (IJDWM) 3(4), 68–82 (2007)CrossRef
15.
go back to reference Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)MathSciNetMATH Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)MathSciNetMATH
17.
go back to reference Pulgar-Rubior, F., Rivera-Rivas, A., Perez-Godoy, M., Gonzalez, P., Carmona, C., del Jesus, M.: MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a MapReduce solution. Knowl.-Based Syst. 117, 70–78 (2017)CrossRef Pulgar-Rubior, F., Rivera-Rivas, A., Perez-Godoy, M., Gonzalez, P., Carmona, C., del Jesus, M.: MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a MapReduce solution. Knowl.-Based Syst. 117, 70–78 (2017)CrossRef
18.
go back to reference Qi, R., Wang, Z., Li, S.: A parallel genetic algorithm based on Spark for pairwise test suite generation. J. Comput. Sci. Technol. 31(2), 417–427 (2016)CrossRef Qi, R., Wang, Z., Li, S.: A parallel genetic algorithm based on Spark for pairwise test suite generation. J. Comput. Sci. Technol. 31(2), 417–427 (2016)CrossRef
20.
go back to reference Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)CrossRef Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)CrossRef
21.
go back to reference Zaharia, M., et al.: Apache Spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef Zaharia, M., et al.: Apache Spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef
Metadata
Title
Evolutionary Induction of Classification Trees on Spark
Authors
Daniel Reska
Krzysztof Jurczuk
Marek Kretowski
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-91253-0_48

Premium Partner