Skip to main content
Top
Published in:
Cover of the book

2020 | OriginalPaper | Chapter

SPM: Modeling Spark Task Execution Time from the Sub-stage Perspective

Authors : Wei Li, Shengjie Hu, Di Wang, Tianba Chen, Yunchun Li

Published in: Algorithms and Architectures for Parallel Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Tasks are the basic unit of Spark application scheduling, and its execution is affected by various configurations of Spark cluster. Therefore, the prediction of task execution time is a challenging job. In this paper, we analyze the features of task execution procedure on different stages, and propose the method of prediction of each sub-stage execution time. Moreover, the correlative time overheads of GC and shuffle spill are analyzed in detail. As a result, we propose SPM, a task-level execution time prediction model. SPM can be used to predict the task execution time of each stage according to the input data size and configuration of parallelism. We further apply SPM to the Spark network emulation tool SNemu, which can determine the start time of each shuffle procedure for emulation effectively. Experimental results show that the prediction method can achieve high accuracy in a variety of Spark benchmarks on Hibench.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
3.
go back to reference Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012) Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
4.
go back to reference Gu, J., Li, Y., Tang, H., Wu, Z.: Auto-tuning spark configurations based on neural network. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2018) Gu, J., Li, Y., Tang, H., Wu, Z.: Auto-tuning spark configurations based on neural network. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2018)
5.
go back to reference Popescu, A.D., Balmin, A., Ercegovac, V., Ailamaki, A.: Predict: towards predicting the runtime of large scale iterative analytics. Proc. VLDB Endow. 6(14), 1678–1689 (2013)CrossRef Popescu, A.D., Balmin, A., Ercegovac, V., Ailamaki, A.: Predict: towards predicting the runtime of large scale iterative analytics. Proc. VLDB Endow. 6(14), 1678–1689 (2013)CrossRef
6.
go back to reference Nguyen, N., Khan, M.M.H., Albayram, Y., Wang, K.: Understanding the influence of configuration settings: an execution model-driven framework for apache spark platform. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 802–807. IEEE (2017) Nguyen, N., Khan, M.M.H., Albayram, Y., Wang, K.: Understanding the influence of configuration settings: an execution model-driven framework for apache spark platform. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 802–807. IEEE (2017)
7.
go back to reference Bhimani, J., Mi, N., Leeser, M., Yang, Z.: FIM: performance prediction for parallel computation in iterative data processing applications. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 359–366. IEEE (2017) Bhimani, J., Mi, N., Leeser, M., Yang, Z.: FIM: performance prediction for parallel computation in iterative data processing applications. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 359–366. IEEE (2017)
8.
go back to reference Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2016), pp. 363–378 (2016) Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2016), pp. 363–378 (2016)
9.
go back to reference Wang, K., Khan, M.M.H.: Performance prediction for apache spark platform. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 166–173. IEEE (2015) Wang, K., Khan, M.M.H.: Performance prediction for apache spark platform. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 166–173. IEEE (2015)
10.
go back to reference Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51. IEEE (2010) Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51. IEEE (2010)
Metadata
Title
SPM: Modeling Spark Task Execution Time from the Sub-stage Perspective
Authors
Wei Li
Shengjie Hu
Di Wang
Tianba Chen
Yunchun Li
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-38961-1_1

Premium Partner