Skip to main content
Erschienen in: The Journal of Supercomputing 8/2021

12.01.2021

Toward efficient execution of data-intensive workflows

verfasst von: Oleg Sukhoroslov

Erschienen in: The Journal of Supercomputing | Ausgabe 8/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Workflows that consume and produce large amounts of data are being widely used in modern scientific computing and data processing pipelines. Scheduling of data-intensive workflows requires a careful management of data transfers between tasks, since network contention can significantly impact the workflow execution time. The paper presents and evaluates several scheduling algorithms, data transfer strategies and optimizations aimed at efficient execution of data-intensive workflows. The studied approaches reduce or completely avoid network contention by explicit scheduling of data transfers and incorporate several optimizations, such as data caching, chunked and peer-to-peer data transfers. The results of experimental study demonstrate that the relative performance of different approaches depends on the workflow properties, data staging strategy and system configuration. The proposed CAS-L1 heuristic with additional data transfer optimizations achieves the best results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Alkaya AF, Topcuoglu HR (2006) A task scheduling algorithm for arbitrarily-connected processors with awareness of link contention. Clust Comput 9(4):417–431CrossRef Alkaya AF, Topcuoglu HR (2006) A task scheduling algorithm for arbitrarily-connected processors with awareness of link contention. Clust Comput 9(4):417–431CrossRef
3.
Zurück zum Zitat Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp 1–10 Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp 1–10
4.
Zurück zum Zitat Bittencourt LF, Sakellariou R, Madeira ERM (2010) DAG scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp 27–34 . https://doi.org/10.1109/PDP.2010.56 Bittencourt LF, Sakellariou R, Madeira ERM (2010) DAG scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp 27–34 . https://​doi.​org/​10.​1109/​PDP.​2010.​56
5.
Zurück zum Zitat Bryk P, Malawski M, Juve G, Deelman E (2016) Storage-aware algorithms for scheduling of workflow ensembles in clouds. J Grid Comput 14(2):359–378CrossRef Bryk P, Malawski M, Juve G, Deelman E (2016) Storage-aware algorithms for scheduling of workflow ensembles in clouds. J Grid Comput 14(2):359–378CrossRef
6.
Zurück zum Zitat Casanova H, Giersch A, Legrand A, Quinson M, Suter F (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917CrossRef Casanova H, Giersch A, Legrand A, Quinson M, Suter F (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917CrossRef
7.
Zurück zum Zitat Çatalyürek ÜV, Kaya K, Uçar B (2011) Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing. ACM, pp 45–54 Çatalyürek ÜV, Kaya K, Uçar B (2011) Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing. ACM, pp 45–54
8.
Zurück zum Zitat da Silva RF, Filgueira R, Deelman E, Pairo-Castineira E, Overton IM, Atkinson MP (2016) Using simple PID controllers to prevent and mitigate faults in scientific workflows. In: WORKS@ SC, pp 15–24 da Silva RF, Filgueira R, Deelman E, Pairo-Castineira E, Overton IM, Atkinson MP (2016) Using simple PID controllers to prevent and mitigate faults in scientific workflows. In: WORKS@ SC, pp 15–24
9.
Zurück zum Zitat Juve G, Chervenak A, Deelman E, Bharathi S, Mehta G, Vahi K (2013) Characterizing and profiling scientific workflows. Future Gener Comput Syst 29(3):682–692CrossRef Juve G, Chervenak A, Deelman E, Bharathi S, Mehta G, Vahi K (2013) Characterizing and profiling scientific workflows. Future Gener Comput Syst 29(3):682–692CrossRef
10.
Zurück zum Zitat Liu J, Pacitti E, Valduriez P, Mattoso M (2015) A survey of data-intensive scientific workflow management. J Grid Comput 13(4):457–493CrossRef Liu J, Pacitti E, Valduriez P, Mattoso M (2015) A survey of data-intensive scientific workflow management. J Grid Comput 13(4):457–493CrossRef
11.
Zurück zum Zitat Liu Z, Xiang T, Lin B, Ye X, Wang H, Zhang Y, Chen X (2018) A data placement strategy for scientific workflow in hybrid cloud. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, pp 556–563 Liu Z, Xiang T, Lin B, Ye X, Wang H, Zhang Y, Chen X (2018) A data placement strategy for scientific workflow in hybrid cloud. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, pp 556–563
12.
Zurück zum Zitat Sinnen O, Sousa LA (2005) Communication contention in task scheduling. IEEE Trans Parallel Distrib Syst 16(6):503–515CrossRef Sinnen O, Sousa LA (2005) Communication contention in task scheduling. IEEE Trans Parallel Distrib Syst 16(6):503–515CrossRef
13.
Zurück zum Zitat Sukhoroslov O (2019) An experimental study of data transfer strategies for execution of scientific workflows. In: International Conference on Parallel Computing Technologies. Springer, pp 67–79 Sukhoroslov O (2019) An experimental study of data transfer strategies for execution of scientific workflows. In: International Conference on Parallel Computing Technologies. Springer, pp 67–79
14.
Zurück zum Zitat Sukhoroslov O (2019) Supporting efficient execution of workflows on Everest platform. In: Voevodin V, Sobolev S (eds) Russian supercomputing days. Springer, Berlin, pp 713–724CrossRef Sukhoroslov O (2019) Supporting efficient execution of workflows on Everest platform. In: Voevodin V, Sobolev S (eds) Russian supercomputing days. Springer, Berlin, pp 713–724CrossRef
15.
Zurück zum Zitat Sukhoroslov O, Nazarenko A, Aleksandrov R (2019) An experimental study of scheduling algorithms for many-task applications. J Supercomput 75(12):7857–7871CrossRef Sukhoroslov O, Nazarenko A, Aleksandrov R (2019) An experimental study of scheduling algorithms for many-task applications. J Supercomput 75(12):7857–7871CrossRef
16.
Zurück zum Zitat Sukhoroslov O, Volkov S, Afanasiev A (2015) A web-based platform for publication and distributed execution of computing applications. In: 14th International Symposium on Parallel and Distributed Computing (ISPDC), pp 175–184. https://doi.org/10.1109/ISPDC.2015.27 Sukhoroslov O, Volkov S, Afanasiev A (2015) A web-based platform for publication and distributed execution of computing applications. In: 14th International Symposium on Parallel and Distributed Computing (ISPDC), pp 175–184. https://​doi.​org/​10.​1109/​ISPDC.​2015.​27
17.
Zurück zum Zitat Szabo C, Sheng QZ, Kroeger T, Zhang Y, Yu J (2014) Science in the cloud: allocation and execution of data-intensive scientific workflows. J Grid Comput 12(2):245–264CrossRef Szabo C, Sheng QZ, Kroeger T, Zhang Y, Yu J (2014) Science in the cloud: allocation and execution of data-intensive scientific workflows. J Grid Comput 12(2):245–264CrossRef
18.
Zurück zum Zitat Taylor IJ, Deelman E, Gannon DB, Shields M (2014) Workflows for e-Science: scientific workflows for grids. Springer, Berlin Taylor IJ, Deelman E, Gannon DB, Shields M (2014) Workflows for e-Science: scientific workflows for grids. Springer, Berlin
19.
Zurück zum Zitat Teylo L, de Paula U, Frota Y, de Oliveira D, Drummond LM (2017) A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds. Future Gener Comput Syst 76:1–17CrossRef Teylo L, de Paula U, Frota Y, de Oliveira D, Drummond LM (2017) A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds. Future Gener Comput Syst 76:1–17CrossRef
21.
Zurück zum Zitat Velho P, Legrand A (2009) Accuracy study and improvement of network simulation in the SimGrid framework. In: Proceedings of the 2nd International Conference on Simulation Tools and Techniques. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), p 13 Velho P, Legrand A (2009) Accuracy study and improvement of network simulation in the SimGrid framework. In: Proceedings of the 2nd International Conference on Simulation Tools and Techniques. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), p 13
22.
Zurück zum Zitat Velho P, Schnorr LM, Casanova H, Legrand A (2013) On the validity of flow-level TCP network models for grid and cloud simulations. ACM Trans Model Comput Simul: TOMACS 23(4):23MathSciNetCrossRef Velho P, Schnorr LM, Casanova H, Legrand A (2013) On the validity of flow-level TCP network models for grid and cloud simulations. ACM Trans Model Comput Simul: TOMACS 23(4):23MathSciNetCrossRef
23.
Zurück zum Zitat Wang M, Zhang J, Dong F, Luo J (2014) Data placement and task scheduling optimization for data intensive scientific workflow in multiple data centers environment. In: 2014 Second International Conference on Advanced Cloud and Big Data. IEEE, pp 77–84 Wang M, Zhang J, Dong F, Luo J (2014) Data placement and task scheduling optimization for data intensive scientific workflow in multiple data centers environment. In: 2014 Second International Conference on Advanced Cloud and Big Data. IEEE, pp 77–84
24.
Zurück zum Zitat Wu F, Wu Q, Tan Y (2015) Workflow scheduling in cloud: a survey. J Supercomput 71(9):3373–3418CrossRef Wu F, Wu Q, Tan Y (2015) Workflow scheduling in cloud: a survey. J Supercomput 71(9):3373–3418CrossRef
26.
Zurück zum Zitat Yu J, Buyya R, Ramamohanarao K (2008) Workflow scheduling algorithms for grid computing. In: Xhafa F, Abraham A (eds) Metaheuristics for scheduling in distributed computing environments. Springer, Berlin, pp 173–214CrossRef Yu J, Buyya R, Ramamohanarao K (2008) Workflow scheduling algorithms for grid computing. In: Xhafa F, Abraham A (eds) Metaheuristics for scheduling in distributed computing environments. Springer, Berlin, pp 173–214CrossRef
27.
Zurück zum Zitat Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214CrossRef Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214CrossRef
Metadaten
Titel
Toward efficient execution of data-intensive workflows
verfasst von
Oleg Sukhoroslov
Publikationsdatum
12.01.2021
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 8/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03612-4

Weitere Artikel der Ausgabe 8/2021

The Journal of Supercomputing 8/2021 Zur Ausgabe