Skip to main content
Erschienen in: The Journal of Supercomputing 10/2021

10.03.2021

Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform

verfasst von: Zain Ulabedin, Babar Nazir

Erschienen in: The Journal of Supercomputing | Ausgabe 10/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Scientific workflow applications have a large amount of tasks and data sets to be processed in a systematic manner. These applications benefit from cloud computing platform that offer access to virtually limitless resources provisioned elastically and on demand. Running data-intensive scientific workflow on geographically distributed data centres faces massive amount of data transfer. That affects the whole execution time and monitory cost of scientific workflows. The existing efforts on scheduling workflow concentrate on decreasing make span and budget; little concern has been paid to contemplate tasks and data sets dependency. In this paper, we introduced workflow scheduling technique to overcome data transfer and execute workflow tasks within deadline and budget constraints. The proposed techniques consist of initial data placement stage, which clusters and distributes datasets based on their dependence and replication-based partial critical path (R-PCP) technique which schedules tasks with data locality and dynamically maintains dependency matrix for the placement of generated data sets. To reduce run time datasets movement, we use interdata centre tasks replication and data sets replication to make sure data sets availability. Simulation results with four workflow applications illustrate that our strategy efficiently reduces data movement and executes all chosen workflows within user specified budget and deadline. Results reveal that R-PCP has 44.93% and 31.37% less data movement compared to random and adaptive data-aware scheduling (ADAS) techniques, respectively. R-PCP has 26.48% less energy consumption compared with ADAS technique.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Deelman E, Blythe J, Gil Y, Kesselman C, Mehta G, Patil S, Su M, Vahi K, Livny M (2004) Pegasus: mapping scientific workflows onto the grid. Grid Comput, pp 11–20 Deelman E, Blythe J, Gil Y, Kesselman C, Mehta G, Patil S, Su M, Vahi K, Livny M (2004) Pegasus: mapping scientific workflows onto the grid. Grid Comput, pp 11–20
2.
Zurück zum Zitat Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics J 20(17):3045–3054CrossRef Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics J 20(17):3045–3054CrossRef
3.
Zurück zum Zitat Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp Work Grid Syst 18:1039–1065CrossRef Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp Work Grid Syst 18:1039–1065CrossRef
4.
Zurück zum Zitat Buyya R, Buyya R, Yeo CS, Yeo CS, Venugopal S, Venugopal S, Broberg J, Broberg J, Brandic I, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):17CrossRef Buyya R, Buyya R, Yeo CS, Yeo CS, Venugopal S, Venugopal S, Broberg J, Broberg J, Brandic I, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):17CrossRef
5.
Zurück zum Zitat Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: 2nd IEEE International Conference on Cloud Computing Technology Science, pp 159–168 Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: 2nd IEEE International Conference on Cloud Computing Technology Science, pp 159–168
6.
Zurück zum Zitat Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: 2008 8th IEEE International Symposium Cluster Computing Grid, pp 687–692 Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: 2008 8th IEEE International Symposium Cluster Computing Grid, pp 687–692
7.
Zurück zum Zitat Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Futur Gener Comput Syst 26(8):1200–1214CrossRef Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Futur Gener Comput Syst 26(8):1200–1214CrossRef
8.
Zurück zum Zitat Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: ICDCS ’04 24th International Conference Distributed Computer Systems, vol 0, pp 342–349 Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: ICDCS ’04 24th International Conference Distributed Computer Systems, vol 0, pp 342–349
9.
Zurück zum Zitat Casas I, Taheri J, Ranjan R, Wang L, Zomaya AY (2016) A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Future Gener Comput Syst 74:168–178CrossRef Casas I, Taheri J, Ranjan R, Wang L, Zomaya AY (2016) A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Future Gener Comput Syst 74:168–178CrossRef
10.
Zurück zum Zitat Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS Operating Systems Review 37(5), p 43 Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS Operating Systems Review 37(5), p 43
11.
Zurück zum Zitat Shvachko K, Hairong K, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies(MSST). In: 2010 IEEE 26th Symposium on, 2010, pp 1–10 Shvachko K, Hairong K, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies(MSST). In: 2010 IEEE 26th Symposium on, 2010, pp 1–10
12.
Zurück zum Zitat Lee YC, Han H, Zomaya AY, Yousif M (2015) Resource-efficient workflow scheduling in clouds. Knowl Based Syst 80:153–162CrossRef Lee YC, Han H, Zomaya AY, Yousif M (2015) Resource-efficient workflow scheduling in clouds. Knowl Based Syst 80:153–162CrossRef
13.
Zurück zum Zitat Wu F, Wu Q, Tan Y, Li R, Wang W (2016) PCP-B2: partial critical path budget balanced scheduling algorithms for scientific work flow applications. Future Gener Comput Syst 60:22–34CrossRef Wu F, Wu Q, Tan Y, Li R, Wang W (2016) PCP-B2: partial critical path budget balanced scheduling algorithms for scientific work flow applications. Future Gener Comput Syst 60:22–34CrossRef
14.
Zurück zum Zitat Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. In: IEEE Transactions on Parallel and Distributed Systems, vol 25, no 7, July 2014 Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. In: IEEE Transactions on Parallel and Distributed Systems, vol 25, no 7, July 2014
15.
Zurück zum Zitat Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T (2012) Dynamic QoS-aware data replication in grid environments based on data ‘importance.’ Futur Gener Comput Syst 28(3):544–553CrossRef Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T (2012) Dynamic QoS-aware data replication in grid environments based on data ‘importance.’ Futur Gener Comput Syst 28(3):544–553CrossRef
16.
Zurück zum Zitat Vairavanathan E, Al-Kiswany S, Costa LB, Zhang Z, Katz DS, Wilde M, Ripeanu M (2012) A workflow-aware storage system: an opportunity study. In: Proceedings of 12th IEEE/ACM International Symposium Cluster Cloud and Grid Computing CCGrid 2012, pp 326–334 Vairavanathan E, Al-Kiswany S, Costa LB, Zhang Z, Katz DS, Wilde M, Ripeanu M (2012) A workflow-aware storage system: an opportunity study. In: Proceedings of 12th IEEE/ACM International Symposium Cluster Cloud and Grid Computing CCGrid 2012, pp 326–334
17.
Zurück zum Zitat Abrishami S, Naghibzadeh M, Epema DHJ (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds. Futur Gener Comput Syst 29(1):158–169CrossRef Abrishami S, Naghibzadeh M, Epema DHJ (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds. Futur Gener Comput Syst 29(1):158–169CrossRef
18.
Zurück zum Zitat Rezaeian A, Naghibzadeh M, Epema DHJ (2019) Fair multiple-workflow scheduling with different quality-of-service goals. J Supercomput 75(2):746–769CrossRef Rezaeian A, Naghibzadeh M, Epema DHJ (2019) Fair multiple-workflow scheduling with different quality-of-service goals. J Supercomput 75(2):746–769CrossRef
19.
Zurück zum Zitat Chen H, Zhu J, Zhang Z et al (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Super Comput 73:4906–4921CrossRef Chen H, Zhu J, Zhang Z et al (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Super Comput 73:4906–4921CrossRef
20.
Zurück zum Zitat Zeng L, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151CrossRef Zeng L, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151CrossRef
21.
Zurück zum Zitat Pandey S, Wu L, Guru SM, Buyya R (2010) A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: Proceedings of International Conference on Advanced Information Networking and Applications AINA, pp 400–407 Pandey S, Wu L, Guru SM, Buyya R (2010) A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: Proceedings of International Conference on Advanced Information Networking and Applications AINA, pp 400–407
22.
Zurück zum Zitat Lee YC, Zomaya AY (2010) Rescheduling for reliable job completion with the support of clouds. Futur Gener Comput Syst 26(8):1192–1199CrossRef Lee YC, Zomaya AY (2010) Rescheduling for reliable job completion with the support of clouds. Futur Gener Comput Syst 26(8):1192–1199CrossRef
23.
Zurück zum Zitat Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48CrossRef Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48CrossRef
24.
Zurück zum Zitat Yu J, Buyya R (2004) A novel architecture for realizing grid workflow using tuple spaces. In: GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on GridComputing. Washington, DC, USA: IEEE, 2004, pp 119–128 Yu J, Buyya R (2004) A novel architecture for realizing grid workflow using tuple spaces. In: GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on GridComputing. Washington, DC, USA: IEEE, 2004, pp 119–128
26.
Zurück zum Zitat Abrishami S, Naghibzadeh M, Epema DHJ (2012) “Cost-driven scheduling of Grid workflows using partial critical paths. IEEE Trans Parallel Distrib Syst 23(8):1400–1414CrossRef Abrishami S, Naghibzadeh M, Epema DHJ (2012) “Cost-driven scheduling of Grid workflows using partial critical paths. IEEE Trans Parallel Distrib Syst 23(8):1400–1414CrossRef
27.
Zurück zum Zitat Chen W, Deelman E (2012) WorkflowSim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science, e-Science 2012 Chen W, Deelman E (2012) WorkflowSim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science, e-Science 2012
28.
Zurück zum Zitat Palankar MR, Iamnitchi A, Ripeanu M, Garfinkel S (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC’08, ACM, New York, NY, USA, 2008, pp 55–64 Palankar MR, Iamnitchi A, Ripeanu M, Garfinkel S (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC’08, ACM, New York, NY, USA, 2008, pp 55–64
29.
Zurück zum Zitat Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: The 3rd Workshop on Workflows in Support of Large Scale Science, (WORKS 08) Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: The 3rd Workshop on Workflows in Support of Large Scale Science, (WORKS 08)
30.
Zurück zum Zitat Topcuoglu H, Hariri S, Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef Topcuoglu H, Hariri S, Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef
31.
Zurück zum Zitat Chen F, Schneider J-G, Yang Y, Grundy J, He Q (2012) An energy consumption model and analysis tool for cloud computing environments. In: GREENS 2012, Zuricg, Switzerland, pp 45–50 Chen F, Schneider J-G, Yang Y, Grundy J, He Q (2012) An energy consumption model and analysis tool for cloud computing environments. In: GREENS 2012, Zuricg, Switzerland, pp 45–50
32.
Zurück zum Zitat Mustafa S, Nazir B, Hayat A, Madani SA (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Elect Eng 47:186–203CrossRef Mustafa S, Nazir B, Hayat A, Madani SA (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Elect Eng 47:186–203CrossRef
33.
Zurück zum Zitat Ahmad Z, Nazir B, Umer A (2021) A fault-tolerant workflow management system with Quality of Service-aware scheduling for scientific workflows in cloud computing. Int J Commun Syst 34(1):e4649 Ahmad Z, Nazir B, Umer A (2021) A fault-tolerant workflow management system with Quality of Service-aware scheduling for scientific workflows in cloud computing. Int J Commun Syst 34(1):e4649
34.
Zurück zum Zitat Qureshi K, Khan FG, Manuel P, Nazir B (2011) A hybrid fault tolerance technique in grid computing system. J Supercomput 56(1):106–128CrossRef Qureshi K, Khan FG, Manuel P, Nazir B (2011) A hybrid fault tolerance technique in grid computing system. J Supercomput 56(1):106–128CrossRef
Metadaten
Titel
Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform
verfasst von
Zain Ulabedin
Babar Nazir
Publikationsdatum
10.03.2021
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 10/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03541-2

Weitere Artikel der Ausgabe 10/2021

The Journal of Supercomputing 10/2021 Zur Ausgabe