Skip to main content
Top
Published in: The Journal of Supercomputing 10/2021

10-03-2021

Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform

Authors: Zain Ulabedin, Babar Nazir

Published in: The Journal of Supercomputing | Issue 10/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Scientific workflow applications have a large amount of tasks and data sets to be processed in a systematic manner. These applications benefit from cloud computing platform that offer access to virtually limitless resources provisioned elastically and on demand. Running data-intensive scientific workflow on geographically distributed data centres faces massive amount of data transfer. That affects the whole execution time and monitory cost of scientific workflows. The existing efforts on scheduling workflow concentrate on decreasing make span and budget; little concern has been paid to contemplate tasks and data sets dependency. In this paper, we introduced workflow scheduling technique to overcome data transfer and execute workflow tasks within deadline and budget constraints. The proposed techniques consist of initial data placement stage, which clusters and distributes datasets based on their dependence and replication-based partial critical path (R-PCP) technique which schedules tasks with data locality and dynamically maintains dependency matrix for the placement of generated data sets. To reduce run time datasets movement, we use interdata centre tasks replication and data sets replication to make sure data sets availability. Simulation results with four workflow applications illustrate that our strategy efficiently reduces data movement and executes all chosen workflows within user specified budget and deadline. Results reveal that R-PCP has 44.93% and 31.37% less data movement compared to random and adaptive data-aware scheduling (ADAS) techniques, respectively. R-PCP has 26.48% less energy consumption compared with ADAS technique.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Deelman E, Blythe J, Gil Y, Kesselman C, Mehta G, Patil S, Su M, Vahi K, Livny M (2004) Pegasus: mapping scientific workflows onto the grid. Grid Comput, pp 11–20 Deelman E, Blythe J, Gil Y, Kesselman C, Mehta G, Patil S, Su M, Vahi K, Livny M (2004) Pegasus: mapping scientific workflows onto the grid. Grid Comput, pp 11–20
2.
go back to reference Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics J 20(17):3045–3054CrossRef Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics J 20(17):3045–3054CrossRef
3.
go back to reference Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp Work Grid Syst 18:1039–1065CrossRef Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp Work Grid Syst 18:1039–1065CrossRef
4.
go back to reference Buyya R, Buyya R, Yeo CS, Yeo CS, Venugopal S, Venugopal S, Broberg J, Broberg J, Brandic I, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):17CrossRef Buyya R, Buyya R, Yeo CS, Yeo CS, Venugopal S, Venugopal S, Broberg J, Broberg J, Brandic I, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):17CrossRef
5.
go back to reference Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: 2nd IEEE International Conference on Cloud Computing Technology Science, pp 159–168 Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: 2nd IEEE International Conference on Cloud Computing Technology Science, pp 159–168
6.
go back to reference Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: 2008 8th IEEE International Symposium Cluster Computing Grid, pp 687–692 Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: 2008 8th IEEE International Symposium Cluster Computing Grid, pp 687–692
7.
go back to reference Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Futur Gener Comput Syst 26(8):1200–1214CrossRef Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Futur Gener Comput Syst 26(8):1200–1214CrossRef
8.
go back to reference Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: ICDCS ’04 24th International Conference Distributed Computer Systems, vol 0, pp 342–349 Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: ICDCS ’04 24th International Conference Distributed Computer Systems, vol 0, pp 342–349
9.
go back to reference Casas I, Taheri J, Ranjan R, Wang L, Zomaya AY (2016) A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Future Gener Comput Syst 74:168–178CrossRef Casas I, Taheri J, Ranjan R, Wang L, Zomaya AY (2016) A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Future Gener Comput Syst 74:168–178CrossRef
10.
go back to reference Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS Operating Systems Review 37(5), p 43 Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS Operating Systems Review 37(5), p 43
11.
go back to reference Shvachko K, Hairong K, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies(MSST). In: 2010 IEEE 26th Symposium on, 2010, pp 1–10 Shvachko K, Hairong K, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies(MSST). In: 2010 IEEE 26th Symposium on, 2010, pp 1–10
12.
go back to reference Lee YC, Han H, Zomaya AY, Yousif M (2015) Resource-efficient workflow scheduling in clouds. Knowl Based Syst 80:153–162CrossRef Lee YC, Han H, Zomaya AY, Yousif M (2015) Resource-efficient workflow scheduling in clouds. Knowl Based Syst 80:153–162CrossRef
13.
go back to reference Wu F, Wu Q, Tan Y, Li R, Wang W (2016) PCP-B2: partial critical path budget balanced scheduling algorithms for scientific work flow applications. Future Gener Comput Syst 60:22–34CrossRef Wu F, Wu Q, Tan Y, Li R, Wang W (2016) PCP-B2: partial critical path budget balanced scheduling algorithms for scientific work flow applications. Future Gener Comput Syst 60:22–34CrossRef
14.
go back to reference Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. In: IEEE Transactions on Parallel and Distributed Systems, vol 25, no 7, July 2014 Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. In: IEEE Transactions on Parallel and Distributed Systems, vol 25, no 7, July 2014
15.
go back to reference Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T (2012) Dynamic QoS-aware data replication in grid environments based on data ‘importance.’ Futur Gener Comput Syst 28(3):544–553CrossRef Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T (2012) Dynamic QoS-aware data replication in grid environments based on data ‘importance.’ Futur Gener Comput Syst 28(3):544–553CrossRef
16.
go back to reference Vairavanathan E, Al-Kiswany S, Costa LB, Zhang Z, Katz DS, Wilde M, Ripeanu M (2012) A workflow-aware storage system: an opportunity study. In: Proceedings of 12th IEEE/ACM International Symposium Cluster Cloud and Grid Computing CCGrid 2012, pp 326–334 Vairavanathan E, Al-Kiswany S, Costa LB, Zhang Z, Katz DS, Wilde M, Ripeanu M (2012) A workflow-aware storage system: an opportunity study. In: Proceedings of 12th IEEE/ACM International Symposium Cluster Cloud and Grid Computing CCGrid 2012, pp 326–334
17.
go back to reference Abrishami S, Naghibzadeh M, Epema DHJ (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds. Futur Gener Comput Syst 29(1):158–169CrossRef Abrishami S, Naghibzadeh M, Epema DHJ (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds. Futur Gener Comput Syst 29(1):158–169CrossRef
18.
go back to reference Rezaeian A, Naghibzadeh M, Epema DHJ (2019) Fair multiple-workflow scheduling with different quality-of-service goals. J Supercomput 75(2):746–769CrossRef Rezaeian A, Naghibzadeh M, Epema DHJ (2019) Fair multiple-workflow scheduling with different quality-of-service goals. J Supercomput 75(2):746–769CrossRef
19.
go back to reference Chen H, Zhu J, Zhang Z et al (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Super Comput 73:4906–4921CrossRef Chen H, Zhu J, Zhang Z et al (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Super Comput 73:4906–4921CrossRef
20.
go back to reference Zeng L, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151CrossRef Zeng L, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151CrossRef
21.
go back to reference Pandey S, Wu L, Guru SM, Buyya R (2010) A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: Proceedings of International Conference on Advanced Information Networking and Applications AINA, pp 400–407 Pandey S, Wu L, Guru SM, Buyya R (2010) A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: Proceedings of International Conference on Advanced Information Networking and Applications AINA, pp 400–407
22.
go back to reference Lee YC, Zomaya AY (2010) Rescheduling for reliable job completion with the support of clouds. Futur Gener Comput Syst 26(8):1192–1199CrossRef Lee YC, Zomaya AY (2010) Rescheduling for reliable job completion with the support of clouds. Futur Gener Comput Syst 26(8):1192–1199CrossRef
23.
go back to reference Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48CrossRef Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48CrossRef
24.
go back to reference Yu J, Buyya R (2004) A novel architecture for realizing grid workflow using tuple spaces. In: GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on GridComputing. Washington, DC, USA: IEEE, 2004, pp 119–128 Yu J, Buyya R (2004) A novel architecture for realizing grid workflow using tuple spaces. In: GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on GridComputing. Washington, DC, USA: IEEE, 2004, pp 119–128
26.
go back to reference Abrishami S, Naghibzadeh M, Epema DHJ (2012) “Cost-driven scheduling of Grid workflows using partial critical paths. IEEE Trans Parallel Distrib Syst 23(8):1400–1414CrossRef Abrishami S, Naghibzadeh M, Epema DHJ (2012) “Cost-driven scheduling of Grid workflows using partial critical paths. IEEE Trans Parallel Distrib Syst 23(8):1400–1414CrossRef
27.
go back to reference Chen W, Deelman E (2012) WorkflowSim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science, e-Science 2012 Chen W, Deelman E (2012) WorkflowSim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science, e-Science 2012
28.
go back to reference Palankar MR, Iamnitchi A, Ripeanu M, Garfinkel S (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC’08, ACM, New York, NY, USA, 2008, pp 55–64 Palankar MR, Iamnitchi A, Ripeanu M, Garfinkel S (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC’08, ACM, New York, NY, USA, 2008, pp 55–64
29.
go back to reference Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: The 3rd Workshop on Workflows in Support of Large Scale Science, (WORKS 08) Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: The 3rd Workshop on Workflows in Support of Large Scale Science, (WORKS 08)
30.
go back to reference Topcuoglu H, Hariri S, Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef Topcuoglu H, Hariri S, Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef
31.
go back to reference Chen F, Schneider J-G, Yang Y, Grundy J, He Q (2012) An energy consumption model and analysis tool for cloud computing environments. In: GREENS 2012, Zuricg, Switzerland, pp 45–50 Chen F, Schneider J-G, Yang Y, Grundy J, He Q (2012) An energy consumption model and analysis tool for cloud computing environments. In: GREENS 2012, Zuricg, Switzerland, pp 45–50
32.
go back to reference Mustafa S, Nazir B, Hayat A, Madani SA (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Elect Eng 47:186–203CrossRef Mustafa S, Nazir B, Hayat A, Madani SA (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Elect Eng 47:186–203CrossRef
33.
go back to reference Ahmad Z, Nazir B, Umer A (2021) A fault-tolerant workflow management system with Quality of Service-aware scheduling for scientific workflows in cloud computing. Int J Commun Syst 34(1):e4649 Ahmad Z, Nazir B, Umer A (2021) A fault-tolerant workflow management system with Quality of Service-aware scheduling for scientific workflows in cloud computing. Int J Commun Syst 34(1):e4649
34.
go back to reference Qureshi K, Khan FG, Manuel P, Nazir B (2011) A hybrid fault tolerance technique in grid computing system. J Supercomput 56(1):106–128CrossRef Qureshi K, Khan FG, Manuel P, Nazir B (2011) A hybrid fault tolerance technique in grid computing system. J Supercomput 56(1):106–128CrossRef
Metadata
Title
Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform
Authors
Zain Ulabedin
Babar Nazir
Publication date
10-03-2021
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 10/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03541-2

Other articles of this Issue 10/2021

The Journal of Supercomputing 10/2021 Go to the issue

Premium Partner