Skip to main content
Erschienen in: The Journal of Supercomputing 6/2018

08.03.2018

Framework for automated partitioning and execution of scientific workflows in the cloud

verfasst von: Jaagup Viil, Satish Narayana Srirama

Erschienen in: The Journal of Supercomputing | Ausgabe 6/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Scientific workflows have become a standardized way for scientists to represent a set of tasks to overcome/solve a certain scientific problem. Usually these workflows consist of numerous CPU and I/O-intensive jobs that are executed using workflow management systems (WfMS), on clouds, grids, supercomputers, etc. Previously, it was shown that using k-way partitioning to distribute a workflow’s tasks between multiple machines in the cloud reduces the overall data communication and therefore lowers the cost of the bandwidth usage. A framework was built to automate this process of partitioning and execution of any workflow submitted by a scientist that is meant to be run on Pegasus WfMS, in the cloud, with ease. The framework provisions the instances in the cloud using CloudML, configures and installs all the software needed for the execution, partitions and runs the provided scientific workflow, also showing the estimated makespan and cost.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal R, Juve G, Deelman E (2012) Peer-to-peer data sharing for scientific workflows on amazon ec2. In: High performance computing, networking, storage and analysis (SCC), 2012 SC companion (pp 82–89). IEEE Agarwal R, Juve G, Deelman E (2012) Peer-to-peer data sharing for scientific workflows on amazon ec2. In: High performance computing, networking, storage and analysis (SCC), 2012 SC companion (pp 82–89). IEEE
2.
Zurück zum Zitat Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S (2004) Kepler: an extensible system for design and execution of scientific workflows. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on (pp 423–424). IEEE Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S (2004) Kepler: an extensible system for design and execution of scientific workflows. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on (pp 423–424). IEEE
7.
Zurück zum Zitat Bass L, Weber I, Zhu L (2015) DevOps: a software architect’s perspective. Addison-Wesley Professional Bass L, Weber I, Zhu L (2015) DevOps: a software architect’s perspective. Addison-Wesley Professional
9.
Zurück zum Zitat Bharathi S, Chervenak A, Deelman E, Mehta G, Su M.H, Vahi K (2008) Characterization of scientific workflows. In: Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on (pp 1–10). IEEE Bharathi S, Chervenak A, Deelman E, Mehta G, Su M.H, Vahi K (2008) Characterization of scientific workflows. In: Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on (pp 1–10). IEEE
10.
Zurück zum Zitat Blumenthal A (2016) How isi’s pegasus helped scientists make the discovery of a century. Accessible: https://viterbi.usc.edu/news/news/2016/isi-gravitational-waves-software-pegasus.htm. Visited (22.04.2014) Blumenthal A (2016) How isi’s pegasus helped scientists make the discovery of a century. Accessible: https://​viterbi.​usc.​edu/​news/​news/​2016/​isi-gravitational-waves-software-pegasus.​htm.​ Visited (22.04.2014)
11.
Zurück zum Zitat Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. In: Algorithm engineering. Springer, pp 117–158 Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. In: Algorithm engineering. Springer, pp 117–158
12.
Zurück zum Zitat Çatalyürek Ü, Aykanat C (2011) Patoh (partitioning tool for hypergraphs). In: Padua D (ed) Encyclopedia of parallel computing. Springer, New York, pp 1479–1487 Çatalyürek Ü, Aykanat C (2011) Patoh (partitioning tool for hypergraphs). In: Padua D (ed) Encyclopedia of parallel computing. Springer, New York, pp 1479–1487
13.
Zurück zum Zitat Çatalyürek UV, Kaya K, Uçar B (2011) Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing (DIDC ’11) (pp 45–54). ACM. https://doi.org/10.1145/1996014.1996022 Çatalyürek UV, Kaya K, Uçar B (2011) Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing (DIDC ’11) (pp 45–54). ACM. https://​doi.​org/​10.​1145/​1996014.​1996022
15.
Zurück zum Zitat Chirkin AM, Belloum AS, Kovalchuk SV, Makkes MX, Melnik MA, Visheratin AA, Nasonov DA (2017) Execution time estimation for workflow scheduling. Future Gener Comput Syst 75:376–387CrossRef Chirkin AM, Belloum AS, Kovalchuk SV, Makkes MX, Melnik MA, Visheratin AA, Nasonov DA (2017) Execution time estimation for workflow scheduling. Future Gener Comput Syst 75:376–387CrossRef
16.
Zurück zum Zitat Deelman E, Singh G, Livny M, Berriman B, Good J (2008) The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (p 50). IEEE Press Deelman E, Singh G, Livny M, Berriman B, Good J (2008) The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (p 50). IEEE Press
18.
Zurück zum Zitat Ferry N, Chauvel F, Rossini A, Morin B, Solberg A (2013) Managing multi-cloud systems with cloudmf. In: Proceedings of the Second Nordic Symposium on Cloud Computing and Internet Technologies (NordiCloud ’13) (pp 38–45). ACM. https://doi.org/10.1145/2513534.2513542 Ferry N, Chauvel F, Rossini A, Morin B, Solberg A (2013) Managing multi-cloud systems with cloudmf. In: Proceedings of the Second Nordic Symposium on Cloud Computing and Internet Technologies (NordiCloud ’13) (pp 38–45). ACM. https://​doi.​org/​10.​1145/​2513534.​2513542
20.
Zurück zum Zitat Golab L, Hadjieleftheriou M, Karloff H, Saha B (2014) Distributed data placement to minimize communication costs via graph partitioning. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management (p 20). ACM Golab L, Hadjieleftheriou M, Karloff H, Saha B (2014) Distributed data placement to minimize communication costs via graph partitioning. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management (p 20). ACM
21.
Zurück zum Zitat Goncalves G, Endo P, Santos M, Sadok D, Kelner J, Melander B, Mangs JE (2011) Cloudml: an integrated language for resource, service and request description for d-clouds. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on (pp 399–406). IEEE Goncalves G, Endo P, Santos M, Sadok D, Kelner J, Melander B, Mangs JE (2011) Cloudml: an integrated language for resource, service and request description for d-clouds. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on (pp 399–406). IEEE
22.
Zurück zum Zitat Graves R, Jordan TH, Callaghan S, Deelman E, Field E, Juve G, Kesselman C, Maechling P, Mehta G, Milner K et al (2011) Cybershake: a physics-based seismic hazard model for southern California. Pure Appl Geophys 168(3–4):367–381CrossRef Graves R, Jordan TH, Callaghan S, Deelman E, Field E, Juve G, Kesselman C, Maechling P, Mehta G, Milner K et al (2011) Cybershake: a physics-based seismic hazard model for southern California. Pure Appl Geophys 168(3–4):367–381CrossRef
23.
Zurück zum Zitat Hendrickson B, Leland R (1995) The chaco users guide: Version 2.0. Tech. rep., Technical Report SAND95-2344, Sandia National Laboratories Hendrickson B, Leland R (1995) The chaco users guide: Version 2.0. Tech. rep., Technical Report SAND95-2344, Sandia National Laboratories
24.
Zurück zum Zitat Hiden H, Woodman S, Watson P (2013) A framework for dynamically generating predictive models of workflow execution. In: Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science (pp 77–87). ACM Hiden H, Woodman S, Watson P (2013) A framework for dynamically generating predictive models of workflow execution. In: Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science (pp 77–87). ACM
25.
Zurück zum Zitat Hiden H, Woodman S, Watson P, Cala J (2013) Developing cloud applications using the e-science central platform. Philos Trans R Soc A 371(1983):20120,085CrossRef Hiden H, Woodman S, Watson P, Cala J (2013) Developing cloud applications using the e-science central platform. Philos Trans R Soc A 371(1983):20120,085CrossRef
26.
Zurück zum Zitat Juve G, Deelman E (2011) Automating application deployment in infrastructure clouds. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on (pp 658–665). IEEE Juve G, Deelman E (2011) Automating application deployment in infrastructure clouds. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on (pp 658–665). IEEE
27.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH
28.
Zurück zum Zitat Lin C, Lu S (2011) Scheduling scientific workflows elastically for cloud computing. In: Cloud Computing (CLOUD), 2011 IEEE International Conference on (pp 746–747). IEEE Lin C, Lu S (2011) Scheduling scientific workflows elastically for cloud computing. In: Cloud Computing (CLOUD), 2011 IEEE International Conference on (pp 746–747). IEEE
30.
Zurück zum Zitat Liu Y, Khan SM, Wang J, Rynge M, Zhang Y, Zeng S, Chen S, dos Santos JVM, Valliyodan B, Calyam PP et al (2016) Pgen: large-scale genomic variations analysis workflow and browser in SoyKB. BMC Bioinformatics 17(13):337CrossRef Liu Y, Khan SM, Wang J, Rynge M, Zhang Y, Zeng S, Chen S, dos Santos JVM, Valliyodan B, Calyam PP et al (2016) Pgen: large-scale genomic variations analysis workflow and browser in SoyKB. BMC Bioinformatics 17(13):337CrossRef
31.
Zurück zum Zitat Miu T, Missier P (2012) Predicting the execution time of workflow activities based on their input features. In: High performance computing, networking, storage and analysis (SCC), 2012 SC companion (pp 64–72). IEEE Miu T, Missier P (2012) Predicting the execution time of workflow activities based on their input features. In: High performance computing, networking, storage and analysis (SCC), 2012 SC companion (pp 64–72). IEEE
33.
Zurück zum Zitat Pietri I, Juve G, Deelman E, Sakellariou R (2014) A performance model to estimate execution time of scientific workflows on the cloud. In: Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science (pp 11–19). IEEE Press. https://doi.org/10.1109/WORKS.2014.12 Pietri I, Juve G, Deelman E, Sakellariou R (2014) A performance model to estimate execution time of scientific workflows on the cloud. In: Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science (pp 11–19). IEEE Press. https://​doi.​org/​10.​1109/​WORKS.​2014.​12
34.
Zurück zum Zitat Poola D, Garg SK, Buyya R, Yang Y, Ramamohanarao K (2014) Robust scheduling of scientific workflows with deadline and budget constraints in clouds. In: Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on (pp 858–865). IEEE Poola D, Garg SK, Buyya R, Yang Y, Ramamohanarao K (2014) Robust scheduling of scientific workflows with deadline and budget constraints in clouds. In: Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on (pp 858–865). IEEE
36.
Zurück zum Zitat Rodriguez MA, Buyya R (2017) Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms. Future Gener Comput Syst 79:739–750CrossRef Rodriguez MA, Buyya R (2017) Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms. Future Gener Comput Syst 79:739–750CrossRef
39.
Zurück zum Zitat Srirama S, Batrashev O, Vainikko E (2010) Scicloud: scientific computing on the cloud. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (pp 579–580). IEEE Computer Society Srirama S, Batrashev O, Vainikko E (2010) Scicloud: scientific computing on the cloud. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (pp 579–580). IEEE Computer Society
41.
Zurück zum Zitat Srirama SN, Iurii T, Viil J (2016) Dynamic deployment and auto-scaling enterprise applications on the heterogeneous cloud. In: Cloud Computing (CLOUD), 2016 IEEE 9th International Conference on (pp 927–932). IEEE Srirama SN, Iurii T, Viil J (2016) Dynamic deployment and auto-scaling enterprise applications on the heterogeneous cloud. In: Cloud Computing (CLOUD), 2016 IEEE 9th International Conference on (pp 927–932). IEEE
42.
Zurück zum Zitat Srirama SN, Ostovar A (2014) Optimal resource provisioning for scaling enterprise applications on the cloud. In: Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on (pp 262–271). IEEE Srirama SN, Ostovar A (2014) Optimal resource provisioning for scaling enterprise applications on the cloud. In: Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on (pp 262–271). IEEE
43.
Zurück zum Zitat Srirama SN, Viil J (2014) Migrating scientific workflows to the cloud: through graph-partitioning, scheduling and peer-to-peer data sharing. In: 16th IEEE International Conference on High Performance Computing and Communications (HPCC 2014) (pp 1105–1112). IEEE Srirama SN, Viil J (2014) Migrating scientific workflows to the cloud: through graph-partitioning, scheduling and peer-to-peer data sharing. In: 16th IEEE International Conference on High Performance Computing and Communications (HPCC 2014) (pp 1105–1112). IEEE
44.
Zurück zum Zitat Tanaka M, Tatebe O (2012) Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on (pp 65–72). IEEE Tanaka M, Tatebe O (2012) Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on (pp 65–72). IEEE
45.
Zurück zum Zitat Tannenbaum T, Wright D, Miller K, Livny M (2002) Condor: a distributed job scheduler. In: Sterling TL (ed) Beowulf cluster computing with linux. MIT Press, Cambridge, pp 307–350 Tannenbaum T, Wright D, Miller K, Livny M (2002) Condor: a distributed job scheduler. In: Sterling TL (ed) Beowulf cluster computing with linux. MIT Press, Cambridge, pp 307–350
46.
Zurück zum Zitat Thapliyal H, Arabnia HR, Bajpai R, Sharma KK (2007) Combined integer and variable precision (CIVP) floating point multiplication architecture for FPGAs. In: Proceedings of 2007 International Conference on Parallel & Distributed Processing Techniques & Applications, PDPTA’07, USA, pp 449–450 Thapliyal H, Arabnia HR, Bajpai R, Sharma KK (2007) Combined integer and variable precision (CIVP) floating point multiplication architecture for FPGAs. In: Proceedings of 2007 International Conference on Parallel & Distributed Processing Techniques & Applications, PDPTA’07, USA, pp 449–450
47.
Zurück zum Zitat Topcuoglu H, Hariri S, Wu My (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef Topcuoglu H, Hariri S, Wu My (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRef
49.
Zurück zum Zitat Vukojevic-Haupt K, Haupt F, Leymann F, Reinfurt L (2015) Bootstrapping complex workflow middleware systems into the cloud. In: e-Science (e-Science), 2015 IEEE 11th International Conference on (pp 126–135). IEEE Vukojevic-Haupt K, Haupt F, Leymann F, Reinfurt L (2015) Bootstrapping complex workflow middleware systems into the cloud. In: e-Science (e-Science), 2015 IEEE 11th International Conference on (pp 126–135). IEEE
Metadaten
Titel
Framework for automated partitioning and execution of scientific workflows in the cloud
verfasst von
Jaagup Viil
Satish Narayana Srirama
Publikationsdatum
08.03.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 6/2018
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2296-7

Weitere Artikel der Ausgabe 6/2018

The Journal of Supercomputing 6/2018 Zur Ausgabe