Skip to main content
Top
Published in: International Journal of Data Science and Analytics 2/2018

06-03-2018 | Review

The many faces of data-centric workflow optimization: a survey

Authors: Georgia Kougka, Anastasios Gounaris, Alkis Simitsis

Published in: International Journal of Data Science and Analytics | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Workflow technology is rapidly evolving and, rather than being limited to modeling the control flow in business processes, is becoming a key mechanism to perform advanced data management, such as big data analytics. This survey focuses on data-centric workflows (or workflows for data analytics or data flows), where a key aspect is data passing through and getting manipulated by a sequence of steps. The large volume and variety of data, the complexity of operations performed, and the long time such workflows take to compute give rise to the need for optimization. In general, data-centric workflow optimization is a technology in evolution. This survey focuses on techniques applicable to workflows comprising arbitrary types of data manipulation steps and semantic inter-dependencies between such steps. Further, it serves a twofold purpose: firstly, to present the main dimensions of the relevant optimization problems and the types of optimizations that occur before flow execution and secondly, to provide a concise overview of the existing approaches with a view to highlighting key observations and areas deserving more attention from the community.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Hereafter, these three terms will be used interchangeably; the terms workflow and flow will be used interchangeably, too.
 
2
The terms technique, proposal, and work will be used interchangeably.
 
3
Through considering optimizations starting from a valid initial flow, we exclude from our survey the big area of answering queries in the presence of limited access patterns, in which, the main aim is to construct such an initial plan [69, 78] through selecting an appropriate subset of tasks from a given task pool; however, we have considered works from data integration that optimize the plan after it has been devised, such as [111] or [34], which is subsumed by Kougka and Gounaris [60].
 
4
www.​myexperiment.​org/​ in bio-informatics.
 
Literature
2.
go back to reference Abadi, D.J., Agrawal, R., Ailamaki, A., Balazinska, M., Bernstein, P.A., Carey, M.J., Chaudhuri, S., Dean, J., Doan, A., Franklin, M.J., Gehrke, J., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Jagadish, H.V., Kossmann, D., Madden, S., Mehrotra, S., Milo, T., Naughton, J.F., Ramakrishnan, R., Markl, V., Olston, C., Ooi, B.C., Ré, C., Suciu, D., Stonebraker, M., Walter, T., Widom, J.: The beckman report on database research. SIGMOD Rec. 43(3), 61–70 (2014)CrossRef Abadi, D.J., Agrawal, R., Ailamaki, A., Balazinska, M., Bernstein, P.A., Carey, M.J., Chaudhuri, S., Dean, J., Doan, A., Franklin, M.J., Gehrke, J., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Jagadish, H.V., Kossmann, D., Madden, S., Mehrotra, S., Milo, T., Naughton, J.F., Ramakrishnan, R., Markl, V., Olston, C., Ooi, B.C., Ré, C., Suciu, D., Stonebraker, M., Walter, T., Widom, J.: The beckman report on database research. SIGMOD Rec. 43(3), 61–70 (2014)CrossRef
3.
go back to reference Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013)CrossRef Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013)CrossRef
4.
go back to reference Abrishami, S., Naghibzadeh, M., Epema, D.H.J.: Cost-driven scheduling of grid workflows using partial critical paths. IEEE Trans. Parallel Distrib. Syst. 23(8), 1400–1414 (2012)CrossRef Abrishami, S., Naghibzadeh, M., Epema, D.H.J.: Cost-driven scheduling of grid workflows using partial critical paths. IEEE Trans. Parallel Distrib. Syst. 23(8), 1400–1414 (2012)CrossRef
5.
go back to reference Agrawal, K., Benoit, A., Dufossé, F., Robert, Y.: Mapping filtering streaming applications with communication costs. In: SPAA, pp. 19–28 (2009) Agrawal, K., Benoit, A., Dufossé, F., Robert, Y.: Mapping filtering streaming applications with communication costs. In: SPAA, pp. 19–28 (2009)
6.
go back to reference Agrawal, K., Benoit, A., Dufossé, F., Robert, Y.: Mapping filtering streaming applications. Algorithmica 62(1–2), 258–308 (2012)MathSciNetCrossRefMATH Agrawal, K., Benoit, A., Dufossé, F., Robert, Y.: Mapping filtering streaming applications. Algorithmica 62(1–2), 258–308 (2012)MathSciNetCrossRefMATH
7.
go back to reference Agrawal, K., Benoit, A., Magnan, L., Robert, Y.: Scheduling algorithms for linear workflow optimization. In: IPDPS, pp. 1–12 (2010) Agrawal, K., Benoit, A., Magnan, L., Robert, Y.: Scheduling algorithms for linear workflow optimization. In: IPDPS, pp. 1–12 (2010)
8.
go back to reference Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRef Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRef
9.
go back to reference Barker, A., van Hemert, J.I.: Scientific workflow: a survey and research directions. In: PPAM, Lecture Notes in Computer Science, vol. 4967, pp. 746–753 (2007) Barker, A., van Hemert, J.I.: Scientific workflow: a survey and research directions. In: PPAM, Lecture Notes in Computer Science, vol. 4967, pp. 746–753 (2007)
10.
go back to reference Benoit, A., Çatalyürek, U.V., Robert, Y., Saule, E.: A survey of pipelined workflow scheduling: models and algorithms. ACM Comput. Surv. 45(4), 50:1–50:36 (2013)CrossRef Benoit, A., Çatalyürek, U.V., Robert, Y., Saule, E.: A survey of pipelined workflow scheduling: models and algorithms. ACM Comput. Surv. 45(4), 50:1–50:36 (2013)CrossRef
11.
go back to reference Bhattacharya, K., Hull, R., Su, J.: A data-centric design methodology for business processes. In: Handbook of Research on Business Process Modeling, Chapter 23, 503–531 (2009) Bhattacharya, K., Hull, R., Su, J.: A data-centric design methodology for business processes. In: Handbook of Research on Business Process Modeling, Chapter 23, 503–531 (2009)
12.
go back to reference Böhm, M.: Cost-based optimization of integration flows. Ph.D. thesis (2011) Böhm, M.: Cost-based optimization of integration flows. Ph.D. thesis (2011)
13.
go back to reference Böhm, M., Habich, D., Lehner, W.: On-demand re-optimization of integration flows. Inf. Syst. 45, 1–17 (2014)CrossRef Böhm, M., Habich, D., Lehner, W.: On-demand re-optimization of integration flows. Inf. Syst. 45, 1–17 (2014)CrossRef
14.
go back to reference Böhm, M., Tatikonda, S., Reinwald, B., Sen, P., Tian, Y., Burdick, D., Vaithyanathan, S.: Hybrid parallelization strategies for large-scale machine learning in systemml. PVLDB 7(7), 553–564 (2014) Böhm, M., Tatikonda, S., Reinwald, B., Sen, P., Tian, Y., Burdick, D., Vaithyanathan, S.: Hybrid parallelization strategies for large-scale machine learning in systemml. PVLDB 7(7), 553–564 (2014)
15.
go back to reference Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. PVLDB 1(1), 562–573 (2008) Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. PVLDB 1(1), 562–573 (2008)
16.
go back to reference Burge, J., Munagala, K., Srivastava, U.: Ordering pipelined query operators with precedence constraints. Technical Report 2005-40, Stanford InfoLab (2005) Burge, J., Munagala, K., Srivastava, U.: Ordering pipelined query operators with precedence constraints. Technical Report 2005-40, Stanford InfoLab (2005)
17.
go back to reference Calheiros, R.N., Buyya, R.: Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans. Parallel Distrib. Syst. 25(7), 1787–1796 (2014)CrossRef Calheiros, R.N., Buyya, R.: Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans. Parallel Distrib. Syst. 25(7), 1787–1796 (2014)CrossRef
18.
go back to reference Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1–3, 1998, Seattle, Washington, pp. 34–43 (1998) Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1–3, 1998, Seattle, Washington, pp. 34–43 (1998)
19.
go back to reference Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54, 88–98 (2011)CrossRef Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54, 88–98 (2011)CrossRef
20.
go back to reference Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Trans. Database Syst. 24(2), 177–228 (1999)CrossRef Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Trans. Database Syst. 24(2), 177–228 (1999)CrossRef
21.
go back to reference Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics—Volume Part II, PPAM’11, pp. 11–20 (2012) Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics—Volume Part II, PPAM’11, pp. 11–20 (2012)
22.
go back to reference Chen, W.N., Zhang, J.: An ant colony optimization approach to a grid workflow scheduling problem with various qos requirements. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(1), 29–43 (2009)CrossRef Chen, W.N., Zhang, J.: An ant colony optimization approach to a grid workflow scheduling problem with various qos requirements. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(1), 29–43 (2009)CrossRef
23.
go back to reference Chirkin, A.M., Belloum, A., Kovalchuk, S.V., Makkes, M.X.: Execution time estimation for workflow scheduling. In: Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, pp. 1–10. IEEE Press (2014) Chirkin, A.M., Belloum, A., Kovalchuk, S.V., Makkes, M.X.: Execution time estimation for workflow scheduling. In: Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, pp. 1–10. IEEE Press (2014)
24.
go back to reference Cohen-Boulakia, S., Chen, J., Goble, C., Missier, P., Williams, A., Froidevaux, C.: Distilling structure in taverna scientific workflows: a refactoring approach. BMC Bioinformatics 15(1), S12 (2014)CrossRef Cohen-Boulakia, S., Chen, J., Goble, C., Missier, P., Williams, A., Froidevaux, C.: Distilling structure in taverna scientific workflows: a refactoring approach. BMC Bioinformatics 15(1), S12 (2014)CrossRef
25.
go back to reference Crotty, A., Galakatos, A., Dursun, K., Kraska, T., Binnig, C., Çetintemel, U., Zdonik, S.: An architecture for compiling udf-centric workflows. PVLDB 8(12), 1466–1477 (2015) Crotty, A., Galakatos, A., Dursun, K., Kraska, T., Binnig, C., Çetintemel, U., Zdonik, S.: An architecture for compiling udf-centric workflows. PVLDB 8(12), 1466–1477 (2015)
26.
go back to reference Curcin, V., Ghanem, M.: Scientific workflow systems—can one size fit all? In: Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International, pp. 1–9 (2008) Curcin, V., Ghanem, M.: Scientific workflow systems—can one size fit all? In: Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International, pp. 1–9 (2008)
27.
go back to reference Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data integration flows for business intelligence. In: Proceedings of EDBT, pp. 1–11 (2009) Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data integration flows for business intelligence. In: Proceedings of EDBT, pp. 1–11 (2009)
28.
go back to reference de Oliveira, D., Ogasawara, E.S., Dias, J., Baio, F.A., Mattoso, M.: Ontology-based semi-automatic workflow composition. JIDM 3(1), 61–72 (2012) de Oliveira, D., Ogasawara, E.S., Dias, J., Baio, F.A., Mattoso, M.: Ontology-based semi-automatic workflow composition. JIDM 3(1), 61–72 (2012)
29.
go back to reference Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRef Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRef
30.
go back to reference Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005) Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
31.
go back to reference Deshpande, A., Hellerstein, L.: Parallel pipelined filter ordering with precedence constraints. ACM Trans. Algorithms 8(4), 41:1–41:38 (2012)MathSciNetCrossRefMATH Deshpande, A., Hellerstein, L.: Parallel pipelined filter ordering with precedence constraints. ACM Trans. Algorithms 8(4), 41:1–41:38 (2012)MathSciNetCrossRefMATH
32.
go back to reference Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: state of the art and open problems. Technical report (2006) Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: state of the art and open problems. Technical report (2006)
33.
go back to reference Fard, H., Prodan, R., Fahringer, T.: A truthful dynamic workflow scheduling mechanism for commercial multicloud environments. IEEE Trans. Parallel Distrib. Syst. 24(6), 1203–1212 (2013)CrossRef Fard, H., Prodan, R., Fahringer, T.: A truthful dynamic workflow scheduling mechanism for commercial multicloud environments. IEEE Trans. Parallel Distrib. Syst. 24(6), 1203–1212 (2013)CrossRef
34.
go back to reference Florescu, D., Levy, A., Manolescu, I., Suciu, D.: Query optimization in the presence of limited access patterns. In: ACM SIGMOD, pp. 311–322 (1999) Florescu, D., Levy, A., Manolescu, I., Suciu, D.: Query optimization in the presence of limited access patterns. In: ACM SIGMOD, pp. 311–322 (1999)
35.
go back to reference Garcia-Molina, H., Ullman, J.D., Widom, J.D.: Database Systems: The Complete Book. Prentice Hall, Upper Saddle River (2001) Garcia-Molina, H., Ullman, J.D., Widom, J.D.: Database Systems: The Complete Book. Prentice Hall, Upper Saddle River (2001)
37.
go back to reference Grehant, X., Demeure, I., Jarp, S.: A survey of task mapping on production grids. ACM Comput. Surv. 45(3), 37:1–37:25 (2013)CrossRefMATH Grehant, X., Demeure, I., Jarp, S.: A survey of task mapping on production grids. ACM Comput. Surv. 45(3), 37:1–37:25 (2013)CrossRefMATH
38.
go back to reference Gu, Y., Wu, Q., Rao, N.S.V.: Analyzing execution dynamics of scientific workflows for latency minimization in resource sharing environments. In: Proceedings of the 2011 IEEE World Congress on Services, pp. 153–160 (2011) Gu, Y., Wu, Q., Rao, N.S.V.: Analyzing execution dynamics of scientific workflows for latency minimization in resource sharing environments. In: Proceedings of the 2011 IEEE World Congress on Services, pp. 153–160 (2011)
39.
go back to reference Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an ETL workflow. In: EDBT, pp. 307–318 (2014) Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an ETL workflow. In: EDBT, pp. 307–318 (2014)
40.
go back to reference Hellerstein, J.M.: Optimization techniques for queries with expensive methods. ACM Trans. Database Syst. 23(2), 113–157 (1998)CrossRef Hellerstein, J.M.: Optimization techniques for queries with expensive methods. ACM Trans. Database Syst. 23(2), 113–157 (1998)CrossRef
41.
go back to reference Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. PVLDB 4(11), 1111–1122 (2011) Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. PVLDB 4(11), 1111–1122 (2011)
42.
go back to reference Holl, S., Zimmermann, O., Hofmann-Apitius, M.: A new optimization phase for scientific workflow management systems. In: eScience, pp. 1–8 (2012) Holl, S., Zimmermann, O., Hofmann-Apitius, M.: A new optimization phase for scientific workflow management systems. In: eScience, pp. 1–8 (2012)
43.
go back to reference Holzinger, A., Stocker, C., Ofner, B., Prohaska, G., Brabenetz, A., Hofmann-Wellenhof, R.: Combining HCI, natural language processing, and knowledge discovery—potential of IBM content analytics as an assistive technology in the biomedical field. In: Human–Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data—Third International Workshop, HCI-KDD, pp. 13–24 (2013) Holzinger, A., Stocker, C., Ofner, B., Prohaska, G., Brabenetz, A., Hofmann-Wellenhof, R.: Combining HCI, natural language processing, and knowledge discovery—potential of IBM content analytics as an assistive technology in the biomedical field. In: Human–Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data—Third International Workshop, HCI-KDD, pp. 13–24 (2013)
44.
go back to reference Huang, B., Babu, S., Yang, J.: Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2013) Huang, B., Babu, S., Yang, J.: Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2013)
45.
go back to reference Huang, B., Böhm, M., Tian, Y., Reinwald, B., Tatikonda, S., Reiss, F.R.: Resource elasticity for large-scale machine learning. In: SIGMOD’15, pp. 137–152 (2015) Huang, B., Böhm, M., Tian, Y., Reinwald, B., Tatikonda, S., Reiss, F.R.: Resource elasticity for large-scale machine learning. In: SIGMOD’15, pp. 137–152 (2015)
46.
go back to reference Huang, B., Jarrett, N.W.D., Babu, S., Mukherjee, S., Yang, J.: Cümülön: Matrix-based data analytics in the cloud with spot instances. Proc. VLDB Endow. 9(3), 156–167 (2015)CrossRef Huang, B., Jarrett, N.W.D., Babu, S., Mukherjee, S., Yang, J.: Cümülön: Matrix-based data analytics in the cloud with spot instances. Proc. VLDB Endow. 9(3), 156–167 (2015)CrossRef
47.
go back to reference Hueske, F., Peters, M., Sax, M., Rheinländer, A., Bergmann, R., Krettek, A., Tzoumas, K.: Opening the black boxes in data flow optimization. PVLDB 5(11), 1256–1267 (2012) Hueske, F., Peters, M., Sax, M., Rheinländer, A., Bergmann, R., Krettek, A., Tzoumas, K.: Opening the black boxes in data flow optimization. PVLDB 5(11), 1256–1267 (2012)
48.
go back to reference Informatica: How to achieve flexible, cost-effective scalability and performance through pushdown processing. White Paper (2007) Informatica: How to achieve flexible, cost-effective scalability and performance through pushdown processing. White Paper (2007)
49.
go back to reference Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)CrossRef Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)CrossRef
50.
go back to reference Jin, T., Zhang, F., Sun, Q., Bui, H., Parashar, M., Yu, H., Klasky, S., Podhorszki, N., Abbasi, H.: Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC’13, p. 74 (2013) Jin, T., Zhang, F., Sun, Q., Bui, H., Parashar, M., Yu, H., Klasky, S., Podhorszki, N., Abbasi, H.: Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC’13, p. 74 (2013)
51.
go back to reference Jovanovic, P., Romero, O., Abelló, A.: A unified view of data-intensive flows in business intelligence systems: a survey. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIX, pp. 66–107. Springer, Berlin (2016) Jovanovic, P., Romero, O., Abelló, A.: A unified view of data-intensive flows in business intelligence systems: a survey. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIX, pp. 66–107. Springer, Berlin (2016)
52.
go back to reference Jovanovic, P., Romero, O., Simitsis, A., Abell, A.: Incremental consolidation of data-intensive multi-flows. IEEE Trans. Knowl. Data Eng. 28(5), 1203–1216 (2016)CrossRef Jovanovic, P., Romero, O., Simitsis, A., Abell, A.: Incremental consolidation of data-intensive multi-flows. IEEE Trans. Knowl. Data Eng. 28(5), 1203–1216 (2016)CrossRef
53.
go back to reference Jovanovic, P., Simitsis, A., Wilkinson, K.: Babbleflow: a translator for analytic data flow programs. In: SIGMOD, pp. 713–716 (2014) Jovanovic, P., Simitsis, A., Wilkinson, K.: Babbleflow: a translator for analytic data flow programs. In: SIGMOD, pp. 713–716 (2014)
54.
go back to reference Jovanovic, P., Simitsis, A., Wilkinson, K.: Engine independence for logical analytic flows. In: ICDE, pp. 1060–1071 (2014) Jovanovic, P., Simitsis, A., Wilkinson, K.: Engine independence for logical analytic flows. In: ICDE, pp. 1060–1071 (2014)
55.
go back to reference Juve, G., Chervenak, A.L., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29(3), 682–692 (2013)CrossRef Juve, G., Chervenak, A.L., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29(3), 682–692 (2013)CrossRef
56.
go back to reference Karagiannis, A., Vassiliadis, P., Simitsis, A.: Scheduling strategies for efficient ETL execution. Inf. Syst. 38(6), 927–945 (2013)CrossRef Karagiannis, A., Vassiliadis, P., Simitsis, A.: Scheduling strategies for efficient ETL execution. Inf. Syst. 38(6), 927–945 (2013)CrossRef
57.
go back to reference Kllapi, H., Sitaridi, E., Tsangaris, M.M., Ioannidis, Y.: Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2011) Kllapi, H., Sitaridi, E., Tsangaris, M.M., Ioannidis, Y.: Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2011)
58.
go back to reference Kougka, G., Gounaris, A.: Declarative expression and optimization of data-intensive flows. In: DaWaK, pp. 13–25 (2013) Kougka, G., Gounaris, A.: Declarative expression and optimization of data-intensive flows. In: DaWaK, pp. 13–25 (2013)
59.
go back to reference Kougka, G., Gounaris, A.: Optimization of data-intensive flows: is it needed? is it solved? In: Proceedings of the 17th International Workshop on Data Warehousing and OLAP, DOLAP 2014, Shanghai, November 3–7, 2014, pp. 95–98 (2014) Kougka, G., Gounaris, A.: Optimization of data-intensive flows: is it needed? is it solved? In: Proceedings of the 17th International Workshop on Data Warehousing and OLAP, DOLAP 2014, Shanghai, November 3–7, 2014, pp. 95–98 (2014)
60.
go back to reference Kougka, G., Gounaris, A.: Cost optimization of data flows based on task re-ordering. In: LNCS Transactions on Large-Scale Data- and Knowledge-Centered Systems (2017, to appear) Kougka, G., Gounaris, A.: Cost optimization of data flows based on task re-ordering. In: LNCS Transactions on Large-Scale Data- and Knowledge-Centered Systems (2017, to appear)
61.
go back to reference Kougka, G., Gounaris, A.: Optimal task ordering in chain data flows: exploring the practicality of non-scalable solutions. In: DaWaK (2017) Kougka, G., Gounaris, A.: Optimal task ordering in chain data flows: exploring the practicality of non-scalable solutions. In: DaWaK (2017)
62.
go back to reference Kougka, G., Gounaris, A., Leser, U.: Modeling data flow execution in a parallel environment. In: DaWaK (2017) Kougka, G., Gounaris, A., Leser, U.: Modeling data flow execution in a parallel environment. In: DaWaK (2017)
63.
go back to reference Kougka, G., Gounaris, A., Tsichlas, K.: Practical algorithms for execution engine selection in data flows. Future Gener. Comput. Syst. 45, 133–148 (2015)CrossRef Kougka, G., Gounaris, A., Tsichlas, K.: Practical algorithms for execution engine selection in data flows. Future Gener. Comput. Syst. 45, 133–148 (2015)CrossRef
64.
go back to reference Krishnamurthy, R., Boral, H., Zaniolo, C.: Optimization of nonrecursive queries. In: VLDB, pp. 128–137 (1986) Krishnamurthy, R., Boral, H., Zaniolo, C.: Optimization of nonrecursive queries. In: VLDB, pp. 128–137 (1986)
65.
go back to reference Kumar, N., Kumar, P.S.: An efficient heuristic for logical optimization of ETL workflows. In: BIRTE, pp. 68–83 (2010) Kumar, N., Kumar, P.S.: An efficient heuristic for logical optimization of ETL workflows. In: BIRTE, pp. 68–83 (2010)
66.
go back to reference Kumar, V.S., Sadayappan, P., Mehta, G., Vahi, K., Deelman, E., Ratnakar, V., Kim, J., Gil, Y., Hall, M., Kurc, T., Saltz, J.: An integrated framework for parameter-based optimization of scientific workflows. In: HPDC, pp. 177–186 (2009) Kumar, V.S., Sadayappan, P., Mehta, G., Vahi, K., Deelman, E., Ratnakar, V., Kim, J., Gil, Y., Hall, M., Kurc, T., Saltz, J.: An integrated framework for parameter-based optimization of scientific workflows. In: HPDC, pp. 177–186 (2009)
67.
go back to reference Kumbhare, A.G., Simmhan, Y., Prasanna, V.K.: Exploiting application dynamism and cloud elasticity for continuous dataflows. In: SC, p. 57 (2013) Kumbhare, A.G., Simmhan, Y., Prasanna, V.K.: Exploiting application dynamism and cloud elasticity for continuous dataflows. In: SC, p. 57 (2013)
68.
go back to reference Kyriazis, D., Tserpes, K., Menychtas, A., Litke, A., Varvarigou, T.A.: An innovative workflow mapping mechanism for grids in the frame of quality of service. Future Gener. Comput. Syst. 24(6), 498–511 (2008)CrossRef Kyriazis, D., Tserpes, K., Menychtas, A., Litke, A., Varvarigou, T.A.: An innovative workflow mapping mechanism for grids in the frame of quality of service. Future Gener. Comput. Syst. 24(6), 498–511 (2008)CrossRef
69.
go back to reference Li, C.: Computing complete answers to queries in the presence of limited access patterns. VLDB J. 12(3), 211–227 (2003)CrossRef Li, C.: Computing complete answers to queries in the presence of limited access patterns. VLDB J. 12(3), 211–227 (2003)CrossRef
70.
go back to reference Lim, H., Herodotou, H., Babu, S.: Stubby: a transformation-based optimizer for mapreduce workflows. Proc. VLDB Endow. 5(11), 1196–1207 (2012)CrossRef Lim, H., Herodotou, H., Babu, S.: Stubby: a transformation-based optimizer for mapreduce workflows. Proc. VLDB Endow. 5(11), 1196–1207 (2012)CrossRef
71.
go back to reference Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)CrossRef Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)CrossRef
72.
go back to reference Liu, X., Iftikhar, N.: An ETL optimization framework using partitioning and parallelization. In: SAC’15 (2015) Liu, X., Iftikhar, N.: An ETL optimization framework using partitioning and parallelization. In: SAC’15 (2015)
73.
go back to reference Nguyen, P., Hilario, M., Kalousis, A.: Using meta-mining to support data mining workflow planning and optimization. J. Artif. Intell. Res. 51, 605–644 (2014)CrossRef Nguyen, P., Hilario, M., Kalousis, A.: Using meta-mining to support data mining workflow planning and optimization. J. Artif. Intell. Res. 51, 605–644 (2014)CrossRef
74.
go back to reference Ogasawara, E.S., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., Mattoso, M.: An algebraic approach for data-centric scientific workflows. PVLDB 4(12), 1328–1339 (2011) Ogasawara, E.S., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., Mattoso, M.: An algebraic approach for data-centric scientific workflows. PVLDB 4(12), 1328–1339 (2011)
75.
go back to reference Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008) Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008)
76.
go back to reference Pietri, I., Juve, G., Deelman, E., Sakellariou, R.: A performance model to estimate execution time of scientific workflows on the cloud. In: Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, pp. 11–19. IEEE Press (2014) Pietri, I., Juve, G., Deelman, E., Sakellariou, R.: A performance model to estimate execution time of scientific workflows on the cloud. In: Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, pp. 11–19. IEEE Press (2014)
77.
go back to reference Plankensteiner, K., Prodan, R.: Meeting soft deadlines in scientific workflows using resubmission impact. IEEE Trans. Parallel Distrib. Syst. 23(5), 890–901 (2012)CrossRef Plankensteiner, K., Prodan, R.: Meeting soft deadlines in scientific workflows using resubmission impact. IEEE Trans. Parallel Distrib. Syst. 23(5), 890–901 (2012)CrossRef
78.
go back to reference Preda, N., Kasneci, G., Suchanek, F.M., Neumann, T., Yuan, W., Weikum, G.: Active knowledge: dynamically enriching RDF knowledge bases by web services. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, IN, June 6–10, 2010, pp. 399–410 (2010) Preda, N., Kasneci, G., Suchanek, F.M., Neumann, T., Yuan, W., Weikum, G.: Active knowledge: dynamically enriching RDF knowledge bases by web services. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, IN, June 6–10, 2010, pp. 399–410 (2010)
79.
go back to reference Quiroz, A., Huang, E., Ceriani, L.: A robust and extensible tool for data integration using data type models. In: Proceedings of the Twenty-Ninth AAAI, pp. 3993–3998 (2015) Quiroz, A., Huang, E., Ceriani, L.: A robust and extensible tool for data integration using data type models. In: Proceedings of the Twenty-Ninth AAAI, pp. 3993–3998 (2015)
80.
go back to reference Rahman, M., Hassan, M.R., Ranjan, R., Buyya, R.: Adaptive workflow scheduling for dynamic grid and cloud computing environment. Concurr. Comput. Pract. Exp. 25(13), 1816–1842 (2013)CrossRef Rahman, M., Hassan, M.R., Ranjan, R., Buyya, R.: Adaptive workflow scheduling for dynamic grid and cloud computing environment. Concurr. Comput. Pract. Exp. 25(13), 1816–1842 (2013)CrossRef
81.
go back to reference Rheinländer, A., Heise, A., Hueske, F., Leser, U., Naumann, F.: SOFA: an extensible logical optimizer for udf-heavy data flows. Inf. Syst. 52, 96–125 (2015)CrossRef Rheinländer, A., Heise, A., Hueske, F., Leser, U., Naumann, F.: SOFA: an extensible logical optimizer for udf-heavy data flows. Inf. Syst. 52, 96–125 (2015)CrossRef
82.
go back to reference Schikuta, E., Wanek, H., Ul Haq, I.: Grid workflow optimization regarding dynamically changing resources and conditions. Concurr. Comput. Pract. Exp. 20, 1837–1849 (2008)CrossRef Schikuta, E., Wanek, H., Ul Haq, I.: Grid workflow optimization regarding dynamically changing resources and conditions. Concurr. Comput. Pract. Exp. 20, 1837–1849 (2008)CrossRef
83.
go back to reference Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34 (1979) Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34 (1979)
84.
go back to reference Shi, J., Zou, J., Lu, J., Cao, Z., Li, S., Wang, C.: MRTuner: a toolkit to enable holistic optimization for mapreduce jobs. Proc. VLDB Endow. 7(13), 1319–1330 (2014)CrossRef Shi, J., Zou, J., Lu, J., Cao, Z., Li, S., Wang, C.: MRTuner: a toolkit to enable holistic optimization for mapreduce jobs. Proc. VLDB Endow. 7(13), 1319–1330 (2014)CrossRef
85.
go back to reference Shivam, P., Babu, S., Chase, J.S.: Active and accelerated learning of cost models for optimizing scientific applications. In: VLDB, pp. 535–546 (2006) Shivam, P., Babu, S., Chase, J.S.: Active and accelerated learning of cost models for optimizing scientific applications. In: VLDB, pp. 535–546 (2006)
86.
go back to reference Simitsis, A., Vassiliadis, P., Dayal, U., Karagiannis, A., Tziovara, V.: Benchmarking ETL workflows. In: TPCTC 2009, 199–220 (2009) Simitsis, A., Vassiliadis, P., Dayal, U., Karagiannis, A., Tziovara, V.: Benchmarking ETL workflows. In: TPCTC 2009, 199–220 (2009)
87.
go back to reference Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-space optimization of ETL workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)CrossRef Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-space optimization of ETL workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)CrossRef
88.
go back to reference Simitsis, A., Wilkinson, K.: Revisiting ETL benchmarking: the case for hybrid flows. In: TPCTC, pp. 75–91 (2012) Simitsis, A., Wilkinson, K.: Revisiting ETL benchmarking: the case for hybrid flows. In: TPCTC, pp. 75–91 (2012)
89.
go back to reference Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: QoX-driven ETL design: reducing the cost of ETL consulting engagements. In: Proceedings of the SIGMOD, pp. 953–960 (2009) Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: QoX-driven ETL design: reducing the cost of ETL consulting engagements. In: Proceedings of the SIGMOD, pp. 953–960 (2009)
90.
go back to reference Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Optimizing analytic data flows for multiple execution engines. In: SIGMOD Conference, pp. 829–840 (2012) Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Optimizing analytic data flows for multiple execution engines. In: SIGMOD Conference, pp. 829–840 (2012)
91.
go back to reference Simitsis, A., Wilkinson, K., Dayal, U.: Hybrid analytic flows—the case for optimization. Fund. Inf. 128(3), 303–335 (2013) Simitsis, A., Wilkinson, K., Dayal, U.: Hybrid analytic flows—the case for optimization. Fund. Inf. 128(3), 303–335 (2013)
92.
go back to reference Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: ICDE, pp. 385–396 (2010) Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: ICDE, pp. 385–396 (2010)
93.
go back to reference Simitsis, A., Wilkinson, K., Dayal, U., Hsu, M.: HFMS: managing the lifecycle and complexity of hybrid analytic data flows. In: ICDE, pp. 1174–1185 (2013) Simitsis, A., Wilkinson, K., Dayal, U., Hsu, M.: HFMS: managing the lifecycle and complexity of hybrid analytic data flows. In: ICDE, pp. 1174–1185 (2013)
94.
go back to reference Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: Proceedings of VLDB, pp. 355–366 (2006) Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: Proceedings of VLDB, pp. 355–366 (2006)
95.
go back to reference Tan, W., Sun, Y., Lu, G., Tang, A., Cui, L.: Trust services-oriented multi-objects workflow scheduling model for cloud computing. In: ICPCA/SWS, pp. 617–630 (2012) Tan, W., Sun, Y., Lu, G., Tang, A., Cui, L.: Trust services-oriented multi-objects workflow scheduling model for cloud computing. In: ICPCA/SWS, pp. 617–630 (2012)
96.
go back to reference Tao, F., Zhang, L., Laili, Y.: Configurable Intelligent Optimization Algorithm: Design and Practice in Manufacturing. Springer, New York, Incorporated (2014) Tao, F., Zhang, L., Laili, Y.: Configurable Intelligent Optimization Algorithm: Design and Practice in Manufacturing. Springer, New York, Incorporated (2014)
97.
go back to reference Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Brief announcement: on the quest of optimal service ordering in decentralized queries. In: Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing, PODC 2010, Zurich, July 25–28, 2010, pp. 277–278 (2010) Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Brief announcement: on the quest of optimal service ordering in decentralized queries. In: Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing, PODC 2010, Zurich, July 25–28, 2010, pp. 277–278 (2010)
98.
go back to reference Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Decentralized execution of linear workflows over web services. Future Gener. Comput. Syst. 27(3), 341–347 (2011)CrossRef Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Decentralized execution of linear workflows over web services. Future Gener. Comput. Syst. 27(3), 341–347 (2011)CrossRef
99.
go back to reference Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Optimal service ordering in decentralized queries over web services. IJKBO 1(2), 1–16 (2011) Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Optimal service ordering in decentralized queries over web services. IJKBO 1(2), 1–16 (2011)
100.
go back to reference Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Queries over web services. In: New Directions in Web Data Management, vol. 1, pp. 139–169 (2011) Tsamoura, E., Gounaris, A., Manolopoulos, Y.: Queries over web services. In: New Directions in Web Data Management, vol. 1, pp. 139–169 (2011)
101.
go back to reference Tziovara, V., Vassiliadis, P., Simitsis, A.: Deciding the physical implementation of ETL workflows. In: Proceedings of the ACM 10th International Workshop on Data Warehousing and OLAP DOLAP, pp. 49–56 (2007) Tziovara, V., Vassiliadis, P., Simitsis, A.: Deciding the physical implementation of ETL workflows. In: Proceedings of the ACM 10th International Workshop on Data Warehousing and OLAP DOLAP, pp. 49–56 (2007)
102.
go back to reference Varol, Y.L., Rotem, D.: An algorithm to generate all topological sorting arrangements. Comput. J. 24(1), 83–84 (1981)CrossRefMATH Varol, Y.L., Rotem, D.: An algorithm to generate all topological sorting arrangements. Comput. J. 24(1), 83–84 (1981)CrossRefMATH
103.
go back to reference Vassiliadis, P.: A survey of extract–transform–load technology. IJDWM 5(3), 1–27 (2009) Vassiliadis, P.: A survey of extract–transform–load technology. IJDWM 5(3), 1–27 (2009)
104.
go back to reference Vassiliadis, P., Simitsis, A., Baikousi, E.: A taxonomy of ETL activities. In: DOLAP 2009, ACM 12th International Workshop on Data Warehousing and OLAP, Hong Kong, November 6, 2009, Proceedings, pp. 25–32 (2009) Vassiliadis, P., Simitsis, A., Baikousi, E.: A taxonomy of ETL activities. In: DOLAP 2009, ACM 12th International Workshop on Data Warehousing and OLAP, Hong Kong, November 6, 2009, Proceedings, pp. 25–32 (2009)
105.
go back to reference vom Brocke, J., Sonnenberg, C.: Business process management and business process analysis. In: Information Systems and Information Technology. Computing Handbook, 3rd edn., pp. 26: 1–31 (2014) vom Brocke, J., Sonnenberg, C.: Business process management and business process analysis. In: Information Systems and Information Technology. Computing Handbook, 3rd edn., pp. 26: 1–31 (2014)
106.
go back to reference Vrhovnik, M., Schwarz, H., Radeschütz, S., Mitschang, B.: An overview of SQL support in workflow products. In: Proceedings of ICDE, pp. 1287–1296 (2008) Vrhovnik, M., Schwarz, H., Radeschütz, S., Mitschang, B.: An overview of SQL support in workflow products. In: Proceedings of ICDE, pp. 1287–1296 (2008)
107.
go back to reference Vrhovnik, M., Schwarz, H., Suhre, O., Mitschang, B., Markl, V., Maier, A., Kraft, T.: An approach to optimize data processing in business processes. In: VLDB, pp. 615–626 (2007) Vrhovnik, M., Schwarz, H., Suhre, O., Mitschang, B., Markl, V., Maier, A., Kraft, T.: An approach to optimize data processing in business processes. In: VLDB, pp. 615–626 (2007)
108.
go back to reference Vu, L.H., Hauswirth, M., Aberer, K.: Qos-based service selection and ranking with trust and reputation management. In: Proceedings of the Cooperative Information System Conference (CoopIS05, pp. 466–483 (2005) Vu, L.H., Hauswirth, M., Aberer, K.: Qos-based service selection and ranking with trust and reputation management. In: Proceedings of the Cooperative Information System Conference (CoopIS05, pp. 466–483 (2005)
109.
go back to reference Whrer, A., Brezany, P., Janciak, I., Mehofer, E.: Modeling and optimizing large-scale data flows. Future Gener. Comput. Syst. 31, 12–27 (2014)CrossRef Whrer, A., Brezany, P., Janciak, I., Mehofer, E.: Modeling and optimizing large-scale data flows. Future Gener. Comput. Syst. 31, 12–27 (2014)CrossRef
110.
go back to reference Wohlin, C.: Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, EASE’14, pp. 38:1–38:10 (2014) Wohlin, C.: Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, EASE’14, pp. 38:1–38:10 (2014)
111.
go back to reference Yerneni, R., Li, C., Ullman, J.D., Garcia-Molina, H.: Optimizing large join queries in mediation systems. In: ICDT, pp. 348–364 (1999) Yerneni, R., Li, C., Ullman, J.D., Garcia-Molina, H.: Optimizing large join queries in mediation systems. In: ICDT, pp. 348–364 (1999)
112.
go back to reference Zeng, L., Veeravalli, B., Zomaya, A.Y.: An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J. Netw. Comput. Appl. 50, 39–48 (2015)CrossRef Zeng, L., Veeravalli, B., Zomaya, A.Y.: An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J. Netw. Comput. Appl. 50, 39–48 (2015)CrossRef
113.
go back to reference Zhou, A.C., He, B., Liu, C.: Monetary cost optimizations for hosting workflow-as-a-service in IaaS clouds. IEEE Trans. Cloud Comput. 4(1), 34–48 (2016)CrossRef Zhou, A.C., He, B., Liu, C.: Monetary cost optimizations for hosting workflow-as-a-service in IaaS clouds. IEEE Trans. Cloud Comput. 4(1), 34–48 (2016)CrossRef
114.
go back to reference Zinn, D., Bowers, S., McPhillips, T., Ludäscher, B.: Scientific workflow design with data assembly lines. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, pp. 14:1–14:10 (2009) Zinn, D., Bowers, S., McPhillips, T., Ludäscher, B.: Scientific workflow design with data assembly lines. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, pp. 14:1–14:10 (2009)
Metadata
Title
The many faces of data-centric workflow optimization: a survey
Authors
Georgia Kougka
Anastasios Gounaris
Alkis Simitsis
Publication date
06-03-2018
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 2/2018
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-018-0107-0

Premium Partner