Skip to main content

2017 | OriginalPaper | Buchkapitel

Optimal Task Ordering in Chain Data Flows: Exploring the Practicality of Non-scalable Solutions

verfasst von : Georgia Kougka, Anastasios Gounaris

Erschienen in: Big Data Analytics and Knowledge Discovery

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Modern data flows generalize traditional Extract-Transform-Load and data integration workflows in order to enable end-to-end data processing and analytics. The more complex they become, the more pressing the need for automated optimization solutions. Optimizing data flows comes in several forms, among which, optimal task ordering is one of the most challenging ones. We take a practical approach; motivated by real-world examples, such as those captured by the TPC-DI benchmark, we argue that exhaustive non-scalable solutions are indeed a valid choice for chain flows. Our contribution is that we thoroughly discuss the three main directions for exhaustive enumeration of task ordering alternatives, namely backtracking, dynamic programming and topological sorting, and we provide concrete evidence up to which size and level of flexibility of chain flows they can be applied.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
An abstract of these ideas, without considering TPC-DI, have appeared in [11] in less than a page.
 
Literatur
1.
Zurück zum Zitat Agrawal, K., Benoit, A., Dufossé, F., Robert, Y.: Mapping filtering streaming applications. Algorithmica 62(1–2), 258–308 (2012)MathSciNetCrossRefMATH Agrawal, K., Benoit, A., Dufossé, F., Robert, Y.: Mapping filtering streaming applications. Algorithmica 62(1–2), 258–308 (2012)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Burge, J., Munagala, K., Srivastava, U.: Ordering pipelined query operators with precedence constraints. Technical report 2005–40, Stanford InfoLab (2005) Burge, J., Munagala, K., Srivastava, U.: Ordering pipelined query operators with precedence constraints. Technical report 2005–40, Stanford InfoLab (2005)
3.
Zurück zum Zitat Chaudhuri, S.: An overview of query optimization in relational systems. In: PODS, pp. 34–43 (1998) Chaudhuri, S.: An overview of query optimization in relational systems. In: PODS, pp. 34–43 (1998)
4.
Zurück zum Zitat Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54, 88–98 (2011)CrossRef Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54, 88–98 (2011)CrossRef
5.
Zurück zum Zitat Florescu, D., Levy, A., Manolescu, I., Suciu, D.: Query optimization in the presence of limited access patterns. In: SIGMOD, pp. 311–322. ACM (1999) Florescu, D., Levy, A., Manolescu, I., Suciu, D.: Query optimization in the presence of limited access patterns. In: SIGMOD, pp. 311–322. ACM (1999)
6.
Zurück zum Zitat Grehant, X., Demeure, I., Jarp, S.: A survey of task mapping on production grids. ACM Comput. Surv. 45(3), 37:1–37:25 (2013)CrossRefMATH Grehant, X., Demeure, I., Jarp, S.: A survey of task mapping on production grids. ACM Comput. Surv. 45(3), 37:1–37:25 (2013)CrossRefMATH
7.
Zurück zum Zitat Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an ETL workflow. In: EDBT, pp. 307–318 (2014) Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an ETL workflow. In: EDBT, pp. 307–318 (2014)
8.
Zurück zum Zitat Hueske, F., Peters, M., Sax, M., Rheinländer, A., Bergmann, R., Krettek, A., Tzoumas, K.: Opening the black boxes in data flow optimization. PVLDB 5(11), 1256–1267 (2012) Hueske, F., Peters, M., Sax, M., Rheinländer, A., Bergmann, R., Krettek, A., Tzoumas, K.: Opening the black boxes in data flow optimization. PVLDB 5(11), 1256–1267 (2012)
9.
Zurück zum Zitat Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)CrossRef Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)CrossRef
10.
Zurück zum Zitat Jovanovic, P., Romero, O., Abelló, A.: A unified view of data-intensive flows in business intelligence systems: a survey. T. Large-Scale Data Knowl. Centered Syst. 29, 66–107 (2016) Jovanovic, P., Romero, O., Abelló, A.: A unified view of data-intensive flows in business intelligence systems: a survey. T. Large-Scale Data Knowl. Centered Syst. 29, 66–107 (2016)
11.
Zurück zum Zitat Kougka, G., Gounaris, A.: Optimization of data-intensive flows: is it needed? Is it solved? In: DOLAP, pp. 95–98 (2014) Kougka, G., Gounaris, A.: Optimization of data-intensive flows: is it needed? Is it solved? In: DOLAP, pp. 95–98 (2014)
12.
Zurück zum Zitat Kougka, G., Gounaris, A., Simitsis, A.: The many faces of data-centric workflow optimization: a survey. CoRR, abs/1701.07723 (2017) Kougka, G., Gounaris, A., Simitsis, A.: The many faces of data-centric workflow optimization: a survey. CoRR, abs/1701.07723 (2017)
13.
Zurück zum Zitat Ogasawara, E.S., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., Mattoso, M.: An algebraic approach for data-centric scientific workflows. PVLDB 4, 1328–1339 (2011) Ogasawara, E.S., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., Mattoso, M.: An algebraic approach for data-centric scientific workflows. PVLDB 4, 1328–1339 (2011)
14.
Zurück zum Zitat Poess, M., Rabl, T., Caufield, B.: TPC-DI: the first industry benchmark for data integration. PVLDB 7(13), 1367–1378 (2014) Poess, M., Rabl, T., Caufield, B.: TPC-DI: the first industry benchmark for data integration. PVLDB 7(13), 1367–1378 (2014)
15.
Zurück zum Zitat Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979) Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)
16.
Zurück zum Zitat Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-space optimization of ETL workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)CrossRef Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-space optimization of ETL workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)CrossRef
17.
Zurück zum Zitat Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Optimizing analytic data flows for multiple execution engines. In: SIGMOD, pp. 829–840 (2012) Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Optimizing analytic data flows for multiple execution engines. In: SIGMOD, pp. 829–840 (2012)
18.
Zurück zum Zitat Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: Proceedings of the 32nd International Conference on Very Large Data Bases VLDB, pp. 355–366 (2006) Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: Proceedings of the 32nd International Conference on Very Large Data Bases VLDB, pp. 355–366 (2006)
19.
Zurück zum Zitat Varol, Y.L., Rotem, D.: An algorithm to generate all topological sorting arrangements. Comput. J. 24(1), 83–84 (1981)CrossRefMATH Varol, Y.L., Rotem, D.: An algorithm to generate all topological sorting arrangements. Comput. J. 24(1), 83–84 (1981)CrossRefMATH
20.
Zurück zum Zitat Yerneni, R., Li, C., Ullman, J., Garcia-Molina, H.: Optimizing large join queries in mediation systems. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 348–364. Springer, Heidelberg (1999). doi:10.1007/3-540-49257-7_22 CrossRef Yerneni, R., Li, C., Ullman, J., Garcia-Molina, H.: Optimizing large join queries in mediation systems. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 348–364. Springer, Heidelberg (1999). doi:10.​1007/​3-540-49257-7_​22 CrossRef
21.
Zurück zum Zitat Zinn, D., Bowers, S., McPhillips, T., Ludäscher, B.: Scientific workflow design with data assembly lines. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS 2009, pp. 14:1–14:10. ACM (2009) Zinn, D., Bowers, S., McPhillips, T., Ludäscher, B.: Scientific workflow design with data assembly lines. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS 2009, pp. 14:1–14:10. ACM (2009)
Metadaten
Titel
Optimal Task Ordering in Chain Data Flows: Exploring the Practicality of Non-scalable Solutions
verfasst von
Georgia Kougka
Anastasios Gounaris
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64283-3_2