Skip to main content
Erschienen in: Datenbank-Spektrum 1/2013

01.03.2013 | Schwerpunktbeitrag

Compilation of Query Languages into MapReduce

verfasst von: Caetano Sauer, Theo Härder

Erschienen in: Datenbank-Spektrum | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The introduction of MapReduce as a tool for Big Data Analytics, combined with the new requirements of emerging application scenarios such as the Web 2.0 and scientific computing, has motivated the development of data processing languages which are more flexible and widely applicable than SQL. Based on the Big Data context, we discuss the points in which SQL is considered too restrictive. Furthermore, we provide a qualitative evaluation of how recent query languages overcome these restrictions. Having established the desired characteristics of a query language, we provide an abstract description of the compilation into the MapReduce programming model, which, up to minor variations, is essentially the same in all approaches. Given the requirements of query processing, we introduce simple generalizations of the model, which allow the reuse of well-established query evaluation techniques, and discuss strategies to generate optimized MapReduce plans.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Fußnoten
1
Although cycles are necessary to model more “exotic” operations such as recursive queries.
 
2
Newer revisions of SQL actually include the types MULTISET and TUPLE, but they are not used transparently as the type of stored tables and intermediate results.
 
3
Note that the MapReduce authors ambiguously refer to map as the first-order function that, in the functional programming setting, is actually a parameter to the higher-order function map.
 
4
Note that if we consider the implementation of Shuffle as a two-phase step—first sorting each map task output locally and then globally merging all partitions—, the first phase can start before all map tasks are completed. Nevertheless, because the merge phase still needs to wait, the whole process is itself synchronous.
 
Literatur
1.
Zurück zum Zitat Afrati FN, Ullman JD (2011) Optimizing multiway joins in a map-reduce environment. IEEE Trans Knowl Data Eng 23(9):1282–1298 CrossRef Afrati FN, Ullman JD (2011) Optimizing multiway joins in a map-reduce environment. IEEE Trans Knowl Data Eng 23(9):1282–1298 CrossRef
2.
Zurück zum Zitat Bächle S (2012) Separating key concerns in query processing—set orientation, physical data independence, and parallelism. PhD thesis, University of Kaiserslautern, Germany Bächle S (2012) Separating key concerns in query processing—set orientation, physical data independence, and parallelism. PhD thesis, University of Kaiserslautern, Germany
3.
Zurück zum Zitat Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: SoCC, pp 119–130 CrossRef Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: SoCC, pp 119–130 CrossRef
4.
Zurück zum Zitat Beyer KS, Ercegovac V, Gemulla R, Balmin A, Eltabakh MY, Kanne CC, Özcan F, Shekita EJ (2011) Jaql: a scripting language for large-scale semistructured data analysis. Proc VLDB Endow 4(12):1272–1283 Beyer KS, Ercegovac V, Gemulla R, Balmin A, Eltabakh MY, Kanne CC, Özcan F, Shekita EJ (2011) Jaql: a scripting language for large-scale semistructured data analysis. Proc VLDB Endow 4(12):1272–1283
5.
Zurück zum Zitat Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77 CrossRef Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77 CrossRef
6.
Zurück zum Zitat Dittrich J, Quiané-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc VLDB Endow 3(1):518–529 Dittrich J, Quiané-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc VLDB Endow 3(1):518–529
7.
Zurück zum Zitat Gates A, Natkovich O, Chopra S, Kamath P, Narayanam S, Olston C, Reed B, Srinivasan S, Srivastava U (2009) Building a high-level dataflow system on top of MapReduce: the pig experience. Proc VLDB Endow 2(2):1414–1425 Gates A, Natkovich O, Chopra S, Kamath P, Narayanam S, Olston C, Reed B, Srinivasan S, Srivastava U (2009) Building a high-level dataflow system on top of MapReduce: the pig experience. Proc VLDB Endow 2(2):1414–1425
8.
Zurück zum Zitat Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–170 CrossRef Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–170 CrossRef
9.
Zurück zum Zitat Herodotou H, Babu S (2011) Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc VLDB Endow 4(11):1111–1122 Herodotou H, Babu S (2011) Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc VLDB Endow 4(11):1111–1122
10.
Zurück zum Zitat Hueske F, Peters M, Sax M, Rheinländer A, Bergmann R, Krettek A, Tzoumas K (2012) Opening the black boxes in data flow optimization. Proc VLDB Endow 5(11):1256–1267 Hueske F, Peters M, Sax M, Rheinländer A, Bergmann R, Krettek A, Tzoumas K (2012) Opening the black boxes in data flow optimization. Proc VLDB Endow 5(11):1256–1267
11.
Zurück zum Zitat Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys, pp 59–72 CrossRef Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys, pp 59–72 CrossRef
12.
Zurück zum Zitat Jahani E, Cafarella MJ, Ré C (2011) Automatic optimization for MapReduce programs. Proc VLDB Endow 4(6):385–396 Jahani E, Cafarella MJ, Ré C (2011) Automatic optimization for MapReduce programs. Proc VLDB Endow 4(6):385–396
13.
Zurück zum Zitat Lämmel R (2008) Google’s MapReduce programming mModel—revisited. Sci Comput Program 70(1):1–30 MATHCrossRef Lämmel R (2008) Google’s MapReduce programming mModel—revisited. Sci Comput Program 70(1):1–30 MATHCrossRef
14.
Zurück zum Zitat Okcan A, Riedewald M (2011) Processing theta-joins using MapReduce. In: SIGMOD conference, pp 949–960 Okcan A, Riedewald M (2011) Processing theta-joins using MapReduce. In: SIGMOD conference, pp 949–960
15.
Zurück zum Zitat Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig Latin: a not-so-foreign language for data processing. In: SIGMOD conference, pp 1099–1110 CrossRef Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig Latin: a not-so-foreign language for data processing. In: SIGMOD conference, pp 1099–1110 CrossRef
16.
Zurück zum Zitat Pike R, Dorward S, Griesemer R, Quinlan S (2005) Interpreting the data: parallel analysis with Sawzall. Sci Program 13(4):277–298 Pike R, Dorward S, Griesemer R, Quinlan S (2005) Interpreting the data: parallel analysis with Sawzall. Sci Program 13(4):277–298
17.
Zurück zum Zitat Sauer C, Bächle S, Härder T (2012) Versatile query processing in the MapReduce framework based on XQuery (submitted) Sauer C, Bächle S, Härder T (2012) Versatile query processing in the MapReduce framework based on XQuery (submitted)
18.
Zurück zum Zitat Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Anthony S, Liu H, Murthy R (2010) Hive—a petabyte scale data warehouse using Hadoop. In: ICDE conference, pp 996–1005 Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Anthony S, Liu H, Murthy R (2010) Hive—a petabyte scale data warehouse using Hadoop. In: ICDE conference, pp 996–1005
21.
Zurück zum Zitat White T (2011) Hadoop—the definitive guide: storage and analysis at Internet scale, 2nd edn. O’Reilly, Sebastopol White T (2011) Hadoop—the definitive guide: storage and analysis at Internet scale, 2nd edn. O’Reilly, Sebastopol
22.
Zurück zum Zitat Zhang X, Chen L, Wang M (2012) Efficient multi-way theta-join processing using MapReduce. Proc VLDB Endow 5(11):1184–1195 Zhang X, Chen L, Wang M (2012) Efficient multi-way theta-join processing using MapReduce. Proc VLDB Endow 5(11):1184–1195
Metadaten
Titel
Compilation of Query Languages into MapReduce
verfasst von
Caetano Sauer
Theo Härder
Publikationsdatum
01.03.2013
Verlag
Springer-Verlag
Erschienen in
Datenbank-Spektrum / Ausgabe 1/2013
Print ISSN: 1618-2162
Elektronische ISSN: 1610-1995
DOI
https://doi.org/10.1007/s13222-012-0112-8

Weitere Artikel der Ausgabe 1/2013

Datenbank-Spektrum 1/2013 Zur Ausgabe

Editorial

Editorial

Community

News

Premium Partner