Skip to main content

2016 | OriginalPaper | Buchkapitel

Big Data Management in the Cloud: Evolution or Crossroad?

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we try to provide a synthetic and comprehensive state of the art concerning big data management in cloud environments. In this perspective, data management based on parallel and cloud (e.g. MapReduce) systems are overviewed, and compared by relying on meeting software requirements (e.g. data independence, software reuse), high performance, scalability, elasticity, and data availability. With respect to proposed cloud systems, we discuss evolution of their data manipulation languages and we try to learn some lessons should be exploited to ensure the viability of the next generation of large-scale data management systems for big data applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Akbarinia, R., Liroz-Gistau, M., Agrawal, D., Valduriez, P.: An efficient solution for processing skewed mapreduce jobs. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9262, pp. 417–429. Springer, Heidelberg (2015)CrossRef Akbarinia, R., Liroz-Gistau, M., Agrawal, D., Valduriez, P.: An efficient solution for processing skewed mapreduce jobs. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9262, pp. 417–429. Springer, Heidelberg (2015)CrossRef
5.
Zurück zum Zitat Baru, C.K., Fecteau, G., Goyal, A., Hsiao, H., Jhingran, A., Padmanabhan, S., Wilson, W.G.: An overview of DB2 parallel edition. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, 22–25 May 1995, pp. 460–462 (1995). http://doi.acm.org/10.1145/223784.223876 Baru, C.K., Fecteau, G., Goyal, A., Hsiao, H., Jhingran, A., Padmanabhan, S., Wilson, W.G.: An overview of DB2 parallel edition. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, 22–25 May 1995, pp. 460–462 (1995). http://​doi.​acm.​org/​10.​1145/​223784.​223876
7.
Zurück zum Zitat Bondiombouy, C., Kolev, B., Levchenko, O., Valduriez, P.: Integrating big data and relational data with a functional SQL-like query language. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 170–185. Springer, Heidelberg (2015)CrossRef Bondiombouy, C., Kolev, B., Levchenko, O., Valduriez, P.: Integrating big data and relational data with a functional SQL-like query language. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 170–185. Springer, Heidelberg (2015)CrossRef
11.
Zurück zum Zitat Chang, L., Wang, Z., Ma, T., Jian, L., Ma, L., Goldshuv, A., Lonergan, L., Cohen, J., Welton, C., Sherry, G., Bhandarkar, M.: HAWQ: a massively parallel processing SQL engine in hadoop. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 1223–1234 (2014). http://doi.acm.org/10.1145/2588555.2595636 Chang, L., Wang, Z., Ma, T., Jian, L., Ma, L., Goldshuv, A., Lonergan, L., Cohen, J., Welton, C., Sherry, G., Bhandarkar, M.: HAWQ: a massively parallel processing SQL engine in hadoop. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 1223–1234 (2014). http://​doi.​acm.​org/​10.​1145/​2588555.​2595636
12.
Zurück zum Zitat Chaudhuri, S.: What next?: a half-dozen data management research goals for big data and the cloud. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, 20–24 May 2012, pp. 1–4 (2012). http://doi.acm.org/10.1145/2213556.2213558 Chaudhuri, S.: What next?: a half-dozen data management research goals for big data and the cloud. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, 20–24 May 2012, pp. 1–4 (2012). http://​doi.​acm.​org/​10.​1145/​2213556.​2213558
13.
Zurück zum Zitat Chekuri, C., Hasan, W., Motwani, R.: Scheduling problems in parallel query optimization. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Jose, California, USA, 22–25 May 1995, pp. 255–265 (1995). http://doi.acm.org/10.1145/212433.212471 Chekuri, C., Hasan, W., Motwani, R.: Scheduling problems in parallel query optimization. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Jose, California, USA, 22–25 May 1995, pp. 255–265 (1995). http://​doi.​acm.​org/​10.​1145/​212433.​212471
14.
Zurück zum Zitat Chen, M., Lo, M., Yu, P.S., Young, H.C.: Using segmented right-deep trees for the execution of pipelined hash joins. In: Proceedings of 18th International Conference on Very Large Data Bases, Vancouver, Canada, 23–27 August 1992, pp. 15–26 (1992). http://www.vldb.org/conf/1992/P015.PDF Chen, M., Lo, M., Yu, P.S., Young, H.C.: Using segmented right-deep trees for the execution of pipelined hash joins. In: Proceedings of 18th International Conference on Very Large Data Bases, Vancouver, Canada, 23–27 August 1992, pp. 15–26 (1992). http://​www.​vldb.​org/​conf/​1992/​P015.​PDF
17.
19.
Zurück zum Zitat DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 2007, Stevenson, Washington, USA, 14–17 October 2007, pp. 205–220 (2007). http://doi.acm.org/10.1145/1294261.1294281 DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 2007, Stevenson, Washington, USA, 14–17 October 2007, pp. 205–220 (2007). http://​doi.​acm.​org/​10.​1145/​1294261.​1294281
21.
Zurück zum Zitat DeWitt, D.J., Halverson, A., Nehme, R.V., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, 22–27 June 2013, pp. 1255–1266 (2013). http://doi.acm.org/10.1145/2463676.2463709 DeWitt, D.J., Halverson, A., Nehme, R.V., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, 22–27 June 2013, pp. 1255–1266 (2013). http://​doi.​acm.​org/​10.​1145/​2463676.​2463709
30.
Zurück zum Zitat Hameurlain, A., Morvan, F.: An optimization method of data communication and control for parallel execution of SQL queries. In: Proceedings of 4th International Conference on Database and Expert Systems Applications, DEXA 1993, Prague, Czech Republic, 6–8 September 1993, pp. 301–312 (1993). http://dx.doi.org/10.1007/3-540-57234-1_27 Hameurlain, A., Morvan, F.: An optimization method of data communication and control for parallel execution of SQL queries. In: Proceedings of 4th International Conference on Database and Expert Systems Applications, DEXA 1993, Prague, Czech Republic, 6–8 September 1993, pp. 301–312 (1993). http://​dx.​doi.​org/​10.​1007/​3-540-57234-1_​27
31.
Zurück zum Zitat Hameurlain, A., Morvan, F.: A parallel scheduling method for efficient query processing. In: Proceedings of the 1993 International Conference on Parallel Processing. Algorithms & Applications, Syracuse University, NY, USA, 16–20 August 1993, vol. III, pp. 258–262 (1993). http://dx.doi.org/10.1109/ICPP.1993.31 Hameurlain, A., Morvan, F.: A parallel scheduling method for efficient query processing. In: Proceedings of the 1993 International Conference on Parallel Processing. Algorithms & Applications, Syracuse University, NY, USA, 16–20 August 1993, vol. III, pp. 258–262 (1993). http://​dx.​doi.​org/​10.​1109/​ICPP.​1993.​31
32.
Zurück zum Zitat Hameurlain, A., Morvan, F.: Scheduling and mapping for parallel execution of extended SQL queries. In: CIKM 1995, Proceedings of the 1995 International Conference on Information and Knowledge Management, Baltimore, Maryland, USA, 28 November–2 December 1995, pp. 197–204 (1995). http://doi.acm.org/10.1145/221270.221567 Hameurlain, A., Morvan, F.: Scheduling and mapping for parallel execution of extended SQL queries. In: CIKM 1995, Proceedings of the 1995 International Conference on Information and Knowledge Management, Baltimore, Maryland, USA, 28 November–2 December 1995, pp. 197–204 (1995). http://​doi.​acm.​org/​10.​1145/​221270.​221567
33.
Zurück zum Zitat Hameurlain, A., Morvan, F.: Parallel relational database systems: Why, how and beyond. In: Proceedings of 7th International Conference on Database and Expert Systems Applications, DEXA 1996, Zurich, Switzerland, 9–13 September 1996, pp. 302–312 (1996). http://dx.doi.org/10.1007/BFb0034690 Hameurlain, A., Morvan, F.: Parallel relational database systems: Why, how and beyond. In: Proceedings of 7th International Conference on Database and Expert Systems Applications, DEXA 1996, Zurich, Switzerland, 9–13 September 1996, pp. 302–312 (1996). http://​dx.​doi.​org/​10.​1007/​BFb0034690
34.
Zurück zum Zitat Hasan, W., Motwani, R.: Optimization algorithms for exploiting the parallelism-communication tradeoff in pipelined parallelism. In: VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, 12–15 September 1994, pp. 36–47 (1994), http://www.vldb.org/conf/1994/P036.PDF Hasan, W., Motwani, R.: Optimization algorithms for exploiting the parallelism-communication tradeoff in pipelined parallelism. In: VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, 12–15 September 1994, pp. 36–47 (1994), http://​www.​vldb.​org/​conf/​1994/​P036.​PDF
35.
Zurück zum Zitat Hasan, W., Motwani, R.: Coloring away communication in parallel query optimization. In: VLDB 1995, Proceedings of 21th International Conference on Very Large Data Bases, Zurich, Switzerland, 11–15 September 1995, pp. 239–250 (1995). http://www.vldb.org/conf/1995/P239.PDF Hasan, W., Motwani, R.: Coloring away communication in parallel query optimization. In: VLDB 1995, Proceedings of 21th International Conference on Very Large Data Bases, Zurich, Switzerland, 11–15 September 1995, pp. 239–250 (1995). http://​www.​vldb.​org/​conf/​1995/​P239.​PDF
37.
39.
Zurück zum Zitat Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD 1998, Proceedings of ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, 2–4 June 1998, pp. 106–117 (1998). http://doi.acm.org/10.1145/276304.276315 Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD 1998, Proceedings of ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, 2–4 June 1998, pp. 106–117 (1998). http://​doi.​acm.​org/​10.​1145/​276304.​276315
40.
41.
Zurück zum Zitat Karanasos, K., Balmin, A., Kutsch, M., Ozcan, F., Ercegovac, V., Xia, C., Jackson, J.: Dynamically optimizing queries over large scale data platforms. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 943–954 (2014). http://doi.acm.org/10.1145/2588555.2610531 Karanasos, K., Balmin, A., Kutsch, M., Ozcan, F., Ercegovac, V., Xia, C., Jackson, J.: Dynamically optimizing queries over large scale data platforms. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 943–954 (2014). http://​doi.​acm.​org/​10.​1145/​2588555.​2610531
43.
Zurück zum Zitat Lanzelotte, R.S.G., Valduriez, P.: Extending the search strategy in a query optimizer. In: Proceedings of 17th International Conference on Very Large Data Bases, Barcelona, Catalonia, Spain, 3–6 September 1991, pp. 363–373 (1991). http://www.vldb.org/conf/1991/P363.PDF Lanzelotte, R.S.G., Valduriez, P.: Extending the search strategy in a query optimizer. In: Proceedings of 17th International Conference on Very Large Data Bases, Barcelona, Catalonia, Spain, 3–6 September 1991, pp. 363–373 (1991). http://​www.​vldb.​org/​conf/​1991/​P363.​PDF
47.
Zurück zum Zitat Lu, H., Tan, K.L., Ooi, B.C.: Query Processing in Parallel Relational Database Systems. IEEE CS Press, Los Alamitos (1994) Lu, H., Tan, K.L., Ooi, B.C.: Query Processing in Parallel Relational Database Systems. IEEE CS Press, Los Alamitos (1994)
48.
Zurück zum Zitat Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 10–12 June 2008, pp. 1099–1110 (2008). http://doi.acm.org/10.1145/1376616.1376726 Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 10–12 June 2008, pp. 1099–1110 (2008). http://​doi.​acm.​org/​10.​1145/​1376616.​1376726
50.
Zurück zum Zitat Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, New York (2011) Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, New York (2011)
51.
Zurück zum Zitat Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, 29 June–2 July 2009, pp. 165–178 (2009). http://doi.acm.org/10.1145/1559845.1559865 Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, 29 June–2 July 2009, pp. 165–178 (2009). http://​doi.​acm.​org/​10.​1145/​1559845.​1559865
52.
Zurück zum Zitat Schneider, D.A., DeWitt, D.J.: Tradeoffs in processing complex join queries via hashing in multiprocessor database machines. In: Proceedings of 16th International Conference on Very Large Data Bases, Brisbane, Queensland, Australia, 13–16 August 1990, pp. 469–480 (1990). http://www.vldb.org/conf/1990/P469.PDF Schneider, D.A., DeWitt, D.J.: Tradeoffs in processing complex join queries via hashing in multiprocessor database machines. In: Proceedings of 16th International Conference on Very Large Data Bases, Brisbane, Queensland, Australia, 13–16 August 1990, pp. 469–480 (1990). http://​www.​vldb.​org/​conf/​1990/​P469.​PDF
53.
Zurück zum Zitat Soliman, M.A., Antova, L., Raghavan, V., El-Helw, A., Gu, Z., Shen, E., Caragea, G.C., Garcia-Alvarado, C., Rahman, F., Petropoulos, M., Waas, F., Narayanan, S., Krikellas, K., Baldwin, R.: Orca: a modular query optimizer architecture for big data. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 337–348 (2014). http://doi.acm.org/10.1145/2588555.2595637 Soliman, M.A., Antova, L., Raghavan, V., El-Helw, A., Gu, Z., Shen, E., Caragea, G.C., Garcia-Alvarado, C., Rahman, F., Petropoulos, M., Waas, F., Narayanan, S., Krikellas, K., Baldwin, R.: Orca: a modular query optimizer architecture for big data. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 337–348 (2014). http://​doi.​acm.​org/​10.​1145/​2588555.​2595637
58.
Zurück zum Zitat Tan, K., Lu, H.: Pipeline processing of multi-way join queries in shared-memory systems. In: Proceedings of the 1993 International Conference on Parallel Processing. Architecture, Syracuse University, NY, USA, 16–20 August 1993, vol. I, pp. 345–348 (1993). http://dx.doi.org/10.1109/ICPP.1993.147 Tan, K., Lu, H.: Pipeline processing of multi-way join queries in shared-memory systems. In: Proceedings of the 1993 International Conference on Parallel Processing. Architecture, Syracuse University, NY, USA, 16–20 August 1993, vol. I, pp. 345–348 (1993). http://​dx.​doi.​org/​10.​1109/​ICPP.​1993.​147
59.
Zurück zum Zitat Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009) Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)
60.
Zurück zum Zitat Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, Long Beach, California, USA, 1–6 March 2010, pp. 996–1005 (2010). http://dx.doi.org/10.1109/ICDE.2010.5447738 Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, Long Beach, California, USA, 1–6 March 2010, pp. 996–1005 (2010). http://​dx.​doi.​org/​10.​1109/​ICDE.​2010.​5447738
64.
Zurück zum Zitat Witkowski, A., Cariño, F., Kostamaa, P.: NCR 3700 - the next-generation industrial database computer. In: Proceedings of 19th International Conference on Very Large Data Bases, Dublin, Ireland, 24–27 August 1993, pp. 230–243 (1993). http://www.vldb.org/conf/1993/P230.PDF Witkowski, A., Cariño, F., Kostamaa, P.: NCR 3700 - the next-generation industrial database computer. In: Proceedings of 19th International Conference on Very Large Data Bases, Dublin, Ireland, 24–27 August 1993, pp. 230–243 (1993). http://​www.​vldb.​org/​conf/​1993/​P230.​PDF
66.
68.
Zurück zum Zitat Ziane, M., Zaït, M., Borla-Salamet, P.: Parallel query processing in DBS3. In: Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems (PDIS 1993), Issues, Architectures, and Algorithms, San Diego, CA, USA, 20–23 January 1993, pp. 93–102 (1993). http://dx.doi.org/10.1109/PDIS.1993.253066 Ziane, M., Zaït, M., Borla-Salamet, P.: Parallel query processing in DBS3. In: Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems (PDIS 1993), Issues, Architectures, and Algorithms, San Diego, CA, USA, 20–23 January 1993, pp. 93–102 (1993). http://​dx.​doi.​org/​10.​1109/​PDIS.​1993.​253066
Metadaten
Titel
Big Data Management in the Cloud: Evolution or Crossroad?
verfasst von
Abdelkader Hameurlain
Franck Morvan
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-34099-9_2

Premium Partner