Skip to main content

2017 | OriginalPaper | Buchkapitel

Modern Column Stores for Big Data Processing

verfasst von : K. T. Sridhar

Erschienen in: Big Data Analytics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The advent of MapReduce/Hadoop and NoSQL databases undermined the primacy of SQL relational databases for data processing. Pioneering work by researchers on MonetDB and C-Store opened up the world of column stores that retain the SQL model but use different store and engine for performance gains. The emergence of pay-by-use clouds and MPP versions of column stores on cloud eliminated scale-out issues of row stores. Data mining researchers have also shown that SQL on parallel, columnar database could be a candidate for Big Data analytics. In this survey written for a tutorial, we trace the technology evolution and history of the fall of row stores and rise of column stores, delving into architectural details of column DBs from academia and industry.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chamberlin, D.D., et al.: A history and evaluation of System R. Commun. ACM 24(10), 632–646 (1981)CrossRef Chamberlin, D.D., et al.: A history and evaluation of System R. Commun. ACM 24(10), 632–646 (1981)CrossRef
2.
Zurück zum Zitat Graeffe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(6), 73–170 (1993)CrossRef Graeffe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(6), 73–170 (1993)CrossRef
3.
Zurück zum Zitat Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)CrossRef Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)CrossRef
4.
Zurück zum Zitat Pavlo, A., Aslett, M.: What’s really new with NewSQL? ACM SIGMOD Record 45(2), 45–55 (2016)CrossRef Pavlo, A., Aslett, M.: What’s really new with NewSQL? ACM SIGMOD Record 45(2), 45–55 (2016)CrossRef
5.
Zurück zum Zitat Chen, M., Mao, S., Liu, Y.: Big data: a survey, mobile network applications. Mob. Netw. Appl. 19, 171–209 (2014). Springer ScienceCrossRef Chen, M., Mao, S., Liu, Y.: Big data: a survey, mobile network applications. Mob. Netw. Appl. 19, 171–209 (2014). Springer ScienceCrossRef
7.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: USENIX OSDI 2004, pp. 137–149 (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: USENIX OSDI 2004, pp. 137–149 (2004)
8.
Zurück zum Zitat Ailamaki, A., Dewitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: Proceedings of 25th VLDB (VLDB 1999), pp. 266–277 (1999) Ailamaki, A., Dewitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: Proceedings of 25th VLDB (VLDB 1999), pp. 266–277 (1999)
9.
Zurück zum Zitat Brewer, E.: Towards robust distributed systems. In: 19th ACM Symposium on Principles of Distributed Computing (PODC 2000), Portland, USA, pp. 7–10 (2000) Brewer, E.: Towards robust distributed systems. In: 19th ACM Symposium on Principles of Distributed Computing (PODC 2000), Portland, USA, pp. 7–10 (2000)
10.
Zurück zum Zitat Stonebraker, M., Cetintemel, U.: One size fits all: an idea whose time has come and gone. In: IEEE International Conference on Data Engineering (ICDE 2005), pp. 2–11 (2005) Stonebraker, M., Cetintemel, U.: One size fits all: an idea whose time has come and gone. In: IEEE International Conference on Data Engineering (ICDE 2005), pp. 2–11 (2005)
11.
Zurück zum Zitat Sridhar, K.T.: Big data analytics using SQL: Quo Vadis? In: IFIP CONFENIS 2017, Shanghai, China, 13 p. (2017) Sridhar, K.T.: Big data analytics using SQL: Quo Vadis? In: IFIP CONFENIS 2017, Shanghai, China, 13 p. (2017)
12.
Zurück zum Zitat Idreos, S., et al.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012) Idreos, S., et al.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
13.
Zurück zum Zitat Stonebraker, M., et al.: C-Store: a column oriented DBMS. In: Proceedings of Very Large Data Bases (VLDB 2005), Trundheim, Norway, pp. 553–564 (2005) Stonebraker, M., et al.: C-Store: a column oriented DBMS. In: Proceedings of Very Large Data Bases (VLDB 2005), Trundheim, Norway, pp. 553–564 (2005)
14.
Zurück zum Zitat Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., Madden, S.: The design and implementation of modern column oriented database systems. Found. Trends Database 5(3), 197–280 (2012)CrossRef Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., Madden, S.: The design and implementation of modern column oriented database systems. Found. Trends Database 5(3), 197–280 (2012)CrossRef
15.
Zurück zum Zitat Pavlo, A., et al.: A comparison of approaches to large scale data analysis. In: ACM SIGMOD 2009, Providence, USA, pp. 165–178 (2009) Pavlo, A., et al.: A comparison of approaches to large scale data analysis. In: ACM SIGMOD 2009, Providence, USA, pp. 165–178 (2009)
16.
Zurück zum Zitat Mohan, C.: History repeats itself: sensible and NonsenSQL aspects of the NoSQL hoopla. In: Proceedings of EDBT/ICDT 2013, Genoa, Italy, pp. 11–16 (2013) Mohan, C.: History repeats itself: sensible and NonsenSQL aspects of the NoSQL hoopla. In: Proceedings of EDBT/ICDT 2013, Genoa, Italy, pp. 11–16 (2013)
17.
Zurück zum Zitat Brewer, E.: CAP twelve years later: how the “rules” have changed. IEEE Comput. 45(2), 23–29 (2012)CrossRef Brewer, E.: CAP twelve years later: how the “rules” have changed. IEEE Comput. 45(2), 23–29 (2012)CrossRef
18.
Zurück zum Zitat Grolinger, K., et al.: Challenges for MapReduce in big data. In: IEEE SERVICES 2014, Anchorage, USA, pp. 182–189 (2014) Grolinger, K., et al.: Challenges for MapReduce in big data. In: IEEE SERVICES 2014, Anchorage, USA, pp. 182–189 (2014)
19.
Zurück zum Zitat Wayner, P.: 7 Hard truths about the NoSQL revolution. InfoWorld, July 2012 Wayner, P.: 7 Hard truths about the NoSQL revolution. InfoWorld, July 2012
20.
Zurück zum Zitat Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: ACM SIGMOD 1985, Austin, USA, pp. 268–279 (1985) Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: ACM SIGMOD 1985, Austin, USA, pp. 268–279 (1985)
21.
Zurück zum Zitat French, C.D.: Teaching an OLTP database kernel advanced data warehousing techniques. In: IEEE International Conference on Data Engineering (ICDE 1997), pp. 194–198 (1997) French, C.D.: Teaching an OLTP database kernel advanced data warehousing techniques. In: IEEE International Conference on Data Engineering (ICDE 1997), pp. 194–198 (1997)
22.
Zurück zum Zitat MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of Very Large Data Bases (VLDB 2004), Toronto, Canada, pp. 1227–1230 (2004) MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of Very Large Data Bases (VLDB 2004), Toronto, Canada, pp. 1227–1230 (2004)
23.
Zurück zum Zitat Boncz, P., Martin, L., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008)CrossRef Boncz, P., Martin, L., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008)CrossRef
24.
Zurück zum Zitat Manegold, S., Kersten M.L., Boncz, P.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. In: Proceedings of the VLDB Endowment (VLDB 2009), Lyon, France (2009). PVLDB 2(2), 1648–1653 Manegold, S., Kersten M.L., Boncz, P.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. In: Proceedings of the VLDB Endowment (VLDB 2009), Lyon, France (2009). PVLDB 2(2), 1648–1653
25.
Zurück zum Zitat Boncz, P., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipeining query execution. In: ACM CIDR 2005, Asilomar, USA, 13 p. (2005) Boncz, P., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipeining query execution. In: ACM CIDR 2005, Asilomar, USA, 13 p. (2005)
26.
Zurück zum Zitat Abadi, D.J., Madden, S.R., Ferreira, M.C.: Integrating compression and execution in column-oriented database systems. In: ACM SIGMOD 2006, Chicago, USA, pp. 671–682 (2006) Abadi, D.J., Madden, S.R., Ferreira, M.C.: Integrating compression and execution in column-oriented database systems. In: ACM SIGMOD 2006, Chicago, USA, pp. 671–682 (2006)
27.
Zurück zum Zitat Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: ACM SIGMOD 2008, Vancouver, Canada, pp. 967–980 (2008) Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: ACM SIGMOD 2008, Vancouver, Canada, pp. 967–980 (2008)
29.
Zurück zum Zitat Sridhar, K.T.: Reliability techniques for MPP SQL database product engineering. In: IEEE ICSRS 2017, Milan, Italy, 6 p., December 2017, to appear Sridhar, K.T.: Reliability techniques for MPP SQL database product engineering. In: IEEE ICSRS 2017, Milan, Italy, 6 p., December 2017, to appear
30.
Zurück zum Zitat Ordonez, C.: Programming the K-means clustering algorithm in SQL. In: AAAI KDD 2004, Seattle, USA, pp. 823–828 (2004) Ordonez, C.: Programming the K-means clustering algorithm in SQL. In: AAAI KDD 2004, Seattle, USA, pp. 823–828 (2004)
31.
Zurück zum Zitat Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Eng. 22(12), 1752–1765 (2010)CrossRef Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Eng. 22(12), 1752–1765 (2010)CrossRef
32.
Zurück zum Zitat Graeffe, G., Fayyad, U., Chaudhuri, S.: On the efficient gathering of sufficient statistics from large SQL databases. In: AAAI KDD 1998, pp. 100–105 (1998) Graeffe, G., Fayyad, U., Chaudhuri, S.: On the efficient gathering of sufficient statistics from large SQL databases. In: AAAI KDD 1998, pp. 100–105 (1998)
33.
Zurück zum Zitat Ordonez, C.: Can we analyze big data inside a DBMS? In: Proceedings of 16th International ACM Workshop on Data Warehousing and OLAP (DOLAP 2013), pp. 85–92 (2013) Ordonez, C.: Can we analyze big data inside a DBMS? In: Proceedings of 16th International ACM Workshop on Data Warehousing and OLAP (DOLAP 2013), pp. 85–92 (2013)
34.
Zurück zum Zitat Jindal, A., Madden, S., Castellanos, M., Hsu, M.: Graph analytics using the Vertica relational database. In: IEEE Big Data, Santa Clara, USA, pp. 1191–1200 (2015) Jindal, A., Madden, S., Castellanos, M., Hsu, M.: Graph analytics using the Vertica relational database. In: IEEE Big Data, Santa Clara, USA, pp. 1191–1200 (2015)
35.
Zurück zum Zitat Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: alternatives and implications. In: ACM SIGMOD 1998, Seattle, USA, pp. 343–354 (1998) Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: alternatives and implications. In: ACM SIGMOD 1998, Seattle, USA, pp. 343–354 (1998)
36.
Zurück zum Zitat Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and kNN-joins in large relational databases (almost) for free. In: IEEE ICDE 2010, pp. 4–15 (2010) Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and kNN-joins in large relational databases (almost) for free. In: IEEE ICDE 2010, pp. 4–15 (2010)
37.
Zurück zum Zitat Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008). SpringerCrossRef Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008). SpringerCrossRef
38.
Zurück zum Zitat Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)CrossRef Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)CrossRef
Metadaten
Titel
Modern Column Stores for Big Data Processing
verfasst von
K. T. Sridhar
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-72413-3_8

Premium Partner