Skip to main content

2015 | OriginalPaper | Buchkapitel

A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases

verfasst von : Feng Yu, Wen-Chi Hou

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The column-store database features a faster data reading speed and higher data compression efficiency compared with traditional row-based databases. However, optimizing write operations in the column-store database is one of the well-known challenges. Most existing works on write performance optimization focus on main-memory column-store databases. In this work, we investigate optimizing write operation (update and deletion) on out-of-core (OOC, or external memory) column-store databases. We propose a general framework to work for both normal OOC storage or big data storage, such as Hadoop Distributed File System (HDFS). On normal OOC storage, we propose an innovative data storage format called Timestamped Binary Association Table (or TBAT). Based on TBAT, a new update method, called Asynchronous Out-of-Core Update (or AOC Update), is designed to replace the traditional update. On big data storage, we further extend TBAT onto HDFS and propose the Asynchronous Map-Only Update (or AMO Update) to replace the traditional update. Fast selection methods are developed in both contexts to improve data retrieving speed. A significant improvement in speed performance is shown in the extensive experiments when performing write operations on TBAT in normal and Map-Reduce environment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi, D.J., Boncz, P.A., Harizopoulos, S.: Column-oriented database systems. Proc. VLDB Endow. 2(2), 1664–1665 (2009)CrossRefMATH Abadi, D.J., Boncz, P.A., Harizopoulos, S.: Column-oriented database systems. Proc. VLDB Endow. 2(2), 1664–1665 (2009)CrossRefMATH
2.
Zurück zum Zitat Aiyer, A.S., Bautin, M., Chen, G.J., Damania, P., Khemani, P., Muthukkaruppan, K., Ranganathan, K., Spiegelberg, N., Tang, L., Vaidya, M.: Storage infrastructure behind facebook messages: using HBase at scale. IEEE Data Eng. Bull. 35(2), 4–13 (2012) Aiyer, A.S., Bautin, M., Chen, G.J., Damania, P., Khemani, P., Muthukkaruppan, K., Ranganathan, K., Spiegelberg, N., Tang, L., Vaidya, M.: Storage infrastructure behind facebook messages: using HBase at scale. IEEE Data Eng. Bull. 35(2), 4–13 (2012)
3.
Zurück zum Zitat Boncz, P.: Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.D. thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May 2002 Boncz, P.: Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.D. thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May 2002
4.
Zurück zum Zitat Boncz, P., Grust, T., Van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: Monetdb/xquery: a fast xquery processor powered by a relational engine. In: ACM SIGMOD, pp. 479–490 (2006) Boncz, P., Grust, T., Van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: Monetdb/xquery: a fast xquery processor powered by a relational engine. In: ACM SIGMOD, pp. 479–490 (2006)
5.
Zurück zum Zitat Brill, R.: The Taxir Primer. ERIC, Washington, D.C (1971) Brill, R.: The Taxir Primer. ERIC, Washington, D.C (1971)
6.
Zurück zum Zitat Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)CrossRef Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)CrossRef
7.
Zurück zum Zitat Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: Proceedings of ACM SIGMOD Record, vol. 14, pp. 268–279. ACM (1985) Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: Proceedings of ACM SIGMOD Record, vol. 14, pp. 268–279. ACM (1985)
8.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRef Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRef
9.
Zurück zum Zitat Estabrook, G.F., Brill, R.C.: The theory of the taxir accessioner. Math. Biosci. 5(3), 327–340 (1969)CrossRef Estabrook, G.F., Brill, R.C.: The theory of the taxir accessioner. Math. Biosci. 5(3), 327–340 (1969)CrossRef
10.
Zurück zum Zitat Färber, F., Cha, S.K., Primsch, J., Bornhövd, C., Sigg, S., Lehner, W.: SAP HANA database: data management for modern business applications. SIGMOD Rec. 40(4), 45–51 (2012)CrossRef Färber, F., Cha, S.K., Primsch, J., Bornhövd, C., Sigg, S., Lehner, W.: SAP HANA database: data management for modern business applications. SIGMOD Rec. 40(4), 45–51 (2012)CrossRef
11.
Zurück zum Zitat Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database - an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012) Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database - an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
12.
Zurück zum Zitat George, L.: HBase: The Definitive Guide. O’Reilly Media Inc., CA (2011)MATH George, L.: HBase: The Definitive Guide. O’Reilly Media Inc., CA (2011)MATH
13.
Zurück zum Zitat Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRefMATH Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRefMATH
14.
Zurück zum Zitat Gluche, D., Grust, T., Mainberger, C., Scholl, M.: Incremental updates for materialized OQL views. In: Bry, François (ed.) DOOD 1997. LNCS, vol. 1341, pp. 52–66. Springer, Heidelberg (1997) CrossRef Gluche, D., Grust, T., Mainberger, C., Scholl, M.: Incremental updates for materialized OQL views. In: Bry, François (ed.) DOOD 1997. LNCS, vol. 1341, pp. 52–66. Springer, Heidelberg (1997) CrossRef
15.
Zurück zum Zitat Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., Valduriez, P.: A query processing strategy for the decomposed storage model. In: Proceedings, pp. 636. Order from IEEE Computer Society (1987) Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., Valduriez, P.: A query processing strategy for the decomposed storage model. In: Proceedings, pp. 636. Order from IEEE Computer Society (1987)
16.
Zurück zum Zitat Krueger, J., Grund, M., Tinnefeld, C., Plattner, H., Zeier, A., Faerber, F.: Optimizing write performance for read optimized databases. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 291–305. Springer, Heidelberg (2010) CrossRef Krueger, J., Grund, M., Tinnefeld, C., Plattner, H., Zeier, A., Faerber, F.: Optimizing write performance for read optimized databases. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 291–305. Springer, Heidelberg (2010) CrossRef
17.
Zurück zum Zitat Krueger, J., Kim, C., Grund, M., Satish, N., Schwalb, D., Chhugani, J., Plattner, H., Dubey, P., Zeier, A.: Fast updates on read-optimized databases using multi-core cpus. Proc. VLDB Endow. 5(1), 61–72 (2011)CrossRef Krueger, J., Kim, C., Grund, M., Satish, N., Schwalb, D., Chhugani, J., Plattner, H., Dubey, P., Zeier, A.: Fast updates on read-optimized databases using multi-core cpus. Proc. VLDB Endow. 5(1), 61–72 (2011)CrossRef
18.
Zurück zum Zitat Ladwig, G., Harth, A.: Cumulusrdf: linked data management on nested key-value stores. In: The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), p. 30 (2011) Ladwig, G., Harth, A.: Cumulusrdf: linked data management on nested key-value stores. In: The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), p. 30 (2011)
19.
Zurück zum Zitat Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)CrossRef Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)CrossRef
20.
Zurück zum Zitat White, T.: Hadoop: The Definitive Guide, 2nd edn. O’Reilly, CA (2010) White, T.: Hadoop: The Definitive Guide, 2nd edn. O’Reilly, CA (2010)
21.
Zurück zum Zitat Zukowski, M., Nes, N., Boncz, P.: Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing. In: DaMoN 2008, pp. 47–54. ACM, New York (2008) Zukowski, M., Nes, N., Boncz, P.: Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing. In: DaMoN 2008, pp. 47–54. ACM, New York (2008)
Metadaten
Titel
A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases
verfasst von
Feng Yu
Wen-Chi Hou
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22849-5_12

Premium Partner