article

Instant loading for main memory databases

Authors:
Tobias Mühlbauer

Technische Universität München, Munich, Germany

Technische Universität München, Munich, Germany
View Profile

,
Wolf Rödiger

Technische Universität München, Munich, Germany

Technische Universität München, Munich, Germany
View Profile

,
Robert Seilbeck

Technische Universität München, Munich, Germany

Technische Universität München, Munich, Germany
View Profile

,
Angelika Reiser

Technische Universität München, Munich, Germany

Technische Universität München, Munich, Germany
View Profile

,
Alfons Kemper

Technische Universität München, Munich, Germany

Technische Universität München, Munich, Germany
View Profile

,
Thomas Neumann

Technische Universität München, Munich, Germany

Technische Universität München, Munich, Germany
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 14pp 1702–1713https://doi.org/10.14778/2556549.2556555

Published:01 September 2013Publication History

Proceedings of the VLDB Endowment

Abstract

eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on the wire speed of the data source and the speed of the data sink, i.e., the disk. As the speed of network adapters and disks has stagnated in the past, loading has become a major bottleneck. The delays it is causing are now ubiquitous as text formats are a preferred storage format for reasons of portability.

But the game has changed: Ever increasing main memory capacities have fostered the development of in-memory database systems and very fast network infrastructures are on the verge of becoming economical. While hardware limitations for fast loading have disappeared, current approaches for main memory databases fail to saturate the now available wire speeds of tens of Gbit/s. With Instant Loading, we contribute a novel CSV loading approach that allows scalable bulk loading at wire speed. This is achieved by optimizing all phases of loading for modern super-scalar multi-core CPUs. Large main memory capacities and Instant Loading thereby facilitate a very efficient data staging processing model consisting of instantaneous load-work-unload cycles across data archives on a single node. Once data is loaded, updates and queries are efficiently processed with the flexibility, security, and high performance of relational main memory databases.

References

A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB, 2(1):922-933, 2009. Google Scholar
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? VLDB, pages 266-277, 1999. Google Scholar
I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki. NoDB: Efficient Query Execution on Raw Data Files. In SIGMOD, pages 241-252, 2012. Google Scholar
P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225-237, 2005.Google Scholar
J. Dean. MapReduce: simplified data processing on large clusters. CACM, 51(1):107-113, 2008. Google Scholar
D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, et al. Split Query Processing in Polybase. In SIGMOD, pages 1255-1266, 2013. Google Scholar
J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, and J. Schad. Only Aggressive Elephants are Fast Elephants. PVLDB, 5(11):1591-1602, 2012. Google Scholar
G. Graefe. B-tree indexes for high update rates. SIGMOD Rec., 35(1):39-44, 2006. Google Scholar
G. Graefe and H. Kuno. Fast Loads and Queries. In TLDKS II, number 6380 in LNCS, pages 31-72, 2010. Google Scholar
J. Gray, D. Liu, M. Nieto-Santisteban, A. Szalay, D. DeWitt, and G. Heber. Scientific Data Management in the Coming Decade. SIGMOD Rec., 34(4):34-41, 2005. Google Scholar
Hive user group presentation from Netflix. http://slideshare.net/slideshow/embed_code/3483386.Google Scholar
S. Idreos, I. Alagiannis, R. Johnson, and A. Ailamaki. Here are my Data Files. Here are my Queries. Where are my Results? In CIDR, pages 57-68, 2011.Google Scholar
S. Idreos, M. L. Kersten, and S. Manegold. Database Cracking. In CIDR, pages 68-78, 2007.Google Scholar
S. Idreos, S. Manegold, H. Kuno, and G. Graefe. Merging what's cracked, cracking what's merged: adaptive indexing in main-memory column-stores. PVLDB, 4(9):586-597, 2011. Google Scholar
Extending the worlds most popular processor architecture. Intel Whitepaper, 2006.Google Scholar
M. Ivanova, M. Kersten, and S. Manegold. Data Vaults: A Symbiosis between Database Technology and Scientific File Repositories. In SSDM, volume 7338 of LNCS, pages 485-494, 2012. Google Scholar
R. Johnson and I. Pandis. The bionic DBMS is coming, but what will it look like? In CIDR, 2013.Google Scholar
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, et al. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496-1499, 2008. Google Scholar
A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195-206, 2011. Google Scholar
V. Leis, A. Kemper, and T. Neumann. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In ICDE, pages 38-49, 2013. Google Scholar
G. Moerkotte. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. VLDB, pages 476-487, 1998. Google Scholar
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539-550, 2011. Google Scholar
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, et al. A Comparison of Approaches to Large-Scale Data Analysis. In SIGMOD, pages 165-178, 2009. Google Scholar
J. Reinders. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. 2007. Google Scholar
E. Sedlar. Oracle Labs. Personal comm. May 29, 2013.Google Scholar
A. Szalay. JHU. Personal comm. May 16, 2013.Google Scholar
A. Szalay, A. R. Thakar, and J. Gray. The sqlLoader Data-Loading Pipeline. JCSE, 10:38-48, 2008. Google Scholar
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, et al. Hive: A warehousing solution over a map-reduce framework. PVLDB, 2(2):1626-1629, 2009. Google Scholar
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, et al. SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units. PVLDB, 2(1):385-394, 2009. Google Scholar
Y. Shafranovich. IETF RFC 4180, 2005.Google Scholar
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, pages 145-156, 2002. Google Scholar

Index Terms

Instant loading for main memory databases
1. Information systems
  1. Data management systems
    1. Database management system engines

Index terms have been assigned to the content through auto-classification.

Recommendations

Instant Apache Sqoop
Read More
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Read More
File-Based Memory Management for Non-volatile Main Memory
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference

Active research and development efforts on byte addressable non-volatile (NV) memory technologies, such as STT-RAM, PCM, and ReRAM, have been conducted in recent years. Because they are byte addressable, they can be used as main memory by directly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 14
September 2013
384 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2013
Published in pvldb Volume 6, Issue 14
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 499
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Instant loading for main memory databases

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Instant Apache Sqoop

Redesign the Memory Allocator for Non-Volatile Main Memory

File-Based Memory Management for Non-volatile Main Memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Instant loading for main memory databases

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Instant Apache Sqoop

Redesign the Memory Allocator for Non-Volatile Main Memory

File-Based Memory Management for Non-volatile Main Memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media