Abstract
The next generation of high-performance networks with remote direct memory access (RDMA) capabilities requires a fundamental rethinking of the design of distributed in-memory DBMSs. These systems are commonly built under the assumption that the network is the primary bottleneck and should be avoided at all costs, but this assumption no longer holds. For instance, with InfiniBand FDR 4×, the bandwidth available to transfer data across the network is in the same ballpark as the bandwidth of one memory channel. Moreover, RDMA transfer latencies continue to rapidly improve as well. In this paper, we first argue that traditional distributed DBMS architectures cannot take full advantage of high-performance networks and suggest a new architecture to address this problem. Then, we discuss initial results from a prototype implementation of our proposed architecture for OLTP and OLAP, showing remarkable performance improvements over existing designs.
- www.jedec.org/standards-documents/docs/jesd-79-3d.Google Scholar
- http://snowflake.net/product/architecture.Google Scholar
- Delivering Application Performance with Oracles InfiniBand Tech. http://www.oracle.com/technetwork/server-storage/networking/documentation/o12-020-1653901.pdf, 2012.Google Scholar
- Shared Memory Communications over RDMA. http://ibm.com/software/network/commserver/SMCR/, 2013.Google Scholar
- Intel Data Direct I/O Technology. http://www.intel.com/content/www/us/en/io/direct-data-i-o.html, 2014.Google Scholar
- I. T. Association. InfiniBand Roadmap. http://www.infinibandta.org/, 2013.Google Scholar
- S. Babu et al. Massively parallel databases and mapreduce systems. Foundations and Trends in Databases, 2013.Google Scholar
- P. Bailis et al. Eventual consistency today: limitations, extensions, and beyond. Comm. of ACM, 2013.Google ScholarDigital Library
- C. Balkesen et al. Multi-core, main-memory joins: Sort vs. hash revisited. In VLDB, 2013.Google Scholar
- V. Barshai et al. Delivering Continuity and Extreme Capacity with the IBM DB2 pureScale Feature. IBM Redbooks, 2012.Google Scholar
- C. Barthels et al. Rack-scale in-memory join processing using RDMA. In SIGMOD, 2015.Google ScholarDigital Library
- C. Binnig et al. Distributed snapshot isolation: Global transactions pay globally, local transactions pay locally. VLDB Journal, 2014.Google Scholar
- M. Brantner et al. Building a database on S3. In SIGMOD, 2008.Google ScholarDigital Library
- D. G. Campbell et al. Extreme scale with full sql language support in microsoft sql azure. In SIGMOD, 2010.Google ScholarDigital Library
- J. C. Corbett et al. Spanner: Googles globally distributed database. ACM TOCS, 2013.Google ScholarCross Ref
- A. Crotty et al. An Architecture for Compiling UDF-centric Workflows. In VLDB, 2015.Google ScholarDigital Library
- C. Curino et al. Schism: a Workload-Driven Approach to Database Replication and Partitioning. In VLDB, 2010.Google ScholarDigital Library
- D. J. DeWitt et al. The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng., 1990.Google Scholar
- A. Dragojevic et al. FaRM: Fast Remote Memory. In NSDI, 2014.Google Scholar
- A. Dragojevic et al. No compromises: distributed transactions with consistency, availability, and performance. In SOSP, 2015.Google ScholarDigital Library
- A. J. Elmore et al. Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases. In SIGMOD, 2015.Google Scholar
- S. Elnikety et al. Database replication using generalized snapshot isolation. In SRDS, 2005.Google ScholarDigital Library
- F. Färber et al. The SAP HANA Database -- An Architecture Overview. IEEE Data Engineering Bulletin, 2012.Google Scholar
- M. Feldman. RoCE: An Ethernet-InfiniBand Love Story. HPC wire, 2010.Google Scholar
- P. Frey et al. A spinning join that does not get dizzy. In ICDCS, 2010.Google ScholarDigital Library
- N. S. Islam et al. High performance RDMA-based design of HDFS over InfiniBand. In SC, 2012.Google ScholarDigital Library
- E. P. C. Jones et al. Low overhead concurrency control for partitioned main memory databases. In SIGMOD, 2010.Google Scholar
- J. Jose et al. Memcached design on high performance RDMA capable interconnects. In ICPP, 2011.Google ScholarDigital Library
- A. Kalia et al. Using RDMA efficiently for key-value services. In SIGCOMM, 2014.Google ScholarDigital Library
- R. Kallman et al. H-store: a high-performance, distributed main memory transaction processing system. In VLDB, 2008.Google ScholarDigital Library
- D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 2000.Google ScholarDigital Library
- T. Kraska et al. MDCC: multi-data center consistency. In EuroSys, 2013.Google ScholarDigital Library
- J. Lee et al. SAP HANA distributed in-memory database system: Transaction, session and metadata management. In ICDE, 2013.Google Scholar
- V. Leis et al. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD, 2014.Google ScholarDigital Library
- J. J. Levandoski et al. High Performance Transactions in Deuteronomy. In CIDR, 2015.Google Scholar
- Y. Lin et al. Middleware based data replication providing snapshot isolation. In SIGMOD, 2005.Google ScholarDigital Library
- Y. Lin et al. Snapshot isolation and integrity constraints in replicated databases. ACM Trans. Database Syst., 2009.Google ScholarDigital Library
- S. Loesing et al. On the Design and Scalability of Distributed Shared-Data Databases. In SIGMOD, 2015.Google ScholarDigital Library
- X. Lu et al. High-performance design of Hadoop RPC with RDMA over InfiniBand. In ICPP, 2013.Google ScholarDigital Library
- P. MacArthur et al. A performance study to guide RDMA programming decisions. In HPCC, 2012.Google ScholarDigital Library
- C. Mohan et al. Transaction Management in the R* Distributed Database Management System. In TODS, 1986.Google ScholarDigital Library
- J. K. Ousterhout et al. The case for ramcloud. Commun. ACM, 2011.Google ScholarDigital Library
- M. T. Ozsu. Principles of Distributed Database Systems. Prentice Hall Press, 3rd edition, 2007.Google ScholarDigital Library
- A. Pavlo. On Scalable Transaction Execution in Partitioned Main Memory Database Management Systems. PhD thesis, Brown University, 2014.Google Scholar
- A. Pavlo et al. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, 2012.Google ScholarDigital Library
- O. Polychroniou et al. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, 2014.Google ScholarDigital Library
- O. Polychroniou et al. Track join: distributed joins with minimal network traffic. In SIGMOD, 2014.Google ScholarDigital Library
- A. Pruscino. Oracle RAC: Architecture and performance. In SIGMOD, 2003.Google Scholar
- A. Quamar et al. SWORD: scalable workload-aware data placement for transactional workloads. In EDBT, 2013.Google ScholarDigital Library
- V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. In VLDB, 2013.Google Scholar
- S. Ramesh et al. Optimizing Distributed Joins with Bloom Filters. In ICDCIT, 2008.Google ScholarDigital Library
- K. Ren, A. Thomson, and D. J. Abadi. Lightweight locking for main memory database systems. In VLDB, 2012.Google ScholarDigital Library
- W. Rödiger et al. Locality-sensitive operators for parallel main-memory database clusters. In ICDE, 2014.Google ScholarCross Ref
- W. Rödiger et al. High-speed query processing over high-speed networks. In VLDB, 2015.Google ScholarDigital Library
- W. Rödiger et al. Flow-join: Adaptive skew handling for distributed joins over high-speed networks. In ICDE, 2016.Google ScholarCross Ref
- M. Stonebraker et al. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, 2005.Google Scholar
- H. Subramoni et al. RDMA over Ethernet - A preliminary study. In CLUSTER, 2009.Google ScholarCross Ref
- A. Thomson et al. The case for determinism in database systems. In VLDB, 2010.Google ScholarDigital Library
- C. Tinnefeld et al. Elastic online analytical processing on RAMCloud. In EDBT, 2013.Google ScholarDigital Library
- C. Tinnefeld et al. Parallel join executions in RAMCloud. In Workshops ICDE, 2014.Google ScholarCross Ref
- S. Tu et al. Speedy transactions in multicore in-memory databases. In SOSP, 2013.Google ScholarDigital Library
- E. Zamanian et al. Locality-aware partitioning in parallel database systems. In SIGMOD, 2015.Google ScholarDigital Library
Index Terms
- The end of slow networks: it's time for a redesign
Recommendations
A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control
CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesisWhen designing a System-on-Chip (SoC) using a Network-on-Chip (NoC), silicon area and power consumption are two key elements to optimize. A dominant part of the NoC area and power consumption is due to the buffers in the Network Interfaces (NIs) needed ...
Interactive web caching for slow or intermittent networks
ACM DEV-4 '13: Proceedings of the 4th Annual Symposium on Computing for DevelopmentWe explore the limitations of existing caching mechanisms in slow networks and propose a new model of web caching designed for developing regions called interactive caching. Unlike conventional caching, interactive caching makes interacting with the ...
Flattened Butterfly Topology for On-Chip Networks
With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...
Comments