research-article

The end of slow networks: it's time for a redesign

Authors:
Carsten Binnig

Brown University

Brown University
View Profile

,
Andrew Crotty

Brown University

Brown University
View Profile

,
Alex Galakatos

Brown University

Brown University
View Profile

,
Tim Kraska

Brown University

Brown University
View Profile

,
Erfan Zamanian

Brown University

Brown University
View Profile

Proceedings of the VLDB Endowment Volume 9 Issue 7pp 528–539https://doi.org/10.14778/2904483.2904485

Published:01 March 2016Publication History

Proceedings of the VLDB Endowment

Abstract

The next generation of high-performance networks with remote direct memory access (RDMA) capabilities requires a fundamental rethinking of the design of distributed in-memory DBMSs. These systems are commonly built under the assumption that the network is the primary bottleneck and should be avoided at all costs, but this assumption no longer holds. For instance, with InfiniBand FDR 4×, the bandwidth available to transfer data across the network is in the same ballpark as the bandwidth of one memory channel. Moreover, RDMA transfer latencies continue to rapidly improve as well. In this paper, we first argue that traditional distributed DBMS architectures cannot take full advantage of high-performance networks and suggest a new architecture to address this problem. Then, we discuss initial results from a prototype implementation of our proposed architecture for OLTP and OLAP, showing remarkable performance improvements over existing designs.

References

www.jedec.org/standards-documents/docs/jesd-79-3d.Google Scholar
http://snowflake.net/product/architecture.Google Scholar
Delivering Application Performance with Oracles InfiniBand Tech. http://www.oracle.com/technetwork/server-storage/networking/documentation/o12-020-1653901.pdf, 2012.Google Scholar
Shared Memory Communications over RDMA. http://ibm.com/software/network/commserver/SMCR/, 2013.Google Scholar
Intel Data Direct I/O Technology. http://www.intel.com/content/www/us/en/io/direct-data-i-o.html, 2014.Google Scholar
I. T. Association. InfiniBand Roadmap. http://www.infinibandta.org/, 2013.Google Scholar
S. Babu et al. Massively parallel databases and mapreduce systems. Foundations and Trends in Databases, 2013.Google Scholar
P. Bailis et al. Eventual consistency today: limitations, extensions, and beyond. Comm. of ACM, 2013.Google ScholarDigital Library
C. Balkesen et al. Multi-core, main-memory joins: Sort vs. hash revisited. In VLDB, 2013.Google Scholar
V. Barshai et al. Delivering Continuity and Extreme Capacity with the IBM DB2 pureScale Feature. IBM Redbooks, 2012.Google Scholar
C. Barthels et al. Rack-scale in-memory join processing using RDMA. In SIGMOD, 2015.Google ScholarDigital Library
C. Binnig et al. Distributed snapshot isolation: Global transactions pay globally, local transactions pay locally. VLDB Journal, 2014.Google Scholar
M. Brantner et al. Building a database on S3. In SIGMOD, 2008.Google ScholarDigital Library
D. G. Campbell et al. Extreme scale with full sql language support in microsoft sql azure. In SIGMOD, 2010.Google ScholarDigital Library
J. C. Corbett et al. Spanner: Googles globally distributed database. ACM TOCS, 2013.Google ScholarCross Ref
A. Crotty et al. An Architecture for Compiling UDF-centric Workflows. In VLDB, 2015.Google ScholarDigital Library
C. Curino et al. Schism: a Workload-Driven Approach to Database Replication and Partitioning. In VLDB, 2010.Google ScholarDigital Library
D. J. DeWitt et al. The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng., 1990.Google Scholar
A. Dragojevic et al. FaRM: Fast Remote Memory. In NSDI, 2014.Google Scholar
A. Dragojevic et al. No compromises: distributed transactions with consistency, availability, and performance. In SOSP, 2015.Google ScholarDigital Library
A. J. Elmore et al. Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases. In SIGMOD, 2015.Google Scholar
S. Elnikety et al. Database replication using generalized snapshot isolation. In SRDS, 2005.Google ScholarDigital Library
F. Färber et al. The SAP HANA Database -- An Architecture Overview. IEEE Data Engineering Bulletin, 2012.Google Scholar
M. Feldman. RoCE: An Ethernet-InfiniBand Love Story. HPC wire, 2010.Google Scholar
P. Frey et al. A spinning join that does not get dizzy. In ICDCS, 2010.Google ScholarDigital Library
N. S. Islam et al. High performance RDMA-based design of HDFS over InfiniBand. In SC, 2012.Google ScholarDigital Library
E. P. C. Jones et al. Low overhead concurrency control for partitioned main memory databases. In SIGMOD, 2010.Google Scholar
J. Jose et al. Memcached design on high performance RDMA capable interconnects. In ICPP, 2011.Google ScholarDigital Library
A. Kalia et al. Using RDMA efficiently for key-value services. In SIGCOMM, 2014.Google ScholarDigital Library
R. Kallman et al. H-store: a high-performance, distributed main memory transaction processing system. In VLDB, 2008.Google ScholarDigital Library
D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 2000.Google ScholarDigital Library
T. Kraska et al. MDCC: multi-data center consistency. In EuroSys, 2013.Google ScholarDigital Library
J. Lee et al. SAP HANA distributed in-memory database system: Transaction, session and metadata management. In ICDE, 2013.Google Scholar
V. Leis et al. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD, 2014.Google ScholarDigital Library
J. J. Levandoski et al. High Performance Transactions in Deuteronomy. In CIDR, 2015.Google Scholar
Y. Lin et al. Middleware based data replication providing snapshot isolation. In SIGMOD, 2005.Google ScholarDigital Library
Y. Lin et al. Snapshot isolation and integrity constraints in replicated databases. ACM Trans. Database Syst., 2009.Google ScholarDigital Library
S. Loesing et al. On the Design and Scalability of Distributed Shared-Data Databases. In SIGMOD, 2015.Google ScholarDigital Library
X. Lu et al. High-performance design of Hadoop RPC with RDMA over InfiniBand. In ICPP, 2013.Google ScholarDigital Library
P. MacArthur et al. A performance study to guide RDMA programming decisions. In HPCC, 2012.Google ScholarDigital Library
C. Mohan et al. Transaction Management in the R* Distributed Database Management System. In TODS, 1986.Google ScholarDigital Library
J. K. Ousterhout et al. The case for ramcloud. Commun. ACM, 2011.Google ScholarDigital Library
M. T. Ozsu. Principles of Distributed Database Systems. Prentice Hall Press, 3rd edition, 2007.Google ScholarDigital Library
A. Pavlo. On Scalable Transaction Execution in Partitioned Main Memory Database Management Systems. PhD thesis, Brown University, 2014.Google Scholar
A. Pavlo et al. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, 2012.Google ScholarDigital Library
O. Polychroniou et al. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, 2014.Google ScholarDigital Library
O. Polychroniou et al. Track join: distributed joins with minimal network traffic. In SIGMOD, 2014.Google ScholarDigital Library
A. Pruscino. Oracle RAC: Architecture and performance. In SIGMOD, 2003.Google Scholar
A. Quamar et al. SWORD: scalable workload-aware data placement for transactional workloads. In EDBT, 2013.Google ScholarDigital Library
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. In VLDB, 2013.Google Scholar
S. Ramesh et al. Optimizing Distributed Joins with Bloom Filters. In ICDCIT, 2008.Google ScholarDigital Library
K. Ren, A. Thomson, and D. J. Abadi. Lightweight locking for main memory database systems. In VLDB, 2012.Google ScholarDigital Library
W. Rödiger et al. Locality-sensitive operators for parallel main-memory database clusters. In ICDE, 2014.Google ScholarCross Ref
W. Rödiger et al. High-speed query processing over high-speed networks. In VLDB, 2015.Google ScholarDigital Library
W. Rödiger et al. Flow-join: Adaptive skew handling for distributed joins over high-speed networks. In ICDE, 2016.Google ScholarCross Ref
M. Stonebraker et al. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, 2005.Google Scholar
H. Subramoni et al. RDMA over Ethernet - A preliminary study. In CLUSTER, 2009.Google ScholarCross Ref
A. Thomson et al. The case for determinism in database systems. In VLDB, 2010.Google ScholarDigital Library
C. Tinnefeld et al. Elastic online analytical processing on RAMCloud. In EDBT, 2013.Google ScholarDigital Library
C. Tinnefeld et al. Parallel join executions in RAMCloud. In Workshops ICDE, 2014.Google ScholarCross Ref
S. Tu et al. Speedy transactions in multicore in-memory databases. In SOSP, 2013.Google ScholarDigital Library
E. Zamanian et al. Locality-aware partitioning in parallel database systems. In SIGMOD, 2015.Google ScholarDigital Library

Index Terms

The end of slow networks: it's time for a redesign
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs

Index terms have been assigned to the content through auto-classification.

Recommendations

A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control
CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis

When designing a System-on-Chip (SoC) using a Network-on-Chip (NoC), silicon area and power consumption are two key elements to optimize. A dominant part of the NoC area and power consumption is due to the buffers in the Network Interfaces (NIs) needed ...
Read More
Interactive web caching for slow or intermittent networks
ACM DEV-4 '13: Proceedings of the 4th Annual Symposium on Computing for Development

We explore the limitations of existing caching mechanisms in slow networks and propose a new model of web caching designed for developing regions called interactive caching. Unlike conventional caching, interactive caching makes interacting with the ...
Read More
Flattened Butterfly Topology for On-Chip Networks

With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 9, Issue 7
March 2016
96 pages
ISSN:2150-8097
Editors:
Surajit Chaudhuri
Microsoft Research
,
Jayant Haritsa
Bangalore
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 March 2016
Published in pvldb Volume 9, Issue 7
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 74
  Total Citations
  View Citations
- 472
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The end of slow networks: it's time for a redesign

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control

Interactive web caching for slow or intermittent networks

Flattened Butterfly Topology for On-Chip Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The end of slow networks: it's time for a redesign

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control

Interactive web caching for slow or intermittent networks

Flattened Butterfly Topology for On-Chip Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media