skip to main content
research-article

The end of slow networks: it's time for a redesign

Published:01 March 2016Publication History
Skip Abstract Section

Abstract

The next generation of high-performance networks with remote direct memory access (RDMA) capabilities requires a fundamental rethinking of the design of distributed in-memory DBMSs. These systems are commonly built under the assumption that the network is the primary bottleneck and should be avoided at all costs, but this assumption no longer holds. For instance, with InfiniBand FDR 4×, the bandwidth available to transfer data across the network is in the same ballpark as the bandwidth of one memory channel. Moreover, RDMA transfer latencies continue to rapidly improve as well. In this paper, we first argue that traditional distributed DBMS architectures cannot take full advantage of high-performance networks and suggest a new architecture to address this problem. Then, we discuss initial results from a prototype implementation of our proposed architecture for OLTP and OLAP, showing remarkable performance improvements over existing designs.

References

  1. www.jedec.org/standards-documents/docs/jesd-79-3d.Google ScholarGoogle Scholar
  2. http://snowflake.net/product/architecture.Google ScholarGoogle Scholar
  3. Delivering Application Performance with Oracles InfiniBand Tech. http://www.oracle.com/technetwork/server-storage/networking/documentation/o12-020-1653901.pdf, 2012.Google ScholarGoogle Scholar
  4. Shared Memory Communications over RDMA. http://ibm.com/software/network/commserver/SMCR/, 2013.Google ScholarGoogle Scholar
  5. Intel Data Direct I/O Technology. http://www.intel.com/content/www/us/en/io/direct-data-i-o.html, 2014.Google ScholarGoogle Scholar
  6. I. T. Association. InfiniBand Roadmap. http://www.infinibandta.org/, 2013.Google ScholarGoogle Scholar
  7. S. Babu et al. Massively parallel databases and mapreduce systems. Foundations and Trends in Databases, 2013.Google ScholarGoogle Scholar
  8. P. Bailis et al. Eventual consistency today: limitations, extensions, and beyond. Comm. of ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Balkesen et al. Multi-core, main-memory joins: Sort vs. hash revisited. In VLDB, 2013.Google ScholarGoogle Scholar
  10. V. Barshai et al. Delivering Continuity and Extreme Capacity with the IBM DB2 pureScale Feature. IBM Redbooks, 2012.Google ScholarGoogle Scholar
  11. C. Barthels et al. Rack-scale in-memory join processing using RDMA. In SIGMOD, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Binnig et al. Distributed snapshot isolation: Global transactions pay globally, local transactions pay locally. VLDB Journal, 2014.Google ScholarGoogle Scholar
  13. M. Brantner et al. Building a database on S3. In SIGMOD, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. G. Campbell et al. Extreme scale with full sql language support in microsoft sql azure. In SIGMOD, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. C. Corbett et al. Spanner: Googles globally distributed database. ACM TOCS, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Crotty et al. An Architecture for Compiling UDF-centric Workflows. In VLDB, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Curino et al. Schism: a Workload-Driven Approach to Database Replication and Partitioning. In VLDB, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. J. DeWitt et al. The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng., 1990.Google ScholarGoogle Scholar
  19. A. Dragojevic et al. FaRM: Fast Remote Memory. In NSDI, 2014.Google ScholarGoogle Scholar
  20. A. Dragojevic et al. No compromises: distributed transactions with consistency, availability, and performance. In SOSP, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. J. Elmore et al. Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases. In SIGMOD, 2015.Google ScholarGoogle Scholar
  22. S. Elnikety et al. Database replication using generalized snapshot isolation. In SRDS, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Färber et al. The SAP HANA Database -- An Architecture Overview. IEEE Data Engineering Bulletin, 2012.Google ScholarGoogle Scholar
  24. M. Feldman. RoCE: An Ethernet-InfiniBand Love Story. HPC wire, 2010.Google ScholarGoogle Scholar
  25. P. Frey et al. A spinning join that does not get dizzy. In ICDCS, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. S. Islam et al. High performance RDMA-based design of HDFS over InfiniBand. In SC, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. P. C. Jones et al. Low overhead concurrency control for partitioned main memory databases. In SIGMOD, 2010.Google ScholarGoogle Scholar
  28. J. Jose et al. Memcached design on high performance RDMA capable interconnects. In ICPP, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Kalia et al. Using RDMA efficiently for key-value services. In SIGCOMM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Kallman et al. H-store: a high-performance, distributed main memory transaction processing system. In VLDB, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Kraska et al. MDCC: multi-data center consistency. In EuroSys, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Lee et al. SAP HANA distributed in-memory database system: Transaction, session and metadata management. In ICDE, 2013.Google ScholarGoogle Scholar
  34. V. Leis et al. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. J. Levandoski et al. High Performance Transactions in Deuteronomy. In CIDR, 2015.Google ScholarGoogle Scholar
  36. Y. Lin et al. Middleware based data replication providing snapshot isolation. In SIGMOD, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Lin et al. Snapshot isolation and integrity constraints in replicated databases. ACM Trans. Database Syst., 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Loesing et al. On the Design and Scalability of Distributed Shared-Data Databases. In SIGMOD, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Lu et al. High-performance design of Hadoop RPC with RDMA over InfiniBand. In ICPP, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. MacArthur et al. A performance study to guide RDMA programming decisions. In HPCC, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Mohan et al. Transaction Management in the R* Distributed Database Management System. In TODS, 1986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. K. Ousterhout et al. The case for ramcloud. Commun. ACM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. T. Ozsu. Principles of Distributed Database Systems. Prentice Hall Press, 3rd edition, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Pavlo. On Scalable Transaction Execution in Partitioned Main Memory Database Management Systems. PhD thesis, Brown University, 2014.Google ScholarGoogle Scholar
  45. A. Pavlo et al. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. O. Polychroniou et al. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. O. Polychroniou et al. Track join: distributed joins with minimal network traffic. In SIGMOD, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Pruscino. Oracle RAC: Architecture and performance. In SIGMOD, 2003.Google ScholarGoogle Scholar
  49. A. Quamar et al. SWORD: scalable workload-aware data placement for transactional workloads. In EDBT, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. In VLDB, 2013.Google ScholarGoogle Scholar
  51. S. Ramesh et al. Optimizing Distributed Joins with Bloom Filters. In ICDCIT, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. K. Ren, A. Thomson, and D. J. Abadi. Lightweight locking for main memory database systems. In VLDB, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. W. Rödiger et al. Locality-sensitive operators for parallel main-memory database clusters. In ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  54. W. Rödiger et al. High-speed query processing over high-speed networks. In VLDB, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. W. Rödiger et al. Flow-join: Adaptive skew handling for distributed joins over high-speed networks. In ICDE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  56. M. Stonebraker et al. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, 2005.Google ScholarGoogle Scholar
  57. H. Subramoni et al. RDMA over Ethernet - A preliminary study. In CLUSTER, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  58. A. Thomson et al. The case for determinism in database systems. In VLDB, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. C. Tinnefeld et al. Elastic online analytical processing on RAMCloud. In EDBT, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. C. Tinnefeld et al. Parallel join executions in RAMCloud. In Workshops ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  61. S. Tu et al. Speedy transactions in multicore in-memory databases. In SOSP, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. E. Zamanian et al. Locality-aware partitioning in parallel database systems. In SIGMOD, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The end of slow networks: it's time for a redesign
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 9, Issue 7
      March 2016
      96 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 March 2016
      Published in pvldb Volume 9, Issue 7

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader