ABSTRACT
This paper describes the design and implementation of HERD, a key-value system designed to make the best use of an RDMA network. Unlike prior RDMA-based key-value systems, HERD focuses its design on reducing network round trips while using efficient RDMA primitives; the result is substantially lower latency, and throughput that saturates modern, commodity RDMA hardware.
HERD has two unconventional decisions: First, it does not use RDMA reads, despite the allure of operations that bypass the remote CPU entirely. Second, it uses a mix of RDMA and messaging verbs, despite the conventional wisdom that the messaging primitives are slow. A HERD client writes its request into the server's memory; the server computes the reply. This design uses a single round trip for all requests and supports up to 26 million key-value operations per second with 5μs average latency. Notably, for small key-value items, our full system throughput is similar to native RDMA read throughput and is over 2X higher than recent RDMA-based key-value systems. We believe that HERD further serves as an effective template for the construction of RDMA-based datacenter services.
- Connect-IB: Architecture for Scalable High Performance Computing. URL http://www.mellanox.com/related-docs/applications/SB_Connect-IB.pdf.Google Scholar
- Intel DPDK: Data Plane Development Kit. URL http://dpdk.org.Google Scholar
- Intel 82599 10 Gigabit Ethernet Controller: Datasheet. URL http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-controller-datasheet.html.Google Scholar
- Redis: An Advanced Key-Value Store. URL http://redis.io.Google Scholar
- memcached: A Distributed Memory Object Caching System, 2011. URL http://memcached.org.Google Scholar
- B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload Analysis of a Large-Scale Key-Value Store. In SIGMETRICS, 2012. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In SoCC, 2010. Google ScholarDigital Library
- A. Dragojevic, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast Remote Memory. In USENIX NSDI, 2014. Google ScholarDigital Library
- B. Fan, D. G. Andersen, and M. Kaminsky. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. In USENIX NSDI, 2013. Google ScholarDigital Library
- M. Flajslik and M. Rosenblum. Network Interface Design for Low Latency Request-Response Protocols. In USENIX ATC, 2013. Google ScholarDigital Library
- G. Gibson, G. Grider, A. Jacobson, and W. Lloyd. PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research.Google Scholar
- M. Herlihy, N. Shavit, and M. Tzafrir. Hopscotch Hashing. In DISC, 2008. Google ScholarDigital Library
- J. Huang, X. Ouyang, J. Jose, M. W. ur Rahman, H. Wang, M. Luo, H. Subramoni, C. Murthy, and D. K. Panda. High-Performance Design of HBase with RDMA over InfiniBand. In IPDPS, 2012. Google ScholarDigital Library
- J. Jose, H. Subramoni, K. C. Kandalla, M. W. ur Rahman, H. Wang, S. Narravula, and D. K. Panda. Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports. In CCGRID. IEEE, 2012. Google ScholarDigital Library
- A. Kalia, D. G. Andersen, and M. Kaminsky. Using RDMA Efficiently for Key-Value Services. In Technical Report CMU-PDL-14-106, 2014.Google ScholarDigital Library
- J. Li, J. Wu, and D. K. Panda. High Performance RDMA-Based MPI Implementation over InfiniBand. International Journal of Parallel Programming, 2004. Google ScholarDigital Library
- H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A Memory-efficient, High-performance Key-value Store. In SOSP, 2011. Google ScholarDigital Library
- H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In USENIX NSDI, 2014. Google ScholarDigital Library
- J. Liu, W. Jiang, P. Wyckoff, D. K. Panda, D. Ashton, D. Buntinas, W. Gropp, and B. Toonen. Design and Implementation of MPICH2 over InfiniBand with RDMA Support. In IPDPD, 2004.Google Scholar
- X. Lu, N. S. Islam, M. W. ur Rahman, J. Jose, H. Subramoni, H. Wang, and D. K. Panda. High-Performance Design of Hadoop RPC with RDMA over InfiniBand. In ICPP, 2013. Google ScholarDigital Library
- C. Mitchell, Y. Geng, and J. Li. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In USENIX ATC, 2013. Google ScholarDigital Library
- R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling Memcache at Facebook. In USENIX NSDI, 2013. Google ScholarDigital Library
- D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast Crash Recovery in RAMCloud. In SOSP, 2011. Google ScholarDigital Library
- R. Pagh and F. F. Rodler. Cuckoo Hashing. J. Algorithms, 2004. Google ScholarDigital Library
- P. Stuedi, A. Trivedi, and B. Metzler. Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft-RDMA to Boost Memcached. In USENIX ATC, 2012. Google ScholarDigital Library
- S. Sur, A. Vishnu, H.-W. Jin, W. Huang, and D. K. Panda. Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? In HOTI, 2005. Google ScholarDigital Library
- S. Sur, M. J. Koop, L. Chai, and D. K. Panda. Performance Analysis and Evaluation of Mellanox ConnectX Infiniband Architecture with Multi-Core Platforms. In HOTI, 2007. Google ScholarDigital Library
- A. Trivedi, B. Metzler, and P. Stuedi. A Case for RDMA in Clouds: Turning Supercomputer Networking into Commodity. In APSys, 2011. Google ScholarDigital Library
- B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. In OSDI, 2002. Google ScholarDigital Library
- J. Wu, P. Wyckoff, and D. K. Panda. PVFS over InfiniBand: Design and Performance Evaluation. In Ohio State University Tech Report, 2003.Google Scholar
- D. Zhou, B. Fan, H. Lim, M. Kaminsky, and D. G. Andersen. Scalable, High Performance Ethernet Forwarding with CuckooSwitch. In CoNEXT, 2013. Google ScholarDigital Library
Index Terms
- Using RDMA efficiently for key-value services
Recommendations
Using RDMA efficiently for key-value services
SIGCOMM'14This paper describes the design and implementation of HERD, a key-value system designed to make the best use of an RDMA network. Unlike prior RDMA-based key-value systems, HERD focuses its design on reducing network round trips while using efficient ...
Exploiting Hybrid Index Scheme for RDMA-based Key-Value Stores
SYSTOR '23: Proceedings of the 16th ACM International Conference on Systems and StorageRDMA (Remote Direct Memory Access) is widely studied in building key-value stores to achieve ultra-low latency. In RDMA-based key-value stores, the indexing time takes a large fraction of the overall operation latency as RDMA enables fast data access. ...
An efficient design for fast memory registration in RDMA
Remote Direct Memory Access (RDMA) improves network bandwidth and reduces latency by eliminating unnecessary copies from network interface card to application buffers, but the communication buffer management to reduce memory registration and ...
Comments