research-article

Public Access

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

Authors:
Xin Jin

Johns Hopkins University

Johns Hopkins University
View Profile

,
Xiaozhou Li

Barefoot Networks

Barefoot Networks
View Profile

,
Haoyu Zhang

Princeton University

Princeton University
View Profile

,
Robert Soulé

Barefoot Networks and Università della Svizzera italiana

Barefoot Networks and Università della Svizzera italiana
View Profile

,
Jeongkeun Lee

Barefoot Networks

Barefoot Networks
View Profile

,
Nate Foster

Barefoot Networks and Cornell University

Barefoot Networks and Cornell University
View Profile

,
Changhoon Kim

Barefoot Networks

Barefoot Networks
View Profile

,
Ion Stoica

UC Berkeley

UC Berkeley
View Profile

SOSP '17: Proceedings of the 26th Symposium on Operating Systems PrinciplesOctober 2017Pages 121–136https://doi.org/10.1145/3132747.3132764

Published:14 October 2017Publication History

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles

Pages 121–136

ABSTRACT

We present NetCache, a new key-value store architecture that leverages the power and flexibility of new-generation programmable switches to handle queries on hot items and balance the load across storage nodes. NetCache provides high aggregate throughput and low latency even under highly-skewed and rapidly-changing workloads. The core of NetCache is a packet-processing pipeline that exploits the capabilities of modern programmable switch ASICs to efficiently detect, index, cache and serve hot key-value items in the switch data plane. Additionally, our solution guarantees cache coherence with minimal overhead. We implement a NetCache prototype on Barefoot Tofino switches and commodity servers and demonstrate that a single switch can process 2+ billion queries per second for 64K items with 16-byte keys and 128-byte values, while only consuming a small portion of its hardware resources. To the best of our knowledge, this is the first time that a sophisticated application-level functionality, such as in-network caching, has been shown to run at line rate on programmable switches. Furthermore, we show that NetCache improves the throughput by 3-10x and reduces the latency of up to 40% of queries by 50%, for high-performance, in-memory key-value stores.

Supplemental Material

netcache.mp4

mp4

2 GB

Download

References

David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A Fast Array of Wimpy Nodes. In ACM SOSP. Google ScholarDigital Library
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store. In ACM SIGMETRICS. Google ScholarDigital Library
Barefoot. 2017. Barefoot Capilano. (2017). https://www.barefootnetworks.com/technology/#capilano.Google Scholar
Barefoot. 2017. Barefoot Tofino. (2017). https://www.barefootnetworks.com/technology/#tofino.Google Scholar
M. Berezecki, E. Frachtenberg, M. Paleczny, and K. Steele. 2011. Many-core Key-value Store. In IEEE IGCC. Google ScholarDigital Library
Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: Programming Protocol-independent Packet Processors. ACM SIGCOMM CCR (July 2014). Google ScholarDigital Library
Broadcom. 2017. Broadcom Tomahawk II. (2017). https://www.broadcom.com/.Google Scholar
Andrei Broder and Michael Mitzenmacher. 2004. Network applications of Bloom filters: A survey. Internet mathematics (2004).Google Scholar
Cavium. 2017. Cavium XPliant. (2017). https://www.cavium.com/.Google Scholar
Yue Cheng, Aayush Gupta, and Ali R. Butt. 2015. An In-memory Object Caching Framework with Adaptive Load Balancing. In EuroSys. Google ScholarDigital Library
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In ACM SOCC. Google ScholarDigital Library
Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: The Count-Min sketch and its applications. Journal of Algorithms (April 2005). Google ScholarDigital Library
Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. 2001. Wide-area Cooperative Storage with CFS. In ACM SOSP. Google ScholarDigital Library
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. ACM CACM (February 2013). Google ScholarDigital Library
Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In USENIX NSDI. Google ScholarDigital Library
Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. In USENIX NSDI. Google ScholarDigital Library
Bin Fan, Hyeontaek Lim, David G. Andersen, and Michael Kaminsky. 2011. Small Cache, Big Effect: Provable Load Balancing for Randomly Partitioned Cluster Services. In ACM SOCC. Google ScholarDigital Library
Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, and Peter J. Weinberger. 1994. Quickly Generating Billion-record Synthetic Databases. In ACM SIGMOD. Google ScholarDigital Library
Qi Huang, Helga Gudmundsdottir, Ymir Vigfusson, Daniel A. Freedman, Ken Birman, and Robbert van Renesse. 2014. Characterizing Load Imbalance in Real-World Networked Caches. In ACM SIGCOMM HotNets Workshop. Google ScholarDigital Library
Intel. 2017. Intel Data Plane Development Kit (DPDK). (2017). http://dpdk.org/.Google Scholar
Jaeyeon Jung, Balachander Krishnamurthy, and Michael Rabinovich. 2002. Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites. In WWW. Google ScholarDigital Library
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA efficiently for key-value services. In ACM SIGCOMM. Google ScholarDigital Library
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In USENIX ATC. Google ScholarDigital Library
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In ACM STOC. Google ScholarDigital Library
Markus Klems, Adam Silberstein, Jianjun Chen, Masood Mortazavi, Sahaya Andrews Albert, P.P.S. Narayan, Adwait Tumbde, and Brian Cooper. 2012. The Yahoo!: Cloud Datastore Load Balancer. In CloudDB. Google ScholarDigital Library
Sheng Li, Hyeontaek Lim, Victor W Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G Andersen, O Seongil, Sukhan Lee, and Pradeep Dubey. 2015. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In ISCA. Google ScholarDigital Library
Xiaozhou Li, David G Andersen, Michael Kaminsky, and Michael J Freedman. 2014. Algorithmic improvements for fast concurrent cuckoo hashing. In EuroSys. Google ScholarDigital Library
Xiaozhou Li, Raghav Sethi, Michael Kaminsky, David G. Andersen, and Michael J. Freedman. 2016. Be Fast, Cheap and in Control with SwitchKV. In USENIX NSDI. Google ScholarDigital Library
Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A Memory-efficient, High-performance Key-value Store. In ACM SOSP. Google ScholarDigital Library
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In USENIX NSDI. Google ScholarDigital Library
Ming Liu, Liang Luo, Jacob Nelson, Luis Ceze, Arvind Krishnamurthy, and Kishore Atreya. 2017. IncBricks: Toward In-Network Computation with an In-Network Cache. In ACM ASPLOS. Google ScholarDigital Library
Memcached. 2017. Memcached key-value store. (2017). https://memcached.org/.Google Scholar
Michael Mitzenmacher. 2001. The Power of Two Choices in Randomized Load Balancing. IEEE Transactions on Parallel and Distributed Systems (October 2001). Google ScholarDigital Library
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In USENIX NSDI. Google ScholarDigital Library
NoviFlow. 2017. NoviFlow NoviSwitch. (2017). http://noviflow.com/products/noviswitch/.Google Scholar
John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang. 2015. The RAMCloud Storage System. ACM Transactions on Computer Systems (August 2015). Google ScholarDigital Library
K. V. Rashmi, Mosharaf Chowdhury, Jack Kosaian, Ion Stoica, and Kannan Ramchandran. 2016. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding. In USENIX OSDI. Google ScholarDigital Library
Redis. 2017. Redis data structure store. (2017). https://redis.io/.Google Scholar
Redis. 2017. Using Redis as an LRU cache. (2017). https://redis.io/topics/lru-cache.Google Scholar
Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, and Michael Stonebraker. 2014. E-Store: Fine-grained Elastic Partitioning for Distributed Transaction Processing Systems. VLDB (November 2014). Google ScholarDigital Library
TommyDS. 2017. TommyDS C library. (2017). http//:www.tommyds.it/.Google Scholar
Vijay Vasudevan, Michael Kaminsky, and David G. Andersen. 2012. Using Vector Interfaces to Deliver Millions of IOPS from a Networked Key-value Storage Server. In ACM SOCC. Google ScholarDigital Library
Venkateshwaran Venkataramani, Zach Amsden, Nathan Bronson, George Cabrera III, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Jeremy Hoon, Sachin Kulkarni, Nathan Lawrence, Mark Marchukov, Dmitri Petrov, and Lovro Puzar. 2012. TAO: How Facebook Serves the Social Graph. In ACM SIGMOD. Google ScholarDigital Library

Index Terms

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Read More
FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores
The Log-Structured Merge Tree (LSM-Tree) is widely used in key-value (KV) stores because of its excwrite performance. But LSM-Tree-based KV stores still have the overhead of write-ahead log and write stall caused by slow L₀ flush and L₀-L₁ compaction. New ...
Read More
PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees
SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles

Key-value stores such as LevelDB and RocksDB offer excellent write throughput, but suffer high write amplification. The write amplification problem is due to the Log-Structured Merge Trees data structure that underlies these key-value stores. To remedy ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles
October 2017
677 pages
ISBN:9781450350853
DOI:10.1145/3132747

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Caching
Key-value stores
Programmable switches
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 357
  Total Citations
  View Citations
- 8,005
  Total Downloads
- Downloads (Last 12 months)1,414
- Downloads (Last 6 weeks)202
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores

PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores

PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media