research-article

Understanding and mitigating the impact of load imbalance in the memory caching tier

Authors:
Yu-Ju Hong

Purdue University

Purdue University
View Profile

,
Mithuna Thottethodi

Purdue University

Purdue University
View Profile

SOCC '13: Proceedings of the 4th annual Symposium on Cloud ComputingOctober 2013Article No.: 13Pages 1–17https://doi.org/10.1145/2523616.2525970

Published:01 October 2013Publication History

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

Pages 1–17

ABSTRACT

Distributed memory caching systems (e.g., memcached) offer tremendous performance improvements for multi-tiered applications compared to architectures that directly access the storage layer. Unfortunately, the performance improvements are artificially limited by load imbalance in the memcached server pool. Specifically, we show that skewed key popularity induces significant load imbalance, which in turn can cause significant degradation in the tail (i.e., 90+^th %ile) latency. Based on this understanding, we design and implement SPORE -- an augmented memcached variant which uses self-adapting, popularity-based replication to mitigate the effects of such load imbalance. SPORE uses reactive internal key renaming as a basic mechanism to efficiently achieve replication without excessive communication and/or coordination among servers and clients. Further, our SPORE design offers the same consistency model (with added time-bounds on write propagation) as a system with memcached. Based on evaluations on a "wimpy-node" testbed and on Amazon EC2, we show that SPORE achieves significantly higher performance than the baseline memcached.

References

S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. Computer, 29(12): 66--76, Dec. 1996. Google ScholarDigital Library
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, pages 53--64, 2012. Google ScholarDigital Library
G. Barish and K. Obraczka. World wide web caching: Trends and techniques. IEEE Communications Magazine, 38: 178--184, 2000. Google ScholarDigital Library
M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC '07, pages 1--14, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB, 1(2): 1277--1288, Aug. 2008. Google ScholarDigital Library
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143--154, 2010. Google ScholarDigital Library
J. C. Corbett, J. Dean, et al. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 251--264. USENIX Association, 2012. Google ScholarDigital Library
Couchbase: a NoSQL database. http://www.couchbase.com.Google Scholar
F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with cfs. In Proceedings of the eighteenth ACM symposium on Operating systems principles, SOSP '01, pages 202--215. ACM, 2001. Google ScholarDigital Library
B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 25--36, 2011. Google ScholarDigital Library
B. Fan, H. Lim, D. G. Andersen, and M. Kaminsky. Small cache, big effect: provable load balancing for randomly partitioned cluster services. In Proceedings of the 2nd ACM Symposium on Cloud Computing, pages 23:1--23:12, 2011. Google ScholarDigital Library
Facebook's nsdi'13 presentation. https://www.usenix.org/conference/nsdi13/scaling-memcache-facebook.Google Scholar
B. Fitzpatrick. Distributed caching with memcached. Linux Journal, (124), Aug. 2004. Google ScholarDigital Library
S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2): 51--59, June 2002. Google ScholarDigital Library
Gumstix Overo Earth COM. http://www.gumstix.com/store/product_info.php?products_id=211.Google Scholar
Gumstix Stagecoach expansion board. https://www.gumstix.com/store/product_info.php?products_id=247.Google Scholar
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3): 463--492, July 1990. Google ScholarDigital Library
D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 654--663, 1997. Google ScholarDigital Library
D. R. Karger and M. Ruhl. Diminished chord: a protocol for heterogeneous subgroup formation in peer-to-peer networks. In Proceedings of the Third international conference on Peer-to-Peer Systems, IPTPS'04, pages 288--297. Springer-Verlag, 2004. Google ScholarDigital Library
libketama: a consistent hashing algorithm for Memcached clients. http://github.com/RJ/ketama.Google Scholar
S. Kumar. HPCA/PPoPP 2012 Keynote Talk. http://www.ece.lsu.edu/hpca-18/files/HPCA2012_Facebook_Keynote.pdf.Google Scholar
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., 28(9): 690--691, Sept. 1979. Google ScholarDigital Library
C. Li, D. Porto, A. Clement, J. Gehrke, N. Preguiça, and R. Rodrigues. Making geo-replicated systems fast as possible, consistent when necessary. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 265--278. USENIX Association, 2012. Google ScholarDigital Library
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don't settle for eventual: scalable causal consistency for wide-area storage with COPS. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 401--416, 2011. Google ScholarDigital Library
N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, FAST '03, pages 115--130. USENIX Association, 2003. Google ScholarDigital Library
A version of YCSB which supports Memcached operations. https://github.com/mikewied/Memcached-Java-Load-Client.Google Scholar
libemcached: A C and C++ client library for Memcached. https://code.launchpad.net/libmemcached.Google Scholar
Memcached: distributed memory object caching system. http://memcached.org.Google Scholar
J. V. D. Merwe, S. Sen, and C. Kalmanek. Streaming video traffic: Characterization and network impact. In In Proc. of International Web Content Caching and Distribution Workshop, 2002.Google Scholar
S. Michel, K. Nguyen, A. Rosenstein, L. Zhang, S. Floyd, and V. Jacobson. Adaptive web caching: towards a new global caching architecture. Comput. Netw. ISDN Syst., 30(22--23): 2169--2177, Nov. 1998. Google ScholarDigital Library
R. Nishtala, H. Fugal, et al. Scaling memcache at facebook. In Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation, nsdi'13, pages 385--398. USENIX Association, 2013. Google ScholarDigital Library
K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and A. J. Demers. Flexible update propagation for weakly consistent replication. In Proceedings of the sixteenth ACM symposium on Operating systems principles, pages 288--301, 1997. Google ScholarDigital Library
A. Rowstron and P. Druschel. Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the eighteenth ACM symposium on Operating systems principles, SOSP '01, pages 188--201. ACM, 2001. Google ScholarDigital Library
N. Sharma, S. Barker, D. Irwin, and P. Shenoy. Blink: managing server clusters on intermittent power. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, pages 185--198, 2011. Google ScholarDigital Library
Spymemcached: A Java client for Memcached. http://code.google.com/p/spymemcached.Google Scholar
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, SIGCOMM '01, pages 149--160. ACM, 2001. Google ScholarDigital Library
J. Terrace and M. J. Freedman. Object storage on CRAQ: high-throughput chain replication for read-mostly workloads. In Proceedings of the 2009 conference on USENIX Annual technical conference, pages 11--11, 2009. Google ScholarDigital Library
G. Urdaneta, G. Pierre, and M. van Steen. Wikipedia workload analysis for decentralized hosting. Comput. Netw., 53(11): 1830--1845, July 2009. Google ScholarDigital Library
J. Wang. A survey of web caching schemes for the internet. SIGCOMM Comput. Commun. Rev., 29(5): 36--46, Oct. 1999. Google ScholarDigital Library

Index Terms

Recommendations

Cooperative hardware/software caching for next-generation memory systems
Read More
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

Although direct-mapped caches suffer from higher miss ratios as compared to set-associative caches, they are attractive for today's high-speed pipelined processors that require very low access times. Victim caching was proposed by Jouppi [1] as an ...
Read More
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks

Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks. In this work, we observe that both accurate and inaccurate prefetches lead to cache ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing
October 2013
427 pages
ISBN:9781450324281
DOI:10.1145/2523616
General Chair:
Guy Lohman
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SOCC '13 Paper Acceptance Rate23of114submissions,20%Overall Acceptance Rate169of722submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 581
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Understanding and mitigating the impact of load imbalance in the memory caching tier

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cooperative hardware/software caching for next-generation memory systems

Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Understanding and mitigating the impact of load imbalance in the memory caching tier

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cooperative hardware/software caching for next-generation memory systems

Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media