skip to main content
10.1145/2254756.2254766acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Workload analysis of a large-scale key-value store

Published:11 June 2012Publication History

ABSTRACT

Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap.

To this end, we have collected detailed traces from Facebook's Memcached deployment, arguably the world's largest. The traces capture over 284 billion requests from five different Memcached use cases over several days. We analyze the workloads from multiple angles, including: request composition, size, and rate; cache efficacy; temporal patterns; and application use cases. We also propose a simple model of the most representative trace to enable the generation of more realistic synthetic workloads by the community.

Our analysis details many characteristics of the caching workload. It also reveals a number of surprises: a GET/SET ratio of 30:1 that is higher than assumed in the literature; some applications of Memcached behave more like persistent storage than a cache; strong locality metrics, such as keys accessed many millions of times a day, do not always suffice for a high hit rate; and there is still room for efficiency and hit rate improvements in Memcached's implementation. Toward the last point, we make several suggestions that address the exposed deficiencies.

References

  1. http://voldemort-project.com.Google ScholarGoogle Scholar
  2. Ahmad, I. Easy and efficient disk I/O workload characterization in VMware ESX server. In Proceedings of IEEE International Symposium on Workload Characterization (Sept. 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anand, A., Muthukrishnan, C., Kappes, S., Akella, A., and Nath, S. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the 7th USENIX conference on Networked Systems Design and Implementation (Apr. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., and Vasudevan, V. FAWN: a fast array of wimpy nodes. In Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles (SOSP) (Big Sky, Montana, 2009), ACM, pp. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arlitt, M., Friedrich, R., and Jin, T. Workload characterization of a web proxy in a cable modem environment. ACM SIGMETRICS - Performance Evaluation Review 27 (1999), 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arlitt, M. F., and Williamson, C. L. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking 5 (October 1997), 631--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barford, P., Bestavros, A., Bradley, A., and Crovella, M. Changes in web client access patterns. In World Wide Web Journal, Special Issue on Characterization and Performance Evaluation (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Beaver, D., Kumar, S., Li, H. C., Sobel, J., and Vajgel, P. Finding a needle in haystack: Facebook's photo storage. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (Oct. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Berezecki, M., Frachtenberg, E., Paleczny, M., and Steele, K. Many-core key-value store. In Proceedings of the Second International Green Computing Conference (Orlando, FL, Aug. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of the 18th Annual IEEE International Conference on Computer Communications (1999).Google ScholarGoogle ScholarCross RefCross Ref
  11. Carns, P., Latham, R., Ross, R., Kamil Iskra, S. L., and Riley, K. 24/7 characterization of petascale I/O workloads. In Proceedings of the 4th Workshop on Interfaces and Architectures for Scientific Data Storage (Nov. 2009).Google ScholarGoogle ScholarCross RefCross Ref
  12. Cunha, C. R., Bestavros, A., and Crovella, M. E. Characteristics of WWW client-based traces. In Technical Report TR-95-010, Boston University Department of Computer Science, (July 1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Debnath, B. K., Sengupta, S., and Li, J. Flashstore: High throughput persistent key-value store. Proceedings of 36th International Conference on Very Large Data Bases (VLDB) 3, 2 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Debnath, B. K., Sengupta, S., and Li, J. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the Annual ACM SIGMOD Conference (June 2010), pp. 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP) (Stevenson, WA, 2007), pp. 205--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Duska, B. M., Marwood, D., and Feeley, M. J. The measured access characteristics of world-wide web client proxy caches. In Proceedings of USENIX Symposium of Internet Technologies and Systems (Dec. 1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Feitelson, D. G. Workload modeling for performance evaluation. In Performance Evaluation of Complex Systems: Techniques and Tools, M. C. Calzarossa and S. Tucci, Eds., vol. 2459 of Lecture Notes in Computer Science. Springer-Verlag, Sept. 2002, pp. 114--141. www.cs.huji.ac.il/ feit/papers/WorkloadModel02chap.ps.gz. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fitzpatrick, B. Distributed caching with memcached. Linux Journal, 124 (Aug. 2004), 72--78. www.linuxjournal.com/article/7451?page=0,0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jiang, S., and Zhang, X. LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (2002), SIGMETRICS'02, ACM, pp. 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kavalanekar, S., Worthington, B., Zhang, Q., and Sharda, V. Characterization of storage workload traces from production windows servers. In Proceedings of IEEE International Symposium on Workload Characterization (Sept. 2008).Google ScholarGoogle ScholarCross RefCross Ref
  21. Keeton, K., Alistair Veitch, D. O., and Wilkes, J. I/O characterization of commercial workloads. In Proceedings of the 3rd Workshop on Computer Architecture Evaluation using Commercial Workloads (Jan. 2000).Google ScholarGoogle Scholar
  22. Kim, Y., Gunasekaran, R., Shipman, G. M., Dillow, D. A., Zhang, Z., and Settlemyer, B. W. Workload characterization of a leadership class storage cluster. In Proceedings of Petascale Data Storage Workshop (Nov. 2010).Google ScholarGoogle ScholarCross RefCross Ref
  23. Lim, H., Fan, B., Andersen, D. G., and Kaminsky, M. Silt: A memory-eficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (Oct. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lublin, U., and Feitelson, D. G. The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63, 11 (Nov. 2003), 1105--1122. www.cs.huji.ac.il/ feit/papers/Rigid01TR.ps.gz. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Manley, S., and Seltzer, M. Web facts and fantasy. In Proceedings of USENIX Symposium on Internet Technologies and Systems (Dec. 1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Petrovic, J. Using Memcached for data distribution in industrial environment. In Proceedings of the Third International Conference on Systems (Washington, DC, 2008), IEEE Computer Society, pp. 368--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Reddi, V. J., Lee, B. C., Chilimbi, T., and Vaid, K. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the 37th International Symposium on Computer Architecture (ISCA) (June 2010), ACM. portal.acm.org/citation.cfm?id=1815961.1816002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Roselli, D., Lorch, J. R., and Anderson, T. E. A comparison of file system workloads. In Proceedings of the 2000 USENIX Annual Technical Conference (June 2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Vasudevan, V. R. Energy-Efficient Data-intensive Computing with a Fast Array of Wimpy Nodes. PhD thesis, Carnegie Mellon University, Oct. 2011.Google ScholarGoogle Scholar
  30. Wang, F., Xin, Q., Hong, B., Miller, E. L., Long, D. D. E., Brandt, S. A., and McLarty, T. T. File system workload analysis for large scientific computing applications. In Proceedings of 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies (Apr. 2004).Google ScholarGoogle Scholar
  31. Zhao, H. HipHop for PHP: Move fast. https://developers.facebook.com/blog/post/358/, Feb. 2010.Google ScholarGoogle Scholar

Index Terms

  1. Workload analysis of a large-scale key-value store

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
          June 2012
          450 pages
          ISBN:9781450310970
          DOI:10.1145/2254756
          • cover image ACM SIGMETRICS Performance Evaluation Review
            ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 1
            Performance evaluation review
            June 2012
            433 pages
            ISSN:0163-5999
            DOI:10.1145/2318857
            Issue’s Table of Contents

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 June 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate459of2,691submissions,17%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader