ABSTRACT
Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap.
To this end, we have collected detailed traces from Facebook's Memcached deployment, arguably the world's largest. The traces capture over 284 billion requests from five different Memcached use cases over several days. We analyze the workloads from multiple angles, including: request composition, size, and rate; cache efficacy; temporal patterns; and application use cases. We also propose a simple model of the most representative trace to enable the generation of more realistic synthetic workloads by the community.
Our analysis details many characteristics of the caching workload. It also reveals a number of surprises: a GET/SET ratio of 30:1 that is higher than assumed in the literature; some applications of Memcached behave more like persistent storage than a cache; strong locality metrics, such as keys accessed many millions of times a day, do not always suffice for a high hit rate; and there is still room for efficiency and hit rate improvements in Memcached's implementation. Toward the last point, we make several suggestions that address the exposed deficiencies.
- http://voldemort-project.com.Google Scholar
- Ahmad, I. Easy and efficient disk I/O workload characterization in VMware ESX server. In Proceedings of IEEE International Symposium on Workload Characterization (Sept. 2007). Google ScholarDigital Library
- Anand, A., Muthukrishnan, C., Kappes, S., Akella, A., and Nath, S. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the 7th USENIX conference on Networked Systems Design and Implementation (Apr. 2010). Google ScholarDigital Library
- Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., and Vasudevan, V. FAWN: a fast array of wimpy nodes. In Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles (SOSP) (Big Sky, Montana, 2009), ACM, pp. 1--14. Google ScholarDigital Library
- Arlitt, M., Friedrich, R., and Jin, T. Workload characterization of a web proxy in a cable modem environment. ACM SIGMETRICS - Performance Evaluation Review 27 (1999), 25--36. Google ScholarDigital Library
- Arlitt, M. F., and Williamson, C. L. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking 5 (October 1997), 631--645. Google ScholarDigital Library
- Barford, P., Bestavros, A., Bradley, A., and Crovella, M. Changes in web client access patterns. In World Wide Web Journal, Special Issue on Characterization and Performance Evaluation (1999). Google ScholarDigital Library
- Beaver, D., Kumar, S., Li, H. C., Sobel, J., and Vajgel, P. Finding a needle in haystack: Facebook's photo storage. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (Oct. 2010). Google ScholarDigital Library
- Berezecki, M., Frachtenberg, E., Paleczny, M., and Steele, K. Many-core key-value store. In Proceedings of the Second International Green Computing Conference (Orlando, FL, Aug. 2011). Google ScholarDigital Library
- Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of the 18th Annual IEEE International Conference on Computer Communications (1999).Google ScholarCross Ref
- Carns, P., Latham, R., Ross, R., Kamil Iskra, S. L., and Riley, K. 24/7 characterization of petascale I/O workloads. In Proceedings of the 4th Workshop on Interfaces and Architectures for Scientific Data Storage (Nov. 2009).Google ScholarCross Ref
- Cunha, C. R., Bestavros, A., and Crovella, M. E. Characteristics of WWW client-based traces. In Technical Report TR-95-010, Boston University Department of Computer Science, (July 1995). Google ScholarDigital Library
- Debnath, B. K., Sengupta, S., and Li, J. Flashstore: High throughput persistent key-value store. Proceedings of 36th International Conference on Very Large Data Bases (VLDB) 3, 2 (2010). Google ScholarDigital Library
- Debnath, B. K., Sengupta, S., and Li, J. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the Annual ACM SIGMOD Conference (June 2010), pp. 25--36. Google ScholarDigital Library
- DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP) (Stevenson, WA, 2007), pp. 205--220. Google ScholarDigital Library
- Duska, B. M., Marwood, D., and Feeley, M. J. The measured access characteristics of world-wide web client proxy caches. In Proceedings of USENIX Symposium of Internet Technologies and Systems (Dec. 1997). Google ScholarDigital Library
- Feitelson, D. G. Workload modeling for performance evaluation. In Performance Evaluation of Complex Systems: Techniques and Tools, M. C. Calzarossa and S. Tucci, Eds., vol. 2459 of Lecture Notes in Computer Science. Springer-Verlag, Sept. 2002, pp. 114--141. www.cs.huji.ac.il/ feit/papers/WorkloadModel02chap.ps.gz. Google ScholarDigital Library
- Fitzpatrick, B. Distributed caching with memcached. Linux Journal, 124 (Aug. 2004), 72--78. www.linuxjournal.com/article/7451?page=0,0. Google ScholarDigital Library
- Jiang, S., and Zhang, X. LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (2002), SIGMETRICS'02, ACM, pp. 31--42. Google ScholarDigital Library
- Kavalanekar, S., Worthington, B., Zhang, Q., and Sharda, V. Characterization of storage workload traces from production windows servers. In Proceedings of IEEE International Symposium on Workload Characterization (Sept. 2008).Google ScholarCross Ref
- Keeton, K., Alistair Veitch, D. O., and Wilkes, J. I/O characterization of commercial workloads. In Proceedings of the 3rd Workshop on Computer Architecture Evaluation using Commercial Workloads (Jan. 2000).Google Scholar
- Kim, Y., Gunasekaran, R., Shipman, G. M., Dillow, D. A., Zhang, Z., and Settlemyer, B. W. Workload characterization of a leadership class storage cluster. In Proceedings of Petascale Data Storage Workshop (Nov. 2010).Google ScholarCross Ref
- Lim, H., Fan, B., Andersen, D. G., and Kaminsky, M. Silt: A memory-eficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (Oct. 2011). Google ScholarDigital Library
- Lublin, U., and Feitelson, D. G. The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63, 11 (Nov. 2003), 1105--1122. www.cs.huji.ac.il/ feit/papers/Rigid01TR.ps.gz. Google ScholarDigital Library
- Manley, S., and Seltzer, M. Web facts and fantasy. In Proceedings of USENIX Symposium on Internet Technologies and Systems (Dec. 1997). Google ScholarDigital Library
- Petrovic, J. Using Memcached for data distribution in industrial environment. In Proceedings of the Third International Conference on Systems (Washington, DC, 2008), IEEE Computer Society, pp. 368--372. Google ScholarDigital Library
- Reddi, V. J., Lee, B. C., Chilimbi, T., and Vaid, K. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the 37th International Symposium on Computer Architecture (ISCA) (June 2010), ACM. portal.acm.org/citation.cfm?id=1815961.1816002. Google ScholarDigital Library
- Roselli, D., Lorch, J. R., and Anderson, T. E. A comparison of file system workloads. In Proceedings of the 2000 USENIX Annual Technical Conference (June 2000). Google ScholarDigital Library
- Vasudevan, V. R. Energy-Efficient Data-intensive Computing with a Fast Array of Wimpy Nodes. PhD thesis, Carnegie Mellon University, Oct. 2011.Google Scholar
- Wang, F., Xin, Q., Hong, B., Miller, E. L., Long, D. D. E., Brandt, S. A., and McLarty, T. T. File system workload analysis for large scientific computing applications. In Proceedings of 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies (Apr. 2004).Google Scholar
- Zhao, H. HipHop for PHP: Move fast. https://developers.facebook.com/blog/post/358/, Feb. 2010.Google Scholar
Index Terms
- Workload analysis of a large-scale key-value store
Recommendations
A Large-scale Analysis of Hundreds of In-memory Key-value Cache Clusters at Twitter
Modern web services use in-memory caching extensively to increase throughput and reduce latency. There have been several workload analyses of production systems that have fueled research in improving the effectiveness of in-memory caching systems. However,...
Workload analysis of a large-scale key-value store
Performance evaluation reviewKey-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, ...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Comments