Abstract
We present WiscKey, a persistent LSM-tree-based key-value store with a performance-oriented data layout that separates keys from values to minimize I/O amplification. The design of WiscKey is highly SSD optimized, leveraging both the sequential and random performance characteristics of the device. We demonstrate the advantages of WiscKey with both microbenchmarks and YCSB workloads. Microbenchmark results show that WiscKey is 2.5 × to 111 × faster than LevelDB for loading a database (with significantly better tail latencies) and 1.6 × to 14 × faster for random lookups. WiscKey is faster than both LevelDB and RocksDB in all six YCSB workloads.
- Jung-Sang Ahn, Chiyoung Seo, Ravi Mayuram, Rahim Yaseen, Jin-Soo Kim, and Seungryoul Maeng. 2016. ForestDB: A fast key-value storage system for variable-length string keys. IEEE Trans. Comput. 65, 3 (March 2016), 902--915. Google ScholarDigital Library
- Ramnatthan Alagappan, Aishwarya Ganesan, Yuvraj Patel, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Correlated crash vulnerabilities. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI’16). Google ScholarDigital Library
- Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella, and Suman Nath. 2010. Cheap and large cams for high-performance data-intensive networked systems. In Proceedings of the 7th Symposium on Networked Systems Design and Implementation (NSDI’10). Google ScholarDigital Library
- David Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A fast array of wimpy nodes. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP’09). Google ScholarDigital Library
- Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). Google ScholarDigital Library
- Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. 2014. Operating Systems: Three Easy Pieces (0.9 ed.). Arpaci-Dusseau Books.Google Scholar
- Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2015. Designing Access Methods: The RUM Conjecture. In The 19th International Conference on Extending Database Technology (EDBT’15).Google Scholar
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2015. Workload analysis of a large-scale key-value store. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15).Google Scholar
- Radheshyam Balasundaram, Igor Canadi, Yueh-Hsuan Chiang, Siying Dong, Leonidas Galanis, Lei Jin, Xing Jin, Venkatesh Radhakrishnan, Dmitri Smirnov, Yi Wu, and Feng Zhu. 2015. RocksDB Blog. Retrieved from http://rocksdb.org/blog/.Google Scholar
- Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a needle in haystack: Facebook’s photo storage. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI’10). Google ScholarDigital Library
- Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan Fogel, Bradley Kuszmaul, and Jelani Nelson. 2007. Cache-oblivious streaming b-trees. In Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’07). Google ScholarDigital Library
- Adam L. Buchsbaum, Michael Goldwasser, Suresh Venkatasubramanian, and Jeffery R. Westbrook. 2000. On external memory graph traversal. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’00). Google ScholarDigital Library
- Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta, and Steven Swanson. 2010. Moneta: A high-performance storage array architecture for next-generation, non-volatile memories. In Proceedings of the 43nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’10). Google ScholarDigital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 205--218. Google ScholarDigital Library
- Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA’11). Google ScholarDigital Library
- Jianjun Chen, Chris Douglas, Michi Mutsuzaki, Patrick Quaid, Raghu Ramakrishnan, Sriram Rao, and Russell Sears. 2012. Walnut: A unified cloud object store. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). Google ScholarDigital Library
- John Colgrove, John D. Davis, John Hayes, Ethan L. Miller, Cary Sandvig, Russell Sears, Ari Tamches, Neil Vachharajani, and Feng Wang. 2015. Purity: Building fast, highly-available enterprise flash storage from commodity components. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15). Google ScholarDigital Library
- Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!s hosted data serving platform. In Proceedings of the VLDB Endowment (PVLDB’08). Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10). Google ScholarDigital Library
- Biplob Debnath, Sudipta Sengupta, and Jin Li. 2010. FlashStore: High throughput persistent key-value store. In Proceedings of the 36th International Conference on Very Large Databases (VLDB’10).Google ScholarDigital Library
- Biplob Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD’11). Google ScholarDigital Library
- Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07). Google ScholarDigital Library
- Siying Dong. 2015. RocksDB: Challenges of LSM-Trees in Practice. http://www.hpts.ws/papers/2015/rocksdb_hpts_website.pdf.Google Scholar
- Facebook. 2015. RocksDB 2015 H2 Roadmap. http://rocksdb.org/blog/2015/rocksdb-2015-h2-roadmap/.Google Scholar
- Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI’13). Google ScholarDigital Library
- Fusion-I. O. 2015. Fusion-I. O. ioDrive2. Retrieved from http://www.fusionio.com/products/iodrive2.Google Scholar
- Lars George. 2011. HBase: The Definitive Guide. O’Reilly.Google Scholar
- Sanjay Ghemawat and jeff Dean. 2011. LevelDB. Retrieved from http://code.google.com/p/leveldb.Google Scholar
- Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling concurrent log-structured data stores. In Proceedings of the EuroSys Conference (EuroSys’15). Google ScholarDigital Library
- Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Analysis of HDFS under HBase: A Facebook messages case study. In Proceedings of the 12th USENIX Symposium on File and Storage Technologies (FAST’14). Google ScholarDigital Library
- William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: A right-optimized write-optimized file system. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). Google ScholarDigital Library
- Hyojun Kim, Nitin Agrawal, and Cristian Ungureanu. 2012. Revisiting storage for smartphones. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST’12). Google ScholarDigital Library
- Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. 2015. Atlas: Baidus key-value storage system for cloud data. In Proceedings of the 31st IEEE Conference on Massive Data Storage (MSST’15).Google ScholarCross Ref
- Avinash Lakshman and Prashant Malik. 2009. Cassandra -- a decentralized structured storage system. In The 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware.Google Scholar
- Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). Google ScholarDigital Library
- Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). Google ScholarDigital Library
- Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI’14). Google ScholarDigital Library
- Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). Google ScholarDigital Library
- Haohui Mai and Jing Zhao. 2015. Scaling HDFS to manage billions of files with key value stores. In The 8th Annual Hadoop Summit.Google Scholar
- Yandong Mao, Eddie Kohler, and Robert Morris. 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the EuroSys Conference (EuroSys’12). Google ScholarDigital Library
- Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, ftl-aware key-value store. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). Google ScholarDigital Library
- Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST’12). Google ScholarDigital Library
- Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at facebook. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI’13). Google ScholarDigital Library
- Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and Dave Lomet. 1994. AlphaSort: A RISC machine sort. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD’94). Google ScholarDigital Library
- Patrick ONeil, Edward Cheng, Dieter Gawlick, and Elizabeth ONeil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996), 351--385. Google ScholarDigital Library
- Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, and Thomas Anderson. 2014. Arrakis: The operating system is the control plane. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI’14). Google ScholarDigital Library
- Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI’14). Google ScholarDigital Library
- Eric Redmond. 2013. A Little Riak Book. Retrieved from http://www.littleriakbook.com.Google Scholar
- Kai Ren and Garth Gibson. 2013. TABLEFS: Enhancing metadata efciency in the local file system. In Proceedings of the USENIX Annual Technical Conference (USENIX’13). Google ScholarDigital Library
- Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. 2014. IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). Google ScholarDigital Library
- Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). Google ScholarDigital Library
- Pradeep Shetty, Richard Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Symposium on File and Storage Technologies (FAST’13). Google ScholarDigital Library
- Roshan Sumbaly, Jay Kreps, Lei Gao, Alex Feinberg, Chinmay Soman, and Sam Shah. 2012. Serving large-scale batch computed data with project Voldemort. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST’12). Google ScholarDigital Library
- Vijay Vasudevan, Michael Kaminsky, and David G. Andersen. 2012. Using vector interfaces to deliver millions of iops from a networked key-value storage server. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’12). Google ScholarDigital Library
- Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the EuroSys Conference (EuroSys’14). Google ScholarDigital Library
- Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). Google ScholarDigital Library
Index Terms
- WiscKey: Separating Keys from Values in SSD-Conscious Storage
Recommendations
WiscKey: separating keys from values in SSD-conscious storage
FAST'16: Proceedings of the 14th Usenix Conference on File and Storage TechnologiesWe present WiscKey, a persistent LSM-tree-based key-value store with a performance-oriented data layout that separates keys from values to minimize I/O amplification. The design of WiscKey is highly SSD optimized, leveraging both the sequential and ...
Scan and join optimization by exploiting internal parallelism of flash-based solid state drives
WAIM'13: Proceedings of the 14th international conference on Web-Age Information ManagementNowadays, flash-based solid state drives (SSDs) are gradually replacing hard disk drives (HDDs) as the primary non-volatile storage in both desktop and enterprise applications because of their potential to speed up performance and reduce power ...
Finding the optimal execution scheme of external mergesort on solid state drives
AbstractAs the flash-based solid-state drives(SSDs) gradually replace the mechanical hard disk drives(HDDs) as the mainstream storage, unlike the HDDs, SSDs have rich internal parallelism, which makes it have the excellent characteristics that HDDs do not ...
Comments