skip to main content
10.1145/3183713.3196927acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging

Published:27 May 2018Publication History

ABSTRACT

In this paper, we show that all mainstream LSM-tree based key-value stores in the literature and in industry are suboptimal with respect to how they trade off among the I/O costs of updates, point lookups, range lookups, as well as the cost of storage, measured as space-amplification. The reason is that they perform expensive merge operations in order to (1) bound the number of runs that a lookup has to probe, and to (2) remove obsolete entries to reclaim space. However, most of these merge operations reduce point lookup cost, long range lookup cost, and space-amplification by a negligible amount.

To address this problem, we expand the LSM-tree design space with Lazy Leveling, a new design that prohibits merge operations at all levels of LSM-tree but the largest. We show that Lazy Leveling improves the worst-case cost complexity of updates while maintaining the same bounds on point lookup cost, long range lookup cost, and space-amplification. To be able to navigate between Lazy Leveling and other designs, we make the LSM-tree design space fluid by introducing Fluid LSM-tree, a generalization of LSM-tree that can be parameterized to assume all existing LSM-tree designs. We show how to fluidly transition from Lazy Leveling to (1) designs that are more optimized for updates by merging less at the largest level, and (2) designs that are more optimized for small range lookups by merging more at all other levels.

We put everything together to design Dostoevsky, a key-value store that navigates the entire Fluid LSM-tree design space based on the application workload and hardware to maximize throughput using a novel closed-form performance model. We implemented Dostoevsky on top of RocksDB, and we show that it strictly dominates state-of-the-art LSM-tree based key-value stores in terms of performance and space-amplification.

References

  1. A. Aggarwal and J. S. Vitter. The Input/Output Complexity of Sorting and Related Problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Agrawal, D. Ganesan, R. Sitaraman, Y. Diao, and S. Singh. Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices. Proceedings of the VLDB Endowment, 2(1):361--372, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Y. Ahmad and B. Kemme. Compaction management in distributed key-value datastores. Proceedings of the VLDB Endowment, 8(8):850--861, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A Fast Array of Wimpy Nodes. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 1--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Apache. Accumulo. https://accumulo.apache.org/, 2016.Google ScholarGoogle Scholar
  6. Apache. Cassandra. http://cassandra.apache.org, 2016.Google ScholarGoogle Scholar
  7. Apache. HBase. http://hbase.apache.org/, 2016.Google ScholarGoogle Scholar
  8. T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. LinkBench: a Database Benchmark Based on the Facebook Social Graph. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1185--1196, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. MaSM: Efficient Online Updates in Data Warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 865--876, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. Online Updates on Data Warehouses via Judicious Use of Solid-State Storage. ACM Transactions on Database Systems (TODS), 40(1), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Badam, K. Park, V. S. Pai, and L. L. Peterson. HashCache: Cache Storage for the Next Billion. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 123--136, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. O. Balmau, D. Didona, R. Guerraoui, W. Zwaenepoel, H. Yuan, A. Arora, K. Gupta, and P. Konka. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In USENIX Annual Technical Conference, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-Oblivious Streaming B-trees. In Proceedings of the Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 81--92, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. S. Brodal and R. Fagerberg. Lower Bounds for External Memory Dictionaries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 546--554, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. C. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 49--60, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Bu, V. Borkar, J. Jia, M. J. Carey, and Condie. Pregelix: Big (ger) graph analytics on a dataflow engine. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 161--172, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Cao, S. Chen, F. Li, M. Wang, and X. S. Wang. LogKV: Exploiting Key-Value Stores for Log Processing. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2013.Google ScholarGoogle Scholar
  19. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 205--218, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Chen, C. Douglas, M. Mutsuzaki, P. Quaid, R. Ramakrishnan, S. Rao, and R. Sears. Walnut: A Unified Cloud Object Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 743--754, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the ACM Symposium on Cloud Computing (SoCC), pages 143--154, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 79--94, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Dayan, P. Bonnet, and S. Idreos. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 327--342, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Debnath, S. Sengupta, and J. Li. FlashStore: high throughput persistent key-value store. Proceedings of the VLDB Endowment, 3(1--2):1414--1425, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 25--36, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6):205--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2017.Google ScholarGoogle Scholar
  28. Facebook. RocksDB. https://github.com/facebook/rocksdb, 2016.Google ScholarGoogle Scholar
  29. Facebook. MyRocks. http://myrocks.io/, 2017.Google ScholarGoogle Scholar
  30. B. Fitzpatrick and A. Vorobey. Memcached: a distributed memory object caching system, 2011.Google ScholarGoogle Scholar
  31. G. Golan-Gueta, E. Bortnikov, E. Hillel, and I. Keidar. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys), pages 32:1---32:14, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Google. LevelDB. https://github.com/google/leveldb/, 2016.Google ScholarGoogle Scholar
  33. H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, and R. Kanneganti. Incremental Organization for Data Recording and Warehousing. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 16--25, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Lakshman and P. Malik. Cassandra - A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Li, B. He, J. Yang, Q. Luo, K. Yi, and R. J. Yang. Tree Indexing on Solid State Drives. Proceedings of the VLDB Endowment, 3(1--2):1195--1206, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Lim, D. G. Andersen, and M. Kaminsky. Towards Accurate and Fast Evaluation of Multi-Stage Log-structured Designs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 149--166, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A Memory-Efficient, High-Performance Key-Value Store. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 1--13, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. LinkedIn. Online reference. http://www.project-voldemort.com, 2016.Google ScholarGoogle Scholar
  39. L. Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 133--148, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Nath and A. Kansal. FlashDB: dynamic self-tuning database for NAND flash. ACM/IEEE IPSN, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Papagiannis, G. Saloustros, P. González-Férez, and A. Bilas. Tucana: Design and Implementation of a Fast and Efficient Scale-up Key-value Store. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 537--550, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Pilman, K. Bocksrocker, L. Braun, R. Marroquin, and D. Kossmann. Fast Scans on Key-Value Stores. Proceedings of the VLDB Endowment, 10(11):1526--1537, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Pitman. Probability. 1999.Google ScholarGoogle Scholar
  45. P. Raju, R. Kadekodi, V. Chidambaram, and I. Abraham. Pebblesdb: Building key-value stores using fragmented log-structured merge trees. Proceedings of the ACM Symposium on Operating Systems Principles, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Redis. Online reference. http://redis.io/.Google ScholarGoogle Scholar
  47. K. Ren, Q. Zheng, J. Arulraj, and G. Gibson. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment, 10(13):2037--2048, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. R. Sears and R. Ramakrishnan. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 217--228, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. P. Shetty, R. P. Spillane, R. Malpani, B. Andrews, J. Seyster, and E. Zadok. Building Workload-Independent Storage with VT-trees. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 17--30, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Serveys & Tutorials, 14(1):131--155, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  51. R. Thonangi and J. Yang. On Log-Structured Merge for Solid-State Drives. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 683--694, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  52. WiredTiger. WiredTiger. https://github.com/wiredtiger/wiredtiger, 2016.Google ScholarGoogle Scholar
  53. X. Wu, Y. Xu, Z. Shao, and S. Jiang. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 71--82, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
        May 2018
        1874 pages
        ISBN:9781450347037
        DOI:10.1145/3183713

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 May 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader