skip to main content
10.1145/3299869.3319903acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

The Log-Structured Merge-Bush & the Wacky Continuum

Published:25 June 2019Publication History

ABSTRACT

Data-intensive key-value stores based on the Log-Structured Merge-Tree are used in numerous modern applications ranging from social media and data science to cloud infrastructure. We show that such designs exhibit an intrinsic contention between the costs of point reads, writes and memory, and that this trade-off deteriorates as the data size grows. The root of the problem is that in all existing designs, the capacity ratio between any pair of levels is fixed. This causes write cost to increase with the data size while yielding exponentially diminishing returns for point reads and memory. We introduce the Log-Structured Merge-Bush (LSM-Bush), a new data structure that sets increasing capacity ratios between adjacent pairs of smaller levels. As a result, smaller levels get lazier by gathering more runs before merging them. By using a doubly-exponential ratio growth rate, LSM-bush brings write cost down from \small$O(łog N)$ to \small$O(łogłog N)$, and it can trade this gain to either improve point reads or memory. Thus, it enables more scalable trade-offs all around. We further introduce Wacky, a design continuum that includes LSM-Bush as well as all state-of-the-art merge policies, from laziest to greediest, and can assume any of them within a single implementation. Wacky encompasses a vast space of performance properties, including ones that favor range reads, and it can be searched analytically to find the design that performs best for a given workload in practice.

References

  1. Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. Design Tradeoffs for SSD Performance. ATC (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anand, A., Muthukrishnan, C., Kappes, S., Akella, A., and Nath, S. Cheap and Large CAMs for High Performance Data-Intensive Networked Systems. NSDI (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., and Vasudevan, V. FAWN: A Fast Array of Wimpy Nodes. SOSP (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Apache. Accumulo. https://accumulo.apache.org/.Google ScholarGoogle Scholar
  5. Apache. Cassandra. http://cassandra.apache.org.Google ScholarGoogle Scholar
  6. Apache. HBase. http://hbase.apache.org/.Google ScholarGoogle Scholar
  7. Arge, L. The Buffer Tree: A Technique for Designing Batched External Data Structures. Algorithmica 37, 1 (2003), 1--24.Google ScholarGoogle Scholar
  8. Armstrong, T. G., Ponnekanti, V., Borthakur, D., and Callaghan, M. LinkBench: a Database Benchmark Based on the Facebook Social Graph. SIGMOD (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Athanassoulis, M., Chen, S., Ailamaki, A., Gibbons, P. B., and Stoica, R. Online Updates on Data Warehouses via Judicious Use of Solid-State Storage. TODS 40, 1 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Athanassoulis, M., Kester, M. S., Maas, L. M., Stoica, R., Idreos, S., Ailamaki, A., and Callaghan, M. Designing Access Methods: The RUM Conjecture. EDBT (2016).Google ScholarGoogle Scholar
  11. Balmau, O., Didona, D., Guerraoui, R., Zwaenepoel, W., Yuan, H., Arora, A., Gupta, K., and Konka, P. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. ATC (2017).Google ScholarGoogle Scholar
  12. Bender, M. A., Farach-Colton, M., Fineman, J. T., Fogel, Y. R., Kuszmaul, B. C., and Nelson, J. Cache-Oblivious Streaming B-trees. SPAA (2007).Google ScholarGoogle Scholar
  13. Bortnikov, E., Braginsky, A., Hillel, E., Keidar, I., and Sheffi, G. Accordion: Better Memory Organization for LSM Key-Value Stores. PVLDB 11, 12 (2018), 1863--1875. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Brodal, G. S., and Fagerberg, R. Lower Bounds for External Memory Dictionaries. SODA (2003).Google ScholarGoogle Scholar
  15. Bu, Y., Borkar, V. R., Jia, J., Carey, M. J., and Condie, T. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. PVLDB 8, 2 (2014), 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Buchsbaum, A. L., Goldwasser, M. H., Venkatasubramanian, S., and Westbrook, J. On External Memory Graph Traversal. SODA (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chan, H. H. W., Li, Y., Lee, P. P. C., and Xu, Y. HashKV: Enabling Efficient Updates in KV Storage via Hashing. ATC (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chandramouli, B., Prasaad, G., Kossmann, D., Levandoski, J. J., Hunter, J., and Barnett, M. FASTER: A Concurrent Key-Value Store with In-Place Updates. SIGMOD (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A Distributed Storage System for Structured Data. OSDI (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dayan, N., Athanassoulis, M., and Idreos, S. Monkey: Optimal Navigable Key-Value Store. SIGMOD (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dayan, N., Athanassoulis, M., and Idreos, S. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. TODS (to appear (2018).Google ScholarGoogle Scholar
  22. Dayan, N., Bonnet, P., and Idreos, S. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. SIGMOD (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dayan, N., and Idreos, S. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. SIGMOD (2018).Google ScholarGoogle Scholar
  24. Debnath, B., Sengupta, S., and Li, J. FlashStore: high throughput persistent key-value store. PVLDB 3, 1--2 (2010), 1414--1425.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Debnath, B., Sengupta, S., and Li, J. SkimpyStash: RAM space skimpy key-value store on flash-based storage. SIGMOD (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's Highly Available Key-value Store. SIGOPS Op. Sys. Rev. 41, 6 (2007), 205--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Dong, S., Callaghan, M., Galanis, L., Borthakur, D., Savor, T., and Strum, M. Optimizing Space Amplification in RocksDB. CIDR (2017).Google ScholarGoogle Scholar
  28. Facebook. RocksDB. https://github.com/facebook/rocksdb.Google ScholarGoogle Scholar
  29. Google. LevelDB. https://github.com/google/leveldb/.Google ScholarGoogle Scholar
  30. Idreos, S., Athanassoulis, Dayan, N., Guo, D., Kester, M. S., Maas, L., and Zoumpatianos, K. Past and future steps for adaptive storage data systems: From shallow to deep adaptivity. In BIRTE (2016).Google ScholarGoogle Scholar
  31. Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., and Zhu, Z. Design continuums and the path toward self-designing key-value stores that know and learn. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2019).Google ScholarGoogle Scholar
  32. Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., and Zhu, Z. Design continuums and the path toward self-designing key-value stores that know and learn. In Biennial Conference on Innovative Data Systems Research (CIDR) (2019).Google ScholarGoogle Scholar
  33. Idreos, S., Zoumpatianos, K., Athanassoulis, M., Dayan, N., Hentschel, B., Kester, M. S., Guo, D., Maas, L. M., Qin, W., Wasay, A., and Sun, Y. The Periodic Table of Data Structures. IEEE DEBULL 41, 3 (2018), 64--75.Google ScholarGoogle Scholar
  34. Idreos, S., Zoumpatianos, K., Hentschel, B., Kester, M. S., and Guo, D. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. SIGMOD (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jagadish, H. V., Narayan, P. P. S., Seshadri, S., Sudarshan, S., and Kanneganti, R. Incremental Organization for Data Recording and Warehousing. VLDB (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut: A scalable bottom-up approach for building data series indexes. Proceedings of the VLDB Endowment 11, 6 (2018), 677--690.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut palm: Static and streaming data series exploration now in your palm. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lakshman, A., and Malik, P. Cassandra - A Decentralized Structured Storage System. SIGOPS Op. Sys. Rev. 44, 2 (2010), 35--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Li, Y., He, B., Yang, J., Luo, Q., Yi, K., and Yang, R. J. Tree Indexing on Solid State Drives. PVLDB 3, 1--2 (2010), 1195--1206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Lim, H., Fan, B., Andersen, D. G., and Kaminsky, M. SILT: A Memory-Efficient, High-Performance Key-Value Store. SOSP (2011).Google ScholarGoogle Scholar
  41. LinkedIn. Voldemort. http://www.project-voldemort.com.Google ScholarGoogle Scholar
  42. Lu, L., Pillai, T. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. WiscKey: Separating Keys from Values in SSD-conscious Storage. FAST (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Memcached. Reference. http://memcached.org/.Google ScholarGoogle Scholar
  44. MongoDB. Online reference. http://www.mongodb.com/.Google ScholarGoogle Scholar
  45. Olson, M. A., Bostic, K., and Seltzer, M. I. Berkeley DB. ATC (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. O'Neil, P. E., Cheng, E., Gawlick, D., and O'Neil, E. J. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Pilman, M., Bocksrocker, K., Braun, L., Marroquin, R., and Kossmann, D. Fast Scans on Key-Value Stores. PVLDB 10, 11 (2017), 1526--1537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Raju, P., Kadekodi, R., Chidambaram, V., and Abraham, I. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. SOSP (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Raju, P., Ponnapalli, S., Kaminsky, E., Oved, G., Keener, Z., Chidambaram, V., and Abraham, I. mlsm: Making authenticated storage faster in ethereum. In Proceedings of the USENIX Conference on Hot Topics in Storage and File Systems (HotStorage) (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Redis. Online reference. http://redis.io/.Google ScholarGoogle Scholar
  51. Ren, K., Zheng, Q., Arulraj, J., and Gibson, G. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. PVLDB 10, 13 (2017), 2037--2048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Rhea, S., Wang, E., Wong, E., Atkins, E., and Storer, N. Littletable: A time-series database and its uses. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sears, R., and Ramakrishnan, R. bLSM: A General Purpose Log Structured Merge Tree. SIGMOD (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sheehy, J., and Smith, D. Bitcask: A Log-Structured Hash Table for Fast Key/Value Data. Basho White Paper (2010).Google ScholarGoogle Scholar
  55. Shetty, P., Spillane, R. P., Malpani, R., Andrews, B., Seyster, J., and Zadok, E. Building Workload-Independent Storage with VT-trees. FAST (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tarkoma, S., Rothenberg, C. E., and Lagerspetz, E. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Surveys & Tutorials 14, 1 (2012), 131--155.Google ScholarGoogle ScholarCross RefCross Ref
  57. Thonangi, R., and Yang, J. On Log-Structured Merge for Solid-State Drives. ICDE (2017).Google ScholarGoogle Scholar
  58. Vo, A.-V., Konda, N., Chauhan, N., Aljumaily, H., and Laefer, D. F. Lessons learned with laser scanning point cloud management in hadoop hbase. In Proceedings of EG-ICE (2019).Google ScholarGoogle Scholar
  59. WiredTiger. Source Code. https://github.com/wiredtiger/wiredtiger.Google ScholarGoogle Scholar
  60. Wu, X., Xu, Y., Shao, Z., and Jiang, S. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. ATC (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Log-Structured Merge-Bush & the Wacky Continuum

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
                  June 2019
                  2106 pages
                  ISBN:9781450356435
                  DOI:10.1145/3299869

                  Copyright © 2019 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 25 June 2019

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader