ABSTRACT
Data-intensive key-value stores based on the Log-Structured Merge-Tree are used in numerous modern applications ranging from social media and data science to cloud infrastructure. We show that such designs exhibit an intrinsic contention between the costs of point reads, writes and memory, and that this trade-off deteriorates as the data size grows. The root of the problem is that in all existing designs, the capacity ratio between any pair of levels is fixed. This causes write cost to increase with the data size while yielding exponentially diminishing returns for point reads and memory. We introduce the Log-Structured Merge-Bush (LSM-Bush), a new data structure that sets increasing capacity ratios between adjacent pairs of smaller levels. As a result, smaller levels get lazier by gathering more runs before merging them. By using a doubly-exponential ratio growth rate, LSM-bush brings write cost down from \small$O(łog N)$ to \small$O(łogłog N)$, and it can trade this gain to either improve point reads or memory. Thus, it enables more scalable trade-offs all around. We further introduce Wacky, a design continuum that includes LSM-Bush as well as all state-of-the-art merge policies, from laziest to greediest, and can assume any of them within a single implementation. Wacky encompasses a vast space of performance properties, including ones that favor range reads, and it can be searched analytically to find the design that performs best for a given workload in practice.
- Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. Design Tradeoffs for SSD Performance. ATC (2008). Google ScholarDigital Library
- Anand, A., Muthukrishnan, C., Kappes, S., Akella, A., and Nath, S. Cheap and Large CAMs for High Performance Data-Intensive Networked Systems. NSDI (2010). Google ScholarDigital Library
- Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., and Vasudevan, V. FAWN: A Fast Array of Wimpy Nodes. SOSP (2009). Google ScholarDigital Library
- Apache. Accumulo. https://accumulo.apache.org/.Google Scholar
- Apache. Cassandra. http://cassandra.apache.org.Google Scholar
- Apache. HBase. http://hbase.apache.org/.Google Scholar
- Arge, L. The Buffer Tree: A Technique for Designing Batched External Data Structures. Algorithmica 37, 1 (2003), 1--24.Google Scholar
- Armstrong, T. G., Ponnekanti, V., Borthakur, D., and Callaghan, M. LinkBench: a Database Benchmark Based on the Facebook Social Graph. SIGMOD (2013).Google ScholarDigital Library
- Athanassoulis, M., Chen, S., Ailamaki, A., Gibbons, P. B., and Stoica, R. Online Updates on Data Warehouses via Judicious Use of Solid-State Storage. TODS 40, 1 (2015). Google ScholarDigital Library
- Athanassoulis, M., Kester, M. S., Maas, L. M., Stoica, R., Idreos, S., Ailamaki, A., and Callaghan, M. Designing Access Methods: The RUM Conjecture. EDBT (2016).Google Scholar
- Balmau, O., Didona, D., Guerraoui, R., Zwaenepoel, W., Yuan, H., Arora, A., Gupta, K., and Konka, P. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. ATC (2017).Google Scholar
- Bender, M. A., Farach-Colton, M., Fineman, J. T., Fogel, Y. R., Kuszmaul, B. C., and Nelson, J. Cache-Oblivious Streaming B-trees. SPAA (2007).Google Scholar
- Bortnikov, E., Braginsky, A., Hillel, E., Keidar, I., and Sheffi, G. Accordion: Better Memory Organization for LSM Key-Value Stores. PVLDB 11, 12 (2018), 1863--1875. Google ScholarDigital Library
- Brodal, G. S., and Fagerberg, R. Lower Bounds for External Memory Dictionaries. SODA (2003).Google Scholar
- Bu, Y., Borkar, V. R., Jia, J., Carey, M. J., and Condie, T. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. PVLDB 8, 2 (2014), 161--172. Google ScholarDigital Library
- Buchsbaum, A. L., Goldwasser, M. H., Venkatasubramanian, S., and Westbrook, J. On External Memory Graph Traversal. SODA (2000). Google ScholarDigital Library
- Chan, H. H. W., Li, Y., Lee, P. P. C., and Xu, Y. HashKV: Enabling Efficient Updates in KV Storage via Hashing. ATC (2018). Google ScholarDigital Library
- Chandramouli, B., Prasaad, G., Kossmann, D., Levandoski, J. J., Hunter, J., and Barnett, M. FASTER: A Concurrent Key-Value Store with In-Place Updates. SIGMOD (2018).Google ScholarDigital Library
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A Distributed Storage System for Structured Data. OSDI (2006). Google ScholarDigital Library
- Dayan, N., Athanassoulis, M., and Idreos, S. Monkey: Optimal Navigable Key-Value Store. SIGMOD (2017). Google ScholarDigital Library
- Dayan, N., Athanassoulis, M., and Idreos, S. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. TODS (to appear (2018).Google Scholar
- Dayan, N., Bonnet, P., and Idreos, S. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. SIGMOD (2016).Google ScholarDigital Library
- Dayan, N., and Idreos, S. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. SIGMOD (2018).Google Scholar
- Debnath, B., Sengupta, S., and Li, J. FlashStore: high throughput persistent key-value store. PVLDB 3, 1--2 (2010), 1414--1425.Google ScholarDigital Library
- Debnath, B., Sengupta, S., and Li, J. SkimpyStash: RAM space skimpy key-value store on flash-based storage. SIGMOD (2011). Google ScholarDigital Library
- DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's Highly Available Key-value Store. SIGOPS Op. Sys. Rev. 41, 6 (2007), 205--220. Google ScholarDigital Library
- Dong, S., Callaghan, M., Galanis, L., Borthakur, D., Savor, T., and Strum, M. Optimizing Space Amplification in RocksDB. CIDR (2017).Google Scholar
- Facebook. RocksDB. https://github.com/facebook/rocksdb.Google Scholar
- Google. LevelDB. https://github.com/google/leveldb/.Google Scholar
- Idreos, S., Athanassoulis, Dayan, N., Guo, D., Kester, M. S., Maas, L., and Zoumpatianos, K. Past and future steps for adaptive storage data systems: From shallow to deep adaptivity. In BIRTE (2016).Google Scholar
- Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., and Zhu, Z. Design continuums and the path toward self-designing key-value stores that know and learn. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2019).Google Scholar
- Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., and Zhu, Z. Design continuums and the path toward self-designing key-value stores that know and learn. In Biennial Conference on Innovative Data Systems Research (CIDR) (2019).Google Scholar
- Idreos, S., Zoumpatianos, K., Athanassoulis, M., Dayan, N., Hentschel, B., Kester, M. S., Guo, D., Maas, L. M., Qin, W., Wasay, A., and Sun, Y. The Periodic Table of Data Structures. IEEE DEBULL 41, 3 (2018), 64--75.Google Scholar
- Idreos, S., Zoumpatianos, K., Hentschel, B., Kester, M. S., and Guo, D. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. SIGMOD (2018).Google ScholarDigital Library
- Jagadish, H. V., Narayan, P. P. S., Seshadri, S., Sudarshan, S., and Kanneganti, R. Incremental Organization for Data Recording and Warehousing. VLDB (1997). Google ScholarDigital Library
- Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut: A scalable bottom-up approach for building data series indexes. Proceedings of the VLDB Endowment 11, 6 (2018), 677--690.Google ScholarDigital Library
- Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut palm: Static and streaming data series exploration now in your palm. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2019).Google ScholarDigital Library
- Lakshman, A., and Malik, P. Cassandra - A Decentralized Structured Storage System. SIGOPS Op. Sys. Rev. 44, 2 (2010), 35--40. Google ScholarDigital Library
- Li, Y., He, B., Yang, J., Luo, Q., Yi, K., and Yang, R. J. Tree Indexing on Solid State Drives. PVLDB 3, 1--2 (2010), 1195--1206. Google ScholarDigital Library
- Lim, H., Fan, B., Andersen, D. G., and Kaminsky, M. SILT: A Memory-Efficient, High-Performance Key-Value Store. SOSP (2011).Google Scholar
- LinkedIn. Voldemort. http://www.project-voldemort.com.Google Scholar
- Lu, L., Pillai, T. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. WiscKey: Separating Keys from Values in SSD-conscious Storage. FAST (2016). Google ScholarDigital Library
- Memcached. Reference. http://memcached.org/.Google Scholar
- MongoDB. Online reference. http://www.mongodb.com/.Google Scholar
- Olson, M. A., Bostic, K., and Seltzer, M. I. Berkeley DB. ATC (1999). Google ScholarDigital Library
- O'Neil, P. E., Cheng, E., Gawlick, D., and O'Neil, E. J. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.Google ScholarDigital Library
- Pilman, M., Bocksrocker, K., Braun, L., Marroquin, R., and Kossmann, D. Fast Scans on Key-Value Stores. PVLDB 10, 11 (2017), 1526--1537. Google ScholarDigital Library
- Raju, P., Kadekodi, R., Chidambaram, V., and Abraham, I. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. SOSP (2017).Google ScholarDigital Library
- Raju, P., Ponnapalli, S., Kaminsky, E., Oved, G., Keener, Z., Chidambaram, V., and Abraham, I. mlsm: Making authenticated storage faster in ethereum. In Proceedings of the USENIX Conference on Hot Topics in Storage and File Systems (HotStorage) (2018). Google ScholarDigital Library
- Redis. Online reference. http://redis.io/.Google Scholar
- Ren, K., Zheng, Q., Arulraj, J., and Gibson, G. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. PVLDB 10, 13 (2017), 2037--2048. Google ScholarDigital Library
- Rhea, S., Wang, E., Wong, E., Atkins, E., and Storer, N. Littletable: A time-series database and its uses. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2017). Google ScholarDigital Library
- Sears, R., and Ramakrishnan, R. bLSM: A General Purpose Log Structured Merge Tree. SIGMOD (2012).Google ScholarDigital Library
- Sheehy, J., and Smith, D. Bitcask: A Log-Structured Hash Table for Fast Key/Value Data. Basho White Paper (2010).Google Scholar
- Shetty, P., Spillane, R. P., Malpani, R., Andrews, B., Seyster, J., and Zadok, E. Building Workload-Independent Storage with VT-trees. FAST (2013). Google ScholarDigital Library
- Tarkoma, S., Rothenberg, C. E., and Lagerspetz, E. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Surveys & Tutorials 14, 1 (2012), 131--155.Google ScholarCross Ref
- Thonangi, R., and Yang, J. On Log-Structured Merge for Solid-State Drives. ICDE (2017).Google Scholar
- Vo, A.-V., Konda, N., Chauhan, N., Aljumaily, H., and Laefer, D. F. Lessons learned with laser scanning point cloud management in hadoop hbase. In Proceedings of EG-ICE (2019).Google Scholar
- WiredTiger. Source Code. https://github.com/wiredtiger/wiredtiger.Google Scholar
- Wu, X., Xu, Y., Shao, Z., and Jiang, S. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. ATC (2015). Google ScholarDigital Library
Index Terms
- The Log-Structured Merge-Bush & the Wacky Continuum
Recommendations
Splaying Log-Structured Merge-Trees
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataModern persistent key-value stores typically use a log-structured merge-tree (LSM-tree) design, which allows for high write throughput. Our observation is that the LSM-tree, however, has suboptimal performance during read-intensive workload windows with ...
Revisiting Log-Structured Merging for KV Stores in Hybrid Memory Systems
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2We present MioDB, a novel LSM-tree based key-value (KV) store system designed to fully exploit the advantages of byte-addressable non-volatile memories (NVMs). Our experimental studies reveal that the performance bottleneck of LSM-tree based KV stores ...
On Integration of Appends and Merges in Log-Structured Merge Trees
ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingAs widely used indices in key-value stores, the Log-Structured Merge-tree (LSM-tree) and its variants suffer from severe write amplification due to frequent merges in compactions for write-intensive applications. To address the problem, we first propose ...
Comments