ABSTRACT
Log-structured merge (LSM) trees have emerged as one of the most commonly used disk-based data structures in modern data systems. LSM-trees employ out-of-place ingestion to support high throughput for writes, while their immutable file structure allows for good utilization of disk space. Thus, the log-structured paradigm has been widely adopted in state-of-the-art NoSQL, relational, spatial, and time-series data systems. However, despite their popularity, there is a lack of pedagogical textbook-like material on LSM designs. The goal of this tutorial is to present the fundamental principles of the LSM paradigm along with a digest of optimizations and new designs proposed in recent research and adopted by modern LSM engines. This will serve as introductory material for non-experts, and as a roadmap to cutting-edge LSM results for the LSM-aware researchers and practitioners.
Toward this, we first discuss in detail the basic operations (inserts, updates, deletes, point and range queries), their access patterns, and their paths through the LSM data structure. We then dive into the details of recent research on optimizing each of those operations. We first discuss techniques and designs that optimize data ingestion in LSM-trees and the performance tradeoff constructed by writes and reads for the LSM engines. Finally, we present the rich design space of the log-structured paradigm and outline how to navigate it and tune LSM-based systems. We conclude with a discussion on open challenges on LSM systems. This will be a 1.5-hour tutorial.
- Regulation (EU) 2016/679 of the European Parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC. Official Journal of the European Union (Legislative Acts), pages L119/1 -- L119/88, 2016.Google Scholar
- California Consumer Privacy Act. Assembly Bill No. 375, Chapter 55, 2018.Google Scholar
- The California Privacy Rights Act of 2020. https://thecpra.org/, 2020.Google Scholar
- Virginia Consumer Data Protection Act. https://www.sullcrom.com/files/upload/SC-Publication-Virginia-Second-State-Enact-Privacy-Legislation.pdf, 2021.Google Scholar
- H. Abu-Libdeh, D. Alt?nbü ken, A. Beutel, E. H. Chi, L. Doshi, T. Kraska, Xiaozhou, Li, A. Ly, and C. Olston. Learned Indexes for a Google-scale Disk-based Database. In Proceedings of the Workshop on ML for Systems at NeurIPS, 2020.Google Scholar
- W. Y. Alkowaileet, S. Alsubaiee, and M. J. Carey. An LSM-based Tuple Compaction Framework for Apache AsterixDB. Proceedings of the VLDB Endowment, 13(9):1388--1400, 2020.Google ScholarDigital Library
- S. Alsubaiee, Y. Altowim, H. Altwaijry, A. Behm, V. R. Borkar, Y. Bu, M. J. Carey, I. Cetindil, M. Cheelangi, K. Faraaz, E. Gabrielova, R. Grover, Z. Heilbron, Y.-S. Kim, C. Li, G. Li, J. M. Ok, N. Onose, P. Pirzadeh, V. J. Tsotras, R. Vernica, J. Wen, and T. Westmann. AsterixDB: A Scalable, Open Source BDMS. Proceedings of the VLDB Endowment, 7(14):1905--1916, 2014.Google ScholarDigital Library
- Amazon. Cloud Storage. https://aws.amazon.com/what-is-cloud-storage/.Google Scholar
- Apache. Accumulo. https://accumulo.apache.org/.Google Scholar
- Apache. HBase. http://hbase.apache.org/.Google Scholar
- Apache. Cassandra. http://cassandra.apache.org, 2021.Google Scholar
- M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. MaSM: Efficient Online Updates in Data Warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 865--876, 2011.Google ScholarDigital Library
- M. Athanassoulis and S. Idreos. Design Tradeoffs of Data Access Methods. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Tutorial, 2016.Google ScholarDigital Library
- M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 461--466, 2016.Google Scholar
- M. Athanassoulis, S. Sarkar, T. I. Papon, Z. Zhu, and D. Staratzis. Building Deletion-Compliant Data Systems. In IEEE Data Engineering Bulletin, 2022.Google Scholar
- O. Balmau, F. Dinu, W. Zwaenepoel, K. Gupta, R. Chandhiramoorthi, and D. Didona. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 753--766, 2019.Google Scholar
- O. Balmau, F. Dinu, W. Zwaenepoel, K. Gupta, R. Chandhiramoorthi, and D. Didona. SILK+ Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads. ACM Trans. Comput. Syst., 36(4):12:1--12:27, 2020.Google ScholarDigital Library
- A. Breslow and N. Jayasena. Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity. Proceedings of the VLDB Endowment, 11(9):1041--1055, 2018.Google ScholarDigital Library
- M. Callaghan. Compaction priority in RocksDB. http://smalldatum.blogspot.com/2016/02/compaction-priority-in-rocksdb.html, 2016.Google Scholar
- M. Callaghan. Compaction stalls: something to make better in RocksDB. http://smalldatum.blogspot.com/2017/01/compaction-stalls-something-to-make.html, 2017.Google Scholar
- M. Callaghan. Name that compaction algorithm. http://smalldatum.blogspot.com/2018/08/name-that-compaction-algorithm.html, 2018.Google Scholar
- M. Callaghan. Summarizing the different implementations of tiered compaction. http://smalldatum.blogspot.com/2021/12/summarizing-different-implementations.html, 2021.Google Scholar
- Z. Cao, S. Dong, S. Vemuri, and D. H. C. Du. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 209--223, 2020.Google Scholar
- B. Chandramouli, G. Prasaad, D. Kossmann, J. J. Levandoski, J. Hunter, and M. Barnett. FASTER: A Concurrent Key-Value Store with In-Place Updates. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 275--290, 2018.Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 205--218, 2006.Google ScholarDigital Library
- S. Chatterjee, M. Jagadeesan, W. Qin, and S. Idreos. Cosine: A Cloud-Cost Optimized Self-Designing Key-Value Storage Engine. In In Proceedings of the Very Large Databases Endowment, 2022.Google Scholar
- H. Chen, L. Liao, H. Jin, and J. Wu. The Dynamic Cuckoo Filter. In Proceedings of the IEEE International Conference on Network Protocols (ICNP), pages 1--10, 2017.Google ScholarCross Ref
- Y. Chen, Y. Lu, F. Yang, Q. Wang, Y. Wang, and J. Shu. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory. In ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16--20, 2020, pages 1077--1091, 2020.Google Scholar
- B. Dageville. Snowflake Data Cloud. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2021.Google Scholar
- Y. Dai, Y. Xu, A. Ganesan, R. Alagappan, B. Kroth, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 155--171, 2020.Google Scholar
- N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 79--94, 2017.Google ScholarDigital Library
- N. Dayan, M. Athanassoulis, and S. Idreos. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (TODS), 43(4):16:1--16:48, 2018.Google Scholar
- N. Dayan and S. Idreos. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 505--520, 2018.Google ScholarDigital Library
- N. Dayan and S. Idreos. The Log-Structured Merge-Bush & the Wacky Continuum. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 449--466, 2019.Google ScholarDigital Library
- N. Dayan and M. Twitto. Chucky: A Succinct Cuckoo Filter for LSM-Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 365--378, 2021.Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6):205--220, 2007.Google ScholarDigital Library
- P. C. Dillinger and S. Walzer. Ribbon filter: practically smaller than Bloom and Xor. CoRR, 2103.02515, 2021.Google Scholar
- S. Dong. Option of Compaction Priority. https://rocksdb.org/blog/2016/01/29/compaction_pri.html, 2016.Google Scholar
- S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2017.Google Scholar
- S. Dong, A. Kryczka, Y. Jin, and M. Stumm. Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience. In 19th USENIX Conference on File and Storage Technologies, FAST 2021, February 23--25, 2021, pages 33--49, 2021.Google Scholar
- S. Dong, A. Kryczka, Y. Jin, and M. Stumm. RocksDB: Evolution of Development Priorities in a Key-value Store Serving Large-scale Applications. ACM Trans. Storage, 17(4):26:1---26:32, 2021.Google ScholarDigital Library
- Facebook. MyRocks. http://myrocks.io/.Google Scholar
- Facebook. MemTable. https://github.com/facebook/rocksdb/wiki/MemTable, 2021.Google Scholar
- Facebook. RocksDB. https://github.com/facebook/rocksdb, 2021.Google Scholar
- B. Fan, D. G. Andersen, M. Kaminsky, and M. Mitzenmacher. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT), pages 75--88, 2014.Google ScholarDigital Library
- D. Feinberg, M. Adrian, and A. Ronthal. The Future of Database Management Systems Is Cloud. https://www.gartner.com/document/3941821, 2019.Google Scholar
- P. Ferragina and G. Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, 13(8):1162--1175, 2020.Google ScholarDigital Library
- G. Golan-Gueta, E. Bortnikov, E. Hillel, and I. Keidar. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys), pages 32:1--32:14, 2015.Google ScholarDigital Library
- Google. LevelDB. https://github.com/google/leveldb/, 2021.Google Scholar
- B. Hayes. Cloud computing. Communications of the ACM, 51(7):9--11, 2008.Google ScholarDigital Library
- P. Helland. Immutability Changes Everything. Communications of the ACM, 59(1):64--70, 2016.Google ScholarDigital Library
- R. Herbster, S. DellaTorre, P. Druschel, and B. Bhattacharjee. Privacy Capsules: Preventing Information Leaks by Mobile Apps. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys 2016, Singapore, June 26--30, 2016, pages 399--411, 2016.Google ScholarDigital Library
- G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, and Q. Li. X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 651--665, 2019.Google ScholarDigital Library
- H. Huang and S. Ghandeharizadeh. Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 749--763, 2021.Google Scholar
- A. Huynh, H. A. Chaudhari, E. Terzi, and M. Athanassoulis. Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty. CoRR, 2110.13801, 2021.Google Scholar
- S. Idreos and M. Callaghan. Key-Value Storage Engines. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 2667--2672, 2020.Google ScholarDigital Library
- S. Idreos, N. Dayan, W. Qin, M. Akmanalp, S. Hilgard, A. Ross, J. Lennon, V. Jain, H. Gupta, D. Li, and Z. Zhu. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2019.Google Scholar
- S. Idreos, K. Zoumpatianos, M. Athanassoulis, N. Dayan, B. Hentschel, M. S. Kester, D. Guo, L. M. Maas, W. Qin, A. Wasay, and Y. Sun. The Periodic Table of Data Structures. IEEE Data Engineering Bulletin, 41(3):64--75, 2018.Google Scholar
- S. Idreos, K. Zoumpatianos, B. Hentschel, M. S. Kester, and D. Guo. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 535--550, 2018.Google ScholarDigital Library
- Influxdata. In-memory indexing and the Time-Structured Merge Tree (TSM). https://docs.influxdata.com/influxdb/v1.8/concepts/storage_engine/, 2021.Google Scholar
- S. Kannan, N. Bhat, A. Gavrilovska, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Redesigning LSMs for Nonvolatile Memory with NoveLSM. In H. S. Gunawi and B. Reed, editors, 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018, pages 993--1005. USENIX Association, 2018.Google Scholar
- K. Keahey. Chameleon: Experimental Platform for Cloud Computing Research. https://www.chameleoncloud.org/media/cms_page_media/17/GENI-lex.pdf, 2018.Google Scholar
- M. L. Kersten. Big Data Space Fungus. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), Gong show talk, 2015.Google Scholar
- J. Kim, S. Lee, and J. S. Vetter. PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017, pages 57:1---57:14, 2017.Google ScholarDigital Library
- Y.-S. Kim, T. Kim, M. J. Carey, and C. Li. A Comparative Study of Log-Structured Merge-Tree-Based Spatial Indexes for Big Data. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 147--150, 2017.Google ScholarCross Ref
- A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann. RadixSpline: a single-pass learned index. In Proceedings of the International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM@SIGMOD), pages 5:1---5:5, 2020.Google ScholarDigital Library
- H. Kondylakis, N. Dayan, K. Zoumpatianos, and T. Palpanas. Coconut Palm: Static and Streaming Data Series Exploration Now in your Palm. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2019.Google ScholarDigital Library
- H. Kopetz. Internet of Things. In Real-Time Systems, pages 307--323. Springer, 2011.Google ScholarCross Ref
- T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 489--504, 2018.Google ScholarDigital Library
- A. Kryczka. Thread Pool. https://github.com/facebook/rocksdb/wiki/Thread-Pool, 2017.Google Scholar
- A. Kryczka. Compaction Styles. https://github.com/facebook/rocksdb/blob/gh-pages-old/talks/2020-07--17-Brownbag-Compactions.pdf, 2020.Google Scholar
- B. C. Kuszmaul. A Comparison of Fractal Trees to Log-Structured Merge (LSM) Trees. Tokutek White Paper, 2014.Google Scholar
- A. Lamb, M. Fuller, and R. Varadarajan. The Vertica Analytic Database: C-Store 7 Years Later. Proceedings of the VLDB Endowment, 5(12):1790--1801, 2012.Google ScholarDigital Library
- B. Lepers, O. Balmau, K. Gupta, and W. Zwaenepoel. KVell: the design and implementation of a fast persistent key-value store. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 447--461, 2019.Google ScholarDigital Library
- B. Lepers, O. Balmau, K. Gupta, and W. Zwaenepoel. Kvell+: Snapshot Isolation without Snapshots. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 425--441, 2020.Google Scholar
- Y. Li, C. Tian, F. Guo, C. Li, and Y. Xu. ElasticBF: Elastic Bloom Filter with Hotness Awareness for Boosting Read Performance in Large Key-Value Stores. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 739--752, 2019.Google Scholar
- LinkedIn. Voldemort. http://www.project-voldemort.com.Google Scholar
- L. Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 133--148, 2016.Google ScholarDigital Library
- C. Luo. Breaking Down Memory Walls in LSM-based Storage Systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 2817--2819, 2020.Google Scholar
- C. Luo and M. J. Carey. Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems. Proceedings of the VLDB Endowment, 12(5):531--543, 2019.Google ScholarDigital Library
- C. Luo and M. J. Carey. On Performance Stability in LSM-based Storage Systems. Proceedings of the VLDB Endowment, 13(4):449--462, 2019.Google ScholarDigital Library
- C. Luo and M. J. Carey. Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems. Proceedings of the VLDB Endowment, 14(3):241--254, 2020.Google ScholarDigital Library
- C. Luo and M. J. Carey. LSM-based Storage Techniques: A Survey. The VLDB Journal, 29(1):393--418, 2020.Google ScholarDigital Library
- S. Luo, S. Chatterjee, R. Ketsetsidis, N. Dayan, W. Qin, and S. Idreos. Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 2071--2086, 2020.Google ScholarDigital Library
- A. Madan and A. Kryczka. DeleteRange: A New Native RocksDB Operation. https://rocksdb.org/blog/2018/11/21/delete-range.html, 2018.Google Scholar
- Q. Mao, M. A. Qader, and V. Hristidis. Comprehensive Comparison of LSM Architectures for Spatial Data. In Proceedings of the IEEE International Conference on Big Data (BigData), pages 455--460, 2020.Google ScholarCross Ref
- Y. Matsunobu, S. Dong, and H. Lee. MyRocks: LSM-Tree Database Storage Engine Serving Facebook's Social Graph. Proceedings of the VLDB Endowment, 13(12):3217--3230, 2020.Google ScholarDigital Library
- F. Mei, Q. Cao, H. Jiang, and J. Li. SifrDB: A Unified Solution for Write-Optimized Key-Value Stores in Large Datacenter. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, CA, USA, October 11--13, 2018, pages 477--489, 2018.Google ScholarDigital Library
- G. Moerkotte. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 476--487, 1998.Google ScholarDigital Library
- MongoDB. Online reference. http://www.mongodb.com/.Google Scholar
- P. Muth, P. E. O'Neil, A. Pick, and G. Weikum. The LHAM Log-structured History Data Access Method. The VLDB Journal, 8(3--4):199--221, 2000.Google Scholar
- M. A. Olson, K. Bostic, and M. I. Seltzer. Berkeley DB. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 183--191, 1999.Google Scholar
- P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.Google ScholarDigital Library
- F. Özcan, Y. Tian, and P. Tö zü n. Hybrid Transactional/Analytical Processing: A Survey. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1771--1775, 2017.Google Scholar
- F. Pan, Y. Yue, and J. Xiong. dCompaction: Delayed Compaction for the LSM-Tree. Int. J. Parallel Program., 45(6):1310--1325, 2017.Google ScholarDigital Library
- H. Park and K. Shim. FAST: Flash-aware external sorting for mobile database systems. Journal of Systems and Software, 82(8):1298--1312, 2009.Google ScholarDigital Library
- M. A. Qader, S. Cheng, and V. Hristidis. A Comparative Study of Secondary Indexing Techniques in LSM-based NoSQL Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 551--566, 2018.Google ScholarDigital Library
- P. Raju, R. Kadekodi, V. Chidambaram, and I. Abraham. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 497--514, 2017.Google ScholarDigital Library
- K. Ren, Q. Zheng, J. Arulraj, and G. Gibson. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment, 10(13):2037--2048, 2017.Google ScholarDigital Library
- RocksDB. Low Priority Write. https://github.com/facebook/rocksdb/wiki/Low-Priority-Write, 2019.Google Scholar
- RocksDB. Single Delete. https://github.com/facebook/rocksdb/wiki/Single-Delete, 2019.Google Scholar
- RocksDB. Leveled Compaction. https://github.com/facebook/rocksdb/wiki/Leveled-Compaction, 2020.Google Scholar
- RocksDB. Prefix Bloom Filter. https://github.com/facebook/rocksdb/wiki/Prefix-Seek#configure-prefix-bloom-filter, 2020.Google Scholar
- RocksDB. Universal Compaction. https://github.com/facebook/rocksdb/wiki/Universal-Compaction, 2020.Google Scholar
- RocksDB. Block Cache. https://github.com/facebook/rocksdb/wiki/Block-Cache, 2021.Google Scholar
- RocksDB. Merge Operator. https://github.com/facebook/rocksdb/wiki/Merge-Operator, 2021.Google Scholar
- RocksDB. RocksDB Tuning Guide. https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide, 2021.Google Scholar
- S. Sarkar and M. Athanassoulis. Query Language Support for Timely Data Deletion. In Proceedings of the International Conference on Extending Database Technology (EDBT), 2022.Google Scholar
- S. Sarkar, J.-P. Banâ tre, L. Rilling, and C. Morin. Towards Enforcement of the EU GDPR: Enabling Data Erasure. In Proceedings of the IEEE International Conference of Internet of Things (iThings), pages 1--8, 2018.Google ScholarCross Ref
- S. Sarkar, S. Chatterjee, and S. Misra. Assessment of the Suitability of Fog Computing in the Context of Internet of Things. IEEE Transactions on Cloud Computing (TCC), 6(1):46--59, 2018.Google ScholarCross Ref
- S. Sarkar, K. Chen, Z. Zhu, and M. Athanassoulis. Compactionary: A Dictionary for LSM Compactions. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2022.Google ScholarDigital Library
- S. Sarkar, T. I. Papon, D. Staratzis, and M. Athanassoulis. Lethe: A Tunable Delete-Aware LSM Engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 893--908, 2020.Google ScholarDigital Library
- S. Sarkar, D. Staratzis, Z. Zhu, and M. Athanassoulis. Constructing and Analyzing the LSM Compaction Design Space. Proceedings of the VLDB Endowment, 14(11):2216--2229, 2021.Google ScholarDigital Library
- ScyllaDB. Online reference. https://www.scylladb.com/.Google Scholar
- R. Sears and R. Ramakrishnan. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 217--228, 2012.Google ScholarDigital Library
- M. I. Seltzer. Berkeley DB: A Retrospective. IEEE Data Engineering Bulletin, 30(3):21--28, 2007.Google Scholar
- W. Tan, S. Tata, Y. R. Tang, and L. L. Fong. Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 700--711, 2014.Google Scholar
- Y. R. Tang, A. Iyengar, W. Tan, L. L. Fong, L. Liu, and B. Palanisamy. Deferred Lightweight Indexing for Log-Structured Key-Value Stores. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4--7, 2015, pages 11--20, 2015.Google Scholar
- D. Teng, L. Guo, R. Lee, F. Chen, S. Ma, Y. Zhang, and X. Zhang. LSbM-tree: Re-Enabling Buffer Caching in Data Management for Mixed Reads and Writes. In 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, Atlanta, GA, USA, June 5--8, 2017, pages 68--79, 2017.Google Scholar
- R. Thonangi, S. Babu, and J. Yang. A Practical Concurrent Index for Solid-State Drives. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pages 1332--1341, 2012.Google ScholarDigital Library
- R. Thonangi and J. Yang. On Log-Structured Merge for Solid-State Drives. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 683--694, 2017.Google ScholarCross Ref
- T. Vincc on, S. Hardock, C. Riegger, J. Oppermann, A. Koch, and I. Petrov. NoFTL-KV: TacklingWrite-Amplification on KV-Stores with Native Storage Management. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 457--460, 2018.Google Scholar
- A.-V. Vo, N. Konda, N. Chauhan, H. Aljumaily, and D. F. Laefer. Lessons Learned with Laser Scanning Point Cloud Management in Hadoop HBase. In Proceedings of EG-ICE, 2019.Google Scholar
- P. Wang, G. Sun, S. Jiang, J. Ouyang, S. Lin, C. Zhang, and J. Cong. An Efficient Design and Implementation of LSM-Tree based Key-Value Store on Open-Channel SSD. In Proceedings of the ACM European Conference on Computer Systems (EuroSys), pages 16:1--16:14, 2014.Google ScholarDigital Library
- WiredTiger. Source Code. https://github.com/wiredtiger/wiredtiger, 2021.Google Scholar
- F. Wu. Improving Point-Lookup Using Data Block Hash Index. https://rocksdb.org/blog/2018/08/23/data-block-hash-index.html, 2018.Google Scholar
- X. Wu, Y. Xu, Z. Shao, and S. Jiang. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 71--82, 2015.Google ScholarDigital Library
- L. Yang, H. Wu, T. Zhang, X. Cheng, F. Li, L. Zou, Y. Wang, R. Chen, J. Wang, and G. Huang. Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines. Proceedings of the VLDB Endowment, 13(11):1976--1989, 2020.Google ScholarDigital Library
- T. Yao, J. Wan, P. Huang, X. He, Q. Gui, F. Wu, and C. Xie. A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores. In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies (MSST), 2017.Google Scholar
- T. Yao, J. Wan, P. Huang, X. He, F. Wu, and C. Xie. Building Efficient Key-Value Stores via a Lightweight Compaction Tree. ACM Transactions on Storage (TOS), 13(4):29:1--29:28, 2017.Google Scholar
- H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 323--336, 2018.Google ScholarDigital Library
- H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. Succinct Range Filters. ACM Transactions on Database Systems (TODS), 45(2):5:1---5:31, 2020.Google Scholar
- W. Zhang, Y. Xu, Y. Li, and D. Li. Improving Write Performance of LSMT-Based Key-Value Store. In 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016, Wuhan, China, December 13--16, 2016, pages 553--560, 2016.Google Scholar
- W. Zhang, X. Zhao, S. Jiang, and H. Jiang. ChameleonDB: a key-value store for optane persistent memory. In EuroSys '21: Sixteenth European Conference on Computer Systems, Online Event, United Kingdom, April 26--28, 2021, pages 194--209, 2021.Google ScholarDigital Library
- Z. Zhang, Y. Yue, B. He, J. Xiong, M. Chen, L. Zhang, and N. Sun. Pipelined Compaction for the LSM-Tree. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19--23, 2014, pages 777--786, 2014.Google Scholar
- Y. Zhu, Z. Zhang, P. Cai, W. Qian, and A. Zhou. An Efficient Bulk Loading Approach of Secondary Index in Distributed Log-Structured Data Stores. In Database Systems for Advanced Applications - 22nd International Conference, DASFAA 2017, Suzhou, China, March 27--30, 2017, Proceedings, Part I, volume 10177 of Lecture Notes in Computer Science, pages 87--102, 2017.Google Scholar
- Z. Zhu, J. H. Mun, A. Raman, and M. Athanassoulis. Reducing Bloom Filter CPU Overhead in LSM-Trees on Modern Storage Devices. http://disc-projects.bu.edu/documents/DiSC-TR-Reducing-BF-Overhead-in-LSM.pdf, 2021.Google Scholar
Index Terms
- Dissecting, Designing, and Optimizing LSM-based Data Stores
Recommendations
Compactionary: A Dictionary for LSM Compactions
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataLog-structured merge (LSM) trees are widely used as the storage layer of modern NoSQL data stores, as they offer efficient ingestion performance. To enable competitive read performance and reduce space amplification, LSM-trees re-organize data layout on ...
Near-Data Processing-Enabled and Time-Aware Compaction Optimization for LSM-tree-based Key-Value Stores
ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingWith the growing volume of storage systems, the traditional relational databases cannot reach the high performance required by big-data applications. As high-throughput alternatives to relational databases, LSM-tree-based key-value stores (KV stores in ...
Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree
ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation ConferenceKey-value store has played a critical role in many large-scale data storage applications. The log-structured merge-tree (LSM-tree) based key-value store achieves excellent performance on write-intensive workloads which is mainly benefited from the ...
Comments