Abstract
The Log-Structured Merge-Tree (LSM-tree) has been widely adopted for use in modern NoSQL systems for its superior write performance. Despite the popularity of LSM-trees, they have been criticized for suffering from write stalls and large performance variances due to the inherent mismatch between their fast in-memory writes and slow background I/O operations. In this paper, we use a simple yet effective two-phase experimental approach to evaluate write stalls for various LSM-tree designs. We further identify and explore the design choices of LSM merge schedulers to minimize write stalls given an I/O bandwidth budget. We have conducted extensive experiments in the context of the Apache AsterixDB system and we present the results here.
- AsterixDB. https://asterixdb.apache.org/.Google Scholar
- Cassandra. http://cassandra.apache.org/.Google Scholar
- Compaction stalls: something to make better in RocksDB. http://smalldatum.blogspot.com/2017/01/compaction-stalls-something-to-make.html.Google Scholar
- HBase. https://hbase.apache.org/.Google Scholar
- LevelDB. http://leveldb.org/.Google Scholar
- Read- and latency-optimized log structured merge tree. https://github.com/sears/bLSM/.Google Scholar
- RocksDB. http://rocksdb.org/.Google Scholar
- SyllaDB. https://www.scylladb.com/.Google Scholar
- Tarantool. https://www.tarantool.io/.Google Scholar
- TPC-C. http://www.tpc.org/tpcc/.Google Scholar
- WiredTiger. http://www.wiredtiger.com/.Google Scholar
- YCSB change log. https://github.com/brianfrankcooper/YCSB/blob/master/core/CHANGES.md.Google Scholar
- M. Y. Ahmad and B. Kemme. Compaction management in distributed key-value datastores. PVLDB, 8(8):850--861, 2015.Google ScholarDigital Library
- S. Alsubaiee et al. AsterixDB: A scalable, open source BDMS. PVLDB, 7(14):1905--1916, 2014.Google ScholarDigital Library
- M. Armbrust et al. Piql: Success-tolerant query processing in the cloud. PVLDB, 5(3):181--192, 2011.Google ScholarDigital Library
- M. Armbrust et al. Generalized scale independence through incremental precomputation. In ACM SIGMOD, pages 625--636, 2013.Google ScholarDigital Library
- O. Balmau et al. FloDB: Unlocking memory in persistent key-value stores. In European Conference on Computer Systems (EuroSys), pages 80--94, 2017.Google Scholar
- O. Balmau et al. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In USENIX Annual Technical Conference (ATC), pages 363--375, 2017.Google Scholar
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM, 13(7):422--426, July 1970.Google ScholarDigital Library
- G. Candea et al. A scalable, predictable join operator for highly concurrent data warehouses. PVLDB, 2(1):277--288, 2009.Google ScholarDigital Library
- Z. Cao et al. On the performance variation in modern storage stacks. In USENIX Conference on File and Storage Technologies (FAST), pages 329--343, 2017.Google Scholar
- M. J. Carey. AsterixDB mid-flight: A case study in building systems in academia. In ICDE, pages 1--12, 2019.Google Scholar
- S. Chaudhuri et al. Variance aware optimization of parameterized queries. In ACM SIGMOD, pages 531--542. ACM, 2010.Google ScholarDigital Library
- B. F. Cooper et al. Benchmarking cloud serving systems with YCSB. In ACM SoCC, pages 143--154, 2010.Google ScholarDigital Library
- N. Dayan et al. Monkey: Optimal navigable key-value store. In ACM SIGMOD, pages 79--94, 2017.Google Scholar
- N. Dayan et al. Optimal Bloom filters and adaptive merging for LSM-trees. ACM TODS, 43(4):16:1--16:48, Dec. 2018.Google ScholarDigital Library
- N. Dayan and S. Idreos. Dostoevsky: Better space-time trade-offs for LSM-tree based key-value stores via adaptive removal of superfluous merging. In ACM SIGMOD, pages 505--520, 2018.Google ScholarDigital Library
- N. Dayan and S. Idreos. The log-structured merge-bush & the wacky continuum. In ACM SIGMOD, pages 449--466, 2019.Google ScholarDigital Library
- J. Dean and L. A. Barroso. The tail at scale. CACM, 56:74--80, 2013.Google ScholarDigital Library
- S. Dong et al. Optimizing space amplification in RocksDB. In CIDR, volume 3, page 3, 2017.Google Scholar
- R. Grover and M. J. Carey. Data ingestion in AsterixDB. In EDBT, pages 605--616, 2015.Google Scholar
- M. Harchol-Balter. Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, 2013.Google ScholarDigital Library
- G. Huang et al. X-Engine: An optimized storage engine for large-scale E-commerce transaction processing. In ACM SIGMOD, pages 651--665, 2019.Google Scholar
- J. Huang et al. Statistical analysis of latency through semantic profiling. In European Conference on Computer Systems (EuroSys), pages 64--79, 2017.Google ScholarDigital Library
- J. Huang et al. A top-down approach to achieving performance predictability in database systems. In ACM SIGMOD, pages 745--758, 2017.Google ScholarDigital Library
- Y. Li et al. Tree indexing on solid state drives. PVLDB, 3(1--2):1195--1206, 2010.Google Scholar
- Y. Li et al. Enabling efficient updates in KV storage via hashing: Design and performance evaluation. ACM Transactions on Storage (TOS), 15(3):20, 2019.Google Scholar
- H. Lim et al. Towards accurate and fast evaluation of multi-stage log-structured designs. In USENIX Conference on File and Storage Technologies (FAST), pages 149--166, 2016.Google Scholar
- L. Lu et al. WiscKey: Separating keys from values in SSD-conscious storage. In USENIX Conference on File and Storage Technologies (FAST), pages 133--148, 2016.Google Scholar
- C. Luo and M. J. Carey. Efficient data ingestion and query processing for LSM-based storage systems. PVLDB, 12(5):531--543, 2019.Google ScholarDigital Library
- C. Luo and M. J. Carey. LSM-based storage techniques: a survey. The VLDB Journal, 2019.Google ScholarCross Ref
- C. Luo and M. J. Carey. On performance stability in LSM-based storage systems (extended version). CoRR, abs/1906.09667, 2019.Google Scholar
- C. Luo et al. Umzi: Unified multi-zone indexing for large-scale HTAP. In EDBT, pages 1--12, 2019.Google Scholar
- F. Mei et al. SifrDB: A unified solution for write-optimized key-value stores in large datacenter. In ACM SoCC, pages 477--489, 2018.Google Scholar
- P. O'Neil et al. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4):351--385, 1996.Google ScholarDigital Library
- M. A. Qader et al. A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In ACM SIGMOD, pages 551--566, 2018.Google ScholarDigital Library
- P. Raju et al. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In ACM SOSP, pages 497--514, 2017.Google Scholar
- V. Raman et al. Constant-time query processing. In ICDE, pages 60--69, 2008.Google ScholarDigital Library
- K. Ren et al. SlimDB: A space-efficient key-value storage engine for semi-sorted data. PVLDB, 10(13):2037--2048, 2017.Google ScholarDigital Library
- R. Sears and E. Brewer. Stasis: Flexible transactional storage. In Symposium on Operating Systems Design and Implementation (OSDI), pages 29--44, 2006.Google Scholar
- R. Sears and R. Ramakrishnan. bLSM: A general purpose log structured merge tree. In ACM SIGMOD, pages 217--228, 2012.Google ScholarDigital Library
- D. Teng et al. LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes. In IEEE International Conference on Distributed Computing Systems (ICDCS), pages 68--79, 2017.Google ScholarCross Ref
- R. Thonangi and J. Yang. On log-structured merge for solid-state drives. In ICDE, pages 683--694, 2017.Google ScholarCross Ref
- P. Unterbrunner et al. Predictable performance for unpredictable workloads. PVLDB, 2(1):706--717, 2009.Google ScholarDigital Library
- X. Wang and M. J. Carey. An IDEA: An ingestion framework for data enrichment in AsterixDB. PVLDB., 12(11):1485--1498, 2019.Google ScholarDigital Library
- T. Yao et al. A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In International Conference on Massive Storage Systems and Technology (MSST), 2017.Google Scholar
- C. Yunpeng et al. LDC: a lower-level driven compaction method to optimize SSD-oriented key-value stores. In ICDE, pages 722--733, 2019.Google Scholar
- Y. Zhang et al. ElasticBF: Fine-grained and elastic bloom filter towards efficient read for LSM-tree-based KV stores. In USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage), 2018.Google Scholar
Index Terms
- On performance stability in LSM-based storage systems
Recommendations
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
LSM-Tree Managed Storage for Large-Scale Key-Value Store
Key-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend block storage, and persisting their clustered data through a block manager, usually a file system. In general, a file system is expected to not only ...
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, ...
Comments