ABSTRACT
Data-intensive applications fueled the evolution of log structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as a second-class citizen. A delete inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion. Further, LSM engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written. We highlight that fast persistent deletion without affecting read performance is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on the right-to-be-forgotten, and (iii) en masse cloud deployment of data systems that makes storage a precious resource. To address these challenges, in this paper, we build a new key-value storage engine, Lethe, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show that Lethe supports any user-defined threshold for the delete persistence latency offering higher read throughput (1.17-1.4x) and lower space amplification (2.1-9.8x), with a modest increase in write amplification (between 4% and 25%). In addition, Lethe supports efficient range deletes on a secondary delete key by dropping entire data pages without sacrificing read performance nor employing a costly full tree merge.
Supplemental Material
- California Consumer Privacy Act of 2018. Assembly Bill No. 375, Chapter 55, 2018.Google Scholar
- T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Ferná ndez-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. Proceedings of the VLDB Endowment, 8(12):1792--1803, 2015.Google ScholarDigital Library
- I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A Hands-free Adaptive Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1103--1114, 2014.Google ScholarDigital Library
- Apache. Accumulo. https://accumulo.apache.org/.Google Scholar
- Apache. Cassandra. http://cassandra.apache.org.Google Scholar
- Apache. HBase. http://hbase.apache.org/.Google Scholar
- R. Appuswamy, M. Karpathiotakis, D. Porobic, and A. Ailamaki. The Case For Heterogeneous HTAP. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2017.Google Scholar
- C. Aravindan and P. Baumgartner. A Rational and Efficient Algorithm for View Deletion in Databases. In Proceedings of the International Symposium on Logic Programming (ILPS), pages 165--179, 1997.Google Scholar
- J. Arulraj, A. Pavlo, and P. Menon. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 583--598, 2016.Google ScholarDigital Library
- M. Athanassoulis and A. Ailamaki. BF-Tree: Approximate Tree Indexing. Proceedings of the VLDB Endowment, 7(14):1881--1892, 2014.Google ScholarDigital Library
- M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. MaSM: Efficient Online Updates in Data Warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 865--876, 2011.Google ScholarDigital Library
- M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. Online Updates on Data Warehouses via Judicious Use of Solid-State Storage. ACM Transactions on Database Systems (TODS), 40(1), 2015.Google ScholarDigital Library
- M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 461--466, 2016.Google Scholar
- B. Bhattacharjee, T. Malkemus, S. Lau, S. Mckeough, J.-A. Kirton, R. V. Boeschoten, and J. Kennedy. Efficient Bulk Deletes for Multi Dimensionally Clustered Tables in DB2. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 1197--1206, 2007.Google Scholar
- M. Callaghan. Deletes are fast and slow in an LSM. http://smalldatum.blogspot.com/2020/01/deletes-are-fast-and-slow-in-lsm.html, 2020.Google Scholar
- Z. Cao, S. Dong, S. Vemuri, and D. H. C. Du. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 209--223, 2020.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 205--218, 2006.Google ScholarDigital Library
- Cisco. Cisco Global Cloud Index: Forecast and Methodology, 2016--2021. White Paper, 2018.Google Scholar
- J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. C. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's Globally-Distributed Database. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 251--264, 2012.Google Scholar
- B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. W. Lee, A. Motivala, A. Q. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, and P. Unterbrunner. The Snowflake Elastic Data Warehouse. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 215--226, 2016.Google ScholarDigital Library
- N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 79--94, 2017.Google ScholarDigital Library
- N. Dayan, M. Athanassoulis, and S. Idreos. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (TODS), 43(4):16:1--16:48, 2018.Google ScholarDigital Library
- N. Dayan and S. Idreos. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 505--520, 2018.Google ScholarDigital Library
- N. Dayan and S. Idreos. The Log-Structured Merge-Bush & the Wacky Continuum. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 449--466, 2019.Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6):205--220, 2007.Google ScholarDigital Library
- A. Deshpande and A. Machanavajjhala. ACM SIGMOD Blog: Privacy Challenges in the Post-GDPR World: A Data Management Perspective. http://wp.sigmod.org/?p=2554, 2018.Google Scholar
- S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2017.Google Scholar
- Facebook. MyRocks. http://myrocks.io/.Google Scholar
- Facebook. RocksDB. https://github.com/facebook/rocksdb.Google Scholar
- Gartner. Gartner Says 8.4 Billion Connected "Things" Will Be in Use in 2017, Up 31 Percent From 2016. https://tinyurl.com/Gartner2020, 2017.Google Scholar
- A. G"a rtner, A. Kemper, D. Kossmann, and B. Zeller. Efficient Bulk Deletes in Relational Databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 183--192, 2001.Google Scholar
- R. Geambasu, T. Kohno, A. A. Levy, and H. M. Levy. Vanish: Increasing Data Privacy with Self-Destructing Data. In Proceedings of the USENIX Security Symposium, pages 299--316, 2009.Google Scholar
- R. Geambasu, A. A. Levy, T. Kohno, A. Krishnamurthy, and H. M. Levy. Comet: An active distributed key-value store. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 323--336, 2010.Google Scholar
- M. Goddard. The EU General Data Protection Regulation (GDPR): European Regulation that has a Global Impact. International Journal of Market Research, 59(6):703--705, 2017.Google ScholarCross Ref
- G. Golan-Gueta, E. Bortnikov, E. Hillel, and I. Keidar. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys), pages 32:1--32:14, 2015.Google ScholarDigital Library
- Google. LevelDB. https://github.com/google/leveldb/.Google Scholar
- T. Heinis and A. Ailamaki. Reconsolidating Data Structures. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 665--670, 2015.Google Scholar
- S. Hé man, M. Zukowski, and N. J. Nes. Positional Update Handling in Column Stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 543--554, 2010.Google Scholar
- G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, and Q. Li. X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 651--665, 2019.Google ScholarDigital Library
- Y. Huang, T. Z. J. Fu, D.-M. Chiu, J. C. S. Lui, and C. Huang. Challenges, design and analysis of a large-scale p2p-vod system. In Proceedings of the ACM SIGCOMM 2008 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Seattle, WA, USA, August 17--22, 2008, pages 375--388, 2008.Google ScholarDigital Library
- F. Hueske. State TTL for Apache Flink: How to Limit the Lifetime of State. Ververica, 2018.Google Scholar
- M. L. Kersten. Big Data Space Fungus. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), Gong show talk, 2015.Google Scholar
- S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter Heron: Stream Processing at Scale. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 239--250, 2015.Google ScholarDigital Library
- A. Lamb, M. Fuller, and R. Varadarajan. The Vertica Analytic Database: C-Store 7 Years Later. Proceedings of the VLDB Endowment, 5(12):1790--1801, 2012.Google Scholar
- J.-T. Lee and G. G. Belford. An Efficient Object-based Algorithm for Spatial Searching, Insertion and Deletion. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 40--47, 1992.Google Scholar
- Y. Li, B. He, J. Yang, Q. Luo, K. Yi, and R. J. Yang. Tree Indexing on Solid State Drives. Proceedings of the VLDB Endowment, 3(1--2):1195--1206, 2010.Google ScholarDigital Library
- T. Lilja, R. Saikkonen, S. Sippu, and E. Soisalon-Soininen. Online Bulk Deletion. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 956--965, 2007.Google ScholarCross Ref
- LinkedIn. Voldemort. http://www.project-voldemort.com.Google Scholar
- H. Lu. Peer-to-Peer Support for Massively Multiplayer Games. In Proceedings IEEE INFOCOM 2004, The 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, Hong Kong, China, March 7--11, 2004, 2004.Google Scholar
- C. Luo and M. J. Carey. LSM-based Storage Techniques: A Survey. The VLDB Journal, 2019.Google Scholar
- C. Mohan. Hybrid Transaction and Analytics Processing (HTAP): State of the Art. In Proceedings of the International Workshop on Business Intelligence for the Real-Time Enterprise (BIRTE), 2016.Google Scholar
- MongoDB. Online reference. http://www.mongodb.com/.Google Scholar
- P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.Google ScholarDigital Library
- F. Ö zcan, Y. Tian, and P. Tö zü n. Hybrid Transactional/Analytical Processing: A Survey. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1771--1775, 2017.Google Scholar
- A. Pantelopoulos and N. G. Bourbakis. Prognosis: a wearable health-monitoring system for people at risk: methodology and modeling. IEEE Trans. Information Technology in Biomedicine, 14(3):613--621, 2010.Google ScholarDigital Library
- S. Papadopoulos, K. Datta, S. Madden, and T. Mattson. The TileDB Array Data Storage Manager. Proceedings of the VLDB Endowment, 10(4):349--360, 2016.Google ScholarDigital Library
- P. Raju, R. Kadekodi, V. Chidambaram, and I. Abraham. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 497--514, 2017.Google ScholarDigital Library
- RocksDB. DeleteRange: A New Native RocksDB Operation. https://rocksdb.org/blog/2018/11/21/delete-range.html, 2018.Google Scholar
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26--30, 2010, pages 851--860, 2010.Google ScholarDigital Library
- S. Sarkar, J.-P. Banâ tre, L. Rilling, and C. Morin. Towards Enforcement of the EU GDPR: Enabling Data Erasure. In Proceedings of the IEEE International Conference of Internet of Things (iThings), pages 1--8, 2018.Google ScholarCross Ref
- R. Sears and R. Ramakrishnan. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 217--228, 2012.Google ScholarDigital Library
- SQLite4. Online reference. https://sqlite.org/src4/.Google Scholar
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 553--564, 2005.Google ScholarDigital Library
- R. Thonangi and J. Yang. On Log-Structured Merge for Solid-State Drives. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 683--694, 2017.Google ScholarCross Ref
- Q.-C. To, J. Soto, and V. Markl. A Survey of State Management in Big Data Processing Systems. CoRR, abs/1702.0, 2017.Google Scholar
- P. Van Sandt, Y. Chronis, and J. M. Patel. Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search? In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 36--53, 2019.Google ScholarDigital Library
- Z. Whittaker and N. Lomas. Even years later, Twitter doesn't delete your direct messages. https://techcrunch.com/2019/02/15/twitter-direct-messages/, 2019.Google Scholar
- WiredTiger. Source Code. https://github.com/wiredtiger/wiredtiger.Google Scholar
- H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 323--336, 2018.Google ScholarDigital Library
- M. Zukowski and P. A. Boncz. Vectorwise: Beyond Column Stores. IEEE Data Engineering Bulletin, 35(1):21--27, 2012.Google Scholar
Index Terms
- Lethe: A Tunable Delete-Aware LSM Engine
Recommendations
Enabling Timely and Persistent Deletion in LSM-Engines
Data-intensive applications have fueled the evolution of log-structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost ...
Breaking Down Memory Walls in LSM-based Storage Systems
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataThe log-structured merge-tree (LSM-tree) [16, 18] is widely used in modern NoSQL systems. Different from traditional update-in-place structures, an LSM-tree first buffers all writes in memory, which are subsequently flushed to disk when memory is full. ...
A Low-cost Disk Solution Enabling LSM-tree to Achieve High Performance for Mixed Read/Write Workloads
LSM-tree has been widely used in data management production systems for write-intensive workloads. However, as read and write workloads co-exist under LSM-tree, data accesses can experience long latency and low throughput due to the interferences to ...
Comments