ABSTRACT
We introduce Rosetta, a probabilistic range filter designed specifically for LSM-tree based key-value stores. The core intuition is that we can sacrifice filter probe time because it is not visible in end-to-end key-value store performance, which in turn allows us to significantly reduce the filter false positive rate for every level of the tree. Rosetta indexes all binary prefixes of a key using a hierarchically arranged set of Bloom filters. It then converts each range query into multiple probes, one for each non-overlapping binary prefix. Rosetta has the ability to track workload patterns and adopt a beneficial tuning for each individual LSM-tree run by adjusting the number of Bloom filters it uses and how memory is spread among them to optimize the FPR/CPU cost balance. We show how to integrate Rosetta in a full system, RocksDB, and we demonstrate that it brings as much as a 40x improvement compared to default RocksDB and 2-5x improvement compared to state-of-the-art range filters in a variety of workloads and across different levels of the memory hierarchy (memory, SSD, hard disk). We also show that, unlike state-of-the-art filters, Rosetta brings a net benefit in RocksDB's overall performance, i.e., it improves range queries without losing any performance for point queries.
Supplemental Material
- Abadi, D. J., Boncz, P. A., Harizopoulos, S., Idreos, S., and Madden, S. The Design and Implementation of Modern Column-Oriented Database Systems. Foundations and Trends in Databases 5, 3 (2013), 197--280.Google Scholar
- Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V. R., Bu, Y., Carey, M. J., Cetindil, I., Cheelangi, M., Faraaz, K., Gabrielova, E., Grover, R., Heilbron, Z., Kim, Y.-S., Li, C., Li, G., Ok, J. M., Onose, N., Pirzadeh, P., Tsotras, V. J., Vernica, R., Wen, J., and Westmann, T. AsterixDB: A Scalable, Open Source BDMS. Proceedings of the VLDB Endowment 7, 14 (2014), 1905--1916.Google ScholarDigital Library
- Alsubaiee, S., Carey, M. J., and Li, C. Lsm-based storage and indexing: An old idea with timely benefits. In Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data (2015), pp. 1--6.Google ScholarDigital Library
- Apache. Accumulo. https://accumulo.apache.org/.Google Scholar
- Apache. HBase. http://hbase.apache.org/.Google Scholar
- Armstrong, T. G., Ponnekanti, V., Borthakur, D., and Callaghan, M. Linkbench: a database benchmark based on the facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013), pp. 1185--1196.Google ScholarDigital Library
- Athanassoulis, M., Kester, M. S., Maas, L. M., Stoica, R., Idreos, S., Ailamaki, A., and Callaghan, M. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT) (2016), pp. 461--466.Google Scholar
- Balmau, O., Didona, D., Guerraoui, R., Zwaenepoel, W., Yuan, H., Arora, A., Gupta, K., and Konka, P. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. In Proceedings of the USENIX Annual Technical Conference (ATC) (2017), pp. 363--375.Google Scholar
- Bender, M. A., Farach-Colton, M., Johnson, R., Kraner, R., Kuszmaul, B. C., Medjedovic, D., Montes, P., Shetty, P., Spillane, R. P., and Zadok, E. Don't Thrash: How to Cache Your Hash on Flash. Proceedings of the VLDB Endowment 5, 11 (2012), 1627--1637.Google ScholarDigital Library
- Bloom, B. H. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13, 7 (1970), 422--426.Google ScholarDigital Library
- Bortnikov, E., Braginsky, A., Hillel, E., Keidar, I., and Sheffi, G. Accordion: Better Memory Organization for LSM Key-Value Stores. Proceedings of the VLDB Endowment 11, 12 (2018), 1863--1875.Google ScholarDigital Library
- Bu, Y., Borkar, V. R., Jia, J., Carey, M. J., and Condie, T. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proceedings of the VLDB Endowment 8, 2 (2014), 161--172.Google ScholarDigital Library
- Callaghan, M. CPU overheads for RocksDB queries. http://smalldatum.blogspot.com/2018/07/query-cpu-overheads-in-rocksdb.html, July 2018.Google Scholar
- Cao, Z., Chen, S., Li, F., Wang, M., and Wang, X. S. LogKV: Exploiting Key-Value Stores for Log Processing. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2013).Google Scholar
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A distributed storage system for structured data. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2006), pp. 205--218.Google Scholar
- Chen, G. J., Wiener, J. L., Iyer, S., Jaiswal, A., Lei, R., Simha, N., Wang, W., Wilfong, K., Williamson, T., and Yilmaz, S. Realtime data processing at facebook. In Proceedings of the 2016 International Conference on Management of Data (2016), pp. 1087--1098.Google ScholarDigital Library
- CockroachLabs. CockroachDB. https://github.com/cockroachdb/cockroach.Google Scholar
- Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SoCC) (2010), pp. 143--154.Google ScholarDigital Library
- Cormode, G. Sketch techniques for approximate query processing. In Foundations and Trends in Databases (2011).Google Scholar
- Cormode, G., Garofalakis, M. N., Haas, P. J., and Jermaine, C. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Foundations and Trends in Databases 4, 1--3 (2012), 1--294.Google ScholarDigital Library
- Cormode, G., Kulkarni, T., and Srivastava, D. Answering Range Queries Under Local Differential Privacy. In arXiv:1812.10942 (2018).Google Scholar
- Cormode, G., and Muthukrishnan, S. What's Hot and What's Not: Tracking Most Frequent Items Dynamically. In PODS (2003).Google ScholarDigital Library
- Cormode, G., and Muthukrishnan, S. An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. In LATIN 2004: Theoretical Informatics, 6th Latin American Symposium, Buenos Aires, Argentina, April 5--8, 2004, Proceedings (2004), vol. 2976, pp. 29--38.Google Scholar
- Dayan, N., Athanassoulis, M., and Idreos, S. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2017), pp. 79--94.Google ScholarDigital Library
- Dayan, N., Athanassoulis, M., and Idreos, S. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (TODS) 43, 4 (2018), 16:1--16:48.Google Scholar
- Dayan, N., Bonnet, P., and Idreos, S. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2016), pp. 327--342.Google ScholarDigital Library
- Dayan, N., and Idreos, S. Dostoevsky: Better space-time trade-offs for lsm-tree based key-value stores via adaptive removal of superfluous merging. In Proceedings of the 2018 International Conference on Management of Data (2018), pp. 505--520.Google ScholarDigital Library
- Dayan, N., and Idreos, S. The log-structured merge-bush & the wacky continuum. In Proceedings of the 2019 International Conference on Management of Data (2019), pp. 449--466.Google ScholarDigital Library
- de Berg, M., van Kreveld, M., Overmars, M., and Schwarzkopf, O. C. More geometric data structures. In Computational Geometry. 2000, pp. 211--233.Google ScholarCross Ref
- DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review 41, 6 (2007), 205--220.Google ScholarDigital Library
- Demertzis, I., Papadopoulos, S., Papapetrou, O., Deligiannakis, A., and Garofalakis, M. Practical Private Range Search Revisited. In ACM SIGMOD (2016).Google ScholarDigital Library
- Dgraph. Badger Key-value DB in Go. https://github.com/dgraph-io/badger.Google Scholar
- Dharmapurikar, S., Krishnamurthy, P., and Taylor, D. E. Longest prefix matching using bloom filters. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (2003), pp. 201--212.Google ScholarDigital Library
- Dong, S., Callaghan, M., Galanis, L., Borthakur, D., Savor, T., and Strum, M. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2017).Google Scholar
- Facebook. MyRocks. http://myrocks.io/.Google Scholar
- Facebook. RocksDB. https://github.com/facebook/rocksdb.Google Scholar
- Fan, B., Andersen, D. G., Kaminsky, M., and Mitzenmacher, M. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT) (2014), pp. 75--88.Google ScholarDigital Library
- Gembalczyk, D., Schuhknecht, F. M., and Dittrich, J. An Experimental Analysis of Different Key-Value Stores and Relational Databases. In Datenbanksysteme fur Business, Technologie und Web (BTW'17) (2017).Google Scholar
- Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., and Strauss, M. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the $34^th$ ACM Symposium on Theory of Computing (2002).Google ScholarDigital Library
- Gilbert, A., Kotidis, Y., Muthukrishnan, S., and Strauss, M. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proceedings of International Conference on Very Large Data Bases (2003).Google Scholar
- Gilbert, A. C., Kotidis, Y., S.Muthukrishnan, and M.Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In Proceedings of International Conference on Very Large Data Bases (2002).Google ScholarCross Ref
- Golan-Gueta, G., Bortnikov, E., Hillel, E., and Keidar, I. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys) (2015), pp. 32:1--32:14.Google Scholar
- Google. LevelDB. https://github.com/google/leveldb/.Google Scholar
- Goswami, M., Grønlund, A., Larsen, K. G., and Pagh, R. Approximate range emptiness in constant time and optimal space. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms (2014), pp. 769--775.Google Scholar
- Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., and Zhu, Z. Design continuums and the path toward self-designing key-value stores that know and learn. In Biennial Conference on Innovative Data Systems Research (CIDR) (2019).Google Scholar
- Jagadish, H. V., Narayan, P. P. S., Seshadri, S., Sudarshan, S., and Kanneganti, R. Incremental Organization for Data Recording and Warehousing. In Proceedings of the International Conference on Very Large Data Bases (VLDB) (1997), pp. 16--25.Google Scholar
- Jannen, W., Yuan, J., Zhan, Y., Akshintala, A., Esmet, J., Jiao, Y., Mittal, A., Pandey, P., Reddy, P., Walsh, L., Bender, M. A., Farach-Colton, M., Johnson, R., Kuszmaul, B. C., and Porter, D. E. BetrFS: A Right-optimized Write-optimized File System. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST) (2015), pp. 301--315.Google Scholar
- Jermaine, C., Omiecinski, E., and Yee, W. G. The Partitioned Exponential File for Database Storage Management. The VLDB Journal 16, 4 (2007), 417--437.Google Scholar
- Kahveci, T., and Singh, A. Variable length queries for time series data. In Proceedings 17th International Conference on Data Engineering (2001), pp. 273--282.Google ScholarDigital Library
- Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut palm: Static and streaming data series exploration now in your palm. In Proceedings of the 2019 International Conference on Management of Data (2019), pp. 1941--1944.Google ScholarDigital Library
- Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut: sortable summarizations for scalable indexes over static and streaming data series. The VLDB Journal (09 2019).Google Scholar
- Kyrola, A., and Guestrin, C. Graphchi-db: Simple design for a scalable graph database system--on just a pc. arXiv preprint arXiv:1403.0701 (2014).Google Scholar
- Lakshman, A., and Malik, P. Cassandra - A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35--40.Google ScholarDigital Library
- Li, Y., He, B., Yang, J., Luo, Q., Yi, K., and Yang, R. J. Tree Indexing on Solid State Drives. Proceedings of the VLDB Endowment 3, 1--2 (2010), 1195--1206.Google Scholar
- LinkedIn. Voldemort. http://www.project-voldemort.com.Google Scholar
- Lu, L., Pillai, T. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST) (2016), pp. 133--148.Google ScholarDigital Library
- Luo, C., and Carey, M. J. LSM-based Storage Techniques: A Survey. arXiv:1812.07527v3 (2019).Google Scholar
- O'Neil, P. E., Cheng, E., Gawlick, D., and O'Neil, E. J. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.Google ScholarDigital Library
- Peng, Y., Guo, J., Li, F., Qian, W., and Zhou, A. Persistent bloom filter: Membership testing for the entire history. In Proceedings of the 2018 International Conference on Management of Data (2018), pp. 1037--1052.Google ScholarDigital Library
- Pirza, P., Tatemura, J., Po, O., and Hacigumus, H. Performance Evaluation of Range Queries in Key Value Stores. J Grid Computing (2012).Google Scholar
- Raju, P., Kadekodi, R., Chidambaram, V., and Abraham, I. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP) (2017), pp. 497--514.Google ScholarDigital Library
- Ren, K., Zheng, Q., Arulraj, J., and Gibson, G. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment 10, 13 (2017), 2037--2048.Google Scholar
- Sears, R., Callaghan, M., and Brewer, E. Rose: Compressed, log-structured replication. 526--537.Google Scholar
- Sears, R., and Ramakrishnan, R. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2012), pp. 217--228.Google ScholarDigital Library
- Shetty, P., Spillane, R. P., Malpani, R., Andrews, B., Seyster, J., and Zadok, E. Building Workload-Independent Storage with VT-trees. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST) (2013), pp. 17--30.Google Scholar
- Stanley, R. P. Catalan numbers. Cambridge University Press, 2015.Google ScholarCross Ref
- Thonangi, R., and Yang, J. On Log-Structured Merge for Solid-State Drives. In Proceedings of the IEEE International Conference on Data Engineering (ICDE) (2017), pp. 683--694.Google Scholar
- Van Kreveld, M., Schwarzkopf, O., de Berg, M., and Overmars, M. Computational geometry algorithms and applications. Springer, 2000.Google Scholar
- Vincon, T., Hardock, S., Riegger, C., Oppermann, J., Koch, A., and Petrov, I. Noftl-kv: Tackling write-amplification on kv-stores with native storage management.Google Scholar
- Wei, Z., Luo, G., Yi, K., Du, X., and Wen, J. R. Persistent Data Sketching. In ACM SIGMOD (2015).Google ScholarDigital Library
- WiredTiger. Source Code. https://github.com/wiredtiger/wiredtiger.Google Scholar
- Wu, X., Xu, Y., Shao, Z., and Jiang, S. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In Proceedings of the USENIX Annual Technical Conference (ATC) (2015), pp. 71--82.Google Scholar
- Yahoo. bLSM: Read-and latency-optimized log structured merge tree. https://github.com/sears/bLSM (2016).Google Scholar
- Zhang, Y., Li, Y., Guo, F., Li, C., and Xu, Y. ElasticBF: Fine-grained and Elastic Bloom Filter Towards Efficient Read for LSM-tree-based KV Stores. In Proceedings of the USENIX Conference on Hot Topics in Storage and File Systems (HotStorage) (2018).Google Scholar
Index Terms
- Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores
Recommendations
Building Efficient Key-Value Stores via a Lightweight Compaction Tree
Special Issue on MSST 2017 and Regular PapersLog-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by ...
A Read-Optimized Index Structure for Distributed Log-Structured Key-Value Store
COMPSAC '15: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 03Recently, Big Data processing is becoming a necessary technique to efficiently store, manage, and analyze massive data obtained by social media contents. NoSQL is one of databases that efficiently handle Big Data compared to the traditional database ...
Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree
ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation ConferenceKey-value store has played a critical role in many large-scale data storage applications. The log-structured merge-tree (LSM-tree) based key-value store achieves excellent performance on write-intensive workloads which is mainly benefited from the ...
Comments