skip to main content
10.1145/3318464.3389731acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores

Published:31 May 2020Publication History

ABSTRACT

We introduce Rosetta, a probabilistic range filter designed specifically for LSM-tree based key-value stores. The core intuition is that we can sacrifice filter probe time because it is not visible in end-to-end key-value store performance, which in turn allows us to significantly reduce the filter false positive rate for every level of the tree. Rosetta indexes all binary prefixes of a key using a hierarchically arranged set of Bloom filters. It then converts each range query into multiple probes, one for each non-overlapping binary prefix. Rosetta has the ability to track workload patterns and adopt a beneficial tuning for each individual LSM-tree run by adjusting the number of Bloom filters it uses and how memory is spread among them to optimize the FPR/CPU cost balance. We show how to integrate Rosetta in a full system, RocksDB, and we demonstrate that it brings as much as a 40x improvement compared to default RocksDB and 2-5x improvement compared to state-of-the-art range filters in a variety of workloads and across different levels of the memory hierarchy (memory, SSD, hard disk). We also show that, unlike state-of-the-art filters, Rosetta brings a net benefit in RocksDB's overall performance, i.e., it improves range queries without losing any performance for point queries.

Skip Supplemental Material Section

Supplemental Material

3318464.3389731.mp4

mp4

95.6 MB

References

  1. Abadi, D. J., Boncz, P. A., Harizopoulos, S., Idreos, S., and Madden, S. The Design and Implementation of Modern Column-Oriented Database Systems. Foundations and Trends in Databases 5, 3 (2013), 197--280.Google ScholarGoogle Scholar
  2. Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V. R., Bu, Y., Carey, M. J., Cetindil, I., Cheelangi, M., Faraaz, K., Gabrielova, E., Grover, R., Heilbron, Z., Kim, Y.-S., Li, C., Li, G., Ok, J. M., Onose, N., Pirzadeh, P., Tsotras, V. J., Vernica, R., Wen, J., and Westmann, T. AsterixDB: A Scalable, Open Source BDMS. Proceedings of the VLDB Endowment 7, 14 (2014), 1905--1916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alsubaiee, S., Carey, M. J., and Li, C. Lsm-based storage and indexing: An old idea with timely benefits. In Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data (2015), pp. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Apache. Accumulo. https://accumulo.apache.org/.Google ScholarGoogle Scholar
  5. Apache. HBase. http://hbase.apache.org/.Google ScholarGoogle Scholar
  6. Armstrong, T. G., Ponnekanti, V., Borthakur, D., and Callaghan, M. Linkbench: a database benchmark based on the facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013), pp. 1185--1196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Athanassoulis, M., Kester, M. S., Maas, L. M., Stoica, R., Idreos, S., Ailamaki, A., and Callaghan, M. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT) (2016), pp. 461--466.Google ScholarGoogle Scholar
  8. Balmau, O., Didona, D., Guerraoui, R., Zwaenepoel, W., Yuan, H., Arora, A., Gupta, K., and Konka, P. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. In Proceedings of the USENIX Annual Technical Conference (ATC) (2017), pp. 363--375.Google ScholarGoogle Scholar
  9. Bender, M. A., Farach-Colton, M., Johnson, R., Kraner, R., Kuszmaul, B. C., Medjedovic, D., Montes, P., Shetty, P., Spillane, R. P., and Zadok, E. Don't Thrash: How to Cache Your Hash on Flash. Proceedings of the VLDB Endowment 5, 11 (2012), 1627--1637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bloom, B. H. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13, 7 (1970), 422--426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bortnikov, E., Braginsky, A., Hillel, E., Keidar, I., and Sheffi, G. Accordion: Better Memory Organization for LSM Key-Value Stores. Proceedings of the VLDB Endowment 11, 12 (2018), 1863--1875.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bu, Y., Borkar, V. R., Jia, J., Carey, M. J., and Condie, T. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proceedings of the VLDB Endowment 8, 2 (2014), 161--172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Callaghan, M. CPU overheads for RocksDB queries. http://smalldatum.blogspot.com/2018/07/query-cpu-overheads-in-rocksdb.html, July 2018.Google ScholarGoogle Scholar
  14. Cao, Z., Chen, S., Li, F., Wang, M., and Wang, X. S. LogKV: Exploiting Key-Value Stores for Log Processing. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2013).Google ScholarGoogle Scholar
  15. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A distributed storage system for structured data. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2006), pp. 205--218.Google ScholarGoogle Scholar
  16. Chen, G. J., Wiener, J. L., Iyer, S., Jaiswal, A., Lei, R., Simha, N., Wang, W., Wilfong, K., Williamson, T., and Yilmaz, S. Realtime data processing at facebook. In Proceedings of the 2016 International Conference on Management of Data (2016), pp. 1087--1098.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. CockroachLabs. CockroachDB. https://github.com/cockroachdb/cockroach.Google ScholarGoogle Scholar
  18. Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SoCC) (2010), pp. 143--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cormode, G. Sketch techniques for approximate query processing. In Foundations and Trends in Databases (2011).Google ScholarGoogle Scholar
  20. Cormode, G., Garofalakis, M. N., Haas, P. J., and Jermaine, C. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Foundations and Trends in Databases 4, 1--3 (2012), 1--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Cormode, G., Kulkarni, T., and Srivastava, D. Answering Range Queries Under Local Differential Privacy. In arXiv:1812.10942 (2018).Google ScholarGoogle Scholar
  22. Cormode, G., and Muthukrishnan, S. What's Hot and What's Not: Tracking Most Frequent Items Dynamically. In PODS (2003).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cormode, G., and Muthukrishnan, S. An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. In LATIN 2004: Theoretical Informatics, 6th Latin American Symposium, Buenos Aires, Argentina, April 5--8, 2004, Proceedings (2004), vol. 2976, pp. 29--38.Google ScholarGoogle Scholar
  24. Dayan, N., Athanassoulis, M., and Idreos, S. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2017), pp. 79--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dayan, N., Athanassoulis, M., and Idreos, S. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (TODS) 43, 4 (2018), 16:1--16:48.Google ScholarGoogle Scholar
  26. Dayan, N., Bonnet, P., and Idreos, S. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2016), pp. 327--342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Dayan, N., and Idreos, S. Dostoevsky: Better space-time trade-offs for lsm-tree based key-value stores via adaptive removal of superfluous merging. In Proceedings of the 2018 International Conference on Management of Data (2018), pp. 505--520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Dayan, N., and Idreos, S. The log-structured merge-bush & the wacky continuum. In Proceedings of the 2019 International Conference on Management of Data (2019), pp. 449--466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. de Berg, M., van Kreveld, M., Overmars, M., and Schwarzkopf, O. C. More geometric data structures. In Computational Geometry. 2000, pp. 211--233.Google ScholarGoogle ScholarCross RefCross Ref
  30. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review 41, 6 (2007), 205--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Demertzis, I., Papadopoulos, S., Papapetrou, O., Deligiannakis, A., and Garofalakis, M. Practical Private Range Search Revisited. In ACM SIGMOD (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dgraph. Badger Key-value DB in Go. https://github.com/dgraph-io/badger.Google ScholarGoogle Scholar
  33. Dharmapurikar, S., Krishnamurthy, P., and Taylor, D. E. Longest prefix matching using bloom filters. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (2003), pp. 201--212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dong, S., Callaghan, M., Galanis, L., Borthakur, D., Savor, T., and Strum, M. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2017).Google ScholarGoogle Scholar
  35. Facebook. MyRocks. http://myrocks.io/.Google ScholarGoogle Scholar
  36. Facebook. RocksDB. https://github.com/facebook/rocksdb.Google ScholarGoogle Scholar
  37. Fan, B., Andersen, D. G., Kaminsky, M., and Mitzenmacher, M. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT) (2014), pp. 75--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Gembalczyk, D., Schuhknecht, F. M., and Dittrich, J. An Experimental Analysis of Different Key-Value Stores and Relational Databases. In Datenbanksysteme fur Business, Technologie und Web (BTW'17) (2017).Google ScholarGoogle Scholar
  39. Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., and Strauss, M. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the $34^th$ ACM Symposium on Theory of Computing (2002).Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gilbert, A., Kotidis, Y., Muthukrishnan, S., and Strauss, M. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proceedings of International Conference on Very Large Data Bases (2003).Google ScholarGoogle Scholar
  41. Gilbert, A. C., Kotidis, Y., S.Muthukrishnan, and M.Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In Proceedings of International Conference on Very Large Data Bases (2002).Google ScholarGoogle ScholarCross RefCross Ref
  42. Golan-Gueta, G., Bortnikov, E., Hillel, E., and Keidar, I. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys) (2015), pp. 32:1--32:14.Google ScholarGoogle Scholar
  43. Google. LevelDB. https://github.com/google/leveldb/.Google ScholarGoogle Scholar
  44. Goswami, M., Grønlund, A., Larsen, K. G., and Pagh, R. Approximate range emptiness in constant time and optimal space. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms (2014), pp. 769--775.Google ScholarGoogle Scholar
  45. Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., and Zhu, Z. Design continuums and the path toward self-designing key-value stores that know and learn. In Biennial Conference on Innovative Data Systems Research (CIDR) (2019).Google ScholarGoogle Scholar
  46. Jagadish, H. V., Narayan, P. P. S., Seshadri, S., Sudarshan, S., and Kanneganti, R. Incremental Organization for Data Recording and Warehousing. In Proceedings of the International Conference on Very Large Data Bases (VLDB) (1997), pp. 16--25.Google ScholarGoogle Scholar
  47. Jannen, W., Yuan, J., Zhan, Y., Akshintala, A., Esmet, J., Jiao, Y., Mittal, A., Pandey, P., Reddy, P., Walsh, L., Bender, M. A., Farach-Colton, M., Johnson, R., Kuszmaul, B. C., and Porter, D. E. BetrFS: A Right-optimized Write-optimized File System. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST) (2015), pp. 301--315.Google ScholarGoogle Scholar
  48. Jermaine, C., Omiecinski, E., and Yee, W. G. The Partitioned Exponential File for Database Storage Management. The VLDB Journal 16, 4 (2007), 417--437.Google ScholarGoogle Scholar
  49. Kahveci, T., and Singh, A. Variable length queries for time series data. In Proceedings 17th International Conference on Data Engineering (2001), pp. 273--282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut palm: Static and streaming data series exploration now in your palm. In Proceedings of the 2019 International Conference on Management of Data (2019), pp. 1941--1944.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut: sortable summarizations for scalable indexes over static and streaming data series. The VLDB Journal (09 2019).Google ScholarGoogle Scholar
  52. Kyrola, A., and Guestrin, C. Graphchi-db: Simple design for a scalable graph database system--on just a pc. arXiv preprint arXiv:1403.0701 (2014).Google ScholarGoogle Scholar
  53. Lakshman, A., and Malik, P. Cassandra - A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Li, Y., He, B., Yang, J., Luo, Q., Yi, K., and Yang, R. J. Tree Indexing on Solid State Drives. Proceedings of the VLDB Endowment 3, 1--2 (2010), 1195--1206.Google ScholarGoogle Scholar
  55. LinkedIn. Voldemort. http://www.project-voldemort.com.Google ScholarGoogle Scholar
  56. Lu, L., Pillai, T. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST) (2016), pp. 133--148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Luo, C., and Carey, M. J. LSM-based Storage Techniques: A Survey. arXiv:1812.07527v3 (2019).Google ScholarGoogle Scholar
  58. O'Neil, P. E., Cheng, E., Gawlick, D., and O'Neil, E. J. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Peng, Y., Guo, J., Li, F., Qian, W., and Zhou, A. Persistent bloom filter: Membership testing for the entire history. In Proceedings of the 2018 International Conference on Management of Data (2018), pp. 1037--1052.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Pirza, P., Tatemura, J., Po, O., and Hacigumus, H. Performance Evaluation of Range Queries in Key Value Stores. J Grid Computing (2012).Google ScholarGoogle Scholar
  61. Raju, P., Kadekodi, R., Chidambaram, V., and Abraham, I. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP) (2017), pp. 497--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Ren, K., Zheng, Q., Arulraj, J., and Gibson, G. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment 10, 13 (2017), 2037--2048.Google ScholarGoogle Scholar
  63. Sears, R., Callaghan, M., and Brewer, E. Rose: Compressed, log-structured replication. 526--537.Google ScholarGoogle Scholar
  64. Sears, R., and Ramakrishnan, R. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2012), pp. 217--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Shetty, P., Spillane, R. P., Malpani, R., Andrews, B., Seyster, J., and Zadok, E. Building Workload-Independent Storage with VT-trees. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST) (2013), pp. 17--30.Google ScholarGoogle Scholar
  66. Stanley, R. P. Catalan numbers. Cambridge University Press, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  67. Thonangi, R., and Yang, J. On Log-Structured Merge for Solid-State Drives. In Proceedings of the IEEE International Conference on Data Engineering (ICDE) (2017), pp. 683--694.Google ScholarGoogle Scholar
  68. Van Kreveld, M., Schwarzkopf, O., de Berg, M., and Overmars, M. Computational geometry algorithms and applications. Springer, 2000.Google ScholarGoogle Scholar
  69. Vincon, T., Hardock, S., Riegger, C., Oppermann, J., Koch, A., and Petrov, I. Noftl-kv: Tackling write-amplification on kv-stores with native storage management.Google ScholarGoogle Scholar
  70. Wei, Z., Luo, G., Yi, K., Du, X., and Wen, J. R. Persistent Data Sketching. In ACM SIGMOD (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. WiredTiger. Source Code. https://github.com/wiredtiger/wiredtiger.Google ScholarGoogle Scholar
  72. Wu, X., Xu, Y., Shao, Z., and Jiang, S. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In Proceedings of the USENIX Annual Technical Conference (ATC) (2015), pp. 71--82.Google ScholarGoogle Scholar
  73. Yahoo. bLSM: Read-and latency-optimized log structured merge tree. https://github.com/sears/bLSM (2016).Google ScholarGoogle Scholar
  74. Zhang, Y., Li, Y., Guo, F., Li, C., and Xu, Y. ElasticBF: Fine-grained and Elastic Bloom Filter Towards Efficient Read for LSM-tree-based KV Stores. In Proceedings of the USENIX Conference on Hot Topics in Storage and File Systems (HotStorage) (2018).Google ScholarGoogle Scholar

Index Terms

  1. Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
        June 2020
        2925 pages
        ISBN:9781450367356
        DOI:10.1145/3318464

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 May 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader