skip to main content
research-article

Performance-optimal filtering: Bloom overtakes Cuckoo at high throughput

Published:01 January 2019Publication History
Skip Abstract Section

Abstract

We define the concept of performance-optimal filtering to indicate the Bloom or Cuckoo filter configuration that best accelerates a particular task. While the space-precision tradeoff of these filters has been well studied, we show how to pick a filter that maximizes the performance for a given workload. This choice might be "suboptimal" relative to traditional space-precision metrics, but it will lead to better performance in practice. In this paper, we focus on high-throughput filter use cases, aimed at avoiding CPU work, e.g., a cache miss, a network message, or a local disk I/O - events that can happen at rates of millions to hundreds per second. Besides the false-positive rate and memory footprint of the filter, performance optimality has to take into account the absolute cost of the filter lookup as well as the saved work per lookup that filtering avoids; while the actual rate of negative lookups in the workload determines whether using a filter improves overall performance at all. In the course of the paper, we introduce new filter variants, namely the register-blocked and cache-sectorized Bloom filters. We present new implementation techniques and perform an extensive evaluation on modern hardware platforms, including the wide-SIMD Skylake-X and Knights Landing. This experimentation shows that in high-throughput situations, the lower lookup cost of blocked Bloom filters allows them to overtake Cuckoo filters.

References

  1. Advanced Micro Devices, Inc. Software Optimization Guide for AMD Family 17h Processors (rev. 3.00). 2017.Google ScholarGoogle Scholar
  2. K. Alexiou, D. Kossmann, and P. Larson. Adaptive range filters for cold data: Avoiding trips to siberia. PVLDB, 6(14):1714--1725, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. S. Almeida, C. Baquero, N. M. Preguiça, and D. Hutchison. Scalable bloom filters. Inf. Process. Lett., 101(6):255--261, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  4. M. A. Bender, M. Farach-Colton, R. Johnson, R. Kraner, B. C. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. P. Spillane, and E. Zadok. Don't thrash: How to cache your hash on flash. PVLDB, 5(11):1627--1637, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In Performance Characterization and Benchmarking - 5th TPC Technology Conference, TPCTC 2013, Trento, Italy, August 26, 2013, Revised Selected Papers, pages 61--76, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An improved construction for counting bloom filters. In Algorithms - ESA 2006, 14th Annual European Symposium, Zurich, Switzerland, September 11--13, 2006, Proceedings, pages 684--695, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Breslow and N. Jayasena. Morton filters: Faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. PVLDB, 11(9):1041--1055, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Z. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. The bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11--14, 2004, pages 30--39, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Cohen and Y. Matias. Spectral bloom filters. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9--12, 2003, pages 241--252, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 79--94, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Donnet, B. Baynat, and T. Friedman. Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives. In Proceedings of the 2006 ACM Conference on Emerging Network Experiment and Technology, CoNEXT 2006, Lisboa, Portugal, December 4--7, 2006, page 13, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Einziger and R. Friedman. TinySet - An access efficient self adjusting bloom filter construction. IEEE/ACM Trans. Netw., 25(4):2295--2307, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, CoNEXT '14, pages 75--88, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Feng, E. Lo, B. Kao, and W. Xu. ByteSlice: Pushing the envelop of main memory data processing with a new storage layout. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 -- June 4, 2015, pages 31--46, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A scan accelerator for rapid and robust predicate evaluation. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 857--872, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2018.Google ScholarGoogle Scholar
  19. H. S. W. Jr. Hacker's Delight, Second Edition. Pearson Education, 2013.Google ScholarGoogle Scholar
  20. A. Kirsch and M. Mitzenmacher. Less hashing, same performance: Building a better bloom filter. Random Struct. Algorithms, 33(2):187--218, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for hadoop. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4--7, 2015, Online Proceedings, 2015.Google ScholarGoogle Scholar
  22. H. Lang, T. Mühlbauer, F. Funke, P. A. Boncz, T. Neumann, and A. Kemper. Data Blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 -- July 01, 2016, pages 311--326, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Li and J. M. Patel. BitWeaving: Fast scans for main memory data processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013, pages 289--300, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Mitzenmacher and E. Upfal. Probability and computing - randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Moerkotte. Small Materialized Aggregates: A light weight index structure for data warehousing. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24--27, 1998, New York City, New York, USA, pages 476--487, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Pagh and F. F. Rodler. Cuckoo hashing. J. Algorithms, 51(2):122--144, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Pandey, M. A. Bender, R. Johnson, and R. Patro. A general-purpose counting filter: Making every bit count. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 775--787, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. O. Polychroniou and K. A. Ross. Vectorized bloom filters for advanced SIMD processors. In Tenth International Workshop on Data Management on New Hardware, DaMoN 2014, Snowbird, UT, USA, June 23, 2014, pages 6:1--6:6, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Polychroniou and K. A. Ross. Efficient lightweight compression alongside fast scans. In Proceedings of the 11th International Workshop on Data Management on New Hardware, DaMoN 2015, Melbourne, VIC, Australia, May 31 -- June 04, 2015, pages 9:1--9:6, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. Putze, P. Sanders, and J. Singler. Cache-, hash-, and space-efficient bloom filters. J. Exp. Algorithmics, 14:4:4.4--4:4.18, Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Sidirourgos and M. L. Kersten. Column imprints: A secondary index structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013, pages 893--904, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Zukowski, M. van de Wiel, and P. A. Boncz. Vectorwise: A vectorized analytical DBMS. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1--5 April, 2012, pages 1349--1350, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 5
    January 2019
    163 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 January 2019
    Published in pvldb Volume 12, Issue 5

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader