Abstract
We define the concept of performance-optimal filtering to indicate the Bloom or Cuckoo filter configuration that best accelerates a particular task. While the space-precision tradeoff of these filters has been well studied, we show how to pick a filter that maximizes the performance for a given workload. This choice might be "suboptimal" relative to traditional space-precision metrics, but it will lead to better performance in practice. In this paper, we focus on high-throughput filter use cases, aimed at avoiding CPU work, e.g., a cache miss, a network message, or a local disk I/O - events that can happen at rates of millions to hundreds per second. Besides the false-positive rate and memory footprint of the filter, performance optimality has to take into account the absolute cost of the filter lookup as well as the saved work per lookup that filtering avoids; while the actual rate of negative lookups in the workload determines whether using a filter improves overall performance at all. In the course of the paper, we introduce new filter variants, namely the register-blocked and cache-sectorized Bloom filters. We present new implementation techniques and perform an extensive evaluation on modern hardware platforms, including the wide-SIMD Skylake-X and Knights Landing. This experimentation shows that in high-throughput situations, the lower lookup cost of blocked Bloom filters allows them to overtake Cuckoo filters.
- Advanced Micro Devices, Inc. Software Optimization Guide for AMD Family 17h Processors (rev. 3.00). 2017.Google Scholar
- K. Alexiou, D. Kossmann, and P. Larson. Adaptive range filters for cold data: Avoiding trips to siberia. PVLDB, 6(14):1714--1725, 2013. Google ScholarDigital Library
- P. S. Almeida, C. Baquero, N. M. Preguiça, and D. Hutchison. Scalable bloom filters. Inf. Process. Lett., 101(6):255--261, 2007. Google ScholarCross Ref
- M. A. Bender, M. Farach-Colton, R. Johnson, R. Kraner, B. C. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. P. Spillane, and E. Zadok. Don't thrash: How to cache your hash on flash. PVLDB, 5(11):1627--1637, 2012. Google ScholarDigital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarDigital Library
- P. A. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In Performance Characterization and Benchmarking - 5th TPC Technology Conference, TPCTC 2013, Trento, Italy, August 26, 2013, Revised Selected Papers, pages 61--76, 2013. Google ScholarDigital Library
- F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An improved construction for counting bloom filters. In Algorithms - ESA 2006, 14th Annual European Symposium, Zurich, Switzerland, September 11--13, 2006, Proceedings, pages 684--695, 2006. Google ScholarDigital Library
- A. Breslow and N. Jayasena. Morton filters: Faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. PVLDB, 11(9):1041--1055, 2018. Google ScholarDigital Library
- A. Z. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2003.Google ScholarCross Ref
- B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. The bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11--14, 2004, pages 30--39, 2004. Google ScholarDigital Library
- S. Cohen and Y. Matias. Spectral bloom filters. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9--12, 2003, pages 241--252, 2003. Google ScholarDigital Library
- N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 79--94, 2017. Google ScholarDigital Library
- B. Donnet, B. Baynat, and T. Friedman. Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives. In Proceedings of the 2006 ACM Conference on Emerging Network Experiment and Technology, CoNEXT 2006, Lisboa, Portugal, December 4--7, 2006, page 13, 2006. Google ScholarDigital Library
- G. Einziger and R. Friedman. TinySet - An access efficient self adjusting bloom filter construction. IEEE/ACM Trans. Netw., 25(4):2295--2307, 2017. Google ScholarDigital Library
- B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, CoNEXT '14, pages 75--88, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- Z. Feng, E. Lo, B. Kao, and W. Xu. ByteSlice: Pushing the envelop of main memory data processing with a new storage layout. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 -- June 4, 2015, pages 31--46, 2015. Google ScholarDigital Library
- B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A scan accelerator for rapid and robust predicate evaluation. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 857--872, 2018. Google ScholarDigital Library
- Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2018.Google Scholar
- H. S. W. Jr. Hacker's Delight, Second Edition. Pearson Education, 2013.Google Scholar
- A. Kirsch and M. Mitzenmacher. Less hashing, same performance: Building a better bloom filter. Random Struct. Algorithms, 33(2):187--218, 2008. Google ScholarCross Ref
- M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for hadoop. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4--7, 2015, Online Proceedings, 2015.Google Scholar
- H. Lang, T. Mühlbauer, F. Funke, P. A. Boncz, T. Neumann, and A. Kemper. Data Blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 -- July 01, 2016, pages 311--326, 2016. Google ScholarDigital Library
- V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015. Google ScholarDigital Library
- Y. Li and J. M. Patel. BitWeaving: Fast scans for main memory data processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013, pages 289--300, 2013. Google ScholarDigital Library
- M. Mitzenmacher and E. Upfal. Probability and computing - randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. Google ScholarDigital Library
- G. Moerkotte. Small Materialized Aggregates: A light weight index structure for data warehousing. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24--27, 1998, New York City, New York, USA, pages 476--487, 1998. Google ScholarDigital Library
- R. Pagh and F. F. Rodler. Cuckoo hashing. J. Algorithms, 51(2):122--144, 2004. Google ScholarDigital Library
- P. Pandey, M. A. Bender, R. Johnson, and R. Patro. A general-purpose counting filter: Making every bit count. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 775--787, 2017. Google ScholarDigital Library
- O. Polychroniou and K. A. Ross. Vectorized bloom filters for advanced SIMD processors. In Tenth International Workshop on Data Management on New Hardware, DaMoN 2014, Snowbird, UT, USA, June 23, 2014, pages 6:1--6:6, 2014. Google ScholarDigital Library
- O. Polychroniou and K. A. Ross. Efficient lightweight compression alongside fast scans. In Proceedings of the 11th International Workshop on Data Management on New Hardware, DaMoN 2015, Melbourne, VIC, Australia, May 31 -- June 04, 2015, pages 9:1--9:6, 2015. Google ScholarDigital Library
- F. Putze, P. Sanders, and J. Singler. Cache-, hash-, and space-efficient bloom filters. J. Exp. Algorithmics, 14:4:4.4--4:4.18, Jan. 2010. Google ScholarDigital Library
- L. Sidirourgos and M. L. Kersten. Column imprints: A secondary index structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013, pages 893--904, 2013. Google ScholarDigital Library
- M. Zukowski, M. van de Wiel, and P. A. Boncz. Vectorwise: A vectorized analytical DBMS. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1--5 April, 2012, pages 1349--1350, 2012. Google ScholarDigital Library
Recommendations
Multirate Kalman filtering approach for optimal two-dimensional signal reconstruction from noisy subband systems
ICIP '97: Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1Conventional synthesis filters in subband systems lose their optimality when additive noise due, for example, to signal quantization, disturbs the subband components. The multichannel representation of the subband signal is combined with the statistical ...
Filtering directory lookups in CMPs
Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with shared cache and directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed ...
MOTF: Multi-objective Optimal Trilateral Filtering based partial moving frame algorithm for image denoising
AbstractIn this paper, a novel denoising approach based on optimal trilateral filtering using Grey Wolf Optimization (GWO) is proposed. At first, a database of noisy images are generated by adding Gaussian noise, Salt & Pepper noise and Random noise to ...
Comments