Abstract
This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclusively in RAM, limiting the size of the sets it can represent.
This paper first describes the quotient filter, which supports the basic operations of the Bloom filter, achieving roughly comparable performance in terms of space and time, but with better data locality. Operations on the quotient filter require only a small number of contiguous accesses. The quotient filter has other advantages over the Bloom filter: it supports deletions, it can be dynamically resized, and two quotient filters can be efficiently merged.
The paper then gives two data structures, the buffered quotient filter and the cascade filter, which exploit the quotient filter advantages and thus serve as SSD-optimized alternatives to the Bloom filter. The cascade filter has better asymptotic I/O performance than the buffered quotient filter, but the buffered quotient filter outperforms the cascade filter on small to medium data sets. Both data structures significantly outperform recently-proposed SSD-optimized Bloom filter variants, such as the elevator Bloom filter, buffered Bloom filter, and forest-structured Bloom filter. In experiments, the cascade filter and buffered quotient filter performed insertions 8.6--11 times faster than the fastest Bloom filter variant and performed lookups 0.94--2.56 times faster.
- M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In SPAA, pages 81--92, 2007. Google ScholarDigital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communincations of the ACM, 13(7):422--426, 1970. Google ScholarDigital Library
- F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An improved construction for counting Bloom filters. In ECA, pages 684--695, 2006. Google ScholarDigital Library
- A. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2003.Google ScholarCross Ref
- M. Canim, G. A. Mihaila, B. Bhattacharhee, C. A. Lang, and K. A. Ross. Buffered Bloom filters on solid state storage. In VLDB ADMS Workshop, 2010.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In OSDI, pages 205--218, 2006. Google ScholarDigital Library
- Y. Chen, B. Schmidt, and D. L. Maskell. A reconfigurable Bloom filter architecture for BLASTN. In ARCS, pages 40--49, 2009. Google ScholarDigital Library
- H. Cheung. TG Video: Fusion io - the power of 1000 hard drives in the palm of your hand. http://www.tgdaily.com/hardware-features/34065-tg-video-fusion-io-the-power-of-1000-hard-drives-in-the-palm-of-your-hand, 2007.Google Scholar
- J. G. Cleary. Compact hash tables using bidirectional linear probing. IEEE Transactions on Computing, 33(9):828--834, 1984. Google ScholarDigital Library
- B. Debnath, S. Sengupta, J. Li, D. Lilja, and D. Du. Bloomflash: Bloom filter on flash-based storage. In ICDCS, pages 635--644, 2011. Google ScholarDigital Library
- L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281--293, 2000. Google ScholarDigital Library
- L. Freeman. How NetApp deduplication works - a primer, Apr. 2010. http://blogs.netapp.com/drdedupe/2010/04/how-netapp-deduplication-works.html.Google Scholar
- G. J. Holzmann. Design and validation of computer protocols. Prentice-Hall, Inc., 1991. Google ScholarDigital Library
- K. Keeton, C. B. Morrey, III, C. A. Soules, and A. Veitch. Lazybase: freshness vs. performance in information management. SIGOPS Operating Systems Review, 44(1):15--19, 2010. Google ScholarDigital Library
- D. E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison Wesley, 1973.Google Scholar
- Z. Li, R. Grosu, P. Sehgal, S. A. Smolka, S. D. Stoller, and E. Zadok. On the energy consumption and performance of systems software. In SYSTOR, pages 8:1--8:12, 2011. Google ScholarDigital Library
- G. Lu, B. Debnath, and D. H. C. Du. A forest-structured bloom filter with flash memory. In MSST, pages 1--6, 2011. Google ScholarDigital Library
- K. Malde and B. O'Sullivan. Using Bloom filters for large scale gene sequence analysis in Haskell. In PADL, pages 183--194, 2009. Google ScholarDigital Library
- J. Mullin. Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16(5):558--560, 1990. Google ScholarDigital Library
- A. Singh, M. Srivatsa, L. Liu, and T. Miller. Apoidea: A decentralized peer-to-peer architecture for crawling the world wide web. In SIGIR Workshop on Distributed Multimedia Information Retrieval, pages 126--142, 2003.Google Scholar
- Z. Yuan, J. Miao, Y. Jia, and L. Wang. Counting data stream based on improved counting Bloom filter. In WAIM, pages 512--519, 2008. Google ScholarDigital Library
- B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In FAST, pages 18:1--18:14, 2008. Google ScholarDigital Library
Recommendations
Don't thrash: how to cache your hash on flash
HotStorage'11: Proceedings of the 3rd USENIX conference on Hot topics in storage and file systemsMany large storage systems use approximate-membership-query (AMQ) data structures to deal with the massive amounts of data that they process. An AMQ data structure is a dictionary that trades off space for a false positive rate on membership queries. It ...
Designs of fractional delay filter, Nyquist filter, lowpass filter and diamond-shaped filter
In this paper, the designs of fractional delay filter, Nyquist filter, lowpass filter and diamond-shaped filter are presented. First, the relation between Nyquist filter and fractional delay filter is investigated such that the design tools of one ...
Hybrid structures for low-complexity variable fractional-delay FIR filters
This paper proposes a pair of new structures for implementing low-complexity odd-order and even-order variable fractional-delay (VFD) FIR filters using hybrid structures with both even-order and odd-order subfilters. For odd-order VFD filter design, the ...
Comments