research-article

Don't thrash: how to cache your hash on flash

Authors:
Michael A. Bender

Stony Brook University and Tokutek, Inc.

Stony Brook University and Tokutek, Inc.
View Profile

,
Martin Farach-Colton

Rutgers University and Tokutek, Inc.

Rutgers University and Tokutek, Inc.
View Profile

,
Rob Johnson

Stony Brook University

Stony Brook University
View Profile

,
Russell Kraner

VCORE Solutions LLC.

VCORE Solutions LLC.
View Profile

,
Bradley C. Kuszmaul

MIT and Tokutek, Inc.

MIT and Tokutek, Inc.
View Profile

,
Dzejla Medjedovic

Stony Brook University

Stony Brook University
View Profile

,
Pablo Montes

Stony Brook University

Stony Brook University
View Profile

,
Pradeep Shetty

Stony Brook University

Stony Brook University
View Profile

,
Richard P. Spillane

Stony Brook University

Stony Brook University
View Profile

,
Erez Zadok

Stony Brook University

Stony Brook University
View Profile

Proceedings of the VLDB Endowment Volume 5 Issue 11pp 1627–1637https://doi.org/10.14778/2350229.2350275

Published:01 July 2012Publication History

Proceedings of the VLDB Endowment

Abstract

This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclusively in RAM, limiting the size of the sets it can represent.

This paper first describes the quotient filter, which supports the basic operations of the Bloom filter, achieving roughly comparable performance in terms of space and time, but with better data locality. Operations on the quotient filter require only a small number of contiguous accesses. The quotient filter has other advantages over the Bloom filter: it supports deletions, it can be dynamically resized, and two quotient filters can be efficiently merged.

The paper then gives two data structures, the buffered quotient filter and the cascade filter, which exploit the quotient filter advantages and thus serve as SSD-optimized alternatives to the Bloom filter. The cascade filter has better asymptotic I/O performance than the buffered quotient filter, but the buffered quotient filter outperforms the cascade filter on small to medium data sets. Both data structures significantly outperform recently-proposed SSD-optimized Bloom filter variants, such as the elevator Bloom filter, buffered Bloom filter, and forest-structured Bloom filter. In experiments, the cascade filter and buffered quotient filter performed insertions 8.6--11 times faster than the fastest Bloom filter variant and performed lookups 0.94--2.56 times faster.

References

M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In SPAA, pages 81--92, 2007. Google ScholarDigital Library
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communincations of the ACM, 13(7):422--426, 1970. Google ScholarDigital Library
F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An improved construction for counting Bloom filters. In ECA, pages 684--695, 2006. Google ScholarDigital Library
A. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2003.Google ScholarCross Ref
M. Canim, G. A. Mihaila, B. Bhattacharhee, C. A. Lang, and K. A. Ross. Buffered Bloom filters on solid state storage. In VLDB ADMS Workshop, 2010.Google Scholar
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In OSDI, pages 205--218, 2006. Google ScholarDigital Library
Y. Chen, B. Schmidt, and D. L. Maskell. A reconfigurable Bloom filter architecture for BLASTN. In ARCS, pages 40--49, 2009. Google ScholarDigital Library
H. Cheung. TG Video: Fusion io - the power of 1000 hard drives in the palm of your hand. http://www.tgdaily.com/hardware-features/34065-tg-video-fusion-io-the-power-of-1000-hard-drives-in-the-palm-of-your-hand, 2007.Google Scholar
J. G. Cleary. Compact hash tables using bidirectional linear probing. IEEE Transactions on Computing, 33(9):828--834, 1984. Google ScholarDigital Library
B. Debnath, S. Sengupta, J. Li, D. Lilja, and D. Du. Bloomflash: Bloom filter on flash-based storage. In ICDCS, pages 635--644, 2011. Google ScholarDigital Library
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281--293, 2000. Google ScholarDigital Library
L. Freeman. How NetApp deduplication works - a primer, Apr. 2010. http://blogs.netapp.com/drdedupe/2010/04/how-netapp-deduplication-works.html.Google Scholar
G. J. Holzmann. Design and validation of computer protocols. Prentice-Hall, Inc., 1991. Google ScholarDigital Library
K. Keeton, C. B. Morrey, III, C. A. Soules, and A. Veitch. Lazybase: freshness vs. performance in information management. SIGOPS Operating Systems Review, 44(1):15--19, 2010. Google ScholarDigital Library
D. E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison Wesley, 1973.Google Scholar
Z. Li, R. Grosu, P. Sehgal, S. A. Smolka, S. D. Stoller, and E. Zadok. On the energy consumption and performance of systems software. In SYSTOR, pages 8:1--8:12, 2011. Google ScholarDigital Library
G. Lu, B. Debnath, and D. H. C. Du. A forest-structured bloom filter with flash memory. In MSST, pages 1--6, 2011. Google ScholarDigital Library
K. Malde and B. O'Sullivan. Using Bloom filters for large scale gene sequence analysis in Haskell. In PADL, pages 183--194, 2009. Google ScholarDigital Library
J. Mullin. Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16(5):558--560, 1990. Google ScholarDigital Library
A. Singh, M. Srivatsa, L. Liu, and T. Miller. Apoidea: A decentralized peer-to-peer architecture for crawling the world wide web. In SIGIR Workshop on Distributed Multimedia Information Retrieval, pages 126--142, 2003.Google Scholar
Z. Yuan, J. Miao, Y. Jia, and L. Wang. Counting data stream based on improved counting Bloom filter. In WAIM, pages 512--519, 2008. Google ScholarDigital Library
B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In FAST, pages 18:1--18:14, 2008. Google ScholarDigital Library

Recommendations

Don't thrash: how to cache your hash on flash
HotStorage'11: Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems

Many large storage systems use approximate-membership-query (AMQ) data structures to deal with the massive amounts of data that they process. An AMQ data structure is a dictionary that trades off space for a false positive rate on membership queries. It ...
Read More
Designs of fractional delay filter, Nyquist filter, lowpass filter and diamond-shaped filter

In this paper, the designs of fractional delay filter, Nyquist filter, lowpass filter and diamond-shaped filter are presented. First, the relation between Nyquist filter and fractional delay filter is investigated such that the design tools of one ...
Read More
Hybrid structures for low-complexity variable fractional-delay FIR filters

This paper proposes a pair of new structures for implementing low-complexity odd-order and even-order variable fractional-delay (VFD) FIR filters using hybrid structures with both even-order and odd-order subfilters. For odd-order VFD filter design, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 5, Issue 11
July 2012
608 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2012
Published in pvldb Volume 5, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 64
  Total Citations
  View Citations
- 498
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Don't thrash: how to cache your hash on flash

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Don't thrash: how to cache your hash on flash

Designs of fractional delay filter, Nyquist filter, lowpass filter and diamond-shaped filter

Hybrid structures for low-complexity variable fractional-delay FIR filters

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Don't thrash: how to cache your hash on flash

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Don't thrash: how to cache your hash on flash

Designs of fractional delay filter, Nyquist filter, lowpass filter and diamond-shaped filter

Hybrid structures for low-complexity variable fractional-delay FIR filters

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media