Abstract
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees.
The scheme itself dates back to Zobrist in 1970 who used it for game playing programs. Keys are viewed as consisting of c characters. We initialize c tables H1, ..., Hc mapping characters to random hash codes. A key x = (x1, ..., xc) is hashed to H1[x1] ⊕ ⋯ ⊕ Hc[xc], where ⊕ denotes bit-wise exclusive-or.
While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, for example, Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.
- Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 209--223. Google ScholarDigital Library
- Azar, Y., Broder, A. Z., Karlin, A. R., and Upfal, E. 1999. Balanced allocations. SIAM J. Comput. 29, 1, 180--200. Google ScholarDigital Library
- Braverman, V., Chung, K.-M., Liu, Z., Mitzenmacher, M., and Ostrovsky, R. 2010. AMS without 4-wise independence on product domains. In Proceedings of the 27th Symposium on Theoretical Aspects of Computer Science (STACS). 119--130.Google Scholar
- Broder, A. Z., Charikar, M., Frieze, A. M., and Mitzenmacher, M. 2000. Min-wise independent permutations. J. Comput. Syst. Sci. 60, 3, 630--659. Google ScholarDigital Library
- Carter, L. and Wegman, M. N. 1979. Universal classes of hash functions. J. Comput. Syst. Sci. 18, 2, 143--154.Google ScholarCross Ref
- Cohen, J. S. and Kane, D. M. 2009. Bounds on the independence required for cuckoo hashing. Manuscript. http://math.stanford.edu/~dankane/cuchkoohashing.pdf.Google Scholar
- Dietzfelbinger, M. 1996. Universal hashing and k-wise independent random variables via integer arithmetic without primes. In Proceedings of the 13th Symposium on Theoretical Aspects of Computer Science (STACS). 569--580. Google ScholarDigital Library
- Dietzfelbinger, M., and Rink, M. 2009. Applications of a splitting trick. In Proceedings of the 36th International Colloquium on Automata, Languages and Programming (ICALP). 354--365. Google ScholarDigital Library
- Dietzfelbinger, M. and Schellbach, U. 2009. On risks of using cuckoo hashing with simple universal hash classes. In Proceedings of the 20th ACM/SIAM Symposium on Discrete Algorithms (SODA). 795--804. Google ScholarDigital Library
- Dietzfelbinger, M. and Woelfel, P. 2003. Almost random graphs with simple hash functions. In Proceedings of the 25th ACM Symposium on Theory of Computing (STOC). 629--638. Google ScholarDigital Library
- Dietzfelbinger, M., Hagerup, T., Katajainen, J., and Penttonen, M. 1997. A reliable randomized algorithm for the closest-pair problem. J. Algor. 25, 1, 19--51. Google ScholarDigital Library
- Gerber, L. 1968. An extension of Bernoulli’s inequality. Amer. Math. Month. 75, 875--876.Google ScholarCross Ref
- Indyk, P. 2001. A small approximately min-wise independent family of hash functions. J. Algor. 38, 1, 84--90. Google ScholarDigital Library
- Karloff, H. J. and Raghavan, P. 1993. Randomized algorithms and pseudorandom numbers. J. ACM 40, 3, 454--476. Google ScholarDigital Library
- Knuth, D. E. 1963. Notes on open addressing. Unpublished memorandum. http://citeseer.ist.psu.edu/knuth63notes.html.Google Scholar
- Mitzenmacher, M., and Vadhan, S. P. 2008. Why simple hash functions work: exploiting the entropy in a data stream. In Proceedings of the 19th ACM/SIAM Symposium on Discrete Algorithms (SODA). 746--755. Google ScholarDigital Library
- Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press. Google ScholarDigital Library
- Pagh, A., Pagh, R., and Ružić, M. 2009. Linear probing with constant independence. SIAM J. Comput. 39, 3, 1107--1120. Google ScholarDigital Library
- Pagh, R., and Rodler, F. F. 2004. Cuckoo hashing. J. Algor. 51, 2, 122--144. Google ScholarDigital Library
- Pǎtraşcu, M., and Thorup, M. 2010. On the k-independence required by linear probing and minwise independence. In Proceedings of the 37th International Colloquium on Automata, Languages and Programming (ICALP). 715--726. Google ScholarDigital Library
- Schmidt, J. P., Siegel, A., and Srinivasan, A. 1995. Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discr. Math. 8, 2, 223--250. Google ScholarDigital Library
- Siegel, A. 2004. On universal classes of extremely random constant-time hash functions. SIAM J. Comput. 33, 3, 505--543. Google ScholarDigital Library
- Thorup, M. 2000. Even strongly universal hashing is pretty fast. In Proceedings of the 11th ACM/SIAM Symposium on Discrete Algorithms (SODA). 496--497. Google ScholarDigital Library
- Thorup, M. 2009. String hashing for linear probing. In Proceedings of the 20th ACM/SIAM Symposium on Discrete Algorithms (SODA). 655--664. Google ScholarDigital Library
- Thorup, M., and Zhang, Y. 2012. Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput. 41, 2, 293--331. Google ScholarDigital Library
- Wegman, M. N., and Carter, L. 1981. New classes and applications of hash functions. J. Comput. Syst. Sci. 22, 3, 265--279.Google ScholarCross Ref
- Zobrist, A. L. 1970. A new hashing method with application for game playing. Tech. rep. 88, Computer Sciences Department, University of Wisconsin, Madison, WI.Google Scholar
Index Terms
- The Power of Simple Tabulation Hashing
Recommendations
Fast and powerful hashing using tabulation
Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on ...
The power of simple tabulation hashing
STOC '11: Proceedings of the forty-third annual ACM symposium on Theory of computingRandomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides ...
Tabulation-Based 5-Independent Hashing with Applications to Linear Probing and Second Moment Estimation
In the framework of Wegman and Carter, a $k$-independent hash function maps any $k$ keys independently. It is known that 5-independent hashing provides good expected performance in applications such as linear probing and second moment estimation for ...
Comments