Article

Free Access

A faster algorithm for constructing minimal perfect hash functions

Authors:
Edward A. Fox

Department of Computer Science and Computing Center, Virginia Polytechnic Institute and State University, Blacksburg, VA

Department of Computer Science and Computing Center, Virginia Polytechnic Institute and State University, Blacksburg, VA
View Profile

,
Qi Fan Chen

Department of Computer Science and Computing Center, Virginia Polytechnic Institute and State University, Blacksburg, VA

Department of Computer Science and Computing Center, Virginia Polytechnic Institute and State University, Blacksburg, VA
View Profile

,
Lenwood S. Heath

Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA

Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA
View Profile

SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrievalJune 1992Pages 266–273https://doi.org/10.1145/133160.133209

Published:01 June 1992Publication History

SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 266–273

ABSTRACT

Our previous research on one-probe access to large collections of data indexed by alphanumeric keys has produced the first practical minimal perfect hash functions for this problem. Here, a new algorithm is described for quickly finding minimal perfect hash functions whose specification space is very close to the theoretical lower bound, i.e., around 2 bits per key. The various stages of processing are detailed, along with analytical and empirical results, including timing for a set of over 3.8 million keys that was processed on a NeXTstation in about 6 hours.

References

1.Richard Barnhart. The Advanced Naval Message Analyzer. Videotape production, VPI&SU, November 1990. Richard Barnhart as host, producer, script-writer; Edward Fox as executive producer, project director. Discusses the CODER system.Google Scholar
2.N. Cercone, M. Krause, and J. Boates. Minimal and almost minimal perfect hash function search with application to natural language lexicon design. Computers and Mathematics with Applications, 9:215-231, 1983.Google ScholarCross Ref
3.C.C. Chang. Letter oriented reciprocal hashing scheme. Information Sciences, 38:243-255, 1986. Google ScholarDigital Library
4.Qi Fan Chen. An object-oriented database system for efficient information retrieval applications. PhD thesis, Virginia Tech Dept. of Computer Science, March 1992.Google Scholar
5.R. J. Cichelli. Minimal perfect hash functions made simple. Communications of the Association for Computing Machinery, 23:17-19.19g0. Google ScholarDigital Library
6.E. A. Fox and Robert K. France. Architecture of an expert system for composite document analysis, representation and retrieval. International Journal of Approximate Reasoning, 1(1): 151-175, 1987.Google ScholarCross Ref
7.Edward A. Fox. Development of the CODER system: A testbe~ for artificial intelligence methods in information retrieval, information Processing & Management, 23(4):341-366, 1987. Google ScholarDigital Library
8.Edward A. Fox, Qi Fan Chen, Amjad M. Daoud, and Lenwood S. Heath. Order-preserving minimal perfect hash functions and information retrieval. ACM Transactions on lnformationSystems, 9(3):281-308, July 199 I. Google ScholarDigital Library
9.Edward A. Fox, Lenwood S. Heath, Qi Fan Chen, and Amjad M. Daoud. Practical minimal perfect hash functions for large databases. Communications of the Association for Computing Machinery, 35(1): 105-121, January 1992. Google ScholarDigital Library
10.Edward A. Fox, M. Prabhakar Koushik, Qi Fan Chen, and Robert K. France. Integrated access to a large medical literature database. Technical Report TR-91-15, Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, May 1991. Google ScholarDigital Library
11.M.L. Fredman and J. Koml6s. On the size of separating systems and families of perfect hash functions. SIAM Journal on Algebraic and Discrete Methods', 5:61-68, 1984.Google Scholar
12.M.L. Fredman, J. Koml6s, and E. Szemer&li. Storing a sparse table with O(1) worst case access time. Journal of the Association for Computing Machinery, 31:538- 544, 1984. Google ScholarDigital Library
13.G. Jaeschke. Reciprocal hashing - a method for generating minimal perfect hash functions. Communications of the Association for Computing Machinery, 24:829- 833, 1981. Google ScholarDigital Library
14.K. Mehlhorn. On the program size of perfect and universal hash functions. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pages 170-175, 1982.Google ScholarDigital Library
15.P. K. Pearson. Fast hashing of variable-length text strings. Communications of the Association for Computing Machinery, 33(6):677-680, June 1990. Google ScholarDigital Library
16.T. J. Sager. A polynomial time generator for minimal perfect hash functions. Communications of the Association for Computing Machinery, 28:523-532, 1985. Google ScholarDigital Library
17.M. Weaver, R. France, Q. Chen, and E. Fox. Using a frame-based language for information retrieval. International Journal of Intelligent Systems, 4(3):223-257, 1989.Google ScholarDigital Library

Index Terms

A faster algorithm for constructing minimal perfect hash functions
1. Information systems
2. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Sorting and searching

Recommendations

Practical minimal perfect hash functions for large databases
Read More
Finding Minimal Perfect Hash Functions
Read More
Indexing internal memory with minimal perfect hash functions
SBBD '08: Proceedings of the 23rd Brazilian symposium on Databases

A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values, which are in turn used to index a hash table. Since no collisions occur, each key can be retrieved from the table with a single probe. A minimal perfect ...
Read More

Reviews

Reviewer: Leonard C. Swanson

A new algorithm for finding minimal perfect hash functions that require small amounts of hash table storage (as little as 2.4 bits per key) is described. Hash functions in this context are used to provide direct keyed access to database records. A minimal perfect hash function (MPHF) is one in which record keys are mapped to unique locations in a hash table (avoiding collision resolution) and the hash table contains exactly one location per key. The new algorithm has several important and original properties: it handles large sets of keys, yields MPHFs that require small specification space, and finds MPHFs in time that is linear in the number of keys (for specification sizes above 3 bits per key). The authors build on their earlier work to obtain MPHFs in linear expected time. The new algorithm is faster and further reduces specification space requirements. The paper presents a summary description of the algorithm that is clear in every respect, as well as results from application of the algorithm to a 3.8 million–key set. The figures are especially helpful and the references are complete, though prior work is not extensively discussed. The paper does not include important concepts the authors have covered in earlier papers, nor does it provide sufficient detail to replicate their work, as earlier papers do. In this respect, the paper does not stand alone; it is useful only in the context of the earlier work. Within that context, however, the authors' algorithm is an important extension, and the paper is a clear and useful summary of it.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
June 1992
352 pages
ISBN:0897915232
DOI:10.1145/133160
Chairman:
Edward Fox
VPI & State Univ., Blacksburg, VA
,
Editors:
Nicholas Belkin
Rutgers Univ., New Brunswick, NJ
,
Peter Ingwersen,
Annelise Mark Pejtersen
Copyright © 1992 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1992
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 1,122
  Total Downloads
- Downloads (Last 12 months)110
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A faster algorithm for constructing minimal perfect hash functions

SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Practical minimal perfect hash functions for large databases

Finding Minimal Perfect Hash Functions

Indexing internal memory with minimal perfect hash functions

Reviews

Access critical reviews of Computing literature here