skip to main content
10.1145/133160.133209acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

A faster algorithm for constructing minimal perfect hash functions

Authors Info & Claims
Published:01 June 1992Publication History

ABSTRACT

Our previous research on one-probe access to large collections of data indexed by alphanumeric keys has produced the first practical minimal perfect hash functions for this problem. Here, a new algorithm is described for quickly finding minimal perfect hash functions whose specification space is very close to the theoretical lower bound, i.e., around 2 bits per key. The various stages of processing are detailed, along with analytical and empirical results, including timing for a set of over 3.8 million keys that was processed on a NeXTstation in about 6 hours.

References

  1. 1.Richard Barnhart. The Advanced Naval Message Analyzer. Videotape production, VPI&SU, November 1990. Richard Barnhart as host, producer, script-writer; Edward Fox as executive producer, project director. Discusses the CODER system.Google ScholarGoogle Scholar
  2. 2.N. Cercone, M. Krause, and J. Boates. Minimal and almost minimal perfect hash function search with application to natural language lexicon design. Computers and Mathematics with Applications, 9:215-231, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.C.C. Chang. Letter oriented reciprocal hashing scheme. Information Sciences, 38:243-255, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Qi Fan Chen. An object-oriented database system for efficient information retrieval applications. PhD thesis, Virginia Tech Dept. of Computer Science, March 1992.Google ScholarGoogle Scholar
  5. 5.R. J. Cichelli. Minimal perfect hash functions made simple. Communications of the Association for Computing Machinery, 23:17-19.19g0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.E. A. Fox and Robert K. France. Architecture of an expert system for composite document analysis, representation and retrieval. International Journal of Approximate Reasoning, 1(1): 151-175, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  7. 7.Edward A. Fox. Development of the CODER system: A testbe~ for artificial intelligence methods in information retrieval, information Processing & Management, 23(4):341-366, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Edward A. Fox, Qi Fan Chen, Amjad M. Daoud, and Lenwood S. Heath. Order-preserving minimal perfect hash functions and information retrieval. ACM Transactions on lnformationSystems, 9(3):281-308, July 199 I. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Edward A. Fox, Lenwood S. Heath, Qi Fan Chen, and Amjad M. Daoud. Practical minimal perfect hash functions for large databases. Communications of the Association for Computing Machinery, 35(1): 105-121, January 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Edward A. Fox, M. Prabhakar Koushik, Qi Fan Chen, and Robert K. France. Integrated access to a large medical literature database. Technical Report TR-91-15, Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.M.L. Fredman and J. Koml6s. On the size of separating systems and families of perfect hash functions. SIAM Journal on Algebraic and Discrete Methods', 5:61-68, 1984.Google ScholarGoogle Scholar
  12. 12.M.L. Fredman, J. Koml6s, and E. Szemer&li. Storing a sparse table with O(1) worst case access time. Journal of the Association for Computing Machinery, 31:538- 544, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.G. Jaeschke. Reciprocal hashing - a method for generating minimal perfect hash functions. Communications of the Association for Computing Machinery, 24:829- 833, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.K. Mehlhorn. On the program size of perfect and universal hash functions. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pages 170-175, 1982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.P. K. Pearson. Fast hashing of variable-length text strings. Communications of the Association for Computing Machinery, 33(6):677-680, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.T. J. Sager. A polynomial time generator for minimal perfect hash functions. Communications of the Association for Computing Machinery, 28:523-532, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.M. Weaver, R. France, Q. Chen, and E. Fox. Using a frame-based language for information retrieval. International Journal of Intelligent Systems, 4(3):223-257, 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A faster algorithm for constructing minimal perfect hash functions

          Recommendations

          Reviews

          Leonard C. Swanson

          A new algorithm for finding minimal perfect hash functions that require small amounts of hash table storage (as little as 2.4 bits per key) is described. Hash functions in this context are used to provide direct keyed access to database records. A minimal perfect hash function (MPHF) is one in which record keys are mapped to unique locations in a hash table (avoiding collision resolution) and the hash table contains exactly one location per key. The new algorithm has several important and original properties: it handles large sets of keys, yields MPHFs that require small specification space, and finds MPHFs in time that is linear in the number of keys (for specification sizes above 3 bits per key). The authors build on their earlier work to obtain MPHFs in linear expected time. The new algorithm is faster and further reduces specification space requirements. The paper presents a summary description of the algorithm that is clear in every respect, as well as results from application of the algorithm to a 3.8 million–key set. The figures are especially helpful and the references are complete, though prior work is not extensively discussed. The paper does not include important concepts the authors have covered in earlier papers, nor does it provide sufficient detail to replicate their work, as earlier papers do. In this respect, the paper does not stand alone; it is useful only in the context of the earlier work. Within that context, however, the authors' algorithm is an important extension, and the paper is a clear and useful summary of it.

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
            June 1992
            352 pages
            ISBN:0897915232
            DOI:10.1145/133160

            Copyright © 1992 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 1992

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate792of3,983submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader