Skip to main content
Erschienen in:
Buchtitelbild

2015 | OriginalPaper | Buchkapitel

Minimal and Monotone Minimal Perfect Hash Functions

verfasst von : Paolo Boldi

Erschienen in: Mathematical Foundations of Computer Science 2015

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A minimal perfect hash function (MPHF) is a (data structure providing a) bijective map from a set S of n keys to the set of the first n natural numbers. In the static case (i.e., when the set S is known in advance), there is a wide spectrum of solutions available, offering different trade-offs in terms of construction time, access time and size of the data structure. MPHFs have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable \(\varOmega (n \log n)\) lower bound on the number of bits required to store the function. Recently, it was observed that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. MPHFs that preserve the intrinsic order of the keys are called monotone (MMPHF). The problem of building MMPHFs is more recent and less studied (for example, no lower bounds are known) but once more there is a wide spectrum of solutions available, by now. In this paper, we survey some of the most practical techniques and tools for the construction of MPHFs and MMPHFs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
I write \(\left( \begin{array}{l}X \\ t \end{array}\right) \) for the set of subsets of X of cardinality t.
 
2
The sub-hypergraph induced by \(X \subseteq V\) is \((X,E_X)\) where \(E_X = E \cap {\left( \begin{array}{l}X \\ t \end{array}\right) }\).
 
3
The degree of a vertex is the number of hyperedges including it.
 
4
Keeping only the (at most) n non-zero \(a^*_i\) and storing their indices in an array on which a rank/select structure is provided.
 
5
In [20] a variant is also discussed that directly produces a MPHF, but its construction time is no longer linear in expectation.
 
6
In order to highlight better the differences between the various approaches, in this section I consider the long-keys scenario.
 
7
The last bucket may, of course, be smaller than b.
 
8
If \(N_{\gamma }\) is a sequence of 1s, \(N_{\gamma }^+\) will not be added to the set.
 
9
If \({\text {exit}}(x)\) is the root, the algorithm will return \(i=-1\), so it is still true that \(x[:i+1]\) is the name of the exit node.
 
Literatur
1.
Zurück zum Zitat Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(O(1)\) worst case access time. J. Assoc. Comput. Mach. 31, 538–544 (1984)MathSciNetCrossRefMATH Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(O(1)\) worst case access time. J. Assoc. Comput. Mach. 31, 538–544 (1984)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Fredman, M.L., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM J. Algebr. Discret. Methods 5, 61–68 (1984)CrossRefMATH Fredman, M.L., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM J. Algebr. Discret. Methods 5, 61–68 (1984)CrossRefMATH
3.
Zurück zum Zitat Fox, E.A., Chen, Q.F., Daoud, A.M., Heath, L.S.: Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Sys. 9, 281–308 (1991)CrossRef Fox, E.A., Chen, Q.F., Daoud, A.M., Heath, L.S.: Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Sys. 9, 281–308 (1991)CrossRef
4.
Zurück zum Zitat Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39, 547–554 (1996)CrossRef Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39, 547–554 (1996)CrossRef
5.
Zurück zum Zitat Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), pp. 785–794, New York, ACM Press (2009) Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), pp. 785–794, New York, ACM Press (2009)
6.
Zurück zum Zitat Boldi, P., Vigna, S.: The WebGraph framework i: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601, Manhattan, USA, ACM Press (2004) Boldi, P., Vigna, S.: The WebGraph framework i: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601, Manhattan, USA, ACM Press (2004)
7.
Zurück zum Zitat Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011) Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)
8.
Zurück zum Zitat Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999) Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)
9.
Zurück zum Zitat Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009) Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009)
10.
Zurück zum Zitat Knuth, D.E.: The Art of Computer Programming. Addison-Wesley, Boston (1973) Knuth, D.E.: The Art of Computer Programming. Addison-Wesley, Boston (1973)
11.
Zurück zum Zitat Mitzenmacher, M., Vadhan, S.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pp. 746–755. Society for Industrial and Applied Mathematics, Philadelphia (2008) Mitzenmacher, M., Vadhan, S.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pp. 746–755. Society for Industrial and Applied Mathematics, Philadelphia (2008)
12.
Zurück zum Zitat Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science (FOCS 1989), pp. 549–554. IEEE Computer Society Press, Research Triangle Park, North Carolina (1989) Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science (FOCS 1989), pp. 549–554. IEEE Computer Society Press, Research Triangle Park, North Carolina (1989)
13.
Zurück zum Zitat Patrascu, M.: Succincter. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 305–313. IEEE Computer Society (2008) Patrascu, M.: Succincter. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 305–313. IEEE Computer Society (2008)
14.
Zurück zum Zitat Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008) CrossRef Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008) CrossRef
15.
Zurück zum Zitat Gog, S., Petri, M.: Optimized succinct data structures for massive data. Software: Practice and Experience (2014). To appear Gog, S., Petri, M.: Optimized succinct data structures for massive data. Software: Practice and Experience (2014). To appear
16.
Zurück zum Zitat Bloom, B.H.: Space-time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)CrossRefMATH Bloom, B.H.: Space-time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)CrossRefMATH
17.
Zurück zum Zitat Mehlhorn, K.: Data Structures and Algorithms 1: Sorting and Searching. EATCS monographs on theoretical computer science, vol. 1. Springer, Heidelberg (1984) CrossRefMATH Mehlhorn, K.: Data Structures and Algorithms 1: Sorting and Searching. EATCS monographs on theoretical computer science, vol. 1. Springer, Heidelberg (1984) CrossRefMATH
18.
Zurück zum Zitat Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001) CrossRef Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001) CrossRef
19.
Zurück zum Zitat Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004) Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)
20.
Zurück zum Zitat Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009) CrossRef Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009) CrossRef
22.
Zurück zum Zitat Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Hromkovič, J., Královič, R., Nunkesser, M., Widmayer, P. (eds.) SAGA 2007. LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007) CrossRef Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Hromkovič, J., Královič, R., Nunkesser, M., Widmayer, P. (eds.) SAGA 2007. LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007) CrossRef
23.
Zurück zum Zitat Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 203–216. Springer, Heidelberg (2007) CrossRef Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 203–216. Springer, Heidelberg (2007) CrossRef
24.
Zurück zum Zitat Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithmic 16, 3.2:1–3.2:26 (2011)MathSciNet Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithmic 16, 3.2:1–3.2:26 (2011)MathSciNet
25.
Zurück zum Zitat Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010) CrossRef Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010) CrossRef
Metadaten
Titel
Minimal and Monotone Minimal Perfect Hash Functions
verfasst von
Paolo Boldi
Copyright-Jahr
2015
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-48057-1_1