Skip to main content

2020 | OriginalPaper | Buchkapitel

Confirmation Sampling for Exact Nearest Neighbor Search

verfasst von : Tobias Christiani, Rasmus Pagh, Mikkel Thorup

Erschienen in: Similarity Search and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings.
We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability \(1-\delta \), using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of \(\varOmega (L)\) trees with internal parameters specifically tuned to the query and data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The sampling of a random element ensures compatibility with ConfirmationSampling, which requires a sample to be returned even if there is no hash collision. It is not really necessary from an algorithmic viewpoint, but also does not hurt the asymptotic performance.
 
2
For every choice of constant \(c \ge 1\) there exists a constant \(n_0\) such that for \(n \ge n_0\) we can obtain success probability \(1 - 1/n^c\) where \(n = |P|\) denotes the size of the set of data points.
 
Literatur
1.
Zurück zum Zitat Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I., Schmidt, L.: Practical and optimal LSH for angular distance. In: Proceedings of the NIPS 2015, pp. 1225–1233 (2015) Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I., Schmidt, L.: Practical and optimal LSH for angular distance. In: Proceedings of the NIPS 2015, pp. 1225–1233 (2015)
2.
Zurück zum Zitat Andoni, A., Laarhoven, T., Razenshteyn, I.P., Waingarten, E.: Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of the SODA 2017, pp. 47–66 (2017) Andoni, A., Laarhoven, T., Razenshteyn, I.P., Waingarten, E.: Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of the SODA 2017, pp. 47–66 (2017)
3.
Zurück zum Zitat Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of the STOC 2015, pp. 793–801 (2015) Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of the STOC 2015, pp. 793–801 (2015)
5.
Zurück zum Zitat Aumüller, M., Christiani, T., Pagh, R., Vesterli, M.: PUFFINN: parameterless and universally fast finding of nearest neighbors. In: Proceedings of the ESA 2019. LIPIcs, vol. 144, pp. 10:1–10:16 (2019) Aumüller, M., Christiani, T., Pagh, R., Vesterli, M.: PUFFINN: parameterless and universally fast finding of nearest neighbors. In: Proceedings of the ESA 2019. LIPIcs, vol. 144, pp. 10:1–10:16 (2019)
6.
Zurück zum Zitat Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of the WWW 2005, pp. 651–660 (2005) Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of the WWW 2005, pp. 651–660 (2005)
7.
Zurück zum Zitat Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the STOC 2002, pp. 380–388 (2002) Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the STOC 2002, pp. 380–388 (2002)
10.
Zurück zum Zitat Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the SOCG 2004, pp. 253–262 (2004) Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the SOCG 2004, pp. 253–262 (2004)
11.
Zurück zum Zitat Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: Proceedings of the CIKM 2008, pp. 669–678 (2008) Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: Proceedings of the CIKM 2008, pp. 669–678 (2008)
12.
Zurück zum Zitat Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theor. Comput. 8(1), 321–350 (2012)MathSciNetCrossRef Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theor. Comput. 8(1), 321–350 (2012)MathSciNetCrossRef
13.
Zurück zum Zitat Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the STOC 1998, pp. 604–613 (1998) Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the STOC 1998, pp. 604–613 (1998)
14.
Zurück zum Zitat Li, P., König, A.C.: Theory and applications of b-bit minwise hashing. Commun. ACM 54(8), 101–109 (2011)CrossRef Li, P., König, A.C.: Theory and applications of b-bit minwise hashing. Commun. ACM 54(8), 101–109 (2011)CrossRef
15.
Zurück zum Zitat Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Intelligent probing for locality sensitive hashing: multi-probe LSH and beyond. PVLDB 10(12), 2021–2024 (2017) Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Intelligent probing for locality sensitive hashing: multi-probe LSH and beyond. PVLDB 10(12), 2021–2024 (2017)
16.
Zurück zum Zitat Panigrahy, R.: Entropy based nearest neighbor search in high dimensions. In: Proceedings of the SODA 2006, pp. 1186–1195 (2006) Panigrahy, R.: Entropy based nearest neighbor search in high dimensions. In: Proceedings of the SODA 2006, pp. 1186–1195 (2006)
17.
Zurück zum Zitat Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)CrossRef Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)CrossRef
Metadaten
Titel
Confirmation Sampling for Exact Nearest Neighbor Search
verfasst von
Tobias Christiani
Rasmus Pagh
Mikkel Thorup
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-60936-8_8