Skip to main content
Top
Published in: Neural Computing and Applications 5/2023

17-09-2020 | S.I. : Deep Geospatial Data Understanding

Efficient locality-sensitive hashing over high-dimensional streaming data

Authors: Hao Wang, Chengcheng Yang, Xiangliang Zhang, Xin Gao

Published in: Neural Computing and Applications | Issue 5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Approximate nearest neighbor (ANN) search in high-dimensional spaces is fundamental in many applications. Locality-sensitive hashing (LSH) is a well-known methodology to solve the ANN problem. Existing LSH-based ANN solutions typically employ a large number of individual indexes optimized for searching efficiency. Updating such indexes might be impractical when processing high-dimensional streaming data. In this paper, we present a novel disk-based LSH index that offers efficient support for both searches and updates. The contributions of our work are threefold. First, we use the write-friendly LSM-trees to store the LSH projections to facilitate efficient updates. Second, we develop a novel estimation scheme to estimate the number of required LSH functions, with which the disk storage and access costs are effectively reduced. Third, we exploit both the collision number and the projection distance to improve the efficiency of candidate selection, improving the search performance with theoretical guarantees on the result quality. Experiments on four real-world datasets show that our proposal outperforms the state-of-the-art schemes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Note that Algorithm  1 sets K proportional to the total number of objects, \(|\mathcal {D}|\). The ratio, denoted \(\beta\), controls the false-positive rate during the search.
 
2
That is, SSPD values of different data objects are considered distinct elements in \(A_{s,j}\) even though they might be equal. Thus, the size of the multiset \(A_{s,j}\) is precisely the number of objects colliding at least j times with the query (within radius \(R=1\)).
 
Literature
1.
go back to reference Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp 194–205 Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp 194–205
2.
go back to reference Indyk P ,Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp 604–613 Indyk P ,Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp 604–613
3.
go back to reference Li Q, Sun Z, He R, Tan T (2017) Deep supervised discrete hashing. In: NIPS, pp 2482–2491 Li Q, Sun Z, He R, Tan T (2017) Deep supervised discrete hashing. In: NIPS, pp 2482–2491
4.
go back to reference Gao J, Jagadish HV, Lu W,Ooi BC (2014) DSH: data sensitive hashing for high-dimensional k-NN search. In: SIGMOD, pp 1127–1138 Gao J, Jagadish HV, Lu W,Ooi BC (2014) DSH: data sensitive hashing for high-dimensional k-NN search. In: SIGMOD, pp 1127–1138
5.
go back to reference Zhao K, Lu H, Mei J (2014) Locality preserving hashing. In: AAAI, pp 2874–2881 Zhao K, Lu H, Mei J (2014) Locality preserving hashing. In: AAAI, pp 2874–2881
6.
go back to reference Gao J., Jagadish HV, Ooi BC, Wang S (2015) Selective hashing: closing the gap between radius search and k-nn search. In: SIGKDD, pp 349–358 Gao J., Jagadish HV, Ooi BC, Wang S (2015) Selective hashing: closing the gap between radius search and k-nn search. In: SIGKDD, pp 349–358
7.
go back to reference Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB, pp 950–961 Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB, pp 950–961
8.
go back to reference Gan J, Feng J, Fang Q, Ng W (2012) Locality-sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pages 541–552 Gan J, Feng J, Fang Q, Ng W (2012) Locality-sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pages 541–552
9.
go back to reference Huang Q, Feng J, Fang Q, Ng W, Wang W (2017) Query-aware locality-sensitive hashing scheme for l\({}_{{p}}\) norm. VLDB J 26(5):683–708CrossRef Huang Q, Feng J, Fang Q, Ng W, Wang W (2017) Query-aware locality-sensitive hashing scheme for l\({}_{{p}}\) norm. VLDB J 26(5):683–708CrossRef
10.
go back to reference Sun Y, Wang W, Qin J, Zhang Y, Lin X (2014) SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1):1–12 Sun Y, Wang W, Qin J, Zhang Y, Lin X (2014) SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1):1–12
11.
go back to reference Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346CrossRefMATH Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346CrossRefMATH
12.
go back to reference Zhai T, Gao Y, Wang H, Cao L (2017) Classification of high-dimensional evolving data streams via a resource-efficient online ensemble. Data Min Knowl Disc 31:1242–1265CrossRefMATH Zhai T, Gao Y, Wang H, Cao L (2017) Classification of high-dimensional evolving data streams via a resource-efficient online ensemble. Data Min Knowl Disc 31:1242–1265CrossRefMATH
13.
go back to reference Andoni A, Indyk P, Laarhoven T, Razenshteyn IP, Schmidt L (2015) Practical and optimal LSH for angular distance. In: NIPS, pp 1225–1233 Andoni A, Indyk P, Laarhoven T, Razenshteyn IP, Schmidt L (2015) Practical and optimal LSH for angular distance. In: NIPS, pp 1225–1233
14.
go back to reference Eshghi K, Rajaram S (2008) Locality sensitive hash functions based on concomitant rank order statistics. In: SIGKDD, pp 221–229 Eshghi K, Rajaram S (2008) Locality sensitive hash functions based on concomitant rank order statistics. In: SIGKDD, pp 221–229
15.
go back to reference Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp 253–262 Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp 253–262
16.
go back to reference O’Neil P, Cheng E, Gawlick D, O’Neil E (1996) The log-structured merge-tree (LSM-tree). Acta Informatica 33(4):351–385CrossRefMATH O’Neil P, Cheng E, Gawlick D, O’Neil E (1996) The log-structured merge-tree (LSM-tree). Acta Informatica 33(4):351–385CrossRefMATH
17.
go back to reference Tao Y, Yi K, Sheng C, Kalnis P (2010) Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans Database Syst 35(35):20:1–20:4620:46 Tao Y, Yi K, Sheng C, Kalnis P (2010) Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans Database Syst 35(35):20:1–20:4620:46
18.
go back to reference Zheng Y, Guo Q, Tung Anthony KH, Wu S (2016) LazyLSH: approximate nearest neighbor search for multiple distance functions with a single index. In: SIGMOD, pp 2023–2037 Zheng Y, Guo Q, Tung Anthony KH, Wu S (2016) LazyLSH: approximate nearest neighbor search for multiple distance functions with a single index. In: SIGMOD, pp 2023–2037
19.
go back to reference Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: SODA, pp 1186–1195 Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: SODA, pp 1186–1195
20.
go back to reference Liu Y, Cui J, Huang Z, Li H, Shen HT (2014) SK-LSH: an efficient index structure for approximate nearest neighbor search. PVLDB 7(9):745–756 Liu Y, Cui J, Huang Z, Li H, Shen HT (2014) SK-LSH: an efficient index structure for approximate nearest neighbor search. PVLDB 7(9):745–756
21.
go back to reference Chu C, Gong D, Chen K, Guo Y, Han J, Ding G (2019) Optimized projection for hashing. Pattern Recognit Lett 117:169–178CrossRef Chu C, Gong D, Chen K, Guo Y, Han J, Ding G (2019) Optimized projection for hashing. Pattern Recognit Lett 117:169–178CrossRef
22.
go back to reference Liu X, Nie X, Wang Y, Yin Y (2019) Jointly multiple hash learning. In: AAAI, pp 9981–9982 Liu X, Nie X, Wang Y, Yin Y (2019) Jointly multiple hash learning. In: AAAI, pp 9981–9982
23.
go back to reference Dayan Ni, Athanassoulis M, Idreos S (2017) Monkey: optimal navigable key-value store. In: SIGMOD, pp 79–94 Dayan Ni, Athanassoulis M, Idreos S (2017) Monkey: optimal navigable key-value store. In: SIGMOD, pp 79–94
24.
go back to reference Liu W, Wang H, Zhang Y, Wang W, Qin L (2019) I-LSH: I/O efficient c-approximate nearest neighbor search in high-dimensional space. In: ICDE, pp 1670–1673 Liu W, Wang H, Zhang Y, Wang W, Qin L (2019) I-LSH: I/O efficient c-approximate nearest neighbor search in high-dimensional space. In: ICDE, pp 1670–1673
25.
go back to reference Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128CrossRef Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128CrossRef
Metadata
Title
Efficient locality-sensitive hashing over high-dimensional streaming data
Authors
Hao Wang
Chengcheng Yang
Xiangliang Zhang
Xin Gao
Publication date
17-09-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 5/2023
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05336-1

Other articles of this Issue 5/2023

Neural Computing and Applications 5/2023 Go to the issue

Premium Partner