Skip to main content
Erschienen in: The Journal of Supercomputing 1/2020

30.10.2019

LSH-based distributed similarity indexing with load balancing in high-dimensional space

verfasst von: Jiagao Wu, Lu Shen, Linfeng Liu

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous \(l_2\) norms, respectively. Unlike earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Tao Y, Yi K, Sheng C, Kalnis P (2010) Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans Database Syst 35(3):1–20CrossRef Tao Y, Yi K, Sheng C, Kalnis P (2010) Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans Database Syst 35(3):1–20CrossRef
2.
Zurück zum Zitat Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. VLDB 99(6):518–529 Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. VLDB 99(6):518–529
3.
Zurück zum Zitat Hu W, Fan Y, Xing J et al (2018) Deep constrained siamese hash coding network and load-balanced locality-sensitive hashing for near duplicate image detection. IEEE Trans Image Process 27:4452–4464MathSciNetCrossRef Hu W, Fan Y, Xing J et al (2018) Deep constrained siamese hash coding network and load-balanced locality-sensitive hashing for near duplicate image detection. IEEE Trans Image Process 27:4452–4464MathSciNetCrossRef
4.
Zurück zum Zitat Gan J, Feng J, Fang Q, Ng W (2012) Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, New York, pp 541–552 Gan J, Feng J, Fang Q, Ng W (2012) Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, New York, pp 541–552
5.
Zurück zum Zitat Shen L, Wu J, Wang Y, Feng L (2018) Towards load balancing for LSH-based distributed similarity indexing in high-dimensional space. In; IEEE 20th International Conference on High Performance Computing and Communications and IEEE 16th International Conference on Smart City and IEEE 4th International Conference on Data Science and Systems, pp 384–391 Shen L, Wu J, Wang Y, Feng L (2018) Towards load balancing for LSH-based distributed similarity indexing in high-dimensional space. In; IEEE 20th International Conference on High Performance Computing and Communications and IEEE 16th International Conference on Smart City and IEEE 4th International Conference on Data Science and Systems, pp 384–391
6.
Zurück zum Zitat Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm. Society for Industrial and Applied Mathematics, Philadelphia, pp 1186–1195 Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm. Society for Industrial and Applied Mathematics, Philadelphia, pp 1186–1195
7.
Zurück zum Zitat Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, Rio de Janeiro, pp 950–961 Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, Rio de Janeiro, pp 950–961
8.
Zurück zum Zitat Christiani T (2017) A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, New York, pp 31–46 Christiani T (2017) A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, New York, pp 31–46
9.
Zurück zum Zitat Kraus N, Carmel D, Keidar I, Orenbach M (2016) Nearbucket-LSH: efficient similarity search in P2P networks. In: International Conference on Similarity Search and Applications. Springer, Cham, pp 236–249 Kraus N, Carmel D, Keidar I, Orenbach M (2016) Nearbucket-LSH: efficient similarity search in P2P networks. In: International Conference on Similarity Search and Applications. Springer, Cham, pp 236–249
10.
Zurück zum Zitat Karapiperis D, Gkoulalas-Divanis A, Verykios V (2016) LSHDB: a parallel and distributed engine for record linkage and similarity search In: IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, New York, pp 1–4 Karapiperis D, Gkoulalas-Divanis A, Verykios V (2016) LSHDB: a parallel and distributed engine for record linkage and similarity search In: IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, New York, pp 1–4
11.
Zurück zum Zitat Qi L et al (2017) A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE J Sel Areas Commun 35(11):2616–2624CrossRef Qi L et al (2017) A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE J Sel Areas Commun 35(11):2616–2624CrossRef
12.
Zurück zum Zitat Zhai D et al (2018) Supervised distributed hashing for large-scale multimedia retrieval. IEEE Trans Multimedia 20(3):675–686CrossRef Zhai D et al (2018) Supervised distributed hashing for large-scale multimedia retrieval. IEEE Trans Multimedia 20(3):675–686CrossRef
13.
Zurück zum Zitat Liao J, Yang D, Li T, Qi Q, Wang J, Sun H (2016) Fusion feature for LSH-based image retrieval in a cloud datacenter. Multimedia Tools Appl 75:15405–15427CrossRef Liao J, Yang D, Li T, Qi Q, Wang J, Sun H (2016) Fusion feature for LSH-based image retrieval in a cloud datacenter. Multimedia Tools Appl 75:15405–15427CrossRef
14.
Zurück zum Zitat Chuang Y-T, Yu C-Y, Wu Q-W (2018) DSLM: a decentralized search for large and mobile networks. J Supercomput 74:738–767CrossRef Chuang Y-T, Yu C-Y, Wu Q-W (2018) DSLM: a decentralized search for large and mobile networks. J Supercomput 74:738–767CrossRef
15.
Zurück zum Zitat Bahmani B, Goel A, Shinde R (2012) Efficient distributed locality sensitive hashing. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, pp 2174–2178 Bahmani B, Goel A, Shinde R (2012) Efficient distributed locality sensitive hashing. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, pp 2174–2178
16.
Zurück zum Zitat Wadhwa S, Gupta P (2010) Distributed locality sensitivity hashing. In: Consumer Communications and Networking Conference, pp 1–4 Wadhwa S, Gupta P (2010) Distributed locality sensitivity hashing. In: Consumer Communications and Networking Conference, pp 1–4
17.
Zurück zum Zitat Haghani P, Michel S, Aberer K (2009) Distributed similarity search in high dimensions using locality sensitive hashing. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, New York, pp 744–755 Haghani P, Michel S, Aberer K (2009) Distributed similarity search in high dimensions using locality sensitive hashing. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, New York, pp 744–755
18.
Zurück zum Zitat Raghu G et al (2018) Memory-based load balancing algorithm in structured peer-to-peer system. In: Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Springer, Berlin, pp 431–439 Raghu G et al (2018) Memory-based load balancing algorithm in structured peer-to-peer system. In: Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Springer, Berlin, pp 431–439
19.
Zurück zum Zitat Lee KM, Jeong Y-S, Lee SH, Lee KM (2019) Bucket-size balancing locality sensitive hashing using the map reduce paradigm. Clust Comput J Netw Softw Tools Appl 22:S1959–S1971 Lee KM, Jeong Y-S, Lee SH, Lee KM (2019) Bucket-size balancing locality sensitive hashing using the map reduce paradigm. Clust Comput J Netw Softw Tools Appl 22:S1959–S1971
20.
Zurück zum Zitat Wu X, Zeng Y, Lin G (2017) An energy efficient VM migration algorithm in data centers. In: Proceedings of IEEE 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp 27–30 Wu X, Zeng Y, Lin G (2017) An energy efficient VM migration algorithm in data centers. In: Proceedings of IEEE 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp 27–30
21.
Zurück zum Zitat Chen H, Zhu J, Zhang Z, Ma M, Shen X (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Supercomput 73:4906–4922CrossRef Chen H, Zhu J, Zhang Z, Ma M, Shen X (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Supercomput 73:4906–4922CrossRef
22.
Zurück zum Zitat Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Future Gener Comput Syst 94:453–467CrossRef Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Future Gener Comput Syst 94:453–467CrossRef
23.
Zurück zum Zitat Haghani P, Michel S, CudreMauroux P, Aberer K (2008) LSH at large-distributed KNN search in high dimensions. In: WebDB Haghani P, Michel S, CudreMauroux P, Aberer K (2008) LSH at large-distributed KNN search in high dimensions. In: WebDB
24.
Zurück zum Zitat Datar M, Immorlica N, Indyk P, Mirrokni V (2004) Locality sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry. ACM, New York, pp 253–262 Datar M, Immorlica N, Indyk P, Mirrokni V (2004) Locality sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry. ACM, New York, pp 253–262
25.
Zurück zum Zitat Balakrishnan H, Kaashoek F, Karger D, Morris R, Stoica I (2003) Looking up data in P2P systems. Commun ACM 46(2):43–48CrossRef Balakrishnan H, Kaashoek F, Karger D, Morris R, Stoica I (2003) Looking up data in P2P systems. Commun ACM 46(2):43–48CrossRef
26.
Zurück zum Zitat Godfrey B, Lakshminarayanan K, Surana S, Karp R, Stoica I (2004) Load balancing in dynamic structured P2P systems. In: Twenty-Third AnnualJoint Conference of the IEEE Computer and Communications Societies, INFOCOM. pp 2253–2262 Godfrey B, Lakshminarayanan K, Surana S, Karp R, Stoica I (2004) Load balancing in dynamic structured P2P systems. In: Twenty-Third AnnualJoint Conference of the IEEE Computer and Communications Societies, INFOCOM. pp 2253–2262
27.
Zurück zum Zitat Pitoura T, Triantafillou P (2007) Load distribution fairness in P2P data management systems. In: IEEE 23rd International Conference on Data Engineering, ICDE. IEEE, New York, pp 396–405 Pitoura T, Triantafillou P (2007) Load distribution fairness in P2P data management systems. In: IEEE 23rd International Conference on Data Engineering, ICDE. IEEE, New York, pp 396–405
Metadaten
Titel
LSH-based distributed similarity indexing with load balancing in high-dimensional space
verfasst von
Jiagao Wu
Lu Shen
Linfeng Liu
Publikationsdatum
30.10.2019
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-019-03047-6

Weitere Artikel der Ausgabe 1/2020

The Journal of Supercomputing 1/2020 Zur Ausgabe