Skip to main content
Top
Published in: The Journal of Supercomputing 1/2020

30-10-2019

LSH-based distributed similarity indexing with load balancing in high-dimensional space

Authors: Jiagao Wu, Lu Shen, Linfeng Liu

Published in: The Journal of Supercomputing | Issue 1/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous \(l_2\) norms, respectively. Unlike earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Tao Y, Yi K, Sheng C, Kalnis P (2010) Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans Database Syst 35(3):1–20CrossRef Tao Y, Yi K, Sheng C, Kalnis P (2010) Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans Database Syst 35(3):1–20CrossRef
2.
go back to reference Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. VLDB 99(6):518–529 Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. VLDB 99(6):518–529
3.
go back to reference Hu W, Fan Y, Xing J et al (2018) Deep constrained siamese hash coding network and load-balanced locality-sensitive hashing for near duplicate image detection. IEEE Trans Image Process 27:4452–4464MathSciNetCrossRef Hu W, Fan Y, Xing J et al (2018) Deep constrained siamese hash coding network and load-balanced locality-sensitive hashing for near duplicate image detection. IEEE Trans Image Process 27:4452–4464MathSciNetCrossRef
4.
go back to reference Gan J, Feng J, Fang Q, Ng W (2012) Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, New York, pp 541–552 Gan J, Feng J, Fang Q, Ng W (2012) Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, New York, pp 541–552
5.
go back to reference Shen L, Wu J, Wang Y, Feng L (2018) Towards load balancing for LSH-based distributed similarity indexing in high-dimensional space. In; IEEE 20th International Conference on High Performance Computing and Communications and IEEE 16th International Conference on Smart City and IEEE 4th International Conference on Data Science and Systems, pp 384–391 Shen L, Wu J, Wang Y, Feng L (2018) Towards load balancing for LSH-based distributed similarity indexing in high-dimensional space. In; IEEE 20th International Conference on High Performance Computing and Communications and IEEE 16th International Conference on Smart City and IEEE 4th International Conference on Data Science and Systems, pp 384–391
6.
go back to reference Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm. Society for Industrial and Applied Mathematics, Philadelphia, pp 1186–1195 Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm. Society for Industrial and Applied Mathematics, Philadelphia, pp 1186–1195
7.
go back to reference Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, Rio de Janeiro, pp 950–961 Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, Rio de Janeiro, pp 950–961
8.
go back to reference Christiani T (2017) A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, New York, pp 31–46 Christiani T (2017) A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, New York, pp 31–46
9.
go back to reference Kraus N, Carmel D, Keidar I, Orenbach M (2016) Nearbucket-LSH: efficient similarity search in P2P networks. In: International Conference on Similarity Search and Applications. Springer, Cham, pp 236–249 Kraus N, Carmel D, Keidar I, Orenbach M (2016) Nearbucket-LSH: efficient similarity search in P2P networks. In: International Conference on Similarity Search and Applications. Springer, Cham, pp 236–249
10.
go back to reference Karapiperis D, Gkoulalas-Divanis A, Verykios V (2016) LSHDB: a parallel and distributed engine for record linkage and similarity search In: IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, New York, pp 1–4 Karapiperis D, Gkoulalas-Divanis A, Verykios V (2016) LSHDB: a parallel and distributed engine for record linkage and similarity search In: IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, New York, pp 1–4
11.
go back to reference Qi L et al (2017) A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE J Sel Areas Commun 35(11):2616–2624CrossRef Qi L et al (2017) A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE J Sel Areas Commun 35(11):2616–2624CrossRef
12.
go back to reference Zhai D et al (2018) Supervised distributed hashing for large-scale multimedia retrieval. IEEE Trans Multimedia 20(3):675–686CrossRef Zhai D et al (2018) Supervised distributed hashing for large-scale multimedia retrieval. IEEE Trans Multimedia 20(3):675–686CrossRef
13.
go back to reference Liao J, Yang D, Li T, Qi Q, Wang J, Sun H (2016) Fusion feature for LSH-based image retrieval in a cloud datacenter. Multimedia Tools Appl 75:15405–15427CrossRef Liao J, Yang D, Li T, Qi Q, Wang J, Sun H (2016) Fusion feature for LSH-based image retrieval in a cloud datacenter. Multimedia Tools Appl 75:15405–15427CrossRef
14.
go back to reference Chuang Y-T, Yu C-Y, Wu Q-W (2018) DSLM: a decentralized search for large and mobile networks. J Supercomput 74:738–767CrossRef Chuang Y-T, Yu C-Y, Wu Q-W (2018) DSLM: a decentralized search for large and mobile networks. J Supercomput 74:738–767CrossRef
15.
go back to reference Bahmani B, Goel A, Shinde R (2012) Efficient distributed locality sensitive hashing. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, pp 2174–2178 Bahmani B, Goel A, Shinde R (2012) Efficient distributed locality sensitive hashing. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, pp 2174–2178
16.
go back to reference Wadhwa S, Gupta P (2010) Distributed locality sensitivity hashing. In: Consumer Communications and Networking Conference, pp 1–4 Wadhwa S, Gupta P (2010) Distributed locality sensitivity hashing. In: Consumer Communications and Networking Conference, pp 1–4
17.
go back to reference Haghani P, Michel S, Aberer K (2009) Distributed similarity search in high dimensions using locality sensitive hashing. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, New York, pp 744–755 Haghani P, Michel S, Aberer K (2009) Distributed similarity search in high dimensions using locality sensitive hashing. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, New York, pp 744–755
18.
go back to reference Raghu G et al (2018) Memory-based load balancing algorithm in structured peer-to-peer system. In: Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Springer, Berlin, pp 431–439 Raghu G et al (2018) Memory-based load balancing algorithm in structured peer-to-peer system. In: Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Springer, Berlin, pp 431–439
19.
go back to reference Lee KM, Jeong Y-S, Lee SH, Lee KM (2019) Bucket-size balancing locality sensitive hashing using the map reduce paradigm. Clust Comput J Netw Softw Tools Appl 22:S1959–S1971 Lee KM, Jeong Y-S, Lee SH, Lee KM (2019) Bucket-size balancing locality sensitive hashing using the map reduce paradigm. Clust Comput J Netw Softw Tools Appl 22:S1959–S1971
20.
go back to reference Wu X, Zeng Y, Lin G (2017) An energy efficient VM migration algorithm in data centers. In: Proceedings of IEEE 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp 27–30 Wu X, Zeng Y, Lin G (2017) An energy efficient VM migration algorithm in data centers. In: Proceedings of IEEE 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp 27–30
21.
go back to reference Chen H, Zhu J, Zhang Z, Ma M, Shen X (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Supercomput 73:4906–4922CrossRef Chen H, Zhu J, Zhang Z, Ma M, Shen X (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Supercomput 73:4906–4922CrossRef
22.
go back to reference Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Future Gener Comput Syst 94:453–467CrossRef Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Future Gener Comput Syst 94:453–467CrossRef
23.
go back to reference Haghani P, Michel S, CudreMauroux P, Aberer K (2008) LSH at large-distributed KNN search in high dimensions. In: WebDB Haghani P, Michel S, CudreMauroux P, Aberer K (2008) LSH at large-distributed KNN search in high dimensions. In: WebDB
24.
go back to reference Datar M, Immorlica N, Indyk P, Mirrokni V (2004) Locality sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry. ACM, New York, pp 253–262 Datar M, Immorlica N, Indyk P, Mirrokni V (2004) Locality sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry. ACM, New York, pp 253–262
25.
go back to reference Balakrishnan H, Kaashoek F, Karger D, Morris R, Stoica I (2003) Looking up data in P2P systems. Commun ACM 46(2):43–48CrossRef Balakrishnan H, Kaashoek F, Karger D, Morris R, Stoica I (2003) Looking up data in P2P systems. Commun ACM 46(2):43–48CrossRef
26.
go back to reference Godfrey B, Lakshminarayanan K, Surana S, Karp R, Stoica I (2004) Load balancing in dynamic structured P2P systems. In: Twenty-Third AnnualJoint Conference of the IEEE Computer and Communications Societies, INFOCOM. pp 2253–2262 Godfrey B, Lakshminarayanan K, Surana S, Karp R, Stoica I (2004) Load balancing in dynamic structured P2P systems. In: Twenty-Third AnnualJoint Conference of the IEEE Computer and Communications Societies, INFOCOM. pp 2253–2262
27.
go back to reference Pitoura T, Triantafillou P (2007) Load distribution fairness in P2P data management systems. In: IEEE 23rd International Conference on Data Engineering, ICDE. IEEE, New York, pp 396–405 Pitoura T, Triantafillou P (2007) Load distribution fairness in P2P data management systems. In: IEEE 23rd International Conference on Data Engineering, ICDE. IEEE, New York, pp 396–405
Metadata
Title
LSH-based distributed similarity indexing with load balancing in high-dimensional space
Authors
Jiagao Wu
Lu Shen
Linfeng Liu
Publication date
30-10-2019
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 1/2020
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-019-03047-6

Other articles of this Issue 1/2020

The Journal of Supercomputing 1/2020 Go to the issue

Premium Partner