Skip to main content

2016 | OriginalPaper | Buchkapitel

NearBucket-LSH: Efficient Similarity Search in P2P Networks

verfasst von : Naama Kraus, David Carmel, Idit Keidar, Meni Orenbach

Erschienen in: Similarity Search and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present NearBucket-LSH, an effective algorithm for similarity search in large-scale distributed online social networks organized as peer-to-peer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucket-LSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucket-LSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than \(50\,\%\).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that in a general c-dimensional CAN of N nodes, the expected routing length is \(c/4\left( N^{1/c}\right) \) [25], which equals k/2 for \(c=k\) and \(N=2^k\).
 
2
We transform cosine similarity into angular similarity and then apply the success probability formulas.
 
Literatur
1.
Zurück zum Zitat Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25, 211–230 (2001)CrossRef Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25, 211–230 (2001)CrossRef
2.
Zurück zum Zitat Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRef Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRef
3.
Zurück zum Zitat Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Effects of user similarity in social media. WSDM 2012, pp. 703–712 (2012) Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Effects of user similarity in social media. WSDM 2012, pp. 703–712 (2012)
4.
Zurück zum Zitat Bahmani, B., Goel, A., Shinde, R.: Efficient distributed locality sensitive hashing. In: CIKM 2012, pp. 2174–2178 (2012) Bahmani, B., Goel, A., Shinde, R.: Efficient distributed locality sensitive hashing. In: CIKM 2012, pp. 2174–2178 (2012)
5.
Zurück zum Zitat Batko, M., Novak, D., Falchi, F., Zezula, P.: Scalability comparison of peer-to-peer similarity search structures. Future Gener. Comp. Syst 24(8), 834–848 (2008)CrossRef Batko, M., Novak, D., Falchi, F., Zezula, P.: Scalability comparison of peer-to-peer similarity search structures. Future Gener. Comp. Syst 24(8), 834–848 (2008)CrossRef
6.
Zurück zum Zitat Buchegger, S., Schiöberg, D., Vu, L.H., Datta, A.: PeerSoN: P2P social networking - early experiences and insights. In: SNS 2009, pp. 46–52, 31 March 2009 Buchegger, S., Schiöberg, D., Vu, L.H., Datta, A.: PeerSoN: P2P social networking - early experiences and insights. In: SNS 2009, pp. 46–52, 31 March 2009
7.
Zurück zum Zitat Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC 2002, pp. 380–388 (2002) Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC 2002, pp. 380–388 (2002)
8.
Zurück zum Zitat Chierichetti, F., Kumar, R.: LSH-preserving functions and their applications. In: SODA 2012, pp. 1078–1094 (2012) Chierichetti, F., Kumar, R.: LSH-preserving functions and their applications. In: SODA 2012, pp. 1078–1094 (2012)
9.
Zurück zum Zitat Cutillo, L.A., Molva, R., Önen, M., Safebook: a distributed privacy preserving online social network. In: WOWMOM, pp. 1–3 (2011) Cutillo, L.A., Molva, R., Önen, M., Safebook: a distributed privacy preserving online social network. In: WOWMOM, pp. 1–3 (2011)
10.
Zurück zum Zitat Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SCG 2004, pp. 253–262 (2004) Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SCG 2004, pp. 253–262 (2004)
12.
Zurück zum Zitat Falchi, F., Gennaro, C., Zezula, P.: A content–addressable network for similarity search in metric spaces. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005-2006. LNCS, vol. 4125, pp. 98–110. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71661-7_9 CrossRef Falchi, F., Gennaro, C., Zezula, P.: A content–addressable network for similarity search in metric spaces. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005-2006. LNCS, vol. 4125, pp. 98–110. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-71661-7_​9 CrossRef
14.
Zurück zum Zitat Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, pp. 518–529 (1999) Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, pp. 518–529 (1999)
15.
Zurück zum Zitat Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In EDBT 2009, pp. 744–755 (2009) Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In EDBT 2009, pp. 744–755 (2009)
16.
Zurück zum Zitat Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998, pp. 604–613 (1998) Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998, pp. 604–613 (1998)
18.
Zurück zum Zitat Lua, E.K., Crowcroft, J., Pias, M., Sharma, R., Lim, S.: A survey and comparison of peer-to-peer overlay network schemes. IEEE Commun. Surv. Tutorials 7, 72–93 (2005)CrossRef Lua, E.K., Crowcroft, J., Pias, M., Sharma, R., Lim, S.: A survey and comparison of peer-to-peer overlay network schemes. IEEE Commun. Surv. Tutorials 7, 72–93 (2005)CrossRef
20.
Zurück zum Zitat Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB 2007, pp. 950–961 (2007) Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB 2007, pp. 950–961 (2007)
21.
Zurück zum Zitat Mani, M., Nguyen, A.-M., Crespi, N.: Scope: a prototype for spontaneous P2P social networking. In: PerCom Workshops, pp. 220–225 (2010) Mani, M., Nguyen, A.-M., Crespi, N.: Scope: a prototype for spontaneous P2P social networking. In: PerCom Workshops, pp. 220–225 (2010)
22.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATH
23.
Zurück zum Zitat McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27, 415–444 (2001)CrossRef McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27, 415–444 (2001)CrossRef
24.
Zurück zum Zitat Narendula, R., Papaioannou, T.G., Aberer, K.: Towards the realization of decentralized online social networks: an empirical study. In: ICDCS Workshops, pp. 155–162 (2012) Narendula, R., Papaioannou, T.G., Aberer, K.: Towards the realization of decentralized online social networks: an empirical study. In: ICDCS Workshops, pp. 155–162 (2012)
25.
Zurück zum Zitat Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: SIGCOMM 2001, pp. 161–172, New York, NY, USA (2001) Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: SIGCOMM 2001, pp. 161–172, New York, NY, USA (2001)
26.
Zurück zum Zitat Sundaram, N., Turmukhametova, A., Satish, N., Mostak, T., Indyk, P., Madden, S., Dubey, P.: Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow. 6(14), 1930–1941 (2013)CrossRef Sundaram, N., Turmukhametova, A., Satish, N., Mostak, T., Indyk, P., Madden, S., Dubey, P.: Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow. 6(14), 1930–1941 (2013)CrossRef
28.
Zurück zum Zitat Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: WWW 2010, pp. 981–990 (2010) Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: WWW 2010, pp. 981–990 (2010)
29.
Zurück zum Zitat Yang, J., Leskovec, J.: Defining, evaluating network communities based on ground-truth. In: MDS 2012, pp. 3: 1–3: 8 (2012) Yang, J., Leskovec, J.: Defining, evaluating network communities based on ground-truth. In: MDS 2012, pp. 3: 1–3: 8 (2012)
Metadaten
Titel
NearBucket-LSH: Efficient Similarity Search in P2P Networks
verfasst von
Naama Kraus
David Carmel
Idit Keidar
Meni Orenbach
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46759-7_18

Neuer Inhalt