Skip to main content
Erschienen in: Knowledge and Information Systems 3/2019

14.02.2018 | Regular Paper

Practical characterization of large networks using neighborhood information

verfasst von: Pinghui Wang, Junzhou Zhao, Bruno Ribeiro, John C. S. Lui, Don Towsley, Xiaohong Guan

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Characterizing large complex networks such as online social networks through node querying is a challenging task. Network service providers often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network of interest. Various ad hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on developing sampling methods for large networks where querying a node also reveals partial structural information about its neighbors. Our methods are optimized for NoSQL graph databases (if the database can be accessed directly), or utilize Web APIs available on most major large networks for graph sampling. We show that our sampling method has provable convergence guarantees on being an unbiased estimator, and it is more accurate than state-of-the-art methods. We also explore methods to uncover shortest paths between a subset of nodes and detect high degree nodes by sampling only a small fraction of the network of interest. Our results demonstrate that utilizing neighborhood information yields methods that are two orders of magnitude faster than state-of-the-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: SIGKDD, pp 631–636 Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: SIGKDD, pp 631–636
2.
Zurück zum Zitat Hubler C et al (2008) Metropolis algorithms for representative subgraph sampling. In: ICDM, pp 283–292 Hubler C et al (2008) Metropolis algorithms for representative subgraph sampling. In: ICDM, pp 283–292
3.
Zurück zum Zitat Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: SIGKDD, pp 105–113 Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: SIGKDD, pp 105–113
4.
Zurück zum Zitat Ahmed NK et al (2012) Network sampling: from static to streaming graphs. TKDD 8(2):7:1–7:56 Ahmed NK et al (2012) Network sampling: from static to streaming graphs. TKDD 8(2):7:1–7:56
5.
Zurück zum Zitat Dasgupta A et al (2012) Social sampling. In: SIGKDD, pp 235–243 Dasgupta A et al (2012) Social sampling. In: SIGKDD, pp 235–243
6.
Zurück zum Zitat Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: IMC, pp 390–403 Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: IMC, pp 390–403
7.
Zurück zum Zitat Gjoka M et al (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: INFOCOM, pp 2498–2506 Gjoka M et al (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: INFOCOM, pp 2498–2506
8.
Zurück zum Zitat Ribeiro B, Towsley D (2012) On the estimation accuracy of degree distributions from graph sampling. In: CDC, pp 1–6 Ribeiro B, Towsley D (2012) On the estimation accuracy of degree distributions from graph sampling. In: CDC, pp 1–6
9.
Zurück zum Zitat Avrachenkov K et al (2010) Improving random walk estimation accuracy with uniform restarts. In: WAW, pp 98–109 Avrachenkov K et al (2010) Improving random walk estimation accuracy with uniform restarts. In: WAW, pp 98–109
11.
Zurück zum Zitat Lovász L (1993) Random walks on graphs: a survey. Combinatorics 2:1–46 Lovász L (1993) Random walks on graphs: a survey. Combinatorics 2:1–46
12.
Zurück zum Zitat Ribeiro B et al (2010) Multiple random walks to uncover short paths in power law networks. In: INFOCOM NetSciCom, pp 1–6 Ribeiro B et al (2010) Multiple random walks to uncover short paths in power law networks. In: INFOCOM NetSciCom, pp 1–6
15.
Zurück zum Zitat Kurant M et al (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: SIGMETRICS, pp 281–292 Kurant M et al (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: SIGMETRICS, pp 281–292
16.
17.
Zurück zum Zitat Lee CH et al (2012) Beyond random walk and Metropolis–Hastings samplers: Why you should not backtrack for unbiased graph sampling. In: SIGMETRICS/Performance, pp 319–330 Lee CH et al (2012) Beyond random walk and Metropolis–Hastings samplers: Why you should not backtrack for unbiased graph sampling. In: SIGMETRICS/Performance, pp 319–330
18.
Zurück zum Zitat Lim Y et al (2011) Online estimating the \(k\) central nodes of a network. In: NSW, pp 1–6 Lim Y et al (2011) Online estimating the \(k\) central nodes of a network. In: NSW, pp 1–6
19.
Zurück zum Zitat Cooper C et al (2012) A fast algorithm to find all high degree vertices in power law graphs. In: WWW LSNA, pp 1007–1016 Cooper C et al (2012) A fast algorithm to find all high degree vertices in power law graphs. In: WWW LSNA, pp 1007–1016
20.
Zurück zum Zitat Coppersmith D et al (1993) Random walks on weighted graphs, and applications to on-line algorithms (extended). J ACM 40:421–453CrossRefMATH Coppersmith D et al (1993) Random walks on weighted graphs, and applications to on-line algorithms (extended). J ACM 40:421–453CrossRefMATH
21.
Zurück zum Zitat Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: PAKDD, pp 91–98 Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: PAKDD, pp 91–98
22.
Zurück zum Zitat Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: SIGKDD, pp 105–113 Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: SIGKDD, pp 105–113
23.
Zurück zum Zitat Hui P et al (2008) BUBBLE Rap: social-based forwarding in delay tolerant networks. In: MobiHoc, pp 241–250 Hui P et al (2008) BUBBLE Rap: social-based forwarding in delay tolerant networks. In: MobiHoc, pp 241–250
24.
Zurück zum Zitat Ribeiro B et al (2012) Multiple random walks to uncover short paths in power law networks. In: Infocom NetSciCom, pp 1–6 Ribeiro B et al (2012) Multiple random walks to uncover short paths in power law networks. In: Infocom NetSciCom, pp 1–6
25.
Zurück zum Zitat Wang P et al (2012) Sampling contents distributed over graphs. Technical Report TR-1201, Xi’an Jiaotong University Wang P et al (2012) Sampling contents distributed over graphs. Technical Report TR-1201, Xi’an Jiaotong University
26.
Zurück zum Zitat Mislove A et al (2007) Measurement and analysis of online social networks. In: IMC, pp 29–42 Mislove A et al (2007) Measurement and analysis of online social networks. In: IMC, pp 29–42
27.
Zurück zum Zitat Richardson M et al (2003) Trust management for the semantic web. In: ISWC, pp 351–368 Richardson M et al (2003) Trust management for the semantic web. In: ISWC, pp 351–368
28.
Zurück zum Zitat Leskovec J et al (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefMATH Leskovec J et al (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefMATH
29.
Zurück zum Zitat Ribeiro B et al (2012) Sampling directed graphs with random walks. In: INFOCOM, pp 1692–1700 Ribeiro B et al (2012) Sampling directed graphs with random walks. In: INFOCOM, pp 1692–1700
30.
Zurück zum Zitat Kurant M et al (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: SIGMETRICS, pp 241–252 Kurant M et al (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: SIGMETRICS, pp 241–252
31.
Zurück zum Zitat Kurant M et al (2011) Towards unbiased BFS sampling. JSAC 29(9):1799–1809 Kurant M et al (2011) Towards unbiased BFS sampling. JSAC 29(9):1799–1809
32.
Zurück zum Zitat Heckathorn DD (2002) Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Soc Probl 49(1):11–34CrossRef Heckathorn DD (2002) Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Soc Probl 49(1):11–34CrossRef
33.
Zurück zum Zitat Salganik MJ, Heckathorn DD (2004) Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Methodol 49(1):11–34 Salganik MJ, Heckathorn DD (2004) Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Methodol 49(1):11–34
34.
Zurück zum Zitat Stutzbach D et al (2009) On unbiased sampling for unstructured peer-to-peer networks. TON 17(2):377–390 Stutzbach D et al (2009) On unbiased sampling for unstructured peer-to-peer networks. TON 17(2):377–390
35.
Zurück zum Zitat Rasti AH et al (2009) Respondent-driven sampling for characterizing unstructured overlays. In: INFOCOM Mini-conference, pp 2701–2705 Rasti AH et al (2009) Respondent-driven sampling for characterizing unstructured overlays. In: INFOCOM Mini-conference, pp 2701–2705
Metadaten
Titel
Practical characterization of large networks using neighborhood information
verfasst von
Pinghui Wang
Junzhou Zhao
Bruno Ribeiro
John C. S. Lui
Don Towsley
Xiaohong Guan
Publikationsdatum
14.02.2018
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1167-0

Weitere Artikel der Ausgabe 3/2019

Knowledge and Information Systems 3/2019 Zur Ausgabe