Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 2/2016

01.03.2016

Fast approximation of betweenness centrality through sampling

verfasst von: Matteo Riondato, Evgenios M. Kornaropoulos

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices (or edges) in a network in terms of the fraction of shortest paths that pass through them. Since exact computation in large networks is prohibitively expensive, we present two efficient randomized algorithms for betweenness estimation. The algorithms are based on random sampling of shortest paths and offer probabilistic guarantees on the quality of the approximation. The first algorithm estimates the betweenness of all vertices (or edges): all approximate values are within an additive factor \(\varepsilon \in (0,1)\) from the real values, with probability at least \(1-\delta \). The second algorithm focuses on the top-K vertices (or edges) with highest betweenness and estimate their betweenness value to within a multiplicative factor \(\varepsilon \), with probability at least \(1-\delta \). This is the first algorithm that can compute such approximation for the top-K vertices (or edges). By proving upper and lower bounds to the VC-dimension of a range set associated with the problem at hand, we can bound the sample size needed to achieve the desired approximations. We obtain sample sizes that are independent from the number of vertices in the network and only depend on a characteristic quantity that we call the vertex-diameter, that is the maximum number of vertices in a shortest path. In some cases, the sample size is completely independent from any quantitative property of the graph. An extensive experimental evaluation on real and artificial networks shows that our algorithms are significantly faster and much more scalable as the number of vertices grows than other algorithms with similar approximation guarantees.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that even if \(p_\emptyset =\emptyset \), the set \(\{p_\emptyset \}\) is not empty. It contains one element.
 
2
We use the normalized version of betweenness as we believe it to be more suitable for presenting approximation results.
 
3
After this work was accepted for publication, Bergamini and Meyerhenke (2015) presented an improved upper bound for VD(G) for undirected weighted graphs.
 
4
It is a well-known open problem whether there is an algorithm to perform a single st-shortest path computation between a pair of vertices with smaller worst-case time complexity than the Single Source Shortest Path computation.
 
5
Bounded-distance betweenness is also known as k-betweenness. We prefer the former denomination to avoid confusion with k-path betweenness as defined by Kourtellis et al. (2012).
 
6
We take the average, rather than the sum, as defined in the original work by Kourtellis et al. (2012), to normalize the value so that it belongs to the interval [0, 1]. This has no consequences on the results.
 
7
The implementations are available at http://​cs.​brown.​edu/​~matteo/​centrsampl.​tar.​bz2. An independent implementation of our algorithms is available in NetworKit Staudt et al. (2014).
 
Literatur
Zurück zum Zitat Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2011) VC-dimension and shortest path algorithms. Automata, languages and programming., Lecture notes in computer scienceSpringer, Berlin, pp 690–699. doi:10.1007/978-3-642-22006-7_58 CrossRef Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2011) VC-dimension and shortest path algorithms. Automata, languages and programming., Lecture notes in computer scienceSpringer, Berlin, pp 690–699. doi:10.​1007/​978-3-642-22006-7_​58 CrossRef
Zurück zum Zitat Anthonisse JM (1971) The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch Centrum, Amsterdam Anthonisse JM (1971) The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch Centrum, Amsterdam
Zurück zum Zitat Anthony M, Bartlett PL (1999) Neural network learning—theoretical foundations. Cambridge University Press, New YorkCrossRefMATH Anthony M, Bartlett PL (1999) Neural network learning—theoretical foundations. Cambridge University Press, New YorkCrossRefMATH
Zurück zum Zitat Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Bonato A, Chung F (eds) Algorithms and models for the web-graph. Lecture notes in computer science, vol 4863. Springer, Berlin. doi:10.1007/978-3-540-77004-6_10 Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Bonato A, Chung F (eds) Algorithms and models for the web-graph. Lecture notes in computer science, vol 4863. Springer, Berlin. doi:10.​1007/​978-3-540-77004-6_​10
Zurück zum Zitat Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. CoRR abs/1504.0709 (to appear in ESA’15) Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. CoRR abs/1504.0709 (to appear in ESA’15)
Zurück zum Zitat Bergamini E, Meyerhenke H, Staudt CL (2015) Approximating betweenness centrality in large evolving networks. In: 17th Workshop on Algorithm Engineering and Experiments, ALENEX 2015, SIAM, pp 133–146 Bergamini E, Meyerhenke H, Staudt CL (2015) Approximating betweenness centrality in large evolving networks. In: 17th Workshop on Algorithm Engineering and Experiments, ALENEX 2015, SIAM, pp 133–146
Zurück zum Zitat Boitmanis K, Freivalds K, Lediņš P, Opmanis R (2006) Fast and simple approximation of the diameter and radius of a graph. In: Alvarez C, Serna M (eds) Experimental algorithms. Lecture notes in computer science. Springer, Berlin, pp 98–108. doi:10.1007/11764298_9 Boitmanis K, Freivalds K, Lediņš P, Opmanis R (2006) Fast and simple approximation of the diameter and radius of a graph. In: Alvarez C, Serna M (eds) Experimental algorithms. Lecture notes in computer science. Springer, Berlin, pp 98–108. doi:10.​1007/​11764298_​9
Zurück zum Zitat Csárdi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9 Csárdi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9
Zurück zum Zitat Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41CrossRef Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41CrossRef
Zurück zum Zitat Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Munro JI, Wagner D (eds) Algorithm engineering & experiments (ALENEX’08), SIAM, pp 90–100 Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Munro JI, Wagner D (eds) Algorithm engineering & experiments (ALENEX’08), SIAM, pp 90–100
Zurück zum Zitat Jacob R, Koschützki D, Lehmann K, Peeters L, Tenfelde-Podehl D (2005) Algorithms for centrality indices. In: Brandes U, Erlebach T (eds) Network analysis. Lecture notes in computer science, vol 3418. Springer, Berlin, pp 62–82. doi:10.1007/978-3-540-31955-9_4 Jacob R, Koschützki D, Lehmann K, Peeters L, Tenfelde-Podehl D (2005) Algorithms for centrality indices. In: Brandes U, Erlebach T (eds) Network analysis. Lecture notes in computer science, vol 3418. Springer, Berlin, pp 62–82. doi:10.​1007/​978-3-540-31955-9_​4
Zurück zum Zitat Kourtellis N, Morales GDF, Bonchi F (2014) Scalable online betweenness centrality in evolving graphs. CoRR abs/1401.6981 Kourtellis N, Morales GDF, Bonchi F (2014) Scalable online betweenness centrality in evolving graphs. CoRR abs/1401.6981
Zurück zum Zitat Lim YS, Menasche DS, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. In: IEEE network science workshop, NSW’11, pp 118–122. doi:10.1109/NSW.2011.6004633 Lim YS, Menasche DS, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. In: IEEE network science workshop, NSW’11, pp 118–122. doi:10.​1109/​NSW.​2011.​6004633
Zurück zum Zitat Löffler M, Phillips JM (2009) Shape fitting on point sets with probability distributions. In: Fiat A, Sanders P (eds) Algorithms—ESA 2009. Lecture notes in computer science, vol 5757. Springer, Berlin, pp 313–324. doi:10.1007/978-3-642-04128-0_29 Löffler M, Phillips JM (2009) Shape fitting on point sets with probability distributions. In: Fiat A, Sanders P (eds) Algorithms—ESA 2009. Lecture notes in computer science, vol 5757. Springer, Berlin, pp 313–324. doi:10.​1007/​978-3-642-04128-0_​29
Zurück zum Zitat Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: Zaki M, Yu J, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 6118. Springer, Berlin, pp 91–98. doi:10.1007/978-3-642-13657-3_12 CrossRef Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: Zaki M, Yu J, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 6118. Springer, Berlin, pp 91–98. doi:10.​1007/​978-3-642-13657-3_​12 CrossRef
Zurück zum Zitat Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New YorkCrossRef Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New YorkCrossRef
Zurück zum Zitat Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25(3):662–676CrossRef Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25(3):662–676CrossRef
Zurück zum Zitat Pfeffer J, Carley KM (2012) k-centralities: local approximations of global measures based on shortest paths. In: Proceedings of the 21st international conference on companion on world wide web, WWW ’12 Companion, pp 1043–1050. ACM, New York. doi:10.1145/2187980.2188239 Pfeffer J, Carley KM (2012) k-centralities: local approximations of global measures based on shortest paths. In: Proceedings of the 21st international conference on companion on world wide web, WWW ’12 Companion, pp 1043–1050. ACM, New York. doi:10.​1145/​2187980.​2188239
Zurück zum Zitat Pohl I (1969) Bidirectional heuristic search in path problems. Ph.D. Thesis, Stanford University Pohl I (1969) Bidirectional heuristic search in path problems. Ph.D. Thesis, Stanford University
Zurück zum Zitat Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Castillo C, Metzler D (eds) Proceedings of 7th ACM conference on web search data mining, WSDM’14. ACM, New York Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Castillo C, Metzler D (eds) Proceedings of 7th ACM conference on web search data mining, WSDM’14. ACM, New York
Zurück zum Zitat Riondato M, Upfal E (2014) Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Disc Data 8(2) Riondato M, Upfal E (2014) Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Disc Data 8(2)
Zurück zum Zitat Roditty L, Williams VV (2012) Approximating the diameter of a graph. CoRR abs/1207.3622 Roditty L, Williams VV (2012) Approximating the diameter of a graph. CoRR abs/1207.3622
Zurück zum Zitat Sarıyüce AE, Saule E, Kaya K, Çatalyürek UV (2013) Shattering and compressing networks for betweenness centrality. In: SIAM data mining conference Sarıyüce AE, Saule E, Kaya K, Çatalyürek UV (2013) Shattering and compressing networks for betweenness centrality. In: SIAM data mining conference
Zurück zum Zitat Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New YorkCrossRef Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New YorkCrossRef
Zurück zum Zitat Staudt C, Sazonovs A, Meyerhenke H (2014) Networkit: an interactive tool suite for high-performance network analysis. CoRR abs/1403.3005 Staudt C, Sazonovs A, Meyerhenke H (2014) Networkit: an interactive tool suite for high-performance network analysis. CoRR abs/1403.3005
Zurück zum Zitat Tang J, Zhang C, Cai K, Zhang L, Su Z (2015) Sampling representative users from large social networks. In: AAAI Tang J, Zhang C, Cai K, Zhang L, Su Z (2015) Sampling representative users from large social networks. In: AAAI
Zurück zum Zitat Yoshida Y (2014) Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1416–1425. ACM, New York. doi:10.1145/2623330.2623626 Yoshida Y (2014) Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1416–1425. ACM, New York. doi:10.​1145/​2623330.​2623626
Metadaten
Titel
Fast approximation of betweenness centrality through sampling
verfasst von
Matteo Riondato
Evgenios M. Kornaropoulos
Publikationsdatum
01.03.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 2/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-015-0423-0

Weitere Artikel der Ausgabe 2/2016

Data Mining and Knowledge Discovery 2/2016 Zur Ausgabe

Premium Partner