Skip to main content
Erschienen in: Social Network Analysis and Mining 4/2013

01.12.2013 | Original Article

Robustness of social and web graphs to node removal

verfasst von: Paolo Boldi, Marco Rosa, Sebastiano Vigna

Erschienen in: Social Network Analysis and Mining | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Given a social network, which of its nodes have a stronger impact in determining its structure? More precisely, which node-removal order has the greatest impact on the network structure? We approach this well-known problem for the first time in a setting that combines both web graphs and social networks. Our experiments are performed on datasets that are of orders of magnitude larger than those appearing in the previous literature: this is possible, thanks to some recently developed algorithms and software tools that approximate accurately the number of reachable pairs and the distribution of distances in large graphs. Our experiments highlight deep differences in the structure of social networks and web graphs, show significant limitations of previous experimental results; at the same time, they reveal clustering by label propagation as a new and very effective way of locating nodes that are important from a structural viewpoint.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The reader might find this definition a bit vague, and some variants are often spotted in the literature: this is a general problem, also highlighted recently by Li et al. (2005).
 
2
It should be remarked that the Milgram’s experiment tried to prove two properties at the same time. First, the average distance between individuals is much smaller than expected; second, the individuals are able to exploit such a feature to route messages along short paths, albeit they only possess local information about the network they live in. This second property is, in a sense, not only more interesting than the former, but also more difficult to describe and study, because it has to do with some information that the nodes possess about the environment they inhabit.
 
3
A reachable pair is a pair of nodes \(\langle x, y\rangle\) such that there is a directed path from x to y; the distance distribution of a graph is a discrete distribution that gives, for every t, the fraction of reachable pairs of nodes that are at distance t.
 
4
Observe that we delete nodes but count the percentage of arcs (rather than nodes) that have been removed: this choice is justified by the fact that otherwise node orderings that put large-degree nodes first would certainly be considered (unfairly) more disruptive.
 
5
We remark that several proposals have been made to find features that highlight such structural differences in a computationwise-feasible way (e.g., assortative mixing (Newman and Park 2003)), but all instances we are aware of have been questioned by the subsequent literature, so no clear-cut results are known so far. An exception is the idea of considering the spid [shortest-path index of dispersion (Boldi et al. 2011a)], which is experimentally larger than the one for web graphs and smaller than the one for social networks. For instance, the spid of the entire Facebook graph is 0.09 (Backstrom et al. 2012).
 
6
Actually, the notion had been introduced before by Marchiori and Latora (2000) and named connectivity length, but we find the name “harmonic diameter” much more insightful.
 
7
We purposely use the word “divergence” between distributions, instead of “distance”, to avoid confusion with the notion of distance in a graph.
 
8
It is mostly a matter of taste whether to use directed symmetric graphs or simple undirected graphs. In our case, since we have to cope with both directed and undirected graph, we prefer to speak of directed graphs that are symmetric, that is, for every arc xy, there is a symmetric arc yx.
 
9
Label propagation has been independently proposed under the name of peer pressure clustering by Gilbert et al. (2007).
 
10
There are sampled variants of Brandes’s algorithm (Brandes and Erlebach 2005a), but the Hoeffding bound providing precision guarantees requires \(\Uptheta(n^4\log n/\varepsilon^2)\) visits to obtain absolute precision \(\varepsilon. \)
 
11
In Boldi et al. (2011b), we also presented the outcomes of similar experiments performed on other networks (Amazon, Enron and . it ) that agree with the ones shown here. Our tables and graphs slightly differ from those previously published in (Boldi et al. 2011b), because we had time to generate more runs, and thus, increase the precision of our results.
 
12
http://​law.​di.​unimi.​it/​. In particular, the graphs we used are the datasets named hollywood-2011, ljournal-2008, orkut-2007, in-2004 and uk-2007-05. Note that isolated nodes have been removed from hollywood-2011.
 
14
We remark that in some cases, the measure is negative or does not decrease monotonically. This is sometimes an artifact of the probabilistic technique used to estimate our measures—small relative errors are unavoidable.
 
15
In this paper, like in all the other experimental research on the same topic, conclusions about social networks should be taken with a grain of salt, due to the heterogeneity of such networks and the lack of a large repertoire of examples.
 
Literatur
Zurück zum Zitat Albert R, Jeong H, Barabási A-L (2000) Error and attack tolerance of complex networks. Nature 406:378–382CrossRef Albert R, Jeong H, Barabási A-L (2000) Error and attack tolerance of complex networks. Nature 406:378–382CrossRef
Zurück zum Zitat Anthonisse JM (1971) The rush in a graph. Technical report, University of Amsterdam Mathematical Centre, Amsterdam Anthonisse JM (1971) The rush in a graph. Technical report, University of Amsterdam Mathematical Centre, Amsterdam
Zurück zum Zitat Backstrom L, Boldi P, Rosa M, Ugander J, Vigna S (2012) Four degrees of separation. In: ACM Web Science 2012: Conference Proceedings, pp 45–54 (Best paper award) Backstrom L, Boldi P, Rosa M, Ugander J, Vigna S (2012) Four degrees of separation. In: ACM Web Science 2012: Conference Proceedings, pp 45–54 (Best paper award)
Zurück zum Zitat Bavelas A (1950) Communication patterns in task-oriented groups. J Acoust Soc Am 57:271–282 Bavelas A (1950) Communication patterns in task-oriented groups. J Acoust Soc Am 57:271–282
Zurück zum Zitat Boldi P, Vigna S (2012a) Four degrees of separation, really. Arxiv preprint arxiv:1205.5509 Boldi P, Vigna S (2012a) Four degrees of separation, really. Arxiv preprint arxiv:1205.5509
Zurück zum Zitat Boldi P, Vigna S (2012b). Harmonic centrality (in preparation) Boldi P, Vigna S (2012b). Harmonic centrality (in preparation)
Zurück zum Zitat Boldi P, Santini M, Vigna S (2009) Page Rank: functional dependencies. ACM Trans Inf Sys 27(4):1–23CrossRef Boldi P, Santini M, Vigna S (2009) Page Rank: functional dependencies. ACM Trans Inf Sys 27(4):1–23CrossRef
Zurück zum Zitat Boldi P, Rosa M, Vigna S (2011a) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Srinivasan S, Ramamritham S, Kumar A, Ravindra MP, Bertino E, Kumar R (eds) Proceedings of the 20th international conference on World Wide Web. ACM, pp 625–634 Boldi P, Rosa M, Vigna S (2011a) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Srinivasan S, Ramamritham S, Kumar A, Ravindra MP, Bertino E, Kumar R (eds) Proceedings of the 20th international conference on World Wide Web. ACM, pp 625–634
Zurück zum Zitat Boldi P, Rosa M, Vigna S (2011b) Robustness of social networks: Comparative results based on distance distributions. In: Social Informatics, Third International Conference, SocInfo 2011. Lecture Notes in Computer Science, vol 6894. Springer, Berlin, pp 8–21 Boldi P, Rosa M, Vigna S (2011b) Robustness of social networks: Comparative results based on distance distributions. In: Social Informatics, Third International Conference, SocInfo 2011. Lecture Notes in Computer Science, vol 6894. Springer, Berlin, pp 8–21
Zurück zum Zitat Borgatti SP (2005) Centrality and network flow. Soc Netw 27(1):55–71CrossRef Borgatti SP (2005) Centrality and network flow. Soc Netw 27(1):55–71CrossRef
Zurück zum Zitat Borgatti SP (2006) Identifying sets of key players in a social network. Comput Math Organ Theory 12:21–34CrossRefMATH Borgatti SP (2006) Identifying sets of key players in a social network. Comput Math Organ Theory 12:21–34CrossRefMATH
Zurück zum Zitat Borgatti SP, Carley KM, Krackhardt D (2006) On the robustness of centrality measures under conditions of imperfect data. Soc Netw 28(2):124–136CrossRef Borgatti SP, Carley KM, Krackhardt D (2006) On the robustness of centrality measures under conditions of imperfect data. Soc Netw 28(2):124–136CrossRef
Zurück zum Zitat Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177CrossRefMATH Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177CrossRefMATH
Zurück zum Zitat Brandes U, Erlebach T (eds) (2005a) Network analysis: methodological foundations. Lecture Notes in Computer Science, vol 3418. Springer, Berlin Brandes U, Erlebach T (eds) (2005a) Network analysis: methodological foundations. Lecture Notes in Computer Science, vol 3418. Springer, Berlin
Zurück zum Zitat Brandes U, Erlebach T (2005b) Network analysis: methodological foundations (Lecture Notes in Computer Science). Number 3418 in Lecture Notes in Computer Science. Springer, Berlin Brandes U, Erlebach T (2005b) Network analysis: methodological foundations (Lecture Notes in Computer Science). Number 3418 in Lecture Notes in Computer Science. Springer, Berlin
Zurück zum Zitat Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 219–228 Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 219–228
Zurück zum Zitat Cohen E (1997) Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci 55:441–453CrossRefMATH Cohen E (1997) Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci 55:441–453CrossRefMATH
Zurück zum Zitat Cohen R, Havlin S (2010) Complex networks: structure, robustness and function. Cambridge University Press, Cambridge Cohen R, Havlin S (2010) Complex networks: structure, robustness and function. Cambridge University Press, Cambridge
Zurück zum Zitat Donato D, Leonardi S, Millozzi S, Tsaparas P (2008) Mining the inner structure of the web graph. J Phys A: Math Theor 41(22):224017MathSciNetCrossRef Donato D, Leonardi S, Millozzi S, Tsaparas P (2008) Mining the inner structure of the web graph. J Phys A: Math Theor 41(22):224017MathSciNetCrossRef
Zurück zum Zitat Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat 37(1):36–48MathSciNet Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat 37(1):36–48MathSciNet
Zurück zum Zitat Flajolet P, Fusy É, Gandouet O, Meunier F (2007) HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the 13th conference on analysis of algorithm (AofA 07), Juan-les-Pins, pp 127–146 Flajolet P, Fusy É, Gandouet O, Meunier F (2007) HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the 13th conference on analysis of algorithm (AofA 07), Juan-les-Pins, pp 127–146
Zurück zum Zitat Fogaras D (2003) Where to start browsing the web? In: Innovative Internet Community Systems, Third International Workshop, IICS 2003. Lecture Notes in Computer Science, vol 2877. Springer, Leipzig, pp 65–79 Fogaras D (2003) Where to start browsing the web? In: Innovative Internet Community Systems, Third International Workshop, IICS 2003. Lecture Notes in Computer Science, vol 2877. Springer, Leipzig, pp 65–79
Zurück zum Zitat Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41CrossRef Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41CrossRef
Zurück zum Zitat Gilbert JR, Reinhardt S, Shah V (2007) High-performance graph algorithms from parallel sparse matrices. In: Kagstrom B, Elmroth E, Dongarra J, Wasniewski J (eds) Applied parallel computing. State of the Art in Scientific Computing (8th PARA’06). Lecture Notes in Computer Science, vol 4699. Springer, New York, pp 260–269 Gilbert JR, Reinhardt S, Shah V (2007) High-performance graph algorithms from parallel sparse matrices. In: Kagstrom B, Elmroth E, Dongarra J, Wasniewski J (eds) Applied parallel computing. State of the Art in Scientific Computing (8th PARA’06). Lecture Notes in Computer Science, vol 4699. Springer, New York, pp 260–269
Zurück zum Zitat Li L, Alderson DL, Doyle J, Willinger W (2005) Towards a theory of scale-free graphs: definition, properties, and implications. Internet Math 2(4):431–523MathSciNetCrossRefMATH Li L, Alderson DL, Doyle J, Willinger W (2005) Towards a theory of scale-free graphs: definition, properties, and implications. Internet Math 2(4):431–523MathSciNetCrossRefMATH
Zurück zum Zitat Marchiori M, Latora V (2000) Harmony in the small-world. Physica A: Stat Mech Appl 285(3-4):539–546CrossRefMATH Marchiori M, Latora V (2000) Harmony in the small-world. Physica A: Stat Mech Appl 285(3-4):539–546CrossRefMATH
Zurück zum Zitat Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego
Zurück zum Zitat Newman MEJ, Park J (2003) Why social networks are different from other types of networks. Phys Rev E 68(3):036122CrossRef Newman MEJ, Park J (2003) Why social networks are different from other types of networks. Phys Rev E 68(3):036122CrossRef
Zurück zum Zitat Page L, Brin S, Motwani R, Winograd T (1998) The Page Rank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford Page L, Brin S, Motwani R, Winograd T (1998) The Page Rank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford
Zurück zum Zitat Palmer CR, Gibbons PB, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 81–90 Palmer CR, Gibbons PB, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 81–90
Zurück zum Zitat Travers J, Milgram S (1969) An experimental study of the small world problem. Sociometry 32(4):425–443CrossRef Travers J, Milgram S (1969) An experimental study of the small world problem. Sociometry 32(4):425–443CrossRef
Metadaten
Titel
Robustness of social and web graphs to node removal
verfasst von
Paolo Boldi
Marco Rosa
Sebastiano Vigna
Publikationsdatum
01.12.2013
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 4/2013
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-013-0096-x

Weitere Artikel der Ausgabe 4/2013

Social Network Analysis and Mining 4/2013 Zur Ausgabe

Premium Partner