Skip to main content
Erschienen in: Social Network Analysis and Mining 4/2013

01.12.2013 | Original Article

FURS: Fast and Unique Representative Subset selection retaining large-scale community structure

verfasst von: Raghvendra Mall, Rocco Langone, Johan A. K. Suykens

Erschienen in: Social Network Analysis and Mining | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a novel algorithm, FURS (Fast and Unique Representative Subset selection) to deterministically select a set of nodes from a given graph which retains the underlying community structure. FURS greedily selects nodes with high-degree centrality from most or all the communities in the network. The nodes with high-degree centrality for each community are usually located at the center rather than the periphery and can better capture the community structure. The nodes are selected such that they are not isolated but can form disconnected components. The FURS is evaluated by quality measures, such as coverage, clustering coefficients, degree distributions and variation of information. Empirically, we observe that the nodes are selected such that most or all of the communities in the original network are retained. We compare our proposed technique with state-of-the-art methods like SlashBurn, Forest-Fire, Metropolis and Snowball Expansion sampling techniques. We evaluate FURS on several synthetic and real-world networks of varying size to demonstrate the high quality of our subset while preserving the community structure. The subset generated by the FURS method can be effectively utilized by model-based approaches with out-of-sample extension properties for inferring community affiliation of the large-scale networks. A consequence of FURS is that the selected subset is also a good candidate set for simple diffusion model. We compare the spread of information over time using FURS for several real-world networks with random node selection, hubs selection, spokes selection, high eigenvector centrality, high Pagerank, high betweenness centrality and low betweenness centrality-based representative subset selection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Adler M, Mitzenmacher M (2000) Towards compressing web graphs. In: Proceedings of IEEE DCC, pp 203–212 Adler M, Mitzenmacher M (2000) Towards compressing web graphs. In: Proceedings of IEEE DCC, pp 203–212
Zurück zum Zitat Alzate C, Suykens J (2010) Multiway spectral clustering with out-of-sample extensions through weighted PCA. IEEE Trans Pattern Anal Mach Intell 32(2):335–347CrossRef Alzate C, Suykens J (2010) Multiway spectral clustering with out-of-sample extensions through weighted PCA. IEEE Trans Pattern Anal Mach Intell 32(2):335–347CrossRef
Zurück zum Zitat Blondel V, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10(P10008) Blondel V, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10(P10008)
Zurück zum Zitat Boguna M, Pastor-Satorras R, Diaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E 70(5) Boguna M, Pastor-Satorras R, Diaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E 70(5)
Zurück zum Zitat Bonacich P (1987) Power and centrality: a family of measures. Am J Sociol 92(5):1170–1182CrossRef Bonacich P (1987) Power and centrality: a family of measures. Am J Sociol 92(5):1170–1182CrossRef
Zurück zum Zitat Bullmore E, Sporns O (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Rev Neurosci 10(4) Bullmore E, Sporns O (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Rev Neurosci 10(4)
Zurück zum Zitat Catanese SA, De Meo P, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. In: Proceedings of international conference on web intelligence, mining and semantics, p 52 Catanese SA, De Meo P, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. In: Proceedings of international conference on web intelligence, mining and semantics, p 52
Zurück zum Zitat Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(066111) Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(066111)
Zurück zum Zitat Coleman J, Katz E, Menzel H (1966) Medical innovation: a diffusion study. Bobbs-Merrill, Indianapolis Coleman J, Katz E, Menzel H (1966) Medical innovation: a diffusion study. Bobbs-Merrill, Indianapolis
Zurück zum Zitat Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: KDD’08, pp 160–168 Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: KDD’08, pp 160–168
Zurück zum Zitat Danon L, Diáz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 09(P09008+) Danon L, Diáz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 09(P09008+)
Zurück zum Zitat Duch J, Arenas A (2005) Community detection in complex networks using external optimization. Phys Rev E 72(2):027104+ Duch J, Arenas A (2005) Community detection in complex networks using external optimization. Phys Rev E 72(2):027104+
Zurück zum Zitat Feder T, Motwani R (1991) Clique partitions, graph compression and speeding-up algorithms. J Comput Syst Sci 123–133 Feder T, Motwani R (1991) Clique partitions, graph compression and speeding-up algorithms. J Comput Syst Sci 123–133
Zurück zum Zitat Feige U (1998) A threshold of ln for approximating set cover. J ACM 45(4):634–652 Feige U (1998) A threshold of ln for approximating set cover. J ACM 45(4):634–652
Zurück zum Zitat Ferrara E (2012) A large-scale community structure analysis in facebook. EPJ Data Sci 1(9):1–30 Ferrara E (2012) A large-scale community structure analysis in facebook. EPJ Data Sci 1(9):1–30
Zurück zum Zitat Fortunato S. (2009) Community detection in graphs. Phys Rep 486:75–174 Fortunato S. (2009) Community detection in graphs. Phys Rep 486:75–174
Zurück zum Zitat Frank O (2005) Network sampling and model fitting. Cambridge Press University, Cambridge Frank O (2005) Network sampling and model fitting. Cambridge Press University, Cambridge
Zurück zum Zitat Freeman L (1979) Centrality in social networks: conceptual clarification. Soc Netw 1(3):215–239CrossRef Freeman L (1979) Centrality in social networks: conceptual clarification. Soc Netw 1(3):215–239CrossRef
Zurück zum Zitat Gilbert A, Levchenko K (2004) Compressing network graphs. In: Proceedings of LinkKDD workshop at the 10th ACM Conference on KDD Gilbert A, Levchenko K (2004) Compressing network graphs. In: Proceedings of LinkKDD workshop at the 10th ACM Conference on KDD
Zurück zum Zitat Gilbert et al (2011) Communities and hierarchical structures in dynamic social networks: analysis and visualization. Soc Netw Anal Min 1(2):83–95 Gilbert et al (2011) Communities and hierarchical structures in dynamic social networks: analysis and visualization. Soc Netw Anal Min 1(2):83–95
Zurück zum Zitat Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM, pp 1–9 Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM, pp 1–9
Zurück zum Zitat Gleich D, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proc of KDD’12, pp 597–605 Gleich D, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proc of KDD’12, pp 597–605
Zurück zum Zitat Goel S, Salganik M (2009) Respondent-driven sampling as markov chain Monte Carlo. Stat Med 17(28):2202–2229MathSciNetCrossRef Goel S, Salganik M (2009) Respondent-driven sampling as markov chain Monte Carlo. Stat Med 17(28):2202–2229MathSciNetCrossRef
Zurück zum Zitat Goldenberg J, Libai B, Muller E (2001) Using complex system analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata. Acad Market Sci Rev 1(9) Goldenberg J, Libai B, Muller E (2001) Using complex system analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata. Acad Market Sci Rev 1(9)
Zurück zum Zitat Goyal S (1996) Interaction structure and social change. J Inst Theor Econ 152:472–495 Goyal S (1996) Interaction structure and social change. J Inst Theor Econ 152:472–495
Zurück zum Zitat Granovetter M (1978) Threshold models of collective behavior. Am J Sociol 83(6):1420–1443 Granovetter M (1978) Threshold models of collective behavior. Am J Sociol 83(6):1420–1443
Zurück zum Zitat Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218 Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Zurück zum Zitat Hubler C, Kriegel H, Borgwardt K, Ghahramani Z. (2008) Metropolis algorithms for representative subgraph sampling. In: ICDM’08, pp 283–292 Hubler C, Kriegel H, Borgwardt K, Ghahramani Z. (2008) Metropolis algorithms for representative subgraph sampling. In: ICDM’08, pp 283–292
Zurück zum Zitat Jeong H, Tombor B, Albert R, Oltvai Z, Barabasi A (2000) The large-scale organization of metabolic networks. Nature 407(6804):651–654CrossRef Jeong H, Tombor B, Albert R, Oltvai Z, Barabasi A (2000) The large-scale organization of metabolic networks. Nature 407(6804):651–654CrossRef
Zurück zum Zitat Kang U, Faloutsos C (2011) Beyond ‘caveman communities’: hubs and spokes for graph compression and mining. In: Proceedings of ICDM’11, pp 300–309 Kang U, Faloutsos C (2011) Beyond ‘caveman communities’: hubs and spokes for graph compression and mining. In: Proceedings of ICDM’11, pp 300–309
Zurück zum Zitat Katz L (1953) A new status index derived from sociometric index. Psychometrika 39–43 Katz L (1953) A new status index derived from sociometric index. Psychometrika 39–43
Zurück zum Zitat Kempe D, Kleinberg J, Tardos E (2005) Influential nodes in a diffusion model for social networks. In: Proceedings of 32nd international colloquium on automata, languages and programming (ICALP) Kempe D, Kleinberg J, Tardos E (2005) Influential nodes in a diffusion model for social networks. In: Proceedings of 32nd international colloquium on automata, languages and programming (ICALP)
Zurück zum Zitat Lancichinetti A, Fortunato S (2009a) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):116–118 Lancichinetti A, Fortunato S (2009a) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):116–118
Zurück zum Zitat Lancichinetti A, Fortunato S (2009b) Community detection algorithms: a comparative analysis. Phys Rev E 80(056117) Lancichinetti A, Fortunato S (2009b) Community detection algorithms: a comparative analysis. Phys Rev E 80(056117)
Zurück zum Zitat Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(033015) Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(033015)
Zurück zum Zitat Langone R, Alzate C, Suykens J (2012) Kernel spectral clustering for community detection in complex networks. In: IEEE WCCI/IJCNN, pp 2596–2603 Langone R, Alzate C, Suykens J (2012) Kernel spectral clustering for community detection in complex networks. In: IEEE WCCI/IJCNN, pp 2596–2603
Zurück zum Zitat Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: KDD’08, pp 462–470 Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: KDD’08, pp 462–470
Zurück zum Zitat Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: KDD’06, pp 631–636 Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: KDD’06, pp 631–636
Zurück zum Zitat Liggett T (1985) Interacting particle systems. Springer, Berlin Liggett T (1985) Interacting particle systems. Springer, Berlin
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, pp 281–297
Zurück zum Zitat Maiya A, Berger-Wolf T (2010) Sampling community structure. In: WWW’10, pp 631–636 Maiya A, Berger-Wolf T (2010) Sampling community structure. In: WWW’10, pp 631–636
Zurück zum Zitat Mall R, Langone R, Suykens J (2013) Kernel spectral clustering for big data networks. Entropy, Special Issue: Big Data, 15(5):1567–1586 Mall R, Langone R, Suykens J (2013) Kernel spectral clustering for big data networks. Entropy, Special Issue: Big Data, 15(5):1567–1586
Zurück zum Zitat Mehler A, Skiena S (2009) Expanding network communities from representative examples. ACM Trans Knowl Discov Data 3(2):1–27CrossRef Mehler A, Skiena S (2009) Expanding network communities from representative examples. ACM Trans Knowl Discov Data 3(2):1–27CrossRef
Zurück zum Zitat Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092CrossRef Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092CrossRef
Zurück zum Zitat Morris S (2000) Contagion. Rev Econ Stud 67(1):57–78 Morris S (2000) Contagion. Rev Econ Stud 67(1):57–78
Zurück zum Zitat Newman M (2006) Finding community structure in networks using eigenvectors of matrices. Phys Rev E 74(3) Newman M (2006) Finding community structure in networks using eigenvectors of matrices. Phys Rev E 74(3)
Zurück zum Zitat Pham M, Klamma R, Jarke M (2011) Development of computer science disciplines: a social network analysis approach. Soc Netw Anal Min 1(4):321–340 Pham M, Klamma R, Jarke M (2011) Development of computer science disciplines: a social network analysis approach. Soc Netw Anal Min 1(4):321–340
Zurück zum Zitat Rabbany R, Takaffoli M, Fagnan J, Zaiane OR, Campello RJGB (2012) Relative validity criteria for community mining algorithms. In: International conference on advances in social networks analysis and mining (ASONAM), pp 258–265 Rabbany R, Takaffoli M, Fagnan J, Zaiane OR, Campello RJGB (2012) Relative validity criteria for community mining algorithms. In: International conference on advances in social networks analysis and mining (ASONAM), pp 258–265
Zurück zum Zitat Rafiei D (2005) Effectively visualizing large networks through sampling. In Proceedings of VIS 05, pp 375–382 Rafiei D (2005) Effectively visualizing large networks through sampling. In Proceedings of VIS 05, pp 375–382
Zurück zum Zitat Rosvall M, Bergstrom C (2008) Maps of random walks on complex networks reveal community structure. PNAS 105:1118–1123 Rosvall M, Bergstrom C (2008) Maps of random walks on complex networks reveal community structure. PNAS 105:1118–1123
Zurück zum Zitat Ryan B, Gross N (1943) The diffusion of hybrid seed corn in two Iowa communities. Rural Sociol 8:15–24 Ryan B, Gross N (1943) The diffusion of hybrid seed corn in two Iowa communities. Rural Sociol 8:15–24
Zurück zum Zitat Saravanan M, Prasad GKK, Suganthi D (2011) Analyzing and labeling telecom communities using structural properties. Soc Netw Anal Min 1(4):271–286CrossRef Saravanan M, Prasad GKK, Suganthi D (2011) Analyzing and labeling telecom communities using structural properties. Soc Netw Anal Min 1(4):271–286CrossRef
Zurück zum Zitat Shelling T (1978) Micromotives and macrobehavior. Norton, New York Shelling T (1978) Micromotives and macrobehavior. Norton, New York
Zurück zum Zitat Wu J, Xiong H, Chen J (2009) Adapting the right measures for k-means clustering. In: Proceedings of SIGKDD’09, pp 877–886 Wu J, Xiong H, Chen J (2009) Adapting the right measures for k-means clustering. In: Proceedings of SIGKDD’09, pp 877–886
Metadaten
Titel
FURS: Fast and Unique Representative Subset selection retaining large-scale community structure
verfasst von
Raghvendra Mall
Rocco Langone
Johan A. K. Suykens
Publikationsdatum
01.12.2013
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 4/2013
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-013-0144-6

Weitere Artikel der Ausgabe 4/2013

Social Network Analysis and Mining 4/2013 Zur Ausgabe