Skip to main content
Erschienen in: Knowledge and Information Systems 1/2015

01.01.2015 | Regular Paper

Defining and evaluating network communities based on ground-truth

verfasst von: Jaewon Yang, Jure Leskovec

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of definitions of network communities, intractability of methods for detecting them, and the issues with evaluation which stem from the lack of a reliable gold-standard ground-truth. In this paper, we distinguish between structural and functional definitions of network communities. Structural definitions of communities are based on connectivity patterns, like the density of connections between the community members, while functional definitions are based on (often unobserved) common function or role of the community members in the network. We argue that the goal of network community detection is to extract functional communities based on the connectivity structure of the nodes in the network. We then identify networks with explicitly labeled functional communities to which we refer as ground-truth communities. In particular, we study a set of 230 large real-world social, collaboration, and information networks where nodes explicitly state their community memberships. For example, in social networks, nodes explicitly join various interest-based social groups. We use such social groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology, which allows us to compare and quantitatively evaluate how different structural definitions of communities correspond to ground-truth functional communities. We study 13 commonly used structural definitions of communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad participation ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than 100 million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Abrahao BD, Soundarajan S, Hopcroft JE, Kleinberg R (2012) On the separability of structural classes of communities. In KDD ’12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 624–632 Abrahao BD, Soundarajan S, Hopcroft JE, Kleinberg R (2012) On the separability of structural classes of communities. In KDD ’12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 624–632
2.
Zurück zum Zitat Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 466:761–764CrossRef Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 466:761–764CrossRef
3.
Zurück zum Zitat Andersen R, Chung F, Lang K (2006) Local graph partitioning using PageRank vectors. In FOCS ’06: proceedings of the 47th annual IEEE symposium on foundations of computer science, pp 475–486 Andersen R, Chung F, Lang K (2006) Local graph partitioning using PageRank vectors. In FOCS ’06: proceedings of the 47th annual IEEE symposium on foundations of computer science, pp 475–486
4.
Zurück zum Zitat Andersen R, Lang K (2006) Communities from seed sets. In: WWW ’06 proceedings of the 15th international conference on, World Wide Web, pp 223–232 Andersen R, Lang K (2006) Communities from seed sets. In: WWW ’06 proceedings of the 15th international conference on, World Wide Web, pp 223–232
5.
Zurück zum Zitat Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth and evolution. In KDD ’06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 44–54 Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth and evolution. In KDD ’06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 44–54
6.
Zurück zum Zitat Danon L, Duch J, Diaz-Guilera A, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 29(09):P09008 Danon L, Duch J, Diaz-Guilera A, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 29(09):P09008
7.
Zurück zum Zitat Dhillon I, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957CrossRef Dhillon I, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957CrossRef
8.
Zurück zum Zitat Feld SL (1981) The focused organization of social ties. Am J Sociol 86(5):1015–1035CrossRef Feld SL (1981) The focused organization of social ties. Am J Sociol 86(5):1015–1035CrossRef
9.
Zurück zum Zitat Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In KDD ’00: proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 150–160 Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In KDD ’00: proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 150–160
11.
Zurück zum Zitat Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Nat Acad Sci USA 104(1):36–41CrossRef Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Nat Acad Sci USA 104(1):36–41CrossRef
12.
13.
Zurück zum Zitat Gleich DF, Seshadhri C (2012) Neighborhoods are good communities. In KDD ’12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 597–605 Gleich DF, Seshadhri C (2012) Neighborhoods are good communities. In KDD ’12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 597–605
14.
Zurück zum Zitat Granovetter MS (1973) The strength of weak ties. Am J Sociol 78:1360–1380CrossRef Granovetter MS (1973) The strength of weak ties. Am J Sociol 78:1360–1380CrossRef
15.
Zurück zum Zitat Kairam S, Wang D, Leskovec J (2012) The life and death of online groups: predicting group growth and longevity. In WSDM ’12: ACM international conference on web search and data mining Kairam S, Wang D, Leskovec J (2012) The life and death of online groups: predicting group growth and longevity. In WSDM ’12: ACM international conference on web search and data mining
16.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392CrossRefMathSciNet Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392CrossRefMathSciNet
17.
Zurück zum Zitat Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Q 2:83–97CrossRef Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Q 2:83–97CrossRef
18.
Zurück zum Zitat Leskovec J, Adamic L, Huberman B (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5CrossRef Leskovec J, Adamic L, Huberman B (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5CrossRef
19.
Zurück zum Zitat Leskovec J, Lang K, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In WWW ’10: proceedings of the 19th international conference on World Wide Web Leskovec J, Lang K, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In WWW ’10: proceedings of the 19th international conference on World Wide Web
20.
Zurück zum Zitat Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123CrossRefMathSciNetMATH Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123CrossRefMathSciNetMATH
21.
Zurück zum Zitat Lin W, Kong X, Yu PS, Wu Q, Jia Y, Li C (2012) Community detection in incomplete information networks. In WWW ’12: proceedings of the 21st international conference on, World Wide Web, pp 341–350 Lin W, Kong X, Yu PS, Wu Q, Jia Y, Li C (2012) Community detection in incomplete information networks. In WWW ’12: proceedings of the 21st international conference on, World Wide Web, pp 341–350
22.
Zurück zum Zitat Meilǎ M (2005) Comparing clusterings: an axiomatic view. In ICML ’05: proceedings of the 22nd international conference on machine learning. New York, NY, USA, pp 577–584 Meilǎ M (2005) Comparing clusterings: an axiomatic view. In ICML ’05: proceedings of the 22nd international conference on machine learning. New York, NY, USA, pp 577–584
23.
Zurück zum Zitat Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In IMC ’07: proceedings of the 7th ACM SIGCOMM conference on internet, measurement, pp 29–42 Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In IMC ’07: proceedings of the 7th ACM SIGCOMM conference on internet, measurement, pp 29–42
24.
Zurück zum Zitat Newman M (2006) Modularity and community structure in networks. Proc Nat Acad Sci USA 103(23):8577–8582CrossRef Newman M (2006) Modularity and community structure in networks. Proc Nat Acad Sci USA 103(23):8577–8582CrossRef
25.
Zurück zum Zitat Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113CrossRef Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113CrossRef
26.
Zurück zum Zitat Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818CrossRef Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818CrossRef
27.
Zurück zum Zitat Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Nat Acad Sci USA 101(9):2658–2663CrossRef Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Nat Acad Sci USA 101(9):2658–2663CrossRef
28.
Zurück zum Zitat Ren Y, Kraut R, Kiesler S (2007) Applying common identity and bond theory to design of online communities. Organ Stud 28(3):377–408CrossRef Ren Y, Kraut R, Kiesler S (2007) Applying common identity and bond theory to design of online communities. Organ Stud 28(3):377–408CrossRef
30.
Zurück zum Zitat Shi C, Yu PS, Cai Y, Yan Z, Wu B (2011) On selection of objective functions in multi-objective community detection. In CIKM ’11: proceedings of the 20th ACM international conference on information and, knowledge management, pp 2301–2304 Shi C, Yu PS, Cai Y, Yan Z, Wu B (2011) On selection of objective functions in multi-objective community detection. In CIKM ’11: proceedings of the 20th ACM international conference on information and, knowledge management, pp 2301–2304
31.
Zurück zum Zitat Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRef Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRef
32.
Zurück zum Zitat Spielman D, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC ’04: proceedings of the 36th annual ACM symposium on theory of computing, pp 81–90 Spielman D, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC ’04: proceedings of the 36th annual ACM symposium on theory of computing, pp 81–90
33.
Zurück zum Zitat Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In KDD ’09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–806 Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In KDD ’09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–806
34.
Zurück zum Zitat von Luxburg U (2010) Clustering stability: an overview. Found Trends Mach Learn 2(3):235–274 von Luxburg U (2010) Clustering stability: an overview. Found Trends Mach Learn 2(3):235–274
35.
Zurück zum Zitat Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 393:440–442CrossRef Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 393:440–442CrossRef
36.
Zurück zum Zitat Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state of the art and comparative study. ACM Comput Surv 45(4). Art no 43 Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state of the art and comparative study. ACM Comput Surv 45(4). Art no 43
37.
Zurück zum Zitat Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping network community detection. In ICDM ’12: proceedings of the 2012 IEEE international conference on data mining, pp 1170–1175 Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping network community detection. In ICDM ’12: proceedings of the 2012 IEEE international conference on data mining, pp 1170–1175
38.
Zurück zum Zitat Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In ICDM ’12: proceedings of the 2012 IEEE international conference on data mining, pp 745–754 Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In ICDM ’12: proceedings of the 2012 IEEE international conference on data mining, pp 745–754
39.
Zurück zum Zitat Yang J, Leskovec J (2013) Overlapping community detection at scale: a non-negative factorization approach. In WSDM ’13: proceedings of the sixth ACM international conference on web search and data mining, pp 587–596 Yang J, Leskovec J (2013) Overlapping community detection at scale: a non-negative factorization approach. In WSDM ’13: proceedings of the sixth ACM international conference on web search and data mining, pp 587–596
40.
Zurück zum Zitat Yang J, Leskovec J (2013) Structure and overlaps of communities in networks. ACM Trans Intell Syst Technol (to appear) Yang J, Leskovec J (2013) Structure and overlaps of communities in networks. ACM Trans Intell Syst Technol (to appear)
Metadaten
Titel
Defining and evaluating network communities based on ground-truth
verfasst von
Jaewon Yang
Jure Leskovec
Publikationsdatum
01.01.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0693-z

Weitere Artikel der Ausgabe 1/2015

Knowledge and Information Systems 1/2015 Zur Ausgabe