Abstract
One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure communities in large networks remains relatively limited. In particular, due to the unavailability of labeled ground-truth data, it was traditionally very hard to develop accurate models of network community structure.
Here we use six large social, collaboration, and information networks where nodes explicitly state their ground-truth community memberships. For example, nodes in social networks join into explicitly defined interest based groups, and we use such groups as explicitly labeled ground-truth communities. We use such ground-truth communities to study their structural signatures by analyzing how ground-truth communities emerge in networks and how they overlap. We observe some surprising phenomena. First, ground-truth communities contain high-degree hub nodes that reside in community overlaps and link to most of the members of the community. Second, the overlaps of communities are more densely connected than the non-overlapping parts of communities. We show that this in contrast to the conventional wisdom that community overlaps are more sparsely connected than the non-overlapping parts themselves. We then show that many existing models of network communities do not capture dense community overlaps. This in turn means that most present models and community detection methods confuse overlaps as separate communities. In contrast, we present the community-affiliation graph model (AGM), a conceptual model of network community structure. We demonstrate that AGM reliably captures the overall structure of networks as well as the overlapping and hierarchical nature of network communities.
- Bruno D. Abrahao, Sucheta Soundarajan, John E. Hopcroft, and Robert Kleinberg. 2012. On the separability of structural classes of communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). 624--632. Google ScholarDigital Library
- Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. 2010. Link communities reveal multi-scale complexity in networks. Nature 466.Google Scholar
- E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. 2007. Mixed membership stochastic blockmodels. J. Machine Learn. Res. 9, 1981--2014. Google ScholarDigital Library
- L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 44--54. Google ScholarDigital Library
- Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Phys. Rev. E 84.Google ScholarCross Ref
- A.-L. Barabási and Z. N. Oltvai. 2004. Network biology: Understanding the cell's functional organization. Nature Rev. Genetics 5, 2, 101--113.Google ScholarCross Ref
- Jeffrey Baumes, Mark Goldberg, and Malik Magdon-ismail. 2005. Efficient identification of overlapping communities. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI'05). 27--36. Google ScholarDigital Library
- J. Baumes, M. Goldberg, M. Magdon-Ismail, and A. Wallace. 2004. Discovering hidden groups in communication networks. In Proceedings of the 2nd NSF/NIJ Symposium on Intelligence and Security Informatics (ISI'04).Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Machine Learn. Res. 3, 993--1022. Google ScholarDigital Library
- R. L. Breiger. 1974. The duality of persons and groups. Social Forces 53, 2, 181--190.Google ScholarCross Ref
- D. Chakrabarti and C. Faloutsos. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 1, 2. Google ScholarDigital Library
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM'04).Google Scholar
- A. Clauset, M. E. J. Newman, and C. Moore. 2004. Finding community structure in very large networks. Phys. Rev. E 70.Google ScholarCross Ref
- V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2005. Characterization and modeling of protein protein interaction networks. Physica A Stat. Mech. Appl. 352, 1--27.Google ScholarCross Ref
- V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2006. Detecting rich-club ordering in complex networks. Nature Phys. 2, 110--115.Google ScholarCross Ref
- I. S. Dhillon, Y. Guan, and B. Kulis. 2007. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Machine Intell. 29, 11, 1944--1957. Google ScholarDigital Library
- T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions, and overlapping communities. Phys. Rev. E 80.Google ScholarCross Ref
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). 251--262. Google ScholarDigital Library
- Illes J. Farkas, Imre Derényi, Albert-László Barabási, and Tamas Vicsek. 2001. Spectra of ‘Real-World’ graphs: Beyond the semi-circle law. Phys. Rev. E 64.Google ScholarCross Ref
- Scott L. Feld. 1981. The focused organization of social ties. Amer. J. Sociol. 86, 5, 1015--1035.Google ScholarCross Ref
- M. Fiedler. 1973. Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 98, 298--305.Google ScholarCross Ref
- G. W. Flake, S. Lawrence, and C. L. Giles. 2000. Efficient identification of Web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). 150--160. Google ScholarDigital Library
- S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3--5, 75--174.Google ScholarCross Ref
- S. Fortunato and M. Barthélemy. 2007. Resolution limit in community detection. Proc. Nat. Acad. Sci. U.S.A. 104, 1, 36--41.Google ScholarCross Ref
- M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. U.S.A. 99, 12, 7821--7826.Google ScholarCross Ref
- David F. Gleich and C. Seshadhri. 2012. Neighborhoods are good communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12).Google Scholar
- M. S. Granovetter. 1973. The strength of weak ties. Amer. J. Sociol. 78, 1360--1380.Google ScholarCross Ref
- S. Gregory. 2010. Finding overlapping communities in networks by label propagation. New J. Physics 12, 10.Google ScholarCross Ref
- S. Guattery and G. L. Miller. 1998. On the quality of spectral separators. SIAM J. Matrix Anal. Appl. 19, 701--719. Google ScholarDigital Library
- Keith Henderson and Tina Eliassi-Rad. 2009. Applying latent dirichlet allocation to group discovery in large graphs. In Proceedings of the ACM Symposium on Applied Computing (SAC'09). 1456--1461. Google ScholarDigital Library
- Paul W. Holland, Kathryn Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Netw. 5, 2, 109--137.Google ScholarCross Ref
- J. Hopcroft, O. Khan, B. Kulis, and B. Selman. 2003. Natural communities in large linked networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 541--546. Google ScholarDigital Library
- S. Kairam, D. Wang, and J. Leskovec. 2012. The life and death of online groups: Predicting group growth and longevity. In Proceedings of the ACM International Conference on Web Search and Data Minig (WSDM'12). Google ScholarDigital Library
- R. Kannan, S. Vempala, and A. Vetta. 2004. On clusterings: Good, bad and spectral. J. ACM 51, 3, 497--515. Google ScholarDigital Library
- B. Karrer, E. Levina, and M. E. J. Newman. 2008. Robustness of community structure in networks. Phys. Rev. E 77.Google ScholarCross Ref
- Brian Karrer and M. E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83.Google ScholarCross Ref
- G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359--392. Google ScholarDigital Library
- R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. 2000. Stochastic models for the Web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS'00). 57. Google ScholarDigital Library
- Andrea Lancichinetti and Santo Fortunato. 2009. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E 80, 1.Google ScholarCross Ref
- Silvio Lattanzi and D. Sivakumar. 2009. Affiliation networks. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC'09). 427--434. Google ScholarDigital Library
- Conrad Lee, Fergal Reid, Aaron McDaid, and Neil Hurley. 2010. Detecting highly overlapping community structure by greedy clique expansion. In Proceedings of the 4th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'10).Google Scholar
- J. Leskovec, L. A. Adamic, and B. A. Huberman. 2007a. The dynamics of viral marketing. ACM Trans. Web 1, 1. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05). 177--187. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. 2007b. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1. Google ScholarDigital Library
- J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1, 29--123.Google ScholarCross Ref
- J. Leskovec, K. Lang, and M. Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). Google ScholarDigital Library
- Wangqun Lin, Xiangnan Kong, Philip S. Yu, Quanyuan Wu, Yan Jia, and Chuan Li. 2012. Community detection in incomplete information networks. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY, 341--350. Google ScholarDigital Library
- J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Proceedings of the 26th Annual Conference on Advances in Neural Information Processing Systems (NIPS'12). 548--556.Google Scholar
- Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). 29--42. Google ScholarDigital Library
- M. Mitzenmacher. 2004. A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 2, 226--251.Google ScholarCross Ref
- M. Molloy and B. Reed. 1995. A critical point for random graphs with a given degree sequence. Random Struct. Algorit. 6, 161--180. Google ScholarDigital Library
- Morten Mørup, Mikkel N. Schmidt, and Lars Kai Hansen. 2011. Infinite multiple membership relational modeling for complex networks. CoRR abs/1101.5097.Google Scholar
- Raj Rao Nadakuditi and M. E. J. Newman. 2012. Graph spectra and the detectability of community structure in networks. Phys. Rev. Lett. 108.Google ScholarCross Ref
- Tamás Nepusz, Haiyuan Yu, and Alberto Paccanaro. 2012. Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods 9, 471--472.Google ScholarCross Ref
- M. E. J. Newman. 2006. Modularity and community structure in networks. Proc. Nat. Acad. Sci. U.S.A. 103, 23, 8577--8582.Google ScholarCross Ref
- M. E. J. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69.Google Scholar
- G. Palla, I. Derényi, I. Farkas, and T. Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043, 814--818.Google Scholar
- S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. 2012. Community detection in social media. Data Mining Knowl. Discov. 24, 3, 515--554. Google ScholarDigital Library
- Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. 2011. Overlapping community detection using Bayesian non-negative matrix factorization. Phys. Rev. E 83, 6.Google ScholarCross Ref
- Y. Qi, J. K. Seetharaman, and Z. B. Joseph. 2005. Random forest similarity for protein-protein interaction prediction from multiple sources. In Proceedings of the Pacific Symposium on Biocomputing.Google Scholar
- F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. 2004. Defining and identifying communities in networks. Proc. Nat. Acad. Sci. U.S.A. 101, 9, 2658--2663.Google ScholarCross Ref
- M. Sales-Pardo, R. Guimerà, A. A. Moreira, and L. A. N. Amaral. 2007. Extracting the hierarchical organization of complex systems. Proc. Nat. Acad. Sci. U.S.A. 104, 18874--18874.Google ScholarCross Ref
- E. N. Sawardecker, M. Sales-Pardo, and L. A. N. Amaral. 2009. Detection of node group membership in networks with group overlap. Euro. Phys. J. B 67, 277--284.Google Scholar
- S. E. Schaeffer. 2007. Graph Clustering. Comput. Sci. Rev. 1, 1, 27--64. Google ScholarDigital Library
- C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdos-Renyi graphs. Phys. Rev. E 85.Google ScholarCross Ref
- Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Stat. Mech. Appl. 388, 8, 1706--1712.Google ScholarCross Ref
- J. Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 22, 8, 888--905. Google ScholarDigital Library
- Georg Simmel. 1964. Conflict and the Web of Group Affiliations. Simon and Schuster.Google Scholar
- D. A. Spielman and S.-H. Teng. 2007. Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra Appl. 421, 2--3, 284--305.Google ScholarCross Ref
- Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 797--806. Google ScholarDigital Library
- Chayant Tantipathananandh, Tanya Berger-Wolf, and David Kempe. 2007. A framework for community identification in dynamic social networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). 717--726. Google ScholarDigital Library
- Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proceedings of the IEEE International Conference on Data Mining (ICDM'08). 608--617. Google ScholarDigital Library
- U. von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 395--416. Google ScholarDigital Library
- Chunyan Wang, Mao Ye, and Wang-Chien Lee. 2012. From face-to-face gathering to social structure. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM'12). 465--474. Google ScholarDigital Library
- D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 440--442.Google ScholarCross Ref
- Jaewon Yang and Jure Leskovec. 2012a. Community-affiliation graph model for overlapping network community detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 1170--1175. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2012b. Defining and evaluating network communities based on ground-truth. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 745--754. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2012c. Structure and overlaps of communities in large networks. In Proceedings of the 6th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'12).Google Scholar
- Jaewon Yang and Jure Leskovec. 2013a. Defining and evaluating network communities based on ground-truth. Knowl. Inform. Syst. J.Google Scholar
- J. Yang and J. Leskovec. 2013b. Overlapping community detection at scale: A non-negative factorization approach. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM'13). 587--596. Google ScholarDigital Library
- E. Zheleva, H. Sharara, and L. Getoor. 2009. Co-evolution of social and affiliation networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 1007--1016. Google ScholarDigital Library
- Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, and Hongyuan Zha. 2006. Probabilistic models for discovering e-communities. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). 173--182. Google ScholarDigital Library
Index Terms
- Structure and Overlaps of Ground-Truth Communities in Networks
Recommendations
Overlapping community detection at scale: a nonnegative matrix factorization approach
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningNetwork communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than ...
Defining and evaluating network communities based on ground-truth
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data SemanticsNodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a ...
Defining and evaluating network communities based on ground-truth
Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...
Comments