ABSTRACT
Nodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth.
We study a set of 230 large social, collaboration and information networks where nodes explicitly define group memberships. We use these groups to define the notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate different definitions of network communities on a large scale. We choose 13 commonly used definitions of network communities and examine their quality, sensitivity and robustness. We show that the 13 definitions naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities.
- R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06, pages 475--486, 2006. Google ScholarDigital Library
- L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06, pages 44--54, 2006. Google ScholarDigital Library
- L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. Journal of Stat. Mech.: Theory and Experiment, 29(09): P09008, 2005.Google ScholarCross Ref
- I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on PAMI, 29(11): 1944--1957, 2007. Google ScholarDigital Library
- S. L. Feld. The focused organization of social ties. American Journal of Sociology, 86(5): 1015--1035, 1981.Google ScholarCross Ref
- G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00, pages 150--160, 2000. Google ScholarDigital Library
- S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5): 75--174, 2010.Google Scholar
- S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104(1): 36--41, 2007.Google ScholarCross Ref
- M. Girvan and M. Newman. Community structure in social and biological networks. PNAS, 99(12): 7821--7826, 2002.Google ScholarCross Ref
- M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78: 1360--1380, 1973.Google ScholarCross Ref
- R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70: 025101, 2004.Google ScholarCross Ref
- S. Kairam, D. Wang, and J. Leskovec. The life and death of online groups: Predicting group growth and longevity. In WSDM '12, 2012. Google ScholarDigital Library
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20: 359--392, 1998. Google ScholarDigital Library
- H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2: 83--97, 1955.Google ScholarCross Ref
- J. Leskovec, L. Adamic, and B. Huberman. The dynamics of viral marketing. ACM TWEB, 1(1), 2007. Google ScholarDigital Library
- J. Leskovec, K. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In WWW '10, 2010. Google ScholarDigital Library
- J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6, 2009.Google Scholar
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC '07, pages 29--42, 2007. Google ScholarDigital Library
- M. Newman. Modularity and community structure in networks. PNAS, 103(23): 8577--8582, 2006.Google ScholarCross Ref
- M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 2004.Google ScholarCross Ref
- G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043): 814--818, 2005.Google ScholarCross Ref
- F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9): 2658--2663, 2004.Google ScholarCross Ref
- S. Schaeffer. Graph clustering. Computer Science Review, 1(1): 27--64, 2007. Google ScholarDigital Library
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations on PAMI, 2000. Google ScholarDigital Library
- D. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04, pages 81--90, 2004. Google ScholarDigital Library
- D. Watts and S. Strogatz. Collective dynamics of small-world networks. Nature, 393: 440--442, 1998.Google ScholarCross Ref
- W. Zachary. An information flow model for conflict and fission in small groups. J. of Anthropological Research, 1977.Google Scholar
Index Terms
- Defining and evaluating network communities based on ground-truth
Recommendations
Overlapping community detection at scale: a nonnegative matrix factorization approach
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningNetwork communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than ...
Structure and Overlaps of Ground-Truth Communities in Networks
Special Issue on Linking Social Granularity and FunctionsOne of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure ...
Defining and evaluating network communities based on ground-truth
Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...
Comments