research-article

Defining and evaluating network communities based on ground-truth

Authors:
Jaewon Yang

Stanford University

Stanford University
View Profile

,
Jure Leskovec

Stanford University

Stanford University
View Profile

MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data SemanticsAugust 2012Article No.: 3Pages 1–8https://doi.org/10.1145/2350190.2350193

Published:12 August 2012Publication History

MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics

Pages 1–8

ABSTRACT

Nodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth.

We study a set of 230 large social, collaboration and information networks where nodes explicitly define group memberships. We use these groups to define the notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate different definitions of network communities on a large scale. We choose 13 commonly used definitions of network communities and examine their quality, sensitivity and robustness. We show that the 13 definitions naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities.

References

R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06, pages 475--486, 2006. Google ScholarDigital Library
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06, pages 44--54, 2006. Google ScholarDigital Library
L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. Journal of Stat. Mech.: Theory and Experiment, 29(09): P09008, 2005.Google ScholarCross Ref
I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on PAMI, 29(11): 1944--1957, 2007. Google ScholarDigital Library
S. L. Feld. The focused organization of social ties. American Journal of Sociology, 86(5): 1015--1035, 1981.Google ScholarCross Ref
G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00, pages 150--160, 2000. Google ScholarDigital Library
S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5): 75--174, 2010.Google Scholar
S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104(1): 36--41, 2007.Google ScholarCross Ref
M. Girvan and M. Newman. Community structure in social and biological networks. PNAS, 99(12): 7821--7826, 2002.Google ScholarCross Ref
M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78: 1360--1380, 1973.Google ScholarCross Ref
R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70: 025101, 2004.Google ScholarCross Ref
S. Kairam, D. Wang, and J. Leskovec. The life and death of online groups: Predicting group growth and longevity. In WSDM '12, 2012. Google ScholarDigital Library
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20: 359--392, 1998. Google ScholarDigital Library
H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2: 83--97, 1955.Google ScholarCross Ref
J. Leskovec, L. Adamic, and B. Huberman. The dynamics of viral marketing. ACM TWEB, 1(1), 2007. Google ScholarDigital Library
J. Leskovec, K. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In WWW '10, 2010. Google ScholarDigital Library
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6, 2009.Google Scholar
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC '07, pages 29--42, 2007. Google ScholarDigital Library
M. Newman. Modularity and community structure in networks. PNAS, 103(23): 8577--8582, 2006.Google ScholarCross Ref
M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 2004.Google ScholarCross Ref
G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043): 814--818, 2005.Google ScholarCross Ref
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9): 2658--2663, 2004.Google ScholarCross Ref
S. Schaeffer. Graph clustering. Computer Science Review, 1(1): 27--64, 2007. Google ScholarDigital Library
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations on PAMI, 2000. Google ScholarDigital Library
D. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04, pages 81--90, 2004. Google ScholarDigital Library
D. Watts and S. Strogatz. Collective dynamics of small-world networks. Nature, 393: 440--442, 1998.Google ScholarCross Ref
W. Zachary. An information flow model for conflict and fission in small groups. J. of Anthropological Research, 1977.Google Scholar

Index Terms

Defining and evaluating network communities based on ground-truth
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Overlapping community detection at scale: a nonnegative matrix factorization approach
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

Network communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than ...
Read More
Structure and Overlaps of Ground-Truth Communities in Networks
Special Issue on Linking Social Granularity and Functions

One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure ...
Read More
Defining and evaluating network communities based on ground-truth

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
August 2012
103 pages
ISBN:9781450315463
DOI:10.1145/2350190
Conference Chairs:
Ying Ding
Indiana University, Indiana
,
Jiawei Han
University of Illinois at Urbana-Champaign
,
Jie Tang
Tsinghua University, China
,
Philip S. Yu
University of Illinois at Chicago
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community scores
network communities
social and information networks
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 269
  Total Citations
  View Citations
- 1,259
  Total Downloads
- Downloads (Last 12 months)205
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Defining and evaluating network communities based on ground-truth

MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Overlapping community detection at scale: a nonnegative matrix factorization approach

Structure and Overlaps of Ground-Truth Communities in Networks

Defining and evaluating network communities based on ground-truth