skip to main content
10.1145/2350190.2350193acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Defining and evaluating network communities based on ground-truth

Published:12 August 2012Publication History

ABSTRACT

Nodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth.

We study a set of 230 large social, collaboration and information networks where nodes explicitly define group memberships. We use these groups to define the notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate different definitions of network communities on a large scale. We choose 13 commonly used definitions of network communities and examine their quality, sensitivity and robustness. We show that the 13 definitions naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities.

References

  1. R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06, pages 475--486, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06, pages 44--54, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. Journal of Stat. Mech.: Theory and Experiment, 29(09): P09008, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  4. I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on PAMI, 29(11): 1944--1957, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. L. Feld. The focused organization of social ties. American Journal of Sociology, 86(5): 1015--1035, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00, pages 150--160, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5): 75--174, 2010.Google ScholarGoogle Scholar
  8. S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104(1): 36--41, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Girvan and M. Newman. Community structure in social and biological networks. PNAS, 99(12): 7821--7826, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78: 1360--1380, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70: 025101, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Kairam, D. Wang, and J. Leskovec. The life and death of online groups: Predicting group growth and longevity. In WSDM '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20: 359--392, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2: 83--97, 1955.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Leskovec, L. Adamic, and B. Huberman. The dynamics of viral marketing. ACM TWEB, 1(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Leskovec, K. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In WWW '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6, 2009.Google ScholarGoogle Scholar
  18. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC '07, pages 29--42, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Newman. Modularity and community structure in networks. PNAS, 103(23): 8577--8582, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043): 814--818, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  22. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9): 2658--2663, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Schaeffer. Graph clustering. Computer Science Review, 1(1): 27--64, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations on PAMI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04, pages 81--90, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Watts and S. Strogatz. Collective dynamics of small-world networks. Nature, 393: 440--442, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  27. W. Zachary. An information flow model for conflict and fission in small groups. J. of Anthropological Research, 1977.Google ScholarGoogle Scholar

Index Terms

  1. Defining and evaluating network communities based on ground-truth

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
      August 2012
      103 pages
      ISBN:9781450315463
      DOI:10.1145/2350190

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader