research-article

Overlapping community detection at scale: a nonnegative matrix factorization approach

Authors:
Jaewon Yang

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Jure Leskovec

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningFebruary 2013Pages 587–596https://doi.org/10.1145/2433396.2433471

Published:04 February 2013Publication History

WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

Pages 587–596

ABSTRACT

Network communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the network. Communities in networks also overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem.

In this paper we present BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges. We build on a novel observation that overlaps between communities are densely connected. This is in sharp contrast with present community detection methods which implicitly assume that overlaps between communities are sparsely connected and thus cannot properly extract overlapping communities in networks. In this paper, we develop a model-based community detection algorithm that can detect densely overlapping, hierarchically nested as well as non-overlapping communities in massive networks. We evaluate our algorithm on 6 large social, collaboration and information networks with ground-truth community information. Experiments show state of the art performance both in terms of the quality of detected communities as well as in speed and scalability of our algorithm.

References

Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. Link communities reveal multi-scale complexity in networks. Nature, 466:761--764, 2010.Google ScholarCross Ref
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. JMLR, 2007. Google ScholarDigital Library
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD'06, 2006. Google ScholarDigital Library
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarCross Ref
R. L. Breiger. The duality of persons and groups. Social Forces, 53(2):181--190, 1974.Google ScholarCross Ref
I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE PAMI, 29(11):1944--1957, 2007. Google ScholarDigital Library
S. L. Feld. The focused organization of social ties. American J. of Sociology, 86(5):1015--1035, 1981.Google ScholarCross Ref
S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75--174, 2010.Google ScholarCross Ref
S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104(1):36--41, 2007.Google ScholarCross Ref
M. Girvan and M. Newman. Community structure in social and biological networks. PNAS, 99(12):7821--7826, 2002.Google ScholarCross Ref
D. F. Gleich and C. Seshadhri. Neighborhoods are good communities. In KDD'12, 2012.Google Scholar
M. S. Granovetter. The strength of weak ties. American J. of Sociology, 78:1360--1380, 1973.Google ScholarCross Ref
S. Gregory. Fuzzy overlapping communities in networks. J. of Stat. Mech.: Theory and Experiment, 2011.Google Scholar
C.-J. Hsieh, K.-Y. Chiang, and I. S. Dhillon. Low-rank modeling of signed networks. In KDD'12, 2012. Google ScholarDigital Library
C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In KDD'11, 2011. Google ScholarDigital Library
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48:96--129, 1998. Google ScholarDigital Library
N. Krogan et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 440(7084):637--643, 2006.Google ScholarCross Ref
S. Lattanzi and D. Sivakumar. Affiliation networks. In STOC'09, 2009. Google ScholarDigital Library
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 1999.Google Scholar
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.Google ScholarCross Ref
C.-J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 2007. Google ScholarDigital Library
J. J. McAuley and J. Leskovec. Learning to discover social circles in ego networks. In NIPS, 2012.Google ScholarDigital Library
M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415--444, 2001..Google ScholarCross Ref
M. Newman. Modularity and community structure in networks. PNAS, 103(23):8577--8582, 2006.Google ScholarCross Ref
G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814--818, 2005.Google ScholarCross Ref
W. W. Powell, D. R. White, K. W. Koput, and J. Owen-Smith. Network dynamics and field evolution: The growth of interorganizational collaboration in the life sciences. American J. of Sociology, 110(4):1132-1205, 2005.Google ScholarCross Ref
M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. PNAS, 105:1118--1123, 2008.Google ScholarCross Ref
S. Schaeffer. Graph clustering. Computer Science Review, 1(1):27--64, 2007. Google ScholarDigital Library
G. Simmel. Conflict and the web of group affiliations. Simon and Schuster, 1964.Google Scholar
U. von Luxburg. A tutorial on spectral clustering. Technical Report 149, MPI for Biological Cybernetics, August 2006.Google Scholar
D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks Nature, 393:440-442, 1998.Google Scholar
J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: the state of the art and comparative study ACM Computing Surveys, 45:4, 2013.Google Scholar
J. Yang and J. Leskovec. Community-affiliation graph model for overlapping network community detection. In ICDM'12, 2012. Google ScholarDigital Library
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In ICDM'12, 2012. Google ScholarDigital Library
J. Yang and J. Leskovec. Structure and overlaps of communities in networks. In SNAKDD'12, 2012.Google Scholar
E. Zheleva, H. Sharara, and L. Getoor. Co-evolution of social and affiliation networks. In KDD'09, 2009. Google ScholarDigital Library

Index Terms

Overlapping community detection at scale: a nonnegative matrix factorization approach
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Detecting cohesive and 2-mode communities indirected and undirected networks
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Networks are a general language for representing relational information among objects. An effective way to model, reason about, and summarize networks, is to discover sets of nodes with common connectivity patterns. Such sets are commonly referred to as ...
Read More
Overlapping Community Detection by Node-Weighting
ICCDA '18: Proceedings of the 2nd International Conference on Compute and Data Analysis

Community detection is an important task with great practical value for understanding the structure and function of complex networks. However, in many social networks, a node may belong to more than one community. Thus, the detection of overlapping ...
Read More
Community detection in social networks using hybrid merging of sub-communities

Network vertices are often divided into groups or communities with dense connections within communities and sparse connections between communities. Community detection has recently attracted considerable attention in the field of data mining and social ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining
February 2013
816 pages
ISBN:9781450318693
DOI:10.1145/2433396
General Chairs:
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy
,
Program Chairs:
Paolo Ferragina
University of Pisa, Italy
,
Aristides Gionis
Yahoo! Research, Barcelona, Spain
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 February 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
matrix factorization
network communities
overlapping community detection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 556
  Total Citations
  View Citations
- 4,571
  Total Downloads
- Downloads (Last 12 months)225
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Overlapping community detection at scale: a nonnegative matrix factorization approach

WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detecting cohesive and 2-mode communities indirected and undirected networks

Overlapping Community Detection by Node-Weighting

Community detection in social networks using hybrid merging of sub-communities