research-article

Public Access

Local Higher-Order Graph Clustering

Authors:
Hao Yin

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Austin R. Benson

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Jure Leskovec

Stanford University, Stanford , CA, USA

Stanford University, Stanford , CA, USA
View Profile

,
David F. Gleich

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2017Pages 555–564https://doi.org/10.1145/3097983.3098069

Published:04 August 2017Publication History

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 555–564

ABSTRACT

Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal \emph{motif conductance}, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology.

Supplemental Material

yin_graph_clustering.mp4

mp4

367.9 MB

Download

References

N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. Efficient graphlet counting for large networks. In ICDM, 2015. Google ScholarDigital Library
R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, 2006. Google ScholarDigital Library
R. Andersen, F. Chung, and K. Lang. Local partitioning for directed graphs using PageRank. Internet Mathematics, 2008. Google ScholarCross Ref
A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In SDM, 2015. Google ScholarCross Ref
A. R. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 2016. Google ScholarCross Ref
M. Bressan, F. Chierichetti, R. Kumar, S. Leucci, and A. Panconesi. Counting graphlets: Space vs time. In WSDM, 2017.Google ScholarDigital Library
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 1985. Google ScholarDigital Library
F. Chung and O. Simpson. Solving linear systems with boundary conditions using heat kernel pagerank. In WAW, 2013. Google ScholarCross Ref
F. R. Chung. Four proofs for the cheeger inequality and graph partition algorithms. In ICCM, 2007.Google Scholar
A. Clauset. Finding local community structure in networks. Phys. Rev. E, 72(2), 2005. Google ScholarCross Ref
W. Cui, Y. Xiao, H. Wang, and W. Wang. Local search of communities in large graphs. In SIGMOD, 2014. Google ScholarDigital Library
L. De Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. In KDD, 2016.Google ScholarDigital Library
A. Epasto, J. Feldman, S. Lattanzi, S. Leonardi, and V. Mirrokni. Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs. In WWW, 2014. Google ScholarDigital Library
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical J., 1973.Google ScholarCross Ref
S. Fortunato. Community detection in graphs. Physics Reports, 2010. Google ScholarCross Ref
U. Gargi, W. Lu, V. Mirrokni, and S. Yoon. Large-scale community detection on youtube for topic discovery and exploration. In ICWSM, 2011.Google Scholar
D. F. Gleich and M. W. Mahoney. Using local spectral methods to robustify graph-based learning algorithms. In KDD, 2015. Google ScholarDigital Library
D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, 2012. Google ScholarDigital Library
M. S. Granovetter. The strength of weak ties. American J. of Sociology, 1973. Google ScholarCross Ref
X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, 2014. Google ScholarDigital Library
S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In WWW, 2017. Google ScholarDigital Library
L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney. Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Physics Review E, 91, 2015. Google ScholarCross Ref
M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, 2015.Google ScholarDigital Library
B.-B. Jiang, J.-G. Wang, Y. Wang, and J. Xiao. Gene prioritization for type 2 diabetes in tissue-specific protein interaction networks. Systems Biology, 2009.Google Scholar
B. Klimt and Y. Yang. Introducing the Enron corpus. In CEAS, 2004.Google Scholar
K. Kloster and D. F. Gleich. Heat kernel based community detection. In KDD, 2014. Google ScholarDigital Library
C. Klymko, D. F. Gleich, and T. G. Kolda. Using triangles to improve community detection in directed networks. In ASE BigData Conference, 2014.Google Scholar
A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 2009. Google ScholarCross Ref
K. J. Lang. Fixing two weaknesses of the spectral method. In NIPS, 2005.Google Scholar
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 2007.Google ScholarDigital Library
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009. Google ScholarCross Ref
Y. Li, K. He, D. Bindel, and J. E. Hopcroft. Uncovering the small community structure in large networks: A local spectral approach. In WWW, 2015. Google ScholarDigital Library
M. W. Mahoney, L. Orecchia, and N. K. Vishnoi. A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. JMLR, 2012.Google Scholar
D. Marcus and Y. Shavitt. Efficient counting of network motifs. In ICDCSW, 2010. Google ScholarDigital Library
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 2002. Google ScholarCross Ref
L. Orecchia and Z. A. Zhu. Flow-based algorithms for local graph clustering. In SODA, 2014. Google ScholarCross Ref
N. Prvzulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 2007.Google Scholar
K. Rohe and T. Qin. The blessing of transitivity in sparse and stochastic networks. arXiv:1307.2302, 2013.Google Scholar
S. E. Schaeffer. Graph clustering. Computer Science Rev., 2007. Google ScholarDigital Library
M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In KDD, 2010. Google ScholarDigital Library
O. Sporns and R. Kötter. Motifs in brain networks. PLOS Biology, 2004. Google ScholarCross Ref
A. L. Traud, P. J. Mucha, and M. A. Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 2012. Google ScholarCross Ref
C. Tsourakakis, J. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. In WWW, 2017. Google ScholarDigital Library
J. Ugander, L. Backstrom, and J. Kleinberg. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In WWW, 2013.Google Scholar
K. Voevodski, S.-H. Teng, and Y. Xia. Spectral affinity in protein networks. BMC Systems Biology, 2009. Google ScholarCross Ref
D. Wagner and F. Wagner. Between min cut and graph bisection. In MFCS, 1993. Google ScholarCross Ref
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015. Google ScholarDigital Library
Ö. N. Yaverouglu, N. Malod-Dognin, D. Davis, Z. Levnajic, V. Janjic, R. Karapandza, A. Stojmirovic, and N. Prvzulj. Revealing the hidden language of complex networks. Scientific Reports, 2014.Google Scholar
H. Yin, A. R. Benson, and J. Leskovec. Higher-order clustering in networks. arXiv:1704.03913, 2017.Google Scholar
Z. A. Zhu, S. Lattanzi, and V. S. Mirrokni. A local algorithm for finding well-connected clusters. In ICML, 2013.endthebibliographyGoogle Scholar

Index Terms

Local Higher-Order Graph Clustering

Recommendations

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Read More
Constrained Local Graph Clustering by Colored Random Walk
WWW '19: The World Wide Web Conference

Detecting local graph clusters is an important problem in big graph analysis. Given seed nodes in a graph, local clustering aims at finding subgraphs around the seed nodes, which consist of nodes highly relevant to the seed nodes. However, existing ...
Read More
New Clustering Techniques of Node Embeddings Based on Metaheuristic Optimization Algorithms
Large-Scale Scientific Computing
Abstract
Node embeddings present a powerful method of embedding graph-structured data into a low dimensional space while preserving local node information. Clustering is a common preprocessing task on unsupervised data utilized to get the best insight into ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
community detection
higher-order structure
motifs
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 323
  Total Citations
  View Citations
- 5,762
  Total Downloads
- Downloads (Last 12 months)782
- Downloads (Last 6 weeks)90
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Local Higher-Order Graph Clustering

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Constrained Local Graph Clustering by Colored Random Walk

New Clustering Techniques of Node Embeddings Based on Metaheuristic Optimization Algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Local Higher-Order Graph Clustering

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Constrained Local Graph Clustering by Colored Random Walk

New Clustering Techniques of Node Embeddings Based on Metaheuristic Optimization Algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media