ABSTRACT
Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal \emph{motif conductance}, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology.
Supplemental Material
- N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. Efficient graphlet counting for large networks. In ICDM, 2015. Google ScholarDigital Library
- R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, 2006. Google ScholarDigital Library
- R. Andersen, F. Chung, and K. Lang. Local partitioning for directed graphs using PageRank. Internet Mathematics, 2008. Google ScholarCross Ref
- A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In SDM, 2015. Google ScholarCross Ref
- A. R. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 2016. Google ScholarCross Ref
- M. Bressan, F. Chierichetti, R. Kumar, S. Leucci, and A. Panconesi. Counting graphlets: Space vs time. In WSDM, 2017.Google ScholarDigital Library
- N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 1985. Google ScholarDigital Library
- F. Chung and O. Simpson. Solving linear systems with boundary conditions using heat kernel pagerank. In WAW, 2013. Google ScholarCross Ref
- F. R. Chung. Four proofs for the cheeger inequality and graph partition algorithms. In ICCM, 2007.Google Scholar
- A. Clauset. Finding local community structure in networks. Phys. Rev. E, 72(2), 2005. Google ScholarCross Ref
- W. Cui, Y. Xiao, H. Wang, and W. Wang. Local search of communities in large graphs. In SIGMOD, 2014. Google ScholarDigital Library
- L. De Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. In KDD, 2016.Google ScholarDigital Library
- A. Epasto, J. Feldman, S. Lattanzi, S. Leonardi, and V. Mirrokni. Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs. In WWW, 2014. Google ScholarDigital Library
- M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical J., 1973.Google ScholarCross Ref
- S. Fortunato. Community detection in graphs. Physics Reports, 2010. Google ScholarCross Ref
- U. Gargi, W. Lu, V. Mirrokni, and S. Yoon. Large-scale community detection on youtube for topic discovery and exploration. In ICWSM, 2011.Google Scholar
- D. F. Gleich and M. W. Mahoney. Using local spectral methods to robustify graph-based learning algorithms. In KDD, 2015. Google ScholarDigital Library
- D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, 2012. Google ScholarDigital Library
- M. S. Granovetter. The strength of weak ties. American J. of Sociology, 1973. Google ScholarCross Ref
- X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, 2014. Google ScholarDigital Library
- S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In WWW, 2017. Google ScholarDigital Library
- L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney. Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Physics Review E, 91, 2015. Google ScholarCross Ref
- M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, 2015.Google ScholarDigital Library
- B.-B. Jiang, J.-G. Wang, Y. Wang, and J. Xiao. Gene prioritization for type 2 diabetes in tissue-specific protein interaction networks. Systems Biology, 2009.Google Scholar
- B. Klimt and Y. Yang. Introducing the Enron corpus. In CEAS, 2004.Google Scholar
- K. Kloster and D. F. Gleich. Heat kernel based community detection. In KDD, 2014. Google ScholarDigital Library
- C. Klymko, D. F. Gleich, and T. G. Kolda. Using triangles to improve community detection in directed networks. In ASE BigData Conference, 2014.Google Scholar
- A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 2009. Google ScholarCross Ref
- K. J. Lang. Fixing two weaknesses of the spectral method. In NIPS, 2005.Google Scholar
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 2007.Google ScholarDigital Library
- J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009. Google ScholarCross Ref
- Y. Li, K. He, D. Bindel, and J. E. Hopcroft. Uncovering the small community structure in large networks: A local spectral approach. In WWW, 2015. Google ScholarDigital Library
- M. W. Mahoney, L. Orecchia, and N. K. Vishnoi. A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. JMLR, 2012.Google Scholar
- D. Marcus and Y. Shavitt. Efficient counting of network motifs. In ICDCSW, 2010. Google ScholarDigital Library
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 2002. Google ScholarCross Ref
- L. Orecchia and Z. A. Zhu. Flow-based algorithms for local graph clustering. In SODA, 2014. Google ScholarCross Ref
- N. Prvzulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 2007.Google Scholar
- K. Rohe and T. Qin. The blessing of transitivity in sparse and stochastic networks. arXiv:1307.2302, 2013.Google Scholar
- S. E. Schaeffer. Graph clustering. Computer Science Rev., 2007. Google ScholarDigital Library
- M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In KDD, 2010. Google ScholarDigital Library
- O. Sporns and R. Kötter. Motifs in brain networks. PLOS Biology, 2004. Google ScholarCross Ref
- A. L. Traud, P. J. Mucha, and M. A. Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 2012. Google ScholarCross Ref
- C. Tsourakakis, J. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. In WWW, 2017. Google ScholarDigital Library
- J. Ugander, L. Backstrom, and J. Kleinberg. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In WWW, 2013.Google Scholar
- K. Voevodski, S.-H. Teng, and Y. Xia. Spectral affinity in protein networks. BMC Systems Biology, 2009. Google ScholarCross Ref
- D. Wagner and F. Wagner. Between min cut and graph bisection. In MFCS, 1993. Google ScholarCross Ref
- J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015. Google ScholarDigital Library
- Ö. N. Yaverouglu, N. Malod-Dognin, D. Davis, Z. Levnajic, V. Janjic, R. Karapandza, A. Stojmirovic, and N. Prvzulj. Revealing the hidden language of complex networks. Scientific Reports, 2014.Google Scholar
- H. Yin, A. R. Benson, and J. Leskovec. Higher-order clustering in networks. arXiv:1704.03913, 2017.Google Scholar
- Z. A. Zhu, S. Lattanzi, and V. S. Mirrokni. A local algorithm for finding well-connected clusters. In ICML, 2013.endthebibliographyGoogle Scholar
Index Terms
- Local Higher-Order Graph Clustering
Recommendations
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Constrained Local Graph Clustering by Colored Random Walk
WWW '19: The World Wide Web ConferenceDetecting local graph clusters is an important problem in big graph analysis. Given seed nodes in a graph, local clustering aims at finding subgraphs around the seed nodes, which consist of nodes highly relevant to the seed nodes. However, existing ...
New Clustering Techniques of Node Embeddings Based on Metaheuristic Optimization Algorithms
Large-Scale Scientific ComputingAbstractNode embeddings present a powerful method of embedding graph-structured data into a low dimensional space while preserving local node information. Clustering is a common preprocessing task on unsupervised data utilized to get the best insight into ...
Comments