skip to main content
10.1145/3097983.3098069acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Local Higher-Order Graph Clustering

Published:04 August 2017Publication History

ABSTRACT

Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal \emph{motif conductance}, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology.

Skip Supplemental Material Section

Supplemental Material

yin_graph_clustering.mp4

mp4

367.9 MB

References

  1. N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. Efficient graphlet counting for large networks. In ICDM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Andersen, F. Chung, and K. Lang. Local partitioning for directed graphs using PageRank. Internet Mathematics, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  4. A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In SDM, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  5. A. R. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Bressan, F. Chierichetti, R. Kumar, S. Leucci, and A. Panconesi. Counting graphlets: Space vs time. In WSDM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Chung and O. Simpson. Solving linear systems with boundary conditions using heat kernel pagerank. In WAW, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  9. F. R. Chung. Four proofs for the cheeger inequality and graph partition algorithms. In ICCM, 2007.Google ScholarGoogle Scholar
  10. A. Clauset. Finding local community structure in networks. Phys. Rev. E, 72(2), 2005. Google ScholarGoogle ScholarCross RefCross Ref
  11. W. Cui, Y. Xiao, H. Wang, and W. Wang. Local search of communities in large graphs. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. De Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. In KDD, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Epasto, J. Feldman, S. Lattanzi, S. Leonardi, and V. Mirrokni. Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs. In WWW, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical J., 1973.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Fortunato. Community detection in graphs. Physics Reports, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  16. U. Gargi, W. Lu, V. Mirrokni, and S. Yoon. Large-scale community detection on youtube for topic discovery and exploration. In ICWSM, 2011.Google ScholarGoogle Scholar
  17. D. F. Gleich and M. W. Mahoney. Using local spectral methods to robustify graph-based learning algorithms. In KDD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. S. Granovetter. The strength of weak ties. American J. of Sociology, 1973. Google ScholarGoogle ScholarCross RefCross Ref
  20. X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In WWW, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney. Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Physics Review E, 91, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  23. M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B.-B. Jiang, J.-G. Wang, Y. Wang, and J. Xiao. Gene prioritization for type 2 diabetes in tissue-specific protein interaction networks. Systems Biology, 2009.Google ScholarGoogle Scholar
  25. B. Klimt and Y. Yang. Introducing the Enron corpus. In CEAS, 2004.Google ScholarGoogle Scholar
  26. K. Kloster and D. F. Gleich. Heat kernel based community detection. In KDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Klymko, D. F. Gleich, and T. G. Kolda. Using triangles to improve community detection in directed networks. In ASE BigData Conference, 2014.Google ScholarGoogle Scholar
  28. A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  29. K. J. Lang. Fixing two weaknesses of the spectral method. In NIPS, 2005.Google ScholarGoogle Scholar
  30. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  32. Y. Li, K. He, D. Bindel, and J. E. Hopcroft. Uncovering the small community structure in large networks: A local spectral approach. In WWW, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. W. Mahoney, L. Orecchia, and N. K. Vishnoi. A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. JMLR, 2012.Google ScholarGoogle Scholar
  34. D. Marcus and Y. Shavitt. Efficient counting of network motifs. In ICDCSW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  36. L. Orecchia and Z. A. Zhu. Flow-based algorithms for local graph clustering. In SODA, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  37. N. Prvzulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 2007.Google ScholarGoogle Scholar
  38. K. Rohe and T. Qin. The blessing of transitivity in sparse and stochastic networks. arXiv:1307.2302, 2013.Google ScholarGoogle Scholar
  39. S. E. Schaeffer. Graph clustering. Computer Science Rev., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. O. Sporns and R. Kötter. Motifs in brain networks. PLOS Biology, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  42. A. L. Traud, P. J. Mucha, and M. A. Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  43. C. Tsourakakis, J. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. In WWW, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Ugander, L. Backstrom, and J. Kleinberg. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In WWW, 2013.Google ScholarGoogle Scholar
  45. K. Voevodski, S.-H. Teng, and Y. Xia. Spectral affinity in protein networks. BMC Systems Biology, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  46. D. Wagner and F. Wagner. Between min cut and graph bisection. In MFCS, 1993. Google ScholarGoogle ScholarCross RefCross Ref
  47. J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ö. N. Yaverouglu, N. Malod-Dognin, D. Davis, Z. Levnajic, V. Janjic, R. Karapandza, A. Stojmirovic, and N. Prvzulj. Revealing the hidden language of complex networks. Scientific Reports, 2014.Google ScholarGoogle Scholar
  49. H. Yin, A. R. Benson, and J. Leskovec. Higher-order clustering in networks. arXiv:1704.03913, 2017.Google ScholarGoogle Scholar
  50. Z. A. Zhu, S. Lattanzi, and V. S. Mirrokni. A local algorithm for finding well-connected clusters. In ICML, 2013.endthebibliographyGoogle ScholarGoogle Scholar

Index Terms

  1. Local Higher-Order Graph Clustering

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2017
          2240 pages
          ISBN:9781450348874
          DOI:10.1145/3097983

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader