skip to main content
10.1145/3394486.3403100acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Published:20 August 2020Publication History

ABSTRACT

Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for extracting large quasi-cliques: simply scan the vertex neighborhoods, compute the clustering coefficient of each vertex, and output the best such subgraph. The implementation of such a method requires counting the triangles in a graph, which is a well-studied problem in graph mining. When empirically tested across a number of real-world graphs, this approach reveals a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes, and the density of the best neighborhood often compares favorably to subgraphs produced by dedicated algorithms for maximizing subgraph density. For graphs with small clustering coefficients, we demonstrate that small vertex neighborhoods can be refined using a local-search method to grow larger cliques and near-cliques. Our results indicate that contrary to worst-case theoretical results, mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem, and provides motivation for further work geared towards a better explanation of these empirical successes.

References

  1. 2015. Large Near-Clique Detection. http://github.com/tsourolampis/Scalable-Large-Near-Clique-Detection.Google ScholarGoogle Scholar
  2. Noga Alon and Joel H Spencer. 2016. The probabilistic method. John Wiley & Sons.Google ScholarGoogle ScholarCross RefCross Ref
  3. Albert Angel, Nikos Sarkas, Nick Koudas, and Divesh Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. In Proceedings of the 38th International Conference on Very Large Data Bases. VLDB Endowment, 574--585.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gary D Bader and Christopher WV Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 1 (2003), 2.Google ScholarGoogle ScholarCross RefCross Ref
  5. Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.Google ScholarGoogle Scholar
  6. Vladimir Batagelj and Matjaz Zaversnik. 2003. An O(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).Google ScholarGoogle Scholar
  7. Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 16, 9 (1973), 575--577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 95--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jose Cadena, Anil Kumar Vullikanti, and Charu C Aggarwal. 2016. On dense subgraphs in signed network streams. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). IEEE, 51--60.Google ScholarGoogle ScholarCross RefCross Ref
  10. Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.Google ScholarGoogle ScholarCross RefCross Ref
  11. Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.Google ScholarGoogle Scholar
  12. Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David Eppstein, Maarten Löffler, and Darren Strash. 2010. Listing all maximal cliques in sparse graphs in near-optimal time. In International Symposium on Algorithms and Computation. Springer, 403--414.Google ScholarGoogle ScholarCross RefCross Ref
  14. Pooya Esfandiar, Francesco Bonchi, David F Gleich, Chen Greif, Laks VS Lakshmanan, and Byung-Won On. 2010. Fast Katz and commuters: Efficient estimation of social relatedness in large networks. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 132--145.Google ScholarGoogle ScholarCross RefCross Ref
  15. Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 251--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alessandro Ferrante, Gopal Pandurangan, and Kihong Park. 2008. On the hardness of optimization in power-law graphs. Theoretical Computer Science 393, 1--3 (2008), 220--230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael R Garey and David S Johnson. 2002. Computers and intractability.Google ScholarGoogle Scholar
  18. David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David F Gleich and C Seshadhri. 2012. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 597--605.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andrew V Goldberg. 1984. Finding a maximum density subgraph. Technical report, University of California Berkeley, CA.Google ScholarGoogle Scholar
  21. Rishi Gupta, Tim Roughgarden, and Comandur Seshadhri. 2014. Decompositions of triangle-dense graphs. In Proceedings of the 5th conference on Innovations in Theoretical Computer Science. ACM, 471--482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 895--904.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  25. Zhi-Quan Luo, Wing-Kin Ma, Anthony Man-Cho So, Yinyu Ye, and Shuzhong Zhang. 2010. Semidefinite relaxation of quadratic optimization problems. IEEE Signal Processing Magazine 3, 27 (2010), 20--34.Google ScholarGoogle ScholarCross RefCross Ref
  26. Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mark Newman. 2018. Networks. Oxford university press.Google ScholarGoogle Scholar
  28. N Prulj, Dennis A Wigle, and Igor Jurisica. 2004. Functional topology in a network of protein interactions. Bioinformatics 20, 3 (2004), 340--348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269--287.Google ScholarGoogle Scholar
  30. Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 817--826.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1122--1132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/~uno/codes.htm.Google ScholarGoogle Scholar
  34. Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. 2009. On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACM Workshop on Online social networks. ACM, 37--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Duncan JWatts and Steven H Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684 (1998), 440.Google ScholarGoogle Scholar

Index Terms

  1. Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        August 2020
        3664 pages
        ISBN:9781450379984
        DOI:10.1145/3394486

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 August 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader