research-article

Public Access

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Authors:
Aritra Konar

University of Virginia, Charlottesville, VA, USA

University of Virginia, Charlottesville, VA, USA
View Profile

,
Nicholas D. Sidiropoulos

University of Virginia, Charlottesville, VA, USA

University of Virginia, Charlottesville, VA, USA
View Profile

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAugust 2020Pages 577–587https://doi.org/10.1145/3394486.3403100

Published:20 August 2020Publication History

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 577–587

ABSTRACT

Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for extracting large quasi-cliques: simply scan the vertex neighborhoods, compute the clustering coefficient of each vertex, and output the best such subgraph. The implementation of such a method requires counting the triangles in a graph, which is a well-studied problem in graph mining. When empirically tested across a number of real-world graphs, this approach reveals a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes, and the density of the best neighborhood often compares favorably to subgraphs produced by dedicated algorithms for maximizing subgraph density. For graphs with small clustering coefficients, we demonstrate that small vertex neighborhoods can be refined using a local-search method to grow larger cliques and near-cliques. Our results indicate that contrary to worst-case theoretical results, mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem, and provides motivation for further work geared towards a better explanation of these empirical successes.

References

2015. Large Near-Clique Detection. http://github.com/tsourolampis/Scalable-Large-Near-Clique-Detection.Google Scholar
Noga Alon and Joel H Spencer. 2016. The probabilistic method. John Wiley & Sons.Google ScholarCross Ref
Albert Angel, Nikos Sarkas, Nick Koudas, and Divesh Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. In Proceedings of the 38th International Conference on Very Large Data Bases. VLDB Endowment, 574--585.Google ScholarDigital Library
Gary D Bader and Christopher WV Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 1 (2003), 2.Google ScholarCross Ref
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.Google Scholar
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).Google Scholar
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 16, 9 (1973), 575--577.Google ScholarDigital Library
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 95--106.Google ScholarDigital Library
Jose Cadena, Anil Kumar Vullikanti, and Charu C Aggarwal. 2016. On dense subgraphs in signed network streams. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). IEEE, 51--60.Google ScholarCross Ref
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.Google ScholarCross Ref
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.Google Scholar
Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1.Google ScholarDigital Library
David Eppstein, Maarten Löffler, and Darren Strash. 2010. Listing all maximal cliques in sparse graphs in near-optimal time. In International Symposium on Algorithms and Computation. Springer, 403--414.Google ScholarCross Ref
Pooya Esfandiar, Francesco Bonchi, David F Gleich, Chen Greif, Laks VS Lakshmanan, and Byung-Won On. 2010. Fast Katz and commuters: Efficient estimation of social relatedness in large networks. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 132--145.Google ScholarCross Ref
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 251--262.Google ScholarDigital Library
Alessandro Ferrante, Gopal Pandurangan, and Kihong Park. 2008. On the hardness of optimization in power-law graphs. Theoretical Computer Science 393, 1--3 (2008), 220--230.Google ScholarDigital Library
Michael R Garey and David S Johnson. 2002. Computers and intractability.Google Scholar
David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732.Google ScholarDigital Library
David F Gleich and C Seshadhri. 2012. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 597--605.Google ScholarDigital Library
Andrew V Goldberg. 1984. Finding a maximum density subgraph. Technical report, University of California Berkeley, CA.Google Scholar
Rishi Gupta, Tim Roughgarden, and Comandur Seshadhri. 2014. Decompositions of triangle-dense graphs. In Proceedings of the 5th conference on Innovations in Theoretical Computer Science. ACM, 471--482.Google ScholarDigital Library
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 895--904.Google ScholarDigital Library
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473.Google ScholarDigital Library
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.Google Scholar
Zhi-Quan Luo, Wing-Kin Ma, Anthony Man-Cho So, Yinyu Ye, and Shuzhong Zhang. 2010. Semidefinite relaxation of quadratic optimization problems. IEEE Signal Processing Magazine 3, 27 (2010), 20--34.Google ScholarCross Ref
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.Google ScholarDigital Library
Mark Newman. 2018. Networks. Oxford university press.Google Scholar
N Prulj, Dennis A Wigle, and Igor Jurisica. 2004. Functional topology in a network of protein interactions. Bioinformatics 20, 3 (2004), 340--348.Google ScholarDigital Library
Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269--287.Google Scholar
Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 817--826.Google ScholarDigital Library
Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1122--1132.Google ScholarDigital Library
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112.Google ScholarDigital Library
Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/~uno/codes.htm.Google Scholar
Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. 2009. On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACM Workshop on Online social networks. ACM, 37--42.Google ScholarDigital Library
Duncan JWatts and Steven H Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684 (1998), 440.Google Scholar

Index Terms

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods
1. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
    2. Graph theory

Recommendations

Mining Largest Maximal Quasi-Cliques

Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications in bioinformatics and social network analysis. ...
Read More
Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph
DATA 2015: Proceedings of 4th International Conference on Data Management Technologies and Applications

Many real-world phenomena such as social networks and biological networks can be modeled as graphs. Discovering dense sub-graphs from these graphs may be able to find interesting facts about the phenomena.

Quasi-cliques are a type of dense graphs, which ...
Read More
On Vertex-Disjoint Triangles in Tripartite Graphs and Multigraphs
Abstract
Let G be a tripartite graph with tripartition $(V_{1}, V_{2}, V_{3})$ , where $∣ V_{1} ∣ = ∣ V_{2} ∣ = ∣ V_{3} ∣ = k > 0$ . It is proved that if $d (x) + d (y) \geq 3 k$ for every pair of nonadjacent vertices $x \in V_{i}, y \in V_{j}$ with $i \neq j (i, j \in {1, 2, 3})$ , then G contains k vertex-disjoint triangles. As a ... $\frac{}{}$ $_{}_{}_{}$ $_{}_{}_{}$ $_{}$
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering coefficients
neighborhoods
quasi-cliques
triangles
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 375
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining Largest Maximal Quasi-Cliques

Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph

On Vertex-Disjoint Triangles in Tripartite Graphs and Multigraphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining Largest Maximal Quasi-Cliques

Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph

On Vertex-Disjoint Triangles in Tripartite Graphs and Multigraphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media