research-article

D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

Authors:
Alessio Conte

National Institute of Informatics, Japan, Tokyo, Japan

National Institute of Informatics, Japan, Tokyo, Japan
View Profile

,
Tiziano De Matteis

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Daniele De Sensi

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Roberto Grossi

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Andrea Marino

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Luca Versari

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 1272–1281https://doi.org/10.1145/3219819.3220093

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1272–1281

ABSTRACT

This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.

References

David Avis and Komei Fukuda. Reverse search for enumeration. Discrete Applied Mathematics, 65(1--3):21 -- 46, 1996. Google ScholarDigital Library
Rachel Behar and Sara Cohen. Finding all maximal connected s-cliques in social networks. In 21th International Conference on Extending Database Technology, EDBT, pages 61--72, 2018.Google Scholar
Devora Berlowitz, Sara Cohen, and Benny Kimelfeld. Efficient enumeration of maximal k-plexes. In 2015 ACM SIGMOD International Conference on Management of Data, pages 431--444. ACM, 2015. Google ScholarDigital Library
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711--726, 2004. Google ScholarDigital Library
Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9):575--577, 1973. Google ScholarDigital Library
Coenraad Bron and Joep Kerbosch. Finding all cliques of an undirected graph (algorithm 457). Communications of the ACM, 16(9):575--576, 1973. Google ScholarDigital Library
James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. Fast algorithms for maximal clique enumeration with limited memory. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 1240--1248, 2012. Google ScholarDigital Library
Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 14(1):210--223, 1985. Google ScholarDigital Library
Alessio Conte, Donatella Firmani, Caterina Mordente, Maurizio Patrignani, and Riccardo Torlone. Fast enumeration of large k-plexes. In 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 115--124, 2017. Google ScholarDigital Library
Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. Sublinear-space bounded-delay enumeration for massive network analytics: Maximal cliques. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP, pages 148:1--148:15, 2016.Google Scholar
Alessio Conte, Roberto De Virgilio, Antonio Maccioni, Maurizio Patrignani, and Riccardo Torlone. Finding all maximal cliques in very large social networks. In 19th International Conference on Extending Database Technology, EDBT, pages 173--184, 2016.Google Scholar
David Eppstein, Maarten Löffler, and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. ACM Journal of Experimental Algorithmics, 18, 2013. Google ScholarDigital Library
Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75 -- 174, 2010.Google ScholarCross Ref
Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. Querying k-truss community in large and dynamic graphs. In 2014 ACM SIGMOD International Conference on Management of Data, pages 1311--1322. ACM, 2014. Google ScholarDigital Library
Guimei Liu and Limsoon Wong. Effective pruning techniques for mining quasi-cliques. In Machine Learning and Knowledge Discovery in Databases, pages 33--49. Springer Berlin Heidelberg, 2008.Google ScholarDigital Library
Kazuhisa Makino and Takeaki Uno. New algorithms for enumerating all maximal cliques. In Algorithm Theory-SWAT 2004, pages 260--272. Springer, 2004.Google ScholarCross Ref
Robert J Mokken. Cliques, clubs and clans. Quality and quantity, 13(2):161--173, 1979.Google ScholarCross Ref
Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. Distributed k-core decomposition. IEEE Transactions on parallel and distributed systems, 24(2):288--300, 2013. Google ScholarDigital Library
Stephen B Seidman and Brian L Foster. A graph-theoretic generalization of the clique concept. Journal of Mathematical sociology, 6(1):139--154, 1978.Google ScholarCross Ref
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363(1):28--42, 2006. Google ScholarDigital Library
Charalampos E. Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria A. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pages 104--112, 2013. Google ScholarDigital Library
Shuji Tsukiyama, Mikio Ide, Hiromu Ariyoshi, and Isao Shirakawa. A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing, 6(3):505--517, 1977.Google ScholarCross Ref
Takeaki Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 56(1):3--16, 2008.Google ScholarCross Ref
Zhuo Wang, Qun Chen, Boyi Hou, Bo Suo, Zhanhuai Li, Wei Pan, and Zachary G. Ives. Parallelizing maximal clique and k-plex enumeration over graph data. Journal of Parallel and Distributed Computing, 106:79 -- 91, 2017.Google Scholar
Bin Wu and Xin Pei. A parallel algorithm for enumerating all the maximal k-plexes. In Emerging Technologies in Knowledge Discovery and Data Mining, pages 476--483, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. Google ScholarDigital Library
Yanyan Xu, James Cheng, Ada Wai-Chee Fu, and Yingyi Bu. Distributed maximal clique computation. In 2014 IEEE International Congress on Big Data, pages 160--167. IEEE, 2014. Google ScholarDigital Library
Hongjie Zhai, Makoto Haraguchi, Yoshiaki Okubo, and Etsuji Tomita. A fast and complete algorithm for enumerating pseudo-cliques in large graphs. International Journal of Data Science and Analytics, 2(3):145--158, Dec 2016.Google ScholarCross Ref
Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. Finding top-k maximal cliques in an uncertain graph. In 26th International Conference on Data Engineering, ICDE, pages 649--652. IEEE, 2010.Google ScholarCross Ref

Index Terms

D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

Recommendations

Fast Enumeration of Large k-Plexes
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

K-plexes are a formal yet flexible way of defining communities in networks. They generalize the notion of cliques and are more appropriate in most real cases: while a node of a clique C is connected to all other nodes of C, a node of a k-plex may miss ...
Read More
Arithmetical Semirings
Abstract
We study the number of connected graphs with n vertices that cannot be written as the cartesian product of two graphs with fewer vertices. We give an upper bound which implies that for large n almost all graphs are both connected and ...
Read More
A Declarative Framework for Maximal k-plex Enumeration Problems
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

It is widely accepted that an ideal community in networks is the one whose structure is closest to a (maximal) clique. However, inmost real-world graphs the clique model is too restrictive, as it requires complete pairwise interactions. More relaxed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community discovery
graph enumeration
k-plexes
parallel programming
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 935
  Total Downloads
- Downloads (Last 12 months)69
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast Enumeration of Large k-Plexes

Arithmetical Semirings

A Declarative Framework for Maximal k-plex Enumeration Problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast Enumeration of Large k-Plexes

Arithmetical Semirings

A Declarative Framework for Maximal k-plex Enumeration Problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media