ABSTRACT
This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.
- David Avis and Komei Fukuda. Reverse search for enumeration. Discrete Applied Mathematics, 65(1--3):21 -- 46, 1996. Google ScholarDigital Library
- Rachel Behar and Sara Cohen. Finding all maximal connected s-cliques in social networks. In 21th International Conference on Extending Database Technology, EDBT, pages 61--72, 2018.Google Scholar
- Devora Berlowitz, Sara Cohen, and Benny Kimelfeld. Efficient enumeration of maximal k-plexes. In 2015 ACM SIGMOD International Conference on Management of Data, pages 431--444. ACM, 2015. Google ScholarDigital Library
- Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711--726, 2004. Google ScholarDigital Library
- Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9):575--577, 1973. Google ScholarDigital Library
- Coenraad Bron and Joep Kerbosch. Finding all cliques of an undirected graph (algorithm 457). Communications of the ACM, 16(9):575--576, 1973. Google ScholarDigital Library
- James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. Fast algorithms for maximal clique enumeration with limited memory. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 1240--1248, 2012. Google ScholarDigital Library
- Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 14(1):210--223, 1985. Google ScholarDigital Library
- Alessio Conte, Donatella Firmani, Caterina Mordente, Maurizio Patrignani, and Riccardo Torlone. Fast enumeration of large k-plexes. In 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 115--124, 2017. Google ScholarDigital Library
- Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. Sublinear-space bounded-delay enumeration for massive network analytics: Maximal cliques. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP, pages 148:1--148:15, 2016.Google Scholar
- Alessio Conte, Roberto De Virgilio, Antonio Maccioni, Maurizio Patrignani, and Riccardo Torlone. Finding all maximal cliques in very large social networks. In 19th International Conference on Extending Database Technology, EDBT, pages 173--184, 2016.Google Scholar
- David Eppstein, Maarten Löffler, and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. ACM Journal of Experimental Algorithmics, 18, 2013. Google ScholarDigital Library
- Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75 -- 174, 2010.Google ScholarCross Ref
- Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. Querying k-truss community in large and dynamic graphs. In 2014 ACM SIGMOD International Conference on Management of Data, pages 1311--1322. ACM, 2014. Google ScholarDigital Library
- Guimei Liu and Limsoon Wong. Effective pruning techniques for mining quasi-cliques. In Machine Learning and Knowledge Discovery in Databases, pages 33--49. Springer Berlin Heidelberg, 2008.Google ScholarDigital Library
- Kazuhisa Makino and Takeaki Uno. New algorithms for enumerating all maximal cliques. In Algorithm Theory-SWAT 2004, pages 260--272. Springer, 2004.Google ScholarCross Ref
- Robert J Mokken. Cliques, clubs and clans. Quality and quantity, 13(2):161--173, 1979.Google ScholarCross Ref
- Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. Distributed k-core decomposition. IEEE Transactions on parallel and distributed systems, 24(2):288--300, 2013. Google ScholarDigital Library
- Stephen B Seidman and Brian L Foster. A graph-theoretic generalization of the clique concept. Journal of Mathematical sociology, 6(1):139--154, 1978.Google ScholarCross Ref
- Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363(1):28--42, 2006. Google ScholarDigital Library
- Charalampos E. Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria A. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pages 104--112, 2013. Google ScholarDigital Library
- Shuji Tsukiyama, Mikio Ide, Hiromu Ariyoshi, and Isao Shirakawa. A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing, 6(3):505--517, 1977.Google ScholarCross Ref
- Takeaki Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 56(1):3--16, 2008.Google ScholarCross Ref
- Zhuo Wang, Qun Chen, Boyi Hou, Bo Suo, Zhanhuai Li, Wei Pan, and Zachary G. Ives. Parallelizing maximal clique and k-plex enumeration over graph data. Journal of Parallel and Distributed Computing, 106:79 -- 91, 2017.Google Scholar
- Bin Wu and Xin Pei. A parallel algorithm for enumerating all the maximal k-plexes. In Emerging Technologies in Knowledge Discovery and Data Mining, pages 476--483, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. Google ScholarDigital Library
- Yanyan Xu, James Cheng, Ada Wai-Chee Fu, and Yingyi Bu. Distributed maximal clique computation. In 2014 IEEE International Congress on Big Data, pages 160--167. IEEE, 2014. Google ScholarDigital Library
- Hongjie Zhai, Makoto Haraguchi, Yoshiaki Okubo, and Etsuji Tomita. A fast and complete algorithm for enumerating pseudo-cliques in large graphs. International Journal of Data Science and Analytics, 2(3):145--158, Dec 2016.Google ScholarCross Ref
- Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. Finding top-k maximal cliques in an uncertain graph. In 26th International Conference on Data Engineering, ICDE, pages 649--652. IEEE, 2010.Google ScholarCross Ref
Index Terms
- D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes
Recommendations
Fast Enumeration of Large k-Plexes
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningK-plexes are a formal yet flexible way of defining communities in networks. They generalize the notion of cliques and are more appropriate in most real cases: while a node of a clique C is connected to all other nodes of C, a node of a k-plex may miss ...
Arithmetical Semirings
AbstractWe study the number of connected graphs with n vertices that cannot be written as the cartesian product of two graphs with fewer vertices. We give an upper bound which implies that for large n almost all graphs are both connected and ...
A Declarative Framework for Maximal k-plex Enumeration Problems
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent SystemsIt is widely accepted that an ideal community in networks is the one whose structure is closest to a (maximal) clique. However, inmost real-world graphs the clique model is too restrictive, as it requires complete pairwise interactions. More relaxed ...
Comments