skip to main content
10.1145/3219819.3220093acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

Published:19 July 2018Publication History

ABSTRACT

This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.

References

  1. David Avis and Komei Fukuda. Reverse search for enumeration. Discrete Applied Mathematics, 65(1--3):21 -- 46, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rachel Behar and Sara Cohen. Finding all maximal connected s-cliques in social networks. In 21th International Conference on Extending Database Technology, EDBT, pages 61--72, 2018.Google ScholarGoogle Scholar
  3. Devora Berlowitz, Sara Cohen, and Benny Kimelfeld. Efficient enumeration of maximal k-plexes. In 2015 ACM SIGMOD International Conference on Management of Data, pages 431--444. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711--726, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9):575--577, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Coenraad Bron and Joep Kerbosch. Finding all cliques of an undirected graph (algorithm 457). Communications of the ACM, 16(9):575--576, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. Fast algorithms for maximal clique enumeration with limited memory. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 1240--1248, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 14(1):210--223, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alessio Conte, Donatella Firmani, Caterina Mordente, Maurizio Patrignani, and Riccardo Torlone. Fast enumeration of large k-plexes. In 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 115--124, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. Sublinear-space bounded-delay enumeration for massive network analytics: Maximal cliques. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP, pages 148:1--148:15, 2016.Google ScholarGoogle Scholar
  11. Alessio Conte, Roberto De Virgilio, Antonio Maccioni, Maurizio Patrignani, and Riccardo Torlone. Finding all maximal cliques in very large social networks. In 19th International Conference on Extending Database Technology, EDBT, pages 173--184, 2016.Google ScholarGoogle Scholar
  12. David Eppstein, Maarten Löffler, and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. ACM Journal of Experimental Algorithmics, 18, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75 -- 174, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  14. Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. Querying k-truss community in large and dynamic graphs. In 2014 ACM SIGMOD International Conference on Management of Data, pages 1311--1322. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guimei Liu and Limsoon Wong. Effective pruning techniques for mining quasi-cliques. In Machine Learning and Knowledge Discovery in Databases, pages 33--49. Springer Berlin Heidelberg, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kazuhisa Makino and Takeaki Uno. New algorithms for enumerating all maximal cliques. In Algorithm Theory-SWAT 2004, pages 260--272. Springer, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  17. Robert J Mokken. Cliques, clubs and clans. Quality and quantity, 13(2):161--173, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. Distributed k-core decomposition. IEEE Transactions on parallel and distributed systems, 24(2):288--300, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Stephen B Seidman and Brian L Foster. A graph-theoretic generalization of the clique concept. Journal of Mathematical sociology, 6(1):139--154, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  20. Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363(1):28--42, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Charalampos E. Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria A. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pages 104--112, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Shuji Tsukiyama, Mikio Ide, Hiromu Ariyoshi, and Isao Shirakawa. A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing, 6(3):505--517, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  23. Takeaki Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 56(1):3--16, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  24. Zhuo Wang, Qun Chen, Boyi Hou, Bo Suo, Zhanhuai Li, Wei Pan, and Zachary G. Ives. Parallelizing maximal clique and k-plex enumeration over graph data. Journal of Parallel and Distributed Computing, 106:79 -- 91, 2017.Google ScholarGoogle Scholar
  25. Bin Wu and Xin Pei. A parallel algorithm for enumerating all the maximal k-plexes. In Emerging Technologies in Knowledge Discovery and Data Mining, pages 476--483, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yanyan Xu, James Cheng, Ada Wai-Chee Fu, and Yingyi Bu. Distributed maximal clique computation. In 2014 IEEE International Congress on Big Data, pages 160--167. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hongjie Zhai, Makoto Haraguchi, Yoshiaki Okubo, and Etsuji Tomita. A fast and complete algorithm for enumerating pseudo-cliques in large graphs. International Journal of Data Science and Analytics, 2(3):145--158, Dec 2016.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. Finding top-k maximal cliques in an uncertain graph. In 26th International Conference on Data Engineering, ICDE, pages 649--652. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2018
          2925 pages
          ISBN:9781450355520
          DOI:10.1145/3219819

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 July 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader