skip to main content
research-article

Finding maximal cliques in massive networks

Authors Info & Claims
Published:19 December 2011Publication History
Skip Abstract Section

Abstract

Maximal clique enumeration is a fundamental problem in graph theory and has important applications in many areas such as social network analysis and bioinformatics. The problem is extensively studied; however, the best existing algorithms require memory space linear in the size of the input graph. This has become a serious concern in view of the massive volume of today's fast-growing networks. We propose a general framework for designing external-memory algorithms for maximal clique enumeration in large graphs. The general framework enables maximal clique enumeration to be processed recursively in small subgraphs of the input graph, thus allowing in-memory computation of maximal cliques without the costly random disk access. We prove that the set of cliques obtained by the recursive local computation is both correct (i.e., globally maximal) and complete. The subgraph to be processed each time is defined based on a set of base vertices that can be flexibly chosen to achieve different purposes. We discuss the selection of the base vertices to fully utilize the available memory in order to minimize I/O cost in static graphs, and for update maintenance in dynamic graphs. We also apply our framework to design an external-memory algorithm for maximum clique computation in a large graph.

References

  1. Abu-Khzam, F. N., Baldwin, N. E., Langston, M. A., and Samatova, N. F. 2005. On the relative efficiency of maximal clique enumeration algorithms, with applications to high-throughput computational biology. In Proceedings of the International Conference on Research Trends in Science and Technology.Google ScholarGoogle Scholar
  2. Akkoyunlu, E. A. 1973. The enumeration of maximal cliques of large graphs. SIAM J. Comput. 2, 1, 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bernard, H. R., Killworth, P. D., and Sailer, L. 1979. Informant accuracy in social network data iv: a comparison of clique-level structure in behavioral and cognitive network data. Social Netw. 2, 3, 191--218.Google ScholarGoogle ScholarCross RefCross Ref
  4. Berry, N. M., Ko, T. H., Moy, T., Smrcka, J., Turnley, J., and Wu, B. 2004. Emergent clique formation in terrorist recruitment. In Proceedings of the AAAI-04 Workshop on Agent Organizations: Theory and Practice.Google ScholarGoogle Scholar
  5. Boginski, V., Butenko, S., and Pardalos, P. M. 2005. Statistical analysis of financial networks. Comput. Statist. Data Anal. 48, 2, 431--443.Google ScholarGoogle ScholarCross RefCross Ref
  6. Bomze, I. M., Budinich, M., Pardalos, P. M., and Pelillo, M. 1999. The maximum clique problem. In Handbook of Combinatorial Optimization. Kluwer Academic Publishers, 1--74.Google ScholarGoogle Scholar
  7. Bron, C. and Kerbosch, J. 1973. Algorithm 457: finding all cliques of an undirected graph. Comm. ACM 16, 9, 575--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Byskov, J. M. 2003. Algorithms for k-colouring and finding maximal independent sets. In Proceedings of the Symposium on Discrete Algorithms (SODA). 456--457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cazals, F. and Karande, C. 2008. A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407, 1-3, 564--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cheng, J., Ke, Y., Fu, A. W.-C., Yu, J. X., and Zhu, L. 2010. Finding maximal cliques in massive networks by h*-graph. In Proceedings of the SIGMOD International Conference on Management of Data. 447--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Creamer, G., Rowe, R., Hershkop, S., and Stolfo, S. J. 2007. Segmentation and automated social hierarchy detection through email network analysis. In Proceedings of WebKDD/SNA-KDD. 40--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dorogovtsev, S. N. and Mendesand, J. F. F. 2003. Evolution of Networks: From Biological Nets to the Internet and www. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Du, N., Wu, B., Xu, L., Wang, B., and Xin, P. 2009. Parallel algorithm for enumerating maximal cliques in complex network. In Mining Complex Data, 207--221.Google ScholarGoogle Scholar
  14. Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the internet topology. In Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures and Protocols for Computer Communications (SIGCOMM). 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Faust, K. and Wasserman, S. 1995. Social Network Analysis: Methods and Applications. Cambridge University Press.Google ScholarGoogle Scholar
  16. Gouda, K. and Zaki, M. J. 2001. Efficiently mining maximal frequent itemsets. In Proceedings of the International Conference on Data Mining (ICDM). 163--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kilby, P., Slaney, J. K., Thiébaux, S., and Walsh, T. 2006. Estimating search tree size. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Knuth, D. E. 1975. Estimating the efficiency of backtrack programs. Math. Comput. 29, 129, 121--136.Google ScholarGoogle Scholar
  19. Koch, I. 2001. Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250, 1-2, 1--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kose, F., Weckwerth, W., Linke, T., and Fiehn, O. 2001. Visualizing plant metabolomic correlation networks using clique-metabolite matrices. Bioinf. 17, 12, 1198--1208.Google ScholarGoogle ScholarCross RefCross Ref
  21. Makino, K. and Uno, T. 2004. New algorithms for enumerating all maximal cliques. In Proceedings of the Scandinavian Workshop on Algorithms Theory (SWAT). 260--272.Google ScholarGoogle Scholar
  22. Modani, N. and Dey, K. 2008. Large maximal cliques enumeration in sparse graphs. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 1377--1378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mohseni-Zadeh, S., Brézellec, P., and Risler, J.-L. 2004. Cluster-c, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Comput. Biol. Chemist. 28, 3, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Newman, M. E. J. 2003. The structure and function of complex networks. SIAM Rev. 45, 167--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Schmidt, M. C., Samatova, N. F., Thomas, K., and Park, B.-H. 2009. A scalable, parallel algorithm for maximal clique enumeration. J. Parall. Distrib. Comput. 69, 4, 417--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stix, V. 2004. Finding all maximal cliques in dynamic graphs. Comput. Optimiz. Appl. 27, 173--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tomita, E. and Seki, T. 2003. An efficient branch-and-bound algorithm for finding a maximum clique. Discr. Math. Theoret. Comput. Sci. 2731, 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tomita, E., Tanaka, A., and Takahashi, H. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363, 1, 28--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tsukiyama, S., Ide, M., Ariyoshi, H., and Shirakawa, I. 1977. A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6, 3, 505--517.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wan, L., Wu, B., Du, N., Ye, Q., and Chen, P. 2006. A new algorithm for enumerating all maximal cliques in complex network. In Proceedings of the International Conference on Advanced Data Mining and Applications (ADMA). 606--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zhang, B., Park, B.-H., Karpinets, T. V., and Samatova, N. F. 2008. From pull-down data to protein interaction networks and complexes with biological relevance. Bioinf. 24, 7, 979--986. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Finding maximal cliques in massive networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 36, Issue 4
      December 2011
      271 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2043652
      Issue’s Table of Contents

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 December 2011
      • Accepted: 1 April 2011
      • Revised: 1 February 2011
      • Received: 1 October 2010
      Published in tods Volume 36, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader