Abstract
Maximal clique enumeration is a fundamental problem in graph theory and has important applications in many areas such as social network analysis and bioinformatics. The problem is extensively studied; however, the best existing algorithms require memory space linear in the size of the input graph. This has become a serious concern in view of the massive volume of today's fast-growing networks. We propose a general framework for designing external-memory algorithms for maximal clique enumeration in large graphs. The general framework enables maximal clique enumeration to be processed recursively in small subgraphs of the input graph, thus allowing in-memory computation of maximal cliques without the costly random disk access. We prove that the set of cliques obtained by the recursive local computation is both correct (i.e., globally maximal) and complete. The subgraph to be processed each time is defined based on a set of base vertices that can be flexibly chosen to achieve different purposes. We discuss the selection of the base vertices to fully utilize the available memory in order to minimize I/O cost in static graphs, and for update maintenance in dynamic graphs. We also apply our framework to design an external-memory algorithm for maximum clique computation in a large graph.
- Abu-Khzam, F. N., Baldwin, N. E., Langston, M. A., and Samatova, N. F. 2005. On the relative efficiency of maximal clique enumeration algorithms, with applications to high-throughput computational biology. In Proceedings of the International Conference on Research Trends in Science and Technology.Google Scholar
- Akkoyunlu, E. A. 1973. The enumeration of maximal cliques of large graphs. SIAM J. Comput. 2, 1, 1--6.Google ScholarDigital Library
- Bernard, H. R., Killworth, P. D., and Sailer, L. 1979. Informant accuracy in social network data iv: a comparison of clique-level structure in behavioral and cognitive network data. Social Netw. 2, 3, 191--218.Google ScholarCross Ref
- Berry, N. M., Ko, T. H., Moy, T., Smrcka, J., Turnley, J., and Wu, B. 2004. Emergent clique formation in terrorist recruitment. In Proceedings of the AAAI-04 Workshop on Agent Organizations: Theory and Practice.Google Scholar
- Boginski, V., Butenko, S., and Pardalos, P. M. 2005. Statistical analysis of financial networks. Comput. Statist. Data Anal. 48, 2, 431--443.Google ScholarCross Ref
- Bomze, I. M., Budinich, M., Pardalos, P. M., and Pelillo, M. 1999. The maximum clique problem. In Handbook of Combinatorial Optimization. Kluwer Academic Publishers, 1--74.Google Scholar
- Bron, C. and Kerbosch, J. 1973. Algorithm 457: finding all cliques of an undirected graph. Comm. ACM 16, 9, 575--577. Google ScholarDigital Library
- Byskov, J. M. 2003. Algorithms for k-colouring and finding maximal independent sets. In Proceedings of the Symposium on Discrete Algorithms (SODA). 456--457. Google ScholarDigital Library
- Cazals, F. and Karande, C. 2008. A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407, 1-3, 564--568. Google ScholarDigital Library
- Cheng, J., Ke, Y., Fu, A. W.-C., Yu, J. X., and Zhu, L. 2010. Finding maximal cliques in massive networks by h*-graph. In Proceedings of the SIGMOD International Conference on Management of Data. 447--458. Google ScholarDigital Library
- Creamer, G., Rowe, R., Hershkop, S., and Stolfo, S. J. 2007. Segmentation and automated social hierarchy detection through email network analysis. In Proceedings of WebKDD/SNA-KDD. 40--58. Google ScholarDigital Library
- Dorogovtsev, S. N. and Mendesand, J. F. F. 2003. Evolution of Networks: From Biological Nets to the Internet and www. Oxford University Press. Google ScholarDigital Library
- Du, N., Wu, B., Xu, L., Wang, B., and Xin, P. 2009. Parallel algorithm for enumerating maximal cliques in complex network. In Mining Complex Data, 207--221.Google Scholar
- Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the internet topology. In Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures and Protocols for Computer Communications (SIGCOMM). 251--262. Google ScholarDigital Library
- Faust, K. and Wasserman, S. 1995. Social Network Analysis: Methods and Applications. Cambridge University Press.Google Scholar
- Gouda, K. and Zaki, M. J. 2001. Efficiently mining maximal frequent itemsets. In Proceedings of the International Conference on Data Mining (ICDM). 163--170. Google ScholarDigital Library
- Kilby, P., Slaney, J. K., Thiébaux, S., and Walsh, T. 2006. Estimating search tree size. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Google ScholarDigital Library
- Knuth, D. E. 1975. Estimating the efficiency of backtrack programs. Math. Comput. 29, 129, 121--136.Google Scholar
- Koch, I. 2001. Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250, 1-2, 1--30. Google ScholarDigital Library
- Kose, F., Weckwerth, W., Linke, T., and Fiehn, O. 2001. Visualizing plant metabolomic correlation networks using clique-metabolite matrices. Bioinf. 17, 12, 1198--1208.Google ScholarCross Ref
- Makino, K. and Uno, T. 2004. New algorithms for enumerating all maximal cliques. In Proceedings of the Scandinavian Workshop on Algorithms Theory (SWAT). 260--272.Google Scholar
- Modani, N. and Dey, K. 2008. Large maximal cliques enumeration in sparse graphs. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 1377--1378. Google ScholarDigital Library
- Mohseni-Zadeh, S., Brézellec, P., and Risler, J.-L. 2004. Cluster-c, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Comput. Biol. Chemist. 28, 3, 211--218. Google ScholarDigital Library
- Newman, M. E. J. 2003. The structure and function of complex networks. SIAM Rev. 45, 167--256.Google ScholarDigital Library
- Schmidt, M. C., Samatova, N. F., Thomas, K., and Park, B.-H. 2009. A scalable, parallel algorithm for maximal clique enumeration. J. Parall. Distrib. Comput. 69, 4, 417--428. Google ScholarDigital Library
- Stix, V. 2004. Finding all maximal cliques in dynamic graphs. Comput. Optimiz. Appl. 27, 173--186. Google ScholarDigital Library
- Tomita, E. and Seki, T. 2003. An efficient branch-and-bound algorithm for finding a maximum clique. Discr. Math. Theoret. Comput. Sci. 2731, 278--289. Google ScholarDigital Library
- Tomita, E., Tanaka, A., and Takahashi, H. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363, 1, 28--42. Google ScholarDigital Library
- Tsukiyama, S., Ide, M., Ariyoshi, H., and Shirakawa, I. 1977. A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6, 3, 505--517.Google ScholarDigital Library
- Wan, L., Wu, B., Du, N., Ye, Q., and Chen, P. 2006. A new algorithm for enumerating all maximal cliques in complex network. In Proceedings of the International Conference on Advanced Data Mining and Applications (ADMA). 606--617. Google ScholarDigital Library
- Zhang, B., Park, B.-H., Karpinets, T. V., and Samatova, N. F. 2008. From pull-down data to protein interaction networks and complexes with biological relevance. Bioinf. 24, 7, 979--986. Google ScholarDigital Library
Index Terms
- Finding maximal cliques in massive networks
Recommendations
Finding maximal cliques in massive networks by H*-graph
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataMaximal clique enumeration (MCE) is a fundamental problem in graph theory and has important applications in many areas such as social network analysis and bioinformatics. The problem is extensively studied; however, the best existing algorithms require ...
Mining maximal cliques from a large graph using MapReduce
We consider Maximal Clique Enumeration (MCE) from a large graph. A maximal clique is perhaps the most fundamental dense substructure in a graph, and MCE is an important tool to discover densely connected subgraphs, with numerous applications to data ...
Coloring the Maximal Cliques of Graphs
In this paper we are concerned with the so-called clique-colorations of a graph, that is, colorations of the vertices so that no maximal clique is monochromatic. On one hand, it is known to be NP-complete to decide whether a perfect graph is 2-clique-...
Comments