ABSTRACT
Extracting dense sub-components from graphs efficiently is an important objective in a wide range of application domains ranging from social network analysis to biological network analysis, from the World Wide Web to stock market analysis. Motivated by this need recently we have seen several new algorithms to tackle this problem based on the (frequent) pattern mining paradigm. A limitation of most of these methods is that they are highly sensitive to parameter settings, rely on exhaustive enumeration with exponential time complexity, and often fail to help the users understand the underlying distribution of components embedded within the host graph.
In this article we propose an approximate algorithm, to mine and visualize cohesive subgraphs (dense sub components) within a large graph. The approach, refereed to as Cohesive Subgraph Visualization (CSV) relies on a novel mapping strategy that maps edges and nodes to a multi-dimensional space wherein dense areas in the mapped space correspond to cohesive subgraphs. The algorithm then walks through the dense regions in the mapped space to output a visual plot that effectively captures the overall dense sub-component distribution of the graph. Unlike extant algorithms with exponential complexity, CSV has a complexity of O(V2logV) when fixing the parameter mapping dimension, where V corresponds to the number of vertices in the graph, although for many real datasets the performance is typically sub-quadratic.
We demonstrate the utility of CSV as a stand-alone tool for visual graph exploration and as a pre-filtering step to significantly scale up exact subgraph mining algorithms such as CLAN.
- J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, pages 598--612, 2002. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12?15 1994. Google ScholarDigital Library
- M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD?99), pages 49--60, Philadelphia, PA, June 1999. Google ScholarDigital Library
- A.P.Gasch and M.B.Eisen. Exploring the conditional coregulation of yeast gene expression through fuzzy k-mean clustering. Genome Biol., 3(RESEARCH0059), 2002.Google Scholar
- S. Asur, D. Ucar, and S. Parthasarathy. An ensemble framework for clustering protein´lcprotein interaction networks. In ISMB ?07: Proceedings of the 15th Annual International Conference on Intelligent Systems, 2007.Google Scholar
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. 2002 Int. Conf. Knowledge Discovery and Data Mining, 2002. Google ScholarDigital Library
- V. Boginski, S. Butenko, and P. M. Pardalos. Mining market data: a network approach. Comput. Oper. Res., 33(11):3171--3184, 2006. Google ScholarDigital Library
- B. Bollobas. Extremal Graph Theory. Dover Publications, Incorporated, 1978. Google ScholarDigital Library
- B. J. Breitkreutz, C. Stark, and M. Tyers. Osprey: a network visualization system. Genome Biol, 4, 2003.Google Scholar
- B. Bustos, G. Navarro, and E. Chávez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett., 24(14):2357--2366, 2003. Google ScholarDigital Library
- J. Cheriyan and R. Thurimella. Fast Algorithms for k-Shredders and k -Node Connectivity Augmentation (Extended Abstract). 1996.Google Scholar
- G. Cong, K. Tan, A. Tung, and F. Pan. Mining Frequent Closed Patterns in Microarray Data. Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM?04)-Volume 00, pages 363--366, 2004. Google ScholarDigital Library
- G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-K covering rule groups for gene expression data. Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 670--681, 2005. Google ScholarDigital Library
- G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang. FARMER: finding interesting rule groups in microarray datasets. Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 143--154, 2004. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD?96), pages 226--231, Portland, Oregon, Aug. 1996.Google Scholar
- C. Faloutsos and K.-I. Lin. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD?95), pages 163--174, San Jose, CA, May 1995. Google ScholarDigital Library
- H.Hu, X.Yan, Y.Huang, J.Han, and X.J.Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005. Google ScholarDigital Library
- V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), 2003. Google ScholarDigital Library
- V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), 2003. Google ScholarDigital Library
- H. Hu, X. Yan, Y. Huang, J. Han, and X. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. 2005.Google Scholar
- A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Mach. Learn., 50(3):321--354, 2003. Google ScholarDigital Library
- G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. In Supercomputing'96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), page 35, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarDigital Library
- G. Kossinets and D. Watts. Empirical analysis of an evolving social network. Science Magazine, 311(5757), 2006.Google Scholar
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001. Google ScholarDigital Library
- A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, volume 14, 2001.Google Scholar
- F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. Carpenter: finding closed patterns in long biological datasets. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 637--642, 2003. Google ScholarDigital Library
- F. Pan, A. Tung, G. Cong, and X. Xu. COBBLER: combining column and row enumeration for closed pattern discovery. Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, pages 21--30, 2004. Google ScholarDigital Library
- C. T. J. R. S. Filho, A. Traina and C. Faloutsos. Similarity search without tears: The omni family of all-purpose access methods. In Proc. of the 17th IEEE Intl. Conf. on Data Engineering, 2001. Google ScholarDigital Library
- S. Seidman. Network structure and minimum degree. Social Networks, 5:269--287, 1983.Google ScholarCross Ref
- A. Srinivasan. The value of strong inapproximability results for clique. In STOC ?00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 144--152, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- S. Tri and U. Leser. Gripp - indexing and querying graphs based on pre- and postorder numbering. Technical report, 2006.Google Scholar
- P. Turan. On an extremal problem in graph theory. Mat. Fiz. Lapok, 48:436--452, 1941.Google Scholar
- J. Wang, Z. Zeng, and L. Zhou. Clan: An algorithm for mining closed cliques from large dense graph databases. In Proceedings of the International Conference on Data Engineering, page 73, 2006. Google ScholarDigital Library
- J. Wang, Z. Zeng, and L. Zhou. Clan: An algorithm for mining closed cliques from large dense graph databases. In Proceedings of the International Conference on Data Engineering, page 73, 2006. Google ScholarDigital Library
- X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD, pages 324--333, 2005. Google ScholarDigital Library
- J. W. J. Z., Mellor and C. DeLisi. Visant: an online visualization and analysis tool for biological interaction data. In BMC Bioinformatics, 5:17--24, 2004.Google ScholarCross Ref
- Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In Proceedings of the International Conf. Knowledge Discovery and Data Mining, pages 797--802, 2006. Google ScholarDigital Library
Index Terms
- CSV: visualizing and mining cohesive subgraphs
Recommendations
Mining frequent cross-graph quasi-cliques
Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different ...
On the Maximum Number of Cliques in a Graph
A clique is a set of pairwise adjacent vertices in a graph. We determine the maximum number of cliques in a graph for the following graph classes: (1) graphs with n vertices and m edges; (2) graphs with n vertices, m edges, and maximum degree Δ; (3) d-...
On minimal vertex separators of dually chordal graphs: Properties and characterizations
Many works related to dually chordal graphs, their cliques and neighborhoods were published by Brandstadt et al. (1998) [1] and Gutierrez (1996) [6]. We will undertake a similar study by considering minimal vertex separators and their properties ...
Comments