research-article

CSV: visualizing and mining cohesive subgraphs

Authors:
Nan Wang

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Srinivasan Parthasarathy

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

,
Kian-Lee Tan

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Anthony K. H. Tung

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataJune 2008Pages 445–458https://doi.org/10.1145/1376616.1376663

Published:09 June 2008Publication History

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 445–458

ABSTRACT

Extracting dense sub-components from graphs efficiently is an important objective in a wide range of application domains ranging from social network analysis to biological network analysis, from the World Wide Web to stock market analysis. Motivated by this need recently we have seen several new algorithms to tackle this problem based on the (frequent) pattern mining paradigm. A limitation of most of these methods is that they are highly sensitive to parameter settings, rely on exhaustive enumeration with exponential time complexity, and often fail to help the users understand the underlying distribution of components embedded within the host graph.

In this article we propose an approximate algorithm, to mine and visualize cohesive subgraphs (dense sub components) within a large graph. The approach, refereed to as Cohesive Subgraph Visualization (CSV) relies on a novel mapping strategy that maps edges and nodes to a multi-dimensional space wherein dense areas in the mapped space correspond to cohesive subgraphs. The algorithm then walks through the dense regions in the mapped space to output a visual plot that effectively captures the overall dense sub-component distribution of the graph. Unlike extant algorithms with exponential complexity, CSV has a complexity of O(V²logV) when fixing the parameter mapping dimension, where V corresponds to the number of vertices in the graph, although for many real datasets the performance is typically sub-quadratic.

We demonstrate the utility of CSV as a stand-alone tool for visual graph exploration and as a pre-filtering step to significantly scale up exact subgraph mining algorithms such as CLAN.

References

J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, pages 598--612, 2002. Google ScholarDigital Library
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12?15 1994. Google ScholarDigital Library
M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD?99), pages 49--60, Philadelphia, PA, June 1999. Google ScholarDigital Library
A.P.Gasch and M.B.Eisen. Exploring the conditional coregulation of yeast gene expression through fuzzy k-mean clustering. Genome Biol., 3(RESEARCH0059), 2002.Google Scholar
S. Asur, D. Ucar, and S. Parthasarathy. An ensemble framework for clustering protein´lcprotein interaction networks. In ISMB ?07: Proceedings of the 15th Annual International Conference on Intelligent Systems, 2007.Google Scholar
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. 2002 Int. Conf. Knowledge Discovery and Data Mining, 2002. Google ScholarDigital Library
V. Boginski, S. Butenko, and P. M. Pardalos. Mining market data: a network approach. Comput. Oper. Res., 33(11):3171--3184, 2006. Google ScholarDigital Library
B. Bollobas. Extremal Graph Theory. Dover Publications, Incorporated, 1978. Google ScholarDigital Library
B. J. Breitkreutz, C. Stark, and M. Tyers. Osprey: a network visualization system. Genome Biol, 4, 2003.Google Scholar
B. Bustos, G. Navarro, and E. Chávez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett., 24(14):2357--2366, 2003. Google ScholarDigital Library
J. Cheriyan and R. Thurimella. Fast Algorithms for k-Shredders and k -Node Connectivity Augmentation (Extended Abstract). 1996.Google Scholar
G. Cong, K. Tan, A. Tung, and F. Pan. Mining Frequent Closed Patterns in Microarray Data. Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM?04)-Volume 00, pages 363--366, 2004. Google ScholarDigital Library
G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-K covering rule groups for gene expression data. Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 670--681, 2005. Google ScholarDigital Library
G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang. FARMER: finding interesting rule groups in microarray datasets. Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 143--154, 2004. Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD?96), pages 226--231, Portland, Oregon, Aug. 1996.Google Scholar
C. Faloutsos and K.-I. Lin. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD?95), pages 163--174, San Jose, CA, May 1995. Google ScholarDigital Library
H.Hu, X.Yan, Y.Huang, J.Han, and X.J.Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005. Google ScholarDigital Library
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), 2003. Google ScholarDigital Library
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), 2003. Google ScholarDigital Library
H. Hu, X. Yan, Y. Huang, J. Han, and X. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. 2005.Google Scholar
A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Mach. Learn., 50(3):321--354, 2003. Google ScholarDigital Library
G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. In Supercomputing'96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), page 35, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarDigital Library
G. Kossinets and D. Watts. Empirical analysis of an evolving social network. Science Magazine, 311(5757), 2006.Google Scholar
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001. Google ScholarDigital Library
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, volume 14, 2001.Google Scholar
F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. Carpenter: finding closed patterns in long biological datasets. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 637--642, 2003. Google ScholarDigital Library
F. Pan, A. Tung, G. Cong, and X. Xu. COBBLER: combining column and row enumeration for closed pattern discovery. Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, pages 21--30, 2004. Google ScholarDigital Library
C. T. J. R. S. Filho, A. Traina and C. Faloutsos. Similarity search without tears: The omni family of all-purpose access methods. In Proc. of the 17th IEEE Intl. Conf. on Data Engineering, 2001. Google ScholarDigital Library
S. Seidman. Network structure and minimum degree. Social Networks, 5:269--287, 1983.Google ScholarCross Ref
A. Srinivasan. The value of strong inapproximability results for clique. In STOC ?00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 144--152, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
S. Tri and U. Leser. Gripp - indexing and querying graphs based on pre- and postorder numbering. Technical report, 2006.Google Scholar
P. Turan. On an extremal problem in graph theory. Mat. Fiz. Lapok, 48:436--452, 1941.Google Scholar
J. Wang, Z. Zeng, and L. Zhou. Clan: An algorithm for mining closed cliques from large dense graph databases. In Proceedings of the International Conference on Data Engineering, page 73, 2006. Google ScholarDigital Library
J. Wang, Z. Zeng, and L. Zhou. Clan: An algorithm for mining closed cliques from large dense graph databases. In Proceedings of the International Conference on Data Engineering, page 73, 2006. Google ScholarDigital Library
X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD, pages 324--333, 2005. Google ScholarDigital Library
J. W. J. Z., Mellor and C. DeLisi. Visant: an online visualization and analysis tool for biological interaction data. In BMC Bioinformatics, 5:17--24, 2004.Google ScholarCross Ref
Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In Proceedings of the International Conf. Knowledge Discovery and Data Mining, pages 797--802, 2006. Google ScholarDigital Library

Index Terms

CSV: visualizing and mining cohesive subgraphs
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining

Recommendations

Mining frequent cross-graph quasi-cliques

Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different ...
Read More
On the Maximum Number of Cliques in a Graph

A clique is a set of pairwise adjacent vertices in a graph. We determine the maximum number of cliques in a graph for the following graph classes: (1) graphs with n vertices and m edges; (2) graphs with n vertices, m edges, and maximum degree Δ; (3) d-...
Read More
On minimal vertex separators of dually chordal graphs: Properties and characterizations

Many works related to dually chordal graphs, their cliques and neighborhoods were published by Brandstadt et al. (1998) [1] and Gutierrez (1996) [6]. We will undertake a similar study by considering minimal vertex separators and their properties ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clique
data mining
graph density
graph mining
visualization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 66
  Total Citations
  View Citations
- 978
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CSV: visualizing and mining cohesive subgraphs

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining frequent cross-graph quasi-cliques

On the Maximum Number of Cliques in a Graph

On minimal vertex separators of dually chordal graphs: Properties and characterizations