skip to main content
10.1145/1376616.1376663acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

CSV: visualizing and mining cohesive subgraphs

Published:09 June 2008Publication History

ABSTRACT

Extracting dense sub-components from graphs efficiently is an important objective in a wide range of application domains ranging from social network analysis to biological network analysis, from the World Wide Web to stock market analysis. Motivated by this need recently we have seen several new algorithms to tackle this problem based on the (frequent) pattern mining paradigm. A limitation of most of these methods is that they are highly sensitive to parameter settings, rely on exhaustive enumeration with exponential time complexity, and often fail to help the users understand the underlying distribution of components embedded within the host graph.

In this article we propose an approximate algorithm, to mine and visualize cohesive subgraphs (dense sub components) within a large graph. The approach, refereed to as Cohesive Subgraph Visualization (CSV) relies on a novel mapping strategy that maps edges and nodes to a multi-dimensional space wherein dense areas in the mapped space correspond to cohesive subgraphs. The algorithm then walks through the dense regions in the mapped space to output a visual plot that effectively captures the overall dense sub-component distribution of the graph. Unlike extant algorithms with exponential complexity, CSV has a complexity of O(V2logV) when fixing the parameter mapping dimension, where V corresponds to the number of vertices in the graph, although for many real datasets the performance is typically sub-quadratic.

We demonstrate the utility of CSV as a stand-alone tool for visual graph exploration and as a pre-filtering step to significantly scale up exact subgraph mining algorithms such as CLAN.

References

  1. J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, pages 598--612, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12?15 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD?99), pages 49--60, Philadelphia, PA, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A.P.Gasch and M.B.Eisen. Exploring the conditional coregulation of yeast gene expression through fuzzy k-mean clustering. Genome Biol., 3(RESEARCH0059), 2002.Google ScholarGoogle Scholar
  5. S. Asur, D. Ucar, and S. Parthasarathy. An ensemble framework for clustering protein´lcprotein interaction networks. In ISMB ?07: Proceedings of the 15th Annual International Conference on Intelligent Systems, 2007.Google ScholarGoogle Scholar
  6. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. 2002 Int. Conf. Knowledge Discovery and Data Mining, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Boginski, S. Butenko, and P. M. Pardalos. Mining market data: a network approach. Comput. Oper. Res., 33(11):3171--3184, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Bollobas. Extremal Graph Theory. Dover Publications, Incorporated, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. J. Breitkreutz, C. Stark, and M. Tyers. Osprey: a network visualization system. Genome Biol, 4, 2003.Google ScholarGoogle Scholar
  10. B. Bustos, G. Navarro, and E. Chávez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett., 24(14):2357--2366, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Cheriyan and R. Thurimella. Fast Algorithms for k-Shredders and k -Node Connectivity Augmentation (Extended Abstract). 1996.Google ScholarGoogle Scholar
  12. G. Cong, K. Tan, A. Tung, and F. Pan. Mining Frequent Closed Patterns in Microarray Data. Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM?04)-Volume 00, pages 363--366, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-K covering rule groups for gene expression data. Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 670--681, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang. FARMER: finding interesting rule groups in microarray datasets. Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 143--154, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD?96), pages 226--231, Portland, Oregon, Aug. 1996.Google ScholarGoogle Scholar
  16. C. Faloutsos and K.-I. Lin. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD?95), pages 163--174, San Jose, CA, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H.Hu, X.Yan, Y.Huang, J.Han, and X.J.Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Hu, X. Yan, Y. Huang, J. Han, and X. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. 2005.Google ScholarGoogle Scholar
  21. A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Mach. Learn., 50(3):321--354, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. In Supercomputing'96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), page 35, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Kossinets and D. Watts. Empirical analysis of an evolving social network. Science Magazine, 311(5757), 2006.Google ScholarGoogle Scholar
  24. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, volume 14, 2001.Google ScholarGoogle Scholar
  26. F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. Carpenter: finding closed patterns in long biological datasets. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 637--642, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Pan, A. Tung, G. Cong, and X. Xu. COBBLER: combining column and row enumeration for closed pattern discovery. Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, pages 21--30, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. T. J. R. S. Filho, A. Traina and C. Faloutsos. Similarity search without tears: The omni family of all-purpose access methods. In Proc. of the 17th IEEE Intl. Conf. on Data Engineering, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Seidman. Network structure and minimum degree. Social Networks, 5:269--287, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  30. A. Srinivasan. The value of strong inapproximability results for clique. In STOC ?00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 144--152, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Tri and U. Leser. Gripp - indexing and querying graphs based on pre- and postorder numbering. Technical report, 2006.Google ScholarGoogle Scholar
  32. P. Turan. On an extremal problem in graph theory. Mat. Fiz. Lapok, 48:436--452, 1941.Google ScholarGoogle Scholar
  33. J. Wang, Z. Zeng, and L. Zhou. Clan: An algorithm for mining closed cliques from large dense graph databases. In Proceedings of the International Conference on Data Engineering, page 73, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Wang, Z. Zeng, and L. Zhou. Clan: An algorithm for mining closed cliques from large dense graph databases. In Proceedings of the International Conference on Data Engineering, page 73, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD, pages 324--333, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. W. J. Z., Mellor and C. DeLisi. Visant: an online visualization and analysis tool for biological interaction data. In BMC Bioinformatics, 5:17--24, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  37. Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In Proceedings of the International Conf. Knowledge Discovery and Data Mining, pages 797--802, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CSV: visualizing and mining cohesive subgraphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
        June 2008
        1396 pages
        ISBN:9781605581026
        DOI:10.1145/1376616

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader