skip to main content
research-article

On triangulation-based dense neighborhood graph discovery

Published:01 November 2010Publication History
Skip Abstract Section

Abstract

This paper introduces a new definition of dense subgraph pattern, the DN -graph. DN -graph considers both the size of the substructure and the minimum level of interactions between any pair of the vertices.

The mining of DN -graphs inherits the difficulty of finding clique, the fully-connected subgraphs. We thus opt for approximately locating the DN -graphs using the state-of-the-art graph triangulation methods. Our solution consists of a family of algorithms, each of which targets a different problem setting. These algorithms are iterative, and utilize repeated scans through the triangles in the graph to approximately locate the DN -graphs. Each scan on the graph triangles improves the results. Since the triangles are not physically materialized, the algorithms have small memory footprint.

With our solution, the users can adopt a "pay as you go" approach. They have the flexibility to terminate the mining process once they are satisfied with the quality of the results. As a result, our algorithms can cope with semi-streaming environment where the graph edges cannot fit into main memory. Results of extensive performance study confirmed our claims.

References

  1. Netflix Prize Data Set, 2009 (accessed July 23, 2009).Google ScholarGoogle Scholar
  2. J. Abello, M. Resende, and R. Sudarsky. Massive quasi-clique detection. In Proc. 5th Latin American Symposium on Theoretical Informatics, pages 598--612, 2002. Google ScholarGoogle Scholar
  3. I. Akihiro, W. Takashi, and M. Hiroshi. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321--354, 2003. Google ScholarGoogle Scholar
  4. P. Aloy, B. BäPttcher, H. Ceulemans, C. Leutwein, C. Mellwig, S. Fischer, A. C. Gavin, P. Bork, S. G. Furga, L. Serrano, and R. D. Russell. Structure-based assembly of protein complexes in yeast. Science, 303(5666):2026--2029, 2004.Google ScholarGoogle Scholar
  5. L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In ACM KDD '08, pages 16--24, New York, NY, USA, 2008. Google ScholarGoogle Scholar
  6. V. Boginski, S. Butenko, and P. Pardalos. Mining market data: a network approach. Computer Operational Research, 33(11):3171--3184, 2006. Google ScholarGoogle Scholar
  7. M. Brockington and J. Culberson. Camouflaging independent sets in quasi-random graphs. DIMACS Series, 26:75--88, 1994.Google ScholarGoogle Scholar
  8. D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB'05, pages 721--732, 2005. Google ScholarGoogle Scholar
  9. J. Han, N. Stefanovic, and K. Koperski. Selective materialization: An efficient method for spatial data cube construction. In PAKDD'98 {Lecture Notes in Artificial Intelligence, 1394, Springer Verlag, 1998}, pages 144--158, 1998. Google ScholarGoogle Scholar
  10. H. Hu, X. Yan, Y. Huang, J. Han, and X. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21:213--221, 2005. Google ScholarGoogle Scholar
  11. C. F. J. Rivas. Proteincprotein interactions essentials: Key concepts to building and analyzing interactome networks. PLoS Computational Biology, 6:6, 2010.Google ScholarGoogle Scholar
  12. R. Karp. Reducibility among combinatorial problems. The Journal of Symbolic Logic, 40:618--619, 1975.Google ScholarGoogle Scholar
  13. M. Latapy. Practical algorithms for triangle computations in very large (sparse (power-law)) graphs. Journal of Theoretical Computer Science, 407:458--473, 2008. Google ScholarGoogle Scholar
  14. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In WEA, pages 606--609, 2005. Google ScholarGoogle Scholar
  15. N. Wang, S. Parthasarathy, K. L. Tan, and A. Tung. Csv: visualizing and mining cohesive subgraphs. In SIGMOD'08, pages 445--458, 2008. Google ScholarGoogle Scholar
  16. I. Xenarios and L. S. etc. DIP, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucleic Acids Research, 30(1):303--305, 2002.Google ScholarGoogle Scholar

Index Terms

  1. On triangulation-based dense neighborhood graph discovery

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 4, Issue 2
          November 2010
          105 pages

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 November 2010
          Published in pvldb Volume 4, Issue 2

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader