skip to main content
research-article

GraMi: frequent subgraph and pattern mining in a single large graph

Published:01 March 2014Publication History
Skip Abstract Section

Abstract

Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GraMi, a novel framework for frequent subgraph mining in a single large graph. GraMi undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GraMi that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGraMi, a version supporting structural and semantic constraints, and AGraMi, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches.

References

  1. B. Bringmann. Mining Patterns in Structured Data. PhD thesis, KU Leuven, 2009.Google ScholarGoogle Scholar
  2. B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. of PAKDD, pages 858--863, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Chen, X. Yan, F. Zhu, and J. Han. gApprox: Mining frequent approximate patterns from a massive network. In Proc. of ICDM, pages 445--450, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In Proc. of ICDE, pages 913--922, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y.-R. Cho and A. Zhang. Predicting protein function by frequent functional association pattern mining in protein interaction networks. Trans. Info. Tech. Biomed., 14(1):30--36, Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W.-T. Chu and M.-H. Tsai. Visual pattern discovery for architecture image classification and product image search. In Proc. of ICMR, pages 27:1--27:8, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1(1):231--255, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. de Givry, T. Schiex, and G. Verfaillie. Exploiting tree decomposition and soft local consistency in weighted CSP. In Proc. of AAAI, pages 22--27, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. of ICDM, pages 35--42, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with stored. In Proc. of SIGMOD, pages 431--442, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Domshlak, R. I. Brafman, and S. E. Shimony. Preference-based configuration of web page content. In Proc. of IJCAI, pages 1451--1456, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Fiedler and C. Borgelt. Subgraph support in a single large graph. In Proc. of ICDMW, pages 399--404, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Ghazizadeh and S. S. Chawathe. Seus: Structure extraction using summaries. In Proc. of DS, pages 71--85, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Guralnik and G. Karypis. A scalable algorithm for clustering sequential data. In Proc. of ICDM, pages 179--186, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. of SIGMOD, pages 405--418, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD, pages 867--878, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Kuramochi and G. Karypis. Grew - A scalable frequent subgraph discovery algorithm. In Proc. of ICDM, pages 439--442, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8(1):99--118, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. J. McGregor. Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Information Sciences, 19: 228--250, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. of ICDE, pages 844--855, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. TKDD, 4(3): 10:1--10:42, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. R. Ullmann. An algorithm for subgraph isomorphism. Journal of ACM, 23: 31--42, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In Proc. of SIGMOD, pages 433--444, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.Google ScholarGoogle Scholar
  33. F. Zhu, X. Yan, J. Han, and P. S. Yu. gPrune: A constraint pushing framework for graph pattern mining. In Proc. of PAKDD, pages 388--400, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Zou, L. Chen, and M. T. Özsu. Distance-join: pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GraMi: frequent subgraph and pattern mining in a single large graph
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 7, Issue 7
      March 2014
      108 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 March 2014
      Published in pvldb Volume 7, Issue 7

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader