Abstract
Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GraMi, a novel framework for frequent subgraph mining in a single large graph. GraMi undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GraMi that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGraMi, a version supporting structural and semantic constraints, and AGraMi, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches.
- B. Bringmann. Mining Patterns in Structured Data. PhD thesis, KU Leuven, 2009.Google Scholar
- B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. of PAKDD, pages 858--863, 2008. Google ScholarDigital Library
- C. Chen, X. Yan, F. Zhu, and J. Han. gApprox: Mining frequent approximate patterns from a massive network. In Proc. of ICDM, pages 445--450, 2007. Google ScholarDigital Library
- J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In Proc. of ICDE, pages 913--922, 2008. Google ScholarDigital Library
- Y.-R. Cho and A. Zhang. Predicting protein function by frequent functional association pattern mining in protein interaction networks. Trans. Info. Tech. Biomed., 14(1):30--36, Jan. 2010. Google ScholarDigital Library
- W.-T. Chu and M.-H. Tsai. Visual pattern discovery for architecture image classification and product image search. In Proc. of ICMR, pages 27:1--27:8, 2012. Google ScholarDigital Library
- D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1(1):231--255, 1994. Google ScholarDigital Library
- S. de Givry, T. Schiex, and G. Verfaillie. Exploiting tree decomposition and soft local consistency in weighted CSP. In Proc. of AAAI, pages 22--27, 2006. Google ScholarDigital Library
- M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. of ICDM, pages 35--42, 2003. Google ScholarDigital Library
- A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with stored. In Proc. of SIGMOD, pages 431--442, 1999. Google ScholarDigital Library
- C. Domshlak, R. I. Brafman, and S. E. Shimony. Preference-based configuration of web page content. In Proc. of IJCAI, pages 1451--1456, 2001. Google ScholarDigital Library
- M. Fiedler and C. Borgelt. Subgraph support in a single large graph. In Proc. of ICDMW, pages 399--404, 2007. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarDigital Library
- S. Ghazizadeh and S. S. Chawathe. Seus: Structure extraction using summaries. In Proc. of DS, pages 71--85, 2002. Google ScholarDigital Library
- V. Guralnik and G. Karypis. A scalable algorithm for clustering sequential data. In Proc. of ICDM, pages 179--186, 2001. Google ScholarDigital Library
- H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. of SIGMOD, pages 405--418, 2008. Google ScholarDigital Library
- A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD, pages 867--878, 2010. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Grew - A scalable frequent subgraph discovery algorithm. In Proc. of ICDM, pages 439--442, 2004. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarDigital Library
- J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, Dec. 2012. Google ScholarDigital Library
- A. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8(1):99--118, 1977.Google ScholarDigital Library
- J. J. McGregor. Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Information Sciences, 19: 228--250, 1979.Google ScholarCross Ref
- S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. of ICDE, pages 844--855, 2009. Google ScholarDigital Library
- Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, May 2012. Google ScholarDigital Library
- L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. TKDD, 4(3): 10:1--10:42, 2010. Google ScholarDigital Library
- J. R. Ullmann. An algorithm for subgraph isomorphism. Journal of ACM, 23: 31--42, 1976. Google ScholarDigital Library
- X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In Proc. of SIGMOD, pages 433--444, 2008. Google ScholarDigital Library
- X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002. Google ScholarDigital Library
- X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
- R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.Google Scholar
- F. Zhu, X. Yan, J. Han, and P. S. Yu. gPrune: A constraint pushing framework for graph pattern mining. In Proc. of PAKDD, pages 388--400, 2007. Google ScholarDigital Library
- L. Zou, L. Chen, and M. T. Özsu. Distance-join: pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009. Google ScholarDigital Library
Index Terms
- GraMi: frequent subgraph and pattern mining in a single large graph
Recommendations
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes
An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs
A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Comments