ABSTRACT
Given a query graph q and a data graph g, the subgraph isomorphism search finds all occurrences of q in g and is considered one of the most fundamental query types for many real applications. While this problem belongs to NP-hard, many algorithms have been proposed to solve it in a reasonable time for real datasets. However, a recent study has shown, through an extensive benchmark with various real datasets, that all existing algorithms have serious problems in their matching order selection. Furthermore, all algorithms blindly permutate all possible mappings for query vertices, often leading to useless computations. In this paper, we present an efficient and robust subgraph search solution, called TurboISO, which is turbo-charged with two novel concepts, candidate region exploration and the combine and permute strategy (in short, Comb/Perm). The candidate region exploration identifies on-the-fly candidate subgraphs (i.e, candidate regions), which contain embeddings, and computes a robust matching order for each candidate region explored. The Comb/Perm strategy exploits the novel concept of the neighborhood equivalence class (NEC). Each query vertex in the same NEC has identically matching data vertices. During subgraph isomorphism search, Comb/Perm generates only combinations for each NEC instead of permutating all possible enumerations. Thus, if a chosen combination is determined to not contribute to a complete solution, all possible permutations for that combination will be safely pruned. Extensive experiments with many real datasets show that TurboISO consistently and significantly outperforms all competitors by up to several orders of magnitude.
- R. A. Baeza-Yates and G. Valiente. An image similarity measure based on graph matching. In SPIRE, pages 28--38, 2000. Google ScholarDigital Library
- J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarDigital Library
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE PAMI, 26(10):1367--1372, 2004. Google ScholarDigital Library
- M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036--1050, 2005. Google ScholarDigital Library
- W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. iGraph: A framework for comparisons of disk-based graph indexing techniques. PVLDB, 3(1):449--459, 2010. Google ScholarDigital Library
- H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, pages 405--418, 2008. Google ScholarDigital Library
- A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao. Neighborhood based fast graph search in large networks. In SIGMOD, pages 901--912, 2011. Google ScholarDigital Library
- J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2), 2013, http://www-db.knu.ac.kr/vldb13.pdf. Google ScholarDigital Library
- M. Mongiovi, R. D. Natale, R. Giugno, A. Pulvirenti, A. Ferro, and R. Sharan. Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinformatics and Computational Biology, 8(2):199--218, 2010.Google ScholarCross Ref
- M. Ohlrich, C. Ebeling, E. Ginting, and L. Sather. Subgemini: Identifying subcircuits using a fast subgraph isomorphism algorithm. In DAC, pages 31--37, 1993. Google ScholarDigital Library
- M. E. Saeedy and P. Kalnis. Grami: Generalized frequent pattern mining in a single large graph.Google Scholar
- H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li. The trinity graph engine. Technical Report 161291, Microsoft Research, 2012.Google Scholar
- Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, 2012. Google ScholarDigital Library
- Y. Tian and J. M. Patel. TALE: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008. Google ScholarDigital Library
- J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23:31--42, January 1976. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst., 30(4):960--993, 2005. Google ScholarDigital Library
- S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.Google ScholarCross Ref
- S. Zhang, S. Li, and J. Yang. GADDI: distance index based subgraph matching in biological networks. In EDBT, pages 192--203, 2009. Google ScholarDigital Library
- P. Zhao and J. Han. On graph query optimization in large networks. PVLDB, 3(1):340--351, 2010. Google ScholarDigital Library
- P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, pages 938--949, 2007. Google ScholarDigital Library
- L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181--192, 2008. Google ScholarDigital Library
Index Terms
- Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases
Recommendations
Finding a chain graph in a bipartite permutation graph
We present a polynomial-time algorithm for solving Subgraph Isomorphism where the base graphs are bipartite permutation graphs and the pattern graphs are chain graphs. Subgraph Isomorphism is studied on graph classes.A polynomial-time algorithm is given ...
Polynomial-time algorithms for Subgraph Isomorphism in small graph classes of perfect graphs
Given two graphs, Subgraph Isomorphism is the problem of deciding whether the first graph (the base graph) contains a subgraph isomorphic to the second one (the pattern graph). This problem is NP-complete even for very restricted graph classes such as ...
Are unique subgraphs not easier to find?
A counterexample to claimed bounds for subgraph isomorphism in terms of those when a graph contains 1 pattern copies.A new time bound for subgraph isomorphism in terms of that when a graph contains 1 pattern copies.A new time bound for induced subgraph ...
Comments