Abstract
Graphs are of growing importance in modeling complex structures such as chemical compounds, proteins, images, and program dependence. Given a query graph Q, the subgraph isomorphism problem is to find a set of graphs containing Q from a graph database, which is NP-complete. Recently, there have been a lot of research efforts to solve the subgraph isomorphism problem for a large graph database by utilizing graph indexes. By using a graph index as a filter, we prune graphs that are not real answers at an inexpensive cost. Then, we need to use expensive subgraph isomorphism tests to verify filtered candidates only. This way, the number of disk I/Os and subgraph isomorphism tests can be significantly minimized. The current practice for experiments in graph indexing techniques is that the author of a newly proposed technique does not implement existing indexes on his own code base, but instead uses the original authors' binary executables and reports only the wall clock time. However, we observe this practice may result in several problems. In order to address these problems, we have made significant efforts in implementing all representative indexing methods on a common framework called iGraph. Unlike existing implementations which either use (full or partial) in-memory representations or rely on OS file system cache without guaranteeing real disk I/Os, we have implemented these indexes on top of a storage engine that guarantees real disk I/Os. Through extensive experiments using many synthetic and real datasets, we also provide new empirical findings in the performance of the full disk-based implementations of these methods.
- http://www.seagate.com.Google Scholar
- http://msdn.microsoft.com/en-us/library/cc644950(VS.85).aspx.Google Scholar
- Aids antiviral dataset used for gindex. http://www.xifengyan.net/software.htm.Google Scholar
- Gaston. http://www.liacs.nl/snijssen/gaston/iccs.html.Google Scholar
- gboost. http://www.kyb.mpg.de/bs/people/nowozin/gboost/.Google Scholar
- Graphgen --- a synthetic graph data generator. http://www.cse.ust.hk/graphgen/.Google Scholar
- National cancer institute. http://dtp.nci.nih.gov/.Google Scholar
- The pubchem project. pubchem.ncbi.nlm.nih.gov.Google Scholar
- Vf2 library. http://amalfi.dis.unina.it/graph/db/vflib-2.0/.Google Scholar
- A. Ailamaki, D. DeWitt, M. Hill, and M. Skounakis. Weaving relations for cache performance. The VLDB Journal, pages 169--180, 2001. Google ScholarDigital Library
- H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. Nucleic Acids Res, 28:235--242, 2000.Google ScholarCross Ref
- J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarDigital Library
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10):1367--1372, 2004. Google ScholarDigital Library
- H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, pages 38--49, 2006. Google ScholarDigital Library
- J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, page 549, 2003. Google ScholarDigital Library
- C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. Sober: statistical model-based bug localization. In ESEC/FSE, pages 286--295, 2005. Google ScholarDigital Library
- E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. TKDE, 9(3), 1997. Google ScholarDigital Library
- R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, Inc., New York, NY, USA, 2003. Google ScholarDigital Library
- V. Roussev, G. G. R. III, and L. Marziale. Multi-resolution similarity hashing. Digital Investigation, 4(Supplement 1):105--113, 2007. Google ScholarDigital Library
- H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008. Google ScholarDigital Library
- D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002. Google ScholarDigital Library
- Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008. Google ScholarDigital Library
- J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarDigital Library
- M. Wörlein, T. Meinl, I. Fischer, and M. Philippsen. A quantitative comparison of the subgraph miners mofa, gspan, ffsm, and gaston. In PKDD, pages 392--403, 2005. Google ScholarCross Ref
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst., 30(4):960--993, 2005. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766--777, 2005. Google ScholarDigital Library
- S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.Google ScholarCross Ref
- P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, pages 938--949, 2007. Google ScholarDigital Library
- L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181--192, 2008. Google ScholarDigital Library
Index Terms
- iGraph: a framework for comparisons of disk-based graph indexing techniques
Recommendations
iGraph in action: performance analysis of disk-based graph indexing techniques
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataGraphs provide a powerful way to model complex structures such as chemical compounds, proteins, images, and program dependence. The previous practice for experiments in graph indexing techniques is that the author of a newly proposed technique does not ...
iGraph: an incremental data processing system for dynamic graph
With the popularity of social network, the demand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is ...
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Comments