research-article

iGraph: a framework for comparisons of disk-based graph indexing techniques

Authors:
Wook-Shin Han

Kyungpook National University, Korea

Kyungpook National University, Korea
View Profile

,
Jinsoo Lee

Kyungpook National University, Korea

Kyungpook National University, Korea
View Profile

,
Minh-Duc Pham

Kyungpook National University, Korea

Kyungpook National University, Korea
View Profile

,
Jeffrey Xu Yu

Chinese University of Hong Kong, Hong Kong

Chinese University of Hong Kong, Hong Kong
View Profile

Proceedings of the VLDB Endowment Volume 3 Issue 1-2pp 449–459https://doi.org/10.14778/1920841.1920901

Published:01 September 2010Publication History

Proceedings of the VLDB Endowment

Abstract

Graphs are of growing importance in modeling complex structures such as chemical compounds, proteins, images, and program dependence. Given a query graph Q, the subgraph isomorphism problem is to find a set of graphs containing Q from a graph database, which is NP-complete. Recently, there have been a lot of research efforts to solve the subgraph isomorphism problem for a large graph database by utilizing graph indexes. By using a graph index as a filter, we prune graphs that are not real answers at an inexpensive cost. Then, we need to use expensive subgraph isomorphism tests to verify filtered candidates only. This way, the number of disk I/Os and subgraph isomorphism tests can be significantly minimized. The current practice for experiments in graph indexing techniques is that the author of a newly proposed technique does not implement existing indexes on his own code base, but instead uses the original authors' binary executables and reports only the wall clock time. However, we observe this practice may result in several problems. In order to address these problems, we have made significant efforts in implementing all representative indexing methods on a common framework called iGraph. Unlike existing implementations which either use (full or partial) in-memory representations or rely on OS file system cache without guaranteeing real disk I/Os, we have implemented these indexes on top of a storage engine that guarantees real disk I/Os. Through extensive experiments using many synthetic and real datasets, we also provide new empirical findings in the performance of the full disk-based implementations of these methods.

References

http://www.seagate.com.Google Scholar
http://msdn.microsoft.com/en-us/library/cc644950(VS.85).aspx.Google Scholar
Aids antiviral dataset used for gindex. http://www.xifengyan.net/software.htm.Google Scholar
Gaston. http://www.liacs.nl/snijssen/gaston/iccs.html.Google Scholar
gboost. http://www.kyb.mpg.de/bs/people/nowozin/gboost/.Google Scholar
Graphgen --- a synthetic graph data generator. http://www.cse.ust.hk/graphgen/.Google Scholar
National cancer institute. http://dtp.nci.nih.gov/.Google Scholar
The pubchem project. pubchem.ncbi.nlm.nih.gov.Google Scholar
Vf2 library. http://amalfi.dis.unina.it/graph/db/vflib-2.0/.Google Scholar
A. Ailamaki, D. DeWitt, M. Hill, and M. Skounakis. Weaving relations for cache performance. The VLDB Journal, pages 169--180, 2001. Google ScholarDigital Library
H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. Nucleic Acids Res, 28:235--242, 2000.Google ScholarCross Ref
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarDigital Library
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10):1367--1372, 2004. Google ScholarDigital Library
H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, pages 38--49, 2006. Google ScholarDigital Library
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, page 549, 2003. Google ScholarDigital Library
C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. Sober: statistical model-based bug localization. In ESEC/FSE, pages 286--295, 2005. Google ScholarDigital Library
E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. TKDE, 9(3), 1997. Google ScholarDigital Library
R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, Inc., New York, NY, USA, 2003. Google ScholarDigital Library
V. Roussev, G. G. R. III, and L. Marziale. Multi-resolution similarity hashing. Digital Investigation, 4(Supplement 1):105--113, 2007. Google ScholarDigital Library
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008. Google ScholarDigital Library
D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002. Google ScholarDigital Library
Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008. Google ScholarDigital Library
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarDigital Library
M. Wörlein, T. Meinl, I. Fischer, and M. Philippsen. A quantitative comparison of the subgraph miners mofa, gspan, ffsm, and gaston. In PKDD, pages 392--403, 2005. Google ScholarCross Ref
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst., 30(4):960--993, 2005. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766--777, 2005. Google ScholarDigital Library
S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.Google ScholarCross Ref
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, pages 938--949, 2007. Google ScholarDigital Library
L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181--192, 2008. Google ScholarDigital Library

Index Terms

iGraph: a framework for comparisons of disk-based graph indexing techniques
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Index terms have been assigned to the content through auto-classification.

Recommendations

iGraph in action: performance analysis of disk-based graph indexing techniques
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Graphs provide a powerful way to model complex structures such as chemical compounds, proteins, images, and program dependence. The previous practice for experiments in graph indexing techniques is that the author of a newly proposed technique does not ...
Read More
iGraph: an incremental data processing system for dynamic graph

With the popularity of social network, the demand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is ...
Read More
On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 3, Issue 1-2
September 2010
1658 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2010
Published in pvldb Volume 3, Issue 1-2
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 722
  Total Downloads
- Downloads (Last 12 months)46
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

iGraph: a framework for comparisons of disk-based graph indexing techniques

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

iGraph in action: performance analysis of disk-based graph indexing techniques

iGraph: an incremental data processing system for dynamic graph

On the Multichromatic Number of s-Stable Kneser Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

iGraph: a framework for comparisons of disk-based graph indexing techniques

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

iGraph in action: performance analysis of disk-based graph indexing techniques

iGraph: an incremental data processing system for dynamic graph

On the Multichromatic Number of s-Stable Kneser Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media