skip to main content
research-article

iGraph: a framework for comparisons of disk-based graph indexing techniques

Published:01 September 2010Publication History
Skip Abstract Section

Abstract

Graphs are of growing importance in modeling complex structures such as chemical compounds, proteins, images, and program dependence. Given a query graph Q, the subgraph isomorphism problem is to find a set of graphs containing Q from a graph database, which is NP-complete. Recently, there have been a lot of research efforts to solve the subgraph isomorphism problem for a large graph database by utilizing graph indexes. By using a graph index as a filter, we prune graphs that are not real answers at an inexpensive cost. Then, we need to use expensive subgraph isomorphism tests to verify filtered candidates only. This way, the number of disk I/Os and subgraph isomorphism tests can be significantly minimized. The current practice for experiments in graph indexing techniques is that the author of a newly proposed technique does not implement existing indexes on his own code base, but instead uses the original authors' binary executables and reports only the wall clock time. However, we observe this practice may result in several problems. In order to address these problems, we have made significant efforts in implementing all representative indexing methods on a common framework called iGraph. Unlike existing implementations which either use (full or partial) in-memory representations or rely on OS file system cache without guaranteeing real disk I/Os, we have implemented these indexes on top of a storage engine that guarantees real disk I/Os. Through extensive experiments using many synthetic and real datasets, we also provide new empirical findings in the performance of the full disk-based implementations of these methods.

References

  1. http://www.seagate.com.Google ScholarGoogle Scholar
  2. http://msdn.microsoft.com/en-us/library/cc644950(VS.85).aspx.Google ScholarGoogle Scholar
  3. Aids antiviral dataset used for gindex. http://www.xifengyan.net/software.htm.Google ScholarGoogle Scholar
  4. Gaston. http://www.liacs.nl/snijssen/gaston/iccs.html.Google ScholarGoogle Scholar
  5. gboost. http://www.kyb.mpg.de/bs/people/nowozin/gboost/.Google ScholarGoogle Scholar
  6. Graphgen --- a synthetic graph data generator. http://www.cse.ust.hk/graphgen/.Google ScholarGoogle Scholar
  7. National cancer institute. http://dtp.nci.nih.gov/.Google ScholarGoogle Scholar
  8. The pubchem project. pubchem.ncbi.nlm.nih.gov.Google ScholarGoogle Scholar
  9. Vf2 library. http://amalfi.dis.unina.it/graph/db/vflib-2.0/.Google ScholarGoogle Scholar
  10. A. Ailamaki, D. DeWitt, M. Hill, and M. Skounakis. Weaving relations for cache performance. The VLDB Journal, pages 169--180, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. Nucleic Acids Res, 28:235--242, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10):1367--1372, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, pages 38--49, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, page 549, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. Sober: statistical model-based bug localization. In ESEC/FSE, pages 286--295, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. TKDE, 9(3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, Inc., New York, NY, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Roussev, G. G. R. III, and L. Marziale. Multi-resolution similarity hashing. Digital Investigation, 4(Supplement 1):105--113, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Wörlein, T. Meinl, I. Fischer, and M. Philippsen. A quantitative comparison of the subgraph miners mofa, gspan, ffsm, and gaston. In PKDD, pages 392--403, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  25. X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst., 30(4):960--993, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766--777, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  30. P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, pages 938--949, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181--192, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. iGraph: a framework for comparisons of disk-based graph indexing techniques
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
        September 2010
        1658 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 September 2010
        Published in pvldb Volume 3, Issue 1-2

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader