skip to main content
10.1145/1247480.1247573acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Fast and practical indexing and querying of very large graphs

Published:11 June 2007Publication History

ABSTRACT

Many applications work with graph-structured data. As graphs grow in size, indexing becomes essential to ensure sufficient query performance. We present the GRIPP index structure (GRaph Indexing based on Pre- and Postorder numbering) for answering reachability queries in graphs.

GRIPP requires only linear time and space. Using GRIPP, we can answer reachability queries on graphs with 5 million nodes on average in less than 5 milliseconds, which is unrivaled by previous methods. We evaluate the performance and scalability of our approach on real and synthetic random and scale-free graphs and compare our approach to existing indexing schemes. GRIPP is implemented as stored procedure inside a relational database management system and can therefore very easily be integrated into existing graph-oriented applications.

References

  1. R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient Management of Transitive Relationships in Large Data and Knowledge Bases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 253--262, 1989. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and H. V. Jagadish. Direct algorithms for computing the transitive closure of database relations. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), pages 255--266, 1987. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. L. Barabàsi and R. Albert. Emergence of Scaling in Random Networks. Science, 286(5439):509--512, Oct 1999.Google ScholarGoogle Scholar
  4. A. L. Barabàsi and Z. N. Oltvai. Network biology: understanding the cell's functional organization. Nature Reviews Genetics, 5(2):101--113, Feb 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. I. Borodina and J. Nielsen. From genomes to in silico cells via metabolic networks. Current Opinion in Biotechnology, 16(3):350--355, Jun 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Chen, A. Gupta, and M. E. Kurul. Stack-based Algorithms for Pattern Matching on DAGs. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pages 493--504, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast Computation of Reachability Labeling for Large Graphs. In Proceedings of the 10th International Conference on Extending Database Technology (EDBT), volume 3896 of Lecture Notes in Computer Science, pages 961--979, 2006. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and Distance Queries via 2-Hop Labels. SIAM J. Comput., 32(5):1338--1355, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Dietz and D. Sleator. Two algorithms for maintaining order in a list. In Proceedings of the 19th annual ACM Symposium on Theory of computing (STOC), pages 365--372, 1987. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Erdös and A. Rényi. On the Evolution of Random Graphs. Publ. Math. Inst. Hungar. Acad. Sci., 5:17--61, 1960.Google ScholarGoogle Scholar
  12. M. F. Fernandez, D. Florescu, A. Y. Levy, and D. Suciu. A query language for a web-site management system. SIGMOD Record, 26(3):4--11, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Grust, M. van Keulen, and J. Teubner. Accelerating XPath evaluation in any RDBMS. ACM Trans. Database Syst., 29:91--131, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. H. Güting. GraphDB: Modeling and Querying Graphs in Databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages 297--308, 1994. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. He, H. Wang, J. Yang, and P. S. Yu. Compact Reachability Labeling for Graph-Structured Data. In Proceedings of the 2005 ACM International Conference on Information and Knowledge Management (CIKM), pages 594--601, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. van Helden, A. Naim, R. Mancuso, M. Eldridge, et al. Representing and analysing molecular and cellular function using the computer. Journal of Biological Chemistry, 381(9-10):921--935, Sep-Oct 2000.Google ScholarGoogle Scholar
  17. G. Joshi-Tope, M. Gillespie, I. Vastrik, P. D'Eustachio, et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Research, 33:D428--D432, Jan 2005.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Kanehisa, S. Goto, S. Kavashima, Y. Okuno, and M. Hattori. The KEGG resource for deciphering the genome. Nucleic Acids Research, 32:D277--D280, Jan 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. G. Karvounarakis, S. Alexaki, V. Christophides, D. Plexousakis, and M. Scholl. RQL: A declarative query language for RDF, 2002. In Proceedings of the 11th Intl. World Wide Web Conference (WWW), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Lemer, E. Antezana, F. Couche, F. Fays, et al. The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research, 32:D443--D448, Jan 2004.Google ScholarGoogle ScholarCross RefCross Ref
  21. H. Lu. New Strategies for Computing the Transitive Closure of a Database Relation. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), pages 267--274, 1987. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Schenkel, A. Theobald, and G. Weikum. Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections. In Proceedings of the 21st International Conference on Data Engineering (ICDE), pages 360--371, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. U. Stelzl, U. Worm, M. Lalowski, C. Haenig, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122(6):957--968, Sep 2005.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Trißl and U. Leser. Querying Ontologies in Relational Database Systems. In Proceedings of the Second International Workshop on Data Integration in the Life Sciences (DILS), volume 3615 of Lecture Notes in Computer Science, pages 63--79, 2005. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Trißl and U. Leser. GRIPP-Indexing and Querying Graphs based on Pre- and Postorder Numbering. Technical Report No. 207, Humboldt-Universität zu Berlin, Institut für Informatik, 2006.Google ScholarGoogle Scholar
  26. H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. In Proceedings of the 22nd International Conference on Data Engineering (ICDE), page 75, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast and practical indexing and querying of very large graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
      June 2007
      1210 pages
      ISBN:9781595936868
      DOI:10.1145/1247480
      • General Chairs:
      • Lizhu Zhou,
      • Tok Wang Ling,
      • Program Chair:
      • Beng Chin Ooi

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader