ABSTRACT
Many applications work with graph-structured data. As graphs grow in size, indexing becomes essential to ensure sufficient query performance. We present the GRIPP index structure (GRaph Indexing based on Pre- and Postorder numbering) for answering reachability queries in graphs.
GRIPP requires only linear time and space. Using GRIPP, we can answer reachability queries on graphs with 5 million nodes on average in less than 5 milliseconds, which is unrivaled by previous methods. We evaluate the performance and scalability of our approach on real and synthetic random and scale-free graphs and compare our approach to existing indexing schemes. GRIPP is implemented as stored procedure inside a relational database management system and can therefore very easily be integrated into existing graph-oriented applications.
- R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient Management of Transitive Relationships in Large Data and Knowledge Bases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 253--262, 1989. ACM Press. Google ScholarDigital Library
- R. Agrawal and H. V. Jagadish. Direct algorithms for computing the transitive closure of database relations. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), pages 255--266, 1987. Morgan Kaufmann. Google ScholarDigital Library
- A. L. Barabàsi and R. Albert. Emergence of Scaling in Random Networks. Science, 286(5439):509--512, Oct 1999.Google Scholar
- A. L. Barabàsi and Z. N. Oltvai. Network biology: understanding the cell's functional organization. Nature Reviews Genetics, 5(2):101--113, Feb 2004.Google ScholarCross Ref
- I. Borodina and J. Nielsen. From genomes to in silico cells via metabolic networks. Current Opinion in Biotechnology, 16(3):350--355, Jun 2005.Google ScholarCross Ref
- L. Chen, A. Gupta, and M. E. Kurul. Stack-based Algorithms for Pattern Matching on DAGs. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pages 493--504, 2005. ACM Press. Google ScholarDigital Library
- J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast Computation of Reachability Labeling for Large Graphs. In Proceedings of the 10th International Conference on Extending Database Technology (EDBT), volume 3896 of Lecture Notes in Computer Science, pages 961--979, 2006. Springer. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and Distance Queries via 2-Hop Labels. SIAM J. Comput., 32(5):1338--1355, 2003. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 2001. Google ScholarDigital Library
- P. Dietz and D. Sleator. Two algorithms for maintaining order in a list. In Proceedings of the 19th annual ACM Symposium on Theory of computing (STOC), pages 365--372, 1987. ACM Press. Google ScholarDigital Library
- P. Erdös and A. Rényi. On the Evolution of Random Graphs. Publ. Math. Inst. Hungar. Acad. Sci., 5:17--61, 1960.Google Scholar
- M. F. Fernandez, D. Florescu, A. Y. Levy, and D. Suciu. A query language for a web-site management system. SIGMOD Record, 26(3):4--11, 1997. Google ScholarDigital Library
- T. Grust, M. van Keulen, and J. Teubner. Accelerating XPath evaluation in any RDBMS. ACM Trans. Database Syst., 29:91--131, 2004. Google ScholarDigital Library
- R. H. Güting. GraphDB: Modeling and Querying Graphs in Databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages 297--308, 1994. Morgan Kaufmann. Google ScholarDigital Library
- H. He, H. Wang, J. Yang, and P. S. Yu. Compact Reachability Labeling for Graph-Structured Data. In Proceedings of the 2005 ACM International Conference on Information and Knowledge Management (CIKM), pages 594--601, 2005. ACM Press. Google ScholarDigital Library
- J. van Helden, A. Naim, R. Mancuso, M. Eldridge, et al. Representing and analysing molecular and cellular function using the computer. Journal of Biological Chemistry, 381(9-10):921--935, Sep-Oct 2000.Google Scholar
- G. Joshi-Tope, M. Gillespie, I. Vastrik, P. D'Eustachio, et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Research, 33:D428--D432, Jan 2005.Google ScholarCross Ref
- M. Kanehisa, S. Goto, S. Kavashima, Y. Okuno, and M. Hattori. The KEGG resource for deciphering the genome. Nucleic Acids Research, 32:D277--D280, Jan 2004.Google ScholarCross Ref
- G. Karvounarakis, S. Alexaki, V. Christophides, D. Plexousakis, and M. Scholl. RQL: A declarative query language for RDF, 2002. In Proceedings of the 11th Intl. World Wide Web Conference (WWW), 2002. Google ScholarDigital Library
- C. Lemer, E. Antezana, F. Couche, F. Fays, et al. The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research, 32:D443--D448, Jan 2004.Google ScholarCross Ref
- H. Lu. New Strategies for Computing the Transitive Closure of a Database Relation. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), pages 267--274, 1987. Morgan Kaufmann. Google ScholarDigital Library
- R. Schenkel, A. Theobald, and G. Weikum. Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections. In Proceedings of the 21st International Conference on Data Engineering (ICDE), pages 360--371, 2005. IEEE Computer Society. Google ScholarDigital Library
- U. Stelzl, U. Worm, M. Lalowski, C. Haenig, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122(6):957--968, Sep 2005.Google ScholarCross Ref
- S. Trißl and U. Leser. Querying Ontologies in Relational Database Systems. In Proceedings of the Second International Workshop on Data Integration in the Life Sciences (DILS), volume 3615 of Lecture Notes in Computer Science, pages 63--79, 2005. Springer. Google ScholarDigital Library
- S. Trißl and U. Leser. GRIPP-Indexing and Querying Graphs based on Pre- and Postorder Numbering. Technical Report No. 207, Humboldt-Universität zu Berlin, Institut für Informatik, 2006.Google Scholar
- H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. In Proceedings of the 22nd International Conference on Data Engineering (ICDE), page 75, 2006. IEEE Computer Society. Google ScholarDigital Library
Index Terms
- Fast and practical indexing and querying of very large graphs
Recommendations
3-HOP: a high-compression indexing scheme for reachability query
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataReachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the ...
Efficiently answering reachability queries on very large directed graphs
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataEfficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In ...
Fast processing of graph queries on a large database of small and medium-sized data graphs
We propose a new way of indexing a large database of small and medium-sized graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather than decomposing a graph into smaller units (e.g., ...
Comments