Article

Fast and practical indexing and querying of very large graphs

Authors:
Silke Trißl

Humboldt-Universität zu Berlin, Berlin, Germany

Humboldt-Universität zu Berlin, Berlin, Germany
View Profile

,
Ulf Leser

Humboldt-Universität zu Berlin, Berlin, Germany

Humboldt-Universität zu Berlin, Berlin, Germany
View Profile

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataJune 2007Pages 845–856https://doi.org/10.1145/1247480.1247573

Published:11 June 2007Publication History

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Pages 845–856

ABSTRACT

Many applications work with graph-structured data. As graphs grow in size, indexing becomes essential to ensure sufficient query performance. We present the GRIPP index structure (GRaph Indexing based on Pre- and Postorder numbering) for answering reachability queries in graphs.

GRIPP requires only linear time and space. Using GRIPP, we can answer reachability queries on graphs with 5 million nodes on average in less than 5 milliseconds, which is unrivaled by previous methods. We evaluate the performance and scalability of our approach on real and synthetic random and scale-free graphs and compare our approach to existing indexing schemes. GRIPP is implemented as stored procedure inside a relational database management system and can therefore very easily be integrated into existing graph-oriented applications.

References

R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient Management of Transitive Relationships in Large Data and Knowledge Bases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 253--262, 1989. ACM Press. Google ScholarDigital Library
R. Agrawal and H. V. Jagadish. Direct algorithms for computing the transitive closure of database relations. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), pages 255--266, 1987. Morgan Kaufmann. Google ScholarDigital Library
A. L. Barabàsi and R. Albert. Emergence of Scaling in Random Networks. Science, 286(5439):509--512, Oct 1999.Google Scholar
A. L. Barabàsi and Z. N. Oltvai. Network biology: understanding the cell's functional organization. Nature Reviews Genetics, 5(2):101--113, Feb 2004.Google ScholarCross Ref
I. Borodina and J. Nielsen. From genomes to in silico cells via metabolic networks. Current Opinion in Biotechnology, 16(3):350--355, Jun 2005.Google ScholarCross Ref
L. Chen, A. Gupta, and M. E. Kurul. Stack-based Algorithms for Pattern Matching on DAGs. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pages 493--504, 2005. ACM Press. Google ScholarDigital Library
J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast Computation of Reachability Labeling for Large Graphs. In Proceedings of the 10th International Conference on Extending Database Technology (EDBT), volume 3896 of Lecture Notes in Computer Science, pages 961--979, 2006. Springer. Google ScholarDigital Library
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and Distance Queries via 2-Hop Labels. SIAM J. Comput., 32(5):1338--1355, 2003. Google ScholarDigital Library
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 2001. Google ScholarDigital Library
P. Dietz and D. Sleator. Two algorithms for maintaining order in a list. In Proceedings of the 19th annual ACM Symposium on Theory of computing (STOC), pages 365--372, 1987. ACM Press. Google ScholarDigital Library
P. Erdös and A. Rényi. On the Evolution of Random Graphs. Publ. Math. Inst. Hungar. Acad. Sci., 5:17--61, 1960.Google Scholar
M. F. Fernandez, D. Florescu, A. Y. Levy, and D. Suciu. A query language for a web-site management system. SIGMOD Record, 26(3):4--11, 1997. Google ScholarDigital Library
T. Grust, M. van Keulen, and J. Teubner. Accelerating XPath evaluation in any RDBMS. ACM Trans. Database Syst., 29:91--131, 2004. Google ScholarDigital Library
R. H. Güting. GraphDB: Modeling and Querying Graphs in Databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages 297--308, 1994. Morgan Kaufmann. Google ScholarDigital Library
H. He, H. Wang, J. Yang, and P. S. Yu. Compact Reachability Labeling for Graph-Structured Data. In Proceedings of the 2005 ACM International Conference on Information and Knowledge Management (CIKM), pages 594--601, 2005. ACM Press. Google ScholarDigital Library
J. van Helden, A. Naim, R. Mancuso, M. Eldridge, et al. Representing and analysing molecular and cellular function using the computer. Journal of Biological Chemistry, 381(9-10):921--935, Sep-Oct 2000.Google Scholar
G. Joshi-Tope, M. Gillespie, I. Vastrik, P. D'Eustachio, et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Research, 33:D428--D432, Jan 2005.Google ScholarCross Ref
M. Kanehisa, S. Goto, S. Kavashima, Y. Okuno, and M. Hattori. The KEGG resource for deciphering the genome. Nucleic Acids Research, 32:D277--D280, Jan 2004.Google ScholarCross Ref
G. Karvounarakis, S. Alexaki, V. Christophides, D. Plexousakis, and M. Scholl. RQL: A declarative query language for RDF, 2002. In Proceedings of the 11th Intl. World Wide Web Conference (WWW), 2002. Google ScholarDigital Library
C. Lemer, E. Antezana, F. Couche, F. Fays, et al. The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research, 32:D443--D448, Jan 2004.Google ScholarCross Ref
H. Lu. New Strategies for Computing the Transitive Closure of a Database Relation. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), pages 267--274, 1987. Morgan Kaufmann. Google ScholarDigital Library
R. Schenkel, A. Theobald, and G. Weikum. Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections. In Proceedings of the 21st International Conference on Data Engineering (ICDE), pages 360--371, 2005. IEEE Computer Society. Google ScholarDigital Library
U. Stelzl, U. Worm, M. Lalowski, C. Haenig, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122(6):957--968, Sep 2005.Google ScholarCross Ref
S. Trißl and U. Leser. Querying Ontologies in Relational Database Systems. In Proceedings of the Second International Workshop on Data Integration in the Life Sciences (DILS), volume 3615 of Lecture Notes in Computer Science, pages 63--79, 2005. Springer. Google ScholarDigital Library
S. Trißl and U. Leser. GRIPP-Indexing and Querying Graphs based on Pre- and Postorder Numbering. Technical Report No. 207, Humboldt-Universität zu Berlin, Institut für Informatik, 2006.Google Scholar
H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. In Proceedings of the 22nd International Conference on Data Engineering (ICDE), page 75, 2006. IEEE Computer Society. Google ScholarDigital Library

Index Terms

Fast and practical indexing and querying of very large graphs
1. Information systems
  1. Information systems applications

Recommendations

3-HOP: a high-compression indexing scheme for reachability query
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Reachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the ...
Read More
Efficiently answering reachability queries on very large directed graphs
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In ...
Read More
Fast processing of graph queries on a large database of small and medium-sized data graphs

We propose a new way of indexing a large database of small and medium-sized graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather than decomposing a graph into smaller units (e.g., ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
General Chairs:
Lizhu Zhou
Tsinghua University, China
,
Tok Wang Ling
National University of Singapore, Singapore
,
Program Chair:
Beng Chin Ooi
National University of Singapore, Singapore
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
databases
graph indexing
reachability queries
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 144
  Total Citations
  View Citations
- 206
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast and practical indexing and querying of very large graphs

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

3-HOP: a high-compression indexing scheme for reachability query

Efficiently answering reachability queries on very large directed graphs

Fast processing of graph queries on a large database of small and medium-sized data graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fast and practical indexing and querying of very large graphs

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

3-HOP: a high-compression indexing scheme for reachability query

Efficiently answering reachability queries on very large directed graphs

Fast processing of graph queries on a large database of small and medium-sized data graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media