ABSTRACT
Graph reachability is a fundamental research problem that finds its use in many applications such as geographic navigation, bioinformatics, web ontologies and XML databases, etc. Given two vertices, u and v, in a directed graph, a reachability query asks if there is a directed path from u to v. Over the last two decades, many indexing schemes have been proposed to support reachability queries on large graphs. Typically, those schemes based on chain or tree covers work well when the graph is sparse. For dense graphs, they still have fast query time but require large storage for their indices. In contrast, the 2-Hop cover and its variations/extensions produce compact indices even for dense graphs but have slower query time than those chain/tree covers. In this paper, we propose a new indexing scheme, called Path-Hop, which is even more space-efficient than those schemes based on 2-Hop cover and yet has query processing speed comparable to those chain/tree covers. We conduct extensive experiments to illustrate the effectiveness of our approach relative to other state-of-the-art methods.
- R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In Proc. of SIGMOD, pp 253--262, 1989. Google ScholarDigital Library
- A. Aho, M. Garey, J. Ullman. The transitive reduction of a directed graph. SIAM Journal on Computing, 1(2):131--137, 1972.Google ScholarDigital Library
- L. Chen, A. Gupta, M. E. Kurul. Stack-based algorithms for pattern matching on DAGs. In Proc. of 31st VLDB, pp 493--504, 2005. Google ScholarDigital Library
- V. Chvatal. A greedy heuristic for the set-covering problem. Math. Oper. Res, 4:233--235, 1979.Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. In Proc. of 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp 937--946, 2002. Google ScholarDigital Library
- H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. on Database Systems, 15(4):558--598, 1990. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In Proc. of SIGMOD, pp 595--608, 2008. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-HOP: a high-compression indexing scheme for reachability query. In Proc. of SIGMOD, pp 813--826, 2009. Google ScholarDigital Library
- I. M. Keseler, J. Collado-Vides, S. Gama-Castro, J. Ingraham, S. Paley, I. T. Paulsen, M. Peralta-Gil, and P. D. Karp. Ecocyc: A comprehensive database resource for escherichia coli. In Nucleic Acids Research, 33 (D334-D337), 2005.Google ScholarCross Ref
- R. Paige and R. Tarjan. Three partition refinement algorithms. SIAM Journal on Computing, 16:973--988,1987. Google ScholarDigital Library
- P. Romero, J. Wagg, M. L. Green, D. Kaiser, M. Krummenacker, and P. D Karp. Computational prediction of human metabolic pathways from the complete human genome. Genome Biology, 6(1):1--17,2004Google ScholarCross Ref
- R. Schenkel, A. Theobald, and G. Weikum. HOPI: an efficient connection index for complex XML document collections. In 7th EDBT, LNCS 2992, pp 237--255, 2004.Google Scholar
- S. Trißl and U. Leser. Fast and practical indexing and querying of very large graphs. In Proc. of SIGMOD, pp 845--856, 2007. Google ScholarDigital Library
- H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: answering graph reachability queries in constant time. In Proc. of 22nd ICDE, pp 75, 2006. Google ScholarDigital Library
- E. Zimanyi and S. Gabouje. Semantic visualization of biochemical databases. In Proc. of 2004 Int'l Conf. on Semantics of a Networked World, pp 199--214, 2004.Google ScholarCross Ref
Index Terms
- Path-hop: efficiently indexing large graphs for reachability queries
Recommendations
3-HOP: a high-compression indexing scheme for reachability query
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataReachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the ...
Approximating expressive queries on graph-modeled data
We present GeX for the approximate matching of complex queries on graph-modeled data.GeX generalizes existing approaches and allows for querying any graph-based datasets.GeX query language supports queries ranging from keyword-based to complex ones.GeX ...
Path-tree: An efficient reachability indexing scheme for large directed graphs
Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling ...
Comments