ABSTRACT
Reachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the reachability. Almost all of these approaches work well for very sparse graphs. However, the challenging problem is that as the ratio of the number of edges to the number of vertices increases, the size of the compressed transitive closure grows very large. In this paper, we propose a new 3-hop indexing scheme for directed graphs with higher density. The basic idea of 3-hop indexing is to use chain structures in combination with hops to minimize the number of structures that must be indexed. Technically, our goal is to find a 3-hop scheme over dense DAGs (directed acyclic graphs) with minimum index size. We develop an efficient algorithm to discover a transitive closure contour, which yields near optimal index size. Empirical studies show that our 3-hop scheme has much smaller index size than state-of-the-art reachability query schemes such as 2-hop and path-tree when DAGs are not very sparse, while our query time is close to path-tree, which is considered to be one of the best reachability query schemes.
- R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In SIGMOD, pages 253--262, 1989. Google ScholarDigital Library
- Renzo Angles and Claudio Gutierrez. Survey of graph database models. ACM Comput. Surv., 40(1):1--39, 2008. Google ScholarDigital Library
- M.de Berg, M.van Kreveld, M.Overmars, and O.Schwarzkopf. Computational Geometry. Springer, 2000.Google Scholar
- Li Chen, Amarnath Gupta, and M. Erdem Kurul. Stack-based algorithms for pattern matching on dags. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 493--504, 2005. Google ScholarDigital Library
- Yangjun Chen and Yibin Chen. An efficient algorithm for answering graph reachability queries. In ICDE, pages 893--902, 2008. Google ScholarDigital Library
- Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, and Philip S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961--979, 2006. Google ScholarDigital Library
- Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, and Philip S. Yu. Fast computing reachability labelings for large graphs with high compression rate. In EDBT, pages 193--204, 2008. Google ScholarDigital Library
- V. Chvál. A greedy heuristic for the set-covering problem. Math. Oper. Res, 4:233--235, 1979.Google ScholarDigital Library
- Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. Reachability and distance queries via 2-hop labels. In Proceedings of the 13th annual ACM-SIAM Symposium on Discrete algorithms, pages 937--946, 2002. Google ScholarDigital Library
- G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM J. Comput., 18(1):30--55, 1989. Google ScholarDigital Library
- H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. Database Syst., 15(4):558--598, 1990. Google ScholarDigital Library
- Ruoming Jin, Yang Xiang, Ning Ruan, and Haixun Wang. Efficiently answering reachability queries on very large directed graphs. In SIGMOD Conference, pages 595--608, 2008. Google ScholarDigital Library
- Richard Johnsonbaugh and Martin Kalin. A graph generation software package. In SIGCSE '91: Proceedings of the twenty-second SIGCSE technical symposium on Computer science education, pages 151--154, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
- Guy Kortsarz and David Peleg. Generating sparse 2-spanners. In SWAT '92: Proceedings of the Third Scandinavian Workshop on Algorithm Theory, pages 73--82, 1992. Google ScholarDigital Library
- R. Schenkel, A. Theobald, and G. Weikum. HOPI: An efficient connection index for complex XML document collections. In EDBT, 2004.Google ScholarCross Ref
- K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci., 58(1-3):325--346, 1988. Google ScholarDigital Library
- Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. Efficient aggregation for graph summarization. In SIGMOD Conference, 2008. Google ScholarDigital Library
- Silke Trißl and Ulf Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 845--856, 2007. Google ScholarDigital Library
- Haixun Wang, Hao He, Jun Yang, Philip S. Yu, and Jeffrey Xu Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), page 75, 2006. Google ScholarDigital Library
- Xifeng Yan, Philip S. Yu, and Jiawei Han. Substructure similarity search in graph databases. In SIGMOD Conference, pages 766--777, 2005. Google ScholarDigital Library
Index Terms
- 3-HOP: a high-compression indexing scheme for reachability query
Recommendations
Path-tree: An efficient reachability indexing scheme for large directed graphs
Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling ...
Efficiently answering reachability queries on very large directed graphs
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataEfficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In ...
Fast and practical indexing and querying of very large graphs
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataMany applications work with graph-structured data. As graphs grow in size, indexing becomes essential to ensure sufficient query performance. We present the GRIPP index structure (GRaph Indexing based on Pre- and Postorder numbering) for answering ...
Comments