skip to main content
10.1145/1559845.1559930acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

3-HOP: a high-compression indexing scheme for reachability query

Published:29 June 2009Publication History

ABSTRACT

Reachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the reachability. Almost all of these approaches work well for very sparse graphs. However, the challenging problem is that as the ratio of the number of edges to the number of vertices increases, the size of the compressed transitive closure grows very large. In this paper, we propose a new 3-hop indexing scheme for directed graphs with higher density. The basic idea of 3-hop indexing is to use chain structures in combination with hops to minimize the number of structures that must be indexed. Technically, our goal is to find a 3-hop scheme over dense DAGs (directed acyclic graphs) with minimum index size. We develop an efficient algorithm to discover a transitive closure contour, which yields near optimal index size. Empirical studies show that our 3-hop scheme has much smaller index size than state-of-the-art reachability query schemes such as 2-hop and path-tree when DAGs are not very sparse, while our query time is close to path-tree, which is considered to be one of the best reachability query schemes.

References

  1. R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In SIGMOD, pages 253--262, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Renzo Angles and Claudio Gutierrez. Survey of graph database models. ACM Comput. Surv., 40(1):1--39, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M.de Berg, M.van Kreveld, M.Overmars, and O.Schwarzkopf. Computational Geometry. Springer, 2000.Google ScholarGoogle Scholar
  4. Li Chen, Amarnath Gupta, and M. Erdem Kurul. Stack-based algorithms for pattern matching on dags. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 493--504, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yangjun Chen and Yibin Chen. An efficient algorithm for answering graph reachability queries. In ICDE, pages 893--902, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, and Philip S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961--979, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, and Philip S. Yu. Fast computing reachability labelings for large graphs with high compression rate. In EDBT, pages 193--204, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Chvál. A greedy heuristic for the set-covering problem. Math. Oper. Res, 4:233--235, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. Reachability and distance queries via 2-hop labels. In Proceedings of the 13th annual ACM-SIAM Symposium on Discrete algorithms, pages 937--946, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM J. Comput., 18(1):30--55, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. Database Syst., 15(4):558--598, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ruoming Jin, Yang Xiang, Ning Ruan, and Haixun Wang. Efficiently answering reachability queries on very large directed graphs. In SIGMOD Conference, pages 595--608, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Richard Johnsonbaugh and Martin Kalin. A graph generation software package. In SIGCSE '91: Proceedings of the twenty-second SIGCSE technical symposium on Computer science education, pages 151--154, New York, NY, USA, 1991. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guy Kortsarz and David Peleg. Generating sparse 2-spanners. In SWAT '92: Proceedings of the Third Scandinavian Workshop on Algorithm Theory, pages 73--82, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Schenkel, A. Theobald, and G. Weikum. HOPI: An efficient connection index for complex XML document collections. In EDBT, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  16. K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci., 58(1-3):325--346, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. Efficient aggregation for graph summarization. In SIGMOD Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Silke Trißl and Ulf Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 845--856, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Haixun Wang, Hao He, Jun Yang, Philip S. Yu, and Jeffrey Xu Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), page 75, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xifeng Yan, Philip S. Yu, and Jiawei Han. Substructure similarity search in graph databases. In SIGMOD Conference, pages 766--777, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. 3-HOP: a high-compression indexing scheme for reachability query

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
      June 2009
      1168 pages
      ISBN:9781605585512
      DOI:10.1145/1559845

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader