skip to main content
research-article

Answering billion-scale label-constrained reachability queries within microsecond

Published:01 February 2020Publication History
Skip Abstract Section

Abstract

In this paper, we study the problem of label-constrained reachability (LCR) query which is fundamental in many applications with directed edge-label graphs. Although the classical reachability query (i.e., reachability query without label constraint) has been extensively studied, LCR query is much more challenging because the number of possible label constraint set is exponential to the size of the labels. We observe that the existing techniques for LCR queries only construct partial index for better scalability, and their worst query time is not guaranteed and could be the same as an online breadth-first search (BFS).

In this paper, we propose novel label-constrained 2-hop indexing techniques with novel pruning rules and order strategies. It is shown that our worst query time could be bounded by the in-out index entry size. With all these techniques, comprehensive experiments show that our proposed methods significantly outperform the state-of-the-art technique in terms of query response time (up to 5 orders of magnitude speedup), index size and index construction time. In particular, our proposed method can answer LCR queries within microsecond over billion-scale graphs in a single machine.

References

  1. I. Abraham, D. Delling, A. V. Goldberg, and R. F. Werneck. Hierarchical hub labelings for shortest paths. In European Symposium on Algorithms, pages 24--35. Springer, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Akiba, Y. Iwata, K.-i. Kawarabayashi, and Y. Kawata. Fast shortest-path distance queries on road networks by pruned highway labeling. In 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 147--154. SIAM, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  3. T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 349--360. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Angles, M. Arenas, P. Barceló, A. Hogan, J. Reutter, and D. Vrgoč. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR), 50(5):68, 2017.Google ScholarGoogle Scholar
  5. P. B. Baeza. Querying graph databases. In R. Hull and W. Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA - June 22 - 27, 2013, pages 175--188. ACM, 2013.Google ScholarGoogle Scholar
  6. C. Barrett, R. Jacob, and M. Marathe. Formal-language-constrained path problems. SIAM Journal on Computing, 30(3):809--837, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. Scientific Programming, 21(3-4):137--148, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Bonchi, A. Gionis, F. Gullo, and A. Ukkonen. Distance oracles in edge-labeled graphs. In EDBT, pages 547--558, 2014.Google ScholarGoogle Scholar
  9. M. Chen, Y. Gu, Y. Bao, and G. Yu. Label and distance-constraint reachability queries in uncertain graphs. In International Conference on Database Systems for Advanced Applications, pages 188--202. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  10. X. Chen, L. Lai, L. Qin, and X. Lin. Structsim: Querying structural node similarity at billion scale. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.Google ScholarGoogle Scholar
  11. Y. Chen, Y. Fang, R. Cheng, Y. Li, X. Chen, and J. Zhang. Exploring communities in large profiled graphs. IEEE Transactions on Knowledge and Data Engineering, 2018.Google ScholarGoogle Scholar
  12. J. Cheng, S. Huang, H. Wu, and A. W.-C. Fu. Tf-label: a topological-folding labeling scheme for reachability querying in a large graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 193--204. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cheng and J. X. Yu. On-line exact shortest distance query processing. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pages 481--492. ACM, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Cheng, J. X. Yu, X. Lin, H. Wang, and S. Y. Philip. Fast computation of reachability labeling for large graphs. In International Conference on Extending Database Technology, pages 961--979. Springer, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Choudhary. A survey on social network analysis for counterterrorism. International Journal of Computer Applications, 112(09), 2015.Google ScholarGoogle Scholar
  16. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 32(5):1338--1355, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Fang, R. Cheng, Y. Chen, S. Luo, and J. Hu. Effective and efficient attributed community search. The VLDB Journal, 26(6):803--828, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Fang, R. Cheng, X. Li, S. Luo, and J. Hu. Effective community search over large spatial graphs. PVLDB, 10(6):709--720, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Fang, R. Cheng, S. Luo, and J. Hu. Effective community search for large attributed graphs. PVLDB, 9(12):1233--1244, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Fang, R. Cheng, S. Luo, J. Hu, and K. Huang. C-explorer: browsing communities in large graphs. PVLDB, 10(12):1885--1888, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin. A survey of community search over big graphs. The VLDB Journal, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  22. Y. Fang, Z. Wang, R. Cheng, X. Li, S. Luo, J. Hu, and X. Chen. On spatial-aware community search. TKDE, 31(4):783--798, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Fang, Z. Wang, R. Cheng, H. Wang, and J. Hu. Effective and efficient community search over large directed graphs. TKDE, 31(11):2093--2107, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  24. Y. Fang, Y. Yang, W. Zhang, X. Lin, and X. Cao. Effective and efficient community search over large heterogeneous information networks. PVLDB, 13(6), Feb. 2020.Google ScholarGoogle Scholar
  25. Y. Fang, K. Yu, R. Cheng, L. V. S. Lakshmanan, and X. Lin. Efficient algorithms for densest subgraph discovery. PVLDB, 12(11):1719--1732, July 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. S. Hassan, W. G. Aref, and A. M. Aly. Graph indexing for shortest-path finding over dynamic sub-graphs. In Proceedings of the 2016 International Conference on Management of Data, pages 1183--1197. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Hu, R. Cheng, K. C.-C. Chang, A. Sankar, Y. Fang, and B. Y. Lam. Discovering maximal motif cliques in large heterogeneous information networks. In International Conference on Data Engineering (ICDE), pages 746--757. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Hu, X. Wu, R. Cheng, S. Luo, and Y. Fang. On minimal steiner maximum-connected subgraph queries. TKDE, 29(11):2455--2469, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  29. R. Jin, H. Hong, H. Wang, N. Ruan, and Y. Xiang. Computing label-constraint reachability in graph databases. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 123--134. ACM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Jin, N. Ruan, S. Dey, and J. Y. Xu. Scarab: scaling reachability computation on large graphs. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 169--180. ACM, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Jin and G. Wang. Simple, fast, and scalable reachability oracle. PVLDB, 6(14):1978--1989, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 813--826. ACM, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Klodt, G. Weikum, S. Bedathur, and S. Seufert. Indexing strategies for constrained shortest paths over large social networks. Universitat des Saarlandes, 2011.Google ScholarGoogle Scholar
  34. J. Kunegis. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, pages 1343--1350. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Lai, Z. Qing, Z. Yang, X. Jin, Z. Lai, R. Wang, K. Hao, X. Lin, L. Qin, W. Zhang, Y. Zhang, Z. Qian, and J. Zhou. Distributed subgraph matching on timely dataflow. PVLDB, 12(10):1099--1112, June 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Leskovec. Snap: Stanford large network dataset collection, 2016.Google ScholarGoogle Scholar
  37. Y. Li, M. L. Yiu, N. M. Kou, et al. An experimental study on hub labeling based shortest path algorithms. PVLDB, 11(4):445--457, 2017.Google ScholarGoogle Scholar
  38. B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation: An index-based approach. In The World Wide Web Conference, pages 1130--1141, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Liu, F. Zhang, C. Zhang, W. Zhang, and X. Lin. Corecube: Core decomposition in multilayer graphs. In International Conference on Web Information Systems Engineering, pages 694--710. Springer, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Schenkel, A. Theobald, and G. Weikum. Hopi: An efficient connection index for complex xml document collections. In International Conference on Extending Database Technology, pages 237--255. Springer, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  41. S. Seufert, A. Anand, S. Bedathur, and G. Weikum. Ferrari: Flexible and efficient reachability range assignment for graph indexing. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 1009--1020. IEEE, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theoretical Computer Science, 58(1-3):325--346, 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. D. Valstar, G. H. Fletcher, and Y. Yoshida. Landmark indexing for evaluation of label-constrained reachability queries. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 345--358. ACM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. O. van Rest, S. Hong, J. Kim, X. Meng, and H. Chafi. Pgql: a property graph query language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, page 7. ACM, 2016.Google ScholarGoogle Scholar
  45. S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 913--924. ACM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Wadhwa, A. Prasad, S. Ranu, A. Bagchi, and S. Bedathur. Efficiently answering regular simple path queries on large labeled networks. In Proceedings of the 2019 International Conference on Management of Data, pages 1463--1480, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 233--244. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  48. K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB, 12(10):1139--1152, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Efficient bitruss decomposition for large-scale bipartite graphs. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.Google ScholarGoogle Scholar
  50. H. Wei, J. X. Yu, C. Lu, and R. Jin. Reachability querying: An independent permutation labeling approach. PVLDB, 7(12):1191--1202, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. P. T. Wood. Query languages for graph databases. ACM SIGMOD Record, 41(1):50--60, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Y. Yano, T. Akiba, Y. Iwata, and Y. Yoshida. Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 1601--1606. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. H. Yildirim, V. Chaoji, and M. J. Zaki. Grail: Scalable reachability index for large graphs. PVLDB, 3(1-2):276--284, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. X. Yu and J. Cheng. Graph reachability queries: A survey. In Managing and Mining Graph Data, pages 181--215. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  55. Y. Yuan, X. Lian, G. Wang, Y. Ma, and Y. Wang. Constrained shortest path query in a large time-dependent graph. PVLDB, 12(10):1058--1070, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. F. Zhang, C. Li, Y. Zhang, L. Qin, and W. Zhang. Finding critical users in social communities: The collapsed core and truss problems. IEEE Transactions on Knowledge and Data Engineering, 32(1):78--91, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  57. F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. Efficiently reinforcing social networks over user engagement and tie strength. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 557--568. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  58. X. Zhang and M. T. Özsu. Correlation constraint shortest path over large multi-relation graphs. PVLDB, 12(5):488--501, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Z. Zhang, J. X. Yu, L. Qin, Q. Zhu, and X. Zhou. I/o cost minimization: reachability queries processing over massive graphs. In Proceedings of the 15th International Conference on Extending Database Technology, pages 468--479. ACM, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. L. Zou, K. Xu, J. X. Yu, L. Chen, Y. Xiao, and D. Zhao. Efficient processing of label-constraint reachability queries in large graphs. Information Systems, 40:47--66, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 6
    February 2020
    170 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 February 2020
    Published in pvldb Volume 13, Issue 6

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader