Abstract
In this paper, we study the problem of label-constrained reachability (LCR) query which is fundamental in many applications with directed edge-label graphs. Although the classical reachability query (i.e., reachability query without label constraint) has been extensively studied, LCR query is much more challenging because the number of possible label constraint set is exponential to the size of the labels. We observe that the existing techniques for LCR queries only construct partial index for better scalability, and their worst query time is not guaranteed and could be the same as an online breadth-first search (BFS).
In this paper, we propose novel label-constrained 2-hop indexing techniques with novel pruning rules and order strategies. It is shown that our worst query time could be bounded by the in-out index entry size. With all these techniques, comprehensive experiments show that our proposed methods significantly outperform the state-of-the-art technique in terms of query response time (up to 5 orders of magnitude speedup), index size and index construction time. In particular, our proposed method can answer LCR queries within microsecond over billion-scale graphs in a single machine.
- I. Abraham, D. Delling, A. V. Goldberg, and R. F. Werneck. Hierarchical hub labelings for shortest paths. In European Symposium on Algorithms, pages 24--35. Springer, 2012.Google ScholarDigital Library
- T. Akiba, Y. Iwata, K.-i. Kawarabayashi, and Y. Kawata. Fast shortest-path distance queries on road networks by pruned highway labeling. In 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 147--154. SIAM, 2014.Google ScholarCross Ref
- T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 349--360. ACM, 2013.Google ScholarDigital Library
- R. Angles, M. Arenas, P. Barceló, A. Hogan, J. Reutter, and D. Vrgoč. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR), 50(5):68, 2017.Google Scholar
- P. B. Baeza. Querying graph databases. In R. Hull and W. Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA - June 22 - 27, 2013, pages 175--188. ACM, 2013.Google Scholar
- C. Barrett, R. Jacob, and M. Marathe. Formal-language-constrained path problems. SIAM Journal on Computing, 30(3):809--837, 2000.Google ScholarDigital Library
- S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. Scientific Programming, 21(3-4):137--148, 2013.Google ScholarDigital Library
- F. Bonchi, A. Gionis, F. Gullo, and A. Ukkonen. Distance oracles in edge-labeled graphs. In EDBT, pages 547--558, 2014.Google Scholar
- M. Chen, Y. Gu, Y. Bao, and G. Yu. Label and distance-constraint reachability queries in uncertain graphs. In International Conference on Database Systems for Advanced Applications, pages 188--202. Springer, 2014.Google ScholarCross Ref
- X. Chen, L. Lai, L. Qin, and X. Lin. Structsim: Querying structural node similarity at billion scale. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.Google Scholar
- Y. Chen, Y. Fang, R. Cheng, Y. Li, X. Chen, and J. Zhang. Exploring communities in large profiled graphs. IEEE Transactions on Knowledge and Data Engineering, 2018.Google Scholar
- J. Cheng, S. Huang, H. Wu, and A. W.-C. Fu. Tf-label: a topological-folding labeling scheme for reachability querying in a large graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 193--204. ACM, 2013.Google ScholarDigital Library
- J. Cheng and J. X. Yu. On-line exact shortest distance query processing. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pages 481--492. ACM, 2009.Google ScholarDigital Library
- J. Cheng, J. X. Yu, X. Lin, H. Wang, and S. Y. Philip. Fast computation of reachability labeling for large graphs. In International Conference on Extending Database Technology, pages 961--979. Springer, 2006.Google ScholarDigital Library
- P. Choudhary. A survey on social network analysis for counterterrorism. International Journal of Computer Applications, 112(09), 2015.Google Scholar
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 32(5):1338--1355, 2003.Google ScholarDigital Library
- Y. Fang, R. Cheng, Y. Chen, S. Luo, and J. Hu. Effective and efficient attributed community search. The VLDB Journal, 26(6):803--828, 2017.Google ScholarDigital Library
- Y. Fang, R. Cheng, X. Li, S. Luo, and J. Hu. Effective community search over large spatial graphs. PVLDB, 10(6):709--720, 2017.Google ScholarDigital Library
- Y. Fang, R. Cheng, S. Luo, and J. Hu. Effective community search for large attributed graphs. PVLDB, 9(12):1233--1244, 2016.Google ScholarDigital Library
- Y. Fang, R. Cheng, S. Luo, J. Hu, and K. Huang. C-explorer: browsing communities in large graphs. PVLDB, 10(12):1885--1888, 2017.Google ScholarDigital Library
- Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin. A survey of community search over big graphs. The VLDB Journal, 2019.Google ScholarCross Ref
- Y. Fang, Z. Wang, R. Cheng, X. Li, S. Luo, J. Hu, and X. Chen. On spatial-aware community search. TKDE, 31(4):783--798, 2019.Google ScholarDigital Library
- Y. Fang, Z. Wang, R. Cheng, H. Wang, and J. Hu. Effective and efficient community search over large directed graphs. TKDE, 31(11):2093--2107, 2019.Google ScholarCross Ref
- Y. Fang, Y. Yang, W. Zhang, X. Lin, and X. Cao. Effective and efficient community search over large heterogeneous information networks. PVLDB, 13(6), Feb. 2020.Google Scholar
- Y. Fang, K. Yu, R. Cheng, L. V. S. Lakshmanan, and X. Lin. Efficient algorithms for densest subgraph discovery. PVLDB, 12(11):1719--1732, July 2019.Google ScholarDigital Library
- M. S. Hassan, W. G. Aref, and A. M. Aly. Graph indexing for shortest-path finding over dynamic sub-graphs. In Proceedings of the 2016 International Conference on Management of Data, pages 1183--1197. ACM, 2016.Google ScholarDigital Library
- J. Hu, R. Cheng, K. C.-C. Chang, A. Sankar, Y. Fang, and B. Y. Lam. Discovering maximal motif cliques in large heterogeneous information networks. In International Conference on Data Engineering (ICDE), pages 746--757. IEEE, 2019.Google ScholarCross Ref
- J. Hu, X. Wu, R. Cheng, S. Luo, and Y. Fang. On minimal steiner maximum-connected subgraph queries. TKDE, 29(11):2455--2469, 2017.Google ScholarCross Ref
- R. Jin, H. Hong, H. Wang, N. Ruan, and Y. Xiang. Computing label-constraint reachability in graph databases. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 123--134. ACM, 2010.Google ScholarDigital Library
- R. Jin, N. Ruan, S. Dey, and J. Y. Xu. Scarab: scaling reachability computation on large graphs. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 169--180. ACM, 2012.Google ScholarDigital Library
- R. Jin and G. Wang. Simple, fast, and scalable reachability oracle. PVLDB, 6(14):1978--1989, 2013.Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 813--826. ACM, 2009.Google ScholarDigital Library
- P. Klodt, G. Weikum, S. Bedathur, and S. Seufert. Indexing strategies for constrained shortest paths over large social networks. Universitat des Saarlandes, 2011.Google Scholar
- J. Kunegis. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, pages 1343--1350. ACM, 2013.Google ScholarDigital Library
- L. Lai, Z. Qing, Z. Yang, X. Jin, Z. Lai, R. Wang, K. Hao, X. Lin, L. Qin, W. Zhang, Y. Zhang, Z. Qian, and J. Zhou. Distributed subgraph matching on timely dataflow. PVLDB, 12(10):1099--1112, June 2019.Google ScholarDigital Library
- J. Leskovec. Snap: Stanford large network dataset collection, 2016.Google Scholar
- Y. Li, M. L. Yiu, N. M. Kou, et al. An experimental study on hub labeling based shortest path algorithms. PVLDB, 11(4):445--457, 2017.Google Scholar
- B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation: An index-based approach. In The World Wide Web Conference, pages 1130--1141, 2019.Google ScholarDigital Library
- B. Liu, F. Zhang, C. Zhang, W. Zhang, and X. Lin. Corecube: Core decomposition in multilayer graphs. In International Conference on Web Information Systems Engineering, pages 694--710. Springer, 2019.Google ScholarDigital Library
- R. Schenkel, A. Theobald, and G. Weikum. Hopi: An efficient connection index for complex xml document collections. In International Conference on Extending Database Technology, pages 237--255. Springer, 2004.Google ScholarCross Ref
- S. Seufert, A. Anand, S. Bedathur, and G. Weikum. Ferrari: Flexible and efficient reachability range assignment for graph indexing. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 1009--1020. IEEE, 2013.Google ScholarDigital Library
- K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theoretical Computer Science, 58(1-3):325--346, 1988.Google ScholarDigital Library
- L. D. Valstar, G. H. Fletcher, and Y. Yoshida. Landmark indexing for evaluation of label-constrained reachability queries. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 345--358. ACM, 2017.Google ScholarDigital Library
- O. van Rest, S. Hong, J. Kim, X. Meng, and H. Chafi. Pgql: a property graph query language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, page 7. ACM, 2016.Google Scholar
- S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 913--924. ACM, 2011.Google ScholarDigital Library
- S. Wadhwa, A. Prasad, S. Ranu, A. Bagchi, and S. Bedathur. Efficiently answering regular simple path queries on large labeled networks. In Proceedings of the 2019 International Conference on Management of Data, pages 1463--1480, 2019.Google ScholarDigital Library
- K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 233--244. IEEE, 2018.Google ScholarCross Ref
- K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB, 12(10):1139--1152, 2019.Google ScholarDigital Library
- K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Efficient bitruss decomposition for large-scale bipartite graphs. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.Google Scholar
- H. Wei, J. X. Yu, C. Lu, and R. Jin. Reachability querying: An independent permutation labeling approach. PVLDB, 7(12):1191--1202, 2014.Google ScholarDigital Library
- P. T. Wood. Query languages for graph databases. ACM SIGMOD Record, 41(1):50--60, 2012.Google ScholarDigital Library
- Y. Yano, T. Akiba, Y. Iwata, and Y. Yoshida. Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 1601--1606. ACM, 2013.Google ScholarDigital Library
- H. Yildirim, V. Chaoji, and M. J. Zaki. Grail: Scalable reachability index for large graphs. PVLDB, 3(1-2):276--284, 2010.Google ScholarDigital Library
- J. X. Yu and J. Cheng. Graph reachability queries: A survey. In Managing and Mining Graph Data, pages 181--215. Springer, 2010.Google ScholarCross Ref
- Y. Yuan, X. Lian, G. Wang, Y. Ma, and Y. Wang. Constrained shortest path query in a large time-dependent graph. PVLDB, 12(10):1058--1070, 2019.Google ScholarDigital Library
- F. Zhang, C. Li, Y. Zhang, L. Qin, and W. Zhang. Finding critical users in social communities: The collapsed core and truss problems. IEEE Transactions on Knowledge and Data Engineering, 32(1):78--91, 2018.Google ScholarCross Ref
- F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. Efficiently reinforcing social networks over user engagement and tie strength. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 557--568. IEEE, 2018.Google ScholarCross Ref
- X. Zhang and M. T. Özsu. Correlation constraint shortest path over large multi-relation graphs. PVLDB, 12(5):488--501, 2019.Google ScholarDigital Library
- Z. Zhang, J. X. Yu, L. Qin, Q. Zhu, and X. Zhou. I/o cost minimization: reachability queries processing over massive graphs. In Proceedings of the 15th International Conference on Extending Database Technology, pages 468--479. ACM, 2012.Google ScholarDigital Library
- L. Zou, K. Xu, J. X. Yu, L. Chen, Y. Xiao, and D. Zhao. Efficient processing of label-constraint reachability queries in large graphs. Information Systems, 40:47--66, 2014.Google ScholarDigital Library
Recommendations
Answering reachability queries with ordered label constraints over labeled graphs
AbstractReachability query plays a vital role in many graph analysis tasks. Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs. Since many real graphs are labeled graph, it highly demands Label-...
Answering Label-Constrained Reachability Queries via Reduction Techniques
Database Systems for Advanced ApplicationsAbstractMany real-world graphs contain edge labels for representing different types of relations between nodes, such as social networks, biological networks and knowledge graphs. As a fundamental task, the label-constrained reachability (LCR) query asks ...
Answering label-constraint reachability in large graphs
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementIn this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries, specifically,given a label set S and two vertices u1 and u2 in a large directed graph G, we verify whether there exists a path from u1 to u2 ...
Comments