skip to main content
research-article

Correlation constraint shortest path over large multi-relation graphs

Published:01 January 2019Publication History
Skip Abstract Section

Abstract

Multi-relation graphs intuitively capture the heterogeneous correlations among real-world entities by allowing multiple types of relationships to be represented as entity-connecting edges, i.e., two entities could be correlated with more than one type of relationship. This is important in various applications such as social network analysis, ecology, and bio-informatics. Existing studies on these graphs usually consider an edge label constraint perspective, where each edge contains only one label and each edge is considered independently. For example, there are lines of research focusing on reachability between two vertices under a set of edge label constraints, or finding paths whose consecutive edge labels satisfy a user-specified logical expression. This is too restricted in real graphs, and in this work, we define a generic correlation constraint on multi-relation graphs from the perspective of vertex correlations, where a correlation can be defined recursively. Specifically, we formalize and investigate the shortest path problem over large multi-relation graphs in the presence of both necessity and denial constraints, which have various real applications. We show that it is nontrivial to apply conventional graph traversal algorithms (e.g., BFS or DFS) to address the challenge. To effectively reduce the search space, we propose a Hybrid Relation Encoding method, a.k.a. HyRE, to encode both topological and relation information in a compact way. We conduct extensive experiments over large real-world graphs to validate the effectiveness and efficiency of the proposed solution.

References

  1. I. Abraham, D. Delling, A. V. Goldberg, and R. F. F. Werneck. Hierarchical hub labelings for shortest paths. In Proc. 20th Annual European Symp. on Algorithms, pages 24--35, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 349--360. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. V. N. Y. Angela Bonifati, George Fletcher. Querying graphs. In Synthesis Lectures on Data Management, Morgan Claypool, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Angles. The property graph database model. In Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, 2018.Google ScholarGoogle Scholar
  5. C. Avery. Giraph: Large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara, 2011.Google ScholarGoogle Scholar
  6. M. Baitaluk, X. Qian, S. Godbole, A. Raval, A. Ray, and A. Gupta. Pathsys: integrating molecular interaction graphs for systems biology. BMC Bioinformatics, 7(1):55, Feb 2006.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Barceló, C. A. Hurtado, L. Libkin, and P. T. Wood. Expressive languages for path queries over graph-structured data. In Proc. ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, pages 3--14. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. L. Barrett, K. R. Bisset, R. Jacob, G. Konjevod, and M. V. Marathe. Classical and contemporary shortest path problems in road networks: Implementation and experimental analysis of the TRANSIMS router. In Proc. 10th Annual European Symp. on Algorithms, pages 126--138, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Bast, S. Funke, D. Matijevic, P. Sanders, and D. Schultes. Proc. 9th workshop on algorithm eng. and experiments. In Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, 2007.Google ScholarGoogle Scholar
  10. S. Beamer, K. Asanovic, and D. A. Patterson. Direction-optimizing breadth-first search. Scientific Programming, 21(3--4):137--148, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Bonchi, A. Gionis, F. Gullo, and A. Ukkonen. Distance oracles in edge-labeled graphs. In Proc. 17th Int. Conf. on Extending Database Technology, pages 547--558, 2014.Google ScholarGoogle Scholar
  12. D. Calvanese, G. D. Giacomo, M. Lenzerini, and M. Y. Vardi. Containment of conjunctive regular path queries with inverse. In KR, pages 176--185. Morgan Kaufmann, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Cardillo, J. Gómez-Gardeñes, M. Zanin, M. Romance, D. Papo, F. del Pozo, and S. Boccaletti. Emergence of network features from multiplexity. CoRR, abs/1212.2153, 2012.Google ScholarGoogle Scholar
  14. M. Coscia, G. Rossetti, D. Pennacchioli, D. Ceccarelli, and F. Giannotti. You know because i know: a multidimensional network approach to human resources problem. In Advances in Social Networks Analysis and Mining, pages 434--441, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Dibbelt, T. Pajor, and D. Wagner. User-constrained multimodal route planning. ACM J. Experimental Algorithmics, 19(1), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269--271, 1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Fan, J. Li, S. Ma, N. Tang, and Y. Wu. Adding regular expressions to graph reachability and pattern queries. Front. Comput. Sci., 6(3):313--338, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  18. D. Florescu, A. Y. Levy, and D. Suciu. Query containment for conjunctive queries with regular expressions. In Proc. ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, pages 139--148. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Geisberger, M. N. Rice, P. Sanders, and V. J. Tsotras. Route planning with flexible edge restrictions. ACM J. Experimental Algorithmics, 17(1), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Geisberger, P. Sanders, D. Schultes, and D. Delling. Contraction hierarchies: Faster and simpler hierarchical routing in road networks. In Proc. Int. Workshop on Experimental and Efficient Algorithms, pages 319--333, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Gubichev, S. Bedathur, S. Seufert, and G. Weikum. Fast and accurate estimation of shortest paths in large graphs. In Proc. 19th ACM Int. Conf. on Information and Knowledge Management, pages 499--508. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Systems Science and Cybernetics, 4(2): 100--107, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  23. M. S. Hassan, W. G. Aref, and A. M. Aly. Graph indexing for shortest-path finding over dynamic sub-graphs. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1183--1197, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Håstad. Clique is hard to approximate within n<sup>1-epsilon</sup>. In FOCS, pages 627--636. IEEE Computer Society, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Janusgraph: Distributed graph database. http://janusgraph.org/, 2018.Google ScholarGoogle Scholar
  26. A. Kanterakis, D. Kafetzopoulos, V. Moustakis, and G. Potamias. Mining gene expression profiles and gene regulatory networks: Identification of phenotype-specific molecular mechanisms. In Proc. Artificial Intelligence: Theories, Models and Applications, 5th Hellenic Conf. on AI, pages 97--109, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Lasalle and G. Karypis. Multi-threaded graph partitioning. In Proc. 27th IEEE Int. Parallel & Distributed Processing Symp., pages 225--236, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Liang and P. Zhao. Similarity search in graph databases: A multi-layered indexing approach. In Proc. 33st Int. Conf. on Data Engineering, pages 783--794, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. Ma, K. Feng, J. Li, H. Wang, G. Cong, and J. Huai. Proxies for shortest path and distance queries. IEEE Trans. Knowl. and Data Eng., 28(7):1835--1850, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 135--146. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Mao, J. Lu, G. Zhang, and J. Zhang. Multirelational social recommendations via multigraph ranking. IEEE Trans. Cybernetics, 47(12):4049--4061, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  32. J. J. McAuley and J. Leskovec. Image labeling on a network: Using social-network metadata for image classification. In ECCV (4), volume 7575 of Lecture Notes in Computer Science, pages 828--841. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. SIAM J. on Comput., 24(6):1235--1258, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Microsoft academic graph. https://www.openacademic.ai/oag/, 2015.Google ScholarGoogle Scholar
  35. V. Rastogi, A. Machanavajjhala, L. Chitnis, and A. D. Sarma. Finding connected components in map-reduce in logarithmic rounds. In Proc. 29th Int. Conf. on Data Engineering, pages 50--61, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. N. Rice and V. J. Tsotras. Graph indexing of road networks for shortest path queries with label restrictions. PVLDB, 4(2):69--80, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Samet, J. Sankaranarayanan, and H. Alborzi. Scalable network distance browsing in spatial databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 43--54, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Sealfon. Shortest paths and distances with differential privacy. In Proc. 35nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 29--41, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Selmer, A. Poulovassilis, and P. T. Wood. Implementing flexible operators for regular path queries. In Proc. 18th Int. Conf. on Extending Database Technology, pages 149--156, 2015.Google ScholarGoogle Scholar
  40. T. Shulong, B. Jiajun, C. Chun, and H. Xiaofei. Using rich social media information for music recommendation via hypergraph model. In ACM Trans. Multimedia Comp., Comm., and Appl., volume 7S, pages 213--237. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Stella, C. S. Andreazzi, S. Selakovic, A. Goudarzi, and A. Antonioni. Parasite spreading in spatial ecological multiplex networks. J. Complex Networks, 5(3):486--511, 2017.Google ScholarGoogle Scholar
  42. D. Szklarczyk, A. Franceschini, S. Wyder, K. Forslund, D. Heller, J. Huerta-Cepas, M. Simonovic, A. Roth, A. Santos, K. P. Tsafou, M. Kuhn, P. Bork, L. J. Jensen, and C. von Mering. String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Research, 43(D1):D447, 2014.Google ScholarGoogle Scholar
  43. D. Szklarczyk, J. H. Morris, H. Cook, M. Kuhn, S. Wyder, M. Simonovic, A. Santos, N. T. Doncheva, A. Roth, P. Bork, L. J. Jensen, and C. von Mering. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Research, 45(Database-Issue):D362--D368, 2017.Google ScholarGoogle Scholar
  44. L. Tang, X. Wang, and H. Liu. Community detection via heterogeneous interaction analysis. Data Min. Knowl. Discov., 25(1):1--33, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. The neo4j graph platform. https://neo4j.com/, 2018.Google ScholarGoogle Scholar
  46. L. D. J. Valstar, G. H. L. Fletcher, and Y. Yoshida. Landmark indexing for evaluation of label-constrained reachability queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 345--358, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Wang, Y. Tsai, T. Hong, and H. Kao. k-anonymization of multiple shortest paths. J. Soft Comput., 21(15):4215--4226, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Wang, Z. Tsai, T. Hong, and I. Ting. Anonymizing shortest paths on social network graphs. In Proc. 3rd Int. Conf. on Intell. Information and Database Syst., pages 129--136, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. F. Wei. Tedi: Efficient shortest path query answering on graphs. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 99--110, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. F. Wei. TEDI: efficient shortest path query answering on graphs. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 99--110, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14):1981--1992, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 5
    January 2019
    163 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 January 2019
    Published in pvldb Volume 12, Issue 5

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader