ABSTRACT
Use of graphs is growing rapidly in social networks, semantic web, biological databases, scientific workflow provenance, and other areas. Regular Path Queries (RPQs) can be seen as a core graph query language to answer pattern-based reachability queries. Unfortunately, the number of freely available systems for querying graphs using RPQs is rather limited, and available implementations do not provide direct support for a number of desirable variants of RPQs, e.g., to return those edges that are contained in some (or all) paths that match the given regular expression R. Thus, by returning not just a pair (x, y) of end points of paths that match R, but also "witness edges" (u, v) inbetween, our RPQ variants can be understood as returning additional provenance information about the answer (x, y), i.e., those edges (u, v) that are in some (or all) paths from x to y matching R. We propose a number of such RPQ variants and show how they can be implemented using either Datalog or a suitable RDBMS. Our initial experimental results indicate that RPQs and our provenance-aware variants (RPQProv), when implemented using conventional relational technologies, yield reasonable performance even for relatively large graphs. On the other hand, the overhead associated with some of these variants also makes efficient handling of provenance-aware graph queries an interesting challenge for future research.
- F. Afrati and F. Toni. Chain queries expressible by linear datalog programs. In Deductive Databases and Logic Programming (DDLP), pages 49--58, 1997.Google Scholar
- F. N. Afrati and J. D. Ullman. Transitive closure and recursive datalog implemented on clusters. In Proceedings of the 15th International Conference on Extending Database Technology, EDBT '12, pages 132--143, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- M. Arenas, S. Conca, and J. Pérez. Counting beyond a yottabyte, or how sparql 1.1 property paths will prevent adoption of the standard. In Proceedings of the 21st international conference on World Wide Web, WWW '12, pages 629--638, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- V. Cuevas-Vicenttín, S. Dey, B. Ludäscher, M. Li Yuan Wang, and T. Song. Modeling and querying scientific workflow provenance in the d-opm. In 7th Workshop on Workflows in Support of Large-Scale Science (WORKS), november 2012.Google Scholar
- O. De Moor, D. Lacey, and E. Van Wyk. Universal regular path queries. Higher Order Symbol. Comput., 16(1-2):15--35, Mar. 2003. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008. Google ScholarDigital Library
- S. Dey, S. Köhler, S. Bowers, and B. Ludäscher. Datalog as a lingua franca for provenance querying and reasoning. In Workshop on the Theory and Practice of Provenance (TaPP), 2012. Google ScholarDigital Library
- W. Fan, X. Wang, and Y. Wu. Performance guarantees for distributed reachability queries. Proc. VLDB Endow., 5(11):1304--1316, July 2012. Google ScholarDigital Library
- R. Kabler, Y. E. Ioannidis, and M. J. Carey. Performance evaluation of algorithms for transitive closure. Inf. Syst., 17(5):415--441, Sept. 1992. Google ScholarDigital Library
- A. Koschmieder and U. Leser. Regular path queries on large graphs. In Proceedings of the 24th international conference on Scientific and Statistical Database Management, SSDBM'12, pages 177--194, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library
- Y. A. Liu, T. Rothamel, F. Yu, S. D. Stoller, and N. Hu. Parametric regular path queries. In Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, PLDI '04, pages 219--230, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. SIAM J. Comput., 24(6):1235--1258, Dec. 1995. Google ScholarDigital Library
- L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. The open provenance model core specification (v1.1). Future Gener. Comput. Syst., 27(6):743--756, June 2011. Google ScholarDigital Library
- J. D. Ullman and A. Van Gelder. Parallel complexity of logical query programs. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science, SFCS '86, pages 438--454, Washington, DC, USA, 1986. IEEE Computer Society. Google ScholarDigital Library
- M. L. Y. Wang. Querying Provenance as Regular Path Queries with Relational Databases. Master's thesis, University of California, Davis, 2012.Google Scholar
Index Terms
- On implementing provenance-aware regular path queries with relational query engines
Recommendations
Decidable containment of recursive queries
Database theoryOne of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query ...
Estimating the Evaluation Cost of Regular Path Queries on Large Graphs
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyRegular path queries (RPQs) are widely used on a graph whose answer is a set of tuples of nodes connected by paths corresponding to a given regular expression. Traditional approaches for evaluating RPQs are restricted in the explosion of graph size and/...
Regular path queries under approximate semantics
We give a general framework for approximate query processing in semistructured databases. We focus on regular path queries, which are the integral part of most of the query languages for semistructured databases. To enable approximations, we allow the ...
Comments