ABSTRACT
We show that existing mature, relational optimizers can be exploited with a novel schema to give better performance for property graph storage and retrieval than popular noSQL graph stores. The schema combines relational storage for adjacency information with JSON storage for vertex and edge attributes. We demonstrate that this particular schema design has benefits compared to a purely relational or purely JSON solution. The query translation mechanism translates Gremlin queries with no side effects into SQL queries so that one can leverage relational query optimizers. We also conduct an empirical evaluation of our schema design and query translation mechanism with two existing popular property graph stores. We show that our system is 2-8 times better on query performance, and 10-30 times better in throughput on 4.3 billion edge graphs compared to existing stores.
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd international conference on very large data bases, pages 411--422. VLDB Endowment, 2007. Google ScholarDigital Library
- R. Angles, P. Boncz, J. Larriba-Pey, I. Fundulaki, T. Neumann, O. Erling, P. Neubauer, N. Martinez-Bazan, V. Kotsev, and I. Toma. The linked data benchmark council: A graph and RDF industry benchmarking effort. SIGMOD Rec., 43(1):27--31, May 2014. Google ScholarDigital Library
- T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. LinkBench: A database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1185--1196, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- C. Bizer and A. Schultz. The Berlin SPARQL benchmark. International Journal On Semantic Web and Information Systems, 2009.Google ScholarCross Ref
- M. A. Bornea, J. Dolby, A. Kementsietsidis, K. Srinivas, P. Dantressangle, O. Udrea, and B. Bhattacharjee. Building an efficient RDF store over a relational database. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 121--132. ACM, 2013. Google ScholarDigital Library
- F. Bugiotti, F. Goasdoué, Z. Kaoudi, and I. Manolescu. RDF data management in the Amazon cloud. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBT-ICDT '12, pages 61--72, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Chebotko, S. Lu, and F. Fotouhi. Semantics preserving SPARQL-to-SQL translation. Data and Knowledge Engineering, 68(10):973 -- 1000, 2009. Google ScholarDigital Library
- M. Ciglan, A. Averbuch, and L. Hluchy. Benchmarking traversal operations over graph databases. In 28th International Conference on Data Engineering Workshops (ICDEW), pages 186--189. IEEE, 2012. Google ScholarDigital Library
- R. Cyganiak. A relational algebra for SPARQL. Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005--170, page 35, 2005.Google Scholar
- F. Di Pinto, D. Lembo, M. Lenzerini, R. Mancini, A. Poggi, R. Rosati, M. Ruzzi, and D. F. Savo. Optimizing query rewriting in ontology-based data access. In Proceedings of the 16th International Conference on Extending Database Technology, EDBT '13, pages 561--572, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- D. Dominguez-Sal, P. Urbón-Bayes, A. Giménez-Vanó, S. Gómez-Villamor, N. Martínez-Bazán, and J.-L. Larriba-Pey. Survey of graph database performance on the HPC scalable graph analysis benchmark. In Web-Age Information Management, pages 37--48. Springer, 2010. Google ScholarCross Ref
- Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics, 3(2--3):158--182, 2005. Google ScholarDigital Library
- S. Harris and N. Shadbolt. SPARQL query processing with conventional relational database systems. In Web Information Systems Engineering--WISE 2005 Workshops, pages 235--244. Springer, 2005. Google ScholarDigital Library
- O. Hartig and B. Thompson. Foundations of an alternative approach to reification in RDF. CoRR, abs/1406.3399, 2014.Google Scholar
- F. Holzschuher and R. Peinl. Performance of graph query languages: Comparison of Cypher, Gremlin and native access in Neo4J. In Proceedings of the Joint EDBT/ICDT 2013 Workshops, pages 195--204, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. PVLDB, 4(11):1123--1134, 2011.Google ScholarDigital Library
- S. Jouili and V. Vansteenberghe. An empirical comparison of graph databases. In SocialCom, pages 708--715. IEEE, 2013. Google ScholarDigital Library
- Z. Kaoudi and I. Manolescu. Cloud-based RDF data management. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 725--729, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- L. Ma, Y. Yang, Z. Qiu, G. Xie, Y. Pan, and S. Liu. Towards a complete OWL ontology benchmark. In Proceedings of the 3rd European Conference on The Semantic Web, ESWC'06, pages 125--139, Berlin, Heidelberg, 2006. Springer-Verlag. Google ScholarDigital Library
- P. Macko, D. Margo, and M. Seltzer. Performance introspection of graph databases. In Proceedings of the 6th International Systems and Storage Conference, page 18. ACM, 2013. Google ScholarDigital Library
- N. Martínez-Bazan, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M.-A. Sánchez-Martínez, and J.-L. Larriba-Pey. DEX: High-performance exploration on large graphs for information retrieval. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM '07, pages 573--582, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- M. Morsey, J. Lehmann, S. Auer, and A.-C. N. Ngomo. DBpedia SPARQL benchmark--performance assessment with real queries on real data. In The Semantic Web--ISWC 2011, pages 454--469. Springer, 2011. Google ScholarDigital Library
- R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray Users Group (CUG), 2010.Google Scholar
- T. Neumann and G. Weikum. RDF-3X: A RISC-style engine for RDF. Proc. VLDB Endow., 1(1):647--659, Aug. 2008. Google ScholarDigital Library
- T. Neumann and G. Weikum. x-RDF-3X: Fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow., 3(1--2):256--263, Sept. 2010. Google ScholarDigital Library
- K. Nitta and I. Savnik. Survey of RDF storage managers. In DBKDA 2014, The Sixth International Conference on Advances in Databases, Knowledge, and Data Applications, pages 148--153, 2014.Google Scholar
- N. Papailiou, I. Konstantinou, D. Tsoumakos, and N. Koziris. H2RDF: Adaptive query processing on RDF data in the cloud. In Proceedings of the 21st International Conference Companion on World Wide Web, WWW '12 Companion, pages 397--400, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- N. Papailiou, D. Tsoumakos, I. Konstantinou, P. Karras, and N. Koziris. H2RDFGoogle Scholar
- : an efficient data management system for big RDF graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, Snowbird, Utah, USA on June 22--27, 2014. ACM, 2014. Google ScholarDigital Library
- M. Rodríguez-Muro, R. Kontchakov, and M. Zakharyaschev. Ontology-based data access: Ontop of databases. In International Semantic Web Conference, ISWC 2013, pages 558--573. Springer, 2013. Google ScholarDigital Library
- S. S. Sahoo, W. Halb, S. Hellmann, K. Idehen, S. Auer, J. Sequeda, and A. Ezzat. A survey of current approaches for mapping of relational databases to RDF. W3C RDB2RDF XG Incubator Report, 2009.Google Scholar
- S. Sakr and G. Al-Naymat. Relational processing of RDF queries: a survey. ACM SIGMOD Record, 38(4):23--28, 2010. Google ScholarDigital Library
- M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. SP2Bench: a SPARQL performance benchmark. In Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on, pages 222--233. IEEE, 2009. Google ScholarDigital Library
- M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In Proceedings of the 13th International Conference on Database Theory, ICDT '10, pages 4--33, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In Proceedings of the 17th International Conference on World Wide Web, WWW '08, pages 595--604, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Tinkerpop. Blueprints. Available: https://github.com/tinkerpop/blueprints/wiki, 2014.Google Scholar
- Tinkerpop. Gremlin pipes. Available: https://github.com/tinkerpop/pipes/wiki, 2014.Google Scholar
- Tinkerpop. Gremlin query language. Available: https://github.com/tinkerpop/gremlin/wiki, 2014.Google Scholar
- K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In Semantic Web and Databases Workshop, pages 131--150, 2003.Google Scholar
- P. Yuan, P. Liu, B. Wu, H. Jin, W. Zhang, and L. Liu. Triplebit: A fast and compact system for large scale RDF data. Proc. VLDB Endow., 6(7):517--528, May 2013. Google ScholarDigital Library
Index Terms
- SQLGraph: An Efficient Relational-Based Property Graph Store
Recommendations
Foundations of Modern Query Languages for Graph Databases
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further ...
Cypher: An Evolving Query Language for Property Graphs
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataThe Cypher property graph query language is an evolving language, originally designed and implemented as part of the Neo4j graph database, and it is currently used by several commercial database products and researchers. We describe Cypher 9, which is ...
PGQL: a property graph query language
GRADES '16: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and SystemsGraph-based approaches to data analysis have become more widespread, which has given need for a query language for graphs. Such a graph query language needs not only SQL-like functionality for querying structured data, but also intrinsic support for ...
Comments