ABSTRACT
We investigate a new approach to the design of distributed, shared-nothing RDF engines. Our engine, coined "TriAD", combines join-ahead pruning via a novel form of RDF graph summarization with a locality-based, horizontal partitioning of RDF triples into a grid-like, distributed index structure. The multi-threaded and distributed execution of joins in TriAD is facilitated by an asynchronous Message Passing protocol which allows us to run multiple join operators along a query plan in a fully parallel, asynchronous fashion. We believe that our architecture provides a so far unique approach to join-ahead pruning in a distributed environment, as the more classical form of sideways information passing would not permit for executing distributed joins in an asynchronous way. Our experiments over the LUBM, BTC and WSDTS benchmarks demonstrate that TriAD consistently outperforms centralized RDF engines by up to two orders of magnitude, while gaining a factor of more than three compared to the currently fastest, distributed engines. To our knowledge, we are thus able to report the so far fastest query response times for the above benchmarks using a mid-range server and regular Ethernet setup.
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. SW-Store: A vertically partitioned DBMS for Semantic Web dat a management. The VLDB Journal, 18(2), 2009. Google ScholarDigital Library
- M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix "Bit" loaded: A scalable lightweight join query processor for RDF data. In WWW, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of ACM, 51(1), 2008. Google ScholarDigital Library
- J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). PVLDB, 3(1), 2010. Google ScholarDigital Library
- P. C. et al. NoSQL Databases for RDF: An Empirical Evaluation. In ISWC, 2013.Google Scholar
- T. M. Forum. MPI: A Message Passing Interface, 1993.Google Scholar
- S. Harris, N. Lamb, and N. Shadbolt. 4store: The design and implementation of a clustered RDF store. In SSWS, 2009.Google Scholar
- J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL Querying of Large RDF Graphs. PVLDB, 4(11), 2011.Google Scholar
- G. Karypis and V. Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput., 20(1), 1998. Google ScholarDigital Library
- J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2010. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010. Google ScholarDigital Library
- T. Milo and D. Suciu. Index Structures for Path Expressions. In ICDT'99, volume 1540. Springer Berlin Heidelberg, 1999. Google ScholarDigital Library
- T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In SIGMOD, 2009. Google ScholarDigital Library
- T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1), 2010. Google ScholarDigital Library
- T. Neumann and G. Weikum. x-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases. PVLDB, 2010. Google ScholarDigital Library
- F. Picalausa, Y. Luo, G. H. L. Fletcher, J. Hidders, and S. Vansummeren. A structural approach to indexing triples. In ESWC, 2012. Google ScholarDigital Library
- K. Rohloff and R. E. Schantz. Clause-iteration with Map Reduce to scalably query datagraphs in SHARD graph-store. In DIDC, 2011. Google ScholarDigital Library
- S. Sakr and G. Al-Naymat. Relational Processing of RDF Queries: A Survey. SIGMOD Rec., 38(4), 2010. Google ScholarDigital Library
- S. Seufert, A. Anand, S. J. Bedathur, and G. Weikum. FERRA RI: Flexible and efficient reachability range assignment for graph indexing. In ICDE, 2013. Google ScholarDigital Library
- Z. Shang and J. X. Yu. Catch the Wind: Graph workload balancing on cloud. In ICDE, 2013. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li. Trinity: a distributed graph engine on a memory cloud. In SIGMOD, 2013. Google ScholarDigital Library
- L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: not all swans are white. PVLDB, 1(2), 2008. Google ScholarDigital Library
- C. A. Sugar, Gareth, and M. James. Finding the number of clu sters in a data set: An information theoretic approach. Journal of the American Statistical Association, 98, 2003.Google Scholar
- P. Tsialiamanis, L. Sidirourgos, I. Fundulaki, V. Chris tophides, and P. A. Boncz. Heuristics-based query optimisation for SPARQL. In EDBT, 2012. Google ScholarDigital Library
- J. Webber. A programmatic introduction to Neo4j. In SPLASH, 2012. Google ScholarDigital Library
- C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1), 2008. Google ScholarDigital Library
- R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. GraphX: a resilient distributed graph system on Spark. In GRADES, 2013. Google ScholarDigital Library
- P. Yuan, P. Liu, B. Wu, H. Jin, W. Zhang, and L. Liu. Triple Bit: a Fast and Compact System for Large Scale RDF Data. PVLDB, 6(7), 2013. Google ScholarDigital Library
- M. Zaharia, N. M. M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. Tech Report UCB/EECS-2010-53, EECS Department, UC Berkeley, May 2010.Google Scholar
- K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A distributed graph engine for web scale RDF data. PVLDB, 6(4), 2013. Google ScholarDigital Library
- X. Zhang, L. Chen, Y. Tong, and M. Wang. EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud. In ICDE, 2013. Google ScholarDigital Library
- L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL Queries via Subgraph Matching. PVLDB, 4(8), 2011. Google ScholarDigital Library
Index Terms
- TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing
Recommendations
ConClass: A Framework for Real-Time Distributed Knowledge-Based Processing
We have developed a problem-solving framework, called ConClass, that is capable of classifying continuous real-time problems dynamically and concurrently on a distributed system. ConClass provides an efficient development environment for describing and ...
Deadlock-free asynchronous message reordering in rust with multiparty session types
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingRust is a modern systems language focused on performance and reliability. Complementing Rust's promise to provide "fearless concurrency", developers frequently exploit asynchronous message passing. Unfortunately, sending and receiving messages in an ...
Minimizing communication overhead for matrix inversion algorithms on hypercubes
IPPS '95: Proceedings of the 9th International Symposium on Parallel ProcessingWe propose novel parallel Gauss-Jordan inversion algorithms (with or without partial pivoting) under different data partitioning strategies. The machine model we assume is a MIMD hypercube, using asynchronous message passing, with the software ...
Comments