research-article

TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing

Authors:
Sairam Gurajada

Max-Planck Institute for Informatics, Saarbrücken, Germany

Max-Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Stephan Seufert

Max-Planck Institute for Informatics, Saarbrücken, Germany

Max-Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Iris Miliaraki

Max-Planck Institute for Informatics, Saarbrücken, Germany

Max-Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Martin Theobald

University of Antwerp, Antwerp, Belgium

University of Antwerp, Antwerp, Belgium
View Profile

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataJune 2014Pages 289–300https://doi.org/10.1145/2588555.2610511

Published:18 June 2014Publication History

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 289–300

ABSTRACT

We investigate a new approach to the design of distributed, shared-nothing RDF engines. Our engine, coined "TriAD", combines join-ahead pruning via a novel form of RDF graph summarization with a locality-based, horizontal partitioning of RDF triples into a grid-like, distributed index structure. The multi-threaded and distributed execution of joins in TriAD is facilitated by an asynchronous Message Passing protocol which allows us to run multiple join operators along a query plan in a fully parallel, asynchronous fashion. We believe that our architecture provides a so far unique approach to join-ahead pruning in a distributed environment, as the more classical form of sideways information passing would not permit for executing distributed joins in an asynchronous way. Our experiments over the LUBM, BTC and WSDTS benchmarks demonstrate that TriAD consistently outperforms centralized RDF engines by up to two orders of magnitude, while gaining a factor of more than three compared to the currently fastest, distributed engines. To our knowledge, we are thus able to report the so far fastest query response times for the above benchmarks using a mid-range server and regular Ethernet setup.

References

D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. SW-Store: A vertically partitioned DBMS for Semantic Web dat a management. The VLDB Journal, 18(2), 2009. Google ScholarDigital Library
M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix "Bit" loaded: A scalable lightweight join query processor for RDF data. In WWW, 2010. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of ACM, 51(1), 2008. Google ScholarDigital Library
J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). PVLDB, 3(1), 2010. Google ScholarDigital Library
P. C. et al. NoSQL Databases for RDF: An Empirical Evaluation. In ISWC, 2013.Google Scholar
T. M. Forum. MPI: A Message Passing Interface, 1993.Google Scholar
S. Harris, N. Lamb, and N. Shadbolt. 4store: The design and implementation of a clustered RDF store. In SSWS, 2009.Google Scholar
J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL Querying of Large RDF Graphs. PVLDB, 4(11), 2011.Google Scholar
G. Karypis and V. Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput., 20(1), 1998. Google ScholarDigital Library
J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2010. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010. Google ScholarDigital Library
T. Milo and D. Suciu. Index Structures for Path Expressions. In ICDT'99, volume 1540. Springer Berlin Heidelberg, 1999. Google ScholarDigital Library
T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In SIGMOD, 2009. Google ScholarDigital Library
T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1), 2010. Google ScholarDigital Library
T. Neumann and G. Weikum. x-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases. PVLDB, 2010. Google ScholarDigital Library
F. Picalausa, Y. Luo, G. H. L. Fletcher, J. Hidders, and S. Vansummeren. A structural approach to indexing triples. In ESWC, 2012. Google ScholarDigital Library
K. Rohloff and R. E. Schantz. Clause-iteration with Map Reduce to scalably query datagraphs in SHARD graph-store. In DIDC, 2011. Google ScholarDigital Library
S. Sakr and G. Al-Naymat. Relational Processing of RDF Queries: A Survey. SIGMOD Rec., 38(4), 2010. Google ScholarDigital Library
S. Seufert, A. Anand, S. J. Bedathur, and G. Weikum. FERRA RI: Flexible and efficient reachability range assignment for graph indexing. In ICDE, 2013. Google ScholarDigital Library
Z. Shang and J. X. Yu. Catch the Wind: Graph workload balancing on cloud. In ICDE, 2013. Google ScholarDigital Library
B. Shao, H. Wang, and Y. Li. Trinity: a distributed graph engine on a memory cloud. In SIGMOD, 2013. Google ScholarDigital Library
L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: not all swans are white. PVLDB, 1(2), 2008. Google ScholarDigital Library
C. A. Sugar, Gareth, and M. James. Finding the number of clu sters in a data set: An information theoretic approach. Journal of the American Statistical Association, 98, 2003.Google Scholar
P. Tsialiamanis, L. Sidirourgos, I. Fundulaki, V. Chris tophides, and P. A. Boncz. Heuristics-based query optimisation for SPARQL. In EDBT, 2012. Google ScholarDigital Library
J. Webber. A programmatic introduction to Neo4j. In SPLASH, 2012. Google ScholarDigital Library
C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1), 2008. Google ScholarDigital Library
R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. GraphX: a resilient distributed graph system on Spark. In GRADES, 2013. Google ScholarDigital Library
P. Yuan, P. Liu, B. Wu, H. Jin, W. Zhang, and L. Liu. Triple Bit: a Fast and Compact System for Large Scale RDF Data. PVLDB, 6(7), 2013. Google ScholarDigital Library
M. Zaharia, N. M. M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. Tech Report UCB/EECS-2010-53, EECS Department, UC Berkeley, May 2010.Google Scholar
K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A distributed graph engine for web scale RDF data. PVLDB, 6(4), 2013. Google ScholarDigital Library
X. Zhang, L. Chen, Y. Tong, and M. Wang. EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud. In ICDE, 2013. Google ScholarDigital Library
L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL Queries via Subgraph Matching. PVLDB, 4(8), 2011. Google ScholarDigital Library

Index Terms

TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
      2. Parallel and distributed DBMSs
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

ConClass: A Framework for Real-Time Distributed Knowledge-Based Processing

We have developed a problem-solving framework, called ConClass, that is capable of classifying continuous real-time problems dynamically and concurrently on a distributed system. ConClass provides an efficient development environment for describing and ...
Read More
Deadlock-free asynchronous message reordering in rust with multiparty session types
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Rust is a modern systems language focused on performance and reliability. Complementing Rust's promise to provide "fearless concurrency", developers frequently exploit asynchronous message passing. Unfortunately, sending and receiving messages in an ...
Read More
Minimizing communication overhead for matrix inversion algorithms on hypercubes
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

We propose novel parallel Gauss-Jordan inversion algorithms (with or without partial pivoting) under different data partitioning strategies. The machine model we assume is a MIMD hypercube, using asynchronous message passing, with the software ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
asynchronous message passing
distributed RDF indexing & SparQL processing
join-ahead pruning
parallel join evaluation
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 119
  Total Citations
  View Citations
- 948
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

ConClass: A Framework for Real-Time Distributed Knowledge-Based Processing

Deadlock-free asynchronous message reordering in rust with multiparty session types

Minimizing communication overhead for matrix inversion algorithms on hypercubes