skip to main content
10.1145/2588555.2610511acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing

Authors Info & Claims
Published:18 June 2014Publication History

ABSTRACT

We investigate a new approach to the design of distributed, shared-nothing RDF engines. Our engine, coined "TriAD", combines join-ahead pruning via a novel form of RDF graph summarization with a locality-based, horizontal partitioning of RDF triples into a grid-like, distributed index structure. The multi-threaded and distributed execution of joins in TriAD is facilitated by an asynchronous Message Passing protocol which allows us to run multiple join operators along a query plan in a fully parallel, asynchronous fashion. We believe that our architecture provides a so far unique approach to join-ahead pruning in a distributed environment, as the more classical form of sideways information passing would not permit for executing distributed joins in an asynchronous way. Our experiments over the LUBM, BTC and WSDTS benchmarks demonstrate that TriAD consistently outperforms centralized RDF engines by up to two orders of magnitude, while gaining a factor of more than three compared to the currently fastest, distributed engines. To our knowledge, we are thus able to report the so far fastest query response times for the above benchmarks using a mid-range server and regular Ethernet setup.

References

  1. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. SW-Store: A vertically partitioned DBMS for Semantic Web dat a management. The VLDB Journal, 18(2), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix "Bit" loaded: A scalable lightweight join query processor for RDF data. In WWW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of ACM, 51(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). PVLDB, 3(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. C. et al. NoSQL Databases for RDF: An Empirical Evaluation. In ISWC, 2013.Google ScholarGoogle Scholar
  6. T. M. Forum. MPI: A Message Passing Interface, 1993.Google ScholarGoogle Scholar
  7. S. Harris, N. Lamb, and N. Shadbolt. 4store: The design and implementation of a clustered RDF store. In SSWS, 2009.Google ScholarGoogle Scholar
  8. J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL Querying of Large RDF Graphs. PVLDB, 4(11), 2011.Google ScholarGoogle Scholar
  9. G. Karypis and V. Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput., 20(1), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Milo and D. Suciu. Index Structures for Path Expressions. In ICDT'99, volume 1540. Springer Berlin Heidelberg, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Neumann and G. Weikum. x-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases. PVLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Picalausa, Y. Luo, G. H. L. Fletcher, J. Hidders, and S. Vansummeren. A structural approach to indexing triples. In ESWC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Rohloff and R. E. Schantz. Clause-iteration with Map Reduce to scalably query datagraphs in SHARD graph-store. In DIDC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Sakr and G. Al-Naymat. Relational Processing of RDF Queries: A Survey. SIGMOD Rec., 38(4), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Seufert, A. Anand, S. J. Bedathur, and G. Weikum. FERRA RI: Flexible and efficient reachability range assignment for graph indexing. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Shang and J. X. Yu. Catch the Wind: Graph workload balancing on cloud. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Shao, H. Wang, and Y. Li. Trinity: a distributed graph engine on a memory cloud. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: not all swans are white. PVLDB, 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. A. Sugar, Gareth, and M. James. Finding the number of clu sters in a data set: An information theoretic approach. Journal of the American Statistical Association, 98, 2003.Google ScholarGoogle Scholar
  24. P. Tsialiamanis, L. Sidirourgos, I. Fundulaki, V. Chris tophides, and P. A. Boncz. Heuristics-based query optimisation for SPARQL. In EDBT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Webber. A programmatic introduction to Neo4j. In SPLASH, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. GraphX: a resilient distributed graph system on Spark. In GRADES, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Yuan, P. Liu, B. Wu, H. Jin, W. Zhang, and L. Liu. Triple Bit: a Fast and Compact System for Large Scale RDF Data. PVLDB, 6(7), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Zaharia, N. M. M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. Tech Report UCB/EECS-2010-53, EECS Department, UC Berkeley, May 2010.Google ScholarGoogle Scholar
  30. K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A distributed graph engine for web scale RDF data. PVLDB, 6(4), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Zhang, L. Chen, Y. Tong, and M. Wang. EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL Queries via Subgraph Matching. PVLDB, 4(8), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
          June 2014
          1645 pages
          ISBN:9781450323765
          DOI:10.1145/2588555

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader