skip to main content
research-article

Using domain-specific languages for analytic graph databases

Published:01 September 2016Publication History
Skip Abstract Section

Abstract

Recently graph has been drawing lots of attention both as a natural data model that captures fine-grained relationships between data entities and as a tool for powerful data analysis that considers such relationships. In this paper, we present a new graph database system that integrates a robust graph storage with an efficient graph analytics engine. Primarily, our system adopts two domain-specific languages (DSLs), one for describing graph analysis algorithms and the other for graph pattern matching queries. Compared to the API-based approaches in conventional graph processing systems, the DSL-based approach provides users with more flexible and intuitive ways of expressing algorithms and queries. Moreover, the DSL-based approach has significant performance benefits as well, (1) by skipping (remote) API invocation overhead and (2) by applying high-level optimization from the compiler.

References

  1. AllegroGraph. http://franz.com/agraph/allegrograph/.Google ScholarGoogle Scholar
  2. Apache Giraph Project. http://giraph.apache.org.Google ScholarGoogle Scholar
  3. Apache TinkerPop. http://tinkerpop.incubator.apache.org.Google ScholarGoogle Scholar
  4. Boost Graph Library (BGL). http://www.boost.org/doc/libs/1_55_0/libs/graph/doc/index.html.Google ScholarGoogle Scholar
  5. Cypher - the Neo4j query Language. http://www.neo4j.org/learn/cypher.Google ScholarGoogle Scholar
  6. InfiniteGraph. http://www.objectivity.com/infinitegraph.Google ScholarGoogle Scholar
  7. Intel 64 and IA-32 Architectures Optimization Reference Manual. http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html,2016.Google ScholarGoogle Scholar
  8. Java universal network/graph framework. http://jung.sourceforge.net.Google ScholarGoogle Scholar
  9. Neo4j graph database. http://www.neo4j.org/.Google ScholarGoogle Scholar
  10. NetworkX. https://networkx.github.io.Google ScholarGoogle Scholar
  11. Oracle Spatial and Graph, RDF Semantic Graph,. http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/rdfsemantic-graph-1902016.html.Google ScholarGoogle Scholar
  12. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/.Google ScholarGoogle Scholar
  13. Stanford network analysis library. http://snap.stanford.edu/snap.Google ScholarGoogle Scholar
  14. Tinkerpop, Gremlin. https://github.com/tinkerpop/gremlin/wiki.Google ScholarGoogle Scholar
  15. Titan Distributed Graph Database. http://thinkaurelius.github.io/titan/.Google ScholarGoogle Scholar
  16. Virtuoso Universal Server. http://virtuoso.openlinksw.com/.Google ScholarGoogle Scholar
  17. Lada A. Adamic and Eytan Adar. Friends and neighbors on the web. Social Networks, 25(3):211--230, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  18. Pablo Barceló, Leonid Libkin, Anthony W. Lin, and Peter T. Wood. Expressive languages for path queries over graph-structured data. ACM Transactions on Database Systems (TODS), 37(4):31, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. David Ediger, Robert McColl, E. Jason Riedy, and David A. Bader. STINGER: High performance data structure for streaming graphs. In High Performance Extreme Computing (HPEC), pages 1--5. IEEE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel. The case against specialized graph analytics engines. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings, 2015.Google ScholarGoogle Scholar
  22. L. C. Freeman. Centrality in social networks: Conceptual clarification. Social Networks, 1(3):215--239, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  23. Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. Wtf: The who to follow service at twitter. In Proceedings of the 22nd international conference on World Wide Web, pages 505--514. International World Wide Web Conferences Steering Committee, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-Marl: A DSL for Easy and Efficient Graph Analysis. In ASPLOS, pages 349--362. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, Merijn Verstraaten, and Hassan Chafi. Pgx.d: A fast distributed graph processing engine. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '15, pages 58:1--58:12, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sungpack Hong, Semih Salihoglu, Jennifer Widom, and Kunle Olukotun. Simplifying scalable graph processing with a domain-specific language. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 208:208--208:218, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sungpack Hong, Jan Van Der Lugt, Adam Welc, Raghavan Raman, and Hassan Chafi. Early experiences in using a domain-specific language for large-scale graph analysis. In First International Workshop on Graph Data Management Experiences and Systems, page 5. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Volker Kaibel and Matthias A. F. Peinhardt. On the bottleneck shortest path problem, technical report, ZIB-Report, 2006.Google ScholarGoogle Scholar
  29. U Kang, Charalampos E Tsourakakis, and Christos Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In IEEE International Conference on Data Mining (ICDM), pages 229--238, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 591--600. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kamesh Madduri, David Ediger, Karl Jiang, David A. Bader, and Daniel Chavarria-Miranda. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In Proceedings of the 2009 IEEE IPDPS, IPDPS '09, pages 1--8, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD. Proceedings of the 2010 international conference on Management of data, pages 135--146. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. C. McColl, D. Ediger, J. Poovey, D. Campbell, and D. A. Bader. A performance evaluation of open source graph databases. In Proceedings of the First Workshop on PPAA, PPAA '14, pages 11--18, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Page. Method for node ranking in a linked database, September 4 2001. US Patent 6,285,999.Google ScholarGoogle Scholar
  37. A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 472--488. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Martin Sevenich, Sungpack Hong, Adam Welc, and Hassan Chafi. Fast in-memory triangle listing for large real-world graphs. In Proceedings of the 8th Workshop on Social Network Mining and Analysis, SNAKDD'14, pages 2:1--2:9, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jyothish Soman and Ankur Narang. Fast community detection algorithm with gpus and multicore architectures. In IPDPS, pages 568--579. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. Sqlgraph: An efficient relational-based property graph store. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 1887--1901, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In Proceedings of the 20th international conference on World wide web, pages 607--614. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Robert Endre Tarjan. Depth-first search and linear graph algorithms. SIAM J. Comput., 1(2):146--160, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Then, M. Kaufmann, F. Chirigati, T. Hoang-Vu, K. Pham, A. Kemper, T. Neumann, and H. T. Vo. The more the merrier: Efficient multi-source graph traversal. Proc. VLDB Endow., 8(4):449--460, December 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Adam Welc, Raghavan Raman, Zhe Wu, Sungpack Hong, Hassan Chafi, and Jay Banerjee. Graph analysis: do we have to reinvent the wheel? In First International Workshop on Graph Data Management Experiences and Systems, page 7. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Wittmann, T. Zeiser, G. Hager, and G. Wellein. Short note on costs of floating point operations on current x86-64 architectures: Denormals, overflow, underflow, and division by zero. CoRR, abs/1506.03997, 2015.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 9, Issue 13
    September 2016
    378 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 September 2016
    Published in pvldb Volume 9, Issue 13

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader