skip to main content
10.1145/2581122.2544162acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
tutorial

Simplifying Scalable Graph Processing with a Domain-Specific Language

Published:15 February 2014Publication History

ABSTRACT

Large-scale graph processing, with its massive data sets, requires distributed processing. However, conventional frameworks for distributed graph processing, such as Pregel, use non-traditional programming models that are well-suited for parallelism and scalability but inconvenient for implementing non-trivial graph algorithms. In this paper, we use Green-Marl, a Domain-Specific Language for graph analysis, to intuitively describe graph algorithms and extend its compiler to generate equivalent Pregel implementations. Using the semantic information captured by Green-Marl, the compiler applies a set of transformation rules that convert imperative graph algorithms into Pregel's programming model. Our experiments show that the Pregel programs generated by the Green-Marl compiler perform similarly to manually coded Pregel implementations of the same algorithms. The compiler is even able to generate a Pregel implementation of a complicated graph algorithm for which a manual Pregel implementation is very challenging.

References

  1. http://github.com/tinkerpop/gremlin/wiki.Google ScholarGoogle Scholar
  2. Apache Giraph Project. http://giraph.apache.org.Google ScholarGoogle Scholar
  3. The laboratory for web algorithmics. http://law.dsi.unimi.it/datasets.php.Google ScholarGoogle Scholar
  4. D. Bader and K. Madduri. Parallel algorithms for evaluating centrality indices in real-world networks. In IEEE ICPP 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. A. Bader and K. Madduri. SNAP: Small-world Network Analysis and Partitioning. http://snap-graph.sourceforge.net.Google ScholarGoogle Scholar
  6. U. Brandes. A Faster Algorithm for Betweenness Centrality. The Journal of Mathematical Sociology, 25(2):163--177, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient Iterative Data Processing on Large Clusters. VLDB, pages 285--296, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Chen, X. Weng, B. He, and M. Yang. Large Graph Processing in the Cloud. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  11. S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for Easy and Ffficient Graph Analysis. In ASPLOS. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Hong, S. Salihoglu, J. Widom, and K. Olukotun. Tech Report: Compiling Green-Marl into GPS. http://ppl.stanford.edu/papers/tr_gm_gps.pdf.Google ScholarGoogle Scholar
  13. S. Hong, J. Van Der Lugt, A. Welc, R. Raman, and H. Chafi. Early experiences in using a domain-specific language for large-scale graph analysis. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. VLDB, 5(8), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Madduri, D. Ediger, K. Jiang, D. Bader, and D. Chavarria-Miranda. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In IPDPS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD '10. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-so-foreign Language for Data Processing. In SIGMOD. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Salihoglu and J. Widom. GPS: Graph Processing System. http://infolab.stanford.edu/gps. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: A Warehousing Solution Over a Map-Reduce Framework. VLDB, 2(2), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Trinity. http://research.microsoft.com/en-us/projects/trinity/default.aspx.Google ScholarGoogle Scholar
  21. L. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Willcock, T. Hoefler, N. Edmonds, and A. Lumsdaine. Active Pebbles: A Programming Model for Highly Parallel Fine-grained Data-driven Computations. In PPoPP, pages 305--306. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. HotCloud'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Simplifying Scalable Graph Processing with a Domain-Specific Language

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
      February 2014
      328 pages
      ISBN:9781450326704
      DOI:10.1145/2581122

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 February 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • tutorial
      • Research
      • Refereed limited

      Acceptance Rates

      CGO '14 Paper Acceptance Rate29of100submissions,29%Overall Acceptance Rate312of1,061submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader