ABSTRACT
Large-scale graph processing, with its massive data sets, requires distributed processing. However, conventional frameworks for distributed graph processing, such as Pregel, use non-traditional programming models that are well-suited for parallelism and scalability but inconvenient for implementing non-trivial graph algorithms. In this paper, we use Green-Marl, a Domain-Specific Language for graph analysis, to intuitively describe graph algorithms and extend its compiler to generate equivalent Pregel implementations. Using the semantic information captured by Green-Marl, the compiler applies a set of transformation rules that convert imperative graph algorithms into Pregel's programming model. Our experiments show that the Pregel programs generated by the Green-Marl compiler perform similarly to manually coded Pregel implementations of the same algorithms. The compiler is even able to generate a Pregel implementation of a complicated graph algorithm for which a manual Pregel implementation is very challenging.
- http://github.com/tinkerpop/gremlin/wiki.Google Scholar
- Apache Giraph Project. http://giraph.apache.org.Google Scholar
- The laboratory for web algorithmics. http://law.dsi.unimi.it/datasets.php.Google Scholar
- D. Bader and K. Madduri. Parallel algorithms for evaluating centrality indices in real-world networks. In IEEE ICPP 2006. Google ScholarDigital Library
- D. A. Bader and K. Madduri. SNAP: Small-world Network Analysis and Partitioning. http://snap-graph.sourceforge.net.Google Scholar
- U. Brandes. A Faster Algorithm for Betweenness Centrality. The Journal of Mathematical Sociology, 25(2):163--177, 2001.Google ScholarCross Ref
- Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient Iterative Data Processing on Large Clusters. VLDB, pages 285--296, 2010. Google ScholarDigital Library
- R. Chen, X. Weng, B. He, and M. Yang. Large Graph Processing in the Cloud. In SIGMOD, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004. Google ScholarDigital Library
- Apache Hadoop. http://hadoop.apache.org/.Google Scholar
- S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for Easy and Ffficient Graph Analysis. In ASPLOS. ACM, 2012. Google ScholarDigital Library
- S. Hong, S. Salihoglu, J. Widom, and K. Olukotun. Tech Report: Compiling Green-Marl into GPS. http://ppl.stanford.edu/papers/tr_gm_gps.pdf.Google Scholar
- S. Hong, J. Van Der Lugt, A. Welc, R. Raman, and H. Chafi. Early experiences in using a domain-specific language for large-scale graph analysis. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13. ACM. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. VLDB, 5(8), 2012. Google ScholarDigital Library
- K. Madduri, D. Ediger, K. Jiang, D. Bader, and D. Chavarria-Miranda. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In IPDPS, 2009. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD '10. ACM. Google ScholarDigital Library
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-so-foreign Language for Data Processing. In SIGMOD. ACM, 2008. Google ScholarDigital Library
- S. Salihoglu and J. Widom. GPS: Graph Processing System. http://infolab.stanford.edu/gps. Google ScholarDigital Library
- A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: A Warehousing Solution Over a Map-Reduce Framework. VLDB, 2(2), 2009. Google ScholarDigital Library
- Trinity. http://research.microsoft.com/en-us/projects/trinity/default.aspx.Google Scholar
- L. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103--111, 1990. Google ScholarDigital Library
- J. Willcock, T. Hoefler, N. Edmonds, and A. Lumsdaine. Active Pebbles: A Programming Model for Highly Parallel Fine-grained Data-driven Computations. In PPoPP, pages 305--306. ACM, 2011. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. HotCloud'10, 2010. Google ScholarDigital Library
Index Terms
- Simplifying Scalable Graph Processing with a Domain-Specific Language
Recommendations
Simplifying Scalable Graph Processing with a Domain-Specific Language
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationLarge-scale graph processing, with its massive data sets, requires distributed processing. However, conventional frameworks for distributed graph processing, such as Pregel, use non-traditional programming models that are well-suited for parallelism and ...
Declaratively defining domain-specific language debuggers
GCPE '11Tool support is vital to the effectiveness of domain-specific languages. With language workbenches, domain-specific languages and their tool support can be generated from a combined, high-level specification. This paper shows how such a specification ...
Declaratively defining domain-specific language debuggers
GPCE '11: Proceedings of the 10th ACM international conference on Generative programming and component engineeringTool support is vital to the effectiveness of domain-specific languages. With language workbenches, domain-specific languages and their tool support can be generated from a combined, high-level specification. This paper shows how such a specification ...
Comments