ABSTRACT
This paper presents GRAPE, a parallel system for graph computations. GRAPE differs from prior systems in its ability to parallelize existing sequential graph algorithms as a whole. Underlying GRAPE are a simple programming model and a principled approach, based on partial evaluation and incremental computation. We show that sequential graph algorithms can be "plugged into" GRAPE with minor changes, and get parallelized. As long as the sequential algorithms are correct, their GRAPE parallelization guarantees to terminate with correct answers under a monotonic condition. Moreover, we show that algorithms in MapReduce, BSP and PRAM can be optimally simulated on GRAPE. In addition to the ease of programming, we experimentally verify that GRAPE achieves comparable performance to the state-of-the-art graph systems, using real-life and synthetic graphs.
- Aliyun.sl https://intl.aliyun.com.Google Scholar
- DBpedia.sl http://wiki.dbpedia.org/Datasets.Google Scholar
- Giraph.sl http://giraph.apache.org/.Google Scholar
- GRAPE.sl http://grapedb.io/.Google Scholar
- Movielens.sl http://grouplens.org/datasets/movielens/.Google Scholar
- MPICH.sl https://www.mpich.org/.Google Scholar
- Snap.sl http://snap.stanford.edu/data/index.html.Google Scholar
- Traffic.sl http://www.dis.uniroma1.it/challenge9/download.shtml.Google Scholar
- U. A. Acar. Self-Adjusting Computation. PhD thesis, Carnegie Mellon University, 2005. Google ScholarDigital Library
- J. Bang-Jensen and G. Z. Gutin. Digraphs: Theory, Algorithms and Applications. Springer, 2008. Google ScholarDigital Library
- P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Comput. Surv., 13(2):185--221. Google ScholarDigital Library
- E. G. Boman, K. D. Devine, and S. Rajamanickam. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In SC, 2013. Google ScholarDigital Library
- F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In SIGKDD, pages 1456--1465, 2014. Google ScholarDigital Library
- P. Buneman, G. Cong, W. Fan, and A. Kementsietsidis. Using partial evaluation in distributed query evaluation. In VLDB, 2006. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SICOMP, 32(5):1338--1355, 2003. Google ScholarDigital Library
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub) graph isomorphism algorithm for matching large graphs. TPAMI, 26(10):1367--1372, 2004. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1), 2008. Google ScholarDigital Library
- W. Fan, C. Hu, and C. Tian. Incremental graph computations: Doable and undoable. In SIGMOD, 2017. Google ScholarDigital Library
- W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractability to polynomial time. In PVLDB, 2010. Google ScholarDigital Library
- W. Fan, J. Li, X. Wang, and Y. Wu. Query preserving graph compression. In SIGMOD, 2012. Google ScholarDigital Library
- W. Fan, X. Wang, and Y. Wu. Incremental graph pattern matching. TODS, 38(3), 2013. Google ScholarDigital Library
- W. Fan, X. Wang, and Y. Wu. Distributed graph simulation: Impossibility and possibility. PVLDB, 7(12), 2014. Google ScholarDigital Library
- M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. JACM, 34(3):596--615, 1987. Google ScholarDigital Library
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In USENIX, 2012.Google ScholarDigital Library
- J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, 2014. Google ScholarDigital Library
- T. J. Harris. A survey of PRAM simulation techniques. ACM Comput. Surv., 26(2):187--206, 1994. Google ScholarDigital Library
- M. R. Henzinger, T. Henzinger, and P. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995. Google ScholarDigital Library
- N. D. Jones. An introduction to partial evaluation. ACM Computing Surveys, 28(3), 1996. Google ScholarDigital Library
- H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In SODA, 2010. Google ScholarDigital Library
- G. Karypis and V. Kumar. METIS--unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical report, 1995.Google Scholar
- A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. Nema: Fast graph search with label similarity. PVLDB, 6(3), 2013. Google ScholarDigital Library
- M. Kim and K. S. Candan. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data & Knowledge Engineering, 72:285--303, 2012. Google ScholarDigital Library
- Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, 2009. Google ScholarDigital Library
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB, 5(8), 2012. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010. Google ScholarDigital Library
- T. Mytkowicz, M. Musuvathi, and W. Schulte. Data-parallel finite-state machines. In ASPLOS, 2014. Google ScholarDigital Library
- K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, et al. The tao of parallelism in algorithms. In ACM Sigplan Notices, volume 46, pages 12--25, 2011. Google ScholarDigital Library
- C. Radoi, S. J. Fink, R. M. Rabbah, and M. Sridharan. Translating imperative code to MapReduce. In OOPSLA, 2014. Google ScholarDigital Library
- G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path problem. J. Algorithms, 21(2):267--305, 1996. Google ScholarDigital Library
- G. Ramalingam and T. Reps. On the computational complexity of dynamic graph problems. TCS, 158(1--2), 1996. Google ScholarDigital Library
- V. Raychev, M. Musuvathi, and T. Mytkowicz. Parallelizing user-defined aggregations using symbolic execution. In SOSP, 2015. Google ScholarDigital Library
- S. Salihoglu and J. Widom. GPS: a graph processing system. In SSDBM, 2013. Google ScholarDigital Library
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In KDD, pages 1222--1230, 2012. Google ScholarDigital Library
- Y. Tian, A. Balmin, S. A. Corsten, and J. M. Shirish Tatikonda. From "think like a vertex" to "think like a graph". PVLDB, 7(7):193--204, 2013. Google ScholarDigital Library
- P. Trinder. A Functional Database. PhD thesis, University of Oxford, 1989. Google ScholarDigital Library
- L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarDigital Library
- L. G. Valiant. General purpose parallel architectures. In Handbook of Theoretical Computer Science, Vol A. 1990. Google ScholarDigital Library
- J. Vinagre, A. M. Jorge, and J. Gama. Fast incremental matrix factorization for recommendation with positive-only feedback. In International Conference on User Modeling, Adaptation, and Personalization, 2014.Google ScholarCross Ref
- G. Wang, W. Xie, A. J. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In CIDR, 2013.Google Scholar
- D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14):1981--1992, 2014. Google ScholarDigital Library
- D. Yan, J. Cheng, K. Xing, Y. Lu, W. Ng, and Y. Bu. Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB, 7(14):1821--1832, 2014. Google ScholarDigital Library
- Y. Zhou, L. Liu, K. Lee, C. Pu, and Q. Zhang. Fast iterative graph computation with resource aware graph parallel abstractions. In HPDC, 2015. Google ScholarDigital Library
Index Terms
- Parallelizing Sequential Graph Computations
Recommendations
Parallelizing Sequential Graph Computations
Best of SIGMOD 2017 PapersThis article presents GRAPE, a parallel <underline>GRAP</underline>h <underline>E</underline>ngine for graph computations. GRAPE differs from prior systems in its ability to parallelize existing sequential graph algorithms as a whole, without the need ...
Parallelizing Subroutines in Sequential Programs
An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran ...
A communication-reduced and computation-balanced framework for fast graph computation
The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graph-processing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high ...
Comments