Parallelizing Sequential Graph Computations

Authors:
Wenfei Fan

University of Edinburgh & Beihang University, Edinburgh, United Kingdom

University of Edinburgh & Beihang University, Edinburgh, United Kingdom
View Profile

,
Jingbo Xu

University of Edinburgh & Beihang University, Edinburgh, United Kingdom

University of Edinburgh & Beihang University, Edinburgh, United Kingdom
View Profile

,
Yinghui Wu

Washington State University, Pullman, WA, USA

Washington State University, Pullman, WA, USA
View Profile

,
Wenyuan Yu

Beihang University, Beijing, China

Beihang University, Beijing, China
View Profile

,
Jiaxin Jiang

Hong Kong Baptist University, Hongkong, China

Hong Kong Baptist University, Hongkong, China
View Profile

,
Zeyu Zheng

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Bohan Zhang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Yang Cao

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Chao Tian

University of Edinburgh & Beihang University, Edinburgh, United Kingdom

University of Edinburgh & Beihang University, Edinburgh, United Kingdom
View Profile

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataMay 2017Pages 495–510https://doi.org/10.1145/3035918.3035942

Published:09 May 2017Publication History

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 495–510

ABSTRACT

This paper presents GRAPE, a parallel system for graph computations. GRAPE differs from prior systems in its ability to parallelize existing sequential graph algorithms as a whole. Underlying GRAPE are a simple programming model and a principled approach, based on partial evaluation and incremental computation. We show that sequential graph algorithms can be "plugged into" GRAPE with minor changes, and get parallelized. As long as the sequential algorithms are correct, their GRAPE parallelization guarantees to terminate with correct answers under a monotonic condition. Moreover, we show that algorithms in MapReduce, BSP and PRAM can be optimally simulated on GRAPE. In addition to the ease of programming, we experimentally verify that GRAPE achieves comparable performance to the state-of-the-art graph systems, using real-life and synthetic graphs.

References

Aliyun.sl https://intl.aliyun.com.Google Scholar
DBpedia.sl http://wiki.dbpedia.org/Datasets.Google Scholar
Giraph.sl http://giraph.apache.org/.Google Scholar
GRAPE.sl http://grapedb.io/.Google Scholar
Movielens.sl http://grouplens.org/datasets/movielens/.Google Scholar
MPICH.sl https://www.mpich.org/.Google Scholar
Snap.sl http://snap.stanford.edu/data/index.html.Google Scholar
Traffic.sl http://www.dis.uniroma1.it/challenge9/download.shtml.Google Scholar
U. A. Acar. Self-Adjusting Computation. PhD thesis, Carnegie Mellon University, 2005. Google ScholarDigital Library
J. Bang-Jensen and G. Z. Gutin. Digraphs: Theory, Algorithms and Applications. Springer, 2008. Google ScholarDigital Library
P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Comput. Surv., 13(2):185--221. Google ScholarDigital Library
E. G. Boman, K. D. Devine, and S. Rajamanickam. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In SC, 2013. Google ScholarDigital Library
F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In SIGKDD, pages 1456--1465, 2014. Google ScholarDigital Library
P. Buneman, G. Cong, W. Fan, and A. Kementsietsidis. Using partial evaluation in distributed query evaluation. In VLDB, 2006. Google ScholarDigital Library
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SICOMP, 32(5):1338--1355, 2003. Google ScholarDigital Library
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub) graph isomorphism algorithm for matching large graphs. TPAMI, 26(10):1367--1372, 2004. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1), 2008. Google ScholarDigital Library
W. Fan, C. Hu, and C. Tian. Incremental graph computations: Doable and undoable. In SIGMOD, 2017. Google ScholarDigital Library
W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractability to polynomial time. In PVLDB, 2010. Google ScholarDigital Library
W. Fan, J. Li, X. Wang, and Y. Wu. Query preserving graph compression. In SIGMOD, 2012. Google ScholarDigital Library
W. Fan, X. Wang, and Y. Wu. Incremental graph pattern matching. TODS, 38(3), 2013. Google ScholarDigital Library
W. Fan, X. Wang, and Y. Wu. Distributed graph simulation: Impossibility and possibility. PVLDB, 7(12), 2014. Google ScholarDigital Library
M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. JACM, 34(3):596--615, 1987. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In USENIX, 2012.Google ScholarDigital Library
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, 2014. Google ScholarDigital Library
T. J. Harris. A survey of PRAM simulation techniques. ACM Comput. Surv., 26(2):187--206, 1994. Google ScholarDigital Library
M. R. Henzinger, T. Henzinger, and P. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995. Google ScholarDigital Library
N. D. Jones. An introduction to partial evaluation. ACM Computing Surveys, 28(3), 1996. Google ScholarDigital Library
H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In SODA, 2010. Google ScholarDigital Library
G. Karypis and V. Kumar. METIS--unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical report, 1995.Google Scholar
A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. Nema: Fast graph search with label similarity. PVLDB, 6(3), 2013. Google ScholarDigital Library
M. Kim and K. S. Candan. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data & Knowledge Engineering, 72:285--303, 2012. Google ScholarDigital Library
Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, 2009. Google ScholarDigital Library
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB, 5(8), 2012. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010. Google ScholarDigital Library
T. Mytkowicz, M. Musuvathi, and W. Schulte. Data-parallel finite-state machines. In ASPLOS, 2014. Google ScholarDigital Library
K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, et al. The tao of parallelism in algorithms. In ACM Sigplan Notices, volume 46, pages 12--25, 2011. Google ScholarDigital Library
C. Radoi, S. J. Fink, R. M. Rabbah, and M. Sridharan. Translating imperative code to MapReduce. In OOPSLA, 2014. Google ScholarDigital Library
G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path problem. J. Algorithms, 21(2):267--305, 1996. Google ScholarDigital Library
G. Ramalingam and T. Reps. On the computational complexity of dynamic graph problems. TCS, 158(1--2), 1996. Google ScholarDigital Library
V. Raychev, M. Musuvathi, and T. Mytkowicz. Parallelizing user-defined aggregations using symbolic execution. In SOSP, 2015. Google ScholarDigital Library
S. Salihoglu and J. Widom. GPS: a graph processing system. In SSDBM, 2013. Google ScholarDigital Library
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In KDD, pages 1222--1230, 2012. Google ScholarDigital Library
Y. Tian, A. Balmin, S. A. Corsten, and J. M. Shirish Tatikonda. From "think like a vertex" to "think like a graph". PVLDB, 7(7):193--204, 2013. Google ScholarDigital Library
P. Trinder. A Functional Database. PhD thesis, University of Oxford, 1989. Google ScholarDigital Library
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarDigital Library
L. G. Valiant. General purpose parallel architectures. In Handbook of Theoretical Computer Science, Vol A. 1990. Google ScholarDigital Library
J. Vinagre, A. M. Jorge, and J. Gama. Fast incremental matrix factorization for recommendation with positive-only feedback. In International Conference on User Modeling, Adaptation, and Personalization, 2014.Google ScholarCross Ref
G. Wang, W. Xie, A. J. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In CIDR, 2013.Google Scholar
D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14):1981--1992, 2014. Google ScholarDigital Library
D. Yan, J. Cheng, K. Xing, Y. Lu, W. Ng, and Y. Bu. Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB, 7(14):1821--1832, 2014. Google ScholarDigital Library
Y. Zhou, L. Liu, K. Lee, C. Pu, and Q. Zhang. Fast iterative graph computation with resource aware graph parallel abstractions. In HPDC, 2015. Google ScholarDigital Library

Index Terms

Parallelizing Sequential Graph Computations
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs

Recommendations

Parallelizing Sequential Graph Computations
Best of SIGMOD 2017 Papers

This article presents GRAPE, a parallel <underline>GRAP</underline>h <underline>E</underline>ngine for graph computations. GRAPE differs from prior systems in its ability to parallelize existing sequential graph algorithms as a whole, without the need ...
Read More
Parallelizing Subroutines in Sequential Programs

An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran ...
Read More
A communication-reduced and computation-balanced framework for fast graph computation

The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graph-processing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
graph computation
in-cremental evaluation
parallel model
partial evaluation
scalability
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 3,918
  Total Downloads
- Downloads (Last 12 months)371
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallelizing Sequential Graph Computations

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Parallelizing Sequential Graph Computations

Parallelizing Subroutines in Sequential Programs

A communication-reduced and computation-balanced framework for fast graph computation