skip to main content
research-article

Parallelizing Sequential Graph Computations

Published:16 December 2018Publication History
Skip Abstract Section

Abstract

This article presents GRAPE, a parallel <underline>GRAP</underline>h <underline>E</underline>ngine for graph computations. GRAPE differs from prior systems in its ability to parallelize existing sequential graph algorithms as a whole, without the need for recasting the entire algorithm into a new model. Underlying GRAPE are a simple programming model and a principled approach based on fixpoint computation that starts with partial evaluation and uses an incremental function as the intermediate consequence operator. We show that users can devise existing sequential graph algorithms with minor additions, and GRAPE parallelizes the computation. Under a monotonic condition, the GRAPE parallelization guarantees to converge at correct answers as long as the sequential algorithms are correct. Moreover, we show that algorithms in MapReduce, BSP, and PRAM can be optimally simulated on GRAPE. In addition to the ease of programming, we experimentally verify that GRAPE achieves comparable performance to the state-of-the-art graph systems using real-life and synthetic graphs.

Skip Supplemental Material Section

Supplemental Material

References

  1. 2006. UKWeb. http://law.di.unimi.it/webdata/uk-union-2006-06-2007-05/.Google ScholarGoogle Scholar
  2. 2010. Traffic. http://www.dis.uniroma1.it/challenge9/download.shtml.Google ScholarGoogle Scholar
  3. 2011. Movielens. http://grouplens.org/datasets/movielens/.Google ScholarGoogle Scholar
  4. 2012. Friendster. https://snap.stanford.edu/data/com-Friendster.html.Google ScholarGoogle Scholar
  5. 2012. MPICH. https://www.mpich.org/.Google ScholarGoogle Scholar
  6. 2014. Giraph. http://giraph.apache.org/.Google ScholarGoogle Scholar
  7. 2015. DBpedia. http://wiki.dbpedia.org/Datasets.Google ScholarGoogle Scholar
  8. 2017. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  9. 2017. GRAPE. http://grapedb.io/.Google ScholarGoogle Scholar
  10. 2017. Nethogs. https://github.com/raboof/nethogs.Google ScholarGoogle Scholar
  11. Umut A. Acar. 2005. Self-Adjusting Computation. Ph.D. Dissertation. CMU. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Konstantin Andreev and Harald Racke. 2006. Balanced graph partitioning. Theory of Computing Systems 39, 6 (2006), 929--939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jrgen Bang-Jensen and Gregory Z. Gutin. 2008. Digraphs: Theory, Algorithms and Applications. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Philip A. Bernstein and Nathan Goodman. 1981. Concurrency control in distributed database systems. ACM Comput Surv. 13, 2 (1981), 185--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dimitri P. Bertsekas and John N. Tsitsiklis. 1997. Parallel and distributed computation: Numerical methods. (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pramod Kumar Bhatotia. 2015. Incremental Parallel and Distributed Systems. Ph.D. Dissertation. Saarland University.Google ScholarGoogle Scholar
  17. Erik G. Boman, Karen D. Devine, and Sivasankaran Rajamanickam. 2013. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO. ACM, 50:1--50:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Florian Bourse, Marc Lelarge, and Milan Vojnovic. 2014. Balanced graph edge partition. In Proceedings of the 20th International Conference on Knowledge Discovery and Data Mining (PSIGKDD'14), New York, NY. ACM, 1456--1465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Peter Buneman, Gao Cong, Wenfei Fan, and Anastasios Kementsietsidis. 2006. Using partial evaluation in distributed query evaluation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06), Seoul. VLDB Endowment, 211--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Badong Chen, Jianji Wang, Haiquan Zhao, Nanning Zheng, and José C. Príncipe. 2015. Convergence of a fixed-point algorithm under maximum correntropy criterion. IEEE Signal Processing Letters 22, 10 (2015), 1723--1727.Google ScholarGoogle ScholarCross RefCross Ref
  21. Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2-hop labels. SICOMP 32, 5 (2003), 1338--1355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. 2015. Keys for graphs. In Proceedings of the 41st International Conference on Very Large Data Bases (PVLDB'15), Kohala Coast. VLDB Endowment, 1590--1601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wenfei Fan, Chunming Hu, and Chao Tian. 2017. Incremental graph computations: Doable and undoable. In Proceedings of the International Conference on Management of Data (SIGMOD'17), Chicago, IL. ACM, 155--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu, and Yunpeng Wu. 2010. Graph pattern matching: From intractable to polynomial time. In Proceedings of the 36th International Conference on Very Large Data Bases (PVLDB'10), Singapore. VLDB Endowment, 264--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In Proceedings of the International Conference on Management of Data (SIGMOD'12), Scottsdale, AZ. ACM, 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wenfei Fan, Ping Lu, Xiaojian Luo, Jingbo Xu, Qiang Yin, Wenyuan Yu, and Ruiqi Xu. 2018. Adaptive asynchronous parallelization of graph algorithms. In Proceedings of the International Conference on Management of Data (SIGMOD'18), Houston, TX. ACM, 1141--1156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. Incremental graph pattern matching. TODS 38, 3 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wenfei Fan, Xin Wang, and Yinghui Wu. 2014. Distributed graph simulation: Impossibility and possibility. In Proceedings of the 40th International Conference on Very Large Data Bases (PVLDB'14), Hangzhou. VLDB Endowment, 1083--1094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015. Association rules with graph patterns. In Proceedings of the 41st International Conference on Very Large Data Bases (PVLDB'15), Kohala Coast. VLDB Endowment, 1502--1513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. 2017. Parallelizing sequential graph computations. In Proceedings of the International Conference on Management of Data (SIGMOD'17), Chicago, IL. ACM, 495--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Michael L. Fredman and Robert Endre Tarjan. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34, 3 (1987), 596--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th International Conference on Knowledge Discovery and Data Mining (KDD'11), San Diego, CA. ACM, 69--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI'14), Broomfield, CO. USENIX Association, 599--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Elaine T. Hale, Wotao Yin, and Yin Zhang. 2008. Fixed-point continuation for &ell;<sub>1</sub>-minimization: Methodology and convergence. SIAM Journal on Optimization 19, 3 (2008), 1107--1130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Ozsu, Xingfang Wang, and Tianqi Jin. 2014. An experimental comparison of Pregel-like graph processing systems. In Proceedings of the 40th International Conference on Very Large Data Bases (VLDB'14), Hangzhou. VLDB Endowment, 1047--1058. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the International Conference on Management of Data (SIGMOD'13), New York, NY. ACM, 337--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tim J. Harris. 1994. A survey of PRAM simulation techniques. ACM Computing Surveys 26, 2 (1994), 187--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. R. Henzinger, T. Henzinger, and P. Kopke. 1995. Computing simulations on finite and infinite graphs. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS'95), Milwaukee, Wisconsin. IEEE Computer Society, 453--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. 2013. More effective distributed ML via a stale synchronous parallel parameter server. In Proceedings of the 26th Advances in Neural Information Processing Systems (NIPS'13), Lake Tahoe, Nevada. Curran Associates Inc, 1223--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jiawei Jiang, Bin Cui, Ce Zhang, and Lele Yu. 2017. Heterogeneity-aware distributed parameter servers. In Proceedings of the International Conference on Management of Data (SIGMOD'17), Chicago, IL. ACM, 463--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. N. D. Jones. 1996. An introduction to partial evaluation. Computing Surveys 28, 3 (1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the Twenty-First Annual Symposium on Discrete Algorithms (SODA'10), Austin. SIAM, 938--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. George Karypis and Vipin Kumar. 1995. METIS--Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0. Technical Report.Google ScholarGoogle Scholar
  45. Arijit Khan, Yinghui Wu, Charu C. Aggarwal, and Xifeng Yan. 2013. Nema: Fast graph search with label similarity. In Proceedings of the 39th International Conference on Very Large Data Bases (PVLDB'13), Riva del Garda, Trento. VLDB Endowment, 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. 2013. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proceedings of the Eighth Eurosys Conference (EuroSys'13), Prague. ACM, 169--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mijung Kim and K. Selçuk Candan. 2012. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data 8 Knowledge Engineering 72 (2012), 285--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yehuda Koren, Robert Bell, Chris Volinsky, et al. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning in the cloud. In Proceedings of the 38th International Conference on Very Large Data Bases (PVLDB'12), Istanbul. VLDB Endowment, 716--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the International Conference on Management of Data (SIGMOD'10), Indianapolis. ACM, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Frank McSherry, Michael Isard, and Derek Gordon Murray. 2015. Scalability&excl; But at what COST? In Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS'15), Kartause Ittingen. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Joris M. Mooij and Hilbert J. Kappen. 2007. Sufficient conditions for convergence of the sum--product algorithm. IEEE Transactions on Information Theory 53, 12 (2007), 4422--4437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte. 2014. Data-parallel finite-state machines. In Proceedings of the Architectural Support for Programming Languages and Operating Systems (ASPLOS'14), Salt Lake City, UT. ACM, 529--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, et al. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd Conference on Programming Language Design and Implementation (PLDI'11), San Jose, CA. ACM, 12--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Cosmin Radoi, Stephen J. Fink, Rodric M. Rabbah, and Manu Sridharan. 2014. Translating imperative code to MapReduce. In Proceedings of the International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'14), part of (SPLASH'14), Portland, OR. ACM, 909--927. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. G. Ramalingam and Thomas Reps. 1996. An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms 21, 2 (1996), 267--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. G. Ramalingam and Thomas Reps. 1996. On the computational complexity of dynamic graph problems. Theoretical Computer Science 158, 1--2 (1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Veselin Raychev, Madanlal Musuvathi, and Todd Mytkowicz. 2015. Parallelizing user-defined aggregations using symbolic execution. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15), Monterey, CA. ACM, 153--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Proceedings of the Conference on Scientific and Statistical Database Management (SSDBM'13), Baltimore. ACM, 22:1--22:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Sebastian Schelter, Stephan Ewen, Kostas Tzoumas, and Volker Markl. 2013. “All roads lead to Rome”: Optimistic recovery for distributed iterative data processing. In Proceedings of the 22nd International Conference on Information and Knowledge Management (CIKM'13), San Francisco. ACM, 1919--1928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. George M. Slota, Sivasankaran Rajamanickam, Karen Devine, and Kamesh Madduri. 2017. Partitioning trillion-edge graphs in minutes. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'17), Orlando, FL. IEEE Computer Society, 646--655.Google ScholarGoogle ScholarCross RefCross Ref
  62. Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining (KDD'12), Beijing. ACM, 1222--1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, and John McPherson Shirish Tatikonda. 2013. From “Think Like a Vertex” to “Think Like a Graph.” In Proceedings of the 40th International Conference on Very Large Data Bases (PVLDB'13), Hangzhou. VLDB Endowment, 193--204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Phil Trinder. 1989. A Functional Database. Ph.D. Dissertation. University of Oxford. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Leslie G. Valiant. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990), 103--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Leslie G. Valiant. 1990. General purpose parallel architectures. In Handbook of Theoretical Computer Science, J. van Leeuwen (Ed.), Vol A. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Guozhang Wang, Wenlei Xie, Alan J. Demers, and Johannes Gehrke. 2013. Asynchronous large-scale graph processing made easy. In Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research (CIDR'13), Asilomar. www.cidrdb.org.Google ScholarGoogle Scholar
  68. Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: Time to fuse for distributed graph-parallel computation. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP'15), San Francisco. ACM, 194--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  70. Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. 2017. Big graph analytics platforms. Foundations and Trends in Databases 7, 1--2 (2017), 1--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. In Proceedings of the 40th International Conference on Very Large Data Bases (PVLDB'14), Hangzhou. VLDB Endowment, 1981--1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web (WWW'15), Florence. ACM, 1307--1317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Da Yan, James Cheng, Kai Xing, Yi Lu, Wilfred Ng, and Yingyi Bu. 2014. Pregel algorithms for graph connectivity problems with performance guarantees. In Proceedings of the 40th International Conference on Very Large Data Bases (PVLDB'14), Hangzhou. VLDB Endowment, 1821--1832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Zhensheng Zhang and Christos Douligeris. 1991. Convergence of synchronous and asynchronous algorithms in multiclass networks. In Proceedings of the Tenth Annual Joint Conference of the Computer and Communications Societies, Networking in the 90s. The Conference on Computer Communications (INFOCOM'91), Bal Harbour. IEEE, 939--943.Google ScholarGoogle ScholarCross RefCross Ref
  75. Yang Zhou, Ling Liu, Kisung Lee, Calton Pu, and Qi Zhang. 2015. Fast iterative graph computation with resource aware graph parallel abstractions. In Proceedings of the Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'2015), Portland. ACM, 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallelizing Sequential Graph Computations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 43, Issue 4
      Best of SIGMOD 2017 Papers
      December 2018
      173 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/3298792
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 December 2018
      • Accepted: 1 October 2018
      • Revised: 1 September 2018
      • Received: 1 November 2017
      Published in tods Volume 43, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader