skip to main content
10.1145/2807591.2807620acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

PGX.D: a fast distributed graph processing engine

Published:15 November 2015Publication History

ABSTRACT

Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x -- 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric.

References

  1. Apache Giraph Project. http://giraph.apache.org.Google ScholarGoogle Scholar
  2. Koblenz Network Collection. http://konect.uni-koblenz.de.Google ScholarGoogle Scholar
  3. Neo4j graph database. http://www.neo4j.org/.Google ScholarGoogle Scholar
  4. NetworkX. https://networkx.github.io.Google ScholarGoogle Scholar
  5. SNAP. http://snap.stanford.edu/data/.Google ScholarGoogle Scholar
  6. Yahoo! Labs Datasets. http://webscope.sandbox.yahoo.com/.Google ScholarGoogle Scholar
  7. Atul Adya, Jon Howell, Marvin Theimer, William J Bolosky, and John R Douceur. Cooperative task management without manual stack management. In USENIX Annual Technical Conference (ATEC), pages 289--302, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sutanay Choudhury, Lawrence Holder, George Chin, Khushbu Agarwal, and John Feo. A selectivity based approach to continuous pattern detection in streaming graphs. arXiv preprint arXiv:1503.00849, 2015.Google ScholarGoogle Scholar
  9. David Ediger, Robert McColl, Jason Riedy, and David A Bader. Stinger: High performance data structure for streaming graphs. In High Performance Extreme Computing (HPEC), pages 1--5, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jing Fan, Adalbert Gerald Soosai Raj, and Jinnesh M. Patel. A case against specialized graph analytics engines. In 7th Biennial Conference on Innovative Data Systems Research (CIDR), 2015.Google ScholarGoogle Scholar
  11. Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. Graphx: Graph processing in a distributed dataflow framework. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sairam Gurajada, Stephan Seufert, Iris Miliaraki, and Martin Theobald. Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 289--300. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  14. Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. Turbo iso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proc. of the 2013 ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-Marl: A DSL for Easy and Efficient Graph Analysis. In ASPLOS. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sungpack Hong, Semih Salihoglu, Jennifer Widom, and Kunle Olukotun. Simplifying scalable graph processing with a domain-specific language. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 208--218, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jin Huang, Rui Zhang, and Jeffrey Xu Yu. Technical report: Hyperx a framework for scalable hypergraph learning. 2015.Google ScholarGoogle Scholar
  18. U Kang, Charalampos E Tsourakakis, and Christos Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In IEEE International Conference on Data Mining (ICDM), pages 229--238, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In WWW '10: Proceedings of the 19th international conference on World wide web, pages 591--600. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Willis Lang, Stavros Harizopoulos, Jignesh M Patel, Mehul A Shah, and Dimitris Tsirogiannis. Towards energy-efficient database cluster design. Proceedings of the VLDB Endowment, 5(11):1684--1695, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Grzegorz Malewicz, Matthew H. Austern, Aart J. C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD '10, pages 135--146. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Robert Campbell McColl, David Ediger, Jason Poovey, Dan Campbell, and David A Bader. A performance evaluation of open source graph databases. In Proceedings of the first workshop on PPAA. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Grappa: A latency-tolerant runtime for large-scale irregular applications. Technical report, Technical report, University of Washington, 2014. URL http://sampa. cs. washington. edu/papers/grappa-tr-2014-02. pdf. 4.1, 2014.Google ScholarGoogle Scholar
  25. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 456--471. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Roger Pearce, Maya Gokhale, and Nancy M Amato. Faster parallel traversal of scale free graphs at extreme scale with vertex delegates. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 549--559, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Raghavan Raman, Oskar van Rest, Sungpack Hong, Zhe Wu, Hassan Chafi, and Jay Banerjee. Pgx. iso: Parallel and efficient in-memory engine for subgraph isomorphism. In Proceedings of Workshop on GRAph Data management Experiences and Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In ACM SIGMOD International Conference on Management of Data, pages 979--990, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow., 6(14):1906--1917, September 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Adam Welc, Raghavan Raman, Zhe Wu, Sungpack Hong, Hassan Chafi, and Jay Banerjee. Graph analysis: do we have to reinvent the wheel? In First International Workshop on Graph Data Management Experiences and Systems, page 7. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jeremiah James Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, and Andrew Lumsdaine. Active pebbles: A programming model for highly parallel fine-grained data-driven computations. In ACM SIGPLAN Notices, volume 46, pages 305--306. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 10--10, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. A distributed graph engine for web scale rdf data. Proceedings of the VLDB Endowment, 6(4):265--276, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PGX.D: a fast distributed graph processing engine

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
              November 2015
              985 pages
              ISBN:9781450337236
              DOI:10.1145/2807591
              • General Chair:
              • Jackie Kern,
              • Program Chair:
              • Jeffrey S. Vetter

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 November 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader