skip to main content
research-article

An experimental comparison of partitioning strategies in distributed graph processing

Published:01 January 2017Publication History
Skip Abstract Section

Abstract

In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).

References

  1. Apache Giraph. http://giraph.apache.org/. Last accessed 2016-04-18.Google ScholarGoogle Scholar
  2. P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proc. of Int'l. Conf. on World Wide Web (WWW). ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proc. of Int'l. Conf. World Wide Web (WWW). ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. of European Conf. on Computer Systems, (EuroSys), pages 1:1-1:15. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One trillion edges: graph processing at Facebook-scale. Proc. of VLDB Endowment, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. DIMACS Challenge 9 - Shortest Paths. http://www.dis.uniroma1.it/challenge9/. Last Accessed 2016-05-30.Google ScholarGoogle Scholar
  7. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In Proc. of Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), pages 251--262. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proc. of Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In Proc. of Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Halberstam and R. R. Laxton. Perfect difference sets. Proc. of Glasgow Mathematical Association, 6:177--184, July 1964.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of pregel-like graph processing systems. In Proc. of VLDB Endowment, volume 7, pages 1047--1058. VLDB Endowment, Aug. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Hoque and I. Gupta. LFGraph: Simple and fast distributed graph analytics. In Proc. of Conf. on Timely Results In Operating Systems (TRiOS). ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Jain, G. Liao, and T. L. Willke. Graphbuilder: scalable graph ETL framework. In First Int'l. Workshop on Graph Data Management Experiences and Systems, (GRADES), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. M. Karp. Reducibility among combinatorial problems. In Proc. of a Symposium on the Complexity of Computer Computations, pages 85--103, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  15. H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proc. of Int'l. Conf. on World Wide Web (WWW). ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In Proc. of Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Laboratory For Web Algorithms. http://law.di.unimi.it/datasets.php. Last Accessed 2016-04-18.Google ScholarGoogle Scholar
  18. M. LeBeane, S. Song, R. Panda, J. H. Ryoo, and L. K. John. Data partitioning strategies for graph workloads on heterogeneous clusters. In Proc. of Int'l. Conf. for High Performance Computing, Networking, Storage and Analysis, (SC '15), pages 56:1--56:12. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google ScholarGoogle Scholar
  20. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proc. of VLDB Endowment. VLDB Endowment, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conf. on Uncertainty in Artificial Intelligence (UAI), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proc. of Int'l. Conf. on Management of Data (SIGMOD). ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Petroni, L. Querzoni, K. Daudjee, S. Kamali, and G. Iacoboni. Hdrf: Stream-based partitioning for power-law graphs. In Proc. of 24th ACM Int'l. on Conf. on Information and Knowledge Management (CIKM), pages 243--252. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Schelter, S. Ewen, K. Tzoumas, and V. Markl. All roads lead to Rome: Optimistic recovery for distributed iterative data processing. In Proc. of Int'l. Conf. on Information and Knowledge Management (CIKM). ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proc. of 7th ACM Int'l. Conf. on Web Search and Data Mining (WSDM), pages 333--342. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, Aug. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Verma, L. M. Leslie, Y. Shin, and I. Gupta. An experimental comparison of partitioning strategies in distributed graph processing. Technical Report, IDEALS, http://hdl.handle.net/2142/91657, 2016.Google ScholarGoogle Scholar
  28. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of Conf. on Networked Systems Design and Implementation (NSDI). USENIX, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An experimental comparison of partitioning strategies in distributed graph processing
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Proceedings of the VLDB Endowment
              Proceedings of the VLDB Endowment  Volume 10, Issue 5
              January 2017
              168 pages
              ISSN:2150-8097
              Issue’s Table of Contents

              Publisher

              VLDB Endowment

              Publication History

              • Published: 1 January 2017
              Published in pvldb Volume 10, Issue 5

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader