research-article

An experimental comparison of partitioning strategies in distributed graph processing

Authors:
Shiv Verma

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Luke M. Leslie

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Yosub Shin

Samsara Inc.

Samsara Inc.
View Profile

,
Indranil Gupta

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

Proceedings of the VLDB Endowment Volume 10 Issue 5pp 493–504https://doi.org/10.14778/3055540.3055543

Published:01 January 2017Publication History

Proceedings of the VLDB Endowment

Abstract

In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).

References

Apache Giraph. http://giraph.apache.org/. Last accessed 2016-04-18.Google Scholar
P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proc. of Int'l. Conf. on World Wide Web (WWW). ACM, 2011. Google ScholarDigital Library
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proc. of Int'l. Conf. World Wide Web (WWW). ACM, 2004. Google ScholarDigital Library
R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. of European Conf. on Computer Systems, (EuroSys), pages 1:1-1:15. ACM, 2015. Google ScholarDigital Library
A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One trillion edges: graph processing at Facebook-scale. Proc. of VLDB Endowment, 2015. Google ScholarDigital Library
DIMACS Challenge 9 - Shortest Paths. http://www.dis.uniroma1.it/challenge9/. Last Accessed 2016-05-30.Google Scholar
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In Proc. of Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), pages 251--262. ACM, 1999. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proc. of Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 2012. Google ScholarDigital Library
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In Proc. of Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 2014. Google ScholarDigital Library
H. Halberstam and R. R. Laxton. Perfect difference sets. Proc. of Glasgow Mathematical Association, 6:177--184, July 1964.Google ScholarCross Ref
M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of pregel-like graph processing systems. In Proc. of VLDB Endowment, volume 7, pages 1047--1058. VLDB Endowment, Aug. 2014. Google ScholarDigital Library
I. Hoque and I. Gupta. LFGraph: Simple and fast distributed graph analytics. In Proc. of Conf. on Timely Results In Operating Systems (TRiOS). ACM, 2013. Google ScholarDigital Library
N. Jain, G. Liao, and T. L. Willke. Graphbuilder: scalable graph ETL framework. In First Int'l. Workshop on Graph Data Management Experiences and Systems, (GRADES), 2013. Google ScholarDigital Library
R. M. Karp. Reducibility among combinatorial problems. In Proc. of a Symposium on the Complexity of Computer Computations, pages 85--103, 1972.Google ScholarCross Ref
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proc. of Int'l. Conf. on World Wide Web (WWW). ACM, 2010. Google ScholarDigital Library
A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In Proc. of Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 2012. Google ScholarDigital Library
Laboratory For Web Algorithms. http://law.di.unimi.it/datasets.php. Last Accessed 2016-04-18.Google Scholar
M. LeBeane, S. Song, R. Panda, J. H. Ryoo, and L. K. John. Data partitioning strategies for graph workloads on heterogeneous clusters. In Proc. of Int'l. Conf. for High Performance Computing, Networking, Storage and Analysis, (SC '15), pages 56:1--56:12. ACM, 2015. Google ScholarDigital Library
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google Scholar
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proc. of VLDB Endowment. VLDB Endowment, 2012. Google ScholarDigital Library
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conf. on Uncertainty in Artificial Intelligence (UAI), 2010. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proc. of Int'l. Conf. on Management of Data (SIGMOD). ACM, 2010. Google ScholarDigital Library
F. Petroni, L. Querzoni, K. Daudjee, S. Kamali, and G. Iacoboni. Hdrf: Stream-based partitioning for power-law graphs. In Proc. of 24th ACM Int'l. on Conf. on Information and Knowledge Management (CIKM), pages 243--252. ACM, 2015. Google ScholarDigital Library
S. Schelter, S. Ewen, K. Tzoumas, and V. Markl. All roads lead to Rome: Optimistic recovery for distributed iterative data processing. In Proc. of Int'l. Conf. on Information and Knowledge Management (CIKM). ACM, 2013. Google ScholarDigital Library
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proc. of 7th ACM Int'l. Conf. on Web Search and Data Mining (WSDM), pages 333--342. ACM, 2014. Google ScholarDigital Library
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, Aug. 1990. Google ScholarDigital Library
S. Verma, L. M. Leslie, Y. Shin, and I. Gupta. An experimental comparison of partitioning strategies in distributed graph processing. Technical Report, IDEALS, http://hdl.handle.net/2142/91657, 2016.Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of Conf. on Networked Systems Design and Implementation (NSDI). USENIX, 2012. Google ScholarDigital Library

Index Terms

An experimental comparison of partitioning strategies in distributed graph processing

Index terms have been assigned to the content through auto-classification.

Recommendations

Application Driven Graph Partitioning
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Graph partitioning is crucial to parallel computations on large graphs. The choice of partitioning strategies has strong impact on not only the performance of graph algorithms, but also the design of the algorithms. For an algorithm of our interest, ...
Read More
Experimental Analysis of Streaming Algorithms for Graph Partitioning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

We report a systematic performance study of streaming graph partitioning algorithms. Graph partitioning plays a crucial role in overall system performance as it has a significant impact on both load balancing and inter-machine communication. The ...
Read More
Graph partitioning strategies: one size does not fit all
Abstract
As an important part of distributed graph computing, graph partitioning has been widely studied. However, the majority of the existing approaches to distributed graph partitioning barely take into consideration the relationship between the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 10, Issue 5
January 2017
168 pages
ISSN:2150-8097
Editor:
Divesh Srivastava
AT&T Labs
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 January 2017
Published in pvldb Volume 10, Issue 5
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 366
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An experimental comparison of partitioning strategies in distributed graph processing

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Application Driven Graph Partitioning

Experimental Analysis of Streaming Algorithms for Graph Partitioning

Graph partitioning strategies: one size does not fit all

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An experimental comparison of partitioning strategies in distributed graph processing

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Application Driven Graph Partitioning

Experimental Analysis of Streaming Algorithms for Graph Partitioning

Graph partitioning strategies: one size does not fit all

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media