ABSTRACT
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different computation modes for graph-structured programs, namely synchronous (Sync) and asynchronous (Async) modes. Unfortunately, there is currently no in-depth study on their execution properties and thus programmers have to manually choose a mode, either requiring a deep understanding of underlying graph engines, or suffering from suboptimal performance. This paper makes the first comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that the performance of the two modes varies significantly with different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, and no single mode consistently outperforms the other. To this end, this paper proposes Hsync, a hybrid graph computation mode that adaptively switches a graph-parallel program between the two modes for optimal performance. Hsync constantly collects execution statistics on-the-fly and leverages a set of heuristics to predict future performance and determine when a mode switch could be profitable. We have built online sampling and offline profiling approaches combined with a set of heuristics to accurately predicting future performance in the two modes. A prototype called PowerSwitch has been built based on PowerGraph, a state-of-the-art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes.
- The laboratory for web algorithmics. http://law.dsi.unimi.it/datasets.php.Google Scholar
- A PACHE. The Apache Giraph Project. http://giraph.apache.org/.Google Scholar
- A PACHE. The Apache Hama Project. http://hama.apache.org/.Google Scholar
- B EAMER, S., A SANOVI ´ C, K., AND P ATTERSON, D. Directionoptimizing breadth-first search. Scientific Programming 21, 3 (2013), 137–148. Google ScholarDigital Library
- B ERTSEKAS, D. P., G UERRIERO, F., AND M USMANNO, R. Parallel asynchronous label-correcting methods for shortest paths. Journal of Optimization Theory and Applications 88, 2 (1996), 297–320. Google ScholarDigital Library
- B ERTSEKAS, D. P., AND T SITSIKLIS, J. N. Parallel and distributed computation: numerical methods. Prentice-Hall, Inc., 1989. Google ScholarDigital Library
- B RIN, S., AND P AGE, L. The anatomy of a large-scale hypertextual web search engine. In Proc. WWW (1998). Google ScholarDigital Library
- C HANDY, K., AND L AMPORT, L. Distributed snapshots: determining global states of distributed systems. ACM TOCS 3, 1 (1985), 63–75. Google ScholarDigital Library
- C HEN, R., D ING, X., W ANG, P., C HEN, H., Z ANG, B., AND G UAN, H. Computation and communication efficient graph processing with distributed immutable view. In Proc. HPDC (2014). Google ScholarDigital Library
- C HEN, R., S HI, J., C HEN, Y., G UAN, H., AND C HEN, H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. Tech. Rep. 2013-11-001, Shanghai Jiao Tong University, 2013.Google Scholar
- C HEN, R., S HI, J., Z ANG, B., AND G UAN, H. Bipartite-oriented distributed graph partitioning for big learning. In Proc. APSys (2014). Google ScholarDigital Library
- C HENG, R., H ONG, J., K YROLA, A., M IAO, Y., W ENG, X., W U, M., Y ANG, F., Z HOU, L., Z HAO, F., AND C HEN, E. Kineograph: taking the pulse of a fast-changing and connected world. In Proc. EuroSys (2012). Google ScholarDigital Library
- C HIERICHETTI, F., K UMAR, R., L ATTANZI, S., M ITZENMACHER, M., P ANCONESI, A., AND R AGHAVAN, P. On compressing social networks. In Proc. SIGKDD (2009). Google ScholarDigital Library
- D EAN, J., AND G HEMAWAT, S. Mapreduce: simplified data processing on large clusters. CACM 51, 1 (2008), 107–113. Google ScholarDigital Library
- E FRON, B., H ASTIE, T., J OHNSTONE, I., AND T IBSHIRANI, R. Least angle regression. The Annals of statistics 32, 2 (2004), 407–499.Google Scholar
- E LIDAN, G., M C G RAW, I., AND K OLLER, D. Residual belief propagation: Informed scheduling for asynchronous message passing. In Proc. UAI (2006).Google Scholar
- G ONZALEZ, J., L OW, Y., G RETTON, A., AND G UESTRIN, C. Parallel gibbs sampling: From colored fields to thin junction trees. In Proc. ICAIS (2011).Google Scholar
- G ONZALEZ, J., L OW, Y., G U, H., B ICKSON, D., AND G UESTRIN, C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. OSDI (2012). Google ScholarDigital Library
- G ONZALEZ, J. E., L OW, Y., G UESTRIN, C., AND O’H ALLARON, D. Distributed parallel inference on large factor graphs. In Proc. UAI (2009). Google ScholarDigital Library
- G ONZALEZ, J. E., X IN, R. S., D AVE, A., C RANKSHAW, D., F RANKLIN, M. J., AND S TOICA, I. Graphx: Graph processing in a distributed dataflow framework. In Proc. OSDI (2014). Google ScholarDigital Library
- H AN, W., M IAO, Y., L I, K., W U, M., Y ANG, F., Z HOU, L., P RAB - HAKARAN, V., C HEN, W., AND C HEN, E. Chronos: a graph engine for temporal graph analysis. In Proc. EuroSys (2014). Google ScholarDigital Library
- H ASELGROVE, H. Wikipedia page-to-page link database. http://haselgrove.id.au/wikipedia.htm, 2010.Google Scholar
- H ERODOTOU, H., AND B ABU, S. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. Proceedings of the VLDB Endowment (2011).Google Scholar
- H ERODOTOU, H., D ONG, F., AND B ABU, S. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proc. SoCC (2011). Google ScholarDigital Library
- J AIN, N., L IAO, G., AND W ILLKE, T. L. Graphbuilder: scalable graph etl framework. In Proc. GDM (2013). Google ScholarDigital Library
- K WAK, H., L EE, C., P ARK, H., AND M OON, S. What is twitter, a social network or a news media? In Proc. WWW (2010). Google ScholarDigital Library
- K YROLA, A., B LELLOCH, G., AND G UESTRIN, C. GraphChi: Largescale graph computation on just a PC. In Proc. OSDI (2012). Google ScholarDigital Library
- L ESKOVEC, J., L ANG, K., D ASGUPTA, A., AND M AHONEY, M. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.Google Scholar
- L IU, Y., W U, B., W ANG, H., AND M A, P. Bpgm: A big graph mining tool. Tsinghua Science and Technology 19, 1 (2014), 33–38.Google Scholar
- L OW, Y., B ICKSON, D., G ONZALEZ, J., G UESTRIN, C., K YROLA, A., AND H ELLERSTEIN, J. M. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB (2012). Google ScholarDigital Library
- M ALEWICZ, G., A USTERN, M. H., B IK, A. J., D EHNERT, J. C., H ORN, I., L EISER, N., AND C ZAJKOWSKI, G. Pregel: a system for large-scale graph processing. In Proc. SIGMOD (2010). Google ScholarDigital Library
- M URRAY, D. G., M C S HERRY, F., I SAACS, R., I SARD, M., B ARHAM, P., AND A BADI, M. Naiad: a timely dataflow system. In Proc. SOSP (2013). Google ScholarDigital Library
- N GUYEN, D., L ENHARTH, A., AND P INGALI, K. A lightweight infrastructure for graph analytics. In Proc. SOSP (2013). Google ScholarDigital Library
- P ANDA, B., H ERBACH, J., B ASU, S., AND B AYARDO, R. PLANET: massively parallel learning of tree ensembles with MapReduce. Proc. VLDB (2009). Google ScholarDigital Library
- P ROJECT, S. N. A. Stanford large network dataset collection. http://snap.stanford.edu/data/.Google Scholar
- R OY, A., M IHAILOVIC, I., AND Z WAENEPOEL, W. X-stream: Edgecentric graph processing using streaming partitions. In Proc. SOSP (2013). Google ScholarDigital Library
- S ALIHOGLU, S., AND W IDOM, J. GPS: A Graph Processing System. http://infolab.stanford.edu/gps/, 2012. Google ScholarDigital Library
- S HAO, B., W ANG, H., AND L I, Y. Trinity: A distributed graph engine on a memory cloud. In Proc. SIGMOD (2013). Google ScholarDigital Library
- S HUN, J., AND B LELLOCH, G. E. Ligra: a lightweight graph processing framework for shared memory. In Proc. PPoPP (2013). Google ScholarDigital Library
- S MOLA, A., AND N ARAYANAMURTHY, S. An architecture for parallel topic models. Proc. VLDB (2010). Google ScholarDigital Library
- V ALIANT, L. G. A bridging model for parallel computation. CACM 33, 8 (1990), 103–111. Google ScholarDigital Library
- W ANG, G., X IE, W., D EMERS, A. J., AND G EHRKE, J. Asynchronous large-scale graph processing made easy. In Proc. CIDR (2013).Google Scholar
- W ILSON, C., B OE, B., S ALA, A., P UTTASWAMY, K. P., AND Z HAO, B. Y. User interactions in social networks and their implications. In Proc. EuroSys (2009). Google ScholarDigital Library
- Y E, J., C HOW, J., C HEN, J., AND Z HENG, Z. Stochastic gradient boosted distributed decision trees. In Proc. CIKM (2009). Google ScholarDigital Library
- Y OU, Y., S ONG, S. L., AND K ERBYSON, D. An adaptive crossarchitecture combination method for graph traversal. In Proc. SC (2014). Google ScholarDigital Library
- Z HANG, K., C HEN, R., AND C HEN, H. Numa-aware graphstructured analytics. In Proc. PPoPP (2015). Google ScholarDigital Library
- Z HAO, X., C HANG, A., S ARMA, A. D., Z HENG, H., AND Z HAO, B. Y. On the embeddability of random walk distances. Proc. VLDB (2013). Google ScholarDigital Library
- Z HONG, J., AND H E, B. Medusa: Simplified graph processing on gpus. IEEE TPDS (2013). Google ScholarDigital Library
- Z HONG, J., AND H E, B. Parallel graph processing on graphics processors made easy. Proc. VLDB (2013). Google ScholarDigital Library
Index Terms
- SYNC or ASYNC: time to fuse for distributed graph-parallel computation
Recommendations
SYNC or ASYNC: time to fuse for distributed graph-parallel computation
PPoPP '15Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different ...
(Sync|Async)+ MPI search engines
PVM/MPI'07: Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing InterfaceWe propose a parallel MPI search engine that is capable of automatically switching between asynchronous message passing and bulk-synchronous message passing modes of operation. When the observed query traffic is small or moderate the standard multiple ...
Comments