Abstract
We present Scalable Host-tree Embeddings for Efficient Partitioning (Sheep), a distributed graph partitioning algorithm capable of handling graphs that far exceed main memory. Sheep produces high quality edge partitions an order of magnitude faster than both state of the art offline (e.g., METIS) and streaming partitioners (e.g., Fennel). Sheep's partitions are independent of the input graph distribution, which means that graph elements can be assigned to processing nodes arbitrarily without affecting the partition quality.
Sheep transforms the input graph into a strictly smaller elimination tree via a distributed map-reduce operation. By partitioning this tree, Sheep finds an upper-bounded communication volume partitioning of the original graph.
We describe the Sheep algorithm and analyze its space-time requirements, partition quality, and intuitive characteristics and limitations. We compare Sheep to contemporary partitioners and demonstrate that Sheep creates competitive partitions, scales to larger graphs, and has better runtime.
- R. Albert, H. Jeong, and A.-L. Barabási. Error and attack tolerance of complex networks. Nature, 406(6794):378--382, 2000.Google Scholar
- S. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In ACM International Conference on Information and Knowledge Management, 2013. Google Scholar
- C. Ashcraft and J. W. Liu. Robust ordering of sparse matrices using multisection. SIAM Journal on Matrix Analysis and Applications, 19(3):816--832, 1998. Google Scholar
- C. Avery. Giraph: Large-scale graph processing infrastructure on hadoop. Hadoop Summit, 2011.Google Scholar
- H. L. Bodlaender, F. V. Fomin, A. M. Koster, D. Kratsch, and D. M. Thilikos. A note on exact algorithms for vertex ordering problems on graphs. Theory of Computing Systems, 50(3):420--432, 2012. Google Scholar
- H. L. Bodlaender, J. R. Gilbert, H. Hafsteinsson, and T. Kloks. Approximating treewidth, pathwidth, frontsize, and shortest elimination tree. Journal of Algorithms, 18(2):238--255, 1995. Google Scholar
- P. Boldi, M. Santini, and S. Vigna. A large time-aware graph. SIGIR Forum, 42(2):33--38, 2008. Google Scholar
- F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In 20th ACM International Conference on Knowledge Discovery and Data mining, pages 1456--1465. ACM, 2014. Google Scholar
- R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In 10th ACM SIGOPS European Conference on Computer Systems. ACM, 2015. Google Scholar
- S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. K-core organization of complex networks. Physical review letters, 96(4):040601, 2006.Google Scholar
- M. Fredman and M. Saks. The cell probe complexity of dynamic data structures. In 21st ACM Symposium on Theory of Computing, pages 345--354. ACM, 1989. Google Scholar
- A. George. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis, 10(2):345--363, 1973.Google Scholar
- A. George and J. W. Liu. The evolution of the minimum degree ordering algorithm. SIAM Review, 31(1):1--19, 1989. Google Scholar
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Operating Systems Design and Implementation, volume 12, page 2, 2012. Google Scholar
- P. Heggernes. Minimal triangulations of graphs: A survey. Discrete Mathematics, 306(3):297--317, 2006. Google Scholar
- S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In Conference on Innovative Data systems Research, volume 3, pages 1--8, 2007.Google Scholar
- S. Iyer, T. Killingback, B. Sundaram, and Z. Wang. Attack robustness and centrality of complex networks. PloS ONE, 8(4):e59613, 2013.Google Scholar
- G. Karypis. A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. University of Minnesota, Department of Computer Science and Engineering, Minneapolis, MN, 2013.Google Scholar
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359--392, 1998. Google Scholar
- G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. Journal of Parallel and Distributed Computing, 48(1):71--95, Jan. 1998. Google Scholar
- S. Kundu and J. Misra. A linear tree partitioning algorithm. SIAM Journal on Computing, 6(1):151--154, 1977.Google Scholar
- A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In Operating Systems Design and Implementation, volume 12, pages 31--46, 2012. Google Scholar
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google Scholar
- P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. Llama: Efficient graph analytics using large multiversioned arrays. In International Conference on Data Engineering. IEEE, 2015.Google Scholar
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In 2010 ACM SIGMOD International Conference on Management of Data, pages 135--146. ACM, 2010. Google Scholar
- H. Miao, X. Liu, B. Huang, and L. Getoor. A hypergraph-partitioned vertex programming approach for large-scale consensus optimization. In International Conference on Big Data, pages 563--568. IEEE, 2013.Google Scholar
- R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray User's Group (CUG), 2010.Google Scholar
- M. E. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404--409, 2001.Google Scholar
- S. Parter. The use of linear graphs in gauss elimination. SIAM Review, 3(2):119--130, 1961.Google Scholar
- A. Pothen and S. Toledo. Elimination structures in scientific computing. Handbook on Data Structures and Applications, pages 59--1, 2004.Google Scholar
- V. Prabhakaran, M. Wu, X. Weng, F. McSherry, L. Zhou, and M. Haradasan. Managing large graphs on multi-cores with graph awareness. In USENIX Annual Technical Conference, pages 41--52, 2012. Google Scholar
- A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In 24th ACM Symposium on Operating Systems Principles, pages 472--488. ACM, 2013. Google Scholar
- S. Salihoglu and J. Widom. Gps: A graph processing system. In 25th International Conference on Scientific and Statistical Database Management. ACM, 2013. Google Scholar
- N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In ACM SIGMOD International Conference on Management of Data, pages 979--990. ACM, 2014. Google Scholar
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 1222--1230. ACM, 2012. Google Scholar
- C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In 7th ACM International Conference on Web Search and Data Mining, pages 333--342. ACM, 2014. Google Scholar
Index Terms
- A scalable distributed graph partitioner
Recommendations
A parallel graph partitioner on a distributed memory multiprocessor
FRONTIERS '95: Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)In order to realize the full potential of speed-up by parallelization, it is essential to partition a problem into small tasks with minimal interactions without making this process itself a bottleneck. We present a method for graph partitioning that is ...
Hammer lightweight graph partitioner based on graph data volumes
Highlights- Graph partitioning is a challenging task for the massive graphs in distributed computing.
AbstractThe graph partitioning challenge is well known and ongoing classical problem. Many heuristic methods tried to propose solutions focusing mainly on load processing and cost-efficiency. With the emergency of big data technology, the ...
Social hash partitioner: a scalable distributed hypergraph partitioner
We design and implement a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k - 1)-cut metric, by optimizing a novel objective called ...
Comments