Abstract
Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems use a built-in partitioner that incorporates a particular partitioning policy, but the best policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a few policies.
CuSP is a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and quickly generates highquality graph partitions. For example, it can partition wdc12, the largest publicly available web-crawl graph with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6× faster on average than the state-of-theart stand-alone partitioner in the literature while supporting a wider range of partitioning policies.
- http://www.graph500.org.Google Scholar
- Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive Crawling for the Masses. In WWW, 2014.Google Scholar
- Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, 2011.Google Scholar
- Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004.Google ScholarDigital Library
- E. G. Boman, K. D. Devine, and S. Rajamanickam. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In SC, 2013.Google ScholarDigital Library
- D. Buono, T. De Matteis, G. Mencagli, and M. Vanneschi. Optimizing message-passing on multicore architectures using hardware multi-threading. In 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pages 262-- 270, Feb 2014.Google ScholarDigital Library
- Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar. On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe. SIAM J. Sci. Comput., 2010.Google Scholar
- Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. EuroSys '15, 2015.Google Scholar
- Hoang-Vu Dang, Roshan Dathathri, Gurbinder Gill, Alex Brooks, Nikoli Dryden, Andrew Lenharth, Loc Hoang, Keshav Pingali, and Marc Snir. A Lightweight Communication Runtime for Distributed Graph Analytics. In IPDPS, 2018.Google ScholarCross Ref
- Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang- Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. Gluon: A Communication Optimizing Framework for Distributed Heterogeneous Graph Analytics. PLDI, 2018.Google Scholar
- Gurbinder Gill, Roshan Dathathri, Loc Hoang, and Keshav Pingali. A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms. volume 12 of PVLDB, 2018.Google Scholar
- Apache Giraph. http://giraph.apache.org/, 2013.Google Scholar
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. OSDI'12, 2012.Google Scholar
- Jiewen Huang and Daniel J. Abadi. Leopard: Lightweight edge-oriented partitioning and replication for dynamic graphs. Proc. VLDB Endow., 9(7):540--551, March 2016.Google ScholarDigital Library
- George Karypis and Vipin Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput.Google Scholar
- Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings Publishing Co., Inc., 1994.Google ScholarDigital Library
- Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker Graphs: An Approach to Modeling Networks. J. Mach. Learn. Res., 2010.Google Scholar
- Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD Intl Conf. on Management of Data, SIGMOD '10, pages 135--146, 2010.Google ScholarDigital Library
- C. Martella, D. Logothetis, A. Loukas, and G. Siganos. Spinner: Scalable graph partitioning in the cloud. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pages 1083--1094, April 2017.Google ScholarCross Ref
- Christian Mayer, Ruben Mayer, Muhammad Adnan Tariq, Heiko Geppert, Larissa Laich, Lukas Rieger, and Kurt Rothermel. Adwise: Adaptive window-based streaming edge partitioning for high-speed graph processing. pages 685--695, 07 2018.Google Scholar
- Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. Web Data Commons, 2012.Google Scholar
- Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. SOSP '13.Google Scholar
- Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. HDRF: Stream- Based Partitioning for Power-Law Graphs. CIKM '15, 2015. 59Google Scholar
- The Lemur Project. The ClueWeb12 Dataset, 2013.Google Scholar
- X. Que, F. Checconi, F. Petrini, and J. A. Gunnels. Scalable community detection with the louvain algorithm. In IPDPS, 2015.Google ScholarDigital Library
- G. M. Slota, K. Madduri, and S. Rajamanickam. PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks. In IEEE Big Data, 2014.Google Scholar
- G. M. Slota, S. Rajamanickam, K. Devine, and K. Madduri. Partitioning Trillion-Edge Graphs in Minutes. In IPDPS, 2017.Google ScholarCross Ref
- Isabelle Stanton and Gabriel Kliot. Streaming Graph Partitioning for Large Distributed Graphs. KDD, 2012.Google ScholarDigital Library
- Dan Stanzione, Bill Barth, Niall Gaffney, Kelly Gaither, Chris Hempel, Tommy Minyard, S. Mehringer, Eric Wernert, H. Tufo, D. Panda, and P. Teller. Stampede 2: The Evolution of an XSEDE Supercomputer. PEARC17, 2017.Google ScholarDigital Library
- Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. WSDM '14, 2014.Google Scholar
- Johan Ugander and Lars Backstrom. Balanced Label Propagation for Partitioning Massive Graphs. WSDM '13, 2013.Google Scholar
- L. Wang, Y. Xiao, B. Shao, and H. Wang. How to partition a billion-node graph. In ICDE, 2014.Google ScholarCross Ref
- Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. Distributed Power-law Graph Computing: Theoretical and Empirical Analysis. In NIPS. 2014.Google Scholar
- Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A Computation-centric Distributed Graph Processing System. OSDI'16, 2016.Google Scholar
Recommendations
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes
An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs
A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Comments