skip to main content
research-article

CuSP: A Customizable Streaming Edge Partitioner for Distributed Graph Analytics

Published:06 June 2021Publication History
Skip Abstract Section

Abstract

Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems use a built-in partitioner that incorporates a particular partitioning policy, but the best policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a few policies.

CuSP is a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and quickly generates highquality graph partitions. For example, it can partition wdc12, the largest publicly available web-crawl graph with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6× faster on average than the state-of-theart stand-alone partitioner in the literature while supporting a wider range of partitioning policies.

References

  1. http://www.graph500.org.Google ScholarGoogle Scholar
  2. Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive Crawling for the Masses. In WWW, 2014.Google ScholarGoogle Scholar
  3. Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, 2011.Google ScholarGoogle Scholar
  4. Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. G. Boman, K. D. Devine, and S. Rajamanickam. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In SC, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Buono, T. De Matteis, G. Mencagli, and M. Vanneschi. Optimizing message-passing on multicore architectures using hardware multi-threading. In 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pages 262-- 270, Feb 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar. On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe. SIAM J. Sci. Comput., 2010.Google ScholarGoogle Scholar
  8. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. EuroSys '15, 2015.Google ScholarGoogle Scholar
  9. Hoang-Vu Dang, Roshan Dathathri, Gurbinder Gill, Alex Brooks, Nikoli Dryden, Andrew Lenharth, Loc Hoang, Keshav Pingali, and Marc Snir. A Lightweight Communication Runtime for Distributed Graph Analytics. In IPDPS, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  10. Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang- Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. Gluon: A Communication Optimizing Framework for Distributed Heterogeneous Graph Analytics. PLDI, 2018.Google ScholarGoogle Scholar
  11. Gurbinder Gill, Roshan Dathathri, Loc Hoang, and Keshav Pingali. A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms. volume 12 of PVLDB, 2018.Google ScholarGoogle Scholar
  12. Apache Giraph. http://giraph.apache.org/, 2013.Google ScholarGoogle Scholar
  13. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. OSDI'12, 2012.Google ScholarGoogle Scholar
  14. Jiewen Huang and Daniel J. Abadi. Leopard: Lightweight edge-oriented partitioning and replication for dynamic graphs. Proc. VLDB Endow., 9(7):540--551, March 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. George Karypis and Vipin Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput.Google ScholarGoogle Scholar
  16. Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings Publishing Co., Inc., 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker Graphs: An Approach to Modeling Networks. J. Mach. Learn. Res., 2010.Google ScholarGoogle Scholar
  18. Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD Intl Conf. on Management of Data, SIGMOD '10, pages 135--146, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Martella, D. Logothetis, A. Loukas, and G. Siganos. Spinner: Scalable graph partitioning in the cloud. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pages 1083--1094, April 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. Christian Mayer, Ruben Mayer, Muhammad Adnan Tariq, Heiko Geppert, Larissa Laich, Lukas Rieger, and Kurt Rothermel. Adwise: Adaptive window-based streaming edge partitioning for high-speed graph processing. pages 685--695, 07 2018.Google ScholarGoogle Scholar
  21. Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. Web Data Commons, 2012.Google ScholarGoogle Scholar
  22. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. SOSP '13.Google ScholarGoogle Scholar
  23. Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. HDRF: Stream- Based Partitioning for Power-Law Graphs. CIKM '15, 2015. 59Google ScholarGoogle Scholar
  24. The Lemur Project. The ClueWeb12 Dataset, 2013.Google ScholarGoogle Scholar
  25. X. Que, F. Checconi, F. Petrini, and J. A. Gunnels. Scalable community detection with the louvain algorithm. In IPDPS, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. M. Slota, K. Madduri, and S. Rajamanickam. PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks. In IEEE Big Data, 2014.Google ScholarGoogle Scholar
  27. G. M. Slota, S. Rajamanickam, K. Devine, and K. Madduri. Partitioning Trillion-Edge Graphs in Minutes. In IPDPS, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  28. Isabelle Stanton and Gabriel Kliot. Streaming Graph Partitioning for Large Distributed Graphs. KDD, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dan Stanzione, Bill Barth, Niall Gaffney, Kelly Gaither, Chris Hempel, Tommy Minyard, S. Mehringer, Eric Wernert, H. Tufo, D. Panda, and P. Teller. Stampede 2: The Evolution of an XSEDE Supercomputer. PEARC17, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. WSDM '14, 2014.Google ScholarGoogle Scholar
  31. Johan Ugander and Lars Backstrom. Balanced Label Propagation for Partitioning Massive Graphs. WSDM '13, 2013.Google ScholarGoogle Scholar
  32. L. Wang, Y. Xiao, B. Shao, and H. Wang. How to partition a billion-node graph. In ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  33. Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. Distributed Power-law Graph Computing: Theoretical and Empirical Analysis. In NIPS. 2014.Google ScholarGoogle Scholar
  34. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A Computation-centric Distributed Graph Processing System. OSDI'16, 2016.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 55, Issue 1
    SIGOPS
    July 2021
    107 pages
    ISSN:0163-5980
    DOI:10.1145/3469379
    Issue’s Table of Contents

    Copyright © 2021 Copyright is held by the owner/author(s)

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 June 2021

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader