skip to main content
research-article

A scalable distributed graph partitioner

Published:01 August 2015Publication History
Skip Abstract Section

Abstract

We present Scalable Host-tree Embeddings for Efficient Partitioning (Sheep), a distributed graph partitioning algorithm capable of handling graphs that far exceed main memory. Sheep produces high quality edge partitions an order of magnitude faster than both state of the art offline (e.g., METIS) and streaming partitioners (e.g., Fennel). Sheep's partitions are independent of the input graph distribution, which means that graph elements can be assigned to processing nodes arbitrarily without affecting the partition quality.

Sheep transforms the input graph into a strictly smaller elimination tree via a distributed map-reduce operation. By partitioning this tree, Sheep finds an upper-bounded communication volume partitioning of the original graph.

We describe the Sheep algorithm and analyze its space-time requirements, partition quality, and intuitive characteristics and limitations. We compare Sheep to contemporary partitioners and demonstrate that Sheep creates competitive partitions, scales to larger graphs, and has better runtime.

References

  1. R. Albert, H. Jeong, and A.-L. Barabási. Error and attack tolerance of complex networks. Nature, 406(6794):378--382, 2000.Google ScholarGoogle Scholar
  2. S. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In ACM International Conference on Information and Knowledge Management, 2013. Google ScholarGoogle Scholar
  3. C. Ashcraft and J. W. Liu. Robust ordering of sparse matrices using multisection. SIAM Journal on Matrix Analysis and Applications, 19(3):816--832, 1998. Google ScholarGoogle Scholar
  4. C. Avery. Giraph: Large-scale graph processing infrastructure on hadoop. Hadoop Summit, 2011.Google ScholarGoogle Scholar
  5. H. L. Bodlaender, F. V. Fomin, A. M. Koster, D. Kratsch, and D. M. Thilikos. A note on exact algorithms for vertex ordering problems on graphs. Theory of Computing Systems, 50(3):420--432, 2012. Google ScholarGoogle Scholar
  6. H. L. Bodlaender, J. R. Gilbert, H. Hafsteinsson, and T. Kloks. Approximating treewidth, pathwidth, frontsize, and shortest elimination tree. Journal of Algorithms, 18(2):238--255, 1995. Google ScholarGoogle Scholar
  7. P. Boldi, M. Santini, and S. Vigna. A large time-aware graph. SIGIR Forum, 42(2):33--38, 2008. Google ScholarGoogle Scholar
  8. F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In 20th ACM International Conference on Knowledge Discovery and Data mining, pages 1456--1465. ACM, 2014. Google ScholarGoogle Scholar
  9. R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In 10th ACM SIGOPS European Conference on Computer Systems. ACM, 2015. Google ScholarGoogle Scholar
  10. S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. K-core organization of complex networks. Physical review letters, 96(4):040601, 2006.Google ScholarGoogle Scholar
  11. M. Fredman and M. Saks. The cell probe complexity of dynamic data structures. In 21st ACM Symposium on Theory of Computing, pages 345--354. ACM, 1989. Google ScholarGoogle Scholar
  12. A. George. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis, 10(2):345--363, 1973.Google ScholarGoogle Scholar
  13. A. George and J. W. Liu. The evolution of the minimum degree ordering algorithm. SIAM Review, 31(1):1--19, 1989. Google ScholarGoogle Scholar
  14. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Operating Systems Design and Implementation, volume 12, page 2, 2012. Google ScholarGoogle Scholar
  15. P. Heggernes. Minimal triangulations of graphs: A survey. Discrete Mathematics, 306(3):297--317, 2006. Google ScholarGoogle Scholar
  16. S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In Conference on Innovative Data systems Research, volume 3, pages 1--8, 2007.Google ScholarGoogle Scholar
  17. S. Iyer, T. Killingback, B. Sundaram, and Z. Wang. Attack robustness and centrality of complex networks. PloS ONE, 8(4):e59613, 2013.Google ScholarGoogle Scholar
  18. G. Karypis. A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. University of Minnesota, Department of Computer Science and Engineering, Minneapolis, MN, 2013.Google ScholarGoogle Scholar
  19. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359--392, 1998. Google ScholarGoogle Scholar
  20. G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. Journal of Parallel and Distributed Computing, 48(1):71--95, Jan. 1998. Google ScholarGoogle Scholar
  21. S. Kundu and J. Misra. A linear tree partitioning algorithm. SIAM Journal on Computing, 6(1):151--154, 1977.Google ScholarGoogle Scholar
  22. A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In Operating Systems Design and Implementation, volume 12, pages 31--46, 2012. Google ScholarGoogle Scholar
  23. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google ScholarGoogle Scholar
  24. P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. Llama: Efficient graph analytics using large multiversioned arrays. In International Conference on Data Engineering. IEEE, 2015.Google ScholarGoogle Scholar
  25. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In 2010 ACM SIGMOD International Conference on Management of Data, pages 135--146. ACM, 2010. Google ScholarGoogle Scholar
  26. H. Miao, X. Liu, B. Huang, and L. Getoor. A hypergraph-partitioned vertex programming approach for large-scale consensus optimization. In International Conference on Big Data, pages 563--568. IEEE, 2013.Google ScholarGoogle Scholar
  27. R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray User's Group (CUG), 2010.Google ScholarGoogle Scholar
  28. M. E. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404--409, 2001.Google ScholarGoogle Scholar
  29. S. Parter. The use of linear graphs in gauss elimination. SIAM Review, 3(2):119--130, 1961.Google ScholarGoogle Scholar
  30. A. Pothen and S. Toledo. Elimination structures in scientific computing. Handbook on Data Structures and Applications, pages 59--1, 2004.Google ScholarGoogle Scholar
  31. V. Prabhakaran, M. Wu, X. Weng, F. McSherry, L. Zhou, and M. Haradasan. Managing large graphs on multi-cores with graph awareness. In USENIX Annual Technical Conference, pages 41--52, 2012. Google ScholarGoogle Scholar
  32. A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In 24th ACM Symposium on Operating Systems Principles, pages 472--488. ACM, 2013. Google ScholarGoogle Scholar
  33. S. Salihoglu and J. Widom. Gps: A graph processing system. In 25th International Conference on Scientific and Statistical Database Management. ACM, 2013. Google ScholarGoogle Scholar
  34. N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In ACM SIGMOD International Conference on Management of Data, pages 979--990. ACM, 2014. Google ScholarGoogle Scholar
  35. I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 1222--1230. ACM, 2012. Google ScholarGoogle Scholar
  36. C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In 7th ACM International Conference on Web Search and Data Mining, pages 333--342. ACM, 2014. Google ScholarGoogle Scholar

Index Terms

  1. A scalable distributed graph partitioner
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 8, Issue 12
        Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
        August 2015
        728 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2015
        Published in pvldb Volume 8, Issue 12

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader