skip to main content
10.1145/3183713.3196915acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TurboGraph++: A Scalable and Fast Graph Analytics System

Published:27 May 2018Publication History

ABSTRACT

Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of-memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the former group keeps a partitioned graph resident in memory of each machine and uses an in-memory processing technique, the latter stores the partitioned graph in external memory of each machine and exploits a streaming processing technique. Gemini and Chaos are the state-of-the-art distributed graph systems in each group, respectively.

We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. First, TurboGraph++ provides a new graph processing abstraction for efficiently supporting neighborhood analytics that requires processing multi-hop neighborhoods of vertices, such as triangle counting and local clustering coefficient computation, with a fixed memory budget. Second, TurboGraph++ provides a balanced and buffer-aware partitioning scheme for ensuring balanced workloads across machines with reasonable cost. Lastly, TurboGraph++ leverages three-level parallel and overlapping processing for fully utilizing three hardware resources, CPU, disk, and network, in a cluster. Extensive experiments show that TurboGraph++ is designed to scale well to very large graphs, like Chaos, while its performance is comparable to Gemini.

References

  1. 2002. The Yahoo Webscope Program: yahoo! altavista web page hyperlink connectivity graph. https://webscope.sandbox.yahoo.com/.Google ScholarGoogle Scholar
  2. 2009. The lemur project: Clueweb09 web graph. http://www.lemurproject.org/clueweb09.Google ScholarGoogle Scholar
  3. 2012. The lemur project: Clueweb12 web graph. http://www.lemurproject.org/clueweb12.Google ScholarGoogle Scholar
  4. 2012. MPI: A Message-Passing Interface Standard Version 3.0. http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf .Google ScholarGoogle Scholar
  5. Foto N Afrati, Dimitris Fotakis, and Jeffrey D Ullman. 2013. Enumerating subgraph instances using map-reduce. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The end of slow networks: It's time for a redesign. Proceedings of the VLDB Endowment 9, 7 (2016), 528--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. John Adrian Bondy, Uppaluri Siva Ramachandra Murty, et al. 1976. Graph theory with applications. Vol. 290. Citeseer.Google ScholarGoogle Scholar
  8. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.Google ScholarGoogle ScholarCross RefCross Ref
  9. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015 Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems. ACM, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis,and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. 2017. Parallelizing sequential graph computations. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 495--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.. In OSDI, Vol. 12. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework.. In OSDI, Vol. 14. 599--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. 2013. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 77--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nilesh Jain, Guangdeng Liao, and Theodore L Willke. 2013 Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems. ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Arijit Khan. 2017. Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly. Proceedings of the 20th International Conference on Extending Database Technology (2017).Google ScholarGoogle Scholar
  17. Ajay D Kshemkalyani and Mukesh Singhal. 2011. Distributed computing: principles, algorithms, and systems. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web . ACM, 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Aapo Kyrola, Guy E Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a pc. USENIX.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang. 2015. Scalable subgraph enumeration in mapreduce. Proceedings of the VLDB Endowment 8, 10 (2015), 974--985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment 8, 3 (2014), 281--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R Meusel, O Lehmberg, C Bizer, and S Vigna. 2014. Web data commons-hyperlink graphs. http://webdatacommons.org/hyperlinkgraph/.Google ScholarGoogle Scholar
  26. Himchan Park and Min-Soo Kim. 2017. TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 913--928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ha-Myung Park, Sung-Hyon Myaeng, and U Kang. 2016. Pte: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1115--1124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Todd Plantenga. 2013. Inexact subgraph isomorphism in MapReduce. J. Parallel and Distrib. Comput. 73, 2 (2013), 164--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Abdul Quamar, Amol Deshpande, and Jimmy Lin. 2014. NScale: neighborhood-centric analytics on large graphs. Proceedings of the VLDB Endowment 7, 13 (2014), 1673--1676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 410--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 472--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S Lam. 2013. Distributed socialite: a datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment 6, 14 (2013), 1906--1917. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu.2014. Parallel subgraph listing in a large-scale graph. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 625--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1222--1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment 7, 3 (2013), 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, and Jeffrey Xu Yu. 2016. Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing. In Proceedings of the 2016 International Conference on Management of Data. ACM, 479--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wenlei Xie, Guozhang Wang, David Bindel, Alan Demers, and Johannes Gehrke. 2013. Fast iterative graph computation with block updates. Proceedings of the VLDB Endowment 6, 14 (2013), 2014--2025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big graph analytics systems. In Proceedings of the 2016 International Conference on Management of Data. ACM, 2241--2243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment 7, 14 (2014), 1981--1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web. ACM, 1307--1317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Makoto Yui, Jun Miyazaki, Shunsuke Uemura, and Hayato Yamana. 2010. Nb-GCLOCK: A non-blocking buffer management based on the generalized CLOCK. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 745--756.Google ScholarGoogle ScholarCross RefCross Ref
  43. Chang Zhou, Jun Gao, Binbin Sun, and Jeffrey Xu Yu. 2014. MOCgraph: Scalable distributed graph processing using message online computing. Proceedings of the VLDB Endowment 8, 4 (2014), 377--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)(Savannah, GA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning.. In USENIX Annual Technical Conference. 375--386 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TurboGraph++: A Scalable and Fast Graph Analytics System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
        May 2018
        1874 pages
        ISBN:9781450347037
        DOI:10.1145/3183713

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 May 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader