research-article

TurboGraph++: A Scalable and Fast Graph Analytics System

Authors:
Seongyun Ko

POSTECH, Pohang, Rebublic of Korea

POSTECH, Pohang, Rebublic of Korea
View Profile

,
Wook-Shin Han

POSTECH, Pohang, Rebublic of Korea

POSTECH, Pohang, Rebublic of Korea
View Profile

SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018Pages 395–410https://doi.org/10.1145/3183713.3196915

Published:27 May 2018Publication History

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Pages 395–410

ABSTRACT

Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of-memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the former group keeps a partitioned graph resident in memory of each machine and uses an in-memory processing technique, the latter stores the partitioned graph in external memory of each machine and exploits a streaming processing technique. Gemini and Chaos are the state-of-the-art distributed graph systems in each group, respectively.

We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. First, TurboGraph++ provides a new graph processing abstraction for efficiently supporting neighborhood analytics that requires processing multi-hop neighborhoods of vertices, such as triangle counting and local clustering coefficient computation, with a fixed memory budget. Second, TurboGraph++ provides a balanced and buffer-aware partitioning scheme for ensuring balanced workloads across machines with reasonable cost. Lastly, TurboGraph++ leverages three-level parallel and overlapping processing for fully utilizing three hardware resources, CPU, disk, and network, in a cluster. Extensive experiments show that TurboGraph++ is designed to scale well to very large graphs, like Chaos, while its performance is comparable to Gemini.

References

2002. The Yahoo Webscope Program: yahoo! altavista web page hyperlink connectivity graph. https://webscope.sandbox.yahoo.com/.Google Scholar
2009. The lemur project: Clueweb09 web graph. http://www.lemurproject.org/clueweb09.Google Scholar
2012. The lemur project: Clueweb12 web graph. http://www.lemurproject.org/clueweb12.Google Scholar
2012. MPI: A Message-Passing Interface Standard Version 3.0. http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf .Google Scholar
Foto N Afrati, Dimitris Fotakis, and Jeffrey D Ullman. 2013. Enumerating subgraph instances using map-reduce. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 62--73. Google ScholarDigital Library
Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The end of slow networks: It's time for a redesign. Proceedings of the VLDB Endowment 9, 7 (2016), 528--539. Google ScholarDigital Library
John Adrian Bondy, Uppaluri Siva Ramachandra Murty, et al. 1976. Graph theory with applications. Vol. 290. Citeseer.Google Scholar
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.Google ScholarCross Ref
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015 Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems. ACM, 1. Google ScholarDigital Library
Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis,and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815. Google ScholarDigital Library
Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. 2017. Parallelizing sequential graph computations. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 495--510. Google ScholarDigital Library
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.. In OSDI, Vol. 12. 2. Google ScholarDigital Library
Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework.. In OSDI, Vol. 14. 599--613. Google ScholarDigital Library
Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. 2013. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 77--85. Google ScholarDigital Library
Nilesh Jain, Guangdeng Liao, and Theodore L Willke. 2013 Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems. ACM, 4. Google ScholarDigital Library
Arijit Khan. 2017. Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly. Proceedings of the 20th International Conference on Extending Database Technology (2017).Google Scholar
Ajay D Kshemkalyani and Mukesh Singhal. 2011. Distributed computing: principles, algorithms, and systems. Cambridge University Press. Google ScholarDigital Library
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web . ACM, 591--600. Google ScholarDigital Library
Aapo Kyrola, Guy E Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a pc. USENIX.Google ScholarDigital Library
Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang. 2015. Scalable subgraph enumeration in mapreduce. Proceedings of the VLDB Endowment 8, 10 (2015), 974--985. Google ScholarDigital Library
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473. Google ScholarDigital Library
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716--727. Google ScholarDigital Library
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment 8, 3 (2014), 281--292. Google ScholarDigital Library
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 135--146. Google ScholarDigital Library
R Meusel, O Lehmberg, C Bizer, and S Vigna. 2014. Web data commons-hyperlink graphs. http://webdatacommons.org/hyperlinkgraph/.Google Scholar
Himchan Park and Min-Soo Kim. 2017. TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 913--928. Google ScholarDigital Library
Ha-Myung Park, Sung-Hyon Myaeng, and U Kang. 2016. Pte: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1115--1124. Google ScholarDigital Library
Todd Plantenga. 2013. Inexact subgraph isomorphism in MapReduce. J. Parallel and Distrib. Comput. 73, 2 (2013), 164--175. Google ScholarDigital Library
Abdul Quamar, Amol Deshpande, and Jimmy Lin. 2014. NScale: neighborhood-centric analytics on large graphs. Proceedings of the VLDB Endowment 7, 13 (2014), 1673--1676. Google ScholarDigital Library
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 410--424. Google ScholarDigital Library
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 472--488. Google ScholarDigital Library
Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S Lam. 2013. Distributed socialite: a datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment 6, 14 (2013), 1906--1917. Google ScholarDigital Library
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu.2014. Parallel subgraph listing in a large-scale graph. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 625--636. Google ScholarDigital Library
Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1222--1230. Google ScholarDigital Library
Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment 7, 3 (2013), 193--204. Google ScholarDigital Library
Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 333--342. Google ScholarDigital Library
Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, and Jeffrey Xu Yu. 2016. Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing. In Proceedings of the 2016 International Conference on Management of Data. ACM, 479--494. Google ScholarDigital Library
Wenlei Xie, Guozhang Wang, David Bindel, Alan Demers, and Johannes Gehrke. 2013. Fast iterative graph computation with block updates. Proceedings of the VLDB Endowment 6, 14 (2013), 2014--2025. Google ScholarDigital Library
Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big graph analytics systems. In Proceedings of the 2016 International Conference on Management of Data. ACM, 2241--2243. Google ScholarDigital Library
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment 7, 14 (2014), 1981--1992. Google ScholarDigital Library
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web. ACM, 1307--1317. Google ScholarDigital Library
Makoto Yui, Jun Miyazaki, Shunsuke Uemura, and Hayato Yamana. 2010. Nb-GCLOCK: A non-blocking buffer management based on the generalized CLOCK. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 745--756.Google ScholarCross Ref
Chang Zhou, Jun Gao, Binbin Sun, and Jeffrey Xu Yu. 2014. MOCgraph: Scalable distributed graph processing using message online computing. Proceedings of the VLDB Endowment 8, 4 (2014), 377--388. Google ScholarDigital Library
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)(Savannah, GA. Google ScholarDigital Library
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning.. In USENIX Annual Technical Conference. 375--386 Google ScholarDigital Library

Index Terms

TurboGraph++: A Scalable and Fast Graph Analytics System
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs

Recommendations

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase ...
Read More
A Cohesive Structure Based Bipartite Graph Analytics System
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Bipartite graphs arise naturally when modeling two different types of entities such as user-item, author-paper, and director-board. In recent years, driven by numerous real-world applications in these networks, mining cohesive structures in bipartite ...
Read More
PFFS: a scalable flash memory file system for the hybrid architecture of phase-change RAM and NAND flash
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

In this paper, we present the scalable and efficient flash file system using the combination of NAND and Phase-change RAM (PRAM). Until now, several flash file systems have been developed considering the physical characteristics of NAND flash. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
General Chairs:
Gautam Das
University of Texas at Arlington, USA
,
Christopher Jermaine
Rice University, USA
,
Philip Bernstein
Microsoft Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
disk-based system
distributed system
graph analytics system
scalability
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 1,786
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TurboGraph++: A Scalable and Fast Graph Analytics System

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

A Cohesive Structure Based Bipartite Graph Analytics System

PFFS: a scalable flash memory file system for the hybrid architecture of phase-change RAM and NAND flash

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TurboGraph++: A Scalable and Fast Graph Analytics System

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

A Cohesive Structure Based Bipartite Graph Analytics System

PFFS: a scalable flash memory file system for the hybrid architecture of phase-change RAM and NAND flash

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media