research-article

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Authors:
Wook-Shin Han

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Sangyeon Lee

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Kyungyeol Park

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Jeong-Hoon Lee

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Min-Soo Kim

DGIST, Deagu, South Korea

DGIST, Deagu, South Korea
View Profile

,
Jinha Kim

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Hwanjo Yu

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2013Pages 77–85https://doi.org/10.1145/2487575.2487581

Published:11 August 2013Publication History

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 77–85

ABSTRACT

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files.

References

Yahoo webscope. yahoo! altavista web page hyperlink connectivity graph. http://webscope.sandbox.yahoo.com.Google Scholar
L. Addario-Berry, W. S. Kennedy, A. D. King, Z. Li, and B. A. Reed. Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discrete Applied Mathematics, 158(7):765--770, 2010. Google ScholarDigital Library
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD, pages 44--54. ACM, 2006. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012. Google ScholarDigital Library
W.-S. Han, J. Lee, and J.-H. Lee. TurboISO: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD, 2013. Google ScholarDigital Library
W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. igraph: a framework for comparisons of disk-based graph indexing techniques. Proc. VLDB Endow., 3(1--2):449--459, Sept. 2010. Google ScholarDigital Library
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-marl: a dsl for easy and efficient graph analysis. In ASPLOS, pages 349--362, 2012. Google ScholarDigital Library
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarDigital Library
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: a scalable and general graph management system. In KDD, pages 1091--1099, 2011. Google ScholarDigital Library
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. gbase: an efficient analysis platform for large graphs. VLDB J., 21(5):637--650, 2012. Google ScholarDigital Library
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. In ICDM, pages 229--238, 2009. Google ScholarDigital Library
G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999. Google ScholarDigital Library
H. Kwak, C. Lee, H. Park, and S. B. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarDigital Library
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: large-scale graph computation on just a pc. In OSDI, pages 31--46, 2012. Google ScholarDigital Library
J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, 2012. Google ScholarDigital Library
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. PVLDB, 5(8):716--727, 2012. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010. Google ScholarDigital Library
H. Maserrat and J. Pei. Neighbor query friendly compression of social networks. In KDD, pages 533--542, 2010. Google ScholarDigital Library
B. Shao, H. Wang, and Y. Li. The trinity graph engine. Technical Report 161291, Microsoft Research, 2012.Google Scholar
Y. Shiloach and U. Vishkin. An o(logn) parallel connectivity algorithm. Journal of Algorithms, 3(1):57--67, 1982. Google ScholarDigital Library
M. Stonebraker et al. C-store: a column-oriented dbms. In VLDB, pages 553--564, 2005. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
P. Zhao and J. Han. On graph query optimization in large networks. PVLDB, 3(1):340--351, 2010. Google ScholarDigital Library

Index Terms

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
  2. Information retrieval
    1. Information retrieval query processing

Recommendations

TurboGraph++: A Scalable and Fast Graph Analytics System
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of-memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the ...
Read More
iGraph: an incremental data processing system for dynamic graph

With the popularity of social network, the demand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is ...
Read More
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Recent work on graph analytics has sought to leverage the high performance offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithm and limitations in GPU-resident memory for storing large graphs. The Graph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big data
graph processing
parallelism
pin-and-slide
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 176
  Total Citations
  View Citations
- 3,241
  Total Downloads
- Downloads (Last 12 months)92
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

TurboGraph++: A Scalable and Fast Graph Analytics System

iGraph: an incremental data processing system for dynamic graph

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

TurboGraph++: A Scalable and Fast Graph Analytics System

iGraph: an incremental data processing system for dynamic graph

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media