research-article

Gunrock: GPU Graph Analytics

Authors:
Yangzihao Wang

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Yuechao Pan

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Andrew Davidson

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Yuduo Wu

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Carl Yang

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Leyuan Wang

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Muhammad Osama

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Chenshan Yuan

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Weitang Liu

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Andy T. Riffel

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
John D. Owens

University of California, Davis, CA

University of California, Davis, CA
View Profile

Authors Info & Claims

ACM Transactions on Parallel Computing Volume 4 Issue 1Article No.: 3pp 1–49https://doi.org/10.1145/3108140

Published:23 August 2017Publication History

ACM Transactions on Parallel Computing

Abstract

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock’s overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, such as Ligra and Galois, and better performance than any other GPU high-level graph library.

References

Christopher R. Aberger, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2015. EmptyHeaded: Boolean algebra based graph processing. CoRR abs/1503.02368 (2015). http://arxiv.org/abs/1503.02368Google Scholar
Saman Ashkiani, Andrew A. Davidson, Ulrich Meyer, and John D. Owens. 2016. GPU multisplit. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 12:1--12:13. Google ScholarDigital Library
Jiri Barnat, Petr Bauch, Lubos Brim, and Milan Ceska. 2011. Computing strongly connected components in parallel on CUDA. In Proceedings of the 2011 IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’11). IEEE Computer Society, Washington, DC, 544--555. Google ScholarDigital Library
Sean Baxter. 2013. Modern GPU Multisets. Retrieved from https://nvlabs.github.io/moderngpu/sets.html.Google Scholar
Sean Baxter. 2013--2016. Moderngpu: Patterns and Behaviors for GPU Computing. Retrieved from http://moderngpu.github.io/moderngpu.Google Scholar
Scott Beamer, Krste Asanovi´, and David Patterson. 2012. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). Article 12, 10 pages. Google ScholarDigital Library
Markus Billeter, Ola Olsson, and Ulf Assarsson. 2009. Efficient stream compaction on wide SIMD many-core architectures. In Proceedings of the Conference on High Performance Graphics 2009 (HPG’09). ACM, New York, NY, 159--166. Google ScholarDigital Library
Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 2 (2001), 163--177.Google ScholarCross Ref
M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. 2010. COSI: Cloud oriented subgraph identification in massive social networks. In Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’10). IEEE Computer Society, Washington, DC, USA, 248--255. Google ScholarDigital Library
Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC) (IISWC’12). IEEE Computer Society, Washington, DC, 141--151. Google ScholarDigital Library
Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’12). 141--151. Google ScholarDigital Library
F. Busato and N. Bombieri. 2015. BFS-4K: An efficient implementation of BFS for kepler GPU architectures. IEEE Trans. Parallel Distrib. Syst. 26, 7 (July 2015), 1826--1838. 1045-9219Google ScholarCross Ref
Daniel Cederman and Philippas Tsigas. 2008. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008. 57--64. Google ScholarDigital Library
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE Computer Society, Washington, DC.Google Scholar
Jonathan Cohen and Patrice CastonGuay. 2012. Efficient graph matching and coloring on the GPU. GPU Technology Conference. Retrieved from http://on-demand.gputechconf.com/gtc/2012/presentations/S0332-Efficient-Graph-Matching-and-Coloring-on-GPUs.pdf.Google Scholar
E. Cuthill and J. McKee. 1969. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference (ACM’69). ACM, New York, NY, 157--172. Google ScholarDigital Library
Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. 2014. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS’14). 349--359. Google ScholarDigital Library
Daniel Delling, Andrew V. Goldberg, Andreas Nowatzyk, and Renato F. Werneck. 2010. PHAST: Hardware-accelerated shortest path trees. J. Parallel and Distrib. Comput. 73 (Sept. 2010), 940--952.Google Scholar
Erich Elsen and Vishal Vaidyanathan. 2013. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the Gather-Apply-Scatter abstraction. Retrieved from http://www.github.com/RoyalCaliber/vertexAPI2.Google Scholar
Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems (GRADES’14). Article 2, 6 pages. Google ScholarDigital Library
Afton Geil, Yangzihao Wang, and John D. Owens. 2014. WTF, GPU&excl; Computing twitter’s who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks (COSN’14). 63--68. Google ScholarDigital Library
Abdullah Gharaibeh, Tahsin Reza, Elizeu Santos-Neto, Lauro Beltrao Costa, Scott Sallinen, and Matei Ripeanu. 2014. Efficient large-scale graph processing on hybrid CPU and GPU systems. CoRR abs/1312.3018, 1312.3018v2 (Dec. 2014). arXiv:1312.3018v2Google Scholar
Ashish Goel, Pankaj Gupta, John Sirois, Dong Wang, Aneesh Sharma, and Siva Gurumurthy. 2015. The who-to-follow system at twitter: Strategy, algorithms, and revenue impact. Interfaces 45, 1 (Feb. 2015), 98--107. 0092-2102 Google ScholarDigital Library
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, 17--30. Google ScholarDigital Library
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, 599--613. Retrieved from http://dl.acm.org/citation.cfm?id=2685048.2685096 Google ScholarDigital Library
Oded Green, Robert McColl, and David A. Bader. 2012. GPU merge path: A GPU merging algorithm. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). ACM, New York, NY, 331--340. Google ScholarDigital Library
Oded Green, Lluís-Miquel Munguía, and David A. Bader. 2014. Load balanced clustering coefficients. In Proceedings of the 1st Workshop on Parallel Programming for Analytics Applications (PPAA’14). 3--10. Google ScholarDigital Library
Oded Green, Pavan Yalamanchili, and Lluís-Miquel Munguía. 2014. Fast triangle counting on the GPU. In Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms (IA3’14). 1--8. Google ScholarDigital Library
Douglas Gregor and Andrew Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. In Proceedings of the Conference on Parallel Object-Oriented Scientific Computing (POOSC’05).Google Scholar
John Greiner. 1994. A comparison of parallel algorithms for connected components. In Proceedings of the 6th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’94). 16--25. Google ScholarDigital Library
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at twitter. In Proceedings of the International Conference on the World Wide Web. 505--514. Google ScholarDigital Library
Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 337--348. Google ScholarDigital Library
Pawan Harish and P. J. Narayanan. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proceedings of the 14th International Conference on High Performance Computing (HiPC’07). Springer-Verlag, Berlin, 197--208. Google ScholarDigital Library
Mark Harris, John D. Owens, Shubho Sengupta, Yao Zhang, and Andrew Davidson. 2009--2016. CUDPP: CUDA Data Parallel Primitives Library. (2009--2016). Retrieved from http://cudpp.github.io/.Google Scholar
Zhengyu He and Bo Hong. 2010. Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-hybrid platforms. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS’10). Georgia Institute of Technology, Atlanta, United States.Google Scholar
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-marl: A DSL for easy and efficient graph analysis. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 349--362. Google ScholarDigital Library
Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 2011. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). 267--276. Google ScholarDigital Library
Yuntao Jia, Victor Lu, Jared Hoberock, Michael Garland, and John C. Hart. 2011. Edge v. node parallelism for graph centrality metrics. In GPU Computing Gems Jade Edition, Wen-mei W. Hwu (Ed.). Morgan Kaufmann, Chapter 2, 15--28.Google Scholar
Rashid Kaleem, Sreepathi Pai, and Keshav Pingali. 2015. Stochastic gradient descent on GPUs. In Proceedings of the 8th Workshop on General Purpose Processing Using GPUs (GPGPU’15). ACM, New York, NY, 81--89. Google ScholarDigital Library
Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, and David Glasco. 2011. GPUs and the future of parallel computing. IEEE Micro 31, 5 (Sept. 2011), 7--17. Google ScholarDigital Library
Jeremy Kepner, Peter Aaltonen, David Bader, Ayd&;inodot;n Bulu&ccdeil;, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Jose Moreira, John D. Owens, Carl Yang, Marcin Zalewski, and Timothy Mattson. 2016. Mathematical foundations of the graphBLAS. In Proceedings of the IEEE High Performance Extreme Computing Conference.Google ScholarCross Ref
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC8rsquo;14). 239--252. Google ScholarDigital Library
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the International Conference on the World Wide Web. 591--600. Google ScholarDigital Library
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarDigital Library
Jure Leskovec. 2009--2016. SNAP: Stanford Large Network Dataset Collection (2009--2016). Retrieved from http://snap.stanford.edu/data/.Google Scholar
Jure Leskovec and Rok Sosič. 2016. SNAP: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1. Google ScholarDigital Library
Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, Article 68, 12 pages. Google ScholarDigital Library
Hang Liu, H. Howie Huang, and Yang Hu. 2016. iBFS: Concurrent breadth-first search on GPUs. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 403--416. Google ScholarDigital Library
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Proceedings of the 26th Annual Conference on Uncertainty in Artificial Intelligence (UAI’10). 340--349. Google ScholarDigital Library
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146. Google ScholarDigital Library
Robert Campbell McColl, David Ediger, Jason Poovey, Dan Campbell, and David A. Bader. 2014. A performance evaluation of open source graph databases. In Proceedings of the 1rst Workshop on Parallel Programming for Analytics Applications (PPAA’14). 11--18. Google ScholarDigital Library
Adam McLaughlin and David A. Bader. 2014. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). 572--583. Google ScholarDigital Library
A. McLaughlin and D. A. Bader. 2015. Fast execution of simultaneous breadth-first searches on sparse graphs. In Proceedings of the IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS’15). 9--18. Google ScholarDigital Library
A. McLaughlin, J. Riedy, and D. A. Bader. 2015. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’15). 1--6.Google Scholar
Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). 117--128. Google ScholarDigital Library
U. Meyer and P. Sanders. 2003. -stepping: A parallelizable shortest path algorithm. J. Algor. 49, 1 (Oct. 2003), 114--152. Google ScholarDigital Library
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP’13). 456--471. Google ScholarDigital Library
Sreepathi Pai and Keshav Pingali. 2016. A compiler for throughput optimization of graph algorithms on GPUs. SIGPLAN Not. 51, 10 (Oct. 2016), 1--19. 0362-1340 Google ScholarDigital Library
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2016. Multi-GPU graph analytics. CoRR abs/1504.04804, 1504.04804v3 (April 2016). arxiv:cs.DC/1504.04804v3Google Scholar
Pushkar R. Pande and David A. Bader. 2011. Computing betweenness centrality for small world networks on a GPU. In Proceedings of the IEEE Conference on High Performance Embedded Computing.Google Scholar
Roger Pearce, Maya Gokhale, and Nancy M. Amato. 2014. Faster parallel traversal of scale-free graphs at extreme scale with vertex delegates. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, 549--559. Google ScholarDigital Library
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 12--25. Google ScholarDigital Library
A. Polak. 2016. Counting triangles in large graphs on GPU. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’16). 740--746.Google ScholarCross Ref
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 472--488. Google ScholarDigital Library
Semih Salihoglu and Jennifer Widom. 2014. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems (GRADES’14). Article 3, 6 pages. Google ScholarDigital Library
Scott Sallinen, Abdullah Gharaibeh, and Matei Ripeanu. 2015. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR abs/1503.04359, 1503.04359v1 (March 2015). arxiv:cs.DC/1503.04359v1Google Scholar
Ahmet Erdem Sariyüce, Kamer Kaya, Erik Saule, and Ümit V. Çatalyürek. 2013. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU’13). 76--85. Google ScholarDigital Library
Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Conference on Experimental and Efficient Algorithms (WEA’05). 606--609. Google ScholarDigital Library
Hyunseok Seo, Jinwook Kim, and Min-Soo Kim. 2015. GStream: A graph streaming processing method for large-scale graphs on GPUs. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 253--254. Google ScholarDigital Library
Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He, Hai Jin, Lu Lu, Zhixiang Wang, Xuan Luo, and Jianlong Zhong. 2015. Optimization of asynchronous graph processing on GPU with hybrid coloring model. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). 271--272. Google ScholarDigital Library
Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). 135--146. Google ScholarDigital Library
J. Shun and K. Tangwongsan. 2015. Multicore triangle computations without tuning. In Proceedings of the IEEE 31st International Conference on Data Engineering. 149--160. 1063-6382Google Scholar
Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine. 2001. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley.Google Scholar
G. M. Slota, S. Rajamanickam, and K. Madduri. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In Proceedings of the IEEE 28th International Parallel and Distributed Processing Symposium. 550--559. 1530-2075 Google ScholarDigital Library
Jyothish Soman, Kothapalli Kishore, and P. J. Narayanan. 2010. A fast GPU algorithm for graph connectivity. In pRO24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum (IPDPSW’10). 1--8.Google Scholar
Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow. 5, 9 (May 2012), 788--799. 2150-8097 Google ScholarDigital Library
Ha-Nguyen Tran, Jung-jae Kim, and Bingsheng He. 2015. Fast subgraph matching on large graphs using graphics processors. In Database Systems for Advanced Applications, Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, and Muhammad Aamir Cheema (Eds.). Lecture Notes in Computer Science, Vol. 9049. Springer, 299--315.Google ScholarCross Ref
Stanley Tzeng, Brandon Lloyd, and John D. Owens. 2012. A GPU task-parallel model with dependency resolution. IEEE Comput. 45, 8 (Aug. 2012), 34--41. Google ScholarDigital Library
Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103--111. 0001-0782 Google ScholarDigital Library
Guozhang Wang, Wenlei Xie, Alan J. Demers, and Johannes Gehrke. 2013. Asynchronous large-scale graph processing made easy. In Proceedings of the Conference on Innovative data Systems Research (CIDR’13). www.cidrdb.org.Google Scholar
Leyuan Wang, Yangzihao Wang, Carl Yang, and John D. Owens. 2016. A comparative study on exact triangle counting algorithms on the GPU. In Proceedings of the 1st High Performance Graph Processing Workshop (HPGP’16). 1--8. Google ScholarDigital Library
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 11:1--11:12. Google ScholarDigital Library
Bo Wu, Zhijia Zhao, Eddy Zheng Zhang, Yunlian Jiang, and Xipeng Shen. 2013. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 57--68. Google ScholarDigital Library
Yuduo Wu, Yangzihao Wang, Yuechao Pan, Carl Yang, and John D. Owens. 2015. Performance characterization for high-level programming models for GPU graph analytics. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’15). 66--75. Google ScholarDigital Library
Carl Yang, Yangzihao Wang, and John D. Owens. 2015. Fast sparse matrix and sparse vector multiplication algorithm on the GPU. In Graph Algorithms Building Blocks (GABB’15). 841--847. Google ScholarDigital Library
Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25, 6 (June 2014), 1543--1552. Google ScholarDigital Library

Index Terms

Gunrock: GPU Graph Analytics

Recommendations

Gunrock: a high-performance graph processing library on the GPU
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level ...
Read More
Gunrock: a high-performance graph processing library on the GPU
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-...
Read More
Gunrock: a high-performance graph processing library on the GPU
PPoPP '15

For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Parallel Computing Volume 4, Issue 1
Special Issue: Invited papers from PPoPP 2016, Part 1
March 2017
170 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3131890
Editor:
Phillip B. Gibbons
Carnegie Mellon University, Pittsburgh, USA
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 August 2017
- Accepted: 1 August 2017
- Revised: 1 May 2017
- Received: 1 December 2016
Published in topc Volume 4, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
Graph processing
runtime framework
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 1,134
  Total Downloads
- Downloads (Last 12 months)176
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gunrock: GPU Graph Analytics

ACM Transactions on Parallel Computing

Abstract

References

Cited By

Index Terms

Recommendations

Gunrock: a high-performance graph processing library on the GPU

Gunrock: a high-performance graph processing library on the GPU

Gunrock: a high-performance graph processing library on the GPU