skip to main content
research-article

Gunrock: GPU Graph Analytics

Published:23 August 2017Publication History
Skip Abstract Section

Abstract

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock’s overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, such as Ligra and Galois, and better performance than any other GPU high-level graph library.

References

  1. Christopher R. Aberger, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2015. EmptyHeaded: Boolean algebra based graph processing. CoRR abs/1503.02368 (2015). http://arxiv.org/abs/1503.02368Google ScholarGoogle Scholar
  2. Saman Ashkiani, Andrew A. Davidson, Ulrich Meyer, and John D. Owens. 2016. GPU multisplit. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 12:1--12:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jiri Barnat, Petr Bauch, Lubos Brim, and Milan Ceska. 2011. Computing strongly connected components in parallel on CUDA. In Proceedings of the 2011 IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’11). IEEE Computer Society, Washington, DC, 544--555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sean Baxter. 2013. Modern GPU Multisets. Retrieved from https://nvlabs.github.io/moderngpu/sets.html.Google ScholarGoogle Scholar
  5. Sean Baxter. 2013--2016. Moderngpu: Patterns and Behaviors for GPU Computing. Retrieved from http://moderngpu.github.io/moderngpu.Google ScholarGoogle Scholar
  6. Scott Beamer, Krste Asanovi´, and David Patterson. 2012. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). Article 12, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Markus Billeter, Ola Olsson, and Ulf Assarsson. 2009. Efficient stream compaction on wide SIMD many-core architectures. In Proceedings of the Conference on High Performance Graphics 2009 (HPG’09). ACM, New York, NY, 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 2 (2001), 163--177.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. 2010. COSI: Cloud oriented subgraph identification in massive social networks. In Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’10). IEEE Computer Society, Washington, DC, USA, 248--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC) (IISWC’12). IEEE Computer Society, Washington, DC, 141--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’12). 141--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Busato and N. Bombieri. 2015. BFS-4K: An efficient implementation of BFS for kepler GPU architectures. IEEE Trans. Parallel Distrib. Syst. 26, 7 (July 2015), 1826--1838. 1045-9219Google ScholarGoogle ScholarCross RefCross Ref
  13. Daniel Cederman and Philippas Tsigas. 2008. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008. 57--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE Computer Society, Washington, DC.Google ScholarGoogle Scholar
  15. Jonathan Cohen and Patrice CastonGuay. 2012. Efficient graph matching and coloring on the GPU. GPU Technology Conference. Retrieved from http://on-demand.gputechconf.com/gtc/2012/presentations/S0332-Efficient-Graph-Matching-and-Coloring-on-GPUs.pdf.Google ScholarGoogle Scholar
  16. E. Cuthill and J. McKee. 1969. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference (ACM’69). ACM, New York, NY, 157--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. 2014. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS’14). 349--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Daniel Delling, Andrew V. Goldberg, Andreas Nowatzyk, and Renato F. Werneck. 2010. PHAST: Hardware-accelerated shortest path trees. J. Parallel and Distrib. Comput. 73 (Sept. 2010), 940--952.Google ScholarGoogle Scholar
  19. Erich Elsen and Vishal Vaidyanathan. 2013. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the Gather-Apply-Scatter abstraction. Retrieved from http://www.github.com/RoyalCaliber/vertexAPI2.Google ScholarGoogle Scholar
  20. Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems (GRADES’14). Article 2, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Afton Geil, Yangzihao Wang, and John D. Owens. 2014. WTF, GPU! Computing twitter’s who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks (COSN’14). 63--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Abdullah Gharaibeh, Tahsin Reza, Elizeu Santos-Neto, Lauro Beltrao Costa, Scott Sallinen, and Matei Ripeanu. 2014. Efficient large-scale graph processing on hybrid CPU and GPU systems. CoRR abs/1312.3018, 1312.3018v2 (Dec. 2014). arXiv:1312.3018v2Google ScholarGoogle Scholar
  23. Ashish Goel, Pankaj Gupta, John Sirois, Dong Wang, Aneesh Sharma, and Siva Gurumurthy. 2015. The who-to-follow system at twitter: Strategy, algorithms, and revenue impact. Interfaces 45, 1 (Feb. 2015), 98--107. 0092-2102 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, 17--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, 599--613. Retrieved from http://dl.acm.org/citation.cfm?id=2685048.2685096 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Oded Green, Robert McColl, and David A. Bader. 2012. GPU merge path: A GPU merging algorithm. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). ACM, New York, NY, 331--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Oded Green, Lluís-Miquel Munguía, and David A. Bader. 2014. Load balanced clustering coefficients. In Proceedings of the 1st Workshop on Parallel Programming for Analytics Applications (PPAA’14). 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Oded Green, Pavan Yalamanchili, and Lluís-Miquel Munguía. 2014. Fast triangle counting on the GPU. In Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms (IA3’14). 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Douglas Gregor and Andrew Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. In Proceedings of the Conference on Parallel Object-Oriented Scientific Computing (POOSC’05).Google ScholarGoogle Scholar
  30. John Greiner. 1994. A comparison of parallel algorithms for connected components. In Proceedings of the 6th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’94). 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at twitter. In Proceedings of the International Conference on the World Wide Web. 505--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 337--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pawan Harish and P. J. Narayanan. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proceedings of the 14th International Conference on High Performance Computing (HiPC’07). Springer-Verlag, Berlin, 197--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mark Harris, John D. Owens, Shubho Sengupta, Yao Zhang, and Andrew Davidson. 2009--2016. CUDPP: CUDA Data Parallel Primitives Library. (2009--2016). Retrieved from http://cudpp.github.io/.Google ScholarGoogle Scholar
  35. Zhengyu He and Bo Hong. 2010. Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-hybrid platforms. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS’10). Georgia Institute of Technology, Atlanta, United States.Google ScholarGoogle Scholar
  36. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-marl: A DSL for easy and efficient graph analysis. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 349--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 2011. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yuntao Jia, Victor Lu, Jared Hoberock, Michael Garland, and John C. Hart. 2011. Edge v. node parallelism for graph centrality metrics. In GPU Computing Gems Jade Edition, Wen-mei W. Hwu (Ed.). Morgan Kaufmann, Chapter 2, 15--28.Google ScholarGoogle Scholar
  39. Rashid Kaleem, Sreepathi Pai, and Keshav Pingali. 2015. Stochastic gradient descent on GPUs. In Proceedings of the 8th Workshop on General Purpose Processing Using GPUs (GPGPU’15). ACM, New York, NY, 81--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, and David Glasco. 2011. GPUs and the future of parallel computing. IEEE Micro 31, 5 (Sept. 2011), 7--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jeremy Kepner, Peter Aaltonen, David Bader, Ayd&;inodot;n Bulu&ccdeil;, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Jose Moreira, John D. Owens, Carl Yang, Marcin Zalewski, and Timothy Mattson. 2016. Mathematical foundations of the graphBLAS. In Proceedings of the IEEE High Performance Extreme Computing Conference.Google ScholarGoogle ScholarCross RefCross Ref
  42. Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC8rsquo;14). 239--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the International Conference on the World Wide Web. 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jure Leskovec. 2009--2016. SNAP: Stanford Large Network Dataset Collection (2009--2016). Retrieved from http://snap.stanford.edu/data/.Google ScholarGoogle Scholar
  46. Jure Leskovec and Rok Sosič. 2016. SNAP: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, Article 68, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hang Liu, H. Howie Huang, and Yang Hu. 2016. iBFS: Concurrent breadth-first search on GPUs. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 403--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Proceedings of the 26th Annual Conference on Uncertainty in Artificial Intelligence (UAI’10). 340--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Robert Campbell McColl, David Ediger, Jason Poovey, Dan Campbell, and David A. Bader. 2014. A performance evaluation of open source graph databases. In Proceedings of the 1rst Workshop on Parallel Programming for Analytics Applications (PPAA’14). 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Adam McLaughlin and David A. Bader. 2014. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). 572--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. McLaughlin and D. A. Bader. 2015. Fast execution of simultaneous breadth-first searches on sparse graphs. In Proceedings of the IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS’15). 9--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. A. McLaughlin, J. Riedy, and D. A. Bader. 2015. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’15). 1--6.Google ScholarGoogle Scholar
  55. Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. U. Meyer and P. Sanders. 2003. -stepping: A parallelizable shortest path algorithm. J. Algor. 49, 1 (Oct. 2003), 114--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP’13). 456--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Sreepathi Pai and Keshav Pingali. 2016. A compiler for throughput optimization of graph algorithms on GPUs. SIGPLAN Not. 51, 10 (Oct. 2016), 1--19. 0362-1340 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2016. Multi-GPU graph analytics. CoRR abs/1504.04804, 1504.04804v3 (April 2016). arxiv:cs.DC/1504.04804v3Google ScholarGoogle Scholar
  60. Pushkar R. Pande and David A. Bader. 2011. Computing betweenness centrality for small world networks on a GPU. In Proceedings of the IEEE Conference on High Performance Embedded Computing.Google ScholarGoogle Scholar
  61. Roger Pearce, Maya Gokhale, and Nancy M. Amato. 2014. Faster parallel traversal of scale-free graphs at extreme scale with vertex delegates. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, 549--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 12--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. A. Polak. 2016. Counting triangles in large graphs on GPU. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’16). 740--746.Google ScholarGoogle ScholarCross RefCross Ref
  64. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 472--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Semih Salihoglu and Jennifer Widom. 2014. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems (GRADES’14). Article 3, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Scott Sallinen, Abdullah Gharaibeh, and Matei Ripeanu. 2015. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR abs/1503.04359, 1503.04359v1 (March 2015). arxiv:cs.DC/1503.04359v1Google ScholarGoogle Scholar
  67. Ahmet Erdem Sariyüce, Kamer Kaya, Erik Saule, and Ümit V. Çatalyürek. 2013. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU’13). 76--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Conference on Experimental and Efficient Algorithms (WEA’05). 606--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Hyunseok Seo, Jinwook Kim, and Min-Soo Kim. 2015. GStream: A graph streaming processing method for large-scale graphs on GPUs. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 253--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He, Hai Jin, Lu Lu, Zhixiang Wang, Xuan Luo, and Jianlong Zhong. 2015. Optimization of asynchronous graph processing on GPU with hybrid coloring model. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). 271--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. J. Shun and K. Tangwongsan. 2015. Multicore triangle computations without tuning. In Proceedings of the IEEE 31st International Conference on Data Engineering. 149--160. 1063-6382Google ScholarGoogle Scholar
  73. Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine. 2001. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley.Google ScholarGoogle Scholar
  74. G. M. Slota, S. Rajamanickam, and K. Madduri. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In Proceedings of the IEEE 28th International Parallel and Distributed Processing Symposium. 550--559. 1530-2075 Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Jyothish Soman, Kothapalli Kishore, and P. J. Narayanan. 2010. A fast GPU algorithm for graph connectivity. In pRO24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum (IPDPSW’10). 1--8.Google ScholarGoogle Scholar
  76. Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow. 5, 9 (May 2012), 788--799. 2150-8097 Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Ha-Nguyen Tran, Jung-jae Kim, and Bingsheng He. 2015. Fast subgraph matching on large graphs using graphics processors. In Database Systems for Advanced Applications, Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, and Muhammad Aamir Cheema (Eds.). Lecture Notes in Computer Science, Vol. 9049. Springer, 299--315.Google ScholarGoogle ScholarCross RefCross Ref
  78. Stanley Tzeng, Brandon Lloyd, and John D. Owens. 2012. A GPU task-parallel model with dependency resolution. IEEE Comput. 45, 8 (Aug. 2012), 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103--111. 0001-0782 Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Guozhang Wang, Wenlei Xie, Alan J. Demers, and Johannes Gehrke. 2013. Asynchronous large-scale graph processing made easy. In Proceedings of the Conference on Innovative data Systems Research (CIDR’13). www.cidrdb.org.Google ScholarGoogle Scholar
  81. Leyuan Wang, Yangzihao Wang, Carl Yang, and John D. Owens. 2016. A comparative study on exact triangle counting algorithms on the GPU. In Proceedings of the 1st High Performance Graph Processing Workshop (HPGP’16). 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 11:1--11:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Bo Wu, Zhijia Zhao, Eddy Zheng Zhang, Yunlian Jiang, and Xipeng Shen. 2013. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 57--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Yuduo Wu, Yangzihao Wang, Yuechao Pan, Carl Yang, and John D. Owens. 2015. Performance characterization for high-level programming models for GPU graph analytics. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’15). 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Carl Yang, Yangzihao Wang, and John D. Owens. 2015. Fast sparse matrix and sparse vector multiplication algorithm on the GPU. In Graph Algorithms Building Blocks (GABB’15). 841--847. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25, 6 (June 2014), 1543--1552. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Gunrock: GPU Graph Analytics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Parallel Computing
          ACM Transactions on Parallel Computing  Volume 4, Issue 1
          Special Issue: Invited papers from PPoPP 2016, Part 1
          March 2017
          170 pages
          ISSN:2329-4949
          EISSN:2329-4957
          DOI:10.1145/3131890
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 August 2017
          • Accepted: 1 August 2017
          • Revised: 1 May 2017
          • Received: 1 December 2016
          Published in topc Volume 4, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader