Abstract
For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock’s overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, such as Ligra and Galois, and better performance than any other GPU high-level graph library.
- Christopher R. Aberger, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2015. EmptyHeaded: Boolean algebra based graph processing. CoRR abs/1503.02368 (2015). http://arxiv.org/abs/1503.02368Google Scholar
- Saman Ashkiani, Andrew A. Davidson, Ulrich Meyer, and John D. Owens. 2016. GPU multisplit. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 12:1--12:13. Google ScholarDigital Library
- Jiri Barnat, Petr Bauch, Lubos Brim, and Milan Ceska. 2011. Computing strongly connected components in parallel on CUDA. In Proceedings of the 2011 IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’11). IEEE Computer Society, Washington, DC, 544--555. Google ScholarDigital Library
- Sean Baxter. 2013. Modern GPU Multisets. Retrieved from https://nvlabs.github.io/moderngpu/sets.html.Google Scholar
- Sean Baxter. 2013--2016. Moderngpu: Patterns and Behaviors for GPU Computing. Retrieved from http://moderngpu.github.io/moderngpu.Google Scholar
- Scott Beamer, Krste Asanovi´, and David Patterson. 2012. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). Article 12, 10 pages. Google ScholarDigital Library
- Markus Billeter, Ola Olsson, and Ulf Assarsson. 2009. Efficient stream compaction on wide SIMD many-core architectures. In Proceedings of the Conference on High Performance Graphics 2009 (HPG’09). ACM, New York, NY, 159--166. Google ScholarDigital Library
- Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 2 (2001), 163--177.Google ScholarCross Ref
- M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. 2010. COSI: Cloud oriented subgraph identification in massive social networks. In Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’10). IEEE Computer Society, Washington, DC, USA, 248--255. Google ScholarDigital Library
- Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC) (IISWC’12). IEEE Computer Society, Washington, DC, 141--151. Google ScholarDigital Library
- Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’12). 141--151. Google ScholarDigital Library
- F. Busato and N. Bombieri. 2015. BFS-4K: An efficient implementation of BFS for kepler GPU architectures. IEEE Trans. Parallel Distrib. Syst. 26, 7 (July 2015), 1826--1838. 1045-9219Google ScholarCross Ref
- Daniel Cederman and Philippas Tsigas. 2008. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008. 57--64. Google ScholarDigital Library
- Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE Computer Society, Washington, DC.Google Scholar
- Jonathan Cohen and Patrice CastonGuay. 2012. Efficient graph matching and coloring on the GPU. GPU Technology Conference. Retrieved from http://on-demand.gputechconf.com/gtc/2012/presentations/S0332-Efficient-Graph-Matching-and-Coloring-on-GPUs.pdf.Google Scholar
- E. Cuthill and J. McKee. 1969. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference (ACM’69). ACM, New York, NY, 157--172. Google ScholarDigital Library
- Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. 2014. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS’14). 349--359. Google ScholarDigital Library
- Daniel Delling, Andrew V. Goldberg, Andreas Nowatzyk, and Renato F. Werneck. 2010. PHAST: Hardware-accelerated shortest path trees. J. Parallel and Distrib. Comput. 73 (Sept. 2010), 940--952.Google Scholar
- Erich Elsen and Vishal Vaidyanathan. 2013. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the Gather-Apply-Scatter abstraction. Retrieved from http://www.github.com/RoyalCaliber/vertexAPI2.Google Scholar
- Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems (GRADES’14). Article 2, 6 pages. Google ScholarDigital Library
- Afton Geil, Yangzihao Wang, and John D. Owens. 2014. WTF, GPU! Computing twitter’s who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks (COSN’14). 63--68. Google ScholarDigital Library
- Abdullah Gharaibeh, Tahsin Reza, Elizeu Santos-Neto, Lauro Beltrao Costa, Scott Sallinen, and Matei Ripeanu. 2014. Efficient large-scale graph processing on hybrid CPU and GPU systems. CoRR abs/1312.3018, 1312.3018v2 (Dec. 2014). arXiv:1312.3018v2Google Scholar
- Ashish Goel, Pankaj Gupta, John Sirois, Dong Wang, Aneesh Sharma, and Siva Gurumurthy. 2015. The who-to-follow system at twitter: Strategy, algorithms, and revenue impact. Interfaces 45, 1 (Feb. 2015), 98--107. 0092-2102 Google ScholarDigital Library
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, 17--30. Google ScholarDigital Library
- Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, 599--613. Retrieved from http://dl.acm.org/citation.cfm?id=2685048.2685096 Google ScholarDigital Library
- Oded Green, Robert McColl, and David A. Bader. 2012. GPU merge path: A GPU merging algorithm. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). ACM, New York, NY, 331--340. Google ScholarDigital Library
- Oded Green, Lluís-Miquel Munguía, and David A. Bader. 2014. Load balanced clustering coefficients. In Proceedings of the 1st Workshop on Parallel Programming for Analytics Applications (PPAA’14). 3--10. Google ScholarDigital Library
- Oded Green, Pavan Yalamanchili, and Lluís-Miquel Munguía. 2014. Fast triangle counting on the GPU. In Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms (IA3’14). 1--8. Google ScholarDigital Library
- Douglas Gregor and Andrew Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. In Proceedings of the Conference on Parallel Object-Oriented Scientific Computing (POOSC’05).Google Scholar
- John Greiner. 1994. A comparison of parallel algorithms for connected components. In Proceedings of the 6th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’94). 16--25. Google ScholarDigital Library
- Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at twitter. In Proceedings of the International Conference on the World Wide Web. 505--514. Google ScholarDigital Library
- Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 337--348. Google ScholarDigital Library
- Pawan Harish and P. J. Narayanan. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proceedings of the 14th International Conference on High Performance Computing (HiPC’07). Springer-Verlag, Berlin, 197--208. Google ScholarDigital Library
- Mark Harris, John D. Owens, Shubho Sengupta, Yao Zhang, and Andrew Davidson. 2009--2016. CUDPP: CUDA Data Parallel Primitives Library. (2009--2016). Retrieved from http://cudpp.github.io/.Google Scholar
- Zhengyu He and Bo Hong. 2010. Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-hybrid platforms. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS’10). Georgia Institute of Technology, Atlanta, United States.Google Scholar
- Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-marl: A DSL for easy and efficient graph analysis. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 349--362. Google ScholarDigital Library
- Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 2011. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). 267--276. Google ScholarDigital Library
- Yuntao Jia, Victor Lu, Jared Hoberock, Michael Garland, and John C. Hart. 2011. Edge v. node parallelism for graph centrality metrics. In GPU Computing Gems Jade Edition, Wen-mei W. Hwu (Ed.). Morgan Kaufmann, Chapter 2, 15--28.Google Scholar
- Rashid Kaleem, Sreepathi Pai, and Keshav Pingali. 2015. Stochastic gradient descent on GPUs. In Proceedings of the 8th Workshop on General Purpose Processing Using GPUs (GPGPU’15). ACM, New York, NY, 81--89. Google ScholarDigital Library
- Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, and David Glasco. 2011. GPUs and the future of parallel computing. IEEE Micro 31, 5 (Sept. 2011), 7--17. Google ScholarDigital Library
- Jeremy Kepner, Peter Aaltonen, David Bader, Ayd&;inodot;n Bulu&ccdeil;, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Jose Moreira, John D. Owens, Carl Yang, Marcin Zalewski, and Timothy Mattson. 2016. Mathematical foundations of the graphBLAS. In Proceedings of the IEEE High Performance Extreme Computing Conference.Google ScholarCross Ref
- Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC8rsquo;14). 239--252. Google ScholarDigital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the International Conference on the World Wide Web. 591--600. Google ScholarDigital Library
- Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarDigital Library
- Jure Leskovec. 2009--2016. SNAP: Stanford Large Network Dataset Collection (2009--2016). Retrieved from http://snap.stanford.edu/data/.Google Scholar
- Jure Leskovec and Rok Sosič. 2016. SNAP: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1. Google ScholarDigital Library
- Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, Article 68, 12 pages. Google ScholarDigital Library
- Hang Liu, H. Howie Huang, and Yang Hu. 2016. iBFS: Concurrent breadth-first search on GPUs. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 403--416. Google ScholarDigital Library
- Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Proceedings of the 26th Annual Conference on Uncertainty in Artificial Intelligence (UAI’10). 340--349. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146. Google ScholarDigital Library
- Robert Campbell McColl, David Ediger, Jason Poovey, Dan Campbell, and David A. Bader. 2014. A performance evaluation of open source graph databases. In Proceedings of the 1rst Workshop on Parallel Programming for Analytics Applications (PPAA’14). 11--18. Google ScholarDigital Library
- Adam McLaughlin and David A. Bader. 2014. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). 572--583. Google ScholarDigital Library
- A. McLaughlin and D. A. Bader. 2015. Fast execution of simultaneous breadth-first searches on sparse graphs. In Proceedings of the IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS’15). 9--18. Google ScholarDigital Library
- A. McLaughlin, J. Riedy, and D. A. Bader. 2015. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’15). 1--6.Google Scholar
- Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). 117--128. Google ScholarDigital Library
- U. Meyer and P. Sanders. 2003. -stepping: A parallelizable shortest path algorithm. J. Algor. 49, 1 (Oct. 2003), 114--152. Google ScholarDigital Library
- Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP’13). 456--471. Google ScholarDigital Library
- Sreepathi Pai and Keshav Pingali. 2016. A compiler for throughput optimization of graph algorithms on GPUs. SIGPLAN Not. 51, 10 (Oct. 2016), 1--19. 0362-1340 Google ScholarDigital Library
- Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2016. Multi-GPU graph analytics. CoRR abs/1504.04804, 1504.04804v3 (April 2016). arxiv:cs.DC/1504.04804v3Google Scholar
- Pushkar R. Pande and David A. Bader. 2011. Computing betweenness centrality for small world networks on a GPU. In Proceedings of the IEEE Conference on High Performance Embedded Computing.Google Scholar
- Roger Pearce, Maya Gokhale, and Nancy M. Amato. 2014. Faster parallel traversal of scale-free graphs at extreme scale with vertex delegates. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, 549--559. Google ScholarDigital Library
- Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 12--25. Google ScholarDigital Library
- A. Polak. 2016. Counting triangles in large graphs on GPU. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’16). 740--746.Google ScholarCross Ref
- Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 472--488. Google ScholarDigital Library
- Semih Salihoglu and Jennifer Widom. 2014. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems (GRADES’14). Article 3, 6 pages. Google ScholarDigital Library
- Scott Sallinen, Abdullah Gharaibeh, and Matei Ripeanu. 2015. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR abs/1503.04359, 1503.04359v1 (March 2015). arxiv:cs.DC/1503.04359v1Google Scholar
- Ahmet Erdem Sariyüce, Kamer Kaya, Erik Saule, and Ümit V. Çatalyürek. 2013. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU’13). 76--85. Google ScholarDigital Library
- Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Conference on Experimental and Efficient Algorithms (WEA’05). 606--609. Google ScholarDigital Library
- Hyunseok Seo, Jinwook Kim, and Min-Soo Kim. 2015. GStream: A graph streaming processing method for large-scale graphs on GPUs. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 253--254. Google ScholarDigital Library
- Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He, Hai Jin, Lu Lu, Zhixiang Wang, Xuan Luo, and Jianlong Zhong. 2015. Optimization of asynchronous graph processing on GPU with hybrid coloring model. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). 271--272. Google ScholarDigital Library
- Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). 135--146. Google ScholarDigital Library
- J. Shun and K. Tangwongsan. 2015. Multicore triangle computations without tuning. In Proceedings of the IEEE 31st International Conference on Data Engineering. 149--160. 1063-6382Google Scholar
- Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine. 2001. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley.Google Scholar
- G. M. Slota, S. Rajamanickam, and K. Madduri. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In Proceedings of the IEEE 28th International Parallel and Distributed Processing Symposium. 550--559. 1530-2075 Google ScholarDigital Library
- Jyothish Soman, Kothapalli Kishore, and P. J. Narayanan. 2010. A fast GPU algorithm for graph connectivity. In pRO24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum (IPDPSW’10). 1--8.Google Scholar
- Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow. 5, 9 (May 2012), 788--799. 2150-8097 Google ScholarDigital Library
- Ha-Nguyen Tran, Jung-jae Kim, and Bingsheng He. 2015. Fast subgraph matching on large graphs using graphics processors. In Database Systems for Advanced Applications, Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, and Muhammad Aamir Cheema (Eds.). Lecture Notes in Computer Science, Vol. 9049. Springer, 299--315.Google ScholarCross Ref
- Stanley Tzeng, Brandon Lloyd, and John D. Owens. 2012. A GPU task-parallel model with dependency resolution. IEEE Comput. 45, 8 (Aug. 2012), 34--41. Google ScholarDigital Library
- Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103--111. 0001-0782 Google ScholarDigital Library
- Guozhang Wang, Wenlei Xie, Alan J. Demers, and Johannes Gehrke. 2013. Asynchronous large-scale graph processing made easy. In Proceedings of the Conference on Innovative data Systems Research (CIDR’13). www.cidrdb.org.Google Scholar
- Leyuan Wang, Yangzihao Wang, Carl Yang, and John D. Owens. 2016. A comparative study on exact triangle counting algorithms on the GPU. In Proceedings of the 1st High Performance Graph Processing Workshop (HPGP’16). 1--8. Google ScholarDigital Library
- Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 11:1--11:12. Google ScholarDigital Library
- Bo Wu, Zhijia Zhao, Eddy Zheng Zhang, Yunlian Jiang, and Xipeng Shen. 2013. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 57--68. Google ScholarDigital Library
- Yuduo Wu, Yangzihao Wang, Yuechao Pan, Carl Yang, and John D. Owens. 2015. Performance characterization for high-level programming models for GPU graph analytics. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’15). 66--75. Google ScholarDigital Library
- Carl Yang, Yangzihao Wang, and John D. Owens. 2015. Fast sparse matrix and sparse vector multiplication algorithm on the GPU. In Graph Algorithms Building Blocks (GABB’15). 841--847. Google ScholarDigital Library
- Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25, 6 (June 2014), 1543--1552. Google ScholarDigital Library
Index Terms
- Gunrock: GPU Graph Analytics
Recommendations
Gunrock: a high-performance graph processing library on the GPU
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingFor large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level ...
Gunrock: a high-performance graph processing library on the GPU
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingFor large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-...
Gunrock: a high-performance graph processing library on the GPU
PPoPP '15For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-...
Comments