skip to main content
10.1145/2600212.2600227acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

CuSha: vertex-centric graph processing on GPUs

Authors Info & Claims
Published:23 June 2014Publication History

ABSTRACT

Vertex-centric graph processing is employed by many popular algorithms (e.g., PageRank) due to its simplicity and efficient use of asynchronous parallelism. The high compute power provided by SIMT architecture presents an opportunity for accelerating these algorithms using GPUs. Prior works of graph processing on a GPU employ Compressed Sparse Row (CSR) form for its space-efficiency; however, CSR suffers from irregular memory accesses and GPU underutilization that limit its performance. In this paper, we present CuSha, a CUDA-based graph processing framework that overcomes the above obstacle via use of two novel graph representations: G-Shards and Concatenated Windows (CW). G-Shards uses a concept recently introduced for non-GPU systems that organizes a graph into autonomous sets of ordered edges called shards. CuSha's mapping of GPU hardware resources on to shards allows fully coalesced memory accesses. CW is a novel representation that enhances the use of shards to achieve higher GPU utilization for processing sparse graphs. Finally, CuSha fully utilizes the GPU power by processing multiple shards in parallel on GPU's streaming multiprocessors. For ease of programming, CuSha allows the user to define the vertex-centric computation and plug it into its framework for parallel processing of large graphs. Our experiments show that CuSha provides significant speedups over the state-of-the-art CSR-based virtual warp-centric method for processing graphs on GPUs.

References

  1. V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph exploration on multicore processors. In SC, pages 1--11, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. A. Bader and K. Madduri. Snap, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks. In IPDPS, pages 1--12, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS, pages 163--174, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Barnat, P. Bauch, L. Brim, and M. Ceska. Computing strongly connected components in parallel on cuda. In IPDPS, pages 544--555, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In In SDM, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Che, J. Sheaffer, and K. Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In SC, pages 1--11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. The anatomy of a scientific rumor. Scientific Reports, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Gharaibeh, L. Beltrão Costa, E. Santos-Neto, and M. Ripeanu. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In PACT, pages 345--354, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Gharaibeh, L. Beltrão Costa, E. Santos-Neto, and M. Ripeanu. Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems. In CoRR, 2013.Google ScholarGoogle Scholar
  10. P. Harish and P. J. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In HiPC, pages 197--208, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Harris et al. Optimizing parallel reduction in cuda. NVIDIA Developer Technology, 2, 2007.Google ScholarGoogle Scholar
  12. S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating cuda graph algorithms at maximum warp. In PPoPP, pages 267--276, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Hong, T. Oguntebi, and K. Olukotun. Efficient parallel graph exploration on multi-core cpu and gpu. In PACT, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In OSDI, pages 31--46, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Leskovec. Stanford large network dataset collection. http://snap.stanford.edu/data/index.html, 2011.Google ScholarGoogle Scholar
  16. J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Trans. Web, 1(1), May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. CoRR, abs/0810.1355, 2008.Google ScholarGoogle Scholar
  18. A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(01):5--20, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  19. L. Luo, M. Wong, and W.-m. Hwu. An effective gpu implementation of breadth-first search. In DAC, pages 52--55, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Mendez-Lojo, M. Burtscher, and K. Pingali. A gpu implementation of inclusion-based points-to analysis. In PPoPP, pages 107--116, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. In PPoPP, pages 117--128, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Nasre, M. Burtscher, and K. Pingali. Morph algorithms on gpus. In PPoPP, pages 147--156, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999--66, Stanford InfoLab, 1999.Google ScholarGoogle Scholar
  24. M. Samadi, A. Hormati, M. Mehrara, J. Lee, and S. Mahlke. Adaptive input-aware compilation for graphics engines. In PLDI, pages 13--22, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Takac and M. Zabovsky. Data analysis in public social networks. In International Scientific Conference and International Workshop Present Day Trends of Innovations, 2012.Google ScholarGoogle Scholar
  26. L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. In PPoPP, pages 57--68, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Wu, B. Wang, Y. Shan, F. Yan, Y. Wang, and N. Xu. Efficient pagerank and spmv computation on amd gpus. In ICPP, pages 81--89, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the- y elimination of dynamic irregularities for gpu computing. In ASPLOS, pages 369--380, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Zhong, and B. He. Medusa: Simplied Graph Processing on GPUs. In IEEE Transactions on Parallel and Distributed Systems, 2013.Google ScholarGoogle Scholar

Index Terms

  1. CuSha: vertex-centric graph processing on GPUs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
        June 2014
        334 pages
        ISBN:9781450327497
        DOI:10.1145/2600212

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        HPDC '14 Paper Acceptance Rate21of130submissions,16%Overall Acceptance Rate166of966submissions,17%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader