ABSTRACT
Parallelizing graph algorithms on GPUs is challenging due to the irregular memory accesses and control-flow involved in graph traversals. In this work, we tame these challenges by injecting approximations. In particular, we improve memory coalescing by renumbering and replicating nodes, memory latency by adding edges among specific nodes brought into shared memory, and thread-divergence by normalizing degrees across nodes assigned to a warp. Using a suite of graphs with varied characteristics and five popular algorithms, we demonstrate the effectiveness of our proposed techniques. Our approximations for coalescing, memory latency and thread-divergence lead to mean speedups of 1.3×, 1.41× and 1.06× achieving accuracies of 83%, 78% and 84%, respectively.
- S. Singh and R. Nasre. 2018. Scalable and Performant Graph Processing on GPUs Using Approximate Computing. TMSCS 4, 3 (2018), 190--203.Google Scholar
- B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. 2013. Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-coalesced Memory Accesses on GPU. In PPoPP '13. ACM, 57--68. Google ScholarDigital Library
Index Terms
- Optimizing graph processing on GPUs using approximate computing: poster
Recommendations
A pattern based algorithmic autotuner for graph processing on GPUs
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel ProgrammingThis paper proposes Gswitch, a pattern-based algorithmic auto-tuning system that dynamically switches between optimization variants with negligible overhead. Its novelty lies in a small set of algorithmic patterns that allow for the configurable ...
Scalable SIMD-Efficient Graph Processing on GPUs
PACT '15: Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)The vast computing power of GPUs makes them an attractive platform for accelerating large scale data parallel computations such as popular graph processing applications. However, the inherent irregularity and large sizes of real-world power law graphs ...
Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)
The practical application of hydrological uncertainty models that are designed to generate multiple ensembles can be severely restricted by the available computer processing power and thus, the time taken to generate the results. CPU clusters can help ...
Comments