2014 | OriginalPaper | Buchkapitel
A Throughput-Aware Analytical Performance Model for GPU Applications
verfasst von : Zhidan Hu, Guangming Liu, Wenrui Dong
Erschienen in: Advanced Computer Architecture
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Graphics processing units (GPUs) have shown increased popularity in general-purpose parallel processing. This massively parallel architecture allows GPUs to execute tens of thousands of threads in parallel to solve heavily data-parallel problems efficiently. However, despite the tremendous computing power, optimizing GPU kernels to achieve high performance is still a challenge due to the sea change from CPU to GPU and lacking of tools for programming and performance analysis.
In this paper, we propose a throughput-aware analytical model to estimate the performance of GPU kernels and optimizations. We construct a pipeline for global memory access servicing and redefine the compute throughput and memory throughput as the speed of memory requests arriving and leaving the pipeline. Based on concluding the kernel throughput limiting factor, GPU programs are classified into compute-bound and memory-bound categories and then we predict performance for each category. Besides, our model can provide useful information on the direction of optimization and predict the potential performance benefits. We demonstrate our model on a manually written benchmark as well as the matrix-multiply kernel and show that the geometric mean of absolute error of our model is less than 6.5%.