2013 | OriginalPaper | Buchkapitel
Dynamic Task Parallelism with a GPU Work-Stealing Runtime System
verfasst von : Sanjay Chatterjee, Max Grossman, Alina Sbîrlea, Vivek Sarkar
Erschienen in: Languages and Compilers for Parallel Computing
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
NVIDIA’s Compute Unified Device Architecture (CUDA) enabled GPUs become accessible to mainstream programming. Abundance of simple computational cores and high memory bandwidth make GPUs ideal candidates for data parallel applications. However, its potential for executing applications that combine task and data parallelism has not been explored in detail. CUDA does not provide a viable interface for creating dynamic tasks and handling load balancing issues. Any support for such has to be orchestrated entirely by the CUDA programmer today.
In this work, we introduce a
finish-async
style API to GPU device programming as first step towards task parallelism. We present the design and implementation details of our new intra-device inter-SM work-stealing runtime system. We compare performance results using our runtime to direct execution on the device as well as past work on GPU runtimes. Finally, we show how this runtime can be targeted by extensions to the high-level CnC-CUDA programming model.