Optimizing occupancy and ILP on the GPU using a combinatorial approach

Authors:
Ghassan Shobaki

California State University at Sacramento, USA

California State University at Sacramento, USA
View Profile

,
Austin Kerbow

Advanced Micro Devices, USA

Advanced Micro Devices, USA
View Profile

,
Stanislav Mekhanoshin

Advanced Micro Devices, USA

Advanced Micro Devices, USA
View Profile

CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and OptimizationFebruary 2020Pages 133–144https://doi.org/10.1145/3368826.3377918

Published:22 February 2020Publication History

CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

Pages 133–144

ABSTRACT

This paper presents the first general solution to the problem of optimizing both occupancy and Instruction-Level Parallelism (ILP) when compiling for a Graphics Processing Unit (GPU). Exploiting ILP (minimizing schedule length) requires using more registers, but using more registers decreases occupancy (the number of thread groups that can be run in parallel). The problem of balancing these two conflicting objectives to achieve the best overall performance is a challenging open problem in code optimization. In this paper, we present a two-pass Branch-and-Bound (B&B) algorithm for solving this problem by treating occupancy as a primary objective and ILP as a secondary objective. In the first pass, the algorithm searches for a maximum-occupancy schedule, while in the second pass it iteratively searches for the shortest schedule that gives the maximum occupancy found in the first pass. The proposed scheduling algorithm was implemented in the LLVM compiler and applied to an AMD GPU. The algorithm’s performance was evaluated using benchmarks from the PlaidML machine learning framework relative to LLVM’s scheduling algorithm, AMD’s production scheduling algorithm and an existing B&B scheduling algorithm that uses a different approach. The results show that the proposed B&B scheduling algorithm speeds up almost every benchmark by up to 35% relative to LLVM’s scheduler, up to 31% relative to AMD’s scheduler and up to 18% relative to the existing B&B scheduler. The geometric-mean improvements are 16.3% relative to LLVM’s scheduler, 5.5% relative to AMD’s production scheduler and 6.2% relative to the existing B&B scheduler. If more compile time can be tolerated, a geometric-mean improvement of 6.3% relative to AMD’s scheduler can be achieved.

Index Terms

Optimizing occupancy and ILP on the GPU using a combinatorial approach
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization
This paper describes a new approach to register-pressure-aware instruction scheduling, using Ant Colony Optimization (ACO). ACO is a nature-inspired optimization technique that researchers have successfully applied to NP-hard sequencing problems like the ...
Read More
Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic techniques ...
Read More
Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling

Multiple combinatorial algorithms have been proposed for doing pre-allocation instruction scheduling with the objective of minimizing register pressure or balancing register pressure and instruction-level parallelism. The cost function that is minimized ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
February 2020
329 pages
ISBN:9781450370479
DOI:10.1145/3368826
General Chairs:
Jason Mars
University of Michigan, USA
,
Lingjia Tang
University of Michigan, USA
,
Program Chairs:
Jingling Xue
UNSW, Australia
,
Peng Wu
Futurewei Technologies, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Branch-and-Bound
Compiler Optimizations
Graphics Processing Unit (GPU)
Instruction Scheduling
Instruction-Level Parallelism (ILP)
Performance Optimization
Register Pressure Reduction
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,061submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 1,126
  Total Downloads
- Downloads (Last 12 months)260
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing occupancy and ILP on the GPU using a combinatorial approach

CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

ABSTRACT

Cited By

Index Terms

Recommendations

Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling