research-article

Free Access

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Authors:
Ghassan Shobaki

Princess Sumaya University for Technology, Jordan

Princess Sumaya University for Technology, Jordan
View Profile

,
Maxim Shawabkeh

Google

Google
View Profile

,
Najm Eldeen Abu Rmaileh

Princess Sumaya University for Technology, Jordan

Princess Sumaya University for Technology, Jordan
View Profile

ACM Transactions on Architecture and Code Optimization Volume 10 Issue 3Article No.: 14pp 1–31https://doi.org/10.1145/2512432

Published:16 September 2013Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic techniques have been proposed to solve this problem. However, due to the inherently conflicting requirements of maximizing ILP and minimizing register pressure, heuristic techniques may produce poor schedules in many cases. If such cases occur in hot code, significant performance degradation may result. A few combinatorial optimization approaches have also been proposed, but none of them has been shown to solve large real-world instances within reasonable time. This article presents the first combinatorial algorithm that is efficient enough to optimally solve large instances of this problem (basic blocks with hundreds of instructions) within a few seconds per instance. The proposed algorithm uses branch-and-bound enumeration with a number of powerful pruning techniques to efficiently search the solution space. The search is based on a cost function that incorporates schedule length and register pressure. An implementation of the proposed scheduling algorithm has been integrated into the LLVM Compiler and evaluated using SPEC CPU 2006. On x86-64, with a time limit of 10ms per instruction, it optimally schedules 79% of the hot basic blocks in FP2006. Another 19% of the blocks are not optimally scheduled but are improved in cost relative to LLVM's heuristic. This improves the execution time of some benchmarks by up to 21%, with a geometric-mean improvement of 2.4% across the entire benchmark suite. With the use of precise latency information, the geometric-mean improvement is increased to 2.8%.

References

Barany, G. 2011. Register reuse scheduling. In Proceedings of the 9^th Workshop on Optimizations for DSP and Embedded Systems (ODES'11).Google Scholar
Barany, G. and Krall, A. 2013. Optimal and heuristic global code motion for minimal spilling. In Proceedings of the International Conference on Compiler Construction. Google ScholarDigital Library
Berson, D., Gupta, R., and Soffa, M. 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures. In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. 243--254. Google ScholarDigital Library
Cooper, K. and Torczon, L. 2004. Engineering a Compiler. Morgan Kaufmann, San Fransisco, CA. Google ScholarDigital Library
Faraboschi, P., Fisher, J., and Young, C. 2001. Instruction scheduling for instruction level parallel processors. Proc. IEEE 89, 11, 1638--1659.Google ScholarCross Ref
Fog, A. 2012. The micro-architecture of INTEL, AMD and VIA cpus. An optimization guide for assembly programmers and compiler makers. http://www.agner.org/optimize/microarchitecture.pdf.Google Scholar
Goodman, J. and Hsu, W. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the International Conference on Supercomputing. Google ScholarDigital Library
Govindarajan, R., Yang, H., Amaral, J., Zhang, C., and Gao, G. 2003. Minimum register instruction sequencing to reduce register spills in out-of-order. IEEE Trans. Comput. 52, 1, 4--20. Google ScholarDigital Library
Havanki, W., Banerjia, S., and Conte, T. 1998. Treegion scheduling for wide-issue processors. In Proceedings of the 4^th International Symposium on High-Performance Computer Architecture (HPCA'98). Google ScholarDigital Library
Kessler, C. 1998. Scheduling expression DAGs for minimal register need. J. Comput. Lang. 24, 1, 33--53. Google ScholarDigital Library
Langevin, M. and Cerny, E. 1996. A recursive technique for computing lower-bound performance of schedules. ACM Trans. Des. Autom. Electron. Syst. 1, 4, 443--456. Google ScholarDigital Library
Malik, A. 2008. Constraint programming techniques for optimal instruction scheduling. Ph.D. thesis, University of Waterloo. https://cs.uwaterloo.ca/&sim;vanbeek/Publications/malik.pdf. Google ScholarDigital Library
Rim, M. and Jain, R. 1994. Lower-bound performance estimation for the high-level synthesis scheduling problem. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 13, 4, 451--458. Google ScholarDigital Library
Shobaki, G. and Wilken, K. 2004. Optimal superblock scheduling using enumeration. In Proceedings of the 37^th International Symposium on Microarchitecture. Google ScholarDigital Library
Shobaki, G. 2006. Optimal global instruction scheduling using enumeration. Ph.D. dissertation, Department of Computer Science, UC Davis. http://www.cs.ucdavis.edu/research/tech-reports/2006/CSE-2006-19.pdf. Google ScholarDigital Library
Shobaki, G., Wilken, K., and Heffernan, M. 2009. Optimal trace scheduling using enumeration. ACM Trans. Archit. Code Optim. 5, 4. Google ScholarDigital Library
Touati, S. 2005. Register saturation in instruction-level parallelism. Int. J. Parallel Program. 33, 4, 393--449. Google ScholarDigital Library
Weicker, R. and Henning, J. 2007. Subroutine profiling results for the CPU2006 benchmarks. ACM SIGARCH Comput. Archit. News 35, 1, 102--111. Google ScholarDigital Library
Winkel, S. 2007. Optimal versus heuristic global code scheduling. In Proceedings of the 40^th International Symposium on Microarchitecture. Google ScholarDigital Library

Index Terms

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Recommendations

Combinatorial Register Allocation and Instruction Scheduling

This article introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-...
Read More
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization
This paper describes a new approach to register-pressure-aware instruction scheduling, using Ant Colony Optimization (ACO). ACO is a nature-inspired optimization technique that researchers have successfully applied to NP-hard sequencing problems like the ...
Read More
Optimizing occupancy and ILP on the GPU using a combinatorial approach
CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

This paper presents the first general solution to the problem of optimizing both occupancy and Instruction-Level Parallelism (ILP) when compiling for a Graphics Processing Unit (GPU). Exploiting ILP (minimizing schedule length) requires using more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 10, Issue 3
September 2013
310 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2509420
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 September 2013
- Accepted: 1 March 2013
- Revised: 1 February 2013
- Received: 1 February 2012
Published in taco Volume 10, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compiler optimizations
NP-complete problems
branch-and-bound enumeration
instruction-level parallelism (ILP)
optimal instruction scheduling
performance optimization
register pressure reduction
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 1,298
  Total Downloads
- Downloads (Last 12 months)162
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Combinatorial Register Allocation and Instruction Scheduling

Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization

Optimizing occupancy and ILP on the GPU using a combinatorial approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Combinatorial Register Allocation and Instruction Scheduling

Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization

Optimizing occupancy and ILP on the GPU using a combinatorial approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media