skip to main content
research-article
Free Access

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Published:16 September 2013Publication History
Skip Abstract Section

Abstract

Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic techniques have been proposed to solve this problem. However, due to the inherently conflicting requirements of maximizing ILP and minimizing register pressure, heuristic techniques may produce poor schedules in many cases. If such cases occur in hot code, significant performance degradation may result. A few combinatorial optimization approaches have also been proposed, but none of them has been shown to solve large real-world instances within reasonable time. This article presents the first combinatorial algorithm that is efficient enough to optimally solve large instances of this problem (basic blocks with hundreds of instructions) within a few seconds per instance. The proposed algorithm uses branch-and-bound enumeration with a number of powerful pruning techniques to efficiently search the solution space. The search is based on a cost function that incorporates schedule length and register pressure. An implementation of the proposed scheduling algorithm has been integrated into the LLVM Compiler and evaluated using SPEC CPU 2006. On x86-64, with a time limit of 10ms per instruction, it optimally schedules 79% of the hot basic blocks in FP2006. Another 19% of the blocks are not optimally scheduled but are improved in cost relative to LLVM's heuristic. This improves the execution time of some benchmarks by up to 21%, with a geometric-mean improvement of 2.4% across the entire benchmark suite. With the use of precise latency information, the geometric-mean improvement is increased to 2.8%.

References

  1. Barany, G. 2011. Register reuse scheduling. In Proceedings of the 9th Workshop on Optimizations for DSP and Embedded Systems (ODES'11).Google ScholarGoogle Scholar
  2. Barany, G. and Krall, A. 2013. Optimal and heuristic global code motion for minimal spilling. In Proceedings of the International Conference on Compiler Construction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berson, D., Gupta, R., and Soffa, M. 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures. In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cooper, K. and Torczon, L. 2004. Engineering a Compiler. Morgan Kaufmann, San Fransisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Faraboschi, P., Fisher, J., and Young, C. 2001. Instruction scheduling for instruction level parallel processors. Proc. IEEE 89, 11, 1638--1659.Google ScholarGoogle ScholarCross RefCross Ref
  6. Fog, A. 2012. The micro-architecture of INTEL, AMD and VIA cpus. An optimization guide for assembly programmers and compiler makers. http://www.agner.org/optimize/microarchitecture.pdf.Google ScholarGoogle Scholar
  7. Goodman, J. and Hsu, W. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the International Conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Govindarajan, R., Yang, H., Amaral, J., Zhang, C., and Gao, G. 2003. Minimum register instruction sequencing to reduce register spills in out-of-order. IEEE Trans. Comput. 52, 1, 4--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Havanki, W., Banerjia, S., and Conte, T. 1998. Treegion scheduling for wide-issue processors. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA'98). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kessler, C. 1998. Scheduling expression DAGs for minimal register need. J. Comput. Lang. 24, 1, 33--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Langevin, M. and Cerny, E. 1996. A recursive technique for computing lower-bound performance of schedules. ACM Trans. Des. Autom. Electron. Syst. 1, 4, 443--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Malik, A. 2008. Constraint programming techniques for optimal instruction scheduling. Ph.D. thesis, University of Waterloo. https://cs.uwaterloo.ca/∼vanbeek/Publications/malik.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rim, M. and Jain, R. 1994. Lower-bound performance estimation for the high-level synthesis scheduling problem. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 13, 4, 451--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shobaki, G. and Wilken, K. 2004. Optimal superblock scheduling using enumeration. In Proceedings of the 37th International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shobaki, G. 2006. Optimal global instruction scheduling using enumeration. Ph.D. dissertation, Department of Computer Science, UC Davis. http://www.cs.ucdavis.edu/research/tech-reports/2006/CSE-2006-19.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shobaki, G., Wilken, K., and Heffernan, M. 2009. Optimal trace scheduling using enumeration. ACM Trans. Archit. Code Optim. 5, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Touati, S. 2005. Register saturation in instruction-level parallelism. Int. J. Parallel Program. 33, 4, 393--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Weicker, R. and Henning, J. 2007. Subroutine profiling results for the CPU2006 benchmarks. ACM SIGARCH Comput. Archit. News 35, 1, 102--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Winkel, S. 2007. Optimal versus heuristic global code scheduling. In Proceedings of the 40th International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Architecture and Code Optimization
              ACM Transactions on Architecture and Code Optimization  Volume 10, Issue 3
              September 2013
              310 pages
              ISSN:1544-3566
              EISSN:1544-3973
              DOI:10.1145/2509420
              Issue’s Table of Contents

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 16 September 2013
              • Accepted: 1 March 2013
              • Revised: 1 February 2013
              • Received: 1 February 2012
              Published in taco Volume 10, Issue 3

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader