skip to main content
10.1145/207110.207132acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free Access

Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

Authors Info & Claims
Published:01 June 1995Publication History

ABSTRACT

Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory latency caused by cache hits and misses on non-blocking architectures. In contrast, balanced scheduling schedules instructions based on an estimate of the amount of instruction-level parallelism in the program. By scheduling independent instructions behind loads based on what the program can provide, rather than what the implementation stipulates in the best case (i.e., a cache hit), balanced scheduling can hide variations in memory latencies more effectively.

Since its success depends on the amount of instruction-level parallelism in the code, balanced scheduling should perform even better when more parallelism is available. In this study, we combine balanced scheduling with three compiler optimizations that increase instruction-level parallelism: loop unrolling, trace scheduling and cache locality analysis. Using code generated for the DEC Alpha by the Multiflow compiler, we simulated a non-blocking processor architecture that closely models the Alpha 21164. Our results show that balanced scheduling benefits from all three optimizations, producing average speedups that range from 1.15 to 1.40, across the optimizations. More importantly, because of its ability to tolerate variations in load interlocks, it improves its advantage over traditional scheduling. Without the optimizations, balanced scheduled code is, on average, 1.05 times faster than that generated by a traditional scheduler; with them, its lead increases to 1.18.

References

  1. Asp93.T. Asprey. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22-35, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. BCK+89.M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Sameh, E. Clementi, S. Chin, D, Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orszag, O. Johnson F. Seidl and, R. Goodrum, and J. Martin. The Perfect Club: Effective performance evaluation of supercomputers. International Journal of Supercompurer Applications, 3(3):5--40, December 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. CS95.R.P. Colwell and R.L. Steck. A 0.6urn BiCMOS Processor with Dynamic Execution. In IEEE International Solid-State Circuits Conference '95, page 176-177, February 1995.Google ScholarGoogle ScholarCross RefCross Ref
  4. DA92.K. Diefendorf and M. Allen. Organization of the Motorola 88110 superscalar RISC microprocessor, iEEE Micro, 12(2):51-61, April 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dix92.K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON, pages 305-310, February 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ER94.J. Edmondson and P. Rubinfeld. An overview of the 21164 Alpha AXP microprocessor. In Hot Chips VI, pages 1-8, August 1994.Google ScholarGoogle Scholar
  7. FERN84.J.A. Fisher, J.R. Ellis, J. Ruttenberg, and A. Nicolau, Parallel processing: A smart compiler and a dumb machine. In A CM SIGPLAN '84 Symposium on Compiler Construction, pages 36-46, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. FGL93.S.M. Freudenberger, T.R. Gross, and P.F. Lowney. Avoidance and suppression of compensation code in a trace scheduling compiler. Technical Report HPL-93- 35, Hewlett Packard, 1993.Google ScholarGoogle Scholar
  9. FJ94.K.I. Farkas and N.P. Jouppi. Complexity/performance tradeoffs with non-blocking Ioads. In 21th Annual International Symposium on Computer Architecture, pages 211-222, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. GM86.P.B. Gibbons and S.S. Muchnick. Efficient instruction scheduling for a pipelined architecture. In A CM SIG- PLAN '86 Symposium on Compiler Construction, pages 11-16, July 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gwe94a.L. Gwennap. UltraSparc Unleashes SPARC Performance. Microprocessor Report, pages 1-8, October 3, 1994.Google ScholarGoogle Scholar
  12. Gwe94b.L, Gwennap. 620 Fills Out PowerPC Product Line. Microprocessor Report, pages 12-16, October 24, 1994.Google ScholarGoogle Scholar
  13. HG83.J.L. Hennessy and T.R. Gross. Postpass code optimization of pipeline constraints. ACM Transactions on Programming Languages and Systems, 5(3):422-448, July 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. KE93.D.R. Kerns and S.J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In A CM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278-289, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kro81.D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In 8th Annual International Symposium on Computer Architecture, pages 81-87, May 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. LFK+93.P.F. Lowney, S.M. Freudenberger, R.J. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. O'Donnell, and J.C. Ruttenberg. The Multiflow trace scheduling compiler. The Journal of Supercomputing, 7:51-143, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. McL93.E. McLellan. The Alpha AXP architecture and 21064 processor. IEEE Micro, 13(3):36-47, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mip94.MIPS Technologies, Inc. Mips Open RiSC Technology R10000 Microprocessor Technical Brief. October 1994.Google ScholarGoogle Scholar
  19. MLG92.T.C. Mowry, M.S. Lam and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth international Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, October 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sit78.R.L. Sites. Instruction Ordering for the Cray-1 Computer. Technical Report 78-CS-023, Univ. of California, San Diego, July 1978.Google ScholarGoogle Scholar
  21. Sta.R. Stallman. The GNU Project Optimizing C Compiler. Free Software Foundation.Google ScholarGoogle Scholar

Index Terms

  1. Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
                June 1995
                335 pages
                ISBN:0897916972
                DOI:10.1145/207110

                Copyright © 1995 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 June 1995

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                PLDI '95 Paper Acceptance Rate28of105submissions,27%Overall Acceptance Rate406of2,067submissions,20%

                Upcoming Conference

                PLDI '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader