skip to main content
10.1145/53990.54022acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free Access

Software pipelining: an effective scheduling technique for VLIW machines

Published:01 June 1988Publication History

ABSTRACT

This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.

This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.

The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.

References

  1. 1.Aiken, A. and Nicolau, A. Perfect Pipelining: A New Loop Parallelization Technique. Comell University, Oct., 1987.Google ScholarGoogle Scholar
  2. 2.Ammratone, M., Bitz, F., Clune E., Kung H. T., Maulik, P., Ribas, H., Tseng, P., and Webb, J. Applications of Warp. Proc. Compcon Spring 87, San Francisco, Feb., 1987, pp. 272-275.Google ScholarGoogle Scholar
  3. 3.Annaratone, M., Bitz, F., Deutch, J., Hamey, L., Kung, H. T., Maulik P. C., Tseng, P., and Webb, J. A. Applications Experience on Warp. Proc. 1987 National Computer Cofference, AFIPS, Chicago, June, 1987, pp. 149-158.Google ScholarGoogle Scholar
  4. 4.Annaratone, M., Amould, E., Gross, T., Kung, H. T., Lain, M., Menzilcioglu, O. and Webb, I. A. "The Warp Computer: Architecture, Implementation and Performance". IEEE Transactions on Computers C-36, 12 (December 1987). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Colwell, R. P., Nix, R. P., O' Donnell, J. I., Papworth, D. B., and Rodman, P. K. A VLIW Architecture for a Trace Scheduling Compiler. Proc. Second Intl. Conf. on Amhitecmral Support for Programming Languages and Operating Systems, Oct., 1987, pp. 180-192. Google ScholarGoogle ScholarCross RefCross Ref
  6. 6.Dantzig, G. B., Blatmer, W. O. and Rao, M. R. All Shortest Routes from a Fixed Origin in a Graph. Theory of Graphs, Rome, July, 1967, pp. 85-90.Google ScholarGoogle Scholar
  7. 7.E2xfioglu, Kemal. A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Ellis, John R. Bulldog: A Compiler for VLIW Architectures. Ph.D. Th., Yale University, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Fisher, J. A. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling with Resources. Ph.D. Th., New York Univ., OeL 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Fisher, J. A. "Trace Scheduling: A Technique for Global Microcode Compaction". IEEE Trans. on Computers C-30, 7 (July 1981), 478-490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Fisher, J. A., Ellis, J. R., Ruttenberg, J. C. and Nicolau, A. Parallel Processing: A Smart Compiler and a Dumb Machine. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, Montreal, Canada, June, 1984, pp. 37-47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Fisher, J. A., Landskov, D. and Shriver, B. D. Microcode Compaction: looking Backward and Looking Forward. Proc. 1981 National Computer Conference, 1981, pp. 95-102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Floyd, R. W. "Algorithm 97: Shortest Path". Comm. ACM 5, 6 (1962), 345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.Garey, Michael R. and Johnson, David S. Computers and Intractability A Guide to the Theory of NP-Completeness. Freeman, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Gross, T. and Lain, M. Compilation for a High-performance Systolic Array. Proc. ACM SIGPLAN 86 Symposium on Compiler Construction, June, 1986, pp. 27-3 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Hsu, Peter. Highly Concurrent Scalar Processing. Ph.D. Th., University of Hlinois at Urbana-Champaign, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Isoda, Sadahiro, Kobayashi, Yoshizumi, and Ishida, Tom. "Global Compaction of Horizontal Microprograms Based on the General/zed Data Dependency Graph". IEEE Trans. on Computers o32, 10 (October 1983), 922-933.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.Kuck, D. J., Kuhn, R. H., Padua, D. A., Leasure, B. and Wolfe, M. l)etxmdence Graphs and Compiler Optimizations. Proc. ACM Symposium on Principles of Programming Languages, January, 1981, pp. 207-218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Lah, I. and Atkin, E. Tree Compaction of Microprograms. Proc. 16th Annual Workshop on Microprogramming, Oct., 1982, pp. 23-33.Google ScholarGoogle Scholar
  20. 20.Lam, Monica. Compiler Optimizations for Asynchronous Systolic Array Programs. Proc. Fifteenth Annual ACM Symposium on Principles of Programming Languages, Jan., 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.Lain, Monica. A Systolic Array Optimizing Compiler. Ph.D. Th., Carnegie Mellon University, May 1987.Google ScholarGoogle Scholar
  22. 22.Linn, Joseph L SRDAG Compaction - A Generalization of Trace Scheduling to Increase the Use of Global Context Information. Proc. 16th Annual Workshop on Microprogramming, 1983, pp. 11-2ZGoogle ScholarGoogle Scholar
  23. 23.McMahon, F. H. Lawrence Livermore National Laboratory FORTRAN Kernels: MFLOPS.Google ScholarGoogle Scholar
  24. 24.Patel, Janak H. and Davidson, Edward S. Improving the Throughput of a Pipeline by Insertion of Delays. Proc. 3rd Annual Symposium on Computer Architecture, Jan., 1976, pp. 159-164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.Rau, B. R. and Glaeser, C. D. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proc. 14th Annual Workshop on Microprogramming, Oct, 1981, pp. 183-198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.Su, B., Ding, S. and Jin, L. An Improvement of Trace Scheduling for Global Microcode Compaction. Proc. 17th Annual Workshop in Microprogramming, Dec., 1984, pp. 78-85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.5u, B., Ding, $., Wang, I. and Xia, J. GURPR- A Method for Global Software Pipelining. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987, pp. 88-96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.Su, B., Ding, S. and Xia, J. URPR- An Extension of URCR for Software Pipeline. Proc. 19th Annual Workshop on Microprogramruing, OCt., 1986, pp. 104-108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.Tarjan, R. E. "Depth first search and linear graph algorithms". SlAM J. Computing 1, 2 (1972), 146-160.Google ScholarGoogle Scholar
  30. 30.Touzeau, R. F. A Fortran Compiler for the FPS-164 Scientific Computer. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, June, 1984, pp. 48-57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31.Weiss, S. and Smith, J. E. A Study of Scalar Compilation Techniques for Pipelined Supercomputers. Proc. Second Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct, 1987, pp. 105-109. Google ScholarGoogle ScholarCross RefCross Ref
  32. 32.Wood, Graham. Global Op 'tlmization of Microprograms Through Modular Control Congress. Proe. 12th Annual Workshop in Microprogramming, 1979, pp. 1-6. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Software pipelining: an effective scheduling technique for VLIW machines

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation
        June 1988
        338 pages
        ISBN:0897912691
        DOI:10.1145/53990
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 23, Issue 7
          Proceedings of the SIGPLAN '88 conference on Programming language design and implementation
          July 1988
          338 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/960116
          Issue’s Table of Contents

        Copyright © 1988 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 1988

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate406of2,067submissions,20%

        Upcoming Conference

        PLDI '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader