Software pipelining: an effective scheduling technique for VLIW machines

Author:
M. Lam

Carnegie Mellon Univ., Pittsburgh, PA

Carnegie Mellon Univ., Pittsburgh, PA
View Profile

PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementationJune 1988Pages 318–328https://doi.org/10.1145/53990.54022

Published:01 June 1988Publication History

PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation

Pages 318–328

ABSTRACT

This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.

This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.

The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.

References

1.Aiken, A. and Nicolau, A. Perfect Pipelining: A New Loop Parallelization Technique. Comell University, Oct., 1987.Google Scholar
2.Ammratone, M., Bitz, F., Clune E., Kung H. T., Maulik, P., Ribas, H., Tseng, P., and Webb, J. Applications of Warp. Proc. Compcon Spring 87, San Francisco, Feb., 1987, pp. 272-275.Google Scholar
3.Annaratone, M., Bitz, F., Deutch, J., Hamey, L., Kung, H. T., Maulik P. C., Tseng, P., and Webb, J. A. Applications Experience on Warp. Proc. 1987 National Computer Cofference, AFIPS, Chicago, June, 1987, pp. 149-158.Google Scholar
4.Annaratone, M., Amould, E., Gross, T., Kung, H. T., Lain, M., Menzilcioglu, O. and Webb, I. A. "The Warp Computer: Architecture, Implementation and Performance". IEEE Transactions on Computers C-36, 12 (December 1987). Google ScholarDigital Library
5.Colwell, R. P., Nix, R. P., O' Donnell, J. I., Papworth, D. B., and Rodman, P. K. A VLIW Architecture for a Trace Scheduling Compiler. Proc. Second Intl. Conf. on Amhitecmral Support for Programming Languages and Operating Systems, Oct., 1987, pp. 180-192. Google ScholarCross Ref
6.Dantzig, G. B., Blatmer, W. O. and Rao, M. R. All Shortest Routes from a Fixed Origin in a Graph. Theory of Graphs, Rome, July, 1967, pp. 85-90.Google Scholar
7.E2xfioglu, Kemal. A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987. Google ScholarDigital Library
8.Ellis, John R. Bulldog: A Compiler for VLIW Architectures. Ph.D. Th., Yale University, 1985. Google ScholarDigital Library
9.Fisher, J. A. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling with Resources. Ph.D. Th., New York Univ., OeL 1979. Google ScholarDigital Library
10.Fisher, J. A. "Trace Scheduling: A Technique for Global Microcode Compaction". IEEE Trans. on Computers C-30, 7 (July 1981), 478-490.Google ScholarDigital Library
11.Fisher, J. A., Ellis, J. R., Ruttenberg, J. C. and Nicolau, A. Parallel Processing: A Smart Compiler and a Dumb Machine. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, Montreal, Canada, June, 1984, pp. 37-47. Google ScholarDigital Library
12.Fisher, J. A., Landskov, D. and Shriver, B. D. Microcode Compaction: looking Backward and Looking Forward. Proc. 1981 National Computer Conference, 1981, pp. 95-102.Google ScholarDigital Library
13.Floyd, R. W. "Algorithm 97: Shortest Path". Comm. ACM 5, 6 (1962), 345. Google ScholarDigital Library
14.Garey, Michael R. and Johnson, David S. Computers and Intractability A Guide to the Theory of NP-Completeness. Freeman, 1979. Google ScholarDigital Library
15.Gross, T. and Lain, M. Compilation for a High-performance Systolic Array. Proc. ACM SIGPLAN 86 Symposium on Compiler Construction, June, 1986, pp. 27-3 8. Google ScholarDigital Library
16.Hsu, Peter. Highly Concurrent Scalar Processing. Ph.D. Th., University of Hlinois at Urbana-Champaign, 1986. Google ScholarDigital Library
17.Isoda, Sadahiro, Kobayashi, Yoshizumi, and Ishida, Tom. "Global Compaction of Horizontal Microprograms Based on the General/zed Data Dependency Graph". IEEE Trans. on Computers o32, 10 (October 1983), 922-933.Google ScholarDigital Library
18.Kuck, D. J., Kuhn, R. H., Padua, D. A., Leasure, B. and Wolfe, M. l)etxmdence Graphs and Compiler Optimizations. Proc. ACM Symposium on Principles of Programming Languages, January, 1981, pp. 207-218. Google ScholarDigital Library
19.Lah, I. and Atkin, E. Tree Compaction of Microprograms. Proc. 16th Annual Workshop on Microprogramming, Oct., 1982, pp. 23-33.Google Scholar
20.Lam, Monica. Compiler Optimizations for Asynchronous Systolic Array Programs. Proc. Fifteenth Annual ACM Symposium on Principles of Programming Languages, Jan., 1988. Google ScholarDigital Library
21.Lain, Monica. A Systolic Array Optimizing Compiler. Ph.D. Th., Carnegie Mellon University, May 1987.Google Scholar
22.Linn, Joseph L SRDAG Compaction - A Generalization of Trace Scheduling to Increase the Use of Global Context Information. Proc. 16th Annual Workshop on Microprogramming, 1983, pp. 11-2ZGoogle Scholar
23.McMahon, F. H. Lawrence Livermore National Laboratory FORTRAN Kernels: MFLOPS.Google Scholar
24.Patel, Janak H. and Davidson, Edward S. Improving the Throughput of a Pipeline by Insertion of Delays. Proc. 3rd Annual Symposium on Computer Architecture, Jan., 1976, pp. 159-164. Google ScholarDigital Library
25.Rau, B. R. and Glaeser, C. D. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proc. 14th Annual Workshop on Microprogramming, Oct, 1981, pp. 183-198. Google ScholarDigital Library
26.Su, B., Ding, S. and Jin, L. An Improvement of Trace Scheduling for Global Microcode Compaction. Proc. 17th Annual Workshop in Microprogramming, Dec., 1984, pp. 78-85. Google ScholarDigital Library
27.5u, B., Ding, $., Wang, I. and Xia, J. GURPR- A Method for Global Software Pipelining. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987, pp. 88-96. Google ScholarDigital Library
28.Su, B., Ding, S. and Xia, J. URPR- An Extension of URCR for Software Pipeline. Proc. 19th Annual Workshop on Microprogramruing, OCt., 1986, pp. 104-108. Google ScholarDigital Library
29.Tarjan, R. E. "Depth first search and linear graph algorithms". SlAM J. Computing 1, 2 (1972), 146-160.Google Scholar
30.Touzeau, R. F. A Fortran Compiler for the FPS-164 Scientific Computer. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, June, 1984, pp. 48-57. Google ScholarDigital Library
31.Weiss, S. and Smith, J. E. A Study of Scalar Compilation Techniques for Pipelined Supercomputers. Proc. Second Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct, 1987, pp. 105-109. Google ScholarCross Ref
32.Wood, Graham. Global Op 'tlmization of Microprograms Through Modular Control Congress. Proe. 12th Annual Workshop in Microprogramming, 1979, pp. 1-6. Google ScholarDigital Library

Index Terms

Software pipelining: an effective scheduling technique for VLIW machines
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language features
        Modules / packages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Software pipelining: an effective scheduling technique for VLIW machines
Proceedings of the SIGPLAN '88 conference on Programming language design and implementation

This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding ...
Read More
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture
LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) ...
Read More
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture
OM '01: Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation
June 1988
338 pages
ISBN:0897912691
DOI:10.1145/53990
Editor:
R. L. Wexelblat
Philips Laboratories, Briarcliff Manor, NY
ACM SIGPLAN Notices Volume 23, Issue 7
Proceedings of the SIGPLAN '88 conference on Programming language design and implementation
July 1988
338 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/960116
Issue’s Table of Contents
Copyright © 1988 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1988
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 932
  Total Citations
  View Citations
- 3,812
  Total Downloads
- Downloads (Last 12 months)420
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Software pipelining: an effective scheduling technique for VLIW machines

Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture