Space-time scheduling of instruction-level parallelism on a raw machine

Authors:
Walter Lee

M.I.T. Laboratory for Computer Science

M.I.T. Laboratory for Computer Science
View Profile

,
Rajeev Barua

M.I.T. Laboratory for Computer Science

M.I.T. Laboratory for Computer Science
View Profile

,
Matthew Frank

M.I.T. Laboratory for Computer Science

M.I.T. Laboratory for Computer Science
View Profile

,
Devabhaktuni Srikrishna

M.I.T. Laboratory for Computer Science

M.I.T. Laboratory for Computer Science
View Profile

,
Jonathan Babb

M.I.T. Laboratory for Computer Science

M.I.T. Laboratory for Computer Science
View Profile

,
Vivek Sarkar

M.I.T. Laboratory for Computer Science

M.I.T. Laboratory for Computer Science
View Profile

,
Saman Amarasinghe

M.I.T. Laboratory for Computer Science

M.I.T. Laboratory for Computer Science
View Profile

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systemsOctober 1998Pages 46–57https://doi.org/10.1145/291069.291018

Published:01 October 1998Publication History

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

Pages 46–57

ABSTRACT

Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single cycle global communication. A Raw microprocessor distributes all of its resources, including instruction streams, register files, memory ports, and ALUs, over a pipelined two-dimensional mesh interconnect, and exposes them fully to the compiler. Because communication in Raw machines is distributed, compiling for instruction-level parallelism (ILP) requires both spatial instruction partitioning as well as traditional temporal instruction scheduling. In addition, the compiler must explicitly manage all communication through the interconnect, including the global synchronization required at branch points. This paper describes RAWCC, the compiler we have developed for compiling general-purpose sequential programs to the distributed Raw architecture. We present performance results that demonstrate that although Raw machines provide no mechanisms for global communication the Raw compiler can schedule to achieve speedups that scale with the number of available functional units.

References

1.I. Ahmad, Y. Kwok, and M. Wu. Analysis, Evaluation, and Comparison of Algorithms for Scheduling Task Graphs on Parallel Processors. in Proceedings of the Second international Symposium on Parallel Architectures, Algorithms, and Networks, pages 207-213, June 1996. Google ScholarDigital Library
2.S. Amarasingle, J. Anderson, C. Wilson, S. Liao, B. Murphy, R. French, and M. Lam. Multiprocessors from a Software Perspective. iEEE Micro, pages 52-61, June 1996. Google ScholarDigital Library
3.D. August, W. Hwu, and S. Mahlke. A Framework for Balancing Control Flow and Predication. In Proceedings of the 30th International Symposium on Microarchitecture, Dec. 1997. Google ScholarDigital Library
4.J. Babb, M. Frank, V. Lee, E. Waingold, R. Barua, M. Taylor, J. Kim, S. Devabhaktuni, and A. Agarwal. The RAW Benchmark Suite: Computation Structures for General Purpose Computing. In IEEE Symposium on Field- Programmable Custom Computing Machines, Napa Valley, CA, Apr. 1997. Google ScholarDigital Library
5.J. Babb, R. Tessier, M. Dahl, S. Hanono, D. Hoki, and A. Agarwal. Logic emulation with virtual wires. IEEE Transactions on Computer Aided Design, 16(6):609-626, June 1997. Google ScholarDigital Library
6.S. Banerjia. Instruction Scheduling and Fetch Mechanism for Clustered VLIW Processors. In Ph.D Thesis, North Carolina State University, 1998. Google ScholarDigital Library
7.R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Maps: A Compiler-Managed Memory System for Raw Machines. Technical Memo TM-583, MIT LCS, July 1998. Google ScholarDigital Library
8.R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Memory Bank Disambiguation using Modulo Unrolling for Raw Machines. In Proceedings of the A CM/IEEE Fifth lnt 'l Conference on High Performance Computing(HiPC), Dec. 1998. Google ScholarDigital Library
9.A. Capitanio, N. Dutt, and A. Nicolau. Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs. In Proceedings of the ~5th International Symposium on Microarchitecture, Dec. 1992. Google ScholarDigital Library
10.G. Deso}i. Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach. Technical report, HP HPL- 98-13, Feb. 1998.Google Scholar
11.M. Dikaiakos, A. Rogers, and K. Steiglitz. A Comparison of Techniques Used for Mapping Parallel Algorithms to Message-Passing Multiprocessors. In Proceedings of the 6th IEEE Symposium on Parallel and Distributed Processing, Oct. 1994.Google ScholarDigital Library
12.J. R. Ellis. Bulldog: A Compiler for VLIW Architectures. In Ph.D Thesis, Yale University, 1985. Google ScholarDigital Library
13.J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, 7(C-30):478-490, July 1981.Google Scholar
14.L. Gwennap. Digital 21264 Sets New Standard. Microprocessor Report, pages 11-16, Oct. 1996.Google Scholar
15.W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Wafter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery. Tile Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing, 7(1), Jan. 1993. Google ScholarDigital Library
16.V. Kathail, M. Schlansker, and B. R. Rau. HPL PlayDoh Architecture Specification: Version 1.0. Technical report, HP HPL-93-80, Feb. 1994.Google Scholar
17.M. S. Lam. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proc. Int'l Conf. on Programming Language Design and Implementation (PLDI), pages 318-328, June 1988. Google ScholarDigital Library
18.M. S. Lam and R. P. Wilson. Limits of Control Flow on Parallelism. In Proceedings of 19th Annual International Symposium on Computer Architecture, pages 46-57, May 1992. Google ScholarDigital Library
19.P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multifiow Trace Scheduling Compiler. In Journal of Supercomputing, pages 51-142, Jan. 1993. Google ScholarDigital Library
20.V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. Pitman, London and The MIT Press, Cambridge, Massachusetts, 1989. In the series, Research Monographs in Parallel and Distributed Computing. Google ScholarDigital Library
21.M. D. Smith. Extending SUIF for Machine-dependent Optimizations. In Proceedings of the First SUiF Compiler Workshop, pages 14-25, Stanford, CA, Jan. 1996.Google Scholar
22.G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar Processors. In Proceedings of the 22nd Annual international Symposium on Computer Architecture, pages 414-425, 1995. Google ScholarDigital Library
23.E. Waingold, M. Taylor, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, S. Devabhaktuni, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring It All To Software: Raw Machines. Computer, pages 86-93, Sept. 1997. Google ScholarDigital Library
24.R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S. Liao, C.-W. Tseng, M. Hall, M. Lam, and J. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. ACM SIGPLAN Notices, 29(12), Dec. 1996. Google ScholarDigital Library
25.T. Yang and A. Gerasoulis. List Scheduling With and Without Communication. Parallel Computing Journal, 19:1321- 1344, 1993. Google ScholarDigital Library
26.T. Yang and A. Gerasoulis. DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors. IEEE Transactions on Parallel and Distributed Systems, 5(9):951-967, 1994. Google ScholarDigital Library

Index Terms

Space-time scheduling of instruction-level parallelism on a raw machine

Recommendations

Space-time scheduling of instruction-level parallelism on a raw machine

Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single cycle global communication. A Raw microprocessor ...
Read More
Space-time scheduling of instruction-level parallelism on a raw machine

Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single cycle global communication. A Raw microprocessor ...
Read More
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
October 1998
326 pages
ISBN:1581131070
DOI:10.1145/291069
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwal
Massachusetts Institute of Technology, Cambridge
ACM SIGPLAN Notices Volume 33, Issue 11
Nov. 1998
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/291006
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwel
Massachusetts Institute of Technology, Cambridge
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 32, Issue 5
Dec. 1998
309 pages
ISSN:0163-5980
DOI:10.1145/384265
Editor:
William M. Waite
Univ. of Colorado, Boulder
Issue’s Table of Contents
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 1998
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ASPLOS VIII Paper Acceptance Rate28of123submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 198
  Total Citations
  View Citations
- 1,952
  Total Downloads
- Downloads (Last 12 months)100
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Space-time scheduling of instruction-level parallelism on a raw machine

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Space-time scheduling of instruction-level parallelism on a raw machine

Space-time scheduling of instruction-level parallelism on a raw machine

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading