skip to main content
10.1145/291069.291018acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free Access

Space-time scheduling of instruction-level parallelism on a raw machine

Published:01 October 1998Publication History

ABSTRACT

Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single cycle global communication. A Raw microprocessor distributes all of its resources, including instruction streams, register files, memory ports, and ALUs, over a pipelined two-dimensional mesh interconnect, and exposes them fully to the compiler. Because communication in Raw machines is distributed, compiling for instruction-level parallelism (ILP) requires both spatial instruction partitioning as well as traditional temporal instruction scheduling. In addition, the compiler must explicitly manage all communication through the interconnect, including the global synchronization required at branch points. This paper describes RAWCC, the compiler we have developed for compiling general-purpose sequential programs to the distributed Raw architecture. We present performance results that demonstrate that although Raw machines provide no mechanisms for global communication the Raw compiler can schedule to achieve speedups that scale with the number of available functional units.

References

  1. 1.I. Ahmad, Y. Kwok, and M. Wu. Analysis, Evaluation, and Comparison of Algorithms for Scheduling Task Graphs on Parallel Processors. in Proceedings of the Second international Symposium on Parallel Architectures, Algorithms, and Networks, pages 207-213, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S. Amarasingle, J. Anderson, C. Wilson, S. Liao, B. Murphy, R. French, and M. Lam. Multiprocessors from a Software Perspective. iEEE Micro, pages 52-61, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.D. August, W. Hwu, and S. Mahlke. A Framework for Balancing Control Flow and Predication. In Proceedings of the 30th International Symposium on Microarchitecture, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.J. Babb, M. Frank, V. Lee, E. Waingold, R. Barua, M. Taylor, J. Kim, S. Devabhaktuni, and A. Agarwal. The RAW Benchmark Suite: Computation Structures for General Purpose Computing. In IEEE Symposium on Field- Programmable Custom Computing Machines, Napa Valley, CA, Apr. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.J. Babb, R. Tessier, M. Dahl, S. Hanono, D. Hoki, and A. Agarwal. Logic emulation with virtual wires. IEEE Transactions on Computer Aided Design, 16(6):609-626, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.S. Banerjia. Instruction Scheduling and Fetch Mechanism for Clustered VLIW Processors. In Ph.D Thesis, North Carolina State University, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Maps: A Compiler-Managed Memory System for Raw Machines. Technical Memo TM-583, MIT LCS, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Memory Bank Disambiguation using Modulo Unrolling for Raw Machines. In Proceedings of the A CM/IEEE Fifth lnt 'l Conference on High Performance Computing(HiPC), Dec. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.A. Capitanio, N. Dutt, and A. Nicolau. Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs. In Proceedings of the ~5th International Symposium on Microarchitecture, Dec. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.G. Deso}i. Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach. Technical report, HP HPL- 98-13, Feb. 1998.Google ScholarGoogle Scholar
  11. 11.M. Dikaiakos, A. Rogers, and K. Steiglitz. A Comparison of Techniques Used for Mapping Parallel Algorithms to Message-Passing Multiprocessors. In Proceedings of the 6th IEEE Symposium on Parallel and Distributed Processing, Oct. 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.J. R. Ellis. Bulldog: A Compiler for VLIW Architectures. In Ph.D Thesis, Yale University, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, 7(C-30):478-490, July 1981.Google ScholarGoogle Scholar
  14. 14.L. Gwennap. Digital 21264 Sets New Standard. Microprocessor Report, pages 11-16, Oct. 1996.Google ScholarGoogle Scholar
  15. 15.W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Wafter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery. Tile Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing, 7(1), Jan. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.V. Kathail, M. Schlansker, and B. R. Rau. HPL PlayDoh Architecture Specification: Version 1.0. Technical report, HP HPL-93-80, Feb. 1994.Google ScholarGoogle Scholar
  17. 17.M. S. Lam. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proc. Int'l Conf. on Programming Language Design and Implementation (PLDI), pages 318-328, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.M. S. Lam and R. P. Wilson. Limits of Control Flow on Parallelism. In Proceedings of 19th Annual International Symposium on Computer Architecture, pages 46-57, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multifiow Trace Scheduling Compiler. In Journal of Supercomputing, pages 51-142, Jan. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. Pitman, London and The MIT Press, Cambridge, Massachusetts, 1989. In the series, Research Monographs in Parallel and Distributed Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.M. D. Smith. Extending SUIF for Machine-dependent Optimizations. In Proceedings of the First SUiF Compiler Workshop, pages 14-25, Stanford, CA, Jan. 1996.Google ScholarGoogle Scholar
  22. 22.G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar Processors. In Proceedings of the 22nd Annual international Symposium on Computer Architecture, pages 414-425, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.E. Waingold, M. Taylor, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, S. Devabhaktuni, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring It All To Software: Raw Machines. Computer, pages 86-93, Sept. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S. Liao, C.-W. Tseng, M. Hall, M. Lam, and J. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. ACM SIGPLAN Notices, 29(12), Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.T. Yang and A. Gerasoulis. List Scheduling With and Without Communication. Parallel Computing Journal, 19:1321- 1344, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.T. Yang and A. Gerasoulis. DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors. IEEE Transactions on Parallel and Distributed Systems, 5(9):951-967, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Space-time scheduling of instruction-level parallelism on a raw machine

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
              October 1998
              326 pages
              ISBN:1581131070
              DOI:10.1145/291069

              Copyright © 1998 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 October 1998

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              ASPLOS VIII Paper Acceptance Rate28of123submissions,23%Overall Acceptance Rate535of2,713submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader