skip to main content
article
Free Access

Communication scheduling

Authors Info & Claims
Published:01 November 2000Publication History
Skip Abstract Section

Abstract

The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect between functional units and register files. Communication scheduling enables scheduling to these emerging architectures, including those that use shared buses and register file ports. Scheduling to these shared interconnect architectures is difficult because it requires simultaneously allocating functional units to operations and buses and register file ports to the communications between operations. Prior VLIW scheduling algorithms are limited to clustered register file architectures with no shared buses or register file ports. Communication scheduling extends the range of target architectures by making each communication explicit and decomposing it into three components: a write stub, zero or more copy operations, and a read stub. Communication scheduling allows media processing kernels to achieve 98% of the performance of a central register file architecture on a distributed register file architecture with only 9% of the area, 6% of the power consumption, and 37% of the access delay, and 120% of the performance of a clustered register file architecture on a distributed register file architecture with 56% of the area and 50% of the power consumption.

References

  1. 1 Capitanio, A., Dutt, N., and Nicolau, A. "Partitioned register files for VLIWs: A preliminary analysis of trade-offs." Proceedings of the 25th Annual International Symposium on Microarchitecture, Dec., 1992, pp. 292-300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 Colwell, R., Hall, W., Joshi, C., Papworth, D., Rodman, E, and Tomes, J. "Architecture and implementation of a VLIW supercomputer." Proceedings in Supercomputing, Nov., 1990, pp. 910-919. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 Dehnert, J. and Towle, R. "Compiling for the Cydra 5." Journal ofSupercomputing, Jan., 1993, 182-227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Desoli, G. "Instruction assignment for clustered VLIW DSP compilers: A new approach." Technical Report HPL- 98-13, Hewlett-Packard Laboratories, Feb., 1998.Google ScholarGoogle Scholar
  5. 5 Diefendorff, K. and Dubey, P. "How multimedia workloads will change processor design." Computer, Sept., 1997, pp. 43-45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 Ellis, J., Bulldog: A compiler for VLIW architectures. Cambridge, MA: MIT Press, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 Fernandes, M., Llosa, J., and Topham, N., "Distributed modulo scheduling." Proceedings of the 5th Annual International Conference on High Performance Computer Architecture, Jan., 1999, pp. 130-134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Grossman, J. and Dally, W. "Point sample rendering." Proceedings of the 9th Eurographics Workshop on Rendering, June, 1998, pp. 181-192.Google ScholarGoogle Scholar
  9. 9 Lam, M. "Software pipelining: An effective scheduling technique for VLIW machines." Proceedings of the Conference on Programming Language Design and Implementation, June, 1988, pp. 318-328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 Lowney, P., Freudenberger, S., Karzes, T., Lichtenstein, W., Nix, R., O'Donnell, J., and Ruttenberg, J. "The Multiflow trace scheduling compiler." Journal of Supercomputing, Jan., 1993, pp. 51-142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 Mangione-Smith, W., Abraham, S., and Davidson, E. "Register requirements of pipelined processors." Proceedings of the International Conference on Supercomputing, July, 1992, pp. 260-271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 Nystrom, E., and Eichenberger, A. "Effective cluster assignment for modulo scheduling." Proceedings of the 31st Annual International Symposium on Microarchitecture, Dec., 1998, pp. 103 - 114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 Ozer, E., Banerjia, S., and Conte, T. "Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures." Proceedings of the 31st Annual International Symposium on Microarchitecture, Dec., 1998, pp. 308-315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 Rau, B., Glaeser, C., and Picard, R., "Efficient code generation for horizontal architectures: Compiler techniques and architectural support." Proceedings of the International Symposium on Computer Architecture, July, 1982, pp. 131- 139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 Rixner, S., Dally, W. J., Khailany, B., Mattson, E, Kapasi, U. J., and Owens, J. D. "Register organization for media processing", 6th International Symposium on High-Performance Computer Architecture, Jan., 2000, pp. 375-386.Google ScholarGoogle Scholar
  16. 16 Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez- Lagunas, A., Mattson, P., and Owens, J. D. "A bandwidthefficient architecture for media processing", Proceedings of the 31st Annual International Symposium on Microarchitecture, Dec., 1998, pp. 3-13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 Stotzer, E. and Leiss, E., "Modulo scheduling for the TMS320C6x VLIW DSP architecture," Proceedings of the ACM SIGPLAN 1999 Workshop on Languages, Compilers, and Tools for Embedded Systems, May, 1999, pp. 28-34. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Communication scheduling

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 35, Issue 11
    Nov. 2000
    269 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/356989
    Issue’s Table of Contents

    Copyright © 2000 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 November 2000

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader