skip to main content
article

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Published:20 October 2006Publication History
Skip Abstract Section

Abstract

As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (image, video, DSP, etc.) are naturally represented by independent filters that communicate over explicit data channels.In this paper, we demonstrate an end-to-end stream compiler that attains robust multicore performance in the face of varying application characteristics. As benchmarks exhibit different amounts of task, data, and pipeline parallelism, we exploit all types of parallelism in a unified manner in order to achieve this generality. Our compiler, which maps from the StreamIt language to the 16-core Raw architecture, attains a 11.2x mean speedup over a single-core baseline, and a 1.84x speedup over our previous work.

References

  1. Raza Microelectronics, Inc. http://www.razamicroelectronics.com/products/xlr.htm.]]Google ScholarGoogle Scholar
  2. StreamIt Language Specification. http://cag.lcs.mit.edu/streamit/papers/streamit-lang-spec.pdf.]]Google ScholarGoogle Scholar
  3. S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing Stream Programs Using Linear State Space Analysis. In CASES, San Francisco, CA, Sept. 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Andrews and N. Baker. Xbox 360 System Architecture. IEEE Micro, 26(2), 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bakshi and D.D. Gajski. Partitioning and pipelining for performance-constrained hardware/software systems. IEEE Trans. Very Large Scale Integr. Syst., 7(4):419--432, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L.-F. Chao and E.H.-M. Sha. Scheduling Data-Flow Graphs via Retiming and Unfolding. IEEE Trans. on Parallel and Distributed Systems, 08(12), 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K.S. Chatha and R. Vemuri. Hardware-Software partitioning and pipelined scheduling of transformative applications. IEEE Trans. Very Large Scale Integr. Syst., 10(3), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M.K. Chen, X.F. Li, R. Lian, J.H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-La: Achieving High Performance from Compiled Network Applications While Enabling Ease of Programming. In PLDI, New York, NY, USA, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Consel, H. Hamdi, L. Rveillre, L. Singaravelu, H. Yu, and C. Pu. Spidle: A DSL Approach to Specifying Streaming Applications. In 2nd Int. Conf. on Generative Prog. and Component Engineering, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Drake, H. Hoffman, R. Rabbah, and S. Amarasinghe. MPEG-2 Decoding in a Stream Programming Language. In IPDPS, Rhodes Island, Greece, April 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Du, R. Ferreira, and G. Agrawal. Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism. In Supercomputing, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. and D. Messerschmitt. Pipeline interleaved programmable DSP's: Synchronous data flow programming. IEEE Trans. on Signal Processing, 35(9), 1987.]]Google ScholarGoogle Scholar
  14. W. Eatherton. The Push of Network Processing to the Top of the Pyramid. Keynote Address, Symposium on Architectures for Networking and Communications Systems, 2005.]]Google ScholarGoogle Scholar
  15. M. Gordon,W. Thies, M. Karczmarek, J. Lin, A.S. Meli, A.A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. In ASPLOS, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Gummaraju and M. Rosenblum. Stream Programming on General- Purpose Processors. In MICRO, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H.P. Hofstee. Power Efficient Processor Architecture and The Cell Processor. HPCA, 00:258--262, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. U.J. Kapasi, S. Rixner, W.J. Dally, B. Khailany, J.H. Ahn, P. Mattson, and J.D. Owens. Programmable stream processors. IEEE Computer, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. In LCTES, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M.A. Karczmarek. Constrained and Phased Scheduling of Synchronous Data Flow Graphs for the StreamIt Language. Master's thesis, MIT, 2002.]]Google ScholarGoogle Scholar
  21. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y.-K. Kwok and I. Ahmad. FASTEST: A Practical Low-Complexity Algorithm for Compile-Time Assignment of Parallel Programs to Multiprocessors. IEEE Trans. on Parallel and Distributed Systems, 10(2), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406--471, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A.A. Lamb, W. Thies, and S. Amarasinghe. Linear Analysis and Optimization of Stream Programs. In PLDI, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]Google ScholarGoogle Scholar
  26. E.A. Lee and D.G. Messerschmitt. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Trans. Comput., 36(1):24--35, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W.R. Mark, R.S. Glanville, K. Akeley, and M.J. Kilgard. Cg: A System for Programming Graphics Hardware in a C-like Language. In SIGGRAPH, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Michael Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Ottoni, R. Rangan, A. Stoler, and D.I. August. Automatic Thread Extraction with Decoupled Software Pipelining. In MICRO, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Parhi and D. Messerschmitt. Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding. IEEE Transactions on Computers, 40(2), 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J.L. Pino, S.S. Bhattacharyya, and E.A. Lee. A Hierarchical Multiprocessor Scheduling Framework for Synchronous Dataflow Graphs. Technical Report UCB/ERL M95/36, May 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J.L. Pino and E.A. Lee. Hierarchical Static Scheduling of Dataflow Graphs onto Multiple Processors. Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, 1995.]]Google ScholarGoogle ScholarCross RefCross Ref
  34. S. Seneff. Speech transformation system (spectrum and/or excitation) without pitch extraction. Master's thesis, MIT, 1980.]]Google ScholarGoogle Scholar
  35. J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache Aware Optimization of Stream Programs. In LCTES, Chicago, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]Google ScholarGoogle ScholarCross RefCross Ref
  37. M.B. Taylor et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M.B. Taylor, W. Lee, J. Miller, D. Wentzlaff, et al. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In ISCA, Munich, Germany, June 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, France, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E.Waingold, M. Taylor, D. Srikrishna, V. Sarkar,W. Lee, et al. Baring It All to Software: Raw Machines. IEEE Computer, 30(9), 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and Computation Transformations for Brook Streaming Applications on Multiprocessors. In CGO, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Zhang, Z.-Z. Li, H. Song, and L. Liu. A Programming Model for an Embedded Media Processing Architecture. In SAMOS, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 34, Issue 5
          Proceedings of the 2006 ASPLOS Conference
          December 2006
          425 pages
          ISSN:0163-5964
          DOI:10.1145/1168919
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
            October 2006
            440 pages
            ISBN:1595934510
            DOI:10.1145/1168857

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 October 2006

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader