article

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Authors:
Michael I. Gordon

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

,
William Thies

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

,
Saman Amarasinghe

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 34 Issue 5December 2006pp 151–162https://doi.org/10.1145/1168919.1168877

Published:20 October 2006Publication History

ACM SIGARCH Computer Architecture News

Abstract

As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (image, video, DSP, etc.) are naturally represented by independent filters that communicate over explicit data channels.In this paper, we demonstrate an end-to-end stream compiler that attains robust multicore performance in the face of varying application characteristics. As benchmarks exhibit different amounts of task, data, and pipeline parallelism, we exploit all types of parallelism in a unified manner in order to achieve this generality. Our compiler, which maps from the StreamIt language to the 16-core Raw architecture, attains a 11.2x mean speedup over a single-core baseline, and a 1.84x speedup over our previous work.

References

Raza Microelectronics, Inc. http://www.razamicroelectronics.com/products/xlr.htm.]]Google Scholar
StreamIt Language Specification. http://cag.lcs.mit.edu/streamit/papers/streamit-lang-spec.pdf.]]Google Scholar
S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing Stream Programs Using Linear State Space Analysis. In CASES, San Francisco, CA, Sept. 2005.]] Google ScholarDigital Library
J. Andrews and N. Baker. Xbox 360 System Architecture. IEEE Micro, 26(2), 2006.]] Google ScholarDigital Library
S. Bakshi and D.D. Gajski. Partitioning and pipelining for performance-constrained hardware/software systems. IEEE Trans. Very Large Scale Integr. Syst., 7(4):419--432, 1999.]] Google ScholarDigital Library
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH, 2004.]] Google ScholarDigital Library
L.-F. Chao and E.H.-M. Sha. Scheduling Data-Flow Graphs via Retiming and Unfolding. IEEE Trans. on Parallel and Distributed Systems, 08(12), 1997.]] Google ScholarDigital Library
K.S. Chatha and R. Vemuri. Hardware-Software partitioning and pipelined scheduling of transformative applications. IEEE Trans. Very Large Scale Integr. Syst., 10(3), 2002.]] Google ScholarDigital Library
M.K. Chen, X.F. Li, R. Lian, J.H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-La: Achieving High Performance from Compiled Network Applications While Enabling Ease of Programming. In PLDI, New York, NY, USA, 2005.]] Google ScholarDigital Library
C. Consel, H. Hamdi, L. Rveillre, L. Singaravelu, H. Yu, and C. Pu. Spidle: A DSL Approach to Specifying Streaming Applications. In 2nd Int. Conf. on Generative Prog. and Component Engineering, 2003.]] Google ScholarDigital Library
M. Drake, H. Hoffman, R. Rabbah, and S. Amarasinghe. MPEG-2 Decoding in a Stream Programming Language. In IPDPS, Rhodes Island, Greece, April 2006.]] Google ScholarDigital Library
W. Du, R. Ferreira, and G. Agrawal. Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism. In Supercomputing, 2005.]] Google ScholarDigital Library
E. and D. Messerschmitt. Pipeline interleaved programmable DSP's: Synchronous data flow programming. IEEE Trans. on Signal Processing, 35(9), 1987.]]Google Scholar
W. Eatherton. The Push of Network Processing to the Top of the Pyramid. Keynote Address, Symposium on Architectures for Networking and Communications Systems, 2005.]]Google Scholar
M. Gordon,W. Thies, M. Karczmarek, J. Lin, A.S. Meli, A.A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. In ASPLOS, 2002.]] Google ScholarDigital Library
J. Gummaraju and M. Rosenblum. Stream Programming on General- Purpose Processors. In MICRO, 2005.]] Google ScholarDigital Library
H.P. Hofstee. Power Efficient Processor Architecture and The Cell Processor. HPCA, 00:258--262, 2005.]] Google ScholarDigital Library
U.J. Kapasi, S. Rixner, W.J. Dally, B. Khailany, J.H. Ahn, P. Mattson, and J.D. Owens. Programmable stream processors. IEEE Computer, 2003.]] Google ScholarDigital Library
M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. In LCTES, San Diego, CA, June 2003.]] Google ScholarDigital Library
M.A. Karczmarek. Constrained and Phased Scheduling of Synchronous Data Flow Graphs for the StreamIt Language. Master's thesis, MIT, 2002.]]Google Scholar
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, 2005.]] Google ScholarDigital Library
Y.-K. Kwok and I. Ahmad. FASTEST: A Practical Low-Complexity Algorithm for Compile-Time Assignment of Parallel Programs to Multiprocessors. IEEE Trans. on Parallel and Distributed Systems, 10(2), 1999.]] Google ScholarDigital Library
Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406--471, 1999.]] Google ScholarDigital Library
A.A. Lamb, W. Thies, and S. Amarasinghe. Linear Analysis and Optimization of Stream Programs. In PLDI, San Diego, CA, June 2003.]] Google ScholarDigital Library
J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]Google Scholar
E.A. Lee and D.G. Messerschmitt. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Trans. Comput., 36(1):24--35, 1987.]] Google ScholarDigital Library
W.R. Mark, R.S. Glanville, K. Akeley, and M.J. Kilgard. Cg: A System for Programming Graphics Hardware in a C-like Language. In SIGGRAPH, 2003.]] Google ScholarDigital Library
D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]] Google ScholarDigital Library
Michael Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997.]] Google ScholarDigital Library
G. Ottoni, R. Rangan, A. Stoler, and D.I. August. Automatic Thread Extraction with Decoupled Software Pipelining. In MICRO, 2005.]] Google ScholarDigital Library
K. Parhi and D. Messerschmitt. Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding. IEEE Transactions on Computers, 40(2), 1991.]] Google ScholarDigital Library
J.L. Pino, S.S. Bhattacharyya, and E.A. Lee. A Hierarchical Multiprocessor Scheduling Framework for Synchronous Dataflow Graphs. Technical Report UCB/ERL M95/36, May 1995.]] Google ScholarDigital Library
J.L. Pino and E.A. Lee. Hierarchical Static Scheduling of Dataflow Graphs onto Multiple Processors. Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, 1995.]]Google ScholarCross Ref
S. Seneff. Speech transformation system (spectrum and/or excitation) without pitch extraction. Master's thesis, MIT, 1980.]]Google Scholar
J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache Aware Optimization of Stream Programs. In LCTES, Chicago, 2005.]] Google ScholarDigital Library
R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]Google ScholarCross Ref
M.B. Taylor et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]] Google ScholarDigital Library
M.B. Taylor, W. Lee, J. Miller, D. Wentzlaff, et al. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In ISCA, Munich, Germany, June 2004.]] Google ScholarDigital Library
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, France, 2002.]] Google ScholarDigital Library
E.Waingold, M. Taylor, D. Srikrishna, V. Sarkar,W. Lee, et al. Baring It All to Software: Raw Machines. IEEE Computer, 30(9), 1997.]] Google ScholarDigital Library
S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and Computation Transformations for Brook Streaming Applications on Multiprocessors. In CGO, 2006.]] Google ScholarDigital Library
D. Zhang, Z.-Z. Li, H. Song, and L. Liu. A Programming Model for an Embedded Media Processing Architecture. In SAMOS, 2005.]] Google ScholarDigital Library

Index Terms

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Data flow languages

Recommendations

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 2006 ASPLOS Conference

As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Read More
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Read More
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 2006 ASPLOS Conference

As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 34, Issue 5
Proceedings of the 2006 ASPLOS Conference
December 2006
425 pages
ISSN:0163-5964
DOI:10.1145/1168919
Issue’s Table of Contents
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
October 2006
440 pages
ISBN:1595934510
DOI:10.1145/1168857
General Chair:
John Paul Shen
Intel Corp.
,
Program Chair:
Margaret R. Martonosi
Princeton University
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 October 2006
Check for updates
Author Tags
Raw
StreamIt
coarse-grained dataflow
multicore
software pipelining
streams
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 478
  Total Citations
  View Citations
- 3,176
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs