Abstract
As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (image, video, DSP, etc.) are naturally represented by independent filters that communicate over explicit data channels.In this paper, we demonstrate an end-to-end stream compiler that attains robust multicore performance in the face of varying application characteristics. As benchmarks exhibit different amounts of task, data, and pipeline parallelism, we exploit all types of parallelism in a unified manner in order to achieve this generality. Our compiler, which maps from the StreamIt language to the 16-core Raw architecture, attains a 11.2x mean speedup over a single-core baseline, and a 1.84x speedup over our previous work.
- Raza Microelectronics, Inc. http://www.razamicroelectronics.com/products/xlr.htm.]]Google Scholar
- StreamIt Language Specification. http://cag.lcs.mit.edu/streamit/papers/streamit-lang-spec.pdf.]]Google Scholar
- S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing Stream Programs Using Linear State Space Analysis. In CASES, San Francisco, CA, Sept. 2005.]] Google ScholarDigital Library
- J. Andrews and N. Baker. Xbox 360 System Architecture. IEEE Micro, 26(2), 2006.]] Google ScholarDigital Library
- S. Bakshi and D.D. Gajski. Partitioning and pipelining for performance-constrained hardware/software systems. IEEE Trans. Very Large Scale Integr. Syst., 7(4):419--432, 1999.]] Google ScholarDigital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH, 2004.]] Google ScholarDigital Library
- L.-F. Chao and E.H.-M. Sha. Scheduling Data-Flow Graphs via Retiming and Unfolding. IEEE Trans. on Parallel and Distributed Systems, 08(12), 1997.]] Google ScholarDigital Library
- K.S. Chatha and R. Vemuri. Hardware-Software partitioning and pipelined scheduling of transformative applications. IEEE Trans. Very Large Scale Integr. Syst., 10(3), 2002.]] Google ScholarDigital Library
- M.K. Chen, X.F. Li, R. Lian, J.H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-La: Achieving High Performance from Compiled Network Applications While Enabling Ease of Programming. In PLDI, New York, NY, USA, 2005.]] Google ScholarDigital Library
- C. Consel, H. Hamdi, L. Rveillre, L. Singaravelu, H. Yu, and C. Pu. Spidle: A DSL Approach to Specifying Streaming Applications. In 2nd Int. Conf. on Generative Prog. and Component Engineering, 2003.]] Google ScholarDigital Library
- M. Drake, H. Hoffman, R. Rabbah, and S. Amarasinghe. MPEG-2 Decoding in a Stream Programming Language. In IPDPS, Rhodes Island, Greece, April 2006.]] Google ScholarDigital Library
- W. Du, R. Ferreira, and G. Agrawal. Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism. In Supercomputing, 2005.]] Google ScholarDigital Library
- E. and D. Messerschmitt. Pipeline interleaved programmable DSP's: Synchronous data flow programming. IEEE Trans. on Signal Processing, 35(9), 1987.]]Google Scholar
- W. Eatherton. The Push of Network Processing to the Top of the Pyramid. Keynote Address, Symposium on Architectures for Networking and Communications Systems, 2005.]]Google Scholar
- M. Gordon,W. Thies, M. Karczmarek, J. Lin, A.S. Meli, A.A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. In ASPLOS, 2002.]] Google ScholarDigital Library
- J. Gummaraju and M. Rosenblum. Stream Programming on General- Purpose Processors. In MICRO, 2005.]] Google ScholarDigital Library
- H.P. Hofstee. Power Efficient Processor Architecture and The Cell Processor. HPCA, 00:258--262, 2005.]] Google ScholarDigital Library
- U.J. Kapasi, S. Rixner, W.J. Dally, B. Khailany, J.H. Ahn, P. Mattson, and J.D. Owens. Programmable stream processors. IEEE Computer, 2003.]] Google ScholarDigital Library
- M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. In LCTES, San Diego, CA, June 2003.]] Google ScholarDigital Library
- M.A. Karczmarek. Constrained and Phased Scheduling of Synchronous Data Flow Graphs for the StreamIt Language. Master's thesis, MIT, 2002.]]Google Scholar
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, 2005.]] Google ScholarDigital Library
- Y.-K. Kwok and I. Ahmad. FASTEST: A Practical Low-Complexity Algorithm for Compile-Time Assignment of Parallel Programs to Multiprocessors. IEEE Trans. on Parallel and Distributed Systems, 10(2), 1999.]] Google ScholarDigital Library
- Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406--471, 1999.]] Google ScholarDigital Library
- A.A. Lamb, W. Thies, and S. Amarasinghe. Linear Analysis and Optimization of Stream Programs. In PLDI, San Diego, CA, June 2003.]] Google ScholarDigital Library
- J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]Google Scholar
- E.A. Lee and D.G. Messerschmitt. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Trans. Comput., 36(1):24--35, 1987.]] Google ScholarDigital Library
- W.R. Mark, R.S. Glanville, K. Akeley, and M.J. Kilgard. Cg: A System for Programming Graphics Hardware in a C-like Language. In SIGGRAPH, 2003.]] Google ScholarDigital Library
- D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]] Google ScholarDigital Library
- Michael Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997.]] Google ScholarDigital Library
- G. Ottoni, R. Rangan, A. Stoler, and D.I. August. Automatic Thread Extraction with Decoupled Software Pipelining. In MICRO, 2005.]] Google ScholarDigital Library
- K. Parhi and D. Messerschmitt. Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding. IEEE Transactions on Computers, 40(2), 1991.]] Google ScholarDigital Library
- J.L. Pino, S.S. Bhattacharyya, and E.A. Lee. A Hierarchical Multiprocessor Scheduling Framework for Synchronous Dataflow Graphs. Technical Report UCB/ERL M95/36, May 1995.]] Google ScholarDigital Library
- J.L. Pino and E.A. Lee. Hierarchical Static Scheduling of Dataflow Graphs onto Multiple Processors. Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, 1995.]]Google ScholarCross Ref
- S. Seneff. Speech transformation system (spectrum and/or excitation) without pitch extraction. Master's thesis, MIT, 1980.]]Google Scholar
- J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache Aware Optimization of Stream Programs. In LCTES, Chicago, 2005.]] Google ScholarDigital Library
- R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]Google ScholarCross Ref
- M.B. Taylor et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]] Google ScholarDigital Library
- M.B. Taylor, W. Lee, J. Miller, D. Wentzlaff, et al. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In ISCA, Munich, Germany, June 2004.]] Google ScholarDigital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, France, 2002.]] Google ScholarDigital Library
- E.Waingold, M. Taylor, D. Srikrishna, V. Sarkar,W. Lee, et al. Baring It All to Software: Raw Machines. IEEE Computer, 30(9), 1997.]] Google ScholarDigital Library
- S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and Computation Transformations for Brook Streaming Applications on Multiprocessors. In CGO, 2006.]] Google ScholarDigital Library
- D. Zhang, Z.-Z. Li, H. Song, and L. Liu. A Programming Model for an Embedded Media Processing Architecture. In SAMOS, 2005.]] Google ScholarDigital Library
Index Terms
- Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Recommendations
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 2006 ASPLOS ConferenceAs multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsAs multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 2006 ASPLOS ConferenceAs multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Comments