Article

Cache aware optimization of stream programs

Authors:
Janis Sermulins

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

,
William Thies

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

,
Rodric Rabbah

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

,
Saman Amarasinghe

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsJune 2005Pages 115–126https://doi.org/10.1145/1065910.1065927

Published:15 June 2005Publication History

LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Pages 115–126

ABSTRACT

Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly prevalent in the embedded domain. We exploit the widespread parallelism and regular communication patterns in stream programs to formulate a set of cache aware optimizations that automatically improve instruction and data locality. Our work is in the context of the Synchronous Dataflow model, in which a program is described as a graph of independent actors that communicate over channels. The communication rates between actors are known at compile time, allowing the compiler to statically model the caching behavior.We present three cache aware optimizations: 1) execution scaling, which judiciously repeats actor executions to improve instruction locality, 2) cache aware fusion, which combines adjacent actors while respecting instruction cache constraints, and 3) scalar replacement, which converts certain data buffers into a sequence of scalar variables that can be register allocated. The optimizations are founded upon a simple and intuitive model that quantifies the temporal locality for a sequence of actor executions. Our implementation of cache aware optimizations in the StreamIt compiler yields a 249% average speedup (over unoptimized code) for our streaming benchmark suite on a StrongARM 1110 processor. The optimizations also yield a 154% speedup on a Pentium 3 and a 152% speedup on an Itanium 2.

References

A. Benveniste, P. Caspi, P. L. Guernic, and N. Halbwachs. Data-Flow Synchronous Languages. In REX School/Symposium, pages 1--45, 1993. Google ScholarDigital Library
G. Berry and G. Gonthier. The Esterel Synchronous Programming Language: Design, Semantics, Implementation. Science of Computer Prog., 19(2), 1992. Google ScholarDigital Library
C. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations. Journal of Design Automation for Embedded Systems., pages 33--60, January 1997.Google Scholar
S. Bhattacharyya, P. Murthy, and E. Lee. Synthesis of embedded software from synchronous dataflow specifications. Journal of VLSI Signal Processing Systems, 21(2), June 1999. Google ScholarDigital Library
S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996. Google ScholarDigital Library
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH, 2004. Google ScholarDigital Library
R. Govindarajan, G. Gao, and P. Desai. Minimizing memory requirements in rate-optimal schedules. In Proceedings of the 1994 Int. conference on Application Specific Array Processors, pages 75--86, August 1994.Google ScholarCross Ref
N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proc. of the IEEE, 79(9), 1991.Google ScholarCross Ref
J. Gaudiot and W. Bohm and T. DeBoni and J. Feo and P. Mille. The Sisal Model of Functional Programming and its Implementation. In Proc. of the Second Aizu International Symposium on Parallel Algorithms/Architectures Synthesis, 1997. Google ScholarDigital Library
M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. In LCTES, 2003. Google ScholarDigital Library
S. Kohli. Cache aware scheduling of synchronous dataflow programs. Master's Report Technical Memorandum UCB/URL M04/03, UC Berkeley, 2004.Google Scholar
E. A. Lee. Overview of the Ptolemy Project. Technical report, Tech Memo UCB/ERL M03/25, UC Berkeley, 2003.Google Scholar
E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, January 1987. Google ScholarDigital Library
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970.Google Scholar
P. K. Murthy and S. S. Bhattacharyya. A Buffer Merging Technique for Reducing Memory Requirements of Synchronous Dataflow Specifications. In International Symposium on System Synthesis, 1999. Google ScholarDigital Library
P. K. Murthy and S. S. Bhattacharyya. Buffer Merging --- A Powerful Technique for Reducing Memory Requirements of Synchronous Dataflow Specifications. Technical report, Inst. for Adv. Computer Studies, UMD College Park, 2000.Google Scholar
T. A. Proebsting and S. A. Watterson. Filter Fusion. In POPL, 1996. Google ScholarDigital Library
J. Sermulins. In preparation. Master's thesis, MIT CSAIL, 2005.Google Scholar
R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.Google Scholar
D. Tennenhouse and V. Bose. The SpectrumWare Approach to Wireless Signal Processing. Wireless Networks, 1999. Google ScholarDigital Library
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proc. of the Int. Conf. on Compiler Construction (CC), 2002. Google ScholarDigital Library

Index Terms

Cache aware optimization of stream programs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language types
        Concurrent programming languages
        Data flow languages
        Distributed programming languages
        Parallel programming languages

Recommendations

Cache aware optimization of stream programs
Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly prevalent in the embedded domain. We exploit the widespread parallelism and ...
Read More
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Read More
Reducing cache misses through programmable decoders

Level-one caches normally reside on a processor's critical path, which determines clock frequency. Therefore, fast access to level-one cache is important. Direct-mapped caches exhibit faster access time, but poor hit rates, compared with same sized set-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
June 2005
248 pages
ISBN:1595930183
DOI:10.1145/1065910
General Chair:
Yunheung Paek
Seoul National University, Seoul, Korea
,
Program Chair:
Rajiv Gupta
University of Arizona, Tucson, USA
ACM SIGPLAN Notices Volume 40, Issue 7
Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
July 2005
238 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1070891
Issue’s Table of Contents
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
StreamIt
cache
cache optimizations
embedded
fusion
stream programing
synchronous dataflow
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate116of438submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 793
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cache aware optimization of stream programs

LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache aware optimization of stream programs

Increasing hardware data prefetching performance using the second-level cache

Reducing cache misses through programmable decoders