research-article

Instruction cache locking using temporal reuse profile

Authors:
Yun Liang

National University of Singapore

National University of Singapore
View Profile

,
Tulika Mitra

National University of Singapore

National University of Singapore
View Profile

DAC '10: Proceedings of the 47th Design Automation ConferenceJune 2010Pages 344–349https://doi.org/10.1145/1837274.1837362

Published:13 June 2010Publication History

DAC '10: Proceedings of the 47th Design Automation Conference

Pages 344–349

ABSTRACT

The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the performance of an application. Modern embedded processors often feature cache locking mechanisms that allow memory blocks to be locked in the cache under software control. Cache locking was primarily designed to offer timing predictability for hard real-time applications. Hence, the compiler optimization techniques focus on employing cache locking to improve worst-case execution time. However, cache locking can be quite effective in improving the average-case execution time of general embedded applications as well. In this paper, we explore static instruction cache locking to improve average-case program performance. We introduce temporal reuse profile to accurately and efficiently model the cost and benefit of locking memory blocks in the cache. We propose an optimal algorithm and a heuristic approach that use the temporal reuse profile to determine the most beneficial memory blocks to be locked in the cache. Experimental results show that locking heuristic achieves close to optimal results and can improve the cache miss rate by up to 24% across a suite of real-world benchmarks. Moreover, our heuristic provides significant improvement compared to the state-of-the-art locking algorithm both in terms of performance and efficiency.

References

3rd Generation Intel Xscale Microarchitecture Developers's Manual. Intel, May 2007. http://www.intel.com/design/intelxscale.Google Scholar
ADSP-BF533 Processor Hardware Reference. Analog Devices, April 2009. http://www.analog.com/static/imported-files/processor_manuals/bf533_hwr_Rev3.4.pdf.Google Scholar
ARM Cortex A-8 Technical Reference Manual. ARM, Revised March 2004. http://www.arm.com/products/CPUs/families/ARMCortexFamily.html.Google Scholar
ARM1156T2-S Technical Reference Manual. ARM, Revised July 2007. http://www.arm.com/products/CPUs/families/ARM11Family.html.Google Scholar
K. Anand and R. Barua. Instruction cache locking inside a binary rewriter. In CASES, 2009. Google ScholarDigital Library
T. Austin et al. Simplescalar: An infrastructure for computer system modeling. Computer, 35(2):59--67, 2002. Google ScholarDigital Library
K. Beyls and E. H. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, 2001.Google Scholar
B. Bryan and J. K. Hollingsworth. An API for runtime code patching. Int. J. High Perform. Comput. Appl., 14(4), 2000. Google ScholarDigital Library
C. Ding and Y. T. Zhong. Predicting whole-program locality through reuse distance analysis. SIGPLAN Not., 38(5), 2003. Google ScholarDigital Library
H. Falk et al. Compile-time decided instruction cache locking using worst-case execution paths. In CODES+ISSS, 2007. Google ScholarDigital Library
N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Trans. Program. Lang. Syst., 21(5):977--1027, 1999. Google ScholarDigital Library
I. Puaut and D. Decotigny. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In RTSS, 2002. Google ScholarDigital Library
V. Suhendra and T. Mitra. Exploring locking & partitioning for predictable shared caches on multi-cores. In DAC, 2008. Google ScholarDigital Library
X. Vera et al. Data cache locking for higher program predictability. In SIGMETRICS, 2003. Google ScholarDigital Library
H. Yang et al. Improving power efficiency with compiler-assisted cache replacement. J. Embedded Comput., 1(4):487--499, 2005. Google ScholarDigital Library

Index Terms

Instruction cache locking using temporal reuse profile
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Phase-based Cache Locking for Embedded Systems
GLSVLSI '15: Proceedings of the 25th edition on Great Lakes Symposium on VLSI

Since caches are commonly used in embedded systems, which typically have stringent design constraints imposed by physical size, battery capacity, real-time deadlines, etc., much research focuses on cache optimizations, such as improved performance and/...
Read More
Instruction-Cache Locking for Improving Embedded Systems Performance
Special Issue on Embedded Platforms for Crypto and Regular Papers

Cache memories in embedded systems play an important role in reducing the execution time of applications. Various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions, improving the runtime over a ...
Read More
Instruction Cache Locking Using Temporal Reuse Profile
The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the performance of an application. Modern embedded processors often feature cache ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAC '10: Proceedings of the 47th Design Automation Conference
June 2010
1036 pages
ISBN:9781450300025
DOI:10.1145/1837274
General Chair:
Sachin Sapatnekar
Univ. of Minnesota, Minneapolis, MN
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache
cache locking
temporal reuse profile
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 258
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Instruction cache locking using temporal reuse profile

DAC '10: Proceedings of the 47th Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Phase-based Cache Locking for Embedded Systems

Instruction-Cache Locking for Improving Embedded Systems Performance

Instruction Cache Locking Using Temporal Reuse Profile