article

Power-efficient prefetching via bit-differential offset assignment on embedded processors

Authors:
Xiaotong Zhuang

Georgia Institute of Technology, Atlanta, GA

Georgia Institute of Technology, Atlanta, GA
View Profile

,
Santosh Pande

Georgia Institute of Technology, Atlanta, GA

Georgia Institute of Technology, Atlanta, GA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 39 Issue 7July 2004pp 67–77https://doi.org/10.1145/998300.997174

Published:11 June 2004Publication History

ACM SIGPLAN Notices

Abstract

Due to stringent power constraints, aggressive latency hiding approaches such as prefetching are absent in the state-of-the-art embedded processors. There are two main reasons that cause prefetching to be power inefficient. First, compiler inserted prefetch instructions increase code size, therefore could increase I-cache power. Secondly, inaccurate prefetching (esp. for hardware prefetching) leads to high D-cache power consumption due to the useless accesses. In this work, we show that it is possible to support power-efficient prefetching through bit-differentail offset assignment to stack variables.We target the prefetching of relocatable stack variables with a high degree of precision. By assigning the offsets of stack variables in such a way that most consecutive addresses differ by 1 bit, we can prefetch them with compact prefetch instructions to save I-cache power. The compiler first generates an access graph of consecutive memory references and then attempts a layout of the memory locations in the smallest hypercube. Each dimension of the hypercube represents a 1-bit differential addressing. The embedding is carried out in as compact a hypercube as possible in order to save memory space. Each load/store instruction carries a hint regarding prefetching the next memory reference by encoding its differential address with respect to the current one. To reduce D-cache power cost, we further attempt to assign offsets so that most of the consecutive accesses map to the same cache line. Our prefetching is done using a one entry line buffer[1]. As a consequence, many look-ups in D-cache reduce to incremental ones. This results in D-cache activity reduction and power savings.Our prefetching requires both compiler and hardware support. In this paper, we provide implementation on the ARM processor with small modification to the ARM ISA. We tackle issues about out of order commit, predication and speculation through simple modifications to the processor pipeline on non-critical paths. Our goal in this work is to boost performance while maintaining/lowering power consumption. Our results show 12% speed-up and slightly lower power consumption.

References

K. M. Wilson, K. Olukotun, M. Rosenblum, "Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors", ISCA'96, 1996. Google ScholarDigital Library
S. Segars, "Low Power Design Techniques for Microprocessors," ISSCC, Feb. 2001.Google Scholar
H. S. Lee, M. Smelyanskiy, C. J. Newburn, and G. S. Tyson, "Stack Value File: Custom Microarchitecture for the Stack," HPCA-7, Jan. 2001. Google ScholarDigital Library
S. Cho, P. C. Yew, G. Lee. "Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor," ISCA, May 1999. Google ScholarDigital Library
S. Udayanarayanan, C. Chakrabarti, "Address code generation for DSPs," DAC, 2001.Google Scholar
D. Bartley, Optimizing Stack Frame access for processors with restricted addressing Modes, Software,Practice and Experience, 22(2): 101--110. Feb 1992. Google ScholarDigital Library
S. Liao, S. Devadas, K. Keutzer, S. Tjiang, A. Wang, "Storage assignment to decrease code size," ACM TOPLAS, 18(3): 235--253, May 1996. Google ScholarDigital Library
R. Leupers,P. Marwedel, "Algorithm for Address Assignment in DSP Code Generation," Proc. ICCAD, 1996. Google ScholarDigital Library
A. Rao and S. Pande. "Storage Assignment Optimizations to Generate Compact and Efficient Code on Embedded DSPs," In ACM (PLDI), pp.128--138, 1999. Google ScholarDigital Library
Intel Corp., "SA-110 Microprocessor Tech. Ref. Manual".Google Scholar
A. V. Aho, R. Sethi, J. D. Ullman, Compilers Principles, Techniques and Tools, Addison-Wesley, 1986. Google ScholarDigital Library
S. Wilton and N. P. Jouppi, "An Enhanced Access and Cycle Time Model for. On-Chip Caches," Technical Report TN93 /5, Compaq Western Research Lab, 1993.Google Scholar
D. Burger, T. M. Austin, "The SimpleScalar Tool Set Version 2.0," Tech. Report 1342, Univ. of Wisconsin--Madison, May 1997.Google Scholar
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," IEEE 4th Annual Workshop on Workload Characterization, 2001. Google ScholarDigital Library
P. R. Panda and N. D. Dutt, "Low-power memory mapping through reducing address bus activity," IEEE Transactions on VLSI Systems, Vol. 7, No. 3, Sept. 1999. Google ScholarDigital Library
W. Noth, R. Kolla, "Spanning Tree-based State Encoding for Low-Power Dissipation," DATE, pp. 168--174. 1999. Google ScholarDigital Library
K. Basu, A. Choudhary, J. Pisharath, M. Kandemir, "Power Protocol: Reducing Power Dissipation on Off-Chip Data Buses," MICRO, Nov. 2002. Google ScholarDigital Library
E. Witchel, S. Larsen, C. S. Ananian, K. AsanoviĆ, "Direct Addressed Caches for Reduced Power Consumption", MICRO'01, Dec. 2001. Google ScholarDigital Library
D. Brooks, V. Tiwari V., and M. Martonosi, "Wattch: A Framework for Architectural-Level Power Analysis and Optimizations," ISCA'00, Jun. 2000. Google ScholarDigital Library
A. J. Smith. "Cache Memories." Computing Surveys, Vol. 14, No. 3, Sept. 1982. Google ScholarDigital Library

Index Terms

Power-efficient prefetching via bit-differential offset assignment on embedded processors
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Power-efficient prefetching via bit-differential offset assignment on embedded processors
LCTES '04: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Due to stringent power constraints, aggressive latency hiding approaches such as prefetching are absent in the state-of-the-art embedded processors. There are two main reasons that cause prefetching to be power inefficient. First, compiler inserted ...
Read More
Power-efficient prefetching for embedded processors

Because of stringent power constraints, aggressive latency-hiding approaches, such as prefetching, are absent in the state-of-the-art embedded processors. There are two main reasons that make prefetching power inefficient. First, compiler-inserted ...
Read More
Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching

Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 39, Issue 7
LCTES '04
July 2004
265 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/998300
Issue’s Table of Contents
LCTES '04: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
June 2004
276 pages
ISBN:1581138067
DOI:10.1145/997163
General Chair:
David Whalley
Florida State University
,
Program Chair:
Ron Cytron
Washington University
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2004
Check for updates
Author Tags
bit-differential addressing
data prefetching
embedded processors
offset assignment
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 445
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Power-efficient prefetching via bit-differential offset assignment on embedded processors

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Power-efficient prefetching via bit-differential offset assignment on embedded processors

Power-efficient prefetching for embedded processors

Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching