Value locality and load value prediction

Authors:
Mikko H. Lipasti

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA
View Profile

,
Christopher B. Wilkerson

Intel Corporation in Portland, Oregon and Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA

Intel Corporation in Portland, Oregon and Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA
View Profile

,
John Paul Shen

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA
View Profile

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systemsOctober 1996Pages 138–147https://doi.org/10.1145/237090.237173

Published:01 September 1996Publication History

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

Pages 138–147

ABSTRACT

Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, a third facet of locality that is frequently present in real-world programs, and describe how to effectively capture and exploit it in order to perform load value prediction. Temporal and spatial locality are attributes of storage locations, and describe the future likelihood of references to those locations or their close neighbors. In a similar vein, value locality describes the likelihood of the recurrence of a previously-seen value within a storage location. Modern processors already exploit value locality in a very restricted sense through the use of control speculation (i.e. branch prediction), which seeks to predict the future value of a single condition bit based on previously-seen values. Our work extends this to predict entire 32- and 64-bit register values based on previously-seen values. We find that, just as condition bits are fairly predictable on a per-static-branch basis, full register values being loaded from memory are frequently predictable as well. Furthermore, we show that simple microarchitectural enhancements to two modern microprocessor implementations (based on the PowerPC 620 and Alpha 21164) that enable load value prediction can effectively exploit value locality to collapse true dependencies, reduce average memory latency and bandwidth requirements, and provide measurable performance gains.

References

AS95.Todd M. Austin and Gurindar S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 82-92, December 1995. Google ScholarDigital Library
ASKL81.Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5):341-356, May 1981.Google ScholarDigital Library
ASU86.A.V. Aho, R. Sethi, and J.D. Ullman. Compilers principles, techniques, and tools. Addison-Wesley, Reading, MA, 1986. Google ScholarDigital Library
ASW+93.S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Ran, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 26th Annual ACM/ IEEE International Symposium on Microarchitecture, December 1993. Google ScholarDigital Library
BK95.Peter Bannon and Jim Keller. Internal architecture of Alpha 21164 microprocessor. COMPCON 95, 1995. Google ScholarDigital Library
CB94.Tien-Fu Chen and Jean-Loup Baer. A performance study of software and hardware data prefetching schemes. In 21st Annual International Symposium on Computer Architecture, pages 223-232, 1994. Google ScholarDigital Library
CKP91.David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching, in Fourth international Conference on Architectural Support for Programming Lan~ guages and Operating Systems, pages zt0-52, Santa Clara, April 1991. Google ScholarDigital Library
CMCH91.W. Y. Chen, S. A. Mahlke, P. P. Chang, and W.-M. Hwu. Data access microarchitecture for superscalar processors with compiler-assisted data prefetching. In Proceedings of the 24th International Symposium on Microarchitecture, 199 I. Google ScholarDigital Library
CMT94.Steve Cart, KathrynS. McKinley, and Chau-Wen Tseng. Compiler optimiza',ions for improving data locality. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, San Jose, October 1994. Google ScholarDigital Library
DNS95.Trung A. Diep, Christopher Nelson, and John P. Shen. Performance evaluation of the PowerPC 620 microarchitecture. In Proceedings of the 22nd international Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995. Google ScholarDigital Library
DS95.Trung A. Died and John Paul Shen. VMW: A visualization-based microarchitecture workbench. IEEE Computer, 28(12):57-64, 1995. Google ScholarDigital Library
Gwe94.Linley Gwennap, Comparing RISC microprocessors. In Proceedings of the Microprocessor Forum, October 1994.Google Scholar
Har80.Samuel P. Harbison. A Computer Architecture for the Dynamic Optimization of High-Level Language Programs. PhD thesis, Carnegie Mellon University, September 1980. Google ScholarDigital Library
Har82.Samuel P. Harbison. An architectural alternative to optimizing compilers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 57-65, March 1982. Google ScholarDigital Library
Jou88.N.P. Jouppi. Architectural and organizational tradeoffs in the design of the MulfiTitan CPU. Technical Report TN-8, DEC-wrl, December 19gg.Google Scholar
Jou90.Norman P, Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In 17th Annual International Symposium on Computer Architecture, pages 364-373, Seattle, May 1990. Google ScholarDigital Library
KEH93.David Keppel, Susan j. Eggers, and Robert R. Henry. Evaluating runtime-compiled, value-specific optimizations. Technical report, University of Washington, 1993.Google Scholar
Kro81.David Kroft. Lockup-free instruction fetch/prefetch cache organization. In 8th Annual International Symposium on Computer Architecture, pages 81-87. IEEE Computer Society Press, 1981. Google ScholarDigital Library
LTT95.David Levitan, Thomas Thomas, and Paul Tu. The PowerPC 620 microprocessor: A high performance superscalar RISC processor. COMPCON 95, 1995. Google ScholarDigital Library
MLG92.Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, 1992. Google ScholarDigital Library
RD94.K. Roland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59- 67, 1994. Google ScholarDigital Library
Ric92.Stephen E. Richardson. Caching function results: Faster arithmetic by avoiding unnecessary computation. Technical report, Sun Microsystems Laboratories, 1992. Google ScholarDigital Library
SE94.Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of the A CM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196-205, 1994. Google ScholarDigital Library
SIG91.SIGPLAN. Proceedings of the Symposium on Partial Evaluation and Semantics-Based Program Manipulation, volume 26, Cambridge, MA, September 1991. SIGPLAN Notices.Google Scholar
Smi81.J.E. Smith. A study of branch prediction techniques. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 135-147, June 1981. Google ScholarDigital Library
Smi82.Alan Jay Smith. Cache memories. Computing Surveys, 14(3):473-530, 1982. Google ScholarDigital Library
SW94.Amitabh Srivastava and David W. Wall. Link-time optimization of address calculation on a 64-bit architecture. SIGPLAN Notices, 29(6):49-60, June 1994. Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
TFMP95.Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 93-103, December 1995. Google ScholarDigital Library
YP91.T.Y. Yeh and Y. N. Patt. Two-level adaptive training branch prediction, in Proceedings of the 24th Annual International Symposium on Microarchitecture, pages 51-61, November 1991. Google ScholarDigital Library

Index Terms

Value locality and load value prediction

Recommendations

Load value prediction via path-based address prediction: avoiding mispredictions due to conflicting stores
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is inherently limited by true data dependencies. Value prediction was proposed to address this ...
Read More
Value locality and load value prediction

Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, ...
Read More
Value locality and load value prediction

Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
October 1996
290 pages
ISBN:0897917677
DOI:10.1145/237090
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggets
Univ. of Washington, Seattle
ACM SIGPLAN Notices Volume 31, Issue 9
Sept. 1996
273 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/248209
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 30, Issue 5
Dec. 1996
273 pages
ISSN:0163-5980
DOI:10.1145/248208
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle
Issue’s Table of Contents
Copyright © 1996 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 1996
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ASPLOS VII Paper Acceptance Rate25of109submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 493
  Total Citations
  View Citations
- 2,595
  Total Downloads
- Downloads (Last 12 months)287
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Value locality and load value prediction

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Load value prediction via path-based address prediction: avoiding mispredictions due to conflicting stores

Value locality and load value prediction

Value locality and load value prediction