skip to main content
10.1145/237090.237173acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free Access

Value locality and load value prediction

Authors Info & Claims
Published:01 September 1996Publication History

ABSTRACT

Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, a third facet of locality that is frequently present in real-world programs, and describe how to effectively capture and exploit it in order to perform load value prediction. Temporal and spatial locality are attributes of storage locations, and describe the future likelihood of references to those locations or their close neighbors. In a similar vein, value locality describes the likelihood of the recurrence of a previously-seen value within a storage location. Modern processors already exploit value locality in a very restricted sense through the use of control speculation (i.e. branch prediction), which seeks to predict the future value of a single condition bit based on previously-seen values. Our work extends this to predict entire 32- and 64-bit register values based on previously-seen values. We find that, just as condition bits are fairly predictable on a per-static-branch basis, full register values being loaded from memory are frequently predictable as well. Furthermore, we show that simple microarchitectural enhancements to two modern microprocessor implementations (based on the PowerPC 620 and Alpha 21164) that enable load value prediction can effectively exploit value locality to collapse true dependencies, reduce average memory latency and bandwidth requirements, and provide measurable performance gains.

References

  1. AS95.Todd M. Austin and Gurindar S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 82-92, December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ASKL81.Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5):341-356, May 1981.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ASU86.A.V. Aho, R. Sethi, and J.D. Ullman. Compilers principles, techniques, and tools. Addison-Wesley, Reading, MA, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ASW+93.S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Ran, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 26th Annual ACM/ IEEE International Symposium on Microarchitecture, December 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BK95.Peter Bannon and Jim Keller. Internal architecture of Alpha 21164 microprocessor. COMPCON 95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CB94.Tien-Fu Chen and Jean-Loup Baer. A performance study of software and hardware data prefetching schemes. In 21st Annual International Symposium on Computer Architecture, pages 223-232, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CKP91.David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching, in Fourth international Conference on Architectural Support for Programming Lan~ guages and Operating Systems, pages zt0-52, Santa Clara, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CMCH91.W. Y. Chen, S. A. Mahlke, P. P. Chang, and W.-M. Hwu. Data access microarchitecture for superscalar processors with compiler-assisted data prefetching. In Proceedings of the 24th International Symposium on Microarchitecture, 199 I. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CMT94.Steve Cart, KathrynS. McKinley, and Chau-Wen Tseng. Compiler optimiza',ions for improving data locality. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, San Jose, October 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. DNS95.Trung A. Diep, Christopher Nelson, and John P. Shen. Performance evaluation of the PowerPC 620 microarchitecture. In Proceedings of the 22nd international Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DS95.Trung A. Died and John Paul Shen. VMW: A visualization-based microarchitecture workbench. IEEE Computer, 28(12):57-64, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gwe94.Linley Gwennap, Comparing RISC microprocessors. In Proceedings of the Microprocessor Forum, October 1994.Google ScholarGoogle Scholar
  13. Har80.Samuel P. Harbison. A Computer Architecture for the Dynamic Optimization of High-Level Language Programs. PhD thesis, Carnegie Mellon University, September 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Har82.Samuel P. Harbison. An architectural alternative to optimizing compilers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 57-65, March 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jou88.N.P. Jouppi. Architectural and organizational tradeoffs in the design of the MulfiTitan CPU. Technical Report TN-8, DEC-wrl, December 19gg.Google ScholarGoogle Scholar
  16. Jou90.Norman P, Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In 17th Annual International Symposium on Computer Architecture, pages 364-373, Seattle, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. KEH93.David Keppel, Susan j. Eggers, and Robert R. Henry. Evaluating runtime-compiled, value-specific optimizations. Technical report, University of Washington, 1993.Google ScholarGoogle Scholar
  18. Kro81.David Kroft. Lockup-free instruction fetch/prefetch cache organization. In 8th Annual International Symposium on Computer Architecture, pages 81-87. IEEE Computer Society Press, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. LTT95.David Levitan, Thomas Thomas, and Paul Tu. The PowerPC 620 microprocessor: A high performance superscalar RISC processor. COMPCON 95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. MLG92.Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. RD94.K. Roland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59- 67, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ric92.Stephen E. Richardson. Caching function results: Faster arithmetic by avoiding unnecessary computation. Technical report, Sun Microsystems Laboratories, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. SE94.Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of the A CM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196-205, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SIG91.SIGPLAN. Proceedings of the Symposium on Partial Evaluation and Semantics-Based Program Manipulation, volume 26, Cambridge, MA, September 1991. SIGPLAN Notices.Google ScholarGoogle Scholar
  25. Smi81.J.E. Smith. A study of branch prediction techniques. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 135-147, June 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Smi82.Alan Jay Smith. Cache memories. Computing Surveys, 14(3):473-530, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. SW94.Amitabh Srivastava and David W. Wall. Link-time optimization of address calculation on a 64-bit architecture. SIGPLAN Notices, 29(6):49-60, June 1994. Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. TFMP95.Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 93-103, December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. YP91.T.Y. Yeh and Y. N. Patt. Two-level adaptive training branch prediction, in Proceedings of the 24th Annual International Symposium on Microarchitecture, pages 51-61, November 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Value locality and load value prediction

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
          October 1996
          290 pages
          ISBN:0897917677
          DOI:10.1145/237090

          Copyright © 1996 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 1996

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ASPLOS VII Paper Acceptance Rate25of109submissions,23%Overall Acceptance Rate535of2,713submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader