skip to main content
10.1145/1736020.1736060acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Inter-core cooperative TLB for chip multiprocessors

Published:13 March 2010Publication History

ABSTRACT

Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for uniprocessors. With the growing dominance of chip multiprocessors (CMPs), it is necessary to examine TLB performance in the context of parallel workloads.

This work is the first to present TLB prefetchers that exploit commonality in TLB miss patterns across cores in CMPs. We propose and evaluate two Inter-Core Cooperative (ICC) TLB prefetching mechanisms, assessing their effectiveness at eliminating TLB misses both individually and together. Our results show these approaches require at most modest hardware and can collectively eliminate 19% to 90% of data TLB (D-TLB) misses across the surveyed parallel workloads.

We also compare performance improvements across a range of hardware and software implementation possibilities. We find that while a fully-hardware implementation results in average performance improvements of 8-46% for a range of TLB sizes, a hardware/software approach yields improvements of 4-32%. Overall, our work shows that TLB prefetchers exploiting inter-core correlations can effectively eliminate TLB misses.

References

  1. T.Anderson et al. The Interaction of Architecture and Operating System Design., Intl. Symp. on Architecture Support for Programming Languages and Operating Systems, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A.Bhattacharjee and M.Martonosi. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors. Intl. Conf. on Parallel Architectures and Compilation Techniques, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C.Bienia et al. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Intl. Conf. on Parallel Architectures and Compilation Techniques, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J.B. Chen, A.Borg, and N.Jouppi. A Simulation Based Study of TLB Performance. Intl. Symp. on Computer Architecture, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T.Chen and J.Baer. Effective Hardware-based Data Prefetching for High-Performance Processors. IEEE Trans. on Computers, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D.Clark and J.Emer.Performance of the VAX-11/780 Translation Buffers: Simulation and Measurement. ACM Transactions on Computer Systems, 3(1), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F.Dahlgren, M.Dubois, and P.Stenstrom. Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors. Intl. Conf. on Parallel Processing, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H.Huck and H.Hays. Architectural Support for Translation Table Management in Large Address Space Machines. Intl. Symp. on Computer Architecture, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B.Jacob and T.Mudge. Software-Managed Address Translation. Intl. Symp. on High Performance Computer Architecture, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B.Jacob and T.Mudge. A Look at Several Memory Management Units: TLB-Refill, and Page Table Organizations. Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B.Jacob and T.Mudge. Virtual Memory in Contemporary Microprocessors. IEEE Micro, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D.Joseph and D.Grunwald. Prefetching using Markov Predictors. Intl. Symp. on Computer Architecture, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G.Kandiraju and A.Sivasubramaniam. Characterizing the d-TLB Behavior of SPEC CPU2000 Benchmarks. ACM SIGMETRICS Intl. Conf. on Measurement and Modeling of Computer Systems, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G.Kandiraju and A.Sivasubramaniam. Going the Distance for TLB Prefetching: An Application-Driven Study. Intl. Symp. on Computer Architecture, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M.Martin et al. Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset. Comp. Arch. News, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D.Nagle et al. Design Tradeoffs for Software Managed TLBs. Intl. Symp. on Computer Architecture, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X.Qui and M.Dubois. Options for Dynamic Address Translations in COMAs. Intl. Symp. on Comp. Arch., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M.Rosenblum et al. The Impact of Architectural Trends on Operating System Performance. ACM Transactions on Modeling and Computer Simulation, 1995.Google ScholarGoogle Scholar
  19. A.Saulsbury, F.Dahlgren, and P.Stenstrom. Recency-Based TLB Preloading.Intl. Symp. on Comp. Arch., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V.Srinivasan, E.Davidson, and G.Tyson. A Prefetch Taxonomy. IEEE Transaction on Computers, 53(2), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sun. UltraSPARC III Cu User's Manual. 2004.Google ScholarGoogle Scholar
  22. M.Talluri. Use of Superpages and Subblocking in the Address Translation Hierarchy. PhD Thesis, Dept. of CS, Univ. of Wisc., 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M.Talluri and M.Hill. Surpassing the TLB Performance of Superpages with Less Operating System Support. Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Virtutech.Simics for Multicore Software. 2007.Google ScholarGoogle Scholar

Index Terms

  1. Inter-core cooperative TLB for chip multiprocessors

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
              March 2010
              422 pages
              ISBN:9781605588391
              DOI:10.1145/1736020
              • General Chair:
              • James C. Hoe,
              • Program Chair:
              • Vikram S. Adve
              • cover image ACM SIGARCH Computer Architecture News
                ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
                ASPLOS '10
                March 2010
                399 pages
                ISSN:0163-5964
                DOI:10.1145/1735970
                Issue’s Table of Contents
              • cover image ACM SIGPLAN Notices
                ACM SIGPLAN Notices  Volume 45, Issue 3
                ASPLOS '10
                March 2010
                399 pages
                ISSN:0362-1340
                EISSN:1558-1160
                DOI:10.1145/1735971
                Issue’s Table of Contents

              Copyright © 2010 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 13 March 2010

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              ASPLOS XV Paper Acceptance Rate32of181submissions,18%Overall Acceptance Rate535of2,713submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader