skip to main content
10.1145/1669112.1669154acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Coordinated control of multiple prefetchers in multi-core systems

Published:12 December 2009Publication History

ABSTRACT

Aggressive prefetching is very beneficial for memory latency tolerance of many applications. However, it faces significant challenges in multi-core systems. Prefetchers of different cores on a chip multiprocessor (CMP) can cause significant interference with prefetch and demand accesses of other cores. Because existing prefetcher throttling techniques do not address this prefetcher-caused inter-core interference, aggressive prefetching in multi-core systems can lead to significant performance degradation and wasted bandwidth consumption.

To make prefetching effective in CMPs, this paper proposes a low-cost mechanism to control prefetcher-caused inter-core interference by dynamically adjusting the aggressiveness of multiple cores' prefetchers in a coordinated fashion. Our solution consists of a hierarchy of prefetcher aggressiveness control structures that combine per-core (local) and prefetcher-caused inter-core (global) interference feedback to maximize the benefits of prefetching on each core while optimizing overall system performance. These structures improve system performance by 23% while reducing bus traffic by 17% compared to employing aggressive prefetching and improve system performance by 14% compared to a state-of-the-art prefetcher aggressiveness control technique on an eight-core system.

References

  1. J. Baer and T. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Supercomputing '91, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Charney and T. Puzak. Prefetching and memory system behavior of the SPEC95 benchmark suite. IBM Journal of Research and Development, 31(3):265--286, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Cooksey et al. A stateless, content-directed data prefetching mechanism. In ASPLOS-X, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Dahlgren et al. Fixed and adaptive sequential prefetching in shared memory multiprocessors. In ICPP-22, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Ebrahimi et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Gendler et al. A PAB-based multi-prefetcher mechanism. Intl. Journal of Parallel Programming, 34(2):171--188, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. E. Goldberg and J. H. Holland. Genetic algorithms and machine learning. Journal of Machine Learning, 3(2--3):95--99, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS'07, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Joseph and D. Grunwald. Prefetching using Markov predictors. In ISCA-24, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. J. Lee et al. Prefetch-aware DRAM controllers. In MICRO-41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. L. Lee, P.-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In ICPP-16, 1987.Google ScholarGoogle Scholar
  15. W.-F. Lin et al. Filtering superfluous prefetches using density vectors. In ICCD-19, 2001.Google ScholarGoogle Scholar
  16. K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google ScholarGoogle Scholar
  17. Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google ScholarGoogle Scholar
  18. T. C. Mowry et al. Design and evaluation of a compiler algorithm for prefetching. In ASPLOS-5, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Mutlu et al. Using the first-level caches as filters to reduce the pollution caused by speculative memory references. Intl. Journal of Parallel Programming, 33(5):529--559, October 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. AC/DC: An adaptive data cache prefetcher. In PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA-34, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Patil et al. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Rixner et al. Memory access scheduling. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Srinath et al. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Tendler et al. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.Google ScholarGoogle Scholar
  29. D. M. Tullsen and S. J. Eggers. Limitations of cache prefetching on a bus-based multiprocessor. In ISCA-20, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Wechsler. Inside Intel Core microarchitecture. Intel Technical White Paper, 2006.Google ScholarGoogle Scholar
  31. X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Coordinated control of multiple prefetchers in multi-core systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
        December 2009
        601 pages
        ISBN:9781605587981
        DOI:10.1145/1669112

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 December 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate484of2,242submissions,22%

        Upcoming Conference

        MICRO '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader