skip to main content
10.1145/3264746.3264766acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Hardware-accelerated cache simulation for multicore by FPGA

Published:09 October 2018Publication History

ABSTRACT

Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.

References

  1. Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track. 41--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Erik Berg, Hakan Zeffer, and Erik Hagersten. 2006. A statistical multiprocessor cache model. In Performance Analysis of Systems and Software, 2006 IEEE International Symposium on. IEEE, 89--99.Google ScholarGoogle ScholarCross RefCross Ref
  3. Kristof Beyls and Erik DâĂŹHollander. 2001. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and systems, Vol. 14. 350--360.Google ScholarGoogle Scholar
  4. Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil A Patil, William Reinhart, Darrel Eric Johnson, Jebediah Keefe, and Hari Angepat. 2007. Fpga-accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators. In Proceedings of the 40th Annual IEEE/ACM international Symposium on Microarchitecture. IEEE Computer Society, 249--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Intel Coporation. {n. d.}. SignalTap II with Verilog Designs.Google ScholarGoogle Scholar
  6. Intel Coporation. {n. d.}. Using ModelSim to Simulate Logic Circuits in Verilog Designs.Google ScholarGoogle Scholar
  7. Intel Coporation. {n. d.}. Using TimeQuest Timing Analyzer.Google ScholarGoogle Scholar
  8. Intel Coporation. 2017. AvalonÂö Interface Specifications.Google ScholarGoogle Scholar
  9. Jan Edler and Mark D. Hill. {n. d.}. Dinero IV Trace-Driven Uniprocessor Cache Simulator. ({n. d.}).Google ScholarGoogle Scholar
  10. Matthew R Guthaus, Jeffrey S Ringenberg, Dan Ernst, Todd M Austin, Trevor Mudge, and Richard B Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on. IEEE, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mark D Hill and Alan Jay Smith. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. 38, 12 (1989), 1612--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Matthew Jacobsen, Dustin Richmond, Matthew Hogains, and Ryan Kastner. 2015. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 8, 4 (2015), 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xiaoyue Pan and Bengt Jonsson. 2014. Modeling cache coherence misses on multicores. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on. IEEE, 96--105.Google ScholarGoogle ScholarCross RefCross Ref
  14. Derek L Schuff, Milind Kulkarni, and Vijay S Pai. 2010. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 53--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David Brooks. 2016. Co-designing accelerators and soc interfaces using gem5-aladdin. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chia-Heng Tu, Hui-Hsin Hsu, Jen-Hao Chen, Chun-Han Chen, and Shih-Hao Hung. 2014. Performance and power profiling for emulated android systems. ACM Transactions on Design Automation of Electronic Systems (TODAES) 19, 2 (2014), 10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hardware-accelerated cache simulation for multicore by FPGA

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            RACS '18: Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems
            October 2018
            355 pages
            ISBN:9781450358859
            DOI:10.1145/3264746

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 October 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate393of1,581submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader