Skip to main content

2015 | OriginalPaper | Buchkapitel

A Machine Learning Approach for a Scalable, Energy-Efficient Utility-Based Cache Partitioning

verfasst von : Isa Ahmet Guney, Abdullah Yildiz, Ismail Ugur Bayindir, Kemal Cagri Serdaroglu, Utku Bayik, Gurhan Kucuk

Erschienen in: High Performance Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In multi- and many-core processors, a shared Last Level Cache (LLC) is utilized to alleviate the performance problems resulting from long latency memory instructions. However, an unmanaged LLC may become quite useless when the running threads have conflicting interests. In one extreme, a thread can make benefit from every portion of the cache whereas, in the other end, another thread may just want to thrash the whole LLC. Recently, a variety of way-partitioning mechanisms are introduced to improve cache performance. Today, almost all of the studies utilize the Utility-based Cache Partitioning (UCP) algorithm as their allocation policy. However, the UCP look-ahead algorithm, although it provides a better utility measure than its greedy counterpart, requires a very complex hardware circuitry and dissipates a considerable amount of energy at the end of each decision period. In this study, we propose an offline supervised machine learning algorithm that replaces the UCP look-ahead circuitry with a circuitry requiring almost negligible hardware and energy cost. Depending on the cache and processor configuration, our thorough analysis and simulation results show that the proposed mechanism reduces up to 5 % of the overall transistor count and 5 % of the overall processor energy without introducing any performance penalty.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 423–432. IEEE Computer Society, Washington, DC (2006) Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 423–432. IEEE Computer Society, Washington, DC (2006)
2.
Zurück zum Zitat Xie, Y., Loh, G.H.: PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In: SIGARCH Computer Architecture News, pp. 174–183. ACM, New York (2009) Xie, Y., Loh, G.H.: PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In: SIGARCH Computer Architecture News, pp. 174–183. ACM, New York (2009)
3.
Zurück zum Zitat Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., Emer, J.: Adaptive insertion policies for high performance caching. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 381–391. ACM, New York (2007) Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., Emer, J.: Adaptive insertion policies for high performance caching. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 381–391. ACM, New York (2007)
4.
Zurück zum Zitat Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, Jr., S., Emer, J.: Adaptive insertion policies for managing shared caches. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 208–219. ACM, New York (2008) Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, Jr., S., Emer, J.: Adaptive insertion policies for managing shared caches. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 208–219. ACM, New York (2008)
5.
Zurück zum Zitat Sanchez, D., Kozyrakis, C.: Vantage: scalable and efficient fine-grain cache partitioning. In: SIGARCH Computer Architecture News, pp. 57–68. ACM, New York (2011) Sanchez, D., Kozyrakis, C.: Vantage: scalable and efficient fine-grain cache partitioning. In: SIGARCH Computer Architecture News, pp. 57–68. ACM, New York (2011)
6.
Zurück zum Zitat Wang, R., Chen, L.: Futility scaling: high-associativity cache partitioning. In: 47th IEEE/ACM International Symposium on Microarchitecture (MICRO) (2014) Wang, R., Chen, L.: Futility scaling: high-associativity cache partitioning. In: 47th IEEE/ACM International Symposium on Microarchitecture (MICRO) (2014)
7.
Zurück zum Zitat Choi, S., Yeung, D.: Learning-based SMT processor resource distribution via hill-climbing. In: SIGARCH Computer Architecture News, pp. 239–251. ACM, New York (2006) Choi, S., Yeung, D.: Learning-based SMT processor resource distribution via hill-climbing. In: SIGARCH Computer Architecture News, pp. 239–251. ACM, New York (2006)
8.
Zurück zum Zitat Bitirgen, R., Ipek, E., Martinez, J.F.: Coordinated management of multiple interacting resources in chip multiprocessors: a machine learning approach. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41), pp. 318–329. IEEE, Computer Society, Washington DC (2008) Bitirgen, R., Ipek, E., Martinez, J.F.: Coordinated management of multiple interacting resources in chip multiprocessors: a machine learning approach. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41), pp. 318–329. IEEE, Computer Society, Washington DC (2008)
10.
11.
Zurück zum Zitat Hamerly, G., Perelman, E., Lau, J., Calder, B.: SimPoint 3.0: faster and more flexible program phase analysis. J. Instr. Level Parallelism 7, 1–28 (2005) Hamerly, G., Perelman, E., Lau, J., Calder, B.: SimPoint 3.0: faster and more flexible program phase analysis. J. Instr. Level Parallelism 7, 1–28 (2005)
12.
Zurück zum Zitat Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40), pp. 3–14. IEEE Computer Society, Washington, DC (2007) Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40), pp. 3–14. IEEE Computer Society, Washington, DC (2007)
13.
Zurück zum Zitat Tran, A.T., Baas, B.M.: Design of an energy-efficient 32-bit adder operating at subthreshold voltages in 45-nm CMOS. In: Third International Conference on Communications and Electronics (ICCE), pp. 87–91 (2010) Tran, A.T., Baas, B.M.: Design of an energy-efficient 32-bit adder operating at subthreshold voltages in 45-nm CMOS. In: Third International Conference on Communications and Electronics (ICCE), pp. 87–91 (2010)
14.
Zurück zum Zitat Mehmood, N., Hansson, M., Alvandpour, A.: An energy-efficient 32-bit multiplier architecture in 90-nm CMOS. In: IEEE 24th Norchip Conference, pp. 35–38 (2006) Mehmood, N., Hansson, M., Alvandpour, A.: An energy-efficient 32-bit multiplier architecture in 90-nm CMOS. In: IEEE 24th Norchip Conference, pp. 35–38 (2006)
15.
Zurück zum Zitat Pham, T.N., Swartzlander, E.E.: Design of Radix 4 SRT dividers for single precision DSP in deep submicron CMOS technology. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 236–241 (2006) Pham, T.N., Swartzlander, E.E.: Design of Radix 4 SRT dividers for single precision DSP in deep submicron CMOS technology. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 236–241 (2006)
16.
Zurück zum Zitat Folegnani, D., Gonzalez, A.: Energy-effective issue logic. In: IEEE International Symposium on Computer Architecture, pp. 230–239 (2001) Folegnani, D., Gonzalez, A.: Energy-effective issue logic. In: IEEE International Symposium on Computer Architecture, pp. 230–239 (2001)
Metadaten
Titel
A Machine Learning Approach for a Scalable, Energy-Efficient Utility-Based Cache Partitioning
verfasst von
Isa Ahmet Guney
Abdullah Yildiz
Ismail Ugur Bayindir
Kemal Cagri Serdaroglu
Utku Bayik
Gurhan Kucuk
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-20119-1_29

Neuer Inhalt