nach oben

International Journal of Parallel Programming

Erschienen in:

09.10.2017

The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs

verfasst von: Wenjie Liu, Sheng Ma, Libo Huang, Zhiying Wang

Erschienen in: International Journal of Parallel Programming | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Memory access scheduling schemes, often performed in memory controllers, have a marked impact on alleviating the heavy burden placed on memory systems of GPGPUs. Existing out-of-order scheduling schemes, like FR-FCFS, improve memory access efficiency by reordering memory request sequences at the destination. Their effectiveness, however, is at the expense of complex logics and high power consumption. In this paper, we propose a NoC-side memory access scheduling based on the key insight that the transmission of on-chip networks is the dominating factor in destroying the row access locality and causing poor memory access efficiency. With appropriate NoC-side optimization, the straight-forward in-order scheduling can be used in memory controllers to simplify scheduling logics and alleviate the tight power envelope. Moreover, we introduce several light-weight optimizations to further improve the system performance. Experimental results on memory-intensive applications show that, comparing with FR-FCFS, our proposed scheme increases the overall system performance by 10.5%, reduces the power consumption by 20% and improves the energy efficiency by 36.9%.

Vorheriger Artikel Enabling Realistic Logical Device Interface and Driver for NVM Express Enabled Full System Simulations

Nächster Artikel Partial-PreSET: Enhancing Lifetime of PCM-Based Main Memory with Fine-Grained SET Operations

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bakhoda, A., Kim, J., Aamodt, T.M.: Throughput-effective on-chip networks for manycore accelerators. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 421–432. IEEE Computer Society (2010)

Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174. IEEE (2009)

Bourduas, S., Zilic, Z.: A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In: Proceedings of the First International Symposium on Networks-on-Chip, pp. 195–204. IEEE Computer Society (2007)

Chen, C.T., Huang, Y.S.C., Chang, Y.Y., Tu, C.Y., King, C.T., Wang, T.Y., Sang, J., Li, M.H.: Designing Coalescing Network-on-Chip for Efficient Memory Accesses of GPGPUs, pp. 169–180. Springer, Berlin (2014)

Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., Burlington (2003)

Dally, W.J., Towles, B.: Route packets, not wires: on-chip interconnection networks. In: Proceedings of the 38th Design Automation Conference, pp. 684–689. ACM (2001)

Jang, H., Kim, J., Gratz, P., Yum, K.H., Kim, E.J.: Bandwidth-efficient on-chip interconnect designs for GPGPUs. In: Proceedings of the 52nd Annual Design Automation Conference, pp. 9:1–9:6. ACM (2015)

Jerger, N.E., Peh, L.S.: On-chip networks. Synthesis Lectures on Computer Architecture, p. 141. Morgan & Claypool Publishers (2009). doi:10.2200/S00209ED1V01Y200907CAC008.

Kim, H., Kim, J., Seo, W., Cho, Y., Ryu, S.: Providing cost-effective on-chip network bandwidth in GPGPUs. In: 2012 IEEE 30th International Conference on Computer Design (ICCD), pp. 407–412. IEEE Computer Society (2012)

10.

Kim, Y., Lee, H., Kim, J.: An alternative memory access scheduling in manycore accelerators. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 195–196. IEEE Computer Society (2011)

11.

Lee, J., Li, S., Kim, H., Yalamanchili, S.: Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures. ACM Trans. Des. Autom. Electron. Syst. 18(4), 48:1–48:28 (2013)

12.

Leng, J., Hetherington, T., ElTantawy, A., Gilani, S., Kim, N.S., Aamodt, T.M., Reddi, V.J.: GPUwattch: enabling energy optimizations in GPGPUs. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 487–498. ACM (2013)

13.

Ma, S., Enright Jerger, N., Wang, Z.: DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, pp. 413–424. ACM (2011)

14.

Mutlu, O., Moscibroda, T.: Stall-time fair memory access scheduling for chip multiprocessors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 146–160. IEEE Computer Society (2007)

15.

Mutlu, O., Moscibroda, T.: Parallelism-aware batch scheduling: enhancing both performance and fairness of shared dram systems. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 63–74. IEEE Computer Society (2008)

16.

Nesbit, K.J., Aggarwal, N., Laudon, J., Smith, J.E.: Fair queuing memory systems. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 208–222. IEEE Computer Society (2006)

17.

Rafique, N., Lim, W.T., Thottethodi, M.: Effective management of dram bandwidth in multicore processors. In: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp. 245–258. IEEE Computer Society (2007)

18.

Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 128–138. ACM (2000)

19.

Stratton, J.A., Rodrigues, C., Sung, I.J., Obeid, N., Chang, L.W., Anssari, N., Geng, D., Liu, W.M., Hwu, W.: Parboil: a revised benchmark suite for scientific and commercial throughput computing. IMPACT Technical Report (2012)

20.

Yuan, G.L., Bakhoda, A., Aamodt, T.M.: Complexity effective memory access scheduling for many-core accelerator architectures. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 34–44. ACM (2009)

Titel: The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs
verfasst von: Wenjie Liu
Sheng Ma
Libo Huang
Zhiying Wang
Publikationsdatum: 09.10.2017
Verlag: Springer US
Erschienen in: International Journal of Parallel Programming / Ausgabe 4/2018
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-017-0521-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2018

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Editor’s Note: Special Issue on Network and Parallel Computing for New Architectures and Applications

A Scalable Runtime Fault Localization Framework for High-Performance Computing Systems

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

Enabling Realistic Logical Device Interface and Driver for NVM Express Enabled Full System Simulations

Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive