Skip to main content
Top
Published in: The Journal of Supercomputing 8/2019

25-01-2019

Application-aware NoC management in GPUs multitasking

Authors: Zhen Xu, Xia Zhao, Zhiying Wang, Canqun Yang

Published in: The Journal of Supercomputing | Issue 8/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Current network-on-chip (NoC) designs in GPUs are agnostic to application requirements, and this leads to wasted performance in GPUs multitasking. We observe that applications can generally be classified as either network-sensitive or network-insensitive. We propose the application-aware NoC (AA-NoC) management to better exploit the application characteristics. AA-NoC consists of the topology-aware streaming multiprocessor (SM) mapping and the adaptive virtual channel (VC) management. The topology-aware SM mapping is implemented in the concurrent thread array scheduler, and the adaptive VC management replies on a light-weight online profiling which only incurs limited hardware overhead. Compared to the traditional application-agnostic NoC, the evaluation results show that AA-NoC improves the STP and ANTT by 19.7% and 20.9%, respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Sewell K, Dreslinski RG, Manville T, Satpathy S, Pinckney N, Blake G, Cieslak M, Das R, Wenisch TF, Sylvester D, Blaauw D, Mudge T (2012) Swizzle-switch networks for many-core systems. IEEE J Emerg Sel Top Circuits Syst 2:278–294CrossRef Sewell K, Dreslinski RG, Manville T, Satpathy S, Pinckney N, Blake G, Cieslak M, Das R, Wenisch TF, Sylvester D, Blaauw D, Mudge T (2012) Swizzle-switch networks for many-core systems. IEEE J Emerg Sel Top Circuits Syst 2:278–294CrossRef
4.
go back to reference Bakhoda A, Kim J, Aamodt TM (2010) Throughput-effective on-chip networks for manycore accelerators. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 421–432 Bakhoda A, Kim J, Aamodt TM (2010) Throughput-effective on-chip networks for manycore accelerators. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 421–432
5.
go back to reference Kim H, Kim J, Seo W, Cho Y, Ryu S (2012) Providing cost-effective on-chip network bandwidth in GPGPUs. In: Proceedings of the International Conference on Computer Design (ICCD), pp 407–412 Kim H, Kim J, Seo W, Cho Y, Ryu S (2012) Providing cost-effective on-chip network bandwidth in GPGPUs. In: Proceedings of the International Conference on Computer Design (ICCD), pp 407–412
6.
go back to reference Jang H, Kim J, Gratz P, Yum KH, Kim EJ (2015) Bandwidth-efficient on-chip interconnect designs for GPGPUs. In: Proceedings of the Design Automation Conference (DAC), pp 9:1–9:6 Jang H, Kim J, Gratz P, Yum KH, Kim EJ (2015) Bandwidth-efficient on-chip interconnect designs for GPGPUs. In: Proceedings of the Design Automation Conference (DAC), pp 9:1–9:6
7.
go back to reference Zhao X, Ma S, Li C, Eeckhout L, Wang Z (2016) A heterogeneous low-cost and low-latency ring-chain network for GPGPUs. In: Proceedings of the International Conference on Computer Design (ICCD), pp 472–479 Zhao X, Ma S, Li C, Eeckhout L, Wang Z (2016) A heterogeneous low-cost and low-latency ring-chain network for GPGPUs. In: Proceedings of the International Conference on Computer Design (ICCD), pp 472–479
8.
go back to reference Adriaens JT, Compton K, Kim NS, Schulte MJ (2012) The case for GPGPU spatial multitasking. In: Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pp 1–12 Adriaens JT, Compton K, Kim NS, Schulte MJ (2012) The case for GPGPU spatial multitasking. In: Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pp 1–12
10.
go back to reference Jog A, Kayiran O, Kesten T, Pattnaik A, Bolotin E, Chatterjee N, Keckler SW, Kandemir MT, Das CR (2015) Anatomy of GPU memory system for multi-application execution. In: Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS Jog A, Kayiran O, Kesten T, Pattnaik A, Bolotin E, Chatterjee N, Keckler SW, Kandemir MT, Das CR (2015) Anatomy of GPU memory system for multi-application execution. In: Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS
11.
go back to reference Park JJK, Park Y, Mahlke S (2015) Chimera: collaborative preemption for multitasking on a shared GPU. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp 593–606 Park JJK, Park Y, Mahlke S (2015) Chimera: collaborative preemption for multitasking on a shared GPU. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp 593–606
12.
go back to reference Wang B, Yu W, Sun X-H, Wang X (2015) DaCache: memory divergence-aware GPU cache management. In: Proceedings of the International Conference on Supercomputing (ICS), pp 89–98 Wang B, Yu W, Sun X-H, Wang X (2015) DaCache: memory divergence-aware GPU cache management. In: Proceedings of the International Conference on Supercomputing (ICS), pp 89–98
13.
go back to reference Sethia A, Jamshidi DA, Mahlke S (2015) Mascar: speeding up GPU warps by reducing memory pitstops. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 174–185 Sethia A, Jamshidi DA, Mahlke S (2015) Mascar: speeding up GPU warps by reducing memory pitstops. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 174–185
14.
go back to reference Abts D, Enright Jerger ND, Kim J, Gibson D, Lipasti MH (2009) Achieving predictable performance through better memory controller placement in many-core CMPs. In: Proceedings of the International Symposium on Computer Architecture, pp 451–461 Abts D, Enright Jerger ND, Kim J, Gibson D, Lipasti MH (2009) Achieving predictable performance through better memory controller placement in many-core CMPs. In: Proceedings of the International Symposium on Computer Architecture, pp 451–461
15.
go back to reference Jerger N E, Krishna T, Peh L (2017) On-chip networks, 2nd edn. Morgan & Claypool Publishers, Williston Jerger N E, Krishna T, Peh L (2017) On-chip networks, 2nd edn. Morgan & Claypool Publishers, Williston
16.
go back to reference Tanasic I, Gelado I, Cabezas J, Ramirez A, Navarro N, Valero M (2014) Enabling preemptive multiprogramming on GPUs. In: Proceeding of the International Symposium on Computer Architecture (ISCA), pp 193–204 Tanasic I, Gelado I, Cabezas J, Ramirez A, Navarro N, Valero M (2014) Enabling preemptive multiprogramming on GPUs. In: Proceeding of the International Symposium on Computer Architecture (ISCA), pp 193–204
17.
go back to reference Rezazad M, Sarbazi-azad H (2005) The effect of virtual channel organization on the performance of interconnection networks. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) Rezazad M, Sarbazi-azad H (2005) The effect of virtual channel organization on the performance of interconnection networks. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS)
18.
go back to reference Lee J, Kim H (2012) TAP: a TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 1–12 Lee J, Kim H (2012) TAP: a TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 1–12
19.
go back to reference Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar), pp 1–10 Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar), pp 1–10
20.
go back to reference He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a MapReduce framework on graphics processors. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 260–269 He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a MapReduce framework on graphics processors. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 260–269
21.
go back to reference Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the International Symposium on Workload Characterization (IISWC), pp 44–54 Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the International Symposium on Workload Characterization (IISWC), pp 44–54
23.
go back to reference Bakhoda A, Yuan GL, Fung WWL, Wong H, Aamodt TM (2009) Analyzing CUDA workloads using a detailed GPU simulator. In: Proceeding of the International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 163–174 Bakhoda A, Yuan GL, Fung WWL, Wong H, Aamodt TM (2009) Analyzing CUDA workloads using a detailed GPU simulator. In: Proceeding of the International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 163–174
24.
go back to reference Stratton JA, Rodrigues C, Sung I-J, Obeid N, Chang L-W, Anssari N, Liu GD, Hwu WMW (2012) Parboil: a revised benchmark suite for scientific and commercial throughput computing. Technical report Stratton JA, Rodrigues C, Sung I-J, Obeid N, Chang L-W, Anssari N, Liu GD, Hwu WMW (2012) Parboil: a revised benchmark suite for scientific and commercial throughput computing. Technical report
25.
go back to reference Wang Z, Yang J, Melhem R, Childers B, Zhang Y, Guo M (2016) Simultaneous multikernel GPU: multi-tasking throughput processors via fine-grained sharing. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 358–369 Wang Z, Yang J, Melhem R, Childers B, Zhang Y, Guo M (2016) Simultaneous multikernel GPU: multi-tasking throughput processors via fine-grained sharing. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 358–369
26.
go back to reference Xu Q, Jeon H, Kim K, Ro WW, Annavaram M (2016) Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 230–242 Xu Q, Jeon H, Kim K, Ro WW, Annavaram M (2016) Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 230–242
27.
go back to reference Zhao X, Wang Z, Eeckhout L (2018) Classification-driven search for effective SM partitioning in GPU multitasking. In: Proceedings of the International Conference on Supercomputing (ICS) Zhao X, Wang Z, Eeckhout L (2018) Classification-driven search for effective SM partitioning in GPU multitasking. In: Proceedings of the International Conference on Supercomputing (ICS)
28.
go back to reference Eyerman S, Eeckhout L (2008) System-level performance metrics for multiprogram workloads. IEEE Micro 28(3):42–53CrossRef Eyerman S, Eeckhout L (2008) System-level performance metrics for multiprogram workloads. IEEE Micro 28(3):42–53CrossRef
29.
go back to reference Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Comput Graph Forum 5:179–188CrossRef Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Comput Graph Forum 5:179–188CrossRef
30.
go back to reference Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with SIMD machine architectures. Comput Graph Forum 6:3–11CrossRef Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with SIMD machine architectures. Comput Graph Forum 6:3–11CrossRef
31.
go back to reference Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–432CrossRef Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–432CrossRef
32.
go back to reference Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Comput Graph Forum 8:3–11CrossRef Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Comput Graph Forum 8:3–11CrossRef
33.
go back to reference Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192CrossRef Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192CrossRef
34.
go back to reference Arabnia HR (1996) Distributed stereo-correlation algorithm. Comput Commun 19(8):707–711CrossRef Arabnia HR (1996) Distributed stereo-correlation algorithm. Comput Commun 19(8):707–711CrossRef
35.
go back to reference Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10:243–269CrossRefMATH Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10:243–269CrossRefMATH
36.
go back to reference Arabnia HR, Taha TR (1998) A parallel numerical algorithm on a reconfigurable multi-ring network. Telecommun Syst 10(1–2):185–202CrossRef Arabnia HR, Taha TR (1998) A parallel numerical algorithm on a reconfigurable multi-ring network. Telecommun Syst 10(1–2):185–202CrossRef
37.
go back to reference Ziabari AK, Abellán JL, Ma Y, Joshi A, Kaeli D (2015) Asymmetric NoC architectures for GPU systems. In: Proceedings of the International Symposium on Networks-on-Chip (NoCs), pp 25:1–25:8 Ziabari AK, Abellán JL, Ma Y, Joshi A, Kaeli D (2015) Asymmetric NoC architectures for GPU systems. In: Proceedings of the International Symposium on Networks-on-Chip (NoCs), pp 25:1–25:8
38.
go back to reference Zhao X, Ma S, Liu Y, Eeckhout L, Wang Z (2016) A low-cost conflict-free NoC for GPGPUs. In: Proceedings of the Design Automation Conference (DAC), pp 34:1–34:6 Zhao X, Ma S, Liu Y, Eeckhout L, Wang Z (2016) A low-cost conflict-free NoC for GPGPUs. In: Proceedings of the Design Automation Conference (DAC), pp 34:1–34:6
39.
go back to reference Cheng X, Zhao Y, Zhao H, Xie Y (2018) Packet pump: overcoming network bottleneck in on-chip interconnects for GPGPUs. In: Proceedings of the Design Automation Conference (DAC), pp 84:1–84:6 Cheng X, Zhao Y, Zhao H, Xie Y (2018) Packet pump: overcoming network bottleneck in on-chip interconnects for GPGPUs. In: Proceedings of the Design Automation Conference (DAC), pp 84:1–84:6
40.
go back to reference Aguilera P, Morrow K, Kim NS (2014) Fair share: allocation of GPU resources for both performance and fairness. In: The 32nd IEEE International Conference on Computer Design, ICCD Aguilera P, Morrow K, Kim NS (2014) Fair share: allocation of GPU resources for both performance and fairness. In: The 32nd IEEE International Conference on Computer Design, ICCD
41.
go back to reference Wang H, Luo F, Ibrahim M, Kayiran O, Jog A (2018) Efficient and fair multi-programming in GPUs via effective bandwidth management. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 247–258 Wang H, Luo F, Ibrahim M, Kayiran O, Jog A (2018) Efficient and fair multi-programming in GPUs via effective bandwidth management. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 247–258
42.
go back to reference Ausavarungnirun R, Landgraf J, Miller V, Ghose S, Gandhi J, Rossbach CJ, Mutlu O (2017) Mosaic: a GPU memory manager with application-transparent support for multiple page sizes. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 136–150 Ausavarungnirun R, Landgraf J, Miller V, Ghose S, Gandhi J, Rossbach CJ, Mutlu O (2017) Mosaic: a GPU memory manager with application-transparent support for multiple page sizes. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 136–150
43.
go back to reference Dai H, Lin Z, Li C, Zhao C, Wang F, Zheng N, Zhou H (2018) Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 208–220 Dai H, Lin Z, Li C, Zhao C, Wang F, Zheng N, Zhou H (2018) Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 208–220
44.
go back to reference Liu Y, Yu Z, Eeckhout L, Reddi VJ, Luo Y, Wang X, Wang Z, Xu C (2016) Barrier-aware warp scheduling for throughput processors. In: Proceedings of the International Conference on Supercomputing (ICS), pp 42:1–42:12 Liu Y, Yu Z, Eeckhout L, Reddi VJ, Luo Y, Wang X, Wang Z, Xu C (2016) Barrier-aware warp scheduling for throughput processors. In: Proceedings of the International Conference on Supercomputing (ICS), pp 42:1–42:12
45.
go back to reference Jog A, Kayiran O, Mishra AK, andemir MT, Mutlu O, Iyer R, Das CR (2013) Orchestrated scheduling and prefetching for GPGPUs. In: ACM SIGARCH Computer Architecture News, vol 41, pp 332–343. ACM Jog A, Kayiran O, Mishra AK, andemir MT, Mutlu O, Iyer R, Das CR (2013) Orchestrated scheduling and prefetching for GPGPUs. In: ACM SIGARCH Computer Architecture News, vol 41, pp 332–343. ACM
46.
go back to reference Wang B, Zhu Y, Yu W (2016) OAWS: memory occlusion aware warp scheduling. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), pp 45–55 Wang B, Zhu Y, Yu W (2016) OAWS: memory occlusion aware warp scheduling. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), pp 45–55
47.
go back to reference Rogers TG, O’Connor M, Aamodt TM (2012) Cache-conscious wavefront scheduling. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 72–83 Rogers TG, O’Connor M, Aamodt TM (2012) Cache-conscious wavefront scheduling. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 72–83
48.
go back to reference Lee S-Y, Arunkumar A, Wu C-J (2015) CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 515–527 Lee S-Y, Arunkumar A, Wu C-J (2015) CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 515–527
49.
go back to reference Xie X, Liang Y, Wang Y, Sun G, Wang T (2015) Coordinated static and dynamic cache bypassing for GPUs. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 76–88 Xie X, Liang Y, Wang Y, Sun G, Wang T (2015) Coordinated static and dynamic cache bypassing for GPUs. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 76–88
50.
go back to reference Jia W, Shaw KA, Martonosi M (2014) MRPB: memory request prioritization for massively parallel processors. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 272–283 Jia W, Shaw KA, Martonosi M (2014) MRPB: memory request prioritization for massively parallel processors. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 272–283
51.
go back to reference Jeon H, Ravi GS, Kim NS, Annavaram M (2015) GPU register file virtualization. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 420–432 Jeon H, Ravi GS, Kim NS, Annavaram M (2015) GPU register file virtualization. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 420–432
52.
go back to reference Abdel-Majeed M, Annavaram M (2013) Warped register file: a power efficient register file for GPGPUs. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 412–423 Abdel-Majeed M, Annavaram M (2013) Warped register file: a power efficient register file for GPGPUs. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 412–423
53.
go back to reference Jing N, Shen Y, Lu Y, Ganapathy S, Mao Z, Guo M, Canal R, Liang X (2013) An energy-efficient and scalable eDRAM-based register file architecture for GPGPU. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 344–355 Jing N, Shen Y, Lu Y, Ganapathy S, Mao Z, Guo M, Canal R, Liang X (2013) An energy-efficient and scalable eDRAM-based register file architecture for GPGPU. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 344–355
54.
go back to reference Yoon M K, Kim K, Lee S, Ro WW, Annavaram M (2016) Virtual thread: maximizing thread-level parallelism beyond GPU scheduling limit. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 609–621 Yoon M K, Kim K, Lee S, Ro WW, Annavaram M (2016) Virtual thread: maximizing thread-level parallelism beyond GPU scheduling limit. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 609–621
55.
go back to reference Vijaykumar N, Hsieh K, Pekhimenko G, Khan S, Shrestha A, Ghose S, Jog A, Gibbons PB, Mutlu O (2016) Zorua: a holistic approach to resource virtualization in GPUs. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 1–14 Vijaykumar N, Hsieh K, Pekhimenko G, Khan S, Shrestha A, Ghose S, Jog A, Gibbons PB, Mutlu O (2016) Zorua: a holistic approach to resource virtualization in GPUs. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 1–14
56.
go back to reference Arunkumar A, Bolotin E, Cho B, Milic U, Ebrahimi E, Villa O, Jaleel A, Wu C-J, Nellans D (2017) MCM-GPU: multi-chip-module GPUs for continued performance scalability. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 320–332 Arunkumar A, Bolotin E, Cho B, Milic U, Ebrahimi E, Villa O, Jaleel A, Wu C-J, Nellans D (2017) MCM-GPU: multi-chip-module GPUs for continued performance scalability. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp 320–332
57.
go back to reference Milic U, Villa O, Bolotin E, Arunkumar A, Ebrahimi E, Jaleel A, Ramirez A, Nellans D (2017) Beyond the socket: NUMA-aware GPUs. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 123–135 Milic U, Villa O, Bolotin E, Arunkumar A, Ebrahimi E, Jaleel A, Ramirez A, Nellans D (2017) Beyond the socket: NUMA-aware GPUs. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 123–135
Metadata
Title
Application-aware NoC management in GPUs multitasking
Authors
Zhen Xu
Xia Zhao
Zhiying Wang
Canqun Yang
Publication date
25-01-2019
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 8/2019
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2694-x

Other articles of this Issue 8/2019

The Journal of Supercomputing 8/2019 Go to the issue

Premium Partner