Skip to main content
Erschienen in: Cluster Computing 1/2017

14.02.2017

A dynamic CTA scheduling scheme for massive parallel computing

verfasst von: Dong Oh Son, Cong Thuan Do, Hong Jun Choi, Jiseung Nam, Cheol Hong Kim

Erschienen in: Cluster Computing | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent computing devices execute massive parallel data requiring huge computing hardware. To satisfy increasing computing need, GPUs providing powerful computational capability are employed to execute both graphics and general-purpose applications (GPGPUs). In the GPGPU, executing multiple applications together can increase the data parallelism, resulting in high resource utilization. Improving the resource utilization of the GPGPU can increase the GPGPU performance. However, various kinds of applications have different execution time depending on their workload sizes. Therefore, if one application is completed earlier than the other ones, resource underutilization problem may happen because the hardware resource allocated for the early completed application becomes idle. In this work, a CTA-aware dynamic streaming multiprocessors scheduling scheme is proposed for multiple applications execution in the GPGPU to efficiently manage hardware resources. Simulation results show that the proposed CTA-aware dynamic SM scheduling scheme can increase the GPU performance by up to 25.6% on average.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal, V., Hrishikesh, M.S., Keckler, S.W., Burger, D.: CLK rate versus IPC: the end of the road for conventional microarchitectures, In: Proceedings of the 27th International Symposium on Computer Architecture. pp. 248–259 (2000) Agarwal, V., Hrishikesh, M.S., Keckler, S.W., Burger, D.: CLK rate versus IPC: the end of the road for conventional microarchitectures, In: Proceedings of the 27th International Symposium on Computer Architecture. pp. 248–259 (2000)
2.
Zurück zum Zitat Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: Proceedings of 7th Conference on Architectural Support for Programming Languages and Operating Systems, pp. 2–11 (1996) Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: Proceedings of 7th Conference on Architectural Support for Programming Languages and Operating Systems, pp. 2–11 (1996)
3.
Zurück zum Zitat Hill, M.D., Marty, M.R.: Amdahls law in the multicore era. IEEE Comput. 41(7), 33–38 (2008)CrossRef Hill, M.D., Marty, M.R.: Amdahls law in the multicore era. IEEE Comput. 41(7), 33–38 (2008)CrossRef
4.
Zurück zum Zitat Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: Proceedings of Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 777–786 (2004) Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: Proceedings of Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 777–786 (2004)
5.
Zurück zum Zitat Lee, V.W., Kim, C.K., Chhugani, J., Deisher, M., Kim, D.H., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU Myth: an Evaluation of Throughput Computing on CPU and GPU. In: Proceedings of International Symposium on Computer Architecture, pp. 451–460 (2010) Lee, V.W., Kim, C.K., Chhugani, J., Deisher, M., Kim, D.H., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU Myth: an Evaluation of Throughput Computing on CPU and GPU. In: Proceedings of International Symposium on Computer Architecture, pp. 451–460 (2010)
10.
Zurück zum Zitat Tanasic, I., Gelado, I., Cabezas, J., Ramrez, A., Navarro, N., Valero, M.: Enabling preemptive multiprogramming on GPUs. In: Proceedings of the 41st International Symposium on Computer Architecture, pp. 193–204 (2014) Tanasic, I., Gelado, I., Cabezas, J., Ramrez, A., Navarro, N., Valero, M.: Enabling preemptive multiprogramming on GPUs. In: Proceedings of the 41st International Symposium on Computer Architecture, pp. 193–204 (2014)
11.
Zurück zum Zitat Xie, X., Liang, Y., Wang, Y., Sun, G., Wang, T.: Coordinated static and dynamic cache bypassing for GPUs. In: Proceedings of 21th IEEE International Symposium on High Performance Computer Architecture, pp. 76–88 (2015) Xie, X., Liang, Y., Wang, Y., Sun, G., Wang, T.: Coordinated static and dynamic cache bypassing for GPUs. In: Proceedings of 21th IEEE International Symposium on High Performance Computer Architecture, pp. 76–88 (2015)
12.
Zurück zum Zitat Voitsechov, D., Etsion, Y.: Single-graph multiple flows: energy efficient design alternative for GPGPUs. In: Proceedings of the 41st International Symposium on Computer Architecture, pp. 205–216 (2014) Voitsechov, D., Etsion, Y.: Single-graph multiple flows: energy efficient design alternative for GPGPUs. In: Proceedings of the 41st International Symposium on Computer Architecture, pp. 205–216 (2014)
13.
Zurück zum Zitat Lee, S., Arunkumar, A., Wu, C.: CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads. In: Proceedings of the 42st International Symposium on Computer Architecture, pp. 515–527 (2015) Lee, S., Arunkumar, A., Wu, C.: CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads. In: Proceedings of the 42st International Symposium on Computer Architecture, pp. 515–527 (2015)
14.
Zurück zum Zitat Wu, G.Y., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: Proceedings of 21th IEEE International Symposium on High Performance Computer Architecture, pp. 564–576 (2015) Wu, G.Y., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: Proceedings of 21th IEEE International Symposium on High Performance Computer Architecture, pp. 564–576 (2015)
15.
Zurück zum Zitat Lee, M., Song, S., Moon, J., Kim, J., Seo, W., Cho, Y., Ryu, S.: Improving GPGPU resources utilization through alternative thread block scheduling. In: Proceedings of 20th IEEE International Symposium on High Performance Computer Architecture, pp. 260–271 (2014) Lee, M., Song, S., Moon, J., Kim, J., Seo, W., Cho, Y., Ryu, S.: Improving GPGPU resources utilization through alternative thread block scheduling. In: Proceedings of 20th IEEE International Symposium on High Performance Computer Architecture, pp. 260–271 (2014)
16.
Zurück zum Zitat Jog, A., Bolotin, E., Guz, Z., Parker, M., Keckler, S.W., Kandemir, M.T., Das, C. R.: Application-aware memory system for fair and efficient execution of concurrent GPGPU applications. In: proceedings of 7th Workshop on General Purpose Processing Using GPUs (2014) Jog, A., Bolotin, E., Guz, Z., Parker, M., Keckler, S.W., Kandemir, M.T., Das, C. R.: Application-aware memory system for fair and efficient execution of concurrent GPGPU applications. In: proceedings of 7th Workshop on General Purpose Processing Using GPUs (2014)
18.
Zurück zum Zitat Son, D.O., Do, C.T., Choi, H.J., Kim, J.M., Park, J.H., Kim, C.H.: CTA-aware dynamic scheduling scheme for streaming multiprocessors in high-performance GPUs. In: Proceedings of Information Science and Applications (ICISA 2016), Vol. 376, pp. 1391–1399 (2016) Son, D.O., Do, C.T., Choi, H.J., Kim, J.M., Park, J.H., Kim, C.H.: CTA-aware dynamic scheduling scheme for streaming multiprocessors in high-performance GPUs. In: Proceedings of Information Science and Applications (ICISA 2016), Vol. 376, pp. 1391–1399 (2016)
20.
Zurück zum Zitat Thornton, J.E.: Parallel operation in the control data 6600. In: Proceedings of AFIPS Proceedings of FJCC, Part. 2, Vol. 26, pp. 33–40 (1964) Thornton, J.E.: Parallel operation in the control data 6600. In: Proceedings of AFIPS Proceedings of FJCC, Part. 2, Vol. 26, pp. 33–40 (1964)
21.
Zurück zum Zitat Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp. 163–174 (2009) Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp. 163–174 (2009)
22.
Zurück zum Zitat Bakhoda, A.G., Yuan, L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp. 163–174 (2009) Bakhoda, A.G., Yuan, L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp. 163–174 (2009)
23.
Zurück zum Zitat Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the International Symposium on Microarchitecture, pp. 469–480 (2009) Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the International Symposium on Microarchitecture, pp. 469–480 (2009)
Metadaten
Titel
A dynamic CTA scheduling scheme for massive parallel computing
verfasst von
Dong Oh Son
Cong Thuan Do
Hong Jun Choi
Jiseung Nam
Cheol Hong Kim
Publikationsdatum
14.02.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 1/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0768-9

Weitere Artikel der Ausgabe 1/2017

Cluster Computing 1/2017 Zur Ausgabe