Skip to main content
Top

2018 | OriginalPaper | Chapter

GPU Computations and Memory Access Model Based on Petri Nets

Authors : Anna Gogolińska, Łukasz Mikulski, Marcin Piątkowski

Published in: Transactions on Petri Nets and Other Models of Concurrency XIII

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In modern systems CPUs as well as GPUs are equipped with multi-level memory architectures, where different levels of the hierarchy vary in latency and capacity. Therefore, various memory access models were studied. Such a model can be seen as an interface abstracting the user from the physical architecture details. In this paper we present a general and uniform GPU computation and memory access model based on bounded inhibitor Petri nets (PNs). Its effectiveness is demonstrated by comparing its throughputs to practical computational experiments performed with the usage of Nvidia GPU with CUDA architecture.
Our PN model is consistent with the workflow of multithreaded GPU streaming multiprocessors. It models a selection and execution of instructions for each warp. The three types of instructions included in the model are: the arithmetic operation, the access to the shared memory and the access to the global memory. For a given algorithm the model allows to check how efficient the parallelization is, and whether a different organization of threads will improve performance.
The accuracy of our model was tested with different kernels. As the preliminary experiments we used the matrix multiplication program and stability example created by Nvidia, and as the main experiment a binary version of the least significant digit radix sort algorithm. We created three implementations of the algorithm using CUDA architecture, differing in the usage of shared and global memory as well as organization of calculations. For each implementation the PN model was used and the results of experiments are presented in the work.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that in the case of bounded nets the use of inhibitors is not necessary, one can provide an equivalent (with more complex structure) net without inhibitors.
 
Literature
1.
go back to reference Chiola, G., Donatelli, S., Franceschinis, G.: Priorities, inhibitor arcs and concurrency in P/T nets. In: Proceedings of ICATPN, vol. 91, pp. 182–205 (1991) Chiola, G., Donatelli, S., Franceschinis, G.: Priorities, inhibitor arcs and concurrency in P/T nets. In: Proceedings of ICATPN, vol. 91, pp. 182–205 (1991)
3.
go back to reference Nvidia Corporation. CUDA. Best practice guide version 8.0.61 (2017) Nvidia Corporation. CUDA. Best practice guide version 8.0.61 (2017)
4.
go back to reference Corporation, Nvidia: CUDA. C programming guide version 7.5 (2017) Corporation, Nvidia: CUDA. C programming guide version 7.5 (2017)
6.
go back to reference Grama, A.: Introduction to Parallel Computing. Pearson Education, London (2003) Grama, A.: Introduction to Parallel Computing. Pearson Education, London (2003)
7.
go back to reference Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, vol. 37, pp. 152–163. ACM (2009) Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, vol. 37, pp. 152–163. ACM (2009)
8.
go back to reference Hwang, K., Jotwani, N.: Advanced Computer Architecture, 3rd edn. McGraw-Hill Education, New York (2011) Hwang, K., Jotwani, N.: Advanced Computer Architecture, 3rd edn. McGraw-Hill Education, New York (2011)
11.
go back to reference Luebke, D., Owens, J., Roberts, M., Lee, C.-H.: Intro to Parallel Programming - Online Course. Nvidia Corporation, Santa Clara Luebke, D., Owens, J., Roberts, M., Lee, C.-H.: Intro to Parallel Programming - Online Course. Nvidia Corporation, Santa Clara
12.
go back to reference Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Gen. Comput. Syst. 30, 202–215 (2014)CrossRef Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Gen. Comput. Syst. 30, 202–215 (2014)CrossRef
13.
go back to reference Ma, L., Chamberlain, R.D., Agrawal, K.: Performance modeling for highly-threaded many-core GPUs. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 84–91. IEEE (2014) Ma, L., Chamberlain, R.D., Agrawal, K.: Performance modeling for highly-threaded many-core GPUs. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 84–91. IEEE (2014)
14.
go back to reference Madougou, S., Varbanescu, A.L., de Laat, C.: Using colored petri nets for GPGPU performance modeling. In: Proceedings of the ACM International Conference on Computing Frontiers, pp. 240–249. ACM (2016) Madougou, S., Varbanescu, A.L., de Laat, C.: Using colored petri nets for GPGPU performance modeling. In: Proceedings of the ACM International Conference on Computing Frontiers, pp. 240–249. ACM (2016)
15.
go back to reference Mukherjee, R.: A performance prediction model for the CUDA GPGPU platform. Ph.D. thesis, International Institute of Information Technology Hyderabad, India (2010) Mukherjee, R.: A performance prediction model for the CUDA GPGPU platform. Ph.D. thesis, International Institute of Information Technology Hyderabad, India (2010)
16.
go back to reference Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)CrossRef Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)CrossRef
17.
go back to reference Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011) Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)
19.
go back to reference Reisig, W.: Petri Nets: An Introduction, vol. 4. Springer, Heidelberg (2012) Reisig, W.: Petri Nets: An Introduction, vol. 4. Springer, Heidelberg (2012)
20.
go back to reference Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–10. IEEE (2009) Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–10. IEEE (2009)
21.
go back to reference Seward, H.E.: Information sorting in the application of electronic digital computers to business operations, Master Thesis, MIT (1954) Seward, H.E.: Information sorting in the application of electronic digital computers to business operations, Master Thesis, MIT (1954)
22.
go back to reference Shuaiwen, S., Chunyi, S., Barry, R., Kirk, C.: A simplified and accurate model of power-performance efficiency on emergent GPU architecure. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 673–686 (2013) Shuaiwen, S., Chunyi, S., Barry, R., Kirk, C.: A simplified and accurate model of power-performance efficiency on emergent GPU architecure. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 673–686 (2013)
23.
go back to reference Storti, D., Yurtoglu, M.: CUDA for Engineers: An Introduction to High-performance Parallel Computing. Addison-Wesley Professional, Boston (2015) Storti, D., Yurtoglu, M.: CUDA for Engineers: An Introduction to High-performance Parallel Computing. Addison-Wesley Professional, Boston (2015)
24.
go back to reference Westergaard, M., (Eric) Verbeek, H.M.W.: CPN Tools official webpage. Eindhoven University of Technology Westergaard, M., (Eric) Verbeek, H.M.W.: CPN Tools official webpage. Eindhoven University of Technology
25.
go back to reference Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef
26.
go back to reference Cheng, S.-T., Hung, Y.: Estimation of job execution time in mapreduce framework over GPU clusters. In: The Fifth International Conference on Performance, Safety and Robustness in Complex Systems and Applications, PESARO 2015 (2015) Cheng, S.-T., Hung, Y.: Estimation of job execution time in mapreduce framework over GPU clusters. In: The Fifth International Conference on Performance, Safety and Robustness in Complex Systems and Applications, PESARO 2015 (2015)
Metadata
Title
GPU Computations and Memory Access Model Based on Petri Nets
Authors
Anna Gogolińska
Łukasz Mikulski
Marcin Piątkowski
Copyright Year
2018
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-58381-4_7

Premium Partner