Skip to main content
Erschienen in: Cluster Computing 3/2017

07.04.2017

Performance analysis and comparison of cellular automata GPU implementations

verfasst von: Emmanuel N. Millán, Nicolás Wolovick, María Fabiana Piccoli, Carlos García Garino, Eduardo M. Bringa

Erschienen in: Cluster Computing | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cellular automata (CA) models are of interest to several scientific areas, and there is a growing interest in exploring large systems which would need high performance computing. In this work a CA implementation is presented which performs well in five different NVIDIA GPU architectures, from Tesla to Maxwell, simulating systems with up to a billion cells. Using the game of life (GoL) and a more complex variation of GoL as examples, a performance of 5.58e6 evaluated cells/s is achieved. The two optimizations most often used in previous studies are the use of shared memory and Multicell algorithms. Here, these optimizations do not improve performance in Fermi or newer architectures. The GoL CA code running in an NVIDIA Titan X obtained a speedup of up to \(\sim \)85 x and up to \(\sim \)230 x for a more complex CA, compared to an optimized serial CPU implementation. Finally, the efficiency of each GPU is analyzed in terms of cell performance/transistors and cell performance/bandwidth showing how the architectures improved for this particular problem.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aaby, B.G., Perumalla, K.S., Seal, S.K.: Efficient simulation of agent-based models on multi-GPU and multi-core clusters. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques p. 29:1 (2010). doi:10.4108/icst.simutools2010.8822 Aaby, B.G., Perumalla, K.S., Seal, S.K.: Efficient simulation of agent-based models on multi-GPU and multi-core clusters. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques p. 29:1 (2010). doi:10.​4108/​icst.​simutools2010.​8822
2.
Zurück zum Zitat Balasalle, J., Lopez, M.A., Rutherford, M.J.: Optimizing memory access patterns for cellular. In: Hwu, W. (ed.) GPU Computing Gems Jade Edition, pp. 67–75. Morgan Kaufmann, Amsterdam (2011) Balasalle, J., Lopez, M.A., Rutherford, M.J.: Optimizing memory access patterns for cellular. In: Hwu, W. (ed.) GPU Computing Gems Jade Edition, pp. 67–75. Morgan Kaufmann, Amsterdam (2011)
3.
Zurück zum Zitat Bauer, M., Cook, H., Khailany, B.: Cudadma. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on—SC 11 p. 12 (2011). doi:10.1145/2063384.2063400 Bauer, M., Cook, H., Khailany, B.: Cudadma. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on—SC 11 p. 12 (2011). doi:10.​1145/​2063384.​2063400
4.
5.
Zurück zum Zitat Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010) Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)
6.
Zurück zum Zitat Brown, W.M., Wang, P., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers—short range forces. Comput. Phys. Commun. 182(4), 898–911 (2011). doi:10.1016/j.cpc.2010.12.021 Brown, W.M., Wang, P., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers—short range forces. Comput. Phys. Commun. 182(4), 898–911 (2011). doi:10.​1016/​j.​cpc.​2010.​12.​021
7.
Zurück zum Zitat Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: ACM/IEEE 2000 Conference on Supercomputing, p. 42. IEEE (2000) Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: ACM/IEEE 2000 Conference on Supercomputing, p. 42. IEEE (2000)
9.
Zurück zum Zitat Carozzani, T., Gandin, C.A., Digonnet, H.: Optimized parallel computing for cellular automaton finite element modeling of solidification grain structures. Modelling Simul. Mater. Sci. Eng. 22(1), 015,012 (2013). doi:10.1088/0965-0393/22/1/015012 CrossRef Carozzani, T., Gandin, C.A., Digonnet, H.: Optimized parallel computing for cellular automaton finite element modeling of solidification grain structures. Modelling Simul. Mater. Sci. Eng. 22(1), 015,012 (2013). doi:10.​1088/​0965-0393/​22/​1/​015012 CrossRef
10.
18.
Zurück zum Zitat Gardner, M.: Mathematical games: the fantastic combinations of John Conway new solitaire game life. Sci. Am. 223(4), 120–123 (1970)CrossRef Gardner, M.: Mathematical games: the fantastic combinations of John Conway new solitaire game life. Sci. Am. 223(4), 120–123 (1970)CrossRef
19.
Zurück zum Zitat Gibson, M.J., Keedwell, E.C., Savi, D.: Understanding the efficient parallelisation of cellular automata on CPU and GPGPU hardware. In: Proceeding of the Fifteenth Annual Conference Companion on Genetic and Evolutionary Computation Conference Companion—GECCO 13 Companion pp. 171–172 (2013). doi:10.1145/2464576.2464660 Gibson, M.J., Keedwell, E.C., Savi, D.: Understanding the efficient parallelisation of cellular automata on CPU and GPGPU hardware. In: Proceeding of the Fifteenth Annual Conference Companion on Genetic and Evolutionary Computation Conference Companion—GECCO 13 Companion pp. 171–172 (2013). doi:10.​1145/​2464576.​2464660
20.
Zurück zum Zitat Gibson, M.J., Keedwell, E.C., Savi, D.A.: An investigation of the efficient implementation of cellular automata on multi-core CPU and GPU hardware. J. Parallel Distrib. Comput. 77, 11–25 (2014). doi:10.1016/j.jpdc.2014.10.011 Gibson, M.J., Keedwell, E.C., Savi, D.A.: An investigation of the efficient implementation of cellular automata on multi-core CPU and GPU hardware. J. Parallel Distrib. Comput. 77, 11–25 (2014). doi:10.​1016/​j.​jpdc.​2014.​10.​011
21.
Zurück zum Zitat Hawick, K.A., Johnson, M.G.: Bit-packed damaged lattice potts model simulations with cuda and gpus. In: Proceedings of International Conferences on Modelling, Simulation and Identification, pp. 371–378 (2011) Hawick, K.A., Johnson, M.G.: Bit-packed damaged lattice potts model simulations with cuda and gpus. In: Proceedings of International Conferences on Modelling, Simulation and Identification, pp. 371–378 (2011)
22.
23.
Zurück zum Zitat Kjolstad, F.B., Snir, M.: Ghost cell pattern. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, p. 4. ACM, New York (2010) Kjolstad, F.B., Snir, M.: Ghost cell pattern. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, p. 4. ACM, New York (2010)
26.
Zurück zum Zitat Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: Nvidia tesla: a unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)CrossRef Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: Nvidia tesla: a unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)CrossRef
27.
Zurück zum Zitat Maruyama, N., Aoki, T.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Größlinger, A., Köstler, H. (eds.) Proceedings of the 1st International Workshop on High-Performance Stencil Computations, pp. 89–95. Austria, Vienna (2014) Maruyama, N., Aoki, T.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Größlinger, A., Köstler, H. (eds.) Proceedings of the 1st International Workshop on High-Performance Stencil Computations, pp. 89–95. Austria, Vienna (2014)
28.
Zurück zum Zitat Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In: Proceedings of the 23rd International Conference on Conference on Supercomputing—ICS 09 pp. 256–265 (2009). doi:10.1145/1542275.1542313 Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In: Proceedings of the 23rd International Conference on Conference on Supercomputing—ICS 09 pp. 256–265 (2009). doi:10.​1145/​1542275.​1542313
30.
Zurück zum Zitat Millán, E.N., Martínez, P.C., Gil Costa, G.V., Piccoli, M.F., Printista, A.M., Bederian, C., García Garino, C., Bringa, E.M.: Parallel implementation of a cellular automata in a hybrid CPU/GPU environment. In: A. De Giusti (ed.) XVIII Congreso Argentino de Ciencias de la Computación, pp. 184–193. Red de Universidades con Carreras en Informática RedUNCI (2013). ISBN 978-987-23963-1-2 Millán, E.N., Martínez, P.C., Gil Costa, G.V., Piccoli, M.F., Printista, A.M., Bederian, C., García Garino, C., Bringa, E.M.: Parallel implementation of a cellular automata in a hybrid CPU/GPU environment. In: A. De Giusti (ed.) XVIII Congreso Argentino de Ciencias de la Computación, pp. 184–193. Red de Universidades con Carreras en Informática RedUNCI (2013). ISBN 978-987-23963-1-2
31.
Zurück zum Zitat Moore, N.: Kernel specialization for improved adaptability and performance on graphics processing units (gpus). Ph.D. thesis, Northeastern University Boston, MA (2012) Moore, N.: Kernel specialization for improved adaptability and performance on graphics processing units (gpus). Ph.D. thesis, Northeastern University Boston, MA (2012)
32.
Zurück zum Zitat North, M.J., Collier, N.T., Ozik, J., Tatara, E.R., Macal, C.M., Bragen, M., Sydelko, P.: Complex adaptive systems modeling with repast simphony. Complex Adapt. Syst. Model. 1(1), 3 (2013). doi:10.1186/2194-3206-1-3 CrossRef North, M.J., Collier, N.T., Ozik, J., Tatara, E.R., Macal, C.M., Bragen, M., Sydelko, P.: Complex adaptive systems modeling with repast simphony. Complex Adapt. Syst. Model. 1(1), 3 (2013). doi:10.​1186/​2194-3206-1-3 CrossRef
33.
Zurück zum Zitat NVIDIA: Whitepaper NVIDIA GeForce GTX 750 Ti, v1.1 NVIDIA: Whitepaper NVIDIA GeForce GTX 750 Ti, v1.1
34.
Zurück zum Zitat NVIDIA: Whitepaper NVIDIA GeForce GTX 980, v1.1 NVIDIA: Whitepaper NVIDIA GeForce GTX 980, v1.1
35.
Zurück zum Zitat NVIDIA: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Fermi, v1.1 NVIDIA: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Fermi, v1.1
36.
Zurück zum Zitat NVIDIA: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110, v1.0 NVIDIA: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110, v1.0
37.
Zurück zum Zitat NVIDIA: Nvidia geforce 8800 gpu architecture overview. Technical brief, November 2006 (2006) NVIDIA: Nvidia geforce 8800 gpu architecture overview. Technical brief, November 2006 (2006)
38.
Zurück zum Zitat NVIDIA: Tuning Cuda Applications for Kepler, v7.0 (2015) NVIDIA: Tuning Cuda Applications for Kepler, v7.0 (2015)
39.
Zurück zum Zitat NVIDIA: Tuning Cuda Applications for Maxwell, v7.0 (2015) NVIDIA: Tuning Cuda Applications for Maxwell, v7.0 (2015)
40.
Zurück zum Zitat NVIDIA: Nvidia geforce gtx 200 gpu architectural overview. Technical brief, May (2008) NVIDIA: Nvidia geforce gtx 200 gpu architectural overview. Technical brief, May (2008)
42.
Zurück zum Zitat Papadopoulou, M.M., Sadooghi-Alvandi, M., Wong, H.: Micro-benchmarking the GT200 GPU. Computer Group, ECE, University of Toronto, Technical Report (2009) Papadopoulou, M.M., Sadooghi-Alvandi, M., Wong, H.: Micro-benchmarking the GT200 GPU. Computer Group, ECE, University of Toronto, Technical Report (2009)
43.
Zurück zum Zitat Perumalla, K.S., Aaby, B.G.: Data parallel execution challenges and runtime performance of agent simulations on gpus. In: Proceedings of the 2008 Spring Simulation Multiconference, SpringSim’08, pp. 116–123. Society for Computer Simulation International, San Diego, CA, USA (2008). http://dl.acm.org/citation.cfm?id=1400549.1400564 Perumalla, K.S., Aaby, B.G.: Data parallel execution challenges and runtime performance of agent simulations on gpus. In: Proceedings of the 2008 Spring Simulation Multiconference, SpringSim’08, pp. 116–123. Society for Computer Simulation International, San Diego, CA, USA (2008). http://​dl.​acm.​org/​citation.​cfm?​id=​1400549.​1400564
44.
Zurück zum Zitat Pohl, T., Deserno, F., Thurey, N., Rude, U., Lammers, P., Wellein, G., Zeiser, T.: Performance evaluation of parallel large-scale lattice boltzmann applications on three supercomputing architectures. In: Proceedings of the ACM/IEEE SC2004 Conference p. 21 (2004). doi:10.1109/sc.2004.37 Pohl, T., Deserno, F., Thurey, N., Rude, U., Lammers, P., Wellein, G., Zeiser, T.: Performance evaluation of parallel large-scale lattice boltzmann applications on three supercomputing architectures. In: Proceedings of the ACM/IEEE SC2004 Conference p. 21 (2004). doi:10.​1109/​sc.​2004.​37
46.
Zurück zum Zitat RanjanNayak, D., Kumar Sahu, S., Mohammed, J.: A cellular automata based optimal edge detection technique using twenty-five neighborhood model. IJCA 84(10), 27–33 (2013). doi:10.5120/14614-2869 CrossRef RanjanNayak, D., Kumar Sahu, S., Mohammed, J.: A cellular automata based optimal edge detection technique using twenty-five neighborhood model. IJCA 84(10), 27–33 (2013). doi:10.​5120/​14614-2869 CrossRef
48.
Zurück zum Zitat Rauch, L., Madej, L., Spytkowski, P., Golab, R.: Development of the cellular automata framework dedicated for metallic materials microstructure evolution models. Arch. Civil Mech. Eng. 15(1), 48–61 (2015). doi:10.1016/j.acme.2014.06.006 CrossRef Rauch, L., Madej, L., Spytkowski, P., Golab, R.: Development of the cellular automata framework dedicated for metallic materials microstructure evolution models. Arch. Civil Mech. Eng. 15(1), 48–61 (2015). doi:10.​1016/​j.​acme.​2014.​06.​006 CrossRef
49.
Zurück zum Zitat Russo, L., Russo, P., Vakalis, D., Siettos, C.: Detecting weak points of wildland fire spread: a cellular automata model risk assessment simulation approach. Chem. Eng. 36, 253–258 (2014) Russo, L., Russo, P., Vakalis, D., Siettos, C.: Detecting weak points of wildland fire spread: a cellular automata model risk assessment simulation approach. Chem. Eng. 36, 253–258 (2014)
50.
Zurück zum Zitat Rybacki, S., Himmelspach, J., Uhrmacher, A.M.: Experiments with single core, multi-core, and GPU based computation of cellular automata. In: First International Conference on Advances in System Simulation, 2009. SIMUL’09, pp. 62–67. IEEE (2009) Rybacki, S., Himmelspach, J., Uhrmacher, A.M.: Experiments with single core, multi-core, and GPU based computation of cellular automata. In: First International Conference on Advances in System Simulation, 2009. SIMUL’09, pp. 62–67. IEEE (2009)
51.
Zurück zum Zitat Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Hwu, W.m.W.: Program optimization space pruning for a multithreaded gpu. In: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization—CGO 08 (2008). doi:10.1145/1356058.1356084 Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Hwu, W.m.W.: Program optimization space pruning for a multithreaded gpu. In: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization—CGO 08 (2008). doi:10.​1145/​1356058.​1356084
52.
Zurück zum Zitat Shimokawabe, T., Aoki, T., Takaki, T., Yamanaka, A., Nukada, A., Endo, T., Maruyama, N., Matsuoka, S.: Peta-scale phase-field simulation for dendritic solidification on the tsubame 2.0 supercomputer. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11 (2011) Shimokawabe, T., Aoki, T., Takaki, T., Yamanaka, A., Nukada, A., Endo, T., Maruyama, N., Matsuoka, S.: Peta-scale phase-field simulation for dendritic solidification on the tsubame 2.0 supercomputer. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11 (2011)
53.
Zurück zum Zitat Smoller, J.: Shock waves and reaction-diffusion equations. In: Research Supported by the US Air Force and National Science Foundation, vol. 258. Springer, New York(Grundlehren der Mathematischen Wissenschaften, vol. 258), p. 600 (1983) Smoller, J.: Shock waves and reaction-diffusion equations. In: Research Supported by the US Air Force and National Science Foundation, vol. 258. Springer, New York(Grundlehren der Mathematischen Wissenschaften, vol. 258), p. 600 (1983)
54.
Zurück zum Zitat Topa, P.: Cellular automata model tuned for efficient computation on GPU with global memory cache. In: 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 380–383 (2014). doi:10.1109/pdp.2014.97 Topa, P.: Cellular automata model tuned for efficient computation on GPU with global memory cache. In: 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 380–383 (2014). doi:10.​1109/​pdp.​2014.​97
55.
Zurück zum Zitat Topa, P., Młocek, P.: Using shared memory as a cache in cellular automata water flow simulations on gpus. Comput. Sci. 14, 3 (2013) Topa, P., Młocek, P.: Using shared memory as a cache in cellular automata water flow simulations on gpus. Comput. Sci. 14, 3 (2013)
58.
Zurück zum Zitat Volkov, V.: Better performance at lower occupancy. In: Proceedings of the GPU Technology Conference, GTC, vol. 10 (2010) Volkov, V.: Better performance at lower occupancy. In: Proceedings of the GPU Technology Conference, GTC, vol. 10 (2010)
59.
Zurück zum Zitat Volkov, V., Demmel, J.: Benchmarking GPUs to tune dense linear algebra. 2008 SC—International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2008). doi:10.1109/sc.2008.5214359 Volkov, V., Demmel, J.: Benchmarking GPUs to tune dense linear algebra. 2008 SC—International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2008). doi:10.​1109/​sc.​2008.​5214359
62.
Zurück zum Zitat Zhao, Y.: GPU accelerated computation and real-time rendering of cellular automata model for spatial simulation. J. Inform. Comput. Sci. 11(12), 4453–4465 (2014). doi:10.12733/jics20104445 CrossRef Zhao, Y.: GPU accelerated computation and real-time rendering of cellular automata model for spatial simulation. J. Inform. Comput. Sci. 11(12), 4453–4465 (2014). doi:10.​12733/​jics20104445 CrossRef
Metadaten
Titel
Performance analysis and comparison of cellular automata GPU implementations
verfasst von
Emmanuel N. Millán
Nicolás Wolovick
María Fabiana Piccoli
Carlos García Garino
Eduardo M. Bringa
Publikationsdatum
07.04.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 3/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0850-3

Weitere Artikel der Ausgabe 3/2017

Cluster Computing 3/2017 Zur Ausgabe