Skip to main content
Top

2019 | OriginalPaper | Chapter

LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation

Authors : Anastasia Perepelkina, Vadim Levchenko

Published in: Supercomputing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We have achieved a \({\sim }0.3\) GLUps performance on a 4 core CPU for the D3Q19 Lattice Boltzmann method by taking an advanced time-space decomposition approach. The LRnLA algorithm ConeFold was used with a new non-local mirrored vectorization. The roofline model was used for the performance estimation and parameter choice. There are many expansion possibilities, so the developed kernel may become a foundation for more complex LBM variations.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Geier, M., Schönherr, M.: Esoteric twist: an efficient in-place streaming algorithmus for the lattice Boltzmann method on massively parallel hardware. Computation 5(2), 19 (2017)CrossRef Geier, M., Schönherr, M.: Esoteric twist: an efficient in-place streaming algorithmus for the lattice Boltzmann method on massively parallel hardware. Computation 5(2), 19 (2017)CrossRef
3.
go back to reference Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., Rüde, U.: A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 35. ACM (2013) Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., Rüde, U.: A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 35. ACM (2013)
4.
go back to reference Habich, J., Zeiser, T., Hager, G., Wellein, G.: Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization. In: 21st International Conference on Parallel Computational Fluid Dynamics, pp. 178–182 (2009) Habich, J., Zeiser, T., Hager, G., Wellein, G.: Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization. In: 21st International Conference on Parallel Computational Fluid Dynamics, pp. 178–182 (2009)
5.
go back to reference Heuveline, V., Latt, J.: The OpenLB project: an open source and object oriented implementation of lattice Boltzmann methods. Int. J. Mod. Phys. C 18(04), 627–634 (2007)CrossRef Heuveline, V., Latt, J.: The OpenLB project: an open source and object oriented implementation of lattice Boltzmann methods. Int. J. Mod. Phys. C 18(04), 627–634 (2007)CrossRef
6.
go back to reference Ivanov, A., Khilkov, S.: Aiwlib library as the instrument for creating numerical modeling applications. Sci. Vis. 10(1), 110–127 (2018) Ivanov, A., Khilkov, S.: Aiwlib library as the instrument for creating numerical modeling applications. Sci. Vis. 10(1), 110–127 (2018)
7.
go back to reference Levchenko, V.D.: Asynchronous parallel algorithms as a way to archive effectiveness of computations (in Russian). J. Inf. Tech. Comp. Syst. (1), 68 (2005) Levchenko, V.D.: Asynchronous parallel algorithms as a way to archive effectiveness of computations (in Russian). J. Inf. Tech. Comp. Syst. (1), 68 (2005)
8.
go back to reference Levchenko, V.D., Perepelkina, A.Y.: Locally recursive non-locally asynchronous algorithms for stencil computation. Lobachevskii J. Math. 39(4), 552–561 (2018)MathSciNetCrossRef Levchenko, V.D., Perepelkina, A.Y.: Locally recursive non-locally asynchronous algorithms for stencil computation. Lobachevskii J. Math. 39(4), 552–561 (2018)MathSciNetCrossRef
9.
go back to reference Levchenko, V.D., Perepelkina, A.Y., Zakirov, A.V.: DiamondTorre algorithm for high-performance wave modeling. Computation 4(3), 29 (2016)CrossRef Levchenko, V.D., Perepelkina, A.Y., Zakirov, A.V.: DiamondTorre algorithm for high-performance wave modeling. Computation 4(3), 29 (2016)CrossRef
10.
go back to reference Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966) Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)
11.
go back to reference Neumann, P., Bungartz, H.J., Mehl, M., Neckel, T., Weinzierl, T.: A coupled approach for fluid dynamic problems using the PDE framework peano. Commun. Comput. Phys. 12(1), 65–84 (2012)CrossRef Neumann, P., Bungartz, H.J., Mehl, M., Neckel, T., Weinzierl, T.: A coupled approach for fluid dynamic problems using the PDE framework peano. Commun. Comput. Phys. 12(1), 65–84 (2012)CrossRef
12.
go back to reference Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–13. IEEE (2010) Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–13. IEEE (2010)
13.
go back to reference Perepelkina, A.Y., Levchenko, V.D., Goryachev, I.A.: Implementation of the kinetic plasma code with locally recursive non-locally asynchronous algorithms. J. Phys. Conf. Ser. 510, 012042 (2014)CrossRef Perepelkina, A.Y., Levchenko, V.D., Goryachev, I.A.: Implementation of the kinetic plasma code with locally recursive non-locally asynchronous algorithms. J. Phys. Conf. Ser. 510, 012042 (2014)CrossRef
14.
go back to reference Perepelkina, A.: 3D3V kinetic code for simulation of magnetized plasma (in Russian). Ph.D. thesis, Keldysh Institute of Applied Mathematics RAS, Moscow (2015) Perepelkina, A.: 3D3V kinetic code for simulation of magnetized plasma (in Russian). Ph.D. thesis, Keldysh Institute of Applied Mathematics RAS, Moscow (2015)
15.
go back to reference Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017)CrossRef Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017)CrossRef
16.
go back to reference Shimokawabe, T., Endo, T., Onodera, N., Aoki, T.: A stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers. In: Cluster Computing (CLUSTER), pp. 525–529. IEEE (2017) Shimokawabe, T., Endo, T., Onodera, N., Aoki, T.: A stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers. In: Cluster Computing (CLUSTER), pp. 525–529. IEEE (2017)
17.
go back to reference Succi, S.: The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press, Oxford (2001)MATH Succi, S.: The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press, Oxford (2001)MATH
18.
go back to reference Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef
19.
go back to reference Wittmann, M.: Hardware-effiziente, hochparallele Implementierungen von Lattice-Boltzmann-Verfahren für komplexe Geometrien (in German). Ph.D. thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (2016) Wittmann, M.: Hardware-effiziente, hochparallele Implementierungen von Lattice-Boltzmann-Verfahren für komplexe Geometrien (in German). Ph.D. thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (2016)
20.
go back to reference Zakirov, A.V., Levchenko, V.D.: The code for effective 3D modeling of electormagnetic wavesevolution in actual electrodynamics problems. Keldysh Institute Preprints (28) (2009) Zakirov, A.V., Levchenko, V.D.: The code for effective 3D modeling of electormagnetic wavesevolution in actual electrodynamics problems. Keldysh Institute Preprints (28) (2009)
Metadata
Title
LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation
Authors
Anastasia Perepelkina
Vadim Levchenko
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-05807-4_9

Premium Partner