nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

Accelerating Lattice Boltzmann Method by Fully Exposing Vectorizable Loops

verfasst von : Bin Qu, Song Liu, Hailong Huang, Jiajun Yuan, Qian Wang, Weiguo Wu

Erschienen in: Algorithms and Architectures for Parallel Processing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Lattice Boltzmann Method (LBM) plays an important role in CFD applications. Accelerating LBM computation indicates the decrease of simulation costs for many industries. However, the loop-carried dependencies in LBM kernels prevent the vectorization of loops and general compilers therefore have missed many opportunities of vectorization. This paper proposes a SIMD-aware loop transformation algorithm to fully expose vectorizable loops for LBM kernels. The proposed algorithm identifies most potential vectorizable loops according to a defined dependence table. Then, it performs appropriate loop transformations and array copying techniques to legalize loop-carried dependencies and makes the identified loops automatically vectorized by compiler. Experiments carried on an Intel Xeon Gold 6140 server show that the proposed algorithm significantly raises the ratio of number of vectorized loops to number of all loops in LBM kernels. And our algorithm also achieves a better performance than an Intel C++ compiler and a polyhedral optimizer, accelerating LBM computation by 147% and 120% on average lattice update speed, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Out-of-Core GPU-Accelerated Causal Structure Learning

Nächstes Kapitel A Solution for High Availability Memory Access

AOS and soa. https://en.wikipedia.org/wiki/AOS_and_SOA. Accessed 1 Apr 2019

Intel\(\textregistered \) c++ compiler 19.0 developer guide and reference. https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-vectorization-and-loops. Accessed 6 June 2019

openlbmflow. https://sourceforge.net/projects/lbmflow. Accessed 15 June 2019

Pluto - an automatic parallelizer and locality optimizer for affine loop nests. http://pluto-compiler.sourceforge.net. Accessed 7 June 2019

Acharya, A., Bondhugula, U.: PLUTO+: near-complete modeling of affine transformations for parallelism and locality. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, 7–11 February, 2015, pp. 54–64 (2015)

Bernstein, A.J.: Analysis of programs for parallel processing. IEEE Trans. Electron. Comput. 5, 757–763 (1966)CrossRef

Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral program optimization system. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2008

Chen, S., Doolen, G.D.: Lattice boltzmann method for fluid flows. Ann. Rev. Fluid Mechan. 30(1), 329–364 (1998)MathSciNetCrossRef

Devan, P.S., Kamat, R.: A review-loop dependence analysis for parallelizing compiler. Int. J. Comput. Sci. Inf. Technol. 5(3), 4038–4046 (2014)

10.

Di, P., Ye, D., Su, Y., Sui, Y., Xue, J.: Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on gpus. In: 2012 41st International Conference on Parallel Processing, pp. 350–359. IEEE (2012)

11.

Du, X., et al.: Comparative study of distributed deep learning tools on supercomputers. In: Vaidya, J., Li, J. (eds.) ICA3PP 2018. LNCS, vol. 11334, pp. 122–137. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05051-1_9CrossRef

12.

Feautrier, P.: Some efficient solutions to the affine scheduling problem. i. one-dimensional time. Int. J. Parallel Program. 21(5), 313–347 (1992)MathSciNetCrossRef

13.

Feautrier, P.: Some efficient solutions to the affine scheduling problem. part ii. multidimensional time. Int. J. Parallel Program. 21(6), 389–420 (1992)MathSciNetCrossRef

14.

Feng, Y., Tang, J., Wang, C., Xie, J.: CuAPSS: a hybrid CUDA solution for all pairs similarity search. In: Vaidya, J., Li, J. (eds.) ICA3PP 2018. LNCS, vol. 11334, pp. 421–436. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05051-1_29CrossRef

15.

Kong, M., Veras, R., Stock, K., Franchetti, F., Pouchet, L., Sadayappan, P.: When polyhedral transformations meet SIMD code generation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, Seattle, WA, USA, 16–19 June, 2013, pp. 127–138 (2013)

16.

Krafczyk, M., Tölke, J., Luo, L.S.: Large-eddy simulations with a multiple-relaxation-time lbe model. Int. J. Modern Phys. B 17(01n02), 33–39 (2003)CrossRef

17.

Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine transforms. In: Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 201–214. ACM (1997)

18.

Liu, S., Zou, N., Cui, Y., Wu, W.: Accelerating the parallelization of lattice boltzmann method by exploiting the temporal locality. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 1186–1193. IEEE (2017)

19.

Pouchet, L.N.: Interative optimization in the polyhedral model. Ph.D. thesis, University of Paris-Sud 11, Orsay, France, January 2010

20.

Qian, Y., d’Humières, D., Lallemand, P.: Lattice BGK models for navier-stokes equation. EPL (Europhys. Lett.) 17(6), 479 (1992)CrossRef

21.

Shanley, T.: Pentium Pro and Pentium II System Architecture. Addison-Wesley Professional, Boston (1998)

22.

Tran, N.P., Lee, M., Choi, D.H.: Memory-efficient parallelization of 3D lattice boltzmann flow solver on a gpu. In: 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp. 315–324. IEEE (2015)

23.

Trifunovic, K., Nuzman, D., Cohen, A., Zaks, A., Rosen, I.: Polyhedral-model guided loop-nest auto-vectorization. In: 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pp. 327–337. IEEE (2009)

24.

Xue, J.: Loop Tiling for Parallelism, vol. 575. Springer Science & Business Media, New York (2012). https://doi.org/10.1007/978-1-4615-4337-4CrossRefMATH

25.

Zhang, W., Zhang, L., Chen, Y.: Asynchronous parallel Dijkstra’s algorithm on intel xeon phi processor. In: Vaidya, J., Li, J. (eds.) ICA3PP 2018. LNCS, vol. 11334, pp. 337–357. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05051-1_24CrossRef

Titel: Accelerating Lattice Boltzmann Method by Fully Exposing Vectorizable Loops
verfasst von: Bin Qu
Song Liu
Hailong Huang
Jiajun Yuan
Qian Wang
Weiguo Wu
Verlag: Springer International Publishing
Buch: Algorithms and Architectures for Parallel Processing
Print ISBN: 978-3-030-38990-1

Electronic ISBN: 978-3-030-38991-8

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-38991-8_8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"