Skip to main content

2016 | OriginalPaper | Buchkapitel

Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL

verfasst von : Bálint Joó, Dhiraj D. Kalamkar, Thorsten Kurth, Karthikeyan Vaidyanathan, Aaron Walden

Erschienen in: High Performance Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Lattice Quantumchromodynamics (QCD) is a powerful tool to numerically access the low energy regime of QCD in a straightforward way with quantifyable uncertainties. In this approach, QCD is discretized on a four dimensional, Euclidean space-time grid with millions of degrees of freedom. In modern lattice calculations, most of the work is still spent in solving large, sparse linear systems. This part has two challenges, i.e. optimizing the sparse matrix application as well as BLAS-like kernels used in the linear solver. We are going to present performance optimizations of the Dirac operator (dslash) with and without clover term for recent Intel® architectures, i.e. Haswell and Knights Landing (KNL). We were able to achieve a good fraction of peak performance for the Wilson-Dslash kernel, and Conjugate Gradients and Stabilized BiConjugate Gradients solvers. We will also present a series of experiments we performed on KNL, i.e. running MCDRAM in different modes, enabling or disabling hardware prefetching as well as using different SoA lengths. Furthermore, we will present a weak scaling study up to 16 KNL nodes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Due to the page limitation, we can not include numbers from SNC studies in this presentation.
 
2
we have also tested a beta of ICC 2017, but we found no significant differences in performance.
 
3
Similar measurements for DDR yield a similar result, i.e. \({\sim }70\,\mathrm {GB/s}\) which corresponds to about \(77\,\%\) of DDR bandwidth peak performance. Remarkably, this value is lower as the computed effective bandwidth for the Dslash kernel.
 
Literatur
3.
Zurück zum Zitat Clark, M.A., Babich, R., Barros, K., Brower, R.C., Rebbi, C.: Solving Lattice QCD systems of equations using mixed precision solvers on GPUs. Comput. Phys. Commun. 181, 1517–1528 (2010)CrossRefMATH Clark, M.A., Babich, R., Barros, K., Brower, R.C., Rebbi, C.: Solving Lattice QCD systems of equations using mixed precision solvers on GPUs. Comput. Phys. Commun. 181, 1517–1528 (2010)CrossRefMATH
4.
Zurück zum Zitat Creutz, M.: Quarks, Gluons and Lattices. Cambridge Monographs on Mathematical Physics, 169 p. Univ. Pr., Cambridge (1983) Creutz, M.: Quarks, Gluons and Lattices. Cambridge Monographs on Mathematical Physics, 169 p. Univ. Pr., Cambridge (1983)
5.
Zurück zum Zitat Edwards, H.C., Sunderland, D.: Kokkos array performance-portable manycore programming model. In: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2012, pp. 1–10. ACM, New York (2012). http://doi.acm.org/10.1145/2141702.2141703 Edwards, H.C., Sunderland, D.: Kokkos array performance-portable manycore programming model. In: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2012, pp. 1–10. ACM, New York (2012). http://​doi.​acm.​org/​10.​1145/​2141702.​2141703
6.
Zurück zum Zitat Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bureau Stand. 49(6), 409–436 (1952)MathSciNetCrossRefMATH Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bureau Stand. 49(6), 409–436 (1952)MathSciNetCrossRefMATH
7.
Zurück zum Zitat Heybrock, S., Joó, B., Kalamkar, D.D., Smelyanskiy, M., Vaidyanathan, K., Wettig, T., Dubey, P.: Lattice QCD with domain decomposition on intel® xeon phi™ co-processors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 69–80. IEEE Press, Piscataway (2014). http://dx.doi.org/10.1109/SC.2014.11 Heybrock, S., Joó, B., Kalamkar, D.D., Smelyanskiy, M., Vaidyanathan, K., Wettig, T., Dubey, P.: Lattice QCD with domain decomposition on intel® xeon phi™ co-processors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 69–80. IEEE Press, Piscataway (2014). http://​dx.​doi.​org/​10.​1109/​SC.​2014.​11
13.
Zurück zum Zitat Kalamkar, D.D., Smelyanskiy, M., Farber, R., Vaidyanathan, K.: Chapter 26 - quantum chromodynamics (QCD). In: Reinders, J., Jeffers, J., Sodani, A. (eds.) Intel Xeon Phi Processor High Performance Programming Knights Landing Edition. Morgan Kaufmann, Boston (2016) Kalamkar, D.D., Smelyanskiy, M., Farber, R., Vaidyanathan, K.: Chapter 26 - quantum chromodynamics (QCD). In: Reinders, J., Jeffers, J., Sodani, A. (eds.) Intel Xeon Phi Processor High Performance Programming Knights Landing Edition. Morgan Kaufmann, Boston (2016)
14.
Zurück zum Zitat Montvay, I., Munster, G.: Quantum Fields on a Lattice. Cambridge Monographs on Mathematical Physics, 491 p. Univ. Pr., Cambridge (1994) Montvay, I., Munster, G.: Quantum Fields on a Lattice. Cambridge Monographs on Mathematical Physics, 491 p. Univ. Pr., Cambridge (1994)
15.
Zurück zum Zitat Nguyen, A.D., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: SC, pp. 1–13 (2010) Nguyen, A.D., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: SC, pp. 1–13 (2010)
17.
Zurück zum Zitat Sheikholeslami, B., Wohlert, R.: Improved continuum limit lattice action for QCD with Wilson Fermions. Nucl. Phys. B 259, 572 (1985)CrossRef Sheikholeslami, B., Wohlert, R.: Improved continuum limit lattice action for QCD with Wilson Fermions. Nucl. Phys. B 259, 572 (1985)CrossRef
18.
Zurück zum Zitat van der Vorst, H.A.: Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 13(2), 631–644 (1992)MathSciNetCrossRefMATH van der Vorst, H.A.: Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 13(2), 631–644 (1992)MathSciNetCrossRefMATH
19.
Zurück zum Zitat Walden, A., Khan, S., Joó, B., Ranjan, D., Zubair, M.: Optimizing a multiple right-hand side Dslash kernel for intel knights corner. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) High Performance Computing. LNCS, vol. 9945, pp. 1–12. Springer International Publishing, Switzerland (2016) Walden, A., Khan, S., Joó, B., Ranjan, D., Zubair, M.: Optimizing a multiple right-hand side Dslash kernel for intel knights corner. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) High Performance Computing. LNCS, vol. 9945, pp. 1–12. Springer International Publishing, Switzerland (2016)
20.
Zurück zum Zitat Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52, 65–76 (2009)CrossRef Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52, 65–76 (2009)CrossRef
21.
Zurück zum Zitat Wilson, K.G.: Quarks and strings on a lattice. In: Zichichi, A. (ed.) New Phenomena in Subnuclear Physics, p. 69. Plenum Press, New York (1975) Wilson, K.G.: Quarks and strings on a lattice. In: Zichichi, A. (ed.) New Phenomena in Subnuclear Physics, p. 69. Plenum Press, New York (1975)
Metadaten
Titel
Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL
verfasst von
Bálint Joó
Dhiraj D. Kalamkar
Thorsten Kurth
Karthikeyan Vaidyanathan
Aaron Walden
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46079-6_30

Neuer Inhalt