Skip to main content
Top
Published in: International Journal of Parallel Programming 3/2017

14-06-2016

Porting the PLASMA Numerical Library to the OpenMP Standard

Authors: Asim YarKhan, Jakub Kurzak, Piotr Luszczek, Jack Dongarra

Published in: International Journal of Parallel Programming | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

PLASMA is a numerical library intended as a successor to LAPACK for solving problems in dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for efficient multithreading of algorithms expressed in a serial fashion. QUARK is a superscalar scheduler and implements automatic parallelization by tracking data dependencies and resolving data hazards at runtime. Recently, this type of scheduling has been incorporated in the OpenMP standard, which allows to transition PLASMA from the proprietary solution offered by QUARK to the standard solution offered by OpenMP. This article studies the feasibility of such transition.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., Rosenberg, L.: Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. In: High Performance Computing for Computational Science—VECPAR 2010, pp. 129–138. Springer (2011) Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., Rosenberg, L.: Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. In: High Performance Computing for Computational Science—VECPAR 2010, pp. 129–138. Springer (2011)
2.
go back to reference Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. In: Journal of Physics: Conference Series, vol. 180, p. 012037. IOP Publishing (2009) Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. In: Journal of Physics: Conference Series, vol. 180, p. 012037. IOP Publishing (2009)
3.
go back to reference Agullo, E., Hadri, B., Ltaief, H., Dongarrra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12. New York (2009) Agullo, E., Hadri, B., Ltaief, H., Dongarrra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12. New York (2009)
4.
go back to reference Amdahl, G.M.: Validity of the single-processor approach to achieving large scale computing capabilities. In: AFIPS Conference Proceedings, vol. 30, pp. 483–485, Atlantic City, N.J., APR 18–20 1967. AFIPS Press, Reston (1967) Amdahl, G.M.: Validity of the single-processor approach to achieving large scale computing capabilities. In: AFIPS Conference Proceedings, vol. 30, pp. 483–485, Atlantic City, N.J., APR 18–20 1967. AFIPS Press, Reston (1967)
5.
go back to reference Anderson, E., Dongarra, J.: Implementation guide for LAPACK. Technical Report UT-CS-90-101, University of Tennessee, Computer Science Department, LAPACK Working Note 18 (1990) Anderson, E., Dongarra, J.: Implementation guide for LAPACK. Technical Report UT-CS-90-101, University of Tennessee, Computer Science Department, LAPACK Working Note 18 (1990)
6.
go back to reference Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammerling, S., McKenney, A., et al.: LAPACK Users’ Guide, vol. 9. SIAM, Philadelphia (1999)CrossRefMATH Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammerling, S., McKenney, A., et al.: LAPACK Users’ Guide, vol. 9. SIAM, Philadelphia (1999)CrossRefMATH
7.
go back to reference Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011)CrossRef Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011)CrossRef
8.
go back to reference Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Concurr. Comput.: Pract. Exp. 21(18), 2438–2456 (2009)CrossRef Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Concurr. Comput.: Pract. Exp. 21(18), 2438–2456 (2009)CrossRef
9.
go back to reference Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)CrossRef Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)CrossRef
10.
11.
go back to reference Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)MathSciNetCrossRef Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)MathSciNetCrossRef
12.
go back to reference Castaldo, A.M., Whaley, R.: Clint: acaling lapack panel operations using parallel cache assignment. In: ACM Sigplan Notices, vol. 45, pp. 223–232. ACM (2010) Castaldo, A.M., Whaley, R.: Clint: acaling lapack panel operations using parallel cache assignment. In: ACM Sigplan Notices, vol. 45, pp. 223–232. ACM (2010)
13.
go back to reference Castaldo, A.M., Whaley, R.: Clint: scaling LAPACK panel operations using parallel cache assignment. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 223–232 (2010) Castaldo, A.M., Whaley, R.: Clint: scaling LAPACK panel operations using parallel cache assignment. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 223–232 (2010)
14.
go back to reference Donfack, S., Dongarra, J., Faverge, M., Gates, M., Kurzak, J., Luszczek, P., Yamazaki, I.: A survey of recent developments in parallel implementations of Gaussian elimination. Concurr. Comput.: Pract. Exp. 27(5), 1292–1309 (2015)CrossRef Donfack, S., Dongarra, J., Faverge, M., Gates, M., Kurzak, J., Luszczek, P., Yamazaki, I.: A survey of recent developments in parallel implementations of Gaussian elimination. Concurr. Comput.: Pract. Exp. 27(5), 1292–1309 (2015)CrossRef
15.
go back to reference Dongarra, J., Kurzak, J., Luszczek, P., Yamazaki, I.: PULSAR Users’ Guide: Parallel Ultra-Light Systolic Array Runtime. Technical Report UT-EECS-14-733, EECS Department, University of Tennessee (2014) Dongarra, J., Kurzak, J., Luszczek, P., Yamazaki, I.: PULSAR Users’ Guide: Parallel Ultra-Light Systolic Array Runtime. Technical Report UT-EECS-14-733, EECS Department, University of Tennessee (2014)
16.
go back to reference Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.: Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurr. Comput.: Pract. Exp. 26(7), 1408–1431 (2014)CrossRef Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.: Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurr. Comput.: Pract. Exp. 26(7), 1408–1431 (2014)CrossRef
17.
go back to reference Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. (TOMS) 16(1), 1–17 (1990)CrossRefMATH Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. (TOMS) 16(1), 1–17 (1990)CrossRefMATH
18.
go back to reference Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OMPSS: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)MathSciNetCrossRef Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OMPSS: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)MathSciNetCrossRef
19.
go back to reference Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Weirong Z.: Parallex: a study of a new parallel computation model. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1–6. IEEE (2007) Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Weirong Z.: Parallex: a study of a new parallel computation model. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1–6. IEEE (2007)
20.
go back to reference Gustafson, J.L.: Reevaluating Amdahl’s Law. Commun. ACM 31(5), 532–533 (1988)CrossRef Gustafson, J.L.: Reevaluating Amdahl’s Law. Commun. ACM 31(5), 532–533 (1988)CrossRef
21.
go back to reference Gustavson, F., Karlsson, L., Kågström, B.: Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. (TOMS) 38(3), 17 (2012)CrossRef Gustavson, F., Karlsson, L., Kågström, B.: Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. (TOMS) 38(3), 17 (2012)CrossRef
22.
go back to reference Gustavson, F.G.: Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41(6), 737–755 (1997)CrossRef Gustavson, F.G.: Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41(6), 737–755 (1997)CrossRef
23.
go back to reference Haidar, A., Kurzak, J., Luszczek, P.: An improved parallel singular value algorithm and its implementation for multicore hardware. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 90. ACM (2013) Haidar, A., Kurzak, J., Luszczek, P.: An improved parallel singular value algorithm and its implementation for multicore hardware. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 90. ACM (2013)
24.
go back to reference Haidar, A., Ltaief, H., YarKhan, A., Dongarra, J.: Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurr. Comput.: Pract. Exp. 24(3), 305–321 (2012)CrossRef Haidar, A., Ltaief, H., YarKhan, A., Dongarra, J.: Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurr. Comput.: Pract. Exp. 24(3), 305–321 (2012)CrossRef
25.
go back to reference Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: International Conference on Parallel Processing Workshops, 2009. ICPPW’09, pp. 394–401. IEEE (2009) Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: International Conference on Parallel Processing Workshops, 2009. ICPPW’09, pp. 394–401. IEEE (2009)
26.
go back to reference Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, vol. 28, pp. 91–108. ACM (1993) Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, vol. 28, pp. 91–108. ACM (1993)
27.
go back to reference Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the Cell processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)CrossRef Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the Cell processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)CrossRef
28.
go back to reference Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concurr. Comput.: Pract. Exp. 22(1), 15–44 (2010)CrossRef Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concurr. Comput.: Pract. Exp. 22(1), 15–44 (2010)CrossRef
29.
go back to reference OpenMP Architecture Review Board: OpenMP Application Program Interface, version 4.5 edition (2015) OpenMP Architecture Review Board: OpenMP Application Program Interface, version 4.5 edition (2015)
30.
go back to reference Pérez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)CrossRef Pérez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)CrossRef
31.
go back to reference Pichon, G., Haidar, A., Faverge, M., Kurzak, J.: Divide and conquer symmetric tridiagonal eigensolver for multicore architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 51–60. IEEE (2015) Pichon, G., Haidar, A., Faverge, M., Kurzak, J.: Divide and conquer symmetric tridiagonal eigensolver for multicore architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 51–60. IEEE (2015)
32.
go back to reference Quintana, E.S., Quintana, G., Sun, X., van de Geijn, R.: A note on parallel matrix inversion. SIAM J. Sci. Comput. 22(5), 1762–1771 (2001)MathSciNetCrossRefMATH Quintana, E.S., Quintana, G., Sun, X., van de Geijn, R.: A note on parallel matrix inversion. SIAM J. Sci. Comput. 22(5), 1762–1771 (2001)MathSciNetCrossRefMATH
33.
go back to reference Quintana-Ortí, G., Quintana-Ortí, E.S., Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. (TOMS) 36(3), 14 (2009)MathSciNetCrossRef Quintana-Ortí, G., Quintana-Ortí, E.S., Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. (TOMS) 36(3), 14 (2009)MathSciNetCrossRef
34.
go back to reference Tillenius, M.: Superglue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. 37(6), C617–C642 (2015)MathSciNetCrossRefMATH Tillenius, M.: Superglue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. 37(6), C617–C642 (2015)MathSciNetCrossRefMATH
35.
go back to reference Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)CrossRef Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)CrossRef
36.
go back to reference YarKhan, A.: Dynamic Task Execution on Shared and Distributed Memory Architectures. PhD thesis, University of Tennessee (2012) YarKhan, A.: Dynamic Task Execution on Shared and Distributed Memory Architectures. PhD thesis, University of Tennessee (2012)
37.
go back to reference Zhao, Y., Hategan, M., Clifford, B., Foster, I., Von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: fast, reliable, loosely coupled parallel computation. In: Services, 2007 IEEE Congress on, pp. 199–206. IEEE (2007) Zhao, Y., Hategan, M., Clifford, B., Foster, I., Von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: fast, reliable, loosely coupled parallel computation. In: Services, 2007 IEEE Congress on, pp. 199–206. IEEE (2007)
Metadata
Title
Porting the PLASMA Numerical Library to the OpenMP Standard
Authors
Asim YarKhan
Jakub Kurzak
Piotr Luszczek
Jack Dongarra
Publication date
14-06-2016
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 3/2017
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-016-0441-6

Other articles of this Issue 3/2017

International Journal of Parallel Programming 3/2017 Go to the issue

Premium Partner