Skip to main content
Top
Published in: International Journal of Parallel Programming 5-6/2019

07-12-2018

Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime

Authors: Brad Peterson, Alan Humphrey, Dan Sunderland, James Sutherland, Tony Saad, Harish Dasari, Martin Berzins

Published in: International Journal of Parallel Programming | Issue 5-6/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Humphrey, A., Meng, Q., Berzins, M., Harman, T.: Radiation modeling using the uintah heterogeneous CPU/GPU runtime system. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment (XSEDE 2012). ACM (2012) Humphrey, A., Meng, Q., Berzins, M., Harman, T.: Radiation modeling using the uintah heterogeneous CPU/GPU runtime system. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment (XSEDE 2012). ACM (2012)
3.
go back to reference Peterson, B., Dasari, H., Humphrey, A., Sutherland, J., Saad, T., Berzins, M.: Reducing overhead in the uintah framework to support short-lived tasks on GPU-heterogeneous architectures. In: Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, WOLFHPC ’15, pp. 4:1–4:8. ACM, New York (2015) Peterson, B., Dasari, H., Humphrey, A., Sutherland, J., Saad, T., Berzins, M.: Reducing overhead in the uintah framework to support short-lived tasks on GPU-heterogeneous architectures. In: Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, WOLFHPC ’15, pp. 4:1–4:8. ACM, New York (2015)
4.
go back to reference Meng, Q., Humphrey, A., Berzins, M.: The Uintah framework: a unified heterogeneous task scheduling and runtime system. In: Digital Proceedings of Supercomputing 12—WOLFHPC Workshop. IEEE (2012) Meng, Q., Humphrey, A., Berzins, M.: The Uintah framework: a unified heterogeneous task scheduling and runtime system. In: Digital Proceedings of Supercomputing 12—WOLFHPC Workshop. IEEE (2012)
5.
go back to reference Berzins, M.: Status of Release of the Uintah Computational Framework. Technical report UUSCI-2012-001. Scientific Computing and Imaging Institute (2012) Berzins, M.: Status of Release of the Uintah Computational Framework. Technical report UUSCI-2012-001. Scientific Computing and Imaging Institute (2012)
6.
go back to reference Kashiwa, B.A., Gaffney, E.S.: Design Basis for CFDLIB. Technical report LA-UR-03-1295. Los Alamos National Laboratory (2003) Kashiwa, B.A., Gaffney, E.S.: Design Basis for CFDLIB. Technical report LA-UR-03-1295. Los Alamos National Laboratory (2003)
7.
go back to reference Bardenhagen, S.G., Guilkey, J.E., Roessig, K.M., Brackbill, J.U., Witzel, W.M., Foster, J.C.: An improved contact algorithm for the material point method and application to stress propagation in granular material. Comput. Model. Eng. Sci. 2, 509–522 (2001)MATH Bardenhagen, S.G., Guilkey, J.E., Roessig, K.M., Brackbill, J.U., Witzel, W.M., Foster, J.C.: An improved contact algorithm for the material point method and application to stress propagation in granular material. Comput. Model. Eng. Sci. 2, 509–522 (2001)MATH
8.
go back to reference Guilkey, J.E., Harman, T.B., Xia, A., Kashiwa, B.A., McMurtry, P.A.: An Eulerian-Lagrangian approach for large deformation fluid-structure interaction problems, part 1: algorithm development. In: Chakrabarti, S.K., Brebbia, C.A., Almorza, D., Gonzalez-Palma, R. (eds.) Fluid Structure Interaction II. WIT Press, Cadiz (2003) Guilkey, J.E., Harman, T.B., Xia, A., Kashiwa, B.A., McMurtry, P.A.: An Eulerian-Lagrangian approach for large deformation fluid-structure interaction problems, part 1: algorithm development. In: Chakrabarti, S.K., Brebbia, C.A., Almorza, D., Gonzalez-Palma, R. (eds.) Fluid Structure Interaction II. WIT Press, Cadiz (2003)
9.
go back to reference Spinti, J., Thornock, J., Eddings, E., Smith, P.J., Sarofim, A.: Heat transfer to objects in pool fires. In: Faghri, M., Sundén, B. (eds.) Transport Phenomena in Fires. WIT Press, Southampton (2008) Spinti, J., Thornock, J., Eddings, E., Smith, P.J., Sarofim, A.: Heat transfer to objects in pool fires. In: Faghri, M., Sundén, B. (eds.) Transport Phenomena in Fires. WIT Press, Southampton (2008)
10.
go back to reference Saad, T., Sutherland, J.C.: Wasatch: an architecture-proof multiphysics development environment using a domain specific language and graph theory. J. Comput. Sci. 17, 639–646 (2016)CrossRef Saad, T., Sutherland, J.C.: Wasatch: an architecture-proof multiphysics development environment using a domain specific language and graph theory. J. Comput. Sci. 17, 639–646 (2016)CrossRef
11.
go back to reference Meng, Q., Berzins, M., Schmidt, J.: Using hybrid parallelism to improve memory use in the uintah framework. In: Proceedings of the 2011 TeraGrid Conference (TG11), Salt Lake City, Utah (2011) Meng, Q., Berzins, M., Schmidt, J.: Using hybrid parallelism to improve memory use in the uintah framework. In: Proceedings of the 2011 TeraGrid Conference (TG11), Salt Lake City, Utah (2011)
12.
go back to reference Meng, Q., Humphrey, A., Schmidt, J., Berzins, M.: Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 96:1–96:12. ACM, New York (2013) Meng, Q., Humphrey, A., Schmidt, J., Berzins, M.: Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 96:1–96:12. ACM, New York (2013)
13.
go back to reference Peterson, B., Humphrey, A., Schmidt, J., Berzins, M.: Addressing global data dependencies in heterogeneous asynchronous runtime systems on GPUs. In: Submitted—Third International Workshop on Extreme Scale Programming Models and Middleware, ESPM2. IEEE Press (2017) Peterson, B., Humphrey, A., Schmidt, J., Berzins, M.: Addressing global data dependencies in heterogeneous asynchronous runtime systems on GPUs. In: Submitted—Third International Workshop on Extreme Scale Programming Models and Middleware, ESPM2. IEEE Press (2017)
14.
go back to reference Bauer, M., Treichler, Sean, S., Elliott, A., Alex: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 66:1–66:11. IEEE Computer Society Press, Los Alamitos (2012) Bauer, M., Treichler, Sean, S., Elliott, A., Alex: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 66:1–66:11. IEEE Computer Society Press, Los Alamitos (2012)
15.
go back to reference Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. SIGPLAN Not. 28(10), 91–108 (1993)CrossRef Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. SIGPLAN Not. 28(10), 91–108 (1993)CrossRef
16.
go back to reference Augonnet, Cédric, Thibault, Samuel, Namyst, Raymond, Wacrenier, Pierre-André: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)CrossRef Augonnet, Cédric, Thibault, Samuel, Namyst, Raymond, Wacrenier, Pierre-André: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)CrossRef
17.
go back to reference Bosilca, George, Bouteiller, Aurelien, Danalis, Anthony, Herault, Thomas, Lemarinier, Pierre, Dongarra, Jack: DAGuE: A Generic Distributed DAG Engine for High Performance Computing. Parallel Comput. 38(1–2), 37–51 (2012)CrossRef Bosilca, George, Bouteiller, Aurelien, Danalis, Anthony, Herault, Thomas, Lemarinier, Pierre, Dongarra, Jack: DAGuE: A Generic Distributed DAG Engine for High Performance Computing. Parallel Comput. 38(1–2), 37–51 (2012)CrossRef
18.
go back to reference Humphrey, A., Sunderland, D., Harman, T., Berzins, M.: Radiative heat transfer calculation on 16384 GPUs using a reverse Monte Carlo ray tracing approach with adaptive mesh refinement. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1222–1231 (2016) Humphrey, A., Sunderland, D., Harman, T., Berzins, M.: Radiative heat transfer calculation on 16384 GPUs using a reverse Monte Carlo ray tracing approach with adaptive mesh refinement. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1222–1231 (2016)
19.
go back to reference Berzins, M., Beckvermit, J., Harman, T., Bezdjian, A., Humphrey, A., Meng, Q., Schmidt, J., Wight, C.: Extending the Uintah framework through the petascale modeling of detonation in arrays of high explosive devices. SIAM J. Sci. Comput. 38(5), S101–S122 (2016)MathSciNetCrossRef Berzins, M., Beckvermit, J., Harman, T., Bezdjian, A., Humphrey, A., Meng, Q., Schmidt, J., Wight, C.: Extending the Uintah framework through the petascale modeling of detonation in arrays of high explosive devices. SIAM J. Sci. Comput. 38(5), S101–S122 (2016)MathSciNetCrossRef
22.
go back to reference OpenMP Architecture Review Board. Openmp application program interface version 4.0 (2013) OpenMP Architecture Review Board. Openmp application program interface version 4.0 (2013)
23.
go back to reference Keasler, J., Hornung, R.: The RAJA Portability Layer: Overview and Status. Technical report LLNL-TR-661403, Lawrence Livermore National Laboratory (2014) Keasler, J., Hornung, R.: The RAJA Portability Layer: Overview and Status. Technical report LLNL-TR-661403, Lawrence Livermore National Laboratory (2014)
24.
go back to reference Edwards, H.C., Sunderland, D.: Kokkos array performance-portable manycore programming model. In: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM ’12, pp. 1–10. ACM, New York (2012) Edwards, H.C., Sunderland, D.: Kokkos array performance-portable manycore programming model. In: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM ’12, pp. 1–10. ACM, New York (2012)
25.
go back to reference Srman, T.: Comparison of Technologies for General-Purpose Computing on Graphics Processing Units. Master’s thesis, Department of Electrical Engineering, Linkping University (2016) Srman, T.: Comparison of Technologies for General-Purpose Computing on Graphics Processing Units. Master’s thesis, Department of Electrical Engineering, Linkping University (2016)
26.
go back to reference Martineau, M., Price, J., McIntosh-Smith, S., Gaudin, W.: Pragmatic performance portability with OpenMP 4.x. In: Maruyama, N., de Supinski, B.R., Wahib, M. (eds.) OpenMP: Memory, Devices, and Tasks: 12th International Workshop on OpenMP, IWOMP 2016, Nara, Japan, October 5–7, 2016, Proceedings, pp. 253–267. Springer, Cham (2016) Martineau, M., Price, J., McIntosh-Smith, S., Gaudin, W.: Pragmatic performance portability with OpenMP 4.x. In: Maruyama, N., de Supinski, B.R., Wahib, M. (eds.) OpenMP: Memory, Devices, and Tasks: 12th International Workshop on OpenMP, IWOMP 2016, Nara, Japan, October 5–7, 2016, Proceedings, pp. 253–267. Springer, Cham (2016)
27.
go back to reference Landaverde, R., Zhang, T., Coskun, A.K., Herbordt, M.C.: An investigation of unified memory access performance in CUDA. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6 (2014) Landaverde, R., Zhang, T., Coskun, A.K., Herbordt, M.C.: An investigation of unified memory access performance in CUDA. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6 (2014)
29.
go back to reference Notz, P.K., Pawlowski, R.P., Sutherland, J.C.: Graph-based software design for managing complexity and enabling concurrency in multiphysics PDE software. ACM Trans. Math. Softw. TOMS 39(1), 1 (2012)MathSciNetCrossRef Notz, P.K., Pawlowski, R.P., Sutherland, J.C.: Graph-based software design for managing complexity and enabling concurrency in multiphysics PDE software. ACM Trans. Math. Softw. TOMS 39(1), 1 (2012)MathSciNetCrossRef
30.
go back to reference Earl, C., Might, M., Bagusetty, A., Sutherland, J.C.: Nebo: an efficient, parallel, and portable domain-specific language for numerically solving partial differential equations. J. Syst. Softw. 125, 389–400 (2017)CrossRef Earl, C., Might, M., Bagusetty, A., Sutherland, J.C.: Nebo: an efficient, parallel, and portable domain-specific language for numerically solving partial differential equations. J. Syst. Softw. 125, 389–400 (2017)CrossRef
31.
go back to reference Sutherland, J.C., Saad, T.: The discrete operator approach to the numerical solution of partial differential equations. In: 20th AIAA Computational Fluid Dynamics Conference, pp. AIAA–2011–3377, Honolulu, Hawaii, USA (2011) Sutherland, J.C., Saad, T.: The discrete operator approach to the numerical solution of partial differential equations. In: 20th AIAA Computational Fluid Dynamics Conference, pp. AIAA–2011–3377, Honolulu, Hawaii, USA (2011)
33.
go back to reference Wu, W., Bosilca, G., vandeVaart, R., Jeaugey, S., Dongarra, J.: GPU-aware non-contiguous data movement in open MPI. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’16, pp. 231–242. ACM, New York (2016) Wu, W., Bosilca, G., vandeVaart, R., Jeaugey, S., Dongarra, J.: GPU-aware non-contiguous data movement in open MPI. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’16, pp. 231–242. ACM, New York (2016)
34.
go back to reference Ren, B., Ravi, N., Yang, Y., Feng, M., Agrawal, G., Chakradhar, S.: Automatic and efficient data host-device communication for many-core coprocessors. In: Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing—LCPC 2015, vol. 9519, p. 173–190. Springer, New York (2016)CrossRef Ren, B., Ravi, N., Yang, Y., Feng, M., Agrawal, G., Chakradhar, S.: Automatic and efficient data host-device communication for many-core coprocessors. In: Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing—LCPC 2015, vol. 9519, p. 173–190. Springer, New York (2016)CrossRef
35.
go back to reference Humphrey, A., Harman, T., Berzins, M., Smith, P.: A scalable algorithm for radiative heat transfer using reverse Monte Carlo ray tracing. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing, Volume 9137 of Lecture Notes in Computer Science, pp. 212–230. Springer, New York (2015) Humphrey, A., Harman, T., Berzins, M., Smith, P.: A scalable algorithm for radiative heat transfer using reverse Monte Carlo ray tracing. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing, Volume 9137 of Lecture Notes in Computer Science, pp. 212–230. Springer, New York (2015)
36.
go back to reference Burns, S.P., Christen, M.A.: Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numer. Heat Transf. B Fundam. 31(4), 401–421 (1997)CrossRef Burns, S.P., Christen, M.A.: Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numer. Heat Transf. B Fundam. 31(4), 401–421 (1997)CrossRef
37.
go back to reference Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 81:1–81:12. ACM, New York (2015) Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 81:1–81:12. ACM, New York (2015)
38.
go back to reference Bosilca, G., Bouteiller, A., Hérault, T., Lemarinier, P., Saengpatsa, N.O., Tomov, S., Dongarra, J.J.: Performance portability of a GPU enabled factorization with the DAGuE framework. In: 2011 IEEE International Conference on Cluster Computing, pp. 395–402 (2011) Bosilca, G., Bouteiller, A., Hérault, T., Lemarinier, P., Saengpatsa, N.O., Tomov, S., Dongarra, J.J.: Performance portability of a GPU enabled factorization with the DAGuE framework. In: 2011 IEEE International Conference on Cluster Computing, pp. 395–402 (2011)
39.
go back to reference Bauer, M.E.: Legion: programming distributed heterogeneous architectures with logical regions. Ph.D. thesis, Stanford University (2014) Bauer, M.E.: Legion: programming distributed heterogeneous architectures with logical regions. Ph.D. thesis, Stanford University (2014)
40.
go back to reference Bhatele, A., Yeom, J.-S., Jain, N., Kuhlman, C.J., Livnat, Y., Bisset, K.R., Kale, L.V., Marathe, M.V.: Massively parallel simulations of spread of infectious diseases over realistic social networks. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid ’17, pp. 689–694. IEEE Press, Piscataway (2017) Bhatele, A., Yeom, J.-S., Jain, N., Kuhlman, C.J., Livnat, Y., Bisset, K.R., Kale, L.V., Marathe, M.V.: Massively parallel simulations of spread of infectious diseases over realistic social networks. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid ’17, pp. 689–694. IEEE Press, Piscataway (2017)
41.
go back to reference Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.: Achieving high performance on supercomputers with a sequential task-based programming model. In: [Research Report] RR-8927, Inria Bordeaux Sud-Ouest, Bordeaux INP, CNRS, Université de Bordeaux, CEA, p. 27 (2016) Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.: Achieving high performance on supercomputers with a sequential task-based programming model. In: [Research Report] RR-8927, Inria Bordeaux Sud-Ouest, Bordeaux INP, CNRS, Université de Bordeaux, CEA, p. 27 (2016)
42.
go back to reference Danalis, A., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J.: PTG: an abstraction for unhindered parallelism. In: Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, WOLFHPC ’14, pp. 21–30. IEEE Press, Piscataway (2014) Danalis, A., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J.: PTG: an abstraction for unhindered parallelism. In: Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, WOLFHPC ’14, pp. 21–30. IEEE Press, Piscataway (2014)
Metadata
Title
Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime
Authors
Brad Peterson
Alan Humphrey
Dan Sunderland
James Sutherland
Tony Saad
Harish Dasari
Martin Berzins
Publication date
07-12-2018
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 5-6/2019
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-018-0619-1

Other articles of this Issue 5-6/2019

International Journal of Parallel Programming 5-6/2019 Go to the issue

Premium Partner