Skip to main content
Top
Published in: International Journal of Parallel Programming 6/2016

01-12-2016

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System

Authors: Rachid Habel, Frédérique Silber-Chaussumier, François Irigoin, Elisabeth Brunet, François Trahay

Published in: International Journal of Parallel Programming | Issue 6/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper describes dSTEP, a directive-based programming model for hybrid shared and distributed memory machines. The originality of our work is the definition and an implementation of a unified high-level programming model addressing both data and computation distributions, providing a particularly fine control of the computation. The goal is to improve the programmer productivity while providing good performances in terms of execution time and memory usage. We define a generic compilation scheme for computation mapping and communication generation. We implement the solution in a source-to-source compiler together with a runtime library. We provide a series of optimizations to improve the performance of the generated code, with a special focus on reducing the communications time. We evaluate our solution on several scientific kernels as well as on the more challenging NAS BT benchmark, and compare our results with the hand written Fortran MPI and UPC implementations. The results show first that our solution allows to make explicit the non trivial parallel execution of the NAS BT benchmark using the dSTEP directives. Second, the results show that our generated MPI+OpenMP BT program runs with a 83.35 speedup over the original NAS OpenMP C benchmark on a hybrid cluster composed of 64 quadricores (256 cores). Overall, our solution dramatically reduces the programming effort while providing good time execution and memory usage performances. This programming model is suitable for a large variety of machines as multi-core and accelerator clusters.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Amini, M., Ancourt, C., Coelho, F., Irigoin, F., Jouvelot, P., Keryell, R., Villalon, P., Creusillet, B., Guelton, S.: PIPS is Not (Just) Polyhedral Software. In: International Workshop on Polyhedral Compilation Techniques (IMPACT11), Chamonix, France (2011) Amini, M., Ancourt, C., Coelho, F., Irigoin, F., Jouvelot, P., Keryell, R., Villalon, P., Creusillet, B., Guelton, S.: PIPS is Not (Just) Polyhedral Software. In: International Workshop on Polyhedral Compilation Techniques (IMPACT11), Chamonix, France (2011)
2.
go back to reference Amza, C., Cox, A.L., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., Zwaenepoel, W.: Treadmarks: shared memory computing on networks of workstations. Computer 29(2), 18–28 (1996)CrossRef Amza, C., Cox, A.L., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., Zwaenepoel, W.: Treadmarks: shared memory computing on networks of workstations. Computer 29(2), 18–28 (1996)CrossRef
3.
go back to reference Ancourt, C., Coelho, F., Irigoin, F., Keryell, R.: A linear algebra framework for static high performance Fortran code distribution. Sci. Program. 6, 3–27 (1997) Ancourt, C., Coelho, F., Irigoin, F., Keryell, R.: A linear algebra framework for static high performance Fortran code distribution. Sci. Program. 6, 3–27 (1997)
4.
go back to reference Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011)CrossRef Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011)CrossRef
5.
go back to reference Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63–73 (1991)CrossRef Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63–73 (1991)CrossRef
6.
go back to reference Banerjee, P., Chandy, J.A., Gupta, M., Hodges IV, E.W., Holm, J.G., Lain, A., Palermo, D.J., Ramaswamy, S., Su, E.: The PARADIGM compiler for distributed-memory multicomputers. Computer 28(10), 37–47 (1995)CrossRef Banerjee, P., Chandy, J.A., Gupta, M., Hodges IV, E.W., Holm, J.G., Lain, A., Palermo, D.J., Ramaswamy, S., Su, E.: The PARADIGM compiler for distributed-memory multicomputers. Computer 28(10), 37–47 (1995)CrossRef
7.
go back to reference Basumallik, A., Eigenmann, R.: Towards automatic translation of OpenMP to MPI. In: Proceedings of the 19th annual international conference on Supercomputing, ACM, pp. 189–198 (2005) Basumallik, A., Eigenmann, R.: Towards automatic translation of OpenMP to MPI. In: Proceedings of the 19th annual international conference on Supercomputing, ACM, pp. 189–198 (2005)
8.
go back to reference Bolze, R., Cappello, F., Caron, E., Daydé, M., Desprez, F., Jeannot, E., Jégou, Y., Lanteri, S., Leduc, J., Melab, N., et al.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006)CrossRef Bolze, R., Cappello, F., Caron, E., Daydé, M., Desprez, F., Jeannot, E., Jégou, Y., Lanteri, S., Leduc, J., Melab, N., et al.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006)CrossRef
9.
go back to reference Bonachea, D.: GASNet Specification, V1.1. Technical Report, University of California at Berkeley, Berkeley (2002) Bonachea, D.: GASNet Specification, V1.1. Technical Report, University of California at Berkeley, Berkeley (2002)
10.
go back to reference Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R., Ayguade, E., Labarta, J.: Productive Cluster Programming with OmpSs. In: Euro-Par 2011 Parallel Processing, Lecture Notes in Computer Science, vol. 6852, pp. 555–566. Springer, Berlin, Heidelberg (2011) Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R., Ayguade, E., Labarta, J.: Productive Cluster Programming with OmpSs. In: Euro-Par 2011 Parallel Processing, Lecture Notes in Computer Science, vol. 6852, pp. 555–566. Springer, Berlin, Heidelberg (2011)
11.
go back to reference Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)CrossRef Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)CrossRef
12.
go back to reference Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Not. 40(10), 519–538 (2005)CrossRef Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Not. 40(10), 519–538 (2005)CrossRef
13.
go back to reference Creusillet, B., Irigoin, F.: Interprocedural Array Region Analyses. In: Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, vol. 1033, pp. 46–60. Springer, Berlin, Heidelberg (1996) Creusillet, B., Irigoin, F.: Interprocedural Array Region Analyses. In: Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, vol. 1033, pp. 46–60. Springer, Berlin, Heidelberg (1996)
14.
go back to reference Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRef Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRef
15.
go back to reference Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment. In: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007) (2007) Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment. In: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007) (2007)
16.
17.
go back to reference Feautrier, P.: Dataflow Analysis of Array and Scalar References. Int. J. Parallel Program. 20, 23–53 (1991)CrossRefMATH Feautrier, P.: Dataflow Analysis of Array and Scalar References. Int. J. Parallel Program. 20, 23–53 (1991)CrossRefMATH
18.
go back to reference Irigoin, F., Jouvelot, P., Triolet, R.: Semantical interprocedural parallelization: an overview of the PIPS project. In: Proceedings of the 5th international conference on Supercomputing, ACM, New York, ICS ’91, pp. 244–251 (1991). doi:10.1145/109025.109086 Irigoin, F., Jouvelot, P., Triolet, R.: Semantical interprocedural parallelization: an overview of the PIPS project. In: Proceedings of the 5th international conference on Supercomputing, ACM, New York, ICS ’91, pp. 244–251 (1991). doi:10.​1145/​109025.​109086
19.
go back to reference Kennedy, K., Koelbel, C., Zima, H.: The rise and fall of High Performance Fortran: an historical object lesson. In: Proceedings of the third ACM SIGPLAN conference on History of programming languages, ACM, New York, HOPL III, pp. 7–1–7–22 (2007). doi:10.1145/1238844.1238851 Kennedy, K., Koelbel, C., Zima, H.: The rise and fall of High Performance Fortran: an historical object lesson. In: Proceedings of the third ACM SIGPLAN conference on History of programming languages, ACM, New York, HOPL III, pp. 7–1–7–22 (2007). doi:10.​1145/​1238844.​1238851
20.
go back to reference Kim, D.: Parameterized and Multi-level Tiled Loop Generation. Ph.D. thesis, Colorado State University aAI3419053 (2010) Kim, D.: Parameterized and Multi-level Tiled Loop Generation. Ph.D. thesis, Colorado State University aAI3419053 (2010)
21.
go back to reference Kusano, K., Satoh, S., Sato, M.: Performance Evaluation of the Omni OpenMP Compiler. In: High Performance Computing, pp. 403–414. Springer (2000) Kusano, K., Satoh, S., Sato, M.: Performance Evaluation of the Omni OpenMP Compiler. In: High Performance Computing, pp. 403–414. Springer (2000)
22.
go back to reference Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. ACM Sigplan Not. 44(4), 101–110 (2009)CrossRef Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. ACM Sigplan Not. 44(4), 101–110 (2009)CrossRef
23.
go back to reference Li, J., Chen, M.: Index domain alignment: minimizing cost of cross-referencing between distributed arrays. In: Proceedings of the 3rd Symposium on the Frontiers of Massively Parallel Computation, 1990, IEEE, pp. 424–433 (1990) Li, J., Chen, M.: Index domain alignment: minimizing cost of cross-referencing between distributed arrays. In: Proceedings of the 3rd Symposium on the Frontiers of Massively Parallel Computation, 1990, IEEE, pp. 424–433 (1990)
24.
go back to reference Mellor-Crummey, J., Adhianto, L., Scherer, W.N. III, Jin, G.: A new vision for co-array Fortran. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, ACM, New York, PGAS ’09, pp. 5:1–5:9 (2009). doi:10.1145/1809961.1809969 Mellor-Crummey, J., Adhianto, L., Scherer, W.N. III, Jin, G.: A new vision for co-array Fortran. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, ACM, New York, PGAS ’09, pp. 5:1–5:9 (2009). doi:10.​1145/​1809961.​1809969
25.
go back to reference Mellor-Crummey, John M., Adve, Vikram S., Broom, Bradley, Chavarra-Miranda, Daniel G., Fowler, Robert J., Jin, Guohua, Kennedy, Ken, Yi, Qing: Advanced optimization strategies in the Rice dHPF compiler. Concurr. Comput.: Pract. Exp. 14, 741–767 (2002)CrossRefMATH Mellor-Crummey, John M., Adve, Vikram S., Broom, Bradley, Chavarra-Miranda, Daniel G., Fowler, Robert J., Jin, Guohua, Kennedy, Ken, Yi, Qing: Advanced optimization strategies in the Rice dHPF compiler. Concurr. Comput.: Pract. Exp. 14, 741–767 (2002)CrossRefMATH
26.
go back to reference Merlin, J., Miles, D., Schuster, V.: Distributed OMP: Extensions to OpenMP for SMP clusters. In: Second European Workshop on OpenMP (EWOMP), pp. 14–15 (2000) Merlin, J., Miles, D., Schuster, V.: Distributed OMP: Extensions to OpenMP for SMP clusters. In: Second European Workshop on OpenMP (EWOMP), pp. 14–15 (2000)
27.
go back to reference Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.0 (2012) Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.0 (2012)
28.
go back to reference Millot, D., Muller, A., Parrot, C., Silber-Chaussumier, F.: STEP: A Distributed OpenMP for Coarse-Grain Parallelism Tool. In: OpenMP in a New Era of Parallelism, Lecture Notes in Computer Science, vol. 5004, pp. 83–99. Springer, Berlin, Heidelberg (2008) Millot, D., Muller, A., Parrot, C., Silber-Chaussumier, F.: STEP: A Distributed OpenMP for Coarse-Grain Parallelism Tool. In: OpenMP in a New Era of Parallelism, Lecture Notes in Computer Science, vol. 5004, pp. 83–99. Springer, Berlin, Heidelberg (2008)
29.
go back to reference Millot, D., Muller, A., Parrot, C., Silber-Chaussumier, F.: From OpenMP to MPI: first experiments of the STEP source-to-source transformation tool. In: The international Parallel Computing Conference (ParCo), pp. 669–676 (2009) Millot, D., Muller, A., Parrot, C., Silber-Chaussumier, F.: From OpenMP to MPI: first experiments of the STEP source-to-source transformation tool. In: The international Parallel Computing Conference (ParCo), pp. 669–676 (2009)
30.
go back to reference Nakao, Masahiro, Lee, Jinpil, Boku, Taisuke, Sato, Mitsuhisa: Productivity and performance of global-view programming with XcalableMP PGAS language. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 402–409 (2012) Nakao, Masahiro, Lee, Jinpil, Boku, Taisuke, Sato, Mitsuhisa: Productivity and performance of global-view programming with XcalableMP PGAS language. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 402–409 (2012)
31.
go back to reference Nieplocha, J., Harrison, R., Littlefield, R.J.: Global arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomput. 10(2), 169–189 (1996)CrossRef Nieplocha, J., Harrison, R., Littlefield, R.J.: Global arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomput. 10(2), 169–189 (1996)CrossRef
32.
go back to reference Numrich, R.W., Reid, J.: Co-Array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)CrossRef Numrich, R.W., Reid, J.: Co-Array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)CrossRef
35.
go back to reference Silber-Chaussumier, F., Muller, A., Habel, R.: Generating data transfers for distributed GPU parallel Programs. J. Parallel Distrib. Comput. 73(12), 1649–1660 (2013)CrossRef Silber-Chaussumier, F., Muller, A., Habel, R.: Generating data transfers for distributed GPU parallel Programs. J. Parallel Distrib. Comput. 73(12), 1649–1660 (2013)CrossRef
37.
go back to reference Trahay, F., Brunet, E., Denis, A., Namyst, R.: A multithreaded communication engine for multicore architectures. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–7 (2008). doi:10.1109/IPDPS.2008.4536139 Trahay, F., Brunet, E., Denis, A., Namyst, R.: A multithreaded communication engine for multicore architectures. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–7 (2008). doi:10.​1109/​IPDPS.​2008.​4536139
39.
go back to reference Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACCfirst Experiences with Real-world Applications. In: Euro-Par 2012 Parallel Processing, Springer, pp. 859–870 (2012) Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACCfirst Experiences with Real-world Applications. In: Euro-Par 2012 Parallel Processing, Springer, pp. 859–870 (2012)
40.
go back to reference Van der Wijngaart, R.F., Wong, P.: NAS Parallel Benchmarks Version 2.4. Technical Report, NAS technical report, NAS-02-007 (2002) Van der Wijngaart, R.F., Wong, P.: NAS Parallel Benchmarks Version 2.4. Technical Report, NAS technical report, NAS-02-007 (2002)
41.
go back to reference Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. SIGPLAN Not. 26(6), 30–44 (1991)CrossRef Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. SIGPLAN Not. 26(6), 30–44 (1991)CrossRef
42.
go back to reference Yuki, T., Rajopadhye, S.: Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs. Technical Report, Colorado State University Technical Report CS13-105 (2013) Yuki, T., Rajopadhye, S.: Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs. Technical Report, Colorado State University Technical Report CS13-105 (2013)
Metadata
Title
Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System
Authors
Rachid Habel
Frédérique Silber-Chaussumier
François Irigoin
Elisabeth Brunet
François Trahay
Publication date
01-12-2016
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 6/2016
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-016-0428-3

Other articles of this Issue 6/2016

International Journal of Parallel Programming 6/2016 Go to the issue

Premium Partner