Top

International Journal of Parallel Programming

Published in:

08-08-2019

Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential

Authors: Re’em Harel, Idan Mosseri, Harel Levin, Lee-or Alon, Matan Rusanovsky, Gal Oren

Published in: International Journal of Parallel Programming | Issue 1/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures, which have become widespread in recent years, especially for scientific applications. In shared memory architectures, the most common parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP parallelization to applications is not always a simple task, due to common parallel shared memory management pitfalls and architecture heterogeneity. To ease this process, many automatic parallelization compilers were created. In this paper we focus on three source-to-source compilers—AutoPar, Par4All and Cetus—which were found to be most suitable for the task, point out their strengths and weaknesses, analyze their performances, inspect their capabilities and suggest new paths for enhancement. We analyze and compare the compilers’ performances over several different exemplary test cases, with each test case pointing out different pitfalls, and suggest several new ways to overcome these pitfalls, while yielding excellent results in practice. Moreover, we note that all of those source-to-source parallelization compilers function in the limits of OpenMP 2.5—an outdated version of the API which is no longer in optimal accordance with nowadays complicated heterogeneous architectures. Therefore we suggest a path to exploit the new features of OpenMP 4.5, as it provides new directives to fully utilize heterogeneous architectures, specifically ones that have a strong collaboration between CPUs and GPGPUs, thus it outperforms previous results by an order of magnitude.

next article Convoider: A Concurrency Bug Avoider Based on Transparent Software Transactional Memory

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Source code of relevant sections can be found at: \(github.com/reemharel22/AutomaticParallelization\_NAS\)

Geer, D.: Chip makers turn to multicore processors. Computer 38(5), 11–13 (2005)CrossRef

Blake, G., Dreslinski, R.G., Mudge, T.: A survey of multicore processors. IEEE Signal Process. Mag. 26(6), 26–37 (2009)CrossRef

Pacheco, P.: An Introduction to Parallel Programming. Elsevier, Amsterdam (2011)

Leopold, C.: Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches. Wiley, Hoboken (2001)

Dagum, L., Menon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRef

Gropp, W., Thakur, R., Lusk, E.: Using MPI-2: Advanced Features of the Message Passing Interface. MIT Press, Cambridge (1999)CrossRef

Snir, M., Otto, S., Huss-Lederman, S., Dongarra, J., Walker, D.: MPI-the Complete Reference: The MPI Core, vol. 1. MIT press, Cambridge (1998)

Boku, Taisuke, Sato, Mitsuhisa, Matsubara, Masazumi, Takahashi, Daisuke: Openmpi-openmp like tool for easy programming in mpi. In Sixth European Workshop on OpenMP, pages 83–88, (2004)

Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. In: ACM SIGGRAPH 2008 Classes, p. 16. ACM (2008)

10.

Oren, G., Ganan, Y., Malamud, G.: Automp: an automatic openmp parallization generator for variable-oriented high-performance scientific codes. Int. J. Comb. Optim. Probl. Inform. 9(1), 46–53 (2018)

11.

Neamtiu, I., Foster, J.S., Hicks, M.: Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Softw. Eng. Notes 30(4), 1–5 (2005)CrossRef

12.

AutoPar documentations. http://rosecompiler.org/ROSE_HTML_Reference/auto_par.html. Accessed 8 Aug 2019

13.

ROSE homepage. http://rosecompiler.org. Accessed 8 Aug 2019

14.

Dever, M.: AutoPar: automating the parallelization of functional programs. PhD thesis, Dublin City University (2015)

15.

Par4All homepage. http://par4all.github.io/. Accessed 8 Aug 2019

16.

PIPS homepage. https://pips4u.org/. Accessed 8 Aug 2019

17.

Ventroux, N., Sassolas, T., Guerre, A., Creusillet, B., Keryell, R.: SESAM/Par4all: a tool for joint exploration of MPSoC architectures and dynamic dataflow code generation. In: Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, pp. 9–16. ACM (2012)

18.

Cetus homepage. https://engineering.purdue.edu/Cetus/. Accessed 8 Aug 2019

19.

Dave, C., Bae, H., Min, S.-J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. Computer 42(12), 36–42 (2009)CrossRef

20.

Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the suif compiler. Computer 29(12), 84–89 (1996)CrossRef

21.

Pottenger, B., Eigenmann, R.: Idiom recognition in the Polaris parallelizing compiler. In: Proceedings of the 9th International Conference on Supercomputing, pp. 444–448. ACM (1995)

22.

Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., Su, E.: Intel® openmp c++/fortran compiler for hyper-threading technology: implementation and performance. Intel Technol. J. 6(1), 36–46 (2002)

23.

Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: PLUTO: a practical and fully automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), Tucson, AZ (June 2008). Citeseer (2008)

24.

Prema, S., Jehadeesan, R., Panigrahi, B.K.: Identifying pitfalls in automatic parallelization of NAS parallel benchmarks. In: 2017 National Conference on Parallel Computing Technologies (PARCOMPTECH), pp. 1–6. IEEE (2017)

25.

Arenaz, M., Hernandez, O., Pleiter, D.: The technological roadmap of parallware and its alignment with the openpower ecosystem. In: International Conference on High Performance Computing, pp. 237–253. Springer (2017)

26.

Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The nas parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991)CrossRef

27.

Graham, S.L., Kessler, P.B., McKusick, M.K.: Gprof: a call graph execution profiler. ACM SIGPLAN Not. 39(4), 49–57 (2004)CrossRef

28.

Prema, S., Jehadeesan, R.: Analysis of parallelization techniques and tools. Int. J. Inf. Comput. Technol. 3(5), 471–478 (2013)

29.

Sohal, M., Kaur, R.: Automatic parallelization: a review. Int. J. Comput. Sci. Mob. Comput. 5(5), 17–21 (2016)

30.

Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)CrossRef

31.

Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.-X., Péan, G., Villalon, P.: Par4all: from convex array regions to heterogeneous computing. In: IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012 (2012)

32.

Lee, S.-I., Johnson, T.A., Eigenmann, R.: Cetus—an extensible compiler infrastructure for source-to-source transformation. In: International Workshop on Languages and Compilers for Parallel Computing, pp. 539–553. Springer (2003)

33.

Liang, X., Humos, A.A., Pei, T.: Vectorization and parallelization of loops in C/C++ code. In: Proceedings of the International Conference on Frontiers in Education: Computer Science and Computer Engineering (FECS). The Steering Committee of The World Congress in Computer Science, Computer, pp. 203–206 (2017)

34.

Jubb, C.: Loop optimizations in modern c compilers (2014)

35.

Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Newnes, Oxford (2013)

36.

Lu, M., Zhang, L., Huynh, H.P., Ong, Z., Liang, Y., He, B., Goh, R.S.M., Huynh, R.: Optimizing the mapreduce framework on Intel Xeon Phi coprocessor. In 2013 IEEE International Conference on Big Data, pp. 125–130. IEEE (2013)

37.

Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Shet, A.G., Chrysos, G., Dubey, P.: Design and implementation of the linpack benchmark for single and multi-node systems based on Intel® Xeon Phi coprocessor. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 126–137. IEEE (2013)

38.

Bailey, D.H.: NAS parallel benchmarks. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1254–1259. Springer, Heidelberg (2011)

39.

NPB in C homepage. http://aces.snu.ac.kr/software/snu-npb/. Accessed 8 Aug 2019

40.

Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22(6), 702–719 (2010)

41.

Van der Pas, R., Stotzer, E., Terboven, C.: Using OpenMP The Next Step: Affinity, Accelerators, Tasking, and SIMD. MIT Press, Cambridge (2017)

42.

Sui, Y., Fan, X.I., Zhou, H., Xue, J.: Loop-oriented array-and field-sensitive pointer analysis for automatic SIMD vectorization. In: ACM SIGPLAN Notices, vol. 51, pp. 41–51. ACM (2016)

43.

Zhou, H.: Compiler techniques for improving SIMD parallelism. PhD thesis, University of New South Wales, Sydney, Australia (2016)

44.

NegevHPC Project. https://www.negevhpc.com. Accessed 8 Aug 2019

Title: Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential
Authors: Re’em Harel
Idan Mosseri
Harel Levin
Lee-or Alon
Matan Rusanovsky
Gal Oren
Publication date: 08-08-2019
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 1/2020
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-019-00640-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 1/2020

CSMqGraph: Coarse-Grained and Multi-external-storage Multi-queue I/O Management for Graph Computing

Convoider: A Concurrency Bug Avoider Based on Transparent Software Transactional Memory

Message Passing Optimization in Robot Operating System

Empirical Mode Decomposition and Temporal Convolutional Networks for Remaining Useful Life Estimation

Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+

Planning Above the API Clouds Before Flying Above the Clouds: A Real-Time Personalized Air Travel Planning Approach

Premium Partner