Top

International Journal of Parallel Programming

Published in:

18-09-2017

An Efficient Programming Skeleton for Clusters of Multi-Core Processors

Authors: Mina Hosseini Rad, Ahmad Patooghy, Mahdi Fazeli

Published in: International Journal of Parallel Programming | Issue 6/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper proposes a divide and conquer skeleton which aids parallel system programmers by (1) reducing programming complexity, (2) shortening programming time, and (3) enhancing code efficiency. To do this, the proposed skeleton exploits three mechanisms of (1) work-stealing, and (2) communication/computation overlapping, and (3) architectural awareness in the proposed divide and conquer skeleton. Using the work-stealing mechanism, when a processing element reaches a low-load condition, the processing core fetches a new job from the waiting queue of other cores. The second mechanism uses special threads to enable the proposed skeleton to overlapping computations with communications. The third mechanism considers the architectural parameters of the host system e.g., size of L1 cache, network bandwidth, network latency to maximally match a divide and conquer problem with the proposed skeleton. To evaluate the proposed skeleton, three benchmarks of merge sort, fast Fourier transform, and standard matrix multiplication are developed by the proposed skeleton as well as customized programming. Experiments are done in both simulation and real implementation environments. The set of six codes are simulated using COTSon simulator and also implemented on 28 dual-core real system. Obtained results from simulations showed an average of 12.6% speed-up of the proposed skeleton as compared to the customized programming; obtained speed-up in real environment is 9.6%. Furthermore, programming aided by the proposed skeleton, is at least 70% faster than custom programming while this difference increases as the program volume increases.

previous article Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation

next article Mosaic: A Scalable Coherence Protocol

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Rauber, T., Rünger, G.: Parallel Programming: For Multicore and Cluster Systems. Springer, Berlin (2010)CrossRef

Bader, D.A., Pennington, R.: Cluster computing: applications. Int. J. High Perform. Comput. 15(2), 181–185 (2001)CrossRef

Leyton, M., Piquer, J. M.: Skandium: multi-core programming with algorithmic skeletons. In: Proceedings of Euromicro Conference Distributed and Network-Based Processing, pp. 289–296 (2010)

Aldinucci, M., Danelutto, M., Kilpatrick, P.: Skeletons for multi/many-core systems. In: Proceedings of International Conference on Parallel Computing (PARCO), pp. 265–272 (2009)

Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In: Proceedings of International Conference on Parallel Processing (ICPP’09), pp. 84–91 (2009)

Nitzberg, B., Lo, V.: Distributed shared memory: a survey of issues and algorithms. Computer 24(8), 52–60 (1991)CrossRef

Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. ACM SIGOPS Oper. Syst. Rev. 42(2), 287–296 (2008)CrossRef

Kwiatkowski, J., Pawlik, M., Konieczny, D.: Parallel program execution anomalies. In: Proceedings of 2nd Workshop on Large Scale Computations on Grids (LaSCoG) (2006)

Danelutto, M.: Efficient support for skeletons on workstation clusters. Parallel Process. Lett. 11(1), 41–56 (2001)MathSciNetCrossRef

10.

Falcou, J., Sérot, J., Chateau, T., Lapresté, J.T.: Quaff: efficient c++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006)CrossRef

11.

Cole, M.I.: Algorithmic Skeletons: Structured Management of Parallel Computation. Pitman, London (1989)MATH

12.

Enmyren, J., Kessler, C. W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14 (2010)

13.

Müller-Funk, U., Thonemann, U.W., Vossen, G.: The Münster skeleton library muesli—a comprehensive overview. In: European Research Center for Information Systems, Münster, Report 7 (2009)

14.

Leyton, M.: Skandium parallel patterns library. http://skandium.niclabs.cl/. Accessed June (2013)

15.

González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)CrossRef

16.

Lea, D.: A Java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43 (2000)

17.

Tanno, H., Iwasaki, H.: Parallel skeletons for variable-length lists in sketo skeleton library. In: 15th International Euro-Par Conference on Parallel Processing, pp. 666–677

18.

Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: Accelerating code on multi-cores with fastflow. In: 17th international Conference on Parallel processing, pp. 1070–181. Springer-Verlag, Berlin, Heidelberg

19.

Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: Proceedings of the Nineteenth ACM-SIAM Symposium on Discrete Algorithms, pp. 501–510 (2008)

20.

Leiserson, C.E.: The Cilk++ concurrency platform. J. Supercomput. 51(3), 244–257 (2010)CrossRef

21.

Ravichandran, K., Lee, S., Pande, S.: Work stealing for multi-core HPC clusters. In: Proceedings of Euro-Par 2011 Parallel Processing, pp. 205–217. Springer Berlin Heidelberg (2011)CrossRef

22.

Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: COTSon: infrastructure for full system simulation. ACM SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)CrossRef

23.

Quinn, M.J., Hatcher, P.J.: On the utility of communication computation overlap in data-parallel programs. J. Parallel Distrib. Comput. 33(2), 197–204 (1996)CrossRef

24.

Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)MathSciNetCrossRef

25.

Manwade, K.B.: Analysis of parallel merge sort algorithm. Analysis 1(19), 66–69 (2010)

26.

Ryckbosch, F., Polfliet, S., Eeckhout, L.: Fast, accurate, and validated full-system software simulation of x86 hardware. Micro IEEE 30(6), 46–56 (2010)CrossRef

Title: An Efficient Programming Skeleton for Clusters of Multi-Core Processors
Authors: Mina Hosseini Rad
Ahmad Patooghy
Mahdi Fazeli
Publication date: 18-09-2017
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 6/2018
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-017-0517-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 6/2018

The Case for Polymorphic Registers in Dataflow Computing

Mosaic: A Scalable Coherence Protocol

High-Performance Computation of Bézier Surfaces on Parallel and Heterogeneous Platforms

Boosting the Hardware-Efficiency of Cascade Support Vector Machines for Embedded Classification Applications

Editor’s Note: Special Issue on Embedded Computer Systems: Architectures, Modeling and Simulation

Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation

Premium Partner