Skip to main content
Top
Published in: International Journal of Parallel Programming 6/2018

18-09-2017

An Efficient Programming Skeleton for Clusters of Multi-Core Processors

Authors: Mina Hosseini Rad, Ahmad Patooghy, Mahdi Fazeli

Published in: International Journal of Parallel Programming | Issue 6/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper proposes a divide and conquer skeleton which aids parallel system programmers by (1) reducing programming complexity, (2) shortening programming time, and (3) enhancing code efficiency. To do this, the proposed skeleton exploits three mechanisms of (1) work-stealing, and (2) communication/computation overlapping, and (3) architectural awareness in the proposed divide and conquer skeleton. Using the work-stealing mechanism, when a processing element reaches a low-load condition, the processing core fetches a new job from the waiting queue of other cores. The second mechanism uses special threads to enable the proposed skeleton to overlapping computations with communications. The third mechanism considers the architectural parameters of the host system e.g., size of L1 cache, network bandwidth, network latency to maximally match a divide and conquer problem with the proposed skeleton. To evaluate the proposed skeleton, three benchmarks of merge sort, fast Fourier transform, and standard matrix multiplication are developed by the proposed skeleton as well as customized programming. Experiments are done in both simulation and real implementation environments. The set of six codes are simulated using COTSon simulator and also implemented on 28 dual-core real system. Obtained results from simulations showed an average of 12.6% speed-up of the proposed skeleton as compared to the customized programming; obtained speed-up in real environment is 9.6%. Furthermore, programming aided by the proposed skeleton, is at least 70% faster than custom programming while this difference increases as the program volume increases.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Rauber, T., Rünger, G.: Parallel Programming: For Multicore and Cluster Systems. Springer, Berlin (2010)CrossRef Rauber, T., Rünger, G.: Parallel Programming: For Multicore and Cluster Systems. Springer, Berlin (2010)CrossRef
2.
go back to reference Bader, D.A., Pennington, R.: Cluster computing: applications. Int. J. High Perform. Comput. 15(2), 181–185 (2001)CrossRef Bader, D.A., Pennington, R.: Cluster computing: applications. Int. J. High Perform. Comput. 15(2), 181–185 (2001)CrossRef
3.
go back to reference Leyton, M., Piquer, J. M.: Skandium: multi-core programming with algorithmic skeletons. In: Proceedings of Euromicro Conference Distributed and Network-Based Processing, pp. 289–296 (2010) Leyton, M., Piquer, J. M.: Skandium: multi-core programming with algorithmic skeletons. In: Proceedings of Euromicro Conference Distributed and Network-Based Processing, pp. 289–296 (2010)
4.
go back to reference Aldinucci, M., Danelutto, M., Kilpatrick, P.: Skeletons for multi/many-core systems. In: Proceedings of International Conference on Parallel Computing (PARCO), pp. 265–272 (2009) Aldinucci, M., Danelutto, M., Kilpatrick, P.: Skeletons for multi/many-core systems. In: Proceedings of International Conference on Parallel Computing (PARCO), pp. 265–272 (2009)
5.
go back to reference Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In: Proceedings of International Conference on Parallel Processing (ICPP’09), pp. 84–91 (2009) Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In: Proceedings of International Conference on Parallel Processing (ICPP’09), pp. 84–91 (2009)
6.
go back to reference Nitzberg, B., Lo, V.: Distributed shared memory: a survey of issues and algorithms. Computer 24(8), 52–60 (1991)CrossRef Nitzberg, B., Lo, V.: Distributed shared memory: a survey of issues and algorithms. Computer 24(8), 52–60 (1991)CrossRef
7.
go back to reference Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. ACM SIGOPS Oper. Syst. Rev. 42(2), 287–296 (2008)CrossRef Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. ACM SIGOPS Oper. Syst. Rev. 42(2), 287–296 (2008)CrossRef
8.
go back to reference Kwiatkowski, J., Pawlik, M., Konieczny, D.: Parallel program execution anomalies. In: Proceedings of 2nd Workshop on Large Scale Computations on Grids (LaSCoG) (2006) Kwiatkowski, J., Pawlik, M., Konieczny, D.: Parallel program execution anomalies. In: Proceedings of 2nd Workshop on Large Scale Computations on Grids (LaSCoG) (2006)
9.
go back to reference Danelutto, M.: Efficient support for skeletons on workstation clusters. Parallel Process. Lett. 11(1), 41–56 (2001)MathSciNetCrossRef Danelutto, M.: Efficient support for skeletons on workstation clusters. Parallel Process. Lett. 11(1), 41–56 (2001)MathSciNetCrossRef
10.
go back to reference Falcou, J., Sérot, J., Chateau, T., Lapresté, J.T.: Quaff: efficient c++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006)CrossRef Falcou, J., Sérot, J., Chateau, T., Lapresté, J.T.: Quaff: efficient c++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006)CrossRef
11.
go back to reference Cole, M.I.: Algorithmic Skeletons: Structured Management of Parallel Computation. Pitman, London (1989)MATH Cole, M.I.: Algorithmic Skeletons: Structured Management of Parallel Computation. Pitman, London (1989)MATH
12.
go back to reference Enmyren, J., Kessler, C. W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14 (2010) Enmyren, J., Kessler, C. W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14 (2010)
13.
go back to reference Müller-Funk, U., Thonemann, U.W., Vossen, G.: The Münster skeleton library muesli—a comprehensive overview. In: European Research Center for Information Systems, Münster, Report 7 (2009) Müller-Funk, U., Thonemann, U.W., Vossen, G.: The Münster skeleton library muesli—a comprehensive overview. In: European Research Center for Information Systems, Münster, Report 7 (2009)
15.
go back to reference González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)CrossRef González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)CrossRef
16.
go back to reference Lea, D.: A Java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43 (2000) Lea, D.: A Java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43 (2000)
17.
go back to reference Tanno, H., Iwasaki, H.: Parallel skeletons for variable-length lists in sketo skeleton library. In: 15th International Euro-Par Conference on Parallel Processing, pp. 666–677 Tanno, H., Iwasaki, H.: Parallel skeletons for variable-length lists in sketo skeleton library. In: 15th International Euro-Par Conference on Parallel Processing, pp. 666–677
18.
go back to reference Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: Accelerating code on multi-cores with fastflow. In: 17th international Conference on Parallel processing, pp. 1070–181. Springer-Verlag, Berlin, Heidelberg Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: Accelerating code on multi-cores with fastflow. In: 17th international Conference on Parallel processing, pp. 1070–181. Springer-Verlag, Berlin, Heidelberg
19.
go back to reference Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: Proceedings of the Nineteenth ACM-SIAM Symposium on Discrete Algorithms, pp. 501–510 (2008) Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: Proceedings of the Nineteenth ACM-SIAM Symposium on Discrete Algorithms, pp. 501–510 (2008)
20.
go back to reference Leiserson, C.E.: The Cilk++ concurrency platform. J. Supercomput. 51(3), 244–257 (2010)CrossRef Leiserson, C.E.: The Cilk++ concurrency platform. J. Supercomput. 51(3), 244–257 (2010)CrossRef
21.
go back to reference Ravichandran, K., Lee, S., Pande, S.: Work stealing for multi-core HPC clusters. In: Proceedings of Euro-Par 2011 Parallel Processing, pp. 205–217. Springer Berlin Heidelberg (2011)CrossRef Ravichandran, K., Lee, S., Pande, S.: Work stealing for multi-core HPC clusters. In: Proceedings of Euro-Par 2011 Parallel Processing, pp. 205–217. Springer Berlin Heidelberg (2011)CrossRef
22.
go back to reference Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: COTSon: infrastructure for full system simulation. ACM SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)CrossRef Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: COTSon: infrastructure for full system simulation. ACM SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)CrossRef
23.
go back to reference Quinn, M.J., Hatcher, P.J.: On the utility of communication computation overlap in data-parallel programs. J. Parallel Distrib. Comput. 33(2), 197–204 (1996)CrossRef Quinn, M.J., Hatcher, P.J.: On the utility of communication computation overlap in data-parallel programs. J. Parallel Distrib. Comput. 33(2), 197–204 (1996)CrossRef
24.
go back to reference Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)MathSciNetCrossRef Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)MathSciNetCrossRef
25.
go back to reference Manwade, K.B.: Analysis of parallel merge sort algorithm. Analysis 1(19), 66–69 (2010) Manwade, K.B.: Analysis of parallel merge sort algorithm. Analysis 1(19), 66–69 (2010)
26.
go back to reference Ryckbosch, F., Polfliet, S., Eeckhout, L.: Fast, accurate, and validated full-system software simulation of x86 hardware. Micro IEEE 30(6), 46–56 (2010)CrossRef Ryckbosch, F., Polfliet, S., Eeckhout, L.: Fast, accurate, and validated full-system software simulation of x86 hardware. Micro IEEE 30(6), 46–56 (2010)CrossRef
Metadata
Title
An Efficient Programming Skeleton for Clusters of Multi-Core Processors
Authors
Mina Hosseini Rad
Ahmad Patooghy
Mahdi Fazeli
Publication date
18-09-2017
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 6/2018
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-017-0517-y

Other articles of this Issue 6/2018

International Journal of Parallel Programming 6/2018 Go to the issue

Premium Partner