Top

Cluster Computing

Published in:

14-02-2017

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Authors: Carlos H. González, Basilio B. Fraguela

Published in: Cluster Computing | Issue 3/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters composed of distributed-memory nodes that contain an increasing number of cores that share a common memory. The optimal exploitation of these systems often requires resorting to a hybrid model that mimics the underlying hardware by combining a distributed and a shared memory parallel programming model. This results in longer development times and increased maintenance costs. In this paper we present a very general skeleton library that allows to parallelize any divide-and-conquer problem in hybrid distributed-shared memory systems with little effort while providing much flexibility and good performance. Our proposal combines a message-passing paradigm at the process level and a threaded model inside each process, hiding the related complexity from the user. The evaluation shows that this skeleton provides performance comparable, and often better than that of manually optimized codes while requiring considerably less effort when parallelizing applications on multi-core clusters.

previous article Parallel GPU-based collision detection of irregular vessel wall for massive particles

next article A low-latency storage stack for fast storage devices

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

One might think that in 2016 every major compiler distribution should support OpenMP, but as a representative example, during the development of this work we found that the compilers of the standard development environment for the current version of Mac OS X do not support OpenMP.

Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)MATH

Aldinucci, M., Danelutto, M., Teti, P.: An advanced environment supporting structured parallel programming in Java. Future Gener. Comput. Syst. 19(5), 611–626 (2003)CrossRef

Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature 324(4), 446–559 (1986)CrossRef

Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana- Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005)

Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRef

Boost.org. Boost C++ libraries. http://boost.org (2016). Accessed 10 Dec 2016

Ciechanowicz, P., Kuchen, H.: Enhancing Muesli’s data parallel skeletons for multi-core computer architectures. In 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 108–113. Los Alamitos, CA (2010)

Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA (1989)MATH

Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRef

10.

Danelutto, M., De Matteis, T., Mencagli, G., Torquati, M.: A divide-and-conquer parallel pattern implementation for multicores. In Proceedings 3rd International Workshop on Software Engineering for Parallel Systems, SEPS 2016, pp. 10–19. New York, NY (2016). ACM

11.

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

12.

Denis, A.: pioman: A pthread-based multithreaded communication engine. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 155–162. Los Alamitos, CA (2015)

13.

Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient C++ design for parallel skeletons. Parallel Comput. 32(7–8), 604–615 (2006)CrossRef

14.

Fleming, P.J., Wallace, J.J.: How not to lie with statistics: the correct way to summarize benchmark results. Commun. ACM 29(3), 218–221 (1986)CrossRef

15.

Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In Proceeding of 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, pp. 285–297. Washington, DC, USA. IEEE Computer Society (1999)

16.

González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In Proceedings of 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 79–88. Los Alamitos, CA (2010)

17.

González, C.H., Fraguela, B.B.: A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput. 39(9), 475–489 (2013)CrossRef

18.

González, C.H., Fraguela, B.B.: An algorithm template for domain-based parallel irregular algorithms. Int. J. Parallel Program. 42(6), 948–967 (2014)CrossRef

19.

Gorlatch, S., Cole, M.: Parallel skeletons. Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, New York (2011)

20.

Gregor, D., Troyer, M.: Boost.MPI. http://boost.cowic.de/rc/pdf/mpi.pdf (2007)

21.

Halstead, M.H.: Elements of Software Science. Elsevier, New York, NY (1977)MATH

22.

Hijma, P., Jacobs, C.J.H., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015), pp. 135–145 (2015)

23.

Horowitz, E., Zorat, A.: Divide-and-conquer for parallel processing. IEEE Trans. Comput. 32(6), 582–585 (1983)CrossRefMATH

24.

Intel\(\textregistered \). Cilk\(^{\text{TM}}\) Plus. https://www.cilkplus.org (2016). Accessed 10 Dec 2016

25.

Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In Proceedings of 2009 International Conference on Parallel Processing (ICPP’09), pp. 84–91. Los Alamitos, CA (2009)

26.

Kawakatsu, T., Kinoshita, A., Takasu, A., Adachi, J.: Divide-and-Conquer Parallelism for Learning Mixture Models, pp. 23–47. Springer, Berlin (2016)

27.

Kuchen, H.: A skeleton library. Euro-Par 2002 Parallel Processing. Volume 2400 of Lecture Notes in Computer Science, pp. 620–629. Springer, Berlin (2002)

28.

Kulkarni, M., Burtscher, M., Cascaval, C., Pingali, K.: Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 65–76 (2009)

29.

Leyton, M., Piquer, J. M.: Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2010), pp. 289–296. Los Alamitos, CA (2010)

30.

Lima, J.V.F., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xKaapi on Intel Xeon Phi coprocessor. In Proceedings of 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2013), pp. 105–112. Los Alamitos, CA (2013)

31.

Mallón, D.A., Taboada, G.L., Teijeiro, C., Touriño, J., Fraguela, B.B., Gómez-Tato, A., Doallo, R., Carlos Mouriño, J.: Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, pp. 174–184. Springer, Berlin (2009)

32.

Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional, Boston, MA (2004)MATH

33.

McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2, 308–320 (1976)MathSciNetCrossRefMATH

34.

Nakatsukasa, Y., Higham, N .J.: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM J. Sci. Comput. 35(3), A1325–A1349 (2013)MathSciNetCrossRefMATH

35.

National Aeronautics and Space Administration. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/ (2010). Accessed 10 Dec 2016

36.

Olivier, S.L., Prins, J.F.: Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs. Int. J. Parallel Program. 38(5–6), 341–360 (2010)CrossRefMATH

37.

Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol, CA (2007)

38.

Rogers, A., Carlisle, M .C., Reppy, J .H., Hendren, L .J.: Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst. 17(2), 233–263 (1995)CrossRef

39.

Tang, G., Yang, W., Li, K., Ye, Y., Xiao, G., Li, K.: An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures. Concurr. Comput. Pract. Exp. 27(17), 5076–5095 (2015)CrossRef

40.

Teijeiro, C., Taboada, G.L., Touriño, J., Fraguela, B.B., Doallo, R., Mallón, D.A., Gómez, A., Mouriño, J.C., Wibecan, B.: Evaluation of UPC programmability using classroom studies. In Proceedings of Third Conference on Partitioned Global Address Space Programing Models, PGAS ’09, pp. 10:1–10:7, New York, NY (2009)

41.

Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. 24(18), 2421–2448 (2012)CrossRef

42.

Tousimojarad, A., Vanderbauwhede, W.: Number of tasks, not threads, is key. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 128–136. Los Alamitos, CA (2015)

43.

Van Nieuwpoort, R.V., Wrzesińska, G., Jacobs, C.J.H., Bal, H.E.: Satin: A high-level and efficient grid programming model. ACM Trans. Program. Lang. Syst. 32(3), 9 (2010)

44.

Walter, J., Koch, M.: Boost basic linear algebra library (uBLAS). http://www.boost.org/libs/numeric/ublas/ (2002). Accessed 7 Dec 2016

45.

White, T: Hadoop: The Definitive Guide. O’Reilly Media, Inc., 1st edition (2009)

46.

Yelick, K., Bonachea, D., Chen, W.-Y., Colella, P., Datta, K., Duell, J., Graham, S.L., Hargrove, P., Hilfinger, P., Husbands, P., Iancu, C., Kamil, A., Nishtala, R., Su, J., Welcome, M., Wen, T.: Productivity and performance using partitioned global address space languages. In Proceedings of 2007 International Workshop on Parallel Symbolic Computation, PASCO ’07, pp. 24–32. New York, NY (2007)

47.

Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In Proceedings of 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, Berkeley, CA. USENIX Association (2010)

48.

Zang, W., Zhang, P., Zhou, C., Guo, L.: Locating multiple sources in social networks under the SIR model: a divide-and-conquer approach. J. Comput. Sci. 10, 278–287 (2015)MathSciNetCrossRef

49.

Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In 26th Annual Conference on Learning Theory (COLT 2013), pp. 592–617 (2013)

Title: A general and efficient divide-and-conquer algorithm framework for multi-core clusters
Authors: Carlos H. González
Basilio B. Fraguela
Publication date: 14-02-2017
Publisher: Springer US
Published in: Cluster Computing / Issue 3/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-0766-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2017

New directly revocable attribute-based encryption scheme and its application in cloud storage environment

Self-organized dynamic provisioning for big data

Cloud-based learning system for answer ranking

Spectral-spatial multi-feature classification of remote sensing big data based on a random forest classifier for land cover mapping

Two level parallelism and I/O reduction in genome comparisons

VNF-EQ: dynamic placement of virtual network functions for energy efficiency and QoS guarantee in NFV

Premium Partner