Skip to main content
Top
Published in: Cluster Computing 3/2017

14-02-2017

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Authors: Carlos H. González, Basilio B. Fraguela

Published in: Cluster Computing | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters composed of distributed-memory nodes that contain an increasing number of cores that share a common memory. The optimal exploitation of these systems often requires resorting to a hybrid model that mimics the underlying hardware by combining a distributed and a shared memory parallel programming model. This results in longer development times and increased maintenance costs. In this paper we present a very general skeleton library that allows to parallelize any divide-and-conquer problem in hybrid distributed-shared memory systems with little effort while providing much flexibility and good performance. Our proposal combines a message-passing paradigm at the process level and a threaded model inside each process, hiding the related complexity from the user. The evaluation shows that this skeleton provides performance comparable, and often better than that of manually optimized codes while requiring considerably less effort when parallelizing applications on multi-core clusters.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
One might think that in 2016 every major compiler distribution should support OpenMP, but as a representative example, during the development of this work we found that the compilers of the standard development environment for the current version of Mac OS X do not support OpenMP.
 
Literature
1.
go back to reference Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)MATH Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)MATH
2.
go back to reference Aldinucci, M., Danelutto, M., Teti, P.: An advanced environment supporting structured parallel programming in Java. Future Gener. Comput. Syst. 19(5), 611–626 (2003)CrossRef Aldinucci, M., Danelutto, M., Teti, P.: An advanced environment supporting structured parallel programming in Java. Future Gener. Comput. Syst. 19(5), 611–626 (2003)CrossRef
3.
go back to reference Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature 324(4), 446–559 (1986)CrossRef Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature 324(4), 446–559 (1986)CrossRef
4.
go back to reference Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana- Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005) Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana- Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005)
5.
go back to reference Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRef Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRef
7.
go back to reference Ciechanowicz, P., Kuchen, H.: Enhancing Muesli’s data parallel skeletons for multi-core computer architectures. In 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 108–113. Los Alamitos, CA (2010) Ciechanowicz, P., Kuchen, H.: Enhancing Muesli’s data parallel skeletons for multi-core computer architectures. In 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 108–113. Los Alamitos, CA (2010)
8.
go back to reference Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA (1989)MATH Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA (1989)MATH
9.
go back to reference Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRef Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRef
10.
go back to reference Danelutto, M., De Matteis, T., Mencagli, G., Torquati, M.: A divide-and-conquer parallel pattern implementation for multicores. In Proceedings 3rd International Workshop on Software Engineering for Parallel Systems, SEPS 2016, pp. 10–19. New York, NY (2016). ACM Danelutto, M., De Matteis, T., Mencagli, G., Torquati, M.: A divide-and-conquer parallel pattern implementation for multicores. In Proceedings 3rd International Workshop on Software Engineering for Parallel Systems, SEPS 2016, pp. 10–19. New York, NY (2016). ACM
11.
go back to reference Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
12.
go back to reference Denis, A.: pioman: A pthread-based multithreaded communication engine. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 155–162. Los Alamitos, CA (2015) Denis, A.: pioman: A pthread-based multithreaded communication engine. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 155–162. Los Alamitos, CA (2015)
13.
go back to reference Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient C++ design for parallel skeletons. Parallel Comput. 32(7–8), 604–615 (2006)CrossRef Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient C++ design for parallel skeletons. Parallel Comput. 32(7–8), 604–615 (2006)CrossRef
14.
go back to reference Fleming, P.J., Wallace, J.J.: How not to lie with statistics: the correct way to summarize benchmark results. Commun. ACM 29(3), 218–221 (1986)CrossRef Fleming, P.J., Wallace, J.J.: How not to lie with statistics: the correct way to summarize benchmark results. Commun. ACM 29(3), 218–221 (1986)CrossRef
15.
go back to reference Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In Proceeding of 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, pp. 285–297. Washington, DC, USA. IEEE Computer Society (1999) Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In Proceeding of 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, pp. 285–297. Washington, DC, USA. IEEE Computer Society (1999)
16.
go back to reference González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In Proceedings of 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 79–88. Los Alamitos, CA (2010) González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In Proceedings of 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 79–88. Los Alamitos, CA (2010)
17.
go back to reference González, C.H., Fraguela, B.B.: A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput. 39(9), 475–489 (2013)CrossRef González, C.H., Fraguela, B.B.: A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput. 39(9), 475–489 (2013)CrossRef
18.
go back to reference González, C.H., Fraguela, B.B.: An algorithm template for domain-based parallel irregular algorithms. Int. J. Parallel Program. 42(6), 948–967 (2014)CrossRef González, C.H., Fraguela, B.B.: An algorithm template for domain-based parallel irregular algorithms. Int. J. Parallel Program. 42(6), 948–967 (2014)CrossRef
19.
go back to reference Gorlatch, S., Cole, M.: Parallel skeletons. Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, New York (2011) Gorlatch, S., Cole, M.: Parallel skeletons. Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, New York (2011)
21.
go back to reference Halstead, M.H.: Elements of Software Science. Elsevier, New York, NY (1977)MATH Halstead, M.H.: Elements of Software Science. Elsevier, New York, NY (1977)MATH
22.
go back to reference Hijma, P., Jacobs, C.J.H., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015), pp. 135–145 (2015) Hijma, P., Jacobs, C.J.H., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015), pp. 135–145 (2015)
23.
go back to reference Horowitz, E., Zorat, A.: Divide-and-conquer for parallel processing. IEEE Trans. Comput. 32(6), 582–585 (1983)CrossRefMATH Horowitz, E., Zorat, A.: Divide-and-conquer for parallel processing. IEEE Trans. Comput. 32(6), 582–585 (1983)CrossRefMATH
25.
go back to reference Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In Proceedings of 2009 International Conference on Parallel Processing (ICPP’09), pp. 84–91. Los Alamitos, CA (2009) Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In Proceedings of 2009 International Conference on Parallel Processing (ICPP’09), pp. 84–91. Los Alamitos, CA (2009)
26.
go back to reference Kawakatsu, T., Kinoshita, A., Takasu, A., Adachi, J.: Divide-and-Conquer Parallelism for Learning Mixture Models, pp. 23–47. Springer, Berlin (2016) Kawakatsu, T., Kinoshita, A., Takasu, A., Adachi, J.: Divide-and-Conquer Parallelism for Learning Mixture Models, pp. 23–47. Springer, Berlin (2016)
27.
go back to reference Kuchen, H.: A skeleton library. Euro-Par 2002 Parallel Processing. Volume 2400 of Lecture Notes in Computer Science, pp. 620–629. Springer, Berlin (2002) Kuchen, H.: A skeleton library. Euro-Par 2002 Parallel Processing. Volume 2400 of Lecture Notes in Computer Science, pp. 620–629. Springer, Berlin (2002)
28.
go back to reference Kulkarni, M., Burtscher, M., Cascaval, C., Pingali, K.: Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 65–76 (2009) Kulkarni, M., Burtscher, M., Cascaval, C., Pingali, K.: Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 65–76 (2009)
29.
go back to reference Leyton, M., Piquer, J. M.: Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2010), pp. 289–296. Los Alamitos, CA (2010) Leyton, M., Piquer, J. M.: Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2010), pp. 289–296. Los Alamitos, CA (2010)
30.
go back to reference Lima, J.V.F., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xKaapi on Intel Xeon Phi coprocessor. In Proceedings of 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2013), pp. 105–112. Los Alamitos, CA (2013) Lima, J.V.F., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xKaapi on Intel Xeon Phi coprocessor. In Proceedings of 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2013), pp. 105–112. Los Alamitos, CA (2013)
31.
go back to reference Mallón, D.A., Taboada, G.L., Teijeiro, C., Touriño, J., Fraguela, B.B., Gómez-Tato, A., Doallo, R., Carlos Mouriño, J.: Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, pp. 174–184. Springer, Berlin (2009) Mallón, D.A., Taboada, G.L., Teijeiro, C., Touriño, J., Fraguela, B.B., Gómez-Tato, A., Doallo, R., Carlos Mouriño, J.: Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, pp. 174–184. Springer, Berlin (2009)
32.
go back to reference Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional, Boston, MA (2004)MATH Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional, Boston, MA (2004)MATH
34.
go back to reference Nakatsukasa, Y., Higham, N .J.: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM J. Sci. Comput. 35(3), A1325–A1349 (2013)MathSciNetCrossRefMATH Nakatsukasa, Y., Higham, N .J.: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM J. Sci. Comput. 35(3), A1325–A1349 (2013)MathSciNetCrossRefMATH
36.
go back to reference Olivier, S.L., Prins, J.F.: Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs. Int. J. Parallel Program. 38(5–6), 341–360 (2010)CrossRefMATH Olivier, S.L., Prins, J.F.: Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs. Int. J. Parallel Program. 38(5–6), 341–360 (2010)CrossRefMATH
37.
go back to reference Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol, CA (2007) Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol, CA (2007)
38.
go back to reference Rogers, A., Carlisle, M .C., Reppy, J .H., Hendren, L .J.: Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst. 17(2), 233–263 (1995)CrossRef Rogers, A., Carlisle, M .C., Reppy, J .H., Hendren, L .J.: Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst. 17(2), 233–263 (1995)CrossRef
39.
go back to reference Tang, G., Yang, W., Li, K., Ye, Y., Xiao, G., Li, K.: An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures. Concurr. Comput. Pract. Exp. 27(17), 5076–5095 (2015)CrossRef Tang, G., Yang, W., Li, K., Ye, Y., Xiao, G., Li, K.: An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures. Concurr. Comput. Pract. Exp. 27(17), 5076–5095 (2015)CrossRef
40.
go back to reference Teijeiro, C., Taboada, G.L., Touriño, J., Fraguela, B.B., Doallo, R., Mallón, D.A., Gómez, A., Mouriño, J.C., Wibecan, B.: Evaluation of UPC programmability using classroom studies. In Proceedings of Third Conference on Partitioned Global Address Space Programing Models, PGAS ’09, pp. 10:1–10:7, New York, NY (2009) Teijeiro, C., Taboada, G.L., Touriño, J., Fraguela, B.B., Doallo, R., Mallón, D.A., Gómez, A., Mouriño, J.C., Wibecan, B.: Evaluation of UPC programmability using classroom studies. In Proceedings of Third Conference on Partitioned Global Address Space Programing Models, PGAS ’09, pp. 10:1–10:7, New York, NY (2009)
41.
go back to reference Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. 24(18), 2421–2448 (2012)CrossRef Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. 24(18), 2421–2448 (2012)CrossRef
42.
go back to reference Tousimojarad, A., Vanderbauwhede, W.: Number of tasks, not threads, is key. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 128–136. Los Alamitos, CA (2015) Tousimojarad, A., Vanderbauwhede, W.: Number of tasks, not threads, is key. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 128–136. Los Alamitos, CA (2015)
43.
go back to reference Van Nieuwpoort, R.V., Wrzesińska, G., Jacobs, C.J.H., Bal, H.E.: Satin: A high-level and efficient grid programming model. ACM Trans. Program. Lang. Syst. 32(3), 9 (2010) Van Nieuwpoort, R.V., Wrzesińska, G., Jacobs, C.J.H., Bal, H.E.: Satin: A high-level and efficient grid programming model. ACM Trans. Program. Lang. Syst. 32(3), 9 (2010)
45.
go back to reference White, T: Hadoop: The Definitive Guide. O’Reilly Media, Inc., 1st edition (2009) White, T: Hadoop: The Definitive Guide. O’Reilly Media, Inc., 1st edition (2009)
46.
go back to reference Yelick, K., Bonachea, D., Chen, W.-Y., Colella, P., Datta, K., Duell, J., Graham, S.L., Hargrove, P., Hilfinger, P., Husbands, P., Iancu, C., Kamil, A., Nishtala, R., Su, J., Welcome, M., Wen, T.: Productivity and performance using partitioned global address space languages. In Proceedings of 2007 International Workshop on Parallel Symbolic Computation, PASCO ’07, pp. 24–32. New York, NY (2007) Yelick, K., Bonachea, D., Chen, W.-Y., Colella, P., Datta, K., Duell, J., Graham, S.L., Hargrove, P., Hilfinger, P., Husbands, P., Iancu, C., Kamil, A., Nishtala, R., Su, J., Welcome, M., Wen, T.: Productivity and performance using partitioned global address space languages. In Proceedings of 2007 International Workshop on Parallel Symbolic Computation, PASCO ’07, pp. 24–32. New York, NY (2007)
47.
go back to reference Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In Proceedings of 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, Berkeley, CA. USENIX Association (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In Proceedings of 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, Berkeley, CA. USENIX Association (2010)
48.
go back to reference Zang, W., Zhang, P., Zhou, C., Guo, L.: Locating multiple sources in social networks under the SIR model: a divide-and-conquer approach. J. Comput. Sci. 10, 278–287 (2015)MathSciNetCrossRef Zang, W., Zhang, P., Zhou, C., Guo, L.: Locating multiple sources in social networks under the SIR model: a divide-and-conquer approach. J. Comput. Sci. 10, 278–287 (2015)MathSciNetCrossRef
49.
go back to reference Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In 26th Annual Conference on Learning Theory (COLT 2013), pp. 592–617 (2013) Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In 26th Annual Conference on Learning Theory (COLT 2013), pp. 592–617 (2013)
Metadata
Title
A general and efficient divide-and-conquer algorithm framework for multi-core clusters
Authors
Carlos H. González
Basilio B. Fraguela
Publication date
14-02-2017
Publisher
Springer US
Published in
Cluster Computing / Issue 3/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0766-y

Other articles of this Issue 3/2017

Cluster Computing 3/2017 Go to the issue

Premium Partner