Skip to main content
Top
Published in: International Journal of Parallel Programming 4/2014

01-08-2014

MulticoreBSP for C: A High-Performance Library for Shared-Memory Parallel Programming

Authors: A. N. Yzelman, R. H. Bisseling, D. Roose, K. Meerbergen

Published in: International Journal of Parallel Programming | Issue 4/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the present article, we further investigate this concept and introduce the new high-performance MulticoreBSP for C library. Among other features, this library supports nested BSP runs. We show that existing BSP software performs well regardless whether it runs on distributed-memory or shared-memory architectures, and show that applications in MulticoreBSP can attain high-performance results. The paper details implementing the Fast Fourier Transform and the sparse matrix–vector multiplication in BSP, both of which outperform state-of-the-art implementations written in other shared-memory parallel programming interfaces. We furthermore study the applicability of BSP when working on highly non-uniform memory access architectures.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
For brevity, we omit the const and restrict keywords.
 
2
Note that the original BSPlib used an int as return type and returned -1 if the queue was empty.
 
3
Global variables defined outside of classes remain visible by all BSP processes, however.
 
9
Auto-tuning proceeds in the FFTW_PATIENT mode.
 
10
\(2^{26}\) complex values take \(16\) bytes each, resulting in \(2^{30}\) bytes of storage. This equals \(1\) GB of data. A \(2^{24}\)-length vector thus takes \(256\) MB.
 
11
FFTW 3.3.3 linked against fftw3_threads.
 
12
The parallel efficiency of an algorithm is defined as \(\frac{\text {speedup}}{p}\).
 
Literature
1.
go back to reference Bisseling, R.H.: Parallel Scientific Computation: A Structured Approach Using BSP and MPI. Oxford University Press, Oxford (2004)CrossRef Bisseling, R.H.: Parallel Scientific Computation: A Structured Approach Using BSP and MPI. Oxford University Press, Oxford (2004)CrossRef
2.
go back to reference Bonorden, O., Juurlink, B., von Otte, I., Rieping, I.: The Paderborn University BSP (PUB) library. Parallel Comput. 29(2), 187–207 (2003)CrossRef Bonorden, O., Juurlink, B., von Otte, I., Rieping, I.: The Paderborn University BSP (PUB) library. Parallel Comput. 29(2), 187–207 (2003)CrossRef
3.
go back to reference Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: SPAA’09: Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures, pp. 233–244. ACM, New York, NY (2009) Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: SPAA’09: Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures, pp. 233–244. ACM, New York, NY (2009)
4.
go back to reference Buluç, A., Williams, S., Oliker, L., Demmel, J.: Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In: International Parallel and Distributed Processing Symposium (IPDPS), pp. 721–733. IEEE Press, Piscataway, NJ (2011) Buluç, A., Williams, S., Oliker, L., Demmel, J.: Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In: International Parallel and Distributed Processing Symposium (IPDPS), pp. 721–733. IEEE Press, Piscataway, NJ (2011)
5.
go back to reference De la Torre, P., Kruskal, C.P.: Submachine locality in the bulk synchronous setting. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds.) Euro-Par’96 Parallel Processing. Lecture Notes in Computer Science, vol. 1124, pp. 352–358. Springer, Berlin (1996) De la Torre, P., Kruskal, C.P.: Submachine locality in the bulk synchronous setting. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds.) Euro-Par’96 Parallel Processing. Lecture Notes in Computer Science, vol. 1124, pp. 352–358. Springer, Berlin (1996)
6.
go back to reference Franchetti, F., Püschel, M., Voronenko, Y., Chellappa, S., Moura, J.M.F.: Discrete Fourier transform on multicore. IEEE Signal Process. Mag., special issue on Signal Processing on Platforms with Multiple Cores. 26(6), 90–102 (2009) Franchetti, F., Püschel, M., Voronenko, Y., Chellappa, S., Moura, J.M.F.: Discrete Fourier transform on multicore. IEEE Signal Process. Mag., special issue on Signal Processing on Platforms with Multiple Cores. 26(6), 90–102 (2009)
7.
go back to reference Frigo, M.: A fast Fourier transform compiler. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI’99, pp. 169–180. ACM, New York, NY (1999) Frigo, M.: A fast Fourier transform compiler. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI’99, pp. 169–180. ACM, New York, NY (1999)
8.
go back to reference Hamidouche, K., Falcou, J., Etiemble, D.: Hybrid bulk synchronous parallelism library for clustered SMP architectures. In: Proceedings Fourth International Workshop on High-level Parallel Programming and Applications, pp. 55–62. ACM, New York, NY (2010) Hamidouche, K., Falcou, J., Etiemble, D.: Hybrid bulk synchronous parallelism library for clustered SMP architectures. In: Proceedings Fourth International Workshop on High-level Parallel Programming and Applications, pp. 55–62. ACM, New York, NY (2010)
9.
go back to reference Hill, J.M.D., McColl, B., Stefanescu, D.C., Goudreau, M.W., Lang, K., Rao, S.B., Suel, T., Tsantilas, T., Bisseling, R.H.: BSPlib: the BSP programming library. Parallel Comput. 24(14), 1947–1980 (1998)CrossRef Hill, J.M.D., McColl, B., Stefanescu, D.C., Goudreau, M.W., Lang, K., Rao, S.B., Suel, T., Tsantilas, T., Bisseling, R.H.: BSPlib: the BSP programming library. Parallel Comput. 24(14), 1947–1980 (1998)CrossRef
10.
go back to reference Hinsen, K.: High-level parallel software development with Python and BSP. Parallel Process. Lett. 13(03), 473–484 (2003)CrossRefMathSciNet Hinsen, K.: High-level parallel software development with Python and BSP. Parallel Process. Lett. 13(03), 473–484 (2003)CrossRefMathSciNet
11.
go back to reference IEEE: Std. 1003.1-2008 Portable Operating System Interface (POSIX) Base Specifications, Issue 7. IEEE Standards for Information Technology. IEEE Press, Piscataway, NJ (2008) IEEE: Std. 1003.1-2008 Portable Operating System Interface (POSIX) Base Specifications, Issue 7. IEEE Standards for Information Technology. IEEE Press, Piscataway, NJ (2008)
12.
go back to reference Inda, M.A., Bisseling, R.H.: A simple and efficient parallel FFT algorithm using the BSP model. Parallel Comput. 27(14), 1847–1878 (2001)CrossRefMATHMathSciNet Inda, M.A., Bisseling, R.H.: A simple and efficient parallel FFT algorithm using the BSP model. Parallel Comput. 27(14), 1847–1878 (2001)CrossRefMATHMathSciNet
13.
go back to reference Javed, N., Loulergue, F.: OSL: Optimized bulk synchronous parallel skeletons on distributed arrays. In: Dou, Y., Gruber, R., Joller, J. (eds.) Advanced Parallel Processing Technologies. Lecture Notes in Computer Science, vol. 5737, pp. 436–451. Springer, Berlin (2009) Javed, N., Loulergue, F.: OSL: Optimized bulk synchronous parallel skeletons on distributed arrays. In: Dou, Y., Gruber, R., Joller, J. (eds.) Advanced Parallel Processing Technologies. Lecture Notes in Computer Science, vol. 5737, pp. 436–451. Springer, Berlin (2009)
14.
go back to reference Keßler, C.W.: NestStep: nested parallelism and virtual shared memory for the BSP model. J. Supercomput. 17, 245–262 (2000)CrossRefMATH Keßler, C.W.: NestStep: nested parallelism and virtual shared memory for the BSP model. J. Supercomput. 17, 245–262 (2000)CrossRefMATH
15.
go back to reference Liu, J., Wu, J., Panda, D.K.: High performance RDMA-based MPI implementation over Infiniband. Int. J. Parallel Program. 32(3), 167–198 (2004)CrossRefMATH Liu, J., Wu, J., Panda, D.K.: High performance RDMA-based MPI implementation over Infiniband. Int. J. Parallel Program. 32(3), 167–198 (2004)CrossRefMATH
16.
go back to reference Loulergue, F., Gava, F., Billiet, D.: Bulk synchronous parallel ML: Modular implementation and performance prediction. In: International Conference on Computational Science, Part II, Lecture Notes in Computer Science, vol. 3515, pp. 1046–1054. Springer, Berlin (2005) Loulergue, F., Gava, F., Billiet, D.: Bulk synchronous parallel ML: Modular implementation and performance prediction. In: International Conference on Computational Science, Part II, Lecture Notes in Computer Science, vol. 3515, pp. 1046–1054. Springer, Berlin (2005)
17.
go back to reference Püschel, M., Franchetti, F., Voronenko, Y.: Spiral. In: Encyclopedia of Parallel Computing. Springer, Berlin (2011) Püschel, M., Franchetti, F., Voronenko, Y.: Spiral. In: Encyclopedia of Parallel Computing. Springer, Berlin (2011)
19.
go back to reference Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef
21.
go back to reference Vastenhouw, B., Bisseling, R.H.: A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev. 47(1), 67–95 (2005)CrossRefMATHMathSciNet Vastenhouw, B., Bisseling, R.H.: A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev. 47(1), 67–95 (2005)CrossRefMATHMathSciNet
22.
go back to reference Yzelman, A.N., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. (2013, in press) Yzelman, A.N., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. (2013, in press)
23.
go back to reference Yzelman, A.N., Bisseling, R.H.: Two-dimensional cache-oblivious sparse matrix-vector multiplication. Parallel Comput. 37(12), 806–819 (2011)CrossRef Yzelman, A.N., Bisseling, R.H.: Two-dimensional cache-oblivious sparse matrix-vector multiplication. Parallel Comput. 37(12), 806–819 (2011)CrossRef
24.
go back to reference Yzelman, A.N., Bisseling, R.H.: An object-oriented bulk synchronous parallel library for multicore programming. Concurr. Comput. Pract. Exper. 24(5), 533–553 (2012)CrossRef Yzelman, A.N., Bisseling, R.H.: An object-oriented bulk synchronous parallel library for multicore programming. Concurr. Comput. Pract. Exper. 24(5), 533–553 (2012)CrossRef
Metadata
Title
MulticoreBSP for C: A High-Performance Library for Shared-Memory Parallel Programming
Authors
A. N. Yzelman
R. H. Bisseling
D. Roose
K. Meerbergen
Publication date
01-08-2014
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 4/2014
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-013-0262-9

Other articles of this Issue 4/2014

International Journal of Parallel Programming 4/2014 Go to the issue

Premium Partner