nach oben

The Journal of Supercomputing

Erschienen in:

01.11.2012

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

verfasst von: José M. Cecilia, José L. Abellán, Juan Fernández, Manuel E. Acacio, José M. García, Manuel Ujaldón

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We are witnessing the consolidation of the heterogeneous computing in parallel computing with architectures such as Cell Broadband Engine (Cell BE) or Graphics Processing Units (GPUs) which are present in a myriad of developments for high performance computing. These platforms provide a Software Development Kit (SDK) to maximize performance at the expense of dealing with complex and low-level architectural details which makes the software development a daunting task. This paper explores stencil computations in several heterogeneous programming models like Cell SDK, CellSs, ALF and CUDA to optimize the Jacobi method for solving Laplace’s differential equation. We describe the programming techniques to extract the maximum performance on the Cell BE and the GPU, and compare their computing paradigms. Experimental results are shown on two Nvidia Teslas and one IBM BladeCenter QS20 blade which incorporates two 3.2 GHz Cell BEs v 5.1. The speed-up factor for our set of GPU optimizations reaches 3–4×, and the execution times defeat those of the Cell BE by an order of magnitude, also showing great scalability when moving towards newer GPU generations and/or more demanding problem sizes.

Vorheriger Artikel Online execution time prediction for computationally intensive applications with periodic progress updates

Nächster Artikel Fast attack detection using correlation and summarizing of security alerts in grid computing networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abellán JL, Fernández J, Acacio ME (2008) Characterizing the basic synchronization and communication operations in dual cell-based blades. In: International conference on computational science, Krakow, Poland.

Amorim R, Haase G, Liebmann M, Weber dos Santos R (2009) Comparing CUDA and OpenGL implementations for a Jacobi iteration. In: Smari WW (ed) Proceedings of the 2009 high performance computing & simulation conference (HPCS’09), IEEE, New Jersey. Logos Verlag, Berlin, pp 22–32 CrossRef

Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley. Tech rep UCB/EECS-2006-183, EECS Department, University of California, Berkeley

Christen M, Schenk O, Neufeld E, Messmer P, Burkhart H (2009) Parallel data-locality aware stencil computations on modern micro-architectures. In: Proceedings of the 2009 IEEE international symposium on parallel & distributed processing (IPDPS ’09). IEEE Computer Society, Washington, pp 1–10

Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing (SC ’08). IEEE Press, Piscataway, pp 1–12

Demmel JW (1997) Applied numerical linear algebra. In: Society for industrial and applied mathematics. SIAM, Philadelphia

Fang X, Tang Y, Wang G, Tang T, Zhang Y (2010) Optimizing stencil application on multi-thread GPU architecture using stream programming model. In: Proceedings of 23rd international conference (ARCS), Hannover, Germany, pp 234–245

Gaona E, Fernández J, Acacio ME (2009) Fast and efficient synchronization and communication collective primitives for dual cell-based blades. In: Euro-Par, pp 900–911

Hill J (2007) Scientific programming on the cell using ALF. Tech rep, HPCx consortium

10.

Systems IBM Technology Group (2007) Cell broadband engine programming tutorial version 2.1

11.

IBM Systems and Technology Group (2007) SPE runtime management library version 2.1

12.

Intel: Array building blocks (2012). http://software.intel.com/en-us/articles/intel-array-building-blocks/

13.

Kahle J, Day M, Hofstee H, Johns C, Maeurer T, Shippy D (2005) Introduction to the cell multiprocessor. IBM J Res Dev 49(4/5):589–604 CrossRef

14.

Lester BP (1993) The art of parallel programming. Prentice-Hall, Upper Saddle River

15.

Lindholm E, Nickolls J, Oberman S, Montrym J (2008) Nvidia tesla: a unified graphics and computing architecture. IEEE MICRO 28(2):39–55. http://doi.ieeecomputersociety.org/10.1109/MM.2008.31 CrossRef

16.

Maruyama N, Nomura T, Sato K, Matsuoka S (2011) Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis (SC ’11), New York, USA, pp 11:1–11:12

17.

McCool MD (2008) Scalable programming models for massively multicore processors. IEEE MICRO 96(5):816–831

18.

NVIDIA: (2008) NVIDIA CUDA programming guide 2.0

19.

Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) Gpu computing. Proc IEEE 96(5):879–899 CrossRef

20.

Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell T (2007) A survey of general-purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113 CrossRef

21.

Renganarayana L, Harthikote-matha M, Dewri R, Rajopadhye S (2007) Towards optimal multi-level tiling for stencil computations. In Proceedings of 21st IEEE international parallel and distributed processing symposium (IPDPS), Long Beach, CA, USA

22.

Stone JE, Gohara D, Shi G (2010) Opencl: A parallel programming standard for heterogeneous computing systems. IEEE Des Test Comput 12(3):66–73. http://dx.doi.org/10.1109/MCSE.2010.69

23.

Unat D, Cai X, Baden SB (2011) Mint: realizing CUDA performance in 3D stencil methods with annotated C. In: Proceedings of the international conference on supercomputing (ICS ’11). ACM, New York, pp 214–224

24.

Venkatasubramanian S, Vuduc RW, None N (2009) Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems. In: Proceedings of the 23rd international conference on supercomputing (ICS ’09). ACM, New York, pp 244–255 CrossRef

Titel: Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
verfasst von: José M. Cecilia
José L. Abellán
Juan Fernández
Manuel E. Acacio
José M. García
Manuel Ujaldón
Publikationsdatum: 01.11.2012
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 2/2012
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-012-0749-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 2/2012

From immediate agreement to eventual agreement: early stopping agreement protocol for dynamic networks with malicious faulty processors

Optical supercomputing: introduction to special issue

State space reduction in modeling checking parameterized cache coherence protocol by two-dimensional abstraction

Paradigmatic shifts for exascale supercomputing

Efficient resource management for virtual desktop cloud computing

Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters

Premium Partner