research-article

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Author:
Istvan Z. Reguly

Pázmány Péter Catholic University, Hungary, Hungary

Pázmány Péter Catholic University, Hungary, Hungary

0000-0002-4385-4204
View Profile

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023Pages 1038–1047https://doi.org/10.1145/3624062.3624180

Published:12 November 2023Publication History

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1038–1047

ABSTRACT

In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three major vendors; we evaluate performance on the Intel(R) Data Center GPU Max 1100, the NVIDIA A100 GPU, and the AMD MI250X GPU. Support on CPUs currently is less established, with DPC++ only supporting x86 CPUs through OpenCL, however, OpenSYCL does have an OpenMP backend capable of targeting all modern CPUs; we benchmark the Intel Xeon Platinum 8360Y Processor (Ice Lake), the AMD EPYC 9V33X (Genoa-X), and the Ampere Altra platforms. We study a range of primarily bandwidth-bound applications implemented using the OPS and OP2 DSLs, evaluate different formulations in SYCL, and contrast their performance to “native” programming approaches where available (CUDA/HIP/OpenMP). On GPU architectures SCYL on average even slightly outperforms native approaches, while on CPUs it falls behind - highlighting a continued need for improving CPU performance. While SYCL does not solve all the challenges of performance portability (e.g. needing different algorithms on different hardware), it does provide a single programming model and ecosystem to target most current HPC architectures productively.

References

Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL. In Proceedings of the International Workshop on OpenCL. 1–1.Google ScholarDigital Library
Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, 2006. The landscape of parallel computing research: A view from berkeley. (2006).Google Scholar
Cédric Chevalier and François Pellegrini. 2008. PT-Scotch: A tool for efficient parallel graph ordering. Parallel computing 34, 6-8 (2008), 318–331.Google Scholar
Steffen Christgau and Thomas Steinke. 2020. Porting a legacy cuda stencil code to oneapi. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 359–367.Google ScholarCross Ref
Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2018. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering 17, 3 (2018), 247–262.Google ScholarCross Ref
H Carter Edwards, Christian R Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of parallel and distributed computing 74, 12 (2014), 3202–3216.Google ScholarDigital Library
Mehdi Goli, Kumudha Narasimhan, Ruyman Reyes, Ben Tracy, Daniel Soutar, Svetlozar Georgiev, Evarist M Fomenko, and Eugene Chereshnev. 2020. Towards cross-platform performance portability of dnn models using sycl. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 25–35.Google ScholarCross Ref
Intel. [n. d.]. Intel/LLVM: Intel staging area for llvm.org contribution. home for Intel LLVM-based projects.https://github.com/intel/llvmGoogle Scholar
C. T. Jacobs, S. P. Jammy, and N. D. Sandham. 2017. OpenSBLI: A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures. Journal of Computational Science 18 (2017), 12–23. https://doi.org/10.1016/j.jocs.2016.11.001Google ScholarCross Ref
Zheming Jin and Jeffrey S Vetter. 2022. Understanding performance portability of bioinformatics applications in sycl on an nvidia gpu. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2190–2195.Google ScholarCross Ref
John Kessenich, Boaz Ouriel, and Raun Krisch. 2018. Spir-v specification. Khronos Group 3 (2018), 17.Google Scholar
Richard O Kirk, Gihan R Mudalige, Istvan Z Reguly, Steven A Wright, Matt J Martineau, and Stephen A Jarvis. 2017. Achieving performance portability for a heat conduction solver mini-application on modern multi-core systems. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 834–841.Google ScholarCross Ref
Andrew Mallinson, David A Beckingsale, Wayne Gaudin, J Herdman, John Levesque, and Stephen A Jarvis. 2013. Cloverleaf: Preparing hydrodynamics codes for exascale. The Cray User Group 2013 (2013).Google Scholar
Aaftab Munshi. 2009. The opencl specification. In 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, 1–314.Google ScholarCross Ref
AMB Owenson, Steven A Wright, Richard A Bunt, YK Ho, Matthew J Street, and Stephen A Jarvis. 2020. An unstructured CFD mini-application for the performance prediction of a production CFD code. Concurrency and Computation: Practice and Experience 32, 10 (2020), e5443.Google ScholarCross Ref
S. John Pennycook and Jason D. Sewall. 2021. Revisiting a Metric for Performance Portability. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–9. https://doi.org/10.1109/P3HPC54578.2021.00004Google ScholarCross Ref
I Reguly. 2012. Op2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In 2012 Innovative Parallel Computing (InPar). IEEE, 1–12.Google Scholar
IZ Reguly, AC Mallinson, WP Gaudin, and JA Herdman. 2015. Performance analysis of a high-level abstractions-based hydrocode on future computing systems. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation: 5th International Workshop, PMBS 2014, New Orleans, LA, USA, November 16, 2014. Revised Selected Papers 5. Springer, 85–104.Google Scholar
István Z Reguly, Gihan R Mudalige, Michael B Giles, Dan Curran, and Simon McIntosh-Smith. 2014. The ops domain specific abstraction for multi-block structured grid computations. In 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing. IEEE, 58–67.Google ScholarCross Ref
Yuhsiang M Tsai, Terry Cojean, and Hartwig Anzt. 2021. Porting sparse linear algebra to Intel GPUs. In European Conference on Parallel Processing. Springer, 57–68.Google Scholar

Index Terms

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Vector / streaming algorithms
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Distributed programming languages
        Parallel programming languages

Recommendations

Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
Abstract
The aim of SYCL is to reduce the gap between the performance and code portability of the main accelerators used in HPC, such as multi-vendor CPUs, GPUs, and FPGAs. To evaluate SYCL’s performance portability, this paper uses the k-means algorithm ...
Read More
Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCL

Event Generators are essential tools for simulating Standard Model particle interactions, representing the initial step in modeling proton-proton collisions in the Large Hadron Collider (LHC) at CERN. Traditionally relying on a few algorithms like ...
Read More
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed Systems

The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Benchmarking
CFD
CPU
GPU
SYCL
portability
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 54
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures

Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures

Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media