skip to main content
10.1145/3624062.3624180acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Published:12 November 2023Publication History

ABSTRACT

In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three major vendors; we evaluate performance on the Intel(R) Data Center GPU Max 1100, the NVIDIA A100 GPU, and the AMD MI250X GPU. Support on CPUs currently is less established, with DPC++ only supporting x86 CPUs through OpenCL, however, OpenSYCL does have an OpenMP backend capable of targeting all modern CPUs; we benchmark the Intel Xeon Platinum 8360Y Processor (Ice Lake), the AMD EPYC 9V33X (Genoa-X), and the Ampere Altra platforms. We study a range of primarily bandwidth-bound applications implemented using the OPS and OP2 DSLs, evaluate different formulations in SYCL, and contrast their performance to “native” programming approaches where available (CUDA/HIP/OpenMP). On GPU architectures SCYL on average even slightly outperforms native approaches, while on CPUs it falls behind - highlighting a continued need for improving CPU performance. While SYCL does not solve all the challenges of performance portability (e.g. needing different algorithms on different hardware), it does provide a single programming model and ecosystem to target most current HPC architectures productively.

References

  1. Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL. In Proceedings of the International Workshop on OpenCL. 1–1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, 2006. The landscape of parallel computing research: A view from berkeley. (2006).Google ScholarGoogle Scholar
  3. Cédric Chevalier and François Pellegrini. 2008. PT-Scotch: A tool for efficient parallel graph ordering. Parallel computing 34, 6-8 (2008), 318–331.Google ScholarGoogle Scholar
  4. Steffen Christgau and Thomas Steinke. 2020. Porting a legacy cuda stencil code to oneapi. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 359–367.Google ScholarGoogle ScholarCross RefCross Ref
  5. Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2018. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering 17, 3 (2018), 247–262.Google ScholarGoogle ScholarCross RefCross Ref
  6. H Carter Edwards, Christian R Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of parallel and distributed computing 74, 12 (2014), 3202–3216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mehdi Goli, Kumudha Narasimhan, Ruyman Reyes, Ben Tracy, Daniel Soutar, Svetlozar Georgiev, Evarist M Fomenko, and Eugene Chereshnev. 2020. Towards cross-platform performance portability of dnn models using sycl. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 25–35.Google ScholarGoogle ScholarCross RefCross Ref
  8. Intel. [n. d.]. Intel/LLVM: Intel staging area for llvm.org contribution. home for Intel LLVM-based projects.https://github.com/intel/llvmGoogle ScholarGoogle Scholar
  9. C. T. Jacobs, S. P. Jammy, and N. D. Sandham. 2017. OpenSBLI: A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures. Journal of Computational Science 18 (2017), 12–23. https://doi.org/10.1016/j.jocs.2016.11.001Google ScholarGoogle ScholarCross RefCross Ref
  10. Zheming Jin and Jeffrey S Vetter. 2022. Understanding performance portability of bioinformatics applications in sycl on an nvidia gpu. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2190–2195.Google ScholarGoogle ScholarCross RefCross Ref
  11. John Kessenich, Boaz Ouriel, and Raun Krisch. 2018. Spir-v specification. Khronos Group 3 (2018), 17.Google ScholarGoogle Scholar
  12. Richard O Kirk, Gihan R Mudalige, Istvan Z Reguly, Steven A Wright, Matt J Martineau, and Stephen A Jarvis. 2017. Achieving performance portability for a heat conduction solver mini-application on modern multi-core systems. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 834–841.Google ScholarGoogle ScholarCross RefCross Ref
  13. Andrew Mallinson, David A Beckingsale, Wayne Gaudin, J Herdman, John Levesque, and Stephen A Jarvis. 2013. Cloverleaf: Preparing hydrodynamics codes for exascale. The Cray User Group 2013 (2013).Google ScholarGoogle Scholar
  14. Aaftab Munshi. 2009. The opencl specification. In 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, 1–314.Google ScholarGoogle ScholarCross RefCross Ref
  15. AMB Owenson, Steven A Wright, Richard A Bunt, YK Ho, Matthew J Street, and Stephen A Jarvis. 2020. An unstructured CFD mini-application for the performance prediction of a production CFD code. Concurrency and Computation: Practice and Experience 32, 10 (2020), e5443.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. John Pennycook and Jason D. Sewall. 2021. Revisiting a Metric for Performance Portability. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–9. https://doi.org/10.1109/P3HPC54578.2021.00004Google ScholarGoogle ScholarCross RefCross Ref
  17. I Reguly. 2012. Op2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In 2012 Innovative Parallel Computing (InPar). IEEE, 1–12.Google ScholarGoogle Scholar
  18. IZ Reguly, AC Mallinson, WP Gaudin, and JA Herdman. 2015. Performance analysis of a high-level abstractions-based hydrocode on future computing systems. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation: 5th International Workshop, PMBS 2014, New Orleans, LA, USA, November 16, 2014. Revised Selected Papers 5. Springer, 85–104.Google ScholarGoogle Scholar
  19. István Z Reguly, Gihan R Mudalige, Michael B Giles, Dan Curran, and Simon McIntosh-Smith. 2014. The ops domain specific abstraction for multi-block structured grid computations. In 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing. IEEE, 58–67.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yuhsiang M Tsai, Terry Cojean, and Hartwig Anzt. 2021. Porting sparse linear algebra to Intel GPUs. In European Conference on Parallel Processing. Springer, 57–68.Google ScholarGoogle Scholar

Index Terms

  1. Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
          November 2023
          2180 pages
          ISBN:9798400707858
          DOI:10.1145/3624062

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format