Skip to main content

2019 | OriginalPaper | Buchkapitel

Refactoring Loops with Nested IFs for SIMD Extensions Without Masked Instructions

verfasst von : Huihui Sun, Sergei Gorlatch, Rongcai Zhao

Erschienen in: Euro-Par 2018: Parallel Processing Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most CPUs in heterogeneous systems are now equipped with SIMD (Single Instruction Multiple Data) extensions that operate on short vectors in parallel to enable high performance. Refactoring programs for such systems relies on vectorization, i.e., transforming into a form with SIMD-instructions. We improve the state of the art in refactoring loops with nested IF-statements that are notoriously difficult to vectorize. For IF-statements whose conditions are independent of the loop variable, we improve the classical loop unswitching method, such that it can tackle nested IFs. For IF-statements whose conditions change with loop iterations, we develop a novel IF-select transformation method: (1) it can work with arbitrarily nested IFs, and (2) while previous methods rely on either masked instructions or hardware support for predicated execution, our method works for SIMD extensions without such operations (as found, e.g., in IBM Power8 and ARM Cortex-A8). Our experimental evaluation for the SPEC CPU2006 benchmark suite is conducted on an SW26010 processor used in the Sunway TaihuLight supercomputer (#2 in the TOP500 list); it demonstrates the performance advantages of our implemented approach over the vectorizer of the Open64 compiler.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Allen, J.R., Kennedy, K., Porterfield, C., et al.: Conversion of control dependence to data dependence. In: Proceedings of the Symposium on Principles of Programming Languages (POPL), Austin, Texas, USA, pp. 177–189 (1983). https://doi.org/10.1145/567067.567085 Allen, J.R., Kennedy, K., Porterfield, C., et al.: Conversion of control dependence to data dependence. In: Proceedings of the Symposium on Principles of Programming Languages (POPL), Austin, Texas, USA, pp. 177–189 (1983). https://​doi.​org/​10.​1145/​567067.​567085
3.
Zurück zum Zitat AMD: Using the x86 Open64 Compiler Suite (2012). For x86 Open64 version 4.5.2 AMD: Using the x86 Open64 Compiler Suite (2012). For x86 Open64 version 4.5.2
5.
Zurück zum Zitat Danelutto, M., Garcia, J.D., Sanchez, L.M., Sotomayor, R., Torquati, M.: Introducing parallelism by using REPARA C++11 attributes. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 354–358 (2016). https://doi.org/10.1109/PDP.2016.115 Danelutto, M., Garcia, J.D., Sanchez, L.M., Sotomayor, R., Torquati, M.: Introducing parallelism by using REPARA C++11 attributes. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 354–358 (2016). https://​doi.​org/​10.​1109/​PDP.​2016.​115
7.
Zurück zum Zitat Fu, H., Liao, J., Yang, J., et al.: The Sunway TaihuLight supercomputer: system and applications. Sci. China Inf. Sci. 59, 1–16 (2016) Fu, H., Liao, J., Yang, J., et al.: The Sunway TaihuLight supercomputer: system and applications. Sci. China Inf. Sci. 59, 1–16 (2016)
8.
Zurück zum Zitat Haidl, M., Moll, S., Klein, L., Sun, H., Hack, S., Gorlatch, S.: PACXXv2 + RV: an LLVM-based portable high-performance programming model. In: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, pp. 7:1–7:12 (2017). https://doi.org/10.1145/3148173.3148185 Haidl, M., Moll, S., Klein, L., Sun, H., Hack, S., Gorlatch, S.: PACXXv2 + RV: an LLVM-based portable high-performance programming model. In: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, pp. 7:1–7:12 (2017). https://​doi.​org/​10.​1145/​3148173.​3148185
9.
Zurück zum Zitat Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Arch. News 34, 1–17 (2006)CrossRef Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Arch. News 34, 1–17 (2006)CrossRef
11.
Zurück zum Zitat Intel: Intel C++ Compiler Developer Guide and Reference (2017). Version 18.0 Intel: Intel C++ Compiler Developer Guide and Reference (2017). Version 18.0
13.
Zurück zum Zitat Larsen, S., Amarasinghe, S.P.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), Vancouver, Britith Columbia, Canada, pp. 145–156 (2000). https://doi.org/10.1145/358438.349320CrossRef Larsen, S., Amarasinghe, S.P.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), Vancouver, Britith Columbia, Canada, pp. 145–156 (2000). https://​doi.​org/​10.​1145/​358438.​349320CrossRef
14.
Zurück zum Zitat Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), San Jose, CA, USA, pp. 75–88 (2004) Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), San Jose, CA, USA, pp. 75–88 (2004)
15.
Zurück zum Zitat Naishlos, D.: Autovectorization in GCC. In: Proceedings of the GCC Developers Summit, Ottawa, Ontario, Canada, pp. 105–118 (2004) Naishlos, D.: Autovectorization in GCC. In: Proceedings of the GCC Developers Summit, Ottawa, Ontario, Canada, pp. 105–118 (2004)
16.
Zurück zum Zitat Shin, J., Hall, M.W., Chame, J.: Superword-level parallelism in the presence of control flow. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), San Jose, CA, USA, pp. 165–175 (2005) Shin, J., Hall, M.W., Chame, J.: Superword-level parallelism in the presence of control flow. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), San Jose, CA, USA, pp. 165–175 (2005)
17.
Zurück zum Zitat Smith, J.E., Faanes, G., Sugumar, R.A.: Vector instruction set support for conditional operations. In: Proceedings of the International Symposium on Computer Architecture (ISCA), Vancouver, BC, Canada, pp. 260–269 (2000) Smith, J.E., Faanes, G., Sugumar, R.A.: Vector instruction set support for conditional operations. In: Proceedings of the International Symposium on Computer Architecture (ISCA), Vancouver, BC, Canada, pp. 260–269 (2000)
18.
Zurück zum Zitat Sreraman, N., Govindarajan, R.: A vectorizing compiler for multimedia extensions. Int. J. Parallel Program. 28, 363–400 (2000)CrossRef Sreraman, N., Govindarajan, R.: A vectorizing compiler for multimedia extensions. Int. J. Parallel Program. 28, 363–400 (2000)CrossRef
19.
Zurück zum Zitat Thomas, J., Allen, F., Cocke, J.: A Catalogue of Optimizing Transformations. Prentice-Hall, Englewood Cliffs (1971) Thomas, J., Allen, F., Cocke, J.: A Catalogue of Optimizing Transformations. Prentice-Hall, Englewood Cliffs (1971)
Metadaten
Titel
Refactoring Loops with Nested IFs for SIMD Extensions Without Masked Instructions
verfasst von
Huihui Sun
Sergei Gorlatch
Rongcai Zhao
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-10549-5_60

Premium Partner