Skip to main content
Top
Published in: The Journal of Supercomputing 6/2021

10-11-2020

Performance and power consumption analysis of Arm Scalable Vector Extension

Authors: Tetsuya Odajima, Yuetsu Kodama, Mitsuhisa Sato

Published in: The Journal of Supercomputing | Issue 6/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Modern CPUs not only have multiple cores but also support wide single instruction multiple data (SIMD). This trend is expected to grow in the future. In this paper, we examine the effect of the vector length and the number of out-of-order resources on the performance and the power consumption of programs having multiple vector lengths using the Arm Scalable Vector Extension. Based on the performed evaluation, we conclude that using a longer vector length with multicycle vector units leads to up to approximately 30% improvement in performance and 21% decrease in power consumption than when using a shorter vector length.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Stephens N (2016) ARMv8-A next-generation vector architecture for HPC. In: 2016 IEEE Hot Chips 28 Symposium (HCS), pp 1–31 Stephens N (2016) ARMv8-A next-generation vector architecture for HPC. In: 2016 IEEE Hot Chips 28 Symposium (HCS), pp 1–31
2.
go back to reference Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P (2017) The ARM scalable vector extension. IEEE Micro 37(2):26–39CrossRef Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P (2017) The ARM scalable vector extension. IEEE Micro 37(2):26–39CrossRef
3.
go back to reference Brash D, Stephens N (2017) ARM: scaling new heights. In: COOL Chips 20 Brash D, Stephens N (2017) ARM: scaling new heights. In: COOL Chips 20
4.
go back to reference Tairum Cruz M (2018) Performing SVE studies using the arm instruction emulator. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 638–638 Tairum Cruz M (2018) Performing SVE studies using the arm instruction emulator. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp 638–638
7.
go back to reference Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1–7CrossRef Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1–7CrossRef
8.
go back to reference Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 469–480 Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 469–480
11.
go back to reference Hammond SD, Hughes C, Levenhagen MJ, Vaughan CT, Younge AJ, Schwaller B, Aguilar MJ, Pedretti KT, Laros JH (2019) Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads. In: 2019 International Conference on High Performance Computing Simulation (HPCS), pp 416–423 Hammond SD, Hughes C, Levenhagen MJ, Vaughan CT, Younge AJ, Schwaller B, Aguilar MJ, Pedretti KT, Laros JH (2019) Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads. In: 2019 International Conference on High Performance Computing Simulation (HPCS), pp 416–423
12.
go back to reference Yoshida T (2018) Fujitsu high performance CPU for the post-K computer. In: Hot Chips: A Symposium on High Performance Chips (HC30) Yoshida T (2018) Fujitsu high performance CPU for the post-K computer. In: Hot Chips: A Symposium on High Performance Chips (HC30)
13.
go back to reference Rico Al, Joao JA, Adeniyi-Jones C, Van Hensbergen E (2017) ARM HPC ecosystem and the reemergence of vectors: invited paper. In: Proceedings of the Computing Frontiers Conference, CF’17, pp 329–334, New York, NY, USA. Association for Computing Machinery Rico Al, Joao JA, Adeniyi-Jones C, Van Hensbergen E (2017) ARM HPC ecosystem and the reemergence of vectors: invited paper. In: Proceedings of the Computing Frontiers Conference, CF’17, pp 329–334, New York, NY, USA. Association for Computing Machinery
14.
go back to reference Poenaru A, McIntosh-Smith S (2020) Evaluating the effectiveness of a vector-length-agnostic instruction set. In: Euro-Par 2020: Parallel Processing, pp 98–114. Springer International Publishing Poenaru A, McIntosh-Smith S (2020) Evaluating the effectiveness of a vector-length-agnostic instruction set. In: Euro-Par 2020: Parallel Processing, pp 98–114. Springer International Publishing
15.
go back to reference Naffziger S, Lepak K, Paraschou M, Subramony M (2020) 2.2 AMD Chiplet architecture for high-performance server and desktop products. In: 2020 IEEE International Solid-State Circuits Conference—ISSCC), pp 44–45 Naffziger S, Lepak K, Paraschou M, Subramony M (2020) 2.2 AMD Chiplet architecture for high-performance server and desktop products. In: 2020 IEEE International Solid-State Circuits Conference—ISSCC), pp 44–45
16.
go back to reference Hisamoto D, Lee W-C, Kedzierski J, Takeuchi H, Asano K, Kuo C, Anderson E, King T-J, Bokor J, Hu C (2000) FinFET-a self-aligned double-gate MOSFET scalable to 20 nm. IEEE Trans Electron Devices 47(12):2320–2325CrossRef Hisamoto D, Lee W-C, Kedzierski J, Takeuchi H, Asano K, Kuo C, Anderson E, King T-J, Bokor J, Hu C (2000) FinFET-a self-aligned double-gate MOSFET scalable to 20 nm. IEEE Trans Electron Devices 47(12):2320–2325CrossRef
17.
go back to reference Kuhn KJ (2012) Considerations for ultimate CMOS scaling. IEEE Trans Electron Devices 59(7):1813–1828CrossRef Kuhn KJ (2012) Considerations for ultimate CMOS scaling. IEEE Trans Electron Devices 59(7):1813–1828CrossRef
19.
go back to reference Kodama Y, Odajima T, Matsuda M, Tsuji M, Lee J, Sato M (2017) Preliminary performance evaluation of application kernels using ARM SVE with multiple vector lengths. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp 677–684 Kodama Y, Odajima T, Matsuda M, Tsuji M, Lee J, Sato M (2017) Preliminary performance evaluation of application kernels using ARM SVE with multiple vector lengths. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp 677–684
21.
go back to reference Endo FA, Couroussé D, Charles H (2014) Micro-architectural simulation of in-order and out-of-order ARM microprocessors with gem5. In: 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp 266–273 Endo FA, Couroussé D, Charles H (2014) Micro-architectural simulation of in-order and out-of-order ARM microprocessors with gem5. In: 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp 266–273
22.
go back to reference Shao YS, Xi SL, Srinivasan V, Wei G, Brooks D (2016) Co-designing accelerators and SoC interfaces using gem5-Aladdin. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–12 Shao YS, Xi SL, Srinivasan V, Wei G, Brooks D (2016) Co-designing accelerators and SoC interfaces using gem5-Aladdin. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–12
23.
go back to reference Lim J, Lakshminarayana NB, Kim H, Song W, Yalamanchili S, Sung W (2014) Power modeling for GPU architectures using McPAT. ACM Trans Des Autom Electron Syst 19(3):1–24CrossRef Lim J, Lakshminarayana NB, Kim H, Song W, Yalamanchili S, Sung W (2014) Power modeling for GPU architectures using McPAT. ACM Trans Des Autom Electron Syst 19(3):1–24CrossRef
24.
go back to reference Endo FA, Couroussé D, Charles H-P (2015) Micro-architectural simulation of embedded core heterogeneity with Gem5 and McPAT. In: Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO’15, New York, NY, USA, Association for Computing Machinery Endo FA, Couroussé D, Charles H-P (2015) Micro-architectural simulation of embedded core heterogeneity with Gem5 and McPAT. In: Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO’15, New York, NY, USA, Association for Computing Machinery
25.
go back to reference Inoue H (2016) How SIMD width affects energy efficiency: a case study on sorting. In: 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX), pp 1–3 Inoue H (2016) How SIMD width affects energy efficiency: a case study on sorting. In: 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX), pp 1–3
26.
go back to reference Inoue H (2017) Energy efficiency effects of vectorization in data reuse transformations for many-core processors—a case study. J Low Power Electron Appl 7(1):1–21CrossRef Inoue H (2017) Energy efficiency effects of vectorization in data reuse transformations for many-core processors—a case study. J Low Power Electron Appl 7(1):1–21CrossRef
Metadata
Title
Performance and power consumption analysis of Arm Scalable Vector Extension
Authors
Tetsuya Odajima
Yuetsu Kodama
Mitsuhisa Sato
Publication date
10-11-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 6/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03495-5

Other articles of this Issue 6/2021

The Journal of Supercomputing 6/2021 Go to the issue

Premium Partner