Skip to main content

2017 | OriginalPaper | Buchkapitel

NVIDIA Jetson Platform Characterization

verfasst von : Hassan Halawa, Hazem A. Abdelhafez, Andrew Boktor, Matei Ripeanu

Erschienen in: Euro-Par 2017: Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This study characterizes the NVIDIA Jetson TK1 and TX1 Platforms, both built on a NVIDIA Tegra System on Chip and combining a quad-core ARM CPU and an NVIDIA GPU. Their heterogeneous nature, as well as their wide operating frequency range, make it hard for application developers to reason about performance and determine which optimizations are worth pursuing. This paper attempts to inform developers’ choices by characterizing the platforms’ performance using Roofline models obtained through an empirical measurement-based approach as well as through a case study of a heterogeneous application (matrix multiplication). Our results highlight a difference of more than an order of magnitude in compute performance between the CPU and GPU on both platforms. Given that the CPU and GPU share the same memory bus, their Roofline models’ balance points are also more than an order of magnitude apart. We also explore the impact of frequency scaling: build CPU and GPU Roofline profiles and characterize both platforms’ balance point variation, power consumption, and performance per watt as frequency is scaled.
The characterization we provide can be used in two main ways. First, given an application, it can inform the choice and number of processing elements to use (i.e., CPU/GPU and number of cores) as well as the optimizations likely to lead to high performance gains. Secondly, this characterization indicates that developers can use frequency scaling to tune the Jetson Platform to suit the requirements of their applications. Third, given a required power/performance budget, application developers can identify the appropriate parameters to use to tune the Jetson platforms to their specific workload requirements. We expect that this optimization approach can lead to overall gains in performance and/or power efficiency without requiring application changes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Our benchmarks are available online at: https://​bitbucket.​org/​nsl_​europar17/​benchmarks.
 
2
We tried several alternative techniques such as using mprotect() which changes memory access permissions on a specific memory range. The NVIDIA driver locks the memory accessed by the GPU kernels until they complete. Therefore, it is not possible to have a shared matrix object accessed at the same time by the CPU and GPU even when we use UMA, even if all accesses are read-only.
 
Literatur
3.
Zurück zum Zitat Lo, Y.J., Williams, S., Van Straalen, B., Ligocki, T.J., Cordery, M.J., Wright, N.J., Hall, M.W., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Cham (2015). doi:10.1007/978-3-319-17248-4_7 Lo, Y.J., Williams, S., Van Straalen, B., Ligocki, T.J., Cordery, M.J., Wright, N.J., Hall, M.W., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Cham (2015). doi:10.​1007/​978-3-319-17248-4_​7
4.
Zurück zum Zitat NVIDIA: Technical brief NVIDIA Jetson TK1 development kit: bringing GPU-accelerated computing to embedded systems. Technical report, April 2014 NVIDIA: Technical brief NVIDIA Jetson TK1 development kit: bringing GPU-accelerated computing to embedded systems. Technical report, April 2014
5.
Zurück zum Zitat NVIDIA: Tegra X1: NVIDIA’s new mobile superchip. Technical report, January 2015 NVIDIA: Tegra X1: NVIDIA’s new mobile superchip. Technical report, January 2015
6.
Zurück zum Zitat NVIDIA: CUBLAS library. Technical report, September 2016 NVIDIA: CUBLAS library. Technical report, September 2016
7.
Zurück zum Zitat Ofenbeck, G., et al.: Applying the Roofline model. In: ISPASS 2014, pp. 76–85, March 2014 Ofenbeck, G., et al.: Applying the Roofline model. In: ISPASS 2014, pp. 76–85, March 2014
8.
Zurück zum Zitat Paolucci, P.S., et al.: Power, energy and speed of embedded and server multi-cores applied to distributed simulation of spiking neural networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores. CoRR abs/1505.03015 (2015) Paolucci, P.S., et al.: Power, energy and speed of embedded and server multi-cores applied to distributed simulation of spiking neural networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores. CoRR abs/1505.03015 (2015)
9.
Zurück zum Zitat Ukidave, Y., et al.: Performance of the NVIDIA Jetson TK1 in HPC. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER), pp. 533–534, September 2015 Ukidave, Y., et al.: Performance of the NVIDIA Jetson TK1 in HPC. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER), pp. 533–534, September 2015
10.
Zurück zum Zitat Williams, S., et al.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef Williams, S., et al.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef
11.
Zurück zum Zitat Wong, H., et al.: Demystifying GPU microarchitecture through microbenchmarking. In: ISPASS 2010, pp. 235–246. IEEE (2010) Wong, H., et al.: Demystifying GPU microarchitecture through microbenchmarking. In: ISPASS 2010, pp. 235–246. IEEE (2010)
Metadaten
Titel
NVIDIA Jetson Platform Characterization
verfasst von
Hassan Halawa
Hazem A. Abdelhafez
Andrew Boktor
Matei Ripeanu
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64203-1_7