nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

Learning Neural Representations for Predicting GPU Performance

verfasst von : Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka

Erschienen in: High Performance Computing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The graphic processing units (GPUs) have become a primary source of heterogeneity in today’s computing systems. With the rapid increase in number and types of GPUs available, finding the best hardware accelerator for each application is a challenge. For that matter, it is time consuming and tedious to execute every application on every GPU system to learn the correlation between application properties and hardware characteristics. To address this problem, we extend our previously proposed collaborating filtering based modeling technique, to build an analytical model which can predict performance of applications across different GPU systems. Our model learns representations, or embeddings (dense vectors of latent features) for applications and systems and uses them to characterize the performance of various GPU-accelerated applications. We improve state-of-the-art collaborative filtering approach based on matrix factorization by building a multi-layer perceptron. In addition to increased accuracy in predicting application performance, we can use this model to simultaneously predict multiple metrics such as rates of memory access operations. We evaluate our approach on a set of 30 well-known micro-applications and seven Nvidia GPUs. As a result, we can predict expected instructions per second value with 90.6% accuracy in average.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Densifying Assumed-Sparse Tensors

Nächstes Kapitel SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis

We tried adding Nvidia RTX 2070 and RTX 2080Ti GPUs from Turing micro-architecture in our study however we faced two issues: (1) nvprof profiling is not supported on these devices and a new profiling tool, Nsight Compute is recently introduced. However, some nvprof metrics (such as global load and store transactions) can’t be recorded using Nsight Compute when SM < 7.0. (2) Also, Nsight Compute records global load transactions in sector while nvprof records the same performance metric in bytes.

Almazro, D., Shahatah, G., Albdulkarim, L., Kherees, M., Martinez, R., Nzoukou, W.: A survey paper on recommender systems. CoRR abs/1006.5278 (2010)

Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Huw, W.M.: An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 105–114 (2010)

Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174, April 2009. https://doi.org/10.1109/ISPASS.2009.4919648

Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186 (2010)CrossRef

Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE) - arguments against avoiding RMSE in the literature. Geosco. Model Dev. 7, 1247–1250 (2014)CrossRef

Che, S., et al.: Rodinia: a benchmark suite for hetrogenous computing. In: International Symposium on Workload Characterization (IISWC) (2009)

NVIDIA Corporation. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

Dean, J., Patterson, D., Young, C.: A new golden age in computer architecture: empowering the machine-learning revolution. IEEE Micro 38(2), 21–29 (2018)CrossRef

Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural network. In: Proceedings of the Fourteenth International Conference on Artifical Intelligence and Statistics. PMLR 15, pp. 315–323 (2011)

10.

Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, November 2006 (2006)

11.

Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar) (2012)

12.

Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 280–289 (2010)

13.

Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016)

14.

Kerr, A., Anger, E., Hendry, G., Yalamanchili, S.: Eiger: a framework for the automated synthesis of statistical performance models. In: 2012 19th International Conference on High Performance Computing, pp. 1–6 (2012)

15.

Liu, W., Schmidt, B.: Performance predictions for general-purpose computation on GPUs. In: Proceedings of 2007 International Conference on Parallel Processing, ICPP (2017)

16.

Luo, C., Suda, R.: A performance and energy consumption analytical model for GPU. In: 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, pp. 658–665 (2011)

17.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26. Curran Associates, Inc. (2013)

18.

Mirowski, P.W., et al.: Learning to navigate in complex environments. CoRR abs/1611.03673 (2016)

19.

Nvidia Turing GPU Architecture. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

20.

NVProf. https://docs.nvidia.com/cuda/profiler-users-guide/index.html

21.

The OpenCL Specification. https://www.khronos.org/opencl/

22.

Salaria, S., Drozd, A., Podobas, A., Matsuoka, S.: Predicting performance using collaborative filtering. In: Proceedings of the 2018 IEEE International Conference on Cluster Computing, pp. 504–514. CLUSTER (2018)

23.

Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems in NIPS (2010)

24.

Top500. https://www.top500.org

25.

Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef

26.

Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 564–576, February 2015

27.

Xhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture, HPCA 2011 (2011)

28.

Yuting, Z., Kibok, L., Honglak, L.: Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 612–621. JMLR.org (2016)

Titel: Learning Neural Representations for Predicting GPU Performance
verfasst von: Shweta Salaria
Aleksandr Drozd
Artur Podobas
Satoshi Matsuoka
Verlag: Springer International Publishing
Buch: High Performance Computing
Print ISBN: 978-3-030-20655-0

Electronic ISBN: 978-3-030-20656-7

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-20656-7_3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner