Skip to main content

2019 | OriginalPaper | Buchkapitel

Learning Neural Representations for Predicting GPU Performance

verfasst von : Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka

Erschienen in: High Performance Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The graphic processing units (GPUs) have become a primary source of heterogeneity in today’s computing systems. With the rapid increase in number and types of GPUs available, finding the best hardware accelerator for each application is a challenge. For that matter, it is time consuming and tedious to execute every application on every GPU system to learn the correlation between application properties and hardware characteristics. To address this problem, we extend our previously proposed collaborating filtering based modeling technique, to build an analytical model which can predict performance of applications across different GPU systems. Our model learns representations, or embeddings (dense vectors of latent features) for applications and systems and uses them to characterize the performance of various GPU-accelerated applications. We improve state-of-the-art collaborative filtering approach based on matrix factorization by building a multi-layer perceptron. In addition to increased accuracy in predicting application performance, we can use this model to simultaneously predict multiple metrics such as rates of memory access operations. We evaluate our approach on a set of 30 well-known micro-applications and seven Nvidia GPUs. As a result, we can predict expected instructions per second value with 90.6% accuracy in average.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We tried adding Nvidia RTX 2070 and RTX 2080Ti GPUs from Turing micro-architecture in our study however we faced two issues: (1) nvprof profiling is not supported on these devices and a new profiling tool, Nsight Compute is recently introduced. However, some nvprof metrics (such as global load and store transactions) can’t be recorded using Nsight Compute when SM < 7.0. (2) Also, Nsight Compute records global load transactions in sector while nvprof records the same performance metric in bytes.
 
Literatur
1.
Zurück zum Zitat Almazro, D., Shahatah, G., Albdulkarim, L., Kherees, M., Martinez, R., Nzoukou, W.: A survey paper on recommender systems. CoRR abs/1006.5278 (2010) Almazro, D., Shahatah, G., Albdulkarim, L., Kherees, M., Martinez, R., Nzoukou, W.: A survey paper on recommender systems. CoRR abs/1006.5278 (2010)
2.
Zurück zum Zitat Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Huw, W.M.: An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 105–114 (2010) Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Huw, W.M.: An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 105–114 (2010)
4.
Zurück zum Zitat Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186 (2010)CrossRef Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186 (2010)CrossRef
5.
Zurück zum Zitat Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE) - arguments against avoiding RMSE in the literature. Geosco. Model Dev. 7, 1247–1250 (2014)CrossRef Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE) - arguments against avoiding RMSE in the literature. Geosco. Model Dev. 7, 1247–1250 (2014)CrossRef
6.
Zurück zum Zitat Che, S., et al.: Rodinia: a benchmark suite for hetrogenous computing. In: International Symposium on Workload Characterization (IISWC) (2009) Che, S., et al.: Rodinia: a benchmark suite for hetrogenous computing. In: International Symposium on Workload Characterization (IISWC) (2009)
8.
Zurück zum Zitat Dean, J., Patterson, D., Young, C.: A new golden age in computer architecture: empowering the machine-learning revolution. IEEE Micro 38(2), 21–29 (2018)CrossRef Dean, J., Patterson, D., Young, C.: A new golden age in computer architecture: empowering the machine-learning revolution. IEEE Micro 38(2), 21–29 (2018)CrossRef
9.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural network. In: Proceedings of the Fourteenth International Conference on Artifical Intelligence and Statistics. PMLR 15, pp. 315–323 (2011) Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural network. In: Proceedings of the Fourteenth International Conference on Artifical Intelligence and Statistics. PMLR 15, pp. 315–323 (2011)
10.
Zurück zum Zitat Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, November 2006 (2006) Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, November 2006 (2006)
11.
Zurück zum Zitat Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar) (2012) Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar) (2012)
12.
Zurück zum Zitat Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 280–289 (2010) Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 280–289 (2010)
13.
Zurück zum Zitat Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016) Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016)
14.
Zurück zum Zitat Kerr, A., Anger, E., Hendry, G., Yalamanchili, S.: Eiger: a framework for the automated synthesis of statistical performance models. In: 2012 19th International Conference on High Performance Computing, pp. 1–6 (2012) Kerr, A., Anger, E., Hendry, G., Yalamanchili, S.: Eiger: a framework for the automated synthesis of statistical performance models. In: 2012 19th International Conference on High Performance Computing, pp. 1–6 (2012)
15.
Zurück zum Zitat Liu, W., Schmidt, B.: Performance predictions for general-purpose computation on GPUs. In: Proceedings of 2007 International Conference on Parallel Processing, ICPP (2017) Liu, W., Schmidt, B.: Performance predictions for general-purpose computation on GPUs. In: Proceedings of 2007 International Conference on Parallel Processing, ICPP (2017)
16.
Zurück zum Zitat Luo, C., Suda, R.: A performance and energy consumption analytical model for GPU. In: 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, pp. 658–665 (2011) Luo, C., Suda, R.: A performance and energy consumption analytical model for GPU. In: 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, pp. 658–665 (2011)
17.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26. Curran Associates, Inc. (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26. Curran Associates, Inc. (2013)
18.
Zurück zum Zitat Mirowski, P.W., et al.: Learning to navigate in complex environments. CoRR abs/1611.03673 (2016) Mirowski, P.W., et al.: Learning to navigate in complex environments. CoRR abs/1611.03673 (2016)
22.
Zurück zum Zitat Salaria, S., Drozd, A., Podobas, A., Matsuoka, S.: Predicting performance using collaborative filtering. In: Proceedings of the 2018 IEEE International Conference on Cluster Computing, pp. 504–514. CLUSTER (2018) Salaria, S., Drozd, A., Podobas, A., Matsuoka, S.: Predicting performance using collaborative filtering. In: Proceedings of the 2018 IEEE International Conference on Cluster Computing, pp. 504–514. CLUSTER (2018)
23.
Zurück zum Zitat Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems in NIPS (2010) Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems in NIPS (2010)
25.
Zurück zum Zitat Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRef
26.
Zurück zum Zitat Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 564–576, February 2015 Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 564–576, February 2015
27.
Zurück zum Zitat Xhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture, HPCA 2011 (2011) Xhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture, HPCA 2011 (2011)
28.
Zurück zum Zitat Yuting, Z., Kibok, L., Honglak, L.: Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 612–621. JMLR.org (2016) Yuting, Z., Kibok, L., Honglak, L.: Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 612–621. JMLR.org (2016)
Metadaten
Titel
Learning Neural Representations for Predicting GPU Performance
verfasst von
Shweta Salaria
Aleksandr Drozd
Artur Podobas
Satoshi Matsuoka
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-20656-7_3

Premium Partner