nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

Efficient cuDNN-Compatible Convolution-Pooling on the GPU

verfasst von : Shunsuke Suita, Takahiro Nishimura, Hiroki Tokura, Koji Nakano, Yasuaki Ito, Akihiko Kasagi, Tsuguchika Tabaru

Erschienen in: Parallel Processing and Applied Mathematics

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate the convolution-pooling. Our new GPU implementation uses two techniques, (1) convolution interchange with direct sum, and (2) conversion to matrix multiplication. By these techniques, the computational and memory access cost are reduced. Further the convolution interchange is converted to matrix multiplication, which can be computed by cuBLAS very efficiently. Experimental results using Telsa V100 GPU show that our new GPU implementation compatible with cuDNN for the convolution-pooling is at least 1.34 times faster than the multiple convolution and then the pooling by cuDNN, the most popular library of primitives to implement the CNNs in the GPU.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Examining Performance Portability with Kokkos for an Ewald Sum Coulomb Solver

Nächstes Kapitel Reactive Task Migration for Hybrid MPI+OpenMP Applications

Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. CoRR abs/1710.09282, October 2017

Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: efficient primitives for deep learning. CoRR abs/1410.0759, August 2014

Emoto, Y., Funasaka, S., Tokura, H., Honda, T., Nakano, K., Ito, Y.: An optimal parallel algorithm for computing the summed area table on the GPU. In: Proceedings of International Parallel and Distributed Processing Symposium Workshops, pp. 763–772, February 2018

Honda, T., Yamamoto, S., Honda, H., Nakano, K., Ito, Y.: Simple and fast parallel algorithms for the Voronoi map and the Euclidean distance map, with GPU implementations. In: Proceedings of International Conference on Parallel Processing, pp. 362–371, August 2017

Hwu, W.W.: GPU Computing Gems, Emerald edn. Morgan Kaufmann, Burlington (2011)

Kasagi, A., Nakano, K., Ito, Y.: Parallel algorithms for the summed area table on the asynchronous hierarchical memory machine, with GPU implementations. In: Proceedings of International Conference on Parallel Processing (ICPP), pp. 251–260, September 2014

Kasagi, A., Tabaru, T., Tamura, H.: Fast algorithm using summed area tables with unified layer performing convolution and average pooling. In: Proceedings of International Workshop on Machine Learning for Signal Processing, September 2017

Li, C., Yang, Y., Feng, M., Chakradhar, S., Zhou, H.: Optimizing memory efficiency for deep convolutional neural networks on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2016

Matsumura, N., Tokura, H., Kuroda, Y., Ito, Y., Nakano, K.: Tile art image generation using conditional generative adversarial networks. In: Proceedings of International Symposium on Computing and Networking Workshops, pp. 209–215 (2018)

10.

NVIDIA Corporation: NVIDIA CUDA C programming guide version 4.0 (2011)

11.

NVIDIA Corporation: CUBLAS LIBRARY user guide, February 2019. https://docs.nvidia.com/cuda/cublas/index.html

12.

NVIDIA Corporation: CUDNN developer guide, February 2019. https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html

13.

Ogawa, K., Ito, Y., Nakano, K.: Efficient Canny edge detection using a GPU. In: Proceedings of International Conference on Networking and Computing, pp. 279–280. IEEE CS Press, November 2010

14.

Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)CrossRef

15.

Takeuchi, Y., Takafuji, D., Ito, Y., Nakano, K.: ASCII art generation using the local exhaustive search on the GPU. In: Proceedings of International Symposium on Computing and Networking, pp. 194–200, December 2013

16.

Zhang, Q., Zhang, M., Chen, T., Sun, Z., Ma, Y., Yu, B.: Recent advances in convolutional neural network acceleration. Neurocomputing 323, 37–51 (2019)CrossRef

Titel: Efficient cuDNN-Compatible Convolution-Pooling on the GPU
verfasst von: Shunsuke Suita
Takahiro Nishimura
Hiroki Tokura
Koji Nakano
Yasuaki Ito
Akihiko Kasagi
Tsuguchika Tabaru
Verlag: Springer International Publishing
Buch: Parallel Processing and Applied Mathematics
Print ISBN: 978-3-030-43221-8

Electronic ISBN: 978-3-030-43222-5

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-43222-5_5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"