nach oben

The Journal of Supercomputing

Erschienen in:

30.05.2023

Novel accelerated methods for convolution neural network with matrix core

verfasst von: Yijie Guo, Lu Lu, Songxiang Zhu

Erschienen in: The Journal of Supercomputing | Ausgabe 17/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The powerful parallel computing capability of GPU and the development of matrix processing unit in recent years provide more possibilities to improve the performance of convolutional neural network (CNN) on GPU. For the Winograd convolution algorithm, which is the most widely used in CNN and has the best performance, there are already some tuning results, but they all ignore the utilization of the matrix operation unit and fail to make full use of the computing resources of GPU. This paper introduces a single precision accelerated solution on GPU for CNN. According to the indicators of architecture, the optimal data layout, grid division and block division methods are derived. In order to adapt to a variety of padding in practical application, an efficient dynamic scheme for filling is designed, and by the use of matrix cores, a pipeline algorithm with operator fusion is implemented. The deep learning accelerated library MIOpen in AMD is used as the baseline. Taking several convolutional layers of ResNet50 as the experimental input, the evaluation shows that our approach outperforms MIOpen with the speedup of 1.41x on MI210, and reaches 74% of the peak value of single precision calculations. Applying this method to the training and inference of ResNet50, the speedup of 1.68x is obtained.

Vorheriger Artikel RETRACTED ARTICLE: Multilingual hate speech detection sentimental analysis on social media platforms using optimal feature extraction and hybrid diagonal gated recurrent neural network

Nächster Artikel Data quality model for assessing public COVID-19 big datasets

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629CrossRef

Lee H, Kwon H (2017) Going deeper with contextual cnn for hyperspectral image classification. IEEE Trans Image Process 26(10):4843–4855MathSciNetCrossRef

Salvador A, Giró-i-Nieto X, Marqués F, Satoh S (2016) Faster r-cnn features for instance search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops pp 9–16

Bao L, Wu B, Liu W (2018) Cnn in mrf: Video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 5977–5986

Sharma S, Shanmugasundaram K, Ramasamy SK (2016) Farec-cnn based efficient face recognition technique using dlib. In: 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) pp 192–195. IEEE

Saranya A, Kottursamy K, AlZubi AA, Bashir AK (2022) Analyzing fibrous tissue pattern in fibrous dysplasia bone images using deep r-cnn networks for segmentation. Soft Comput 26(16):7519–7533CrossRef

Potok TE, Schuman C, Young S, Patton R, Spedalieri F, Liu J, Yao K-T, Rose G, Chakma G (2018) A study of complex deep learning networks on high-performance, neuromorphic, and quantum computers. ACM J Emerg Technol Comput Syst (JETC) 14(2):1–21CrossRef

Chang M-C, Pan Z-G, Chen J-L (2017) Hardware accelerator for boosting convolution computation in image classification applications. In: 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE) pp. 1–2 IEEE

Khan J, Fultz P, Tamazov A, Lowell D, Liu C, Melesse M, Nandhimandalam M, Nasyrov K, Perminov I, Shah T, et al (2019) Miopen: An open source library for deep learning primitives. arXiv preprint arXiv:1910.00078

10.

Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B (2019) Shelhamer, e. cudnn: Efficient primitives for deep learning. arxiv 2014. arXiv preprint arXiv:1410.0759

11.

NVIDIA: cutlass. https://github.com/NVIDIA/cutlass (2022)

12.

Georganas E, Avancha S, Banerjee K, Kalamkar D, Henry G, Pabst H, Heinecke A (2018) Anatomy of high-performance deep learning convolutions on simd architectures. In: SC18: International Conference for High Performance Computing, Networking Storage and Analysis, pp 830–841. IEEE

13.

Mathieu M, Henaff M, LeCun Y (2013) Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851

14.

Lavin A, Gray S (2016) Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021

15.

INTEL/oneapi-src: oneDNN. https://github.com/oneapi-src/oneDNN (2021)

16.

Tencent: ncnn. https://github.com/Tencent/ncnn (2022)

17.

Winograd S (1980) Arithmetic complexity of computations, vol 33. Siam, IndiaCrossRefMATH

18.

Kuo L-W, Yang C-C, Lee J-K, Tseng S-Y (2014) The design of llvm-based shader compiler for embedded architecture. In: 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp 961–968. IEEE

19.

Horn RA (1990) The hadamard product. In: Proc Symp Appl Math vol 40: pp 87–169

20.

ROCmSoftwarePlatform: rocWMMA. https://github.com/ROCmSoftwarePlatform/rocWMMA (2022)

21.

Theckedath D, Sedamkar R (2020) Detecting affect states using vgg16, resnet50 and se-resnet50 networks. SN Comput Sci 1(2):1–7CrossRef

22.

Vasudevan A, Anderson A, Gregg D (2017) Parallel multi channel convolution using general matrix multiplication. In: 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) pp 19–24. IEEE

23.

Chikin V, Kryzhanovskiy V (2022) Channel balancing for accurate quantization of winograd convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 12507–12516

24.

Yan D, Wang W, Chu X (2020) Optimizing batched winograd convolution on gpus. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp 32–44

25.

Castro RL, Andrade D, Fraguela BB (2021) Opencnn: a winograd minimal filtering algorithm implementation in cuda. Mathematics 9(17):2033CrossRef

26.

Markidis S, Der Chien SW, Laure E, Peng IB, Vetter JS (2018) Nvidia tensor core programmability, performance & precision. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 522–531 IEEE

27.

Jia L, Liang Y, Li X, Lu L, Yan S (2020) Enabling efficient fast convolution algorithms on gpus via megakernels. IEEE Trans Comput 69(7):986–997MathSciNetMATH

28.

Jiang J, Huang D, Du J, Lu Y, Liao X (2022) Optimizing small channel 3d convolution on gpu with tensor core. Parallel Comput 113:102954MathSciNetCrossRef

29.

Jia Z, Zlateski A, Durand F, Li K (2018) Optimizing n-dimensional, winograd-based convolution for manycore cpus. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp 109–123

30.

Ma Y, Cao Y, Vrudhula S, Seo J-s (2018) Optimizing the convolution operation to accelerate deep neural networks on fpga. IEEE Trans Very Large Scale Int (VLSI) Syst 26(7): 1354–1367

31.

Kala S, Jose BR, Mathew J, Nalesh S (2019) High-performance cnn accelerator on fpga using unified winograd-gemm architecture. IEEE Trans Very Large Scale Int (VLSI) Syst 27(12): 2816–2828

32.

Coppersmith D, Winograd S (1987) Matrix multiplication via arithmetic progressions. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, pp 1–6

33.

Smith A, James N (2022) Amd instinct\(^{{\rm TM}}\) mi200 series accelerator and node architectures. In: 2022 IEEE Hot Chips 34 Symposium (HCS), pp 1–23. IEEE Computer Society

34.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 770–778

35.

Mattson P, Reddi VJ, Cheng C, Coleman C, Diamos G, Kanter D, Micikevicius P, Patterson D, Schmuelling G, Tang H et al (2020) Mlperf: an industry standard benchmark suite for machine learning performance. IEEE Micro 40(2):8–16CrossRef

36.

Wei H, Liu E, Zhao Y, Yu H (2020) Efficient non-fused winograd on gpus. In: Computer Graphics International Conference pp 411–418. Springer

37.

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) \(\{\)TensorFlow\(\}\): a system for \(\{\)Large-Scale\(\}\) machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) pp 265–283

38.

NervanaSystems: neon. https://github.com/NervanaSystems/neon (2019)

39.

ROCmSoftwarePlatform: rocRAND. https://github.com/ROCmSoftwarePlatform/rocRAND (2019)

40.

Sun Y, Mukherjee S, Baruah T, Dong S, Gutierrez J, Mohan P, Kaeli D (2018) Evaluating performance tradeoffs on the radeon open compute platform. In: 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp 209–218. IEEE

41.

Zhou Y, Yang M, Guo C, Leng J, Liang Y, Chen Q, Guo M, Zhu Y (2021) Characterizing and demystifying the implicit convolution algorithm on commercial matrix-multiplication accelerators. In: 2021 IEEE International Symposium on Workload Characterization (IISWC) pp 214–225. IEEE

42.

Tsai YM, Cojean T, Anzt H (2020) Evaluating the performance of nvidia’s a100 ampere gpu for sparse linear algebra computations. arXiv preprint arXiv:2008.08478

Titel: Novel accelerated methods for convolution neural network with matrix core
verfasst von: Yijie Guo
Lu Lu
Songxiang Zhu
Publikationsdatum: 30.05.2023
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 17/2023
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-023-05399-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 17/2023

Temporal fusion transformer-based prediction in aquaponics

Simplifying non-contiguous data transfer with MPI for Python

Data quality model for assessing public COVID-19 big datasets

COLMA: a chaos-based mayfly algorithm with opposition-based learning and Levy flight for numerical optimization and engineering design

Anomaly detection of policies in distributed firewalls using data log analysis

Topic sentiment analysis based on deep neural network using document embedding technique

Premium Partner