nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

3. Hardware-Algorithm Co-optimizations

verfasst von : Bert Moons, Daniel Bankman, Marian Verhelst

Erschienen in: Embedded Deep Learning

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As discussed in Chap. 1, neural network-based applications are still too costly for them to be embedded on mobile and always-on devices. This chapter discusses hardware aware algorithm-level solutions for this problem. As an introduction to this topic, this chapter gives an overview of existing work in hardware and neural network co-optimizations. Two own contributions in hardware-algorithm optimization are discussed and compared: network quantization either at test- and train-time. The chapter ends with a methodology for designing minimum energy quantized neural networks—networks trained for low-precision fixed-point operation, a second major contribution of this text.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Optimized Hierarchical Cascaded Processing

Nächstes Kapitel Circuit Techniques for Approximate Computing

qhardtanh(a) = 2 × hardtanh(round(hardsigmoid(a) × 2^Q−1)∕2^Q−1), with hardsigmoid(a) = clip((a + 1)∕2, 0, 1).

Albericio J, Judd P, Jerger N, Aamodt T, Hetherington T, Moshovos A (2016) Cnvlutin:ineffectual-neuron-free deep neural network computing. In: International Symposium on Computer Architecture (ISCA)

Andri R, Cavigelli L, Rossi D, Benini L (2016) Yodann: an ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2016. IEEE, pp 236–241

Andri R, Cavigelli L, Rossi D, Benini L (2017) Yodann: an architecture for ultra-low power binary-weight CNN acceleration. IEEE Trans Comput Aided Des Integr Circuits Syst 37:48–60CrossRef

Annaratone M, Arnould E, Gross T, Kung HT, Lam M, Menzilcioglu O, Webb JA (1987) The warp computer: architecture, implementation, and performance. IEEE Trans Comput 100(12):1523–1538CrossRef

Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 uj/classification 86 accelerator with all memory on chip in 28 nm CMOS. In: International Solid-State Circuits Conference (ISSCC) technical digest

Bengio Y, Léonard N, Courville A (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint:13083432

Biswas A, Chandrakasan A (2018) Conv-ram: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In: International Solid-State Circuits Conference (ISSCC)

Cecconi L (2017) Optimal tiling strategy for memory bandwidth reduction for CNNS. Ph.D. thesis

Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 269–284

Chen YH, Emer J, Sze V (2016a) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: ACM/IEEE 43rd annual International Symposium on Computer Architecture (ISCA), 2016. IEEE, pp 367–379

Chen YH, Krishna T, Emer J, Sze V (2016b) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: International Solid-State Circuits Conference (ISSCC) digest of technical papers, pp 262–263

Conti F, Benini L (2015) A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters. In: Proceedings of the 2015 design, automation & test in Europe conference & exhibition. EDA Consortium, San Jose, pp 683–688

Courbariaux M, Bengio Y (2016) Binarynet: training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830

Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., Red Hook, pp 3123–3131

Dundar G, Rose K (1995) The effects of quantization on multilayer neural networks. IEEE Trans Neural Netw 6(6):1446–1451CrossRef

Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y (2011) Neuflow: a runtime reconfigurable dataflow processor for vision. In: IEEE computer society conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2011. IEEE, pp 109–116

Fick L, Blaauw D, Sylvester D, Skrzyniarz S, Parikh M, Fick D (2017) Analog in-memory subthreshold deep neural network accelerator. In: IEEE Custom Integrated Circuits Conference (CICC), 2017. IEEE, pp 1–4

Goetschalckx K, Moons B, Wambacq P, Verhelst M (2018) Improved deep neural network compression by combining deep compression and singular value decomposition. In: International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI)

Gonugondla SK, Kang M, Shanbhag N (2018) A 42pj/decision 3.12tops/w robust in-memory machine learning classifier with on-chip training. In: International Solid-State Circuits Conference (ISSCC)

Gysel P (2016) Ristretto: hardware-oriented approximation of convolutional neural networks. arXiv preprint:160506402

Gysel P, Motamedi M, Ghiasi S (2016) Hardware-oriented approximation of convolutional neural networks. Workshop contribution to International Conference on Learning Representations (ICLR)

Han S, Mao H, Dally WJ (2015a) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint:151000149

Han S, Pool J, Tran J, Dally W (2015b) Learning both weights and connections for efficient neural network. Proceedings of advances in neural information processing systems, pp 1135–1143

Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016a) EIE: Efficient Inference Engine on compressed deep neural network. In: International Symposium on Computer Architecture (ISCA)

Han S, Mao H, Dally WJ (2016b) Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (ICLR)

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR)

Horowitz M (2014) Energy table for 45 nm process. Stanford VLSI wiki

Huang G, Chen D, Li T, Wu F, van der Maaten L, Weinberger KQ (2017) Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:170309844

Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016a) Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS)

Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016b) Quantized neural networks: training neural networks with low precision weights and activations. arXiv preprint:160907061

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint:150203167

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) CAFFE: Convolutional Architecture for Fast Feature Embedding. arXiv:14085093 [cs]. http://arxiv.org/abs/1408.5093, arXiv: 1408.5093

Jiang M, Gielen G (2003) The effects of quantization on multi-layer feedforward neural networks. Int J Pattern Recognit Artif Intell 17(04):637–661. https://doi.org/10.1142/S0218001403002514. http://www.worldscientific.com/doi/abs/10.1142/S0218001403002514 CrossRef

Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: International Symposium on Computer Architecture (ISCA)

Kim D, Ahn J, Yoo S (2018) Zena: zero-aware neural network accelerator. IEEE Design & Test 35:39–46CrossRef

Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint:14126980

Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report

Krizhevsky A, Sutskever I, Hinton GE (2012a) Imagenet classification with deep convolutional neural networks. In: Proceedings of advances in neural information processing systems, pp 1097–1105

Krizhevsky A, Sutskever I, Hinton GE (2012b) ImageNet Classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., Red Hook, pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Le Cun BB, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, Citeseer

Lee J, Kim C, Kang S, Shin D, Kim S, Yoo HY (2018) Unpu: A 50.6 tops/w unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: International Solid-State Circuits Conference (ISSCC)

Lin Y, Zhang S, Shanbhag NR (2016) Variation-tolerant architectures for convolutional neural networks in the near threshold voltage regime. In: IEEE international workshop on Signal Processing Systems (SiPS), 2016. IEEE, pp 17–22

Moons B, Verhelst M (2015) DVAS: Dynamic Voltage Accuracy Scaling for increased energy-efficiency in approximate computing. In: International Symposium on Low Power Electronics and Design (ISLPED). https://doi.org/10.1109/ISLPED.2015.7273520

Moons B, Verhelst M (2016) A 0.3-2.6 tops/w precision-scalable processor for real-time large-scale convnets. In: Proceedings of the IEEE symposium on VLSI circuits, pp 178–179

Moons B, De Brabandere B, Van Gool L, Verhelst M (2016) Energy-efficient convnets through approximate computing. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1–8

Moons B, Goetschalckx K, Van Berckelaer N, Verhelst M (2017a) Minimum energy quantized neural networks. In: Asilomar conference on signals, systems and computers

Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017b) DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling. In: 2017 Design, Automation & Test in Europe conference & exhibition (DATE). IEEE, pp 488–493

Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017c) Envision: a 0.26-to-10 tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: International Solid-State Circuits Conference (ISSCC)

Moons B, et al (2017d) Bertmoons github page. http://github.com/BertMoons. Accessed: 01 Jan 2018

Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: 2013 IEEE 31st International Conference on Computer Design (ICCD). IEEE, pp 13–19

Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, Berlin, pp 525–542

Real E, Moore S, Selle A, Saxena S, Suematsu YL, Le Q, Kurakin A (2017) Large-scale evolution of image classifiers. arXiv preprint:170301041

Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V (2016) Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: Proceedings of the 43rd international symposium on computer architecture. IEEE Press, pp 14–26

Shin D, Lee J, Lee J, Yoo HJ (2017) 14.2 dnpu: an 8.1 tops/w reconfigurable CNN-RNN processor for general-purpose deep neural networks. In: IEEE International Solid-State Circuits Conference (ISSCC), 2017. IEEE, pp 240–241

Sze V, Yang TJ, Chen YH (2017) Designing energy-efficient convolutional neural networks using energy-aware pruning. In: Computer Vision and Pattern Recognition (CVPR)

Whatmough PN, Lee SK, Lee H, Rama S, Brooks D, Wei GY (2017) 14.3 a 28 nm soc with a 1.2 ghz 568 nj/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IOT applications. In: IEEE International Solid-State Circuits Conference (ISSCC), 2017. IEEE, pp 242–243

Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369

Yang L, Bankman D, Moons B, Verhelst M, Murmann B (2018) Bit error tolerance of a CIFAR-10 binarized convolutional neural network processor. In: IEEE International Symposium on Circuits and Systems (ISCAS)

Yin S, et al (2017) Minimizing area and energy of deep learning hardware design using collective low precision and structured compression. In: Asilomar conference on signals, systems and computers

Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:160606160

Zhu C, Han S, Mao H, Dally WJ (2016) Trained ternary quantization. arXiv preprint:161201064

Titel: Hardware-Algorithm Co-optimizations
verfasst von: Bert Moons
Daniel Bankman
Marian Verhelst
Verlag: Springer International Publishing
Buch: Embedded Deep Learning
Print ISBN: 978-3-319-99222-8

Electronic ISBN: 978-3-319-99223-5

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-319-99223-5_3

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.