Skip to main content

2019 | OriginalPaper | Buchkapitel

3. Hardware-Algorithm Co-optimizations

verfasst von : Bert Moons, Daniel Bankman, Marian Verhelst

Erschienen in: Embedded Deep Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As discussed in Chap. 1, neural network-based applications are still too costly for them to be embedded on mobile and always-on devices. This chapter discusses hardware aware algorithm-level solutions for this problem. As an introduction to this topic, this chapter gives an overview of existing work in hardware and neural network co-optimizations. Two own contributions in hardware-algorithm optimization are discussed and compared: network quantization either at test- and train-time. The chapter ends with a methodology for designing minimum energy quantized neural networks—networks trained for low-precision fixed-point operation, a second major contribution of this text.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
qhardtanh(a) = 2 × hardtanh(round(hardsigmoid(a) × 2Q−1)∕2Q−1), with hardsigmoid(a) = clip((a + 1)∕2,  0,  1).
 
Literatur
Zurück zum Zitat Albericio J, Judd P, Jerger N, Aamodt T, Hetherington T, Moshovos A (2016) Cnvlutin:ineffectual-neuron-free deep neural network computing. In: International Symposium on Computer Architecture (ISCA) Albericio J, Judd P, Jerger N, Aamodt T, Hetherington T, Moshovos A (2016) Cnvlutin:ineffectual-neuron-free deep neural network computing. In: International Symposium on Computer Architecture (ISCA)
Zurück zum Zitat Andri R, Cavigelli L, Rossi D, Benini L (2016) Yodann: an ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2016. IEEE, pp 236–241 Andri R, Cavigelli L, Rossi D, Benini L (2016) Yodann: an ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2016. IEEE, pp 236–241
Zurück zum Zitat Andri R, Cavigelli L, Rossi D, Benini L (2017) Yodann: an architecture for ultra-low power binary-weight CNN acceleration. IEEE Trans Comput Aided Des Integr Circuits Syst 37:48–60CrossRef Andri R, Cavigelli L, Rossi D, Benini L (2017) Yodann: an architecture for ultra-low power binary-weight CNN acceleration. IEEE Trans Comput Aided Des Integr Circuits Syst 37:48–60CrossRef
Zurück zum Zitat Annaratone M, Arnould E, Gross T, Kung HT, Lam M, Menzilcioglu O, Webb JA (1987) The warp computer: architecture, implementation, and performance. IEEE Trans Comput 100(12):1523–1538CrossRef Annaratone M, Arnould E, Gross T, Kung HT, Lam M, Menzilcioglu O, Webb JA (1987) The warp computer: architecture, implementation, and performance. IEEE Trans Comput 100(12):1523–1538CrossRef
Zurück zum Zitat Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 uj/classification 86 accelerator with all memory on chip in 28 nm CMOS. In: International Solid-State Circuits Conference (ISSCC) technical digest Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 uj/classification 86 accelerator with all memory on chip in 28 nm CMOS. In: International Solid-State Circuits Conference (ISSCC) technical digest
Zurück zum Zitat Bengio Y, Léonard N, Courville A (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint:13083432 Bengio Y, Léonard N, Courville A (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint:13083432
Zurück zum Zitat Biswas A, Chandrakasan A (2018) Conv-ram: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In: International Solid-State Circuits Conference (ISSCC) Biswas A, Chandrakasan A (2018) Conv-ram: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In: International Solid-State Circuits Conference (ISSCC)
Zurück zum Zitat Cecconi L (2017) Optimal tiling strategy for memory bandwidth reduction for CNNS. Ph.D. thesis Cecconi L (2017) Optimal tiling strategy for memory bandwidth reduction for CNNS. Ph.D. thesis
Zurück zum Zitat Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 269–284 Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 269–284
Zurück zum Zitat Chen YH, Emer J, Sze V (2016a) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: ACM/IEEE 43rd annual International Symposium on Computer Architecture (ISCA), 2016. IEEE, pp 367–379 Chen YH, Emer J, Sze V (2016a) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: ACM/IEEE 43rd annual International Symposium on Computer Architecture (ISCA), 2016. IEEE, pp 367–379
Zurück zum Zitat Chen YH, Krishna T, Emer J, Sze V (2016b) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: International Solid-State Circuits Conference (ISSCC) digest of technical papers, pp 262–263 Chen YH, Krishna T, Emer J, Sze V (2016b) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: International Solid-State Circuits Conference (ISSCC) digest of technical papers, pp 262–263
Zurück zum Zitat Conti F, Benini L (2015) A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters. In: Proceedings of the 2015 design, automation & test in Europe conference & exhibition. EDA Consortium, San Jose, pp 683–688 Conti F, Benini L (2015) A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters. In: Proceedings of the 2015 design, automation & test in Europe conference & exhibition. EDA Consortium, San Jose, pp 683–688
Zurück zum Zitat Courbariaux M, Bengio Y (2016) Binarynet: training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830 Courbariaux M, Bengio Y (2016) Binarynet: training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830
Zurück zum Zitat Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., Red Hook, pp 3123–3131 Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., Red Hook, pp 3123–3131
Zurück zum Zitat Dundar G, Rose K (1995) The effects of quantization on multilayer neural networks. IEEE Trans Neural Netw 6(6):1446–1451CrossRef Dundar G, Rose K (1995) The effects of quantization on multilayer neural networks. IEEE Trans Neural Netw 6(6):1446–1451CrossRef
Zurück zum Zitat Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y (2011) Neuflow: a runtime reconfigurable dataflow processor for vision. In: IEEE computer society conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2011. IEEE, pp 109–116 Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y (2011) Neuflow: a runtime reconfigurable dataflow processor for vision. In: IEEE computer society conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2011. IEEE, pp 109–116
Zurück zum Zitat Fick L, Blaauw D, Sylvester D, Skrzyniarz S, Parikh M, Fick D (2017) Analog in-memory subthreshold deep neural network accelerator. In: IEEE Custom Integrated Circuits Conference (CICC), 2017. IEEE, pp 1–4 Fick L, Blaauw D, Sylvester D, Skrzyniarz S, Parikh M, Fick D (2017) Analog in-memory subthreshold deep neural network accelerator. In: IEEE Custom Integrated Circuits Conference (CICC), 2017. IEEE, pp 1–4
Zurück zum Zitat Goetschalckx K, Moons B, Wambacq P, Verhelst M (2018) Improved deep neural network compression by combining deep compression and singular value decomposition. In: International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI) Goetschalckx K, Moons B, Wambacq P, Verhelst M (2018) Improved deep neural network compression by combining deep compression and singular value decomposition. In: International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI)
Zurück zum Zitat Gonugondla SK, Kang M, Shanbhag N (2018) A 42pj/decision 3.12tops/w robust in-memory machine learning classifier with on-chip training. In: International Solid-State Circuits Conference (ISSCC) Gonugondla SK, Kang M, Shanbhag N (2018) A 42pj/decision 3.12tops/w robust in-memory machine learning classifier with on-chip training. In: International Solid-State Circuits Conference (ISSCC)
Zurück zum Zitat Gysel P (2016) Ristretto: hardware-oriented approximation of convolutional neural networks. arXiv preprint:160506402 Gysel P (2016) Ristretto: hardware-oriented approximation of convolutional neural networks. arXiv preprint:160506402
Zurück zum Zitat Gysel P, Motamedi M, Ghiasi S (2016) Hardware-oriented approximation of convolutional neural networks. Workshop contribution to International Conference on Learning Representations (ICLR) Gysel P, Motamedi M, Ghiasi S (2016) Hardware-oriented approximation of convolutional neural networks. Workshop contribution to International Conference on Learning Representations (ICLR)
Zurück zum Zitat Han S, Mao H, Dally WJ (2015a) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint:151000149 Han S, Mao H, Dally WJ (2015a) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint:151000149
Zurück zum Zitat Han S, Pool J, Tran J, Dally W (2015b) Learning both weights and connections for efficient neural network. Proceedings of advances in neural information processing systems, pp 1135–1143 Han S, Pool J, Tran J, Dally W (2015b) Learning both weights and connections for efficient neural network. Proceedings of advances in neural information processing systems, pp 1135–1143
Zurück zum Zitat Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016a) EIE: Efficient Inference Engine on compressed deep neural network. In: International Symposium on Computer Architecture (ISCA) Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016a) EIE: Efficient Inference Engine on compressed deep neural network. In: International Symposium on Computer Architecture (ISCA)
Zurück zum Zitat Han S, Mao H, Dally WJ (2016b) Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (ICLR) Han S, Mao H, Dally WJ (2016b) Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (ICLR)
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR)
Zurück zum Zitat Horowitz M (2014) Energy table for 45 nm process. Stanford VLSI wiki Horowitz M (2014) Energy table for 45 nm process. Stanford VLSI wiki
Zurück zum Zitat Huang G, Chen D, Li T, Wu F, van der Maaten L, Weinberger KQ (2017) Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:170309844 Huang G, Chen D, Li T, Wu F, van der Maaten L, Weinberger KQ (2017) Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:170309844
Zurück zum Zitat Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016a) Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS) Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016a) Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS)
Zurück zum Zitat Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016b) Quantized neural networks: training neural networks with low precision weights and activations. arXiv preprint:160907061 Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016b) Quantized neural networks: training neural networks with low precision weights and activations. arXiv preprint:160907061
Zurück zum Zitat Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint:150203167 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint:150203167
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) CAFFE: Convolutional Architecture for Fast Feature Embedding. arXiv:14085093 [cs]. http://arxiv.org/abs/1408.5093, arXiv: 1408.5093 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) CAFFE: Convolutional Architecture for Fast Feature Embedding. arXiv:14085093 [cs]. http://​arxiv.​org/​abs/​1408.​5093, arXiv: 1408.5093
Zurück zum Zitat Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: International Symposium on Computer Architecture (ISCA) Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: International Symposium on Computer Architecture (ISCA)
Zurück zum Zitat Kim D, Ahn J, Yoo S (2018) Zena: zero-aware neural network accelerator. IEEE Design & Test 35:39–46CrossRef Kim D, Ahn J, Yoo S (2018) Zena: zero-aware neural network accelerator. IEEE Design & Test 35:39–46CrossRef
Zurück zum Zitat Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint:14126980 Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint:14126980
Zurück zum Zitat Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012a) Imagenet classification with deep convolutional neural networks. In: Proceedings of advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012a) Imagenet classification with deep convolutional neural networks. In: Proceedings of advances in neural information processing systems, pp 1097–1105
Zurück zum Zitat Le Cun BB, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, Citeseer Le Cun BB, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, Citeseer
Zurück zum Zitat Lee J, Kim C, Kang S, Shin D, Kim S, Yoo HY (2018) Unpu: A 50.6 tops/w unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: International Solid-State Circuits Conference (ISSCC) Lee J, Kim C, Kang S, Shin D, Kim S, Yoo HY (2018) Unpu: A 50.6 tops/w unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: International Solid-State Circuits Conference (ISSCC)
Zurück zum Zitat Lin Y, Zhang S, Shanbhag NR (2016) Variation-tolerant architectures for convolutional neural networks in the near threshold voltage regime. In: IEEE international workshop on Signal Processing Systems (SiPS), 2016. IEEE, pp 17–22 Lin Y, Zhang S, Shanbhag NR (2016) Variation-tolerant architectures for convolutional neural networks in the near threshold voltage regime. In: IEEE international workshop on Signal Processing Systems (SiPS), 2016. IEEE, pp 17–22
Zurück zum Zitat Moons B, Verhelst M (2016) A 0.3-2.6 tops/w precision-scalable processor for real-time large-scale convnets. In: Proceedings of the IEEE symposium on VLSI circuits, pp 178–179 Moons B, Verhelst M (2016) A 0.3-2.6 tops/w precision-scalable processor for real-time large-scale convnets. In: Proceedings of the IEEE symposium on VLSI circuits, pp 178–179
Zurück zum Zitat Moons B, De Brabandere B, Van Gool L, Verhelst M (2016) Energy-efficient convnets through approximate computing. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1–8 Moons B, De Brabandere B, Van Gool L, Verhelst M (2016) Energy-efficient convnets through approximate computing. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1–8
Zurück zum Zitat Moons B, Goetschalckx K, Van Berckelaer N, Verhelst M (2017a) Minimum energy quantized neural networks. In: Asilomar conference on signals, systems and computers Moons B, Goetschalckx K, Van Berckelaer N, Verhelst M (2017a) Minimum energy quantized neural networks. In: Asilomar conference on signals, systems and computers
Zurück zum Zitat Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017b) DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling. In: 2017 Design, Automation & Test in Europe conference & exhibition (DATE). IEEE, pp 488–493 Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017b) DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling. In: 2017 Design, Automation & Test in Europe conference & exhibition (DATE). IEEE, pp 488–493
Zurück zum Zitat Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017c) Envision: a 0.26-to-10 tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: International Solid-State Circuits Conference (ISSCC) Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017c) Envision: a 0.26-to-10 tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: International Solid-State Circuits Conference (ISSCC)
Zurück zum Zitat Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: 2013 IEEE 31st International Conference on Computer Design (ICCD). IEEE, pp 13–19 Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: 2013 IEEE 31st International Conference on Computer Design (ICCD). IEEE, pp 13–19
Zurück zum Zitat Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, Berlin, pp 525–542 Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, Berlin, pp 525–542
Zurück zum Zitat Real E, Moore S, Selle A, Saxena S, Suematsu YL, Le Q, Kurakin A (2017) Large-scale evolution of image classifiers. arXiv preprint:170301041 Real E, Moore S, Selle A, Saxena S, Suematsu YL, Le Q, Kurakin A (2017) Large-scale evolution of image classifiers. arXiv preprint:170301041
Zurück zum Zitat Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V (2016) Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: Proceedings of the 43rd international symposium on computer architecture. IEEE Press, pp 14–26 Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V (2016) Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: Proceedings of the 43rd international symposium on computer architecture. IEEE Press, pp 14–26
Zurück zum Zitat Shin D, Lee J, Lee J, Yoo HJ (2017) 14.2 dnpu: an 8.1 tops/w reconfigurable CNN-RNN processor for general-purpose deep neural networks. In: IEEE International Solid-State Circuits Conference (ISSCC), 2017. IEEE, pp 240–241 Shin D, Lee J, Lee J, Yoo HJ (2017) 14.2 dnpu: an 8.1 tops/w reconfigurable CNN-RNN processor for general-purpose deep neural networks. In: IEEE International Solid-State Circuits Conference (ISSCC), 2017. IEEE, pp 240–241
Zurück zum Zitat Sze V, Yang TJ, Chen YH (2017) Designing energy-efficient convolutional neural networks using energy-aware pruning. In: Computer Vision and Pattern Recognition (CVPR) Sze V, Yang TJ, Chen YH (2017) Designing energy-efficient convolutional neural networks using energy-aware pruning. In: Computer Vision and Pattern Recognition (CVPR)
Zurück zum Zitat Whatmough PN, Lee SK, Lee H, Rama S, Brooks D, Wei GY (2017) 14.3 a 28 nm soc with a 1.2 ghz 568 nj/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IOT applications. In: IEEE International Solid-State Circuits Conference (ISSCC), 2017. IEEE, pp 242–243 Whatmough PN, Lee SK, Lee H, Rama S, Brooks D, Wei GY (2017) 14.3 a 28 nm soc with a 1.2 ghz 568 nj/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IOT applications. In: IEEE International Solid-State Circuits Conference (ISSCC), 2017. IEEE, pp 242–243
Zurück zum Zitat Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369 Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369
Zurück zum Zitat Yang L, Bankman D, Moons B, Verhelst M, Murmann B (2018) Bit error tolerance of a CIFAR-10 binarized convolutional neural network processor. In: IEEE International Symposium on Circuits and Systems (ISCAS) Yang L, Bankman D, Moons B, Verhelst M, Murmann B (2018) Bit error tolerance of a CIFAR-10 binarized convolutional neural network processor. In: IEEE International Symposium on Circuits and Systems (ISCAS)
Zurück zum Zitat Yin S, et al (2017) Minimizing area and energy of deep learning hardware design using collective low precision and structured compression. In: Asilomar conference on signals, systems and computers Yin S, et al (2017) Minimizing area and energy of deep learning hardware design using collective low precision and structured compression. In: Asilomar conference on signals, systems and computers
Zurück zum Zitat Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:160606160 Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:160606160
Zurück zum Zitat Zhu C, Han S, Mao H, Dally WJ (2016) Trained ternary quantization. arXiv preprint:161201064 Zhu C, Han S, Mao H, Dally WJ (2016) Trained ternary quantization. arXiv preprint:161201064
Metadaten
Titel
Hardware-Algorithm Co-optimizations
verfasst von
Bert Moons
Daniel Bankman
Marian Verhelst
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-319-99223-5_3

Neuer Inhalt