Skip to main content

2024 | OriginalPaper | Buchkapitel

2. Basics and Research Status of Neural Network Processors

verfasst von : Jinshan Yue

Erschienen in: High Energy Efficiency Neural Network Processor with Combined Digital and Computing-in-Memory Architecture

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Das Kapitel beginnt mit einem historischen Überblick über neuronale Netzwerkalgorithmen, der ihre Entwicklung von frühen Modellen wie dem M-P-Modell und der einschichtigen Wahrnehmung bis hin zu modernen Weiterentwicklungen wie Deep Learning und Transformatoren verfolgt. Anschließend vertieft er sich in die Grundlagen neuronaler Netzwerkprozessoren und diskutiert Schlüsselkennzahlen wie Leistung, Leistung und Energieeffizienz. Der Forschungsstand digitaler schaltungsbasierter neuronaler Netzwerkprozessoren wird untersucht, wobei der Schwerpunkt auf Optimierungstechniken wie Datenwiederverwendung, Low-Bit-Quantisierung und Komprimierung von Netzwerkmodellen liegt. Das Kapitel behandelt auch Computing-in-Memory neuronale Netzwerkprozessoren und diskutiert ihre Prinzipien, Bauteile, Schaltkreise und Makroebenen-Designs. Er hebt das Potenzial von CIM für eine höhere Energieeffizienz hervor, verweist aber auch auf die Herausforderungen und unerforschten Forschungsräume. Das Kapitel schließt mit einer Zusammenfassung der bestehenden Herausforderungen und der Notwendigkeit weiterer Forschung sowohl in digitalen Schaltkreisen als auch in CIM-Prozessoren für neuronale Netzwerke.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat James W (1990) The principles of psychology. In: Great books of the Western world, 53 James W (1990) The principles of psychology. In: Great books of the Western world, 53
2.
Zurück zum Zitat McClulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in neurons activity. Bull Math Biophys 5(115–133):10 McClulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in neurons activity. Bull Math Biophys 5(115–133):10
3.
Zurück zum Zitat Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386CrossRef Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386CrossRef
4.
Zurück zum Zitat Minsky M, Papert S (1969) Perceptron: an introduction to computational geometry. The MIT Press Cambrid Expanded Edn 19(88):2 Minsky M, Papert S (1969) Perceptron: an introduction to computational geometry. The MIT Press Cambrid Expanded Edn 19(88):2
5.
Zurück zum Zitat Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536 Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
6.
Zurück zum Zitat LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
7.
Zurück zum Zitat LeCun Y, Cortes C, Burges CJC (1994) The MNIST database of handwritten digits LeCun Y, Cortes C, Burges CJC (1994) The MNIST database of handwritten digits
8.
Zurück zum Zitat Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comp 18(7):1527–1554 Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comp 18(7):1527–1554
9.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
10.
Zurück zum Zitat Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229 Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:​1312.​6229
11.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
12.
13.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
14.
Zurück zum Zitat Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
15.
Zurück zum Zitat Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
16.
Zurück zum Zitat Hochreiter S, Schmidhuber J, Elvezia C (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Hochreiter S, Schmidhuber J, Elvezia C (1997) Long short-term memory. Neural Comput 9(8):1735–1780
17.
Zurück zum Zitat Ali Z (2001) Fundamentals of neural networks. Intell Control Syst Using Soft Comput Methodol 1:1–5 Ali Z (2001) Fundamentals of neural networks. Intell Control Syst Using Soft Comput Methodol 1:1–5
18.
Zurück zum Zitat Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Int J Speech Technol Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Int J Speech Technol
19.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, pp 6000–6010 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, pp 6000–6010
20.
Zurück zum Zitat Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27
21.
Zurück zum Zitat Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche GVD, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489 Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche GVD, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
22.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
23.
24.
Zurück zum Zitat Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:​1704.​04861
25.
Zurück zum Zitat Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856 Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
26.
Zurück zum Zitat Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314MathSciNetCrossRef Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314MathSciNetCrossRef
27.
Zurück zum Zitat Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, pp 807–814 Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, pp 807–814
28.
Zurück zum Zitat Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2146–2153 Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2146–2153
29.
Zurück zum Zitat Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
30.
Zurück zum Zitat LiKamWa R, Hou Y, Gao J, Polansky M, Zhong L (2016) Redeye: analog convnet image sensor architecture for continuous mobile vision. ACM SIGARCH Comput Archit News 44(3):255–266CrossRef LiKamWa R, Hou Y, Gao J, Polansky M, Zhong L (2016) Redeye: analog convnet image sensor architecture for continuous mobile vision. ACM SIGARCH Comput Archit News 44(3):255–266CrossRef
31.
Zurück zum Zitat Li Q, Zhu H, Qiao F, Liu X, Wei Q, Yang H (2018) Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition. In: 2018 IEEE/ACM international symposium on nanoscale architectures (NANOARCH). IEEE, pp 1–3 Li Q, Zhu H, Qiao F, Liu X, Wei Q, Yang H (2018) Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition. In: 2018 IEEE/ACM international symposium on nanoscale architectures (NANOARCH). IEEE, pp 1–3
32.
Zurück zum Zitat Chen Y-H, Krishna T, Emer J, Sze V (2016) 14.5 Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: 2016 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 262–264 Chen Y-H, Krishna T, Emer J, Sze V (2016) 14.5 Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: 2016 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 262–264
33.
Zurück zum Zitat Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 ENVISION: a 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: 2017 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 246, 247 Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 ENVISION: a 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: 2017 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 246, 247
34.
Zurück zum Zitat Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A et al (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th annual international symposium on computer architecture, pp 1–12 Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A et al (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th annual international symposium on computer architecture, pp 1–12
35.
Zurück zum Zitat Bang S, Wang J, Li Z, Cao G, Sylvester D (2017) 14.7 A 288 \(\upmu \)w programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In: 2017 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 250, 251 Bang S, Wang J, Li Z, Cao G, Sylvester D (2017) 14.7 A 288 \(\upmu \)w programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In: 2017 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 250, 251
36.
Zurück zum Zitat Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer CNN accelerators. In: 2016 49th annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 1–12 Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer CNN accelerators. In: 2016 49th annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 1–12
37.
Zurück zum Zitat Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H-J (2018) UNPU: a 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 218–220 Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H-J (2018) UNPU: a 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 218–220
38.
Zurück zum Zitat Horowitz M (2014) 1.1 computing’s energy problem (and what we can do about it). In: 2014 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 10–14 Horowitz M (2014) 1.1 computing’s energy problem (and what we can do about it). In: 2014 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 10–14
39.
Zurück zum Zitat Guo K, Li W, Zhong K, Zhu Z, Zeng S, Song H, Yuan X, Debacker P, Verhelst M, Wang Y (2020) Neural network accelerator comparison Guo K, Li W, Zhong K, Zhu Z, Zeng S, Song H, Yuan X, Debacker P, Verhelst M, Wang Y (2020) Neural network accelerator comparison
40.
Zurück zum Zitat Yin S, Peng O, Yang J, Lu T, Li X, Liu L, Wei S (2018) An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28 nm CMOS. In: 2018 IEEE symposium on VLSI circuits. IEEE, pp 37–38 Yin S, Peng O, Yang J, Lu T, Li X, Liu L, Wei S (2018) An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28 nm CMOS. In: 2018 IEEE symposium on VLSI circuits. IEEE, pp 37–38
41.
Zurück zum Zitat Ueyoshi K, Ando K, Hirose K, Takamaeda-Yamazaki S, Kadomoto J, Miyata T, Hamada M, Kuroda T, Motomura M (2018) QUEST: A 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96 MB 3D SRAM using inductive-coupling technology in 40 nm CMOS. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 216–218 Ueyoshi K, Ando K, Hirose K, Takamaeda-Yamazaki S, Kadomoto J, Miyata T, Hamada M, Kuroda T, Motomura M (2018) QUEST: A 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96 MB 3D SRAM using inductive-coupling technology in 40 nm CMOS. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 216–218
42.
Zurück zum Zitat Yuan Z, Yue J, Yang H, Wang Z, Li J, Yang Y, Guo Q, Li X, Chang M-F, Yang H et al (2018) STICKER: a 0.41–62.1 TOPS/W 8 bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In: 2018 IEEE symposium on VLSI circuits. IEEE, pp 33–34 Yuan Z, Yue J, Yang H, Wang Z, Li J, Yang Y, Guo Q, Li X, Chang M-F, Yang H et al (2018) STICKER: a 0.41–62.1 TOPS/W 8 bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In: 2018 IEEE symposium on VLSI circuits. IEEE, pp 33–34
43.
Zurück zum Zitat Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143 Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143
44.
Zurück zum Zitat Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. Adv Neural Inf Process Syst 29:2074–2082 Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. Adv Neural Inf Process Syst 29:2074–2082
45.
Zurück zum Zitat Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ (2017) Exploring the granularity of sparsity in convolutional neural networks, pp 13–20 Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ (2017) Exploring the granularity of sparsity in convolutional neural networks, pp 13–20
46.
Zurück zum Zitat Zhang T, Ye S, Zhang K, Tang, Wen W, Fardad M, Wang Y (2018) A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European conference on computer vision (ECCV), pp 184–199 Zhang T, Ye S, Zhang K, Tang, Wen W, Fardad M, Wang Y (2018) A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European conference on computer vision (ECCV), pp 184–199
47.
Zurück zum Zitat Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 243–254 Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 243–254
48.
Zurück zum Zitat Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA). IEEE, pp 27–40 Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA). IEEE, pp 27–40
49.
Zurück zum Zitat Whatmough PN, Lee SK, Lee H, Rama S, Brooks D, Wei G-Y (2017) 14.3 A 28 nm SoC with a 1.2 GHz 568 nJ/prediction sparse deep-neural-network engine with \(>\)0.1 timing error rate tolerance for IoT applications. In: 2017 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 242–243 Whatmough PN, Lee SK, Lee H, Rama S, Brooks D, Wei G-Y (2017) 14.3 A 28 nm SoC with a 1.2 GHz 568 nJ/prediction sparse deep-neural-network engine with \(>\)0.1 timing error rate tolerance for IoT applications. In: 2017 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 242–243
51.
Zurück zum Zitat Vasilache N, Johnson J, Mathieu M, Chintala S, Piantino S, LeCun Y (2014) Fast convolutional nets with FBFFT: a GPU performance evaluation. arXiv:1412.7580 Vasilache N, Johnson J, Mathieu M, Chintala S, Piantino S, LeCun Y (2014) Fast convolutional nets with FBFFT: a GPU performance evaluation. arXiv:​1412.​7580
52.
Zurück zum Zitat Biswas A, Chandrakasan AP (2018) Conv-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 488–490 Biswas A, Chandrakasan AP (2018) Conv-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 488–490
53.
Zurück zum Zitat Chi P, Li S, Xu C, Tao Z, Zhao J, Liu Y, Yu W, Xie Y (2016) PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput Archit News 44(3):27–39 Chi P, Li S, Xu C, Tao Z, Zhao J, Liu Y, Yu W, Xie Y (2016) PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput Archit News 44(3):27–39
54.
Zurück zum Zitat Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Stanley Williams R, Srikumar V (2016) ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput Archit News 44(3):14–26 Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Stanley Williams R, Srikumar V (2016) ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput Archit News 44(3):14–26
55.
Zurück zum Zitat Song L, Qian X, Li H, Chen Y (2017) Pipelayer: a pipelined ReRAM-based accelerator for deep learning. In: 2017 IEEE international symposium on high performance computer architecture (HPCA). IEEE, pp 541–552 Song L, Qian X, Li H, Chen Y (2017) Pipelayer: a pipelined ReRAM-based accelerator for deep learning. In: 2017 IEEE international symposium on high performance computer architecture (HPCA). IEEE, pp 541–552
56.
Zurück zum Zitat Yue J, Liu Y, Su F, Li S, Yuan Z, Wang Z, Sun W, Li X, Yang H (2019) AERIS: area/energy-efficient 1T2R ReRAM based processing-in-memory neural network system-on-a-chip. In: Proceedings of the 24th Asia and South Pacific design automation conference, pp 146–151 Yue J, Liu Y, Su F, Li S, Yuan Z, Wang Z, Sun W, Li X, Yang H (2019) AERIS: area/energy-efficient 1T2R ReRAM based processing-in-memory neural network system-on-a-chip. In: Proceedings of the 24th Asia and South Pacific design automation conference, pp 146–151
57.
Zurück zum Zitat Khwa W-S, Chen J-J, Li J-F, Si X, Yang E-Y, Sun X, Liu R, Chen P-Y, Li Q, Yu S et al (2018) A 65 nm 4 kB algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 496–498 Khwa W-S, Chen J-J, Li J-F, Si X, Yang E-Y, Sun X, Liu R, Chen P-Y, Li Q, Yu S et al (2018) A 65 nm 4 kB algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 496–498
58.
Zurück zum Zitat Si X, Chen J-J, Tu Y-N, Huang W-H, Wang J-H, Chiu Y-C, Wei W-C, Wu S-Y, Sun X, Liu R et al (2019) 24.5 A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In: 2019 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 396–398 Si X, Chen J-J, Tu Y-N, Huang W-H, Wang J-H, Chiu Y-C, Wei W-C, Wu S-Y, Sun X, Liu R et al (2019) 24.5 A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In: 2019 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 396–398
59.
Zurück zum Zitat Li S, Niu D, Malladi KT, Zheng H, Brennan B, Xie Y (2017) DRISA: a DRAM-based reconfigurable in-situ accelerator. In: 2017 50th annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 288–301 Li S, Niu D, Malladi KT, Zheng H, Brennan B, Xie Y (2017) DRISA: a DRAM-based reconfigurable in-situ accelerator. In: 2017 50th annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 288–301
60.
Zurück zum Zitat Seshadri V, Lee D, Mullins T, Hassan H, Boroumand A, Kim J, Kozuch MA, Mutlu O, Gibbons PB, Mowry TC (2017) Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: 2017 50th annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 273–287 Seshadri V, Lee D, Mullins T, Hassan H, Boroumand A, Kim J, Kozuch MA, Mutlu O, Gibbons PB, Mowry TC (2017) Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: 2017 50th annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 273–287
61.
Zurück zum Zitat Guo X, Bayat FM, Prezioso M, Chen Y, Nguyen B, Do N, Strukov DB (2017) Temperature-insensitive analog vector-by-matrix multiplier based on 55 nm NOR flash memory cells. In: 2017 IEEE custom integrated circuits conference (CICC). IEEE, pp 1–4 Guo X, Bayat FM, Prezioso M, Chen Y, Nguyen B, Do N, Strukov DB (2017) Temperature-insensitive analog vector-by-matrix multiplier based on 55 nm NOR flash memory cells. In: 2017 IEEE custom integrated circuits conference (CICC). IEEE, pp 1–4
62.
Zurück zum Zitat Guo X, Bayat FM, Bavandpour M, Klachko M, Mahmoodi MR, Prezioso M, Likharev KK, Strukov DB (2017) Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology. In: 2017 IEEE international electron devices meeting (IEDM). IEEE, pp 6.5.1–6.5.4 Guo X, Bayat FM, Bavandpour M, Klachko M, Mahmoodi MR, Prezioso M, Likharev KK, Strukov DB (2017) Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology. In: 2017 IEEE international electron devices meeting (IEDM). IEEE, pp 6.5.1–6.5.4
63.
Zurück zum Zitat Chen W-H, Li K-X, Lin W-Y, Hsu K-H, Li P-Y, Yang C-H, Xue C-X, Yang E-Y, Chen Y-K, Chang Y-S et al (2018) A 65 nm 1 MB nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processors. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 494–496 Chen W-H, Li K-X, Lin W-Y, Hsu K-H, Li P-Y, Yang C-H, Xue C-X, Yang E-Y, Chen Y-K, Chang Y-S et al (2018) A 65 nm 1 MB nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processors. In: 2018 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 494–496
64.
Zurück zum Zitat Liu Q, Gao B, Yao P, Wu D, Chen J, Pang Y, Zhang W, Liao Y, Xue C-X, Chen W-H et al (2020) 33.2 A fully integrated analog ReRAM based 78.4 TOPS/W compute-in-memory chip with fully parallel MAC computing. In: 2020 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 500–502 Liu Q, Gao B, Yao P, Wu D, Chen J, Pang Y, Zhang W, Liao Y, Xue C-X, Chen W-H et al (2020) 33.2 A fully integrated analog ReRAM based 78.4 TOPS/W compute-in-memory chip with fully parallel MAC computing. In: 2020 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 500–502
65.
Zurück zum Zitat Keshavarzi A, Ni K, Van Den H Wilbert, Datta S, Raychowdhury A (2020) Ferroelectronics for edge intelligence. IEEE Micro 40(6):33–48CrossRef Keshavarzi A, Ni K, Van Den H Wilbert, Datta S, Raychowdhury A (2020) Ferroelectronics for edge intelligence. IEEE Micro 40(6):33–48CrossRef
66.
Zurück zum Zitat Khan AI, Keshavarzi A, Datta S (2020) The future of ferroelectric field-effect transistor technology. Nat Electron 3(10):588–597 Khan AI, Keshavarzi A, Datta S (2020) The future of ferroelectric field-effect transistor technology. Nat Electron 3(10):588–597
67.
Zurück zum Zitat Le Gallo M, Sebastian A, Mathis R, Manica M, Giefers H, Tuma T, Bekas C, Curioni A, Eleftheriou E (2018) Mixed-precision in-memory computing. Nat Electron 1(4):246–253CrossRef Le Gallo M, Sebastian A, Mathis R, Manica M, Giefers H, Tuma T, Bekas C, Curioni A, Eleftheriou E (2018) Mixed-precision in-memory computing. Nat Electron 1(4):246–253CrossRef
68.
Zurück zum Zitat Shi Y, Oh S, Huang Z, Lu X, Kang SH, Kuzum D (2020) Performance prospects of deeply scaled spin-transfer torque magnetic random-access memory for in-memory computing. IEEE Electron Dev Lett 41(7):1126–1129 Shi Y, Oh S, Huang Z, Lu X, Kang SH, Kuzum D (2020) Performance prospects of deeply scaled spin-transfer torque magnetic random-access memory for in-memory computing. IEEE Electron Dev Lett 41(7):1126–1129
69.
Zurück zum Zitat Yang J, Kong Y, Wang Z, Liu Y, Wang B, Yin S, Shi L (2019) 24.4 sandwich-RAM: an energy-efficient in-memory BWN architecture with pulse-width modulation. In: 2019 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 394–396 Yang J, Kong Y, Wang Z, Liu Y, Wang B, Yin S, Shi L (2019) 24.4 sandwich-RAM: an energy-efficient in-memory BWN architecture with pulse-width modulation. In: 2019 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 394–396
70.
Zurück zum Zitat Si X, Tu Y-N, Huanq W-H, Su J-W, Lu P-J, Wang J-H, Liu T-W, Wu S-Y, Liu R, Chou Y-C et al (2020) 15.5 A 28 nm 64 kB 6T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips. In: 2020 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 246–248 Si X, Tu Y-N, Huanq W-H, Su J-W, Lu P-J, Wang J-H, Liu T-W, Wu S-Y, Liu R, Chou Y-C et al (2020) 15.5 A 28 nm 64 kB 6T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips. In: 2020 IEEE international solid-state circuits conference (ISSCC). IEEE, pp 246–248
71.
Zurück zum Zitat Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 \(\upmu \)J/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28-nm CMOS. IEEE J Sol-State Circ 54(1):158–172 Bankman D, Yang L, Moons B, Verhelst M, Murmann B (2018) An always-on 3.8 \(\upmu \)J/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28-nm CMOS. IEEE J Sol-State Circ 54(1):158–172
72.
Zurück zum Zitat Valavi H, Ramadge PJ, Nestler E, Verma N (2019) A 64-tile 2.4-MB in-memory-computing CNN accelerator employing charge-domain compute. IEEE J Sol-State Circ 54(6):1789–1799 Valavi H, Ramadge PJ, Nestler E, Verma N (2019) A 64-tile 2.4-MB in-memory-computing CNN accelerator employing charge-domain compute. IEEE J Sol-State Circ 54(6):1789–1799
73.
Zurück zum Zitat Jia H, Ozatay M, Tang Y, Valavi H, Pathak R, Lee J, Verma N (2021) A programmable neural-network inference accelerator based on scalable in-memory computing. In: 2021 IEEE international solid-state circuits conference (ISSCC), vol 64. IEEE, pp 236–238 Jia H, Ozatay M, Tang Y, Valavi H, Pathak R, Lee J, Verma N (2021) A programmable neural-network inference accelerator based on scalable in-memory computing. In: 2021 IEEE international solid-state circuits conference (ISSCC), vol 64. IEEE, pp 236–238
74.
Zurück zum Zitat Guo R, Liu Y, Zheng S, Wu S-Y, Ouyang P, Khwa W-S, Chen X, Chen J-J, Li X, Liu L et al (2019) A 5.1 pJ/neuron 127.3 \(\upmu \)s/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65 nm CMOS. In: 2019 symposium on VLSI circuits. IEEE, pp C120–C121 Guo R, Liu Y, Zheng S, Wu S-Y, Ouyang P, Khwa W-S, Chen X, Chen J-J, Li X, Liu L et al (2019) A 5.1 pJ/neuron 127.3 \(\upmu \)s/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65 nm CMOS. In: 2019 symposium on VLSI circuits. IEEE, pp C120–C121
Metadaten
Titel
Basics and Research Status of Neural Network Processors
verfasst von
Jinshan Yue
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-3477-1_2