Skip to main content
Erschienen in: Cluster Computing 4/2021

10.07.2021

Data scheduling and placement in deep learning accelerator

verfasst von: Seyedeh Yasaman Hosseini Mirmahaleh, Midia Reshadi, Nader Bagherzadeh, Ahmad Khademzadeh

Erschienen in: Cluster Computing | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep neural networks (DNNs) have been employed to different devices as a popular machine learning algorithm (ML) owing to deploy the Internet of Things (IoT), data mining in cloud computing, and web search engines, which MLs had an impressive effect on IoT’s edge level nodes. Deploying DNN-based applications leads to memory access problems, including communication delay, energy efficiency, and bandwidth requirement. We propose a bus scheduling for data placement on distributed local buffers in a deep learning accelerator (DLA). The contributions of this paper include (1) providing a method for data-flow mapping between off-chip DRAM and distributed local buffers, and a flow mapping approach for data transfer between distributed local buffers and processing elements (PEs) (2) employing distributed local buffers in four directions for traffic distribution on a mesh based on memory access mechanism (3) bus scheduling for data placement on distributed local buffers. Simulated experiment based on typical DNN (i.e., AlexNet, VGG-16, and GoogLeNet) workflows demonstrates the effectiveness of the design: (1) the scheduling and mapping methods improve total runtime and bandwidth requirement by approximately 42.29% and 88.95% compared with TPU, respectively. Additionally, (2) our methods reduce total runtime for row-column stationary plus by approximately 99% compared with weight-stationary data-flow in CONV1 and CONV11 of VGG-16, respectively. This work reports the simulation results based on distributing AlexNet, VGG-16, and GoogLeNet’s traffics as the popular CNNs and DNNs models, whereas it investigates our method's efficiency for other trained models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Int. J. (SOLID-STATE CIRCUITS) (2016) Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Int. J. (SOLID-STATE CIRCUITS) (2016)
2.
Zurück zum Zitat Dally, W.J., Thomas Gray, C., Poulton, J., Khailany, B., Wilson, J., Dennison, L.: Hardware-enabled artificial intelligence. In: IEEE International Conference (SVCDT) (2018) Dally, W.J., Thomas Gray, C., Poulton, J., Khailany, B., Wilson, J., Dennison, L.: Hardware-enabled artificial intelligence. In: IEEE International Conference (SVCDT) (2018)
3.
Zurück zum Zitat Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. IEEE Int. J. (JoP) 105, 2295–2329 (2017) Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. IEEE Int. J. (JoP) 105, 2295–2329 (2017)
4.
Zurück zum Zitat Andri, R., Cavigelli, L., Rossiy, D., Benini, L.: Hyperdrive: a systolically scalable binary-weight CNN inference engine for mW IoT end-nodes. In: IEEE International Conference (ISVLSI) (2018) Andri, R., Cavigelli, L., Rossiy, D., Benini, L.: Hyperdrive: a systolically scalable binary-weight CNN inference engine for mW IoT end-nodes. In: IEEE International Conference (ISVLSI) (2018)
5.
Zurück zum Zitat Luo, T., Liu, S., Li, L., Wang, Y., Zhang, S., Chen, T., Xu, Z., Temam, O., Chen, Y.: DaDianNao: a machine-learning supercomputer. IEEE Int. J. (Computer) (2016) Luo, T., Liu, S., Li, L., Wang, Y., Zhang, S., Chen, T., Xu, Z., Temam, O., Chen, Y.: DaDianNao: a machine-learning supercomputer. IEEE Int. J. (Computer) (2016)
6.
Zurück zum Zitat Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, X., Temam, O.: ShiDianNao: shifting vision processing closer to the sensor. In: IEEE International Conference (ISCA) (2015) Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, X., Temam, O.: ShiDianNao: shifting vision processing closer to the sensor. In: IEEE International Conference (ISCA) (2015)
7.
Zurück zum Zitat Chen, Y.-H., Emer, J.S., Sze, V.: Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. IEEE Int. J. (ArXiv) (2018) Chen, Y.-H., Emer, J.S., Sze, V.: Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. IEEE Int. J. (ArXiv) (2018)
8.
Zurück zum Zitat Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: IEEE International Conference (ISCA) (2017) Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: IEEE International Conference (ISCA) (2017)
9.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE International Conference (CVPR’15) (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE International Conference (CVPR’15) (2015)
10.
Zurück zum Zitat Schuiki, F., Schaffner, M., Gürkaynak, F.K., Benini, L.: A scalable near-memory architecture for training deep neural networks on large in-memory datasets. IEEE Int. J. (ITC) (2019) Schuiki, F., Schaffner, M., Gürkaynak, F.K., Benini, L.: A scalable near-memory architecture for training deep neural networks on large in-memory datasets. IEEE Int. J. (ITC) (2019)
11.
Zurück zum Zitat Samajdar, A., Zhu, Y., Whatmough, P., Mattina, M., Krishna, T.: SCALE-sim: systolic CNN accelerator simulator. In: IEEE International Conference (ASPLOS’18) (2018) Samajdar, A., Zhu, Y., Whatmough, P., Mattina, M., Krishna, T.: SCALE-sim: systolic CNN accelerator simulator. In: IEEE International Conference (ASPLOS’18) (2018)
12.
Zurück zum Zitat Tang, T., Li, S., Xie, Y., Jouppi, N.: MLPAT: a power, area, timing modeling framework for machine learning accelerators. In: IEEE International Conference (MICRO’18) (2018) Tang, T., Li, S., Xie, Y., Jouppi, N.: MLPAT: a power, area, timing modeling framework for machine learning accelerators. In: IEEE International Conference (MICRO’18) (2018)
13.
Zurück zum Zitat Gao, M., Pu, J., Yang, X., Horowitz, M., Kozyrakis, C.: TETRIS: scalable and efficient neural network acceleration with 3D memory. In: IEEE International Conference (ASPLOS) (2017) Gao, M., Pu, J., Yang, X., Horowitz, M., Kozyrakis, C.: TETRIS: scalable and efficient neural network acceleration with 3D memory. In: IEEE International Conference (ASPLOS) (2017)
14.
Zurück zum Zitat Chen (Jimmy), K.-C., Ebrahimi, M., Wang, T.-Y., Yang, Y.-C.: NoC-based DNN accelerator: a future design paradigm. In: Conference (NOCS ’19) (2019) Chen (Jimmy), K.-C., Ebrahimi, M., Wang, T.-Y., Yang, Y.-C.: NoC-based DNN accelerator: a future design paradigm. In: Conference (NOCS ’19) (2019)
15.
Zurück zum Zitat Mirmahaleh, S.Y., Reshadi, M., Bagherzadeh, N.: Flow mapping on mesh-based deep learning accelerator. J. Parallel Distrib. Comput. 144, 80–97 (2020)CrossRef Mirmahaleh, S.Y., Reshadi, M., Bagherzadeh, N.: Flow mapping on mesh-based deep learning accelerator. J. Parallel Distrib. Comput. 144, 80–97 (2020)CrossRef
16.
Zurück zum Zitat Kwon, H., Samajdar, A., Krishna, T.: A communication-centric approach for designing flexible DNN accelerators. In: IEEE International Journal (MICRO) (2018) Kwon, H., Samajdar, A., Krishna, T.: A communication-centric approach for designing flexible DNN accelerators. In: IEEE International Journal (MICRO) (2018)
17.
Zurück zum Zitat Ascia, G., Catania, V., Monteleone, S., Palesi, M., Patti, D., Jose, J.: Analyzing networks-on-chip based deep neural networks. In: Conference (NOCS ’19) (2019) Ascia, G., Catania, V., Monteleone, S., Palesi, M., Patti, D., Jose, J.: Analyzing networks-on-chip based deep neural networks. In: Conference (NOCS ’19) (2019)
18.
Zurück zum Zitat Mirmahaleh, S.Y.H., Reshadi, M., Shabani, H., Guo, X., Bagherzadeh, N.: Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of international symposium on networks-on chip, New York, NY, USA, October 17–18, 2019 (NOCS ’19), 8 pages (2019) Mirmahaleh, S.Y.H., Reshadi, M., Shabani, H., Guo, X., Bagherzadeh, N.: Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of international symposium on networks-on chip, New York, NY, USA, October 17–18, 2019 (NOCS ’19), 8 pages (2019)
19.
Zurück zum Zitat Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: IEEE International Conference (ASPLOS) (2014) Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: IEEE International Conference (ASPLOS) (2014)
20.
Zurück zum Zitat Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: IEEE International Conference (ICML) (2007) Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: IEEE International Conference (ICML) (2007)
21.
Zurück zum Zitat Han, X., Zhou, D., Wang, S., Kimura, S.: CNN-MERP: an FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In: IEEE International Conference (ICCD) (2016) Han, X., Zhou, D., Wang, S., Kimura, S.: CNN-MERP: an FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In: IEEE International Conference (ICCD) (2016)
22.
Zurück zum Zitat Li, J., Mei, X., Prokhorov, D.: Deep neural network for structural prediction and lane detection in traffic scene. IEEE Trans. Neural Netw. Learning Syst. 28, 690–703 (2016)CrossRef Li, J., Mei, X., Prokhorov, D.: Deep neural network for structural prediction and lane detection in traffic scene. IEEE Trans. Neural Netw. Learning Syst. 28, 690–703 (2016)CrossRef
23.
Zurück zum Zitat Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Understanding the limitations of existing energy-efficient design approaches for deep neural networks. In: IEEE International Conference (SYSML’18) (2018) Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Understanding the limitations of existing energy-efficient design approaches for deep neural networks. In: IEEE International Conference (SYSML’18) (2018)
24.
Zurück zum Zitat Chen, Y.-H., Emer, J., Sze, V.: Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Int. J. (Computer Society) 37, 12–21 (2017) Chen, Y.-H., Emer, J., Sze, V.: Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Int. J. (Computer Society) 37, 12–21 (2017)
25.
Zurück zum Zitat Kwon, H., Samajdar, A., Krishna, T.: Rethinking NoCs for spatial neural network accelerators. In: IEEE International Conference (NOCS) (2017) Kwon, H., Samajdar, A., Krishna, T.: Rethinking NoCs for spatial neural network accelerators. In: IEEE International Conference (NOCS) (2017)
26.
Zurück zum Zitat Choi, W., Duraisamy, K., Kim, R.G., Doppa, J.R., Pande, P.P., Marculescu, D., Marculescu, R.: On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Int. J. (Computers) 67, 672–686 (2017)MathSciNetMATH Choi, W., Duraisamy, K., Kim, R.G., Doppa, J.R., Pande, P.P., Marculescu, D., Marculescu, R.: On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Int. J. (Computers) 67, 672–686 (2017)MathSciNetMATH
27.
Zurück zum Zitat Joardar, B.K., Choi, W., Kim, R.G., Doppa, J.R., Pande, P.P., Marculescu, D., Marculescu, R.: 3D NoC-enabled heterogeneous manycore architectures for accelerating CNN training: performance and thermal trade-offs. In: IEEE International Conference (NoC'17) (2017) Joardar, B.K., Choi, W., Kim, R.G., Doppa, J.R., Pande, P.P., Marculescu, D., Marculescu, R.: 3D NoC-enabled heterogeneous manycore architectures for accelerating CNN training: performance and thermal trade-offs. In: IEEE International Conference (NoC'17) (2017)
28.
Zurück zum Zitat Park, S., Bong, K., Shin, D., Lee, J., Choi, S., Yoo, H.-J.: A 1.93UPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In: IEEE International Conference (ISSCC) (2015) Park, S., Bong, K., Shin, D., Lee, J., Choi, S., Yoo, H.-J.: A 1.93UPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In: IEEE International Conference (ISSCC) (2015)
30.
Zurück zum Zitat Adolf, R., Rama, S., Reagen, B., Wei, G.-Y., Brooks, D.: Fathom: reference workloads for modern deep learning methods. IEEE International arXiv.org (2016) Adolf, R., Rama, S., Reagen, B., Wei, G.-Y., Brooks, D.: Fathom: reference workloads for modern deep learning methods. IEEE International arXiv.org (2016)
31.
Zurück zum Zitat Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., Li, X.: C-brain: a deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In: IEEE International Conference (DAC) (2016) Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., Li, X.: C-brain: a deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In: IEEE International Conference (DAC) (2016)
32.
Zurück zum Zitat Guo, K., Lingzhi, Qiu, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: a complete design flow for mapping CNN onto customized hardware. In: IEEE International Conference (VLSI) (2016) Guo, K., Lingzhi, Qiu, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: a complete design flow for mapping CNN onto customized hardware. In: IEEE International Conference (VLSI) (2016)
33.
Zurück zum Zitat Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.-Y.: David brooks, minerva: enabling low-power, highly-accurate deep neural network accelerators. In: IEEE International Conference (Computer Architecture) (2016) Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.-Y.: David brooks, minerva: enabling low-power, highly-accurate deep neural network accelerators. In: IEEE International Conference (Computer Architecture) (2016)
34.
Zurück zum Zitat Jouppi, N.P., Young, C., Patil, N., Patterson, D.: Motivation for and evaluation of the first tensor processing unit. IEEE Int. J. (Micro) 38, 10–19 (2018) Jouppi, N.P., Young, C., Patil, N., Patterson, D.: Motivation for and evaluation of the first tensor processing unit. IEEE Int. J. (Micro) 38, 10–19 (2018)
35.
Zurück zum Zitat Kwon, H., Emer, J.S., Krishna, T.: MAERI: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. In: IEEE International Conference (ASPLOS’18) (2018) Kwon, H., Emer, J.S., Krishna, T.: MAERI: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. In: IEEE International Conference (ASPLOS’18) (2018)
36.
Zurück zum Zitat Kwon, H., Pellauer, M., Krishna, T.: MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. In: IEEE International Conference (ArXiv) (2018) Kwon, H., Pellauer, M., Krishna, T.: MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. In: IEEE International Conference (ArXiv) (2018)
37.
Zurück zum Zitat Yang, T.-J., Chen, Y.-H.: Vivienne sze, designing energy-efficient convolutional neural networks using energy-aware pruning. In: IEEE International Conference (CVPR) (2017) Yang, T.-J., Chen, Y.-H.: Vivienne sze, designing energy-efficient convolutional neural networks using energy-aware pruning. In: IEEE International Conference (CVPR) (2017)
38.
Zurück zum Zitat Cavigelli, L., Magno, M., Benini, L.: Accelerating real-time embedded scene labeling with convolutional networks. In: IEEE International Conference (DAC) (2015) Cavigelli, L., Magno, M., Benini, L.: Accelerating real-time embedded scene labeling with convolutional networks. In: IEEE International Conference (DAC) (2015)
39.
Zurück zum Zitat Mirmahaleh, S.Y., Rahmani, A.M.: DNN pruning and mapping on NoC-Based communication infrastructure. Microelectron. J. 94, 104655 (2019)CrossRef Mirmahaleh, S.Y., Rahmani, A.M.: DNN pruning and mapping on NoC-Based communication infrastructure. Microelectron. J. 94, 104655 (2019)CrossRef
40.
Zurück zum Zitat Karam, R., Puri, R., Bhunia, S.: Energy-efficient adaptive hardware accelerator for text mining application kernels. IEEE Int. J. (VLSI) 24, 3526–3537 (2016) Karam, R., Puri, R., Bhunia, S.: Energy-efficient adaptive hardware accelerator for text mining application kernels. IEEE Int. J. (VLSI) 24, 3526–3537 (2016)
41.
Zurück zum Zitat Firuzan, A., Modarresi, M., Daneshtalab, M., Reshadi, M.: Reconfigurable network-on-chip for 3D neural network accelerators. In: IEEE International Conference (NOCS’18) (2018) Firuzan, A., Modarresi, M., Daneshtalab, M., Reshadi, M.: Reconfigurable network-on-chip for 3D neural network accelerators. In: IEEE International Conference (NOCS’18) (2018)
42.
Zurück zum Zitat Samajdar, A., Mannan, P., Garg, K., Krishna, T.: GeneSys: enabling continuous learning through neural network evolution in hardware. In: IEEE Int. Conference (ArXiv) (2018) Samajdar, A., Mannan, P., Garg, K., Krishna, T.: GeneSys: enabling continuous learning through neural network evolution in hardware. In: IEEE Int. Conference (ArXiv) (2018)
43.
Zurück zum Zitat Li, J., Yan, G., Lu, W., Jiang, S., Gong, S., Wu, J., Li, X.: SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In: IEEE International Conference (DATE), 2018 Li, J., Yan, G., Lu, W., Jiang, S., Gong, S., Wu, J., Li, X.: SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In: IEEE International Conference (DATE), 2018
46.
Zurück zum Zitat Dahl, G., Sainath, T., Hinton, G.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE International Conference (Acoustics, Speech and Signal Processing) (2013) Dahl, G., Sainath, T., Hinton, G.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE International Conference (Acoustics, Speech and Signal Processing) (2013)
47.
Zurück zum Zitat Huang, P., He, X., Gao, J., Deng, L.: Learning deep structured semantic models for web search using clickthrough data. In: IEEE International Conference (Information and Knowledge Management) (2013) Huang, P., He, X., Gao, J., Deng, L.: Learning deep structured semantic models for web search using clickthrough data. In: IEEE International Conference (Information and Knowledge Management) (2013)
48.
Zurück zum Zitat Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River, NJ (1998)MATH Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River, NJ (1998)MATH
49.
Zurück zum Zitat Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: IEEE International Conference (CoRR) (2013) Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: IEEE International Conference (CoRR) (2013)
50.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE International Conference (CVPR) (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE International Conference (CVPR) (2016)
51.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE International Conference (CVPR) (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE International Conference (CVPR) (2014)
52.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IEEE Int. J. Comput. Vis. 115, 211–252 (2015)MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IEEE Int. J. Comput. Vis. 115, 211–252 (2015)MathSciNetCrossRef
53.
Zurück zum Zitat Hagan, M.T., Demuth, H.B., Beale, M.H., De Jesús, O.: IEEE International Book, 2nd Edition. Martin Hagan Publisher (2014) Hagan, M.T., Demuth, H.B., Beale, M.H., De Jesús, O.: IEEE International Book, 2nd Edition. Martin Hagan Publisher (2014)
54.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: IEEE International Conference (NIPS) (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: IEEE International Conference (NIPS) (2012)
55.
Zurück zum Zitat Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., Dally, W.J.: SCNN: an accelerator for compressed-sparse convolutional neural networks. In: IEEE International Conference (ISCA) (2017) Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., Dally, W.J.: SCNN: an accelerator for compressed-sparse convolutional neural networks. In: IEEE International Conference (ISCA) (2017)
56.
Zurück zum Zitat Albaqsami, A., Hosseini, M.S., Bagherzadeh, N.: HTF-MPR: a heterogeneous TensorFlow mapper targeting performance using genetic algorithms and gradient boosting regressors. In: IEEE International Conference (DATE) (2018) Albaqsami, A., Hosseini, M.S., Bagherzadeh, N.: HTF-MPR: a heterogeneous TensorFlow mapper targeting performance using genetic algorithms and gradient boosting regressors. In: IEEE International Conference (DATE) (2018)
57.
Zurück zum Zitat Ebrahimi, M., Daneshtalab, M., Liljeberg, P., Tenhunen, H.: HAMUM-A novel routing protocol for unicast and multicast traffic in MPSoCs. In: IEEE Int. Conference (PDP) (2010) Ebrahimi, M., Daneshtalab, M., Liljeberg, P., Tenhunen, H.: HAMUM-A novel routing protocol for unicast and multicast traffic in MPSoCs. In: IEEE Int. Conference (PDP) (2010)
58.
Zurück zum Zitat Daneshtalab, M., Ebrahimi, M., Mohammadi, S., Afzali-Kusha, A.: Low-distance path-based multicast routing algorithm for network-on-chips. IEEE Int. J. 3, 430 (2009) Daneshtalab, M., Ebrahimi, M., Mohammadi, S., Afzali-Kusha, A.: Low-distance path-based multicast routing algorithm for network-on-chips. IEEE Int. J. 3, 430 (2009)
59.
Zurück zum Zitat Chi, P., Li, S., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y., Xie, Y.: PRIME: a novel processing-in memory architecture for neural network computation in ReRAM-based main memory. In: IEEE International Conference (ISCA) (2016) Chi, P., Li, S., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y., Xie, Y.: PRIME: a novel processing-in memory architecture for neural network computation in ReRAM-based main memory. In: IEEE International Conference (ISCA) (2016)
60.
Zurück zum Zitat Moons, B., Verhelst, M.: A 0.3–2.6 UPS/W precision-scalable processor for real-time large-scale ConvNets. IEEE International Symposium (VLSI) (2016). Moons, B., Verhelst, M.: A 0.3–2.6 UPS/W precision-scalable processor for real-time large-scale ConvNets. IEEE International Symposium (VLSI) (2016).
62.
Zurück zum Zitat Catania, V., Mineo, A., Palesi, M., Patti, D., Monteleone, S.: cycle-accurate network on chip simulation with Noxim. IEEE Int. J. (TOMACS) 27, 1–25 (2016) Catania, V., Mineo, A., Palesi, M., Patti, D., Monteleone, S.: cycle-accurate network on chip simulation with Noxim. IEEE Int. J. (TOMACS) 27, 1–25 (2016)
63.
Zurück zum Zitat Jimmy Chen, K.-C., Wang, T.-Y.: NN-Noxim: high-level cycle-accurate NoC-based neural networks simulator. In: IEEE International Conference (NOCARC) (2018) Jimmy Chen, K.-C., Wang, T.-Y.: NN-Noxim: high-level cycle-accurate NoC-based neural networks simulator. In: IEEE International Conference (NOCARC) (2018)
64.
Zurück zum Zitat Jimmy Chen, K.-C., Wang, T.-Y.: NN-Noxim: high-level cycle-accurate NoC-based neural networks simulator. In: Conference (NOCARC) (2018) Jimmy Chen, K.-C., Wang, T.-Y.: NN-Noxim: high-level cycle-accurate NoC-based neural networks simulator. In: Conference (NOCARC) (2018)
65.
Zurück zum Zitat Chen, K.-C., Wang, T.-Y., Yang, Y.-C.: Cycle-accurate NoC-based convolutional neural network simulator. In: Conference (COINS’19) (2019) Chen, K.-C., Wang, T.-Y., Yang, Y.-C.: Cycle-accurate NoC-based convolutional neural network simulator. In: Conference (COINS’19) (2019)
67.
Zurück zum Zitat Kwon, H., Krishna, T.: OpenSMART: single-cycle multi-hop NoC generator in BSV and chisel. In: IEEE International Conference (ISPASS) (2017) Kwon, H., Krishna, T.: OpenSMART: single-cycle multi-hop NoC generator in BSV and chisel. In: IEEE International Conference (ISPASS) (2017)
68.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: IEEE International Conference (ICLR) (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: IEEE International Conference (ICLR) (2015)
Metadaten
Titel
Data scheduling and placement in deep learning accelerator
verfasst von
Seyedeh Yasaman Hosseini Mirmahaleh
Midia Reshadi
Nader Bagherzadeh
Ahmad Khademzadeh
Publikationsdatum
10.07.2021
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2021
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-021-03355-8

Weitere Artikel der Ausgabe 4/2021

Cluster Computing 4/2021 Zur Ausgabe

Premium Partner