Introduction
Related works
Deep learning applications
-
Computer vision
-
Translation
-
Smart cars
-
Robotics
-
Health monitoring
-
Disease prediction
-
Medical image analysis
-
Drug discovery
-
Biomedicine
-
Bioinformatics
-
Smart clothing
-
Personal health advisors
-
Pixel restoration for photos
-
Sound restoration in videos
-
Describing photos
-
Handwriting recognition
-
Predicting natural disasters
-
Cyber physical security systems [12]
-
Intelligent transportation systems [13]
-
Computed tomography image reconstruction [14]
Method
Results and discussion
-
LayerFile: Including the information related to the layers of neural network.
-
DataFlow File: Information related to data flow.
-
Vector Width: Width of the vectors.
-
NoCBand width: Bandwidth of NoC.
-
Multicast Supported: This logical indictor (True/False) is for defining that the NoC supports multicast or not.
-
NumAverageHopsinNoC: Average number of hops in the NoC.
-
NumPEs: Number of processing elements.
No. | Input parameter | Value |
---|---|---|
1 | LayerFile | Vgg16_conv11 |
2 | dataFlowFile | NLR.m NVDLA.m |
3 | vectorWidth | 64 |
4 | NoCBandwidth | 128 |
5 | multicastSupported | True(1) |
6 | numAverageHopsinNoC | 4 |
7 | numPEs | 32 |
Data Flow | NLR | NVDLA |
---|---|---|
Buffer analysis | ||
L1 Buffer Requiremnet (Byte) | 18.00 | 66.00 |
L2 Buffer Requiremnet (KB) | 1.12 | 4.12 |
L1RdSum | 7,225,344 | 451,584 |
L1WrSum | 7,225,344 | 451,584 |
L2RdSum | 462,422,016 | 28,901,376 |
L2WrSum | 462,422,016 | 28,901,376 |
L1 weight reuse | 1 | 16 |
L1 input reuse | 4 | 16 |
L2 weight reuse | 448 | 190.26 |
L2 input reuse | 2633 | 4473 |
NoC analysis | ||
L1 to L2 NoC BW | 128 | 32 |
L2 to L1 NoC BW | 160 | 1024 |
Performance analysis | ||
L1 to L2 Sum | 56 | 32 |
L1 to L2 Delay | 4.43 | 4.25 |
L2 to L1 Delay | 0 | 0 |
Roofline Throughput (GFLOPS with 1 GHZ clock) | 896 | 128 |
Compute Runtime | 169 | 421 |
Total Runtime (cycles) | 1,428,553,728 | 384,072,192 |