1 Introduction
2 Related Work
3 Background on CNNs
3.1 Fully-Connected Layers
3.2 Convolutional Layers
3.3 Distributed Inference
4 Layer Partitioning Methods
4.1 Partitioning of Feature-dominated Layers
4.2 Partitioning of Weight-dominated Layers
4.2.1 Layer Output Partitioning (LOP)
4.2.2 Layer Input Partitioning (LIP)
4.2.3 Fused Layer Partitioning (FUSE)
4.2.4 Partitioning of Convolutional Layers
5 ILP-based Optimization of Partitioning Decisions
5.1 ILP-based Memory Footprint Minimization
5.2 ILP-Based Communication Optimization for Weight Partitioned Layers
6 Experimental Evaluation
6.1 Evaluation of the ILP-based Optimization Methods
6.1.1 Evaluation of ILP-based Memory Footprint Minimization
Model | L | First OWP layer | \(F_n^{(SINGLE)}\) [MB] | \(F_n^{(FULL)}\) [MB] | \(F_n\) reduction | \(F_n^{(SEQ)}\) [MB] |
---|---|---|---|---|---|---|
YOLOv2 | 32 | 13 | 256 | 28.4 | 9.0x | 51.8 |
AlexNet | 14 | 3 | 16.8 | 2.65 | 6.3x | 5.82 |
VGG-16 | 25 | 8 | 84.5 | 13.2 | 6.4x | 25.8 |
GoogLeNet | 27 | 5 | 97.5 | 12.2 | 8.0x | 20.1 |
6.1.2 Evaluation of ILP-based Communication Optimization
Model | \(C^{(LOP)}\) [MB] | \(C^{(OWP)}\) [MB] | \(C^{(OWP)}\) Saving [%] | \(C^{}\)([25]) [MB] |
---|---|---|---|---|
YOLOv2 | 84.0 | 59.8 | 28.8 | 61.0 |
AlexNet | 9.69 | 9.11 | 5.96 | 9.11 |
VGG-16 | 202 | 182 | 10.1 | 186 |
Extraction GoogLeNet | 47.0 | 41.8 | 11.1 | 41.8 |