Introduction
Related work
Lightweight convolutional neural network model
Attention mechanism
Approach
Motivation
Model | Stemlayer Kernelsize | Downsample rate(stride) | Output Feature map size | Stages | Channels |
---|---|---|---|---|---|
AlexNet | conv11 × 11 | 4 | 56 × 56 | 4 | 96 |
GoogleNet | conv7 × 7 | 2 | 112 × 112 | 4 | 64 |
ResNet | conv7 × 7 | 4 | 56 × 56 | 4 | 64 |
VGGNet | conv3 × 3 | 1 | 224 × 224 | 4 | 64 |
ConvNeXt | conv4 × 4 | 4 | 56 × 56 | 4 | 96 |
ShuffleNetV2 | conv3 × 3 | 4 | 56 × 56 | 3 | 24 |
GhostNet | conv3 × 3 | 2 | 112 × 112 | 4 | 16 |
EfficientNet | conv3 × 3 | 2 | 112 × 112 | 4 | 32 |
Overall architecture
Outsize | Layer | CCNNet1.0X | CCNNet1.5X | CCNNet2.0X | CCNNet3.0X |
---|---|---|---|---|---|
\(72 \times 72\) | Stem | \(3 \times 3,128,stride = 3\)\(\left[ {3 \times 3,128} \right] \times 2\) | \(3 \times 3,160,stride = 3\)\(\left[ {3 \times 3,160} \right] \times 2\) | \(3 \times 3,192,stride = 3\)\(\left[ {3 \times 3,192} \right] \times 2\) | \(3 \times 3,256,stride = 3\)\(\left[ {3 \times 3,256} \right] \times 2\) |
Stage1 | GCIR | \(\left[ {\begin{array}{*{20}c} {3 \times 3,128} \\ {MDCA} \\ {1 \times 1,512} \\ {1 \times 1,128} \\ \end{array} } \right] \times 3\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,160} \\ {MDCA} \\ {1 \times 1,640} \\ {1 \times 1,160} \\ \end{array} } \right] \times 3\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,192} \\ {MDCA} \\ {1 \times 1,768} \\ {1 \times 1,192} \\ \end{array} } \right] \times 3\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,256} \\ {MDCA} \\ {1 \times 1,1024} \\ {1 \times 1,256} \\ \end{array} } \right] \times 3\) |
\(24 \times 24\) | DS | \(3 \times 3,64,stride = 3\) | \(3 \times 3,80,stride = 3\) | \(3 \times 3,96,stride = 3\) | \(3 \times 3,128,stride = 3\) |
Stage2 | GCIR | \(\left[ {\begin{array}{*{20}c} {3 \times 3,64} \\ {MDCA} \\ {1 \times 1,256} \\ {1 \times 1,64} \\ \end{array} } \right] \times 9\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,80} \\ {MDCA} \\ {1 \times 1,320} \\ {1 \times 1,80} \\ \end{array} } \right] \times 9\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,96} \\ {MDCA} \\ {1 \times 1,384} \\ {1 \times 1,96} \\ \end{array} } \right] \times 9\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,128} \\ {MDCA} \\ {1 \times 1,512} \\ {1 \times 1,128} \\ \end{array} } \right] \times 9\) |
\(8 \times 8\) | DS | \(3 \times 3,128,stride = 3\) | \(3 \times 3,160,stride = 3\) | \(3 \times 3,192,stride = 3\) | \(3 \times 3,256,stride = 3\) |
Stage3 | GCIR | \(\left[ {\begin{array}{*{20}c} {3 \times 3,128} \\ {MDCA} \\ {1 \times 1,512} \\ {1 \times 1,128} \\ \end{array} } \right] \times 3\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,160} \\ {MDCA} \\ {1 \times 1,640} \\ {1 \times 1,160} \\ \end{array} } \right] \times 3\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,192} \\ {MDCA} \\ {1 \times 1,768} \\ {1 \times 1,192} \\ \end{array} } \right] \times 3\) | \(\left[ {\begin{array}{*{20}c} {3 \times 3,256} \\ {MDCA} \\ {1 \times 1,1024} \\ {1 \times 1,256} \\ \end{array} } \right] \times 3\) |
\(1 \times 1\) | GAP | \(AvgPool(1 \times 1)\) | |||
\(1 \times 1\) | FC | \(1 \times 1,1000\) | |||
Parameter | 1.4 M | 2.2 M | 3.1 M | 5.4 M | |
Flops | 0.36B | 0.46B | 0.55B | 0.74B |
Stem layer
GCIR block
Downsample layer
MDCA (Multi dimension channel attention)
Experimental
Experimental environment
Datasets
Experimental results and analysis
The validation of CCNNet model
Model | Parameters/M | Flops/B | Top-1 acc | ||
---|---|---|---|---|---|
ImageNet (%) | cifar-10 (%) | cifar-100 (%) | |||
MobileNetV3small | 2.5 | 0.3 | 67.4 | 89.6 | 74.2 |
Shuffle NetV2 0.5X | 1.4 | 0.15 | 61.1 | 88.9 | 71.4 |
GhostNet 0.5X | 2.6 | 0.14 | 66.2 | 89.8 | 74.1 |
MobileNeXt 0.5X | 1.8 | 0.3 | 67.7 | 89.9 | 75.2 |
MobileVit-XXS | 1.3 | 0.7 | 69.0 | 90.2 | 76.3 |
CCNNet(Ours) | 1.4 | 0.36 | 70.1 | 90.8 | 77.6 |
Model | Parameters/M | Flops/B | Top-1 accuracy | |||
---|---|---|---|---|---|---|
ImageNet (%) | cifar-10 (%) | cifar-100 (%) | TCM-100 (%) | |||
CCNNet(1.0X) | 1.4 | 0.36 | 70.1 | 90.8 | 77.6 | 86.8 |
CCNNet(1.5X) | 2.1 | 0.46 | 73.2 | 91.9 | 79.4 | 87.3 |
CCNNet(2.0X) | 3.1 | 0.55 | 75.1 | 93.8 | 80.1 | 89.2 |
CCNNet(3.0X) | 5.4 | 0.74 | 76.3 | 94.7 | 81.6 | 92.5 |
Model | Parameter/M | MAdd/M | Top1-Acc (%) |
---|---|---|---|
CCNNet2.0X(ours) | 3.1 | 546 M | 75.1 |
MobileNeXt(1.0x) | 3.4 | 300 M | 74.0 |
MobileNetV2 | 3.6 | 340 M | 72.3 |
ShuffleNetV2(1.5) | 3.4 | 292 M | 72.6 |
MobileNetV1 | 3.6 | 578 M | 70.9 |
MobileNetV3small-1.25x | 3.6 | 100 M | 70.6 |
PP-LCNet-1x | 3.0 | 161 M | 71.3 |
The CCNNet model validation experiments on TCM-100
Metrics | ShuffleNetV2 | MobileNeXt | CCNNet1.0X(ours) |
---|---|---|---|
Training accuracy | 0.970 | 0.952 | 0.9706 |
Training loss | 0.098 | 0.159 | 0.100 |
Testing accuracy | 0.836 | 0.829 | 0.868 |
testing loss | 0.599 | 0.633 | 0.421 |
Ablation study
The downsampling rate
Model | Downsample rate | Epochs | Top-1 acc | ||
---|---|---|---|---|---|
ImageNet (%) | Cifar-10 (%) | Cifar-100 (%) | |||
ConvNeXt | 2 | 200 | 83.42 | 90.73 | 76.52 |
3 | 200 | 83.39 | 90.69 | 76.48 | |
4 | 200 | 83.27 | 90.63 | 76.43 |
MDCA attention module
Setting | Parameters/M | M-Adds/M | top-1Acc (%) |
---|---|---|---|
CCNNet + MDCA | 2.2 | 366 | 73.2 |
CCNNet + SE | 2.2 | 366 | 72.7 |
CCNNet + CBAM | 2.2 | 366 | 72.3 |
CCNNet + CA | 2.35 | 381 | 73.3 |
CCNNet + SimAM | 2.2 | 366 | 71.8 |