Skip to main content

2018 | OriginalPaper | Buchkapitel

An Experimental Perspective for Computation-Efficient Neural Networks Training

verfasst von : Lujia Yin, Xiaotao Chen, Zheng Qin, Zhaoning Zhang, Jinghua Feng, Dongsheng Li

Erschienen in: Advanced Computer Architecture

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, as the tremendous requirements of computation-efficient neural networks to deploy deep learning models on inexpensive and broadly-used devices, many lightweight networks have been presented, such as MobileNet series, ShuffleNet, etc. The computation-efficient models are specifically designed for very limited computational budget, e.g., 10–150 MFLOPs, and can run efficiently on ARM-based devices. These models have smaller CMR than the large networks, such as VGG, ResNet, Inception, etc.
However, it is quite efficient for inference on ARM, how about inference or training on GPU? Unfortunately, compact models usually cannot make full utilization of GPU, though it is fast for its small size. In this paper, we will present a series of extensive experiments on the training of compact models, including training on single host, with GPU and CPU, and distributed environment. Then we give some analysis and suggestions on the training.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
It is Chipset Qualcomm MSM8996 Snapdragon 821, CPU Quad-core (4\(\,\times \,\)2.15/2.16 GHz Kryo).
 
2
Unlike the original papers, the computational complexity and the memory accesses also include the pooling, lateral and activation layers.
 
Literatur
1.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions, pp. 1–9 (2014) Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions, pp. 1–9 (2014)
2.
Zurück zum Zitat Szegedy, C., Ioffe, S., Vanhoucke, V., et al.: Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, vol. 4, p. 12 (2017) Szegedy, C., Ioffe, S., Vanhoucke, V., et al.: Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, vol. 4, p. 12 (2017)
3.
Zurück zum Zitat Chollet, F.: Xception: deep learning with depth wise separable convolutions. arXiv preprint (2016) Chollet, F.: Xception: deep learning with depth wise separable convolutions. arXiv preprint (2016)
4.
Zurück zum Zitat He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
5.
Zurück zum Zitat Huang, G., Liu, Z., Weinberger, K.Q., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, no. 2, p. 3 (2017) Huang, G., Liu, Z., Weinberger, K.Q., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, no. 2, p. 3 (2017)
6.
Zurück zum Zitat Huang, J., Rathod, V., Sun, C., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR (2017) Huang, J., Rathod, V., Sun, C., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR (2017)
7.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
8.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)
9.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. IEEE Computer Society (2014) Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. IEEE Computer Society (2014)
10.
Zurück zum Zitat Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef
11.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE (2015)
12.
Zurück zum Zitat Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861 (2017)
13.
Zurück zum Zitat Sandler, M., Howard, A., Zhu, M., et al.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018) Sandler, M., Howard, A., Zhu, M., et al.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv preprint arXiv:​1801.​04381 (2018)
14.
Zurück zum Zitat Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017) Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:​1707.​01083 (2017)
15.
Zurück zum Zitat Qin, Z., Zhang, Z., Chen, X., et al.: FD-MobileNet: improved MobileNet with a fast down sampling strategy. arXiv preprint arXiv:1802.03750 (2018) Qin, Z., Zhang, Z., Chen, X., et al.: FD-MobileNet: improved MobileNet with a fast down sampling strategy. arXiv preprint arXiv:​1802.​03750 (2018)
16.
Zurück zum Zitat Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef
17.
Zurück zum Zitat Goyal, P., Dollár, P., Girshick, R., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017) Goyal, P., Dollár, P., Girshick, R., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour. arXiv preprint arXiv:​1706.​02677 (2017)
18.
Zurück zum Zitat You, Y., Zhang, Z., Hsieh, C.J., et al.: 100-epoch ImageNet training with AlexNet in 24 minutes. ArXiv e-prints (2017) You, Y., Zhang, Z., Hsieh, C.J., et al.: 100-epoch ImageNet training with AlexNet in 24 minutes. ArXiv e-prints (2017)
19.
Zurück zum Zitat Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks (2016) Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks (2016)
20.
Zurück zum Zitat Mathew, M., Desappan, K., Swami, P.K., et al.: Sparse, quantized, full frame CNN for low power embedded devices. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 328–336. IEEE Computer Society (2017) Mathew, M., Desappan, K., Swami, P.K., et al.: Sparse, quantized, full frame CNN for low power embedded devices. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 328–336. IEEE Computer Society (2017)
21.
Zurück zum Zitat Li, M.: Scaling distributed machine learning with the parameter server, p. 1 (2014) Li, M.: Scaling distributed machine learning with the parameter server, p. 1 (2014)
22.
Zurück zum Zitat Chen, T., Li, M., Li, Y., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015) Chen, T., Li, M., Li, Y., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
23.
Zurück zum Zitat InfiniBand Trade Association: InfiniBand Architecture Specification: Release 1.0 (2000) InfiniBand Trade Association: InfiniBand Architecture Specification: Release 1.0 (2000)
24.
Zurück zum Zitat Padovano, M.: System and method for accessing a storage area network as network attached storage: WO, US6606690[P] (2003) Padovano, M.: System and method for accessing a storage area network as network attached storage: WO, US6606690[P] (2003)
25.
Zurück zum Zitat Kågström, B., Ling, P., van Loan, C.: GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. (TOMS) 24(3), 268–302 (1998)CrossRef Kågström, B., Ling, P., van Loan, C.: GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. (TOMS) 24(3), 268–302 (1998)CrossRef
26.
Zurück zum Zitat Williams, S., Patterson, D., Oliker, L., et al.: The roofline model: a pedagogical tool for auto-tuning kernels on multicore architectures. In: Hot Chips, vol. 20, pp. 24–26 (2008) Williams, S., Patterson, D., Oliker, L., et al.: The roofline model: a pedagogical tool for auto-tuning kernels on multicore architectures. In: Hot Chips, vol. 20, pp. 24–26 (2008)
27.
Zurück zum Zitat Sifre, L.: Rigid-motion scattering for image classification. Ph.D. thesis (2014) Sifre, L.: Rigid-motion scattering for image classification. Ph.D. thesis (2014)
Metadaten
Titel
An Experimental Perspective for Computation-Efficient Neural Networks Training
verfasst von
Lujia Yin
Xiaotao Chen
Zheng Qin
Zhaoning Zhang
Jinghua Feng
Dongsheng Li
Copyright-Jahr
2018
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-2423-9_13

Neuer Inhalt