skip to main content
10.1145/3318216.3363312acmconferencesArticle/Chapter ViewAbstractPublication PagessecConference Proceedingsconference-collections
research-article
Public Access

Adaptive parallel execution of deep neural networks on heterogeneous edge devices

Published:07 November 2019Publication History

ABSTRACT

New applications such as smart homes, smart cities, and autonomous vehicles are driving an increased interest in deploying machine learning on edge devices. Unfortunately, deploying deep neural networks (DNNs) on resource-constrained devices presents significant challenges. These workloads are computationally intensive and often require cloud-like resources. Prior solutions attempted to address these challenges by either introducing more design efforts or by relying on cloud resources for assistance.

In this paper, we propose a runtime adaptive convolutional neural network (CNN) acceleration framework that is optimized for heterogeneous Internet of Things (IoT) environments. The framework leverages spatial partitioning techniques through fusion of the convolution layers and dynamically selects the optimal degree of parallelism according to the availability of computational resources, as well as network conditions. Our evaluation shows that our framework outperforms state-of-art approaches by improving the inference speed and reducing communication costs while running on wirelessly-connected Raspberry-Pi3 devices. Experimental evaluation shows up to 1.9x ~ 3.7x speedup using 8 devices for three popular CNN models.

References

  1. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amazon. [n.d.]. Machine Learning on AWS. https://aws.amazon.com/machine-learning/.Google ScholarGoogle Scholar
  4. Apple. [n.d.]. Core ML. https://developer.apple.com/documentation/coreml.Google ScholarGoogle Scholar
  5. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  6. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).Google ScholarGoogle Scholar
  7. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 269--284.Google ScholarGoogle Scholar
  8. Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 367--379.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 609--622.Google ScholarGoogle Scholar
  10. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (ICLR). 160--167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).Google ScholarGoogle Scholar
  12. Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). 92--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Marat Dukhan. 2018. NNPACK. https://github.com/Maratyszcza/NNPACK.Google ScholarGoogle Scholar
  14. Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In Proceedings of the 24th ACM International Conference on Mobile Computing and Networking. 115--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Raspberry Pi Foundation. [n.d.]. Raspberry Pi. https://www.raspberrypi.org/.Google ScholarGoogle Scholar
  16. Google. [n.d.]. Cloud Machine Learning Engine. https://cloud.google.com/ml-engine/.Google ScholarGoogle Scholar
  17. Ramyad Hadidi, Jiashen Cao, Micheal S Ryoo, and Hyesoon Kim. 2019. Collaborative Execution of Deep Neural Networks on Internet of Things Devices. arXiv preprint arXiv:1901.02537 (2019).Google ScholarGoogle Scholar
  18. Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S Ryoo, and Hyesoon Kim. 2018. Distributed Perception by Collaborative Robots. IEEE Robotics and Automation Letters 3, 4 (2018), 3709--3716.Google ScholarGoogle ScholarCross RefCross Ref
  19. Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  21. Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  22. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.Google ScholarGoogle ScholarCross RefCross Ref
  23. Intel. [n.d.]. Movidius Neural Compute Stick. https://software.intel.com/en-us/movidius-ncs.Google ScholarGoogle Scholar
  24. Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon. 2018. IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers. In Proceedings of the ACM Symposium on Cloud Computing. 401--411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhihao Jia, Sina Lin, Charles R Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1802.04924 (2018).Google ScholarGoogle Scholar
  26. Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).Google ScholarGoogle Scholar
  28. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.Google ScholarGoogle Scholar
  29. He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: deep learning for the internet of things with edge computing. IEEE Network 32, 1 (2018), 96--101.Google ScholarGoogle ScholarCross RefCross Ref
  30. Robert LiKamWa, Yunhui Hou, Yuan Gao, Mia Polansky, and Lin Zhong. 2016. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 255--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical image analysis 42 (2017), 60--88.Google ScholarGoogle Scholar
  32. Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 369--381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. 2017. Modnn: Local distributed mobile computing system for deep neural network. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1396--1401.Google ScholarGoogle Scholar
  34. Microsoft. [n.d.]. Azure Machine Learning service. https://azure.microsoft.com/en-us/services/machine-learning-service/.Google ScholarGoogle Scholar
  35. Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, and Mohsen Guizani. 2018. Deep learning for IoT big data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials 20, 4 (2018), 2923--2960.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Nvidia. [n.d.]. Jetson Nano. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/.Google ScholarGoogle Scholar
  37. Joseph Redmon. 2013-2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.Google ScholarGoogle Scholar
  38. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.Google ScholarGoogle ScholarCross RefCross Ref
  39. Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. arXiv preprint (2017).Google ScholarGoogle Scholar
  40. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 14--26.Google ScholarGoogle Scholar
  42. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  43. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  44. Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 328--339.Google ScholarGoogle ScholarCross RefCross Ref
  45. TensorFlowTM. 2019. TensorFlow for Mobile and IoT. https://www.tensorflow.org/lite.Google ScholarGoogle Scholar
  46. Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. 2011. Improving the speed of neural networks on CPUs. In in Deep Learning and Unsupervised Feature Learning Workshop, NIPS. Citeseer.Google ScholarGoogle Scholar
  47. Zirui Xu, Zhuwei Qin, Fuxun Yu, Chenchen Liu, and Xiang Chen. 2018. DiReCt: Resource-Aware Dynamic Model Reconfiguration for Convolutional Neural Network in Mobile Systems. In Proceedings of the ACM International Symposium on Low Power Electronics and Design. 37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 548--560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Qi Zhang, Lu Cheng, and Raouf Boutaba. 2010. Cloud computing: state-of-the-art and research challenges. Journal of internet services and applications 1, 1 (2010), 7--18.Google ScholarGoogle ScholarCross RefCross Ref
  50. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848--6856.Google ScholarGoogle ScholarCross RefCross Ref
  51. Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2348--2359.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Adaptive parallel execution of deep neural networks on heterogeneous edge devices

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing
          November 2019
          455 pages
          ISBN:9781450367332
          DOI:10.1145/3318216

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 November 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SEC '19 Paper Acceptance Rate20of59submissions,34%Overall Acceptance Rate40of100submissions,40%

          Upcoming Conference

          SEC '24
          The Nineth ACM/IEEE Symposium on Edge Computing
          December 4 - 7, 2024
          Rome , Italy

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader