research-article

Public Access

Adaptive parallel execution of deep neural networks on heterogeneous edge devices

Authors:
Li Zhou

The Ohio State University

The Ohio State University
View Profile

,
Mohammad Hossein Samavatian

The Ohio State University

The Ohio State University
View Profile

,
Anys Bacha

University of Michigan-Dearborn

University of Michigan-Dearborn
View Profile

,
Saikat Majumdar

The Ohio State University

The Ohio State University
View Profile

,
Radu Teodorescu

The Ohio State University

The Ohio State University
View Profile

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge ComputingNovember 2019Pages 195–208https://doi.org/10.1145/3318216.3363312

Published:07 November 2019Publication History

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

Pages 195–208

ABSTRACT

New applications such as smart homes, smart cities, and autonomous vehicles are driving an increased interest in deploying machine learning on edge devices. Unfortunately, deploying deep neural networks (DNNs) on resource-constrained devices presents significant challenges. These workloads are computationally intensive and often require cloud-like resources. Prior solutions attempted to address these challenges by either introducing more design efforts or by relying on cloud resources for assistance.

In this paper, we propose a runtime adaptive convolutional neural network (CNN) acceleration framework that is optimized for heterogeneous Internet of Things (IoT) environments. The framework leverages spatial partitioning techniques through fusion of the convolution layers and dynamically selects the optimal degree of parallelism according to the availability of computational resources, as well as network conditions. Our evaluation shows that our framework outperforms state-of-art approaches by improving the inference speed and reducing communication costs while running on wirelessly-connected Raspberry-Pi3 devices. Experimental evaluation shows up to 1.9x ~ 3.7x speedup using 8 devices for three popular CNN models.

References

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 1--13.Google ScholarDigital Library
Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 22.Google ScholarDigital Library
Amazon. [n.d.]. Machine Learning on AWS. https://aws.amazon.com/machine-learning/.Google Scholar
Apple. [n.d.]. Core ML. https://developer.apple.com/documentation/coreml.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations (ICLR).Google Scholar
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).Google Scholar
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 269--284.Google Scholar
Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 367--379.Google ScholarDigital Library
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 609--622.Google Scholar
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (ICLR). 160--167.Google ScholarDigital Library
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).Google Scholar
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). 92--104.Google ScholarDigital Library
Marat Dukhan. 2018. NNPACK. https://github.com/Maratyszcza/NNPACK.Google Scholar
Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In Proceedings of the 24th ACM International Conference on Mobile Computing and Networking. 115--127.Google ScholarDigital Library
Raspberry Pi Foundation. [n.d.]. Raspberry Pi. https://www.raspberrypi.org/.Google Scholar
Google. [n.d.]. Cloud Machine Learning Engine. https://cloud.google.com/ml-engine/.Google Scholar
Ramyad Hadidi, Jiashen Cao, Micheal S Ryoo, and Hyesoon Kim. 2019. Collaborative Execution of Deep Neural Networks on Internet of Things Devices. arXiv preprint arXiv:1901.02537 (2019).Google Scholar
Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S Ryoo, and Hyesoon Kim. 2018. Distributed Perception by Collaborative Robots. IEEE Robotics and Automation Letters 3, 4 (2018), 3709--3716.Google ScholarCross Ref
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.Google ScholarCross Ref
Intel. [n.d.]. Movidius Neural Compute Stick. https://software.intel.com/en-us/movidius-ncs.Google Scholar
Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon. 2018. IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers. In Proceedings of the ACM Symposium on Cloud Computing. 401--411.Google ScholarDigital Library
Zhihao Jia, Sina Lin, Charles R Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1802.04924 (2018).Google Scholar
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google ScholarDigital Library
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.Google Scholar
He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: deep learning for the internet of things with edge computing. IEEE Network 32, 1 (2018), 96--101.Google ScholarCross Ref
Robert LiKamWa, Yunhui Hou, Yuan Gao, Mia Polansky, and Lin Zhong. 2016. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 255--266.Google ScholarDigital Library
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical image analysis 42 (2017), 60--88.Google Scholar
Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 369--381.Google ScholarDigital Library
Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. 2017. Modnn: Local distributed mobile computing system for deep neural network. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1396--1401.Google Scholar
Microsoft. [n.d.]. Azure Machine Learning service. https://azure.microsoft.com/en-us/services/machine-learning-service/.Google Scholar
Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, and Mohsen Guizani. 2018. Deep learning for IoT big data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials 20, 4 (2018), 2923--2960.Google ScholarDigital Library
Nvidia. [n.d.]. Jetson Nano. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/.Google Scholar
Joseph Redmon. 2013-2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.Google Scholar
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.Google ScholarCross Ref
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. arXiv preprint (2017).Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.Google ScholarDigital Library
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 14--26.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarCross Ref
Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 328--339.Google ScholarCross Ref
TensorFlow^TM. 2019. TensorFlow for Mobile and IoT. https://www.tensorflow.org/lite.Google Scholar
Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. 2011. Improving the speed of neural networks on CPUs. In in Deep Learning and Unsupervised Feature Learning Workshop, NIPS. Citeseer.Google Scholar
Zirui Xu, Zhuwei Qin, Fuxun Yu, Chenchen Liu, and Xiang Chen. 2018. DiReCt: Resource-Aware Dynamic Model Reconfiguration for Convolutional Neural Network in Mobile Systems. In Proceedings of the ACM International Symposium on Low Power Electronics and Design. 37.Google ScholarDigital Library
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 548--560.Google ScholarDigital Library
Qi Zhang, Lu Cheng, and Raouf Boutaba. 2010. Cloud computing: state-of-the-art and research challenges. Journal of internet services and applications 1, 1 (2010), 7--18.Google ScholarCross Ref
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848--6856.Google ScholarCross Ref
Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2348--2359.Google ScholarCross Ref

Index Terms

Adaptive parallel execution of deep neural networks on heterogeneous edge devices
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

Unsupervised test-time adaptation of deep neural networks at the edge: a case study
DATE '22: Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe

Deep learning is being increasingly used in mobile and edge autonomous systems. The prediction accuracy of deep neural networks (DNNs), however, can degrade after deployment due to encountering data samples whose distributions are different than the ...
Read More
Edge-preserving image denoising using a deep convolutional neural network
Highlights
- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
Abstract
This paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...
Read More
Convergence of deep convolutional neural networks
Abstract
Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning. In a previous study, we investigated this question for deep networks with the Rectified ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing
November 2019
455 pages
ISBN:9781450367332
DOI:10.1145/3318216
General Chairs:
Songqing Chen
George Mason University
,
Ryokichi Onishi
Toyota
,
Program Chairs:
Ganesh Ananthanarayanan
Microsoft Research
,
Qun Li
College of William & Mary
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
edge devices
inference
parallel execution
Qualifiers
- research-article
Conference

Acceptance Rates
SEC '19 Paper Acceptance Rate20of59submissions,34%Overall Acceptance Rate40of100submissions,40%
More
Upcoming Conference
SEC '24

Sponsor:

sigmobile

The Nineth ACM/IEEE Symposium on Edge Computing

December 4 - 7, 2024

Rome , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 62
  Total Citations
  View Citations
- 2,457
  Total Downloads
- Downloads (Last 12 months)456
- Downloads (Last 6 weeks)61
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive parallel execution of deep neural networks on heterogeneous edge devices

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised test-time adaptation of deep neural networks at the edge: a case study

Edge-preserving image denoising using a deep convolutional neural network

Convergence of deep convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adaptive parallel execution of deep neural networks on heterogeneous edge devices

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised test-time adaptation of deep neural networks at the edge: a case study

Edge-preserving image denoising using a deep convolutional neural network

Convergence of deep convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media