skip to main content
research-article
Public Access

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

The computation for today's intelligent personal assistants such as Apple Siri, Google Now, and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires significant amounts of data to be sent to the cloud over the wireless network and puts significant computational pressure on the datacenter. However, as the computational resources in mobile devices become more powerful and energy efficient, questions arise as to whether this cloud-only processing is desirable moving forward, and what are the implications of pushing some or all of this compute to the mobile devices on the edge.

In this paper, we examine the status quo approach of cloud-only processing and investigate computation partitioning strategies that effectively leverage both the cycles in the cloud and on the mobile device to achieve low latency, low energy consumption, and high datacenter throughput for this class of intelligent applications. Our study uses 8 intelligent applications spanning computer vision, speech, and natural language domains, all employing state-of-the-art Deep Neural Networks (DNNs) as the core machine learning technique. We find that given the characteristics of DNN algorithms, a fine-grained, layer-level computation partitioning strategy based on the data and computation variations of each layer within a DNN has significant latency and energy advantages over the status quo approach.

Using this insight, we design Neurosurgeon, a lightweight scheduler to automatically partition DNN computation between mobile devices and datacenters at the granularity of neural network layers. Neurosurgeon does not require per-application profiling. It adapts to various DNN architectures, hardware platforms, wireless networks, and server load levels, intelligently partitioning computation for best latency or best mobile energy. We evaluate Neurosurgeon on a state-of-the-art mobile development platform and show that it improves end-to-end latency by 3.1X on average and up to 40.7X, reduces mobile energy consumption by 59.5% on average and up to 94.7%, and improves datacenter throughput by 1.5X on average and up to 6.7X.

References

  1. Wearables market to be worth$25 billion by 2019. http://www.ccsinsight.com/press/company-news/2332-wearables-market-to-be-worth-25-billion-by-2019-reveals-ccs-insight. Accessed: 2017-01.Google ScholarGoogle Scholar
  2. Rapid Expansion Projected for Smart Home Devices, IHS Markit Says. http://news.ihsmarkit.com/press-release/technology/rapid-expansion-projected-smart-home-devices-ihs-markit-says. Accessed: 2017-01.Google ScholarGoogle Scholar
  3. Intelligent Virtual Assistant Market Worth$3.07Bn By 2020. https://globenewswire.com/news-release/2015/12/17/796353/0/en/Intelligent-Virtual-Assistant-Market-Worth-3-07Bn-By-2020.html. Accessed: 2016-08.Google ScholarGoogle Scholar
  4. Intelligent Virtual Assistant Market Analysis And Segment Forecasts 2015 To 2022. https://www.hexaresearch.com/research-report/intelligent-virtual-assistant-industry/. Accessed: 2016-08.Google ScholarGoogle Scholar
  5. Growing Focus on Strengthening Customer Relations Spurs Adoption of Intelligent Virtual Assistant Technology. http://www.transparencymarketresearch.com/pressrelease/intelligent-virtual-assistant-industry.html/. Accessed: 2016-08.Google ScholarGoogle Scholar
  6. Google Brain. https://backchannel.com/google-search-will-be-your-next-brain-5207c26e4523#.x9n2ajota. Accessed: 2017-01.Google ScholarGoogle Scholar
  7. Microsoft Deep Learning Outperforms Humans in Image Recognition. http://www.forbes.com/sites/michaelthomsen/2015/02/19/microsofts-deep-learning-project-outperforms-humans-in-image-recognition/. Accessed: 2016-08.Google ScholarGoogle Scholar
  8. Baidu Supercomputer. https://gigaom.com/2015/01/14/baidu-has-built-a-supercomputer-for-deep-learning/. Accessed: 2016-08.Google ScholarGoogle Scholar
  9. Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G. Dreslinski, Jason Mars, and Lingjia Tang. Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), ISCA '15, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. The 'Google Brain' is a real thing but very few people have seen it. http://www.businessinsider.com/what-is-google-brain-2016--9. Accessed: 2017-01.Google ScholarGoogle Scholar
  11. Google supercharges machine learning tasks with TPU custom chip. https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html. Accessed: 2017-01.Google ScholarGoogle Scholar
  12. Apple's Massive New Data Center Set To Host Nuance Tech. http://techcrunch.com/2011/05/09/apple-nuance-data-center-deal/. Accessed: 2016-08.Google ScholarGoogle Scholar
  13. Apple moves to third-generation Siri back-end, built on open-source Mesos platform. http://9to5mac.com/2015/04/27/siri-backend-mesos/. Accessed: 2016-08.Google ScholarGoogle Scholar
  14. Matthew Halpern, Yuhao Zhu, and Vijay Janapa Reddi. Mobile cpu's rise to power: Quantifying the impact of generational mobile cpu design trends on performance, energy, and user satisfaction. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on, pages 64--76. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  15. Whitepaper: NVIDIA Tegra X1. Technical report. Accessed: 2017-01.Google ScholarGoogle Scholar
  16. NVIDIA Jetson TK1 Development Kit: Bringing GPU-accelerated computing to Embedded Systems. Technical report. Accessed: 2017-01.Google ScholarGoogle Scholar
  17. Nvidia's Tegra K1 at the Heart of Google's Nexus 9. http://www.pcmag.com/article2/0,2817,2470740,00.asp. Accessed: 2016-08.Google ScholarGoogle Scholar
  18. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  19. Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. Augem: automatically generate high performance dense linear algebra kernels on x86 cpus. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 25. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cuDNN: Efficient Primitives for Deep Learning. CoRR, abs/1410.0759, 2014.Google ScholarGoogle Scholar
  21. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Andrew Ng. Deep learning with cots hpc systems. In Proceedings of the 30th international conference on machine learning, pages 1337--1345, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. TestMyNet: Internet Speed Test. http://testmy.net/. Accessed: 2015-02.Google ScholarGoogle Scholar
  24. Watts Up? Power Meter. https://www.wattsupmeters.com/. Accessed: 2015-05.Google ScholarGoogle Scholar
  25. Junxian Huang, Feng Qian, Alexandre Gerber, Z Morley Mao, Subhabrata Sen, and Oliver Spatscheck. A close examination of performance and power characteristics of 4g lte networks. In Proceedings of the 10th international conference on Mobile systems, applications, and services, pages 225--238. ACM, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google ScholarGoogle Scholar
  27. Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. Google ScholarGoogle ScholarCross RefCross Ref
  29. Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. The kaldi speech recognition toolkit. In Proc. ASRU, 2011.Google ScholarGoogle Scholar
  30. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 2011.Google ScholarGoogle Scholar
  31. Ashkan Nikravesh, David R Choffnes, Ethan Katz-Bassett, Z Morley Mao, and Matt Welsh. Mobile network performance from user devices: A longitudinal, multidimensional analysis. In International Conference on Passive and Active Network Measurement, pages 12--22. Springer, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. Towards energy proportionality for large-scale latency-critical workloads. In ACM SIGARCH Computer Architecture News, volume 42, pages 301--312. IEEE Press, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  33. Mark Slee, Aditya Agarwal, and Marc Kwiatkowski. Thrift: Scalable cross-language services implementation. Facebook White Paper, 5(8), 2007.Google ScholarGoogle Scholar
  34. Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. Maui: making smartphones last longer with code offload. In Proceedings of the 8th international conference on Mobile systems, applications, and services, pages 49--62. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mark S Gordon, Davoud Anoushe Jamshidi, Scott A Mahlke, Zhuoqing Morley Mao, and Xu Chen. Comet: Code offload by migrating execution transparently.Google ScholarGoogle Scholar
  36. Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wetherall, and Ramesh Govindan. Odessa: enabling interactive perception applications on mobile devices. In Proceedings of the 9th international conference on Mobile systems, applications, and services, pages 43--56. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and Ashwin Patti. Clonecloud: elastic execution between mobile device and cloud. In Proceedings of the sixth conference on Computer systems, pages 301--314. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David Meisner, Junjie Wu, and Thomas F. Wenisch. BigHouse: A Simulation Infrastructure for Data Center Systems. ISPASS '12: International Symposium on Performance Analysis of Systems and Software, April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chang-Hong Hsu, Yunqi Zhang, Michael A. Laurenzano, David Meisner, Thomas Wenisch, Lingjia Tang, Jason Mars, and Ronald G. Dreslinski. Adrenaline: Pinpointing and reigning in tail queries with quick voltage boosting. In International Symposium on High Performance Computer Architecture (HPCA), 2015. Google ScholarGoogle ScholarCross RefCross Ref
  40. Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, and Jason Mars. Protean code: Achieving near-free online code transformations for warehouse scale computers. In International Symposium on Microarchitecture (MICRO), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In International Symposium on Microarchitecture (MICRO), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vinicius Petrucci, Michael A. Laurenzano, Yunqi Zhang, John Doherty, Daniel Mosse, Jason Mars, and Lingjia Tang. Octopus-man: Qos-driven task management for heterogeneous multicore in warehouse scale computers. In International Symposium on High Performance Computer Architecture (HPCA), 2015. Google ScholarGoogle ScholarCross RefCross Ref
  43. Jason Mars and Lingjia Tang. Whare-map: Heterogeneity in "homogeneous" warehouse-scale computers. In International Symposium on Computer Architecture (ISCA), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Johann Hauswald, Tom Manville, Qi Zheng, Ronald G. Dreslinski, Chaitali Chakrabarti, and Trevor Mudge. A hybrid approach to offloading mobile image classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014. Google ScholarGoogle ScholarCross RefCross Ref
  45. Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ronald G. Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, and Jason Mars. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In International Symposium on Computer Architecture (ISCA), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Matt Skach, Manish Arora, Chang-Hong Hsu, Qi Li, Dean Tullsen, Lingjia Tang, and Jason Mars. Thermal time shifting: Leveraging phase change materials to reduce cooling costs in warehouse-scale computers. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), ISCA '15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. Smite: Precise qos prediction on real system smt processors to improve utilization in warehouse scale computers. In International Symposium on Microarchitecture (MICRO), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. In ACM SIGPLAN Notices, volume 51, pages 681--696. ACM, 2016.Google ScholarGoogle Scholar
  50. Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. Treadmill: Attributing the source of tail latency through precise load testing and statistical inference. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pages 456--468. IEEE, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Animesh Jain, Michael A Laurenzano, Lingjia Tang, and Jason Mars. Continuous shape shifting: Enabling loop co-optimization via near-free dynamic code rewriting. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1--12. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  52. Michael A. Laurenzano, Yunqi Zhang, Jiang Chen, Lingjia Tang, and Jason Mars. Powerchop: Identifying and managing non-critical units in hybrid processor architectures. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 140--152, Piscataway, NJ, USA, 2016. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, pages 269--284. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. Pudiannao: A polyvalent machine learning accelerator. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369--381. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S Chung. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, 2(11), 2015.Google ScholarGoogle Scholar
  56. Xin Lei, Andrew Senior, Alexander Gruenstein, and Jeffrey Sorensen. Accurate and Compact Large vocabulary speech recognition on mobile devices. In INTERSPEECH, pages 662--665, 2013.Google ScholarGoogle Scholar
  57. Xin Lei, Andrew Senior, Alexander Gruenstein, and Jeffrey Sorensen. Accurate and compact large vocabulary speech recognition on mobile devices. In INTERSPEECH, pages 662--665, 2013.Google ScholarGoogle Scholar
  58. Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. Mcdnn: An execution framework for deep neural networks on resource-constrained devices. In MobiSys, 2016.Google ScholarGoogle Scholar

Index Terms

  1. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 4
        ASPLOS '17
        April 2017
        811 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3093336
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
          April 2017
          856 pages
          ISBN:9781450344654
          DOI:10.1145/3037697

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2017

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader