skip to main content
10.1145/3005745.3005750acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Resource Management with Deep Reinforcement Learning

Published:09 November 2016Publication History

ABSTRACT

Resource management problems in systems and networking often manifest as difficult online decision making tasks where appropriate solutions depend on understanding the workload and environment. Inspired by recent advances in deep reinforcement learning for AI problems, we consider building systems that learn to manage resources directly from experience. We present DeepRM, an example solution that translates the problem of packing tasks with multiple resource demands into a learning problem. Our initial results show that DeepRM performs comparably to state-of-the-art heuristics, adapts to different conditions, converges quickly, and learns strategies that are sensible in hindsight.

References

  1. Terminator, http://www.imdb.com/title/tt0088247/.Google ScholarGoogle Scholar
  2. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow. org, 2015.Google ScholarGoogle Scholar
  3. P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng. An application of reinforcement learning to aerobatic helicopter flight. Advances in neural information processing systems, page 1, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Reoptimizing data parallel computing. In NSDI, pages 281-294, San Jose, CA, 2012. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In OSDI, number 1, page 24, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud computing. Communications of the ACM, (4), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. P. Bertsekas and J. N. Tsitsiklis. Neuro-dynamic programming: an overview. In Decision and Control, IEEE, 1995.Google ScholarGoogle Scholar
  8. J. A. Boyan and M. L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in neural information processing systems, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. R. Cassandra and L. P. Kaelbling. Learning policies for partially observable environments: Scaling up. In Machine Learning Proceedings 1995, page 362. Morgan Kaufmann, 2016.Google ScholarGoogle Scholar
  10. T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, pages 571-582, Broomfield, CO, Oct. 2014. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, pages 74-80, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ASPLOS '14, pages 127-144, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Dong, Q. Li, D. Zarchy, P. B. Godfrey, and M. Schapira. Pcc: Re-architecting congestion control for consistent high performance. In NSDI, pages 395-408, Oakland, CA, May 2015. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Gao and R. Evans. Deepmind ai reduces google data centre cooling bill by 40%. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/.Google ScholarGoogle Scholar
  16. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. NSDI'11, pages 323-336, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. SIGCOMM '14, pages 455-466, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. T. Hagan, H. B. Demuth, M. H. Beale, and O. De Jesús. Neural network design. PWS publishing company Boston, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, (1), 1970.Google ScholarGoogle Scholar
  20. B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, S. Banerjee, and N. McKeown. Elastictree: Saving energy in data center networks. NSDI'10, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Hinton. Overview of mini-batch gradient descent. Neural Networks for Machine Learning.Google ScholarGoogle Scholar
  22. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In ACM SIGOPS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Junchen, D. Rajdeep, A. Ganesh, C. Philip, P. Venkata, S. Vyas, D. Esbjorn, G. Marcin, K. Dalibor, V. Renat, and Z. Hui. A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM '15, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of artificial intelligence research, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Kober, J. A. Bagnell, and J. Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Mahadevan and G. Theocharous. Optimizing production manufacturing using reinforcement learning. In FLAIRS Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. Menache, S. Mannor, and N. Shimkin. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, (1), 2005.Google ScholarGoogle Scholar
  28. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. CoRR, 2016.Google ScholarGoogle Scholar
  29. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Playing atari with deep reinforcement learning. CoRR, 2013.Google ScholarGoogle Scholar
  30. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. H. I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Human-level control through deep reinforcement learning. Nature, 2015.Google ScholarGoogle Scholar
  31. G. E. Monahan. State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Management Science, (1), 1982.Google ScholarGoogle Scholar
  32. J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. Trust region policy optimization. CoRR, abs/1502.05477, 2015.Google ScholarGoogle Scholar
  33. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershevlvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 2016.Google ScholarGoogle Scholar
  34. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache hadoop yarn: Yet another resource negotiator. SOCC '13, pages 5:1-5:16, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Winstein and H. Balakrishnan. TCP Ex Machina: Computer-generated Congestion Control. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. Winstein, A. Sivaraman, and H. Balakrishnan. Stochastic forecasts achieve high throughput and low delay over cellular networks. In NSDI, pages 459-471, Lombard, IL, 2013. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Yi, Y. Xiaoqi, J. Junchen, S. Vyas, L. Fuyuan, W. Nanshu, L. Tao, and B. Sinopoli. Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction. SIGCOMM, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. Via: Improving internet telephony call quality using predictive relay selection. In SIGCOMM, SIGCOMM '16, 2016.Google ScholarGoogle Scholar
  41. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In IJCAI. Citeseer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    HotNets '16: Proceedings of the 15th ACM Workshop on Hot Topics in Networks
    November 2016
    217 pages
    ISBN:9781450346610
    DOI:10.1145/3005745

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 November 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    HotNets '16 Paper Acceptance Rate30of108submissions,28%Overall Acceptance Rate110of460submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader