research-article

Terrain-adaptive locomotion skills using deep reinforcement learning

Authors:
Xue Bin Peng

University of British Columbia

University of British Columbia
View Profile

,
Glen Berseth

University of British Columbia

University of British Columbia
View Profile

,
Michiel van de Panne

University of British Columbia

University of British Columbia
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 35 Issue 4Article No.: 81pp 1–12https://doi.org/10.1145/2897824.2925881

Published:11 July 2016Publication History

ACM Transactions on Graphics

Abstract

Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features. Building on recent progress in deep reinforcement learning (DeepRL), we introduce a mixture of actor-critic experts (MACE) approach that learns terrain-adaptive dynamic locomotion skills using high-dimensional state and terrain descriptions as input, and parameterized leaps or steps as output actions. MACE learns more quickly than a single actor-critic approach and results in actor-critic experts that exhibit specialization. Additional elements of our solution that contribute towards efficient learning include Boltzmann exploration and the use of initial actor biases to encourage specialization. Results are demonstrated for multiple planar characters and terrain classes.

Supplemental Material

a81.mp4

mp4

309.5 MB

Download

Available for Download

zip

a81-peng-supp.zip (57.3 MB)

Supplemental files.

References

Assael, J.-A. M., Wahlström, N., Schön, T. B., and Deisenroth, M. P. 2015. Data-efficient learning of feedback policies from image pixels using deep dynamical models. arXiv preprint arXiv:1510.02173.Google Scholar
Bullet, 2015. Bullet physics library, Dec. http://bulletphysics.org.Google Scholar
Calinon, S., Kormushev, P., and Caldwell, D. G. 2013. Compliant skills acquisition and multi-optima policy search with em-based reinforcement learning. Robotics and Autonomous Systems 61, 4, 369--379. Google ScholarDigital Library
Coros, S., Beaudoin, P., Yin, K. K., and van de Panne, M. 2008. Synthesis of constrained walking skills. ACM Trans. Graph. 27, 5, Article 113. Google ScholarDigital Library
Coros, S., Beaudoin, P., and van de Panne, M. 2009. Robust task-based control policies for physics-based characters. ACM Transctions on Graphics 28, 5, Article 170. Google ScholarDigital Library
Coros, S., Beaudoin, P., and van de Panne, M. 2010. Generalized biped walking control. ACM Transctions on Graphics 29, 4, Article 130. Google ScholarDigital Library
Coros, S., Karpathy, A., Jones, B., Reveret, L., and van de Panne, M. 2011. Locomotion skills for simulated quadrupeds. ACM Transactions on Graphics 30, 4, Article 59. Google ScholarDigital Library
da Silva, M., Abe, Y., and Popović, J. 2008. Interactive simulation of stylized human locomotion. ACM Trans. Graph. 27, 3, Article 82. Google ScholarDigital Library
da Silva, M., Durand, F., and Popović, J. 2009. Linear bellman combination for control of character animation. ACM Trans. Graph. 28, 3, Article 82. Google ScholarDigital Library
Doya, K., Samejima, K., Katagiri, K.-i., and Kawato, M. 2002. Multiple model-based reinforcement learning. Neural computation 14, 6, 1347--1369. Google ScholarDigital Library
Faloutsos, P., van de Panne, M., and Terzopoulos, D. 2001. Composable controllers for physics-based character animation. In Proceedings of SIGGRAPH 2001, 251--260. Google ScholarDigital Library
Featherstone, R. 2014. Rigid body dynamics algorithms. Springer. Google ScholarDigital Library
Geijtenbeek, T., and Pronost, N. 2012. Interactive character animation using simulated physics: A state-of-the-art review. In Computer Graphics Forum, vol. 31, Wiley Online Library, 2492--2515. Google ScholarDigital Library
Grzeszczuk, R., Terzopoulos, D., and Hinton, G. 1998. Neuroanimator: Fast neural network emulation and control of physics-based models. In Proc. ACM SIGGRAPH, ACM, 9--20. Google ScholarDigital Library
Hansen, N. 2006. The cma evolution strategy: A comparing review. In Towards a New Evolutionary Computation, 75--102.Google Scholar
Haruno, M., Wolpert, D. H., and Kawato, M. 2001. Mosaic model for sensorimotor learning and control. Neural computation 13, 10, 2201--2220. Google ScholarDigital Library
Hausknecht, M., and Stone, P. 2015. Deep reinforcement learning in parameterized action space. arXiv preprint arXiv:1511.04143.Google Scholar
Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., and Tassa, Y. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, 2926--2934. Google ScholarDigital Library
Hester, T., and Stone, P. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Machine Learning 90, 3, 385--429. Google ScholarDigital Library
Hodgins, J. K., Wooten, W. L., Brogan, D. C., and O'Brien, J. F. 1995. Animating human athletics. In Proceedings of SIGGRAPH 1995, 71--78. Google ScholarDigital Library
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. 1991. Adaptive mixtures of local experts. Neural computation 3, 1, 79--87.Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, ACM, New York, NY, USA, MM '14, 675--678. Google ScholarDigital Library
Laszlo, J., van de Panne, M., and Fiume, E. 1996. Limit cycle control and its application to the animation of balancing and walking. In Proc. ACM SIGGRAPH, 155--162. Google ScholarDigital Library
Lee, J., and Lee, K. H. 2006. Precomputing avatar behavior from human motion data. Graphical Models 68, 2, 158--174. Google ScholarDigital Library
Lee, Y., Lee, S. J., and Popović, Z. 2009. Compact character controllers. ACM Transctions on Graphics 28, 5, Article 169. Google ScholarDigital Library
Lee, Y., Wampler, K., Bernstein, G., Popović, J., and Popović, Z. 2010. Motion fields for interactive character locomotion. ACM Transctions on Graphics 29, 6, Article 138. Google ScholarDigital Library
Lee, Y., Kim, S., and Lee, J. 2010. Data-driven biped control. ACM Transctions on Graphics 29, 4, Article 129. Google ScholarDigital Library
Levine, S., and Abbeel, P. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems 27. 1071--1079. Google ScholarDigital Library
Levine, S., and Koltun, V. 2014. Learning complex neural network policies with trajectory optimization. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 829--837.Google Scholar
Levine, S., Wang, J. M., Haraux, A., Popović, Z., and Koltun, V. 2012. Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics (TOG) 31, 4, 28. Google ScholarDigital Library
Levine, S., Finn, C., Darrell, T., and Abbeel, P. 2015. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702. Google ScholarDigital Library
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.Google Scholar
Liu, L., Yin, K., va n d e Panne, M., and Guo, B. 2012. Terrain runner: control, parameterization, composition, and planning for highly dynamic motions. ACM Trans. Graph. 31, 6, 154. Google ScholarDigital Library
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540, 529--533.Google Scholar
Mordatch, I., and Todorov, E. 2014. Combining the benefits of function approximation and trajectory optimization. In Robotics: Science and Systems (RSS).Google Scholar
Mordatch, I., de Lasa, M., and Hertzmann, A. 2010. Robust physics-based locomotion using low-dimensional planning. ACM Trans. Graph. 29, 4, Article 71. Google ScholarDigital Library
Mordatch, I., Lowrey, K., Andrew, G., Popovic, Z., and Todorov, E. V. 2015. Interactive control of diverse complex characters with neural networks. In Advances in Neural Information Processing Systems, 3114--3122. Google ScholarDigital Library
Muico, U., Lee, Y., Popović, J., and Popović, Z. 2009. Contact-aware nonlinear control of dynamic characters. ACM Trans. Graph. 28, 3, Article 81. Google ScholarDigital Library
Muico, U., Popović, J., and Popović, Z. 2011. Composite control of physically simulated characters. ACM Trans. Graph. 30, 3, Article 16. Google ScholarDigital Library
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suley-man, M., Beattie, C., Petersen, S., et al. 2015. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296.Google Scholar
Parisotto, E., Ba, J. L., and Salakhutdinov, R. 2015. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342.Google Scholar
Pastor, P., Kalakrishnan, M., Righetti, L., and Schaal, S. 2012. Towards associative skill memories. In Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on, IEEE, 309--315.Google Scholar
Peng, X. B., Berseth, G., and van de Panne, M. 2015. Dynamic terrain traversal skills using reinforcement learning. ACM Transactions on Graphics 34, 4. Google ScholarDigital Library
Rusu, A. A., Colmenarejo, S. G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., and Hadsell, R. 2015. Policy distillation. arXiv preprint arXiv:1511.06295.Google Scholar
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.Google Scholar
Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. 2015. Trust region policy optimization. CoRR abs/1502.05477.Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. 2014. Deterministic policy gradient algorithms. In ICML.Google Scholar
Sok, K. W., Kim, M., and Lee, J. 2007. Simulating biped behaviors from human motion data. ACM Trans. Graph. 26, 3, Article 107. Google ScholarDigital Library
Stadie, B. C., Levine, S., and Abbeel, P. 2015. Incentiviz-ing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814.Google Scholar
Tan, J., Liu, K., and Turk, G. 2011. Stable proportional-derivative controllers. Computer Graphics and Applications, IEEE 31, 4, 34--44. Google ScholarDigital Library
Tan, J., Gu, Y., Liu, C. K., and Turk, G. 2014. Learning bicycle stunts. ACM Transactions on Graphics (TOG) 33, 4, 50. Google ScholarDigital Library
Treuille, A., Lee, Y., and Popović, Z. 2007. Near-optimal character animation with continuous control. ACM Transactions on Graphics (TOG) 26, 3, Article 7. Google ScholarDigital Library
Uchibe, E., and Doya, K. 2004. Competitive-cooperative-concurrent reinforcement learning with importance sampling. In Proc. of International Conference on Simulation of Adaptive Behavior: From Animals and Animats, 287--296.Google Scholar
van der Maaten, L., and Hinton, G. E. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research 9, 2579--2605.Google Scholar
Van Hasselt, H., and Wiering, M. A. 2007. Reinforcement learning in continuous action spaces. In Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on, IEEE, 272--279.Google Scholar
Van Hasselt, H., Guez, A., and Silver, D. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461.Google Scholar
Van Hasselt, H. 2012. Reinforcement learning in continuous state and action spaces. In Reinforcement Learning. Springer, 207--251.Google Scholar
Wang, J. M., Fleet, D. J., and Hertzmann, A. 2009. Optimizing walking controllers. ACM Transctions on Graphics 28, 5, Article 168. Google ScholarDigital Library
Wiering, M., and Van Hasselt, H. 2008. Ensemble algorithms in reinforcement learning. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 38, 4, 930--936. Google ScholarDigital Library
Ye, Y., and Liu, C. K. 2010. Optimal feedback control for character animation using an abstract model. ACM Trans. Graph. 29, 4, Article 74. Google ScholarDigital Library
Yin, K., Loken, K., and van de Panne, M. 2007. Simbicon: Simple biped locomotion control. ACM Transctions on Graphics 26, 3, Article 105. Google ScholarDigital Library
Yin, K., Coros, S., Beaudoin, P., and van de Panne, M. 2008. Continuation methods for adapting simulated skills. ACM Transctions on Graphics 27, 3, Article 81. Google ScholarDigital Library

Index Terms

Terrain-adaptive locomotion skills using deep reinforcement learning
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Physical simulation

Recommendations

Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Deep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Read More
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Read More
Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning

Usage of trust region policy optimisation (TRPO) and proximal policy optimisation (PPO) 'children of policy gradient optimisation method' and deep Q-learning network (DQN) in Lidar-based differential robots are proposed using Turtlebot and OpenAI's ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 35, Issue 4
July 2016
1396 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2897824
Issue’s Table of Contents

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 July 2016
Published in tog Volume 35, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
physics-based characters
reinforcement learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 133
  Total Citations
  View Citations
- 2,708
  Total Downloads
- Downloads (Last 12 months)206
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Terrain-adaptive locomotion skills using deep reinforcement learning

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Conversational Recommender System Using Deep Reinforcement Learning

Reward Shaping in Episodic Reinforcement Learning

Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Terrain-adaptive locomotion skills using deep reinforcement learning

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Conversational Recommender System Using Deep Reinforcement Learning

Reward Shaping in Episodic Reinforcement Learning

Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media