skip to main content
research-article

Terrain-adaptive locomotion skills using deep reinforcement learning

Published:11 July 2016Publication History
Skip Abstract Section

Abstract

Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features. Building on recent progress in deep reinforcement learning (DeepRL), we introduce a mixture of actor-critic experts (MACE) approach that learns terrain-adaptive dynamic locomotion skills using high-dimensional state and terrain descriptions as input, and parameterized leaps or steps as output actions. MACE learns more quickly than a single actor-critic approach and results in actor-critic experts that exhibit specialization. Additional elements of our solution that contribute towards efficient learning include Boltzmann exploration and the use of initial actor biases to encourage specialization. Results are demonstrated for multiple planar characters and terrain classes.

Skip Supplemental Material Section

Supplemental Material

a81.mp4

mp4

309.5 MB

References

  1. Assael, J.-A. M., Wahlström, N., Schön, T. B., and Deisenroth, M. P. 2015. Data-efficient learning of feedback policies from image pixels using deep dynamical models. arXiv preprint arXiv:1510.02173.Google ScholarGoogle Scholar
  2. Bullet, 2015. Bullet physics library, Dec. http://bulletphysics.org.Google ScholarGoogle Scholar
  3. Calinon, S., Kormushev, P., and Caldwell, D. G. 2013. Compliant skills acquisition and multi-optima policy search with em-based reinforcement learning. Robotics and Autonomous Systems 61, 4, 369--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Coros, S., Beaudoin, P., Yin, K. K., and van de Panne, M. 2008. Synthesis of constrained walking skills. ACM Trans. Graph. 27, 5, Article 113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Coros, S., Beaudoin, P., and van de Panne, M. 2009. Robust task-based control policies for physics-based characters. ACM Transctions on Graphics 28, 5, Article 170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Coros, S., Beaudoin, P., and van de Panne, M. 2010. Generalized biped walking control. ACM Transctions on Graphics 29, 4, Article 130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Coros, S., Karpathy, A., Jones, B., Reveret, L., and van de Panne, M. 2011. Locomotion skills for simulated quadrupeds. ACM Transactions on Graphics 30, 4, Article 59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. da Silva, M., Abe, Y., and Popović, J. 2008. Interactive simulation of stylized human locomotion. ACM Trans. Graph. 27, 3, Article 82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. da Silva, M., Durand, F., and Popović, J. 2009. Linear bellman combination for control of character animation. ACM Trans. Graph. 28, 3, Article 82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Doya, K., Samejima, K., Katagiri, K.-i., and Kawato, M. 2002. Multiple model-based reinforcement learning. Neural computation 14, 6, 1347--1369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Faloutsos, P., van de Panne, M., and Terzopoulos, D. 2001. Composable controllers for physics-based character animation. In Proceedings of SIGGRAPH 2001, 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Featherstone, R. 2014. Rigid body dynamics algorithms. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Geijtenbeek, T., and Pronost, N. 2012. Interactive character animation using simulated physics: A state-of-the-art review. In Computer Graphics Forum, vol. 31, Wiley Online Library, 2492--2515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Grzeszczuk, R., Terzopoulos, D., and Hinton, G. 1998. Neuroanimator: Fast neural network emulation and control of physics-based models. In Proc. ACM SIGGRAPH, ACM, 9--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hansen, N. 2006. The cma evolution strategy: A comparing review. In Towards a New Evolutionary Computation, 75--102.Google ScholarGoogle Scholar
  16. Haruno, M., Wolpert, D. H., and Kawato, M. 2001. Mosaic model for sensorimotor learning and control. Neural computation 13, 10, 2201--2220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hausknecht, M., and Stone, P. 2015. Deep reinforcement learning in parameterized action space. arXiv preprint arXiv:1511.04143.Google ScholarGoogle Scholar
  18. Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., and Tassa, Y. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, 2926--2934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hester, T., and Stone, P. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Machine Learning 90, 3, 385--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hodgins, J. K., Wooten, W. L., Brogan, D. C., and O'Brien, J. F. 1995. Animating human athletics. In Proceedings of SIGGRAPH 1995, 71--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. 1991. Adaptive mixtures of local experts. Neural computation 3, 1, 79--87.Google ScholarGoogle Scholar
  22. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, ACM, New York, NY, USA, MM '14, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Laszlo, J., van de Panne, M., and Fiume, E. 1996. Limit cycle control and its application to the animation of balancing and walking. In Proc. ACM SIGGRAPH, 155--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lee, J., and Lee, K. H. 2006. Precomputing avatar behavior from human motion data. Graphical Models 68, 2, 158--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lee, Y., Lee, S. J., and Popović, Z. 2009. Compact character controllers. ACM Transctions on Graphics 28, 5, Article 169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lee, Y., Wampler, K., Bernstein, G., Popović, J., and Popović, Z. 2010. Motion fields for interactive character locomotion. ACM Transctions on Graphics 29, 6, Article 138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lee, Y., Kim, S., and Lee, J. 2010. Data-driven biped control. ACM Transctions on Graphics 29, 4, Article 129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Levine, S., and Abbeel, P. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems 27. 1071--1079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Levine, S., and Koltun, V. 2014. Learning complex neural network policies with trajectory optimization. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 829--837.Google ScholarGoogle Scholar
  30. Levine, S., Wang, J. M., Haraux, A., Popović, Z., and Koltun, V. 2012. Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics (TOG) 31, 4, 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Levine, S., Finn, C., Darrell, T., and Abbeel, P. 2015. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.Google ScholarGoogle Scholar
  33. Liu, L., Yin, K., va n d e Panne, M., and Guo, B. 2012. Terrain runner: control, parameterization, composition, and planning for highly dynamic motions. ACM Trans. Graph. 31, 6, 154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540, 529--533.Google ScholarGoogle Scholar
  35. Mordatch, I., and Todorov, E. 2014. Combining the benefits of function approximation and trajectory optimization. In Robotics: Science and Systems (RSS).Google ScholarGoogle Scholar
  36. Mordatch, I., de Lasa, M., and Hertzmann, A. 2010. Robust physics-based locomotion using low-dimensional planning. ACM Trans. Graph. 29, 4, Article 71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mordatch, I., Lowrey, K., Andrew, G., Popovic, Z., and Todorov, E. V. 2015. Interactive control of diverse complex characters with neural networks. In Advances in Neural Information Processing Systems, 3114--3122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Muico, U., Lee, Y., Popović, J., and Popović, Z. 2009. Contact-aware nonlinear control of dynamic characters. ACM Trans. Graph. 28, 3, Article 81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Muico, U., Popović, J., and Popović, Z. 2011. Composite control of physically simulated characters. ACM Trans. Graph. 30, 3, Article 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suley-man, M., Beattie, C., Petersen, S., et al. 2015. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296.Google ScholarGoogle Scholar
  41. Parisotto, E., Ba, J. L., and Salakhutdinov, R. 2015. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342.Google ScholarGoogle Scholar
  42. Pastor, P., Kalakrishnan, M., Righetti, L., and Schaal, S. 2012. Towards associative skill memories. In Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on, IEEE, 309--315.Google ScholarGoogle Scholar
  43. Peng, X. B., Berseth, G., and van de Panne, M. 2015. Dynamic terrain traversal skills using reinforcement learning. ACM Transactions on Graphics 34, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rusu, A. A., Colmenarejo, S. G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., and Hadsell, R. 2015. Policy distillation. arXiv preprint arXiv:1511.06295.Google ScholarGoogle Scholar
  45. Schaul, T., Quan, J., Antonoglou, I., and Silver, D. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.Google ScholarGoogle Scholar
  46. Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. 2015. Trust region policy optimization. CoRR abs/1502.05477.Google ScholarGoogle Scholar
  47. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. 2014. Deterministic policy gradient algorithms. In ICML.Google ScholarGoogle Scholar
  48. Sok, K. W., Kim, M., and Lee, J. 2007. Simulating biped behaviors from human motion data. ACM Trans. Graph. 26, 3, Article 107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Stadie, B. C., Levine, S., and Abbeel, P. 2015. Incentiviz-ing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814.Google ScholarGoogle Scholar
  50. Tan, J., Liu, K., and Turk, G. 2011. Stable proportional-derivative controllers. Computer Graphics and Applications, IEEE 31, 4, 34--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Tan, J., Gu, Y., Liu, C. K., and Turk, G. 2014. Learning bicycle stunts. ACM Transactions on Graphics (TOG) 33, 4, 50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Treuille, A., Lee, Y., and Popović, Z. 2007. Near-optimal character animation with continuous control. ACM Transactions on Graphics (TOG) 26, 3, Article 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Uchibe, E., and Doya, K. 2004. Competitive-cooperative-concurrent reinforcement learning with importance sampling. In Proc. of International Conference on Simulation of Adaptive Behavior: From Animals and Animats, 287--296.Google ScholarGoogle Scholar
  54. van der Maaten, L., and Hinton, G. E. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research 9, 2579--2605.Google ScholarGoogle Scholar
  55. Van Hasselt, H., and Wiering, M. A. 2007. Reinforcement learning in continuous action spaces. In Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on, IEEE, 272--279.Google ScholarGoogle Scholar
  56. Van Hasselt, H., Guez, A., and Silver, D. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461.Google ScholarGoogle Scholar
  57. Van Hasselt, H. 2012. Reinforcement learning in continuous state and action spaces. In Reinforcement Learning. Springer, 207--251.Google ScholarGoogle Scholar
  58. Wang, J. M., Fleet, D. J., and Hertzmann, A. 2009. Optimizing walking controllers. ACM Transctions on Graphics 28, 5, Article 168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Wiering, M., and Van Hasselt, H. 2008. Ensemble algorithms in reinforcement learning. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 38, 4, 930--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ye, Y., and Liu, C. K. 2010. Optimal feedback control for character animation using an abstract model. ACM Trans. Graph. 29, 4, Article 74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Yin, K., Loken, K., and van de Panne, M. 2007. Simbicon: Simple biped locomotion control. ACM Transctions on Graphics 26, 3, Article 105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Yin, K., Coros, S., Beaudoin, P., and van de Panne, M. 2008. Continuation methods for adapting simulated skills. ACM Transctions on Graphics 27, 3, Article 81. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Terrain-adaptive locomotion skills using deep reinforcement learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 35, Issue 4
      July 2016
      1396 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/2897824
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 July 2016
      Published in tog Volume 35, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader