Top

Published in:

2020 | OriginalPaper | Chapter

PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

Authors : Guillaume Matheron, Nicolas Perrin, Olivier Sigaud

Published in: Artificial Neural Networks and Machine Learning – ICANN 2020

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study, state-of-the-art RL algorithms such as DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this paper, we propose a new algorithm called “Plan, Backplay, Chain Skills” (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments. In a first phase, a motion planning algorithm is used to find a single good trajectory, then an RL algorithm is trained using a curriculum derived from the trajectory, by combining a variant of the Backplay algorithm and skill chaining. We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control

next chapter Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

Achiam, J., Knight, E., Abbeel, P.: Towards Characterizing Divergence in Deep Q-Learning. arXiv:1903.08894 (2019)

Benureau, F.C.Y., Oudeyer, P.Y.: Behavioral diversity generation in autonomous exploration through reuse of past experience. Front. Robot. AI 3, 8 (2016)CrossRef

Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by Random Network Distillation. arXiv:1810.12894 (2018)

Chiang, H.T.L., Hsu, J., Fiser, M., Tapia, L., Faust, A.: RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies. arXiv:1907.04799 (2019)

Ciosek, K., Vuong, Q., Loftin, R., Hofmann, K.: Better Exploration with Optimistic Actor-Critic. arXiv:1910.12807 (2019)

Colas, C., Sigaud, O., Oudeyer, P.Y.: GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. arXiv:1802.05054 (2018)

Cully, A., Demiris, Y.: Quality and Diversity Optimization: a unifying Modular Framework. IEEE Trans. Evol. Comput. 1 (2017)

Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-Explore: a New Approach for Hard-Exploration Problems. arXiv:1901.10995 (2019)

Erickson, L.H., LaValle, S.M.: Survivability: measuring and ensuring path diversity. In: 2009 IEEE International Conference on Robotics and Automation, pp. 2068–2073 (2009)

10.

Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is All You Need: Learning Skills without a Reward Function. arXiv:1802.06070 (2018)

11.

Faust, A., et al.: PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning. arXiv:1710.03937 (2018)

12.

Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse Curriculum Generation for Reinforcement Learning. arXiv:1707.05300 (2018)

13.

Fournier, P., Sigaud, O., Colas, C., Chetouani, M.: CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments. arXiv:1901.09720 (2019)

14.

Fujimoto, S., Hoof, H.v., Meger, D.: Addressing Function Approximation Error in Actor-Critic Methods. ICML (2018)

15.

Fujimoto, S., Meger, D., Precup, D.: Off-Policy Deep Reinforcement Learning without Exploration. arXiv:1812.02900 (2018)

16.

Goyal, A., et al.: Recall Traces: Backtracking Models for Efficient Reinforcement Learning. arXiv:1804.00379 (2019)

17.

van Hasselt, H., Doron, Y., Strub, F., Hessel, M., Sonnerat, N., Modayil, J.: Deep Reinforcement Learning and the Deadly Triad. arXiv:1812.02648 (2018)

18.

Hosu, I.A., Rebedea, T.: Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay. arXiv:1607.05077 (2016)

19.

Knepper, R.A., Mason, M.T.: Path diversity is only part of the problem. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3224–3229 (2009)

20.

Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Bengio, Y., et al. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1015–1023 (2009)

21.

Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.G.: Constructing skill trees for reinforcement learning agents from demonstration trajectories. In: Lafferty, J.D., et al. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1162–1170 (2010)

22.

Lavalle, S.M.: Rapidly-Exploring Random Trees: A New Tool for Path Planning. Iowa State University, Technical report (1998)

23.

Lillicrap, T.P., et al: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

24.

Matheron, G., Perrin, N., Sigaud, O.: The problem with DDPG: understanding failures in deterministic environments with sparse rewards. arXiv:1911.11679 (2019)

25.

Mnih, V., et al.: Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 (2013)

26.

Morere, P., Francis, G., Blau, T., Ramos, F.: Reinforcement Learning with Probabilistically Complete Exploration. arXiv:2001.06940 (2020)

27.

Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming Exploration in Reinforcement Learning with Demonstrations. arXiv:1709.10089 (2018)

28.

Ng, A.Y., Harada, D., Russell, S.J.: policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, pp. 278–287 (1999)

29.

Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep Exploration via Bootstrapped DQN. arXiv:1602.04621 (2016)

30.

Paine, T.L., et al.: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems. arXiv:1909.01387 (2019)

31.

Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven Exploration by Self-supervised Prediction. arXiv:1705.05363 (2017)

32.

Penedones, H., Vincent, D., Maennel, H., Gelly, S., Mann, T., Barreto, A.: Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem. arXiv:1807.03064 (2018)

33.

Pugh, J.K., Soros, L.B., Szerlip, P.A., Stanley, K.O.: Confronting the challenge of quality diversity. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 967–974, GECCO 2015. ACM, New York (2015)

34.

Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)CrossRef

35.

Resnick, C., Raileanu, R., Kapoor, S., Peysakhovich, A., Cho, K., Bruna, J.: Backplay: “Man muss immer umkehren’. arXiv:1807.06919 (2018)

36.

Riedmiller, M., et al.: Learning by Playing - Solving Sparse Reward Tasks from Scratch. arXiv:1802.10567 (2018)

37.

Salimans, T., Chen, R.: Learning Montezuma’s Revenge from a Single Demonstration. arXiv:1812.03381 (2018)

38.

Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized Experience Replay. arXiv:1511.05952 (2015)

39.

Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. arXiv:1502.05477 (2015)

40.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arXiv:1707.06347 (2017)

41.

Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arXiv:1507.00814 (2015)

42.

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)MATH

43.

Tang, H., et al.: #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. arXiv:1611.04717 (2016)

44.

Tassa, Y., et al.: DeepMind Control Suite. arXiv:1801.00690 (2018)

Title: PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning
Authors: Guillaume Matheron
Nicolas Perrin
Olivier Sigaud
Publisher: Springer International Publishing
Book: Artificial Neural Networks and Machine Learning – ICANN 2020
Print ISBN: 978-3-030-61615-1

Electronic ISBN: 978-3-030-61616-8

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-61616-8_24

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner