Skip to main content
Erschienen in:
Buchtitelbild

2024 | OriginalPaper | Buchkapitel

Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation

verfasst von : Erdi Sayar, Vladislav Vintaykin, Giovanni Iacca, Alois Knoll

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reinforcement learning (RL) algorithms often require a significant number of experiences to learn a policy capable of achieving desired goals in multi-goal robot manipulation tasks with sparse rewards. Hindsight Experience Replay (HER) is an existing method that improves learning efficiency by using failed trajectories and replacing the original goals with hindsight goals that are uniformly sampled from the visited states. However, HER has a limitation: the hindsight goals are mostly near the initial state, which hinders solving tasks efficiently if the desired goals are far from the initial state. To overcome this limitation, we introduce a curriculum learning method called HERDT (HER with Decision Trees). HERDT uses binary DTs to generate curriculum goals that guide a robotic agent progressively from an initial state toward a desired goal. During the warm-up stage, DTs are optimized using the Grammatical Evolution algorithm. In the training stage, curriculum goals are then sampled by DTs to help the agent navigate the environment. Since binary DTs generate discrete values, we fine-tune these curriculum points by incorporating a feedback value (i.e., the Q-value). This fine-tuning enables us to adjust the difficulty level of the generated curriculum points, ensuring that they are neither overly simplistic nor excessively challenging. In other words, these points are precisely tailored to match the robot’s ongoing learning policy. We evaluate our proposed approach on different sparse reward robotic manipulation tasks and compare it with the state-of-the-art HER approach. Our results demonstrate that our method consistently outperforms or matches the existing approach in all the tested tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The physical interpretation of the achieved goal depends on the task at hand. For some robotic manipulation tasks, the robot needs to pick and place (Fig. 5b), push (Fig. 5c), or slide (Fig. 5d) an object. In this case, the achieved goal corresponds to the x-y-z position of the object. Conversely, if there is no object in the task (Fig. 5a), the achieved goal is defined as the position of the end-effector of the robot.
 
Literatur
1.
Zurück zum Zitat Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
3.
Zurück zum Zitat Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRef
4.
Zurück zum Zitat Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)CrossRef Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)CrossRef
5.
Zurück zum Zitat Rajeswaran, A., Lowrey, K., Todorov, E.V., Kakade, S.M.: Towards generalization and simplicity in continuous control. Adv. Neural Inf. Process. Syst. 30 (2017) Rajeswaran, A., Lowrey, K., Todorov, E.V., Kakade, S.M.: Towards generalization and simplicity in continuous control. Adv. Neural Inf. Process. Syst. 30 (2017)
6.
Zurück zum Zitat Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35CrossRef Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11552246_​35CrossRef
9.
Zurück zum Zitat Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: International Conference on Machine Learning. (1999) Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: International Conference on Machine Learning. (1999)
10.
Zurück zum Zitat Rengarajan, D., Vaidya, G., Sarvesh, A., Kalathil, D., Shakkottai, S.: Reinforcement learning with sparse rewards using guidance from offline demonstration. In: International Conference on Learning Representations (2022) Rengarajan, D., Vaidya, G., Sarvesh, A., Kalathil, D., Shakkottai, S.: Reinforcement learning with sparse rewards using guidance from offline demonstration. In: International Conference on Learning Representations (2022)
11.
Zurück zum Zitat Andrychowicz, M., et al.: Hindsight experience replay. Adv. Neural Inf. Process. Syst. 30 (2017) Andrychowicz, M., et al.: Hindsight experience replay. Adv. Neural Inf. Process. Syst. 30 (2017)
12.
Zurück zum Zitat Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning, pp. 113–122. PMLR (2018) Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning, pp. 113–122. PMLR (2018)
13.
Zurück zum Zitat Zhao, R., Sun, X., Tresp, V.: Maximum entropy-regularized multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7553–7562. PMLR (2019) Zhao, R., Sun, X., Tresp, V.: Maximum entropy-regularized multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7553–7562. PMLR (2019)
15.
Zurück zum Zitat Lipton, Z.C.: The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3), 31–57 (2018)CrossRef Lipton, Z.C.: The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3), 31–57 (2018)CrossRef
16.
Zurück zum Zitat Molnar, C.: Interpretable machine learning. Lulu. com (2020) Molnar, C.: Interpretable machine learning. Lulu. com (2020)
17.
Zurück zum Zitat Coppens, Y., et al.: Distilling deep reinforcement learning policies in soft decision trees. In: IJCAI Workshop on Explainable Artificial Intelligence, pp. 1–6 (2019) Coppens, Y., et al.: Distilling deep reinforcement learning policies in soft decision trees. In: IJCAI Workshop on Explainable Artificial Intelligence, pp. 1–6 (2019)
18.
Zurück zum Zitat Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. Adv. Neural Inf. Process. Syst. 31 (2018) Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. Adv. Neural Inf. Process. Syst. 31 (2018)
19.
Zurück zum Zitat Ding, Z., Hernandez-Leal, P., Ding, G.W., Li, C., Huang, R.: CDT: cascading decision trees for explainable reinforcement learning. arXiv:2011.07553 (2020) Ding, Z., Hernandez-Leal, P., Ding, G.W., Li, C., Huang, R.: CDT: cascading decision trees for explainable reinforcement learning. arXiv:​2011.​07553 (2020)
20.
Zurück zum Zitat Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative Q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv:1907.01180 (2019) Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative Q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv:​1907.​01180 (2019)
21.
Zurück zum Zitat Hallawa, A., et al.: Evo-RL: evolutionary-driven reinforcement learning. In: Genetic and Evolutionary Computation Conference Companion, pp. 153–154 (2021) Hallawa, A., et al.: Evo-RL: evolutionary-driven reinforcement learning. In: Genetic and Evolutionary Computation Conference Companion, pp. 153–154 (2021)
22.
Zurück zum Zitat Custode, L.L., Iacca, G.: Evolutionary learning of interpretable decision trees. IEEE Access 11, 6169–6184 (2023)CrossRef Custode, L.L., Iacca, G.: Evolutionary learning of interpretable decision trees. IEEE Access 11, 6169–6184 (2023)CrossRef
23.
Zurück zum Zitat Ferigo, A., Custode, L.L., Iacca, G.: Quality diversity evolutionary learning of decision trees. In: Symposium on Applied Computing, pp. 425–432. ACM/SIGAPP (2023) Ferigo, A., Custode, L.L., Iacca, G.: Quality diversity evolutionary learning of decision trees. In: Symposium on Applied Computing, pp. 425–432. ACM/SIGAPP (2023)
24.
Zurück zum Zitat Custode, L.L., Iacca, G.: Interpretable pipelines with evolutionary optimized modules for reinforcement learning tasks with visual inputs. In: Genetic and Evolutionary Computation Conference Companion, pp. 224–227 (2022) Custode, L.L., Iacca, G.: Interpretable pipelines with evolutionary optimized modules for reinforcement learning tasks with visual inputs. In: Genetic and Evolutionary Computation Conference Companion, pp. 224–227 (2022)
25.
Zurück zum Zitat Custode, L.L., Iacca, G.: A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces. In: IEEE Symposium Series on Computational Intelligence, pp. 1–8. IEEE (2021) Custode, L.L., Iacca, G.: A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces. In: IEEE Symposium Series on Computational Intelligence, pp. 1–8. IEEE (2021)
26.
Zurück zum Zitat Crespi, M., Ferigo, A., Custode, L.L., Iacca, G.: A population-based approach for multi-agent interpretable reinforcement learning. Appl. Soft Comput. 147, 110758 (2023)CrossRef Crespi, M., Ferigo, A., Custode, L.L., Iacca, G.: A population-based approach for multi-agent interpretable reinforcement learning. Appl. Soft Comput. 147, 110758 (2023)CrossRef
27.
Zurück zum Zitat Todorov, E., Erez, T., Tassa, Y.: MuJoCo: A physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012) Todorov, E., Erez, T., Tassa, Y.: MuJoCo: A physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)
28.
Zurück zum Zitat Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)CrossRef Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)CrossRef
30.
Zurück zum Zitat Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv:1802.09464 (2018) Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv:​1802.​09464 (2018)
31.
Zurück zum Zitat Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning. (2015) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning. (2015)
32.
Zurück zum Zitat Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930CrossRef Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://​doi.​org/​10.​1007/​BFb0055930CrossRef
Metadaten
Titel
Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation
verfasst von
Erdi Sayar
Vladislav Vintaykin
Giovanni Iacca
Alois Knoll
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-56855-8_1

Premium Partner