Skip to main content

2023 | OriginalPaper | Buchkapitel

Oracle-SAGE: Planning Ahead in Graph-Based Deep Reinforcement Learning

verfasst von : Andrew Chester, Michael Dann, Fabio Zambetta, John Thangarajah

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep reinforcement learning (RL) commonly suffers from high sample complexity and poor generalisation, especially with high-dimensional (image-based) input. Where available (such as some robotic control domains), low dimensional vector inputs outperform their image based counterparts, but it is challenging to represent complex dynamic environments in this manner. Relational reinforcement learning instead represents the world as a set of objects and the relations between them; offering a flexible yet expressive view which provides structural inductive biases to aid learning. Recently relational RL methods have been extended with modern function approximation using graph neural networks (GNNs). However, inherent limitations in the processing model for GNNs result in decreased returns when important information is dispersed widely throughout the graph. We outline a hybrid learning and planning model which uses reinforcement learning to propose and select subgoals for a planning model to achieve. This includes a novel action selection mechanism and loss function to allow training around the non-differentiable planner. We demonstrate our algorithms effectiveness on a range of domains, including MiniHack and a challenging extension of the classic taxi domain.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Ammanabrolu, P., Riedl, M.: Playing text-adventure games with graph-based deep reinforcement learning. In: NAACL (2019) Ammanabrolu, P., Riedl, M.: Playing text-adventure games with graph-based deep reinforcement learning. In: NAACL (2019)
2.
3.
Zurück zum Zitat Beeching, E., et al.: Graph augmented deep reinforcement learning in the GameRLand3D environment. arXiv preprint arXiv:2112.11731 (2021) Beeching, E., et al.: Graph augmented deep reinforcement learning in the GameRLand3D environment. arXiv preprint arXiv:​2112.​11731 (2021)
4.
Zurück zum Zitat Berry, D.A., Fristedt, B.: Bandit problems: sequential allocation of experiments (Monographs on Statistics and Applied Probability) (1985) Berry, D.A., Fristedt, B.: Bandit problems: sequential allocation of experiments (Monographs on Statistics and Applied Probability) (1985)
5.
Zurück zum Zitat Chester, A., Dann, M., Zambetta, F., Thangarajah, J.: SAGE: generating symbolic goals for myopic models in deep reinforcement learning. arXiv preprint arXiv2203.05079 (2022) Chester, A., Dann, M., Zambetta, F., Thangarajah, J.: SAGE: generating symbolic goals for myopic models in deep reinforcement learning. arXiv preprint arXiv2203.05079 (2022)
6.
7.
Zurück zum Zitat Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Mach. Learn. 43(1), 7–52 (2001)CrossRefMATH Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Mach. Learn. 43(1), 7–52 (2001)CrossRefMATH
8.
Zurück zum Zitat Garg, S., Bajpai, A., Mausam, M.: Size independent neural transfer for RDDL planning. In: ICAPS (2019) Garg, S., Bajpai, A., Mausam, M.: Size independent neural transfer for RDDL planning. In: ICAPS (2019)
9.
Zurück zum Zitat Garg, S., Bajpai, A., Mausam, M.: Symbolic network: generalized neural policies for relational MDPs. In: ICML (2020) Garg, S., Bajpai, A., Mausam, M.: Symbolic network: generalized neural policies for relational MDPs. In: ICML (2020)
11.
Zurück zum Zitat Godwin, J., et al.: Simple GNN regularisation for 3D molecular property prediction and beyond. In: ICLR (2022) Godwin, J., et al.: Simple GNN regularisation for 3D molecular property prediction and beyond. In: ICLR (2022)
12.
Zurück zum Zitat Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: NeurIPS (2018) Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: NeurIPS (2018)
13.
Zurück zum Zitat Hafner, D., Lillicrap, T., Norouzi, M., Ba, J.: Mastering atari with discrete world models. In: ICLR (2021) Hafner, D., Lillicrap, T., Norouzi, M., Ba, J.: Mastering atari with discrete world models. In: ICLR (2021)
14.
Zurück zum Zitat Hamrick, J.B., et al.: Relational inductive bias for physical construction in humans and machines. In: Proceedings of the Annual Meeting of the Cognitive Science Society (2018) Hamrick, J.B., et al.: Relational inductive bias for physical construction in humans and machines. In: Proceedings of the Annual Meeting of the Cognitive Science Society (2018)
15.
Zurück zum Zitat Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: AAAI (2018) Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: AAAI (2018)
16.
Zurück zum Zitat Illanes, L., Yan, X., Icarte, R.T., McIlraith, S.A.: Symbolic plans as high-level instructions for reinforcement learning. In: ICAPS (2020) Illanes, L., Yan, X., Icarte, R.T., McIlraith, S.A.: Symbolic plans as high-level instructions for reinforcement learning. In: ICAPS (2020)
17.
Zurück zum Zitat Janisch, J., Pevný, T., Lisý, V.: Symbolic relational deep reinforcement learning based on graph neural networks. arXiv preprint arXiv:2009.12462 (2020) Janisch, J., Pevný, T., Lisý, V.: Symbolic relational deep reinforcement learning based on graph neural networks. arXiv preprint arXiv:​2009.​12462 (2020)
18.
Zurück zum Zitat Jiang, J., Dun, C., Huang, T., Lu, Z.: Graph convolutional reinforcement learning. In: ICLR (2019) Jiang, J., Dun, C., Huang, T., Lu, Z.: Graph convolutional reinforcement learning. In: ICLR (2019)
19.
Zurück zum Zitat Kemp, C., Tenenbaum, J.B.: The discovery of structural form. PNAS 105(31), 10687–10692 (2008)CrossRef Kemp, C., Tenenbaum, J.B.: The discovery of structural form. PNAS 105(31), 10687–10692 (2008)CrossRef
20.
Zurück zum Zitat Kerkkamp, D., Bukhsh, Z., Zhang, Y., Jansen, N.: Grouping of maintenance actions with deep reinforcement learning and graph convolutional networks. In: ICAART (2022) Kerkkamp, D., Bukhsh, Z., Zhang, Y., Jansen, N.: Grouping of maintenance actions with deep reinforcement learning and graph convolutional networks. In: ICAART (2022)
21.
Zurück zum Zitat Kokel, H., Manoharan, A., Natarajan, S., Ravindran, B., Tadepalli, P.: RePReL: integrating relational planning and reinforcement learning for effective abstraction. In: ICAPS (2021) Kokel, H., Manoharan, A., Natarajan, S., Ravindran, B., Tadepalli, P.: RePReL: integrating relational planning and reinforcement learning for effective abstraction. In: ICAPS (2021)
22.
Zurück zum Zitat Küttler, H., et al.: The NetHack learning environment. In: NeurIPS (2020) Küttler, H., et al.: The NetHack learning environment. In: NeurIPS (2020)
23.
Zurück zum Zitat Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. AIJ 241, 103–130 (2016)MathSciNetMATH Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. AIJ 241, 103–130 (2016)MathSciNetMATH
24.
Zurück zum Zitat Li, R., Jabri, A., Darrell, T., Agrawal, P.: Towards practical multi-object manipulation using relational reinforcement learning. In: ICRA (2020) Li, R., Jabri, A., Darrell, T., Agrawal, P.: Towards practical multi-object manipulation using relational reinforcement learning. In: ICRA (2020)
25.
Zurück zum Zitat Liu, Z., Chen, C., Li, L., Zhou, J., Li, X., Song, L., Qi, Y.: GeniePath: graph neural networks with adaptive receptive paths. In: AAAI (2019) Liu, Z., Chen, C., Li, L., Zhou, J., Li, X., Song, L., Qi, Y.: GeniePath: graph neural networks with adaptive receptive paths. In: AAAI (2019)
26.
Zurück zum Zitat Lyu, D., Yang, F., Liu, B., Gustafson, S.: SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: AAAI (2019) Lyu, D., Yang, F., Liu, B., Gustafson, S.: SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: AAAI (2019)
27.
Zurück zum Zitat McDermott, D., et al.: PDDL - the planning domain definition language. Technical Report, Yale Center for Computational Vision and Control (1998) McDermott, D., et al.: PDDL - the planning domain definition language. Technical Report, Yale Center for Computational Vision and Control (1998)
28.
Zurück zum Zitat Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRef
29.
Zurück zum Zitat Navon, D.: Forest before trees: the precedence of global features in visual perception. Cogn. Psychol. 9(3), 353–383 (1977)CrossRef Navon, D.: Forest before trees: the precedence of global features in visual perception. Cogn. Psychol. 9(3), 353–383 (1977)CrossRef
30.
Zurück zum Zitat Roderick, M., Grimm, C., Tellex, S.: Deep abstract Q-networks. In: AAMAS (2018) Roderick, M., Grimm, C., Tellex, S.: Deep abstract Q-networks. In: AAMAS (2018)
31.
Zurück zum Zitat Rong, Y., Huang, W., Xu, T., Huang, J.: DropEdge: towards deep graph convolutional networks on node classification. In: ICLR (2020) Rong, Y., Huang, W., Xu, T., Huang, J.: DropEdge: towards deep graph convolutional networks on node classification. In: ICLR (2020)
32.
Zurück zum Zitat Samvelyan, M., et al.: MiniHack the planet: a sandbox for open-ended reinforcement learning research. In: NeurIPS Track on Datasets and Benchmarks (2021) Samvelyan, M., et al.: MiniHack the planet: a sandbox for open-ended reinforcement learning research. In: NeurIPS Track on Datasets and Benchmarks (2021)
33.
Zurück zum Zitat Schrittwieser, J., et al.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)CrossRef Schrittwieser, J., et al.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)CrossRef
34.
Zurück zum Zitat Sievers, S., Röger, G., Wehrle, M., Katz, M.: Theoretical foundations for structural symmetries of lifted PDDL tasks. In: ICAPS (2019) Sievers, S., Röger, G., Wehrle, M., Katz, M.: Theoretical foundations for structural symmetries of lifted PDDL tasks. In: ICAPS (2019)
35.
Zurück zum Zitat Topping, J., Di Giovanni, F., Chamberlain, B.P., Dong, X., Bronstein, M.M.: Understanding over-squashing and bottlenecks on graphs via curvature. arXiv preprint arXiv:2111.14522 (2021) Topping, J., Di Giovanni, F., Chamberlain, B.P., Dong, X., Bronstein, M.M.: Understanding over-squashing and bottlenecks on graphs via curvature. arXiv preprint arXiv:​2111.​14522 (2021)
36.
Zurück zum Zitat Wang, T., Liao, R., Ba, J., Fidler, S.: NerveNet: learning structured policy with graph neural networks. In: ICLR (2018) Wang, T., Liao, R., Ba, J., Fidler, S.: NerveNet: learning structured policy with graph neural networks. In: ICLR (2018)
37.
Zurück zum Zitat Winder, J., et al.: Planning with abstract learned models while learning transferable subtasks. In: AAAI (2020) Winder, J., et al.: Planning with abstract learned models while learning transferable subtasks. In: AAAI (2020)
38.
Zurück zum Zitat Wöhlke, J., Schmitt, F., van Hoof, H.: Hierarchies of planning and reinforcement learning for robot navigation. In: ICRA (2021) Wöhlke, J., Schmitt, F., van Hoof, H.: Hierarchies of planning and reinforcement learning for robot navigation. In: ICRA (2021)
39.
Zurück zum Zitat Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)CrossRef Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)CrossRef
Metadaten
Titel
Oracle-SAGE: Planning Ahead in Graph-Based Deep Reinforcement Learning
verfasst von
Andrew Chester
Michael Dann
Fabio Zambetta
John Thangarajah
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-26412-2_4

Premium Partner