Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 10/2020

18.05.2020 | Original Article

Evaluating skills in hierarchical reinforcement learning

verfasst von: Marzieh Davoodabadi Farahani, Nasser Mozayani

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 10/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Despite the benefits mentioned in previous works of automatically acquiring skills for using them in hierarchical reinforcement learning algorithms such as solving the curse of dimensionality, improving exploration, and speeding up value propagation, they have not paid much attention to evaluating the effect of each skill on these factors. In this paper, we show that depending on the given task, a skill may be useful for learning it or not. In addition, the focus of the related work of automatically acquiring skills is on detecting subgoals, i.e., the skill termination condition, but there is not a precise method for extracting the initiation set of skills. In this paper, we propose not only two methods for evaluating skills but also two other methods for pruning the initiation set of them. Experimental results show significant improvements in learning different test domains after evaluating and pruning skills.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Dulac-Arnold G, Mankowitz D, Hester T (2019) Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901, 29 April 2019 Dulac-Arnold G, Mankowitz D, Hester T (2019) Challenges of real-world reinforcement learning. arXiv preprint arXiv:​1904.​12901, 29 April 2019
2.
Zurück zum Zitat Moerman W (2009) Hierarchical reinforcement learning : assignment of behaviours to subpolicies by self-organization. Ph.D. thesis, Utrecht University Moerman W (2009) Hierarchical reinforcement learning : assignment of behaviours to subpolicies by self-organization. Ph.D. thesis, Utrecht University
3.
Zurück zum Zitat Pfau J (2008) Plans as a means for guiding reinforcement learner. Ph.D. thesis, The University of Melbourn Pfau J (2008) Plans as a means for guiding reinforcement learner. Ph.D. thesis, The University of Melbourn
4.
Zurück zum Zitat Nguyen TT, Nguyen ND, Nahavandi S (2018) Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications. arXiv preprint arXiv:1812.11794. 31 Dec 2018 Nguyen TT, Nguyen ND, Nahavandi S (2018) Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications. arXiv preprint arXiv:​1812.​11794. 31 Dec 2018
5.
Zurück zum Zitat McGovern A, Sutton RS (1998) Macro-actions in reinforcement learning: an empirical analysis. University of Massachusetts, Department of Computer Science, Tech. Rep 98–70 McGovern A, Sutton RS (1998) Macro-actions in reinforcement learning: an empirical analysis. University of Massachusetts, Department of Computer Science, Tech. Rep 98–70
6.
Zurück zum Zitat Jong NK, Hester T, Stone P (2008) The utility of temporal abstraction in reinforcement learning. In: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, vol 1, pp 299–306 Jong NK, Hester T, Stone P (2008) The utility of temporal abstraction in reinforcement learning. In: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, vol 1, pp 299–306
7.
Zurück zum Zitat Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3):293–321MathSciNet Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3):293–321MathSciNet
8.
Zurück zum Zitat Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054CrossRef Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054CrossRef
9.
Zurück zum Zitat Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1):181–211MathSciNetCrossRef Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1):181–211MathSciNetCrossRef
10.
Zurück zum Zitat Dietterich TG (2000) An overview of MAXQ hierarchical reinforcement learning. In: International symposium on abstraction, reformulation, and approximation. Springer, Berlin, Heidelberg, pp 26–44CrossRef Dietterich TG (2000) An overview of MAXQ hierarchical reinforcement learning. In: International symposium on abstraction, reformulation, and approximation. Springer, Berlin, Heidelberg, pp 26–44CrossRef
11.
Zurück zum Zitat Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recognit Lett 87:104–116CrossRef Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recognit Lett 87:104–116CrossRef
12.
Zurück zum Zitat Xiong C, Tianmin S, Socher R (2019) Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. United States patent application Xiong C, Tianmin S, Socher R (2019) Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. United States patent application
13.
Zurück zum Zitat Bacon P, Harb J, Precup D (2017) The option-critic architecture. In: Thirty-first AAAI conference on artificial intelligence, pp 1726–1734 Bacon P, Harb J, Precup D (2017) The option-critic architecture. In: Thirty-first AAAI conference on artificial intelligence, pp 1726–1734
14.
Zurück zum Zitat Machado M, Bellemare M, Bowling M (2017) A Laplacian framework for option discovery in reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2295–2304 Machado M, Bellemare M, Bowling M (2017) A Laplacian framework for option discovery in reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2295–2304
15.
Zurück zum Zitat Dann M, Zambetta F (2017) Integrating skills and simulation to solve complex navigation tasks in Infinite Mario. IEEE Trans Games 10:101–106CrossRef Dann M, Zambetta F (2017) Integrating skills and simulation to solve complex navigation tasks in Infinite Mario. IEEE Trans Games 10:101–106CrossRef
16.
17.
Zurück zum Zitat Houthooft R, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P (2016) Vime: variational information maximizing exploration. In: Advances in neural information processing systems, pp 1109–1117 Houthooft R, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P (2016) Vime: variational information maximizing exploration. In: Advances in neural information processing systems, pp 1109–1117
18.
Zurück zum Zitat Demir A, Çilden E, Polat F (2016) Local roots: a tree-based subgoal discovery method to accelerate reinforcement learning. In: Joint European conference on machine learning and knowledge discovery in databases, pp 361–376 Demir A, Çilden E, Polat F (2016) Local roots: a tree-based subgoal discovery method to accelerate reinforcement learning. In: Joint European conference on machine learning and knowledge discovery in databases, pp 361–376
19.
Zurück zum Zitat Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683 Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683
20.
Zurück zum Zitat Riemer M, Liu M, Tesauro G (2018) Learning abstract options. In: Advances in neural information processing systems, pp 10424–10434 Riemer M, Liu M, Tesauro G (2018) Learning abstract options. In: Advances in neural information processing systems, pp 10424–10434
21.
Zurück zum Zitat Kaelbling L (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285CrossRef Kaelbling L (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285CrossRef
22.
Zurück zum Zitat McGovern A, Barto AG (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Machine learning-international workshop then conference, pp 361–368 McGovern A, Barto AG (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Machine learning-international workshop then conference, pp 361–368
23.
Zurück zum Zitat Menache I, Mannor S, Shimkin N (2002) Q-cut-dynamic discovery of sub-goals in reinforcement learning. In: European conference on machine learning: ECML 2002, pp 295–306 Menache I, Mannor S, Shimkin N (2002) Q-cut-dynamic discovery of sub-goals in reinforcement learning. In: European conference on machine learning: ECML 2002, pp 295–306
24.
Zurück zum Zitat Simşek O (2008) Behavioral building blocks for autonomous agents: description, identification, and learning. Ph.D. Thesis, University of Massachusetts Amherst Simşek O (2008) Behavioral building blocks for autonomous agents: description, identification, and learning. Ph.D. Thesis, University of Massachusetts Amherst
25.
Zurück zum Zitat Merrick K (2007) Modelling motivation for experience-based attention focus in reinforcement learning. Ph.D. Thesis, School of Information Technologies, University of Sydney Merrick K (2007) Modelling motivation for experience-based attention focus in reinforcement learning. Ph.D. Thesis, School of Information Technologies, University of Sydney
26.
Zurück zum Zitat Mehta N, Ray S, Tadepalli P, Dietterich T (2008) Automatic discovery and transfer of MAXQ hierarchies. In: Proceedings of the 25th international conference on machine learning, pp 648–655 Mehta N, Ray S, Tadepalli P, Dietterich T (2008) Automatic discovery and transfer of MAXQ hierarchies. In: Proceedings of the 25th international conference on machine learning, pp 648–655
27.
Zurück zum Zitat Zang P, Zhou P, Minnen D, Isbell C (2009) Discovering options from example trajectories. In: Proceedings of the 26th annual international conference on machine learning, pp 1217–1224 Zang P, Zhou P, Minnen D, Isbell C (2009) Discovering options from example trajectories. In: Proceedings of the 26th annual international conference on machine learning, pp 1217–1224
28.
Zurück zum Zitat Mannor S, Menache I, Hoze A, Klein U (2004) Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the twenty-first international conference on machine learning, p 71 Mannor S, Menache I, Hoze A, Klein U (2004) Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the twenty-first international conference on machine learning, p 71
29.
Zurück zum Zitat Murata J (2008) Controlled use of subgoals in reinforcement learning. In: Robotics, automation and control, book, pp 167–182 Murata J (2008) Controlled use of subgoals in reinforcement learning. In: Robotics, automation and control, book, pp 167–182
30.
Zurück zum Zitat Davoodabadi Farahani M, Mozayani N (2019) Automatic construction and evaluation of macro-actions in reinforcement learning. Appl Soft Comput 82:105574CrossRef Davoodabadi Farahani M, Mozayani N (2019) Automatic construction and evaluation of macro-actions in reinforcement learning. Appl Soft Comput 82:105574CrossRef
31.
Zurück zum Zitat Metzen JH (2014) Learning the structure of continuous Markov decision processes. Ph.D. thesis, Universität Bremen Metzen JH (2014) Learning the structure of continuous Markov decision processes. Ph.D. thesis, Universität Bremen
32.
Zurück zum Zitat Davoodabadi Farahani M, Mozayani N (2020) A new method for acquiring reusable skills in intrinsically motivated reinforcement learning. J Intell Manuf (submitted) Davoodabadi Farahani M, Mozayani N (2020) A new method for acquiring reusable skills in intrinsically motivated reinforcement learning. J Intell Manuf (submitted)
33.
Zurück zum Zitat Barto AG, Singh S, Chentanez N (2004) Intrinsically motivated learning of hierarchical collections of skills. In: Proceedings of the 3rd international conference on development and learning (ICDL 2004), Salk Institute, San Diego Barto AG, Singh S, Chentanez N (2004) Intrinsically motivated learning of hierarchical collections of skills. In: Proceedings of the 3rd international conference on development and learning (ICDL 2004), Salk Institute, San Diego
34.
Zurück zum Zitat Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8188 LNAI, no PART 1, pp 81–96 Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8188 LNAI, no PART 1, pp 81–96
35.
Zurück zum Zitat Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRef Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRef
36.
Zurück zum Zitat Davoodabadi Farahani M, Mozayani N (2018) Proposing a new method for acquiring skills in reinforcement learning with the help of graph clustering. Iran J Electr Comput Eng 2(16):131–141 Davoodabadi Farahani M, Mozayani N (2018) Proposing a new method for acquiring skills in reinforcement learning with the help of graph clustering. Iran J Electr Comput Eng 2(16):131–141
37.
Zurück zum Zitat Sutton RS, Precup D, Singh S (1998) Intra-option learning about temporally abstract actions. In: Proceedings of the fifteenth international conference on machine learning, pp 556–564 Sutton RS, Precup D, Singh S (1998) Intra-option learning about temporally abstract actions. In: Proceedings of the fifteenth international conference on machine learning, pp 556–564
38.
Zurück zum Zitat Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. Mach Learn Knowl Discov Databases 8188:81–96 Metzen JH (2013) Learning graph-based representations for continuous reinforcement learning domains. Mach Learn Knowl Discov Databases 8188:81–96
39.
Zurück zum Zitat Henderson P, Chang WD, Shkurti F, Hansen J, Meger D, Dudek G (2017) Benchmark environments for multitask learning in continuous domains. arXiv preprint arXiv:1708.04352 Henderson P, Chang WD, Shkurti F, Hansen J, Meger D, Dudek G (2017) Benchmark environments for multitask learning in continuous domains. arXiv preprint arXiv:​1708.​04352
40.
Zurück zum Zitat François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends®. Mach Learn 11(3–4):219–354CrossRef François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends®. Mach Learn 11(3–4):219–354CrossRef
Metadaten
Titel
Evaluating skills in hierarchical reinforcement learning
verfasst von
Marzieh Davoodabadi Farahani
Nasser Mozayani
Publikationsdatum
18.05.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 10/2020
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01141-3

Weitere Artikel der Ausgabe 10/2020

International Journal of Machine Learning and Cybernetics 10/2020 Zur Ausgabe

Neuer Inhalt