Top

Published in:

2019 | OriginalPaper | Chapter

Safe Policy Learning with Constrained Return Variance

Author : Arushi Jain

Published in: Advances in Artificial Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

It is desirable for a safety-critical application that the agent performs in a reliable and repeatable manner which conventional setting in reinforcement learning (RL) often fails to provide. In this work, we derive a novel algorithm to learn a safe hierarchical policy by constraining the direct estimate of the variance in the return in the Option-Critic framework [1]. We first present the novel theorem of safe control in the policy gradient methods and then extend the derivation to the Option-Critic framework.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Towards a Novel Data Representation for Classifying Acoustic Signals

next chapter Artificial Intelligence-Based Latency Estimation for Distributed Systems

Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI, pp. 1726–1734 (2017)

Jain, A., Khetarpal, K., Precup, D.: Safe option-critic: Learning safety in the option-critic architecture. arXiv preprint arXiv:1807.08060 (2018)

Prashanth, L., Ghavamzadeh, M.: Actor-critic algorithms for risk-sensitive MDPs. In: Advances in Neural Information Processing Systems, pp. 252–260 (2013)

Sato, M., Kimura, H., Kobayashi, S.: TD algorithm for the variance of return and mean-variance reinforcement learning. Trans. Jpn. Soc. Artif. Intell. 16(3), 353–362 (2001)CrossRef

Sherstan, C., et al.: Directly estimating the variance of the \(\lambda \)-return using temporal-difference methods. arXiv preprint arXiv:1801.08287 (2018)

Tamar, A., Di Castro, D., Mannor, S.: Policy gradients with variance related risk criteria. In: Proceedings of the Twenty-ninth International Conference on Machine Learning, pp. 387–396 (2012)

Tamar, A., Di Castro, D., Mannor, S.: Learning the variance of the reward-to-go. J. Mach. Learn. Res. 17(13), 1–36 (2016)MathSciNetMATH

Tamar, A., Xu, H., Mannor, S.: Scaling up robust MDPs by reinforcement learning. arXiv preprint arXiv:1306.6189 (2013)

Title: Safe Policy Learning with Constrained Return Variance
Author: Arushi Jain
Publisher: Springer International Publishing
Book: Advances in Artificial Intelligence
Print ISBN: 978-3-030-18304-2

Electronic ISBN: 978-3-030-18305-9

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-18305-9_68

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner