Skip to main content
Top

2021 | OriginalPaper | Chapter

Q-Learning for Distributionally Robust Markov Decision Processes

Authors : Nicole Bäuerle, Alexander Glauner

Published in: Modern Trends in Controlled Stochastic Processes:

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we consider distributionally robust Markov Decision Processes with Borel state and action spaces and infinite time horizon. The problem is formulated as a Stackelberg game where nature as a second player chooses the least favorable disturbance density in each scenario. Under suitable assumptions, we prove that the value function is the unique fixed point of an operator and that minimizers respectively, maximizers lead to optimal policies for the decision maker and nature. Based on this result, we introduce a Q-learning approach to solve the problem via simulation-based techniques. We prove the convergence of the Q-learning algorithm and study its performance using a distributionally robust irrigation problem.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bäuerle, N., Glauner, A.: Distributionally robust Markov decision processes and their connection to risk measures. arXiv:2007.13103 (2020) Bäuerle, N., Glauner, A.: Distributionally robust Markov decision processes and their connection to risk measures. arXiv:​2007.​13103 (2020)
2.
go back to reference Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg (2011)CrossRef Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg (2011)CrossRef
3.
go back to reference Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)MATH Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)MATH
4.
go back to reference Bellman, R.: Dynamic Programming. Dover Publications, Mineola (2003)MATH Bellman, R.: Dynamic Programming. Dover Publications, Mineola (2003)MATH
5.
go back to reference Bertsekas, D., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATH Bertsekas, D., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATH
7.
go back to reference González-Trejo, J.I., Hernández-Lerma, O., Hoyos-Reyes, L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5), 1626–1659 (2002)MathSciNetCrossRef González-Trejo, J.I., Hernández-Lerma, O., Hoyos-Reyes, L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5), 1626–1659 (2002)MathSciNetCrossRef
8.
go back to reference Guidolin, M., Rinaldi, F.: Ambiguity in asset pricing and portfolio choice: a review of the literature. Theory Decis. 74(2), 183–217 (2013)MathSciNetCrossRef Guidolin, M., Rinaldi, F.: Ambiguity in asset pricing and portfolio choice: a review of the literature. Theory Decis. 74(2), 183–217 (2013)MathSciNetCrossRef
9.
go back to reference Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999)CrossRef Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999)CrossRef
10.
go back to reference Hinderer, K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. Springer, Heidelberg (1970)CrossRef Hinderer, K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. Springer, Heidelberg (1970)CrossRef
12.
13.
go back to reference Maccheroni, F., Marinacci, M., Rustichini, A.: Ambiguity aversion, robustness, and the variational representation of preferences. Econometrica 74(6), 1447–1498 (2006)MathSciNetCrossRef Maccheroni, F., Marinacci, M., Rustichini, A.: Ambiguity aversion, robustness, and the variational representation of preferences. Econometrica 74(6), 1447–1498 (2006)MathSciNetCrossRef
14.
go back to reference McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools, revised Princeton University Press, Princeton and Oxford (2015)MATH McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools, revised Princeton University Press, Princeton and Oxford (2015)MATH
15.
16.
go back to reference Rüschendorf, L.: Mathematical Risk Analysis: Dependence, Risk Bounds, Optimal Allocations and Portfolios. Springer, Heidelberg (2013)CrossRef Rüschendorf, L.: Mathematical Risk Analysis: Dependence, Risk Bounds, Optimal Allocations and Portfolios. Springer, Heidelberg (2013)CrossRef
17.
go back to reference Unami, K., Mohawesh, O., Sharifi, E., Takeuchi, J., Fujihara, M.: Stochastic modelling and control of rainwater harvesting systems for irrigation during dry spells. J. Clean. Prod. 88, 185–195 (2015)CrossRef Unami, K., Mohawesh, O., Sharifi, E., Takeuchi, J., Fujihara, M.: Stochastic modelling and control of rainwater harvesting systems for irrigation during dry spells. J. Clean. Prod. 88, 185–195 (2015)CrossRef
18.
go back to reference Unami, K., Yangyuoru, M., Alam, A.H.M.B., Kranjac-Berisavljevic, G.: Stochastic control of a micro-dam irrigation scheme for dry season farming. Stoch. Environ. Res. Risk Assess. 27(1), 77–89 (2013)CrossRef Unami, K., Yangyuoru, M., Alam, A.H.M.B., Kranjac-Berisavljevic, G.: Stochastic control of a micro-dam irrigation scheme for dry season farming. Stoch. Environ. Res. Risk Assess. 27(1), 77–89 (2013)CrossRef
19.
20.
go back to reference Xu, H., Mannor, S.: Distributionally robust Markov decision processes. Adv. Neural Inform. Process. Syst. 23, 2505–2513 (2010)MATH Xu, H., Mannor, S.: Distributionally robust Markov decision processes. Adv. Neural Inform. Process. Syst. 23, 2505–2513 (2010)MATH
Metadata
Title
Q-Learning for Distributionally Robust Markov Decision Processes
Authors
Nicole Bäuerle
Alexander Glauner
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-76928-4_6

Premium Partner