nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Robustness to Approximations and Model Learning in MDPs and POMDPs

verfasst von : Ali Devran Kara, Serdar Yüksel

Erschienen in: Modern Trends in Controlled Stochastic Processes:

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In stochastic control applications, typically only an ideal model (controlled transition kernel) is assumed and the control design is based on the given model, raising the problem of performance loss due to the mismatch between the assumed model and the actual model. In some further setups, an exact model may be known, but this model may entail computationally challenging optimality analysis leading to the solution of some approximate model being implemented. With such a motivation, we study continuity properties of discrete-time stochastic control problems with respect to system models and robustness of optimal control policies designed for incorrect models applied to the true system. We study both fully observed and partially observed setups under an infinite horizon discounted expected cost criterion. We show that continuity can be established under total variation convergence of the transition kernels under mild assumptions and with further restrictions on the dynamics and observation model under weak and setwise convergence of the transition kernels. Using these, we establish convergence results and error bounds due to mismatch that occurs by the application of a control policy which is designed for an incorrectly estimated system model to the actual system, thus establishing results on robustness. These entail implications on empirical learning in (data-driven) stochastic control since often system models are learned through empirical training data where typically the weak convergence criterion applies but stronger convergence criteria do not. We finally view and establish approximation as a particular instance of robustness.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Estimation of Equilibria in an Advertising Game with Unknown Distribution of the Response to Advertising Efforts

Nächstes Kapitel Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

Backhoff-Veraguas, J., Bartl, D., Beiglböck, M., Eder, M.: Adapted Wasserstein distances and stability in mathematical finance. Financ. Stoch. 24, 3601–632 (2020)MathSciNetCrossRef

Bayraktar, E., Dolinsky, Y., Guo, J.: Continuity of utility maximization under weak convergence. Math. Financial Econ. 14(4), 1–33 (2020)MathSciNetCrossRef

Billingsley, P.: Statistical methods in Markov chains. Ann. Math. Statist. 32, 12–40 (1961)MathSciNetCrossRef

Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)MATH

Devroye, L., Györfi, L.: Non-parametric Density Estimation: The \(L_1\) View. Wiley, New York (1985)MATH

Dudley, R.M.: Real Analysis and Probability, 2nd edn. Cambridge University Press, Cambridge (2002)CrossRef

Dupuis, P., James, M.R., Petersen, I.: Robust properties of risk-sensitive control. Math. Control Signals Syst. 13(4), 318–332 (2000)MathSciNetCrossRef

Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171(1), 1–52 (2018)MathSciNetMATH

Feinberg, E., Kasyanov, P., Zgurovsky, M.: Partially observable total-cost Markov decision process with weakly continuous transition probabilities. Math. Oper. Res. 41(2), 656–681 (2016)MathSciNetCrossRef

10.

Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York (2003)MATH

11.

Gray, R.M.: Entropy and Information Theory. Springer-Verlag, New York (1990)CrossRef

12.

Györfi, L., Kohler, M.: Nonparametric estimation of conditional distributions. IEEE Trans. Inf. Theory 53(5), 1872–1879 (2007)MathSciNetCrossRef

13.

Hernandez-Lerma, O., Lasserre, J.: Discrete-Time Markov Control Processes. Springer, New York (1996)CrossRef

14.

Jacobson, D.: Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automat. Contr. 18(2), 124–131 (1973)MathSciNetCrossRef

15.

Kara, A.D., Saldi, N., Yüksel, S.: Weak Feller property of non-linear filters. Syst. Control Lett. 134, 104–512 (2019)MathSciNetCrossRef

16.

Kara, A. D., Yüksel, S.: Robustness to incorrect system models in stochastic control and application to data-driven learning. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 2753–2758 (2018)

17.

Kara, A.D., Yüksel, S.: Robustness to incorrect priors in partially observed stochastic control. SIAM J. Control. Optim. 57(3), 1929–1964 (2019)MathSciNetCrossRef

18.

Kara, A.D., Yüksel, S.: Robustness to incorrect system models in stochastic control. SIAM J. Control. Optim. 58(2), 1144–1182 (2020)MathSciNetCrossRef

19.

Parthasarathy, K.: Probability Measures on Metric Spaces. AMS, Providence (2005)MATH

20.

Petersen, I., James, M.R., Dupuis, P.: Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automat. Contr. 45(3), 398–412 (2000)MathSciNetCrossRef

21.

Pra, P.D., Meneghini, L., Runggaldier, W.J.: Connections between stochastic control and dynamic games. Math. Control Signals Syst. 9(4), 303–326 (1996)MathSciNetCrossRef

22.

Saldi, N., Yüksel, S., Linder, T.: On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)MathSciNetCrossRef

23.

Saldi, N., Yüksel, S., Linder, T.: Near optimality of quantized policies in stochastic control under weak continuity conditions. J. Math. Anal. Appl. 435(1), 321–337 (2015)MathSciNetCrossRef

24.

Savkin, A.V., Petersen, I.R.: Robust control of uncertain systems with structured uncertainty. J. Math. Syst. Est. Control 6(3), 1–14 (1996)MathSciNetMATH

25.

Sun, H., Xu, H.: Convergence analysis for distributionally robust optimization and equilibrium problems. Math. Oper. Res. 41(2), 377–401 (2016)MathSciNetCrossRef

26.

Ugrinovskii, V.A:. Robust H-infinity control in the presence of stochastic uncertainty. Int. J. Control 71(2), 219–237 (1998)

Titel: Robustness to Approximations and Model Learning in MDPs and POMDPs
verfasst von: Ali Devran Kara
Serdar Yüksel
Verlag: Springer International Publishing
Buch: Modern Trends in Controlled Stochastic Processes:
Print ISBN: 978-3-030-76927-7

Electronic ISBN: 978-3-030-76928-4

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-76928-4_9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"