Skip to main content

2021 | OriginalPaper | Buchkapitel

Robustness to Approximations and Model Learning in MDPs and POMDPs

verfasst von : Ali Devran Kara, Serdar Yüksel

Erschienen in: Modern Trends in Controlled Stochastic Processes:

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In stochastic control applications, typically only an ideal model (controlled transition kernel) is assumed and the control design is based on the given model, raising the problem of performance loss due to the mismatch between the assumed model and the actual model. In some further setups, an exact model may be known, but this model may entail computationally challenging optimality analysis leading to the solution of some approximate model being implemented. With such a motivation, we study continuity properties of discrete-time stochastic control problems with respect to system models and robustness of optimal control policies designed for incorrect models applied to the true system. We study both fully observed and partially observed setups under an infinite horizon discounted expected cost criterion. We show that continuity can be established under total variation convergence of the transition kernels under mild assumptions and with further restrictions on the dynamics and observation model under weak and setwise convergence of the transition kernels. Using these, we establish convergence results and error bounds due to mismatch that occurs by the application of a control policy which is designed for an incorrectly estimated system model to the actual system, thus establishing results on robustness. These entail implications on empirical learning in (data-driven) stochastic control since often system models are learned through empirical training data where typically the weak convergence criterion applies but stronger convergence criteria do not. We finally view and establish approximation as a particular instance of robustness.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Backhoff-Veraguas, J., Bartl, D., Beiglböck, M., Eder, M.: Adapted Wasserstein distances and stability in mathematical finance. Financ. Stoch. 24, 3601–632 (2020)MathSciNetCrossRef Backhoff-Veraguas, J., Bartl, D., Beiglböck, M., Eder, M.: Adapted Wasserstein distances and stability in mathematical finance. Financ. Stoch. 24, 3601–632 (2020)MathSciNetCrossRef
2.
Zurück zum Zitat Bayraktar, E., Dolinsky, Y., Guo, J.: Continuity of utility maximization under weak convergence. Math. Financial Econ. 14(4), 1–33 (2020)MathSciNetCrossRef Bayraktar, E., Dolinsky, Y., Guo, J.: Continuity of utility maximization under weak convergence. Math. Financial Econ. 14(4), 1–33 (2020)MathSciNetCrossRef
4.
Zurück zum Zitat Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)MATH Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)MATH
5.
Zurück zum Zitat Devroye, L., Györfi, L.: Non-parametric Density Estimation: The \(L_1\) View. Wiley, New York (1985)MATH Devroye, L., Györfi, L.: Non-parametric Density Estimation: The \(L_1\) View. Wiley, New York (1985)MATH
6.
Zurück zum Zitat Dudley, R.M.: Real Analysis and Probability, 2nd edn. Cambridge University Press, Cambridge (2002)CrossRef Dudley, R.M.: Real Analysis and Probability, 2nd edn. Cambridge University Press, Cambridge (2002)CrossRef
7.
Zurück zum Zitat Dupuis, P., James, M.R., Petersen, I.: Robust properties of risk-sensitive control. Math. Control Signals Syst. 13(4), 318–332 (2000)MathSciNetCrossRef Dupuis, P., James, M.R., Petersen, I.: Robust properties of risk-sensitive control. Math. Control Signals Syst. 13(4), 318–332 (2000)MathSciNetCrossRef
8.
Zurück zum Zitat Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171(1), 1–52 (2018)MathSciNetMATH Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171(1), 1–52 (2018)MathSciNetMATH
9.
Zurück zum Zitat Feinberg, E., Kasyanov, P., Zgurovsky, M.: Partially observable total-cost Markov decision process with weakly continuous transition probabilities. Math. Oper. Res. 41(2), 656–681 (2016)MathSciNetCrossRef Feinberg, E., Kasyanov, P., Zgurovsky, M.: Partially observable total-cost Markov decision process with weakly continuous transition probabilities. Math. Oper. Res. 41(2), 656–681 (2016)MathSciNetCrossRef
10.
Zurück zum Zitat Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York (2003)MATH Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York (2003)MATH
11.
Zurück zum Zitat Gray, R.M.: Entropy and Information Theory. Springer-Verlag, New York (1990)CrossRef Gray, R.M.: Entropy and Information Theory. Springer-Verlag, New York (1990)CrossRef
12.
Zurück zum Zitat Györfi, L., Kohler, M.: Nonparametric estimation of conditional distributions. IEEE Trans. Inf. Theory 53(5), 1872–1879 (2007)MathSciNetCrossRef Györfi, L., Kohler, M.: Nonparametric estimation of conditional distributions. IEEE Trans. Inf. Theory 53(5), 1872–1879 (2007)MathSciNetCrossRef
13.
Zurück zum Zitat Hernandez-Lerma, O., Lasserre, J.: Discrete-Time Markov Control Processes. Springer, New York (1996)CrossRef Hernandez-Lerma, O., Lasserre, J.: Discrete-Time Markov Control Processes. Springer, New York (1996)CrossRef
14.
Zurück zum Zitat Jacobson, D.: Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automat. Contr. 18(2), 124–131 (1973)MathSciNetCrossRef Jacobson, D.: Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automat. Contr. 18(2), 124–131 (1973)MathSciNetCrossRef
15.
Zurück zum Zitat Kara, A.D., Saldi, N., Yüksel, S.: Weak Feller property of non-linear filters. Syst. Control Lett. 134, 104–512 (2019)MathSciNetCrossRef Kara, A.D., Saldi, N., Yüksel, S.: Weak Feller property of non-linear filters. Syst. Control Lett. 134, 104–512 (2019)MathSciNetCrossRef
16.
Zurück zum Zitat Kara, A. D., Yüksel, S.: Robustness to incorrect system models in stochastic control and application to data-driven learning. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 2753–2758 (2018) Kara, A. D., Yüksel, S.: Robustness to incorrect system models in stochastic control and application to data-driven learning. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 2753–2758 (2018)
17.
Zurück zum Zitat Kara, A.D., Yüksel, S.: Robustness to incorrect priors in partially observed stochastic control. SIAM J. Control. Optim. 57(3), 1929–1964 (2019)MathSciNetCrossRef Kara, A.D., Yüksel, S.: Robustness to incorrect priors in partially observed stochastic control. SIAM J. Control. Optim. 57(3), 1929–1964 (2019)MathSciNetCrossRef
18.
Zurück zum Zitat Kara, A.D., Yüksel, S.: Robustness to incorrect system models in stochastic control. SIAM J. Control. Optim. 58(2), 1144–1182 (2020)MathSciNetCrossRef Kara, A.D., Yüksel, S.: Robustness to incorrect system models in stochastic control. SIAM J. Control. Optim. 58(2), 1144–1182 (2020)MathSciNetCrossRef
19.
Zurück zum Zitat Parthasarathy, K.: Probability Measures on Metric Spaces. AMS, Providence (2005)MATH Parthasarathy, K.: Probability Measures on Metric Spaces. AMS, Providence (2005)MATH
20.
Zurück zum Zitat Petersen, I., James, M.R., Dupuis, P.: Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automat. Contr. 45(3), 398–412 (2000)MathSciNetCrossRef Petersen, I., James, M.R., Dupuis, P.: Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automat. Contr. 45(3), 398–412 (2000)MathSciNetCrossRef
21.
Zurück zum Zitat Pra, P.D., Meneghini, L., Runggaldier, W.J.: Connections between stochastic control and dynamic games. Math. Control Signals Syst. 9(4), 303–326 (1996)MathSciNetCrossRef Pra, P.D., Meneghini, L., Runggaldier, W.J.: Connections between stochastic control and dynamic games. Math. Control Signals Syst. 9(4), 303–326 (1996)MathSciNetCrossRef
22.
Zurück zum Zitat Saldi, N., Yüksel, S., Linder, T.: On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)MathSciNetCrossRef Saldi, N., Yüksel, S., Linder, T.: On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)MathSciNetCrossRef
23.
Zurück zum Zitat Saldi, N., Yüksel, S., Linder, T.: Near optimality of quantized policies in stochastic control under weak continuity conditions. J. Math. Anal. Appl. 435(1), 321–337 (2015)MathSciNetCrossRef Saldi, N., Yüksel, S., Linder, T.: Near optimality of quantized policies in stochastic control under weak continuity conditions. J. Math. Anal. Appl. 435(1), 321–337 (2015)MathSciNetCrossRef
24.
Zurück zum Zitat Savkin, A.V., Petersen, I.R.: Robust control of uncertain systems with structured uncertainty. J. Math. Syst. Est. Control 6(3), 1–14 (1996)MathSciNetMATH Savkin, A.V., Petersen, I.R.: Robust control of uncertain systems with structured uncertainty. J. Math. Syst. Est. Control 6(3), 1–14 (1996)MathSciNetMATH
25.
Zurück zum Zitat Sun, H., Xu, H.: Convergence analysis for distributionally robust optimization and equilibrium problems. Math. Oper. Res. 41(2), 377–401 (2016)MathSciNetCrossRef Sun, H., Xu, H.: Convergence analysis for distributionally robust optimization and equilibrium problems. Math. Oper. Res. 41(2), 377–401 (2016)MathSciNetCrossRef
26.
Zurück zum Zitat Ugrinovskii, V.A:. Robust H-infinity control in the presence of stochastic uncertainty. Int. J. Control 71(2), 219–237 (1998) Ugrinovskii, V.A:. Robust H-infinity control in the presence of stochastic uncertainty. Int. J. Control 71(2), 219–237 (1998)
Metadaten
Titel
Robustness to Approximations and Model Learning in MDPs and POMDPs
verfasst von
Ali Devran Kara
Serdar Yüksel
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-76928-4_9