Skip to main content
Top

2021 | OriginalPaper | Chapter

Robustness to Approximations and Model Learning in MDPs and POMDPs

Authors : Ali Devran Kara, Serdar Yüksel

Published in: Modern Trends in Controlled Stochastic Processes:

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In stochastic control applications, typically only an ideal model (controlled transition kernel) is assumed and the control design is based on the given model, raising the problem of performance loss due to the mismatch between the assumed model and the actual model. In some further setups, an exact model may be known, but this model may entail computationally challenging optimality analysis leading to the solution of some approximate model being implemented. With such a motivation, we study continuity properties of discrete-time stochastic control problems with respect to system models and robustness of optimal control policies designed for incorrect models applied to the true system. We study both fully observed and partially observed setups under an infinite horizon discounted expected cost criterion. We show that continuity can be established under total variation convergence of the transition kernels under mild assumptions and with further restrictions on the dynamics and observation model under weak and setwise convergence of the transition kernels. Using these, we establish convergence results and error bounds due to mismatch that occurs by the application of a control policy which is designed for an incorrectly estimated system model to the actual system, thus establishing results on robustness. These entail implications on empirical learning in (data-driven) stochastic control since often system models are learned through empirical training data where typically the weak convergence criterion applies but stronger convergence criteria do not. We finally view and establish approximation as a particular instance of robustness.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Backhoff-Veraguas, J., Bartl, D., Beiglböck, M., Eder, M.: Adapted Wasserstein distances and stability in mathematical finance. Financ. Stoch. 24, 3601–632 (2020)MathSciNetCrossRef Backhoff-Veraguas, J., Bartl, D., Beiglböck, M., Eder, M.: Adapted Wasserstein distances and stability in mathematical finance. Financ. Stoch. 24, 3601–632 (2020)MathSciNetCrossRef
2.
go back to reference Bayraktar, E., Dolinsky, Y., Guo, J.: Continuity of utility maximization under weak convergence. Math. Financial Econ. 14(4), 1–33 (2020)MathSciNetCrossRef Bayraktar, E., Dolinsky, Y., Guo, J.: Continuity of utility maximization under weak convergence. Math. Financial Econ. 14(4), 1–33 (2020)MathSciNetCrossRef
4.
go back to reference Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)MATH Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)MATH
5.
go back to reference Devroye, L., Györfi, L.: Non-parametric Density Estimation: The \(L_1\) View. Wiley, New York (1985)MATH Devroye, L., Györfi, L.: Non-parametric Density Estimation: The \(L_1\) View. Wiley, New York (1985)MATH
6.
go back to reference Dudley, R.M.: Real Analysis and Probability, 2nd edn. Cambridge University Press, Cambridge (2002)CrossRef Dudley, R.M.: Real Analysis and Probability, 2nd edn. Cambridge University Press, Cambridge (2002)CrossRef
7.
go back to reference Dupuis, P., James, M.R., Petersen, I.: Robust properties of risk-sensitive control. Math. Control Signals Syst. 13(4), 318–332 (2000)MathSciNetCrossRef Dupuis, P., James, M.R., Petersen, I.: Robust properties of risk-sensitive control. Math. Control Signals Syst. 13(4), 318–332 (2000)MathSciNetCrossRef
8.
go back to reference Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171(1), 1–52 (2018)MathSciNetMATH Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171(1), 1–52 (2018)MathSciNetMATH
9.
go back to reference Feinberg, E., Kasyanov, P., Zgurovsky, M.: Partially observable total-cost Markov decision process with weakly continuous transition probabilities. Math. Oper. Res. 41(2), 656–681 (2016)MathSciNetCrossRef Feinberg, E., Kasyanov, P., Zgurovsky, M.: Partially observable total-cost Markov decision process with weakly continuous transition probabilities. Math. Oper. Res. 41(2), 656–681 (2016)MathSciNetCrossRef
10.
go back to reference Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York (2003)MATH Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York (2003)MATH
11.
go back to reference Gray, R.M.: Entropy and Information Theory. Springer-Verlag, New York (1990)CrossRef Gray, R.M.: Entropy and Information Theory. Springer-Verlag, New York (1990)CrossRef
12.
go back to reference Györfi, L., Kohler, M.: Nonparametric estimation of conditional distributions. IEEE Trans. Inf. Theory 53(5), 1872–1879 (2007)MathSciNetCrossRef Györfi, L., Kohler, M.: Nonparametric estimation of conditional distributions. IEEE Trans. Inf. Theory 53(5), 1872–1879 (2007)MathSciNetCrossRef
13.
go back to reference Hernandez-Lerma, O., Lasserre, J.: Discrete-Time Markov Control Processes. Springer, New York (1996)CrossRef Hernandez-Lerma, O., Lasserre, J.: Discrete-Time Markov Control Processes. Springer, New York (1996)CrossRef
14.
go back to reference Jacobson, D.: Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automat. Contr. 18(2), 124–131 (1973)MathSciNetCrossRef Jacobson, D.: Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automat. Contr. 18(2), 124–131 (1973)MathSciNetCrossRef
15.
go back to reference Kara, A.D., Saldi, N., Yüksel, S.: Weak Feller property of non-linear filters. Syst. Control Lett. 134, 104–512 (2019)MathSciNetCrossRef Kara, A.D., Saldi, N., Yüksel, S.: Weak Feller property of non-linear filters. Syst. Control Lett. 134, 104–512 (2019)MathSciNetCrossRef
16.
go back to reference Kara, A. D., Yüksel, S.: Robustness to incorrect system models in stochastic control and application to data-driven learning. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 2753–2758 (2018) Kara, A. D., Yüksel, S.: Robustness to incorrect system models in stochastic control and application to data-driven learning. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 2753–2758 (2018)
17.
go back to reference Kara, A.D., Yüksel, S.: Robustness to incorrect priors in partially observed stochastic control. SIAM J. Control. Optim. 57(3), 1929–1964 (2019)MathSciNetCrossRef Kara, A.D., Yüksel, S.: Robustness to incorrect priors in partially observed stochastic control. SIAM J. Control. Optim. 57(3), 1929–1964 (2019)MathSciNetCrossRef
18.
go back to reference Kara, A.D., Yüksel, S.: Robustness to incorrect system models in stochastic control. SIAM J. Control. Optim. 58(2), 1144–1182 (2020)MathSciNetCrossRef Kara, A.D., Yüksel, S.: Robustness to incorrect system models in stochastic control. SIAM J. Control. Optim. 58(2), 1144–1182 (2020)MathSciNetCrossRef
19.
go back to reference Parthasarathy, K.: Probability Measures on Metric Spaces. AMS, Providence (2005)MATH Parthasarathy, K.: Probability Measures on Metric Spaces. AMS, Providence (2005)MATH
20.
go back to reference Petersen, I., James, M.R., Dupuis, P.: Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automat. Contr. 45(3), 398–412 (2000)MathSciNetCrossRef Petersen, I., James, M.R., Dupuis, P.: Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automat. Contr. 45(3), 398–412 (2000)MathSciNetCrossRef
21.
go back to reference Pra, P.D., Meneghini, L., Runggaldier, W.J.: Connections between stochastic control and dynamic games. Math. Control Signals Syst. 9(4), 303–326 (1996)MathSciNetCrossRef Pra, P.D., Meneghini, L., Runggaldier, W.J.: Connections between stochastic control and dynamic games. Math. Control Signals Syst. 9(4), 303–326 (1996)MathSciNetCrossRef
22.
go back to reference Saldi, N., Yüksel, S., Linder, T.: On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)MathSciNetCrossRef Saldi, N., Yüksel, S., Linder, T.: On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)MathSciNetCrossRef
23.
go back to reference Saldi, N., Yüksel, S., Linder, T.: Near optimality of quantized policies in stochastic control under weak continuity conditions. J. Math. Anal. Appl. 435(1), 321–337 (2015)MathSciNetCrossRef Saldi, N., Yüksel, S., Linder, T.: Near optimality of quantized policies in stochastic control under weak continuity conditions. J. Math. Anal. Appl. 435(1), 321–337 (2015)MathSciNetCrossRef
24.
go back to reference Savkin, A.V., Petersen, I.R.: Robust control of uncertain systems with structured uncertainty. J. Math. Syst. Est. Control 6(3), 1–14 (1996)MathSciNetMATH Savkin, A.V., Petersen, I.R.: Robust control of uncertain systems with structured uncertainty. J. Math. Syst. Est. Control 6(3), 1–14 (1996)MathSciNetMATH
25.
go back to reference Sun, H., Xu, H.: Convergence analysis for distributionally robust optimization and equilibrium problems. Math. Oper. Res. 41(2), 377–401 (2016)MathSciNetCrossRef Sun, H., Xu, H.: Convergence analysis for distributionally robust optimization and equilibrium problems. Math. Oper. Res. 41(2), 377–401 (2016)MathSciNetCrossRef
26.
go back to reference Ugrinovskii, V.A:. Robust H-infinity control in the presence of stochastic uncertainty. Int. J. Control 71(2), 219–237 (1998) Ugrinovskii, V.A:. Robust H-infinity control in the presence of stochastic uncertainty. Int. J. Control 71(2), 219–237 (1998)
Metadata
Title
Robustness to Approximations and Model Learning in MDPs and POMDPs
Authors
Ali Devran Kara
Serdar Yüksel
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-76928-4_9

Premium Partner