Skip to main content

2016 | OriginalPaper | Buchkapitel

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

verfasst von : Alessandro Abate, Milan Češka, Marta Kwiatkowska

Erschienen in: Automated Technology for Verification and Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abate, A., Brim, L., Češka, M., Kwiatkowska, M.: Adaptive aggregation of Markov chains: quantitative analysis of chemical reaction networks. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 195–213. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21690-4_12 CrossRef Abate, A., Brim, L., Češka, M., Kwiatkowska, M.: Adaptive aggregation of Markov chains: quantitative analysis of chemical reaction networks. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 195–213. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-21690-4_​12 CrossRef
2.
Zurück zum Zitat Abate, A., D’Innocenzo, A., Benedetto, M.D.: Approximate abstractions of stochastic hybrid systems. IEEE Trans. Autom. Control 56(11), 2688–2694 (2011)MathSciNetCrossRef Abate, A., D’Innocenzo, A., Benedetto, M.D.: Approximate abstractions of stochastic hybrid systems. IEEE Trans. Autom. Control 56(11), 2688–2694 (2011)MathSciNetCrossRef
3.
Zurück zum Zitat Bertsekas, D.: Dynamic Programming and Optimal Control, vol. I. Athena Scientific, Belmont (1995)MATH Bertsekas, D.: Dynamic Programming and Optimal Control, vol. I. Athena Scientific, Belmont (1995)MATH
4.
5.
Zurück zum Zitat Bertsekas, D.: Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming. Athena Scientific, Belmont (2012)MATH Bertsekas, D.: Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming. Athena Scientific, Belmont (2012)MATH
6.
Zurück zum Zitat Bertsekas, D.: Tsitsiklis: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996) Bertsekas, D.: Tsitsiklis: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
7.
Zurück zum Zitat Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11936-6_8 Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-11936-6_​8
8.
Zurück zum Zitat D’Innocenzo, A., Abate, A., Katoen, J.-P.: Robust PCTL model checking. In: Proceedings of the HSCC 2012, pp. 275–285. ACM (2012) D’Innocenzo, A., Abate, A., Katoen, J.-P.: Robust PCTL model checking. In: Proceedings of the HSCC 2012, pp. 275–285. ACM (2012)
9.
Zurück zum Zitat Haesaert, S., Babuska, R., Abate, A.: Sampling-based approximations with quantitative performance for the probabilistic reach-avoid problem over general Markov processes. arXiv (2014). arXiv:1409.0553 Haesaert, S., Babuska, R., Abate, A.: Sampling-based approximations with quantitative performance for the probabilistic reach-avoid problem over general Markov processes. arXiv (2014). arXiv:​1409.​0553
10.
Zurück zum Zitat Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Logic Algebraic Program. 81(4), 356–389 (2012)MathSciNetCrossRefMATH Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Logic Algebraic Program. 81(4), 356–389 (2012)MathSciNetCrossRefMATH
11.
Zurück zum Zitat Kattenbelt, M., Kwiatkowska, M., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246–280 (2010)CrossRefMATH Kattenbelt, M., Kwiatkowska, M., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246–280 (2010)CrossRefMATH
12.
Zurück zum Zitat Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22110-1_47 CrossRef Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-22110-1_​47 CrossRef
13.
Zurück zum Zitat Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)MathSciNetCrossRef Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)MathSciNetCrossRef
14.
Zurück zum Zitat McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the ICML, pp. 569–576. ACM (2005) McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the ICML, pp. 569–576. ACM (2005)
15.
Zurück zum Zitat Munos, R., Szepesvari, C.: Finite time bounds for fitted value iteration. J. Mach. Learn. Res. 9, 815–857 (2008)MathSciNetMATH Munos, R., Szepesvari, C.: Finite time bounds for fitted value iteration. J. Mach. Learn. Res. 9, 815–857 (2008)MathSciNetMATH
16.
Zurück zum Zitat Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Hoboken (2005)MATH Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Hoboken (2005)MATH
Metadaten
Titel
Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations
verfasst von
Alessandro Abate
Milan Češka
Marta Kwiatkowska
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46520-3_2

Premium Partner