nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

verfasst von : Alessandro Abate, Milan Češka, Marta Kwiatkowska

Erschienen in: Automated Technology for Verification and Analysis

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Synthesizing and Completely Testing Hardware Based on Templates Through Small Numbers of Test Patterns

Nächstes Kapitel Optimizing the Expected Mean Payoff in Energy Markov Decision Processes

Abate, A., Brim, L., Češka, M., Kwiatkowska, M.: Adaptive aggregation of Markov chains: quantitative analysis of chemical reaction networks. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 195–213. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21690-4_12 CrossRef

Abate, A., D’Innocenzo, A., Benedetto, M.D.: Approximate abstractions of stochastic hybrid systems. IEEE Trans. Autom. Control 56(11), 2688–2694 (2011)MathSciNetCrossRef

Bertsekas, D.: Dynamic Programming and Optimal Control, vol. I. Athena Scientific, Belmont (1995)MATH

Bertsekas, D.: Approximate policy iteration: a survey and some new methods. J. Control Theor. Appl. 9(3), 310–335 (2011)MathSciNetCrossRefMATH

Bertsekas, D.: Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming. Athena Scientific, Belmont (2012)MATH

Bertsekas, D.: Tsitsiklis: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11936-6_8

D’Innocenzo, A., Abate, A., Katoen, J.-P.: Robust PCTL model checking. In: Proceedings of the HSCC 2012, pp. 275–285. ACM (2012)

Haesaert, S., Babuska, R., Abate, A.: Sampling-based approximations with quantitative performance for the probabilistic reach-avoid problem over general Markov processes. arXiv (2014). arXiv:1409.0553

10.

Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Logic Algebraic Program. 81(4), 356–389 (2012)MathSciNetCrossRefMATH

11.

Kattenbelt, M., Kwiatkowska, M., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246–280 (2010)CrossRefMATH

12.

Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22110-1_47 CrossRef

13.

Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)MathSciNetCrossRef

14.

McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the ICML, pp. 569–576. ACM (2005)

15.

Munos, R., Szepesvari, C.: Finite time bounds for fitted value iteration. J. Mach. Learn. Res. 9, 815–857 (2008)MathSciNetMATH

16.

Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Hoboken (2005)MATH

Titel: Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations
verfasst von: Alessandro Abate
Milan Češka
Marta Kwiatkowska
Verlag: Springer International Publishing
Buch: Automated Technology for Verification and Analysis
Print ISBN: 978-3-319-46519-7

Electronic ISBN: 978-3-319-46520-3

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-46520-3_2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner