Skip to main content

2018 | OriginalPaper | Buchkapitel

Parameter-Independent Strategies for pMDPs via POMDPs

verfasst von : Sebastian Arming, Ezio Bartocci, Krishnendu Chatterjee, Joost-Pieter Katoen, Ana Sokolova

Erschienen in: Quantitative Evaluation of Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances.
We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs.
We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We can deal with objectives beyond reachability as long as they are induced by a reward structure (for applicability of the available tools), see Sects. 3 and 4.
 
2
The discounted accumulated reward objective is defined in a similar way, by adding a factor \(\gamma ^i\) to the i-th summand in (1) with \(\gamma \in [0,1)\) being the discount factor. For solving reachability objectives, undiscounted rewards are sufficient.
 
3
We use the abbreviation pMDP rather than PMDP as it is common in the recent literature, see (e.g. [17, 35] and as it reminds of the parameter p.).
 
4
All of our code and models, as well as detailed results of the experiments can be found at http://​github.​com/​sarming/​pMDP-Toolbox.
 
Literatur
2.
Zurück zum Zitat Arming, S., Bartocci, E., Sokolova, A.: SEA-PARAM: exploring schedulers in parametric MDPs. In: Proceedings of the QAPL 2017. EPTCS, vol. 250, pp. 25–38 (2017) Arming, S., Bartocci, E., Sokolova, A.: SEA-PARAM: exploring schedulers in parametric MDPs. In: Proceedings of the QAPL 2017. EPTCS, vol. 250, pp. 25–38 (2017)
3.
Zurück zum Zitat Aspnes, J., Herlihy, M.: Fast randomized consensus using shared memory. J. Algorithms 11(3), 441–461 (1990)MathSciNetCrossRef Aspnes, J., Herlihy, M.: Fast randomized consensus using shared memory. J. Algorithms 11(3), 441–461 (1990)MathSciNetCrossRef
4.
5.
Zurück zum Zitat Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)MATH Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)MATH
10.
Zurück zum Zitat Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning - a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the UAI 1997, pp. 54–61 (1997) Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning - a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the UAI 1997, pp. 54–61 (1997)
12.
13.
Zurück zum Zitat Chatterjee, K., Chmelik, M., Davies, J.: A symbolic SAT-based algorithm for almost-sure reachability with small strategies in POMDPs. In: Proceedings of the AAAI 2016, pp. 3225–3232 (2016) Chatterjee, K., Chmelik, M., Davies, J.: A symbolic SAT-based algorithm for almost-sure reachability with small strategies in POMDPs. In: Proceedings of the AAAI 2016, pp. 3225–3232 (2016)
14.
Zurück zum Zitat Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artif. Intell. 234, 26–48 (2016)MathSciNetCrossRef Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artif. Intell. 234, 26–48 (2016)MathSciNetCrossRef
16.
Zurück zum Zitat Chen, T., Hahn, E.M., Han, T., Kwiatkowska, M.Z., Qu, H., Zhang, L.: Model repair for Markov decision processes. In: Proceedings of the TASE 2013, pp. 85–92 (2013) Chen, T., Hahn, E.M., Han, T., Kwiatkowska, M.Z., Qu, H., Zhang, L.: Model repair for Markov decision processes. In: Proceedings of the TASE 2013, pp. 85–92 (2013)
21.
Zurück zum Zitat Hahn, E.M., Han, T., Zhang, L.: Probabilistic reachability for parametric Markov models. STTT 13(1), 3–19 (2011)CrossRef Hahn, E.M., Han, T., Zhang, L.: Probabilistic reachability for parametric Markov models. STTT 13(1), 3–19 (2011)CrossRef
26.
Zurück zum Zitat Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J., Becker, B.: Finite-state controllers of POMDPs via parameter synthesis. In: Proceedings of the UAI 2018 (2018) Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J., Becker, B.: Finite-state controllers of POMDPs via parameter synthesis. In: Proceedings of the UAI 2018 (2018)
29.
Zurück zum Zitat Lanotte, R., Maggiolo-Schettini, A., Troina, A.: Parametric probabilistic transition systems for system design and analysis. Form. Asp. Comput. 19(1), 93–109 (2007)CrossRef Lanotte, R., Maggiolo-Schettini, A., Troina, A.: Parametric probabilistic transition systems for system design and analysis. Form. Asp. Comput. 19(1), 93–109 (2007)CrossRef
31.
Zurück zum Zitat Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)MathSciNetCrossRef Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)MathSciNetCrossRef
32.
Zurück zum Zitat Medina Ayala, A.I., Andersson, S.B., Belta, C.: Probabilistic control from time-bounded temporal logic specifications in dynamic environments. In: Proceedings of the ICRA 2012, pp. 4705–4710. IEEE (2012) Medina Ayala, A.I., Andersson, S.B., Belta, C.: Probabilistic control from time-bounded temporal logic specifications in dynamic environments. In: Proceedings of the ICRA 2012, pp. 4705–4710. IEEE (2012)
34.
Zurück zum Zitat Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration - an anytime algorithm for POMDPs. In: Proceedings of the IJCAI 2003, pp. 1025–1032 (2003) Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration - an anytime algorithm for POMDPs. In: Proceedings of the IJCAI 2003, pp. 1025–1032 (2003)
37.
Zurück zum Zitat Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (2009)MATH Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (2009)MATH
38.
Zurück zum Zitat Sennott, L.I.: Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York (1998)CrossRef Sennott, L.I.: Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York (1998)CrossRef
39.
Zurück zum Zitat Spaan, M.T.J., Vlassis, N.: Perseus: randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2011)CrossRef Spaan, M.T.J., Vlassis, N.: Perseus: randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2011)CrossRef
Metadaten
Titel
Parameter-Independent Strategies for pMDPs via POMDPs
verfasst von
Sebastian Arming
Ezio Bartocci
Krishnendu Chatterjee
Joost-Pieter Katoen
Ana Sokolova
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-99154-2_4