Skip to main content
Erschienen in: Acta Informatica 6/2017

01.02.2016 | Original Article

Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

verfasst von: Aaron Bohy, Véronique Bruyère, Jean-François Raskin, Nathalie Bertrand

Erschienen in: Acta Informatica | Ausgabe 6/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combines symbolic and explicit data structures, and uses binary decision diagrams as symbolic representation. The aim of this paper is to show that the new data structure of pseudo-antichains (an extension of antichains) provides another interesting alternative, especially for the class of monotonic MDPs. We design efficient pseudo-antichain based symblicit algorithms (with open source implementations) for two quantitative settings: the expected mean-payoff and the stochastic shortest path. For two practical applications coming from automated planning and \(\mathsf {LTL}\) synthesis, we report promising experimental results w.r.t. both the run time and the memory consumption. We also show that a variant of pseudo-antichains allows to handle the infinite state spaces underlying the qualitative verification of probabilistic lossy channel systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
An alternative objective might be to maximize the value function, in which case \(\lambda ^*\) is optimal if \(\mathbb {E}^{~\cdot }_{\lambda ^*}(s) = \sup _{\lambda \in \varLambda } \mathbb {E}^{~\cdot }_{\lambda }(s)\) for all \(s \in S\).
 
2
If the expected truncated sum has to be maximized, the cost function is restricted to the strictly negative real numbers and \(\arg \min \) is replaced by \(\arg \max \) in line 4.
 
3
If the expected mean-payoff has to be maximized, one has to replace \(\arg \min \) by \(\arg \max \) in lines 4 and 7.
 
4
We use calligraphic style for symbols denoting a symbolic representation.
 
5
A data structure closely related to our pseudo-antichains has been proposed in [2] in the particular context of probabilistic lossy channel systems. A comparison is given in Sect. 7.3.
 
6
“PA-representation” means pseudo-antichain-based representation.
 
7
This can be easily tested by Proposition 2.
 
8
Remark 2 also holds for Assumption 2.
 
9
for all \(s \in G\) and \(\sigma \in \varSigma _s\), \(\sum _{s'\in G}\mathbf{P }(s, \sigma , s') = 1\).
 
10
The “iff” holds since probabilities p are pairwise distinct.
 
11
The improvement of a strategy for the EMP, with the gain g or the bias b values (see Algorithm 2), is similar and is thus not detailed.
 
12
As Algorithm Split only works on \(S_\sigma \), it is not a problem if \(\lambda \) is not defined on \(S \backslash S_\sigma \).
 
13
A comparison with an MTBDD based symblicit algorithm is done in the second application for the EMP problem.
 
14
In [9], the authors study a different problem that is to maximize the probability of reaching the goal within a given number of steps.
 
15
On our benchmarks, the value iteration algorithm of \(\mathsf{PRISM}\) performs better than the strategy iteration one w.r.t. the run time and memory consumption. However, it still consumes more memory than the pseudo-antichain-based algorithm, and runs out of memory on several examples.
 
16
Note that in [11, 12], the weight function w is more general since it also associates values to \({\textsf {Lit}} (I)\). However, for this application, we restrict w to \({\textsf {Lit}} (O)\).
 
17
More precisely, it reduces to the EMP problem where the objective is to maximize the expected mean-payoff (see footnotes 1 and 3).
 
18
For all the MDPs considered in Tables 1 and 2, this ratio is 1.
 
Literatur
2.
Zurück zum Zitat Baier, C., Bertrand, N., Schnoebelen, P.: Symbolic verification of communicating systems with probabilistic message losses: liveness and fairness. In: Najm, E., Pradat-Peyre, J., Donzeau-Gouge, V. (eds.) FORTE, volume 4229 of Lecture Notes in Computer Science, pp. 212–227. Springer (2006) Baier, C., Bertrand, N., Schnoebelen, P.: Symbolic verification of communicating systems with probabilistic message losses: liveness and fairness. In: Najm, E., Pradat-Peyre, J., Donzeau-Gouge, V. (eds.) FORTE, volume 4229 of Lecture Notes in Computer Science, pp. 212–227. Springer (2006)
3.
Zurück zum Zitat Baier, C., Bertrand, N., Schnoebelen, P.: Verifying nondeterministic probabilistic channel systems against \(\omega \)-regular linear-time properties. ACM Trans. Comput. Log. 9(1), Article No. 5 (2007) Baier, C., Bertrand, N., Schnoebelen, P.: Verifying nondeterministic probabilistic channel systems against \(\omega \)-regular linear-time properties. ACM Trans. Comput. Log. 9(1), Article No. 5 (2007)
4.
Zurück zum Zitat Baier, C.: Principles of model checking. MIT Press, Cambridge (2008)MATH Baier, C.: Principles of model checking. MIT Press, Cambridge (2008)MATH
5.
Zurück zum Zitat Baier, C., Katoen, J.-P., Hermanns, H., Wolf, V.: Comparative branching-time semantics for Markov chains. Inf. Comput. 200(2), 149–214 (2005)MathSciNetCrossRefMATH Baier, C., Katoen, J.-P., Hermanns, H., Wolf, V.: Comparative branching-time semantics for Markov chains. Inf. Comput. 200(2), 149–214 (2005)MathSciNetCrossRefMATH
6.
Zurück zum Zitat Bertrand, N., Schnoebelen, P.: Computable fixpoints in well-structured symbolic model checking. Form. Methods Syst. Des. 43(2), 233–267 (2013)CrossRefMATH Bertrand, N., Schnoebelen, P.: Computable fixpoints in well-structured symbolic model checking. Form. Methods Syst. Des. 43(2), 233–267 (2013)CrossRefMATH
7.
Zurück zum Zitat Bertsekas, D.P., Tsitsiklis, J.: Neuro-dynamic programming. Athena Scientific, Anthropological Field Studies, Belmont (1996)MATH Bertsekas, D.P., Tsitsiklis, J.: Neuro-dynamic programming. Athena Scientific, Anthropological Field Studies, Belmont (1996)MATH
8.
Zurück zum Zitat Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)MathSciNetCrossRefMATH Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)MathSciNetCrossRefMATH
9.
Zurück zum Zitat Blum, A.L., Langford, J.C.: Probabilistic planning in the graphplan framework. In: Biundo, S., Fox, M. (eds.) Recent Advances in AI Planning, pp. 319–332. Springer (2000) Blum, A.L., Langford, J.C.: Probabilistic planning in the graphplan framework. In: Biundo, S., Fox, M. (eds.) Recent Advances in AI Planning, pp. 319–332. Springer (2000)
10.
Zurück zum Zitat Bohy, A., Bruyère, V., Filiot, E., Jin, N., Raskin, J.-F.: Acacia+, a tool for LTL synthesis. In: Madhusudan, P., Seshia, S.A. (eds.) CAV, volume 7358 of Lecture Notes in Computer Science, pp. 652–657. Springer (2012) Bohy, A., Bruyère, V., Filiot, E., Jin, N., Raskin, J.-F.: Acacia+, a tool for LTL synthesis. In: Madhusudan, P., Seshia, S.A. (eds.) CAV, volume 7358 of Lecture Notes in Computer Science, pp. 652–657. Springer (2012)
11.
Zurück zum Zitat Bohy, A., Bruyère, V., Filiot, E., Raskin, J.-F.: Synthesis from LTL specifications with mean-payoff objectives. CoRR, abs/1210.3539 (2012) Bohy, A., Bruyère, V., Filiot, E., Raskin, J.-F.: Synthesis from LTL specifications with mean-payoff objectives. CoRR, abs/1210.3539 (2012)
12.
Zurück zum Zitat Bohy, A., Bruyère, V., Filiot, E., Raskin, J.-F.: Synthesis from LTL specifications with mean-payoff objectives. In: Piterman, N., Smolka, S.A. (eds.) TACAS, volume 7795 of Lecture Notes in Computer Science, pp. 169–184. Springer (2013) Bohy, A., Bruyère, V., Filiot, E., Raskin, J.-F.: Synthesis from LTL specifications with mean-payoff objectives. In: Piterman, N., Smolka, S.A. (eds.) TACAS, volume 7795 of Lecture Notes in Computer Science, pp. 169–184. Springer (2013)
13.
Zurück zum Zitat Bohy, A., Bruyère, V., Raskin, J.: Symblicit algorithms for optimal strategy synthesis in monotonic markov decision processes. In: Chatterjee, K., Ehlers, R., Jha, S. (eds.) SYNT, volume 157 of EPTCS, pp. 51–67 (2014) Bohy, A., Bruyère, V., Raskin, J.: Symblicit algorithms for optimal strategy synthesis in monotonic markov decision processes. In: Chatterjee, K., Ehlers, R., Jha, S. (eds.) SYNT, volume 157 of EPTCS, pp. 51–67 (2014)
14.
Zurück zum Zitat Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput. 35(8), 677–691 (1986)CrossRefMATH Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput. 35(8), 677–691 (1986)CrossRefMATH
16.
Zurück zum Zitat Burch, J.R., Clarke, E.M., McMillan, K.L., Dill, D.L., Hwang, L.J.: Symbolic model checking: \(10^{20}\) states and beyond. Inf. Comput. 98(2), 142–170 (1992)MathSciNetCrossRefMATH Burch, J.R., Clarke, E.M., McMillan, K.L., Dill, D.L., Hwang, L.J.: Symbolic model checking: \(10^{20}\) states and beyond. Inf. Comput. 98(2), 142–170 (1992)MathSciNetCrossRefMATH
17.
Zurück zum Zitat Chatterjee, K., Henzinger, T.A., Jobstmann, B., Singh, R.: In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS, volume 6605 of Lecture Notes in Computer Science. Lecture Notes in Computer Science, pp. 267–271. Springer (2011) Chatterjee, K., Henzinger, T.A., Jobstmann, B., Singh, R.: In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS, volume 6605 of Lecture Notes in Computer Science. Lecture Notes in Computer Science, pp. 267–271. Springer (2011)
18.
Zurück zum Zitat Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons using branching-time temporal logic. In: Kozen, D. (ed.) Logic of Programs, volume 131 of Lecture Notes in Computer Science, pp. 52–71. Springer (1981) Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons using branching-time temporal logic. In: Kozen, D. (ed.) Logic of Programs, volume 131 of Lecture Notes in Computer Science, pp. 52–71. Springer (1981)
19.
Zurück zum Zitat de Alfaro, L.: Computing minimum and maximum reachability times in probabilistic systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR, volume 1664 of Lecture Notes in Computer Science, pp. 66–81. Springer (1999) de Alfaro, L.: Computing minimum and maximum reachability times in probabilistic systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR, volume 1664 of Lecture Notes in Computer Science, pp. 66–81. Springer (1999)
20.
Zurück zum Zitat Derisavi, S., Hermanns, H., Sanders, W.H.: Optimal state-space lumping in Markov chains. Inf. Process. Lett. 87(6), 309–315 (2003)MathSciNetCrossRefMATH Derisavi, S., Hermanns, H., Sanders, W.H.: Optimal state-space lumping in Markov chains. Inf. Process. Lett. 87(6), 309–315 (2003)MathSciNetCrossRefMATH
21.
Zurück zum Zitat Doyen, L., Raskin, J.-F.: Improved algorithms for the automata-based approach to model-checking. In: Grumberg, O., Huth, M. (eds.) TACAS, volume 4424 of Lecture Notes in Computer Science, pp. 451–465. Springer (2007) Doyen, L., Raskin, J.-F.: Improved algorithms for the automata-based approach to model-checking. In: Grumberg, O., Huth, M. (eds.) TACAS, volume 4424 of Lecture Notes in Computer Science, pp. 451–465. Springer (2007)
22.
Zurück zum Zitat Fikes, R.E., Nilsson, N.J.: STRIPS:a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3), 189–208 (1972)MATH Fikes, R.E., Nilsson, N.J.: STRIPS:a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3), 189–208 (1972)MATH
23.
Zurück zum Zitat Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Berlin (1997)MATH Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Berlin (1997)MATH
24.
Zurück zum Zitat Filiot, E., Jin, N., Raskin, J.-F.: Antichains and compositional algorithms for LTL synthesis. Form. Methods Syst. Des. 39(3), 261–296 (2011)CrossRefMATH Filiot, E., Jin, N., Raskin, J.-F.: Antichains and compositional algorithms for LTL synthesis. Form. Methods Syst. Des. 39(3), 261–296 (2011)CrossRefMATH
25.
Zurück zum Zitat Finkel, A.: Decidability of the termination problem for completely specified protocols. Distrib. Comput. 7(3), 129–135 (1994)CrossRef Finkel, A.: Decidability of the termination problem for completely specified protocols. Distrib. Comput. 7(3), 129–135 (1994)CrossRef
26.
Zurück zum Zitat Finkel, A., Schnoebelen, P.: Well-structured transition systems everywhere!. Theor. Comput. Sci. 256(1–2), 63–92 (2001) Finkel, A., Schnoebelen, P.: Well-structured transition systems everywhere!. Theor. Comput. Sci. 256(1–2), 63–92 (2001)
27.
Zurück zum Zitat Fujita, M., McGeer, P.C., Yang, J.C.-Y.: Multi-terminal binary decision diagrams: an efficient data structure for matrix representation. Form. Methods Syst. Des. 10(2/3), 149–169 (1997)CrossRef Fujita, M., McGeer, P.C., Yang, J.C.-Y.: Multi-terminal binary decision diagrams: an efficient data structure for matrix representation. Form. Methods Syst. Des. 10(2/3), 149–169 (1997)CrossRef
28.
Zurück zum Zitat Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Form. Asp. Comput. 6(5), 512–535 (1994)CrossRefMATH Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Form. Asp. Comput. 6(5), 512–535 (1994)CrossRefMATH
29.
Zurück zum Zitat Hartmanns, A.: Modest: a unified language for quantitative models. In: FDL, IEEE, pp. 44–51 (2012) Hartmanns, A.: Modest: a unified language for quantitative models. In: FDL, IEEE, pp. 44–51 (2012)
31.
Zurück zum Zitat Howard, R.A.: Dynamic Programming and Markov Processes. Wiley, New Jersey (1960)MATH Howard, R.A.: Dynamic Programming and Markov Processes. Wiley, New Jersey (1960)MATH
32.
Zurück zum Zitat Jansen, D.N., Katoen, J.-P., Oldenkamp, M., Stoelinga, M., Zapreev, I.S.: How fast and fat is your probabilistic model checker? an experimental performance comparison. In: Yorav, K. (ed.) Haifa Verification Conference, volume 4899 of Lecture Notes in Computer Science, pp. 69–85. Springer (2007) Jansen, D.N., Katoen, J.-P., Oldenkamp, M., Stoelinga, M., Zapreev, I.S.: How fast and fat is your probabilistic model checker? an experimental performance comparison. In: Yorav, K. (ed.) Haifa Verification Conference, volume 4899 of Lecture Notes in Computer Science, pp. 69–85. Springer (2007)
33.
Zurück zum Zitat Katoen, J.-P., Zapreev, I.S., Hahn, E.M., Hermanns, H., Jansen, D.N.: The ins and outs of the probabilistic model checker MRMC. Perform. Eval. 68(2), 90–104 (2011)CrossRef Katoen, J.-P., Zapreev, I.S., Hahn, E.M., Hermanns, H., Jansen, D.N.: The ins and outs of the probabilistic model checker MRMC. Perform. Eval. 68(2), 90–104 (2011)CrossRef
34.
Zurück zum Zitat Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Van Nostrand Company, Inc, New York (1960)MATH Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Van Nostrand Company, Inc, New York (1960)MATH
35.
Zurück zum Zitat Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV, volume 6806 of Lecture Notes in Computer Science, pp. 585–591. Springer (2011) Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV, volume 6806 of Lecture Notes in Computer Science, pp. 585–591. Springer (2011)
37.
Zurück zum Zitat Majercik, S.M., Littman, M.L.: Maxplan: a new approach to probabilistic planning. In: Simmons, R.G., Veloso, M.M., Smith, S.F. (eds.) AIPS, pp. 86–93. AAAI, Palo Alto (1998) Majercik, S.M., Littman, M.L.: Maxplan: a new approach to probabilistic planning. In: Simmons, R.G., Veloso, M.M., Smith, S.F. (eds.) AIPS, pp. 86–93. AAAI, Palo Alto (1998)
38.
Zurück zum Zitat Pachl, J.K.: Protocol description and analysis based on a state transition model with channel expressions. In: Rudin, H., West, C.H. (eds.) PSTV, Proceedings of the IFIP WG6.1, pp. 207–219. North-Holland (1987) Pachl, J.K.: Protocol description and analysis based on a state transition model with channel expressions. In: Rudin, H., West, C.H. (eds.) PSTV, Proceedings of the IFIP WG6.1, pp. 207–219. North-Holland (1987)
40.
Zurück zum Zitat Parker, D.: Personal communication, 2013-11-20 Parker, D.: Personal communication, 2013-11-20
41.
Zurück zum Zitat Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New Jersey (1994)CrossRefMATH Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New Jersey (1994)CrossRefMATH
42.
Zurück zum Zitat Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995)MATH Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995)MATH
43.
Zurück zum Zitat Veinott, A.F.: On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Stat. 37(5), 1284–1294 (1966) Veinott, A.F.: On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Stat. 37(5), 1284–1294 (1966)
44.
Zurück zum Zitat Von Essen, C.: Personal communication, 2013-11-20 Von Essen, C.: Personal communication, 2013-11-20
45.
Zurück zum Zitat Von Essen, C., Jobstmann, B.: Synthesizing efficient controllers. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI, volume 7148 of Lecture Notes in Computer Science, pp. 428–444. Springer (2012) Von Essen, C., Jobstmann, B.: Synthesizing efficient controllers. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI, volume 7148 of Lecture Notes in Computer Science, pp. 428–444. Springer (2012)
46.
Zurück zum Zitat Wimmer, R., Braitling, B., Becker, B., Hahn, E.M., Crouzen, P., Hermanns, H., Dhama, A., Theel, O.E.: Symblicit calculation of long-run averages for concurrent probabilistic systems. In: QEST, IEEE Computer Society, pp. 27–36 (2010) Wimmer, R., Braitling, B., Becker, B., Hahn, E.M., Crouzen, P., Hermanns, H., Dhama, A., Theel, O.E.: Symblicit calculation of long-run averages for concurrent probabilistic systems. In: QEST, IEEE Computer Society, pp. 27–36 (2010)
47.
Zurück zum Zitat Wulf, M.D., Doyen, L., Henzinger, T.A., Raskin, J.-F.: Antichains: a new algorithm for checking universality of finite automata. In: Ball, T., Jones, R.B. (eds.) CAV, volume 4144 of Lecture Notes in Computer Science, pp. 17–30. Springer (2006) Wulf, M.D., Doyen, L., Henzinger, T.A., Raskin, J.-F.: Antichains: a new algorithm for checking universality of finite automata. In: Ball, T., Jones, R.B. (eds.) CAV, volume 4144 of Lecture Notes in Computer Science, pp. 17–30. Springer (2006)
Metadaten
Titel
Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes
verfasst von
Aaron Bohy
Véronique Bruyère
Jean-François Raskin
Nathalie Bertrand
Publikationsdatum
01.02.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Acta Informatica / Ausgabe 6/2017
Print ISSN: 0001-5903
Elektronische ISSN: 1432-0525
DOI
https://doi.org/10.1007/s00236-016-0255-4