Skip to main content

2015 | OriginalPaper | Buchkapitel

Reducing the Number of Queries in Interactive Value Iteration

verfasst von : Hugo Gilbert, Olivier Spanjaard, Paolo Viappiani, Paul Weng

Erschienen in: Algorithmic Decision Theory

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To tackle the potentially hard task of defining the reward function in a Markov Decision Process (MDPs), a new approach, called Interactive Value Iteration (IVI) has recently been proposed by Weng and Zanuttini (2013). This solving method, which interweaves elicitation and optimization phases, computes a (near) optimal policy without knowing the precise reward values. The procedure as originally presented can be improved in order to reduce the number of queries needed to determine an optimal policy. The key insights are that (1) asking queries should be delayed as much as possible, avoiding asking queries that might not be necessary to determine the best policy, (2) queries should be asked by following a priority order because the answers to some queries can enable to resolve some other queries, (3) queries can be avoided by using heuristic information to guide the process. Following these ideas, a modified IVI algorithm is presented and experimental results show a significant decrease in the number of queries issued.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In both the autonomic computing domain and in Coach, we randomly generated the transition values and the rewards in such a way to satisfy the constraints imposed by the problem domain.
 
Literatur
1.
Zurück zum Zitat Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of Twenty-First International Conference on Machine Learning, ICML 2004. ACM, New York (2004) Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of Twenty-First International Conference on Machine Learning, ICML 2004. ACM, New York (2004)
2.
Zurück zum Zitat Bagnell, J., Ng, A., Schneider, J.: Solving uncertain markov decision processes. Technical report, CMU (2001) Bagnell, J., Ng, A., Schneider, J.: Solving uncertain markov decision processes. Technical report, CMU (2001)
3.
Zurück zum Zitat Boger, J., Hoey, J., Poupart, P., Boutilier, C., Fernie, G., Mihailidis, A.: A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Trans. Inf. Technol. Biomed. 10, 323–333 (2006)CrossRef Boger, J., Hoey, J., Poupart, P., Boutilier, C., Fernie, G., Mihailidis, A.: A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Trans. Inf. Technol. Biomed. 10, 323–333 (2006)CrossRef
4.
Zurück zum Zitat Boutilier, C., Das, R., Kephart, J.O., Tesauro, G., Walsh, W.E.: Cooperative negotiation in autonomic systems using incremental utility elicitation. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 89–97 (2003) Boutilier, C., Das, R., Kephart, J.O., Tesauro, G., Walsh, W.E.: Cooperative negotiation in autonomic systems using incremental utility elicitation. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 89–97 (2003)
5.
Zurück zum Zitat Casella, G., George, E.I.: Explaining the gibbs sampler. Am. Stat. 46, 167–174 (1992)MathSciNet Casella, G., George, E.I.: Explaining the gibbs sampler. Am. Stat. 46, 167–174 (1992)MathSciNet
6.
Zurück zum Zitat Delage, E., Mannor, S.: Percentile optimization in uncertain Markov decision processes with application to efficient exploration. In: ICML, pp. 225–232 (2007) Delage, E., Mannor, S.: Percentile optimization in uncertain Markov decision processes with application to efficient exploration. In: ICML, pp. 225–232 (2007)
7.
Zurück zum Zitat Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision process. Artif. Intell. 122(1–2), 71–109 (2000)CrossRef Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision process. Artif. Intell. 122(1–2), 71–109 (2000)CrossRef
8.
Zurück zum Zitat Piot, B., Geist, M., Pietquin, O.: Boosted and reward-regularized classification for apprenticeship learning. In: International conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2014, Paris, France, 5–9 May 2014, pp. 1249–1256 (2014) Piot, B., Geist, M., Pietquin, O.: Boosted and reward-regularized classification for apprenticeship learning. In: International conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2014, Paris, France, 5–9 May 2014, pp. 1249–1256 (2014)
9.
Zurück zum Zitat Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)CrossRef Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)CrossRef
10.
Zurück zum Zitat Regan, K., Boutilier, C.: Regret-based reward elicitation for Markov decision Processes. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009, pp. 444–451. AUAI Press, Arlington (2009) Regan, K., Boutilier, C.: Regret-based reward elicitation for Markov decision Processes. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009, pp. 444–451. AUAI Press, Arlington (2009)
11.
Zurück zum Zitat Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain MDPS using nondominated policies. In: Fox, M., Poole, D. (eds.) AAAI. AAAI Press (2010) Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain MDPS using nondominated policies. In: Fox, M., Poole, D. (eds.) AAAI. AAAI Press (2010)
12.
Zurück zum Zitat Regan, K., Boutilier, C.: Eliciting additive reward functions for Markov decision processes. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, pp. 2159–2164. AAAI Press (2011) Regan, K., Boutilier, C.: Eliciting additive reward functions for Markov decision processes. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, pp. 2159–2164. AAAI Press (2011)
13.
Zurück zum Zitat Regan, K., Boutilier, C.: Robust online optimization of reward-uncertain MDPs. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, pp. 2165–2171. AAAI Press (2011) Regan, K., Boutilier, C.: Robust online optimization of reward-uncertain MDPs. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, pp. 2165–2171. AAAI Press (2011)
14.
Zurück zum Zitat Rosenthal, S., Veloso, M.M.: Monte carlo preference elicitation for learning additive reward functions. In: RO-MAN, pp. 886–891. IEEE (2012) Rosenthal, S., Veloso, M.M.: Monte carlo preference elicitation for learning additive reward functions. In: RO-MAN, pp. 886–891. IEEE (2012)
15.
Zurück zum Zitat Thomaz, A., Hoffman, G., Breazeal, C.: Real-time interactive reinforcement learning for robots. In: AAAI Workshop Human Comprehensible Machine Learning, pp. 9–13 (2005) Thomaz, A., Hoffman, G., Breazeal, C.: Real-time interactive reinforcement learning for robots. In: AAAI Workshop Human Comprehensible Machine Learning, pp. 9–13 (2005)
16.
Zurück zum Zitat Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany, 11–16 June 2011 (2011) Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany, 11–16 June 2011 (2011)
17.
Zurück zum Zitat Weng, P.: Ordinal decision models for Markov decision processes. In: ECAI 2012–20th European Conference on Artificial Intelligence. Including Prestigious Applications of Artificial Intelligence (PAIS 2012) System Demonstrations Track, pp. 828–833, Montpellier, France, 27–31 August 2012 (2012) Weng, P.: Ordinal decision models for Markov decision processes. In: ECAI 2012–20th European Conference on Artificial Intelligence. Including Prestigious Applications of Artificial Intelligence (PAIS 2012) System Demonstrations Track, pp. 828–833, Montpellier, France, 27–31 August 2012 (2012)
18.
Zurück zum Zitat Weng, P., Zanuttini, B.: Interactive value iteration for Markov decision processes with unknown rewards. In: Rossi, F. (ed.) IJCAI. IJCAI/AAAI (2013) Weng, P., Zanuttini, B.: Interactive value iteration for Markov decision processes with unknown rewards. In: Rossi, F. (ed.) IJCAI. IJCAI/AAAI (2013)
19.
Zurück zum Zitat White, D.J.: Multi-objective infinite-horizon discounted Markov decision processes. J. Math. Anal. Appl. 89(2), 639–647 (1982)MathSciNetCrossRef White, D.J.: Multi-objective infinite-horizon discounted Markov decision processes. J. Math. Anal. Appl. 89(2), 639–647 (1982)MathSciNetCrossRef
20.
Zurück zum Zitat Xu, H., Mannor, S.: Parametric regret in uncertain Markov decision processes. In: CDC, pp. 3606–3613. IEEE (2009) Xu, H., Mannor, S.: Parametric regret in uncertain Markov decision processes. In: CDC, pp. 3606–3613. IEEE (2009)
Metadaten
Titel
Reducing the Number of Queries in Interactive Value Iteration
verfasst von
Hugo Gilbert
Olivier Spanjaard
Paolo Viappiani
Paul Weng
Copyright-Jahr
2015
Verlag
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-23114-3_9

Premium Partner