nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

Reducing the Number of Queries in Interactive Value Iteration

verfasst von : Hugo Gilbert, Olivier Spanjaard, Paolo Viappiani, Paul Weng

Erschienen in: Algorithmic Decision Theory

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

To tackle the potentially hard task of defining the reward function in a Markov Decision Process (MDPs), a new approach, called Interactive Value Iteration (IVI) has recently been proposed by Weng and Zanuttini (2013). This solving method, which interweaves elicitation and optimization phases, computes a (near) optimal policy without knowing the precise reward values. The procedure as originally presented can be improved in order to reduce the number of queries needed to determine an optimal policy. The key insights are that (1) asking queries should be delayed as much as possible, avoiding asking queries that might not be necessary to determine the best policy, (2) queries should be asked by following a priority order because the answers to some queries can enable to resolve some other queries, (3) queries can be avoided by using heuristic information to guide the process. Following these ideas, a modified IVI algorithm is presented and experimental results show a significant decrease in the number of queries issued.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Bayesian Network Structure Learning with Messy Inputs: The Case of Multiple Incomplete Datasets and Expert Opinions

Nächstes Kapitel Learning the Parameters of a Non Compensatory Sorting Model

In both the autonomic computing domain and in Coach, we randomly generated the transition values and the rewards in such a way to satisfy the constraints imposed by the problem domain.

Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of Twenty-First International Conference on Machine Learning, ICML 2004. ACM, New York (2004)

Bagnell, J., Ng, A., Schneider, J.: Solving uncertain markov decision processes. Technical report, CMU (2001)

Boger, J., Hoey, J., Poupart, P., Boutilier, C., Fernie, G., Mihailidis, A.: A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Trans. Inf. Technol. Biomed. 10, 323–333 (2006)CrossRef

Boutilier, C., Das, R., Kephart, J.O., Tesauro, G., Walsh, W.E.: Cooperative negotiation in autonomic systems using incremental utility elicitation. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 89–97 (2003)

Casella, G., George, E.I.: Explaining the gibbs sampler. Am. Stat. 46, 167–174 (1992)MathSciNet

Delage, E., Mannor, S.: Percentile optimization in uncertain Markov decision processes with application to efficient exploration. In: ICML, pp. 225–232 (2007)

Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision process. Artif. Intell. 122(1–2), 71–109 (2000)CrossRef

Piot, B., Geist, M., Pietquin, O.: Boosted and reward-regularized classification for apprenticeship learning. In: International conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2014, Paris, France, 5–9 May 2014, pp. 1249–1256 (2014)

Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)CrossRef

10.

Regan, K., Boutilier, C.: Regret-based reward elicitation for Markov decision Processes. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009, pp. 444–451. AUAI Press, Arlington (2009)

11.

Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain MDPS using nondominated policies. In: Fox, M., Poole, D. (eds.) AAAI. AAAI Press (2010)

12.

Regan, K., Boutilier, C.: Eliciting additive reward functions for Markov decision processes. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, pp. 2159–2164. AAAI Press (2011)

13.

Regan, K., Boutilier, C.: Robust online optimization of reward-uncertain MDPs. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, pp. 2165–2171. AAAI Press (2011)

14.

Rosenthal, S., Veloso, M.M.: Monte carlo preference elicitation for learning additive reward functions. In: RO-MAN, pp. 886–891. IEEE (2012)

15.

Thomaz, A., Hoffman, G., Breazeal, C.: Real-time interactive reinforcement learning for robots. In: AAAI Workshop Human Comprehensible Machine Learning, pp. 9–13 (2005)

16.

Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany, 11–16 June 2011 (2011)

17.

Weng, P.: Ordinal decision models for Markov decision processes. In: ECAI 2012–20th European Conference on Artificial Intelligence. Including Prestigious Applications of Artificial Intelligence (PAIS 2012) System Demonstrations Track, pp. 828–833, Montpellier, France, 27–31 August 2012 (2012)

18.

Weng, P., Zanuttini, B.: Interactive value iteration for Markov decision processes with unknown rewards. In: Rossi, F. (ed.) IJCAI. IJCAI/AAAI (2013)

19.

White, D.J.: Multi-objective infinite-horizon discounted Markov decision processes. J. Math. Anal. Appl. 89(2), 639–647 (1982)MathSciNetCrossRef

20.

Xu, H., Mannor, S.: Parametric regret in uncertain Markov decision processes. In: CDC, pp. 3606–3613. IEEE (2009)

Titel: Reducing the Number of Queries in Interactive Value Iteration
verfasst von: Hugo Gilbert
Olivier Spanjaard
Paolo Viappiani
Paul Weng
Verlag: Springer International Publishing
Buch: Algorithmic Decision Theory
Print ISBN: 978-3-319-23113-6

Electronic ISBN: 978-3-319-23114-3

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-3-319-23114-3_9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner