Skip to main content
Top

2018 | OriginalPaper | Chapter

Preference-Based Reinforcement Learning Using Dyad Ranking

Authors : Dirk Schäfer, Eyke Hüllermeier

Published in: Discovery Science

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Preference-based reinforcement learning has recently been introduced as a generalization of conventional reinformcement learning. Instead of numerical rewards, which are often difficult to specify, the former assumes weaker feedback in the form of qualitative preferences between states or trajectories. A specific realization of preference-based reinforcement learning is approximate policy iteration using label ranking. We propose an extension of this method, in which label ranking is replaced by so-called dyad ranking. The main advantage of this extension is the ability of dyad ranking to learn from feature descriptions of actions, which are often available in reinforcement learning. Several simulation studies are conducted to confirm the usefulness of the approach.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that the number of actions is not fixed per rollout and rather depends on the quality of the current policy. This includes the case that rollouts can stop prematurely before the maximal trajectory length L is reached.
 
2
Throughout all experiments we used the RPC method in conjunction with logistic regression.
 
Literature
1.
go back to reference Akrour, R., Schoenauer, M., Sebag, M.: Preference-based policy learning. In: Proceedings of ECML/PKDD-2011, Athens, Greece (2011) Akrour, R., Schoenauer, M., Sebag, M.: Preference-based policy learning. In: Proceedings of ECML/PKDD-2011, Athens, Greece (2011)
2.
go back to reference Brazdil, P., Giraud-Carrier, C.G.: Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Mach. Learn. 107(1), 1–14 (2018)MathSciNetCrossRef Brazdil, P., Giraud-Carrier, C.G.: Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Mach. Learn. 107(1), 1–14 (2018)MathSciNetCrossRef
3.
go back to reference Cheng, W., Fürnkranz, J., Hüllermeier, E., Park, S.H.: Preference-based policy iteration: leveraging preference learning for reinforcement learning. In: Proceedings of ECML/PKDD-2011, Athens, Greece (2011) Cheng, W., Fürnkranz, J., Hüllermeier, E., Park, S.H.: Preference-based policy iteration: leveraging preference learning for reinforcement learning. In: Proceedings of ECML/PKDD-2011, Athens, Greece (2011)
4.
go back to reference Dimitrakakis, C., Lagoudakis, M.G.: Rollout sampling approximate policy iteration. Mach. Learn. 72(3), 157–171 (2008)CrossRef Dimitrakakis, C., Lagoudakis, M.G.: Rollout sampling approximate policy iteration. Mach. Learn. 72(3), 157–171 (2008)CrossRef
5.
go back to reference Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89(1–2), 123–156 (2012)MathSciNetCrossRef Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89(1–2), 123–156 (2012)MathSciNetCrossRef
6.
go back to reference Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice Hall, Englewood Cliffs (2002) Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice Hall, Englewood Cliffs (2002)
7.
go back to reference Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artif. Intell. 172, 1897–1917 (2008)MathSciNetCrossRef Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artif. Intell. 172, 1897–1917 (2008)MathSciNetCrossRef
8.
go back to reference Lagoudakis, M., Parr, R.: Reinforcement learning as classification: leveraging modern classifiers. In: Proceedings of ICML, 20th International Conference on Machine Learning, vol. 20, pp. 424–431. AAAI Press (2003) Lagoudakis, M., Parr, R.: Reinforcement learning as classification: leveraging modern classifiers. In: Proceedings of ICML, 20th International Conference on Machine Learning, vol. 20, pp. 424–431. AAAI Press (2003)
9.
go back to reference Schäfer, D., Hüllermeier, E.: Plackett-Luce networks for dyad ranking. In: Workshop LWDA, Lernen, Wissen, Daten, Analysen, Potsdam, Germany (2016) Schäfer, D., Hüllermeier, E.: Plackett-Luce networks for dyad ranking. In: Workshop LWDA, Lernen, Wissen, Daten, Analysen, Potsdam, Germany (2016)
11.
go back to reference Schäfer, D., Hüllermeier, E.: Dyad ranking using Plackett-Luce models based on joint feature representations. Mach. Learn. (2018) Schäfer, D., Hüllermeier, E.: Dyad ranking using Plackett-Luce models based on joint feature representations. Mach. Learn. (2018)
12.
go back to reference Settles, B.: Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison (2008) Settles, B.: Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison (2008)
13.
go back to reference Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988) Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
14.
go back to reference Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
15.
go back to reference Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH
16.
go back to reference Vembu, S., Gärtner, T.: Label ranking: a survey. In: Fürnkranz, J., Hüllermeier, E., (eds.) Preference Learning. Springer (2010) Vembu, S., Gärtner, T.: Label ranking: a survey. In: Fürnkranz, J., Hüllermeier, E., (eds.) Preference Learning. Springer (2010)
17.
go back to reference Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRef Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRef
18.
go back to reference Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 272–292 (1992)MATH Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 272–292 (1992)MATH
19.
go back to reference Wirth, C., Akrour, R., Neumann, G., Fürnkranz, J.: A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18, 136:1–136:46 (2017) Wirth, C., Akrour, R., Neumann, G., Fürnkranz, J.: A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18, 136:1–136:46 (2017)
20.
go back to reference Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017), arXiv:1708.07747 Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017), arXiv:​1708.​07747
21.
go back to reference Zhao, Y., Kosorok, M.R., Zeng, D.: Reinforcement learning design for cancer clinical trials. Stat. Med. 28(15), 1982–1998 (2009)MathSciNetCrossRef Zhao, Y., Kosorok, M.R., Zeng, D.: Reinforcement learning design for cancer clinical trials. Stat. Med. 28(15), 1982–1998 (2009)MathSciNetCrossRef
Metadata
Title
Preference-Based Reinforcement Learning Using Dyad Ranking
Authors
Dirk Schäfer
Eyke Hüllermeier
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-01771-2_11

Premium Partner