Skip to main content

2013 | OriginalPaper | Buchkapitel

6. Up the Down Staircase: Hierarchical Reinforcement Learning

verfasst von : Alexander Paprotny, Michael Thess

Erschienen in: Realtime Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We address the question of how hierarchical, or multigrid, methods may figure in dynamic programming and reinforcement learning for recommendation engines.
After providing a general introduction, we approach the framework of hierarchical methods from both the historical analytical and algebraic viewpoints; we proceed to devising and justifying approaches to apply hierarchical methods to both the model-based as well as the model-free case. In regard to the latter, we set out from the multigrid reinforcement learning algorithms introduced by Ziv in [Ziv04] and extend these methods to finite-horizon problems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
[AR02]
Zurück zum Zitat Andre, D., Russel, S.J.: State abstraction for programmable reinforcement learning agents. In: Proceedings of the National Conference on Artificial Intelligence, pp. 119–125 (2002) Andre, D., Russel, S.J.: State abstraction for programmable reinforcement learning agents. In: Proceedings of the National Conference on Artificial Intelligence, pp. 119–125 (2002)
[Bakh66]
Zurück zum Zitat Bakhvalov, N.S.: On the convergence of a relaxation method with natural constraints on the elliptic operator. USSR Comp. Math. Math. Phys. 6, 101–113 (1966)CrossRef Bakhvalov, N.S.: On the convergence of a relaxation method with natural constraints on the elliptic operator. USSR Comp. Math. Math. Phys. 6, 101–113 (1966)CrossRef
[BC89]
Zurück zum Zitat Bertsekas, D.P., Castanon, D.A.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Trans. Automat. Contr. 34(6), 589–598 (1998)MathSciNet Bertsekas, D.P., Castanon, D.A.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Trans. Automat. Contr. 34(6), 589–598 (1998)MathSciNet
[BMR82]
Zurück zum Zitat Brandt, A., McCormick S.F., Ruge J.W.: Algebraic multigrid (AMG) for automatic multigrid solution with application in geodetic computations. Technical Report CO POB 1852, Institute Computational Studies State University (1982) Brandt, A., McCormick S.F., Ruge J.W.: Algebraic multigrid (AMG) for automatic multigrid solution with application in geodetic computations. Technical Report CO POB 1852, Institute Computational Studies State University (1982)
[BPX90]
[Bra77]
Zurück zum Zitat Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31, 333–390 (1977)CrossRefMATH Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31, 333–390 (1977)CrossRefMATH
[Dau92]
Zurück zum Zitat Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)CrossRefMATH Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)CrossRefMATH
[Diet00]
Zurück zum Zitat Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)MATHMathSciNet Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)MATHMathSciNet
[Fed64]
Zurück zum Zitat Fedorenko, R.P.: The speed of convergence of one iterative process. USSR Comput. Math. Math. Phys. 4, 227–235 (1964)CrossRef Fedorenko, R.P.: The speed of convergence of one iterative process. USSR Comput. Math. Math. Phys. 4, 227–235 (1964)CrossRef
[GVL96]
Zurück zum Zitat Goloub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore/London (1996) Goloub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore/London (1996)
[Ha85]
Zurück zum Zitat Hackbusch, W.: Multigrid Methods and Applications. Springer, New York (1985)CrossRef Hackbusch, W.: Multigrid Methods and Applications. Springer, New York (1985)CrossRef
[Ma99]
Zurück zum Zitat Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San Diego (1999)MATH Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San Diego (1999)MATH
[MRLG05]
Zurück zum Zitat Marthi, B., Russell, S.J., Latham, D., Guestrin, C.: Concurrent hierarchical reinforcement learning. In: Proceedings of the 19th International Conference on Artificial Intelligence (IJCAI), pp. 779–785. (2005) Marthi, B., Russell, S.J., Latham, D., Guestrin, C.: Concurrent hierarchical reinforcement learning. In: Proceedings of the 19th International Conference on Artificial Intelligence (IJCAI), pp. 779–785. (2005)
[Os94]
[Pap10]
Zurück zum Zitat Paprotny A.: Hierarchical methods for the solution of dynamic programming equations arising from optimal control problems related to recommendation. Diploma Thesis, TU Hamburg-Harburg (2010) Paprotny A.: Hierarchical methods for the solution of dynamic programming equations arising from optimal control problems related to recommendation. Diploma Thesis, TU Hamburg-Harburg (2010)
[Pap11]
Zurück zum Zitat Paprotny, A.: Multilevel Methods for Dynamic Programming: Deterministic and Stochastic Iterative Methods with Application to Recommendation Engines. AVM – Akademische Verlagsgemeinschaft, München (2011) Paprotny, A.: Multilevel Methods for Dynamic Programming: Deterministic and Stochastic Iterative Methods with Application to Recommendation Engines. AVM – Akademische Verlagsgemeinschaft, München (2011)
[PR98]
Zurück zum Zitat Parr, R., Russel, S.J.: Reinforcement learning with hierarchies of machines. Adv. Neural Inf. Process. Syst., 1088–1095 (1998) Parr, R., Russel, S.J.: Reinforcement learning with hierarchies of machines. Adv. Neural Inf. Process. Syst., 1088–1095 (1998)
[SPS99]
Zurück zum Zitat Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)CrossRefMATHMathSciNet Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)CrossRefMATHMathSciNet
[The12]
Zurück zum Zitat Thess, M.: Multilevel preconditioners for temporal-difference learning methods related to recommendation engines. In: Apel, T., Steinbach, O. (eds.) Advanced Finite Element Methods and Applications. Springer, Berlin (2012) Thess, M.: Multilevel preconditioners for temporal-difference learning methods related to recommendation engines. In: Apel, T., Steinbach, O. (eds.) Advanced Finite Element Methods and Applications. Springer, Berlin (2012)
[TOS01]
Zurück zum Zitat Stüben, K.: An introduction to algebraic multigrid. In: Trottenberg, U., Oosterlee, C., Schüller, A. (eds.) Multigrid, pp. 413–532. Academic Press, San Diego (2001) Stüben, K.: An introduction to algebraic multigrid. In: Trottenberg, U., Oosterlee, C., Schüller, A. (eds.) Multigrid, pp. 413–532. Academic Press, San Diego (2001)
[Ziv04]
Zurück zum Zitat Ziv, O.: Algebraic multigrid for reinforcement learning. Master’s Thesis, Technion (2004) Ziv, O.: Algebraic multigrid for reinforcement learning. Master’s Thesis, Technion (2004)
[Zu00]
Zurück zum Zitat Zumbusch, G.: A sparse grid PDE solver. In: Langtangen, H.P., Bruaset A.M., Quak E. (eds.) Advances in Software Tools for Scientific Computing, Proceedings SciTools '98. Lecture Notes in Computational Science and Engineering, vol. 10, chapter 4, pp. 133–178. Springer (2000) Zumbusch, G.: A sparse grid PDE solver. In: Langtangen, H.P., Bruaset A.M., Quak E. (eds.) Advances in Software Tools for Scientific Computing, Proceedings SciTools '98. Lecture Notes in Computational Science and Engineering, vol. 10, chapter 4, pp. 133–178. Springer (2000)
Metadaten
Titel
Up the Down Staircase: Hierarchical Reinforcement Learning
verfasst von
Alexander Paprotny
Michael Thess
Copyright-Jahr
2013
Verlag
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-01321-3_6

Premium Partner