Skip to main content
Top
Published in:
Cover of the book

2017 | OriginalPaper | Chapter

An Incremental Fast Policy Search Using a Single Sample Path

Authors : Ajin George Joseph, Shalabh Bhatnagar

Published in: Pattern Recognition and Machine Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we consider the control problem in a reinforcement learning setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which optimizes the long run \(\gamma \)-discounted transition costs, where \(\gamma \in [0,1)\). They also assume access to a generative model/simulator of the underlying MDP with the hidden premise that realization of the system dynamics of the MDP for arbitrary policies in the form of sample paths can be obtained with ease from the model. In this paper, we consider a cost function which is the expectation of a approximate value function w.r.t. the steady state distribution of the Markov chain induced by the policy, without having access to the generative model. We assume that a single sample path generated using a priori chosen behaviour policy is made available. In this information restricted setting, we solve the generalized control problem using the incremental cross entropy method. The proposed algorithm is shown to converge to the solution which is globally optimal relative to the behaviour policy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Joseph, A.G., Bhatnagar, S.: A randomized algorithm for continuous optimization. In: Winter Simulation Conference, WSC 2016, Washington, DC, USA, 11–14 December 2016, pp. 907–918 (2016) Joseph, A.G., Bhatnagar, S.: A randomized algorithm for continuous optimization. In: Winter Simulation Conference, WSC 2016, Washington, DC, USA, 11–14 December 2016, pp. 907–918 (2016)
2.
go back to reference Joseph, A.G., Bhatnagar, S.: Revisiting the cross entropy method with applications in stochastic global optimization and reinforcement learning. In: Frontiers in Artificial Intelligence and Applications, (ECAI 2016), vol. 285, pp. 1026–1034 (2016) Joseph, A.G., Bhatnagar, S.: Revisiting the cross entropy method with applications in stochastic global optimization and reinforcement learning. In: Frontiers in Artificial Intelligence and Applications, (ECAI 2016), vol. 285, pp. 1026–1034 (2016)
3.
go back to reference Koller, D., Parr, R.: Policy iteration for factored MDPS. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 326–334. Morgan Kaufmann Publishers Inc. (2000) Koller, D., Parr, R.: Policy iteration for factored MDPS. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 326–334. Morgan Kaufmann Publishers Inc. (2000)
4.
go back to reference Kroese, D.P., Porotsky, S., Rubinstein, R.Y.: The cross-entropy method for continuous multi-extremal optimization. Methodol. Comput. Appl. Probab. 8(3), 383–407 (2006)CrossRefMATHMathSciNet Kroese, D.P., Porotsky, S., Rubinstein, R.Y.: The cross-entropy method for continuous multi-extremal optimization. Methodol. Comput. Appl. Probab. 8(3), 383–407 (2006)CrossRefMATHMathSciNet
5.
go back to reference Mannor, S., Rubinstein, R.Y., Gat, Y.: The cross entropy method for fast policy search. In: ICML, pp. 512–519 (2003) Mannor, S., Rubinstein, R.Y., Gat, Y.: The cross entropy method for fast policy search. In: ICML, pp. 512–519 (2003)
6.
go back to reference Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Ann. Oper. Res. 134(1), 215–238 (2005)CrossRefMATHMathSciNet Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Ann. Oper. Res. 134(1), 215–238 (2005)CrossRefMATHMathSciNet
7.
go back to reference Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1(2), 127–190 (1999)CrossRefMATHMathSciNet Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1(2), 127–190 (1999)CrossRefMATHMathSciNet
8.
go back to reference Rubinstein, R.Y.: Cross-entropy and rare events for maximal cut and partition problems. ACM Trans. Model. Comput. Simul. (TOMACS) 12(1), 27–53 (2002)CrossRef Rubinstein, R.Y.: Cross-entropy and rare events for maximal cut and partition problems. ACM Trans. Model. Comput. Simul. (TOMACS) 12(1), 27–53 (2002)CrossRef
9.
go back to reference Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer, New York (2013)CrossRefMATH Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer, New York (2013)CrossRefMATH
Metadata
Title
An Incremental Fast Policy Search Using a Single Sample Path
Authors
Ajin George Joseph
Shalabh Bhatnagar
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-69900-4_1

Premium Partner