Skip to main content
Top

2020 | OriginalPaper | Chapter

Quantile Regression Hindsight Experience Replay

Authors : Qiwei He, Liansheng Zhuang, Wei Zhang, Houqiang Li

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Efficient learning in the environment with sparse rewards is one of the most important challenges in Deep Reinforcement Learning (DRL). In continuous DRL environments such as robotic manipulation tasks, Multi-goal RL with the accompanying algorithm Hindsight Experience Replay (HER) has been shown an effective solution. However, HER and its variants typically suffer from a major challenge that the agents may perform well in some goals while poorly in the other goals. The main reason for the phenomenon is the popular concept in the recent DRL works called intrinsic stochasticity. In Multi-goal RL, intrinsic stochasticity lies in that the different initial goals of the environment will cause the different value distributions and interfere with each other, where computing the expected return is not suitable in principle and cannot perform well as usual. To tackle this challenge, in this paper, we propose Quantile Regression Hindsight Experience Replay (QR-HER), a novel approach based on Quantile Regression. The key idea is to select the returns that are most closely related to the current goal from the replay buffer without additional data. In this way, the interference between different initial goals will be significantly reduced. We evaluate QR-HER on OpenAI Robotics manipulation tasks with sparse rewards. Experimental results show that, in contrast to HER and its variants, our proposed QR-HER achieves better performance by improving the performances of each goal as we expected.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017) Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)
2.
go back to reference Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 449–458. JMLR. org (2017) Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 449–458. JMLR. org (2017)
3.
go back to reference Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression (2017) Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression (2017)
4.
go back to reference Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 12602–12613 (2019) Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 12602–12613 (2019)
5.
go back to reference LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef
7.
go back to reference Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992) Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
8.
go back to reference Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv preprint arXiv:​1802.​09464 (2018)
9.
go back to reference Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015)
10.
go back to reference Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018) Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018)
Metadata
Title
Quantile Regression Hindsight Experience Replay
Authors
Qiwei He
Liansheng Zhuang
Wei Zhang
Houqiang Li
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-63820-7_94

Premium Partner