Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 10/2020

23.03.2020 | Original Article

Gradient boosting in crowd ensembles for Q-learning using weight sharing

verfasst von: D. L. Elliott, K. C. Santosh, Charles Anderson

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 10/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reinforcement learning (RL) is a double-edged sword: it frees the human trainer from having to provide voluminous supervised training data or from even knowing a solution. On the other hand, a common complaint about RL is that learning is slow. Deep Q-learning (DQN), a somewhat recent development, has allowed practitioners and scientists to solve tasks previously thought unsolvable by a reinforcement learning approach. However DQN has resulted in an explosion in the number of model parameters which has further exasperated the computational needs of Q-learning during training. In this work, an ensemble approach which improves the training time, in terms of the number of interactions with the training environment, is proposed. In the presented experiments, it is shown that the proposed approach improves stability of during training, results in improved average performance, results in more reliable training, and faster learning of features in convolutional layers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Anderson CW (1986) Learning and problem solving with multilayer connectionist systems. Ph.D. thesis, University of Massachusetts Anderson CW (1986) Learning and problem solving with multilayer connectionist systems. Ph.D. thesis, University of Massachusetts
2.
Zurück zum Zitat Anderson CW, Lee M, Elliott DL (2015) Faster reinforcement learning after pretraining deep networks to predict state dynamics. In: International joint conference on neural networks (IJCNN). IEEE, pp 1–7 Anderson CW, Lee M, Elliott DL (2015) Faster reinforcement learning after pretraining deep networks to predict state dynamics. In: International joint conference on neural networks (IJCNN). IEEE, pp 1–7
3.
Zurück zum Zitat Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279CrossRef Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279CrossRef
4.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
5.
Zurück zum Zitat Duryea E, Ganger M, Hu W (2016) Exploring deep reinforcement learning with multi q-learning. Intell Control Autom 7:129–144CrossRef Duryea E, Ganger M, Hu W (2016) Exploring deep reinforcement learning with multi q-learning. Intell Control Autom 7:129–144CrossRef
6.
Zurück zum Zitat Elliott DL, Anderson C (2014) Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp 1–8 Elliott DL, Anderson C (2014) Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp 1–8
7.
Zurück zum Zitat Elliott DL (2018) The wisdom of the crowd: reliable deep reinforcement learning through ensembles of Q-functions. Ph.D. thesis, Colorado State University Elliott DL (2018) The wisdom of the crowd: reliable deep reinforcement learning through ensembles of Q-functions. Ph.D. thesis, Colorado State University
8.
Zurück zum Zitat Faußer S, Schwenker F (2015) Neural network ensembles in reinforcement learning. Neural Process Lett 41(1):55–69CrossRef Faußer S, Schwenker F (2015) Neural network ensembles in reinforcement learning. Neural Process Lett 41(1):55–69CrossRef
9.
Zurück zum Zitat Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetCrossRef Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetCrossRef
10.
Zurück zum Zitat Gholizade-Narm H, Noori A (2018) Understanding deep learning requires rethinking generalization. Int J Mach Learn Cybern 9:1169–1179CrossRef Gholizade-Narm H, Noori A (2018) Understanding deep learning requires rethinking generalization. Int J Mach Learn Cybern 9:1169–1179CrossRef
11.
Zurück zum Zitat Hasselt HV (2010) Double q-learning. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems, vol 23. Curran Associates, Inc., Hook, pp 2613–2621 Hasselt HV (2010) Double q-learning. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems, vol 23. Curran Associates, Inc., Hook, pp 2613–2621
12.
Zurück zum Zitat Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRef Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRef
13.
Zurück zum Zitat Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the international conference on learning representations Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the international conference on learning representations
14.
Zurück zum Zitat LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRef
15.
Zurück zum Zitat LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
16.
Zurück zum Zitat Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Proceedings of the international conference on learning representations Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Proceedings of the international conference on learning representations
17.
Zurück zum Zitat Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937 Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
18.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller MA (2013) Playing Atari with deep reinforcement learning. Computing Research Repository. arXiv:1312.5602 Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller MA (2013) Playing Atari with deep reinforcement learning. Computing Research Repository. arXiv:​1312.​5602
19.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
20.
Zurück zum Zitat Nair A, Srinivasan P, Blackwell S, Alcicek C, Fearon R, De Maria A, Panneershelvam V, Suleyman M, Beattie C, Petersen S, Legg S (2015) Massively parallel methods for deep reinforcement learning. In: International conference on machine learning deep learning workshop Nair A, Srinivasan P, Blackwell S, Alcicek C, Fearon R, De Maria A, Panneershelvam V, Suleyman M, Beattie C, Petersen S, Legg S (2015) Massively parallel methods for deep reinforcement learning. In: International conference on machine learning deep learning workshop
21.
Zurück zum Zitat Pourpanah F, Lim CP, Hao Q (2019) A reinforced fuzzy ARTMAP model for data classification. Int J Mach Learn Cybern 10(7):1643–1655CrossRef Pourpanah F, Lim CP, Hao Q (2019) A reinforced fuzzy ARTMAP model for data classification. Int J Mach Learn Cybern 10(7):1643–1655CrossRef
22.
Zurück zum Zitat Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, CambridgeMATH Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, CambridgeMATH
23.
Zurück zum Zitat van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double Q-learning. Computing Research Repository. arXiv:1509.06461 van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double Q-learning. Computing Research Repository. arXiv:​1509.​06461
Metadaten
Titel
Gradient boosting in crowd ensembles for Q-learning using weight sharing
verfasst von
D. L. Elliott
K. C. Santosh
Charles Anderson
Publikationsdatum
23.03.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 10/2020
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01115-5

Weitere Artikel der Ausgabe 10/2020

International Journal of Machine Learning and Cybernetics 10/2020 Zur Ausgabe

Neuer Inhalt