Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 11/2020

04.05.2020 | Original Article

Attentive multi-view reinforcement learning

verfasst von: Yueyue Hu, Shiliang Sun, Xin Xu, Jing Zhao

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 11/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The reinforcement learning process usually takes millions of steps from scratch, due to the limited observation experience. More precisely, the representation approximated by a single deep network is usually limited for reinforcement learning agents. In this paper, we propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning framework for the first time. Based on the multi-view scheme of function approximation, the proposed model approximates multiple view-specific policy or value functions in parallel by estimating the middle-level representation and integrates these functions based on attention mechanisms to generate a comprehensive strategy. Furthermore, we develop the multi-view generalized policy improvement to jointly optimize all policies instead of a single one. Compared with the single-view function approximation scheme in reinforcement learning methods, experimental results on eight Atari benchmarks show that MvDAN outperforms the state-of-the-art methods and has faster convergence and training stability.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
2.
Zurück zum Zitat Barati E, Chen X, Zhong Z (2019) Attention-based deep reinforcement learning for multi-view environments. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1805–1807 Barati E, Chen X, Zhong Z (2019) Attention-based deep reinforcement learning for multi-view environments. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1805–1807
3.
Zurück zum Zitat Barreto A, Borsa D, Quan J, Schaul T, Silver D, Hessel M, Mankowitz D, Žídek A, Munos R (2019) Transfer in deep reinforcement learning using successor features and generalised policy improvement. arXiv:1901.10964 Barreto A, Borsa D, Quan J, Schaul T, Silver D, Hessel M, Mankowitz D, Žídek A, Munos R (2019) Transfer in deep reinforcement learning using successor features and generalised policy improvement. arXiv:​1901.​10964
4.
Zurück zum Zitat Barreto A, Dabney W, Munos R, Hunt JJ, Schaul T, van Hasselt HP, Silver D (2017). Successor features for transfer in reinforcement learning. In: Advances in neural information processing systems, pp 4055–4065 Barreto A, Dabney W, Munos R, Hunt JJ, Schaul T, van Hasselt HP, Silver D (2017). Successor features for transfer in reinforcement learning. In: Advances in neural information processing systems, pp 4055–4065
5.
Zurück zum Zitat Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915 Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
7.
Zurück zum Zitat Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeMATH Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeMATH
8.
Zurück zum Zitat Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI Fall Symposium Series Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI Fall Symposium Series
9.
Zurück zum Zitat Hu Y, Sun S, Xu X, Zhao J (2019) Multi-view deep attention network for reinforcement learning. In: 34th AAAI Conference on Artificial Intelligence, pp 1–2 Hu Y, Sun S, Xu X, Zhao J (2019) Multi-view deep attention network for reinforcement learning. In: 34th AAAI Conference on Artificial Intelligence, pp 1–2
10.
Zurück zum Zitat Iwata T, Yamada M (2016) Multi-view anomaly detection via robust probabilistic latent variable models. In: Advances in neural information processing systems, pp 1136–1144 Iwata T, Yamada M (2016) Multi-view anomaly detection via robust probabilistic latent variable models. In: Advances in neural information processing systems, pp 1136–1144
11.
Zurück zum Zitat Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264 Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264
12.
Zurück zum Zitat Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137 Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
13.
Zurück zum Zitat Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683 Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683
14.
Zurück zum Zitat LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
15.
Zurück zum Zitat Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373MathSciNetMATH Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373MathSciNetMATH
16.
Zurück zum Zitat Li M, Wu L, Ammar HB, Wang J (2019) Multi-view reinforcement learning. In: Advances in neural information processing systems Li M, Wu L, Ammar HB, Wang J (2019) Multi-view reinforcement learning. In: Advances in neural information processing systems
17.
Zurück zum Zitat Li Y, Yang M, Zhang ZM (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 31(10):1863–1883CrossRef Li Y, Yang M, Zhang ZM (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 31(10):1863–1883CrossRef
18.
Zurück zum Zitat Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390 Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390
19.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602 Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:​1312.​5602
20.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
21.
Zurück zum Zitat Nie W, Narodytska N, Patel A (2018) Relgan: relational generative adversarial networks for text generation. In: International conference on learning representations Nie W, Narodytska N, Patel A (2018) Relgan: relational generative adversarial networks for text generation. In: International conference on learning representations
23.
Zurück zum Zitat Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489 Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
24.
Zurück zum Zitat Silver D, Newnham L, Barker D, Weller S, McFall J (2013) Concurrent reinforcement learning from customer interactions. In: International conference on machine learning, pp 924–932 Silver D, Newnham L, Barker D, Weller S, McFall J (2013) Concurrent reinforcement learning from customer interactions. In: International conference on machine learning, pp 924–932
25.
Zurück zum Zitat Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359 Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
27.
Zurück zum Zitat Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953 Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953
28.
Zurück zum Zitat Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038CrossRef Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038CrossRef
29.
Zurück zum Zitat Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, CambridgeMATH Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, CambridgeMATH
30.
Zurück zum Zitat Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063 Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
31.
Zurück zum Zitat Van Hasselt H, Guez A, Silver, D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence, pp 2613–2621 Van Hasselt H, Guez A, Silver, D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence, pp 2613–2621
32.
Zurück zum Zitat Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575:350–354CrossRef Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575:350–354CrossRef
33.
Zurück zum Zitat Wang W, Arora R, Livescu K, Bilmes J (2015) On deep multi-view representation learning. In: International conference on machine learning, pp 1083–1092 Wang W, Arora R, Livescu K, Bilmes J (2015) On deep multi-view representation learning. In: International conference on machine learning, pp 1083–1092
34.
Zurück zum Zitat Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581 Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv:​1511.​06581
35.
Zurück zum Zitat Watter M, Springenberg J, Boedecker J, Riedmiller M (2015) Embed to control: a locally linear latent dynamics model for control from raw images. In: Advances in neural information processing systems, pp 2746–2754 Watter M, Springenberg J, Boedecker J, Riedmiller M (2015) Embed to control: a locally linear latent dynamics model for control from raw images. In: Advances in neural information processing systems, pp 2746–2754
36.
Zurück zum Zitat Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256MATH Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256MATH
37.
Zurück zum Zitat Willianms R (1988) Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3, Northeastern University Willianms R (1988) Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3, Northeastern University
38.
39.
Zurück zum Zitat Zhao B, Wu X, Cheng ZQ, Liu H, Jie Z, Feng J (2018) Multi-view image generation from a single-view. In: 2018 ACM multimedia conference on multimedia conference, pp 383–391 Zhao B, Wu X, Cheng ZQ, Liu H, Jie Z, Feng J (2018) Multi-view image generation from a single-view. In: 2018 ACM multimedia conference on multimedia conference, pp 383–391
40.
Zurück zum Zitat Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54CrossRef Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54CrossRef
Metadaten
Titel
Attentive multi-view reinforcement learning
verfasst von
Yueyue Hu
Shiliang Sun
Xin Xu
Jing Zhao
Publikationsdatum
04.05.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 11/2020
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01130-6

Weitere Artikel der Ausgabe 11/2020

International Journal of Machine Learning and Cybernetics 11/2020 Zur Ausgabe

Neuer Inhalt