Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 4/2024

24-10-2023 | Original Article

Multi-view reinforcement learning for sequential decision-making with insufficient state information

Authors: Min Li, William Zhu, Shiping Wang

Published in: International Journal of Machine Learning and Cybernetics | Issue 4/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most reinforcement learning methods describe sequential decision-making as a Markov decision process where the effect of action is only decided by the current state. But this is reasonable only if the state is correctly defined and the state information is sufficiently observed. Thus the learning efficiency of reinforcement learning methods based on Markov decision process is limited when the state information is insufficient. Partially observable Markov decision process and history-based decision process are respectively proposed to describe sequential decision-making with insufficient state information. However, these two processes are easy to ignore the important information from the current observed state. Therefore, the learning efficiency of reinforcement learning methods based on these two processes is also limited when the state information is insufficient. In this paper, we propose a multi-view reinforcement learning method to solve this problem. The motivation is that the interaction information between the agent and its environment should be considered from the views of history, present, and future to overcome the insufficiency of state information. Based on these views, we construct a multi-view decision process to describe sequential decision-making with insufficient state information. A multi-view reinforcement learning method is proposed by combining the multi-view decision process and the actor-critic framework. In the proposed method, multi-view clustering is performed to ensure that each type of sample can be sufficiently exploited. Experiments illustrate that the proposed method is more effective than the compared state-of-the-arts. The source code can be downloaded from https://github.com/jamieliuestc/MVRL.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Appendix
Available only for authorised users
Literature
1.
go back to reference Littman ML, Algorithms for sequential decision-making, Brown University, 1996 Littman ML, Algorithms for sequential decision-making, Brown University, 1996
2.
go back to reference Barto AG, Sutton RS, Watkins C (1989) Learning and sequential decision making. University of Massachusetts Amherst, MA Barto AG, Sutton RS, Watkins C (1989) Learning and sequential decision making. University of Massachusetts Amherst, MA
3.
go back to reference Lample G, Chaplot DS (2017) Playing fps games with deep reinforcement learning, in: Proceedings of AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2017, pp. 2140–2146 Lample G, Chaplot DS (2017) Playing fps games with deep reinforcement learning, in: Proceedings of AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2017, pp. 2140–2146
4.
go back to reference Littman M L (1994) Markov games as a framework for multi-agent reinforcement learning, in: Machine learning proceedings 1994, Elsevier, pp. 157–163 Littman M L (1994) Markov games as a framework for multi-agent reinforcement learning, in: Machine learning proceedings 1994, Elsevier, pp. 157–163
5.
go back to reference Zheng L, Fiez T, Alumbaugh Z, Chasnov B, Ratliff LJ (2022) Stackelberg actor-critic: Game-theoretic reinforcement learning algorithms, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9217–9224 Zheng L, Fiez T, Alumbaugh Z, Chasnov B, Ratliff LJ (2022) Stackelberg actor-critic: Game-theoretic reinforcement learning algorithms, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9217–9224
6.
go back to reference Sholeh Y, Mohammad BNS, Ali K (2016) Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained input systems. Int J Mach Learn Cybern 7(6):967–980CrossRef Sholeh Y, Mohammad BNS, Ali K (2016) Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained input systems. Int J Mach Learn Cybern 7(6):967–980CrossRef
7.
go back to reference Johannink T, Bahl S, Nair A, Luo J, Kumar A, Loskyll M, Ojea JA, Solowjow E, Levine S (2019) Residual reinforcement learning for robot control, in: 2019 International Conference on Robotics and Automation, IEEE, Montreal, Canada, pp. 6023–6029 Johannink T, Bahl S, Nair A, Luo J, Kumar A, Loskyll M, Ojea JA, Solowjow E, Levine S (2019) Residual reinforcement learning for robot control, in: 2019 International Conference on Robotics and Automation, IEEE, Montreal, Canada, pp. 6023–6029
8.
go back to reference Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274CrossRef Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274CrossRef
9.
go back to reference Gui Y, Hu W, Rahmani A (2022) A reinforcement learning based artificial bee colony algorithm with application in robot path planning. Expert Syst Appl 203:117389CrossRef Gui Y, Hu W, Rahmani A (2022) A reinforcement learning based artificial bee colony algorithm with application in robot path planning. Expert Syst Appl 203:117389CrossRef
10.
go back to reference Folkers A, Rick M, Büskens C (2019) Controlling an autonomous vehicle with deep reinforcement learning in, IEEE Intelligent Vehicles Symposium. IEEE, Paris France 2019:2025–2031 Folkers A, Rick M, Büskens C (2019) Controlling an autonomous vehicle with deep reinforcement learning in, IEEE Intelligent Vehicles Symposium. IEEE, Paris France 2019:2025–2031
11.
go back to reference Noaeen M, Naik A, Goodman L, Crebo J, Abrar T, Abad ZSH, Bazzan AL, Far B (2022) Reinforcement learning in urban network traffic signal control: A systematic literature review. Expert Syst Appl 199:116830CrossRef Noaeen M, Naik A, Goodman L, Crebo J, Abrar T, Abad ZSH, Bazzan AL, Far B (2022) Reinforcement learning in urban network traffic signal control: A systematic literature review. Expert Syst Appl 199:116830CrossRef
12.
go back to reference Wurman PR, Barrett S, Kawamoto K, MacGlashan J, Subramanian K, Walsh TJ, Capobianco R, Devlic A, Eckert F, Fuchs F et al (2022) Outracing champion gran turismo drivers with deep reinforcement learning. Nature 602(7896):223–228ADSCrossRefPubMed Wurman PR, Barrett S, Kawamoto K, MacGlashan J, Subramanian K, Walsh TJ, Capobianco R, Devlic A, Eckert F, Fuchs F et al (2022) Outracing champion gran turismo drivers with deep reinforcement learning. Nature 602(7896):223–228ADSCrossRefPubMed
13.
go back to reference Yang F, Liu Y, Ding X, Ma F, Cao J (2022) Asymmetric cross-modal hashing with high-level semantic similarity. Pattern Recogn 130:108823CrossRef Yang F, Liu Y, Ding X, Ma F, Cao J (2022) Asymmetric cross-modal hashing with high-level semantic similarity. Pattern Recogn 130:108823CrossRef
14.
go back to reference Yang F, Ding X, Ma F, Tong D, Cao J (2023) Edmh: efficient discrete matrix factorization hashing for multi-modal similarity retrieval. Inform Process Manage 60(3):103301CrossRef Yang F, Ding X, Ma F, Tong D, Cao J (2023) Edmh: efficient discrete matrix factorization hashing for multi-modal similarity retrieval. Inform Process Manage 60(3):103301CrossRef
15.
go back to reference Yang F, Ding X, Liu Y, Ma F, Cao J (2022) Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl-Based Syst 251:109176CrossRef Yang F, Ding X, Liu Y, Ma F, Cao J (2022) Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl-Based Syst 251:109176CrossRef
16.
go back to reference Cristescu M-C (2021) Machine learning techniques for improving the performance metrics of functional verification. Sci Technol 24(1):99–116MathSciNet Cristescu M-C (2021) Machine learning techniques for improving the performance metrics of functional verification. Sci Technol 24(1):99–116MathSciNet
17.
go back to reference Li J, Sun A, Guan Z, Cheema MA, Min G (2022) Real-time dynamic network learning for location inference modelling and computing. Neurocomputing 472:198–200CrossRef Li J, Sun A, Guan Z, Cheema MA, Min G (2022) Real-time dynamic network learning for location inference modelling and computing. Neurocomputing 472:198–200CrossRef
18.
go back to reference Zamfirache IA, Precup R-E, Roman R-C, Petriu EM (2022) Policy iteration reinforcement learning-based control using a grey wolf optimizer algorithm. Inf Sci 585:162–175CrossRef Zamfirache IA, Precup R-E, Roman R-C, Petriu EM (2022) Policy iteration reinforcement learning-based control using a grey wolf optimizer algorithm. Inf Sci 585:162–175CrossRef
19.
go back to reference Sutton RS, Barto AG, Reinforcement learning: An introduction, MIT press, 2018 Sutton RS, Barto AG, Reinforcement learning: An introduction, MIT press, 2018
20.
go back to reference Chen X, Qu G, Tang Y, Low S, Li N (2022) Reinforcement learning for selective key applications in power systems: recent advances and future challenges. IEEE Trans Smart Grid 13:2935CrossRef Chen X, Qu G, Tang Y, Low S, Li N (2022) Reinforcement learning for selective key applications in power systems: recent advances and future challenges. IEEE Trans Smart Grid 13:2935CrossRef
21.
go back to reference Puterman ML (1990) Markov decision processes. Handb Oper Res Manage Sci 2:331–434MathSciNet Puterman ML (1990) Markov decision processes. Handb Oper Res Manage Sci 2:331–434MathSciNet
22.
go back to reference Otterlo MV, Wiering M (2012) Reinforcement learning and markov decision processes, in: Reinforcement learning, Springer, pp. 3–42 Otterlo MV, Wiering M (2012) Reinforcement learning and markov decision processes, in: Reinforcement learning, Springer, pp. 3–42
23.
go back to reference Daswani M, Sunehag P, Hutter M (2013) Q-learning for history-based reinforcement learning, in: Asian Conference on Machine Learning, Canberra, Australia, pp. 213–228 Daswani M, Sunehag P, Hutter M (2013) Q-learning for history-based reinforcement learning, in: Asian Conference on Machine Learning, Canberra, Australia, pp. 213–228
24.
go back to reference Leike J (2016) Nonparametric general reinforcement learning, Ph.D. thesis, Australian National University Leike J (2016) Nonparametric general reinforcement learning, Ph.D. thesis, Australian National University
25.
go back to reference Monahan GE (1982) State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Manage Sci 28(1):1–16CrossRef Monahan GE (1982) State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Manage Sci 28(1):1–16CrossRef
26.
go back to reference Majeed SJ, Hutter M (2018) On q-learning convergence for non-markov decision processes, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 2546–2552 Majeed SJ, Hutter M (2018) On q-learning convergence for non-markov decision processes, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 2546–2552
27.
go back to reference Bellemare MG, Ostrovski G, Guez A, Thomas P, Munos R (2016) Increasing the action gap: New operators for reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, Arizona USA Bellemare MG, Ostrovski G, Guez A, Thomas P, Munos R (2016) Increasing the action gap: New operators for reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, Arizona USA
28.
go back to reference Melo FS (2001)Convergence of q-learning: A simple proof. Instit Syst Robot, Tech Rep 1–4 Melo FS (2001)Convergence of q-learning: A simple proof. Instit Syst Robot, Tech Rep 1–4
29.
go back to reference Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning, in: International Conference on Learning Representations, San Juan, Puerto Rico, Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning, in: International Conference on Learning Representations, San Juan, Puerto Rico,
30.
go back to reference Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596 Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596
31.
go back to reference Li M, Huang T, Zhu W (2022) Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recogn 131:108875CrossRef Li M, Huang T, Zhu W (2022) Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recogn 131:108875CrossRef
32.
go back to reference Li M, Huang T, Zhu W (2021) Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization. Int J Mach Learn Cybern 12(12):3491–3501CrossRef Li M, Huang T, Zhu W (2021) Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization. Int J Mach Learn Cybern 12(12):3491–3501CrossRef
33.
go back to reference Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1008–1014 Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1008–1014
34.
go back to reference Zhong C, Lu Z, Gursoy MC, Velipasalar S (2019) A deep actor-critic reinforcement learning framework for dynamic multichannel access. IEEE Trans Cogn Commun Netw 5(4):1125–1139CrossRef Zhong C, Lu Z, Gursoy MC, Velipasalar S (2019) A deep actor-critic reinforcement learning framework for dynamic multichannel access. IEEE Trans Cogn Commun Netw 5(4):1125–1139CrossRef
35.
go back to reference Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1291–1307CrossRef Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1291–1307CrossRef
36.
go back to reference Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms, in: International Conference on Machine Learning, Beijing, China, , pp. 1387–1395 Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms, in: International Conference on Machine Learning, Beijing, China, , pp. 1387–1395
37.
go back to reference Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1057–1063 Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1057–1063
38.
go back to reference Hasselt HV (2010) Double q-learning, in: Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 2613–2621 Hasselt HV (2010) Double q-learning, in: Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 2613–2621
39.
go back to reference Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, New York, USA Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, New York, USA
40.
go back to reference Huang T, Li M, Qin X, Zhu W (2022) A cnn-based policy for optimizing continuous action control by learning state sequences. Neurocomputing 468:286–295CrossRef Huang T, Li M, Qin X, Zhu W (2022) A cnn-based policy for optimizing continuous action control by learning state sequences. Neurocomputing 468:286–295CrossRef
41.
go back to reference Zhao J, Guan Z, Xu C, Zhao W, Chen E (2022) Charge prediction by constitutive elements matching of crimes. Proceed Thirty-First Int Joint Conf Artif Intell IJCAI 22:4517–4523 Zhao J, Guan Z, Xu C, Zhao W, Chen E (2022) Charge prediction by constitutive elements matching of crimes. Proceed Thirty-First Int Joint Conf Artif Intell IJCAI 22:4517–4523
42.
go back to reference Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multiview deep learning for internet of things applications. IEEE Trans Industr Inf 19(2):1456–1466CrossRef Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multiview deep learning for internet of things applications. IEEE Trans Industr Inf 19(2):1456–1466CrossRef
43.
go back to reference Xu C, Guan Z, Zhao W, Niu Y, Wang Q, Wang Z (2018) Deep multi-view concept learning., in: IJCAI, Stockholm, pp. 2898–2904 Xu C, Guan Z, Zhao W, Niu Y, Wang Q, Wang Z (2018) Deep multi-view concept learning., in: IJCAI, Stockholm, pp. 2898–2904
44.
go back to reference Zhao W, Xu C, Guan Z, Liu Y (2020) Multiview concept learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(2):814–825MathSciNetCrossRef Zhao W, Xu C, Guan Z, Liu Y (2020) Multiview concept learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(2):814–825MathSciNetCrossRef
45.
go back to reference Xu C, Guan Z, Zhao W, Wu H, Niu Y, Ling B (2019) Adversarial incomplete multi-view clustering. IJCAI 7:3933–3939 Xu C, Guan Z, Zhao W, Wu H, Niu Y, Ling B (2019) Adversarial incomplete multi-view clustering. IJCAI 7:3933–3939
46.
go back to reference Xu C, Liu H, Guan Z, Wu X, Tan J, Ling B (2021) Adversarial incomplete multiview subspace clustering networks. IEEE Trans Cybern 52(10):10490–10503CrossRef Xu C, Liu H, Guan Z, Wu X, Tan J, Ling B (2021) Adversarial incomplete multiview subspace clustering networks. IEEE Trans Cybern 52(10):10490–10503CrossRef
47.
go back to reference Li M, Wu L, Wang J, Bou Ammar H (2019) Multi-view reinforcement learning, Advances in neural information processing systems 32 (2019) Li M, Wu L, Wang J, Bou Ammar H (2019) Multi-view reinforcement learning, Advances in neural information processing systems 32 (2019)
48.
go back to reference Hu Y, Sun S, Xu X, Zhao J (2020) Attentive multi-view reinforcement learning. Int J Mach Learn Cybern 11:2461–2474CrossRef Hu Y, Sun S, Xu X, Zhao J (2020) Attentive multi-view reinforcement learning. Int J Mach Learn Cybern 11:2461–2474CrossRef
49.
go back to reference Fan J, Li W, (2022) Dribo: Robust deep reinforcement learning via multi-view information bottleneck, in: International Conference on Machine Learning, PMLR, pp. 6074–6102 Fan J, Li W, (2022) Dribo: Robust deep reinforcement learning via multi-view information bottleneck, in: International Conference on Machine Learning, PMLR, pp. 6074–6102
50.
go back to reference Goodfellow I, Bengio Y. a Courville (2016) A, Deep learning, Vol. 1, MIT press Cambridge Goodfellow I, Bengio Y. a Courville (2016) A, Deep learning, Vol. 1, MIT press Cambridge
51.
go back to reference Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data, in: Proceedings of the 23th International Joint conference on artificial intelligence, Beijing China, pp. 2598–2604 Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data, in: Proceedings of the 23th International Joint conference on artificial intelligence, Beijing China, pp. 2598–2604
52.
go back to reference Han J, Xu J, Nie F, Li X (2020) Multi-view k-means clustering with adaptive sparse memberships and weight allocation. IEEE Trans Knowl Data Eng 34(2):816–827CrossRef Han J, Xu J, Nie F, Li X (2020) Multi-view k-means clustering with adaptive sparse memberships and weight allocation. IEEE Trans Knowl Data Eng 34(2):816–827CrossRef
53.
go back to reference Fu L, Lin P, Vasilakos AV, Wang S (2020) An overview of recent multi-view clustering. Neurocomputing 402:148–161CrossRef Fu L, Lin P, Vasilakos AV, Wang S (2020) An overview of recent multi-view clustering. Neurocomputing 402:148–161CrossRef
54.
go back to reference Todorov E, Erez T, Tassa Mujoco Y(2012) A physics engine for model-based control, in: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Algarve, Portugal, pp. 5026–5033 Todorov E, Erez T, Tassa Mujoco Y(2012) A physics engine for model-based control, in: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Algarve, Portugal, pp. 5026–5033
55.
go back to reference Palanisamy P (2018) Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning, Packt Publishing Ltd Palanisamy P (2018) Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning, Packt Publishing Ltd
Metadata
Title
Multi-view reinforcement learning for sequential decision-making with insufficient state information
Authors
Min Li
William Zhu
Shiping Wang
Publication date
24-10-2023
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 4/2024
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01981-9

Other articles of this Issue 4/2024

International Journal of Machine Learning and Cybernetics 4/2024 Go to the issue