Top

International Journal of Machine Learning and Cybernetics

Published in:

24-10-2023 | Original Article

Multi-view reinforcement learning for sequential decision-making with insufficient state information

Authors: Min Li, William Zhu, Shiping Wang

Published in: International Journal of Machine Learning and Cybernetics | Issue 4/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Most reinforcement learning methods describe sequential decision-making as a Markov decision process where the effect of action is only decided by the current state. But this is reasonable only if the state is correctly defined and the state information is sufficiently observed. Thus the learning efficiency of reinforcement learning methods based on Markov decision process is limited when the state information is insufficient. Partially observable Markov decision process and history-based decision process are respectively proposed to describe sequential decision-making with insufficient state information. However, these two processes are easy to ignore the important information from the current observed state. Therefore, the learning efficiency of reinforcement learning methods based on these two processes is also limited when the state information is insufficient. In this paper, we propose a multi-view reinforcement learning method to solve this problem. The motivation is that the interaction information between the agent and its environment should be considered from the views of history, present, and future to overcome the insufficiency of state information. Based on these views, we construct a multi-view decision process to describe sequential decision-making with insufficient state information. A multi-view reinforcement learning method is proposed by combining the multi-view decision process and the actor-critic framework. In the proposed method, multi-view clustering is performed to ensure that each type of sample can be sufficiently exploited. Experiments illustrate that the proposed method is more effective than the compared state-of-the-arts. The source code can be downloaded from https://github.com/jamieliuestc/MVRL.

previous article A novel autism spectrum disorder identification method: spectral graph network with brain-population graph structure joint learning

next article Enhancing framelet GCNs with generalized p-Laplacian regularization

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

Available only for authorised users

Littman ML, Algorithms for sequential decision-making, Brown University, 1996

Barto AG, Sutton RS, Watkins C (1989) Learning and sequential decision making. University of Massachusetts Amherst, MA

Lample G, Chaplot DS (2017) Playing fps games with deep reinforcement learning, in: Proceedings of AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2017, pp. 2140–2146

Littman M L (1994) Markov games as a framework for multi-agent reinforcement learning, in: Machine learning proceedings 1994, Elsevier, pp. 157–163

Zheng L, Fiez T, Alumbaugh Z, Chasnov B, Ratliff LJ (2022) Stackelberg actor-critic: Game-theoretic reinforcement learning algorithms, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9217–9224

Sholeh Y, Mohammad BNS, Ali K (2016) Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained input systems. Int J Mach Learn Cybern 7(6):967–980CrossRef

Johannink T, Bahl S, Nair A, Luo J, Kumar A, Loskyll M, Ojea JA, Solowjow E, Levine S (2019) Residual reinforcement learning for robot control, in: 2019 International Conference on Robotics and Automation, IEEE, Montreal, Canada, pp. 6023–6029

Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274CrossRef

Gui Y, Hu W, Rahmani A (2022) A reinforcement learning based artificial bee colony algorithm with application in robot path planning. Expert Syst Appl 203:117389CrossRef

10.

Folkers A, Rick M, Büskens C (2019) Controlling an autonomous vehicle with deep reinforcement learning in, IEEE Intelligent Vehicles Symposium. IEEE, Paris France 2019:2025–2031

11.

Noaeen M, Naik A, Goodman L, Crebo J, Abrar T, Abad ZSH, Bazzan AL, Far B (2022) Reinforcement learning in urban network traffic signal control: A systematic literature review. Expert Syst Appl 199:116830CrossRef

12.

Wurman PR, Barrett S, Kawamoto K, MacGlashan J, Subramanian K, Walsh TJ, Capobianco R, Devlic A, Eckert F, Fuchs F et al (2022) Outracing champion gran turismo drivers with deep reinforcement learning. Nature 602(7896):223–228ADSCrossRefPubMed

13.

Yang F, Liu Y, Ding X, Ma F, Cao J (2022) Asymmetric cross-modal hashing with high-level semantic similarity. Pattern Recogn 130:108823CrossRef

14.

Yang F, Ding X, Ma F, Tong D, Cao J (2023) Edmh: efficient discrete matrix factorization hashing for multi-modal similarity retrieval. Inform Process Manage 60(3):103301CrossRef

15.

Yang F, Ding X, Liu Y, Ma F, Cao J (2022) Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl-Based Syst 251:109176CrossRef

16.

Cristescu M-C (2021) Machine learning techniques for improving the performance metrics of functional verification. Sci Technol 24(1):99–116MathSciNet

17.

Li J, Sun A, Guan Z, Cheema MA, Min G (2022) Real-time dynamic network learning for location inference modelling and computing. Neurocomputing 472:198–200CrossRef

18.

Zamfirache IA, Precup R-E, Roman R-C, Petriu EM (2022) Policy iteration reinforcement learning-based control using a grey wolf optimizer algorithm. Inf Sci 585:162–175CrossRef

19.

Sutton RS, Barto AG, Reinforcement learning: An introduction, MIT press, 2018

20.

Chen X, Qu G, Tang Y, Low S, Li N (2022) Reinforcement learning for selective key applications in power systems: recent advances and future challenges. IEEE Trans Smart Grid 13:2935CrossRef

21.

Puterman ML (1990) Markov decision processes. Handb Oper Res Manage Sci 2:331–434MathSciNet

22.

Otterlo MV, Wiering M (2012) Reinforcement learning and markov decision processes, in: Reinforcement learning, Springer, pp. 3–42

23.

Daswani M, Sunehag P, Hutter M (2013) Q-learning for history-based reinforcement learning, in: Asian Conference on Machine Learning, Canberra, Australia, pp. 213–228

24.

Leike J (2016) Nonparametric general reinforcement learning, Ph.D. thesis, Australian National University

25.

Monahan GE (1982) State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Manage Sci 28(1):1–16CrossRef

26.

Majeed SJ, Hutter M (2018) On q-learning convergence for non-markov decision processes, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 2546–2552

27.

Bellemare MG, Ostrovski G, Guez A, Thomas P, Munos R (2016) Increasing the action gap: New operators for reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, Arizona USA

28.

Melo FS (2001)Convergence of q-learning: A simple proof. Instit Syst Robot, Tech Rep 1–4

29.

Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning, in: International Conference on Learning Representations, San Juan, Puerto Rico,

30.

Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596

31.

Li M, Huang T, Zhu W (2022) Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recogn 131:108875CrossRef

32.

Li M, Huang T, Zhu W (2021) Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization. Int J Mach Learn Cybern 12(12):3491–3501CrossRef

33.

Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1008–1014

34.

Zhong C, Lu Z, Gursoy MC, Velipasalar S (2019) A deep actor-critic reinforcement learning framework for dynamic multichannel access. IEEE Trans Cogn Commun Netw 5(4):1125–1139CrossRef

35.

Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1291–1307CrossRef

36.

Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms, in: International Conference on Machine Learning, Beijing, China, , pp. 1387–1395

37.

Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1057–1063

38.

Hasselt HV (2010) Double q-learning, in: Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 2613–2621

39.

Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, New York, USA

40.

Huang T, Li M, Qin X, Zhu W (2022) A cnn-based policy for optimizing continuous action control by learning state sequences. Neurocomputing 468:286–295CrossRef

41.

Zhao J, Guan Z, Xu C, Zhao W, Chen E (2022) Charge prediction by constitutive elements matching of crimes. Proceed Thirty-First Int Joint Conf Artif Intell IJCAI 22:4517–4523

42.

Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multiview deep learning for internet of things applications. IEEE Trans Industr Inf 19(2):1456–1466CrossRef

43.

Xu C, Guan Z, Zhao W, Niu Y, Wang Q, Wang Z (2018) Deep multi-view concept learning., in: IJCAI, Stockholm, pp. 2898–2904

44.

Zhao W, Xu C, Guan Z, Liu Y (2020) Multiview concept learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(2):814–825MathSciNetCrossRef

45.

Xu C, Guan Z, Zhao W, Wu H, Niu Y, Ling B (2019) Adversarial incomplete multi-view clustering. IJCAI 7:3933–3939

46.

Xu C, Liu H, Guan Z, Wu X, Tan J, Ling B (2021) Adversarial incomplete multiview subspace clustering networks. IEEE Trans Cybern 52(10):10490–10503CrossRef

47.

Li M, Wu L, Wang J, Bou Ammar H (2019) Multi-view reinforcement learning, Advances in neural information processing systems 32 (2019)

48.

Hu Y, Sun S, Xu X, Zhao J (2020) Attentive multi-view reinforcement learning. Int J Mach Learn Cybern 11:2461–2474CrossRef

49.

Fan J, Li W, (2022) Dribo: Robust deep reinforcement learning via multi-view information bottleneck, in: International Conference on Machine Learning, PMLR, pp. 6074–6102

50.

Goodfellow I, Bengio Y. a Courville (2016) A, Deep learning, Vol. 1, MIT press Cambridge

51.

Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data, in: Proceedings of the 23th International Joint conference on artificial intelligence, Beijing China, pp. 2598–2604

52.

Han J, Xu J, Nie F, Li X (2020) Multi-view k-means clustering with adaptive sparse memberships and weight allocation. IEEE Trans Knowl Data Eng 34(2):816–827CrossRef

53.

Fu L, Lin P, Vasilakos AV, Wang S (2020) An overview of recent multi-view clustering. Neurocomputing 402:148–161CrossRef

54.

Todorov E, Erez T, Tassa Mujoco Y(2012) A physics engine for model-based control, in: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Algarve, Portugal, pp. 5026–5033

55.

Palanisamy P (2018) Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning, Packt Publishing Ltd

Title: Multi-view reinforcement learning for sequential decision-making with insufficient state information
Authors: Min Li
William Zhu
Shiping Wang
Publication date: 24-10-2023
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 4/2024
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-023-01981-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 4/2024

A hybrid similarity measure-based clustering approach for mixed attribute data

Joint weighted knowledge distillation and multi-scale feature distillation for long-tailed recognition

Multi-criteria group decision-making methods with dynamic probabilistic linguistic information characterized by multiple consecutive time points

Efficient low-light image enhancement with model parameters scaled down to 0.02M

Knowledge structures construction and learning paths recommendation based on formal contexts

Contrastive sequential interaction network learning on co-evolving Riemannian spaces