Abstract
Temporal difference and eligibility traces are of the most common approaches to solve reinforcement learning problems. However, except in the case of Q-learning, there are no studies about using these two approaches in a cooperative multi-agent learning setting. This paper addresses this shortcoming by using temporal difference and eligibility traces as the core learning method in multi-criteria expertness based cooperative learning (MCE). The experiments, performed on a sample maze world, show the results of an empirical study on temporal difference and eligibility trace methods in a MCE based cooperative learning setting.
Similar content being viewed by others
References
Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. J Auton Agents Multi-Agent Syst 11(3):387–434
Nili Ahmadabadi M, Asadpour M, Khodaabakhsh Seyyed H, Nakano E (2000) Expertness measuring in cooperative learning. In: Proceedings of the 2000 IEEE/RSJ international conference on intelligent robots and systems, pp 2261–2267
Pakizeh E, Palhang M, Pedram MM (2013) Multi-criteria expertness based cooperative Q-learning. J Appl Intell Springer 39(1):28–40
Pakizeh E (2011) Multi-critreria expertness based cooperative learning in multi-agent systems, M.S. Thesis, Electrical and Computer Engineering Department, Isfahan University of Technology (in Persian)
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction to adaptive computation and machine learning. MIT Press
Dolk V (2010) Survey reinforcement learning, Eindhoven University of Technology
Watkins CJCH (1989) Learning with delayed rewards. Ph.D. thesis, Cambridge University Psychology Department
Whitehead S, Ballard D (1991) A study of cooperative mechanisms for faster reinforcement learning, Technical Report 365, Computer Science Dept., University of Rochester
Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of tenth international conference machine learning. Amherst, pp 487–494
Kuniyoshi y (1994) Learning by watching: extracting reuseable task knowledge from visual observation of human performance. IEEE Trans Robot Automat 10(6):799–822
Maclin R, Shavlik JW (1996) Creating advice-taking reinforcement learners. Mach Learn 22:251–282
Judah K, Roy S, Fern F, Dietterich T Reinforcement learning via practice and critique advice. In: AAAI conference on artificial intelligence (AAAI-10). Atlanta
Garland A, Alterman R (1995) Preparation of multi-agent knowledge for reuse, Technical Report. Waltham: AAAI fall sumposium on adaptation of knowledge for reuse
Garland A, Alterman R (1996) Multi-agent learning through collective memory. In: adaptation, co evolution and learning in multi-agent systems: papers from the 1996 AAAI spring symposium. Menlo Park, pp 33–38
Nili Ahmadabadi M, Asadpour M (2002) Expertness based cooperative Q-learning. IEEE T-SMC 32 (1):66–76
Akbarzadeh MR, Rezaei H, Naghibi MB (2003) A fuzzy adaptive algorithm for expertness based cooperative learning, application to Herding problem. In: Proceeding of 22nd international conference of the North American fuzzy information processing society, pp 317–322
Ritthipravat P, Maneewarn T, Wya6tt J, Laowattana D (2006) Comparison and analysis of expertness measure in knowledge sharing among robots. Springer-Verlag, LNAI 4031, pp 60–69
Yang Y, Tian Y, Mei H Cooperative Q learning based on blackboard architecture. In: Proceedings of 2007 international conference on computational intelligence and security workshops, pp 224–227
Yang M, Tian Y, Liu X (2009) Cooperative Q-learning based on maturity of the policy. In: Proceedings of the 2009 IEEE international conference on mechatronics and automation. Changchun
Song Y, Li Y, Wang X, Ma X, Ruan J (2014) An improved reinforcement learning algorithm for cooperative behaviors of mobile robots. J Control Sci Eng 2014(270548):8. doi:10.1155/2014/270548
Erus G, Polat F (2007) A layered approach to learning coordination knowledge in multiagent environments. J Appl Intell 27(3):249–267
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR166. Cambridge University Engineering Department
Singh P, Jaakkola T, Littman M, Szepesv’ari C (2000) Convergence results for single-step on-policy reinforcement learning algorithms. Mach Learn 38:287–308
Peng J, Williams RJ (1996) Incremental multi-step Qlearning, vol 22
Hernandez-Orallo J (2010) On evaluating agent performance in a fixed period of time. In: Artificial General Intelligence, pp 25–30
Whiteson S, Taylor ME, Stone P (2010) Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning. J Auton Agents Multi-Agent Systems 21(1)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pakizeh, E., Pedram, M.M. & Palhang, M. Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms. Appl Intell 43, 487–498 (2015). https://doi.org/10.1007/s10489-015-0665-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0665-y