Skip to main content
Log in

Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Temporal difference and eligibility traces are of the most common approaches to solve reinforcement learning problems. However, except in the case of Q-learning, there are no studies about using these two approaches in a cooperative multi-agent learning setting. This paper addresses this shortcoming by using temporal difference and eligibility traces as the core learning method in multi-criteria expertness based cooperative learning (MCE). The experiments, performed on a sample maze world, show the results of an empirical study on temporal difference and eligibility trace methods in a MCE based cooperative learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. J Auton Agents Multi-Agent Syst 11(3):387–434

    Article  Google Scholar 

  2. Nili Ahmadabadi M, Asadpour M, Khodaabakhsh Seyyed H, Nakano E (2000) Expertness measuring in cooperative learning. In: Proceedings of the 2000 IEEE/RSJ international conference on intelligent robots and systems, pp 2261–2267

  3. Pakizeh E, Palhang M, Pedram MM (2013) Multi-criteria expertness based cooperative Q-learning. J Appl Intell Springer 39(1):28–40

    Article  Google Scholar 

  4. Pakizeh E (2011) Multi-critreria expertness based cooperative learning in multi-agent systems, M.S. Thesis, Electrical and Computer Engineering Department, Isfahan University of Technology (in Persian)

  5. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction to adaptive computation and machine learning. MIT Press

  6. Dolk V (2010) Survey reinforcement learning, Eindhoven University of Technology

  7. Watkins CJCH (1989) Learning with delayed rewards. Ph.D. thesis, Cambridge University Psychology Department

  8. Whitehead S, Ballard D (1991) A study of cooperative mechanisms for faster reinforcement learning, Technical Report 365, Computer Science Dept., University of Rochester

  9. Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of tenth international conference machine learning. Amherst, pp 487–494

  10. Kuniyoshi y (1994) Learning by watching: extracting reuseable task knowledge from visual observation of human performance. IEEE Trans Robot Automat 10(6):799–822

    Article  Google Scholar 

  11. Maclin R, Shavlik JW (1996) Creating advice-taking reinforcement learners. Mach Learn 22:251–282

    Google Scholar 

  12. Judah K, Roy S, Fern F, Dietterich T Reinforcement learning via practice and critique advice. In: AAAI conference on artificial intelligence (AAAI-10). Atlanta

  13. Garland A, Alterman R (1995) Preparation of multi-agent knowledge for reuse, Technical Report. Waltham: AAAI fall sumposium on adaptation of knowledge for reuse

  14. Garland A, Alterman R (1996) Multi-agent learning through collective memory. In: adaptation, co evolution and learning in multi-agent systems: papers from the 1996 AAAI spring symposium. Menlo Park, pp 33–38

  15. Nili Ahmadabadi M, Asadpour M (2002) Expertness based cooperative Q-learning. IEEE T-SMC 32 (1):66–76

    Google Scholar 

  16. Akbarzadeh MR, Rezaei H, Naghibi MB (2003) A fuzzy adaptive algorithm for expertness based cooperative learning, application to Herding problem. In: Proceeding of 22nd international conference of the North American fuzzy information processing society, pp 317–322

  17. Ritthipravat P, Maneewarn T, Wya6tt J, Laowattana D (2006) Comparison and analysis of expertness measure in knowledge sharing among robots. Springer-Verlag, LNAI 4031, pp 60–69

  18. Yang Y, Tian Y, Mei H Cooperative Q learning based on blackboard architecture. In: Proceedings of 2007 international conference on computational intelligence and security workshops, pp 224–227

  19. Yang M, Tian Y, Liu X (2009) Cooperative Q-learning based on maturity of the policy. In: Proceedings of the 2009 IEEE international conference on mechatronics and automation. Changchun

  20. Song Y, Li Y, Wang X, Ma X, Ruan J (2014) An improved reinforcement learning algorithm for cooperative behaviors of mobile robots. J Control Sci Eng 2014(270548):8. doi:10.1155/2014/270548

    MathSciNet  Google Scholar 

  21. Erus G, Polat F (2007) A layered approach to learning coordination knowledge in multiagent environments. J Appl Intell 27(3):249–267

    Article  Google Scholar 

  22. Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR166. Cambridge University Engineering Department

  23. Singh P, Jaakkola T, Littman M, Szepesv’ari C (2000) Convergence results for single-step on-policy reinforcement learning algorithms. Mach Learn 38:287–308

    Article  MATH  Google Scholar 

  24. Peng J, Williams RJ (1996) Incremental multi-step Qlearning, vol 22

  25. Hernandez-Orallo J (2010) On evaluating agent performance in a fixed period of time. In: Artificial General Intelligence, pp 25–30

  26. Whiteson S, Taylor ME, Stone P (2010) Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning. J Auton Agents Multi-Agent Systems 21(1)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mir Mohsen Pedram.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pakizeh, E., Pedram, M.M. & Palhang, M. Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms. Appl Intell 43, 487–498 (2015). https://doi.org/10.1007/s10489-015-0665-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0665-y

Keywords

Navigation