Accelerating autonomous learning by using heuristic selection of actions

Bianchi, Reinaldo A. C.; Ribeiro, Carlos H. C.; Costa, Anna H. R.

doi:10.1007/s10732-007-9031-5

Accelerating autonomous learning by using heuristic selection of actions

Published: 04 May 2007

Volume 14, pages 135–168, (2008)
Cite this article

Journal of Heuristics Aims and scope Submit manuscript

Reinaldo A. C. Bianchi¹,
Carlos H. C. Ribeiro² &
Anna H. R. Costa³

527 Accesses
43 Citations
3 Altmetric
Explore all metrics

Abstract

This paper investigates how to make improved action selection for online policy learning in robotic scenarios using reinforcement learning (RL) algorithms. Since finding control policies using any RL algorithm can be very time consuming, we propose to combine RL algorithms with heuristic functions for selecting promising actions during the learning process. With this aim, we investigate the use of heuristics for increasing the rate of convergence of RL algorithms and contribute with a new learning algorithm, Heuristically Accelerated Q-learning (HAQL), which incorporates heuristics for action selection to the Q-Learning algorithm. Experimental results on robot navigation show that the use of even very simple heuristic functions results in significant performance enhancement of the learning rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

References

Albus, J.S.: Data storage in the cerebellar model articulation controller (CMAC). J. Dyn. Syst. Meas. Control 97, 228–233 (1975)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Upper Saddle River (1987)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (1995)
MATH Google Scholar
Bianchi, R.A.C.: Using heuristics to accelerate reinforcement learning algorithms (in Portuguese). Ph.D. thesis, University of São Paulo (2004)
Bonabeau, E., Dorigo, M., Theraulaz, G.: Inspiration for optimization from social insect behaviour. Nature 406 (2000)
Butz, M.V.: State value learning with an anticipatory learning classifier system in a Markov decision process. Technical report 2002018 at the Illinois Genetic Algorithms Laboratory (2002)
Drummond, C.: Accelerating reinforcement learning by composing solutions of automatically identified subtasks. J. Artif. Intell. Res. 16, 59–104 (2002)
MATH Google Scholar
Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22, 46–57 (1989)
Article Google Scholar
Foster, D., Dayan, P.: Structure in the space of value functions. Mach. Learn. 49, 325–346 (2002)
Article MATH Google Scholar
Fox, D., Burgard, W., Thrun, S.: Markov localization for mobile robots in dynamic environments. J. Artif. Intell. Res. 11, 391–427 (1999)
MATH Google Scholar
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 100–107 (1968)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Google Scholar
Konolige, K., Myers, K.: The Saphira architecture for autonomous mobile robots. In: AI-based Mobile Robots: Case Studies of Successful Robot Systems. MIT, Cambridge (1996)
Google Scholar
Millan, J.R., Posenato, D., Dedieu, E.: Continuous-action Q-learning. Mach. Learn. 49, 247–266 (2002)
Article MATH Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993)
Google Scholar
Munos, R., Moore, A.W.: Variable resolution discretization in optimal control. Mach. Learn. 49, 291–323 (2002)
Article MATH Google Scholar
Peng, J., Williams, R.J.: Efficient learning and planning within the dyna framework. Adapt. Behav. 1, 437–454 (1993)
Article Google Scholar
Puterman, M.L.: Markovian Decision Problems. Wiley, New York (1994)
Google Scholar
Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)
Russell, S., Norvig, P.: Artificial Intelligence: a Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Spiegel, M.R.: Probability and Statistics. McGraw-Hill, New York (1975)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning. Morgan Kaufmann, Austin (1990)
Google Scholar
Sutton, R.S.: Generalization in reinforcement learning: successful examples using sparse coarse coding. Adv. Neural. Inf. Process. Syst. 8, 1038–1044 (1996)
Google Scholar
Szepesvári, C.: Static and dynamic aspects of optimal sequential decision making. Ph.D. thesis, Jozsef Attila University, Szeged, Hungary (1997)
Szepesvári, C., Littman, M.: Generalized Markov decision processes: dynamic-programming and reinforcement-learning algorithms. CS-96-11, Brown University, Department of Computer Science, Providence, RI (1996)
Thrun, S., Fox, W., Burgard, D., Dellaert, F.: Robust Monte Carlo localization for mobile robots. Artif. Intell. 128, 99–141 (2001)
Article MATH Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)

Download references

Author information

Authors and Affiliations

Centro Universitário da FEI, Av. Humberto de Alencar Castelo Branco, 3972, 09850-901, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi
Instituto Tecnológico de Aeronáutica, Praça Marechal Eduardo Gomes, 50, 12228-900, São José dos Campos, Brazil
Carlos H. C. Ribeiro
Escola Politécnica da Universidade de São Paulo, Av. Professor Luciano Gualberto, travessa 3, n. 158, 05508-900, São Paulo, Brazil
Anna H. R. Costa

Authors

Reinaldo A. C. Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Carlos H. C. Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Anna H. R. Costa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reinaldo A. C. Bianchi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bianchi, R.A.C., Ribeiro, C.H.C. & Costa, A.H.R. Accelerating autonomous learning by using heuristic selection of actions. J Heuristics 14, 135–168 (2008). https://doi.org/10.1007/s10732-007-9031-5

Download citation

Received: 30 March 2005
Revised: 28 April 2006
Accepted: 25 October 2006
Published: 04 May 2007
Issue Date: April 2008
DOI: https://doi.org/10.1007/s10732-007-9031-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating autonomous learning by using heuristic selection of actions

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

A review of motion planning algorithms for intelligent robots

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating autonomous learning by using heuristic selection of actions

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

A review of motion planning algorithms for intelligent robots

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation