Skip to main content
Log in

Accelerating autonomous learning by using heuristic selection of actions

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

This paper investigates how to make improved action selection for online policy learning in robotic scenarios using reinforcement learning (RL) algorithms. Since finding control policies using any RL algorithm can be very time consuming, we propose to combine RL algorithms with heuristic functions for selecting promising actions during the learning process. With this aim, we investigate the use of heuristics for increasing the rate of convergence of RL algorithms and contribute with a new learning algorithm, Heuristically Accelerated Q-learning (HAQL), which incorporates heuristics for action selection to the Q-Learning algorithm. Experimental results on robot navigation show that the use of even very simple heuristic functions results in significant performance enhancement of the learning rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Albus, J.S.: Data storage in the cerebellar model articulation controller (CMAC). J. Dyn. Syst. Meas. Control 97, 228–233 (1975)

    MATH  Google Scholar 

  • Bertsekas, D.P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Upper Saddle River (1987)

    MATH  Google Scholar 

  • Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (1995)

    MATH  Google Scholar 

  • Bianchi, R.A.C.: Using heuristics to accelerate reinforcement learning algorithms (in Portuguese). Ph.D. thesis, University of São Paulo (2004)

  • Bonabeau, E., Dorigo, M., Theraulaz, G.: Inspiration for optimization from social insect behaviour. Nature 406 (2000)

  • Butz, M.V.: State value learning with an anticipatory learning classifier system in a Markov decision process. Technical report 2002018 at the Illinois Genetic Algorithms Laboratory (2002)

  • Drummond, C.: Accelerating reinforcement learning by composing solutions of automatically identified subtasks. J. Artif. Intell. Res. 16, 59–104 (2002)

    MATH  Google Scholar 

  • Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22, 46–57 (1989)

    Article  Google Scholar 

  • Foster, D., Dayan, P.: Structure in the space of value functions. Mach. Learn. 49, 325–346 (2002)

    Article  MATH  Google Scholar 

  • Fox, D., Burgard, W., Thrun, S.: Markov localization for mobile robots in dynamic environments. J. Artif. Intell. Res. 11, 391–427 (1999)

    MATH  Google Scholar 

  • Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 100–107 (1968)

    Article  Google Scholar 

  • Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Google Scholar 

  • Konolige, K., Myers, K.: The Saphira architecture for autonomous mobile robots. In: AI-based Mobile Robots: Case Studies of Successful Robot Systems. MIT, Cambridge (1996)

    Google Scholar 

  • Millan, J.R., Posenato, D., Dedieu, E.: Continuous-action Q-learning. Mach. Learn. 49, 247–266 (2002)

    Article  MATH  Google Scholar 

  • Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  • Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993)

    Google Scholar 

  • Munos, R., Moore, A.W.: Variable resolution discretization in optimal control. Mach. Learn. 49, 291–323 (2002)

    Article  MATH  Google Scholar 

  • Peng, J., Williams, R.J.: Efficient learning and planning within the dyna framework. Adapt. Behav. 1, 437–454 (1993)

    Article  Google Scholar 

  • Puterman, M.L.: Markovian Decision Problems. Wiley, New York (1994)

    Google Scholar 

  • Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)

  • Russell, S., Norvig, P.: Artificial Intelligence: a Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  • Spiegel, M.R.: Probability and Statistics. McGraw-Hill, New York (1975)

    Google Scholar 

  • Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)

    Google Scholar 

  • Sutton, R.S.: Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning. Morgan Kaufmann, Austin (1990)

    Google Scholar 

  • Sutton, R.S.: Generalization in reinforcement learning: successful examples using sparse coarse coding. Adv. Neural. Inf. Process. Syst. 8, 1038–1044 (1996)

    Google Scholar 

  • Szepesvári, C.: Static and dynamic aspects of optimal sequential decision making. Ph.D. thesis, Jozsef Attila University, Szeged, Hungary (1997)

  • Szepesvári, C., Littman, M.: Generalized Markov decision processes: dynamic-programming and reinforcement-learning algorithms. CS-96-11, Brown University, Department of Computer Science, Providence, RI (1996)

  • Thrun, S., Fox, W., Burgard, D., Dellaert, F.: Robust Monte Carlo localization for mobile robots. Artif. Intell. 128, 99–141 (2001)

    Article  MATH  Google Scholar 

  • Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reinaldo A. C. Bianchi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bianchi, R.A.C., Ribeiro, C.H.C. & Costa, A.H.R. Accelerating autonomous learning by using heuristic selection of actions. J Heuristics 14, 135–168 (2008). https://doi.org/10.1007/s10732-007-9031-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-007-9031-5

Keywords

Navigation