Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement

Schmidhuber, Jürgen; Zhao, Jieyu; Wiering, Marco

doi:10.1023/A:1007383707642

Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement

Published: July 1997

Volume 28, pages 105–130, (1997)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement

Download PDF

Jürgen Schmidhuber¹,
Jieyu Zhao¹ &
Marco Wiering¹

1623 Accesses
93 Citations
4 Altmetric
Explore all metrics

Abstract

We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the “success-story algorithm” (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger long-term reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental self-improvement). Our inductive transfer case studies involve complex, partially observable environments where traditional reinforcement learning fails.

Article PDF

Policy invariant explicit shaping: an efficient alternative to reward shaping

Article Open access 28 September 2021

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

References

Berry, D. A. & Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London.
Google Scholar
Boddy, M. & Dean, T. L. (1994). Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence, 67:245–285.
Article Google Scholar
Caruana, R., Silver, D. L., Baxter, J., Mitchell, T. M., Pratt, L. Y., & Thrun, S. (1995). Learning to learn: knowledge consolidation and transfer in inductive systems. Workshop held at NIPS-95, Vail, CO, see http://www.cs.cmu.edu/afs/user/caruana/pub/transfer.html.
Chaitin, G. (1969). On the length of programs for computing finite binary sequences: statistical considerations. Journal of the ACM, 16: 145–159.
Article Google Scholar
Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth International Conference on Artificial Intelligence, pages 183–188. AAAI Press, San Jose, California.
Google Scholar
Cliff, D. & Ross, S. (1994). Adding temporary memory to ZCS. Adaptive Behavior, 3: 101–150.
Google Scholar
Cramer, N. L. (1985). A representation for the adaptive generation of simple sequential programs. In Grefenstette, J., editor, Proceedings of an International Conference on Genetic Algorithms and Their Applications, Hillsdale NJ. Lawrence Erlbaum Associates.
Google Scholar
Dickmanns, D., Schmidhuber, J., & Winklhofer, A. (1987). Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.
Google Scholar
Fogel, L., Owens, A., & Walsh, M. (1966). Artificial Intelligence through Simulated Evolution. Willey, New York.
Google Scholar
Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY.
Google Scholar
Greiner, R. (1996). PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(2).
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 345–352. MIT Press, Cambridge MA.
Google Scholar
Kaelbling, L. (1993). Learning in Embedded Systems. MIT Press.
Kaelbling, L., Littman, M., & Cassandra, A. (1995). Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI.
Google Scholar
Kolmogorov, A. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1:1–11.
Google Scholar
Koza, J. R. (1992). Genetic evolution and co-evolution of computer programs. In Langton, C., Taylor, C., Farmer, J. D., and Rasmussen, S., editors, Artificial Life II, pages 313–324. Addison Wesley Publishing Company.
Kumar, P. R. & Varaiya, P. (1986). Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall.
Lenat, D. (1983). Theory formation by heuristic search. Machine Learning, 21.
Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3):265–266.
Google Scholar
Levin, L. A. (1984). Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control, 61:15–37.
Google Scholar
Li, M. & Vitányi, P. M. B. (1993). An Introduction to Kolmogorov Complexity and its Applications. Springer.
Lin, L. (1993). Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh.
Google Scholar
Littman, M. (1994). Memoryless policies: Theoretical limitations and practical results. In D. Cliff, P. Husbands, J. A. M. and Wilson, S. W., editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats 3, pages 297–305. MIT Press/Bradford Books.
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Prieditis, A. and Russell, S., editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 387–395. Morgan Kaufmann Publishers, San Francisco, CA.
Google Scholar
Narendra, K. S. & Thathatchar, M. A. L. (1974). Learning automata-a survey. IEEE Transactions on Systems, Man, and Cybernetics, 4:323–334.
Google Scholar
Pratt, L. & Jennings, B. (1996). A survey of transfer between connectionist networks. Connection Science, 8(2):163–184.
Google Scholar
Ring, M. B. (1994). Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712.
Rosenbloom, P. S., Laird, J. E., & Newell, A. (1993). The SOAR Papers. MIT Press.
Russell, S. & Wefald, E. (1991). Principles of Metareasoning. Artificial Intelligence, 49:361–395.
Google Scholar
Schmidhuber, J. (1987). Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-⋯ hook. Institut für Informatik, Technische Universität München.
Google Scholar
Schmidhuber, J. (1991). Reinforcement learning in Markovian and non-Markovian environments. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 500–506. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Schmidhuber, J. (1993). A self-referential weight matrix. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446–451. Springer.
Schmidhuber, J. (1994). On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München. Revised 1995.
Google Scholar
Schmidhuber, J. (1995). Discovering solutions with low Kolmogorov complexity and high generalization capability. In Prieditis, A. and Russell, S., editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 488–496. Morgan Kaufmann Publishers, San Francisco, CA.
Google Scholar
Schmidhuber, J. (1997a). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks. In press.
Schmidhuber, J. (1997b). A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In Yao, X., editor, Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore. In press.
Google Scholar
Schmidhuber, J., Zhao, J., & Wiering, M. (1996). Simple principles of metalearning. Technical Report IDSIA69-96, IDSIA.
Schwefel, H. P. (1974). Numerische Optimierung von Computer-Modellen. Dissertation. Published 1977 by Birkhäuser, Basel.
Solomonoff, R. (1964). A formal theory of inductive inference. Part I. Information and Control, 7:1–22.
Google Scholar
Solomonoff, R. (1986). An application of algorithmic probability to problems in artificial intelligence. In Kanal, L. N. and Lemmer, J. F., editors, Uncertainty in Artificial Intelligence, pages 473–491. Elsevier Science Publishers.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44.
Google Scholar
Utgoff, P. (1986). Shift of bias for inductive concept learning. In Michalski, R., Carbonell, J., and Mitchell, T., editors, Machine Learning, volume 2, pages 163–190. Morgan Kaufmann, Los Altos, CA.
Google Scholar
Watanabe, O. (1992). Kolmogorov complexity and computational complexity. EATCS Monographs on Theoretical Computer Science, Springer.
Google Scholar
Watkins, C. J. C. H. & Dayan, P. (1992). Q-learning. Machine Learning, 8:279–292.
Google Scholar
Whitehead, S. & Ballard, D. H. (1990). Active perception and reinforcement learning. Neural Computation, 2(4):409–419.
Google Scholar
Wiering, M. & Schmidhuber, J. (1996). Solving POMDPs with Levin search and EIRA. In Saitta, L., editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534–542. Morgan Kaufmann Publishers, San Francisco, CA.
Google Scholar
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256.
Google Scholar
Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7):1341–1390.
Google Scholar
Zhao, J. & Schmidhuber, J. (1996). Incremental self-improvement for life-time multi-agent reinforcement learning. In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., and Wilson, S. W., editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 516–525. MIT Press, Bradford Books.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, CH-6900-, Lugano, Switzerland
Jürgen Schmidhuber, Jieyu Zhao & Marco Wiering

Authors

Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar
Jieyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Marco Wiering
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmidhuber, J., Zhao, J. & Wiering, M. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement. Machine Learning 28, 105–130 (1997). https://doi.org/10.1023/A:1007383707642

Download citation

Issue Date: July 1997
DOI: https://doi.org/10.1023/A:1007383707642

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement

Abstract

Article PDF

Similar content being viewed by others

Policy invariant explicit shaping: an efficient alternative to reward shaping

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement

Abstract

Article PDF

Similar content being viewed by others

Policy invariant explicit shaping: an efficient alternative to reward shaping

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation