Self-Modification of Policy and Utility Function in Rational Agents

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify – for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby ‘escaping’ the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.

Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter

Avoiding Wireheading with Value Reinforcement Learning

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.

Tom Everitt, Marcus Hutter

Death and Suicide in Universal Artificial Intelligence

Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent’s estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically self-preserving), and that the agent’s posterior belief that it will survive increases over time.

Jarryd Martin, Tom Everitt, Marcus Hutter

Ultimate Intelligence Part II: Physical Complexity and Limits of Inductive Inference Systems

We continue our analysis of volume and energy measures that are appropriate for quantifying inductive inference systems. We extend logical depth and conceptual jump size measures in AIT to stochastic problems, and physical measures that involve volume and energy. We introduce a graphical model of computational complexity that we believe to be appropriate for intelligent machines. We show several asymptotic relations between energy, logical depth and volume of computation for inductive inference. In particular, we arrive at a “black-hole equation” of inductive inference, which relates energy, volume, space, and algorithmic information for an optimal inductive inference solution. We introduce energy-bounded algorithmic entropy. We briefly apply our ideas to the physical limits of intelligent computation in our universe.

Eray Özkural

Open-Ended Intelligence

On the Role of Individuation in AGI

We offer a novel theoretical approach to AGI. Starting with a brief introduction of the current conceptual approach, our critique exposes limitations in the ontological roots of the concept of intelligence. We propose a paradigm shift from intelligence perceived as a competence of individual agents defined in relation to an a priori given problem or a goal, to intelligence perceived as a formative process of self-organization by which intelligent agents are individuated. We call this process Open-ended intelligence. This paradigmatic shift significantly extends the concept of intelligence beyond its current definitions and overcomes the difficulties exposed in the critique. Open-ended intelligence is developed as an abstraction of the process of cognitive development so its application can be extended to general agents and systems. We show how open-ended intelligence can be framed in terms of a distributed, self-organizing scalable network of interacting elements.

David (Weaver) Weinbaum, Viktoras Veitas

The AGI Containment Problem

There is considerable uncertainty about what properties, capabilities and motivations future AGIs will have. In some plausible scenarios, AGIs may pose security risks arising from accidents and defects. In order to mitigate these risks, prudent early AGI research teams will perform significant testing on their creations before use. Unfortunately, if an AGI has human-level or greater intelligence, testing itself may not be safe; some natural AGI goal systems create emergent incentives for AGIs to tamper with their test environments, make copies of themselves on the internet, or convince developers and operators to do dangerous things. In this paper, we survey the AGI containment problem – the question of how to build a container in which tests can be conducted safely and reliably, even on AGIs with unknown motivations and capabilities that could be dangerous. We identify requirements for AGI containers, available mechanisms, and weaknesses that need to be addressed.

James Babcock, János Kramár, Roman Yampolskiy

Imitation Learning as Cause-Effect Reasoning

We propose a framework for general-purpose imitation learning centered on cause-effect reasoning. Our approach infers a hierarchical representation of a demonstrator’s intentions, which can explain why they acted as they did. This enables rapid generalization of the observed actions to new situations. We employ a novel causal inference algorithm with formal guarantees and connections to automated planning. Our approach is implemented and validated empirically using a physical robot, which successfully generalizes skills involving bimanual manipulation of composite objects in 3D. These results suggest that cause-effect reasoning is an effective unifying principle for cognitive-level imitation learning.

Garrett Katz, Di-Wei Huang, Rodolphe Gentili, James Reggia

Some Theorems on Incremental Compression

The ability to induce short descriptions of, i.e. compressing, a wide class of data is essential for any system exhibiting general intelligence. In all generality, it is proven that incremental compression – extracting features of data strings and continuing to compress the residual data variance – leads to a time complexity superior to universal search if the strings are incrementally compressible. It is further shown that such a procedure breaks up the shortest description into a set of pairwise orthogonal features in terms of algorithmic information.

Arthur Franz

Rethinking Sigma’s Graphical Architecture: An Extension to Neural Networks

The status of Sigma’s grounding in graphical models is challenged by the ways in which their semantics has been violated while incorporating rule-based reasoning into them. This has led to a rethinking of what goes on in its graphical architecture, with results that include a straightforward extension to feedforward neural networks (although not yet with learning).

Paul S. Rosenbloom, Abram Demski, Volkan Ustun

Real-Time GA-Based Probabilistic Programming in Application to Robot Control

Possibility to solve the problem of planning and plan recovery for robots using probabilistic programming with optimization queries, which is being developed as a framework for AGI and cognitive architectures, is considered. Planning can be done directly by introducing a generative model for plans and optimizing an objective function calculated via plan simulation. Plan recovery is achieved almost without modifying optimization queries. These queries are simply executed in parallel with plan execution by a robot meaning that they continuously optimize dynamically varying objective functions tracking their optima. Experiments with the NAO robot showed that replanning can be naturally done within this approach without developing special plan recovery methods.

Alexey Potapov, Sergey Rodionov, Vita Potapova

About Understanding

The concept of understanding is commonly used in everyday communications, and seems to lie at the heart of human intelligence. However, no concrete theory of understanding has been fielded as of yet in artificial intelligence (AI), and references on this subject are far from abundant in the research literature. We contend that the ability of an artificial system to autonomously deepen its understanding of phenomena in its surroundings must be part of any system design targeting general intelligence. We present a theory of pragmatic understanding, discuss its implications for architectural design and analyze the behavior of an intelligent agent implementing the theory. Our agent learns to understand how to perform multimodal dialogue with humans through observation, becoming capable of constructing sentences with complex grammar, generating proper question-answer patterns, correctly resolving and generating anaphora with coordinated deictic gestures, producing efficient turntaking, and following the structure of interviews, without any information on this being provided up front.

Kristinn R. Thórisson, David Kremelberg, Bas R. Steunebrink, Eric Nivel

Why Artificial Intelligence Needs a Task Theory

And What It Might Look Like

The concept of “task” is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane’s performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial general intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A task theory would enable addressing tasks at the class level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks.

Kristinn R. Thórisson, Jordi Bieger, Thröstur Thorarensen, Jóna S. Sigurðardóttir, Bas R. Steunebrink

Growing Recursive Self-Improvers

Research into the capability of recursive self-improvement typically only considers pairs of $$\langle $$agent, self-modification candidate$$\rangle $$, and asks whether the agent can determine/prove if the self-modification is beneficial and safe. But this leaves out the much more important question of how to come up with a potential self-modification in the first place, as well as how to build an AI system capable of evaluating one. Here we introduce a novel class of AI systems, called experience-based AI (expai), which trivializes the search for beneficial and safe self-modifications. Instead of distracting us with proof-theoretical issues, expai systems force us to consider their education in order to control a system’s growth towards a robust and trustworthy, benevolent and well-behaved agent. We discuss what a practical instance of expai looks like and build towards a “test theory” that allows us to gauge an agent’s level of understanding of educational material.

Bas R. Steunebrink, Kristinn R. Thórisson, Jürgen Schmidhuber

Different Conceptions of Learning: Function Approximation vs. Self-Organization

This paper compares two understandings of “learning” in the context of AGI research: algorithmic learning that approximates an input/output function according to given instances, and inferential learning that organizes various aspects of the system according to experience. The former is how “learning” is often interpreted in the machine learning community, while the latter is exemplified by the AGI system NARS. This paper describes the learning mechanism of NARS, and contrasts it with canonical machine learning algorithms. It is concluded that inferential learning is arguably more fundamental for AGI systems.

Pei Wang, Xiang Li

The Emotional Mechanisms in NARS

This paper explains the conceptual design and experimental implementation of the components of NARS that are directly related to emotion. It is argued that emotion is necessary for an AGI system that has to work with insufficient knowledge and resources. This design is also compared to the other approaches in AGI research, as well as to the relevant aspects in the human brain.

Pei Wang, Max Talanov, Patrick Hammer

The OpenNARS Implementation of the Non-Axiomatic Reasoning System

This paper describes the implementation of a Non-Axiomatic Reasoning System (NARS), a unified AGI system which works under the assumption of insufficient knowledge and resources (AIKR). The system’s architecture, memory structure, inference engine, and control mechanism are described in detail.

Patrick Hammer, Tony Lofthouse, Pei Wang

Integrating Symbolic and Sub-symbolic Reasoning

This paper proposes a way of bridging the gap between symbolic and sub-symbolic reasoning. More precisely, it describes a developing system with bounded rationality that bases its decisions on sub-symbolic as well as symbolic reasoning. The system has a fixed set of needs and its sole goal is to stay alive as long as possible by satisfying those needs. It operates without pre-programmed knowledge of any kind. The learning mechanism consists of several meta-rules that govern the development of its network-based memory structure. The decision making mechanism operates under time constraints and combines symbolic reasoning, aimed at compressing information, with sub-symbolic reasoning, aimed at planning.

Claes Strannegård, Abdul Rahim Nizamani

Integrating Axiomatic and Analogical Reasoning

We present a computational model of a developing system with bounded rationality that is surrounded by an arbitrary number of symbolic domains. The system is fully automatic and makes continuous observations of facts emanating from those domains. The system starts from scratch and gradually evolves a knowledge base consisting of three parts: (1) a set of beliefs for each domain, (2) a set of rules for each domain, and (3) an analogy for each pair of domains. The learning mechanism for updating the knowledge base uses rote learning, inductive learning, analogy discovery, and belief revision. The reasoning mechanism combines axiomatic reasoning for drawing conclusions inside the domains, with analogical reasoning for transferring knowledge from one domain to another. Thus the reasoning processes may use analogies to jump back and forth between domains.

Claes Strannegård, Abdul Rahim Nizamani, Ulf Persson

Embracing Inference as Action: A Step Towards Human-Level Reasoning

Human-level AI involves the ability to reason about the beliefs of other agents, even when those other agents have reasoning styles that may be very different than the AI’s. The ability to carry out reasonable inferences in such situations, as well as in situations where an agent must reason about the beliefs of another agent’s beliefs about yet another agent, is under-studied. We show how such reasoning can be carried out in a new variant of the cognitive event calculus we call $$\mathcal {CEC}_\mathtt {AC}$$, by introducing several new powerful features for automated reasoning: First, the implementation of classical logic at the “system-level” and nonclassical logics at the “belief-level”; Second, $$\mathcal {CEC}_\mathtt {AC}$$ treats all inferences made by agents as actions. This opens the door for two more additional features: epistemic boxes, which are a sort of frame in which the reasoning of an individual agent can be simulated, and evaluated codelets, which allow our reasoner to carry out operations beyond the limits of many current systems. We explain how these features are achieved and implemented in the MATR reasoning system, and discuss their consequences.

John Licato, Maxwell Fowler

Asymptotic Logical Uncertainty and the Benford Test

Almost all formal theories of intelligence suffer from the problem of logical omniscience, the assumption that an agent already knows all consequences of its beliefs. Logical uncertainty codifies uncertainty about the consequences of existing beliefs. This implies a departure from beliefs governed by standard probability theory. Here, we study the asymptotic properties of beliefs on quickly computable sequences of logical sentences. Motivated by an example we call the Benford test, we provide an approach which identifies when such subsequences are indistinguishable from random, and learns their probabilities.

Scott Garrabrant, Tsvi Benson-Tilsen, Siddharth Bhaskar, Abram Demski, Joanna Garrabrant, George Koleszarik, Evan Lloyd

Towards a Computational Framework for Function-Driven Concept Invention

We propose a novel framework for computational concept invention. As opposed to recent implementations of Fauconnier’s and Turner’s Conceptual Blending Theory, our framework simplifies computational concept invention by focusing on concepts’ functions rather than on structural similarity of concept descriptions. Even though creating an optimal combination of concepts that achieves the desired functions is NP-complete in general, some interesting special cases are tractable.

Nico Potyka, Danny Gómez-Ramírez, Kai-Uwe Kühnberger

System Induction Games and Cognitive Modeling as an AGI Methodology

We propose a methodology for using human cognition as a template for artificial generally intelligent agents that learn from experience. In particular, we consider the problem of learning certain Mealy machines from observations of their behavior; this is a general but conceptually simple learning task that can be given to humans as well as machines. We illustrate by example the sorts of observations that can be gleaned from studying human performance on this task.

Sean Markan

Integrating Model-Based Prediction and Facial Expressions in the Perception of Emotion

Understanding a person’s mental state is a key challenge to the design of Artificial General Intelligence (AGI) that can interact with people. A range of technologies have been developed to infer a user’s emotional state from facial expressions. Such bottom-up approaches confront several problems, including that there are significant individual and cultural differences in how people display emotions. More fundamentally, in many applications we may want to know other mental states such as goals and beliefs that can be critical for effective interaction with a person. Instead of bottom-up processing of facial expressions, in this work, we take a predictive, Bayesian approach. An observer agent uses mental models of an observed agent’s goals to predict how the observed will react emotionally to an event. These predictions are then integrated with the observer’s perceptions of the observed agent’s expressions, as provided by a perceptual model of how the observed tends to display emotions. This integration provides the interpretation of the emotion displayed while also updating the observer’s mental and emotional display models of the observed. Thus perception, mental model and display model are integrated into a single process. We provide a simulation study to initially test the effectiveness of the approach and discuss future work in testing the approach in interactions with people.

Nutchanon Yongsatianchot, Stacy Marsella

A Few Notes on Multiple Theories and Conceptual Jump Size

These are a few notes about some of Ray Solomonoff’s foundational work in algorithmic probability, focussing on the universal prior and conceptual jump size, including a few illustrations of how he thought. His induction theory gives a way to compare the likelihood of different theories describing observations. He used Bayes’ rule of causation to discard theories inconsistent with the observations. Can we find good theories? Lsearch may give a way to search and the conceptual jump size a measure for this.

Grace Solomonoff

Generalized Temporal Induction with Temporal Concepts in a Non-axiomatic Reasoning System

The introduction of Temporal Concepts into a Syllogistic based reasoning system such as NARS (Non-Axiomatic Reasoning System) provides a generalized temporal induction capability and extends the meaning of semantic relationship to include temporality.

Tony Lofthouse, Patrick Hammer

Introspective Agents: Confidence Measures for General Value Functions

Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions. While such adaptive agents may leverage engineered knowledge, they will require the capacity to construct and evaluate knowledge themselves from their own experience in a bottom-up, constructivist fashion. This position paper builds on the idea of encoding knowledge as temporally extended predictions through the use of general value functions. Prior work has focused on learning predictions about externally derived signals about a task or environment (e.g. battery level, joint position). Here we advocate that the agent should also predict internally generated signals regarding its own learning process—for example, an agent’s confidence in its learned predictions. Finally, we suggest how such information would be beneficial in creating an introspective agent that is able to learn to make good decisions in a complex, changing world.

Craig Sherstan, Adam White, Marlos C. Machado, Patrick M. Pilarski

Automatic Sampler Discovery via Probabilistic Programming and Approximate Bayesian Computation

We describe an approach to automatic discovery of samplers in the form of human interpretable probabilistic programs. Specifically, we learn the procedure code of samplers for one-dimensional distributions. We formulate a Bayesian approach to this problem by specifying an adaptor grammar prior over probabilistic program code, and use approximate Bayesian computation to learn a program whose execution generates samples that match observed data or analytical characteristics of a distribution of interest. In our experiments we leverage the probabilistic programming system Anglican to perform Markov chain Monte Carlo sampling over the space of programs. Our results are competive relative to state-of-the-art genetic programming methods and demonstrate that we can learn approximate and even exact samplers.

Yura Perov, Frank Wood

How Much Computation and Distributedness is Needed in Sequence Learning Tasks?

In this paper, we are analyzing how much computation and distributedness of representation is needed to solve sequence-learning tasks which are essential for many artificial intelligence applications. We propose a novel minimal architecture based on cellular automata. The states of the cells are used as the reservoir of activities as in Echo State Networks. The projection of the input onto this reservoir medium provides a systematic way of remembering previous inputs and combining the memory with a continuous stream of inputs. The proposed framework is tested on classical synthetic pathological tasks that are widely used in evaluating recurrent algorithms. We show that the proposed algorithm achieves zero error in all tasks, giving a similar performance with Echo State Networks, but even better in many different aspects. The comparative results in our experiments suggest that, computation of high order attribute statistics and representing them in a distributed manner is essential, but it can be done in a very simple network of cellular automaton with identical binary units. This raises the question of whether real valued neuron units are mandatory for solving complex problems that are distributed over time. Even very sparsely connected binary units with simple computational rules can provide the required computation for intelligent behavior.

Mrwan Margem, Ozgur Yilmaz

Analysis of Algorithms and Partial Algorithms

We present an alternative methodology for the analysis of algorithms, based on the concept of expected discounted reward. This methodology naturally handles algorithms that do not always terminate, so it can (theoretically) be used with partial algorithms for undecidable problems, such as those found in artificial general intelligence (AGI) and automated theorem proving. We mention an approach to self-improving AGI enabled by this methodology.

Andrew MacFie

Estimating Cartesian Compression via Deep Learning

We introduce a learning architecture that can serve compression while it also satisfies the constraints of factored reinforcement learning. Our novel Cartesian factors enable one to decrease the number of variables being relevant for the ongoing task, an exponential gain in the size of the state space. We demonstrate the working, the limitations and the promises of the abstractions: we develop a representation of space in allothetic coordinates from egocentric observations and argue that the lower dimensional allothetic representation can be used for path planning. Our results on the learning of Cartesian factors indicate that (a) shallow autoencoders perform well in our numerical example and (b) if deeper networks are needed, e.g., for classification or regression, then sparsity should also be enforced at (some of) the intermediate layers.

András Lőrincz, András Sárkány, Zoltán Á. Milacski, Zoltán Tősér

A Methodology for the Assessment of AI Consciousness

The research and philosophical communities currently lack a clear way to quantify, measure, and characterize the degree of consciousness in a mind or AI entity. This paper addresses that gap by providing a numerical measure of consciousness. Implicit in our approach is a definition of consciousness itself. Underlying this is our assumption that consciousness is not a single unified characteristic but a constellation of features, mental abilities, and thought patterns. Although some people may experience their own consciousness as a unified whole, we assume that consciousness is a multi-dimensional set of attributes, each of which can be present to differing degrees in a given mind. These attributes can be measured and therefore the degree of consciousness can be quantified with a number, much as IQ attempts to quantify human intelligence.

Harry H. Porter III

Toward Human-Level Massively-Parallel Neural Networks with Hodgkin-Huxley Neurons

This paper describes neural network algorithms and software that scale up to massively parallel computers. The neuron model used is the best available at this time, the Hodgkin-Huxley equations. Most massively parallel simulations use very simplified neuron models, which cannot accurately simulate biological neurons and the wide variety of neuron types. Using C++ and MPI we can scale these networks to human-level sizes. Computers such as the Chinese TianHe computer are capable of human level neural networks.

Lyle N. Long

Modeling Neuromodulation as a Framework to Integrate Uncertainty in General Cognitive Architectures

One of the most critical properties of a versatile intelligent agent is its capacity to adapt autonomously to any change in the environment without overly complexifying its cognitive architecture. In this paper, we propose that understanding the role of neuromodulation in the brain is of central interest for this purpose. More precisely, we propose that an accurate estimation of the nature of uncertainty present in the environment is performed by specific brain regions and broadcast throughout the cerebral network by neuromodulators, resulting in appropriate changes in cerebral functioning and learning modes. Better understanding the principles of these mechanisms in the brain might tremendously inspire the field of Artificial General Intelligence. The original contribution of this paper is to relate the four major neuromodulators to four fundamental dimensions of uncertainty.

Frédéric Alexandre, Maxime Carrere

Controlling Combinatorial Explosion in Inference via Synergy with Nonlinear-Dynamical Attention Allocation

One of the core principles of the OpenCog AGI design, “cognitive synergy”, is exemplified by the synergy between logical reasoning and attention allocation. This synergy centers on a feedback in which nonlinear-dynamical attention-spreading guides logical inference control, and inference directs attention to surprising new conclusions it has created. In this paper we report computational experiments in which this synergy is demonstrated in practice, in the context of a very simple logical inference problem.More specifically: First-order probabilistic inference generates conclusions, and its inference steps are pruned via “Short Term importance” (STI) attention values associated to the logical Atoms it manipulates. As inference generates conclusions, information theory is used to assess the surprisingness value of these conclusions, and the “short term importance” attention values of the Atoms representing the conclusions are updated accordingly. The result of this feedback is that meaningful conclusions are drawn after many fewer inference steps than would be the case without the introduction of attention allocation dynamics and feedback therewith.This simple example demonstrates a cognitive dynamic that is hypothesized to be very broadly valuable for general intelligence.

Ben Goertzel, Misgana Bayetta Belachew, Matthew Ikle’, Gino Yu

Probabilistic Growth and Mining of Combinations: A Unifying Meta-Algorithm for Practical General Intelligence

A new conceptual framing of the notion of the general intelligence is outlined, in the form of a universal learning meta-algorithm called Probabilistic Growth and Mining of Combinations (PGMC). Incorporating ideas from logical inference systems, Solomonoff induction and probabilistic programming, PGMC is a probabilistic inference based framework which reflects processes broadly occurring in the natural world, is theoretically capable of arbitrarily powerful generally intelligent reasoning, and encompasses a variety of existing practical AI algorithms as special cases. Several ways of manifesting PGMC using the OpenCog AI framework are described. It is proposed that PGMC can be viewed as a core learning process serving as the central dynamic of real-world general intelligence; but that to achieve high levels of general intelligence using limited computational resources, it may be necessary for cognitive systems to incorporate multiple distinct structures and dynamics, each of which realizes this core PGMC process in a different way (optimized for some particular sort of sub-problem).

Ben Goertzel

Ideas for a Reinforcement Learning Algorithm that Learns Programs

Conventional reinforcement learning algorithms such as Q-learning are not good at learning complicated procedures or programs because they are not designed to do that. AIXI, which is a general framework for reinforcement learning, can learn programs as the environment model, but it is not computable.AIXI has a computable and computationally tractable approximation, MC-AIXI(FAC-CTW), but it models the environment not as programs but as a trie, and still has not resolved the trade-off between exploration and exploitation within a realistic amount of computation.This paper presents our research idea for realizing an efficient reinforcement learning algorithm that retains the property of modeling the environment as programs. It also models the policy as programs and has the ability to imitate other agents in the environment.The design policy of the algorithm has two points: (1) the ability to program is indispensable for human-level intelligence, and (2) a realistic solution to the exploration/exploitation trade-off is teaching via imitation.

Susumu Katayama

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter