The previous two chapters introduced three different methods that use inter-task mappings to transfer between tasks with different state variables and actions, but all methods required agents in the source task and target task to use the same type of underlying RL method.
1. Value Function Transfer (Section 5.2) used an action-value function learned in the source task to initialize an action-value function in the target task, with the requirement that both source and target task agents use value-function learning, such as Q-Learning or Sarsa.
2. Q-Value Reuse (Section 6.1) also required TD learners in the source and target task, but copied an entire Q-value function, rather than using it to initialize a target task’s action-value function. Thus the target task agent must use value function learning.
3. Policy transfer (Section 6.2) transfers between policy search methods which use neural network action selectors.