1 Introduction
2 Learning method
2.1 REINFORCE, PGPE and EM-based algorithms
2.2 Proposed method
3 Experiments
3.1 Pendulum swing-up with limited torque
3.2 Cart-pole balancing
3.3 Two-wheeled smartphone robot
EPHE | 90 % |
PGPE | 60 % |
Finite difference | 85 % |