Skip to main content

2019 | Buch

Deep Reinforcement Learning

Frontiers of Artificial Intelligence

insite
SUCHEN

Über dieses Buch

This book starts by presenting the basics of reinforcement learning using highly intuitive and easy-to-understand examples and applications, and then introduces the cutting-edge research advances that make reinforcement learning capable of out-performing most state-of-art systems, and even humans in a number of applications. The book not only equips readers with an understanding of multiple advanced and innovative algorithms, but also prepares them to implement systems such as those created by Google Deep Mind in actual code.

This book is intended for readers who want to both understand and apply advanced concepts in a field that combines the best of two worlds – deep learning and reinforcement learning – to tap the potential of ‘advanced artificial intelligence’ for creating real-world applications and game-winning algorithms.

Inhaltsverzeichnis

Frontmatter
1. Introduction to Reinforcement Learning
The Intelligence Behind the AI Agent
Abstract
In this chapter, we will discuss what is Reinforcement Learning and its relationship with Artificial Intelligence. We would then try to go deeper to understand the basic building blocks of Reinforcement Learning like state, actor, environment, and the reward, and will try to understand the challenges in each of the aspect as revealed by using multiple examples so that the intuition is well established, and we build a solid foundation before going ahead into some advanced topics. We would also discuss how the agent learns to take the best action and the policy for learning the same. We will also learn the difference between the On-Policy and the Off-Policy methods.
Mohit Sewak
2. Mathematical and Algorithmic Understanding of Reinforcement Learning
The Markov Decision Process and Solution Approaches
Abstract
In this chapter, we will discuss the Bellman Equation and the Markov Decision Process (MDP), which are the basis for almost all the approaches that we will be discussing further. We will thereafter discuss some of the non-model-based approaches for Reinforcement Learning like Dynamic Programming. It is imperative to understand these concepts before going forward to discussing some advanced topics ahead. Finally, we will cover the algorithms like value iteration and policy iteration for solving the MDP.
Mohit Sewak
3. Coding the Environment and MDP Solution
Coding the Environment, Value Iteration, and Policy Iteration Algorithms
Abstract
In this chapter, we will learn one of the most critical skills of coding our environment for any Reinforcement Learning agent to train against. We will create an environment for the grid-world problem such that it is compatible with OpenAI Gym’s environment such that most out-of-box agents could also work on our environment. Next, we will implement the value iteration and the policy iteration algorithm in code and make them work with our environment.
Mohit Sewak
4. Temporal Difference Learning, SARSA, and Q-Learning
Some Popular Value Approximation Based Reinforcement Learning Approaches
Abstract
In this chapter, we will discuss the very important Q-Learning algorithm which is the basis of Deep Q Networks (DQN) that we will discuss in later chapters. Q-Learning serves to provide solutions for the control side of the problem in Reinforcement Learning and leaves the estimation side of the problem to the Temporal Difference Learning algorithm. Q-Learning provides the control solution in an off-policy approach. The counterpart SARSA algorithm also uses TD Learning for estimation but provides the solution to the control problem in an on-policy manner. In this chapter, we cover the important concepts of the TD Learning, SARSA, and Q-Learning. Also, since Q-Learning is an off-policy algorithm, so it uses different mechanisms for the behavior as opposed to the estimation policy. So, we will also cover the epsilon-greedy and some other similar algorithms that can help us explore the different actions in an off-policy approach.
Mohit Sewak
5. Q-Learning in Code
Coding the Off-Policy Q-Learning Agent and Behavior Policy
Abstract
In this chapter, we would put what we have learnt on Q-Learning in the last chapter in code. We would implement a Q-Table-based Off-Policy Q-Learning agent class, and to complement with a behavior policy, we would implement another class on Behavior Policy with an implementation of the epsilon-greedy algorithm.
Mohit Sewak
6. Introduction to Deep Learning
Enter the World of Modern Machine Learning
Abstract
In this chapter we will cover the essentials of Deep Learning to the point required in this book. We will be discussing the basic architecture of deep learning network like an MLP-DNN and its internal working. Since many of the Reinforcement Learning algorithm work on game feeds have image/video as input states, we will also cover CNN, the deep learning networks for vision in this chapter.
Mohit Sewak
7. Implementation Resources
Training Environments and Agent Implementation Libraries
Abstract
In this chapter, we will discuss some of the resources available for building one’s own reinforcement learning agent easily, or implementing one with the least amount of code. We will also cover some standardized environment, platforms, and community boards against which one can evaluate their custom agent’s performances on different types of reinforcement learning tasks and challenges.
Mohit Sewak
8. Deep Q Network (DQN), Double DQN, and Dueling DQN
A Step Towards General Artificial Intelligence
Abstract
In this chapter, we will take our first step towards Deep Learning based Reinforcement Learning. We will discuss the very popular Deep Q Networks and its very powerful variants like Double DQN and Dueling DQN. Extensive work has been done on these models and these models form the basis of some of the very popular applications like AlphaGo. We will also introduce the concept of General AI in this chapter and discuss how these models have been instrumental in inspiring hopes of achieving General AI through these Deep Reinforcement Learning model applications.
Mohit Sewak
9. Double DQN in Code
Coding the DDQN with Epsilon-Decay Behavior Policy
Abstract
In this chapter, we will implement the Double DQN (DDQN) agent in code. As compared to a conventional DQN, the DDQN agent is more stable as it uses a dedicated target network which remains relatively stable. We also put into practice the concepts of MLP-DNN we learnt in Chap. 6 and have used Keras and TensorFlow for our deep learning models. We have also used the OpenAI gym for instantiating standardized environments to train and test out agents. We use the CartPole environment from the gym for training our model.
Mohit Sewak
10. Policy-Based Reinforcement Learning Approaches
Stochastic Policy Gradient and the REINFORCE Algorithm
Abstract
In this chapter, we will cover the basics of the policy-based approaches especially the policy gradient-based approaches. We will understand why policy-based approaches are superior to that of value-based approaches under some circumstances and why they are also tough to implement. We will subsequently cover some simplifications that will help make policy-based approaches practical to implement and also cover the REINFORCE algorithm.
Mohit Sewak
11. Actor-Critic Models and the A3C
The Asynchronous Advantage Actor-Critic Model
Abstract
In this chapter, we will take the idea of the policy-gradient-based REINFORCE with baseline algorithm further and combine that idea with the value-estimation ideas from the DQN, thus, bringing the best of both worlds together in the form of the Actor-Critic algorithm. We will further discuss the “advantage” baseline implementation of the model with deep learning-based approximators, and take the concept further to implement a parallel implementation of the deep learning-based advantage actor-critic algorithm in the synchronous (A2C) and the asynchronous (A3C) modes.
Mohit Sewak
12. A3C in Code
Coding the Asynchronous Advantage Actor-Critic Agent
Abstract
In this chapter, we will cover the Asynchronous Advantage Actor-Critic Model. We use the TensorFlow’s own implementation of the Keras for this. We define the actor-critic model using the Sub-Classing and eager execution functionality of Keras. Both the master and worker agents use this model. The asynchronous workers are implemented as different threads, syncing with the master after every few steps or completion of their respective episodes.
Mohit Sewak
13. Deterministic Policy Gradient and the DDPG
Deterministic-Policy-Gradient-Based Approaches
Abstract
In this chapter, we will cover the Deterministic Policy-Gradient algorithm (DPG), with the underlying Deterministic Policy-Gradient Theorems that empower the underlying mathematics. We would also cover the Deep Deterministic Policy-Gradient (DDPG) algorithm, which is a combination of the DQN and the DPG and brings the deep learning enhancement to the DPG algorithm. This chapter leads us to a more practical and modern approach for empowering reinforcement learning agents for continuous-action control.
Mohit Sewak
14. DDPG in Code
Coding the DDPG Using High-Level Wrapper Libraries
Abstract
In this chapter, we will code the Deep Deterministic Policy Gradient algorithm and apply it for continuous action control tasks as in the Gym’s Mountain Car Continuous environment. We use the Keras-RL high-reinforcement learning wrapper library for a simplified and succinct implementation.
Mohit Sewak
Backmatter
Metadaten
Titel
Deep Reinforcement Learning
verfasst von
Mohit Sewak
Copyright-Jahr
2019
Verlag
Springer Singapore
Electronic ISBN
978-981-13-8285-7
Print ISBN
978-981-13-8284-0
DOI
https://doi.org/10.1007/978-981-13-8285-7

Premium Partner