nach oben

2019 | Buch

Kapitel lesen Erstes Kapitel lesen

Deep Reinforcement Learning

Frontiers of Artificial Intelligence

verfasst von: Mohit Sewak

Verlag: Springer Singapore

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book starts by presenting the basics of reinforcement learning using highly intuitive and easy-to-understand examples and applications, and then introduces the cutting-edge research advances that make reinforcement learning capable of out-performing most state-of-art systems, and even humans in a number of applications. The book not only equips readers with an understanding of multiple advanced and innovative algorithms, but also prepares them to implement systems such as those created by Google Deep Mind in actual code.

This book is intended for readers who want to both understand and apply advanced concepts in a field that combines the best of two worlds – deep learning and reinforcement learning – to tap the potential of ‘advanced artificial intelligence’ for creating real-world applications and game-winning algorithms.

Inhaltsverzeichnis

Frontmatter

1. Introduction to Reinforcement Learning

The Intelligence Behind the AI Agent

Abstract

In this chapter, we will discuss what is Reinforcement Learning and its relationship with Artificial Intelligence. We would then try to go deeper to understand the basic building blocks of Reinforcement Learning like state, actor, environment, and the reward, and will try to understand the challenges in each of the aspect as revealed by using multiple examples so that the intuition is well established, and we build a solid foundation before going ahead into some advanced topics. We would also discuss how the agent learns to take the best action and the policy for learning the same. We will also learn the difference between the On-Policy and the Off-Policy methods.

Mohit Sewak

2. Mathematical and Algorithmic Understanding of Reinforcement Learning

The Markov Decision Process and Solution Approaches

Abstract

In this chapter, we will discuss the Bellman Equation and the Markov Decision Process (MDP), which are the basis for almost all the approaches that we will be discussing further. We will thereafter discuss some of the non-model-based approaches for Reinforcement Learning like Dynamic Programming. It is imperative to understand these concepts before going forward to discussing some advanced topics ahead. Finally, we will cover the algorithms like value iteration and policy iteration for solving the MDP.

Mohit Sewak

3. Coding the Environment and MDP Solution

Coding the Environment, Value Iteration, and Policy Iteration Algorithms

Abstract

In this chapter, we will learn one of the most critical skills of coding our environment for any Reinforcement Learning agent to train against. We will create an environment for the grid-world problem such that it is compatible with OpenAI Gym’s environment such that most out-of-box agents could also work on our environment. Next, we will implement the value iteration and the policy iteration algorithm in code and make them work with our environment.

Mohit Sewak

4. Temporal Difference Learning, SARSA, and Q-Learning

Some Popular Value Approximation Based Reinforcement Learning Approaches

Abstract

In this chapter, we will discuss the very important Q-Learning algorithm which is the basis of Deep Q Networks (DQN) that we will discuss in later chapters. Q-Learning serves to provide solutions for the control side of the problem in Reinforcement Learning and leaves the estimation side of the problem to the Temporal Difference Learning algorithm. Q-Learning provides the control solution in an off-policy approach. The counterpart SARSA algorithm also uses TD Learning for estimation but provides the solution to the control problem in an on-policy manner. In this chapter, we cover the important concepts of the TD Learning, SARSA, and Q-Learning. Also, since Q-Learning is an off-policy algorithm, so it uses different mechanisms for the behavior as opposed to the estimation policy. So, we will also cover the epsilon-greedy and some other similar algorithms that can help us explore the different actions in an off-policy approach.

Mohit Sewak

5. Q-Learning in Code

Coding the Off-Policy Q-Learning Agent and Behavior Policy

Abstract

In this chapter, we would put what we have learnt on Q-Learning in the last chapter in code. We would implement a Q-Table-based Off-Policy Q-Learning agent class, and to complement with a behavior policy, we would implement another class on Behavior Policy with an implementation of the epsilon-greedy algorithm.

Mohit Sewak

6. Introduction to Deep Learning

Enter the World of Modern Machine Learning

Abstract

In this chapter we will cover the essentials of Deep Learning to the point required in this book. We will be discussing the basic architecture of deep learning network like an MLP-DNN and its internal working. Since many of the Reinforcement Learning algorithm work on game feeds have image/video as input states, we will also cover CNN, the deep learning networks for vision in this chapter.

Mohit Sewak

7. Implementation Resources

Training Environments and Agent Implementation Libraries

Abstract

In this chapter, we will discuss some of the resources available for building one’s own reinforcement learning agent easily, or implementing one with the least amount of code. We will also cover some standardized environment, platforms, and community boards against which one can evaluate their custom agent’s performances on different types of reinforcement learning tasks and challenges.

Mohit Sewak

8. Deep Q Network (DQN), Double DQN, and Dueling DQN

A Step Towards General Artificial Intelligence

Abstract

In this chapter, we will take our first step towards Deep Learning based Reinforcement Learning. We will discuss the very popular Deep Q Networks and its very powerful variants like Double DQN and Dueling DQN. Extensive work has been done on these models and these models form the basis of some of the very popular applications like AlphaGo. We will also introduce the concept of General AI in this chapter and discuss how these models have been instrumental in inspiring hopes of achieving General AI through these Deep Reinforcement Learning model applications.

Mohit Sewak

9. Double DQN in Code

Coding the DDQN with Epsilon-Decay Behavior Policy

Abstract

In this chapter, we will implement the Double DQN (DDQN) agent in code. As compared to a conventional DQN, the DDQN agent is more stable as it uses a dedicated target network which remains relatively stable. We also put into practice the concepts of MLP-DNN we learnt in Chap. 6 and have used Keras and TensorFlow for our deep learning models. We have also used the OpenAI gym for instantiating standardized environments to train and test out agents. We use the CartPole environment from the gym for training our model.

Mohit Sewak

10. Policy-Based Reinforcement Learning Approaches

Stochastic Policy Gradient and the REINFORCE Algorithm

Abstract

In this chapter, we will cover the basics of the policy-based approaches especially the policy gradient-based approaches. We will understand why policy-based approaches are superior to that of value-based approaches under some circumstances and why they are also tough to implement. We will subsequently cover some simplifications that will help make policy-based approaches practical to implement and also cover the REINFORCE algorithm.

Mohit Sewak

11. Actor-Critic Models and the A3C

The Asynchronous Advantage Actor-Critic Model

Abstract

In this chapter, we will take the idea of the policy-gradient-based REINFORCE with baseline algorithm further and combine that idea with the value-estimation ideas from the DQN, thus, bringing the best of both worlds together in the form of the Actor-Critic algorithm. We will further discuss the “advantage” baseline implementation of the model with deep learning-based approximators, and take the concept further to implement a parallel implementation of the deep learning-based advantage actor-critic algorithm in the synchronous (A2C) and the asynchronous (A3C) modes.

Mohit Sewak

12. A3C in Code

Coding the Asynchronous Advantage Actor-Critic Agent

Abstract

In this chapter, we will cover the Asynchronous Advantage Actor-Critic Model. We use the TensorFlow’s own implementation of the Keras for this. We define the actor-critic model using the Sub-Classing and eager execution functionality of Keras. Both the master and worker agents use this model. The asynchronous workers are implemented as different threads, syncing with the master after every few steps or completion of their respective episodes.

Mohit Sewak

13. Deterministic Policy Gradient and the DDPG

Deterministic-Policy-Gradient-Based Approaches

Abstract

In this chapter, we will cover the Deterministic Policy-Gradient algorithm (DPG), with the underlying Deterministic Policy-Gradient Theorems that empower the underlying mathematics. We would also cover the Deep Deterministic Policy-Gradient (DDPG) algorithm, which is a combination of the DQN and the DPG and brings the deep learning enhancement to the DPG algorithm. This chapter leads us to a more practical and modern approach for empowering reinforcement learning agents for continuous-action control.

Mohit Sewak

14. DDPG in Code

Coding the DDPG Using High-Level Wrapper Libraries

Abstract

In this chapter, we will code the Deep Deterministic Policy Gradient algorithm and apply it for continuous action control tasks as in the Gym’s Mountain Car Continuous environment. We use the Keras-RL high-reinforcement learning wrapper library for a simplified and succinct implementation.

Mohit Sewak

Backmatter

Titel: Deep Reinforcement Learning
verfasst von: Mohit Sewak
Verlag: Springer Singapore
Electronic ISBN: 978-981-13-8285-7
Print ISBN: 978-981-13-8284-0
DOI: https://doi.org/10.1007/978-981-13-8285-7

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Introduction to Reinforcement Learning

2. Mathematical and Algorithmic Understanding of Reinforcement Learning

3. Coding the Environment and MDP Solution

4. Temporal Difference Learning, SARSA, and Q-Learning

5. Q-Learning in Code

6. Introduction to Deep Learning

7. Implementation Resources

8. Deep Q Network (DQN), Double DQN, and Dueling DQN

9. Double DQN in Code

10. Policy-Based Reinforcement Learning Approaches

11. Actor-Critic Models and the A3C

12. A3C in Code

13. Deterministic Policy Gradient and the DDPG

14. DDPG in Code

Backmatter

Premium Partner