Deep Reinforcement Learning

verfasst von: Prof. Aske Plaat

Verlag: Springer Nature Singapore

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Deep reinforcement learning has attracted considerable attention recently. Impressive results have been achieved in such diverse fields as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to understand problems that were previously considered to be very difficult. In the game of Go, the program AlphaGo has even learned to outmatch three of the world’s leading players.Deep reinforcement learning takes its inspiration from the fields of biology and psychology. Biology has inspired the creation of artificial neural networks and deep learning, while psychology studies how animals and humans learn, and how subjects’ desired behavior can be reinforced with positive and negative stimuli. When we see how reinforcement learning teaches a simulated robot to walk, we are reminded of how children learn, through playful exploration. Techniques that are inspired by biology and psychology work amazingly well in computers: animal behavior and the structure of the brain as new blueprints for science and engineering. In fact, computers truly seem to possess aspects of human behavior; as such, this field goes to the heart of the dream of artificial intelligence.

These research advances have not gone unnoticed by educators. Many universities have begun offering courses on the subject of deep reinforcement learning. The aim of this book is to provide an overview of the field, at the proper level of detail for a graduate course in artificial intelligence. It covers the complete field, from the basic algorithms of Deep Q-learning, to advanced topics such as multi-agent reinforcement learning and meta learning.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract

Deep reinforcement learning studies how we learn to solve complex problems, problems that require us to find a solution to a sequence of decisions in high-dimensional states. To make bread, we must use the right flour, add some salt, yeast and sugar, prepare the dough (not too dry and not too wet), pre-heat the oven to the right temperature, and bake the bread (but not too long); to win a ballroom dancing contest we must find the right partner, learn to dance, practice, and beat the competition; to win in chess we must study, practice, and make all the right moves.

Aske Plaat

Chapter 2. Tabular Value-Based Reinforcement Learning

Abstract

This chapter will introduce the classic, tabular, field of reinforcement learning, to build a foundation for the next chapters. First, we will introduce the concepts of agent and environment. Next come Markov decision processes, the formalism that is used to reason mathematically about reinforcement learning. We discuss at some length the elements of reinforcement learning: states, actions, values, and policies.

Aske Plaat

Chapter 3. Deep Value-Based Reinforcement Learning

Abstract

The previous chapter introduced the field of classic reinforcement learning. We learned about agents and environments, and about states, actions, values, and policy functions. We also saw our first planning and learning algorithms: value iteration, SARSA, and Q-learning. The methods in the previous chapter were exact, tabular, methods, which work for problems of moderate size that fit in memory.

Aske Plaat

Chapter 4. Policy-Based Reinforcement Learning

Abstract

Some of the most successful applications of deep reinforcement learning have a continuous action space, such as applications in robotics, self-driving cars, and real-time strategy games.

Aske Plaat

Chapter 5. Model-Based Reinforcement Learning

Abstract

The previous chapters discussed model-free methods, and we saw their success in video games and simulated robotics. In model-free methods, the agent updates a policy directly from the feedback that the environment provides on its actions. The environment performs the state transitions and calculates the reward. A disadvantage of deep model-free methods is that they can be slow to train; for stable convergence or low variance, often millions of environment samples are needed before the policy function converges to a high-quality optimum.

Aske Plaat

Chapter 6. Two-Agent Self-Play

Abstract

Previous chapters were concerned with how a single agent can learn optimal behavior for its environment. This chapter is different. We turn to problems where two agents operate whose behavior will both be modeled (and, in the next chapter, more than two).

Aske Plaat

Chapter 7. Multi-Agent Reinforcement Learning

Abstract

On this planet, in our societies, millions of people live and work together. Each individual has their own individual set of goals and performs their actions accordingly. Some of these goals are shared. When we want to achieve shared goals, we organize ourselves in teams, groups, companies, organizations, and societies.

Aske Plaat

Chapter 8. Hierarchical Reinforcement Learning

Abstract

The goal of artificial intelligence is to understand and create intelligent behavior; the goal of deep reinforcement learning is to find a behavior policy for ever larger sequential decision problems.

Aske Plaat

Chapter 9. Meta-Learning

Abstract

Although current deep reinforcement learning methods have obtained great successes, training times for most interesting problems are high; they are often measured in weeks or months, consuming time and resources—as you may have noticed while doing some of the exercises at the end of the chapters.

Aske Plaat

Chapter 10. Further Developments

Abstract

We have come to the end of this book. We will reflect on what we have learned. In this chapter we will review the main themes and essential lessons, and we will look to the future.

Aske Plaat

Backmatter

Titel: Deep Reinforcement Learning
verfasst von: Prof. Aske Plaat
Verlag: Springer Nature Singapore
Electronic ISBN: 978-981-19-0638-1
Print ISBN: 978-981-19-0637-4
DOI: https://doi.org/10.1007/978-981-19-0638-1