nach oben

2015 | Buch

Kapitel lesen Erstes Kapitel lesen

Simulation-Based Optimization

Parametric Optimization Techniques and Reinforcement Learning

verfasst von: Abhijit Gosavi

Verlag: Springer US

Buchreihe : Operations Research/Computer Science Interfaces Series

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduce the evolving area of static and dynamic simulation-based optimization. Covered in detail are model-free optimization techniques – especially designed for those discrete-event, stochastic systems which can be simulated but whose analytical models are difficult to find in closed mathematical forms.

Key features of this revised and improved Second Edition include:

· Extensive coverage, via step-by-step recipes, of powerful new algorithms for static simulation optimization, including simultaneous perturbation, backtracking adaptive search and nested partitions, in addition to traditional methods, such as response surfaces, Nelder-Mead search and meta-heuristics (simulated annealing, tabu search, and genetic algorithms)

· Detailed coverage of the Bellman equation framework for Markov Decision Processes (MDPs), along with dynamic programming (value and policy iteration) for discounted, average, and total reward performance metrics

· An in-depth consideration of dynamic simulation optimization via temporal differences and Reinforcement Learning: Q-Learning, SARSA, and R-SMART algorithms, and policy search, via API, Q-P-Learning, actor-critics, and learning automata

· A special examination of neural-network-based function approximation for Reinforcement Learning, semi-Markov decision processes (SMDPs), finite-horizon problems, two time scales, case studies for industrial tasks, computer codes (placed online) and convergence proofs, via Banach fixed point theory and Ordinary Differential Equations

Themed around three areas in separate sets of chapters – Static Simulation Optimization, Reinforcement Learning and Convergence Analysis – this book is written for researchers and students in the fields of engineering (industrial, systems, electrical and computer), operations research, computer science and applied mathematics.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Background

Abstract

This book seeks to introduce the reader to the rapidly evolving subject called simulation-based optimization. This is not a very young topic, because from the time computers started making an impact on scientific research and it became possible to analyze random systems with computer programs that generated random numbers, engineers have always wanted to optimize systems using simulation models. However, it is only recently that noteworthy success in realizing this objective has been met in practice.

Abhijit Gosavi

Chapter 2. Simulation Basics

Abstract

This chapter has been written to introduce the topic of discrete-event simulation. To comprehend the material presented in this chapter, some background in the theory of probability is needed, some of which is in the Appendix. Two of the main topics covered in this chapter are random number generation and simulation modeling of random systems. Readers familiar with this material can skip this chapter without loss of continuity.

Abhijit Gosavi

Chapter 3. Simulation-Based Optimization: An Overview

Abstract

The purpose of this short chapter is to discuss the role played by computer simulation in simulation-based optimization. Simulation-based optimization revolves around methods that require the maximization (or minimization) of the net rewards (or costs) obtained from a random system. We will be concerned with two types of optimization problems: (1) parametric optimization (also called static optimization) and (2) control optimization (also called dynamic optimization).

Abhijit Gosavi

Chapter 4. Parametric Optimization: Response Surfaces and Neural Networks

Abstract

This chapter will discuss one of the oldest simulation-based methods of parametric optimization, namely, the response surface method (RSM). While RSM is admittedly primitive for the purposes of simulation optimization, it is still a very robust technique that is often used when other methods fail. It hinges on a rather simple idea, which is to obtain an approximate form of the objective function by simulating the system at a finite number of points carefully sampled from the function space. Traditionally, RSM has used regression over the sampled points to find an approximate form of the objective function.

Abhijit Gosavi

Chapter 5. Parametric Optimization: Stochastic Gradients and Adaptive Search

Abstract

This chapter focusses on simulation-based techniques for solving stochastic problems of parametric optimization, also popularly called static optimization problems. Such problems have been defined in Chap. 3.

Abhijit Gosavi

Chapter 6. Control Optimization with Stochastic Dynamic Programming

Abstract

This chapter focuses on a problem of control optimization, in particular the Markov decision problem (or process). Our discussions will be at a very elementary level, and we will not attempt to prove any theorems. The central aim of this chapter is to introduce the reader to classical dynamic programming in the context of solving Markov decision problems. In the next chapter, the same ideas will be presented in the context of simulation-based dynamic programming. The main concepts presented in this chapter are (1) Markov chains, (2) Markov decision problems, (3) semi-Markov decision problems, and (4) classical dynamic programming methods.

Abhijit Gosavi

Chapter 7. Control Optimization with Reinforcement Learning

Abstract

his chapter focuses on a relatively new methodology called reinforcement learning (RL). RL will be presented here as a form of simulation-based dynamic programming, primarily used for solving Markov and semi-Markov decision problems. Pioneering work in the area of RL was performed within the artificial intelligence community, which views it as a “machine learning” method. This perhaps explains the roots of the word “learning” in the name reinforcement learning. We also note that within the artificial intelligence community, “learning” is sometimes used to describe function approximation, e.g., regression. Some kind of function approximation, as we will see below, usually accompanies RL. The word “reinforcement” is linked to the fact that RL algorithms can be viewed as agents that learn through trials and errors (feedback).

Abhijit Gosavi

Chapter 8. Control Optimization with Stochastic Search

Abstract

In this chapter, we discuss an approach for solving Markov decision problems (MDPs) and Semi-Markov decision problems (SMDPs) using an approach that employs the so-called action-selection probabilities instead of the Q-factors required in reinforcement learning (RL). The underlying philosophy of this approach can be explained as follows. The action-selection probabilities, which are stored in some form either directly or indirectly, are used to guide the search. As a result, we have a stochastic search in which each action is considered to be equally good at the start, but using feedback from the system about the effectiveness of each action, the algorithm updates the action-selection probabilities—leading the system to the optimal policy at the end. It should be clear to the reader that like RL, this approach also uses feedback from the system, but unlike RL, it stores action-selection probabilities.

Abhijit Gosavi

Chapter 9. Convergence: Background Material

Abstract

This chapter introduces some fundamental mathematical notions that will be useful in understanding the analysis presented in the subsequent chapters. The aim of this chapter is to introduce elements of the mathematical framework needed for analyzing the convergence of algorithms discussed in this book. Much of the material presented in this chapter is related to mathematical analysis, and hence a reader with a good grasp of mathematical analysis may skip this chapter. To follow Chap. 10, the reader should read all material up to and including Theorem 9.2 in this chapter. All the ideas developed in this chapter will be needed in Chap. 11.

Abhijit Gosavi

Chapter 10. Convergence Analysis of Parametric Optimization Methods

Abstract

This chapter deals with some simple convergence results related to the parametric optimization methods discussed in Chap. 5. The main idea underlying convergence analysis of an algorithm is to identify (mathematically) the solution to which the algorithm converges. Hence to prove that an algorithm works, one must show that the algorithm converges to the optimal solution. In this chapter, this is precisely what we will attempt to do with some algorithms of Chap. 5.

Abhijit Gosavi

Chapter 11. Convergence Analysis of Control Optimization Methods

Abstract

This chapter will discuss the proofs of optimality of a subset of algorithms discussed in the context of control optimization. The chapter is organized as follows. We begin in Sect. 2 with some definitions and notation related to discounted and average reward Markov decision problems (MDPs). Subsequently, we present convergence theory related to dynamic programming (DP) for MDPs in Sects. 3 and 4. In Sect. 5, we discuss some selected topics related to semi-MDPs (SMDPs). Thereafter, from Sect. 6, we present a selected collection of topics related to convergence of reinforcement learning (RL) algorithms.

Abhijit Gosavi

Chapter 12. Case Studies

Abstract

In this chapter, we will describe some case studies related to simulation-based optimization. We will provide a general description of the problem and of the approach used in the solution process. For more specific numeric details, the readers are referred to the references provided. We present three case studies for model-free simulation optimization related to airline revenue management, preventive maintenance of machines, and buffer allocation in production lines in detail. We also present a heuristic rule in each case, which can be used for benchmarking the simulation-optimization performance. Such heuristics are typically problem-dependent and may produce high-quality solutions. Also, without a benchmark, it is difficult to gage the performance of the simulation-optimization algorithm on large-scale problems where the optimal solution cannot be determined. We enumerate numerous other case studies, pointing the reader to appropriate references for further reading.

Abhijit Gosavi

Backmatter

Titel: Simulation-Based Optimization
verfasst von: Abhijit Gosavi
Verlag: Springer US
Electronic ISBN: 978-1-4899-7491-4
Print ISBN: 978-1-4899-7490-7
DOI: https://doi.org/10.1007/978-1-4899-7491-4