Skip to main content

2002 | Buch

Handbook of Markov Decision Processes

Methods and Applications

herausgegeben von: Eugene A. Feinberg, Adam Shwartz

Verlag: Springer US

Buchreihe : International Series in Operations Research & Management Science

insite
SUCHEN

Über dieses Buch

Eugene A. Feinberg Adam Shwartz This volume deals with the theory of Markov Decision Processes (MDPs) and their applications. Each chapter was written by a leading expert in the re­ spective area. The papers cover major research areas and methodologies, and discuss open questions and future research directions. The papers can be read independently, with the basic notation and concepts ofSection 1.2. Most chap­ ters should be accessible by graduate or advanced undergraduate students in fields of operations research, electrical engineering, and computer science. 1.1 AN OVERVIEW OF MARKOV DECISION PROCESSES The theory of Markov Decision Processes-also known under several other names including sequential stochastic optimization, discrete-time stochastic control, and stochastic dynamic programming-studiessequential optimization ofdiscrete time stochastic systems. The basic object is a discrete-time stochas­ tic system whose transition mechanism can be controlled over time. Each control policy defines the stochastic process and values of objective functions associated with this process. The goal is to select a "good" control policy. In real life, decisions that humans and computers make on all levels usually have two types ofimpacts: (i) they cost orsavetime, money, or other resources, or they bring revenues, as well as (ii) they have an impact on the future, by influencing the dynamics. In many situations, decisions with the largest immediate profit may not be good in view offuture events. MDPs model this paradigm and provide results on the structure and existence of good policies and on methods for their calculation.

Inhaltsverzeichnis

Frontmatter

Introduction

1. Introduction
Abstract
This volume deals with the theory of Markov Decision Processes (MDPs) and their applications. Each chapter was written by a leading expert in the respective area. The papers cover major research areas and methodologies, and discuss open questions and future research directions. The papers can be read independently, with the basic notation and concepts of Section 1.2. Most chap- ters should be accessible by graduate or advanced undergraduate students in fields of operations research, electrical engineering, and computer science.
Eugene A. Feinberg, Adam Shwartz

Finite State and Action Models

Frontmatter
2. Finite State and Action MDPS
Abstract
In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite horizon models. For the finite horizon model the utility function of the total expected reward is commonly used. For the infinite horizon the utility function is less obvious. We consider several criteria: total discounted expected reward, average expected reward and more sensitive optimality criteria including the Blackwell optimality criterion. We end with a variety of other subjects.
The emphasis is on computational methods to compute optimal policies for these criteria. These methods are based on concepts like value iteration, policy iteration and linear programming. This survey covers about three hundred papers. Although the subject of finite state and action MDPs is classical, there are still open problems. We also mention some of them.
Lodewijk Kallenberg
3. Bias Optimality
Abstract
The use of the long-run average reward or the gain as an optimally criterion has received considerable attention in the literature. However, for many practical models the gain has the undesirable property of being underselective, that is, there may be several gain optimal policies. After finding the set of policies that achieve the primary objective of maximizing the long-run average reward one might search for that which maximizes the “short-run” reward. This reward, called the bias aids in distinguishing among multiple gain optimal policies. This chapter focuses on establishing the usefulness of the bias in distinguishing among multiple gain optimal policies, computing it and demonstrating the implicit discounting captured by bias on recurrent states.
Mark E. Lewis, Martin L. Puterman
4. Singular Perturbations of Markov Chains and Decision Processes
Abstract
In this survey we present a unified treatment of both singular and regular perturbations in finite Markov chains and decision processes. The treatment is based on the analysis of series expansions of various important entities such as the perturbed stationary distribution matrix, the deviation matrix, the mean-passage times matrix and others.
Konstantin E. Avrachenkov, Jerzy Filar, Moshe Haviv

Infinite State Models

Frontmatter
5. Average Reward Optimization Theory for Denumerable State Spaces
Abstract
In this chapter we deal with certain aspects of average reward optimality. It is assumed that the state space X is denumerably infinite, and that for each x ∈ X, the set A(x) of available actions is finite. It is possible to extend the theory to compact action sets, but at the expense of increased mathematical complexity. Finite action sets are sufficient for digitally implemented controls, and so we restrict our attention to this case.
Linn I. Sennott
6. Total Reward Criteria
Abstract
This chapter deals with total reward criteria. We discuss the existence and structure of optimal and nearly optimal policies and the convergence of value iteration algorithms under the so-called General Convergence Condition. This condition assumes that, for any initial state and for any policy, the expected sum of positive parts of rewards is finite. Positive, negative, and discounted dynamic programming problems are special cases when the General Convergence Condition holds.
Eugene A. Feinberg
7. Mixed Criteria
Abstract
Mixed criteria are linear combinations of standard criteria which cannot be represented as standard criteria. Linear combinations of total discounted and average rewards as well as linear combinations of total discounted rewards with different discount factors are examples of mixed criteria. We discuss the structure of optimal policies and algorithms for their computation for problems with and without constraints.
Eugene A. Feinberg, Adam Shwartz
8. Blackwell Optimality
Abstract
In this introductory section we consider Blackwell optimality in Controlled Markov Processes (CMPs) with finite state and action spaces; for brevity, we call them finite models. We introduce the basic definitions, the Laurent-expansion technique, the lexicographical policy improvement, and the Blackwell optimality equation, which were developed at the early stage of the study of sensitive criteria in CMPs. We also mention some extensions and generalizations obtained afterwards for the case of a finite state space. In Chapter 2 the algorithmic approach to Blackwell optimality for finite models is given. We refer to that chapter for computational methods. Especially for the linear programming method, which we do not introduce.
Arie Hordijk, Alexander A. Yushkevich
9. The Poisson Equation for Countable Markov Chains: Probabilistic Methods and Interpretations
Abstract
This paper considers the Poisson equation associated with time-homogeneous Markov chains on a countable state space. The discussion emphasizes probabilistic arguments and focuses on three separate issues, namely (i) the existence and uniqueness of solutions to the Poisson equation, (ii) growth estimates and bounds on these solutions and (iii) their parametric dependence. Answers to these questions are obtained under a variety of recurrence conditions.
Motivating applications can be found in the theory of Markov decision processes in both its adaptive and non-adaptive formulations, and in the theory of Stochastic Approximations. The results complement available results from Potential Theory for Markov chains, and are therefore of independent interest.
Armand M. Makowski, Adam Shwartz
10. Stability, Performance Evaluation, and Optimization
Abstract
The theme of this chapter is stability and performance approximation for MDPs on an infinite state space. The main results are centered around stochastic Lyapunov functions for verifying stability and bounding performance. An operator-theoretic framework is used to reduce the analytic arguments to the level of the finite state-space case.
Sean P. Meyn
11. Convex Analytic Methods in Markov Decision Processes
Abstract
This article describes the convex analytic approach to classical Markov decision processes wherein they are cast as a static convex programming problem in the space of measures. Applications to multiobjective problems are described.
Vivek S. Borkar
12. The Linear Programming Approach
Abstract
This chapter is concerned with the Linear Programming (LP) approach to MDPs in general Borel spaces, valid for several criteria, including the finite horizon and long run expected average cost, as well as the infinite horizon expected discounted cost.
Onésimo Hernández-Lerma, Jean B. Lasserre
13. Invariant Gambling Problems and Markov Decision Processes
Abstract
Markov decision problems can be viewed as gambling problems that are invariant under the action of a group or semi-group. It is shown that invariant stationary plans are almost surely adequate for a leavable, measurable, invariant gambling problem with a nonnegative utility function and a finite optimal reward function. This generalizes results about stationary plans for positive Markov decision models as well as measurable gambling problems.
Lester E. Dubins, Ashok P. Maitra, William D. Sudderth

Applications

Frontmatter
14. Neuro-Dynamic Programming: Overview and Recent Trends
Abstract
Neuro-dynamic programming is comprised of algorithms for solving large-scale stochastic control problems. Many ideas underlying these algorithms originated in the field of artificial intelligence and were motivated to some extent by descriptive models of animal behavior. This chapter provides an overview of the history and state-of-the-art in neuro-dynamic programming, as well as a review of recent results involving two classes of algorithms that have been the subject of much recent research activity: temporal-difference learning and actor-critic methods.
Benjamin Van Roy
15. Markov Decision Processes in Finance and Dynamic Options
Abstract
In this paper a discrete-time Markovian model for a financial market is chosen. The fundamental theorem of asset pricing relates the existence of a martingale measure to the no-arbitrage condition. It is explained how to prove the theorem by stochastic dynamic programming via portfolio optimization. The approach singles out certain martingale measures with additional interesting properties. Furthermore, it is shown how to use dynamic programming to study the smallest initial wealth x * that allows for super-hedging a contingent claim by some dynamic portfolio. There, a joint property of the set of policies in a Markov decision model and the set of martingale measures is exploited. The approach extends to dynamic options which are introduced here and are generalizations of American options.
Manfred Schäl
16. Applications of Markov Decision Processes in Communication Networks
Abstract
We present in this chapter a survey on applications of MDPs to communication networks. We survey both the different application areas in communication networks as well as the theoretical tools that have been developed to model and to solve the resulting control problems.
Eitan Altman
17. Water Reservoir Applications of Markov Decision Processes
Abstract
Decision problems in water resources management are usually stochastic, dynamic and multidimensional. MDP models have been used since the early fifties for the planning and operation of reservoir systems because the natural water inflows can be modeled using Markovian stochastic processes and the transition equations of mass conservation for the reservoir storages are akin to those found in inventory theory. However, the “curse of dimensionality” has been a major obstacle to the numerical solution of MDP models for systems with several reservoirs. Also, the use of optimization models for the operation of multipurpose reservoir systems is not so widespread, due to the need for negotiations between different users, with dam operators often relying on operating rules obtained by simulation models.
In this chapter, we present the basic concepts of reservoir management and we give a brief survey of stochastic inflow models based on statistical hydrology. We also present a stochastic dynamic programming model for the planning and operation of a system of hydroelectric reservoirs, and we discuss some applications and computational issues. We feel many research opportunities exist both in the enhancement of computational methods and in the modeling of reservoir applications.
Bernard F. Lamond, Abdeslem Boukhtouta
Backmatter
Metadaten
Titel
Handbook of Markov Decision Processes
herausgegeben von
Eugene A. Feinberg
Adam Shwartz
Copyright-Jahr
2002
Verlag
Springer US
Electronic ISBN
978-1-4615-0805-2
Print ISBN
978-1-4613-5248-8
DOI
https://doi.org/10.1007/978-1-4615-0805-2