Skip to main content
main-content

Über dieses Buch

This book explores discrete-time dynamic optimization and provides a detailed introduction to both deterministic and stochastic models. Covering problems with finite and infinite horizon, as well as Markov renewal programs, Bayesian control models and partially observable processes, the book focuses on the precise modelling of applications in a variety of areas, including operations research, computer science, mathematics, statistics, engineering, economics and finance.
Dynamic Optimization is a carefully presented textbook which starts with discrete-time deterministic dynamic optimization problems, providing readers with the tools for sequential decision-making, before proceeding to the more complicated stochastic models. The authors present complete and simple proofs and illustrate the main results with numerous examples and exercises (without solutions). With relevant material covered in four appendices, this book is completely self-contained.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction and Organization of the Book

In this treatise we deal with optimization problems whose objective functions show a sequential structure and hence are amenable to sequential methods. The corresponding field mostly goes by the name Discrete-time Dynamic Programming. Other names are Discrete-time Stochastic Control and Multi-stage Optimization. In order to avoid possible confusion with programming in computer science we speak of Dynamic Optimization.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Deterministic Models

Frontmatter

Chapter 2. The Stationary Deterministic Model and the Basic Solution Procedure

We introduce as a prototype of deterministic dynamic optimization problems a simple allocation problem, give firstly an intuitive and then a formal description of the general problem, and derive the basic solution technique: value iteration and optimality criterion. This allows us to derive structural properties of the solution of the allocation problem.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 3. Additional General Issues

We present the basic theorems for cost minimization and for DPs with an absorbing set of states. We also prove the basic theorem using reachable states. The important notion of a bounding function is introduced.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 4. Examples of Deterministic Dynamic Programs

In this chapter we explicitly solve the following: optimal routing of a freighter, a production-inventory problem with linear costs, allocation and linear-quadratic problems and a scheduling problem. Then we discuss some further models: DPs with random length of periods and with random termination.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 5. Absorbing Dynamic Programs and Acyclic Networks

We study the problem of maximizing the sum of discounted rewards, earned not over a fixed number of periods, but until the decision process enters a given absorbing set. The basic theorem for absorbing DPs is derived. Moreover, we show how absorbing DPs can be used to find cost-minimal subpaths in acyclic networks.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 6. Monotonicity of the Value Functions

Only a small number of DPs possess an explicit solution. Therefore structural (i.e. qualitative) properties of the value functions and maximizers are of considerable importance. Here we study the monotonicity of V n (s) in n and, with arbitrary relations on S, also in s. Examples of structural properties considered in later sections are concavity and Lipschitz continuity of the value function. As a first benefit of each structural result it can serve to check the correctness of numerical computations. Further benefits, specific to the structure considered, are discussed at the corresponding place.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 7. Concavity and Convexity of the Value Functions

Here we deal with the following questions, assuming that the functions under consideration are defined on convex sets or on a non-degenerate discrete interval.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 8. Monotone and ILIP Maximizers

We present conditions under which the pointwise smallest maximizer f n (s) depends monotonely on the current state s, is increasing and Lipschitz in s, is monotone in n and is monotone both in n and s. We also introduce several algorithms for computing the smallest maximizer.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 9. Existence of Optimal Action Sequences

In many special models the existence of maximizers at all stages (and hence by the OC in Theorem 2.​3.​3(c) also of s-optimal action sequences for all initial states s) can be established by ad hoc methods. The existence of a maximizer at each stage is also obvious if the set D(s) of admissible actions is finite for all s. In Proposition 7.​1.​10 we gave a result which covers many applications where S and A are intervals. The existence problem for maximizers under more general conditions is most easily dealt with under assumptions which are so strong that continuity (or at least upper semicontinuity) of W n and also of V n is implied.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 10. Stationary Models with Large Horizon

In this book we prefer to place more emphasis, at least from the conceptual point of view, on \(\lim \mathop{\mathrm{\mathit{DP_{} } } }\nolimits _{n}\) than on DP . There are a number of reasons for our approach. We present the general theory for \(\lim \mathop{\mathrm{\mathit{DP_{}}}}\nolimits _{n}\) and discuss the optimality equation (Bellman equation) for the limit value function. DPs with infinite horizon are also considered.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Markovian Decision Processes

Frontmatter

Chapter 11. Control Models with Disturbances

In this chapter we introduce control models with finite and i.i.d. disturbances. We prove the reward iteration and derive the basic solution techniques: value iteration and optimality criterion.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 12. Markovian Decision Processes with Finite Transition Law

Firstly we introduce MDPs with finite state spaces, prove the reward iteration and derive the basic solution techniques: value iteration and optimality criterion. Then MDPs with finite transition law are considered. There the set of reachable states is finite.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 13. Examples of Markovian Decision Processes with Finite Transition Law

In this chapter we investigate several examples and models with finite transition law: an allocation problem with random investment, an inventory problem, MDPs with an absorbing set of states, MDPs with random initial state, stopping problems and terminating MDPs. Finally, stationary MDPs are generalized to non-stationary MDPs.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 14. Markovian Decision Processes with Discrete Transition Law

We consider MDPs with countable state spaces and variable discount factors. The discount factor may depend on the state and the action. Under minimal assumptions we prove the reward iteration and formulate a structure theorem for MDPs. Also the useful notion of a bounding function is introduced.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 15. Examples with Discrete Disturbances and with Discrete Transition Law

In this chapter we apply the general theorems from Chap. 14 to special examples. In particular, we consider a production-inventory problem with backlogging and delivery lag and a queueing model with arrival control.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 16. Models with Arbitrary Transition Law

In this chapter we consider MDPs with general measurable state and action space. We extend the results of Chaps. 12 and 14 to the present model. Under minimal assumptions we state the reward iteration and the structure theorem. Binary MDPs and continuous versions of examples illustrate the results. A useful generalization of MDPs are MDPs with random environment. The random environment (such as economic factors) evolves within a set of states as an uncontrolled Markov chain.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 17. Existence of Optimal Policies

In this chapter the state and action spaces are assumed to be separable metric spaces. We apply the measurable selection theorem of Kuratowski/Ryll-Nardzewski in order to prove the existence of maximizers. We present different sets of assumptions under which optimal policies for MDPs exist.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 18. Stochastic Monotonicity and Monotonicity of the Value Functions

We consider MDPs and CMs with structured state space and arbitrary transition law. We assume that the minimal assumption (MA1) holds. As in Chap. 6 we are looking for conditions under which the value functions are monotone in the initial state s. This is easy for CMs, but requires a thorough treatment of the notion of stochastic monotonicity for MDPs. Unless stated otherwise, a result holds for both MDPs and CMs.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 19. Concavity and Convexity of the Value Functions and Monotone Maximizers

We now extend some of the results in Chaps. 7 and 8 to CMs and MDPs. The latter are considerably more difficult to deal with than CMs.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 20. Markovian Decision Processes with Large and with Infinite Horizon

We now extend the investigation in Sect. 12.​2 to MDPs with arbitrary transition law. The general issues arising in the case of a large or infinite horizon are described in detail at the beginning of Chap. 10 and in Sect. 10.​5
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Generalizations of Markovian Decision Processes

Frontmatter

Chapter 21. Markovian Decision Processes with Disturbances

In Part II we treated two basic models of Stochastic Dynamic Programming: Control Models with independent disturbances and Markovian Decision Processes. In the first of these models the state ζ t = ζ t π in period t and the disturbance η t+1 in the same period are stochastically independent of each other. However, there are other important models in which η t+1 depends on ζ t . This holds in particular in some of the Markov renewal programs treated in Chap. 22 below, where a random time η t+1 elapses between the t-th and the (t + 1)-st decision; another relevant model is the Markovian control model MCM introduced below. We now present a model, called an MDP with disturbances, which due to a very flexible transition law comprises all these models. As a further important generalization, necessary for the Markov renewal programs, we allow the discount factor in each period to be a function of the disturbance.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 22. Markov Renewal Programs

In the models of Part II we usually assumed a constant discount factor. This is only appropriate if the decisions are taken at equally spaced time points. However, there are many situations where the times between successive decisions are random variables whose probability distribution depends in a Markovian manner on the momentary state and action. Prominent examples are the optimal control of queueing systems where decisions (e.g. about the speed of service or the admission of customers) are often taken at the arrival or departure times of customers. Such situations can be described by special MDPDs where the disturbance variable η t+1 is the random time between the t-th and the (t + 1)-st decision, t ≥ 0.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 23. Bayesian Control Models

We present the Bayesian approach to solving control models with i.i.d. disturbances where the reward functions and the distributions depend measurably on some unknown parameter. We introduce the associated MDP and state the basic theorem for Bayesian control models. Proofs of the results are involved and postponed to Chap. 25 A gambling problem illustrates the results.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 24. Examples of Bayesian Control Models

In this chapter we study several examples: linear-quadratic and gambling problems, stopping problems and asset selling. Important applications of the Bayesian theory are those statistical decision problems (in particular problems of estimation and testing) which are not based on a fixed number of observations, but where sampling may be discontinued at a random time, determined by the observations made so far. An example is the sequential probability ratio test for two simple hypotheses.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 25. Bayesian Models with Disturbances

The model of the present chapter generalizes the Bayesian control models (BCMs) from Chap. 23 The BCMs in Chap. 23 were essentially families of CMs, indexed by the unknown parameter ϑ ∈ Θ. We now consider a Bayesian MDPD (BMDPD for short), which essentially is a family of MDPDs (cf. Chap. 21), indexed by ϑ, where the transition law has the factorization property below (see Definition 25.1.2).
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Chapter 26. Partially Observable Models

POMs or models with incomplete information are more general than the Bayesian models from the previous chapter. We introduce the associated MDPD and state the basic theorem for POMs. A classical maintenance problem illustrates the solution technique.
Karl Hinderer, Ulrich Rieder, Michael Stieglitz

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise