main-content

## Über dieses Buch

This book presents the first part of a planned two-volume series devoted to a systematic exposition of some recent developments in the theory of discrete-time Markov control processes (MCPs). Interest is mainly confined to MCPs with Borel state and control (or action) spaces, and possibly unbounded costs and noncompact control constraint sets. MCPs are a class of stochastic control problems, also known as Markov decision processes, controlled Markov processes, or stochastic dynamic pro­ grams; sometimes, particularly when the state space is a countable set, they are also called Markov decision (or controlled Markov) chains. Regardless of the name used, MCPs appear in many fields, for example, engineering, economics, operations research, statistics, renewable and nonrenewable re­ source management, (control of) epidemics, etc. However, most of the lit­ erature (say, at least 90%) is concentrated on MCPs for which (a) the state space is a countable set, and/or (b) the costs-per-stage are bounded, and/or (c) the control constraint sets are compact. But curiously enough, the most widely used control model in engineering and economics--namely the LQ (Linear system/Quadratic cost) model-satisfies none of these conditions. Moreover, when dealing with "partially observable" systems) a standard approach is to transform them into equivalent "completely observable" sys­ tems in a larger state space (in fact, a space of probability measures), which is uncountable even if the original state process is finite-valued.

## Inhaltsverzeichnis

### 1. Introduction and Summary

Abstract
In an optimal control problem, we are given a dynamical system whose behavior may be influenced or regulated by a suitable choice of some of the system’s variables, which are called control—or action or decisionvariables. The controls that can be applied at any given time are chosen according to “rules” known as control policies. In addition, we are given a function called a performance criterion (or performance index), defined on the set of control policies, which measures or evaluates in some sense the system’s response to the control policies being used. Then the optimal control problem is to determine a control policy that optimizes (i.e., either minimizes or maximizes) the performance criterion.
Onésimo Hernández-Lerma, Jean Bernard Lasserre

### 2. Markov Control Processes

Abstract
The main objective of this chapter is to set the stage for the rest of the book by formally introducing the controlled stochastic processes in which we are interested. An informal discussion of the main concepts, namely, Markov control models, control policies, and Markov control processes (MCPs), was already presented in §1.2. Their meaning is made precise in this chapter.
Onésimo Hernández-Lerma, Jean Bernard Lasserre

### 3. Finite-Horizon Problems

Abstract
In this chapter, we consider the Markov control model
$$(X,A,\{ A(x)|x \in X\} ,Q,c),$$
(3.1.1)
introduced in Definition 2.2.1, and the control problem we are interested in is to minimize the finite-horizon performance criterion
$$J(\pi ,x): = E_x^\pi \left[ {\sum\limits_{t = 0}^{N - 1} {c({x_t},{a_t}) + {c_N}({x_N})} } \right],$$
(3.1.2)
with c N , the terminal cost function, a given measurable function on X.
Onésimo Hernández-Lerma, Jean Bernard Lasserre

### 4. Infinite-Horizon Discounted-Cost Problems

Abstract
As was already mentioned in §3.4 (Remark 3.4.1), the motivation to study discounted cost problems is mainly economic. In that section, we considered finite-horizon problems, but for many purposes it is convenient to introduce the fiction that the optimization horizon is infinite. Certainly, for instance, processes of capital accumulation for an economy, or some problems on inventory or portfolio management, do not necessarily have a natural stopping time in the definable future.
Onésimo Hernández-Lerma, Jean Bernard Lasserre

### 5. Long-Run Average-Cost Problems

Abstract
In this chapter, we study the long-run expected average cost per unit-time criterion, hereafter abbreviated average cost or AC criterion, which is defined as follows.
Onésimo Hernández-Lerma, Jean Bernard Lasserre

### 6. The Linear Programming Formulation

Abstract
A time-honored approach to studying optimal control problems (OCPs) is via mathematical programming techniques on suitable spaces. This approach is in principle applicable to almost any class of OCPs, deterministic or stochastic, in discrete or continuous time, constrained or unconstrained, with finite or infinite optimization horizon—some references are given in §6.6. The preferred techniques, on the other hand, include the Lagrange multipliers method and convex and linear programming techniques.
Onésimo Hernández-Lerma, Jean Bernard Lasserre

### Backmatter

Weitere Informationen