scroll identifier for mobile
main-content

## Über dieses Buch

The theory of Markov decision processes focuses on controlled Markov chains in discrete time. The authors establish the theory for general state and action spaces and at the same time show its application by means of numerous examples, mostly taken from the fields of finance and operations research. By using a structural approach many technicalities (concerning measure theory) are avoided. They cover problems with finite and infinite horizons, as well as partially observable Markov decision processes, piecewise deterministic Markov decision processes and stopping problems.

The book presents Markov decision processes in action and includes various state-of-the-art applications with a particular view towards finance. It is useful for upper-level undergraduates, Master's students and researchers in both applied probability and finance, and provides exercises (without solutions).

## Inhaltsverzeichnis

### Chapter 1. Introduction and First Examples

Suppose a system is given which can be controlled by sequential decisions. The state transitions are random and we assume that the system state process is Markovian which means that previous states have no influence on future states. Given the current state of the system (which could be for example the wealth of an investor) the controller or decision maker has to choose an admissible action (for example a possible investment). Once an action is chosen there is a random system transition according to a stochastic law (for example a change in the asset value) which leads to a new state. The task is to control the process in an optimal way. In order to formulate a reasonable optimization criterion we assume that each time an action is taken, the controller obtains a certain reward. The aim is then to control the system in such a way that the expected total discounted rewards are maximized. All these quantities together which have been described in an informal way, define a so-called Markov Decision Process. The Markov Decision Process is the sequence of random variables (Xn) which describes the stochastic evolution of the system states. Of course the distribution of (Xn) depends on the chosen actions. Figure 1.1 shows the schematic evolution of a Markov Decision Process.
Nicole Bäuerle, Ulrich Rieder

### Chapter 2. Theory of Finite Horizon Markov Decision Processes

In this chapter we will establish the theory of Markov Decision Processes with a finite time horizon and with general state and action spaces. Optimization problems of this kind can be solved by a backward induction algorithm. Since state and action space are arbitrary, we will impose a structure assumption on the problem in order to prove the validity of the backward induction and the existence of optimal policies. The chapter is organized as follows.
Nicole Bäuerle, Ulrich Rieder

### Chapter 3. The Financial Markets

In this chapter we introduce the financial markets which will appear in our applications. In Section 3.1 a financial market in discrete time is presented. A prominent example is the binomial model. However, we do not restrict to finite probability spaces in general. We will define portfolio strategies and characterize the absence of arbitrage in this market. In later chapters we will often restrict to Markov asset price processes in order to be able to use the Markov Decision Process framework. In Section 3.2 a special financial market in continuous time is considered which is driven by jumps only. More precisely the asset dynamics follow so-called Piecewise Deterministic Markov Processes. Though portfolio strategies are defined in continuous time here we will see in Section 9.3 that portfolio optimization problems in this market can be solved with the help of Markov Decision Processes. In Section 3.3 we will briefly investigate the relation of the discrete-time financial market to the standard Black-Scholes-Merton model as a widely used benchmark model in mathematical finance. Indeed if the parameters in the discrete-time financial market are chosen appropriately, this market can be seen as an approximation of the Black-Scholes-Merton model or of even more general models. This observation serves as one justification for the importance of discrete-time models. Other justifications are that trading in continuous time is not possible or expensive in reality (because of transaction cost) and that continuous-time trading strategies are often pretty risky. In Section 3.4 utility functions and the concept of expected utility are introduced and discussed briefly. The last section contains some notes and references.
Nicole Bäuerle, Ulrich Rieder

### Chapter 4. Financial Optimization Problems

The theory of Markov Decision Processes which has been presented in Chapter 2 will now be applied to some selected dynamic optimization problems in finance. The basic underlying model is the financial market of Chapter 3. We will always assume that investors are small and cannot influence the asset price process. We begin in the first two sections with the classical problem of maximizing the expected utility of terminal wealth. In Section 4.1 we consider the general one-period model. It will turn out that the existence of an optimal portfolio strategy is equivalent to the absence of arbitrage in this market. Moreover, the one-period problem is the key building block for the multiperiod problems which are investigated in Section 4.2 and which can be solved with the theory of Markov Decision Processes. In this section we will also present some results for special utility functions and the relation to continuous-time models is highlighted. In Section 4.3 consumption and investment problems are treated and solved explicitly for special utility functions. The next section generalizes these models to include regime switching. Here a Markov chain is used to model the changing economic conditions which give rise to a changing return distribution. Under some simplifying assumptions this problem is solved and the influence of the environment is discussed. Section 4.5 deals with models with proportional transaction cost. For homogeneous utility functions it will turn out that the action space is separated into sell-, buy- and no-transaction regions which are defined by cones. The next section considers dynamic mean-variance problems. In contrast to utility functions the idea is now to measure the risk by the portfolio variance and to search among all portfolios which yield at least a certain expected return, the one with smallest portfolio variance. The challenge is here to reduce this problem to a Markov Decision Problem first. Essentially the task boils down to solving a linear-quadratic problem. In Section 4.7 the variance is replaced by the risk measure ‘Average-Value-at-Risk’. In order to obtain an explicit solution in this mean-risk model, only the binomial model is considered here and the relation to the mean-variance problem is discussed. Section 4.8 deals with index-tracking problems and Section 4.9 investigates the problem of indifference pricing in incomplete markets. Finally, the last but one section explains the relation to continuous-time models and introduces briefly the approximating Markov chain approach. The last section contains some remarks and references.
Nicole Bäuerle, Ulrich Rieder

### Chapter 5. Partially Observable Markov Decision Processes

In many applications the decision maker has only partial information about the state process, i.e. part of the state cannot be observed. Examples can be found in engineering, economics, statistics, speech recognition and learning theory among others. An important financial application is given when the drift of a stock price process is unobservable and hard to estimate.
Nicole Bäuerle, Ulrich Rieder

### Chapter 6. Partially Observable Markov Decision Problems in Finance

All the models which have been considered in Chapter 4 may also be treated in the case of partial information. Indeed models of this type occur somehow natural in mathematical finance because there are underlying economic factors influencing asset prices which are not specified and cannot be observed. Moreover, for example the drift of a stock is notoriously difficult to estimate. In this chapter we assume that the relative risk return distribution of the stocks is determined up to an unknown parameter which may change. This concept can also be interpreted as one way of dealing with model ambiguity. We choose two of the models from Chapter 4 and extend them to partial observation. The first is the general terminal wealth problem of Section 4.2 and the second is the dynamic mean-variance problem of Section 4.6.
Nicole Bäuerle, Ulrich Rieder

### Chapter 7. Theory of Infinite Horizon Markov Decision Processes

In this chapter we consider Markov Decision Processes with an infinite time horizon. There are situations where problems with infinite time horizon arise in a natural way, e.g. when the random lifetime of the investor is considered. However more important is the fact that Markov Decision Models with finite but large horizon can be approximated by models with infinite time horizon. The latter one is often simpler to solve and admits mostly a (time) stationary optimal policy. On the other hand, the infinite time horizon makes it necessary to invoke some convergence assumptions. Moreover, for the theory it is necessary that properties of the finite horizon value functions carry over to the limit function.
Nicole Bäuerle, Ulrich Rieder

### Chapter 8. Piecewise Deterministic Markov Decision Processes

In this chapter we deal with optimization problems where the state process is a Piecewise Deterministic Markov Process. These processes evolve through random jumps at random time points while the behavior between jumps is governed by an ordinary differential equation. They form a general and important class of non-diffusions. It is known that every strongMarkov process with continuous paths of bounded variation is necessarily deterministic.We assume that both the jump behavior as well as the drift behavior between jumps can be controlled. Hence this leads to a control problem in continuous-time which can be tackled for example via the Hamilton-Jacobi-Bellman equation. However, since the evolution between jumps is deterministic these problems can also be reduced to a discrete-time Markov Decision Process where however the action space is now a function space. We can treat these problems with the methods we have established in the previous chapters. More precisely we will restrict the presentation to problems with infinite horizon, thus we will use the results of Chapter 7. We show that under some continuity and compactness conditions the value function of the Piecewise Deterministic Markov Decision Process is a fixed point of the Bellman equation (Theorem 8.2.6) and the computational methods of Chapter 7 apply. In Section 8.3 the important special class of continuous-time Markov Decision Chains is investigated, in particular for problems with finite time horizon.
Nicole Bäuerle, Ulrich Rieder

### Chapter 9. Optimization Problems in Finance and Insurance

We will now apply the theory of infinite horizon Markov Decision Models to solve some optimization problems in finance. In Section 9.1 we consider a consumption and investment problem with random horizon which leads to a contracting Markov Decision Model with infinite horizon as explained in Section 7.6.1. Explicit solutions in the case of a power utility are given. In Section 9.2 a classical dividend pay-out problem for an insurance company is investigated. In this example the state and action space are both discrete which implies that all functions on E×A are continuous and we can work with Theorem 7.2.1. Here the Markov Decision Model is not contracting. The main part of this section is to show that there exists an optimal stationary policy which is a so-called band-policy. In special cases this band-policy reduces to a barrier-policy, i.e. it is optimal to pay out all the money which is above a certain threshold. In Section 9.3 we consider a utility maximization problem in a financial market where the stock prices are Piecewise Deterministic Markov Processes. This optimization problem is contracting and our results from Chapters 7 and 8 allow a characterization of the value function and some computational approaches which complement the classical stochastic control approach via the Hamilton-Jacobi-Bellman equation. Some numerical results are also given. In Section 9.4 we study the liquidation of a large amount of shares in so-called dark pools. This is a continuous-time Markov Decision Chain with finite time horizon (see Section 8.3). Using the discretetime solution approach we are able to derive some interesting properties of the optimal liquidation policy.
Nicole Bäuerle, Ulrich Rieder

### Chapter 10. Theory of Optimal Stopping Problems

A very important subclass of the Markov Decision Problems considered so 3 far are optimal stopping problems. There, a Markov process (Xn)n∈N is given 4 which cannot be influenced by the decision maker. However, this process has 5 to be stopped at some time point n and a reward gn(Xn) is then obtained. 6 Thus, the only decision at each time point is whether the process should be 7 continued or stopped. Once it is stopped, no further decision is necessary. 8 Sometimes costs have to be paid or an additional reward is obtained as long 9 as the process is not stopped. Of course the aim is to find a stopping time 10 such that the expected stopping reward is maximized.
Nicole Bäuerle, Ulrich Rieder

### Chapter 11. Stopping Problems in Finance

Typical stopping problems in finance involve the pricing of American options. It can be shown by using no-arbitrage arguments that the price of an American option is the value of an optimal stopping problem under a risk neutral probability measure and the optimal stopping time is the optimal exercise time of the option. In order to have a complete financial market without arbitrage we restrict the first section on pricing American options to the binomial model. An algorithm is presented for pricing American options and the American put option is investigated in detail. In particular also perpetual American put options are studied. In Section 11.2 so-called credit granting problems are considered. Here the decision maker has to decide whether or not a credit is extended. In this context, a Bayesian Model is also presented.
Nicole Bäuerle, Ulrich Rieder

### Appendix A. Tools from Analysis

In order to prove existence of optimal policies, upper semicontinuous functions are important. For the following definition and properties we suppose that M is a metric space. We use the notation $$\bar{\mathbb{R}}=\mathbb{R}\cup{\{-\infty, \infty\}}.$$
Nicole Bäuerle, Ulrich Rieder

### Appendix B. Tools from Probability

In what follows we suppose that all random variables are defined on a com- plete probability space (Ω,,ℙ ). The following classical results about the interchange of expectation and limit can be found in every textbook on prob- ability theory (see e.g. Billingsley (1995), Bauer (1996), Shiryaev (1996)).
Nicole Bäuerle, Ulrich Rieder

### Appendix C. Tools from Mathematical Finance

In this section we summarize some facts from the fundamental no arbitrage pricing theory and shed some light on the role of martingales in option pricing. For details see Föllmer and Schied (2004) Chapter 5. In what follows suppose a filtered probability space $${(\Omega, \mathcal{F},(\mathcal{F}_n), \mathbb{P})}$$ is given where $${\mathcal{F}_0}\,:=\{\emptyset,\Omega\}$$. On this space there exist $${d\,+\,1}$$ assets and the price at time $${n\,=\,0,\,1,\ldots,N}$$ of asset $${k}$$ is modelled by a random variable $${S^k_n}$$ (see Section 3.1 for a detailed description of the financial market). Asset $${S^0}$$ is a riskless bond which is used as a numeraire.
Nicole Bäuerle, Ulrich Rieder

### Backmatter

Weitere Informationen

## BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

## Whitepaper

- ANZEIGE -

### Blockchain-Effekte im Banking und im Wealth Management

Es steht fest, dass Blockchain-Technologie die Welt verändern wird. Weit weniger klar ist, wie genau dies passiert. Ein englischsprachiges Whitepaper des Fintech-Unternehmens Avaloq untersucht, welche Einsatzszenarien es im Banking und in der Vermögensverwaltung geben könnte – „Blockchain: Plausibility within Banking and Wealth Management“. Einige dieser plausiblen Einsatzszenarien haben sogar das Potenzial für eine massive Disruption. Ein bereits existierendes Beispiel liefert der Initial Coin Offering-Markt: ICO statt IPO.