Skip to main content

2024 | Buch

Optimization and Games for Controllable Markov Chains

Numerical Methods with Application to Finance and Engineering

insite
SUCHEN

Über dieses Buch

This book considers a class of ergodic finite controllable Markov's chains. The main idea behind the method, described in this book, is to develop the original discrete optimization problems (or game models) in the space of randomized formulations, where the variables stand in for the distributions (mixed strategies or preferences) of the original discrete (pure) strategies in the use. The following suppositions are made: a finite state space, a limited action space, continuity of the probabilities and rewards associated with the actions, and a necessity for accessibility. These hypotheses lead to the existence of an optimal policy. The best course of action is always stationary. It is either simple (i.e., nonrandomized stationary) or composed of two nonrandomized policies, which is equivalent to randomly selecting one of two simple policies throughout each epoch by tossing a biased coin. As a bonus, the optimization procedure just has to repeatedly solve the time-average dynamic programming equation, making it theoretically feasible to choose the optimum course of action under the global restriction. In the ergodic cases the state distributions, generated by the corresponding transition equations, exponentially quickly converge to their stationary (final) values. This makes it possible to employ all widely used optimization methods (such as Gradient-like procedures, Extra-proximal method, Lagrange's multipliers, Tikhonov's regularization), including the related numerical techniques. In the book we tackle different problems and theoretical Markov models like controllable and ergodic Markov chains, multi-objective Pareto front solutions, partially observable Markov chains, continuous-time Markov chains, Nash equilibrium and Stackelberg equilibrium, Lyapunov-like function in Markov chains, Best-reply strategy, Bayesian incentive-compatible mechanisms, Bayesian Partially Observable Markov Games, bargaining solutions for Nash and Kalai-Smorodinsky formulations, multi-traffic signal-control synchronization problem, Rubinstein's non-cooperative bargaining solutions, the transfer pricing problem as bargaining.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Controllable Markov Chains
Abstract
In this chapter, we describe a class of discrete-time, controllable, and ergodic Markov chains. In these concepts, time and space are discrete. We start by outlining the fundamental model. Its single-step transition probabilities is then utilized to determine if a Markov chain is ergodic. In order to convert the nonlinear optimization Markov problem into a linear one and make the problem tractable, we propose to use an auxiliary c-variable. With such a setup, a discrete time Markov chain simulation is shown. A piece of the illustrative Markov chain laws for discrete time is constructed in the end.
Julio B. Clempner, Alexander Poznyak
Chapter 2. Multiobjective Control
Abstract
A multi-objective Pareto front solution is presented in this chapter for a particular type of discrete-time ergodic controllable Markov chains. We offer a technique that, given specific boundaries, chooses the best multi-objective option for the Pareto frontier as a decision support system. We only consider a class of finite, ergodic, and controllable Markov chains while addressing this issue. The regularized penalty method utilizes a projection-gradient strategy to identify the Pareto policies along the Pareto frontier and is based on Tikhonov’s regularization method. The goal is to make the parameters as efficient as possible while still maintaining the original form of the functional. After setting the initial value, we gradually reduce it until each policy closely resembles the Pareto policy. In this sense, we specify the precise direction of the parameter tendencies toward zero and establish the convergence of the gradient regularized penalty algorithm. The matching picture in the objective space receives a Pareto frontier of only Pareto policies thanks to our policy-gradient multi-objective algorithms, which, on the other hand, use a gradient-based strategy. In order to improve security when transporting cash and valuables, we empirically validate the technique by providing a numerical example of a genuine alternative solution to the vehicle routing planning problem. In addition, we describe a portfolio optimization and represent the Pareto frontier. The decision-making techniques investigated in this paper are consistent with the most widely used computational intelligent models in the Artificial Intelligence research field.
Julio B. Clempner, Alexander Poznyak
Chapter 3. Partially Observable Markov Chains
Abstract
The controlled Partly Observable Markov Decision Process (POMDP) architecture has shown to be effective in a variety of fields where one is required to disclose only partial knowledge about the problem’s structure and parameters. Since some state variables are difficult to track and measure correctly, it may be more beneficial to base judgments on less accurate information. This chapter focuses on the design of an observer for a class of partially observable ergodic homogeneous finite Markov chains. The major objective of the suggested approach is to derive the formulas for computing an observer and, as a consequence, the best control strategy. We create a new variable that combines the policy, the observation kernel, and the distribution vector in order to solve the issue. To retrieve the important variables, we derive the formulas. The POMDP model’s parameters are being learned in a dynamic context in this work. The development of the adaptive policies is based on an identification method, in which we count the number of unobserved events to estimate the components of the utility and transition matrices. The practical applications of the theoretical concerns addressed to a portfolio optimization problem are demonstrated through the use of a numerical example.
Julio B. Clempner, Alexander Poznyak
Chapter 4. Continuous-Time Markov Chains
Abstract
The c-variable approach is extended in this chapter by adding a new linear constraint for continuous time. This method’s benefit is that it transforms the continuous-time Markov decision process problem into a discrete-time Markov decision process, where the linear constraints make the problem computationally tractable. Chemical reaction networks, where the concentration dynamics is described as a continuous-time Markov chain, serve as an example of the method’s use. Using a state-discrete continuous-time Markov decision process, we provide a mathematical optimization method for resolving chemical processes. The first application is a single reversible reaction that produces the amidogen radical, and we were able to determine the ideal temperature that reduces a linear functional of interest. The second is a chemical reaction network that involves the proton transfer, hydration, and tautomeric reaction of anthocyanin pigments, in this case we found an optimal strategy over a set of values of pH that minimizes the corresponding linear functional.
Julio B. Clempner, Alexander Poznyak
Chapter 5. Nash and Stackelberg Equilibrium
Abstract
We provide an approach to locating the Nash equilibrium in this chapter. The technique depends on identifying a scalar \(\lambda ^{*}\) and the associated strategies \(d^{*}(\lambda ^{*})\) fixing particular boundaries (min and max) that belong to the Pareto front. Bounds refer to limits placed by the player over the Pareto front that form a specific decision region where the strategies can be selected. We first use a nonlinear programming issue to illustrate the Pareto front of the game, introducing a set of linear constraints for the Markov chain game based on the c-variable technique. We suggest using the Euler method and a penalty function with regularization to solve the strong Nash equilibrium issue. The convergence to a single (strong) equilibrium point is ensured using Tikhonov’s regularization method. The subsequent single-objective restricted problems that result from using the regularized functional of the game were then solved using a nonlinear programming technique. We use the gradient approach to resolve the first-order optimality requirements in order to accomplish the aim. The approach solves an optimization issue by adding linear constraints necessary to identify the best strong strategy, d (lambda d), starting from a utopia point (Pareto optimum point) given an initial lambda of the individual objectives. We demonstrate that the game’s functional in the regularized issue decreases and ultimately converges, demonstrating the presence and exclusivity of strong Nash equilibrium (Pareto-optimal Nash equilibrium). We also provide a method for calculating the Markov chain games’ strong Stackelberg/Nash equilibrium. The minimization of the \(L p-\)norm, which shortens the distance to the utopian point in Euclidian space, is taken into consideration while solving the cooperative n-leaders and m-followers Markov game. Next, we formulate a Pareto-optimal solution to the optimization issue. For finding the strong Lp-Stackelberg/Nash equilibrium, we use a bi-level programming technique that is carried out through extraproximal optimization.
Julio B. Clempner, Alexander Poznyak
Chapter 6. Best-Reply Strategies in Repeated Games
Abstract
The actions that players naturally and frequently choose to perform throughout a repeated game are the “best-reply strategy”. Making the determination that such strategies lead to an equilibrium point is a difficult and sensitive operation since, often, the behavior of a single cost-function when such best-reply techniques are used turns out to be non-monotonic. The convergence to a stable equilibrium is not always guaranteed, even in repeated games. In this chapter, we demonstrate that the best-reply actions surely lead to an equilibrium point for a type of finite controlled Markov Chains dynamic games. The Lyapunov Games concept, which is based on the design of a unique Lyapunov function (associated with a unique cost function) that monotonically reduces (non-increases) throughout the game, achieves this result. We provide a technique for creating a Lyapunov-like function that describes how players behave in a recurrent Markov chain game. The Lyapunov-like function replaces the components of the ergodic system, which simulate players’ anticipated behavior in one-shot games, for the recursive process. We first provide a non-converging state-value function that varies (increases and decreases) between states of the repeating Markov game in order to demonstrate our claim. Then, we demonstrate that using a one-step-ahead fixed-local-optimal approach, it is possible to describe that function in a recursive format. Thus, we show that the previous recursive expression for the repeated game can be used to construct a Lyapunov-like function; the resulting Lyapunov-like function is a monotonic function that can only decrease (or remain the same) over time, regardless of the initial distribution of probabilities. The best-reply methods examined in this study are connected to what are known as pure and stationary fixed-local-optimal actions, or, to put it another way, with one step ahead optimization algorithms that are extensively employed in the Artificial Intelligence theory. The Bank Marketing Campaigns Game and the Duel Game with Best-Reply Strategies Application serve as examples of the proposed strategy.
Julio B. Clempner, Alexander Poznyak
Chapter 7. Mechanism Design
Abstract
This chapter presents an analytical method for computing Bayesian incentive-compatible mechanisms where the private information is revealed following a class of controllable Markov games. We take into account a dynamic setting where decisions are made after a number of limited time periods. Our approach includes a new variable that denotes the outcome of the distribution vector, the strategies, and the mechanism design. We develop the relationships needed to calculate the relevant variables analytically. The issue becomes computationally tractable with the addition of this variable. The technique uses a Reinforcement Learning (RL) methodology to calculate a mechanism that is nearly optimum and in equilibrium with the game’s winning strategy. We employ the Bayesian-Nash equilibrium concept as the default equilibrium idea in the game. There are several equilibria, which presents an intriguing problem because there isn’t a single mechanism that is ideal for the goal of profit maximization. To address this issue, we apply Tikhonov’s approach and offer a regularization parameter. We show the game’s equilibrium and convergence to a single mechanism that is compatible with incentives. This results in numerous game theory issue areas having unique and significantly improved results, as well as incentive-compatible processes that are consistent with the equilibrium of the game. To illustrate the recommended method, we provide a numerical example in the context of a dynamic public finance model with incomplete knowledge.
Julio B. Clempner, Alexander Poznyak
Chapter 8. Joint Observer and Mechanism Design
Abstract
An intelligent agent suggests an autonomous entity, which manages and learns actions to be taken towards achieving goals. The issue is that it is difficult to build a method that can calculate effective judgments that maximize the overall reward of interacting agents upon an environment with unknown, partial, and uncertain information, according to reports in the literature on Artificial Intelligence (AI). This chapter offers a solution to these problems: a foundation for Bayesian Partially Observable Markov Games (BPOMGs) supported by an AI strategy. The nucleus’s structure is governed by three essential concepts: game theory, learning, and inference. The first thing we do is provide a brand-new general Bayesian strategy that is designed for games that take into account both the partial information provided by the Bayesian model and the incomplete information on the states of the Markov system. This approach uses a Partly Observable Markov Game (POMG) to deal with execution uncertainty. Second, we expand design theory to include joint observer design and mechanism design (both unknown). Because agents behave in their own self-interest, the mechanism is created to persuade them not to divulge their personal information and to get a certain result. The purpose of the joint observer design is to depict the possibility that agents may not be motivated to deliver accurate information about their states. The transition matrices, which are also unknown, are estimated by the agents using a model that uses a Reinforcement Learning (RL) technique at each time step. The result is an expanded POMG model that adds a new variable and suggests an analytical method for computing both the observer design and the mechanism design (both unknown). The suggested expansion makes the computationally challenging game theory issue tractable. The variables of relevance for each agent, such as the observation kernels, joint observers, mechanisms, strategies, and distribution vectors, are derived relations in order to retrieve them analytically. By simulating a game-theoretic examination of the patrolling issue that involves developing the mechanism, calculating the observers, and using an RL technique, the utility and efficacy of the suggested nucleus are confirmed.
Julio B. Clempner, Alexander Poznyak
Chapter 9. Bargaining Games or How to Negotiate
Abstract
The term “bargaining game” describes a scenario in which participants may reach a win-win agreement. There is a conflict of interest here over whether an agreement should be reached or whether no agreement should be reached without the consent of each player. Amazingly, negotiating protocols, duopoly market games, business transactions, arbitration, and other key scenarios have all used bargaining and its game-theoretic solutions. The underpinning for all of these research applications is equilibrium computation. This chapter explores the theory behind bargaining games and offers a way for solving the game-theoretic models of bargaining put out by Nash and Kalai-Smorodinsky, which suggest a beautiful axiomatic solution to the problem based on several fairness standards. We consider a class of continuous-time, controllable, and ergodic Markov games. The Nash bargaining solution is first introduced and axiomatized. The Kalai-Smorodinsky approach, which adds the monotonicity postulate to the Nash’s model, is then presented. We recommend using a bargaining solver to resolve the issue, which is carried out iteratively using a set of nonlinear equations represented by the Lagrange principle and the Tikhonov regularization approach to guarantee convergence to a particular equilibrium point. Each equation in this solver is an optimization problem for which the necessary condition of a minimum is solved using the projection gradient method. This chapter’s key finding illustrates how equilibrium calculations work in bargaining games. We specifically discuss the convergence analysis and rate of convergence of the suggested technique. A numerical example contrasting the Nash and Kalai-Smorodinsky bargaining solution problem is used to illustrate the value of the considered method.
Julio B. Clempner, Alexander Poznyak
Chapter 10. Multi-traffic Signal-Control Synchronization
Abstract
In this chapter, we provide a brand-new paradigm for combining game theory and the extraproximal approach to represent the multi-traffic signal-control synchronization problem. The intersection’s goal is to reduce queuing time, and finding the best signal timing strategy, or assigning a green period to each signal phase, is a challenge for signal controllers. The game’s players are referred to as signal controllers. Finding a green period at each junction seeks to reduce signal and queuing delays. The issue poses two inherent limitations: (a) the number of arriving and departing vehicles varies for each street of the intersection; and (b) for every intersection, the period of time allowed for vehicles to reach the green light is equal to the duration of the red light. A leader-follower Stackelberg game model is determined by the first restriction: streets with higher traffic demand more green time. The most recent constraint created a simultaneous game-solution as a better evaluation of the actual circumstance. The study then demonstrates that the Nash equilibrium provides the solution in order to take use of the game’s structure. To solve the problem computationally and determine the best signal timing distribution, we present the c-variable technique. The extraproximal approach is a two-stage iterative process. In the first phase, an approximate equilibrium point is calculated, and in the second step, the previous step is adjusted. The game is described in terms of linked nonlinear programming issues that use the Lagrange principle. To guarantee the convergence of the cost-functions to a Nash equilibrium point, the Tikhonov regularization approach is used. The extraproximal approach is also developed using Markov chains. The three way intersection example serves as a good demonstration of the method’s use. Our contributions have significant effects on the applications in the real world.
Julio B. Clempner, Alexander Poznyak
Chapter 11. Non-cooperative Bargaining with Unsophisticated Agents
Abstract
In a conventional non-cooperative negotiating scenario, two or more forward-thinking participants make offers and counteroffers alternately until an agreement is achieved, with a penalty taking into account the length of time it takes players to make a decision. We provide a game that helps myopic participants achieve equilibrium as if they were forward-thinking agents. One of the game’s main mechanics is that players are penalized for deviating from their prior best reply plan as well as for the amount of time they spend to make decisions at each stage of play. Our chapter adds to existing research on typical myopic agent bargaining while also broadening the class of processes and functions that may be used to define and apply Rubinstein’s non-cooperative bargaining solutions.
Julio B. Clempner, Alexander Poznyak
Chapter 12. Transfer Pricing as Bargaining
Abstract
The Nash bargaining game theory technique is used in this chapter to examine and provide a solution to the transfer pricing problem. We analyze a company with sequential transfers among its several divisions, where central management decides on the transfer price to maximize operational profitability. Throughout the negotiation process, the price shifting between divisions is a tool for bargaining. First, we take into account a point of contention (status quo) between the firm’s divisions, which serves as a deterrent. We offer a methodology and framework for calculating the disagreement point that are based on the Nash equilibrium approach. Then, we introduce the bargaining solution, which is a single-valued function that chooses an outcome from each bargaining problem’s feasible pay-offs. This solution is the result of cooperation between the company divisions involved in the transfer pricing problem. The agreement achieved by the divisions in the game is the most desirable option within the range of plausible outcomes that results in a distribution of the transfer price between divisions that maximizes profit. We provide an optimization approach for computing the negotiating solution. The method’s usefulness is demonstrated through a number of examples, including Markov models in both continuous and discrete time.
Julio B. Clempner, Alexander Poznyak
Backmatter
Metadaten
Titel
Optimization and Games for Controllable Markov Chains
verfasst von
Julio B. Clempner
Alexander Poznyak
Copyright-Jahr
2024
Electronic ISBN
978-3-031-43575-1
Print ISBN
978-3-031-43574-4
DOI
https://doi.org/10.1007/978-3-031-43575-1

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.