Top

Queueing Systems

Published in:

27-06-2022

On the Whittle index of Markov modulated restless bandits

Authors: S. Duran, U. Ayesta, I. M. Verloop

Published in: Queueing Systems | Issue 3-4/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, we study a Multi-Armed Restless Bandit Problem (MARBP) subject to time fluctuations. This model has numerous applications in practice, like in cloud computing systems or in wireless communications networks. Each bandit is formed by two processes: a controllable process and an environment. The transition rates of the controllable process are determined by the state of the environment, which is an exogenous Markov process. The decision maker has full information on the state of every bandit, and the objective is to determine the optimal policy that minimises the long-run average cost. Given the complexity of the problem, we set out to characterise the Whittle index, which is obtained by solving a relaxed version of the MARBP. As reported in the literature, this heuristic performs extremely well for a wide variety of problems. Assuming that the optimal policy of the relaxed problem is of threshold type, we provide an algorithm that finds Whittle’s index. We then consider a multi-class queue with linear cost and impatient customers. For this model, we show threshold optimality, prove indexability, and obtain Whittle’s index in closed-form. We also study the limiting regimes in which the environment is relatively slower and faster than the controllable process. By numerical simulations, we assess the suboptimality of Whittle’s index policy in a wide variety of scenarios, and the general observation is that, as in the case of standard MARBP, the suboptimality gap of Whittle’s index policy is small.

previous article On the Gittins index for multistage jobs

next article A general “power-of-d” dispatching framework for heterogeneous systems

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Aalto, S., Lassila, P., Osti, P.: Whittle index approach to size-aware scheduling with time-varying channels. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 57–69, (2015)

Altman, E., Avrachenkov, K.E., Núnez-Queija, R.: Perturbation analysis for denumerable markov chains with application to queueing models. Adv. Appl. Probab. 36(3), 839–853 (2004)CrossRef

Anand, A., de Veciana, G.: A Whittle’s index based approach for QoE optimization in wireless networks. In: Proceedings of ACM SIGMETRICS, Irvine, California, USA, (2018)

Ansell, P.S., Glazebrook, K.D., Niño-Mora, J., O’Keeffe, M.: Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57, 21–39 (2003)CrossRef

Arapostathis, A., Das, A., Pang, G., Zheng, Y.: Optimal control of markov-modulated multiclass many-server queues. Stochast. Syst. 9(2), 83–181 (2019)

Argon, N.T., Ding, L., Glazebrook, K.D., Ziya, S.: Dynamic routing of customers with general delay costs in a multiserver queuing system. Probab. Eng. Inf. Sci. 23(2), 175–203 (2009)CrossRef

Bhulai, S., Brooms, A.C., Spieksma, F.M.: On structural properties of the value function for an unbounded jump markov process with an application to a processor sharing retrial queue. Queueing Syst. 76(4), 425–446 (2014)CrossRef

Borkar, V.S., Kasbekar, G.S., Pattathil, S., Shetty, P.: Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems, (2017)

Borkar, V.S., Pattathil, S.: Whittle indexability in egalitarian processor sharing systems. Ann. Oper. Res., pp. 1–21, (2017)

10.

Borkar, V.S., Ravikumar, K., Saboo, K.: An index policy for dynamic pricing in cloud computing under price commitments. Appl. Math. 44, 215–245 (2017)

11.

Boucherie, R.J., Van Dijk, N.M.: Queueing networks: a fundamental approach, vol. 154. Springer Science & Business Media, NY (2010)

12.

Budhiraja, A., Ghosh, A., Liu, X.: Scheduling control for markov-modulated single-server multiclass queueing systems in heavy traffic. Queueing Syst. 78(1), 57–97 (2014)CrossRef

13.

Dai, J.G., He, S.: Many-server queues with customer abandonment: a survey of diffusion and fluid approximations. J. Syst. Sci. Syst. Eng. 21(1), 1–36 (2012)CrossRef

14.

Duran, S., Verloop, I.M.: Asymptotic optimal control of markov-modulated restless bandits. Proc. ACM Measure. Anal. Comput. Syst. 2(1), 7 (2018)

15.

Gast, N., Gaujal, B.: A mean field approach for optimization in discrete time. Dis. Event Dynam. Syst. 21(1), 63–101 (2011)CrossRef

16.

Gittins, J., Glazebrook, K., Weber, R.: Multi-Armed Bandit Allocation Indices. John Wiley & Sons, Chichester (1989)

17.

Glazebrook, K.D., Kirkbride, C., Ouenniche, J.: Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Oper. Res. 57, 975–989 (2009)CrossRef

18.

Glazebrook, K.D., Mitchell, H.M., Ansell, P.S.: Index policies for the maintenance of a collection of machines by a set of repairmen. Eur. J. Oper. Res. 165(1), 267–284 (2005)CrossRef

19.

Hasenbein, J., Perry, D. (Eds.): Special issue on queueing systems with abandonments. Queueing Syst. 75(2–4), 111–113 2013

20.

Ji, B., Gupta, GG. R., Sharma, M., Lin, X., Shroff, N.B.: Achieving optimal throughput and near-optimal asymptotic delay performance in multichannel wireless networks with low complexity: a practical greedy scheduling policy. IEEE/ACM Trans. Network, 23(3):880–893, (2014)

21.

Larrañaga, M., Ayesta, U., Verloop, I.M.: Index policies for multi-class queues with convex holding cost and abandonments. In: Proceedings of ACM SIGMETRICS, Austin TX, USA, (2014)

22.

Larrañaga, M., Ayesta, U., Verloop, I.M.: Asymptotically optimal index policies for an abandonment queue with convex holding cost. Queueing Syst. 81(2–3), 99–169 (2015)CrossRef

23.

Larrañaga, M., Ayesta, U., Verloop, I.M.: Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Trans. Network. 24(6), 3812–3825 (2016)CrossRef

24.

Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Foundations and Application of Sensor Management, eds. A.O. Hero III, D.A. Castanon, D. Cochran and K. Kastella., pp. 121–308, Springer, Verlag, (2007)

25.

Niño-Mora, J.: Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 31(1), 50–84 (2006)CrossRef

26.

Niño-Mora, J.: Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198 (2007)CrossRef

27.

Niño-Mora, J., Villar, S.S.: Sensor scheduling for hunting elusive hiding targets via whittle’s restless bandit index policy. In: International Conference on NETwork Games, Control and Optimization (NetGCooP 2011), pages 1–8. IEEE, (2011)

28.

Opp, M., Glazebrook, K., Kulkarni, V.G.: Outsourcing warranty repairs: dynamic allocation. Naval Res. Logist. (NRL) 52(5), 381–398 (2005)CrossRef

29.

Ouyang, W., Eryilmaz, A., Shroff, N.B.: Asymptotically optimal downlink scheduling over markovian fading channels. In: 2012 Proceedings IEEE INFOCOM, pages 1224–1232. IEEE, (2012)

30.

Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)CrossRef

31.

Stolyar, A.L.: Maxweight scheduling in a generalized switch: state space collapse and workload minimization in heavy traffic. Ann. Appl. Probab. 14(1), 1–53 (2004)CrossRef

32.

Tijms, H.C.: Stochastic Modelling and Analysis: A Computational Approach. John Wiley & Sons Inc, NY (1986)

33.

van Dijk, N.M.: Approximate uniformization for continuous-time markov chains with an application to performability analysis. Stochast. Processes Appl. 40(2), 339–357 (1992)CrossRef

34.

Verloop, I.M.: Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4), 1947–1995 (2016)CrossRef

35.

Weber, R.R., Weiss, G.: On an index policy for restless bandits. J. Appl. Probab. 27(03), 637–648 (1990)CrossRef

36.

Whittle, P.: Restless bandits: activity allocation in a changing world. J. Appl. Probab., 25(A):287–298, (1988)

37.

Whittle, P.: Optimal Control, Basics and Beyond. John Wiley & Sons, NY (1996)

Title: On the Whittle index of Markov modulated restless bandits
Authors: S. Duran
U. Ayesta
I. M. Verloop
Publication date: 27-06-2022
Publisher: Springer US
Published in: Queueing Systems / Issue 3-4/2022
Print ISSN: 0257-0130
Electronic ISSN: 1572-9443
DOI: https://doi.org/10.1007/s11134-022-09737-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 3-4/2022

Strategic customer behavior and optimal policies in a passenger–taxi double-ended queueing system with multiple access points and nonzero matching times

Uniform stability of some large-scale parallel server networks

Correction to: Extremal GI/GI/1 queues given two moments: exploiting Tchebycheff systems

On the Gittins index for multistage jobs

A general “power-of-d” dispatching framework for heterogeneous systems

Premium Partner