Skip to main content
Top
Published in: Queueing Systems 3-4/2022

27-06-2022

On the Whittle index of Markov modulated restless bandits

Authors: S. Duran, U. Ayesta, I. M. Verloop

Published in: Queueing Systems | Issue 3-4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we study a Multi-Armed Restless Bandit Problem (MARBP) subject to time fluctuations. This model has numerous applications in practice, like in cloud computing systems or in wireless communications networks. Each bandit is formed by two processes: a controllable process and an environment. The transition rates of the controllable process are determined by the state of the environment, which is an exogenous Markov process. The decision maker has full information on the state of every bandit, and the objective is to determine the optimal policy that minimises the long-run average cost. Given the complexity of the problem, we set out to characterise the Whittle index, which is obtained by solving a relaxed version of the MARBP. As reported in the literature, this heuristic performs extremely well for a wide variety of problems. Assuming that the optimal policy of the relaxed problem is of threshold type, we provide an algorithm that finds Whittle’s index. We then consider a multi-class queue with linear cost and impatient customers. For this model, we show threshold optimality, prove indexability, and obtain Whittle’s index in closed-form. We also study the limiting regimes in which the environment is relatively slower and faster than the controllable process. By numerical simulations, we assess the suboptimality of Whittle’s index policy in a wide variety of scenarios, and the general observation is that, as in the case of standard MARBP, the suboptimality gap of Whittle’s index policy is small.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Aalto, S., Lassila, P., Osti, P.: Whittle index approach to size-aware scheduling with time-varying channels. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 57–69, (2015) Aalto, S., Lassila, P., Osti, P.: Whittle index approach to size-aware scheduling with time-varying channels. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 57–69, (2015)
2.
go back to reference Altman, E., Avrachenkov, K.E., Núnez-Queija, R.: Perturbation analysis for denumerable markov chains with application to queueing models. Adv. Appl. Probab. 36(3), 839–853 (2004)CrossRef Altman, E., Avrachenkov, K.E., Núnez-Queija, R.: Perturbation analysis for denumerable markov chains with application to queueing models. Adv. Appl. Probab. 36(3), 839–853 (2004)CrossRef
3.
go back to reference Anand, A., de Veciana, G.: A Whittle’s index based approach for QoE optimization in wireless networks. In: Proceedings of ACM SIGMETRICS, Irvine, California, USA, (2018) Anand, A., de Veciana, G.: A Whittle’s index based approach for QoE optimization in wireless networks. In: Proceedings of ACM SIGMETRICS, Irvine, California, USA, (2018)
4.
go back to reference Ansell, P.S., Glazebrook, K.D., Niño-Mora, J., O’Keeffe, M.: Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57, 21–39 (2003)CrossRef Ansell, P.S., Glazebrook, K.D., Niño-Mora, J., O’Keeffe, M.: Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57, 21–39 (2003)CrossRef
5.
go back to reference Arapostathis, A., Das, A., Pang, G., Zheng, Y.: Optimal control of markov-modulated multiclass many-server queues. Stochast. Syst. 9(2), 83–181 (2019) Arapostathis, A., Das, A., Pang, G., Zheng, Y.: Optimal control of markov-modulated multiclass many-server queues. Stochast. Syst. 9(2), 83–181 (2019)
6.
go back to reference Argon, N.T., Ding, L., Glazebrook, K.D., Ziya, S.: Dynamic routing of customers with general delay costs in a multiserver queuing system. Probab. Eng. Inf. Sci. 23(2), 175–203 (2009)CrossRef Argon, N.T., Ding, L., Glazebrook, K.D., Ziya, S.: Dynamic routing of customers with general delay costs in a multiserver queuing system. Probab. Eng. Inf. Sci. 23(2), 175–203 (2009)CrossRef
7.
go back to reference Bhulai, S., Brooms, A.C., Spieksma, F.M.: On structural properties of the value function for an unbounded jump markov process with an application to a processor sharing retrial queue. Queueing Syst. 76(4), 425–446 (2014)CrossRef Bhulai, S., Brooms, A.C., Spieksma, F.M.: On structural properties of the value function for an unbounded jump markov process with an application to a processor sharing retrial queue. Queueing Syst. 76(4), 425–446 (2014)CrossRef
8.
go back to reference Borkar, V.S., Kasbekar, G.S., Pattathil, S., Shetty, P.: Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems, (2017) Borkar, V.S., Kasbekar, G.S., Pattathil, S., Shetty, P.: Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems, (2017)
9.
go back to reference Borkar, V.S., Pattathil, S.: Whittle indexability in egalitarian processor sharing systems. Ann. Oper. Res., pp. 1–21, (2017) Borkar, V.S., Pattathil, S.: Whittle indexability in egalitarian processor sharing systems. Ann. Oper. Res., pp. 1–21, (2017)
10.
go back to reference Borkar, V.S., Ravikumar, K., Saboo, K.: An index policy for dynamic pricing in cloud computing under price commitments. Appl. Math. 44, 215–245 (2017) Borkar, V.S., Ravikumar, K., Saboo, K.: An index policy for dynamic pricing in cloud computing under price commitments. Appl. Math. 44, 215–245 (2017)
11.
go back to reference Boucherie, R.J., Van Dijk, N.M.: Queueing networks: a fundamental approach, vol. 154. Springer Science & Business Media, NY (2010) Boucherie, R.J., Van Dijk, N.M.: Queueing networks: a fundamental approach, vol. 154. Springer Science & Business Media, NY (2010)
12.
go back to reference Budhiraja, A., Ghosh, A., Liu, X.: Scheduling control for markov-modulated single-server multiclass queueing systems in heavy traffic. Queueing Syst. 78(1), 57–97 (2014)CrossRef Budhiraja, A., Ghosh, A., Liu, X.: Scheduling control for markov-modulated single-server multiclass queueing systems in heavy traffic. Queueing Syst. 78(1), 57–97 (2014)CrossRef
13.
go back to reference Dai, J.G., He, S.: Many-server queues with customer abandonment: a survey of diffusion and fluid approximations. J. Syst. Sci. Syst. Eng. 21(1), 1–36 (2012)CrossRef Dai, J.G., He, S.: Many-server queues with customer abandonment: a survey of diffusion and fluid approximations. J. Syst. Sci. Syst. Eng. 21(1), 1–36 (2012)CrossRef
14.
go back to reference Duran, S., Verloop, I.M.: Asymptotic optimal control of markov-modulated restless bandits. Proc. ACM Measure. Anal. Comput. Syst. 2(1), 7 (2018) Duran, S., Verloop, I.M.: Asymptotic optimal control of markov-modulated restless bandits. Proc. ACM Measure. Anal. Comput. Syst. 2(1), 7 (2018)
15.
go back to reference Gast, N., Gaujal, B.: A mean field approach for optimization in discrete time. Dis. Event Dynam. Syst. 21(1), 63–101 (2011)CrossRef Gast, N., Gaujal, B.: A mean field approach for optimization in discrete time. Dis. Event Dynam. Syst. 21(1), 63–101 (2011)CrossRef
16.
go back to reference Gittins, J., Glazebrook, K., Weber, R.: Multi-Armed Bandit Allocation Indices. John Wiley & Sons, Chichester (1989) Gittins, J., Glazebrook, K., Weber, R.: Multi-Armed Bandit Allocation Indices. John Wiley & Sons, Chichester (1989)
17.
go back to reference Glazebrook, K.D., Kirkbride, C., Ouenniche, J.: Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Oper. Res. 57, 975–989 (2009)CrossRef Glazebrook, K.D., Kirkbride, C., Ouenniche, J.: Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Oper. Res. 57, 975–989 (2009)CrossRef
18.
go back to reference Glazebrook, K.D., Mitchell, H.M., Ansell, P.S.: Index policies for the maintenance of a collection of machines by a set of repairmen. Eur. J. Oper. Res. 165(1), 267–284 (2005)CrossRef Glazebrook, K.D., Mitchell, H.M., Ansell, P.S.: Index policies for the maintenance of a collection of machines by a set of repairmen. Eur. J. Oper. Res. 165(1), 267–284 (2005)CrossRef
19.
go back to reference Hasenbein, J., Perry, D. (Eds.): Special issue on queueing systems with abandonments. Queueing Syst. 75(2–4), 111–113 2013 Hasenbein, J., Perry, D. (Eds.): Special issue on queueing systems with abandonments. Queueing Syst. 75(2–4), 111–113 2013
20.
go back to reference Ji, B., Gupta, GG. R., Sharma, M., Lin, X., Shroff, N.B.: Achieving optimal throughput and near-optimal asymptotic delay performance in multichannel wireless networks with low complexity: a practical greedy scheduling policy. IEEE/ACM Trans. Network, 23(3):880–893, (2014) Ji, B., Gupta, GG. R., Sharma, M., Lin, X., Shroff, N.B.: Achieving optimal throughput and near-optimal asymptotic delay performance in multichannel wireless networks with low complexity: a practical greedy scheduling policy. IEEE/ACM Trans. Network, 23(3):880–893, (2014)
21.
go back to reference Larrañaga, M., Ayesta, U., Verloop, I.M.: Index policies for multi-class queues with convex holding cost and abandonments. In: Proceedings of ACM SIGMETRICS, Austin TX, USA, (2014) Larrañaga, M., Ayesta, U., Verloop, I.M.: Index policies for multi-class queues with convex holding cost and abandonments. In: Proceedings of ACM SIGMETRICS, Austin TX, USA, (2014)
22.
go back to reference Larrañaga, M., Ayesta, U., Verloop, I.M.: Asymptotically optimal index policies for an abandonment queue with convex holding cost. Queueing Syst. 81(2–3), 99–169 (2015)CrossRef Larrañaga, M., Ayesta, U., Verloop, I.M.: Asymptotically optimal index policies for an abandonment queue with convex holding cost. Queueing Syst. 81(2–3), 99–169 (2015)CrossRef
23.
go back to reference Larrañaga, M., Ayesta, U., Verloop, I.M.: Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Trans. Network. 24(6), 3812–3825 (2016)CrossRef Larrañaga, M., Ayesta, U., Verloop, I.M.: Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Trans. Network. 24(6), 3812–3825 (2016)CrossRef
24.
go back to reference Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Foundations and Application of Sensor Management, eds. A.O. Hero III, D.A. Castanon, D. Cochran and K. Kastella., pp. 121–308, Springer, Verlag, (2007) Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Foundations and Application of Sensor Management, eds. A.O. Hero III, D.A. Castanon, D. Cochran and K. Kastella., pp. 121–308, Springer, Verlag, (2007)
25.
go back to reference Niño-Mora, J.: Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 31(1), 50–84 (2006)CrossRef Niño-Mora, J.: Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 31(1), 50–84 (2006)CrossRef
26.
go back to reference Niño-Mora, J.: Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198 (2007)CrossRef Niño-Mora, J.: Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198 (2007)CrossRef
27.
go back to reference Niño-Mora, J., Villar, S.S.: Sensor scheduling for hunting elusive hiding targets via whittle’s restless bandit index policy. In: International Conference on NETwork Games, Control and Optimization (NetGCooP 2011), pages 1–8. IEEE, (2011) Niño-Mora, J., Villar, S.S.: Sensor scheduling for hunting elusive hiding targets via whittle’s restless bandit index policy. In: International Conference on NETwork Games, Control and Optimization (NetGCooP 2011), pages 1–8. IEEE, (2011)
28.
go back to reference Opp, M., Glazebrook, K., Kulkarni, V.G.: Outsourcing warranty repairs: dynamic allocation. Naval Res. Logist. (NRL) 52(5), 381–398 (2005)CrossRef Opp, M., Glazebrook, K., Kulkarni, V.G.: Outsourcing warranty repairs: dynamic allocation. Naval Res. Logist. (NRL) 52(5), 381–398 (2005)CrossRef
29.
go back to reference Ouyang, W., Eryilmaz, A., Shroff, N.B.: Asymptotically optimal downlink scheduling over markovian fading channels. In: 2012 Proceedings IEEE INFOCOM, pages 1224–1232. IEEE, (2012) Ouyang, W., Eryilmaz, A., Shroff, N.B.: Asymptotically optimal downlink scheduling over markovian fading channels. In: 2012 Proceedings IEEE INFOCOM, pages 1224–1232. IEEE, (2012)
30.
go back to reference Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)CrossRef Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)CrossRef
31.
go back to reference Stolyar, A.L.: Maxweight scheduling in a generalized switch: state space collapse and workload minimization in heavy traffic. Ann. Appl. Probab. 14(1), 1–53 (2004)CrossRef Stolyar, A.L.: Maxweight scheduling in a generalized switch: state space collapse and workload minimization in heavy traffic. Ann. Appl. Probab. 14(1), 1–53 (2004)CrossRef
32.
go back to reference Tijms, H.C.: Stochastic Modelling and Analysis: A Computational Approach. John Wiley & Sons Inc, NY (1986) Tijms, H.C.: Stochastic Modelling and Analysis: A Computational Approach. John Wiley & Sons Inc, NY (1986)
33.
go back to reference van Dijk, N.M.: Approximate uniformization for continuous-time markov chains with an application to performability analysis. Stochast. Processes Appl. 40(2), 339–357 (1992)CrossRef van Dijk, N.M.: Approximate uniformization for continuous-time markov chains with an application to performability analysis. Stochast. Processes Appl. 40(2), 339–357 (1992)CrossRef
34.
go back to reference Verloop, I.M.: Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4), 1947–1995 (2016)CrossRef Verloop, I.M.: Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4), 1947–1995 (2016)CrossRef
35.
go back to reference Weber, R.R., Weiss, G.: On an index policy for restless bandits. J. Appl. Probab. 27(03), 637–648 (1990)CrossRef Weber, R.R., Weiss, G.: On an index policy for restless bandits. J. Appl. Probab. 27(03), 637–648 (1990)CrossRef
36.
go back to reference Whittle, P.: Restless bandits: activity allocation in a changing world. J. Appl. Probab., 25(A):287–298, (1988) Whittle, P.: Restless bandits: activity allocation in a changing world. J. Appl. Probab., 25(A):287–298, (1988)
37.
go back to reference Whittle, P.: Optimal Control, Basics and Beyond. John Wiley & Sons, NY (1996) Whittle, P.: Optimal Control, Basics and Beyond. John Wiley & Sons, NY (1996)
Metadata
Title
On the Whittle index of Markov modulated restless bandits
Authors
S. Duran
U. Ayesta
I. M. Verloop
Publication date
27-06-2022
Publisher
Springer US
Published in
Queueing Systems / Issue 3-4/2022
Print ISSN: 0257-0130
Electronic ISSN: 1572-9443
DOI
https://doi.org/10.1007/s11134-022-09737-y

Other articles of this Issue 3-4/2022

Queueing Systems 3-4/2022 Go to the issue

Premium Partner