Top

Published in:

2020 | OriginalPaper | Chapter

Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation

Authors : Souvik Barat, Prashant Kumar, Monika Gajrani, Harshad Khadilkar, Hardik Meisheri, Vinita Baniwal, Vinay Kulkarni

Published in: Multi-Agent-Based Simulation XX

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and autonomous driving, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in the real world is to train the agent before deployment by computing the effect of its exploratory actions on the environment. While this effect is easy to compute for online gameplay (where the rules of the game are well known) and autonomous driving (where the dynamics of the vehicle are predictable), it is much more difficult for complex systems due to associated complexities, such as uncertainty, adaptability and emergent behaviour. In this paper, we describe a framework for effective integration of a reinforcement learning controller with an actor-based multi-agent simulation of the supply chain network including the warehouse, transportation system, and stores, with the objective of maximizing product availability while minimising wastage under constraints.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Modeling Pedestrian Behavior Under Panic During a Fire Emergency

next chapter On Developing a More Comprehensive Decision-Making Architecture for Empirical Social Research: Agent-Based Simulation of Mobility Demands in Switzerland

Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985)

Allen, J.: Effective Akka. O’Reilly Media, Sebastopol (2013)

Anderson, P.: Perspective: complexity theory and organization science. Organ. Sci. 10(3), 216–232 (1999)CrossRef

Armstrong, J.: Erlang - a survey of the language and its industrial applications. In: Proceedings of the INAP, vol. 96 (1996)

Barabási, A.L., Bonabeau, E.: Scale-free networks. Sci. Am. 288(5), 60–69 (2003)CrossRef

Bouabdallah, S., Noth, A., Siegwart, R.: PID vs LQ control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2451–2456. IEEE (2004)

Caro, F., Gallien, J.: Inventory management of a fast-fashion retail network. Oper. Res. 58(2), 257–273 (2010)MathSciNetCrossRef

Clark, T., Kulkarni, V., Barat, S., Barn, B.: ESL: an actor-based platform for developing emergent behaviour organisation simulations. In: Demazeau, Y., Davidsson, P., Bajo, J., Vale, Z. (eds.) PAAMS 2017. LNCS (LNAI), vol. 10349, pp. 311–315. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59930-4_27CrossRef

Condea, C., Thiesse, F., Fleisch, E.: RFID-enabled shelf replenishment with backroom monitoring in retail stores. Decis. Support Syst. 52(4), 839–849 (2012)CrossRef

10.

Duan, Y., et al.: One-shot imitation learning. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), vol. 31 (2017)

11.

Gabel, T., Riedmiller, M.: Distributed policy search RL for job-shop scheduling tasks. Int. J. Prod. Res. 50(1), 41–61 (2012)CrossRef

12.

Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002)CrossRef

13.

Godfrey, G.A., Powell, W.B.: An ADP algorithm for dynamic fleet management, I: single period travel times. Transp. Sci. 36(1), 21–39 (2002)CrossRef

14.

Hewitt, C.: Actor model of computation: scalable robust information systems. arXiv preprint arXiv:1008.1459 (2010)

15.

Holland, J.H.: Complex Adaptive Systems, pp. 17–30. Daedalus, Boston (1992)

16.

Iacob, M., Jonkers, D.H., Lankhorst, M., Proper, E., Quartel, D.D.: Archimate 2.0 Specification. Van Haren Publishing, ’s-Hertogenbosch (2012)

17.

Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst. Appl. 36(3), 6520–6526 (2009)CrossRef

18.

Kaggle: Instacart market basket analysis data. https://www.kaggle.com/c/instacart-market-basket-analysis/data. Accessed August 2018

19.

Khadilkar, H.: A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. ITS 20, 727–736 (2018)

20.

Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRef

21.

Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)

22.

Lee, H.L., Padmanabhan, V., Whang, S.: Information distortion in a supply chain: The bullwhip effect. Manage. Sci. 43(4), 546–558 (1997)CrossRef

23.

Macal, C.M., North, M.J.: Tutorial on agent-based modelling and simulation. J. Simul. 4(3), 151–162 (2010)CrossRef

24.

Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)MathSciNetCrossRef

25.

Meadows, D.H., Wright, D.: Thinking in Systems. Chelsea Green Publishing, Hartford (2008)

26.

Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

27.

Mortazavi, A., Khamseh, A.A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 37, 207–220 (2015)CrossRef

28.

Powell, W.B.: AI, OR and control theory: a Rosetta stone for stochastic optimization. Princeton University (2012)

29.

Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)MATH

30.

Sabri, E.H., Beamon, B.M.: A multi-objective approach to simultaneous strategic and operational planning in supply chain design. Omega 28(5), 581–598 (2000)CrossRef

31.

Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent supply chain management using adaptive critic learning. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 33(2), 235–244 (2003)CrossRef

32.

Silver, E.A.: Operations research in inventory management: a review and critique. Oper. Res. 29(4), 628–645 (1981)MathSciNetCrossRef

33.

Simon, H.A.: The architecture of complexity. In: Facets of Systems Science, pp. 457–476. Springer, Boston (1991). https://doi.org/10.1007/978-1-4899-0718-9_31

34.

Smith, S.A., Agrawal, N.: Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1), 50–64 (2000)MathSciNetCrossRef

35.

Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (2012)MATH

36.

Thomas, M., McGarry, F.: Top-down vs. bottom-up process improvement. IEEE Softw. 11(4), 12–13 (1994)CrossRef

37.

Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J. Comput. 18(1), 31–42 (2006)MathSciNetCrossRef

38.

Valluri, A., North, M.J., Macal, C.M.: Reinforcement learning in supply chains. Int. J. Neural Sys. 19(05), 331–344 (2009)CrossRef

39.

White, S.A.: BPMN Modeling and Reference Guide (2008)

Title: Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation
Authors: Souvik Barat
Prashant Kumar
Monika Gajrani
Harshad Khadilkar
Hardik Meisheri
Vinita Baniwal
Vinay Kulkarni
Publisher: Springer International Publishing
Book: Multi-Agent-Based Simulation XX
Print ISBN: 978-3-030-60842-2

Electronic ISBN: 978-3-030-60843-9

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-60843-9_3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner