Skip to main content
Top

2020 | OriginalPaper | Chapter

Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation

Authors : Souvik Barat, Prashant Kumar, Monika Gajrani, Harshad Khadilkar, Hardik Meisheri, Vinita Baniwal, Vinay Kulkarni

Published in: Multi-Agent-Based Simulation XX

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and autonomous driving, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in the real world is to train the agent before deployment by computing the effect of its exploratory actions on the environment. While this effect is easy to compute for online gameplay (where the rules of the game are well known) and autonomous driving (where the dynamics of the vehicle are predictable), it is much more difficult for complex systems due to associated complexities, such as uncertainty, adaptability and emergent behaviour. In this paper, we describe a framework for effective integration of a reinforcement learning controller with an actor-based multi-agent simulation of the supply chain network including the warehouse, transportation system, and stores, with the objective of maximizing product availability while minimising wastage under constraints.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985) Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985)
2.
go back to reference Allen, J.: Effective Akka. O’Reilly Media, Sebastopol (2013) Allen, J.: Effective Akka. O’Reilly Media, Sebastopol (2013)
3.
go back to reference Anderson, P.: Perspective: complexity theory and organization science. Organ. Sci. 10(3), 216–232 (1999)CrossRef Anderson, P.: Perspective: complexity theory and organization science. Organ. Sci. 10(3), 216–232 (1999)CrossRef
4.
go back to reference Armstrong, J.: Erlang - a survey of the language and its industrial applications. In: Proceedings of the INAP, vol. 96 (1996) Armstrong, J.: Erlang - a survey of the language and its industrial applications. In: Proceedings of the INAP, vol. 96 (1996)
5.
go back to reference Barabási, A.L., Bonabeau, E.: Scale-free networks. Sci. Am. 288(5), 60–69 (2003)CrossRef Barabási, A.L., Bonabeau, E.: Scale-free networks. Sci. Am. 288(5), 60–69 (2003)CrossRef
6.
go back to reference Bouabdallah, S., Noth, A., Siegwart, R.: PID vs LQ control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2451–2456. IEEE (2004) Bouabdallah, S., Noth, A., Siegwart, R.: PID vs LQ control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2451–2456. IEEE (2004)
7.
9.
go back to reference Condea, C., Thiesse, F., Fleisch, E.: RFID-enabled shelf replenishment with backroom monitoring in retail stores. Decis. Support Syst. 52(4), 839–849 (2012)CrossRef Condea, C., Thiesse, F., Fleisch, E.: RFID-enabled shelf replenishment with backroom monitoring in retail stores. Decis. Support Syst. 52(4), 839–849 (2012)CrossRef
10.
go back to reference Duan, Y., et al.: One-shot imitation learning. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), vol. 31 (2017) Duan, Y., et al.: One-shot imitation learning. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), vol. 31 (2017)
11.
go back to reference Gabel, T., Riedmiller, M.: Distributed policy search RL for job-shop scheduling tasks. Int. J. Prod. Res. 50(1), 41–61 (2012)CrossRef Gabel, T., Riedmiller, M.: Distributed policy search RL for job-shop scheduling tasks. Int. J. Prod. Res. 50(1), 41–61 (2012)CrossRef
12.
go back to reference Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002)CrossRef Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002)CrossRef
13.
go back to reference Godfrey, G.A., Powell, W.B.: An ADP algorithm for dynamic fleet management, I: single period travel times. Transp. Sci. 36(1), 21–39 (2002)CrossRef Godfrey, G.A., Powell, W.B.: An ADP algorithm for dynamic fleet management, I: single period travel times. Transp. Sci. 36(1), 21–39 (2002)CrossRef
15.
go back to reference Holland, J.H.: Complex Adaptive Systems, pp. 17–30. Daedalus, Boston (1992) Holland, J.H.: Complex Adaptive Systems, pp. 17–30. Daedalus, Boston (1992)
16.
go back to reference Iacob, M., Jonkers, D.H., Lankhorst, M., Proper, E., Quartel, D.D.: Archimate 2.0 Specification. Van Haren Publishing, ’s-Hertogenbosch (2012) Iacob, M., Jonkers, D.H., Lankhorst, M., Proper, E., Quartel, D.D.: Archimate 2.0 Specification. Van Haren Publishing, ’s-Hertogenbosch (2012)
17.
go back to reference Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst. Appl. 36(3), 6520–6526 (2009)CrossRef Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst. Appl. 36(3), 6520–6526 (2009)CrossRef
19.
go back to reference Khadilkar, H.: A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. ITS 20, 727–736 (2018) Khadilkar, H.: A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. ITS 20, 727–736 (2018)
20.
go back to reference Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRef Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRef
21.
go back to reference Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000) Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
22.
go back to reference Lee, H.L., Padmanabhan, V., Whang, S.: Information distortion in a supply chain: The bullwhip effect. Manage. Sci. 43(4), 546–558 (1997)CrossRef Lee, H.L., Padmanabhan, V., Whang, S.: Information distortion in a supply chain: The bullwhip effect. Manage. Sci. 43(4), 546–558 (1997)CrossRef
23.
go back to reference Macal, C.M., North, M.J.: Tutorial on agent-based modelling and simulation. J. Simul. 4(3), 151–162 (2010)CrossRef Macal, C.M., North, M.J.: Tutorial on agent-based modelling and simulation. J. Simul. 4(3), 151–162 (2010)CrossRef
24.
go back to reference Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)MathSciNetCrossRef Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)MathSciNetCrossRef
25.
go back to reference Meadows, D.H., Wright, D.: Thinking in Systems. Chelsea Green Publishing, Hartford (2008) Meadows, D.H., Wright, D.: Thinking in Systems. Chelsea Green Publishing, Hartford (2008)
27.
go back to reference Mortazavi, A., Khamseh, A.A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 37, 207–220 (2015)CrossRef Mortazavi, A., Khamseh, A.A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 37, 207–220 (2015)CrossRef
28.
go back to reference Powell, W.B.: AI, OR and control theory: a Rosetta stone for stochastic optimization. Princeton University (2012) Powell, W.B.: AI, OR and control theory: a Rosetta stone for stochastic optimization. Princeton University (2012)
29.
go back to reference Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)MATH Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)MATH
30.
go back to reference Sabri, E.H., Beamon, B.M.: A multi-objective approach to simultaneous strategic and operational planning in supply chain design. Omega 28(5), 581–598 (2000)CrossRef Sabri, E.H., Beamon, B.M.: A multi-objective approach to simultaneous strategic and operational planning in supply chain design. Omega 28(5), 581–598 (2000)CrossRef
31.
go back to reference Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent supply chain management using adaptive critic learning. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 33(2), 235–244 (2003)CrossRef Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent supply chain management using adaptive critic learning. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 33(2), 235–244 (2003)CrossRef
32.
go back to reference Silver, E.A.: Operations research in inventory management: a review and critique. Oper. Res. 29(4), 628–645 (1981)MathSciNetCrossRef Silver, E.A.: Operations research in inventory management: a review and critique. Oper. Res. 29(4), 628–645 (1981)MathSciNetCrossRef
34.
go back to reference Smith, S.A., Agrawal, N.: Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1), 50–64 (2000)MathSciNetCrossRef Smith, S.A., Agrawal, N.: Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1), 50–64 (2000)MathSciNetCrossRef
35.
go back to reference Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (2012)MATH Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (2012)MATH
36.
go back to reference Thomas, M., McGarry, F.: Top-down vs. bottom-up process improvement. IEEE Softw. 11(4), 12–13 (1994)CrossRef Thomas, M., McGarry, F.: Top-down vs. bottom-up process improvement. IEEE Softw. 11(4), 12–13 (1994)CrossRef
37.
go back to reference Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J. Comput. 18(1), 31–42 (2006)MathSciNetCrossRef Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J. Comput. 18(1), 31–42 (2006)MathSciNetCrossRef
38.
go back to reference Valluri, A., North, M.J., Macal, C.M.: Reinforcement learning in supply chains. Int. J. Neural Sys. 19(05), 331–344 (2009)CrossRef Valluri, A., North, M.J., Macal, C.M.: Reinforcement learning in supply chains. Int. J. Neural Sys. 19(05), 331–344 (2009)CrossRef
39.
go back to reference White, S.A.: BPMN Modeling and Reference Guide (2008) White, S.A.: BPMN Modeling and Reference Guide (2008)
Metadata
Title
Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation
Authors
Souvik Barat
Prashant Kumar
Monika Gajrani
Harshad Khadilkar
Hardik Meisheri
Vinita Baniwal
Vinay Kulkarni
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-60843-9_3

Premium Partner