Skip to main content

2020 | OriginalPaper | Buchkapitel

Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation

verfasst von : Souvik Barat, Prashant Kumar, Monika Gajrani, Harshad Khadilkar, Hardik Meisheri, Vinita Baniwal, Vinay Kulkarni

Erschienen in: Multi-Agent-Based Simulation XX

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and autonomous driving, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in the real world is to train the agent before deployment by computing the effect of its exploratory actions on the environment. While this effect is easy to compute for online gameplay (where the rules of the game are well known) and autonomous driving (where the dynamics of the vehicle are predictable), it is much more difficult for complex systems due to associated complexities, such as uncertainty, adaptability and emergent behaviour. In this paper, we describe a framework for effective integration of a reinforcement learning controller with an actor-based multi-agent simulation of the supply chain network including the warehouse, transportation system, and stores, with the objective of maximizing product availability while minimising wastage under constraints.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985) Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985)
2.
Zurück zum Zitat Allen, J.: Effective Akka. O’Reilly Media, Sebastopol (2013) Allen, J.: Effective Akka. O’Reilly Media, Sebastopol (2013)
3.
Zurück zum Zitat Anderson, P.: Perspective: complexity theory and organization science. Organ. Sci. 10(3), 216–232 (1999)CrossRef Anderson, P.: Perspective: complexity theory and organization science. Organ. Sci. 10(3), 216–232 (1999)CrossRef
4.
Zurück zum Zitat Armstrong, J.: Erlang - a survey of the language and its industrial applications. In: Proceedings of the INAP, vol. 96 (1996) Armstrong, J.: Erlang - a survey of the language and its industrial applications. In: Proceedings of the INAP, vol. 96 (1996)
5.
Zurück zum Zitat Barabási, A.L., Bonabeau, E.: Scale-free networks. Sci. Am. 288(5), 60–69 (2003)CrossRef Barabási, A.L., Bonabeau, E.: Scale-free networks. Sci. Am. 288(5), 60–69 (2003)CrossRef
6.
Zurück zum Zitat Bouabdallah, S., Noth, A., Siegwart, R.: PID vs LQ control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2451–2456. IEEE (2004) Bouabdallah, S., Noth, A., Siegwart, R.: PID vs LQ control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2451–2456. IEEE (2004)
7.
Zurück zum Zitat Caro, F., Gallien, J.: Inventory management of a fast-fashion retail network. Oper. Res. 58(2), 257–273 (2010)MathSciNetCrossRef Caro, F., Gallien, J.: Inventory management of a fast-fashion retail network. Oper. Res. 58(2), 257–273 (2010)MathSciNetCrossRef
9.
Zurück zum Zitat Condea, C., Thiesse, F., Fleisch, E.: RFID-enabled shelf replenishment with backroom monitoring in retail stores. Decis. Support Syst. 52(4), 839–849 (2012)CrossRef Condea, C., Thiesse, F., Fleisch, E.: RFID-enabled shelf replenishment with backroom monitoring in retail stores. Decis. Support Syst. 52(4), 839–849 (2012)CrossRef
10.
Zurück zum Zitat Duan, Y., et al.: One-shot imitation learning. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), vol. 31 (2017) Duan, Y., et al.: One-shot imitation learning. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), vol. 31 (2017)
11.
Zurück zum Zitat Gabel, T., Riedmiller, M.: Distributed policy search RL for job-shop scheduling tasks. Int. J. Prod. Res. 50(1), 41–61 (2012)CrossRef Gabel, T., Riedmiller, M.: Distributed policy search RL for job-shop scheduling tasks. Int. J. Prod. Res. 50(1), 41–61 (2012)CrossRef
12.
Zurück zum Zitat Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002)CrossRef Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002)CrossRef
13.
Zurück zum Zitat Godfrey, G.A., Powell, W.B.: An ADP algorithm for dynamic fleet management, I: single period travel times. Transp. Sci. 36(1), 21–39 (2002)CrossRef Godfrey, G.A., Powell, W.B.: An ADP algorithm for dynamic fleet management, I: single period travel times. Transp. Sci. 36(1), 21–39 (2002)CrossRef
14.
15.
Zurück zum Zitat Holland, J.H.: Complex Adaptive Systems, pp. 17–30. Daedalus, Boston (1992) Holland, J.H.: Complex Adaptive Systems, pp. 17–30. Daedalus, Boston (1992)
16.
Zurück zum Zitat Iacob, M., Jonkers, D.H., Lankhorst, M., Proper, E., Quartel, D.D.: Archimate 2.0 Specification. Van Haren Publishing, ’s-Hertogenbosch (2012) Iacob, M., Jonkers, D.H., Lankhorst, M., Proper, E., Quartel, D.D.: Archimate 2.0 Specification. Van Haren Publishing, ’s-Hertogenbosch (2012)
17.
Zurück zum Zitat Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst. Appl. 36(3), 6520–6526 (2009)CrossRef Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst. Appl. 36(3), 6520–6526 (2009)CrossRef
19.
Zurück zum Zitat Khadilkar, H.: A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. ITS 20, 727–736 (2018) Khadilkar, H.: A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. ITS 20, 727–736 (2018)
20.
Zurück zum Zitat Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRef Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRef
21.
Zurück zum Zitat Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000) Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
22.
Zurück zum Zitat Lee, H.L., Padmanabhan, V., Whang, S.: Information distortion in a supply chain: The bullwhip effect. Manage. Sci. 43(4), 546–558 (1997)CrossRef Lee, H.L., Padmanabhan, V., Whang, S.: Information distortion in a supply chain: The bullwhip effect. Manage. Sci. 43(4), 546–558 (1997)CrossRef
23.
Zurück zum Zitat Macal, C.M., North, M.J.: Tutorial on agent-based modelling and simulation. J. Simul. 4(3), 151–162 (2010)CrossRef Macal, C.M., North, M.J.: Tutorial on agent-based modelling and simulation. J. Simul. 4(3), 151–162 (2010)CrossRef
24.
Zurück zum Zitat Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)MathSciNetCrossRef Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)MathSciNetCrossRef
25.
Zurück zum Zitat Meadows, D.H., Wright, D.: Thinking in Systems. Chelsea Green Publishing, Hartford (2008) Meadows, D.H., Wright, D.: Thinking in Systems. Chelsea Green Publishing, Hartford (2008)
27.
Zurück zum Zitat Mortazavi, A., Khamseh, A.A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 37, 207–220 (2015)CrossRef Mortazavi, A., Khamseh, A.A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 37, 207–220 (2015)CrossRef
28.
Zurück zum Zitat Powell, W.B.: AI, OR and control theory: a Rosetta stone for stochastic optimization. Princeton University (2012) Powell, W.B.: AI, OR and control theory: a Rosetta stone for stochastic optimization. Princeton University (2012)
29.
Zurück zum Zitat Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)MATH Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)MATH
30.
Zurück zum Zitat Sabri, E.H., Beamon, B.M.: A multi-objective approach to simultaneous strategic and operational planning in supply chain design. Omega 28(5), 581–598 (2000)CrossRef Sabri, E.H., Beamon, B.M.: A multi-objective approach to simultaneous strategic and operational planning in supply chain design. Omega 28(5), 581–598 (2000)CrossRef
31.
Zurück zum Zitat Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent supply chain management using adaptive critic learning. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 33(2), 235–244 (2003)CrossRef Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent supply chain management using adaptive critic learning. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 33(2), 235–244 (2003)CrossRef
32.
Zurück zum Zitat Silver, E.A.: Operations research in inventory management: a review and critique. Oper. Res. 29(4), 628–645 (1981)MathSciNetCrossRef Silver, E.A.: Operations research in inventory management: a review and critique. Oper. Res. 29(4), 628–645 (1981)MathSciNetCrossRef
34.
Zurück zum Zitat Smith, S.A., Agrawal, N.: Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1), 50–64 (2000)MathSciNetCrossRef Smith, S.A., Agrawal, N.: Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1), 50–64 (2000)MathSciNetCrossRef
35.
Zurück zum Zitat Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (2012)MATH Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (2012)MATH
36.
Zurück zum Zitat Thomas, M., McGarry, F.: Top-down vs. bottom-up process improvement. IEEE Softw. 11(4), 12–13 (1994)CrossRef Thomas, M., McGarry, F.: Top-down vs. bottom-up process improvement. IEEE Softw. 11(4), 12–13 (1994)CrossRef
37.
Zurück zum Zitat Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J. Comput. 18(1), 31–42 (2006)MathSciNetCrossRef Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J. Comput. 18(1), 31–42 (2006)MathSciNetCrossRef
38.
Zurück zum Zitat Valluri, A., North, M.J., Macal, C.M.: Reinforcement learning in supply chains. Int. J. Neural Sys. 19(05), 331–344 (2009)CrossRef Valluri, A., North, M.J., Macal, C.M.: Reinforcement learning in supply chains. Int. J. Neural Sys. 19(05), 331–344 (2009)CrossRef
39.
Zurück zum Zitat White, S.A.: BPMN Modeling and Reference Guide (2008) White, S.A.: BPMN Modeling and Reference Guide (2008)
Metadaten
Titel
Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation
verfasst von
Souvik Barat
Prashant Kumar
Monika Gajrani
Harshad Khadilkar
Hardik Meisheri
Vinita Baniwal
Vinay Kulkarni
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-60843-9_3

Premium Partner