Top

Published in:

2022 | OriginalPaper | Chapter

Learning to Solve a Stochastic Orienteering Problem with Time Windows

Authors : Fynn Schmitt-Ulms, André Hottung, Meinolf Sellmann, Kevin Tierney

Published in: Learning and Intelligent Optimization

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Reinforcement learning (RL) has seen increasing success at solving a variety of combinatorial optimization problems. These techniques have generally been applied to deterministic optimization problems with few side constraints, such as the traveling salesperson problem (TSP) or capacitated vehicle routing problem (CVRP). With this in mind, the recent IJCAI AI for TSP competition challenged participants to apply RL to a difficult routing problem involving optimization under uncertainty and time windows. We present the winning submission to the challenge, which uses the policy optimization with multiple optima (POMO) approach combined with efficient active search and Monte Carlo roll-outs. We present experimental results showing that our proposed approach outperforms the second place approach by 1.7%. Furthermore, our computational results suggest that solving more realistic routing problems may not be as difficult as previously thought.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Binary Black Widow Optimization Algorithm for Feature Selection Problems

next chapter ML-Based Approach for Accelerating Global Search Algorithm for Solving Multicriteria Problems

We note that the TDOP is abbreviated as the TD-OPSWTW in some works.

Although not the focus of our research, our approach can also generate complete solutions using the expected travel time for the supervised learning track, and these tie the winning team’s solutions and generate them in less computation time.

We assume the penalty is large enough (\(p > r_i\)) such that, in the version of the problem with recourse, we should always avoid the late arrival penalty at nodes.

Basso, R., Kulcsár, B., Sanchez-Diaz, I., Qu, X.: Dynamic stochastic electric vehicle routing with safe reinforcement learning. Transp. Res. Part E: Logistics Transp. Rev. 157, 102496 (2022)CrossRef

Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural Combinatorial Optimization with Reinforcement Learning. arXiv:1611.0 (2016)

Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290, 405–421 (2020)

Bliek, L., et al.: The first AI4TSP competition: learning to solve stochastic routing problems (2022). https://doi.org/10.48550/arXiv.2201.10453

Bono, G.: Deep multi-agent reinforcement learning for dynamic and stochastic vehicle routing problems. Ph.D. thesis, Université de Lyon (2020)

Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial optimization. In: Advances in Neural Information Processing Systems, pp. 6278–6289 (2019)

Choo, J., et al.: Simulation-guided beam search for neural combinatorial optimization (2022). https://doi.org/10.48550/arXiv.2207.06190

de O. da Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A.: Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 465–480. PMLR (2020)

Helsgaun, K.: An effective implementation of the Lin-Kernighan traveling salesman heuristic. Eur. J. Oper. Res. 126, 106–130 (2000)MathSciNetCrossRefMATH

10.

Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. U.S.A. 79(8), 2554–2558 (1982)MathSciNetCrossRefMATH

11.

Hottung, A., Bhandari, B., Tierney, K.: Learning a latent search space for routing problems using variational autoencoders. In: International Conference on Learning Representations (2021)

12.

Hottung, A., Kwon, Y.D., Tierney, K.: Efficient active search for combinatorial optimization problems. In: International Conference on Learning Representations (2022)

13.

Hottung, A., Tierney, K.: Neural large neighborhood search for the capacitated vehicle routing problem. In: European Conference on Artificial Intelligence, pp. 443–450 (2020)

14.

Joe, W., Lau, H.C.: Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers. Proc. Int. Conf. Autom. Plann. Sched. 30, 394–402 (2020)

15.

Joshi, C.K., Cappart, Q., Rousseau, L.M., Laurent, T.: Learning TSP requires rethinking generalization. In: Michel, L.D. (ed.) 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Leibniz International Proceedings in Informatics (LIPIcs), vol. 210, pp. 33:1–33:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2021)

16.

Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem. arXiv:1906.01227 (2019)

17.

Kool, W., van Hoof, H., Gromicho, J., Welling, M.: Deep policy dynamic programming for vehicle routing problems. In: International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 190–213 (2022)

18.

Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! International Conference on Learning Representations (2019). https://doi.org/10.48550/arXiv.1803.08475

19.

Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: POMO: policy optimization with multiple optima for reinforcement learning. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 21188–21198. Curran Associates, Inc. (2020)

20.

Li, S., Yan, Z., Wu, C.: Learning to delegate for large-scale vehicle routing. In: Advances in Neural Information Processing Systems. 34 (2021)

21.

Sultana, N.N., Baniwal, V., Basumatary, A., Mittal, P., Ghosh, S., Khadilkar, H.: Fast approximate solutions using reinforcement learning for dynamic capacitated vehicle routing with time windows. arXiv preprint arXiv:2102.12088 (2021)

22.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017)

23.

Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural Inf. Process. Syst. 28, 2692–2700 (2015)

24.

Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)CrossRefMATH

25.

Williams, R.J., Peng, J.: Function optimization using connectionist reinforcement learning algorithms. Connection Sci. 3(3), 241–268 (1991)CrossRef

26.

Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems. IEEE Trans. Neural Netw. Learn. Syst. 33(9), 5057–5069 (2021)

27.

Xin, L., Song, W., Cao, Z., Zhang, J.: NeuroLKH: combining deep learning model with lin-kernighan-helsgaun heuristic for solving the traveling salesman problem. In: Advances in Neural Information Processing Systems. 34 (2021)

Title: Learning to Solve a Stochastic Orienteering Problem with Time Windows
Authors: Fynn Schmitt-Ulms
André Hottung
Meinolf Sellmann
Kevin Tierney
Publisher: Springer International Publishing
Book: Learning and Intelligent Optimization
Print ISBN: 978-3-031-24865-8

Electronic ISBN: 978-3-031-24866-5

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-3-031-24866-5_8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner