Skip to main content

2024 | OriginalPaper | Buchkapitel

Hybrid Surrogate Assisted Evolutionary Multiobjective Reinforcement Learning for Continuous Robot Control

verfasst von : Atanu Mazumdar, Ville Kyrki

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many real world reinforcement learning (RL) problems consist of multiple conflicting objective functions that need to be optimized simultaneously. Finding these optimal policies (known as Pareto optimal policies) for different preferences of objectives requires extensive state space exploration. Thus, obtaining a dense set of Pareto optimal policies is challenging and often reduces the sample efficiency. In this paper, we propose a hybrid multiobjective policy optimization approach for solving multiobjective reinforcement learning (MORL) problems with continuous actions. Our approach combines the faster convergence of multiobjective policy gradient (MOPG) and a surrogate assisted multiobjective evolutionary algorithm (MOEA) to produce a dense set of Pareto optimal policies. The solutions found by the MOPG algorithm are utilized to build computationally inexpensive surrogate models in the parameter space of the policies that approximate the return of policies. An MOEA is executed that utilizes the surrogates’ mean prediction and uncertainty in the prediction to find approximate optimal policies. The final solution policies are later evaluated using the simulator and stored in an archive. Tests on multiobjective continuous action RL benchmarks show that a hybrid surrogate assisted multiobjective evolutionary optimizer with robust selection criterion produces a dense set of Pareto optimal policies without extensively exploring the state space. We also apply the proposed approach to train Pareto optimal agents for autonomous driving, where the hybrid approach produced superior results compared to a state-of-the-art MOPG algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
1.
Zurück zum Zitat Ao, Y., Li, H., Zhu, L., Ali, S., Yang, Z.: The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. J. Petrol. Sci. Eng. 174, 776–789 (2019)CrossRef Ao, Y., Li, H., Zhu, L., Ali, S., Yang, Z.: The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. J. Petrol. Sci. Eng. 174, 776–789 (2019)CrossRef
2.
Zurück zum Zitat Arashi, M., Lukman, A.F., Algamal, Z.Y.: Liu regression after random forest for prediction and modeling in high dimension. J. Chemometr. 36(4), e3393 (2022)CrossRef Arashi, M., Lukman, A.F., Algamal, Z.Y.: Liu regression after random forest for prediction and modeling in high dimension. J. Chemometr. 36(4), e3393 (2022)CrossRef
3.
Zurück zum Zitat Bouhlel, M.A., Martins, J.R.R.A.: Gradient-enhanced kriging for high-dimensional problems. Eng. Comput. 35(1), 157–173 (2018)CrossRef Bouhlel, M.A., Martins, J.R.R.A.: Gradient-enhanced kriging for high-dimensional problems. Eng. Comput. 35(1), 157–173 (2018)CrossRef
4.
Zurück zum Zitat Chen, D., Wang, Y., Gao, W.: Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Appl. Intell. 50(10), 3301–3317 (2020)CrossRef Chen, D., Wang, Y., Gao, W.: Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Appl. Intell. 50(10), 3301–3317 (2020)CrossRef
5.
Zurück zum Zitat Cheng, R., Jin, Y., Olhofer, M., Sendhoff, B.: A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 20, 773–791 (2016)CrossRef Cheng, R., Jin, Y., Olhofer, M., Sendhoff, B.: A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 20, 773–791 (2016)CrossRef
6.
Zurück zum Zitat Chugh, T., Sindhya, K., Hakanen, J., Miettinen, K.: A survey on handling computationally expensive multiobjective optimization problems with evolutionary algorithms. Soft. Comput. 23, 3137–3166 (2019)CrossRef Chugh, T., Sindhya, K., Hakanen, J., Miettinen, K.: A survey on handling computationally expensive multiobjective optimization problems with evolutionary algorithms. Soft. Comput. 23, 3137–3166 (2019)CrossRef
7.
Zurück zum Zitat Conlon, J., Lin, J.: Greenhouse gas emission impact of autonomous vehicle introduction in an urban network. Transp. Res. Rec. 2673(5), 142–152 (2019)CrossRef Conlon, J., Lin, J.: Greenhouse gas emission impact of autonomous vehicle introduction in an urban network. Transp. Res. Rec. 2673(5), 142–152 (2019)CrossRef
8.
Zurück zum Zitat Deb, K., Jain, H.: An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans. Evol. Comput. 18, 577–601 (2014)CrossRef Deb, K., Jain, H.: An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans. Evol. Comput. 18, 577–601 (2014)CrossRef
9.
Zurück zum Zitat Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRef Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRef
10.
Zurück zum Zitat Forrester, A., Sobester, A., Keane, A.: Engineering Design via Surrogate Modelling. John Wiley & Sons, Hoboken (2008)CrossRef Forrester, A., Sobester, A., Keane, A.: Engineering Design via Surrogate Modelling. John Wiley & Sons, Hoboken (2008)CrossRef
11.
Zurück zum Zitat Hayes, C.F., Reymond, M., Roijers, D.M., Howley, E., Mannion, P.: Risk aware and multi-objective decision making with distributional monte carlo tree search (2021). arXiv:2102.00966 Hayes, C.F., Reymond, M., Roijers, D.M., Howley, E., Mannion, P.: Risk aware and multi-objective decision making with distributional monte carlo tree search (2021). arXiv:​2102.​00966
12.
Zurück zum Zitat Hayes, C.F., et al.: A practical guide to multi-objective reinforcement learning and planning. Auton. Agents Multi-Agent Syst. 36(1), 26 (2022)CrossRef Hayes, C.F., et al.: A practical guide to multi-objective reinforcement learning and planning. Auton. Agents Multi-Agent Syst. 36(1), 26 (2022)CrossRef
13.
Zurück zum Zitat Jin, Y.: Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol. Comput. 1, 61–70 (2011)CrossRef Jin, Y.: Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol. Comput. 1, 61–70 (2011)CrossRef
14.
Zurück zum Zitat Jin, Y., Wang, H., Chugh, T., Guo, D., Miettinen, K.: Data-driven evolutionary optimization: an overview and case studies. IEEE Trans. Evol. Comput. 23, 442–458 (2019)CrossRef Jin, Y., Wang, H., Chugh, T., Guo, D., Miettinen, K.: Data-driven evolutionary optimization: an overview and case studies. IEEE Trans. Evol. Comput. 23, 442–458 (2019)CrossRef
15.
Zurück zum Zitat Knowles, J.D., Thiele, L., Zitzler, E.: A tutorial on the performance assessment of stochastic multiobjective optimizers (2006) Knowles, J.D., Thiele, L., Zitzler, E.: A tutorial on the performance assessment of stochastic multiobjective optimizers (2006)
17.
Zurück zum Zitat Li, M., Yao, X.: Quality evaluation of solution sets in multiobjective optimisation. ACM Comput. Surv. 52(2), 1–38 (2019)CrossRef Li, M., Yao, X.: Quality evaluation of solution sets in multiobjective optimisation. ACM Comput. Surv. 52(2), 1–38 (2019)CrossRef
18.
Zurück zum Zitat Mazumdar, A., Chugh, T., Hakanen, J., Miettinen, K.: Probabilistic selection approaches in decomposition-based evolutionary algorithms for offline data-driven multiobjective optimization. IEEE Trans. Evol. Comput. 26, 1182–1191 (2022)CrossRef Mazumdar, A., Chugh, T., Hakanen, J., Miettinen, K.: Probabilistic selection approaches in decomposition-based evolutionary algorithms for offline data-driven multiobjective optimization. IEEE Trans. Evol. Comput. 26, 1182–1191 (2022)CrossRef
19.
Zurück zum Zitat Parisi, S., Pirotta, M., Smacchia, N., Bascetta, L., Restelli, M.: Policy gradient approaches for multi-objective sequential decision making. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 2323–2330 (2014) Parisi, S., Pirotta, M., Smacchia, N., Bascetta, L., Restelli, M.: Policy gradient approaches for multi-objective sequential decision making. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 2323–2330 (2014)
20.
Zurück zum Zitat Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M.: Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804–818 (2015)CrossRef Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M.: Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804–818 (2015)CrossRef
21.
Zurück zum Zitat Siddique, U., Weng, P., Zimmer, M.: Learning fair policies in multiobjective (deep) reinforcement learning with average and discounted rewards. In: Proceedings of the 37th International Conference on Machine Learning (2020) Siddique, U., Weng, P., Zimmer, M.: Learning fair policies in multiobjective (deep) reinforcement learning with average and discounted rewards. In: Proceedings of the 37th International Conference on Machine Learning (2020)
22.
Zurück zum Zitat Stork, J., et al.: Open issues in surrogate-assisted optimization. High-Performance Simulation-Based Optimization p. 225–244 (2019) Stork, J., et al.: Open issues in surrogate-assisted optimization. High-Performance Simulation-Based Optimization p. 225–244 (2019)
23.
Zurück zum Zitat Xu, J., Tian, Y., Ma, P., Rus, D., Sueda, S., Matusik, W.: Prediction-guided multi-objective reinforcement learning for continuous robot control. In: Proceedings of the 37th International Conference on Machine Learning, pp. 10607–10616. PMLR (2020) Xu, J., Tian, Y., Ma, P., Rus, D., Sueda, S., Matusik, W.: Prediction-guided multi-objective reinforcement learning for continuous robot control. In: Proceedings of the 37th International Conference on Machine Learning, pp. 10607–10616. PMLR (2020)
24.
Zurück zum Zitat Yang, K., Emmerich, M., Deutz, A., Bäck, T.: Efficient computation of expected hypervolume improvement using box decomposition algorithms. J. Global Optim. 75(1), 3–34 (2019)MathSciNetCrossRef Yang, K., Emmerich, M., Deutz, A., Bäck, T.: Efficient computation of expected hypervolume improvement using box decomposition algorithms. J. Global Optim. 75(1), 3–34 (2019)MathSciNetCrossRef
25.
Zurück zum Zitat Zapotecas Martínez, S., Coello Coello, C.A.: Moea/d assisted by RBF networks for expensive multi-objective optimization problems. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1405–1412. Association for Computing Machinery (2013) Zapotecas Martínez, S., Coello Coello, C.A.: Moea/d assisted by RBF networks for expensive multi-objective optimization problems. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1405–1412. Association for Computing Machinery (2013)
26.
Zurück zum Zitat Zhang, Q., Li, H.: MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11, 712–731 (2007)CrossRef Zhang, Q., Li, H.: MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11, 712–731 (2007)CrossRef
27.
Zurück zum Zitat Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8, 173–195 (2000)CrossRef Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8, 173–195 (2000)CrossRef
Metadaten
Titel
Hybrid Surrogate Assisted Evolutionary Multiobjective Reinforcement Learning for Continuous Robot Control
verfasst von
Atanu Mazumdar
Ville Kyrki
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-56855-8_4

Premium Partner