skip to main content
10.1145/3321707.3321799acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

POET: open-ended coevolution of environments and their optimized solutions

Published:13 July 2019Publication History

ABSTRACT

How can progress in machine learning and reinforcement learning be automated to generate its own never-ending curriculum of challenges without human intervention? The recent emergence of quality diversity (QD) algorithms offers a glimpse of the potential for such continual open-ended invention. For example, novelty search showcases the benefits of explicit novelty pressure, MAP-Elites and Innovation Engines highlight the advantage of explicit elitism within niches in an otherwise divergent process, and minimal criterion coevolution (MCC) reveals that problems and solutions can coevolve divergently. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper combines these principles to produce a practical approach to generating an endless progression of diverse and increasingly challenging environments while at the same time explicitly optimizing their solutions. An intriguing implication is the opportunity to transfer solutions among environments, reflecting the view that innovation is a circuitous and unpredictable process. POET is tested in a 2-D obstacles course domain, where it generates diverse and sophisticated behaviors that create and solve a wide range of environmental challenges, many of which cannot be solved by direct optimization, or by a direct-path curriculum-building control algorithm. We hope that POET will inspire a new push towards open-ended discovery across many domains.

Skip Supplemental Material Section

Supplemental Material

References

  1. Charles W. Anderson. 1989. Learning to Control an Inverted Pendulum Using Neural Networks. IEEE Control Systems Magazine 9 (1989), 31--37.Google ScholarGoogle ScholarCross RefCross Ref
  2. Mark Bedau. 2008. The Arrow of Complexity Hypothesis (Abstract). In Proceedings of the Eleventh International Conference on Artificial Life (Alife XI), Seth Bullock, Jason Noble, Richard Watson, and Mark Bedau (Eds.). MIT Press, Cambridge, MA, 750. http://www.alifexi.org/papers/ALIFExi-abstracts-0010.pdfGoogle ScholarGoogle Scholar
  3. Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res.(JAIR) 47 (2013), 253--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. ACM, 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jonathan C. Brant and Kenneth O. Stanley. 2017. Minimal Criterion Coevolution: A New Approach to Open-Ended Search. In Proceedings of the 2017 on Genetic and Evolutionary Computation Conference (GECCO). 67--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. (2016). arXiv:arXiv:1606.01540Google ScholarGoogle Scholar
  7. N. Cheney, R. MacCurdy, J. Clune, and H. Lipson. 2013. Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2013). ACM Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Advances in Neural Information Processing Systems (NeurIPS) 31. 5032--5043. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret. 2015. Robots that can adapt like animals. Nature 521 (2015), 503--507.Google ScholarGoogle Scholar
  10. Edwin D De Jong. 2004. The incremental pareto-coevolution archive. In Genetic and Evolutionary Computation Conference. Springer, 525--536.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), 248--255.Google ScholarGoogle Scholar
  12. Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2019. Go-Explore: a New Approach for Hard-Exploration Problems. arXiv preprint arXiv:1901.10995 (2019).Google ScholarGoogle Scholar
  13. Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070 (2018).Google ScholarGoogle Scholar
  14. S.G. Ficici and J.B. Pollack. 1998. Challenges in coevolutionary learning: Armsrace dynamics, open-endedness, and mediocre stable states. Artificial life VI (1998), 238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. 2018. Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning. 1514--1523.Google ScholarGoogle Scholar
  16. Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse Curriculum Generation for Reinforcement Learning. In Proceedings of the 1st Annual Conference on Robot Learning. 482--495.Google ScholarGoogle Scholar
  17. Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017).Google ScholarGoogle Scholar
  18. Faustino Gomez and Risto Miikkulainen. 1997. Incremental Evolution of Complex General Behavior. Adaptive Behavior 5 (1997), 317--342. gomez:ab97 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Abhishek Gupta, Benjamin Eysenbach, Chelsea Finn, and Sergey Levine. 2018. Unsupervised Meta-Learning for Reinforcement Learning. arXiv (2018). arXiv:1806.04640 http://arxiv.org/abs/1806.04640Google ScholarGoogle Scholar
  20. David Ha. 2017. Evolving stable strategies. http://blog.otoro.net/. (2017).Google ScholarGoogle Scholar
  21. David Ha. 2018. Reinforcement Learning for Improving Agent Design. arXiv preprint arXiv:1810.03779 (2018).Google ScholarGoogle Scholar
  22. Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. 2017. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017).Google ScholarGoogle Scholar
  23. Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR abs/1707.02286 (2017). arXiv:1707.02286 http://arxiv.org/abs/1707.02286Google ScholarGoogle Scholar
  24. Joost Huizinga and Jeff Clune. 2018. Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm. arXiv preprint arXiv:1807.03392 (2018).Google ScholarGoogle Scholar
  25. Joost Huizinga, Jean-Baptiste Mouret, and Jeff Clune. 2016. Does Aligning Phenotypic and Genotypic Modularity Improve the Evolution of Neural Networks?. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference (GECCO). 125--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Edwin D De Jong and Jordan B Pollack. 2004. Ideal evaluation from coevolution. Evolutionary computation 12, 2 (2004), 159--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. 2018. Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation. arXiv preprint arXiv:1806.10729 (2018).Google ScholarGoogle Scholar
  28. Andrej Karpathy and Michiel Van De Panne. 2012. Curriculum learning for motor skills. In Canadian Conference on Artificial Intelligence. Springer, 325--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  30. O. Klimov. 2016. BipedalWalkerHardcore-v2. https://gym.openai.com. (2016).Google ScholarGoogle Scholar
  31. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. B. Langdon. 2005. Pfeiffer - A Distributed Open-ended Evolutionary System. In AISB'05: Proceedings of the Joint Symposium on Socially Inspired Computing (METAS 2005), Bruce Edmonds, Nigel Gilbert, Steven Gustafson, David Hales, and Natalio Krasnogor (Eds.). 7--13. http://www.cs.ucl.ac.uk/staf/W.Langdon/ftp/papers/wbl_metas2005.pdfGoogle ScholarGoogle Scholar
  33. Yann LeCun and Corinna Cortes. 1998. The MNIST database of handwritten digits. (1998).Google ScholarGoogle Scholar
  34. Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O. Stanley. 2018. ES is More Than Just a Traditional Finite-difference Approximator. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 450--457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Joel Lehman and Kenneth O. Stanley. 2010. Revising the evolutionary computation abstraction: minimal criteria novelty search. In Proceedings of the 12th annual conference on Genetic and evolutionary computation (GECCO '10). ACM, New York, NY, USA, 103--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Joel Lehman and Kenneth O. Stanley. 2011. Abandoning Objectives: Evolution through the Search for Novelty Alone. Evolutionary Computation 19, 2 (2011), 189--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Joel Lehman and Kenneth O. Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Joel Lehman and Kenneth O. Stanley. 2011. Novelty Search and the Problem with Objectives. In Genetic Programming Theory and Practice IX (GPTP 2011).Google ScholarGoogle Scholar
  39. Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2017. Teacher-Student Curriculum Learning. arXiv preprint arXiv:1707.00183 (2017).Google ScholarGoogle Scholar
  40. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  42. Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).Google ScholarGoogle Scholar
  43. Anh Nguyen, Jason Yosinski, and Jeff Clune. 2016. Understanding Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning. Evolutionary Computation 24, 3 (2016), 545--572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Elena Popovici, Anthony Bucci, R. Paul Wiegand, and Edwin D. De Jong. 2012. Coevolutionary Principles. Springer Berlin Heidelberg, Berlin, Heidelberg, 987-- 1033.Google ScholarGoogle Scholar
  45. Justin K Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation. 3, 40 (2016).Google ScholarGoogle Scholar
  46. Ingo Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie. 83--114.Google ScholarGoogle Scholar
  47. Tim Salimans, Jonathan Ho, Xi Chen, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google ScholarGoogle Scholar
  48. Nikolay Savinov, Anton Raichuk, Raphael Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, and Sylvain Gelly. 2018. EPISODIC CURIOSITY THROUGH REACHABILITY. arXiv preprint arXiv:1810.0227 (2018).Google ScholarGoogle Scholar
  49. Jürgen Schmidhuber. 2013. Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in psychology 4 (2013), 313.Google ScholarGoogle Scholar
  50. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning. 1889--1897. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, and Jürgen Schmidhuber. 2010. Parameter-exploring policy gradients. Neural Networks 23, 4 (2010), 551--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Noor Shaker, Julian Togelius, and Mark J Nelson. 2016. Procedural content generation in games. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. L.B. Soros, Nick Cheney, and Kenneth O Stanley. 2016. How the Strictness of the Minimal Criterion Impacts Open-Ended Evolution. In ALIFE 15: The Fifteenth International Conference on the Synthesis and Simulation of Living Systems. 208--215.Google ScholarGoogle Scholar
  54. L.B. Soros and Kenneth O Stanley. 2014. Identifying Necessary Conditions for Open-Ended Evolution through the Artificial Life World of Chromaria. In ALIFE 14: The Fourteenth International Conference on the Synthesis and Simulation of Living Systems. 793--800.Google ScholarGoogle ScholarCross RefCross Ref
  55. Russell K Standish. 2003. Open-ended artificial evolution. International Journal of Computational Intelligence and Applications 3, 02 (2003), 167--175.Google ScholarGoogle ScholarCross RefCross Ref
  56. Kenneth O. Stanley. 2007. Compositional Pattern Producing Networks: A Novel Abstraction of Development. Genetic Programming and Evolvable Machines Special Issue on Developmental Systems 8, 2 (2007), 131--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Kenneth O Stanley and Joel Lehman. 2015. Why Greatness Cannot Be Planned. (2015).Google ScholarGoogle Scholar
  58. Kenneth O. Stanley, Joel Lehman, and Lisa Soros. 2017. Open-endedness: The last grand challenge you've never heard of. O'Reilly Online (2017). https://www.oreilly.com/ideas/open-endedness-the-last-grand-challenge-youve-never-heard-ofGoogle ScholarGoogle Scholar
  59. Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tim Taylor, Mark Bedau, Alastair Channon, and David Ackley et al. 2016. Open-Ended Evolution: Perspectives from the OEE Workshop in York. Artificial life 22, 3 (2016), 408âĂŞ423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and Cameron Browne. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172--186.Google ScholarGoogle Scholar
  62. R. Paul Wiegand, William C. Liles, and Kenneth A. De Jong. 2001. An Empirical Analysis of Collaboration Methods in Cooperative Coevolutionary Algorithms. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation (GECCO'01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1235--1242. http://dl.acm.org/citation.cfm?id=2955239.2955458 Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber. 2008. Natural evolution strategies. In Evolutionary Computation, 2008. 3381--3387.Google ScholarGoogle Scholar
  64. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Brian G. Woolley and Kenneth O. Stanley. 2011. On the deleterious effects of a priori objectives on evolution and representation. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 957--964. https://doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Xingwen Zhang, Jeff Clune, and Kenneth O. Stanley. 2017. On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent. arXiv preprint arXiv:1712.06564 (2017).Google ScholarGoogle Scholar

Index Terms

  1. POET: open-ended coevolution of environments and their optimized solutions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2019
      1545 pages
      ISBN:9781450361118
      DOI:10.1145/3321707

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 July 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader