research-article

POET: open-ended coevolution of environments and their optimized solutions

Authors:
Rui Wang

Uber AI

Uber AI
View Profile

,
Joel Lehman

Uber AI

Uber AI
View Profile

,
Jeff Clune

Uber AI

Uber AI
View Profile

,
Kenneth O. Stanley

Uber AI

Uber AI
View Profile

GECCO '19: Proceedings of the Genetic and Evolutionary Computation ConferenceJuly 2019Pages 142–151https://doi.org/10.1145/3321707.3321799

Published:13 July 2019Publication History

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 142–151

ABSTRACT

How can progress in machine learning and reinforcement learning be automated to generate its own never-ending curriculum of challenges without human intervention? The recent emergence of quality diversity (QD) algorithms offers a glimpse of the potential for such continual open-ended invention. For example, novelty search showcases the benefits of explicit novelty pressure, MAP-Elites and Innovation Engines highlight the advantage of explicit elitism within niches in an otherwise divergent process, and minimal criterion coevolution (MCC) reveals that problems and solutions can coevolve divergently. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper combines these principles to produce a practical approach to generating an endless progression of diverse and increasingly challenging environments while at the same time explicitly optimizing their solutions. An intriguing implication is the opportunity to transfer solutions among environments, reflecting the view that innovation is a circuitous and unpredictable process. POET is tested in a 2-D obstacles course domain, where it generates diverse and sophisticated behaviors that create and solve a wide range of environmental challenges, many of which cannot be solved by direct optimization, or by a direct-path curriculum-building control algorithm. We hope that POET will inspire a new push towards open-ended discovery across many domains.

Supplemental Material

Available for Download

pdf

p142-wang_suppl.pdf (445.9 KB)

Supplemental material.

References

Charles W. Anderson. 1989. Learning to Control an Inverted Pendulum Using Neural Networks. IEEE Control Systems Magazine 9 (1989), 31--37.Google ScholarCross Ref
Mark Bedau. 2008. The Arrow of Complexity Hypothesis (Abstract). In Proceedings of the Eleventh International Conference on Artificial Life (Alife XI), Seth Bullock, Jason Noble, Richard Watson, and Mark Bedau (Eds.). MIT Press, Cambridge, MA, 750. http://www.alifexi.org/papers/ALIFExi-abstracts-0010.pdfGoogle Scholar
Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res.(JAIR) 47 (2013), 253--279. Google ScholarDigital Library
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. ACM, 41--48. Google ScholarDigital Library
Jonathan C. Brant and Kenneth O. Stanley. 2017. Minimal Criterion Coevolution: A New Approach to Open-Ended Search. In Proceedings of the 2017 on Genetic and Evolutionary Computation Conference (GECCO). 67--74. Google ScholarDigital Library
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. (2016). arXiv:arXiv:1606.01540Google Scholar
N. Cheney, R. MacCurdy, J. Clune, and H. Lipson. 2013. Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2013). ACM Press, New York, NY. Google ScholarDigital Library
Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Advances in Neural Information Processing Systems (NeurIPS) 31. 5032--5043. Google ScholarDigital Library
A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret. 2015. Robots that can adapt like animals. Nature 521 (2015), 503--507.Google Scholar
Edwin D De Jong. 2004. The incremental pareto-coevolution archive. In Genetic and Evolutionary Computation Conference. Springer, 525--536.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), 248--255.Google Scholar
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2019. Go-Explore: a New Approach for Hard-Exploration Problems. arXiv preprint arXiv:1901.10995 (2019).Google Scholar
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070 (2018).Google Scholar
S.G. Ficici and J.B. Pollack. 1998. Challenges in coevolutionary learning: Armsrace dynamics, open-endedness, and mediocre stable states. Artificial life VI (1998), 238. Google ScholarDigital Library
Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. 2018. Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning. 1514--1523.Google Scholar
Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse Curriculum Generation for Reinforcement Learning. In Proceedings of the 1st Annual Conference on Robot Learning. 482--495.Google Scholar
Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017).Google Scholar
Faustino Gomez and Risto Miikkulainen. 1997. Incremental Evolution of Complex General Behavior. Adaptive Behavior 5 (1997), 317--342. gomez:ab97 Google ScholarDigital Library
Abhishek Gupta, Benjamin Eysenbach, Chelsea Finn, and Sergey Levine. 2018. Unsupervised Meta-Learning for Reinforcement Learning. arXiv (2018). arXiv:1806.04640 http://arxiv.org/abs/1806.04640Google Scholar
David Ha. 2017. Evolving stable strategies. http://blog.otoro.net/. (2017).Google Scholar
David Ha. 2018. Reinforcement Learning for Improving Agent Design. arXiv preprint arXiv:1810.03779 (2018).Google Scholar
Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. 2017. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017).Google Scholar
Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR abs/1707.02286 (2017). arXiv:1707.02286 http://arxiv.org/abs/1707.02286Google Scholar
Joost Huizinga and Jeff Clune. 2018. Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm. arXiv preprint arXiv:1807.03392 (2018).Google Scholar
Joost Huizinga, Jean-Baptiste Mouret, and Jeff Clune. 2016. Does Aligning Phenotypic and Genotypic Modularity Improve the Evolution of Neural Networks?. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference (GECCO). 125--132. Google ScholarDigital Library
Edwin D De Jong and Jordan B Pollack. 2004. Ideal evaluation from coevolution. Evolutionary computation 12, 2 (2004), 159--192. Google ScholarDigital Library
Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. 2018. Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation. arXiv preprint arXiv:1806.10729 (2018).Google Scholar
Andrej Karpathy and Michiel Van De Panne. 2012. Curriculum learning for motor skills. In Canadian Conference on Artificial Intelligence. Springer, 325--330. Google ScholarDigital Library
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
O. Klimov. 2016. BipedalWalkerHardcore-v2. https://gym.openai.com. (2016).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
W. B. Langdon. 2005. Pfeiffer - A Distributed Open-ended Evolutionary System. In AISB'05: Proceedings of the Joint Symposium on Socially Inspired Computing (METAS 2005), Bruce Edmonds, Nigel Gilbert, Steven Gustafson, David Hales, and Natalio Krasnogor (Eds.). 7--13. http://www.cs.ucl.ac.uk/staf/W.Langdon/ftp/papers/wbl_metas2005.pdfGoogle Scholar
Yann LeCun and Corinna Cortes. 1998. The MNIST database of handwritten digits. (1998).Google Scholar
Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O. Stanley. 2018. ES is More Than Just a Traditional Finite-difference Approximator. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 450--457. Google ScholarDigital Library
Joel Lehman and Kenneth O. Stanley. 2010. Revising the evolutionary computation abstraction: minimal criteria novelty search. In Proceedings of the 12th annual conference on Genetic and evolutionary computation (GECCO '10). ACM, New York, NY, USA, 103--110. Google ScholarDigital Library
Joel Lehman and Kenneth O. Stanley. 2011. Abandoning Objectives: Evolution through the Search for Novelty Alone. Evolutionary Computation 19, 2 (2011), 189--223. Google ScholarDigital Library
Joel Lehman and Kenneth O. Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. 211--218. Google ScholarDigital Library
Joel Lehman and Kenneth O. Stanley. 2011. Novelty Search and the Problem with Objectives. In Genetic Programming Theory and Practice IX (GPTP 2011).Google Scholar
Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2017. Teacher-Student Curriculum Learning. arXiv preprint arXiv:1707.00183 (2017).Google Scholar
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937. Google ScholarDigital Library
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.Google Scholar
Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).Google Scholar
Anh Nguyen, Jason Yosinski, and Jeff Clune. 2016. Understanding Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning. Evolutionary Computation 24, 3 (2016), 545--572. Google ScholarDigital Library
Elena Popovici, Anthony Bucci, R. Paul Wiegand, and Edwin D. De Jong. 2012. Coevolutionary Principles. Springer Berlin Heidelberg, Berlin, Heidelberg, 987-- 1033.Google Scholar
Justin K Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation. 3, 40 (2016).Google Scholar
Ingo Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie. 83--114.Google Scholar
Tim Salimans, Jonathan Ho, Xi Chen, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google Scholar
Nikolay Savinov, Anton Raichuk, Raphael Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, and Sylvain Gelly. 2018. EPISODIC CURIOSITY THROUGH REACHABILITY. arXiv preprint arXiv:1810.0227 (2018).Google Scholar
Jürgen Schmidhuber. 2013. Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in psychology 4 (2013), 313.Google Scholar
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning. 1889--1897. Google ScholarDigital Library
Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, and Jürgen Schmidhuber. 2010. Parameter-exploring policy gradients. Neural Networks 23, 4 (2010), 551--559. Google ScholarDigital Library
Noor Shaker, Julian Togelius, and Mark J Nelson. 2016. Procedural content generation in games. Springer. Google ScholarDigital Library
L.B. Soros, Nick Cheney, and Kenneth O Stanley. 2016. How the Strictness of the Minimal Criterion Impacts Open-Ended Evolution. In ALIFE 15: The Fifteenth International Conference on the Synthesis and Simulation of Living Systems. 208--215.Google Scholar
L.B. Soros and Kenneth O Stanley. 2014. Identifying Necessary Conditions for Open-Ended Evolution through the Artificial Life World of Chromaria. In ALIFE 14: The Fourteenth International Conference on the Synthesis and Simulation of Living Systems. 793--800.Google ScholarCross Ref
Russell K Standish. 2003. Open-ended artificial evolution. International Journal of Computational Intelligence and Applications 3, 02 (2003), 167--175.Google ScholarCross Ref
Kenneth O. Stanley. 2007. Compositional Pattern Producing Networks: A Novel Abstraction of Development. Genetic Programming and Evolvable Machines Special Issue on Developmental Systems 8, 2 (2007), 131--162. Google ScholarDigital Library
Kenneth O Stanley and Joel Lehman. 2015. Why Greatness Cannot Be Planned. (2015).Google Scholar
Kenneth O. Stanley, Joel Lehman, and Lisa Soros. 2017. Open-endedness: The last grand challenge you've never heard of. O'Reilly Online (2017). https://www.oreilly.com/ideas/open-endedness-the-last-grand-challenge-youve-never-heard-ofGoogle Scholar
Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1.Google ScholarDigital Library
Tim Taylor, Mark Bedau, Alastair Channon, and David Ackley et al. 2016. Open-Ended Evolution: Perspectives from the OEE Workshop in York. Artificial life 22, 3 (2016), 408âĂŞ423. Google ScholarDigital Library
Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and Cameron Browne. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172--186.Google Scholar
R. Paul Wiegand, William C. Liles, and Kenneth A. De Jong. 2001. An Empirical Analysis of Collaboration Methods in Cooperative Coevolutionary Algorithms. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation (GECCO'01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1235--1242. http://dl.acm.org/citation.cfm?id=2955239.2955458 Google ScholarDigital Library
Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber. 2008. Natural evolution strategies. In Evolutionary Computation, 2008. 3381--3387.Google Scholar
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256. Google ScholarDigital Library
Brian G. Woolley and Kenneth O. Stanley. 2011. On the deleterious effects of a priori objectives on evolution and representation. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 957--964. https://doi.org/ Google ScholarDigital Library
Xingwen Zhang, Jeff Clune, and Kenneth O. Stanley. 2017. On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent. arXiv preprint arXiv:1712.06564 (2017).Google Scholar

Index Terms

POET: open-ended coevolution of environments and their optimized solutions
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Artificial life

Recommendations

Minimal criterion coevolution: a new approach to open-ended search
GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference

Recent studies have emphasized the merits of search processes that lack overarching objectives, instead promoting divergence by rewarding behavioral novelty. While this less objective search paradigm is more open-ended and divergent, it still differs ...
Read More
Rapid host-parasite coevolution drives the production and maintenance of diversity in digital organisms
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation

Accumulating evidence suggests evolution and ecology can happen on similar time scales. Coevolution between hosts and parasites is a practical example of interacting ecological and evolutionary dynamics. Antagonistic interactions theoretically and ...
Read More
Open-ended evolution with multi-containers QD
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Evolution in nature has allowed biological systems to develop and survive in many different environments. It can be attributed to one of the major features of natural evolution: its open-endedness, that can be considered as the ability to continuously ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
July 2019
1545 pages
ISBN:9781450361118
DOI:10.1145/3321707
Editor:
Manuel López-Ibáñez
University of Manchester, UK
,
General Chairs:
Anne Auger
Inria and Ecole Polytechnique, France
,
Thomas Stützle
IRIDIA, Université libre de Bruxelles Belgium
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
artificial life
coevolution
evolution strategies
novelty search
open-ended evolution
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 826
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

POET: open-ended coevolution of environments and their optimized solutions

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Minimal criterion coevolution: a new approach to open-ended search

Rapid host-parasite coevolution drives the production and maintenance of diversity in digital organisms

Open-ended evolution with multi-containers QD