ABSTRACT
How can progress in machine learning and reinforcement learning be automated to generate its own never-ending curriculum of challenges without human intervention? The recent emergence of quality diversity (QD) algorithms offers a glimpse of the potential for such continual open-ended invention. For example, novelty search showcases the benefits of explicit novelty pressure, MAP-Elites and Innovation Engines highlight the advantage of explicit elitism within niches in an otherwise divergent process, and minimal criterion coevolution (MCC) reveals that problems and solutions can coevolve divergently. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper combines these principles to produce a practical approach to generating an endless progression of diverse and increasingly challenging environments while at the same time explicitly optimizing their solutions. An intriguing implication is the opportunity to transfer solutions among environments, reflecting the view that innovation is a circuitous and unpredictable process. POET is tested in a 2-D obstacles course domain, where it generates diverse and sophisticated behaviors that create and solve a wide range of environmental challenges, many of which cannot be solved by direct optimization, or by a direct-path curriculum-building control algorithm. We hope that POET will inspire a new push towards open-ended discovery across many domains.
Supplemental Material
Available for Download
Supplemental material.
- Charles W. Anderson. 1989. Learning to Control an Inverted Pendulum Using Neural Networks. IEEE Control Systems Magazine 9 (1989), 31--37.Google ScholarCross Ref
- Mark Bedau. 2008. The Arrow of Complexity Hypothesis (Abstract). In Proceedings of the Eleventh International Conference on Artificial Life (Alife XI), Seth Bullock, Jason Noble, Richard Watson, and Mark Bedau (Eds.). MIT Press, Cambridge, MA, 750. http://www.alifexi.org/papers/ALIFExi-abstracts-0010.pdfGoogle Scholar
- Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res.(JAIR) 47 (2013), 253--279. Google ScholarDigital Library
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. ACM, 41--48. Google ScholarDigital Library
- Jonathan C. Brant and Kenneth O. Stanley. 2017. Minimal Criterion Coevolution: A New Approach to Open-Ended Search. In Proceedings of the 2017 on Genetic and Evolutionary Computation Conference (GECCO). 67--74. Google ScholarDigital Library
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. (2016). arXiv:arXiv:1606.01540Google Scholar
- N. Cheney, R. MacCurdy, J. Clune, and H. Lipson. 2013. Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2013). ACM Press, New York, NY. Google ScholarDigital Library
- Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Advances in Neural Information Processing Systems (NeurIPS) 31. 5032--5043. Google ScholarDigital Library
- A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret. 2015. Robots that can adapt like animals. Nature 521 (2015), 503--507.Google Scholar
- Edwin D De Jong. 2004. The incremental pareto-coevolution archive. In Genetic and Evolutionary Computation Conference. Springer, 525--536.Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), 248--255.Google Scholar
- Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2019. Go-Explore: a New Approach for Hard-Exploration Problems. arXiv preprint arXiv:1901.10995 (2019).Google Scholar
- Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070 (2018).Google Scholar
- S.G. Ficici and J.B. Pollack. 1998. Challenges in coevolutionary learning: Armsrace dynamics, open-endedness, and mediocre stable states. Artificial life VI (1998), 238. Google ScholarDigital Library
- Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. 2018. Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning. 1514--1523.Google Scholar
- Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse Curriculum Generation for Reinforcement Learning. In Proceedings of the 1st Annual Conference on Robot Learning. 482--495.Google Scholar
- Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017).Google Scholar
- Faustino Gomez and Risto Miikkulainen. 1997. Incremental Evolution of Complex General Behavior. Adaptive Behavior 5 (1997), 317--342. gomez:ab97 Google ScholarDigital Library
- Abhishek Gupta, Benjamin Eysenbach, Chelsea Finn, and Sergey Levine. 2018. Unsupervised Meta-Learning for Reinforcement Learning. arXiv (2018). arXiv:1806.04640 http://arxiv.org/abs/1806.04640Google Scholar
- David Ha. 2017. Evolving stable strategies. http://blog.otoro.net/. (2017).Google Scholar
- David Ha. 2018. Reinforcement Learning for Improving Agent Design. arXiv preprint arXiv:1810.03779 (2018).Google Scholar
- Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. 2017. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017).Google Scholar
- Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR abs/1707.02286 (2017). arXiv:1707.02286 http://arxiv.org/abs/1707.02286Google Scholar
- Joost Huizinga and Jeff Clune. 2018. Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm. arXiv preprint arXiv:1807.03392 (2018).Google Scholar
- Joost Huizinga, Jean-Baptiste Mouret, and Jeff Clune. 2016. Does Aligning Phenotypic and Genotypic Modularity Improve the Evolution of Neural Networks?. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference (GECCO). 125--132. Google ScholarDigital Library
- Edwin D De Jong and Jordan B Pollack. 2004. Ideal evaluation from coevolution. Evolutionary computation 12, 2 (2004), 159--192. Google ScholarDigital Library
- Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. 2018. Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation. arXiv preprint arXiv:1806.10729 (2018).Google Scholar
- Andrej Karpathy and Michiel Van De Panne. 2012. Curriculum learning for motor skills. In Canadian Conference on Artificial Intelligence. Springer, 325--330. Google ScholarDigital Library
- Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- O. Klimov. 2016. BipedalWalkerHardcore-v2. https://gym.openai.com. (2016).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
- W. B. Langdon. 2005. Pfeiffer - A Distributed Open-ended Evolutionary System. In AISB'05: Proceedings of the Joint Symposium on Socially Inspired Computing (METAS 2005), Bruce Edmonds, Nigel Gilbert, Steven Gustafson, David Hales, and Natalio Krasnogor (Eds.). 7--13. http://www.cs.ucl.ac.uk/staf/W.Langdon/ftp/papers/wbl_metas2005.pdfGoogle Scholar
- Yann LeCun and Corinna Cortes. 1998. The MNIST database of handwritten digits. (1998).Google Scholar
- Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O. Stanley. 2018. ES is More Than Just a Traditional Finite-difference Approximator. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 450--457. Google ScholarDigital Library
- Joel Lehman and Kenneth O. Stanley. 2010. Revising the evolutionary computation abstraction: minimal criteria novelty search. In Proceedings of the 12th annual conference on Genetic and evolutionary computation (GECCO '10). ACM, New York, NY, USA, 103--110. Google ScholarDigital Library
- Joel Lehman and Kenneth O. Stanley. 2011. Abandoning Objectives: Evolution through the Search for Novelty Alone. Evolutionary Computation 19, 2 (2011), 189--223. Google ScholarDigital Library
- Joel Lehman and Kenneth O. Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. 211--218. Google ScholarDigital Library
- Joel Lehman and Kenneth O. Stanley. 2011. Novelty Search and the Problem with Objectives. In Genetic Programming Theory and Practice IX (GPTP 2011).Google Scholar
- Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2017. Teacher-Student Curriculum Learning. arXiv preprint arXiv:1707.00183 (2017).Google Scholar
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937. Google ScholarDigital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.Google Scholar
- Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).Google Scholar
- Anh Nguyen, Jason Yosinski, and Jeff Clune. 2016. Understanding Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning. Evolutionary Computation 24, 3 (2016), 545--572. Google ScholarDigital Library
- Elena Popovici, Anthony Bucci, R. Paul Wiegand, and Edwin D. De Jong. 2012. Coevolutionary Principles. Springer Berlin Heidelberg, Berlin, Heidelberg, 987-- 1033.Google Scholar
- Justin K Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation. 3, 40 (2016).Google Scholar
- Ingo Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie. 83--114.Google Scholar
- Tim Salimans, Jonathan Ho, Xi Chen, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google Scholar
- Nikolay Savinov, Anton Raichuk, Raphael Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, and Sylvain Gelly. 2018. EPISODIC CURIOSITY THROUGH REACHABILITY. arXiv preprint arXiv:1810.0227 (2018).Google Scholar
- Jürgen Schmidhuber. 2013. Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in psychology 4 (2013), 313.Google Scholar
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning. 1889--1897. Google ScholarDigital Library
- Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, and Jürgen Schmidhuber. 2010. Parameter-exploring policy gradients. Neural Networks 23, 4 (2010), 551--559. Google ScholarDigital Library
- Noor Shaker, Julian Togelius, and Mark J Nelson. 2016. Procedural content generation in games. Springer. Google ScholarDigital Library
- L.B. Soros, Nick Cheney, and Kenneth O Stanley. 2016. How the Strictness of the Minimal Criterion Impacts Open-Ended Evolution. In ALIFE 15: The Fifteenth International Conference on the Synthesis and Simulation of Living Systems. 208--215.Google Scholar
- L.B. Soros and Kenneth O Stanley. 2014. Identifying Necessary Conditions for Open-Ended Evolution through the Artificial Life World of Chromaria. In ALIFE 14: The Fourteenth International Conference on the Synthesis and Simulation of Living Systems. 793--800.Google ScholarCross Ref
- Russell K Standish. 2003. Open-ended artificial evolution. International Journal of Computational Intelligence and Applications 3, 02 (2003), 167--175.Google ScholarCross Ref
- Kenneth O. Stanley. 2007. Compositional Pattern Producing Networks: A Novel Abstraction of Development. Genetic Programming and Evolvable Machines Special Issue on Developmental Systems 8, 2 (2007), 131--162. Google ScholarDigital Library
- Kenneth O Stanley and Joel Lehman. 2015. Why Greatness Cannot Be Planned. (2015).Google Scholar
- Kenneth O. Stanley, Joel Lehman, and Lisa Soros. 2017. Open-endedness: The last grand challenge you've never heard of. O'Reilly Online (2017). https://www.oreilly.com/ideas/open-endedness-the-last-grand-challenge-youve-never-heard-ofGoogle Scholar
- Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1.Google ScholarDigital Library
- Tim Taylor, Mark Bedau, Alastair Channon, and David Ackley et al. 2016. Open-Ended Evolution: Perspectives from the OEE Workshop in York. Artificial life 22, 3 (2016), 408âĂŞ423. Google ScholarDigital Library
- Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and Cameron Browne. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172--186.Google Scholar
- R. Paul Wiegand, William C. Liles, and Kenneth A. De Jong. 2001. An Empirical Analysis of Collaboration Methods in Cooperative Coevolutionary Algorithms. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation (GECCO'01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1235--1242. http://dl.acm.org/citation.cfm?id=2955239.2955458 Google ScholarDigital Library
- Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber. 2008. Natural evolution strategies. In Evolutionary Computation, 2008. 3381--3387.Google Scholar
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256. Google ScholarDigital Library
- Brian G. Woolley and Kenneth O. Stanley. 2011. On the deleterious effects of a priori objectives on evolution and representation. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 957--964. https://doi.org/ Google ScholarDigital Library
- Xingwen Zhang, Jeff Clune, and Kenneth O. Stanley. 2017. On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent. arXiv preprint arXiv:1712.06564 (2017).Google Scholar
Index Terms
- POET: open-ended coevolution of environments and their optimized solutions
Recommendations
Minimal criterion coevolution: a new approach to open-ended search
GECCO '17: Proceedings of the Genetic and Evolutionary Computation ConferenceRecent studies have emphasized the merits of search processes that lack overarching objectives, instead promoting divergence by rewarding behavioral novelty. While this less objective search paradigm is more open-ended and divergent, it still differs ...
Rapid host-parasite coevolution drives the production and maintenance of diversity in digital organisms
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computationAccumulating evidence suggests evolution and ecology can happen on similar time scales. Coevolution between hosts and parasites is a practical example of interacting ecological and evolutionary dynamics. Antagonistic interactions theoretically and ...
Open-ended evolution with multi-containers QD
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference CompanionEvolution in nature has allowed biological systems to develop and survive in many different environments. It can be attributed to one of the major features of natural evolution: its open-endedness, that can be considered as the ability to continuously ...
Comments