Neuroevolution of self-interpretable agents

Authors:
Yujin Tang

Google Brain, Tokyo

Google Brain, Tokyo
View Profile

,
Duong Nguyen

Google Japan

Google Japan
View Profile

,
David Ha

Google Brain, Tokyo

Google Brain, Tokyo
View Profile

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation ConferenceJune 2020Pages 414–424https://doi.org/10.1145/3377930.3389847

Published:26 June 2020Publication History

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

Pages 414–424

ABSTRACT

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail.¹

Supplemental Material

p414-tang-suppl.mp4

mp4

412.6 KB

Download

References

2012. Vision: Processing Information. Retrieved January 10, 2020 from https://www.brainfacts.org/thinking-sensing-and-behaving/vision/2012/vision-processing-informationGoogle Scholar
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems. 9505--9515. https://arxiv.org/abs/1810.03292Google Scholar
Rishabh Agarwal, Chen Liang, Dale Schuurmans, and Mohammad Norouzi. 2019. Learning to generalize from sparse and underspecified rewards. arXiv preprint arXiv:1902.07198 (2019). https://arxiv.org/abs/1902.07198Google Scholar
Jimmy Ba, Geoffrey E Hinton, Volodymyr Mnih, Joel Z Leibo, and Catalin Ionescu. 2016. Using fast weights to attend to the recent past. In Advances in Neural Information Processing Systems. 4331--4339. https://arxiv.org/abs/1610.06258Google Scholar
Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014). https://arxiv.org/abs/1412.7755Google Scholar
Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. 2019. Attention augmented convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 3286--3295. https://arxiv.org/abs/1904.09925Google ScholarCross Ref
Benjamin Beyret, José Hernández-Orallo, Lucy Cheke, Marta Halina, Murray Shanahan, and Matthew Crosby. 2019. The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition. arXiv preprint arXiv:1909.07483 (2019). https://arxiv.org/abs/1909.07483Google Scholar
Peter Bloem. 2019. Transformers from Scratch. http://www.peterbloem.nl/ (2019). http://www.peterbloem.nl/blog/transformersGoogle Scholar
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. http://arxiv.org/abs/1606.01540 cite arxiv:1606.01540.Google Scholar
Yuning Chai. 2019. Patchwork: A Patch-wise Attention Network for Efficient Object Detection and Segmentation in Video Streams. CoRR abs/1904.01784 (2019). arXiv:1904.01784 http://arxiv.org/abs/1904.01784Google Scholar
Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. Neural ordinary differential equations. In Advances in neural information processing systems. 6571--6583. https://arxiv.org/abs/1806.07366Google Scholar
Brian Cheung, Eric Weiss, and Bruno Olshausen. 2016. Emergence of foveal image sampling from learning to attend in visual scenes. arXiv preprint arXiv:1611.09430 (2016). https://arxiv.org/abs/1611.09430Google Scholar
Jinyoung Choi, Beom-Jin Lee, and Byoung-Tak Zhang. 2017. Multi-Focus Attention Network for Efficient Deep Reinforcement Learning. In The Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, Saturday, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/WS/AAAIW17/paper/view/15100Google Scholar
Jeff Clune, Benjamin E Beckmann, Charles Ofria, and Robert T Pennock. 2009. Evolving coordinated quadruped gaits with the HyperNEAT generative encoding. In 2009 iEEE congress on evolutionary computation. IEEE, 2764--2771. https://bit.ly/2SqUrNJGoogle Scholar
Jeff Clune, Kenneth O Stanley, Robert T Pennock, and Charles Ofria. 2011. On the performance of indirect encoding across the continuum of regularity. IEEE Transactions on Evolutionary Computation 15, 3 (2011), 346--367. https://bit.ly/2V8g3QGGoogle ScholarDigital Library
Karl Cobbe, Christopher Hesse, Jacob Hilton, and John Schulman. 2019. Leveraging Procedural Generation to Benchmark Reinforcement Learning. arXiv preprint arXiv:1912.01588 (2019). https://arxiv.org/abs/1912.01588Google Scholar
Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. 2018. Quantifying generalization in reinforcement learning. arXiv preprint arXiv:1812.02341 (2018). https://arxiv.org/abs/1812.02341Google Scholar
Jean-Baptiste Cordonnier, Andreas Loukas, and Martin Jaggi. 2019. On the Relationship between Self-Attention and Convolutional Layers. CoRR abs/1911.03584 (2019). arXiv:1911.03584 http://arxiv.org/abs/1911.03584Google Scholar
Giuseppe Cuccu, Julian Togelius, and Philippe Cudré-Mauroux. 2019. Playing atari with six neurons. In Proceedings of the 18th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, 998--1006.Google Scholar
Stanislas Dehaene. 2014. Consciousness and the brain: Deciphering how the brain codes our thoughts. Penguin. https://en.wikipedia.org/wiki/Consciousness_and_the_BrainGoogle Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. Google ScholarCross Ref
Lei Ding, Hao Tang, and Lorenzo Bruzzone. 2019. Improving Semantic Segmentation of Aerial Images Using Patch-based Attention. ArXiv (2019). https://arxiv.org/abs/1911.08877Google Scholar
Vincent Dumoulin, Ethan Perez, Nathan Schucher, Florian Strub, Harm de Vries, Aaron Courville, and Yoshua Bengio. 2018. Feature-wise transformations. Distill (2018). Google ScholarCross Ref
Gamaleldin Elsayed, Simon Kornblith, and Quoc V Le. 2019. Saccader: Improving Accuracy of Hard Attention Models for Vision. In Advances in Neural Information Processing Systems. 700--712. https://arxiv.org/abs/1908.07644Google Scholar
Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, and Pieter Abbeel. 2016. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 512--519. https://arxiv.org/abs/1509.06113Google ScholarDigital Library
Daniel Freeman, David Ha, and Luke Metz. 2019. Learning to Predict Without Looking Ahead: World Models Without Forward Prediction. In Advances in Neural Information Processing Systems. 5380--5391. https://learningtopredict.github.io/Google Scholar
Jeremy Freeman and Eero P Simoncelli. 2011. Metamers of the ventral stream. Nature neuroscience 14, 9 (2011), 1195. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3164938/Google Scholar
Adam Gaier and David Ha. 2019. Weight agnostic neural networks. In Advances in Neural Information Processing Systems. 5365--5379. https://weightagnostic.github.ioGoogle Scholar
Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G Bellemare. 2019. Deepmdp: Learning continuous latent space models for representation learning. arXiv preprint arXiv:1906.02736 (2019). https://arxiv.org/abs/1906.02736Google Scholar
Nils Gessert, Thilo Sentker, Frederic Madesta, Rudiger Schmitz, Helge Kniep, Ivo Baltruschat, Rene Werner, and Alexander Schlaefer. 2019. Skin Lesion Classification Using CNNs with Patch-Based Attention and Diagnosis-Guided Loss Weighting. IEEE Transactions on Biomedical Engineering (2019). https://arxiv.org/abs/1905.02793Google Scholar
Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. 2019. Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893 (2019). https://arxiv.org/abs/1909.10893Google Scholar
Roger Grosse and James Martens. 2016. A kronecker-factored approximate fisher matrix for convolution layers. In International Conference on Machine Learning. 573--582. http://www.jmlr.org/proceedings/papers/v48/grosse16.pdfGoogle Scholar
D. Ha. 2017. Evolving Stable Strategies. http://blog.otoro.net/ (2017). http://blog.otoro.net/2017/11/12/evolving-stable-strategies/Google Scholar
David Ha. 2017. A Visual Guide to Evolution Strategies. http://blog.otoro.net (2017). http://blog.otoro.net/2017/10/29/visual-evolution-strategies/Google Scholar
David Ha. 2018. Reinforcement Learning for Improving Agent Design. arXiv:1810.03779 (2018). https://designrl.github.ioGoogle Scholar
David Ha, Andrew Dai, and Quoc V Le. 2017. Hypernetworks. In Fifth International Conference on Learning Representations (ICLR 2017). https://openreview.net/forum?id=rkpACe1lxGoogle Scholar
David Ha and Jürgen Schmidhuber. 2018. Recurrent World Models Facilitate Policy Evolution. In Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2451--2463. https://worldmodels.github.ioGoogle Scholar
David Ha and Jürgen Schmidhuber. 2018. World models. arXiv preprint arXiv:1803.10122 (2018). https://worldmodels.github.io/Google Scholar
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. 2018. Learning latent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551 (2018). https://planetrl.github.io/Google Scholar
Nikolaus Hansen. 2006. The CMA Evolution Strategy: A Comparing Review. Springer Berlin Heidelberg, Berlin, Heidelberg, 75--102. Google ScholarCross Ref
Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. 2019. CMA-ES/pycma on Github. Zenodo, https://doi.org/10.5281/zenodo.2559634 Google ScholarCross Ref
Uri Hasson, Samuel A Nastase, and Ariel Goldstein. 2020. Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks. Neuron 105, 3 (2020), 416--434. https://www.biorxiv.org/content/10.1101/764258v2.fullGoogle ScholarCross Ref
Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, and Peter Stone. 2012. HyperNEAT-GGP: A HyperNEAT-based Atari general game player. In Proceedings of the 14th annual conference on Genetic and evolutionary computation. 217--224. http://nn.cs.utexas.edu/downloads/papers/hausknecht.gecco12.pdfGoogle ScholarDigital Library
Donald O Hebb. 1949. The organization of behavior. na. https://en.wikipedia.org/wiki/Hebbian_theoryGoogle Scholar
Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, and Alexander Lerchner. 2017. Darla: Improving zero-shot transfer in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1480--1490. https://arxiv.org/abs/1707.08475Google Scholar
Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L McClelland, and Adam Santoro. 2019. Emergent systematic generalization in a situated agent. arXiv preprint arXiv:1910.00571 (2019). https://arxiv.org/abs/1910.00571Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Han Hu, Zheng Zhang, Zhenda Xie, and Stephen Lin. 2019. Local relation networks for image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3464--3473. https://arxiv.org/abs/1904.11491Google ScholarCross Ref
Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, and Danny Lange. 2019. Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378 (2019). https://arxiv.org/abs/1902.01378Google Scholar
Daniel Kahneman. 2011. Thinking, fast and slow. Farrar, Straus and Giroux, New York. https://en.wikipedia.org/wiki/Thinking,_Fast_and_SlowGoogle Scholar
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et al. 2019. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374 (2019).Google Scholar
Ken Kansky, Tom Silver, David A Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. 2017. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1809--1818. https://arxiv.org/abs/1706.04317Google Scholar
Michal Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaskowski. 2016. ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In IEEE Conference on Computational Intelligence and Games, CIG 2016, Santorini, Greece, September 20-23, 2016. 1--8. Google ScholarCross Ref
Oleg Klimov. 2016. CarRacing-v0. Retrieved January 17, 2020 from https://gym.openai.com/envs/CarRacing-v0/Google Scholar
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37. https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdfGoogle ScholarDigital Library
Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez. 2013. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 1061--1068. http://people.idsia.ch/~juergen/compressednetworksearch.htmlGoogle ScholarDigital Library
Tejas D Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman, and Volodymyr Mnih. 2019. Unsupervised learning of object keypoints for perception and control. In Advances in Neural Information Processing Systems. 10723--10733. https://bit.ly/2wLiEFZGoogle Scholar
Kimin Lee, Kibok Lee, Jinwoo Shin, and Honglak Lee. 2020. Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=HJgcvJBFvBGoogle Scholar
Joel Z Leibo, Cyprien de Masson d'Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys, et al. 2018. Psychlab: a psychology laboratory for deep reinforcement learning agents. arXiv preprint arXiv:1801.08116 (2018). https://arxiv.org/abs/1801.08116Google Scholar
Xiaoteng Ma. 2019. Car Racing with PyTorch. Retrieved March 6, 2020 from https://github.com/xtma/pytorch_car_caringGoogle Scholar
Arien Mack, Irvin Rock, et al. 1998. Inattentional blindness. MIT press. https://en.wikipedia.org/wiki/Inattentional_blindnessGoogle Scholar
Horia Mania, Aurelia Guy, and Benjamin Recht. 2018. Simple random search of static linear policies is competitive for reinforcement learning. In Advances in Neural Information Processing Systems. 1800--1809. https://bit.ly/38sMEEnGoogle Scholar
Thomas Miconi, Jeff Clune, and Kenneth O Stanley. 2018. Differentiable plasticity: training plastic neural networks with backpropagation. arXiv preprint arXiv:1804.02464 (2018). https://arxiv.org/abs/1804.02464Google Scholar
Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. 2014. Recurrent models of visual attention. In Advances in neural information processing systems. 2204--2212. https://arxiv.org/abs/1406.6247Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013). https://arxiv.org/abs/1312.5602Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533. https://daiwk.github.io/assets/dqn.pdfGoogle Scholar
Alexander Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, and Danilo Jimenez Rezende. 2019. Towards Interpretable Reinforcement Learning Using Attention Augmented Agents. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 12329--12338. https://bit.ly/2Ul97zaGoogle Scholar
Nils Müller and Tobias Glasmachers. 2018. Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies. In Parallel Problem Solving from Nature - PPSN XV, Anne Auger, Carlos M. Fonseca, Nuno Lourenço, Penousal Machado, Luís Paquete, and Darrell Whitley (Eds.). Springer International Publishing, Cham, 411--423.Google Scholar
Tsendsuren Munkhdalai and Hong Yu. 2017. Meta networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2554--2563. https://arxiv.org/abs/1703.00837Google ScholarDigital Library
Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, and Dawn Song. 2018. Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282 (2018). https://arxiv.org/abs/1810.12282Google Scholar
Philip Paquette. 2017. DoomTakeCover-v0. Retrieved January 17, 2020 from https://gym.openai.com/envs/DoomTakeCover-v0/Google Scholar
Niki Parmar, Prajit Ramachandran, Ashish Vaswani, Irwan Bello, Anselm Levskaya, and Jon Shlens. 2019. Stand-Alone Self-Attention in Vision Models. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 68--80. http://papers.nips.cc/paper/8302-stand-alone-self-attention-in-vision-modelsGoogle Scholar
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019). https://bit.ly/31PMViqGoogle Scholar
Sebastian Risi and Kenneth O Stanley. 2012. An enhanced hypercube-based encoding for evolving the placement, density, and connectivity of neurons. Artificial Life 18, 4 (2012), 331--363. https://eplex.cs.ucf.edu/papers/risi_alife12.pdfGoogle ScholarDigital Library
Sebastian Risi and Kenneth O Stanley. 2013. Confronting the challenge of learning a flexible neural controller for a diversity of morphologies. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 255--262. https://eplex.cs.ucf.edu/papers/risi_gecco13b.pdfGoogle ScholarDigital Library
Sebastian Risi and Kenneth O. Stanley. 2019. Deep neuroevolution of recurrent and discrete world models. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, Prague, Czech Republic, July 13-17, 2019. 456--462. Google ScholarDigital Library
Sebastian Risi and Kenneth O. Stanley. 2020. Improving Deep Neuroevolution via Deep Innovation Protection. CoRR abs/2001.01683 (2020). arXiv:2001.01683 http://arxiv.org/abs/2001.01683Google Scholar
Edward Rosten, Gerhard Reitmayr, and Tom Drummond. 2005. Real-time video annotations for augmented reality. In International Symposium on Visual Computing. Springer, 294--302. http://www.edrosten.com/work/rosten_2005_annotations.pdfGoogle ScholarDigital Library
Tara N Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. 2013. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6655--6659. https://bit.ly/39ZEF26Google ScholarCross Ref
T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. Preprint arXiv:1703.03864 (2017). https://arxiv.org/abs/1703.03864Google Scholar
Juergen Schmidhuber. 1993. A 'self-referential' weight matrix. In International Conference on Artificial Neural Networks. Springer, 446--450. https://mediatum.ub.tum.de/doc/814784/file.pdfGoogle ScholarCross Ref
Juergen Schmidhuber. 1997. Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks 10, 5 (1997), 857--873. ftp://ftp.idsia.ch/pub/juergen/loconet.pdfGoogle ScholarDigital Library
Juergen Schmidhuber and Rudolf Huber. 1991. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems 2, 01n02 (1991), 125--134. http://people.idsia.ch/~juergen/attentive.htmlGoogle ScholarCross Ref
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy P. Lillicrap, and David Silver. 2019. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. CoRR abs/1911.08265 (2019). arXiv:1911.08265 http://arxiv.org/abs/1911.08265Google Scholar
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347Google Scholar
Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, and Behnam Neyshabur. 2020. Observational Overfitting in Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=HJli2hNKDHGoogle Scholar
Ivan Sorokin, Alexey Seleznev, Mikhail Pavlov, Aleksandr Fedorov, and Anastasiia Ignateva. 2015. Deep Attention Recurrent Q-Network. ArXiv abs/1512.01693 (2015). https://arxiv.org/abs/1512.01693Google Scholar
E. S Spelke and K. D. Kinzler. 2007. Core knowledge. Developmental Science 10 (2007), 89--96. http://www.wjh.harvard.edu/~lds/pdfs/SpelkeKinzler07.pdfGoogle ScholarCross Ref
Kenneth O Stanley. 2007. Compositional pattern producing networks: A novel abstraction of development. Genetic programming and evolvable machines 8, 2 (2007), 131--162. https://eplex.cs.ucf.edu/papers/stanley_gpem07.pdfGoogle Scholar
Kenneth O Stanley, David B D'Ambrosio, and Jason Gauci. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185--212. http://eplex.cs.ucf.edu/hyperNEATpage/Google Scholar
Kenneth O Stanley and Risto Miikkulainen. 2003. A taxonomy for artificial embryogeny. Artificial Life 9, 2 (2003), 93--130. http://nn.cs.utexas.edu/?stanley:alife03Google ScholarDigital Library
Marijn F Stollenga, Jonathan Masci, Faustino Gomez, and Jürgen Schmidhuber. 2014. Deep networks with internal selective attention through feedback connections. In Advances in neural information processing systems. 3545--3553. https://arxiv.org/abs/1407.3068Google Scholar
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017). https://arxiv.org/abs/1712.06567Google Scholar
Gencer Sumbul and Begüm Demir. 2019. A CNN-RNN Framework with a Novel Patch-Based Multi-Attention Mechanism for Multi-Label Image Classification in Remote Sensing. arXiv preprint arXiv:1902.11274 (2019).Google Scholar
Supasorn Suwajanakorn, Noah Snavely, Jonathan J Tompson, and Mohammad Norouzi. 2018. Discovery of latent 3d keypoints via end-to-end geometric reasoning. In Advances in Neural Information Processing Systems. 2059--2070. https://keypointnet.github.io/Google Scholar
Yujin Tang and David Ha. 2019. How to run evolution strategies on Google Kubernetes Engine. https://cloud.google.com/blog (2019). https://cloud.google.com/blog/products/ai-machine-learning/how-to-run-evolution-strategies-on-google-kubernetes-engineGoogle Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdfGoogle ScholarDigital Library
Johannes von Oswald, Christian Henning, João Sacramento, and Benjamin F. Grewe. 2020. Continual learning with hypernetworks. In International Conference on Learning Representations. https://openreview.net/forum?id=SJgwNerKvBGoogle Scholar
Edward Vul, Deborah Hanus, and Nancy Kanwisher. 2009. Attention as inference: selection is probabilistic; responses are all-or-none samples. Journal of Experimental Psychology: General 138, 4 (2009), 546. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2822457/Google ScholarCross Ref
Marek Wydmuch, Michal Kempka, and Wojciech Jaskowski. 2019. ViZDoom Competitions: Playing Doom From Pixels. IEEE Trans. Games 11, 3 (2019), 248--259. Google ScholarCross Ref
Chang Ye, Ahmed Khalifa, Philip Bontrager, and Julian Togelius. 2020. Rotation, Translation, and Cropping for Zero-Shot Generalization. arXiv preprint arXiv:2001.09908 (2020). https://arxiv.org/abs/2001.09908Google Scholar
Anthony M Zador. 2019. A critique of pure learning and what artificial neural networks can learn from animal brains. Nature communications 10, 1 (2019), 1--7. https://www.nature.com/articles/s41467-019-11786-6Google Scholar
Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, and Peter Battaglia. 2019. Deep reinforcement learning with relational inductive biases. In International Conference on Learning Representations. https://openreview.net/forum?id=HkxaFoC9KQGoogle Scholar
Amy Zhang, Yuxin Wu, and Joelle Pineau. 2018. Natural environment benchmarks for reinforcement learning. arXiv preprint arXiv:1811.06032 (2018). https://arxiv.org/abs/1811.06032Google Scholar
Chenyang Zhao, Olivier Siguad, Freek Stulp, and Timothy M Hospedales. 2019. Investigating generalisation in continuous deep reinforcement learning. arXiv preprint arXiv:1902.07015 (2019). https://arxiv.org/abs/1902.07015Google Scholar

Recommendations

Evolution of Self-interested Agents: An Experimental Study
MIWAI 2013: Proceedings of the 7th International Workshop on Multi-disciplinary Trends in Artificial Intelligence - Volume 8271

In this paper, we perform an experimental study to examine the evolution of self-interested agents in cooperative agent societies. To this end, we realize a multiagent system in which agents initially behave altruistically by sharing information of ...
Read More
Self-organization in service discovery in presence of noncooperative agents

Self-organization and cooperation of agents in open societies play an important role in the success of the service discovery process. Self-organization allows agents to deal with dynamic requirements in service demand. Moreover, in distributed ...
Read More
Distributing coalition value calculations to self-interested agents
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems

In characteristic function games, an agent can potentially join many different coalitions, and so must choose which coalition to join. To compare each potential coalition, the agents need to calculate a value for each coalition. As the number of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference
June 2020
1349 pages
ISBN:9781450371285
DOI:10.1145/3377930
General Chair:
Carlos Artemio Coello Coello
CINVESTAV-IPN
Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 1,962
  Total Downloads
- Downloads (Last 12 months)478
- Downloads (Last 6 weeks)67
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Neuroevolution of self-interpretable agents

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Evolution of Self-interested Agents: An Experimental Study

Self-organization in service discovery in presence of noncooperative agents

Distributing coalition value calculations to self-interested agents