ABSTRACT
Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail.1
Supplemental Material
- 2012. Vision: Processing Information. Retrieved January 10, 2020 from https://www.brainfacts.org/thinking-sensing-and-behaving/vision/2012/vision-processing-informationGoogle Scholar
- Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems. 9505--9515. https://arxiv.org/abs/1810.03292Google Scholar
- Rishabh Agarwal, Chen Liang, Dale Schuurmans, and Mohammad Norouzi. 2019. Learning to generalize from sparse and underspecified rewards. arXiv preprint arXiv:1902.07198 (2019). https://arxiv.org/abs/1902.07198Google Scholar
- Jimmy Ba, Geoffrey E Hinton, Volodymyr Mnih, Joel Z Leibo, and Catalin Ionescu. 2016. Using fast weights to attend to the recent past. In Advances in Neural Information Processing Systems. 4331--4339. https://arxiv.org/abs/1610.06258Google Scholar
- Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014). https://arxiv.org/abs/1412.7755Google Scholar
- Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. 2019. Attention augmented convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 3286--3295. https://arxiv.org/abs/1904.09925Google ScholarCross Ref
- Benjamin Beyret, José Hernández-Orallo, Lucy Cheke, Marta Halina, Murray Shanahan, and Matthew Crosby. 2019. The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition. arXiv preprint arXiv:1909.07483 (2019). https://arxiv.org/abs/1909.07483Google Scholar
- Peter Bloem. 2019. Transformers from Scratch. http://www.peterbloem.nl/ (2019). http://www.peterbloem.nl/blog/transformersGoogle Scholar
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. http://arxiv.org/abs/1606.01540 cite arxiv:1606.01540.Google Scholar
- Yuning Chai. 2019. Patchwork: A Patch-wise Attention Network for Efficient Object Detection and Segmentation in Video Streams. CoRR abs/1904.01784 (2019). arXiv:1904.01784 http://arxiv.org/abs/1904.01784Google Scholar
- Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. Neural ordinary differential equations. In Advances in neural information processing systems. 6571--6583. https://arxiv.org/abs/1806.07366Google Scholar
- Brian Cheung, Eric Weiss, and Bruno Olshausen. 2016. Emergence of foveal image sampling from learning to attend in visual scenes. arXiv preprint arXiv:1611.09430 (2016). https://arxiv.org/abs/1611.09430Google Scholar
- Jinyoung Choi, Beom-Jin Lee, and Byoung-Tak Zhang. 2017. Multi-Focus Attention Network for Efficient Deep Reinforcement Learning. In The Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, Saturday, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/WS/AAAIW17/paper/view/15100Google Scholar
- Jeff Clune, Benjamin E Beckmann, Charles Ofria, and Robert T Pennock. 2009. Evolving coordinated quadruped gaits with the HyperNEAT generative encoding. In 2009 iEEE congress on evolutionary computation. IEEE, 2764--2771. https://bit.ly/2SqUrNJGoogle Scholar
- Jeff Clune, Kenneth O Stanley, Robert T Pennock, and Charles Ofria. 2011. On the performance of indirect encoding across the continuum of regularity. IEEE Transactions on Evolutionary Computation 15, 3 (2011), 346--367. https://bit.ly/2V8g3QGGoogle ScholarDigital Library
- Karl Cobbe, Christopher Hesse, Jacob Hilton, and John Schulman. 2019. Leveraging Procedural Generation to Benchmark Reinforcement Learning. arXiv preprint arXiv:1912.01588 (2019). https://arxiv.org/abs/1912.01588Google Scholar
- Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. 2018. Quantifying generalization in reinforcement learning. arXiv preprint arXiv:1812.02341 (2018). https://arxiv.org/abs/1812.02341Google Scholar
- Jean-Baptiste Cordonnier, Andreas Loukas, and Martin Jaggi. 2019. On the Relationship between Self-Attention and Convolutional Layers. CoRR abs/1911.03584 (2019). arXiv:1911.03584 http://arxiv.org/abs/1911.03584Google Scholar
- Giuseppe Cuccu, Julian Togelius, and Philippe Cudré-Mauroux. 2019. Playing atari with six neurons. In Proceedings of the 18th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, 998--1006.Google Scholar
- Stanislas Dehaene. 2014. Consciousness and the brain: Deciphering how the brain codes our thoughts. Penguin. https://en.wikipedia.org/wiki/Consciousness_and_the_BrainGoogle Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. Google ScholarCross Ref
- Lei Ding, Hao Tang, and Lorenzo Bruzzone. 2019. Improving Semantic Segmentation of Aerial Images Using Patch-based Attention. ArXiv (2019). https://arxiv.org/abs/1911.08877Google Scholar
- Vincent Dumoulin, Ethan Perez, Nathan Schucher, Florian Strub, Harm de Vries, Aaron Courville, and Yoshua Bengio. 2018. Feature-wise transformations. Distill (2018). Google ScholarCross Ref
- Gamaleldin Elsayed, Simon Kornblith, and Quoc V Le. 2019. Saccader: Improving Accuracy of Hard Attention Models for Vision. In Advances in Neural Information Processing Systems. 700--712. https://arxiv.org/abs/1908.07644Google Scholar
- Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, and Pieter Abbeel. 2016. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 512--519. https://arxiv.org/abs/1509.06113Google ScholarDigital Library
- Daniel Freeman, David Ha, and Luke Metz. 2019. Learning to Predict Without Looking Ahead: World Models Without Forward Prediction. In Advances in Neural Information Processing Systems. 5380--5391. https://learningtopredict.github.io/Google Scholar
- Jeremy Freeman and Eero P Simoncelli. 2011. Metamers of the ventral stream. Nature neuroscience 14, 9 (2011), 1195. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3164938/Google Scholar
- Adam Gaier and David Ha. 2019. Weight agnostic neural networks. In Advances in Neural Information Processing Systems. 5365--5379. https://weightagnostic.github.ioGoogle Scholar
- Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G Bellemare. 2019. Deepmdp: Learning continuous latent space models for representation learning. arXiv preprint arXiv:1906.02736 (2019). https://arxiv.org/abs/1906.02736Google Scholar
- Nils Gessert, Thilo Sentker, Frederic Madesta, Rudiger Schmitz, Helge Kniep, Ivo Baltruschat, Rene Werner, and Alexander Schlaefer. 2019. Skin Lesion Classification Using CNNs with Patch-Based Attention and Diagnosis-Guided Loss Weighting. IEEE Transactions on Biomedical Engineering (2019). https://arxiv.org/abs/1905.02793Google Scholar
- Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. 2019. Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893 (2019). https://arxiv.org/abs/1909.10893Google Scholar
- Roger Grosse and James Martens. 2016. A kronecker-factored approximate fisher matrix for convolution layers. In International Conference on Machine Learning. 573--582. http://www.jmlr.org/proceedings/papers/v48/grosse16.pdfGoogle Scholar
- D. Ha. 2017. Evolving Stable Strategies. http://blog.otoro.net/ (2017). http://blog.otoro.net/2017/11/12/evolving-stable-strategies/Google Scholar
- David Ha. 2017. A Visual Guide to Evolution Strategies. http://blog.otoro.net (2017). http://blog.otoro.net/2017/10/29/visual-evolution-strategies/Google Scholar
- David Ha. 2018. Reinforcement Learning for Improving Agent Design. arXiv:1810.03779 (2018). https://designrl.github.ioGoogle Scholar
- David Ha, Andrew Dai, and Quoc V Le. 2017. Hypernetworks. In Fifth International Conference on Learning Representations (ICLR 2017). https://openreview.net/forum?id=rkpACe1lxGoogle Scholar
- David Ha and Jürgen Schmidhuber. 2018. Recurrent World Models Facilitate Policy Evolution. In Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2451--2463. https://worldmodels.github.ioGoogle Scholar
- David Ha and Jürgen Schmidhuber. 2018. World models. arXiv preprint arXiv:1803.10122 (2018). https://worldmodels.github.io/Google Scholar
- Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. 2018. Learning latent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551 (2018). https://planetrl.github.io/Google Scholar
- Nikolaus Hansen. 2006. The CMA Evolution Strategy: A Comparing Review. Springer Berlin Heidelberg, Berlin, Heidelberg, 75--102. Google ScholarCross Ref
- Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. 2019. CMA-ES/pycma on Github. Zenodo, https://doi.org/10.5281/zenodo.2559634 Google ScholarCross Ref
- Uri Hasson, Samuel A Nastase, and Ariel Goldstein. 2020. Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks. Neuron 105, 3 (2020), 416--434. https://www.biorxiv.org/content/10.1101/764258v2.fullGoogle ScholarCross Ref
- Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, and Peter Stone. 2012. HyperNEAT-GGP: A HyperNEAT-based Atari general game player. In Proceedings of the 14th annual conference on Genetic and evolutionary computation. 217--224. http://nn.cs.utexas.edu/downloads/papers/hausknecht.gecco12.pdfGoogle ScholarDigital Library
- Donald O Hebb. 1949. The organization of behavior. na. https://en.wikipedia.org/wiki/Hebbian_theoryGoogle Scholar
- Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, and Alexander Lerchner. 2017. Darla: Improving zero-shot transfer in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1480--1490. https://arxiv.org/abs/1707.08475Google Scholar
- Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L McClelland, and Adam Santoro. 2019. Emergent systematic generalization in a situated agent. arXiv preprint arXiv:1910.00571 (2019). https://arxiv.org/abs/1910.00571Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- Han Hu, Zheng Zhang, Zhenda Xie, and Stephen Lin. 2019. Local relation networks for image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3464--3473. https://arxiv.org/abs/1904.11491Google ScholarCross Ref
- Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, and Danny Lange. 2019. Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378 (2019). https://arxiv.org/abs/1902.01378Google Scholar
- Daniel Kahneman. 2011. Thinking, fast and slow. Farrar, Straus and Giroux, New York. https://en.wikipedia.org/wiki/Thinking,_Fast_and_SlowGoogle Scholar
- Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et al. 2019. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374 (2019).Google Scholar
- Ken Kansky, Tom Silver, David A Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. 2017. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1809--1818. https://arxiv.org/abs/1706.04317Google Scholar
- Michal Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaskowski. 2016. ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In IEEE Conference on Computational Intelligence and Games, CIG 2016, Santorini, Greece, September 20-23, 2016. 1--8. Google ScholarCross Ref
- Oleg Klimov. 2016. CarRacing-v0. Retrieved January 17, 2020 from https://gym.openai.com/envs/CarRacing-v0/Google Scholar
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37. https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdfGoogle ScholarDigital Library
- Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez. 2013. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 1061--1068. http://people.idsia.ch/~juergen/compressednetworksearch.htmlGoogle ScholarDigital Library
- Tejas D Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman, and Volodymyr Mnih. 2019. Unsupervised learning of object keypoints for perception and control. In Advances in Neural Information Processing Systems. 10723--10733. https://bit.ly/2wLiEFZGoogle Scholar
- Kimin Lee, Kibok Lee, Jinwoo Shin, and Honglak Lee. 2020. Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=HJgcvJBFvBGoogle Scholar
- Joel Z Leibo, Cyprien de Masson d'Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys, et al. 2018. Psychlab: a psychology laboratory for deep reinforcement learning agents. arXiv preprint arXiv:1801.08116 (2018). https://arxiv.org/abs/1801.08116Google Scholar
- Xiaoteng Ma. 2019. Car Racing with PyTorch. Retrieved March 6, 2020 from https://github.com/xtma/pytorch_car_caringGoogle Scholar
- Arien Mack, Irvin Rock, et al. 1998. Inattentional blindness. MIT press. https://en.wikipedia.org/wiki/Inattentional_blindnessGoogle Scholar
- Horia Mania, Aurelia Guy, and Benjamin Recht. 2018. Simple random search of static linear policies is competitive for reinforcement learning. In Advances in Neural Information Processing Systems. 1800--1809. https://bit.ly/38sMEEnGoogle Scholar
- Thomas Miconi, Jeff Clune, and Kenneth O Stanley. 2018. Differentiable plasticity: training plastic neural networks with backpropagation. arXiv preprint arXiv:1804.02464 (2018). https://arxiv.org/abs/1804.02464Google Scholar
- Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. 2014. Recurrent models of visual attention. In Advances in neural information processing systems. 2204--2212. https://arxiv.org/abs/1406.6247Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013). https://arxiv.org/abs/1312.5602Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533. https://daiwk.github.io/assets/dqn.pdfGoogle Scholar
- Alexander Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, and Danilo Jimenez Rezende. 2019. Towards Interpretable Reinforcement Learning Using Attention Augmented Agents. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 12329--12338. https://bit.ly/2Ul97zaGoogle Scholar
- Nils Müller and Tobias Glasmachers. 2018. Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies. In Parallel Problem Solving from Nature - PPSN XV, Anne Auger, Carlos M. Fonseca, Nuno Lourenço, Penousal Machado, Luís Paquete, and Darrell Whitley (Eds.). Springer International Publishing, Cham, 411--423.Google Scholar
- Tsendsuren Munkhdalai and Hong Yu. 2017. Meta networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2554--2563. https://arxiv.org/abs/1703.00837Google ScholarDigital Library
- Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, and Dawn Song. 2018. Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282 (2018). https://arxiv.org/abs/1810.12282Google Scholar
- Philip Paquette. 2017. DoomTakeCover-v0. Retrieved January 17, 2020 from https://gym.openai.com/envs/DoomTakeCover-v0/Google Scholar
- Niki Parmar, Prajit Ramachandran, Ashish Vaswani, Irwan Bello, Anselm Levskaya, and Jon Shlens. 2019. Stand-Alone Self-Attention in Vision Models. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 68--80. http://papers.nips.cc/paper/8302-stand-alone-self-attention-in-vision-modelsGoogle Scholar
- Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019). https://bit.ly/31PMViqGoogle Scholar
- Sebastian Risi and Kenneth O Stanley. 2012. An enhanced hypercube-based encoding for evolving the placement, density, and connectivity of neurons. Artificial Life 18, 4 (2012), 331--363. https://eplex.cs.ucf.edu/papers/risi_alife12.pdfGoogle ScholarDigital Library
- Sebastian Risi and Kenneth O Stanley. 2013. Confronting the challenge of learning a flexible neural controller for a diversity of morphologies. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 255--262. https://eplex.cs.ucf.edu/papers/risi_gecco13b.pdfGoogle ScholarDigital Library
- Sebastian Risi and Kenneth O. Stanley. 2019. Deep neuroevolution of recurrent and discrete world models. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, Prague, Czech Republic, July 13-17, 2019. 456--462. Google ScholarDigital Library
- Sebastian Risi and Kenneth O. Stanley. 2020. Improving Deep Neuroevolution via Deep Innovation Protection. CoRR abs/2001.01683 (2020). arXiv:2001.01683 http://arxiv.org/abs/2001.01683Google Scholar
- Edward Rosten, Gerhard Reitmayr, and Tom Drummond. 2005. Real-time video annotations for augmented reality. In International Symposium on Visual Computing. Springer, 294--302. http://www.edrosten.com/work/rosten_2005_annotations.pdfGoogle ScholarDigital Library
- Tara N Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. 2013. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6655--6659. https://bit.ly/39ZEF26Google ScholarCross Ref
- T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. Preprint arXiv:1703.03864 (2017). https://arxiv.org/abs/1703.03864Google Scholar
- Juergen Schmidhuber. 1993. A 'self-referential' weight matrix. In International Conference on Artificial Neural Networks. Springer, 446--450. https://mediatum.ub.tum.de/doc/814784/file.pdfGoogle ScholarCross Ref
- Juergen Schmidhuber. 1997. Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks 10, 5 (1997), 857--873. ftp://ftp.idsia.ch/pub/juergen/loconet.pdfGoogle ScholarDigital Library
- Juergen Schmidhuber and Rudolf Huber. 1991. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems 2, 01n02 (1991), 125--134. http://people.idsia.ch/~juergen/attentive.htmlGoogle ScholarCross Ref
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy P. Lillicrap, and David Silver. 2019. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. CoRR abs/1911.08265 (2019). arXiv:1911.08265 http://arxiv.org/abs/1911.08265Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347Google Scholar
- Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, and Behnam Neyshabur. 2020. Observational Overfitting in Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=HJli2hNKDHGoogle Scholar
- Ivan Sorokin, Alexey Seleznev, Mikhail Pavlov, Aleksandr Fedorov, and Anastasiia Ignateva. 2015. Deep Attention Recurrent Q-Network. ArXiv abs/1512.01693 (2015). https://arxiv.org/abs/1512.01693Google Scholar
- E. S Spelke and K. D. Kinzler. 2007. Core knowledge. Developmental Science 10 (2007), 89--96. http://www.wjh.harvard.edu/~lds/pdfs/SpelkeKinzler07.pdfGoogle ScholarCross Ref
- Kenneth O Stanley. 2007. Compositional pattern producing networks: A novel abstraction of development. Genetic programming and evolvable machines 8, 2 (2007), 131--162. https://eplex.cs.ucf.edu/papers/stanley_gpem07.pdfGoogle Scholar
- Kenneth O Stanley, David B D'Ambrosio, and Jason Gauci. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185--212. http://eplex.cs.ucf.edu/hyperNEATpage/Google Scholar
- Kenneth O Stanley and Risto Miikkulainen. 2003. A taxonomy for artificial embryogeny. Artificial Life 9, 2 (2003), 93--130. http://nn.cs.utexas.edu/?stanley:alife03Google ScholarDigital Library
- Marijn F Stollenga, Jonathan Masci, Faustino Gomez, and Jürgen Schmidhuber. 2014. Deep networks with internal selective attention through feedback connections. In Advances in neural information processing systems. 3545--3553. https://arxiv.org/abs/1407.3068Google Scholar
- Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017). https://arxiv.org/abs/1712.06567Google Scholar
- Gencer Sumbul and Begüm Demir. 2019. A CNN-RNN Framework with a Novel Patch-Based Multi-Attention Mechanism for Multi-Label Image Classification in Remote Sensing. arXiv preprint arXiv:1902.11274 (2019).Google Scholar
- Supasorn Suwajanakorn, Noah Snavely, Jonathan J Tompson, and Mohammad Norouzi. 2018. Discovery of latent 3d keypoints via end-to-end geometric reasoning. In Advances in Neural Information Processing Systems. 2059--2070. https://keypointnet.github.io/Google Scholar
- Yujin Tang and David Ha. 2019. How to run evolution strategies on Google Kubernetes Engine. https://cloud.google.com/blog (2019). https://cloud.google.com/blog/products/ai-machine-learning/how-to-run-evolution-strategies-on-google-kubernetes-engineGoogle Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdfGoogle ScholarDigital Library
- Johannes von Oswald, Christian Henning, João Sacramento, and Benjamin F. Grewe. 2020. Continual learning with hypernetworks. In International Conference on Learning Representations. https://openreview.net/forum?id=SJgwNerKvBGoogle Scholar
- Edward Vul, Deborah Hanus, and Nancy Kanwisher. 2009. Attention as inference: selection is probabilistic; responses are all-or-none samples. Journal of Experimental Psychology: General 138, 4 (2009), 546. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2822457/Google ScholarCross Ref
- Marek Wydmuch, Michal Kempka, and Wojciech Jaskowski. 2019. ViZDoom Competitions: Playing Doom From Pixels. IEEE Trans. Games 11, 3 (2019), 248--259. Google ScholarCross Ref
- Chang Ye, Ahmed Khalifa, Philip Bontrager, and Julian Togelius. 2020. Rotation, Translation, and Cropping for Zero-Shot Generalization. arXiv preprint arXiv:2001.09908 (2020). https://arxiv.org/abs/2001.09908Google Scholar
- Anthony M Zador. 2019. A critique of pure learning and what artificial neural networks can learn from animal brains. Nature communications 10, 1 (2019), 1--7. https://www.nature.com/articles/s41467-019-11786-6Google Scholar
- Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, and Peter Battaglia. 2019. Deep reinforcement learning with relational inductive biases. In International Conference on Learning Representations. https://openreview.net/forum?id=HkxaFoC9KQGoogle Scholar
- Amy Zhang, Yuxin Wu, and Joelle Pineau. 2018. Natural environment benchmarks for reinforcement learning. arXiv preprint arXiv:1811.06032 (2018). https://arxiv.org/abs/1811.06032Google Scholar
- Chenyang Zhao, Olivier Siguad, Freek Stulp, and Timothy M Hospedales. 2019. Investigating generalisation in continuous deep reinforcement learning. arXiv preprint arXiv:1902.07015 (2019). https://arxiv.org/abs/1902.07015Google Scholar
Recommendations
Evolution of Self-interested Agents: An Experimental Study
MIWAI 2013: Proceedings of the 7th International Workshop on Multi-disciplinary Trends in Artificial Intelligence - Volume 8271In this paper, we perform an experimental study to examine the evolution of self-interested agents in cooperative agent societies. To this end, we realize a multiagent system in which agents initially behave altruistically by sharing information of ...
Self-organization in service discovery in presence of noncooperative agents
Self-organization and cooperation of agents in open societies play an important role in the success of the service discovery process. Self-organization allows agents to deal with dynamic requirements in service demand. Moreover, in distributed ...
Distributing coalition value calculations to self-interested agents
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systemsIn characteristic function games, an agent can potentially join many different coalitions, and so must choose which coalition to join. To compare each potential coalition, the agents need to calculate a value for each coalition. As the number of ...
Comments