Abstract
How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding language?
- Abadi, M. et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symp. Operating Systems Design and Implementation, 2016, 265--283.Google Scholar
- Adiwardana, D., Luong, M., So, D., Hall, J., Fiedel, N., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade, G., Lu, Y., et al. Towards a human-like open-domain chatbot 2020; arXiv preprint arXiv:2001.09977.Google Scholar
- Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization, 2019; arXiv preprint arXiv:1907.02893.Google Scholar
- Ba, J., Hinton, G., Mnih, V., Leibo, J., and Ionescu, C. Using fast weights to attend to the recent past. Advances in Neural Information Processing Systems, 2016, 4331--4339.Google Scholar
- Baars, B. A Cognitive Theory of Consciousness. Cambridge University Press, Cambridge, MA, 1993.Google Scholar
- Bachman, P., Hjelm, R., and Buchwalter, W. Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems, 2019, 15535--15545.Google Scholar
- Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate, 2014; arXiv:1409.0473.Google Scholar
- Bahdanau, D., Murty, S., Noukhovitch, M., Nguyen, T., Vries, H., and Courville, A. Systematic generalization: What is required and can it be learned? 2018; arXiv:1811.12889.Google Scholar
- Bahdanau, D., de Vries, H., O'Donnell, T., Murty, S., Beaudoin, P., Bengio, Y., and Courville, A. Closure: Assessing systematic generalization of clever models, 2019; arXiv:1912.05783.Google Scholar
- Becker, S. and Hinton, G. Self-organizing neural network that discovers surfaces in random dot stereograms. Nature 355, 6356 (1992), 161--163.Google ScholarCross Ref
- Bengio, Y. The consciousness prior, 2017; arXiv:1709.08568.Google Scholar
- Bengio, Y., Bengio, S., and Cloutier, J. Learning a synaptic learning rule. In Proceedings of the IEEE 1991 Seattle Intern. Joint Conf. Neural Networks 2.Google Scholar
- Bengio, Y., Deleu, T., Rahaman, N., Ke, R., Lachapelle, S., Bilaniuk, O., Goyal, A., and Pal, C. A meta-transfer objective for learning to disentangle causal mechanisms. In Proceedings of ICLR'2020; arXiv:1901.10912.Google Scholar
- Bengio, Y., Ducharme, R., and Vincent, P. A neural probabilistic language model. NIPS'2000, 2001, 932--938.Google Scholar
- Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. Greedy layer-wise training of deep networks. In Proceedings of NIPS'2006, 2007.Google ScholarCross Ref
- Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y. Theano: A CPU and GPU math expression compiler. In Proceedings of SciPy, 2010.Google ScholarCross Ref
- Bromley, J., Guyon, I., LeCun, Y., Säkinger, E., and Shah, R. Signature verification using a "Siamese" time delay neural network. Advances in Neural Information Processing Systems, 1994, 737--744.Google Scholar
- Brown, T. et al. Language models are few-shot learners, 2020; arXiv:2005.14165.Google Scholar
- Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. End-to-end object detection with transformers. In Procedings of ECCV'2020; arXiv:2005.12872.Google Scholar
- Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments, 2020;. arXiv:2006.09882.Google Scholar
- Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations, 2020; arXiv:2002.05709.Google Scholar
- Chen, X., Fan, H., Girshick, R., and He, K. Improved baselines with momentum contrastive learning, 2020; arXiv:2003.04297.Google Scholar
- Chevalier-Boisvert, M., Bahdanau, D., Lahlou, S., Willems, L., Saharia, C., Nguyen, T., and Bengio, Y. Babyai: First steps towards grounded language learning with a human in the loop. In Proceedings in ICLR'2019; arXiv:1810.08272.Google Scholar
- Chopra, S., Hadsell, R., and LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conf. Computer Vision and Pattern Recognition 1, 539--546.Google Scholar
- Collobert, R., Kavukcuoglu, K., and Farabet, C. Torch7: A matlab-like environment for machine learning. In Proceedings of NIPS Worskshop BigLearn, 2011.Google Scholar
- Collobert, R. and Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of ICML'2008.Google Scholar
- Conneau, A. and Lample, G. Cross-lingual language model pretraining. Advances in Neural Information Processing Systems 32, 2019. H. Wallach et al., eds. 7059--7069. Curran Associates, Inc.; http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining.pdf.Google Scholar
- Dahl, G., Yu, D., Deng, L., and Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech, and Language Processing 20, 1 (2011), 30--42.Google Scholar
- Dayan, P. and Abbott, L. Theoretical Neuroscience. The MIT Press, 2001.Google Scholar
- Dehaene, S., Lau, H., and Kouider, S. What is consciousness, and could machines have it? Science 358, 6362 (2017, 486--492.Google Scholar
- Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of 2009 IEEE Conf. Computer Vision and Pattern Recognition, 248--255.Google Scholar
- Devlin, J., Chang, M., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of ACL'2019; arXiv:1810.04805.Google Scholar
- Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks, 2017; arXiv:1703.03400.Google Scholar
- Ganin, Y and Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of Intern. Conf. Machine Learning, 2015, 1180--1189.Google Scholar
- Glorot, X., Bordes, A., and Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of AISTATS'2011.Google Scholar
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014, 2672--26804.Google Scholar
- Gopnik, A., Glymour, C., Sobel, D., Schulz, L., Kushnir, T., and Danks, D. A theory of causal learning in children: causal maps and bayes nets. Psychological Review 111, 1 (2004).Google ScholarCross Ref
- Goyal, A., Lamb, A., Hoffmann, J., Sodhani, S., Levine, S., Bengio, Y., and Schölkopf, B. Recurrent independent mechanisms, 2019; arXiv:1909.10893.Google Scholar
- Graves, A. Generating sequences with recurrent neural networks, 2013; arXiv:1308.0850.Google Scholar
- Grill, J-B. et al. Bootstrap your own latent: A new approach to self-supervised learning, 2020; aeXiv:2006.07733.Google Scholar
- Gutmann, M. and Hyvärinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13th Intern. Conf. Artificial Intelligence and Statistics, 2010, 297--304.Google Scholar
- He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of CVPR'2020, June 2020.Google ScholarCross Ref
- He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of CVPR'2016, 770--778.Google Scholar
- Hinton, G. A parallel computation that assigns canonical object-based frames of reference. In Proceedings of the 7th Intern. Joint Conf. Artificial Intelligence 2, 1981, 683--685.Google Scholar
- Hinton, G. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence 46, 1--2 (1990), 47--75.Google ScholarCross Ref
- Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing 29, 6 (2012), 82--97.Google ScholarCross Ref
- Hinton, G., Krizhevsky, A., and Wang, S. Transforming auto-encoders. In Proceedings of Intern. Conf. Artificial Neural Networks. Springer, 2011, 44--51.Google ScholarCross Ref
- Hinton, G., Osindero, S., and Teh, Y-W. A fast-learning algorithm for deep belief nets. Neural Computation 18 (2006), 1527--1554.Google ScholarDigital Library
- Hinton, G. and Plaut, D. Using fast weights to deblur old memories. In Proceedings of the 9th Annual Conf. Cognitive Science Society, 1987, 177--186.Google Scholar
- Hinton, G. and Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313 (July 2006), 504--507.Google ScholarCross Ref
- Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. In Proceedings of NeurIPS'2012; arXiv:1207.0580.Google Scholar
- Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015.Google Scholar
- Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of ICCV'09, 2009.Google ScholarCross Ref
- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM Intern. Conf. Multimedia, 2014, 675--678.Google ScholarDigital Library
- Kahneman, D. Thinking, Fast and Slow. Macmillan, 2011.Google Scholar
- Ke, N., Bilaniuk, O., Goyal, A., Bauer, S., Larochelle, H., Pal, C., and Bengio, Y. Learning neural causal models from unknown interventions, 2019; arXiv:1910.01075.Google Scholar
- Kingma, D. and Welling, M. Auto-encoding variational bayes. In Proceedings of the Intern. Conf. Learning Representations, 2014.Google Scholar
- Kosiorek, A., Sabour, S., Teh, Y., and Hinton, G. Stacked capsule autoencoders. Advances in Neural Information Processing Systems, 2019, 15512--15522.Google Scholar
- Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. In Proceedings of NIPS'2012.Google Scholar
- Lake, B., Ullman, T., Tenenbaum, J., and Gershman, S. Building machines that learn and think like people. Behavioral and Brain Sciences 40 (2017).Google Scholar
- Lample, G. and Charton, F. Deep learning for symbolic mathematics. In Proceedings of ICLR'2020; arXiv:1912.01412.Google Scholar
- LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarCross Ref
- LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., and Jackel, L. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 4 (1989), 541--551.Google ScholarDigital Library
- LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Lopez-Paz, D., Nishihara, R., Chintala, S., Scholkopf, B., and Bottou, L. Discovering causal signals in images. In Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition, 2017, 6979--6987.Google ScholarCross Ref
- Misra, I. and Maaten, L. Self-supervised learning of pretext-invariant representations. In Proceedings of CVPR'2020, June 2020; arXiv:1912.01991.Google ScholarCross Ref
- Mohamed, A., Dahl, G., and Hinton, G. Deep belief networks for phone recognition. In Proceedings of NIPS Workshop on Deep Learning for Speech Recognition and Related Applications. (Vancouver, Canada, 2009).Google Scholar
- Morgan, N., Beck, J., Allman, E., and Beer, J. Rap: A ring array processor for multilayer perceptron applications. In Proceedings of the IEEE Intern. Conf. Acoustics, Speech, and Signal Processing, 1990, 1005--1008.Google ScholarCross Ref
- Nair, V. and Hinton, G. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the ICML'2010.Google Scholar
- Paszke, A., et al. Automatic differentiation in pytorch. 2017.Google Scholar
- Robinson, A. An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Networks 5, 2 (1994), 298--305.Google ScholarDigital Library
- Roller, S., et al. Recipes for building an open domain chatbot, 2020; arXiv:2004.13637.Google Scholar
- Rumelhart, D., Hinton, G., and Williams, R. Learning representations by back-propagating errors. Nature 323 (1986), 533--536.Google ScholarCross Ref
- Schmidhuber, J. Evolutionary principles in self-referential learning. Diploma thesis, Institut f. Informatik, Tech.Univ. Munich, 1987.Google Scholar
- Shepard, R. Toward a universal law of generalization for psychological science. Science 237, 4820 (1987), 1317--1323.Google ScholarCross Ref
- Silver, D., et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.Google Scholar
- Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. End-to-end memory networks. Advances in Neural Information Processing Systems 28, 2015, 2440--2448. C. Cortes et al., eds. Curran Associates, Inc.; http://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf.Google Scholar
- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In Proceedings of ICLR'2014; arXiv:1312.6199.Google Scholar
- Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. Web-scale training for face identification. In Proceedings of CVPR'2015, 2746--2754.Google Scholar
- Thrun, S. Is learning the n-th thing any easier than learning the first? In Proceedings of NIPS'1995. MIT Press, Cambridge, MA, 640--646.Google Scholar
- Utgoff, P. and Stracuzzi, D. Many-layered learning. Neural Computation 14 (2002), 2497--2539, 2002.Google ScholarDigital Library
- Van Essen, D. and Maunsell, J. Hierarchical organization and functional streams in the visual cortex. Trends in Neurosciences 6 (1983), 370--375.Google ScholarCross Ref
- van Steenkiste, S., Chang, M., Greff, K., and Schmidhuber, J. Relational neural expectation maximization: Unsupervised discovery of objects and their interactions, 2018; arXiv:1802.10353.Google Scholar
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, T., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998--6008.Google Scholar
- Zambaldi, V., et al. Relational deep reinforcement learning, 2018; arXiv:1806.01830.Google Scholar
- Zhu, J-Y., Park, T., Isola, P., and Efros, A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE Intern. Conf. on Computer Vision, 2223--2232.Google Scholar
Index Terms
- Deep learning for AI
Recommendations
Deep learning: an overview and main paradigms
In the present paper, we examine and analyze main paradigms of learning of multilayer neural networks starting with a single layer perceptron and ending with deep neural networks, which are considered regarded as a breakthrough in the field of the ...
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information ProcessingAbstractAs the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Comments