Deep learning for AI

Authors:
Yoshua Bengio

Université de Montréal

Université de Montréal
View Profile

,
Yann Lecun

New York University

New York University
View Profile

,
Geoffrey Hinton

University of Toronto, Canada

University of Toronto, Canada
View Profile

Authors Info & Claims

Communications of the ACM Volume 64 Issue 7July 2021pp 58–65https://doi.org/10.1145/3448250

Published:21 June 2021Publication History

Communications of the ACM

Abstract

How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding language?

References

Abadi, M. et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12^th USENIX Symp. Operating Systems Design and Implementation, 2016, 265--283.Google Scholar
Adiwardana, D., Luong, M., So, D., Hall, J., Fiedel, N., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade, G., Lu, Y., et al. Towards a human-like open-domain chatbot 2020; arXiv preprint arXiv:2001.09977.Google Scholar
Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization, 2019; arXiv preprint arXiv:1907.02893.Google Scholar
Ba, J., Hinton, G., Mnih, V., Leibo, J., and Ionescu, C. Using fast weights to attend to the recent past. Advances in Neural Information Processing Systems, 2016, 4331--4339.Google Scholar
Baars, B. A Cognitive Theory of Consciousness. Cambridge University Press, Cambridge, MA, 1993.Google Scholar
Bachman, P., Hjelm, R., and Buchwalter, W. Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems, 2019, 15535--15545.Google Scholar
Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate, 2014; arXiv:1409.0473.Google Scholar
Bahdanau, D., Murty, S., Noukhovitch, M., Nguyen, T., Vries, H., and Courville, A. Systematic generalization: What is required and can it be learned? 2018; arXiv:1811.12889.Google Scholar
Bahdanau, D., de Vries, H., O'Donnell, T., Murty, S., Beaudoin, P., Bengio, Y., and Courville, A. Closure: Assessing systematic generalization of clever models, 2019; arXiv:1912.05783.Google Scholar
Becker, S. and Hinton, G. Self-organizing neural network that discovers surfaces in random dot stereograms. Nature 355, 6356 (1992), 161--163.Google ScholarCross Ref
Bengio, Y. The consciousness prior, 2017; arXiv:1709.08568.Google Scholar
Bengio, Y., Bengio, S., and Cloutier, J. Learning a synaptic learning rule. In Proceedings of the IEEE 1991 Seattle Intern. Joint Conf. Neural Networks 2.Google Scholar
Bengio, Y., Deleu, T., Rahaman, N., Ke, R., Lachapelle, S., Bilaniuk, O., Goyal, A., and Pal, C. A meta-transfer objective for learning to disentangle causal mechanisms. In Proceedings of ICLR'2020; arXiv:1901.10912.Google Scholar
Bengio, Y., Ducharme, R., and Vincent, P. A neural probabilistic language model. NIPS'2000, 2001, 932--938.Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. Greedy layer-wise training of deep networks. In Proceedings of NIPS'2006, 2007.Google ScholarCross Ref
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y. Theano: A CPU and GPU math expression compiler. In Proceedings of SciPy, 2010.Google ScholarCross Ref
Bromley, J., Guyon, I., LeCun, Y., Säkinger, E., and Shah, R. Signature verification using a "Siamese" time delay neural network. Advances in Neural Information Processing Systems, 1994, 737--744.Google Scholar
Brown, T. et al. Language models are few-shot learners, 2020; arXiv:2005.14165.Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. End-to-end object detection with transformers. In Procedings of ECCV'2020; arXiv:2005.12872.Google Scholar
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments, 2020;. arXiv:2006.09882.Google Scholar
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations, 2020; arXiv:2002.05709.Google Scholar
Chen, X., Fan, H., Girshick, R., and He, K. Improved baselines with momentum contrastive learning, 2020; arXiv:2003.04297.Google Scholar
Chevalier-Boisvert, M., Bahdanau, D., Lahlou, S., Willems, L., Saharia, C., Nguyen, T., and Bengio, Y. Babyai: First steps towards grounded language learning with a human in the loop. In Proceedings in ICLR'2019; arXiv:1810.08272.Google Scholar
Chopra, S., Hadsell, R., and LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conf. Computer Vision and Pattern Recognition 1, 539--546.Google Scholar
Collobert, R., Kavukcuoglu, K., and Farabet, C. Torch7: A matlab-like environment for machine learning. In Proceedings of NIPS Worskshop BigLearn, 2011.Google Scholar
Collobert, R. and Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of ICML'2008.Google Scholar
Conneau, A. and Lample, G. Cross-lingual language model pretraining. Advances in Neural Information Processing Systems 32, 2019. H. Wallach et al., eds. 7059--7069. Curran Associates, Inc.; http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining.pdf.Google Scholar
Dahl, G., Yu, D., Deng, L., and Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech, and Language Processing 20, 1 (2011), 30--42.Google Scholar
Dayan, P. and Abbott, L. Theoretical Neuroscience. The MIT Press, 2001.Google Scholar
Dehaene, S., Lau, H., and Kouider, S. What is consciousness, and could machines have it? Science 358, 6362 (2017, 486--492.Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of 2009 IEEE Conf. Computer Vision and Pattern Recognition, 248--255.Google Scholar
Devlin, J., Chang, M., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of ACL'2019; arXiv:1810.04805.Google Scholar
Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks, 2017; arXiv:1703.03400.Google Scholar
Ganin, Y and Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of Intern. Conf. Machine Learning, 2015, 1180--1189.Google Scholar
Glorot, X., Bordes, A., and Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of AISTATS'2011.Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014, 2672--26804.Google Scholar
Gopnik, A., Glymour, C., Sobel, D., Schulz, L., Kushnir, T., and Danks, D. A theory of causal learning in children: causal maps and bayes nets. Psychological Review 111, 1 (2004).Google ScholarCross Ref
Goyal, A., Lamb, A., Hoffmann, J., Sodhani, S., Levine, S., Bengio, Y., and Schölkopf, B. Recurrent independent mechanisms, 2019; arXiv:1909.10893.Google Scholar
Graves, A. Generating sequences with recurrent neural networks, 2013; arXiv:1308.0850.Google Scholar
Grill, J-B. et al. Bootstrap your own latent: A new approach to self-supervised learning, 2020; aeXiv:2006.07733.Google Scholar
Gutmann, M. and Hyvärinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13^th Intern. Conf. Artificial Intelligence and Statistics, 2010, 297--304.Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of CVPR'2020, June 2020.Google ScholarCross Ref
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of CVPR'2016, 770--778.Google Scholar
Hinton, G. A parallel computation that assigns canonical object-based frames of reference. In Proceedings of the 7^th Intern. Joint Conf. Artificial Intelligence 2, 1981, 683--685.Google Scholar
Hinton, G. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence 46, 1--2 (1990), 47--75.Google ScholarCross Ref
Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing 29, 6 (2012), 82--97.Google ScholarCross Ref
Hinton, G., Krizhevsky, A., and Wang, S. Transforming auto-encoders. In Proceedings of Intern. Conf. Artificial Neural Networks. Springer, 2011, 44--51.Google ScholarCross Ref
Hinton, G., Osindero, S., and Teh, Y-W. A fast-learning algorithm for deep belief nets. Neural Computation 18 (2006), 1527--1554.Google ScholarDigital Library
Hinton, G. and Plaut, D. Using fast weights to deblur old memories. In Proceedings of the 9^th Annual Conf. Cognitive Science Society, 1987, 177--186.Google Scholar
Hinton, G. and Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313 (July 2006), 504--507.Google ScholarCross Ref
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. In Proceedings of NeurIPS'2012; arXiv:1207.0580.Google Scholar
Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015.Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of ICCV'09, 2009.Google ScholarCross Ref
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22^nd ACM Intern. Conf. Multimedia, 2014, 675--678.Google ScholarDigital Library
Kahneman, D. Thinking, Fast and Slow. Macmillan, 2011.Google Scholar
Ke, N., Bilaniuk, O., Goyal, A., Bauer, S., Larochelle, H., Pal, C., and Bengio, Y. Learning neural causal models from unknown interventions, 2019; arXiv:1910.01075.Google Scholar
Kingma, D. and Welling, M. Auto-encoding variational bayes. In Proceedings of the Intern. Conf. Learning Representations, 2014.Google Scholar
Kosiorek, A., Sabour, S., Teh, Y., and Hinton, G. Stacked capsule autoencoders. Advances in Neural Information Processing Systems, 2019, 15512--15522.Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. In Proceedings of NIPS'2012.Google Scholar
Lake, B., Ullman, T., Tenenbaum, J., and Gershman, S. Building machines that learn and think like people. Behavioral and Brain Sciences 40 (2017).Google Scholar
Lample, G. and Charton, F. Deep learning for symbolic mathematics. In Proceedings of ICLR'2020; arXiv:1912.01412.Google Scholar
LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarCross Ref
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., and Jackel, L. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 4 (1989), 541--551.Google ScholarDigital Library
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Lopez-Paz, D., Nishihara, R., Chintala, S., Scholkopf, B., and Bottou, L. Discovering causal signals in images. In Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition, 2017, 6979--6987.Google ScholarCross Ref
Misra, I. and Maaten, L. Self-supervised learning of pretext-invariant representations. In Proceedings of CVPR'2020, June 2020; arXiv:1912.01991.Google ScholarCross Ref
Mohamed, A., Dahl, G., and Hinton, G. Deep belief networks for phone recognition. In Proceedings of NIPS Workshop on Deep Learning for Speech Recognition and Related Applications. (Vancouver, Canada, 2009).Google Scholar
Morgan, N., Beck, J., Allman, E., and Beer, J. Rap: A ring array processor for multilayer perceptron applications. In Proceedings of the IEEE Intern. Conf. Acoustics, Speech, and Signal Processing, 1990, 1005--1008.Google ScholarCross Ref
Nair, V. and Hinton, G. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the ICML'2010.Google Scholar
Paszke, A., et al. Automatic differentiation in pytorch. 2017.Google Scholar
Robinson, A. An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Networks 5, 2 (1994), 298--305.Google ScholarDigital Library
Roller, S., et al. Recipes for building an open domain chatbot, 2020; arXiv:2004.13637.Google Scholar
Rumelhart, D., Hinton, G., and Williams, R. Learning representations by back-propagating errors. Nature 323 (1986), 533--536.Google ScholarCross Ref
Schmidhuber, J. Evolutionary principles in self-referential learning. Diploma thesis, Institut f. Informatik, Tech.Univ. Munich, 1987.Google Scholar
Shepard, R. Toward a universal law of generalization for psychological science. Science 237, 4820 (1987), 1317--1323.Google ScholarCross Ref
Silver, D., et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.Google Scholar
Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. End-to-end memory networks. Advances in Neural Information Processing Systems 28, 2015, 2440--2448. C. Cortes et al., eds. Curran Associates, Inc.; http://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf.Google Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In Proceedings of ICLR'2014; arXiv:1312.6199.Google Scholar
Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. Web-scale training for face identification. In Proceedings of CVPR'2015, 2746--2754.Google Scholar
Thrun, S. Is learning the n-th thing any easier than learning the first? In Proceedings of NIPS'1995. MIT Press, Cambridge, MA, 640--646.Google Scholar
Utgoff, P. and Stracuzzi, D. Many-layered learning. Neural Computation 14 (2002), 2497--2539, 2002.Google ScholarDigital Library
Van Essen, D. and Maunsell, J. Hierarchical organization and functional streams in the visual cortex. Trends in Neurosciences 6 (1983), 370--375.Google ScholarCross Ref
van Steenkiste, S., Chang, M., Greff, K., and Schmidhuber, J. Relational neural expectation maximization: Unsupervised discovery of objects and their interactions, 2018; arXiv:1802.10353.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, T., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998--6008.Google Scholar
Zambaldi, V., et al. Relational deep reinforcement learning, 2018; arXiv:1806.01830.Google Scholar
Zhu, J-Y., Park, T., Isola, P., and Efros, A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE Intern. Conf. on Computer Vision, 2223--2232.Google Scholar

Index Terms

Deep learning for AI
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Deep learning: an overview and main paradigms

In the present paper, we examine and analyze main paradigms of learning of multilayer neural networks starting with a single layer perceptron and ending with deep neural networks, which are considered regarded as a breakthrough in the field of the ...
Read More
Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more
Read More
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 64, Issue 7
July 2021
99 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3472147
Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2021 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2021
Check for updates
Qualifiers
- article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 221
  Total Citations
  View Citations
- 149,748
  Total Downloads
- Downloads (Last 12 months)16,134
- Downloads (Last 6 weeks)1,336
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Deep learning for AI

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Deep learning: an overview and main paradigms

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning