skip to main content
article
Open Access

Deep learning for AI

Published:21 June 2021Publication History
Skip Abstract Section

Abstract

How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding language?

References

  1. Abadi, M. et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symp. Operating Systems Design and Implementation, 2016, 265--283.Google ScholarGoogle Scholar
  2. Adiwardana, D., Luong, M., So, D., Hall, J., Fiedel, N., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade, G., Lu, Y., et al. Towards a human-like open-domain chatbot 2020; arXiv preprint arXiv:2001.09977.Google ScholarGoogle Scholar
  3. Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization, 2019; arXiv preprint arXiv:1907.02893.Google ScholarGoogle Scholar
  4. Ba, J., Hinton, G., Mnih, V., Leibo, J., and Ionescu, C. Using fast weights to attend to the recent past. Advances in Neural Information Processing Systems, 2016, 4331--4339.Google ScholarGoogle Scholar
  5. Baars, B. A Cognitive Theory of Consciousness. Cambridge University Press, Cambridge, MA, 1993.Google ScholarGoogle Scholar
  6. Bachman, P., Hjelm, R., and Buchwalter, W. Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems, 2019, 15535--15545.Google ScholarGoogle Scholar
  7. Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate, 2014; arXiv:1409.0473.Google ScholarGoogle Scholar
  8. Bahdanau, D., Murty, S., Noukhovitch, M., Nguyen, T., Vries, H., and Courville, A. Systematic generalization: What is required and can it be learned? 2018; arXiv:1811.12889.Google ScholarGoogle Scholar
  9. Bahdanau, D., de Vries, H., O'Donnell, T., Murty, S., Beaudoin, P., Bengio, Y., and Courville, A. Closure: Assessing systematic generalization of clever models, 2019; arXiv:1912.05783.Google ScholarGoogle Scholar
  10. Becker, S. and Hinton, G. Self-organizing neural network that discovers surfaces in random dot stereograms. Nature 355, 6356 (1992), 161--163.Google ScholarGoogle ScholarCross RefCross Ref
  11. Bengio, Y. The consciousness prior, 2017; arXiv:1709.08568.Google ScholarGoogle Scholar
  12. Bengio, Y., Bengio, S., and Cloutier, J. Learning a synaptic learning rule. In Proceedings of the IEEE 1991 Seattle Intern. Joint Conf. Neural Networks 2.Google ScholarGoogle Scholar
  13. Bengio, Y., Deleu, T., Rahaman, N., Ke, R., Lachapelle, S., Bilaniuk, O., Goyal, A., and Pal, C. A meta-transfer objective for learning to disentangle causal mechanisms. In Proceedings of ICLR'2020; arXiv:1901.10912.Google ScholarGoogle Scholar
  14. Bengio, Y., Ducharme, R., and Vincent, P. A neural probabilistic language model. NIPS'2000, 2001, 932--938.Google ScholarGoogle Scholar
  15. Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. Greedy layer-wise training of deep networks. In Proceedings of NIPS'2006, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y. Theano: A CPU and GPU math expression compiler. In Proceedings of SciPy, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  17. Bromley, J., Guyon, I., LeCun, Y., Säkinger, E., and Shah, R. Signature verification using a "Siamese" time delay neural network. Advances in Neural Information Processing Systems, 1994, 737--744.Google ScholarGoogle Scholar
  18. Brown, T. et al. Language models are few-shot learners, 2020; arXiv:2005.14165.Google ScholarGoogle Scholar
  19. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. End-to-end object detection with transformers. In Procedings of ECCV'2020; arXiv:2005.12872.Google ScholarGoogle Scholar
  20. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments, 2020;. arXiv:2006.09882.Google ScholarGoogle Scholar
  21. Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations, 2020; arXiv:2002.05709.Google ScholarGoogle Scholar
  22. Chen, X., Fan, H., Girshick, R., and He, K. Improved baselines with momentum contrastive learning, 2020; arXiv:2003.04297.Google ScholarGoogle Scholar
  23. Chevalier-Boisvert, M., Bahdanau, D., Lahlou, S., Willems, L., Saharia, C., Nguyen, T., and Bengio, Y. Babyai: First steps towards grounded language learning with a human in the loop. In Proceedings in ICLR'2019; arXiv:1810.08272.Google ScholarGoogle Scholar
  24. Chopra, S., Hadsell, R., and LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conf. Computer Vision and Pattern Recognition 1, 539--546.Google ScholarGoogle Scholar
  25. Collobert, R., Kavukcuoglu, K., and Farabet, C. Torch7: A matlab-like environment for machine learning. In Proceedings of NIPS Worskshop BigLearn, 2011.Google ScholarGoogle Scholar
  26. Collobert, R. and Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of ICML'2008.Google ScholarGoogle Scholar
  27. Conneau, A. and Lample, G. Cross-lingual language model pretraining. Advances in Neural Information Processing Systems 32, 2019. H. Wallach et al., eds. 7059--7069. Curran Associates, Inc.; http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining.pdf.Google ScholarGoogle Scholar
  28. Dahl, G., Yu, D., Deng, L., and Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech, and Language Processing 20, 1 (2011), 30--42.Google ScholarGoogle Scholar
  29. Dayan, P. and Abbott, L. Theoretical Neuroscience. The MIT Press, 2001.Google ScholarGoogle Scholar
  30. Dehaene, S., Lau, H., and Kouider, S. What is consciousness, and could machines have it? Science 358, 6362 (2017, 486--492.Google ScholarGoogle Scholar
  31. Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of 2009 IEEE Conf. Computer Vision and Pattern Recognition, 248--255.Google ScholarGoogle Scholar
  32. Devlin, J., Chang, M., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of ACL'2019; arXiv:1810.04805.Google ScholarGoogle Scholar
  33. Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks, 2017; arXiv:1703.03400.Google ScholarGoogle Scholar
  34. Ganin, Y and Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of Intern. Conf. Machine Learning, 2015, 1180--1189.Google ScholarGoogle Scholar
  35. Glorot, X., Bordes, A., and Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of AISTATS'2011.Google ScholarGoogle Scholar
  36. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014, 2672--26804.Google ScholarGoogle Scholar
  37. Gopnik, A., Glymour, C., Sobel, D., Schulz, L., Kushnir, T., and Danks, D. A theory of causal learning in children: causal maps and bayes nets. Psychological Review 111, 1 (2004).Google ScholarGoogle ScholarCross RefCross Ref
  38. Goyal, A., Lamb, A., Hoffmann, J., Sodhani, S., Levine, S., Bengio, Y., and Schölkopf, B. Recurrent independent mechanisms, 2019; arXiv:1909.10893.Google ScholarGoogle Scholar
  39. Graves, A. Generating sequences with recurrent neural networks, 2013; arXiv:1308.0850.Google ScholarGoogle Scholar
  40. Grill, J-B. et al. Bootstrap your own latent: A new approach to self-supervised learning, 2020; aeXiv:2006.07733.Google ScholarGoogle Scholar
  41. Gutmann, M. and Hyvärinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13th Intern. Conf. Artificial Intelligence and Statistics, 2010, 297--304.Google ScholarGoogle Scholar
  42. He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of CVPR'2020, June 2020.Google ScholarGoogle ScholarCross RefCross Ref
  43. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of CVPR'2016, 770--778.Google ScholarGoogle Scholar
  44. Hinton, G. A parallel computation that assigns canonical object-based frames of reference. In Proceedings of the 7th Intern. Joint Conf. Artificial Intelligence 2, 1981, 683--685.Google ScholarGoogle Scholar
  45. Hinton, G. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence 46, 1--2 (1990), 47--75.Google ScholarGoogle ScholarCross RefCross Ref
  46. Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing 29, 6 (2012), 82--97.Google ScholarGoogle ScholarCross RefCross Ref
  47. Hinton, G., Krizhevsky, A., and Wang, S. Transforming auto-encoders. In Proceedings of Intern. Conf. Artificial Neural Networks. Springer, 2011, 44--51.Google ScholarGoogle ScholarCross RefCross Ref
  48. Hinton, G., Osindero, S., and Teh, Y-W. A fast-learning algorithm for deep belief nets. Neural Computation 18 (2006), 1527--1554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hinton, G. and Plaut, D. Using fast weights to deblur old memories. In Proceedings of the 9th Annual Conf. Cognitive Science Society, 1987, 177--186.Google ScholarGoogle Scholar
  50. Hinton, G. and Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313 (July 2006), 504--507.Google ScholarGoogle ScholarCross RefCross Ref
  51. Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. In Proceedings of NeurIPS'2012; arXiv:1207.0580.Google ScholarGoogle Scholar
  52. Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015.Google ScholarGoogle Scholar
  54. Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of ICCV'09, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  55. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM Intern. Conf. Multimedia, 2014, 675--678.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Kahneman, D. Thinking, Fast and Slow. Macmillan, 2011.Google ScholarGoogle Scholar
  57. Ke, N., Bilaniuk, O., Goyal, A., Bauer, S., Larochelle, H., Pal, C., and Bengio, Y. Learning neural causal models from unknown interventions, 2019; arXiv:1910.01075.Google ScholarGoogle Scholar
  58. Kingma, D. and Welling, M. Auto-encoding variational bayes. In Proceedings of the Intern. Conf. Learning Representations, 2014.Google ScholarGoogle Scholar
  59. Kosiorek, A., Sabour, S., Teh, Y., and Hinton, G. Stacked capsule autoencoders. Advances in Neural Information Processing Systems, 2019, 15512--15522.Google ScholarGoogle Scholar
  60. Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. In Proceedings of NIPS'2012.Google ScholarGoogle Scholar
  61. Lake, B., Ullman, T., Tenenbaum, J., and Gershman, S. Building machines that learn and think like people. Behavioral and Brain Sciences 40 (2017).Google ScholarGoogle Scholar
  62. Lample, G. and Charton, F. Deep learning for symbolic mathematics. In Proceedings of ICLR'2020; arXiv:1912.01412.Google ScholarGoogle Scholar
  63. LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarGoogle ScholarCross RefCross Ref
  64. LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., and Jackel, L. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 4 (1989), 541--551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  66. Lopez-Paz, D., Nishihara, R., Chintala, S., Scholkopf, B., and Bottou, L. Discovering causal signals in images. In Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition, 2017, 6979--6987.Google ScholarGoogle ScholarCross RefCross Ref
  67. Misra, I. and Maaten, L. Self-supervised learning of pretext-invariant representations. In Proceedings of CVPR'2020, June 2020; arXiv:1912.01991.Google ScholarGoogle ScholarCross RefCross Ref
  68. Mohamed, A., Dahl, G., and Hinton, G. Deep belief networks for phone recognition. In Proceedings of NIPS Workshop on Deep Learning for Speech Recognition and Related Applications. (Vancouver, Canada, 2009).Google ScholarGoogle Scholar
  69. Morgan, N., Beck, J., Allman, E., and Beer, J. Rap: A ring array processor for multilayer perceptron applications. In Proceedings of the IEEE Intern. Conf. Acoustics, Speech, and Signal Processing, 1990, 1005--1008.Google ScholarGoogle ScholarCross RefCross Ref
  70. Nair, V. and Hinton, G. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the ICML'2010.Google ScholarGoogle Scholar
  71. Paszke, A., et al. Automatic differentiation in pytorch. 2017.Google ScholarGoogle Scholar
  72. Robinson, A. An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Networks 5, 2 (1994), 298--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Roller, S., et al. Recipes for building an open domain chatbot, 2020; arXiv:2004.13637.Google ScholarGoogle Scholar
  74. Rumelhart, D., Hinton, G., and Williams, R. Learning representations by back-propagating errors. Nature 323 (1986), 533--536.Google ScholarGoogle ScholarCross RefCross Ref
  75. Schmidhuber, J. Evolutionary principles in self-referential learning. Diploma thesis, Institut f. Informatik, Tech.Univ. Munich, 1987.Google ScholarGoogle Scholar
  76. Shepard, R. Toward a universal law of generalization for psychological science. Science 237, 4820 (1987), 1317--1323.Google ScholarGoogle ScholarCross RefCross Ref
  77. Silver, D., et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.Google ScholarGoogle Scholar
  78. Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. End-to-end memory networks. Advances in Neural Information Processing Systems 28, 2015, 2440--2448. C. Cortes et al., eds. Curran Associates, Inc.; http://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf.Google ScholarGoogle Scholar
  79. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In Proceedings of ICLR'2014; arXiv:1312.6199.Google ScholarGoogle Scholar
  80. Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. Web-scale training for face identification. In Proceedings of CVPR'2015, 2746--2754.Google ScholarGoogle Scholar
  81. Thrun, S. Is learning the n-th thing any easier than learning the first? In Proceedings of NIPS'1995. MIT Press, Cambridge, MA, 640--646.Google ScholarGoogle Scholar
  82. Utgoff, P. and Stracuzzi, D. Many-layered learning. Neural Computation 14 (2002), 2497--2539, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Van Essen, D. and Maunsell, J. Hierarchical organization and functional streams in the visual cortex. Trends in Neurosciences 6 (1983), 370--375.Google ScholarGoogle ScholarCross RefCross Ref
  84. van Steenkiste, S., Chang, M., Greff, K., and Schmidhuber, J. Relational neural expectation maximization: Unsupervised discovery of objects and their interactions, 2018; arXiv:1802.10353.Google ScholarGoogle Scholar
  85. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, T., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998--6008.Google ScholarGoogle Scholar
  86. Zambaldi, V., et al. Relational deep reinforcement learning, 2018; arXiv:1806.01830.Google ScholarGoogle Scholar
  87. Zhu, J-Y., Park, T., Isola, P., and Efros, A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE Intern. Conf. on Computer Vision, 2223--2232.Google ScholarGoogle Scholar

Index Terms

  1. Deep learning for AI

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Communications of the ACM
          Communications of the ACM  Volume 64, Issue 7
          July 2021
          99 pages
          ISSN:0001-0782
          EISSN:1557-7317
          DOI:10.1145/3472147
          Issue’s Table of Contents

          Copyright © 2021 Owner/Author

          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 June 2021

          Check for updates

          Qualifiers

          • article
          • Popular
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format