Skip to main content
Erschienen in: International Journal of Computer Vision 3/2021

27.11.2020

Deep Nets: What have They Ever Done for Vision?

verfasst von: Alan L. Yuille, Chenxi Liu

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This is an opinion paper about the strengths and weaknesses of Deep Nets for vision. They are at the heart of the enormous recent progress in artificial intelligence and are of growing importance in cognitive science and neuroscience. They have had many successes but also have several limitations and there is limited understanding of their inner workings. At present Deep Nets perform very well on specific visual tasks with benchmark datasets but they are much less general purpose, flexible, and adaptive than the human visual system. We argue that Deep Nets in their current form are unlikely to be able to overcome the fundamental problem of computer vision, namely how to deal with the combinatorial explosion, caused by the enormous complexity of natural images, and obtain the rich understanding of visual scenes that the human visual achieves. We argue that this combinatorial explosion takes us into a regime where “big data is not enough” and where we need to rethink our methods for benchmarking performance and evaluating vision algorithms. We stress that, as vision algorithms are increasingly used in real world applications, that performance evaluation is not merely an academic exercise but has important consequences in the real world. It is impractical to review the entire Deep Net literature so we restrict ourselves to a limited range of topics and references which are intended as entry points into the literature. The views expressed in this paper are our own and do not necessarily represent those of anybody else in the computer vision community.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The first author remembers that in the mid 1990s and early 2000s the term “neural network” in the title of a submission to a computer vision conference was sadly a good predictor for rejection and recalls sympathizing with researchers who were pursuing such unfashionable ideas.
 
2
In addition to visualization, training a small neural network (also known as readout functions) on top of deep features is another popular technique of assessing how much they encode some particular properties, which is now widely adopted in the self-supervised learning literature (Noroozi et al. 2016; Zhang et al. 2016).
 
3
Admittedly in ResNets (He et al. 2016), there is only one “decision layer”, and the analogy to “template matching” also weakens at higher layers due to presence of the residual connection.
 
5
This issue we are describing is in general related to how Deep Nets can take unintended, “shortcut” solutions, for example the chromatic aberration noticed in Doersch et al. (2015), or the low level statistics and edge continuity noticed in Noroozi et al. (2016). In this paper we highlight “over-sensitivity to context” as the representative example, for both familiarity and keeping the discussion contained.
 
6
The first author remembers that when studying text detection for the visually impaired we were so concerned about dataset biases that we recruited blind subjects who would walk the streets of San Francisco taking images automatically (but found the main difference from regular images was that there was a greater variety of angles).
 
7
Available from Sowerby Research Centre, British Aerospace.
 
11
Quote during a public talk by a West Coast Professor who, perhaps coincidentally, had a start-up company.
 
Literatur
Zurück zum Zitat Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.
Zurück zum Zitat Alcorn, MA., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W., & Nguyen, A. (2019). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In CVPR, Computer Vision Foundation/IEEE (pp. 4845–4854). Alcorn, MA., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W., & Nguyen, A. (2019). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In CVPR, Computer Vision Foundation/IEEE (pp. 4845–4854).
Zurück zum Zitat Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Neural module networks. In CVPR, IEEE Computer Society (pp. 39–48). Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Neural module networks. In CVPR, IEEE Computer Society (pp. 39–48).
Zurück zum Zitat Arbib, M. A., & Bonaiuto, J. J. (2016). From neuron to cognition via computational neuroscience. Cambridge: MIT Press. Arbib, M. A., & Bonaiuto, J. J. (2016). From neuron to cognition via computational neuroscience. Cambridge: MIT Press.
Zurück zum Zitat Arterberry, M. E., & Kellman, P. J. (2016). Development of perception in infancy: The cradle of knowledge revisited. Oxford: Oxford University Press. Arterberry, M. E., & Kellman, P. J. (2016). Development of perception in infancy: The cradle of knowledge revisited. Oxford: Oxford University Press.
Zurück zum Zitat Athalye, A., Carlini, N., & Wagner, DA. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 80, pp. 274–283). Athalye, A., Carlini, N., & Wagner, DA. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 80, pp. 274–283).
Zurück zum Zitat Barlow, H., & Tripathy, S. P. (1997). Correspondence noise and signal pooling in the detection of coherent visual motion. Journal of Neuroscience, 17(20), 7954–7966. Barlow, H., & Tripathy, S. P. (1997). Correspondence noise and signal pooling in the detection of coherent visual motion. Journal of Neuroscience, 17(20), 7954–7966.
Zurück zum Zitat Bashford, A., & Levine, P. (2010). The Oxford handbook of the history of eugenics. OUP USA. Bashford, A., & Levine, P. (2010). The Oxford handbook of the history of eugenics. OUP USA.
Zurück zum Zitat Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332. Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332.
Zurück zum Zitat Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychological Review, 94(2), 115. Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychological Review, 94(2), 115.
Zurück zum Zitat Biggio, B., Corona, I., Maiorca, D., Nelson, B., Srndic, N., Laskov, P., Giacinto, G., & Roli, F. (2013). Evasion attacks against machine learning at test time. In ECML/PKDD (3), Springer, Lecture Notes in Computer Science (Vol. 8190, pp. 387–402). Biggio, B., Corona, I., Maiorca, D., Nelson, B., Srndic, N., Laskov, P., Giacinto, G., & Roli, F. (2013). Evasion attacks against machine learning at test time. In ECML/PKDD (3), Springer, Lecture Notes in Computer Science (Vol. 8190, pp. 387–402).
Zurück zum Zitat Bowyer, KW., Kranenburg, C., & Dougherty, S. (1999). Edge detector evaluation using empirical ROC curves. In CVPR, IEEE Computer Society (pp. 1354–1359). Bowyer, KW., Kranenburg, C., & Dougherty, S. (1999). Edge detector evaluation using empirical ROC curves. In CVPR, IEEE Computer Society (pp. 1354–1359).
Zurück zum Zitat Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G., & Deisseroth, K. (2005). Millisecond-timescale, genetically targeted optical control of neural activity. Nature Neuroscience, 8(9), 1263. Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G., & Deisseroth, K. (2005). Millisecond-timescale, genetically targeted optical control of neural activity. Nature Neuroscience, 8(9), 1263.
Zurück zum Zitat Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency (pp. 77–91). Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency (pp. 77–91).
Zurück zum Zitat Canny, J. F. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 679–698. Canny, J. F. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 679–698.
Zurück zum Zitat Chang, AX., Funkhouser, TA., Guibas, LJ., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). Shapenet: An information-rich 3d model repository. CoRR abs/1512.03012. Chang, AX., Funkhouser, TA., Guibas, LJ., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). Shapenet: An information-rich 3d model repository. CoRR abs/1512.03012.
Zurück zum Zitat Changizi, M. (2010). The vision revolution: How the latest research overturns everything we thought we knew about human vision. Benbella books. Changizi, M. (2010). The vision revolution: How the latest research overturns everything we thought we knew about human vision. Benbella books.
Zurück zum Zitat Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Zurück zum Zitat Chen, X., & Yuille, AL. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS (pp. 1736–1744). Chen, X., & Yuille, AL. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS (pp. 1736–1744).
Zurück zum Zitat Chen, X., & Yuille, AL. (2015). Parsing occluded people by flexible compositions. In CVPR, IEEE Computer Society (pp. 3945–3954). Chen, X., & Yuille, AL. (2015). Parsing occluded people by flexible compositions. In CVPR, IEEE Computer Society (pp. 3945–3954).
Zurück zum Zitat Chen, Y., Zhu, L., Lin, C., Yuille, AL., & Zhang, H. (2007). Rapid inference on a novel AND/OR graph for object detection, segmentation and parsing. In NIPS, Curran Associates, Inc., (pp. 289–296). Chen, Y., Zhu, L., Lin, C., Yuille, AL., & Zhang, H. (2007). Rapid inference on a novel AND/OR graph for object detection, segmentation and parsing. In NIPS, Curran Associates, Inc., (pp. 289–296).
Zurück zum Zitat Chomsky, N. (2014). Aspects of the theory of syntax. Cambridge: MIT Press. Chomsky, N. (2014). Aspects of the theory of syntax. Cambridge: MIT Press.
Zurück zum Zitat Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6, 27755. Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6, 27755.
Zurück zum Zitat Clune, J., Mouret, J. B., & Lipson, H. (2013). The evolutionary origins of modularity. Proceedings of the Royal Society B: Biological Sciences, 280(1755), 20122863. Clune, J., Mouret, J. B., & Lipson, H. (2013). The evolutionary origins of modularity. Proceedings of the Royal Society B: Biological Sciences, 280(1755), 20122863.
Zurück zum Zitat Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. MCSS, 2(4), 303–314.MathSciNetMATH Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. MCSS, 2(4), 303–314.MathSciNetMATH
Zurück zum Zitat Darwiche, A. (2018). Human-level intelligence or animal-like abilities? Commun ACM, 61(10), 56–67. Darwiche, A. (2018). Human-level intelligence or animal-like abilities? Commun ACM, 61(10), 56–67.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Li, F. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, IEEE Computer Society (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Li, F. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, IEEE Computer Society (pp. 248–255).
Zurück zum Zitat Doersch, C., Gupta, A., & Efros, AA. (2015). Unsupervised visual representation learning by context prediction. In ICCV, IEEE Computer Society (pp. 1422–1430). Doersch, C., Gupta, A., & Efros, AA. (2015). Unsupervised visual representation learning by context prediction. In ICCV, IEEE Computer Society (pp. 1422–1430).
Zurück zum Zitat Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In NIPS (pp. 2366–2374) Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In NIPS (pp. 2366–2374)
Zurück zum Zitat Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645. Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Zurück zum Zitat Firestone, C. (2020). Performance versus competence in human-machine comparisons. In Proceedings of the National Academy of Sciences In Press. Firestone, C. (2020). Performance versus competence in human-machine comparisons. In Proceedings of the National Academy of Sciences In Press.
Zurück zum Zitat Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition, Competition and cooperation in neural nets (pp. 267–285). Berlin: Springer. Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition, Competition and cooperation in neural nets (pp. 267–285). Berlin: Springer.
Zurück zum Zitat Geisler, W. S. (2011). Contributions of ideal observer theory to vision research. Vision Research, 51(7), 771–781. Geisler, W. S. (2011). Contributions of ideal observer theory to vision research. Vision Research, 51(7), 771–781.
Zurück zum Zitat Geman, S. (2007). Compositionality in vision. In The grammar of vision: probabilistic grammar-based models for visual scene understanding and object categorization. Geman, S. (2007). Compositionality in vision. In The grammar of vision: probabilistic grammar-based models for visual scene understanding and object categorization.
Zurück zum Zitat George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., et al. (2017). A generative vision model that trains with high data efficiency and breaks text-based captchas. Science, 358(6368), eaag2612. George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., et al. (2017). A generative vision model that trains with high data efficiency and breaks text-based captchas. Science, 358(6368), eaag2612.
Zurück zum Zitat Gibson, J. J. (1986). The ecological approach to visual perception. Hove: Psychology Press. Gibson, J. J. (1986). The ecological approach to visual perception. Hove: Psychology Press.
Zurück zum Zitat Girshick, RB., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, IEEE Computer Society (pp. 580–587). Girshick, RB., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, IEEE Computer Society (pp. 580–587).
Zurück zum Zitat Goodfellow, IJ., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, AC., & Bengio, Y. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680). Goodfellow, IJ., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, AC., & Bengio, Y. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).
Zurück zum Zitat Goodfellow, IJ., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International Conference on Learning Representations. Goodfellow, IJ., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International Conference on Learning Representations.
Zurück zum Zitat Gopnik, A., Meltzoff, A. N., & Kuhl, P. K. (1999). The scientist in the crib: Minds, brains, and how children learn. New York: William Morrow and Co. Gopnik, A., Meltzoff, A. N., & Kuhl, P. K. (1999). The scientist in the crib: Minds, brains, and how children learn. New York: William Morrow and Co.
Zurück zum Zitat Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A theory of causal learning in children: Causal maps and bayes nets. Psychological Review, 111(1), 3. Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A theory of causal learning in children: Causal maps and bayes nets. Psychological Review, 111(1), 3.
Zurück zum Zitat Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New Jersey: John Wiley. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New Jersey: John Wiley.
Zurück zum Zitat Gregoriou, G. G., Rossi, A. F., Ungerleider, L. G., & Desimone, R. (2014). Lesions of prefrontal cortex reduce attentional modulation of neuronal responses and synchrony in v4. Nature Neuroscience, 17(7), 1003–1011. Gregoriou, G. G., Rossi, A. F., Ungerleider, L. G., & Desimone, R. (2014). Lesions of prefrontal cortex reduce attentional modulation of neuronal responses and synchrony in v4. Nature Neuroscience, 17(7), 1003–1011.
Zurück zum Zitat Gregory, R. L. (1973). Eye and brain: The psychology of seeing. New York: McGraw-Hill. Gregory, R. L. (1973). Eye and brain: The psychology of seeing. New York: McGraw-Hill.
Zurück zum Zitat Grenander, U. (1993). General pattern theory-A mathematical study of regular structures. Oxford: Clarendon Press.MATH Grenander, U. (1993). General pattern theory-A mathematical study of regular structures. Oxford: Clarendon Press.MATH
Zurück zum Zitat Guu, K., Pasupat, P., Liu, EZ., & Liang, P. (2017). From language to programs: Bridging reinforcement learning and maximum marginal likelihood. In ACL (1), Association for Computational Linguistics (pp. 1051–1062). Guu, K., Pasupat, P., Liu, EZ., & Liang, P. (2017). From language to programs: Bridging reinforcement learning and maximum marginal likelihood. In ACL (1), Association for Computational Linguistics (pp. 1051–1062).
Zurück zum Zitat Guzmán, A. (1968). Decomposition of a visual scene into three-dimensional bodies. In Proceedings of the December 9–11, 1968, Fall Joint Computer Conference, Part I (pp. 291–304) Guzmán, A. (1968). Decomposition of a visual scene into three-dimensional bodies. In Proceedings of the December 9–11, 1968, Fall Joint Computer Conference, Part I (pp. 291–304)
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, IEEE Computer Society (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, IEEE Computer Society (pp. 770–778).
Zurück zum Zitat He, K., Fan, H., Wu, Y., Xie, S., & Girshick, RB. (2019). Momentum contrast for unsupervised visual representation learning. CoRR abs/1911.05722. He, K., Fan, H., Wu, Y., Xie, S., & Girshick, RB. (2019). Momentum contrast for unsupervised visual representation learning. CoRR abs/1911.05722.
Zurück zum Zitat Hoffman, J., Tzeng, E., Park, T., Zhu, J., Isola, P., Saenko, K., et al. (2018). Cycada: Cycle-consistent adversarial domain adaptation. ICML, PMLR, Proceedings of Machine Learning Research, 80, 1994–2003. Hoffman, J., Tzeng, E., Park, T., Zhu, J., Isola, P., Saenko, K., et al. (2018). Cycada: Cycle-consistent adversarial domain adaptation. ICML, PMLR, Proceedings of Machine Learning Research, 80, 1994–2003.
Zurück zum Zitat Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In ECCV (3), Springer, Lecture Notes in Computer Science. (Vol. 7574, pp. 340–353). Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In ECCV (3), Springer, Lecture Notes in Computer Science. (Vol. 7574, pp. 340–353).
Zurück zum Zitat Hornik, K., Stinchcombe, M. B., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.MATH Hornik, K., Stinchcombe, M. B., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.MATH
Zurück zum Zitat Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 37, pp. 448–456). Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 37, pp. 448–456).
Zurück zum Zitat Jabr, F. (2012). The connectome debate: Is mapping the mind of a worm worth it. New York: Scientific American. Jabr, F. (2012). The connectome debate: Is mapping the mind of a worm worth it. New York: Scientific American.
Zurück zum Zitat Jégou, S., Drozdzal, M., Vázquez, D., Romero, A., & Bengio, Y.(2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In CVPR Workshops, IEEE Computer Society (pp. 1175–1183). Jégou, S., Drozdzal, M., Vázquez, D., Romero, A., & Bengio, Y.(2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In CVPR Workshops, IEEE Computer Society (pp. 1175–1183).
Zurück zum Zitat Julesz, B. (1971). Foundations of cyclopean perception. Chicago: U. Chicago Press. Julesz, B. (1971). Foundations of cyclopean perception. Chicago: U. Chicago Press.
Zurück zum Zitat Kaushik, D., Hovy, EH., & Lipton, ZC. (2020). Learning the difference that makes A difference with counterfactually-augmented data. In ICLR, OpenReview.net. Kaushik, D., Hovy, EH., & Lipton, ZC. (2020). Learning the difference that makes A difference with counterfactually-augmented data. In ICLR, OpenReview.net.
Zurück zum Zitat Kokkinos, I. (2017). Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, IEEE Computer Society (pp. 5454–5463). Kokkinos, I. (2017). Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, IEEE Computer Society (pp. 5454–5463).
Zurück zum Zitat Konishi, S., Yuille, AL., Coughlan, JM., & Zhu, SC. (1999). Fundamental bounds on edge detection: An information theoretic evaluation of different edge cues. In CVPR, IEEE Computer Society (pp. 1573–1579) Konishi, S., Yuille, AL., Coughlan, JM., & Zhu, SC. (1999). Fundamental bounds on edge detection: An information theoretic evaluation of different edge cues. In CVPR, IEEE Computer Society (pp. 1573–1579)
Zurück zum Zitat Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74. Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74.
Zurück zum Zitat Kortylewski, A., He, J., Liu, Q., & Yuille, AL. (2020). Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. CoRR abs/2003.04490. Kortylewski, A., He, J., Liu, Q., & Yuille, AL. (2020). Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. CoRR abs/2003.04490.
Zurück zum Zitat Kortylewski, A., Liu, Q., Wang, H., Zhang, Z., & Yuille, AL. (2020). Combining compositional models and deep networks for robust object classification under occlusion. In WACV, IEEE (pp. 1322–1330). Kortylewski, A., Liu, Q., Wang, H., Zhang, Z., & Yuille, AL. (2020). Combining compositional models and deep networks for robust object classification under occlusion. In WACV, IEEE (pp. 1322–1330).
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, GE. (2012). Imagenet classification with deep convolutional neural networks. In NIPS (pp. 1106–1114). Krizhevsky, A., Sutskever, I., & Hinton, GE. (2012). Imagenet classification with deep convolutional neural networks. In NIPS (pp. 1106–1114).
Zurück zum Zitat LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551. LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.
Zurück zum Zitat Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. JOSA A, 20(7), 1434–1448. Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. JOSA A, 20(7), 1434–1448.
Zurück zum Zitat Lin, X., Wang, H., Li, Z., Zhang, Y., Yuille, AL., & Lee, TS. (2017). Transfer of view-manifold learning to similarity perception of novel objects. In International Conference on Learning Representations. Lin, X., Wang, H., Li, Z., Zhang, Y., Yuille, AL., & Lee, TS. (2017). Transfer of view-manifold learning to similarity perception of novel objects. In International Conference on Learning Representations.
Zurück zum Zitat Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L., Fei-Fei, L., Yuille, AL., Huang, J., & Murphy, K. (2018). Progressive neural architecture search. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 11205, pp. 19–35). Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L., Fei-Fei, L., Yuille, AL., Huang, J., & Murphy, K. (2018). Progressive neural architecture search. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 11205, pp. 19–35).
Zurück zum Zitat Liu, C., Dollár, P., He, K., Girshick, RB., Yuille, AL., & Xie, S. (2020). Are labels necessary for neural architecture search? CoRR abs/2003.12056. Liu, C., Dollár, P., He, K., Girshick, RB., Yuille, AL., & Xie, S. (2020). Are labels necessary for neural architecture search? CoRR abs/2003.12056.
Zurück zum Zitat Liu, R., Liu, C., Bai, Y., & Yuille, AL. (2019). Clevr-ref+: Diagnosing visual reasoning with referring expressions. In CVPR, Computer Vision Foundation/IEEE (pp. 4185–4194). Liu, R., Liu, C., Bai, Y., & Yuille, AL. (2019). Clevr-ref+: Diagnosing visual reasoning with referring expressions. In CVPR, Computer Vision Foundation/IEEE (pp. 4185–4194).
Zurück zum Zitat Liu, Z., Knill, D. C., & Kersten, D. (1995). Object classification for human and ideal observers. Vision Research, 35(4), 549–568. Liu, Z., Knill, D. C., & Kersten, D. (1995). Object classification for human and ideal observers. Vision Research, 35(4), 549–568.
Zurück zum Zitat Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR, IEEE Computer Society (pp. 3431–3440). Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR, IEEE Computer Society (pp. 3431–3440).
Zurück zum Zitat Lu, H., & Yuille, AL. (2005). Ideal observers for detecting motion: Correspondence noise. In NIPS (pp. 827–834). Lu, H., & Yuille, AL. (2005). Ideal observers for detecting motion: Correspondence noise. In NIPS (pp. 827–834).
Zurück zum Zitat Lyu, J., Qiu, W., Wei, X., Zhang, Y., Yuille, AL., & Zha, Z. (2019). Identity preserve transform: Understand what activity classification models have learnt. CoRR abs/1912.06314. Lyu, J., Qiu, W., Wei, X., Zhang, Y., Yuille, AL., & Zha, Z. (2019). Identity preserve transform: Understand what activity classification models have learnt. CoRR abs/1912.06314.
Zurück zum Zitat Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. CoRR abs/1706.06083. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. CoRR abs/1706.06083.
Zurück zum Zitat Mao, J., Wei, X., Yang, Y., Wang, J., Huang, Z., & Yuille, AL. (2015). Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, IEEE Computer Society (pp. 2533–2541). Mao, J., Wei, X., Yang, Y., Wang, J., Huang, Z., & Yuille, AL. (2015). Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, IEEE Computer Society (pp. 2533–2541).
Zurück zum Zitat Marcus, G. (2018). Deep learning: A critical appraisal. CoRR abs/1801.00631. Marcus, G. (2018). Deep learning: A critical appraisal. CoRR abs/1801.00631.
Zurück zum Zitat Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Henry Holt and Co. Inc. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Henry Holt and Co. Inc.
Zurück zum Zitat Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR, IEEE Computer Society (pp. 4040–4048). Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR, IEEE Computer Society (pp. 4040–4048).
Zurück zum Zitat McManus, J. N., Li, W., & Gilbert, C. D. (2011). Adaptive shape processing in primary visual cortex. Proceedings of the National Academy of Sciences, 108(24), 9739–9746. McManus, J. N., Li, W., & Gilbert, C. D. (2011). Adaptive shape processing in primary visual cortex. Proceedings of the National Academy of Sciences, 108(24), 9739–9746.
Zurück zum Zitat Mengistu, H., Huizinga, J., Mouret, J., & Clune, J. (2016). The evolutionary origins of hierarchy. PLoS Computational Biology, 12(6), e1004829. Mengistu, H., Huizinga, J., Mouret, J., & Clune, J. (2016). The evolutionary origins of hierarchy. PLoS Computational Biology, 12(6), e1004829.
Zurück zum Zitat Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. CoRR abs/1411.1784. Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. CoRR abs/1411.1784.
Zurück zum Zitat Mu, J., Qiu, W., Hager, GD., & Yuille, AL. (2019). Learning from synthetic animals. CoRR abs/1912.08265. Mu, J., Qiu, W., Hager, GD., & Yuille, AL. (2019). Learning from synthetic animals. CoRR abs/1912.08265.
Zurück zum Zitat Mumford, D. (1994). Pattern theory: a unifying perspective. In First European Congress of Mathematics, Springer (pp. 187–224). Mumford, D. (1994). Pattern theory: a unifying perspective. In First European Congress of Mathematics, Springer (pp. 187–224).
Zurück zum Zitat Mumford, D., & Desolneux, A. (2010). Pattern theory: The stochastic analysis of real-world signals. Cambridge: CRC Press.MATH Mumford, D., & Desolneux, A. (2010). Pattern theory: The stochastic analysis of real-world signals. Cambridge: CRC Press.MATH
Zurück zum Zitat Noroozi, M., Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV (6), Springer, Lecture Notes in Computer Science (Vol. 9910, pp. 69–84). Noroozi, M., Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV (6), Springer, Lecture Notes in Computer Science (Vol. 9910, pp. 69–84).
Zurück zum Zitat Papandreou, G., Chen, L., Murphy, KP., Yuille, AL. (2015). Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In ICCV, IEEE Computer Society (pp. 1742–1750). Papandreou, G., Chen, L., Murphy, KP., Yuille, AL. (2015). Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In ICCV, IEEE Computer Society (pp. 1742–1750).
Zurück zum Zitat Pearl, J. (1989). Probabilistic reasoning in intelligent systems—networks of plausible inference. Morgan Kaufmann series in representation and reasoning, Morgan Kaufmann. Pearl, J. (1989). Probabilistic reasoning in intelligent systems—networks of plausible inference. Morgan Kaufmann series in representation and reasoning, Morgan Kaufmann.
Zurück zum Zitat Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.MATH Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.MATH
Zurück zum Zitat Penn, D. C., Holyoak, K. J., & Povinelli, D. J. (2008). Darwin’s mistake: Explaining the discontinuity between human and nonhuman minds. Behavioral and Brain Sciences, 31(2), 109–130. Penn, D. C., Holyoak, K. J., & Povinelli, D. J. (2008). Darwin’s mistake: Explaining the discontinuity between human and nonhuman minds. Behavioral and Brain Sciences, 31(2), 109–130.
Zurück zum Zitat Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. ICML, PMLR, Proceedings of Machine Learning Research, 80, 4092–4101. Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. ICML, PMLR, Proceedings of Machine Learning Research, 80, 4092–4101.
Zurück zum Zitat Poirazi, P., & Mel, B. W. (2001). Impact of active dendrites and structural plasticity on the memory capacity of neural tissue. Neuron, 29(3), 779–796. Poirazi, P., & Mel, B. W. (2001). Impact of active dendrites and structural plasticity on the memory capacity of neural tissue. Neuron, 29(3), 779–796.
Zurück zum Zitat Qiao, S., Liu, C., Shen, W., & Yuille, AL. (2018). Few-shot image recognition by predicting parameters from activations. In CVPR, IEEE Computer Society (pp. 7229–7238). Qiao, S., Liu, C., Shen, W., & Yuille, AL. (2018). Few-shot image recognition by predicting parameters from activations. In CVPR, IEEE Computer Society (pp. 7229–7238).
Zurück zum Zitat Qiu, W., & Yuille, AL. (2016). Unrealcv: Connecting computer vision to unreal engine. In ECCV Workshops (3), Lecture Notes in Computer Science (Vol. 9915, pp. 909–916). Qiu, W., & Yuille, AL. (2016). Unrealcv: Connecting computer vision to unreal engine. In ECCV Workshops (3), Lecture Notes in Computer Science (Vol. 9915, pp. 909–916).
Zurück zum Zitat Ren, S., He, K., Girshick, RB., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. In NIPS (pp. 91–99). Ren, S., He, K., Girshick, RB., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
Zurück zum Zitat Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., & Zha, H. (2017). Unsupervised deep learning for optical flow estimation. In AAAI, AAAI Press (pp. 1495–1501). Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., & Zha, H. (2017). Unsupervised deep learning for optical flow estimation. In AAAI, AAAI Press (pp. 1495–1501).
Zurück zum Zitat Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373. Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373.
Zurück zum Zitat Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019.
Zurück zum Zitat Rosenfeld, A., Zemel, RS., & Tsotsos, JK. (2018). The elephant in the room. CoRR abs/1808.03305. Rosenfeld, A., Zemel, RS., & Tsotsos, JK. (2018). The elephant in the room. CoRR abs/1808.03305.
Zurück zum Zitat Rother, C., Kolmogorov, V., & Blake, A. (2004). “Grabcut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314. Rother, C., Kolmogorov, V., & Blake, A. (2004). “Grabcut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.
Zurück zum Zitat Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.MATH Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.MATH
Zurück zum Zitat Russell, S. J., & Norvig, P. (2010). Artificial Intelligence—A Modern Approach. Pearson Education: Third International Edition. Russell, S. J., & Norvig, P. (2010). Artificial Intelligence—A Modern Approach. Pearson Education: Third International Edition.
Zurück zum Zitat Sabour, S., Frosst, N., & Hinton, GE. (2017). Dynamic routing between capsules. In NIPS (pp. 3856–3866). Sabour, S., Frosst, N., & Hinton, GE. (2017). Dynamic routing between capsules. In NIPS (pp. 3856–3866).
Zurück zum Zitat Salakhutdinov, R., Tenenbaum, JB., & Torralba, A. (2012). One-shot learning with a hierarchical nonparametric bayesian model. In ICML Unsupervised and Transfer Learning, JMLR.org, JMLR Proceedings (Vol. 27, pp. 195–206). Salakhutdinov, R., Tenenbaum, JB., & Torralba, A. (2012). One-shot learning with a hierarchical nonparametric bayesian model. In ICML Unsupervised and Transfer Learning, JMLR.org, JMLR Proceedings (Vol. 27, pp. 195–206).
Zurück zum Zitat Santoro, A., Hill, F., Barrett, DGT., Morcos, AS., & Lillicrap, TP. (2018). Measuring abstract reasoning in neural networks. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 80, pp. 4477–4486). Santoro, A., Hill, F., Barrett, DGT., Morcos, AS., & Lillicrap, TP. (2018). Measuring abstract reasoning in neural networks. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 80, pp. 4477–4486).
Zurück zum Zitat Seung, S. (2012). Connectome: How the brain’s wiring makes us who we are. HMH. Seung, S. (2012). Connectome: How the brain’s wiring makes us who we are. HMH.
Zurück zum Zitat Shen, W., Zhao, K., Jiang, Y., Wang, Y., Bai, X., & Yuille, A. L. (2017a). Deepskeleton: Learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Transactions on Image Processing, 26(11), 5298–5311.MathSciNetMATH Shen, W., Zhao, K., Jiang, Y., Wang, Y., Bai, X., & Yuille, A. L. (2017a). Deepskeleton: Learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Transactions on Image Processing, 26(11), 5298–5311.MathSciNetMATH
Zurück zum Zitat Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., & Xue, X. (2017) DSOD: learning deeply supervised object detectors from scratch. In ICCV, IEEE Computer Society (pp. 1937–1945). Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., & Xue, X. (2017) DSOD: learning deeply supervised object detectors from scratch. In ICCV, IEEE Computer Society (pp. 1937–1945).
Zurück zum Zitat Shu ,M., Liu, C., Qiu, W., & Yuille, AL. (2020). Identifying model weakness with adversarial examiner. In AAAI, AAAI Press, (pp. 11998–12006). Shu ,M., Liu, C., Qiu, W., & Yuille, AL. (2020). Identifying model weakness with adversarial examiner. In AAAI, AAAI Press, (pp. 11998–12006).
Zurück zum Zitat Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception, 28(9), 1059–1074. Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception, 28(9), 1059–1074.
Zurück zum Zitat Simonyan ,K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations. Simonyan ,K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.
Zurück zum Zitat Smirnakis, SM., & Yuille, AL. (1995). Neural implementation of bayesian vision theories by unsupervised learning. In The Neurobiology of Computation, Springer, (pp. 427–432). Smirnakis, SM., & Yuille, AL. (1995). Neural implementation of bayesian vision theories by unsupervised learning. In The Neurobiology of Computation, Springer, (pp. 427–432).
Zurück zum Zitat Smith, L., & Gasser, M. (2005). The development of embodied cognition: Six lessons from babies. Artificial Life, 11(1–2), 13–29. Smith, L., & Gasser, M. (2005). The development of embodied cognition: Six lessons from babies. Artificial Life, 11(1–2), 13–29.
Zurück zum Zitat Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, IJ., & Fergus, R. (2014). Intriguing properties of neural networks. In International Conference on Learning Representations. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, IJ., & Fergus, R. (2014). Intriguing properties of neural networks. In International Conference on Learning Representations.
Zurück zum Zitat Tjan, B. S., Braje, W. L., Legge, G. E., & Kersten, D. (1995). Human efficiency for recognizing 3-d objects in luminance noise. Vision Research, 35(21), 3053–3069. Tjan, B. S., Braje, W. L., Legge, G. E., & Kersten, D. (1995). Human efficiency for recognizing 3-d objects in luminance noise. Vision Research, 35(21), 3053–3069.
Zurück zum Zitat Torralba, A., & Efros, AA. (2011). Unbiased look at dataset bias. In CVPR, IEEE Computer Society (pp. 1521–1528). Torralba, A., & Efros, AA. (2011). Unbiased look at dataset bias. In CVPR, IEEE Computer Society (pp. 1521–1528).
Zurück zum Zitat Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., & Madry, A. (2019). Robustness may be at odds with accuracy. In ICLR (Poster), OpenReview.net. Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., & Madry, A. (2019). Robustness may be at odds with accuracy. In ICLR (Poster), OpenReview.net.
Zurück zum Zitat Tu, Z., Chen, X., Yuille, AL., & Zhu, SC. (2003). Image parsing: Unifying segmentation, detection, and recognition. In ICCV, IEEE Computer Society (pp. 18–25). Tu, Z., Chen, X., Yuille, AL., & Zhu, SC. (2003). Image parsing: Unifying segmentation, detection, and recognition. In ICCV, IEEE Computer Society (pp. 18–25).
Zurück zum Zitat Uesato, J., O’Donoghue, B., Kohli, P., & van den Oord, A. (2018). Adversarial risk and the dangers of evaluating against weak attacks. ICML, PMLR, Proceedings of Machine Learning Research, 80, 5032–5041. Uesato, J., O’Donoghue, B., Kohli, P., & van den Oord, A. (2018). Adversarial risk and the dangers of evaluating against weak attacks. ICML, PMLR, Proceedings of Machine Learning Research, 80, 5032–5041.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In NIPS (pp. 5998–6008). Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In NIPS (pp. 5998–6008).
Zurück zum Zitat Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In NIPS (pp. 3630–3638). Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In NIPS (pp. 3630–3638).
Zurück zum Zitat Wang, J., Zhang, Z., Premachandran, V., & Yuille, AL. (2015). Discovering internal representations from object-cnns using population encoding. CoRR abs/1511.06855. Wang, J., Zhang, Z., Premachandran, V., & Yuille, AL. (2015). Discovering internal representations from object-cnns using population encoding. CoRR abs/1511.06855.
Zurück zum Zitat Wang, J., Zhang, Z., Xie, C., Zhou, Y., Premachandran, V., Zhu, J., et al. (2018). Visual concepts and compositional voting. Annals of Mathematical Sciences and Applications, 2(3), 4.MathSciNetMATH Wang, J., Zhang, Z., Xie, C., Zhou, Y., Premachandran, V., Zhu, J., et al. (2018). Visual concepts and compositional voting. Annals of Mathematical Sciences and Applications, 2(3), 4.MathSciNetMATH
Zurück zum Zitat Wang, P., & Yuille, AL. (2016). DOC: deep occlusion estimation from a single image. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 9905, pp. 545–561). Wang, P., & Yuille, AL. (2016). DOC: deep occlusion estimation from a single image. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 9905, pp. 545–561).
Zurück zum Zitat Wang, T., Zhao, J., Yatskar, M., Chang, K., & Ordonez, V. (2019). Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In ICCV, IEEE (pp. 5309–5318). Wang, T., Zhao, J., Yatskar, M., Chang, K., & Ordonez, V. (2019). Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In ICCV, IEEE (pp. 5309–5318).
Zurück zum Zitat Wang, X., & Gupta, A. (2015). Unsupervised learning of visual representations using videos. In ICCV, IEEE Computer Society (pp. 2794–2802). Wang, X., & Gupta, A. (2015). Unsupervised learning of visual representations using videos. In ICCV, IEEE Computer Society (pp. 2794–2802).
Zurück zum Zitat Wen, H., Shi, J., Zhang, Y., Lu, K. H., Cao, J., & Liu, Z. (2017). Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex, 28, 1–25. Wen, H., Shi, J., Zhang, Y., Lu, K. H., Cao, J., & Liu, Z. (2017). Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex, 28, 1–25.
Zurück zum Zitat Wu, Z., Xiong, Y., Yu, SX., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In CVPR, IEEE Computer Society (pp. 3733–3742). Wu, Z., Xiong, Y., Yu, SX., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In CVPR, IEEE Computer Society (pp. 3733–3742).
Zurück zum Zitat Xia, F., Wang, P., Chen, L., & Yuille, AL. (2016), Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net. In ECCV (5), Springer, Lecture Notes in Computer Science (Vol. 9909, pp. 648–663). Xia, F., Wang, P., Chen, L., & Yuille, AL. (2016), Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net. In ECCV (5), Springer, Lecture Notes in Computer Science (Vol. 9909, pp. 648–663).
Zurück zum Zitat Xia, Y., Zhang, Y., Liu, F., Shen, W., & Yuille, AL. (2020).Synthesize then compare: Detecting failures and anomalies for semantic segmentation. CoRR abs/2003.08440. Xia, Y., Zhang, Y., Liu, F., Shen, W., & Yuille, AL. (2020).Synthesize then compare: Detecting failures and anomalies for semantic segmentation. CoRR abs/2003.08440.
Zurück zum Zitat Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, AL. (2017). Adversarial examples for semantic segmentation and object detection. In ICCV, IEEE Computer Society (pp. 1378–1387). Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, AL. (2017). Adversarial examples for semantic segmentation and object detection. In ICCV, IEEE Computer Society (pp. 1378–1387).
Zurück zum Zitat Xie, C., Wang, J., Zhangm, Z., Ren, Z., & Yuille, AL. (2018). Mitigating adversarial effects through randomization. In International Conference on Learning Representations. Xie, C., Wang, J., Zhangm, Z., Ren, Z., & Yuille, AL. (2018). Mitigating adversarial effects through randomization. In International Conference on Learning Representations.
Zurück zum Zitat Xie, L., & Yuille, AL. (2017). Genetic CNN. In ICCV, IEEE Computer Society (pp. 1388–1397). Xie, L., & Yuille, AL. (2017). Genetic CNN. In ICCV, IEEE Computer Society (pp. 1388–1397).
Zurück zum Zitat Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In ICCV, IEEE Computer Society (pp. 1395–1403). Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In ICCV, IEEE Computer Society (pp. 1395–1403).
Zurück zum Zitat Xu, L., Krzyzak, A., & Yuille, A. L. (1994). On radial basis function nets and kernel regression: Statistical consistency, convergence rates, and receptive field size. Neural Networks, 7(4), 609–628.MATH Xu, L., Krzyzak, A., & Yuille, A. L. (1994). On radial basis function nets and kernel regression: Statistical consistency, convergence rates, and receptive field size. Neural Networks, 7(4), 609–628.MATH
Zurück zum Zitat Yamane, Y., Carlson, E. T., Bowman, K. C., Wang, Z., & Connor, C. E. (2008). A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nature Neuroscience, 11(11), 1352–1360. Yamane, Y., Carlson, E. T., Bowman, K. C., Wang, Z., & Connor, C. E. (2008). A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nature Neuroscience, 11(11), 1352–1360.
Zurück zum Zitat Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624. Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624.
Zurück zum Zitat Yang, C., Kortylewski, A., Xie, C., Cao, Y., & Yuille, AL. (2020). Patchattack: A black-box texture-based attack with reinforcement learning. CoRR abs/2004.05682. Yang, C., Kortylewski, A., Xie, C., Cao, Y., & Yuille, AL. (2020). Patchattack: A black-box texture-based attack with reinforcement learning. CoRR abs/2004.05682.
Zurück zum Zitat Yosinski, J., Clune, J., Nguyen, AM., Fuchs. TJ., & Lipson, H. (2015). Understanding neural networks through deep visualization. CoRR abs/1506.06579. Yosinski, J., Clune, J., Nguyen, AM., Fuchs. TJ., & Lipson, H. (2015). Understanding neural networks through deep visualization. CoRR abs/1506.06579.
Zurück zum Zitat Yuille, A., & Kersten, D. (2006). Vision as bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7), 301–308. Yuille, A., & Kersten, D. (2006). Vision as bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7), 301–308.
Zurück zum Zitat Yuille, A. L., & Mottaghi, R. (2016). Complexity of representation and inference in compositional models with part sharing. Journal of Machine Learning Research, 17, 292–319.MathSciNetMATH Yuille, A. L., & Mottaghi, R. (2016). Complexity of representation and inference in compositional models with part sharing. Journal of Machine Learning Research, 17, 292–319.MathSciNetMATH
Zurück zum Zitat Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR, IEEE Computer Society (pp. 1592–1599). Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR, IEEE Computer Society (pp. 1592–1599).
Zurück zum Zitat Zeiler, MD., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 8689, pp. 818–833). Zeiler, MD., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 8689, pp. 818–833).
Zurück zum Zitat Zendel, O., Murschitz, M., Humenberger, M., & Herzner, W. (2015). CV-HAZOP: introducing test data validation for computer vision. In ICCV, IEEE Computer Society (pp. 2066–2074). Zendel, O., Murschitz, M., Humenberger, M., & Herzner, W. (2015). CV-HAZOP: introducing test data validation for computer vision. In ICCV, IEEE Computer Society (pp. 2066–2074).
Zurück zum Zitat Zhang, R., Isola, P., & Efros, AA. (2016). Colorful image colorization. In ECCV (3), Springer, Lecture Notes in Computer Science (Vol. 9907, pp. 649–666). Zhang, R., Isola, P., & Efros, AA. (2016). Colorful image colorization. In ECCV (3), Springer, Lecture Notes in Computer Science (Vol. 9907, pp. 649–666).
Zurück zum Zitat Zhang, Y., Qiu, W., Chen, Q., Hu, X., & Yuille, AL. (2018). Unrealstereo: Controlling hazardous factors to analyze stereo vision. In 3DV, IEEE Computer Society (pp. 228–237). Zhang, Y., Qiu, W., Chen, Q., Hu, X., & Yuille, AL. (2018). Unrealstereo: Controlling hazardous factors to analyze stereo vision. In 3DV, IEEE Computer Society (pp. 228–237).
Zurück zum Zitat Zhang, Z., Shen, W., Qiao, S., Wang, Y., Wang, B., & Yuille, AL. (2020). Robust face detection via learning small faces on hard images. In WACV, IEEE (pp. 1350–1359). Zhang, Z., Shen, W., Qiao, S., Wang, Y., Wang, B., & Yuille, AL. (2020). Robust face detection via learning small faces on hard images. In WACV, IEEE (pp. 1350–1359).
Zurück zum Zitat Zhou, B., Lapedriza, À., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS (pp. 487–495). Zhou, B., Lapedriza, À., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS (pp. 487–495).
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene cnns. In International Conference on Learning Representations. Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene cnns. In International Conference on Learning Representations.
Zurück zum Zitat Zhou, T., Brown, M., Snavely, N., & Lowe, DG. (2017). Unsupervised learning of depth and ego-motion from video. In CVPR, IEEE Computer Society (pp. 6612–6619). Zhou, T., Brown, M., Snavely, N., & Lowe, DG. (2017). Unsupervised learning of depth and ego-motion from video. In CVPR, IEEE Computer Society (pp. 6612–6619).
Zurück zum Zitat Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10(1), 1–9. Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10(1), 1–9.
Zurück zum Zitat Zhu, H., Tang, P., Yuille, AL., Park, S., & Park, J. (2019). Robustness of object recognition under extreme occlusion in humans and computational models. In CogSci, cognitivesciencesociety.org (pp. 3213–3219). Zhu, H., Tang, P., Yuille, AL., Park, S., & Park, J. (2019). Robustness of object recognition under extreme occlusion in humans and computational models. In CogSci, cognitivesciencesociety.org (pp. 3213–3219).
Zurück zum Zitat Zhu, L., Chen, Y., Torralba, A., Freeman, WT., Yuille, AL. (2010). Part and appearance sharing: Recursive compositional models for multi-view. In CVPR, IEEE Computer Society (pp. 1919–1926). Zhu, L., Chen, Y., Torralba, A., Freeman, WT., Yuille, AL. (2010). Part and appearance sharing: Recursive compositional models for multi-view. In CVPR, IEEE Computer Society (pp. 1919–1926).
Zurück zum Zitat Zhu, S., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.MATH Zhu, S., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.MATH
Zurück zum Zitat Zhu, Z., Xie, L., & Yuille, AL. (2017). Object recognition with and without objects. In IJCAI, ijcai.org (pp. 3609–3615). Zhu, Z., Xie, L., & Yuille, AL. (2017). Object recognition with and without objects. In IJCAI, ijcai.org (pp. 3609–3615).
Zurück zum Zitat Zitnick, C. L., Agrawal, A., Antol, S., Mitchell, M., Batra, D., & Parikh, D. (2016). Measuring machine intelligence through visual question answering. AI Magazine, 37(1), 63–72. Zitnick, C. L., Agrawal, A., Antol, S., Mitchell, M., Batra, D., & Parikh, D. (2016). Measuring machine intelligence through visual question answering. AI Magazine, 37(1), 63–72.
Zurück zum Zitat Zoph, B., & Le, QV. (2017). Neural architecture search with reinforcement learning. In ICLR, OpenReview.net. Zoph, B., & Le, QV. (2017). Neural architecture search with reinforcement learning. In ICLR, OpenReview.net.
Metadaten
Titel
Deep Nets: What have They Ever Done for Vision?
verfasst von
Alan L. Yuille
Chenxi Liu
Publikationsdatum
27.11.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01405-z

Weitere Artikel der Ausgabe 3/2021

International Journal of Computer Vision 3/2021 Zur Ausgabe

Premium Partner