nach oben

International Journal of Computer Vision

Erschienen in:

27.11.2020

Deep Nets: What have They Ever Done for Vision?

verfasst von: Alan L. Yuille, Chenxi Liu

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This is an opinion paper about the strengths and weaknesses of Deep Nets for vision. They are at the heart of the enormous recent progress in artificial intelligence and are of growing importance in cognitive science and neuroscience. They have had many successes but also have several limitations and there is limited understanding of their inner workings. At present Deep Nets perform very well on specific visual tasks with benchmark datasets but they are much less general purpose, flexible, and adaptive than the human visual system. We argue that Deep Nets in their current form are unlikely to be able to overcome the fundamental problem of computer vision, namely how to deal with the combinatorial explosion, caused by the enormous complexity of natural images, and obtain the rich understanding of visual scenes that the human visual achieves. We argue that this combinatorial explosion takes us into a regime where “big data is not enough” and where we need to rethink our methods for benchmarking performance and evaluating vision algorithms. We stress that, as vision algorithms are increasingly used in real world applications, that performance evaluation is not merely an academic exercise but has important consequences in the real world. It is impractical to review the entire Deep Net literature so we restrict ourselves to a limited range of topics and references which are intended as entry points into the literature. The views expressed in this paper are our own and do not necessarily represent those of anybody else in the computer vision community.

Vorheriger Artikel CDTD: A Large-Scale Cross-Domain Benchmark for Instance-Level Image-to-Image Translation and Domain Adaptive Object Detection

Nächster Artikel Correction to: Rooted Spanning Superpixels

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The first author remembers that in the mid 1990s and early 2000s the term “neural network” in the title of a submission to a computer vision conference was sadly a good predictor for rejection and recalls sympathizing with researchers who were pursuing such unfashionable ideas.

In addition to visualization, training a small neural network (also known as readout functions) on top of deep features is another popular technique of assessing how much they encode some particular properties, which is now widely adopted in the self-supervised learning literature (Noroozi et al. 2016; Zhang et al. 2016).

Admittedly in ResNets (He et al. 2016), there is only one “decision layer”, and the analogy to “template matching” also weakens at higher layers due to presence of the residual connection.

https://michaelbach.de/ot/.

This issue we are describing is in general related to how Deep Nets can take unintended, “shortcut” solutions, for example the chromatic aberration noticed in Doersch et al. (2015), or the low level statistics and edge continuity noticed in Noroozi et al. (2016). In this paper we highlight “over-sensitivity to context” as the representative example, for both familiarity and keeping the discussion contained.

The first author remembers that when studying text detection for the visually impaired we were so concerned about dataset biases that we recruited blind subjects who would walk the streets of San Francisco taking images automatically (but found the main difference from regular images was that there was a greater variety of angles).

Available from Sowerby Research Centre, British Aerospace.

https://www.caranddriver.com/features/a32266303/self-driving-cars-are-taking-longer-to-build-than-everyone-thought/.

https://www.ncbi.nlm.nih.gov/books/NBK210143/.

https://www.wired.com/story/done-right-ai-make-policing-fairer/.

Quote during a public talk by a West Coast Professor who, perhaps coincidentally, had a start-up company.

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.

Alcorn, MA., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W., & Nguyen, A. (2019). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In CVPR, Computer Vision Foundation/IEEE (pp. 4845–4854).

Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Neural module networks. In CVPR, IEEE Computer Society (pp. 39–48).

Arbib, M. A., & Bonaiuto, J. J. (2016). From neuron to cognition via computational neuroscience. Cambridge: MIT Press.

Arterberry, M. E., & Kellman, P. J. (2016). Development of perception in infancy: The cradle of knowledge revisited. Oxford: Oxford University Press.

Athalye, A., Carlini, N., & Wagner, DA. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 80, pp. 274–283).

Barlow, H., & Tripathy, S. P. (1997). Correspondence noise and signal pooling in the detection of coherent visual motion. Journal of Neuroscience, 17(20), 7954–7966.

Bashford, A., & Levine, P. (2010). The Oxford handbook of the history of eugenics. OUP USA.

Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332.

Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychological Review, 94(2), 115.

Biggio, B., Corona, I., Maiorca, D., Nelson, B., Srndic, N., Laskov, P., Giacinto, G., & Roli, F. (2013). Evasion attacks against machine learning at test time. In ECML/PKDD (3), Springer, Lecture Notes in Computer Science (Vol. 8190, pp. 387–402).

Bowyer, KW., Kranenburg, C., & Dougherty, S. (1999). Edge detector evaluation using empirical ROC curves. In CVPR, IEEE Computer Society (pp. 1354–1359).

Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G., & Deisseroth, K. (2005). Millisecond-timescale, genetically targeted optical control of neural activity. Nature Neuroscience, 8(9), 1263.

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency (pp. 77–91).

Canny, J. F. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 679–698.

Chang, AX., Funkhouser, TA., Guibas, LJ., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). Shapenet: An information-rich 3d model repository. CoRR abs/1512.03012.

Changizi, M. (2010). The vision revolution: How the latest research overturns everything we thought we knew about human vision. Benbella books.

Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.

Chen, X., & Yuille, AL. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS (pp. 1736–1744).

Chen, X., & Yuille, AL. (2015). Parsing occluded people by flexible compositions. In CVPR, IEEE Computer Society (pp. 3945–3954).

Chen, Y., Zhu, L., Lin, C., Yuille, AL., & Zhang, H. (2007). Rapid inference on a novel AND/OR graph for object detection, segmentation and parsing. In NIPS, Curran Associates, Inc., (pp. 289–296).

Chomsky, N. (2014). Aspects of the theory of syntax. Cambridge: MIT Press.

Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6, 27755.

Clune, J., Mouret, J. B., & Lipson, H. (2013). The evolutionary origins of modularity. Proceedings of the Royal Society B: Biological Sciences, 280(1755), 20122863.

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. MCSS, 2(4), 303–314.MathSciNetMATH

Darwiche, A. (2018). Human-level intelligence or animal-like abilities? Commun ACM, 61(10), 56–67.

Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Li, F. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, IEEE Computer Society (pp. 248–255).

Doersch, C., Gupta, A., & Efros, AA. (2015). Unsupervised visual representation learning by context prediction. In ICCV, IEEE Computer Society (pp. 1422–1430).

Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In NIPS (pp. 2366–2374)

Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

Firestone, C. (2020). Performance versus competence in human-machine comparisons. In Proceedings of the National Academy of Sciences In Press.

Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition, Competition and cooperation in neural nets (pp. 267–285). Berlin: Springer.

Geisler, W. S. (2011). Contributions of ideal observer theory to vision research. Vision Research, 51(7), 771–781.

Geman, S. (2007). Compositionality in vision. In The grammar of vision: probabilistic grammar-based models for visual scene understanding and object categorization.

George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., et al. (2017). A generative vision model that trains with high data efficiency and breaks text-based captchas. Science, 358(6368), eaag2612.

Gibson, J. J. (1986). The ecological approach to visual perception. Hove: Psychology Press.

Girshick, RB., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, IEEE Computer Society (pp. 580–587).

Goodfellow, IJ., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, AC., & Bengio, Y. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).

Goodfellow, IJ., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International Conference on Learning Representations.

Gopnik, A., Meltzoff, A. N., & Kuhl, P. K. (1999). The scientist in the crib: Minds, brains, and how children learn. New York: William Morrow and Co.

Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A theory of causal learning in children: Causal maps and bayes nets. Psychological Review, 111(1), 3.

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New Jersey: John Wiley.

Gregoriou, G. G., Rossi, A. F., Ungerleider, L. G., & Desimone, R. (2014). Lesions of prefrontal cortex reduce attentional modulation of neuronal responses and synchrony in v4. Nature Neuroscience, 17(7), 1003–1011.

Gregory, R. L. (1973). Eye and brain: The psychology of seeing. New York: McGraw-Hill.

Grenander, U. (1993). General pattern theory-A mathematical study of regular structures. Oxford: Clarendon Press.MATH

Guu, K., Pasupat, P., Liu, EZ., & Liang, P. (2017). From language to programs: Bridging reinforcement learning and maximum marginal likelihood. In ACL (1), Association for Computational Linguistics (pp. 1051–1062).

Guzmán, A. (1968). Decomposition of a visual scene into three-dimensional bodies. In Proceedings of the December 9–11, 1968, Fall Joint Computer Conference, Part I (pp. 291–304)

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, IEEE Computer Society (pp. 770–778).

He, K., Fan, H., Wu, Y., Xie, S., & Girshick, RB. (2019). Momentum contrast for unsupervised visual representation learning. CoRR abs/1911.05722.

Hoffman, J., Tzeng, E., Park, T., Zhu, J., Isola, P., Saenko, K., et al. (2018). Cycada: Cycle-consistent adversarial domain adaptation. ICML, PMLR, Proceedings of Machine Learning Research, 80, 1994–2003.

Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In ECCV (3), Springer, Lecture Notes in Computer Science. (Vol. 7574, pp. 340–353).

Hornik, K., Stinchcombe, M. B., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.MATH

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 37, pp. 448–456).

Jabr, F. (2012). The connectome debate: Is mapping the mind of a worm worth it. New York: Scientific American.

Jégou, S., Drozdzal, M., Vázquez, D., Romero, A., & Bengio, Y.(2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In CVPR Workshops, IEEE Computer Society (pp. 1175–1183).

Julesz, B. (1971). Foundations of cyclopean perception. Chicago: U. Chicago Press.

Kaushik, D., Hovy, EH., & Lipton, ZC. (2020). Learning the difference that makes A difference with counterfactually-augmented data. In ICLR, OpenReview.net.

Kokkinos, I. (2017). Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, IEEE Computer Society (pp. 5454–5463).

Konishi, S., Yuille, AL., Coughlan, JM., & Zhu, SC. (1999). Fundamental bounds on edge detection: An information theoretic evaluation of different edge cues. In CVPR, IEEE Computer Society (pp. 1573–1579)

Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74.

Kortylewski, A., He, J., Liu, Q., & Yuille, AL. (2020). Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. CoRR abs/2003.04490.

Kortylewski, A., Liu, Q., Wang, H., Zhang, Z., & Yuille, AL. (2020). Combining compositional models and deep networks for robust object classification under occlusion. In WACV, IEEE (pp. 1322–1330).

Krizhevsky, A., Sutskever, I., & Hinton, GE. (2012). Imagenet classification with deep convolutional neural networks. In NIPS (pp. 1106–1114).

LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.

Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. JOSA A, 20(7), 1434–1448.

Lin, X., Wang, H., Li, Z., Zhang, Y., Yuille, AL., & Lee, TS. (2017). Transfer of view-manifold learning to similarity perception of novel objects. In International Conference on Learning Representations.

Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L., Fei-Fei, L., Yuille, AL., Huang, J., & Murphy, K. (2018). Progressive neural architecture search. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 11205, pp. 19–35).

Liu, C., Dollár, P., He, K., Girshick, RB., Yuille, AL., & Xie, S. (2020). Are labels necessary for neural architecture search? CoRR abs/2003.12056.

Liu, R., Liu, C., Bai, Y., & Yuille, AL. (2019). Clevr-ref+: Diagnosing visual reasoning with referring expressions. In CVPR, Computer Vision Foundation/IEEE (pp. 4185–4194).

Liu, Z., Knill, D. C., & Kersten, D. (1995). Object classification for human and ideal observers. Vision Research, 35(4), 549–568.

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR, IEEE Computer Society (pp. 3431–3440).

Lu, H., & Yuille, AL. (2005). Ideal observers for detecting motion: Correspondence noise. In NIPS (pp. 827–834).

Lyu, J., Qiu, W., Wei, X., Zhang, Y., Yuille, AL., & Zha, Z. (2019). Identity preserve transform: Understand what activity classification models have learnt. CoRR abs/1912.06314.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. CoRR abs/1706.06083.

Mao, J., Wei, X., Yang, Y., Wang, J., Huang, Z., & Yuille, AL. (2015). Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, IEEE Computer Society (pp. 2533–2541).

Marcus, G. (2018). Deep learning: A critical appraisal. CoRR abs/1801.00631.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Henry Holt and Co. Inc.

Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR, IEEE Computer Society (pp. 4040–4048).

McManus, J. N., Li, W., & Gilbert, C. D. (2011). Adaptive shape processing in primary visual cortex. Proceedings of the National Academy of Sciences, 108(24), 9739–9746.

Mengistu, H., Huizinga, J., Mouret, J., & Clune, J. (2016). The evolutionary origins of hierarchy. PLoS Computational Biology, 12(6), e1004829.

Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. CoRR abs/1411.1784.

Mu, J., Qiu, W., Hager, GD., & Yuille, AL. (2019). Learning from synthetic animals. CoRR abs/1912.08265.

Mumford, D. (1994). Pattern theory: a unifying perspective. In First European Congress of Mathematics, Springer (pp. 187–224).

Mumford, D., & Desolneux, A. (2010). Pattern theory: The stochastic analysis of real-world signals. Cambridge: CRC Press.MATH

Murez, Z., Kolouri, S., Kriegman, DJ., Ramamoorthi, R., Kim, K. (2018). Image to image translation for domain adaptation. In CVPR (pp. 4500–4509), 10.1109/CVPR.2018.00473, http://openaccess.thecvf.com/content_cvpr_2018/html/Murez_Image_to_Image_CVPR_2018_paper.html.

Noroozi, M., Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV (6), Springer, Lecture Notes in Computer Science (Vol. 9910, pp. 69–84).

Papandreou, G., Chen, L., Murphy, KP., Yuille, AL. (2015). Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In ICCV, IEEE Computer Society (pp. 1742–1750).

Pearl, J. (1989). Probabilistic reasoning in intelligent systems—networks of plausible inference. Morgan Kaufmann series in representation and reasoning, Morgan Kaufmann.

Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.MATH

Penn, D. C., Holyoak, K. J., & Povinelli, D. J. (2008). Darwin’s mistake: Explaining the discontinuity between human and nonhuman minds. Behavioral and Brain Sciences, 31(2), 109–130.

Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. ICML, PMLR, Proceedings of Machine Learning Research, 80, 4092–4101.

Poirazi, P., & Mel, B. W. (2001). Impact of active dendrites and structural plasticity on the memory capacity of neural tissue. Neuron, 29(3), 779–796.

Qiao, S., Liu, C., Shen, W., & Yuille, AL. (2018). Few-shot image recognition by predicting parameters from activations. In CVPR, IEEE Computer Society (pp. 7229–7238).

Qiu, W., & Yuille, AL. (2016). Unrealcv: Connecting computer vision to unreal engine. In ECCV Workshops (3), Lecture Notes in Computer Science (Vol. 9915, pp. 909–916).

Ren, S., He, K., Girshick, RB., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).

Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., & Zha, H. (2017). Unsupervised deep learning for optical flow estimation. In AAAI, AAAI Press (pp. 1495–1501).

Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373.

Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019.

Rosenfeld, A., Zemel, RS., & Tsotsos, JK. (2018). The elephant in the room. CoRR abs/1808.03305.

Rother, C., Kolmogorov, V., & Blake, A. (2004). “Grabcut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.MATH

Russell, S. J., & Norvig, P. (2010). Artificial Intelligence—A Modern Approach. Pearson Education: Third International Edition.

Sabour, S., Frosst, N., & Hinton, GE. (2017). Dynamic routing between capsules. In NIPS (pp. 3856–3866).

Salakhutdinov, R., Tenenbaum, JB., & Torralba, A. (2012). One-shot learning with a hierarchical nonparametric bayesian model. In ICML Unsupervised and Transfer Learning, JMLR.org, JMLR Proceedings (Vol. 27, pp. 195–206).

Santoro, A., Hill, F., Barrett, DGT., Morcos, AS., & Lillicrap, TP. (2018). Measuring abstract reasoning in neural networks. In ICML, JMLR.org, JMLR Workshop and Conference Proceedings (Vol. 80, pp. 4477–4486).

Seung, S. (2012). Connectome: How the brain’s wiring makes us who we are. HMH.

Shen, W., Zhao, K., Jiang, Y., Wang, Y., Bai, X., & Yuille, A. L. (2017a). Deepskeleton: Learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Transactions on Image Processing, 26(11), 5298–5311.MathSciNetMATH

Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., & Xue, X. (2017) DSOD: learning deeply supervised object detectors from scratch. In ICCV, IEEE Computer Society (pp. 1937–1945).

Shu ,M., Liu, C., Qiu, W., & Yuille, AL. (2020). Identifying model weakness with adversarial examiner. In AAAI, AAAI Press, (pp. 11998–12006).

Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception, 28(9), 1059–1074.

Simonyan ,K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.

Smirnakis, SM., & Yuille, AL. (1995). Neural implementation of bayesian vision theories by unsupervised learning. In The Neurobiology of Computation, Springer, (pp. 427–432).

Smith, L., & Gasser, M. (2005). The development of embodied cognition: Six lessons from babies. Artificial Life, 11(1–2), 13–29.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, IJ., & Fergus, R. (2014). Intriguing properties of neural networks. In International Conference on Learning Representations.

Tjan, B. S., Braje, W. L., Legge, G. E., & Kersten, D. (1995). Human efficiency for recognizing 3-d objects in luminance noise. Vision Research, 35(21), 3053–3069.

Torralba, A., & Efros, AA. (2011). Unbiased look at dataset bias. In CVPR, IEEE Computer Society (pp. 1521–1528).

Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., & Madry, A. (2019). Robustness may be at odds with accuracy. In ICLR (Poster), OpenReview.net.

Tu, Z., Chen, X., Yuille, AL., & Zhu, SC. (2003). Image parsing: Unifying segmentation, detection, and recognition. In ICCV, IEEE Computer Society (pp. 18–25).

Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In CVPR (pp. 2962–2971), 10.1109/CVPR.2017.316, https://doi.org/10.1109/CVPR.2017.316

Uesato, J., O’Donoghue, B., Kohli, P., & van den Oord, A. (2018). Adversarial risk and the dangers of evaluating against weak attacks. ICML, PMLR, Proceedings of Machine Learning Research, 80, 5032–5041.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In NIPS (pp. 5998–6008).

Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In NIPS (pp. 3630–3638).

Wang, J., Zhang, Z., Premachandran, V., & Yuille, AL. (2015). Discovering internal representations from object-cnns using population encoding. CoRR abs/1511.06855.

Wang, J., Zhang, Z., Xie, C., Zhou, Y., Premachandran, V., Zhu, J., et al. (2018). Visual concepts and compositional voting. Annals of Mathematical Sciences and Applications, 2(3), 4.MathSciNetMATH

Wang, P., & Yuille, AL. (2016). DOC: deep occlusion estimation from a single image. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 9905, pp. 545–561).

Wang, T., Zhao, J., Yatskar, M., Chang, K., & Ordonez, V. (2019). Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In ICCV, IEEE (pp. 5309–5318).

Wang, X., & Gupta, A. (2015). Unsupervised learning of visual representations using videos. In ICCV, IEEE Computer Society (pp. 2794–2802).

Wen, H., Shi, J., Zhang, Y., Lu, K. H., Cao, J., & Liu, Z. (2017). Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex, 28, 1–25.

Wu, Z., Xiong, Y., Yu, SX., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In CVPR, IEEE Computer Society (pp. 3733–3742).

Xia, F., Wang, P., Chen, L., & Yuille, AL. (2016), Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net. In ECCV (5), Springer, Lecture Notes in Computer Science (Vol. 9909, pp. 648–663).

Xia, Y., Zhang, Y., Liu, F., Shen, W., & Yuille, AL. (2020).Synthesize then compare: Detecting failures and anomalies for semantic segmentation. CoRR abs/2003.08440.

Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., & Yuille, AL. (2017). Adversarial examples for semantic segmentation and object detection. In ICCV, IEEE Computer Society (pp. 1378–1387).

Xie, C., Wang, J., Zhangm, Z., Ren, Z., & Yuille, AL. (2018). Mitigating adversarial effects through randomization. In International Conference on Learning Representations.

Xie, L., & Yuille, AL. (2017). Genetic CNN. In ICCV, IEEE Computer Society (pp. 1388–1397).

Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In ICCV, IEEE Computer Society (pp. 1395–1403).

Xu, L., Krzyzak, A., & Yuille, A. L. (1994). On radial basis function nets and kernel regression: Statistical consistency, convergence rates, and receptive field size. Neural Networks, 7(4), 609–628.MATH

Yamane, Y., Carlson, E. T., Bowman, K. C., Wang, Z., & Connor, C. E. (2008). A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nature Neuroscience, 11(11), 1352–1360.

Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624.

Yang, C., Kortylewski, A., Xie, C., Cao, Y., & Yuille, AL. (2020). Patchattack: A black-box texture-based attack with reinforcement learning. CoRR abs/2004.05682.

Yosinski, J., Clune, J., Nguyen, AM., Fuchs. TJ., & Lipson, H. (2015). Understanding neural networks through deep visualization. CoRR abs/1506.06579.

Yuille, A., & Kersten, D. (2006). Vision as bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7), 301–308.

Yuille, A. L., & Mottaghi, R. (2016). Complexity of representation and inference in compositional models with part sharing. Journal of Machine Learning Research, 17, 292–319.MathSciNetMATH

Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR, IEEE Computer Society (pp. 1592–1599).

Zeiler, MD., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV (1), Springer, Lecture Notes in Computer Science (Vol. 8689, pp. 818–833).

Zendel, O., Murschitz, M., Humenberger, M., & Herzner, W. (2015). CV-HAZOP: introducing test data validation for computer vision. In ICCV, IEEE Computer Society (pp. 2066–2074).

Zhang, R., Isola, P., & Efros, AA. (2016). Colorful image colorization. In ECCV (3), Springer, Lecture Notes in Computer Science (Vol. 9907, pp. 649–666).

Zhang, Y., Qiu, W., Chen, Q., Hu, X., & Yuille, AL. (2018). Unrealstereo: Controlling hazardous factors to analyze stereo vision. In 3DV, IEEE Computer Society (pp. 228–237).

Zhang, Z., Shen, W., Qiao, S., Wang, Y., Wang, B., & Yuille, AL. (2020). Robust face detection via learning small faces on hard images. In WACV, IEEE (pp. 1350–1359).

Zhou, B., Lapedriza, À., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS (pp. 487–495).

Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene cnns. In International Conference on Learning Representations.

Zhou, T., Brown, M., Snavely, N., & Lowe, DG. (2017). Unsupervised learning of depth and ego-motion from video. In CVPR, IEEE Computer Society (pp. 6612–6619).

Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10(1), 1–9.

Zhu, H., Tang, P., Yuille, AL., Park, S., & Park, J. (2019). Robustness of object recognition under extreme occlusion in humans and computational models. In CogSci, cognitivesciencesociety.org (pp. 3213–3219).

Zhu, L., Chen, Y., Torralba, A., Freeman, WT., Yuille, AL. (2010). Part and appearance sharing: Recursive compositional models for multi-view. In CVPR, IEEE Computer Society (pp. 1919–1926).

Zhu, S., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.MATH

Zhu, Z., Xie, L., & Yuille, AL. (2017). Object recognition with and without objects. In IJCAI, ijcai.org (pp. 3609–3615).

Zitnick, C. L., Agrawal, A., Antol, S., Mitchell, M., Batra, D., & Parikh, D. (2016). Measuring machine intelligence through visual question answering. AI Magazine, 37(1), 63–72.

Zoph, B., & Le, QV. (2017). Neural architecture search with reinforcement learning. In ICLR, OpenReview.net.

Titel: Deep Nets: What have They Ever Done for Vision?
verfasst von: Alan L. Yuille
Chenxi Liu
Publikationsdatum: 27.11.2020
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 3/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-020-01405-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2021

Correction to: Rooted Spanning Superpixels

Entrack: Probabilistic Spherical Regression with Entropy Regularization for Fiber Tractography

Weakly Supervised Group Mask Network for Object Detection

Viewpoint and Scale Consistency Reinforcement for UAV Vehicle Re-Identification

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

CDTD: A Large-Scale Cross-Domain Benchmark for Instance-Level Image-to-Image Translation and Domain Adaptive Object Detection

Premium Partner