Skip to main content
Erschienen in: International Journal of Computer Vision 3/2021

24.11.2020

Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition Under Occlusion

verfasst von: Adam Kortylewski, Qing Liu, Angtian Wang, Yihong Sun, Alan Yuille

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Computer vision systems in real-world applications need to be robust to partial occlusion while also being explainable. In this work, we show that black-box deep convolutional neural networks (DCNNs) have only limited robustness to partial occlusion. We overcome these limitations by unifying DCNNs with part-based models into Compositional Convolutional Neural Networks (CompositionalNets)—an interpretable deep architecture with innate robustness to partial occlusion. Specifically, we propose to replace the fully connected classification head of DCNNs with a differentiable compositional model that can be trained end-to-end. The structure of the compositional model enables CompositionalNets to decompose images into objects and context, as well as to further decompose object representations in terms of individual parts and the objects’ pose. The generative nature of our compositional model enables it to localize occluders and to recognize objects based on their non-occluded parts. We conduct extensive experiments in terms of image classification and object detection on images of artificially occluded objects from the PASCAL3D+ and ImageNet dataset, and real images of partially occluded vehicles from the MS-COCO dataset. Our experiments show that CompositionalNets made from several popular DCNN backbones (VGG-16, ResNet50, ResNext) improve by a large margin over their non-compositional counterparts at classifying and detecting partially occluded objects. Furthermore, they can localize occluders accurately despite being trained with class-level supervision only. Finally, we demonstrate that CompositionalNets provide human interpretable predictions as their individual components can be understood as detecting parts and estimating an objects’ viewpoint.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alain, G., & Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644. Alain, G., & Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:​1610.​01644.
Zurück zum Zitat Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.
Zurück zum Zitat Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6(Sep), 1345–1382.MathSciNetMATH Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6(Sep), 1345–1382.MathSciNetMATH
Zurück zum Zitat Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6541–6549). Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6541–6549).
Zurück zum Zitat Brendel, W., & Bethge, M. (2019). Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:1904.00760. Brendel, W., & Bethge, M. (2019). Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:​1904.​00760.
Zurück zum Zitat Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154–6162). Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154–6162).
Zurück zum Zitat Chen, Y., Zhu, L., Lin, C., Zhang, H., & Yuille, A. L. (2008). Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In Advances in neural information processing systems (pp. 289–296). Chen, Y., Zhu, L., Lin, C., Zhang, H., & Yuille, A. L. (2008). Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In Advances in neural information processing systems (pp. 289–296).
Zurück zum Zitat Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2018). Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501. Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2018). Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:​1805.​09501.
Zurück zum Zitat Dai, J., Hong, Y., Hu, W., Zhu, S. C., & Nian Wu, Y. (2014) Unsupervised learning of dictionaries of hierarchical compositional models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2505–2512). Dai, J., Hong, Y., Hu, W., Zhu, S. C., & Nian Wu, Y. (2014) Unsupervised learning of dictionaries of hierarchical compositional models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2505–2512).
Zurück zum Zitat Dechter, R., & Mateescu, R. (2007). And/or search spaces for graphical models. Artificial Intelligence, 171(2–3), 73–106.MathSciNetCrossRef Dechter, R., & Mateescu, R. (2007). And/or search spaces for graphical models. Artificial Intelligence, 171(2–3), 73–106.MathSciNetCrossRef
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.
Zurück zum Zitat DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552. DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:​1708.​04552.
Zurück zum Zitat Economist, T. (2017). Why uber’s self-driving car killed a pedestrian. Economist, T. (2017). Why uber’s self-driving car killed a pedestrian.
Zurück zum Zitat Fawzi, A., & Frossard, P. (2016). Measuring the effect of nuisance variables on classifiers. Technical report. Fawzi, A., & Frossard, P. (2016). Measuring the effect of nuisance variables on classifiers. Technical report.
Zurück zum Zitat Fidler, S., Boben, M., & Leonardis, A. (2014). Learning a hierarchical compositional shape vocabulary for multi-class object representation. arXiv preprint arXiv:1408.5516. Fidler, S., Boben, M., & Leonardis, A. (2014). Learning a hierarchical compositional shape vocabulary for multi-class object representation. arXiv preprint arXiv:​1408.​5516.
Zurück zum Zitat Fong, R., & Vedaldi, A. (2018). Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8730–8738). Fong, R., & Vedaldi, A. (2018). Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8730–8738).
Zurück zum Zitat George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., et al. (2017). A generative vision model that trains with high data efficiency and breaks text-based captchas. Science, 358(6368), eaag2612.CrossRef George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., et al. (2017). A generative vision model that trains with high data efficiency and breaks text-based captchas. Science, 358(6368), eaag2612.CrossRef
Zurück zum Zitat Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448). Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587). Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Zurück zum Zitat Girshick, R., Iandola, F., Darrell, T., & Malik, J. (2015). Deformable part models are convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 437–446). Girshick, R., Iandola, F., Darrell, T., & Malik, J. (2015). Deformable part models are convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 437–446).
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Zurück zum Zitat Hu, Z., Ma, X., Liu, Z., Hovy, E., & Xing, E. (2016). Harnessing deep neural networks with logic rules. arXiv preprint arXiv:1603.06318. Hu, Z., Ma, X., Liu, Z., Hovy, E., & Xing, E. (2016). Harnessing deep neural networks with logic rules. arXiv preprint arXiv:​1603.​06318.
Zurück zum Zitat Huber, P. J. (2011). Robust statistics. Berlin: Springer. Huber, P. J. (2011). Robust statistics. Berlin: Springer.
Zurück zum Zitat Jian Sun, Y. L., & Kang, S. B. (2018). Symmetric stereo matching for occlusion handling. In IEEE conference on computer vision and pattern recognition. Jian Sun, Y. L., & Kang, S. B. (2018). Symmetric stereo matching for occlusion handling. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) (vol. 2, pp. 2145–2152). IEEE. Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) (vol. 2, pp. 2145–2152). IEEE.
Zurück zum Zitat Kortylewski, A. (2017). Model-based image analysis for forensic shoe print recognition. Ph.D. thesis, University\(\_\)of\(\_\)Basel. Kortylewski, A. (2017). Model-based image analysis for forensic shoe print recognition. Ph.D. thesis, University\(\_\)of\(\_\)Basel.
Zurück zum Zitat Kortylewski, A., He, J., Liu, Q., & Yuille, A. (2020a). Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. In Proceedings of the IEEE conference on computer vision and pattern recognition. Kortylewski, A., He, J., Liu, Q., & Yuille, A. (2020a). Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Kortylewski, A., Liu, Q., Wang, H., Zhang, Z., & Yuille, A. (2020b). Combining compositional models and deep networks for robust object classification under occlusion. In The IEEE Winter Conference on Applications of Computer Vision. Kortylewski, A., Liu, Q., Wang, H., Zhang, Z., & Yuille, A. (2020b). Combining compositional models and deep networks for robust object classification under occlusion. In The IEEE Winter Conference on Applications of Computer Vision.
Zurück zum Zitat Kortylewski, A., & Vetter, T. (2016). Probabilistic compositional active basis models for robust pattern recognition. In British machine vision conference. Kortylewski, A., & Vetter, T. (2016). Probabilistic compositional active basis models for robust pattern recognition. In British machine vision conference.
Zurück zum Zitat Kortylewski, A., Wieczorek, A., Wieser, M., Blumer, C., Parbhoo, S., Morel-Forster, A., Roth, V., & Vetter, T. (2019). Greedy structure learning of hierarchical compositional models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 11612–11621). Kortylewski, A., Wieczorek, A., Wieser, M., Blumer, C., Parbhoo, S., Morel-Forster, A., Roth, V., & Vetter, T. (2019). Greedy structure learning of hierarchical compositional models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 11612–11621).
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Zurück zum Zitat Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE. Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE.
Zurück zum Zitat Le, Q. V. (2013). Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8595–8598). IEEE. Le, Q. V. (2013). Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8595–8598). IEEE.
Zurück zum Zitat Li, A., & Yuan, Z. (2018). Symmnet: A symmetric convolutional neural network for occlusion detection. In British machine vision conference. Li, A., & Yuan, Z. (2018). Symmnet: A symmetric convolutional neural network for occlusion detection. In British machine vision conference.
Zurück zum Zitat Li, X., Song, X., & Wu, T. (2019). Aognets: Compositional grammatical architectures for deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6220–6230). Li, X., Song, X., & Wu, T. (2019). Aognets: Compositional grammatical architectures for deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6220–6230).
Zurück zum Zitat Li, Y., Li, B., Tian, B., & Yao, Q. (2013). Vehicle detection based on the and-or graph for congested traffic conditions. IEEE Transactions on Intelligent Transportation Systems, 14(2), 984–993.CrossRef Li, Y., Li, B., Tian, B., & Yao, Q. (2013). Vehicle detection based on the and-or graph for congested traffic conditions. IEEE Transactions on Intelligent Transportation Systems, 14(2), 984–993.CrossRef
Zurück zum Zitat Liao, R., Schwing, A., Zemel, R., & Urtasun, R. (2016). Learning deep parsimonious representations. In Advances in neural information processing systems (pp. 5076–5084). Liao, R., Schwing, A., Zemel, R., & Urtasun, R. (2016). Learning deep parsimonious representations. In Advances in neural information processing systems (pp. 5076–5084).
Zurück zum Zitat Lin, L., Wang, X., Yang, W., & Lai, J. H. (2014). Discriminatively trained and-or graph models for object shape detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(5), 959–972.CrossRef Lin, L., Wang, X., Yang, W., & Lai, J. H. (2014). Discriminatively trained and-or graph models for object shape detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(5), 959–972.CrossRef
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: European conference on computer vision (pp. 740–755). Springer. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: European conference on computer vision (pp. 740–755). Springer.
Zurück zum Zitat Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5188–5196). Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5188–5196).
Zurück zum Zitat Montavon, G., Samek, W., & Müller, K. R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15.MathSciNetCrossRef Montavon, G., Samek, W., & Müller, K. R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15.MathSciNetCrossRef
Zurück zum Zitat Narasimhan, N. D. R. M. V. S. G. (2019). Occlusion-net: 2d/3d occluded keypoint localization using graph networks. IEEE conference on computer vision and pattern recognition. Narasimhan, N. D. R. M. V. S. G. (2019). Occlusion-net: 2d/3d occluded keypoint localization using graph networks. IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., & Clune, J. (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In Advances in neural information processing systems (pp. 3387–3395). Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., & Clune, J. (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In Advances in neural information processing systems (pp. 3387–3395).
Zurück zum Zitat Nilsson, N. J., et al. (1980). Principles of artificial intelligence. Nilsson, N. J., et al. (1980). Principles of artificial intelligence.
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99).
Zurück zum Zitat Ross, A. S., Hughes, M. C., & Doshi-Velez, F. (2017) Right for the right reasons: Training differentiable models by constraining their explanations. arXiv preprint arXiv:1703.03717. Ross, A. S., Hughes, M. C., & Doshi-Velez, F. (2017) Right for the right reasons: Training differentiable models by constraining their explanations. arXiv preprint arXiv:​1703.​03717.
Zurück zum Zitat Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in neural information processing systems (pp. 3856–3866). Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in neural information processing systems (pp. 3856–3866).
Zurück zum Zitat Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:​1312.​6034.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556.
Zurück zum Zitat Song, X., Wu, T., Jia, Y., & Zhu, S. C. (2013). Discriminatively trained and-or tree models for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3278–3285). Song, X., Wu, T., Jia, Y., & Zhu, S. C. (2013). Discriminatively trained and-or tree models for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3278–3285).
Zurück zum Zitat Stone, A., Wang, H., Stark, M., Liu, Y., Scott Phoenix, D., & George, D. (2017). Teaching compositionality to cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5058–5067). Stone, A., Wang, H., Stark, M., Liu, Y., Scott Phoenix, D., & George, D. (2017). Teaching compositionality to cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5058–5067).
Zurück zum Zitat Tabernik, D., Kristan, M., Wyatt, J. L., & Leonardis, A. (2016). Towards deep compositional networks. In 2016 23rd international conference on pattern recognition (ICPR) (pp. 3470–3475). IEEE. Tabernik, D., Kristan, M., Wyatt, J. L., & Leonardis, A. (2016). Towards deep compositional networks. In 2016 23rd international conference on pattern recognition (ICPR) (pp. 3470–3475). IEEE.
Zurück zum Zitat Tang, W., Yu, P., & Wu, Y. (2018). Deeply learned compositional models for human pose estimation. In Proceedings of the European conference on computer vision (ECCV) (pp. 190–206). Tang, W., Yu, P., & Wu, Y. (2018). Deeply learned compositional models for human pose estimation. In Proceedings of the European conference on computer vision (ECCV) (pp. 190–206).
Zurück zum Zitat Tang, W., Yu, P., Zhou, J., & Wu, Y. (2017). Towards a unified compositional model for visual pattern modeling. In Proceedings of the IEEE international conference on computer vision (pp. 2784–2793). Tang, W., Yu, P., Zhou, J., & Wu, Y. (2017). Towards a unified compositional model for visual pattern modeling. In Proceedings of the IEEE international conference on computer vision (pp. 2784–2793).
Zurück zum Zitat Wang, A., Sun, Y., Kortylewski, A., & Yuille, A. (2020). Robust object detection under occlusion with context-aware compositionalnets. In Proceedings of the IEEE conference on computer vision and pattern recognition. Wang, A., Sun, Y., Kortylewski, A., & Yuille, A. (2020). Robust object detection under occlusion with context-aware compositionalnets. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wang, J., Xie, C., Zhang, Z., Zhu, J., Xie, L., & Yuille, A. (2017). Detecting semantic parts on partially occluded objects. arXiv preprint arXiv:1707.07819. Wang, J., Xie, C., Zhang, Z., Zhu, J., Xie, L., & Yuille, A. (2017). Detecting semantic parts on partially occluded objects. arXiv preprint arXiv:​1707.​07819.
Zurück zum Zitat Wang, J., Zhang, Z., Xie, C., Premachandran, V., & Yuille, A. (2015) Unsupervised learning of object semantic parts from internal states of cnns by population encoding. arXiv preprint arXiv:1511.06855. Wang, J., Zhang, Z., Xie, C., Premachandran, V., & Yuille, A. (2015) Unsupervised learning of object semantic parts from internal states of cnns by population encoding. arXiv preprint arXiv:​1511.​06855.
Zurück zum Zitat Wang, J., Zhang, Z., Xie, C., Zhou, Y., Premachandran, V., Zhu, J., Xie, L., & Yuille, A. (2017). Visual concepts and compositional voting. arXiv preprint arXiv:1711.04451. Wang, J., Zhang, Z., Xie, C., Zhou, Y., Premachandran, V., Zhu, J., Xie, L., & Yuille, A. (2017). Visual concepts and compositional voting. arXiv preprint arXiv:​1711.​04451.
Zurück zum Zitat Wu, T., Li, B., & Zhu, S. C. (2015). Learning and-or model to represent context and occlusion for car detection and viewpoint estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1829–1843.CrossRef Wu, T., Li, B., & Zhu, S. C. (2015). Learning and-or model to represent context and occlusion for car detection and viewpoint estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1829–1843.CrossRef
Zurück zum Zitat Xia, F., Zhu, J., Wang, P., & Yuille, A. L. (2016). Pose-guided human parsing by an and/or graph using pose-context features. In Thirtieth AAAI conference on artificial intelligence. Xia, F., Zhu, J., Wang, P., & Yuille, A. L. (2016). Pose-guided human parsing by an and/or graph using pose-context features. In Thirtieth AAAI conference on artificial intelligence.
Zurück zum Zitat Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3d object detection in the wild. In IEEE winter conference on applications of computer vision (pp. 75–820). IEEE. Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3d object detection in the wild. In IEEE winter conference on applications of computer vision (pp. 75–820). IEEE.
Zurück zum Zitat Xiang, Y., & Savarese, S. (2013). Object detection by 3d aspectlets and occlusion reasoning. Xiang, Y., & Savarese, S. (2013). Object detection by 3d aspectlets and occlusion reasoning.
Zurück zum Zitat Xiao, M., Kortylewski, A., Wu, R., Qiao, S., Shen, W., & Yuille, A. (2019). Tdapnet: Prototype network with recurrent top-down attention for robust object classification under partial occlusion. arXiv preprint arXiv:1909.03879. Xiao, M., Kortylewski, A., Wu, R., Qiao, S., Shen, W., & Yuille, A. (2019). Tdapnet: Prototype network with recurrent top-down attention for robust object classification under partial occlusion. arXiv preprint arXiv:​1909.​03879.
Zurück zum Zitat Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500). Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500).
Zurück zum Zitat Yan, S., & Liu, Q. (2015). Inferring occluded features for fast object detection. In Signal processing (Vol 110). Yan, S., & Liu, Q. (2015). Inferring occluded features for fast object detection. In Signal processing (Vol 110).
Zurück zum Zitat Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. arXiv preprint arXiv:1905.04899. Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. arXiv preprint arXiv:​1905.​04899.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818–833). Springer. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818–833). Springer.
Zurück zum Zitat Zhang, Q., Nian Wu, Y., & Zhu, S. C. (2018a). Interpretable convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8827–8836). Zhang, Q., Nian Wu, Y., & Zhu, S. C. (2018a). Interpretable convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8827–8836).
Zurück zum Zitat Zhang, Q. S., & Zhu, S. C. (2018). Visual interpretability for deep learning: A survey. Frontiers of Information Technology and Electronic Engineering, 19(1), 27–39.CrossRef Zhang, Q. S., & Zhu, S. C. (2018). Visual interpretability for deep learning: A survey. Frontiers of Information Technology and Electronic Engineering, 19(1), 27–39.CrossRef
Zurück zum Zitat Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018b). Occlusion-aware r-cnn: detecting pedestrians in a crowd, pp. 637–653. Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018b). Occlusion-aware r-cnn: detecting pedestrians in a crowd, pp. 637–653.
Zurück zum Zitat Zhang, Z., Xie, C., Wang, J., Xie, L., & Yuille, A. L. (2018c). Deepvoting: A robust and explainable deep network for semantic part detection under partial occlusion. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1372–1380). Zhang, Z., Xie, C., Wang, J., Xie, L., & Yuille, A. L. (2018c). Deepvoting: A robust and explainable deep network for semantic part detection under partial occlusion. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1372–1380).
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene cnns. In ICLR. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene cnns. In ICLR.
Zurück zum Zitat Zhu, H., Tang, P., Park, J., Park, S., & Yuille, A. (2019). Robustness of object recognition under extreme occlusion in humans and computational models. In CogSci conference. Zhu, H., Tang, P., Park, J., Park, S., & Yuille, A. (2019). Robustness of object recognition under extreme occlusion in humans and computational models. In CogSci conference.
Zurück zum Zitat Zhu, L., Chen, Y., Lu, Y., Lin, C., & Yuille, A. (2008a). Max margin and/or graph learning for parsing the human body. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE. Zhu, L., Chen, Y., Lu, Y., Lin, C., & Yuille, A. (2008a). Max margin and/or graph learning for parsing the human body. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE.
Zurück zum Zitat Zhu, L. L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008). Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In Computer vision–eccv 2008 (pp. 759–773). Springer. Zhu, L. L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008). Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In Computer vision–eccv 2008 (pp. 759–773). Springer.
Metadaten
Titel
Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition Under Occlusion
verfasst von
Adam Kortylewski
Qing Liu
Angtian Wang
Yihong Sun
Alan Yuille
Publikationsdatum
24.11.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01401-3

Weitere Artikel der Ausgabe 3/2021

International Journal of Computer Vision 3/2021 Zur Ausgabe