Skip to main content
Erschienen in: International Journal of Computer Vision 5/2018

Open Access 17.10.2017

Do Semantic Parts Emerge in Convolutional Neural Networks?

verfasst von: Abel Gonzalez-Garcia, Davide Modolo, Vittorio Ferrari

Erschienen in: International Journal of Computer Vision | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Semantic object parts can be useful for several visual recognition tasks. Lately, these tasks have been addressed using Convolutional Neural Networks (CNN), achieving outstanding results. In this work we study whether CNNs learn semantic parts in their internal representation. We investigate the responses of convolutional filters and try to associate their stimuli with semantic parts. We perform two extensive quantitative analyses. First, we use ground-truth part bounding-boxes from the PASCAL-Part dataset to determine how many of those semantic parts emerge in the CNN. We explore this emergence for different layers, network depths, and supervision levels. Second, we collect human judgements in order to study what fraction of all filters systematically fire on any semantic part, even if not annotated in PASCAL-Part. Moreover, we explore several connections between discriminative power and semantics. We find out which are the most discriminative filters for object recognition, and analyze whether they respond to semantic parts or to other image patches. We also investigate the other direction: we determine which semantic parts are the most discriminative and whether they correspond to those parts emerging in the network. This enables to gain an even deeper understanding of the role of semantic parts in the network.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural networks for object recognition. In ECCV. Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural networks for object recognition. In ECCV.
Zurück zum Zitat Caesar, H., Uijlings, J., & Ferrari, V. (2015). Joint calibration for semantic segmentation. In BMVC. Caesar, H., Uijlings, J., & Ferrari, V. (2015). Joint calibration for semantic segmentation. In BMVC.
Zurück zum Zitat Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., & Yuille, A. (2014). Detect what you can: Detecting and representing objects using holistic models and body parts. In CVPR. Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., & Yuille, A. (2014). Detect what you can: Detecting and representing objects using holistic models and body parts. In CVPR.
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histogram of oriented gradients for human detection. In CVPR. Dalal, N., & Triggs, B. (2005). Histogram of oriented gradients for human detection. In CVPR.
Zurück zum Zitat Eigen, D., Rolfe, J., Fergus, R., & LeCun, Y. (2013). Understanding deep architectures using a recursive convolutional network. In ICLR workshop. Eigen, D., Rolfe, J., Fergus, R., & LeCun, Y. (2013). Understanding deep architectures using a recursive convolutional network. In ICLR workshop.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Zurück zum Zitat Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE transactions on pattern analysis and machine intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE transactions on pattern analysis and machine intelligence, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Girshick, R. (2015). Fast R-CNN. In ICCV. Girshick, R. (2015). Fast R-CNN. In ICCV.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.
Zurück zum Zitat Gkioxari, G., Girshick, R., & Malik, J. (2015). Actions and attributes from wholes and parts. In ICCV. Gkioxari, G., Girshick, R., & Malik, J. (2015). Actions and attributes from wholes and parts. In ICCV.
Zurück zum Zitat Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2015). Hypercolumns for object segmentation and fine-grained localization. In CVPR. Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2015). Hypercolumns for object segmentation and fine-grained localization. In CVPR.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV. He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
Zurück zum Zitat Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In CVPR. Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In CVPR.
Zurück zum Zitat Lin, D., Shen, X., Lu, C., & Jia, J. (2015). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In CVPR. Lin, D., Shen, X., Lu, C., & Jia, J. (2015). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In CVPR.
Zurück zum Zitat Liu, J., Li, Y., & Belhumeur, P.N. (2014). Part-pair representation for part localization. In ECCV. Liu, J., Li, Y., & Belhumeur, P.N. (2014). Part-pair representation for part localization. In ECCV.
Zurück zum Zitat Long, J., Shelhamer, E., & Darrell, T. (2015) Fully convolutional networks for semantic segmentation. In CVPR. Long, J., Shelhamer, E., & Darrell, T. (2015) Fully convolutional networks for semantic segmentation. In CVPR.
Zurück zum Zitat Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
Zurück zum Zitat Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In CVPR. Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In CVPR.
Zurück zum Zitat Mitchell, M. (1998). An introduction to genetic algorithms. Cambridge: MIT Press.MATH Mitchell, M. (1998). An introduction to genetic algorithms. Cambridge: MIT Press.MATH
Zurück zum Zitat Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2015). Is object localization for free?-weakly-supervised learning with convolutional neural networks. In CVPR. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2015). Is object localization for free?-weakly-supervised learning with convolutional neural networks. In CVPR.
Zurück zum Zitat Parkhi, O., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2011). The truth about cats and dogs. In ICCV. Parkhi, O., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2011). The truth about cats and dogs. In ICCV.
Zurück zum Zitat Parkhi, O.M., Vedaldi, A., Zisserman, A., & Jawahar, C. (2012) .Cats and dogs. In CVPR. Parkhi, O.M., Vedaldi, A., Zisserman, A., & Jawahar, C. (2012) .Cats and dogs. In CVPR.
Zurück zum Zitat Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.CrossRef Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.CrossRef
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Zurück zum Zitat Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In ICCV. Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In ICCV.
Zurück zum Zitat Simon, M., Rodner, E., & Denzler, J. (2014). Part detector discovery in deep convolutional neural networks. In ACCV. Simon, M., Rodner, E., & Denzler, J. (2014). Part detector discovery in deep convolutional neural networks. In ACCV.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015) Very deep convolutional networks for large-scale image recognition. In ICLR. Simonyan, K., & Zisserman, A. (2015) Very deep convolutional networks for large-scale image recognition. In ICLR.
Zurück zum Zitat Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR workshop. Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR workshop.
Zurück zum Zitat Sun, M., & Savarese, S. (2011). Articulated part-based model for joint object detection and pose estimation. In ICCV. Sun, M., & Savarese, S. (2011). Articulated part-based model for joint object detection and pose estimation. In ICCV.
Zurück zum Zitat Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In ICLR. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In ICLR.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.
Zurück zum Zitat Ukita, N. (2012). Articulated pose estimation with parts connectivity using discriminative local oriented contours. In CVPR. Ukita, N. (2012). Articulated pose estimation with parts connectivity using discriminative local oriented contours. In CVPR.
Zurück zum Zitat Vedaldi, A., Mahendran, S., Tsogkas, S., Maji, S., Girshick, R., Kannala, J., et al. (2014). Understanding objects in detail with fine-grained attributes. In CVPR. Vedaldi, A., Mahendran, S., Tsogkas, S., Maji, S., Girshick, R., Kannala, J., et al. (2014). Understanding objects in detail with fine-grained attributes. In CVPR.
Zurück zum Zitat Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR. Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR.
Zurück zum Zitat Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In NIPS. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In NIPS.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.
Zurück zum Zitat Zhang, N., Farrell, R., Iandola, F., & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In ICCV. Zhang, N., Farrell, R., Iandola, F., & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In ICCV.
Zurück zum Zitat Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNS for fine-grained category detection. In ECCV. Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNS for fine-grained category detection. In ECCV.
Zurück zum Zitat Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS.
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene cnns. In ICLR. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene cnns. In ICLR.
Metadaten
Titel
Do Semantic Parts Emerge in Convolutional Neural Networks?
verfasst von
Abel Gonzalez-Garcia
Davide Modolo
Vittorio Ferrari
Publikationsdatum
17.10.2017
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 5/2018
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-017-1048-0

Weitere Artikel der Ausgabe 5/2018

International Journal of Computer Vision 5/2018 Zur Ausgabe

Premium Partner