Skip to main content
Erschienen in: International Journal of Computer Vision 9/2019

23.03.2019

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

verfasst von: Xiangteng He, Yuxin Peng, Junjie Zhao

Erschienen in: International Journal of Computer Vision | Ausgabe 9/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories that belong to the same superclass. Since the distinctions among similar subcategories are quite subtle and local, it is highly challenging to distinguish them from each other even for humans. So the localization of distinctions is essential for fine-grained visual categorization, and there are two pivotal problems: (1) Which regions are discriminative and representative to distinguish from other subcategories? (2) How many discriminative regions are necessary to achieve the best categorization performance? It is still difficult to address these two problems adaptively and intelligently. Artificial prior and experimental validation are widely used in existing mainstream methods to discover which and how many regions to gaze. However, their applications extremely restrict the usability and scalability of the methods. To address the above two problems, this paper proposes a multi-scale and multi-granularity deep reinforcement learning approach (M2DRL), which learns multi-granularity discriminative region attention and multi-scale region-based feature representation. Its main contributions are as follows: (1) Multi-granularity discriminative localization is proposed to localize the distinctions via a two-stage deep reinforcement learning approach, which discovers the discriminative regions with multiple granularities in a hierarchical manner (“which problem”), and determines the number of discriminative regions in an automatic and adaptive manner (“how many problem”). (2) Multi-scale representation learning helps to localize regions in different scales as well as encode images in different scales, boosting the fine-grained visual categorization performance. (3) Semantic reward function is proposed to drive M2DRL to fully capture the salient and conceptual visual information, via jointly considering attention and category information in the reward function. It allows the deep reinforcement learning to localize the distinctions in a weakly supervised manner or even an unsupervised manner. (4) Unsupervised discriminative localization is further explored to avoid the heavy labor consumption of annotating, and extremely strengthen the usability and scalability of our M2DRL approach. Compared with state-of-the-art methods on two widely-used fine-grained visual categorization datasets, our M2DRL approach achieves the best categorization accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. In International conference on learning representations (ICLR). arXiv:1412.7755. Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. In International conference on learning representations (ICLR). arXiv:​1412.​7755.
Zurück zum Zitat Berg, T., & Belhumeur, P. (2013). Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 955–962). Berg, T., & Belhumeur, P. (2013). Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 955–962).
Zurück zum Zitat Branson, S., Van Horn, G., Belongie, S., & Perona, P. (2014a). Bird species categorization using pose normalized deep convolutional nets. arXiv:1406.2952. Branson, S., Van Horn, G., Belongie, S., & Perona, P. (2014a). Bird species categorization using pose normalized deep convolutional nets. arXiv:​1406.​2952.
Zurück zum Zitat Branson, S., Van Horn, G., Wah, C., Perona, P., & Belongie, S. (2014b). The ignorant led by the blind: A hybrid human-machine vision system for fine-grained categorization. International Journal of Computer Vision (IJCV), 108(1–2), 3–29.MathSciNetMATH Branson, S., Van Horn, G., Wah, C., Perona, P., & Belongie, S. (2014b). The ignorant led by the blind: A hybrid human-machine vision system for fine-grained categorization. International Journal of Computer Vision (IJCV), 108(1–2), 3–29.MathSciNetMATH
Zurück zum Zitat Cai, S., Zuo, W., & Zhang, L. (2017). Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–520). Cai, S., Zuo, W., & Zhang, L. (2017). Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–520).
Zurück zum Zitat Caicedo, J. C., & Lazebnik, S. (2015). Active object localization with deep reinforcement learning. In International conference of computer vision (ICCV), IEEE (pp. 2488–2496). Caicedo, J. C., & Lazebnik, S. (2015). Active object localization with deep reinforcement learning. In International conference of computer vision (ICCV), IEEE (pp. 2488–2496).
Zurück zum Zitat Chai, Y., Lempitsky, V., & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 321–328). Chai, Y., Lempitsky, V., & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 321–328).
Zurück zum Zitat Cui, Y., Zhou, F., Lin, Y., & Belongie, S. (2016). Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1153–1162). Cui, Y., Zhou, F., Lin, Y., & Belongie, S. (2016). Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1153–1162).
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 248–255).
Zurück zum Zitat Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef
Zurück zum Zitat Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR). Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Girshick, R. (2015). Fast R-CNN. In International conference of computer vision (ICCV). Girshick, R. (2015). Fast R-CNN. In International conference of computer vision (ICCV).
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580–587). Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580–587).
Zurück zum Zitat Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2018). Do semantic parts emerge in convolutional neural networks? International Journal of Computer Vision (IJCV), 126(5), 476–494.MathSciNetCrossRef Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2018). Do semantic parts emerge in convolutional neural networks? International Journal of Computer Vision (IJCV), 126(5), 476–494.MathSciNetCrossRef
Zurück zum Zitat He, X., & Peng, Y. (2017a). Fine-grained image classification via combining vision and language. In IEEE conference on computer vision and pattern recognition (CVPR). He, X., & Peng, Y. (2017a). Fine-grained image classification via combining vision and language. In IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat He, X., & Peng, Y. (2017b). Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In: AAAI conference on artificial intelligence (AAAI) (pp. 4075–4081). He, X., & Peng, Y. (2017b). Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In: AAAI conference on artificial intelligence (AAAI) (pp. 4075–4081).
Zurück zum Zitat Huang, S., Xu, Z., Tao, D., & Zhang, Y. (2016). Part-stacked cnn for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1173–1182). Huang, S., Xu, Z., Tao, D., & Zhang, Y. (2016). Part-stacked cnn for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1173–1182).
Zurück zum Zitat Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:​1502.​03167.
Zurück zum Zitat Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203.CrossRef Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203.CrossRef
Zurück zum Zitat Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Neural information processing systems (NIPS) (NIPS) (pp. 2017–2025). Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Neural information processing systems (NIPS) (NIPS) (pp. 2017–2025).
Zurück zum Zitat Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., & Yan, S. (2016). Tree-structured reinforcement learning for sequential object localization. In Neural information processing systems (NIPS) (pp. 127–135). Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., & Yan, S. (2016). Tree-structured reinforcement learning for sequential object localization. In Neural information processing systems (NIPS) (pp. 127–135).
Zurück zum Zitat Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research (JAIR), 4, 237–285.CrossRef Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research (JAIR), 4, 237–285.CrossRef
Zurück zum Zitat Kong, S., & Fowlkes, C. (2017). Low-rank bilinear pooling for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7025–7034). IEEE. Kong, S., & Fowlkes, C. (2017). Low-rank bilinear pooling for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7025–7034). IEEE.
Zurück zum Zitat Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 554–561). Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 554–561).
Zurück zum Zitat Krause, J., Gebru, T., Deng, J., Li, L. J., & Fei-Fei, L. (2014). Learning features and parts for fine-grained recognition. In International conference on pattern recognition (ICPR) (pp. 26–33). Krause, J., Gebru, T., Deng, J., Li, L. J., & Fei-Fei, L. (2014). Learning features and parts for fine-grained recognition. In International conference on pattern recognition (ICPR) (pp. 26–33).
Zurück zum Zitat Krause, J., Jin, H., Yang, J., Fei-Fei, L. (2015). Fine-grained recognition without part annotations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5546–5555). Krause, J., Jin, H., Yang, J., Fei-Fei, L. (2015). Fine-grained recognition without part annotations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5546–5555).
Zurück zum Zitat LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef
Zurück zum Zitat Lin, D., Shen, X., Lu, C., & Jia, J. (2015a). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1666–1674). Lin, D., Shen, X., Lu, C., & Jia, J. (2015a). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1666–1674).
Zurück zum Zitat Lin, T. Y., RoyChowdhury, A., & Maji, S. (2015b). Bilinear CNN models for fine-grained visual recognition. In: International conference of computer vision (ICCV) (pp. 1449–1457). Lin, T. Y., RoyChowdhury, A., & Maji, S. (2015b). Bilinear CNN models for fine-grained visual recognition. In: International conference of computer vision (ICCV) (pp. 1449–1457).
Zurück zum Zitat Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv:1306.5151. Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv:​1306.​5151.
Zurück zum Zitat Mathe, S., Pirinen, A., & Sminchisescu, C. (2016). Reinforcement learning for visual object detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2894–2902). Mathe, S., Pirinen, A., & Sminchisescu, C. (2016). Reinforcement learning for visual object detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2894–2902).
Zurück zum Zitat Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.CrossRef
Zurück zum Zitat Neider, M. B., & Zelinsky, G. J. (2006). Searching for camouflaged targets: Effects of target-background similarity on visual search. Vision Research, 46(14), 2217–2235.CrossRef Neider, M. B., & Zelinsky, G. J. (2006). Searching for camouflaged targets: Effects of target-background similarity on visual search. Vision Research, 46(14), 2217–2235.CrossRef
Zurück zum Zitat Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729). Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729).
Zurück zum Zitat Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.CrossRef Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.CrossRef
Zurück zum Zitat Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123.CrossRef Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123.CrossRef
Zurück zum Zitat Peng, Y., He, X., & Zhao, J. (2018). Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing (TIP), 27(3), 1487–1500.MathSciNetCrossRefMATH Peng, Y., He, X., & Zhao, J. (2018). Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing (TIP), 27(3), 1487–1500.MathSciNetCrossRefMATH
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS) (pp. 91–99). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS) (pp. 91–99).
Zurück zum Zitat Sfar, A. R., Boujemaa, N., & Geman, D. (2015). Confidence sets for fine-grained categorization and plant species identification. International Journal of Computer Vision (IJCV), 111(3), 255–275.MathSciNetCrossRef Sfar, A. R., Boujemaa, N., & Geman, D. (2015). Confidence sets for fine-grained categorization and plant species identification. International Journal of Computer Vision (IJCV), 111(3), 255–275.MathSciNetCrossRef
Zurück zum Zitat Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In International conference of computer vision (ICCV) (pp. 1143–1151). Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In International conference of computer vision (ICCV) (pp. 1143–1151).
Zurück zum Zitat Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). Cambridge: MIT Press.MATH Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). Cambridge: MIT Press.MATH
Zurück zum Zitat Tatler, B. W., Baddeley, R. J., & Vincent, B. T. (2006). The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task. Vision Research, 46(12), 1857–1862.CrossRef Tatler, B. W., Baddeley, R. J., & Vincent, B. T. (2006). The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task. Vision Research, 46(12), 1857–1862.CrossRef
Zurück zum Zitat Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2), 154–171.CrossRef Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2), 154–171.CrossRef
Zurück zum Zitat Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI) (Vol. 16, pp. 2094–2100). Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI) (Vol. 16, pp. 2094–2100).
Zurück zum Zitat Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. California Inst. Technol., Pasadena, CA, USA, Tech. Rep (CNS-TR-2011-001). Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. California Inst. Technol., Pasadena, CA, USA, Tech. Rep (CNS-TR-2011-001).
Zurück zum Zitat Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., & Zhang, Z. (2015). Multiple granularity descriptors for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 2399–2406). Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., & Zhang, Z. (2015). Multiple granularity descriptors for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 2399–2406).
Zurück zum Zitat Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3360–3367). Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3360–3367).
Zurück zum Zitat Wang, Y., Choi, J., Morariu, V., & Davis, L. S. (2016a). Mining discriminative triplets of patches for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1163–1172). Wang, Y., Choi, J., Morariu, V., & Davis, L. S. (2016a). Mining discriminative triplets of patches for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1163–1172).
Zurück zum Zitat Wang, Y., Morariu, V. I., & Davis, L. S. (2016b). Weakly-supervised discriminative patch learning via CNN for fine-grained recognition. arXiv:1611.09932. Wang, Y., Morariu, V. I., & Davis, L. S. (2016b). Weakly-supervised discriminative patch learning via CNN for fine-grained recognition. arXiv:​1611.​09932.
Zurück zum Zitat Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016c). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML) (pp. 1995–2003). Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016c). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML) (pp. 1995–2003).
Zurück zum Zitat Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 842–850). Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 842–850).
Zurück zum Zitat Xie, L., Tian, Q., Hong, R., Yan, S., & Zhang, B. (2013). Hierarchical part matching for fine-grained visual categorization. In International conference of computer vision (ICCV) (pp. 1641–1648). Xie, L., Tian, Q., Hong, R., Yan, S., & Zhang, B. (2013). Hierarchical part matching for fine-grained visual categorization. In International conference of computer vision (ICCV) (pp. 1641–1648).
Zurück zum Zitat Xie, L., Zheng, L., Wang, J., Yuille, A. L., & Tian, Q. (2016). Interactive: Inter-layer activeness propagation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 270–279). Xie, L., Zheng, L., Wang, J., Yuille, A. L., & Tian, Q. (2016). Interactive: Inter-layer activeness propagation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 270–279).
Zurück zum Zitat Xie, S., Yang, T., Wang, X., & Lin, Y. (2015). Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2645–2654). Xie, S., Yang, T., Wang, X., & Lin, Y. (2015). Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2645–2654).
Zurück zum Zitat Xu, Z., Huang, S., Zhang, Y., & Tao, D. (2018). Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(5), 1100–1113.CrossRef Xu, Z., Huang, S., Zhang, Y., & Tao, D. (2018). Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(5), 1100–1113.CrossRef
Zurück zum Zitat Xu, Z., Tao, D., Huang, S., & Zhang, Y. (2017). Friend or foe: Fine-grained categorization with weak supervision. IEEE Transactions on Image Processing (TIP), 26(1), 135–146.MathSciNetCrossRefMATH Xu, Z., Tao, D., Huang, S., & Zhang, Y. (2017). Friend or foe: Fine-grained categorization with weak supervision. IEEE Transactions on Image Processing (TIP), 26(1), 135–146.MathSciNetCrossRefMATH
Zurück zum Zitat Yang, S., Bo, L., Wang, J., & Shapiro, L. G. (2012). Unsupervised template learning for fine-grained object recognition. In Neural information processing systems (NIPS) (pp. 3122–3130). Yang, S., Bo, L., Wang, J., & Shapiro, L. G. (2012). Unsupervised template learning for fine-grained object recognition. In Neural information processing systems (NIPS) (pp. 3122–3130).
Zurück zum Zitat Yao, H., Zhang, S., Zhang, Y., Li, J., & Tian, Q. (2016). Coarse-to-fine description for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(10), 4858–4872.MathSciNetCrossRef Yao, H., Zhang, S., Zhang, Y., Li, J., & Tian, Q. (2016). Coarse-to-fine description for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(10), 4858–4872.MathSciNetCrossRef
Zurück zum Zitat Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., & Metaxas, D. (2016a). Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1143–1152). Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., & Metaxas, D. (2016a). Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1143–1152).
Zurück zum Zitat Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision (IJCV), 73(2), 213–238.CrossRef Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision (IJCV), 73(2), 213–238.CrossRef
Zurück zum Zitat Zhang, L., Yang, Y., Wang, M., Hong, R., Nie, L., & Li, X. (2016b). Detecting densely distributed graph patterns for fine-grained image categorization. IEEE Transactions on Image Processing (TIP), 25(2), 553–565.MathSciNetCrossRefMATH Zhang, L., Yang, Y., Wang, M., Hong, R., Nie, L., & Li, X. (2016b). Detecting densely distributed graph patterns for fine-grained image categorization. IEEE Transactions on Image Processing (TIP), 25(2), 553–565.MathSciNetCrossRefMATH
Zurück zum Zitat Zhang, N., Farrell, R., Iandola, F., & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In International conference of computer vision (ICCV) (pp. 729–736). Zhang, N., Farrell, R., Iandola, F., & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In International conference of computer vision (ICCV) (pp. 729–736).
Zurück zum Zitat Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNs for fine-grained category detection. In International conference on machine learning (ICML) (pp. 834–849). Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNs for fine-grained category detection. In International conference on machine learning (ICML) (pp. 834–849).
Zurück zum Zitat Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016c). Picking deep filter responses for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1134–1142). Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016c). Picking deep filter responses for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1134–1142).
Zurück zum Zitat Zhang, X., Xiong, H., Zhou, W., & Tian, Q. (2016d). Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(2), 878–892.MathSciNetCrossRefMATH Zhang, X., Xiong, H., Zhou, W., & Tian, Q. (2016d). Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(2), 878–892.MathSciNetCrossRefMATH
Zurück zum Zitat Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2017). Picking neural activations for fine-grained recognition. IEEE Transactions on Multimedia (TMM), 19(12), 2736–2750. Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2017). Picking neural activations for fine-grained recognition. IEEE Transactions on Multimedia (TMM), 19(12), 2736–2750.
Zurück zum Zitat Zhang, Y., Wei, X. S., Wu, J., Cai, J., Lu, J., Nguyen, V. A., et al. (2016e). Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing (TIP), 25(4), 1713–1725.MathSciNetCrossRefMATH Zhang, Y., Wei, X. S., Wu, J., Cai, J., Lu, J., Nguyen, V. A., et al. (2016e). Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing (TIP), 25(4), 1713–1725.MathSciNetCrossRefMATH
Zurück zum Zitat Zhao, B., Wu, X., Feng, J., Peng, Q., & Yan, S. (2017a). Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia (TMM), 19(6), 1245–1256.CrossRef Zhao, B., Wu, X., Feng, J., Peng, Q., & Yan, S. (2017a). Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia (TMM), 19(6), 1245–1256.CrossRef
Zurück zum Zitat Zhao, D., Chen, Y., & Lv, L. (2017b). Deep reinforcement learning with visual attention for vehicle classification. IEEE Transactions on Cognitive and Developmental Systems, 9(4), 356–367.CrossRef Zhao, D., Chen, Y., & Lv, L. (2017b). Deep reinforcement learning with visual attention for vehicle classification. IEEE Transactions on Cognitive and Developmental Systems, 9(4), 356–367.CrossRef
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene CNNs. In International conference on learning representations (ICLR). Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene CNNs. In International conference on learning representations (ICLR).
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2921–2929). Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2921–2929).
Zurück zum Zitat Zhou, F., & Lin, Y. (2016). Fine-grained image classification by exploring bipartite-graph labels. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1124–1133). Zhou, F., & Lin, Y. (2016). Fine-grained image classification by exploring bipartite-graph labels. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1124–1133).
Metadaten
Titel
Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization
verfasst von
Xiangteng He
Yuxin Peng
Junjie Zhao
Publikationsdatum
23.03.2019
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 9/2019
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01176-2

Weitere Artikel der Ausgabe 9/2019

International Journal of Computer Vision 9/2019 Zur Ausgabe