Skip to main content
Top
Published in: International Journal of Computer Vision 9/2019

23-03-2019

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

Authors: Xiangteng He, Yuxin Peng, Junjie Zhao

Published in: International Journal of Computer Vision | Issue 9/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories that belong to the same superclass. Since the distinctions among similar subcategories are quite subtle and local, it is highly challenging to distinguish them from each other even for humans. So the localization of distinctions is essential for fine-grained visual categorization, and there are two pivotal problems: (1) Which regions are discriminative and representative to distinguish from other subcategories? (2) How many discriminative regions are necessary to achieve the best categorization performance? It is still difficult to address these two problems adaptively and intelligently. Artificial prior and experimental validation are widely used in existing mainstream methods to discover which and how many regions to gaze. However, their applications extremely restrict the usability and scalability of the methods. To address the above two problems, this paper proposes a multi-scale and multi-granularity deep reinforcement learning approach (M2DRL), which learns multi-granularity discriminative region attention and multi-scale region-based feature representation. Its main contributions are as follows: (1) Multi-granularity discriminative localization is proposed to localize the distinctions via a two-stage deep reinforcement learning approach, which discovers the discriminative regions with multiple granularities in a hierarchical manner (“which problem”), and determines the number of discriminative regions in an automatic and adaptive manner (“how many problem”). (2) Multi-scale representation learning helps to localize regions in different scales as well as encode images in different scales, boosting the fine-grained visual categorization performance. (3) Semantic reward function is proposed to drive M2DRL to fully capture the salient and conceptual visual information, via jointly considering attention and category information in the reward function. It allows the deep reinforcement learning to localize the distinctions in a weakly supervised manner or even an unsupervised manner. (4) Unsupervised discriminative localization is further explored to avoid the heavy labor consumption of annotating, and extremely strengthen the usability and scalability of our M2DRL approach. Compared with state-of-the-art methods on two widely-used fine-grained visual categorization datasets, our M2DRL approach achieves the best categorization accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. In International conference on learning representations (ICLR). arXiv:1412.7755. Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. In International conference on learning representations (ICLR). arXiv:​1412.​7755.
go back to reference Berg, T., & Belhumeur, P. (2013). Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 955–962). Berg, T., & Belhumeur, P. (2013). Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 955–962).
go back to reference Branson, S., Van Horn, G., Belongie, S., & Perona, P. (2014a). Bird species categorization using pose normalized deep convolutional nets. arXiv:1406.2952. Branson, S., Van Horn, G., Belongie, S., & Perona, P. (2014a). Bird species categorization using pose normalized deep convolutional nets. arXiv:​1406.​2952.
go back to reference Branson, S., Van Horn, G., Wah, C., Perona, P., & Belongie, S. (2014b). The ignorant led by the blind: A hybrid human-machine vision system for fine-grained categorization. International Journal of Computer Vision (IJCV), 108(1–2), 3–29.MathSciNetMATH Branson, S., Van Horn, G., Wah, C., Perona, P., & Belongie, S. (2014b). The ignorant led by the blind: A hybrid human-machine vision system for fine-grained categorization. International Journal of Computer Vision (IJCV), 108(1–2), 3–29.MathSciNetMATH
go back to reference Cai, S., Zuo, W., & Zhang, L. (2017). Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–520). Cai, S., Zuo, W., & Zhang, L. (2017). Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–520).
go back to reference Caicedo, J. C., & Lazebnik, S. (2015). Active object localization with deep reinforcement learning. In International conference of computer vision (ICCV), IEEE (pp. 2488–2496). Caicedo, J. C., & Lazebnik, S. (2015). Active object localization with deep reinforcement learning. In International conference of computer vision (ICCV), IEEE (pp. 2488–2496).
go back to reference Chai, Y., Lempitsky, V., & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 321–328). Chai, Y., Lempitsky, V., & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 321–328).
go back to reference Cui, Y., Zhou, F., Lin, Y., & Belongie, S. (2016). Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1153–1162). Cui, Y., Zhou, F., Lin, Y., & Belongie, S. (2016). Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1153–1162).
go back to reference Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 248–255).
go back to reference Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef
go back to reference Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR). Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Girshick, R. (2015). Fast R-CNN. In International conference of computer vision (ICCV). Girshick, R. (2015). Fast R-CNN. In International conference of computer vision (ICCV).
go back to reference Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580–587). Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580–587).
go back to reference Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2018). Do semantic parts emerge in convolutional neural networks? International Journal of Computer Vision (IJCV), 126(5), 476–494.MathSciNetCrossRef Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2018). Do semantic parts emerge in convolutional neural networks? International Journal of Computer Vision (IJCV), 126(5), 476–494.MathSciNetCrossRef
go back to reference He, X., & Peng, Y. (2017a). Fine-grained image classification via combining vision and language. In IEEE conference on computer vision and pattern recognition (CVPR). He, X., & Peng, Y. (2017a). Fine-grained image classification via combining vision and language. In IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference He, X., & Peng, Y. (2017b). Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In: AAAI conference on artificial intelligence (AAAI) (pp. 4075–4081). He, X., & Peng, Y. (2017b). Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In: AAAI conference on artificial intelligence (AAAI) (pp. 4075–4081).
go back to reference Huang, S., Xu, Z., Tao, D., & Zhang, Y. (2016). Part-stacked cnn for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1173–1182). Huang, S., Xu, Z., Tao, D., & Zhang, Y. (2016). Part-stacked cnn for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1173–1182).
go back to reference Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:​1502.​03167.
go back to reference Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203.CrossRef Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203.CrossRef
go back to reference Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Neural information processing systems (NIPS) (NIPS) (pp. 2017–2025). Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Neural information processing systems (NIPS) (NIPS) (pp. 2017–2025).
go back to reference Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., & Yan, S. (2016). Tree-structured reinforcement learning for sequential object localization. In Neural information processing systems (NIPS) (pp. 127–135). Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., & Yan, S. (2016). Tree-structured reinforcement learning for sequential object localization. In Neural information processing systems (NIPS) (pp. 127–135).
go back to reference Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research (JAIR), 4, 237–285.CrossRef Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research (JAIR), 4, 237–285.CrossRef
go back to reference Kong, S., & Fowlkes, C. (2017). Low-rank bilinear pooling for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7025–7034). IEEE. Kong, S., & Fowlkes, C. (2017). Low-rank bilinear pooling for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7025–7034). IEEE.
go back to reference Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 554–561). Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 554–561).
go back to reference Krause, J., Gebru, T., Deng, J., Li, L. J., & Fei-Fei, L. (2014). Learning features and parts for fine-grained recognition. In International conference on pattern recognition (ICPR) (pp. 26–33). Krause, J., Gebru, T., Deng, J., Li, L. J., & Fei-Fei, L. (2014). Learning features and parts for fine-grained recognition. In International conference on pattern recognition (ICPR) (pp. 26–33).
go back to reference Krause, J., Jin, H., Yang, J., Fei-Fei, L. (2015). Fine-grained recognition without part annotations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5546–5555). Krause, J., Jin, H., Yang, J., Fei-Fei, L. (2015). Fine-grained recognition without part annotations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5546–5555).
go back to reference LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef
go back to reference Lin, D., Shen, X., Lu, C., & Jia, J. (2015a). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1666–1674). Lin, D., Shen, X., Lu, C., & Jia, J. (2015a). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1666–1674).
go back to reference Lin, T. Y., RoyChowdhury, A., & Maji, S. (2015b). Bilinear CNN models for fine-grained visual recognition. In: International conference of computer vision (ICCV) (pp. 1449–1457). Lin, T. Y., RoyChowdhury, A., & Maji, S. (2015b). Bilinear CNN models for fine-grained visual recognition. In: International conference of computer vision (ICCV) (pp. 1449–1457).
go back to reference Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv:1306.5151. Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv:​1306.​5151.
go back to reference Mathe, S., Pirinen, A., & Sminchisescu, C. (2016). Reinforcement learning for visual object detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2894–2902). Mathe, S., Pirinen, A., & Sminchisescu, C. (2016). Reinforcement learning for visual object detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2894–2902).
go back to reference Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.CrossRef
go back to reference Neider, M. B., & Zelinsky, G. J. (2006). Searching for camouflaged targets: Effects of target-background similarity on visual search. Vision Research, 46(14), 2217–2235.CrossRef Neider, M. B., & Zelinsky, G. J. (2006). Searching for camouflaged targets: Effects of target-background similarity on visual search. Vision Research, 46(14), 2217–2235.CrossRef
go back to reference Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729). Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729).
go back to reference Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.CrossRef Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.CrossRef
go back to reference Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123.CrossRef Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123.CrossRef
go back to reference Peng, Y., He, X., & Zhao, J. (2018). Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing (TIP), 27(3), 1487–1500.MathSciNetCrossRefMATH Peng, Y., He, X., & Zhao, J. (2018). Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing (TIP), 27(3), 1487–1500.MathSciNetCrossRefMATH
go back to reference Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS) (pp. 91–99). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS) (pp. 91–99).
go back to reference Sfar, A. R., Boujemaa, N., & Geman, D. (2015). Confidence sets for fine-grained categorization and plant species identification. International Journal of Computer Vision (IJCV), 111(3), 255–275.MathSciNetCrossRef Sfar, A. R., Boujemaa, N., & Geman, D. (2015). Confidence sets for fine-grained categorization and plant species identification. International Journal of Computer Vision (IJCV), 111(3), 255–275.MathSciNetCrossRef
go back to reference Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In International conference of computer vision (ICCV) (pp. 1143–1151). Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In International conference of computer vision (ICCV) (pp. 1143–1151).
go back to reference Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). Cambridge: MIT Press.MATH Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). Cambridge: MIT Press.MATH
go back to reference Tatler, B. W., Baddeley, R. J., & Vincent, B. T. (2006). The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task. Vision Research, 46(12), 1857–1862.CrossRef Tatler, B. W., Baddeley, R. J., & Vincent, B. T. (2006). The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task. Vision Research, 46(12), 1857–1862.CrossRef
go back to reference Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2), 154–171.CrossRef Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2), 154–171.CrossRef
go back to reference Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI) (Vol. 16, pp. 2094–2100). Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI) (Vol. 16, pp. 2094–2100).
go back to reference Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. California Inst. Technol., Pasadena, CA, USA, Tech. Rep (CNS-TR-2011-001). Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. California Inst. Technol., Pasadena, CA, USA, Tech. Rep (CNS-TR-2011-001).
go back to reference Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., & Zhang, Z. (2015). Multiple granularity descriptors for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 2399–2406). Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., & Zhang, Z. (2015). Multiple granularity descriptors for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 2399–2406).
go back to reference Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3360–3367). Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3360–3367).
go back to reference Wang, Y., Choi, J., Morariu, V., & Davis, L. S. (2016a). Mining discriminative triplets of patches for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1163–1172). Wang, Y., Choi, J., Morariu, V., & Davis, L. S. (2016a). Mining discriminative triplets of patches for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1163–1172).
go back to reference Wang, Y., Morariu, V. I., & Davis, L. S. (2016b). Weakly-supervised discriminative patch learning via CNN for fine-grained recognition. arXiv:1611.09932. Wang, Y., Morariu, V. I., & Davis, L. S. (2016b). Weakly-supervised discriminative patch learning via CNN for fine-grained recognition. arXiv:​1611.​09932.
go back to reference Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016c). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML) (pp. 1995–2003). Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016c). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML) (pp. 1995–2003).
go back to reference Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 842–850). Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 842–850).
go back to reference Xie, L., Tian, Q., Hong, R., Yan, S., & Zhang, B. (2013). Hierarchical part matching for fine-grained visual categorization. In International conference of computer vision (ICCV) (pp. 1641–1648). Xie, L., Tian, Q., Hong, R., Yan, S., & Zhang, B. (2013). Hierarchical part matching for fine-grained visual categorization. In International conference of computer vision (ICCV) (pp. 1641–1648).
go back to reference Xie, L., Zheng, L., Wang, J., Yuille, A. L., & Tian, Q. (2016). Interactive: Inter-layer activeness propagation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 270–279). Xie, L., Zheng, L., Wang, J., Yuille, A. L., & Tian, Q. (2016). Interactive: Inter-layer activeness propagation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 270–279).
go back to reference Xie, S., Yang, T., Wang, X., & Lin, Y. (2015). Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2645–2654). Xie, S., Yang, T., Wang, X., & Lin, Y. (2015). Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2645–2654).
go back to reference Xu, Z., Huang, S., Zhang, Y., & Tao, D. (2018). Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(5), 1100–1113.CrossRef Xu, Z., Huang, S., Zhang, Y., & Tao, D. (2018). Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(5), 1100–1113.CrossRef
go back to reference Xu, Z., Tao, D., Huang, S., & Zhang, Y. (2017). Friend or foe: Fine-grained categorization with weak supervision. IEEE Transactions on Image Processing (TIP), 26(1), 135–146.MathSciNetCrossRefMATH Xu, Z., Tao, D., Huang, S., & Zhang, Y. (2017). Friend or foe: Fine-grained categorization with weak supervision. IEEE Transactions on Image Processing (TIP), 26(1), 135–146.MathSciNetCrossRefMATH
go back to reference Yang, S., Bo, L., Wang, J., & Shapiro, L. G. (2012). Unsupervised template learning for fine-grained object recognition. In Neural information processing systems (NIPS) (pp. 3122–3130). Yang, S., Bo, L., Wang, J., & Shapiro, L. G. (2012). Unsupervised template learning for fine-grained object recognition. In Neural information processing systems (NIPS) (pp. 3122–3130).
go back to reference Yao, H., Zhang, S., Zhang, Y., Li, J., & Tian, Q. (2016). Coarse-to-fine description for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(10), 4858–4872.MathSciNetCrossRef Yao, H., Zhang, S., Zhang, Y., Li, J., & Tian, Q. (2016). Coarse-to-fine description for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(10), 4858–4872.MathSciNetCrossRef
go back to reference Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., & Metaxas, D. (2016a). Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1143–1152). Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., & Metaxas, D. (2016a). Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1143–1152).
go back to reference Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision (IJCV), 73(2), 213–238.CrossRef Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision (IJCV), 73(2), 213–238.CrossRef
go back to reference Zhang, L., Yang, Y., Wang, M., Hong, R., Nie, L., & Li, X. (2016b). Detecting densely distributed graph patterns for fine-grained image categorization. IEEE Transactions on Image Processing (TIP), 25(2), 553–565.MathSciNetCrossRefMATH Zhang, L., Yang, Y., Wang, M., Hong, R., Nie, L., & Li, X. (2016b). Detecting densely distributed graph patterns for fine-grained image categorization. IEEE Transactions on Image Processing (TIP), 25(2), 553–565.MathSciNetCrossRefMATH
go back to reference Zhang, N., Farrell, R., Iandola, F., & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In International conference of computer vision (ICCV) (pp. 729–736). Zhang, N., Farrell, R., Iandola, F., & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In International conference of computer vision (ICCV) (pp. 729–736).
go back to reference Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNs for fine-grained category detection. In International conference on machine learning (ICML) (pp. 834–849). Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNs for fine-grained category detection. In International conference on machine learning (ICML) (pp. 834–849).
go back to reference Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016c). Picking deep filter responses for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1134–1142). Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016c). Picking deep filter responses for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1134–1142).
go back to reference Zhang, X., Xiong, H., Zhou, W., & Tian, Q. (2016d). Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(2), 878–892.MathSciNetCrossRefMATH Zhang, X., Xiong, H., Zhou, W., & Tian, Q. (2016d). Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(2), 878–892.MathSciNetCrossRefMATH
go back to reference Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2017). Picking neural activations for fine-grained recognition. IEEE Transactions on Multimedia (TMM), 19(12), 2736–2750. Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2017). Picking neural activations for fine-grained recognition. IEEE Transactions on Multimedia (TMM), 19(12), 2736–2750.
go back to reference Zhang, Y., Wei, X. S., Wu, J., Cai, J., Lu, J., Nguyen, V. A., et al. (2016e). Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing (TIP), 25(4), 1713–1725.MathSciNetCrossRefMATH Zhang, Y., Wei, X. S., Wu, J., Cai, J., Lu, J., Nguyen, V. A., et al. (2016e). Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing (TIP), 25(4), 1713–1725.MathSciNetCrossRefMATH
go back to reference Zhao, B., Wu, X., Feng, J., Peng, Q., & Yan, S. (2017a). Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia (TMM), 19(6), 1245–1256.CrossRef Zhao, B., Wu, X., Feng, J., Peng, Q., & Yan, S. (2017a). Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia (TMM), 19(6), 1245–1256.CrossRef
go back to reference Zhao, D., Chen, Y., & Lv, L. (2017b). Deep reinforcement learning with visual attention for vehicle classification. IEEE Transactions on Cognitive and Developmental Systems, 9(4), 356–367.CrossRef Zhao, D., Chen, Y., & Lv, L. (2017b). Deep reinforcement learning with visual attention for vehicle classification. IEEE Transactions on Cognitive and Developmental Systems, 9(4), 356–367.CrossRef
go back to reference Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene CNNs. In International conference on learning representations (ICLR). Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene CNNs. In International conference on learning representations (ICLR).
go back to reference Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2921–2929). Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2921–2929).
go back to reference Zhou, F., & Lin, Y. (2016). Fine-grained image classification by exploring bipartite-graph labels. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1124–1133). Zhou, F., & Lin, Y. (2016). Fine-grained image classification by exploring bipartite-graph labels. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1124–1133).
Metadata
Title
Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization
Authors
Xiangteng He
Yuxin Peng
Junjie Zhao
Publication date
23-03-2019
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 9/2019
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01176-2

Other articles of this Issue 9/2019

International Journal of Computer Vision 9/2019 Go to the issue

Premium Partner