nach oben

International Journal of Computer Vision

Erschienen in:

23.03.2019

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

verfasst von: Xiangteng He, Yuxin Peng, Junjie Zhao

Erschienen in: International Journal of Computer Vision | Ausgabe 9/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories that belong to the same superclass. Since the distinctions among similar subcategories are quite subtle and local, it is highly challenging to distinguish them from each other even for humans. So the localization of distinctions is essential for fine-grained visual categorization, and there are two pivotal problems: (1) Which regions are discriminative and representative to distinguish from other subcategories? (2) How many discriminative regions are necessary to achieve the best categorization performance? It is still difficult to address these two problems adaptively and intelligently. Artificial prior and experimental validation are widely used in existing mainstream methods to discover which and how many regions to gaze. However, their applications extremely restrict the usability and scalability of the methods. To address the above two problems, this paper proposes a multi-scale and multi-granularity deep reinforcement learning approach (M2DRL), which learns multi-granularity discriminative region attention and multi-scale region-based feature representation. Its main contributions are as follows: (1) Multi-granularity discriminative localization is proposed to localize the distinctions via a two-stage deep reinforcement learning approach, which discovers the discriminative regions with multiple granularities in a hierarchical manner (“which problem”), and determines the number of discriminative regions in an automatic and adaptive manner (“how many problem”). (2) Multi-scale representation learning helps to localize regions in different scales as well as encode images in different scales, boosting the fine-grained visual categorization performance. (3) Semantic reward function is proposed to drive M2DRL to fully capture the salient and conceptual visual information, via jointly considering attention and category information in the reward function. It allows the deep reinforcement learning to localize the distinctions in a weakly supervised manner or even an unsupervised manner. (4) Unsupervised discriminative localization is further explored to avoid the heavy labor consumption of annotating, and extremely strengthen the usability and scalability of our M2DRL approach. Compared with state-of-the-art methods on two widely-used fine-grained visual categorization datasets, our M2DRL approach achieves the best categorization accuracy.

Vorheriger Artikel Deep Supervised Hashing for Fast Image Retrieval

Nächster Artikel Deep Learning Approach in Aerial Imagery for Supporting Land Search and Rescue Missions

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.vision.caltech.edu/visipedia/CUB-200-2011.html.

http://ai.stanford.edu/~jkrause/cars/car_dataset.html.

Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. In International conference on learning representations (ICLR). arXiv:1412.7755.

Berg, T., & Belhumeur, P. (2013). Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 955–962).

Branson, S., Van Horn, G., Belongie, S., & Perona, P. (2014a). Bird species categorization using pose normalized deep convolutional nets. arXiv:1406.2952.

Branson, S., Van Horn, G., Wah, C., Perona, P., & Belongie, S. (2014b). The ignorant led by the blind: A hybrid human-machine vision system for fine-grained categorization. International Journal of Computer Vision (IJCV), 108(1–2), 3–29.MathSciNetMATH

Cai, S., Zuo, W., & Zhang, L. (2017). Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–520).

Caicedo, J. C., & Lazebnik, S. (2015). Active object localization with deep reinforcement learning. In International conference of computer vision (ICCV), IEEE (pp. 2488–2496).

Chai, Y., Lempitsky, V., & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 321–328).

Cui, Y., Zhou, F., Lin, Y., & Belongie, S. (2016). Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1153–1162).

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 248–255).

Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef

Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR).

Girshick, R. (2015). Fast R-CNN. In International conference of computer vision (ICCV).

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580–587).

Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2018). Do semantic parts emerge in convolutional neural networks? International Journal of Computer Vision (IJCV), 126(5), 476–494.MathSciNetCrossRef

He, X., & Peng, Y. (2017a). Fine-grained image classification via combining vision and language. In IEEE conference on computer vision and pattern recognition (CVPR).

He, X., & Peng, Y. (2017b). Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In: AAAI conference on artificial intelligence (AAAI) (pp. 4075–4081).

Huang, S., Xu, Z., Tao, D., & Zhang, Y. (2016). Part-stacked cnn for fine-grained visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1173–1182).

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167.

Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203.CrossRef

Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Neural information processing systems (NIPS) (NIPS) (pp. 2017–2025).

Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., & Yan, S. (2016). Tree-structured reinforcement learning for sequential object localization. In Neural information processing systems (NIPS) (pp. 127–135).

Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research (JAIR), 4, 237–285.CrossRef

Kong, S., & Fowlkes, C. (2017). Low-rank bilinear pooling for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7025–7034). IEEE.

Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 554–561).

Krause, J., Gebru, T., Deng, J., Li, L. J., & Fei-Fei, L. (2014). Learning features and parts for fine-grained recognition. In International conference on pattern recognition (ICPR) (pp. 26–33).

Krause, J., Jin, H., Yang, J., Fei-Fei, L. (2015). Fine-grained recognition without part annotations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5546–5555).

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef

Lin, D., Shen, X., Lu, C., & Jia, J. (2015a). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1666–1674).

Lin, T. Y., RoyChowdhury, A., & Maji, S. (2015b). Bilinear CNN models for fine-grained visual recognition. In: International conference of computer vision (ICCV) (pp. 1449–1457).

Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv:1306.5151.

Mathe, S., Pirinen, A., & Sminchisescu, C. (2016). Reinforcement learning for visual object detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2894–2902).

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.CrossRef

Neider, M. B., & Zelinsky, G. J. (2006). Searching for camouflaged targets: Effects of target-background similarity on visual search. Vision Research, 46(14), 2217–2235.CrossRef

Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729).

Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.CrossRef

Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123.CrossRef

Peng, Y., He, X., & Zhao, J. (2018). Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing (TIP), 27(3), 1487–1500.MathSciNetCrossRefMATH

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS) (pp. 91–99).

Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv:1511.05952.

Sfar, A. R., Boujemaa, N., & Geman, D. (2015). Confidence sets for fine-grained categorization and plant species identification. International Journal of Computer Vision (IJCV), 111(3), 255–275.MathSciNetCrossRef

Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In International conference of computer vision (ICCV) (pp. 1143–1151).

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). Cambridge: MIT Press.MATH

Tatler, B. W., Baddeley, R. J., & Vincent, B. T. (2006). The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task. Vision Research, 46(12), 1857–1862.CrossRef

Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2), 154–171.CrossRef

Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI) (Vol. 16, pp. 2094–2100).

Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. California Inst. Technol., Pasadena, CA, USA, Tech. Rep (CNS-TR-2011-001).

Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., & Zhang, Z. (2015). Multiple granularity descriptors for fine-grained categorization. In International conference of computer vision (ICCV) (pp. 2399–2406).

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3360–3367).

Wang, Y., Choi, J., Morariu, V., & Davis, L. S. (2016a). Mining discriminative triplets of patches for fine-grained classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1163–1172).

Wang, Y., Morariu, V. I., & Davis, L. S. (2016b). Weakly-supervised discriminative patch learning via CNN for fine-grained recognition. arXiv:1611.09932.

Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016c). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML) (pp. 1995–2003).

Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 842–850).

Xie, L., Tian, Q., Hong, R., Yan, S., & Zhang, B. (2013). Hierarchical part matching for fine-grained visual categorization. In International conference of computer vision (ICCV) (pp. 1641–1648).

Xie, L., Zheng, L., Wang, J., Yuille, A. L., & Tian, Q. (2016). Interactive: Inter-layer activeness propagation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 270–279).

Xie, S., Yang, T., Wang, X., & Lin, Y. (2015). Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2645–2654).

Xu, Z., Huang, S., Zhang, Y., & Tao, D. (2018). Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(5), 1100–1113.CrossRef

Xu, Z., Tao, D., Huang, S., & Zhang, Y. (2017). Friend or foe: Fine-grained categorization with weak supervision. IEEE Transactions on Image Processing (TIP), 26(1), 135–146.MathSciNetCrossRefMATH

Yang, S., Bo, L., Wang, J., & Shapiro, L. G. (2012). Unsupervised template learning for fine-grained object recognition. In Neural information processing systems (NIPS) (pp. 3122–3130).

Yao, H., Zhang, S., Zhang, Y., Li, J., & Tian, Q. (2016). Coarse-to-fine description for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(10), 4858–4872.MathSciNetCrossRef

Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., & Metaxas, D. (2016a). Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1143–1152).

Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision (IJCV), 73(2), 213–238.CrossRef

Zhang, L., Yang, Y., Wang, M., Hong, R., Nie, L., & Li, X. (2016b). Detecting densely distributed graph patterns for fine-grained image categorization. IEEE Transactions on Image Processing (TIP), 25(2), 553–565.MathSciNetCrossRefMATH

Zhang, N., Farrell, R., Iandola, F., & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In International conference of computer vision (ICCV) (pp. 729–736).

Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNs for fine-grained category detection. In International conference on machine learning (ICML) (pp. 834–849).

Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016c). Picking deep filter responses for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1134–1142).

Zhang, X., Xiong, H., Zhou, W., & Tian, Q. (2016d). Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Transactions on Image Processing (TIP), 25(2), 878–892.MathSciNetCrossRefMATH

Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2017). Picking neural activations for fine-grained recognition. IEEE Transactions on Multimedia (TMM), 19(12), 2736–2750.

Zhang, Y., Wei, X. S., Wu, J., Cai, J., Lu, J., Nguyen, V. A., et al. (2016e). Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing (TIP), 25(4), 1713–1725.MathSciNetCrossRefMATH

Zhao, B., Wu, X., Feng, J., Peng, Q., & Yan, S. (2017a). Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia (TMM), 19(6), 1245–1256.CrossRef

Zhao, D., Chen, Y., & Lv, L. (2017b). Deep reinforcement learning with visual attention for vehicle classification. IEEE Transactions on Cognitive and Developmental Systems, 9(4), 356–367.CrossRef

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Object detectors emerge in deep scene CNNs. In International conference on learning representations (ICLR).

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2921–2929).

Zhou, F., & Lin, Y. (2016). Fine-grained image classification by exploring bipartite-graph labels. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1124–1133).

Titel: Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization
verfasst von: Xiangteng He
Yuxin Peng
Junjie Zhao
Publikationsdatum: 23.03.2019
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 9/2019
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-019-01176-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 9/2019

Efficient Feature Matching via Nonnegative Orthogonal Relaxation

Deep Supervised Hashing for Fast Image Retrieval

Unsupervised Learning of Foreground Object Segmentation

Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

Multi-target Tracking in Multiple Non-overlapping Cameras Using Fast-Constrained Dominant Sets

Deep Learning Approach in Aerial Imagery for Supporting Land Search and Rescue Missions