Skip to main content
Top
Published in: Cognitive Computation 4/2023

09-01-2022

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Authors: Dengdi Sun, Leilei Ma, Zhuanlian Ding, Bin Luo

Published in: Cognitive Computation | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multi-label image classification is a fundamental and vital task in computer vision. The latest methods are mostly based on deep learning and exhibit excellent performance in understanding images. However, in previous studies, only capture the image content information has been captured using convolutional neural networks (CNNs), and the semantic structure information and implicit dependencies between labels and image regions have been ignored. Therefore, it is necessary to develop more effective methods for integrating semantic information and visual features in multi-label image classification. In this study, we propose a novel framework for multi-label image classification, named FLNet, which simultaneously takes advantage of the visual features and semantic structure. Specifically, to enhance the association between semantic annotations and image regions, we first integrate the attention mechanism with a CNN to focus on the target regions while ignoring other useless surrounding information and then employ graph convolutional network (GCN) to capture the structure information between multiple labels. Based on our architecture, we also introduce the lateral connections to repeatedly inject the label system into the CNN backbone during the GCN learning process to improve performance and, consequently, learn interdependent classifiers for each image label. We apply our method to multi-label image classification. The experiments on two public multi-label benchmark datasets, namely, MS-COCO and PASCAL visual object classes challenge (VOC 2007), demonstrate that our approach outperforms other existing state-of-the-art methods. Our method learns specific target regions and enhances the association between labels and image regions by using semantic information and attention mechanism. Thus, we combine the advantages of both visual and semantic information to further improve the image classification performance. Finally, the correctness and effectiveness of the proposed method are proven by visualizing the classifier results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–778. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–778.
2.
go back to reference Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​14091556. 2014.
3.
go back to reference Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016. p. 2818–2826. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016. p. 2818–2826.
4.
go back to reference Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017. p. 4700–4708. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017. p. 4700–4708.
5.
go back to reference Alkhateeb A, Zhou L, Tabl AA, Rueda L. Deep Learning Approach for Breast Cancer InClust 5 Prediction based on Multiomics Data Integration. In: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2020. p. 1–6. Alkhateeb A, Zhou L, Tabl AA, Rueda L. Deep Learning Approach for Breast Cancer InClust 5 Prediction based on Multiomics Data Integration. In: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2020. p. 1–6.
6.
go back to reference Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W. CNN-RNN: A unified framework for multi-label image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016. p. 2285–2294. Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W. CNN-RNN: A unified framework for multi-label image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016. p. 2285–2294.
7.
go back to reference Li Q, Qiao M, Bian W, Tao D. Conditional graphical lasso for multi-label image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016. p. 2977–2986. Li Q, Qiao M, Bian W, Tao D. Conditional graphical lasso for multi-label image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016. p. 2977–2986.
8.
go back to reference Yazici VO, Gonzalez-Garcia A, Ramisa A, Twardowski B, Weijer JVD. Orderless Recurrent Models for Multi-label Classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 13440–13449. Yazici VO, Gonzalez-Garcia A, Ramisa A, Twardowski B, Weijer JVD. Orderless Recurrent Models for Multi-label Classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 13440–13449.
9.
go back to reference Chen ZM, Wei XS, Wang P, Guo Y. Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. p. 5177–5186. Chen ZM, Wei XS, Wang P, Guo Y. Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. p. 5177–5186.
10.
go back to reference Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, et al. HCP: A flexible CNN framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell. 2015;38(9):1901–7.CrossRef Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, et al. HCP: A flexible CNN framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell. 2015;38(9):1901–7.CrossRef
11.
go back to reference Wang Z, Chen T, Li G, Xu R, Lin L. Multi-label image recognition by recurrently discovering attentional regions. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 464–472. Wang Z, Chen T, Li G, Xu R, Lin L. Multi-label image recognition by recurrently discovering attentional regions. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 464–472.
12.
go back to reference Zhu F, Li H, Ouyang W, Yu N, Wang X. Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 5513–5522. Zhu F, Li H, Ouyang W, Yu N, Wang X. Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 5513–5522.
13.
go back to reference Ge W, Yang S, Yu Y. Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. p. 1277–1286. Ge W, Yang S, Yu Y. Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. p. 1277–1286.
14.
go back to reference Lee CW, Fang W, Yeh CK, Frank Wang YC. Multi-label zero-shot learning with structured knowledge graphs. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. p. 1576–1585. Lee CW, Fang W, Yeh CK, Frank Wang YC. Multi-label zero-shot learning with structured knowledge graphs. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. p. 1576–1585.
15.
go back to reference Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on Computer Vision and Pattern Recognition. IEEE; 2009. p. 248–255. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on Computer Vision and Pattern Recognition. IEEE; 2009. p. 248–255.
16.
go back to reference Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: European conference on computer vision. Springer; 2014. p. 740–755. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: European conference on computer vision. Springer; 2014. p. 740–755.
17.
go back to reference Chen Q, Song Z, Hua Y, Huang Z, Yan S. Hierarchical matching with side information for image classification. In: 2012 IEEE conference on Computer Vision and Pattern Recognition. IEEE; 2012. p. 3426–3433. Chen Q, Song Z, Hua Y, Huang Z, Yan S. Hierarchical matching with side information for image classification. In: 2012 IEEE conference on Computer Vision and Pattern Recognition. IEEE; 2012. p. 3426–3433.
18.
go back to reference Tsoumakas G, Katakis I. Multi-label classification: an overview. International Journal of Data Warehousing and Mining (IJDWM). 2007;3(3):1–13.CrossRef Tsoumakas G, Katakis I. Multi-label classification: an overview. International Journal of Data Warehousing and Mining (IJDWM). 2007;3(3):1–13.CrossRef
19.
go back to reference Gong Y, Jia Y, Leung T, Toshev A, Ioffe S. Deep convolutional ranking for multi-label image annotation. arXiv preprint arXiv:13124894. 2013. Gong Y, Jia Y, Leung T, Toshev A, Ioffe S. Deep convolutional ranking for multi-label image annotation. arXiv preprint arXiv:​13124894. 2013.
20.
go back to reference Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:14053531. 2014. Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:​14053531. 2014.
21.
go back to reference Wu F, Wang Z, Zhang Z, Yang Y, Luo J, Zhu W, et al. Weakly semi-supervised deep learning for multi-label image annotation. IEEE Transactions on Big Data. 2015;1(3):109–22.CrossRef Wu F, Wang Z, Zhang Z, Yang Y, Luo J, Zhu W, et al. Weakly semi-supervised deep learning for multi-label image annotation. IEEE Transactions on Big Data. 2015;1(3):109–22.CrossRef
22.
go back to reference Ghamrawi N, McCallum A. Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 2005. p. 195–200. Ghamrawi N, McCallum A. Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 2005. p. 195–200.
23.
go back to reference Guo Y, Gu S. Multi-label classification using conditional dependency networks. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22. Citeseer; 2011. p. 1300. Guo Y, Gu S. Multi-label classification using conditional dependency networks. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22. Citeseer; 2011. p. 1300.
24.
go back to reference Xue X, Zhang W, Zhang J, Wu B, Fan J, Lu Y. Correlative multi-label multi-instance image annotation. In: 2011 International Conference on Computer Vision. IEEE; 2011. p. 651–658. Xue X, Zhang W, Zhang J, Wu B, Fan J, Lu Y. Correlative multi-label multi-instance image annotation. In: 2011 International Conference on Computer Vision. IEEE; 2011. p. 651–658.
25.
go back to reference Tehrani AF, Ahrens D. Modeling label dependence for multi-label classification using the Choquistic regression. Pattern Recogn Lett. 2017;92:75–80.CrossRef Tehrani AF, Ahrens D. Modeling label dependence for multi-label classification using the Choquistic regression. Pattern Recogn Lett. 2017;92:75–80.CrossRef
26.
go back to reference Marino K, Salakhutdinov R, Gupta A. The more you know: Using knowledge graphs for image classification. arXiv preprint arXiv:161204844. 2016. Marino K, Salakhutdinov R, Gupta A. The more you know: Using knowledge graphs for image classification. arXiv preprint arXiv:​161204844. 2016.
27.
go back to reference Wang X, Gupta A. Videos as space-time region graphs. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 399–417. Wang X, Gupta A. Videos as space-time region graphs. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 399–417.
28.
29.
go back to reference Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems. 2017. p. 1024–1034. Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems. 2017. p. 1024–1034.
30.
go back to reference Xu J, Tian H, Wang Z, Wang Y, Chen F, Kang W. Joint Input and Output Space Learning for Multi-Label Image Classification. IEEE Trans Multimedia. 2020. Xu J, Tian H, Wang Z, Wang Y, Chen F, Kang W. Joint Input and Output Space Learning for Multi-Label Image Classification. IEEE Trans Multimedia. 2020.
31.
go back to reference Wang Y, Zhang T, Cui Z, Xu C, Yang J. Instance-Aware Graph Convolutional Network for Multi-Label Classification. arXiv preprint arXiv:200808407. 2020. Wang Y, Zhang T, Cui Z, Xu C, Yang J. Instance-Aware Graph Convolutional Network for Multi-Label Classification. arXiv preprint arXiv:​200808407. 2020.
32.
go back to reference Wang Y, He D, Li F, Long X, Zhou Z, Ma J, et al. Multi-label classification with label graph superimposing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34; 2020. p. 12265–12272. Wang Y, He D, Li F, Long X, Zhou Z, Ma J, et al. Multi-label classification with label graph superimposing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34; 2020. p. 12265–12272.
33.
go back to reference Li Q, Peng X, Qiao Y, Peng Q. Learning label correlations for multi-label image recognition with graph networks. Pattern Recogn Lett. 2020;138:378–84.CrossRef Li Q, Peng X, Qiao Y, Peng Q. Learning label correlations for multi-label image recognition with graph networks. Pattern Recogn Lett. 2020;138:378–84.CrossRef
34.
go back to reference Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017. p. 3156–3164. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017. p. 3156–3164.
35.
go back to reference Li X, Zhong Z, Wu J, Yang Y, Lin Z, Liu H. Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. p. 9167–9176. Li X, Zhong Z, Wu J, Yang Y, Lin Z, Liu H. Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. p. 9167–9176.
36.
go back to reference Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 4836–4845. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 4836–4845.
37.
38.
go back to reference Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention. In: Advances in Neural Information Processing Systems. 2014. p. 2204–2212. Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention. In: Advances in Neural Information Processing Systems. 2014. p. 2204–2212.
39.
go back to reference Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, et al. Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning. 2015. p. 2048–2057. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, et al. Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning. 2015. p. 2048–2057.
40.
go back to reference Chen SF, Chen YC, Yeh CK, Wang YCF. Order-free rnn with visual attention for multi-label classification. arXiv preprint arXiv:170705495. 2017. Chen SF, Chen YC, Yeh CK, Wang YCF. Order-free rnn with visual attention for multi-label classification. arXiv preprint arXiv:​170705495. 2017.
41.
go back to reference Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW. Selective search for object recognition. Int J Comput Vis. 2013;104(2):154–71.CrossRef Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW. Selective search for object recognition. Int J Comput Vis. 2013;104(2):154–71.CrossRef
42.
go back to reference Zhang Z, Liu Y, Chen X, Zhu Y, Cheng MM, Saligrama V, et al. Sequential optimization for efficient high-quality object proposal generation. IEEE Trans Pattern Anal Mach Intell. 2017;40(5):1209–23.CrossRef Zhang Z, Liu Y, Chen X, Zhu Y, Cheng MM, Saligrama V, et al. Sequential optimization for efficient high-quality object proposal generation. IEEE Trans Pattern Anal Mach Intell. 2017;40(5):1209–23.CrossRef
43.
go back to reference Chen T, Xu M, Hui X, Wu H, Lin L. Learning semantic-specific graph representation for multi-label image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. p. 522–531. Chen T, Xu M, Hui X, Wu H, Lin L. Learning semantic-specific graph representation for multi-label image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. p. 522–531.
44.
go back to reference Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vis. 2010;88(2):303–38.CrossRef Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vis. 2010;88(2):303–38.CrossRef
45.
go back to reference Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. p. 1532–1543. Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. p. 1532–1543.
46.
go back to reference Zhang J, Wu Q, Shen C, Zhang J, Lu J. Multilabel image classification with regional latent semantic dependencies. IEEE Trans Multimedia. 2018;20(10):2801–13.CrossRef Zhang J, Wu Q, Shen C, Zhang J, Lu J. Multilabel image classification with regional latent semantic dependencies. IEEE Trans Multimedia. 2018;20(10):2801–13.CrossRef
47.
go back to reference Yang H, Tianyi Zhou J, Zhang Y, Gao BB, Wu J, Cai J. Exploit bounding box annotations for multi-label object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 280–288. Yang H, Tianyi Zhou J, Zhang Y, Gao BB, Wu J, Cai J. Exploit bounding box annotations for multi-label object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 280–288.
48.
go back to reference Chen T, Wang Z, Li G, Lin L. Recurrent attentional reinforcement learning for multi-label image recognition. arXiv preprint arXiv:171207465. 2017. Chen T, Wang Z, Li G, Lin L. Recurrent attentional reinforcement learning for multi-label image recognition. arXiv preprint arXiv:​171207465. 2017.
49.
go back to reference Maaten Lvd, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(Nov):2579–2605. Maaten Lvd, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(Nov):2579–2605.
Metadata
Title
An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks
Authors
Dengdi Sun
Leilei Ma
Zhuanlian Ding
Bin Luo
Publication date
09-01-2022
Publisher
Springer US
Published in
Cognitive Computation / Issue 4/2023
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-021-09977-9

Other articles of this Issue 4/2023

Cognitive Computation 4/2023 Go to the issue

Premium Partner