Skip to main content
main-content

Tipp

Weitere Artikel dieser Ausgabe durch Wischen aufrufen

Erschienen in: Neural Processing Letters 4/2022

23.02.2022

Fixed-Size Objects Encoding for Visual Relationship Detection

verfasst von: Hengyue Pan, Xin Niu, Siqi Shen, Yixin Chen, Peng Qiao, Zhen Huang, Dongsheng Li

Erschienen in: Neural Processing Letters | Ausgabe 4/2022

Einloggen, um Zugang zu erhalten
share
TEILEN

Abstract

In this paper, we propose a fixed-size object encoding method called FOE-VRD to improve performance of visual relationship detection tasks. For each relationship triplet in a given image, FOE-VRD not only considers the subject and object, but also uses one fixed-size vector to encoding all background objects of the image. In this way, we introduce more background knowledge to assist the relationship detector for better performance. We firstly use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet, we apply ROI-pooling as the feature generator on the bounding boxes of subject and object to get two corresponding feature vectors. Moreover, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 generated feature vectors, we successfully encode the relationship using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get the predicate classification result. Experimental results on VRD and Visual Genome databases show that the proposed method works well on both predicate classification and relationship detection tasks, especially on the situation of zero-shot detection.
Literatur
1.
Zurück zum Zitat Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. https://​www.​tensorflow.​org/​, software available from tensorflow.org Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. https://​www.​tensorflow.​org/​, software available from tensorflow.org
2.
Zurück zum Zitat Bin Y, Yang Y, Tao C, Huang Z, Li J, Shen HT (2019) Mr-net: Exploiting mutual relation for visual relationship detection. Proceed AAAI Conf Artificial Intell 33:8110–8117 Bin Y, Yang Y, Tao C, Huang Z, Li J, Shen HT (2019) Mr-net: Exploiting mutual relation for visual relationship detection. Proceed AAAI Conf Artificial Intell 33:8110–8117
3.
Zurück zum Zitat Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255
4.
Zurück zum Zitat Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448 Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
5.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916 CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916 CrossRef
6.
Zurück zum Zitat He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969 He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
7.
Zurück zum Zitat Jung J, Park J (2018) Visual relationship detection with language prior and softmax. In: 2018 IEEE international conference on image processing, applications and systems (IPAS), IEEE, pp 143–148 Jung J, Park J (2018) Visual relationship detection with language prior and softmax. In: 2018 IEEE international conference on image processing, applications and systems (IPAS), IEEE, pp 143–148
8.
Zurück zum Zitat Jung J, Park J (2019) Improving visual relationship detection using linguistic and spatial cues. ETRI J Jung J, Park J (2019) Improving visual relationship detection using linguistic and spatial cues. ETRI J
9.
Zurück zum Zitat Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, Bernstein M, Fei-Fei L (2016) Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv:​1602.​07332 Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, Bernstein M, Fei-Fei L (2016) Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv:​1602.​07332
10.
Zurück zum Zitat Kukleva A, Tapaswi M, Laptev I (2020) Learning interactions and relationships between movie characters. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Kukleva A, Tapaswi M, Laptev I (2020) Learning interactions and relationships between movie characters. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
11.
Zurück zum Zitat Liang K, Guo Y, Chang H, Chen X (2018) Visual relationship detection with deep structural ranking. In: Thirty-Second AAAI Conference on Artificial Intelligence Liang K, Guo Y, Chang H, Chen X (2018) Visual relationship detection with deep structural ranking. In: Thirty-Second AAAI Conference on Artificial Intelligence
12.
Zurück zum Zitat Liao W, Rosenhahn B, Shuai L, Ying Yang M (2019) Natural language guided visual relationship detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops Liao W, Rosenhahn B, Shuai L, Ying Yang M (2019) Natural language guided visual relationship detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
13.
Zurück zum Zitat Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755 Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755
14.
Zurück zum Zitat Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
15.
Zurück zum Zitat Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: European Conference on Computer Vision Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: European Conference on Computer Vision
16.
Zurück zum Zitat Mi L, Chen Z (2020) Hierarchical graph attention network for visual relationship detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Mi L, Chen Z (2020) Hierarchical graph attention network for visual relationship detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
18.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99 Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
19.
21.
Zurück zum Zitat Yin G, Sheng L, Liu B, Yu N, Wang X, Shao J, Change Loy C (2018) Zoom-net: mining deep feature interactions for visual relationship recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 322–338 Yin G, Sheng L, Liu B, Yu N, Wang X, Shao J, Change Loy C (2018) Zoom-net: mining deep feature interactions for visual relationship recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 322–338
22.
Zurück zum Zitat Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of the IEEE international conference on computer vision, pp 1974–1982 Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of the IEEE international conference on computer vision, pp 1974–1982
23.
Zurück zum Zitat Zhang S, Jiang H, Xu M, Hou J, Dai LR (2015) The fixed-size ordinally-forgetting encoding method for neural network language models. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp 495–500 Zhang S, Jiang H, Xu M, Hou J, Dai LR (2015) The fixed-size ordinally-forgetting encoding method for neural network language models. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp 495–500
24.
Zurück zum Zitat Zhou H, Zhang C, Hu C (2019) Visual relationship detection with relative location mining. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 30–38 Zhou H, Zhang C, Hu C (2019) Visual relationship detection with relative location mining. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 30–38
25.
Zurück zum Zitat Zhu Y, Jiang S (2018) Deep structured learning for visual relationship detection. In: Thirty-Second AAAI Conference on Artificial Intelligence Zhu Y, Jiang S (2018) Deep structured learning for visual relationship detection. In: Thirty-Second AAAI Conference on Artificial Intelligence
Metadaten
Titel
Fixed-Size Objects Encoding for Visual Relationship Detection
verfasst von
Hengyue Pan
Xin Niu
Siqi Shen
Yixin Chen
Peng Qiao
Zhen Huang
Dongsheng Li
Publikationsdatum
23.02.2022
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 4/2022
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-022-10766-0

Weitere Artikel der Ausgabe 4/2022

Neural Processing Letters 4/2022 Zur Ausgabe