Skip to main content
Top
Published in: Neural Processing Letters 3/2023

26-08-2022

Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism

Authors: Xiaolian Li, Lei Zhu, Wenwu Wang, Ke Yang

Published in: Neural Processing Letters | Issue 3/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Rational use of multilevel structures of deep networks to extract multiscale features is crucial for instance segmentation. The Feature Pyramid Network (FPN) is a classical architecture that enriches the semantic information of multiscale objects. However, inherent defects in FPN structure are bound to cause loss of information during feature extraction and feature fusion. In this paper, we propose a feature pyramid structure (called Refine-FPN) based on a non-local multi-feature aggregation operation, a module that integrates multi-scale feature to rely on attention mechanisms to improve pyramid feature representation. The algorithm enriches the feature details of feature layers by aggregating multiple features to form a contextual global feature representation. By replacing FPN with Refine-FPN in the Mask R-CNN, our model improved the performance of the mask AP by 0.6% and 0.5% on the COCO dataset, when using ResNet-50 and ResNet-101 as the backbone, respectively. Moreover, it is friendly to integrate the proposed method into other popular architectures. For example, equipping the Cascade Mask R-CNN with Refine-FPN achieves an improvement of 0.5% and 0.4% mask AP under ResNet-50 and ResNet-101, respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
To simplify the analysis, we omit the dimension in batch direction here.
 
Literature
1.
2.
go back to reference Hong C, Yu J, Zhang J et al (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961CrossRef Hong C, Yu J, Zhang J et al (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961CrossRef
3.
go back to reference Yu J, Tan M, Zhang H et al (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578CrossRef Yu J, Tan M, Zhang H et al (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578CrossRef
4.
go back to reference Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768 Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768
5.
go back to reference Chen K, Pang J, Wang J et al (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4974–4983 Chen K, Pang J, Wang J et al (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4974–4983
6.
go back to reference Chen H, Sun K, Tian Z et al (2020) Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8573–8581 Chen H, Sun K, Tian Z et al (2020) Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8573–8581
7.
go back to reference Yu J, Yao J, Zhang J et al (2020) SPRNet: single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742CrossRef Yu J, Yao J, Zhang J et al (2020) SPRNet: single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742CrossRef
8.
go back to reference Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952CrossRef Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952CrossRef
9.
go back to reference Zhang J, Yang J, Yu J et al (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117–3141CrossRef Zhang J, Yang J, Yu J et al (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117–3141CrossRef
10.
go back to reference He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969 He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969
11.
go back to reference Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125 Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
12.
go back to reference Lin T Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755 Lin T Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755
14.
go back to reference Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988 Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
15.
go back to reference Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28 Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
16.
go back to reference Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162 Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162
17.
go back to reference Fang Y, Yang S, Wang X et al (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919 Fang Y, Yang S, Wang X et al (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919
18.
go back to reference O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Adv Neural Inform Process Syst 28 O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Adv Neural Inform Process Syst 28
19.
go back to reference Pinheiro PO, Lin TY, Collobert R et al (2016) Learning to refine object segments. In: European conference on computer vision. Springer, Cham, pp 75–91 Pinheiro PO, Lin TY, Collobert R et al (2016) Learning to refine object segments. In: European conference on computer vision. Springer, Cham, pp 75–91
21.
go back to reference Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3150–3158 Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3150–3158
22.
go back to reference Li Y, Qi H, Dai J et al (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2359–2367 Li Y, Qi H, Dai J et al (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2359–2367
23.
go back to reference Dai J, He K, Li Y et al (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision. Springer, Cham, pp 534–549 Dai J, He K, Li Y et al (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision. Springer, Cham, pp 534–549
24.
go back to reference Chen LC, Hermans A, Papandreou G et al (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022 Chen LC, Hermans A, Papandreou G et al (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022
25.
go back to reference Kirillov A, Levinkov E, Andres B et al (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5008–5017 Kirillov A, Levinkov E, Andres B et al (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5008–5017
26.
go back to reference Liu S, Jia J, Fidler S et al (2017) Sgn: sequential grouping networks for instance segmentation. In: Proceedings of the IEEE international conference on computer vision. pp. 3496–3504 Liu S, Jia J, Fidler S et al (2017) Sgn: sequential grouping networks for instance segmentation. In: Proceedings of the IEEE international conference on computer vision. pp. 3496–3504
27.
go back to reference Uhrig J, Cordts M, Franke U et al (2016) Pixel-level encoding and depth layering for instance-level semantic labeling. In: German conference on pattern recognition. Springer, Cham, pp 14–25 Uhrig J, Cordts M, Franke U et al (2016) Pixel-level encoding and depth layering for instance-level semantic labeling. In: German conference on pattern recognition. Springer, Cham, pp 14–25
28.
go back to reference De Brabandere B, Neven D, Van Gool L (2017) Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 De Brabandere B, Neven D, Van Gool L (2017) Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:​1708.​02551
29.
go back to reference Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inform Process Syst 30 Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inform Process Syst 30
30.
31.
go back to reference Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37 Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
32.
go back to reference Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10781–10790 Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10781–10790
33.
go back to reference Ghiasi G, Lin T Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7036–7045 Ghiasi G, Lin T Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7036–7045
34.
go back to reference Guo C, Fan B, Zhang Q et al (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12595–12604 Guo C, Fan B, Zhang Q et al (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12595–12604
35.
go back to reference Qiao S, Chen LC, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10213–10224 Qiao S, Chen LC, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10213–10224
36.
go back to reference Hu M, Li Y, Fang L et al (2021) A2-FPN: attention aggregation based feature pyramid network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 15343–15352 Hu M, Li Y, Fang L et al (2021) A2-FPN: attention aggregation based feature pyramid network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 15343–15352
37.
go back to reference Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Systems 30 Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Systems 30
38.
go back to reference Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803 Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803
39.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141
40.
go back to reference Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3146–3154 Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3146–3154
41.
go back to reference Huang Z, Wang X, Huang L et al (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 603–612 Huang Z, Wang X, Huang L et al (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 603–612
42.
go back to reference Cao Y, Xu J, Lin S et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops Cao Y, Xu J, Lin S et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
43.
go back to reference He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778 He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
44.
go back to reference Gupta A, Dollar P, Girshick R (2019) Lvis: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5356–5364 Gupta A, Dollar P, Girshick R (2019) Lvis: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5356–5364
45.
46.
go back to reference Xie S, Girshick R, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500 Xie S, Girshick R, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500
47.
go back to reference Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890 Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890
Metadata
Title
Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism
Authors
Xiaolian Li
Lei Zhu
Wenwu Wang
Ke Yang
Publication date
26-08-2022
Publisher
Springer US
Published in
Neural Processing Letters / Issue 3/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-022-11016-z

Other articles of this Issue 3/2023

Neural Processing Letters 3/2023 Go to the issue