Top

Neural Processing Letters

Published in:

26-08-2022

Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism

Authors: Xiaolian Li, Lei Zhu, Wenwu Wang, Ke Yang

Published in: Neural Processing Letters | Issue 3/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Rational use of multilevel structures of deep networks to extract multiscale features is crucial for instance segmentation. The Feature Pyramid Network (FPN) is a classical architecture that enriches the semantic information of multiscale objects. However, inherent defects in FPN structure are bound to cause loss of information during feature extraction and feature fusion. In this paper, we propose a feature pyramid structure (called Refine-FPN) based on a non-local multi-feature aggregation operation, a module that integrates multi-scale feature to rely on attention mechanisms to improve pyramid feature representation. The algorithm enriches the feature details of feature layers by aggregating multiple features to form a contextual global feature representation. By replacing FPN with Refine-FPN in the Mask R-CNN, our model improved the performance of the mask AP by 0.6% and 0.5% on the COCO dataset, when using ResNet-50 and ResNet-101 as the backbone, respectively. Moreover, it is friendly to integrate the proposed method into other popular architectures. For example, equipping the Cascade Mask R-CNN with Refine-FPN achieves an improvement of 0.5% and 0.4% mask AP under ResNet-50 and ResNet-101, respectively.

previous article Rolling Bearing Fault Diagnosis Method Based on Attention CNN and BiLSTM Network

next article Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

To simplify the analysis, we omit the dimension in batch direction here.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

Hong C, Yu J, Zhang J et al (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961CrossRef

Yu J, Tan M, Zhang H et al (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578CrossRef

Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768

Chen K, Pang J, Wang J et al (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4974–4983

Chen H, Sun K, Tian Z et al (2020) Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8573–8581

Yu J, Yao J, Zhang J et al (2020) SPRNet: single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742CrossRef

Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952CrossRef

Zhang J, Yang J, Yu J et al (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117–3141CrossRef

10.

He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969

11.

Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125

12.

Lin T Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755

13.

Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

14.

Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988

15.

Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28

16.

Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162

17.

Fang Y, Yang S, Wang X et al (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919

18.

O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Adv Neural Inform Process Syst 28

19.

Pinheiro PO, Lin TY, Collobert R et al (2016) Learning to refine object segments. In: European conference on computer vision. Springer, Cham, pp 75–91

20.

Zagoruyko S, Lerer A, Lin T Y et al (2016) A multipath network for object detection. arXiv preprint arXiv:1604.02135

21.

Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3150–3158

22.

Li Y, Qi H, Dai J et al (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2359–2367

23.

Dai J, He K, Li Y et al (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision. Springer, Cham, pp 534–549

24.

Chen LC, Hermans A, Papandreou G et al (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022

25.

Kirillov A, Levinkov E, Andres B et al (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5008–5017

26.

Liu S, Jia J, Fidler S et al (2017) Sgn: sequential grouping networks for instance segmentation. In: Proceedings of the IEEE international conference on computer vision. pp. 3496–3504

27.

Uhrig J, Cordts M, Franke U et al (2016) Pixel-level encoding and depth layering for instance-level semantic labeling. In: German conference on pattern recognition. Springer, Cham, pp 14–25

28.

De Brabandere B, Neven D, Van Gool L (2017) Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551

29.

Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inform Process Syst 30

30.

Fathi A, Wojna Z, Rathod V et al (2017) Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277

31.

Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37

32.

Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10781–10790

33.

Ghiasi G, Lin T Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7036–7045

34.

Guo C, Fan B, Zhang Q et al (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12595–12604

35.

Qiao S, Chen LC, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10213–10224

36.

Hu M, Li Y, Fang L et al (2021) A2-FPN: attention aggregation based feature pyramid network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 15343–15352

37.

Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Systems 30

38.

Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803

39.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141

40.

Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3146–3154

41.

Huang Z, Wang X, Huang L et al (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 603–612

42.

Cao Y, Xu J, Lin S et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops

43.

He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

44.

Gupta A, Dollar P, Girshick R (2019) Lvis: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5356–5364

45.

Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155

46.

Xie S, Girshick R, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500

47.

Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890

Title: Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism
Authors: Xiaolian Li
Lei Zhu
Wenwu Wang
Ke Yang
Publication date: 26-08-2022
Publisher: Springer US
Published in: Neural Processing Letters / Issue 3/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-022-11016-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2023

Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models

A Block Cipher Algorithm Identification Scheme Based on Hybrid Random Forest and Logistic Regression Model

A Support Vector Based Hybrid Forecasting Model for Chaotic Time Series: Spare Part Consumption Prediction

A Unified Synchronization Criterion for Reaction-Diffusion Neural Networks with Time-Varying Impulsive Delays and System Delay

An Empirical Mode Decomposition Fuzzy Forecast Model for COVID-19

Hadamard Product Perceptron Attention for Image Captioning