Skip to main content
Top

Hint

Swipe to navigate through the articles of this issue

Published in: Neural Processing Letters 5/2021

19-06-2021

Convolutional Feature Frequency Adaptive Fusion Object Detection Network

Authors: Lin Mao, Xuemeng Li, Dawei Yang, Rubo Zhang

Published in: Neural Processing Letters | Issue 5/2021

Login to get access
share
SHARE

Abstract

While the convolutional layer deepens during the feature extraction process in deep learning networks, the performance of the object detection decreases associated with the gradual loss of feature integrity. In this paper, the convolutional feature frequency adaptive fusion object detection network is proposed to effectively compensate for the missing frequency information in the convolutional feature propagation. Two branches are used for high- and low-frequency-domain channel information to maintain the stability of feature delivery. The adaptive feature fusion network complements the advantages of missing high-frequency features, enhances the feature extraction integrity of convolutional neural networks, and improves network detection performance. The simulation tests showed that this algorithm’s detection results are significantly enhanced on blurred objects, overlapping objects, and objects with low contrast between the object and background. The detection results on the Common Objects in Context dataset was more than 1% higher than the CornerNet algorithm. Thus, the proposed algorithm performs well for detecting pedestrians, vehicles, and other objects. Consequently, this algorithm is suitable for application in autonomous vehicle systems and smart robots.
Literature
1.
go back to reference He K, Zhang X, Zhang X, Ren S, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916 CrossRef He K, Zhang X, Zhang X, Ren S, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916 CrossRef
2.
go back to reference Girshick R (2015) Fast R-CNN[C]. In: International conference on computer vision, pp 1440–1448 Girshick R (2015) Fast R-CNN[C]. In: International conference on computer vision, pp 1440–1448
4.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition[C]. Comput Vis Pattern Recog pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition[C]. Comput Vis Pattern Recog pp 770–778
5.
go back to reference Jisoo J, Hyojin P, Nojun K (2017) Enhancement of SSD by concatenating feature maps for object detection. In: Kim TK, Zafeiriou S, Brostow G, Mikolajczyk K (eds) Proceedings of the British Machine Vision Conference (BMVC), pp 76.1–76.12. BMVA Press Jisoo J, Hyojin P, Nojun K (2017) Enhancement of SSD by concatenating feature maps for object detection. In: Kim TK, Zafeiriou S, Brostow G, Mikolajczyk K (eds) Proceedings of the British Machine Vision Conference (BMVC), pp 76.1–76.12. BMVA Press
6.
go back to reference Sun F, Kong T, Huang W, Tan C, Fang B, Liu H (2019) Feature pyramid reconfiguration with consistent loss for object detection[J]. IEEE Trans Image Process 28(10):5041–5051 MathSciNetCrossRef Sun F, Kong T, Huang W, Tan C, Fang B, Liu H (2019) Feature pyramid reconfiguration with consistent loss for object detection[J]. IEEE Trans Image Process 28(10):5041–5051 MathSciNetCrossRef
7.
go back to reference Ghiasi G, Lin T-Y, Le QV (2019) NAS-FPN: learning scalable feature pyramid architecture for object detection[C]. Comput Vis Pattern Recogn, pp 7036-7045 Ghiasi G, Lin T-Y, Le QV (2019) NAS-FPN: learning scalable feature pyramid architecture for object detection[C]. Comput Vis Pattern Recogn, pp 7036-7045
8.
go back to reference Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2Det: a single-shot object detector based on multi-level feature pyramid network[J]. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol 33, pp 9259–9266 Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2Det: a single-shot object detector based on multi-level feature pyramid network[J]. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol 33, pp 9259–9266
9.
go back to reference Law H, Deng J (2018) CornerNet: Detecting objects as paired keypoints[C]. In: European conference on computer vision, pp 765–781 Law H, Deng J (2018) CornerNet: Detecting objects as paired keypoints[C]. In: European conference on computer vision, pp 765–781
10.
11.
go back to reference Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) CenterNet: keypoint triplets for object detection[C]. In: International conference on computer vision, pp 6569–6578 Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) CenterNet: keypoint triplets for object detection[C]. In: International conference on computer vision, pp 6569–6578
12.
go back to reference Newell A, Yang K, Deng J (2016) Stacked Hourglass Networks for Human Pose Estimation[C]. In: European conference on computer vision, pp 483–499 Newell A, Yang K, Deng J (2016) Stacked Hourglass Networks for Human Pose Estimation[C]. In: European conference on computer vision, pp 483–499
13.
go back to reference Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions[C]. Comput Vis Pattern Recogn, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions[C]. Comput Vis Pattern Recogn, pp 1–9
14.
go back to reference Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection[A]. Comput Vis Pattern Recogn[C]. USA: IEEE, pp 845–853 Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection[A]. Comput Vis Pattern Recogn[C]. USA: IEEE, pp 845–853
15.
go back to reference Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection[C]. Comput Vis Pattern Recogn, pp 936–944 Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection[C]. Comput Vis Pattern Recogn, pp 936–944
16.
go back to reference Singh B, Davis LS (2018) An analysis of scale invariance in object detection - SNIP[C]. Comput Vis Pattern Recogn, pp 3578-3587 Singh B, Davis LS (2018) An analysis of scale invariance in object detection - SNIP[C]. Comput Vis Pattern Recogn, pp 3578-3587
17.
go back to reference Chen Y, Fan H, Xu B, Yan Z, Kalantidis Y, Rohrbach M, Shuicheng Y, Feng J (2019) Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution[C]. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 3434–3443. Chen Y, Fan H, Xu B, Yan Z, Kalantidis Y, Rohrbach M, Shuicheng Y, Feng J (2019) Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution[C]. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 3434–3443.
18.
go back to reference Li J, Hou Q, Xing J, Ju J (2020) SSD object detection model based on multi-frequency feature theory[J]. IEEE Access 8:82294–82305 CrossRef Li J, Hou Q, Xing J, Ju J (2020) SSD object detection model based on multi-frequency feature theory[J]. IEEE Access 8:82294–82305 CrossRef
19.
go back to reference Ye L, Wang L, Sun Y, Zhao L, Wei Y (2018) Parallel multi-stage features fusion of deep convolutional neural networks for aerial scene classification[J]. J Remote Sens Lett 9(3):294–303 CrossRef Ye L, Wang L, Sun Y, Zhao L, Wei Y (2018) Parallel multi-stage features fusion of deep convolutional neural networks for aerial scene classification[J]. J Remote Sens Lett 9(3):294–303 CrossRef
20.
go back to reference Yu Y, Liu F (2018) A two-stream deep fusion framework for high-resolution aerial scene classification[J]. J Comput Intell Neurosci p 8639367 Yu Y, Liu F (2018) A two-stream deep fusion framework for high-resolution aerial scene classification[J]. J Comput Intell Neurosci p 8639367
21.
go back to reference Sun N, Li W, Liu J, Han G, Wu CJ (2019) Fusing object semantics and deep appearance features for scene recognition[J]. IEEE Trans Circuits Technol Syst Video 29(6):1715–1728 CrossRef Sun N, Li W, Liu J, Han G, Wu CJ (2019) Fusing object semantics and deep appearance features for scene recognition[J]. IEEE Trans Circuits Technol Syst Video 29(6):1715–1728 CrossRef
22.
23.
go back to reference Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCO: common objects in context[C]. In: European conference on computer vision, pp 740–755. Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCO: common objects in context[C]. In: European conference on computer vision, pp 740–755.
24.
go back to reference Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning[C]. In: National conference on artificial intelligence, pp 4278–4284. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning[C]. In: National conference on artificial intelligence, pp 4278–4284.
25.
go back to reference Wang X, Yang M, Zhu S, Lin Y (2013) Regionlets for Generic Object Detection[C]. In: International conference on computer vision, pp 17–24. Wang X, Yang M, Zhu S, Lin Y (2013) Regionlets for Generic Object Detection[C]. In: International conference on computer vision, pp 17–24.
26.
go back to reference He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN[C]. In: International conference on computer vision, pp 2980–2988. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN[C]. In: International conference on computer vision, pp 2980–2988.
27.
go back to reference Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks[C]. Comput Vis Pattern Recogn, pp 5987–5995. Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks[C]. Comput Vis Pattern Recogn, pp 5987–5995.
28.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: single shot multibox detector[C]. In: European conference on computer vision, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: single shot multibox detector[C]. In: European conference on computer vision, pp 21–37
30.
go back to reference Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection[C]. Comput Vis Pattern Recogn, pp 4203–4212 Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection[C]. Comput Vis Pattern Recogn, pp 4203–4212
Metadata
Title
Convolutional Feature Frequency Adaptive Fusion Object Detection Network
Authors
Lin Mao
Xuemeng Li
Dawei Yang
Rubo Zhang
Publication date
19-06-2021
Publisher
Springer US
Published in
Neural Processing Letters / Issue 5/2021
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10560-4

Other articles of this Issue 5/2021

Neural Processing Letters 5/2021 Go to the issue