Skip to main content
Top

2020 | OriginalPaper | Chapter

HoughNet: Integrating Near and Long-Range Evidence for Bottom-Up Object Detection

Authors : Nermin Samet, Samet Hicsonmez, Emre Akbas

Published in: Computer Vision – ECCV 2020

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet’s best model achieves 46.4 AP (and 65.1 \(AP_{50}\)), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in another task, namely, “labels to photo” image generation by integrating the voting module of HoughNet to two different GAN models and showing that the accuracy is significantly improved in both cases. Code is available at https://​github.​com/​nerminsamet/​houghnet.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
We provide a step-by-step animation of the voting process at https://​shorturl.​at/​ilOP2.
 
Literature
1.
go back to reference Akbas, E., Eckstein, M.P.: Object detection through search with a foveated visual system. PLoS Comput. Biol. 13(10), e1005743 (2017)CrossRef Akbas, E., Eckstein, M.P.: Object detection through search with a foveated visual system. PLoS Comput. Biol. 13(10), e1005743 (2017)CrossRef
2.
go back to reference Ballard, D.H., et al.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13, 111–122 (1981) CrossRef Ballard, D.H., et al.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13, 111–122 (1981) CrossRef
3.
go back to reference Barinova, O., Lempitsky, V., Kholi, P.: On detection of multiple object instances using hough transforms. IEEE Trans. Patt. Anal. Mach. Intell. 34(9), 1773–1784 (2012)CrossRef Barinova, O., Lempitsky, V., Kholi, P.: On detection of multiple object instances using hough transforms. IEEE Trans. Patt. Anal. Mach. Intell. 34(9), 1773–1784 (2012)CrossRef
5.
go back to reference Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS-improving object detection with one line of code. In: IEEE International Conference on Computer Vision, pp. 5561–5569 (2017) Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS-improving object detection with one line of code. In: IEEE International Conference on Computer Vision, pp. 5561–5569 (2017)
6.
go back to reference Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018) Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
7.
go back to reference Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016) Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
8.
go back to reference Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016) Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
9.
go back to reference Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (2019) Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (2019)
10.
go back to reference Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)CrossRef Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)CrossRef
11.
go back to reference Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)CrossRef Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)CrossRef
12.
go back to reference Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017) Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:​1701.​06659 (2017)
13.
go back to reference Gabriel, E., Schleiss, M., Schramm, H., Meyer, C.: Analysis of the discriminative generalized hough transform as a proposal generator for a deep network in automatic pedestrian and car detection. J. Electron. Imaging 27(5), 051228 (2018)CrossRef Gabriel, E., Schleiss, M., Schramm, H., Meyer, C.: Analysis of the discriminative generalized hough transform as a proposal generator for a deep network in automatic pedestrian and car detection. J. Electron. Imaging 27(5), 051228 (2018)CrossRef
14.
go back to reference Gall, J., Lempitsky, V.: Class-specific Hough forests for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Gall, J., Lempitsky, V.: Class-specific Hough forests for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
16.
go back to reference He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
17.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
18.
go back to reference Hough, P.V.C.: Machine analysis of bubble chamber pictures, vol. C 590914, pp. 554–558 (1959) Hough, P.V.C.: Machine analysis of bubble chamber pictures, vol. C 590914, pp. 554–558 (1959)
19.
go back to reference Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018) Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)
20.
go back to reference Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
22.
go back to reference Land, M., Tatler, B.: Looking and Acting: Vision and Eye Movements in Natural Behaviour. Oxford University Press, Oxford (2009) Land, M., Tatler, B.: Looking and Acting: Vision and Eye Movements in Natural Behaviour. Oxford University Press, Oxford (2009)
23.
go back to reference Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: European Conference on Computer Vision, pp. 734–750 (2018) Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: European Conference on Computer Vision, pp. 734–750 (2018)
24.
go back to reference Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1), 259–289 (2008)CrossRef Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1), 259–289 (2008)CrossRef
25.
go back to reference Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. In: European Conference on Computer Vision (2016) Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. In: European Conference on Computer Vision (2016)
26.
go back to reference Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017)
27.
go back to reference Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (2017) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (2017)
29.
go back to reference Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018) Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
30.
go back to reference Liu, W., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision (2016) Liu, W., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision (2016)
31.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
32.
go back to reference Maji, S., Malik, J.: Object detection using a max-margin Hough transform. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Maji, S., Malik, J.: Object detection using a max-margin Hough transform. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
33.
go back to reference Okada, R.: Discriminative generalized Hough transform for object detection. In: IEEE International Conference on Computer Vision (2009) Okada, R.: Discriminative generalized Hough transform for object detection. In: IEEE International Conference on Computer Vision (2009)
34.
go back to reference Oksuz, K., Cam, B., Akbas, E., Kalkan, S.: Localization recall precision (LRP): a new performance metric for object detection. In: European Conference on Computer Vision (ECCV) (2018) Oksuz, K., Cam, B., Akbas, E., Kalkan, S.: Localization recall precision (LRP): a new performance metric for object detection. In: European Conference on Computer Vision (ECCV) (2018)
36.
go back to reference Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: IEEE International Conference on Computer Vision (2019) Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: IEEE International Conference on Computer Vision (2019)
37.
go back to reference Razavi, N., Gall, J., Kohli, P., Van Gool, L.: Latent Hough transform for object detection. In: European Conference on Computer Vision (2012) Razavi, N., Gall, J., Kohli, P., Van Gool, L.: Latent Hough transform for object detection. In: European Conference on Computer Vision (2012)
38.
go back to reference Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (2017) Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
40.
go back to reference Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: IEEE Conference on Computer Vision and Pattern Recognition (2017) Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
41.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
42.
go back to reference Sheshkus, A., Ingacheva, A., Arlazarov, V., Nikolaev, D.: HoughNet: neural network architecture for vanishing points detection (2019) Sheshkus, A., Ingacheva, A., Arlazarov, V., Nikolaev, D.: HoughNet: neural network architecture for vanishing points detection (2019)
43.
go back to reference Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: IEEE International Conference on Computer Vision (2019) Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: IEEE International Conference on Computer Vision (2019)
44.
go back to reference Traver, V.J., Bernardino, A.: A review of log-polar imaging for visual perception in robotics. Rob. Auton. Syst. 58(4), 378–398 (2010)CrossRef Traver, V.J., Bernardino, A.: A review of log-polar imaging for visual perception in robotics. Rob. Auton. Syst. 58(4), 378–398 (2010)CrossRef
45.
go back to reference Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: IEEE Conference on Computer Vision and Pattern Recognition (2019) Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
46.
go back to reference Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018) Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
47.
go back to reference Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European Conference on Computer Vision, pp. 466–481 (2018) Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European Conference on Computer Vision, pp. 466–481 (2018)
48.
go back to reference Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. CoRR (2015) Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. CoRR (2015)
49.
go back to reference Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018) Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
50.
go back to reference Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: FreeAnchor: learning to match anchors for visual object detection. In: Advances in Neural Information Processing Systems (2019) Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: FreeAnchor: learning to match anchors for visual object detection. In: Advances in Neural Information Processing Systems (2019)
52.
go back to reference Zhou, X., Zhuo, J., Krähenbühl, P.: Bottom-up object detection by grouping extreme and center points. In: IEEE Conference on Computer Vision and Pattern Recognition (2019) Zhou, X., Zhuo, J., Krähenbühl, P.: Bottom-up object detection by grouping extreme and center points. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
53.
go back to reference Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2019) Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
54.
go back to reference Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision, pp. 2223–2232 (2017) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
55.
go back to reference Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., Lu, H.: CoupleNet: coupling global structure with local parts for object detection. In: IEEE International Conference on Computer Vision, pp. 4126–4134 (2017) Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., Lu, H.: CoupleNet: coupling global structure with local parts for object detection. In: IEEE International Conference on Computer Vision, pp. 4126–4134 (2017)
Metadata
Title
HoughNet: Integrating Near and Long-Range Evidence for Bottom-Up Object Detection
Authors
Nermin Samet
Samet Hicsonmez
Emre Akbas
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-58595-2_25

Premium Partner