Skip to main content
Top

2016 | OriginalPaper | Chapter

Category Aggregation Among Region Proposals for Object Detection

Authors : Linghui Li, Sheng Tang, Jianshe Zhou, Bin Wang, Qi Tian

Published in: Advances in Multimedia Information Processing - PCM 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recently, an overwhelming majority of object detection methods have focused on how to reduce the number of region proposals while keeping high object recall without consideration of category information. It may lead to a lot of false positives due to the interferences between categories especially when the number of categories is very large. To eliminate such interferences, we propose a novel category aggregation approach based upon our observation that more frequently detected categories around an object have the higher probabilities to be present in an image. After further exploiting the co-occurrence relationship between categories, we can determine the most possible categories for an image in advance. Thus, many false positives can be greatly filtered out before subsequent classification process. Our extensive experiments on the well-known ILSVRC 2015 detection dataset show that our approach can achieve 49.0% of mAP in the validation dataset and 45.36% of mAP in the test dataset ranked 5th in the ILSVRC 2015 detection task.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sermanet, P., Eigen, D., Zhang, X., et al.: Overfeat integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013) Sermanet, P., Eigen, D., Zhang, X., et al.: Overfeat integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:​1312.​6229 (2013)
2.
go back to reference Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010) Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010)
3.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
4.
go back to reference Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
5.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
6.
go back to reference Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
7.
go back to reference Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
8.
go back to reference Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE Conference on IEEE Computer Vision and Pattern Recognition (CVPR), pp. 73–80 (2010) Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE Conference on IEEE Computer Vision and Pattern Recognition (CVPR), pp. 73–80 (2010)
9.
go back to reference Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)CrossRef Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)CrossRef
10.
go back to reference Cheng, M.M., Zhang, Z., Lin, W.Y., et al.: BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3286–3293 (2014) Cheng, M.M., Zhang, Z., Lin, W.Y., et al.: BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3286–3293 (2014)
11.
go back to reference Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26 Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10602-1_​26
12.
go back to reference Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
13.
go back to reference Erhan, D., Szegedy, C., Toshev, A., et al.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014) Erhan, D., Szegedy, C., Toshev, A., et al.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)
14.
go back to reference Arbelez, P., Pont-Tuset, J., Barron, J., et al.: Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014) Arbelez, P., Pont-Tuset, J., Barron, J., et al.: Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)
15.
go back to reference Qi, G.J., Hua, X.S., Rui, Y., et al.: Correlative multi-label video annotation. In: Proceedings of the 15th International Conference on Multimedia, pp. 17–26. ACM (2007) Qi, G.J., Hua, X.S., Rui, Y., et al.: Correlative multi-label video annotation. In: Proceedings of the 15th International Conference on Multimedia, pp. 17–26. ACM (2007)
16.
go back to reference Jiang, W., Chang, S.F., Loui, A.C.: Active context-based concept fusion with partial user labels. In: 2006 IEEE International Conference on Image Processing, pp. 2917–2920. IEEE (2006) Jiang, W., Chang, S.F., Loui, A.C.: Active context-based concept fusion with partial user labels. In: 2006 IEEE International Conference on Image Processing, pp. 2917–2920. IEEE (2006)
17.
go back to reference Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015) Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
18.
go back to reference Ouyang, W., Wang, X., Zeng, X., et al.: DeepID-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2015) Ouyang, W., Wang, X., Zeng, X., et al.: DeepID-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2015)
19.
go back to reference He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRef He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRef
20.
go back to reference Weng, M.F., Chuang, Y.Y.: Multi-cue fusion for semantic video indexing. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 71–80. ACM (2008) Weng, M.F., Chuang, Y.Y.: Multi-cue fusion for semantic video indexing. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 71–80. ACM (2008)
21.
go back to reference Choi, M.J., Lim, J.J., Torralba, A., et al.: Exploiting hierarchical context on a large database of object categories. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 129–136. IEEE (2010) Choi, M.J., Lim, J.J., Torralba, A., et al.: Exploiting hierarchical context on a large database of object categories. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 129–136. IEEE (2010)
22.
go back to reference Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008) Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
23.
go back to reference Oquab, M., Bottou, L., Laptev, I., et al.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014) Oquab, M., Bottou, L., Laptev, I., et al.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
24.
go back to reference Zheng, L., Wang, S., Liu, Z., et al.: Packing, padding: coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014) Zheng, L., Wang, S., Liu, Z., et al.: Packing, padding: coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014)
Metadata
Title
Category Aggregation Among Region Proposals for Object Detection
Authors
Linghui Li
Sheng Tang
Jianshe Zhou
Bin Wang
Qi Tian
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-48896-7_21